JP2003348598A

JP2003348598A - Method and apparatus for memory efficient compressed domain video processing and for fast inverse motion compensation using factorization and integer approximation

Info

Publication number: JP2003348598A
Application number: JP2003107352A
Authority: JP
Inventors: William Chen; チェンウィリアム; Vasudev Bhaskaran; バスカランヴァスデヴ
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 2002-04-12
Filing date: 2003-04-11
Publication date: 2003-12-05
Also published as: CN1225904C; CN1452396A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method and apparatus for memory efficient compressed domain video processing and for fast inverse motion compensation using factorization and integer approximation. <P>SOLUTION: A method for reducing memory requirements needed to decode a bit stream comprises: receiving a video bit stream; decoding the frame of the bit stream into a discrete cosine transform (DCT) domain representation; identifying non-zero coefficients of the DCT domain representation; assembling a hybrid data structure; and inserting the nonzero coefficients of the DCT domain representation into the hybrid data structure. A method for performing inverse motion compensation is provided. The method initiates with receiving a video bit stream, and then, a transform matrix type is identified. The transform matrix type is either a half pixel matrix or a full pixel matrix. If the transform matrix type is a half pixel matrix, then the method includes applying a factorization technique to decode the bit stream corresponding to the half pixel matrix. If the transform matrix type is a full pixel matrix, then the method includes applying an integer approximation technique to decode the bit stream corresponding to the full pixel matrix. <P>COPYRIGHT: (C)2004,JPO

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、一般的には、ディ
ジタルビデオ技術に関し、より具体的には、効率的なメ
モリ圧縮法を実現するための方法及び装置に関すると共
に、圧縮領域ビデオデコーダのための効率的な逆動き補
償法を実現するための方法及び装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates generally to digital video technology, and more particularly to a method and apparatus for implementing an efficient memory compression method, and for a compressed domain video decoder. And an apparatus for realizing an efficient reverse motion compensation method of the present invention.

【０００２】[0002]

【従来の技術】セルラー電話やパーソナルディジタルア
シスタントといったモバイル端末でのビデオアクセスで
は、モバイルシステムの性質上の限界があるために、数
多くの難しい課題に出くわす。例えば、低消費電力型の
ハンドヘルド機器は、バンド幅、電力、メモリ、及びコ
スト上の必要条件により制約を受ける。こうしたハンド
ヘルド機器で受信されるビデオデータはビデオデコーダ
により復号化される。そうした端末と関連付けられるビ
デオデコーダは、空間領域、つまり、解凍領域で動き補
償を実行する。H.263、H261、MPEG1/2/4といったビデオ
圧縮規格は、低ビットレートでビデオを符号化するのに
動き補償型離散コサイン変換（DCT）スキームを用い
る。ここで採用している低ビットレートとは、秒毎約６
４キロビット未満のビットレートのことである。DCTス
キームでは、時間的な冗長度を削除するために動き予測
（ME）及び動き補償（MC）を用いる一方、それ以外の空
間的冗長度を削除するためにDCTを用いる。BACKGROUND OF THE INVENTION Video access at mobile terminals such as cellular telephones and personal digital assistants encounters a number of difficult challenges due to the limitations of the nature of mobile systems. For example, low power handheld devices are limited by bandwidth, power, memory, and cost requirements. Video data received by such a handheld device is decoded by a video decoder. Video decoders associated with such terminals perform motion compensation in the spatial, or decompressed, domain. Video compression standards such as H.263, H261, and MPEG1 / 2/4 use a motion compensated discrete cosine transform (DCT) scheme to encode video at low bit rates. The low bit rate adopted here is approximately 6 per second.
A bit rate of less than 4 kilobits. The DCT scheme uses motion estimation (ME) and motion compensation (MC) to remove temporal redundancy, while using DCT to remove other spatial redundancy.

【０００３】図１は、ビデオデータを復号化すると共
に、空間領域で動き補償を実行するためのビデオデコー
ダの概略図である。ビットストリーム１０２がデコーダ
１００によって受け取られる。デコーダ１００には、可
変デコーダ（VLD）ステージ１０４と、ランレングスデ
コーダ（RLD）ステージ１０６、反量子化（DQ）ステー
ジ１０８、逆離散コサイン変換（IDCT）ステージ１１
０、動き補償（MC）ステージ１１２、及びフレームバッ
ファとも呼ばれるメモリ（MEM）１１４とがある。最初
の４ステージ（VLD１０４、RLD１０６、DQ１０８、IDCT
１１０）は、圧縮されているビットストリームを復号化
して画素領域に戻す。イントラコード化された（intrac
oded）ブロックでは、現フレームの中でブロックを再構
築するために最初の４ステージ、つまり、104、106、10
8、110の出力が直接用いられる。インターコード化され
た（intercoded）ブロックでは、出力が予測誤差を表わ
しており、現フレームの中でブロックを再構築するため
にその前のフレームから作られた予測に出力が付加され
る。よって、現フレームはブロック単位で再構築され
る。最終的に、現フレームがデコーダ、つまり、ディス
プレイ１１６の出力に送られると共に、フレームバッフ
ァ（MEM）１１４にも保持される。FIG. 1 is a schematic diagram of a video decoder for decoding video data and performing motion compensation in the spatial domain. Bit stream 102 is received by decoder 100. The decoder 100 includes a variable decoder (VLD) stage 104, a run-length decoder (RLD) stage 106, an anti-quantization (DQ) stage 108, and an inverse discrete cosine transform (IDCT) stage 11.
0, a motion compensation (MC) stage 112, and a memory (MEM) 114, also called a frame buffer. First four stages (VLD104, RLD106, DQ108, IDCT
110) decodes the compressed bit stream and returns it to the pixel area. Intra-coded (intrac
oded) block, the first four stages, ie, 104, 106, 10 to reconstruct the block in the current frame
8, 110 outputs are used directly. For an intercoded block, the output represents the prediction error, and the output is added to the prediction made from the previous frame to reconstruct the block in the current frame. Therefore, the current frame is reconstructed in block units. Finally, the current frame is sent to the decoder, the output of the display 116, and is also held in the frame buffer (MEM) 114.

【０００４】MEM１１４は、動き補償１１２に必要な既
に復号化されている画像（picture）を保持している。M
EM１１４のサイズは、入ってくる画像フォーマットに応
じてスケーリングしなければならない。例えば、H.263
は５つの規格化された画像フォーマット、すなわち、
（１）1/4以下（sub-quarter）共通中間フォーマット
（サブQCIF）、（２）1/4共通中間フォーマット（QCI
F）、（３）共通中間フォーマット（CIF）、（４）4CI
F、及び（５）16CIF、をサポートしている。各フォーマ
ットは、画像の幅及び高さだけでなく、アスペクト比も
定義する。広く知られているように、画像は１つの輝度
成分及び２つの色差成分（Y, Cr, Cb）としてコード化
される。それらの成分は4:2:0の構成でサンプリングさ
れ、各成分は画素あたり8ビット分解能を有する。例え
ば、図１のビデオデコーダは、CIFフォーマットのH.263
ビットストリームを復号化しながら、MEM１１４のため
に約２００キロバイトのメモリを割り当てなければなら
ない。さらに、ビデオ会議システムで欠かせないよう
に、複数のビットストリームが一度に復号化される場
合、メモリ需要が大きくなりすぎる。The MEM 114 holds an already decoded picture required for the motion compensation 112. M
The size of the EM 114 must be scaled according to the incoming image format. For example, H.263
Has five standardized image formats:
(1) 1/4 or less (sub-quarter) common intermediate format (sub-QCIF), (2) 1/4 common intermediate format (QCI
F), (3) Common intermediate format (CIF), (4) 4CI
F, and (5) 16CIF. Each format defines the aspect ratio as well as the width and height of the image. As is widely known, an image is coded as one luminance component and two chrominance components (Y, Cr, Cb). The components are sampled in a 4: 2: 0 configuration, with each component having an 8-bit resolution per pixel. For example, the video decoder of FIG.
About 200 kilobytes of memory must be allocated for MEM 114 while decoding the bitstream. Further, if multiple bit streams are decoded at once, as is essential in video conferencing systems, the memory demands will be too great.

【０００５】MEM１１４は、ビデオデコーダ１００の中
で唯一最大のメモリ使用源である。メモリ使用を軽減す
るために、入ってくるビットストリームの色成分の分解
能を低下させることが一つのアプローチとして考えられ
る。例えば、モバイル端末上のカラー表示の濃度が65,5
36色しか表示できないとしたら、色成分（Y, Cr, Cb）
の分解能を画素につき24ビットから16ビットに低下させ
ることができる。この技法は、可能性としては、メモリ
使用度を30%減らすことができるけれども、ビデオデコ
ーダで回路的に対応しなければならないディスプレイ依
存型ソリューションである。また、この技法は、ピーク
信号対雑音比（PSNR）要件を変えて簡単にスケーリング
することができないので、自由度がない。[0005] The MEM 114 is the single largest memory use source in the video decoder 100. One approach is to reduce the resolution of the color components of the incoming bitstream to reduce memory usage. For example, if the color display density on a mobile device is 65,5
If only 36 colors can be displayed, color components (Y, Cr, Cb)
Can be reduced from 24 bits to 16 bits per pixel. Although this technique can potentially reduce memory usage by 30%, it is a display-dependent solution that must be accommodated circuitically in the video decoder. Also, this technique has no flexibility because it cannot be easily scaled with varying peak signal-to-noise ratio (PSNR) requirements.

【０００６】空間領域でデータに操作を行なうには、圧
縮領域処理と比べ、より大きなメモリ容量が要る。空間
領域では、動き補償を算出すると共に連続フレームの画
像に動き補償をかけることが容易である。しかしなが
ら、圧縮領域で操作している場合、動き補償は、誤差値
がもはや空間値ではなくなるから、つまり、圧縮領域で
操作している時の誤差値は画素値ではないから、動きベ
クトルが前のフレームを示すのと比べてそれほど明快で
はない。その上に、圧縮領域データを効率的に処理する
能力を有する方法がない。先行技術のアプローチは主
に、圧縮領域をトランスコード化し、スケーリングし、
鮮鋭化する各アプリケーションを中心にしている。さら
に、圧縮領域対応の逆補償用アプリケーションは、ピー
ク信号対雑音比（PSNR）の性能が貧弱になる傾向がある
と同時に、１秒あたりに表示可能なフレーム量の点か
ら、応答時間が受容れ難いほど遅い。[0006] Performing operations on data in the spatial domain requires a larger memory capacity than the compressed domain processing. In the spatial domain, it is easy to calculate motion compensation and apply motion compensation to images of continuous frames. However, when operating in the compressed domain, the motion compensation is based on the fact that the error value is no longer a spatial value, that is, the error value when operating in the compressed domain is not a pixel value, so the motion vector is Not as clear as showing a frame. Moreover, there is no method that has the ability to process compressed domain data efficiently. Prior art approaches mainly transcode and scale the compressed domain,
Focuses on sharpening each application. In addition, compression-compensated de-compensation applications tend to have poor peak signal-to-noise ratio (PSNR) performance while accepting response times in terms of the amount of frames that can be displayed per second. Difficultly slow.

【０００７】[0007]

【特許文献１】米国再発行特許発明第6,240,210号明細
書[Patent Document 1] US Reissued Patent Invention No. 6,240,210

【特許文献２】米国特許第6,157,740号明細書[Patent Document 2] US Patent No. 6,157,740

【０００８】[0008]

【発明が解決しようとする課題】そこで、低ビットレー
トのビデオデータを復号化するのに要するメモリ所要量
を最小限にする方法並びに装置を提供すると共に、圧縮
領域ビデオレコーダの高速且つ効率的な逆動き補償を可
能にする方法並びに装置を提供するべく、先行技術が抱
える問題を解決する必要がある。SUMMARY OF THE INVENTION Accordingly, there is provided a method and apparatus for minimizing the amount of memory required to decode low bit rate video data, while providing a fast and efficient compression domain video recorder. There is a need to solve the problems of the prior art in order to provide a method and apparatus that enables reverse motion compensation.

【０００９】[0009]

【課題を解決するための手段】おおまかに言えば、本発
明は、ハイブリッドデータ構造を採用することによりメ
モリ所要量を最小限にするように構成されたビデオデコ
ーダを提供することで、こうしたニーズの少なくとも一
つの局面を満たすものである。なお、本発明のこの態様
は、方法、システム、コンピュータ可読媒体、又はデバ
イスなど、いろいろな方法で実現することができる。本
発明のこの態様の実施例を以下にいくつか説明する。SUMMARY OF THE INVENTION Broadly speaking, the present invention addresses these needs by providing a video decoder configured to minimize memory requirements by employing a hybrid data structure. It satisfies at least one aspect. Note that this aspect of the invention can be implemented in various ways, such as in a method, system, computer readable medium, or device. Several examples of this aspect of the invention are described below.

【００１０】１実施例において、ビットストリームを復
号化するのに要するメモリ所要量を低減するための方法
を提供する。この方法は、ビデオビットストリームを受
け取ることから始まる。次に、そのビットストリームの
フレームが変換（例えば、離散コサイン変換（DCT））
領域表現に復号化される。次に、その変換領域表現の非
零係数が識別される。次に、ハイブリッドデータ構造が
アセンブルされる。このハイブリッドデータ構造は、固
定サイズのアレイ及び可変サイズのオーバーフローベク
トルを含んでいる。次に、変換領域表現の非零係数がハ
イブリッドデータ構造の中に挿入される。In one embodiment, a method is provided for reducing the amount of memory required to decode a bitstream. The method starts with receiving a video bitstream. Next, the frame of the bitstream is transformed (eg, discrete cosine transform (DCT))
Decoded to region representation. Next, the non-zero coefficients of the transformed domain representation are identified. Next, the hybrid data structure is assembled. The hybrid data structure includes a fixed size array and a variable size overflow vector. Next, the non-zero coefficients of the transform domain representation are inserted into the hybrid data structure.

【００１１】別の実施例において、ビデオデータを復号
化するための方法を提供する。この方法は、圧縮された
ビットストリーム内のビデオデータのフレームを受け取
ることから始まる。次に、そのフレームのブロックが、
圧縮領域で変換（例えば、DCT）領域表現に復号化され
る。次に、ハイブリッドデータ構造が定義される。次
に、その変換領域表現と関連付けられるデータがハイブ
リッドデータ構造で保持される。次に、圧縮領域で変換
領域表現と関連付けられるデータに対して逆動き補償が
実行される。データに逆動き補償を行なった後、表示す
るためにデータが解凍される。[0011] In another embodiment, a method is provided for decoding video data. The method begins by receiving a frame of video data in a compressed bitstream. Next, the blocks in that frame are
Decoded into a transformed (eg, DCT) domain representation in the compressed domain. Next, a hybrid data structure is defined. Next, data associated with the transformed domain representation is held in a hybrid data structure. Next, inverse motion compensation is performed on the data associated with the transformed domain representation in the compressed domain. After performing reverse motion compensation on the data, the data is decompressed for display.

【００１２】また別の実施例において、低レートのビッ
トストリームデータをハイブリッドデータ構造で保持す
るべく並べ替えるためのプログラム命令を有するコンピ
ュータ可読媒体を提供する。このコンピュータ可読媒体
には、データフレームの符号化されたブロックと関連付
けられる非零変換（例えば、DCT）係数を識別するため
のプログラム命令が入っている。その非零変換係数を固
定サイズアレイに配列するためのプログラム命令が入っ
ている。非零変換係数の数量が固定サイズアレイの容量
を超えたかどうか判定するためのプログラム命令を提供
する。固定サイズアレイの容量を超える非零変換係数を
可変サイズオーバーフローベクトルで保持するためのプ
ログラム命令、及び非零変換係数を圧縮領域から空間領
域に平行移動させる（translate）ためのプログラム命
令を含んでいる。In yet another embodiment, a computer readable medium having program instructions for reordering low rate bit stream data to be maintained in a hybrid data structure is provided. The computer readable medium contains program instructions for identifying a non-zero transform (eg, DCT) coefficient associated with an encoded block of a data frame. Contains program instructions for arranging the non-zero transform coefficients in a fixed size array. Program instructions are provided for determining whether the quantity of non-zero transform coefficients exceeds the capacity of the fixed size array. Includes program instructions for holding non-zero transform coefficients exceeding the capacity of the fixed-size array in a variable-size overflow vector, and program instructions for translating non-zero transform coefficients from the compressed domain to the spatial domain. .

【００１３】さらに別の実施例において、回路を提供す
る。この回路は、ビデオデコーダ集積回路チップを有す
る。このビデオデコーダ集積回路チップは、ビデオデー
タのフレームと関連付けられるデータのビットストリー
ムを受け取るための回路構成を含む。ビデオデコーダに
は、データのビットストリームを変換（例えば、DCT）
領域表現に復号化するための回路構成が入っている。変
換領域表現の非零変換係数を、ビデオデコーダと関連付
けられるメモリの中のハイブリッドデータ構造で配列す
るための回路構成を提供する。表示するために、変換領
域表現の非零変換係数を解凍するための回路構成も提供
する。[0013] In yet another embodiment, a circuit is provided. This circuit has a video decoder integrated circuit chip. The video decoder integrated circuit chip includes circuitry for receiving a bit stream of data associated with a frame of video data. Video decoder converts the bit stream of data (eg, DCT)
A circuit configuration for decoding is included in the region representation. Circuit arrangement for arranging non-zero transform coefficients of a transform domain representation in a hybrid data structure in a memory associated with a video decoder. Circuitry for decompressing the non-zero transform coefficients of the transform domain representation for display is also provided.

【００１４】別の実施例において、画像を表示するよう
に構成された機器を提供する。この機器は、中央処理機
構（CPU）、ランダムアクセスメモリ（RAM）、及び画像
を表示するように構成されたディスプレイ画面とを含
む。ビデオビットストリームを変換（例えば、DCT）領
域表現に変換するように構成されたデコーダ回路構成を
含む。このデコーダ回路は、変換領域表現の非零変換係
数を、デコーダ回路と関連付けられるメモリの中にハイ
ブリッドデータ構造で配列する能力を有する。デコーダ
回路には、逆動き補償時に、ハイブリッド因数分解／整
数近似の技法を選択的に適用するための回路構成が入っ
ている。CPU、RAM、表示画面、及びデコーダ回路と通信
しているバスも有する。In another embodiment, an apparatus is provided that is configured to display an image. The device includes a central processing unit (CPU), a random access memory (RAM), and a display screen configured to display an image. A decoder circuit arrangement configured to convert the video bitstream to a transform (eg, DCT) domain representation. The decoder circuit has the ability to arrange the non-zero transform coefficients of the transform domain representation in a hybrid data structure in a memory associated with the decoder circuit. The decoder circuit has a circuit configuration for selectively applying the hybrid factorization / integer approximation technique at the time of inverse motion compensation. It also has a bus in communication with the CPU, RAM, display screen, and decoder circuit.

【００１５】大まかに言えば、本発明は、メモリ所要量
を低減すると同時に一応満足できるビデオ画質を提供
し、それと同時に、圧縮領域で逆動き補償を実行する能
力を有するビデオデコーダを提供することにより、こう
したニーズの少なくとも別の局面を満たすものである。
なお、本発明のこの態様は、方法、システム、コンピュ
ータ可読媒体、又は機器など、いろいろな方法で実現す
ることができる。本発明のこの態様の実施例についてい
くつか以下に説明する。Broadly speaking, the present invention provides a video decoder which has a reduced video footprint while providing acceptable video quality while at the same time providing the ability to perform inverse motion compensation in the compressed domain. Fulfilling at least another aspect of these needs.
It should be noted that this aspect of the present invention can be implemented in various ways, such as a method, a system, a computer-readable medium, or an apparatus. Some examples of this aspect of the invention are described below.

【００１６】一つの実施例において、逆メモリ補償を実
行するための方法を提供する。この方法は、ビデオビッ
トストリームを受け取ることから始まる。次に、変換行
列タイプが識別される。この変換行列タイプは、半画素
行列か完全画素行列かのどちらかである。この方法は、
変換行列タイプが半画素行列ならば、その半画素行列に
対応するビットストリームを復号化する因数分解技法を
適用することを含む。変換行列タイプが完全画素行列な
らば、その完全画素行列に対応するビットストリームを
復号化する整数近似技法を適用することを含む。In one embodiment, a method is provided for performing inverse memory compensation. The method starts with receiving a video bitstream. Next, the transformation matrix type is identified. This transformation matrix type is either a half pixel matrix or a full pixel matrix. This method
If the transformation matrix type is a half-pixel matrix, this includes applying a factorization technique to decode a bitstream corresponding to the half-pixel matrix. If the transformation matrix type is a full pixel matrix, this involves applying an integer approximation technique to decode the bitstream corresponding to the full pixel matrix.

【００１７】別の実施例において、ビデオデータを復号
化するための方法を提供する。この方法は、圧縮された
ビットストリーム内のビデオデータのフレームを受け取
ることから始まる。次に、そのフレームのブロックが変
換（例えば、離散コサイン変換（DCT））領域表現に圧
縮領域で復号化される。次に、その変換領域表現と関連
付けられるデータがハイブリッドデータ構造で保持され
る。次に、その圧縮領域で変換領域表現と関連付けられ
るデータに逆動き補償が実行される。逆動き補償の実行
には、ビデオデータのフレームの一部と関連付けられる
変換行列のタイプを決めることと、逆動き補償を向上さ
せるためにハイブリッド因数分解及び整数近似技法を適
用することが含まれる。In another embodiment, a method is provided for decoding video data. The method begins by receiving a frame of video data in a compressed bitstream. Next, the blocks of the frame are decoded in the compressed domain to a transform (eg, discrete cosine transform (DCT)) domain representation. Next, data associated with the transformed domain representation is held in a hybrid data structure. Next, inverse motion compensation is performed on the data associated with the transformed domain representation in the compressed domain. Performing inverse motion compensation involves determining the type of transform matrix associated with a portion of the frame of video data and applying hybrid factorization and integer approximation techniques to improve inverse motion compensation.

【００１８】また別の実施例において、圧縮領域で逆動
き補償を実行するためのプログラム命令を有するコンピ
ュータ可読メディアを提供する。このコンピュータ可読
メディアには変換行列を識別するためのプログラム命令
が入っている。変換行列が半画素行列か或いは完全画素
行列かを判定するためのプログラム命令が入っている。
半画素行列に対応するビットストリームのブロックを復
号化する因数分解技法を適用するためのプログラム命令
並びに完全画素行列に対応するビットストリームのブロ
ックを復号化する整数近似技法を適用するためのプログ
ラム命令を含んでいる。In yet another embodiment, a computer readable medium having program instructions for performing inverse motion compensation in a compressed domain is provided. The computer readable medium contains program instructions for identifying a transformation matrix. Contains program instructions for determining whether the transformation matrix is a half-pixel matrix or a complete pixel matrix.
The program instructions for applying the factorization technique for decoding the block of the bitstream corresponding to the half-pixel matrix and the program instructions for applying the integer approximation technique for decoding the block of the bitstream corresponding to the full pixel matrix Contains.

【００１９】さらに別の実施例において、回路を提供す
る。この回路には、ビデオデータを復号化するように構
成された集積回路チップがある。この集積回路チップ
は、ビデオデータのフレームと関連付けられるデータの
ビットストリームを受け取るための回路構成を含んでい
る。集積回路チップには、データのビットストリームを
変換（例えば、DCT）領域表現に復号化するための回路
構成が搭載されている。変換行列のタイプを識別するた
めの回路構成並びにハイブリッド因数分解及び整数近似
技法によって逆動き補償を実行するための回路構成が集
積回路チップに搭載されている。In yet another embodiment, a circuit is provided. The circuit includes an integrated circuit chip configured to decode video data. The integrated circuit chip includes circuitry for receiving a bit stream of data associated with a frame of video data. The integrated circuit chip has a circuit configuration for decoding a bit stream of data into a transform (eg, DCT) domain representation. A circuit configuration for identifying the type of the transformation matrix and a circuit configuration for performing inverse motion compensation by hybrid factorization and integer approximation techniques are mounted on an integrated circuit chip.

【００２０】別の実施例において、ビデオデコーダを提
供する。このビデオデコーダは、入ってくるビットスト
リームから係数値及び動きベクトルデータを抽出するよ
うに構成された可変長デコーダ（VLD）を含んでいる。V
LDと通信している反量子化ブロックを有する。この反量
子化ブロックは、係数値をスケーリングし直すように構
成されている。その反量子化ブロックと通信している下
流のブランチが設けられている。この下流ブランチは誤
差係数を空間領域に復号化するように構成されている。
反量子化ブロックと通信している上流のブランチを含ん
でいる。この上流ブランチは、内部変換（例えば、DC
T）領域変換を維持するように構成されている。上流ブ
ランチはさらに、現ブロックを再構築するために復号化
された誤差係数に加算されることが可能な空間領域出力
を生成するように構成されている。In another embodiment, a video decoder is provided. The video decoder includes a variable length decoder (VLD) configured to extract coefficient values and motion vector data from an incoming bit stream. V
It has an anti-quantization block in communication with the LD. The anti-quantization block is configured to rescale the coefficient values. A downstream branch in communication with the anti-quantization block is provided. This downstream branch is configured to decode the error coefficients into the spatial domain.
Includes an upstream branch in communication with the anti-quantization block. This upstream branch uses internal transformations (eg, DC
T) It is configured to maintain domain transformation. The upstream branch is further configured to generate a spatial domain output that can be added to the decoded error coefficients to reconstruct the current block.

【００２１】本発明のその他の態様並びに効果は、発明
の原理を例を挙げて示した添付の図面と共に、以下に述
べる詳細な説明から明白になる。Other aspects and advantages of the present invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.

【００２２】[0022]

【発明の実施の形態】本発明を、圧縮領域ビデオ復号化
に要するメモリ容量を最小限にするためのシステム、装
置、及び方法として説明する。しかしながら、当業者な
らば、以下の説明に鑑みて、以下に説明する詳細を部分
的に又は全く知らなくても本発明を実施できることが分
かる。また、本発明を不要に不明瞭なものにしないため
に、既によく知られているプロセスオペレーションにつ
いては詳細に説明しない。図１については、「従来の技
術」の項で説明した。本明細書で使用している「約」と
いう言葉は、基準値の＋／−１０％のことである。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention is described as a system, apparatus, and method for minimizing the amount of memory required for compressed domain video decoding. However, it will be apparent to one skilled in the art, in light of the following description, that the present invention may be practiced without some or all of the details described below. In other instances, well known process operations have not been described in detail in order not to unnecessarily obscure the present invention. FIG. 1 has been described in the section of “Prior Art”. As used herein, the term "about" refers to +/- 10% of the reference value.

【００２３】ここで説明する実施例は、圧縮領域でビデ
オデータを復号化する際に使用されるメモリの低減を可
能にするデータ構造を提供する。１実施例では、周波数
領域、つまり、圧縮領域で、現フレームが保持され、逆
動き補償が実行されるように、ビデオ復号化パイプライ
ンが並べ替えられている。ハイブリッドデータ構造は、
計算コストやデータの有意な損失なしに、圧縮領域での
データの操作を可能にする。１実施例において、ハイブ
リッドデータ構造は符号化されたブロックの中に非零離
散コサイン変換（DCT）係数はほんのわずかしかないと
いう事実を利用している。従って、フレーム全体の非零
DCT係数だけが保持されるので、メモリ所要量を低減す
ることができる。以下により詳細に説明するように、ハ
イブリッドデータ構造は固定サイズのアレイと可変サイ
ズのオーバーフローベクトルとを含んでいる。可変サイ
ズオーバーフローベクトルは、固定サイズアレイの容量
を超える符号化されたブロックの非零DCT係数を保持す
る。The embodiments described herein provide a data structure that allows for a reduction in the memory used in decoding video data in the compressed domain. In one embodiment, the video decoding pipeline is reordered so that the current frame is retained and inverse motion compensation is performed in the frequency domain, or compression domain. The hybrid data structure is
Enables data manipulation in the compressed domain without significant computational cost or loss of data. In one embodiment, the hybrid data structure takes advantage of the fact that there are very few non-zero discrete cosine transform (DCT) coefficients in the encoded block. Therefore, the non-zero
Since only the DCT coefficient is retained, the memory requirement can be reduced. As described in more detail below, the hybrid data structure includes a fixed size array and a variable size overflow vector. The variable size overflow vector holds the non-zero DCT coefficients of the coded block beyond the capacity of the fixed size array.

【００２４】図２は、本発明の１実施例による、逆動き
補償が実行されるように配置構成されたビデオデコーダ
の概略図である。ここで、ビデオデコーダ１２０によっ
てビットストリーム１２２が受け取られる。最初の２ス
テージ、つまり、可変長デコーダ（VLD）ステージ１２
４及び反量子化（DQ）ステージ１２６は圧縮されたビッ
トストリームをDCT領域表現に復号化する。DCT領域表現
は、動き補償（MC）ステージ１３４で使用するために、
フレームバッファとも呼ばれるメモリ（MEM）１３０に
保持される。MC１２８及びMEM１３４を含んだ動き補償
フィードバックループの後に、ランレングスデコーダ
（RLD）ステージ１３２及び逆DCT（IDCT）ステージ１３
４が実行される。従って、復号化されたブロックの内部
表現は圧縮領域のままである。符号化されたブロック内
に非零DCT係数はほんの小数しかないので、この特徴
を、フレーム内の各ブロックの非零DCT係数しか保持し
ないMEM１３０のデータ構造を開発することによって、
利用することができる。以下により詳細に明らかにして
いるように、ハイブリッドデータ構造により可能になる
メモリ圧縮は、ビデオ画質の損失なしにメモリ使用を５
０％減らすことができる。人間の視覚系は、高位のDCT
係数よりも低位DCT係数に対して敏感だから、以下に説
明するように、高位DCTをフィルタ処理して取り除くと
共にメモリ使用対変動電力又はピークの信号対雑音比の
要件をトレードオフするしきい値化スキームを開発し
た。FIG. 2 is a schematic diagram of a video decoder arranged to perform inverse motion compensation, according to one embodiment of the present invention. Here, the bit stream 122 is received by the video decoder 120. The first two stages, the variable length decoder (VLD) stage 12
4 and the dequantization (DQ) stage 126 decodes the compressed bitstream into a DCT domain representation. The DCT domain representation is for use in the motion compensation (MC) stage 134,
It is held in a memory (MEM) 130 also called a frame buffer. After a motion compensation feedback loop including MC 128 and MEM 134, a run length decoder (RLD) stage 132 and an inverse DCT (IDCT) stage 13
4 is executed. Therefore, the internal representation of the decoded block remains a compressed domain. Because there are only a few non-zero DCT coefficients in the coded block, this feature can be exploited by developing a MEM 130 data structure that holds only the non-zero DCT coefficients of each block in the frame.
Can be used. As will be explained in more detail below, the memory compression enabled by the hybrid data structure reduces the memory usage by 5 without loss of video quality.
It can be reduced by 0%. The human visual system is a high-level DCT
Since it is more sensitive to the lower DCT coefficients than the coefficients, thresholding is used to filter out higher DCTs and trade off memory usage versus fluctuating power or peak signal to noise ratio requirements, as described below. Developed a scheme.

【００２５】そこで、高速でしかもメモリ効率のいい復
号化ができるように最適化される完全圧縮領域ビデオ復
号化について説明する。１実施例で、本書で言及してい
るテストのために、パブリックドメインH.263に準拠し
ているデコーダであるTELENORのビデオデコーダを使用
した。なお、以下に説明する実施例の中にはH.263ビッ
トストリームと呼んでいるものがあるが、実施例はH.26
3ビットストリームに対する操作だけに限らない。すな
わち、Motion Picture Expert Group（MPEG）1/2/4、H.
261など、ビデオデータを有するどんなDCTベースの圧縮
ビットストリームでも採用することができる。圧縮領域
での効率的な処理を可能にする離散コサイン変換（DC
T）領域表現のための高速逆動き補償アルゴリズムは数
多い。なお、符号化されたブロック内に非零DCT係数を
保持するメモリ圧縮法は、圧縮領域での処理だから、メ
モリ所要量を低減することが可能になる。さらに、スピ
ード及びメモリの最適化における様々な性能のトレード
オフを実証するために、本書で説明している逆動き補償
技術及びメモリ圧縮を用いた圧縮領域処理を採用するビ
デオデコーダの性能を３つの次元で、つまり、計算量、
メモリ効率、PSNRの観点から評価する。A description will now be given of fully-compressed area video decoding which is optimized so as to enable high-speed and memory-efficient decoding. In one embodiment, a video decoder from TELENOR, a public domain H.263 compliant decoder, was used for the tests referred to herein. It should be noted that some of the embodiments described below are referred to as H.263 bit streams.
It is not limited to operations on 3 bit streams. That is, Motion Picture Expert Group (MPEG) 1/2/4, H.
Any DCT-based compressed bit stream with video data, such as 261 can be employed. Discrete cosine transform (DC) enables efficient processing in the compressed domain
T) There are many fast inverse motion compensation algorithms for region representation. Note that the memory compression method of storing non-zero DCT coefficients in an encoded block is processing in a compression area, so that the required memory amount can be reduced. Further, to demonstrate various performance trade-offs in speed and memory optimization, the performance of a video decoder employing the inverse motion compensation technique and the compression domain processing using memory compression described herein is reduced by three. In dimensions, that is, the complexity,
Evaluate in terms of memory efficiency and PSNR.

【００２６】 [0026]

【数１】 (Equation 1)

【００２７】 [0027]

【数２】 (Equation 2)

【００２８】 [0028]

【数３】 (Equation 3)

【００２９】 [0029]

【数４】 (Equation 4)

【数５】 (Equation 5)

【００３０】低ビットレートビデオ、つまり、毎秒約６
４キロビット未満のビットレートを有するビデオデータ
は、ビデオ会議のアプリケーションに使用されると共
に、セルラー電話、パーソナルディジタルアシスタント
（PDA）や、その他のハンドヘルド機器や電池で動く機
器のワイヤレスビデオといったアプリケーション向けで
ある。H.263規格は、低ビットレートのビデオ復号化用
のビットストリームシンタックス及びアルゴリズムを指
定している模式的規格である。アルゴリズムは、変換符
号化、動き推定／補償、係数量子化、ランレングス符号
化を含む。ベースライン指定とは別に、この規格のバー
ジョン２は、符号化性能を向上させると共にエラー差回
復力を提供する１６の交渉可能なオプションもサポート
している。Low bit rate video, ie, about 6 per second
Video data having a bit rate of less than 4 kilobits is used for video conferencing applications and for applications such as cellular phones, personal digital assistants (PDAs), and wireless video on other handheld and battery-powered devices. . The H.263 standard is a schematic standard specifying a bitstream syntax and algorithm for decoding low bit rates video. Algorithms include transform coding, motion estimation / compensation, coefficient quantization, and run-length coding. Apart from baseline specification, version 2 of the standard also supports 16 negotiable options that provide improved coding performance and provide error resilience.

【００３１】低ビットレートで符号化されたビデオは、
目に見える歪みが発生する可能性がある。特に、アクシ
ョンの多い分類になっているビデオ、つまり、活動的な
動きのブロックはそうである。先に触れたように、本書
で説明している実施例では、H.263規格のことを指して
いるが、適していればどんなビデオコーデック規格でも
実施例と共に使用することができる。参考までに、H.26
3規格の機能特徴のいくつかを以下に説明するが、これ
により本発明をH.263規格と共に使用することに限定し
ようとしているわけではない。H.263規格の一つの特徴
は、この規格の中に画像群（GOP）及び高位レイヤーが
存在していない点である。ベースライン符号化された列
が、単一イントラフレーム（Iフレーム）とそれに続く
長いインターフレーム（Pフレーム）列とだけからなる
場合、時間的冗長が連続フレーム間で取り除かれるの
で、長いPフレーム列がより高い圧縮比を実現する。し
かしながら、動き推定／動き補償（ME/MC）は時間依存
性を生じさせるので、損失性符号化プロセス時に発生し
た誤差が復号化プロセス時に集積する。Iフレームが足
りないと、デコーダはこの誤差集積を崩すことができな
い。H.263規格は強制更新メカニズムを有するので、符
号化プロセス時に少なくとも１３２回に一回エンコーダ
がマクロブロックをイントラブロックとして符号化しな
ければならない。図４は、強制更新メカニズムの効果を
説明している図である。図４に示すように、ビデオのPS
NRは無作為に変動するが、列の後半にあるフレームでは
どんな方向にもドリフトしない。Video encoded at a low bit rate is
Visible distortion can occur. This is especially the case for videos that have been categorized as action-rich, that is, blocks of active movement. As mentioned earlier, the embodiments described herein refer to the H.263 standard, but any suitable video codec standard may be used with the embodiments. For reference, H.26
Some of the functional features of the three standards are described below, but are not intended to limit the invention to use with the H.263 standard. One feature of the H.263 standard is that there is no group of pictures (GOP) and higher layers in this standard. If the baseline coded sequence consists only of a single intra-frame (I-frame) followed by a long sequence of inter-frames (P-frames), the long P-frame sequence is removed because temporal redundancy is removed between consecutive frames. Achieve higher compression ratios. However, motion estimation / motion compensation (ME / MC) introduces a time dependency, so that errors generated during the lossy encoding process accumulate during the decoding process. If there are not enough I-frames, the decoder cannot break this error accumulation. Because the H.263 standard has a forced update mechanism, the encoder must encode the macroblock as an intrablock at least once every 132 times during the encoding process. FIG. 4 is a diagram illustrating the effect of the forced update mechanism. As shown in FIG. 4, the video PS
The NR fluctuates randomly, but does not drift in any direction in the later frames of the row.

【００３２】図５は、H.263規格で半画素の値の判定を
説明している概略図である。よく知られているように、
H.263規格は動き補償に半画素補間を採用している。こ
の規格では、半画素補間が０．５分解能（つまり、<7.
5, 4.5>）を有する動きベクトルによって示される。エ
ンコーダは、水平方向だけ、垂直方向だけ、或いは水平
垂直両方向で、補間を指定することができる。図５に示
されているように、半画素値は、半画素の位置を取り巻
く整数画素位置の双線形補間によって見出される。画素
位置A 150-1、画素位置B 150-2、画素位置C 150-3、画
素位置D 150-4は、整数画素位置を表わしているのに対
し、画素位置e 152-1、画素位置f 152-2、画素位置g 15
2-3は、半画素位置を表わしている。水平方向の補間をe
=(A+B+1)>>1と表わし、垂直方向の補間をf=(A+C+1)>>1
と表わすことができる。水平垂直両方向の補間をg=(A+B
+C+D+2)>>2と表わすことができる。FIG. 5 is a schematic diagram illustrating the determination of the value of a half pixel in the H.263 standard. As is well known,
The H.263 standard employs half-pixel interpolation for motion compensation. In this standard, half-pixel interpolation uses 0.5 resolution (that is, <7.
5, 4.5>). The encoder can specify interpolation only in the horizontal direction, only in the vertical direction, or in both the horizontal and vertical directions. As shown in FIG. 5, the half-pixel values are found by bilinear interpolation of the integer pixel positions surrounding the half-pixel position. Pixel position A 150-1, pixel position B 150-2, pixel position C 150-3, and pixel position D 150-4 represent integer pixel positions, whereas pixel position e 152-1 and pixel position f 152-2, pixel position g 15
2-3 represents a half pixel position. E for horizontal interpolation
= (A + B + 1) >> 1 and the vertical interpolation is f = (A + C + 1) >> 1
Can be expressed as G = (A + B
+ C + D + 2) >> 2.

【００３３】図６A及び６Bはそれぞれ、ベースライン空
間ビデオデコーダ及び圧縮領域ビデオデコーダの概略図
である。図６Bのブロック図は、図６Aの空間領域ビデオ
デコーダの機能ブロックを部分的に並べ替えたものであ
る。特に、RLD１３２とIDCT１３４がMC２８フィードバ
ックループの後に移動している。この配列により、圧縮
領域でビデオの内部表現を保つことができる。図６Bの
配列では、圧縮領域後処理モジュールをMC１２８フィー
ドバックループのすぐ後に挿入可能である。なお、複合
化（compositing）、スケーリング、非ブロック化な
ど、特定のビデオ操作は、空間領域での操作と比べ、圧
縮領域のほうが高速である。しかしながら、ビデオコー
デックの観点から言えば、空間的エンコーダは圧縮領域
デコーダに完全にマッチしない。図６Bに表示されてい
るように、圧縮領域ビデオデコーダは、復号化パイプラ
イン沿いのいくつかの点で図６Aの空間領域ビデオデコ
ーダとは異なる。単なるブロックの並べ替えだけという
のではなく、相違点は、クリッピングや丸めなど非線形
操作を表わしている。これらの非線形性を有する点が、
２つの領域間でPSNR測定値が異なるビデオを生成する。FIGS. 6A and 6B are schematic diagrams of a baseline spatial video decoder and a compressed domain video decoder, respectively. The block diagram in FIG. 6B is obtained by partially rearranging the functional blocks of the spatial domain video decoder in FIG. 6A. In particular, RLD 132 and IDCT 134 have moved after the MC 28 feedback loop. This arrangement allows the internal representation of the video to be preserved in the compressed domain. In the arrangement of FIG. 6B, the compression domain post-processing module can be inserted immediately after the MC128 feedback loop. Certain video operations, such as compositing, scaling, and deblocking, are faster in the compressed domain than in the spatial domain. However, from a video codec perspective, a spatial encoder does not perfectly match a compressed domain decoder. As shown in FIG. 6B, the compressed domain video decoder differs from the spatial domain video decoder of FIG. 6A at some points along the decoding pipeline. Rather than just reordering blocks, the differences represent non-linear operations such as clipping and rounding. The point having these nonlinearities is
Generate a video with different PSNR measurements between the two regions.

【００３４】 [0034]

【００３５】当業者ならば、MEM１３０は、動き補償の
前のフレームを保持するフレームバッファであることが
分かる。空間領域デコーダでは、フレームバッファが、
（Y,Cr, Cb）値を保持するために、入ってくるフレーム
サイズに十分対応できるメモリを割り当てる。例えば、
４：２：０でサンプリングされたCIFビデオには約２０
０キロバイトのメモリがなければならない。MEM１３０
はビデオデコーダで唯一の最大メモリ使用源だから、こ
こで定義しているハイブリッドデータ構造及び逆動き補
償により、圧縮領域復号化パイプラインのメモリ使用の
軽減が可能になる。１実施例では、復号化されたビデオ
で有意な画質の損失なく、２倍から３倍のメモリ圧縮を
達成している。Those skilled in the art will recognize that the MEM 130 is a frame buffer that holds frames before motion compensation. In the spatial domain decoder, the frame buffer is
Allocate memory sufficient to accommodate the incoming frame size to hold the (Y, Cr, Cb) values. For example,
About 20 for CIF video sampled at 4: 2: 0
There must be 0 kilobytes of memory. MEM130
Since is the only maximum memory usage in a video decoder, the hybrid data structure and inverse motion compensation defined herein allow for reduced memory usage in the compression domain decoding pipeline. In one embodiment, two to three times the memory compression is achieved without significant loss of image quality in the decoded video.

【００３６】図７は、本発明の１実施例による、ビデオ
符号化及び復号化プロセス時のブロック変換を説明して
いるブロック図である。点線１７０より上の変換のシー
ケンスは、動き補償／動き推定後に、Iフレームの中の
ブロック又はPフレームの中のブロックに対してビデオ
エンコーダが用いる空間圧縮法を説明している。画素ブ
ロック１７２は完全８ｘ８行列である。この時点で、空
間領域での圧縮又は打ち切りはどんなものでも、再構築
されたブロックの目に見える画質に直接影響する。しか
しながら、DCT変換後、変換された行列１７４はコンパ
クトで、低周波数で項が大きくなっている。量子化のス
テップが、ブロック１７６の中の高周波数の小さめの項
を零にすることにより、ブロックをさらにコンパクトに
する。ブロック１７６で強調されているジグザグ走査
が、低周波数から高周波数にDCT係数に順番を付ける。
ランレングス符号化は、２値をもつ要素（element）、
例えば、ラン及びレベルのコンパクトなリストの中の、
零係数を無視し、非零DCT係数だけをランレングス表現
１７８で表わす。従って、非零DCT係数のランレングス
表現を保持及びアクセスする効率的なデータ構造及び方
法を開発することにより、DCT領域でメモリ圧縮を達成
することができる。FIG. 7 is a block diagram illustrating block conversion during the video encoding and decoding process according to one embodiment of the present invention. The sequence of transforms above the dotted line 170 describes the spatial compression method used by the video encoder for blocks in I-frames or blocks in P-frames after motion compensation / motion estimation. Pixel block 172 is a complete 8x8 matrix. At this point, any compression or truncation in the spatial domain directly affects the visible image quality of the reconstructed block. However, after the DCT transform, the transformed matrix 174 is compact and has large terms at low frequencies. The quantization step further compacts the block by zeroing out the higher frequency smaller terms in block 176. The zigzag scan highlighted in block 176 orders the DCT coefficients from low to high frequencies.
Run-length encoding is a binary element,
For example, in a compact list of runs and levels,
The zero coefficient is ignored, and only the non-zero DCT coefficients are represented in the run length representation 178. Thus, memory compression can be achieved in the DCT domain by developing efficient data structures and methods for holding and accessing run-length representations of non-zero DCT coefficients.

【００３７】１実施例において、半圧縮（SC）表現が、
一つのそうしたメモリ効率のいいランレングス表現であ
る。非零DCT係数のランレングス表現は、図７のランレ
ングス表現１７８及び１８０に類似している。しかしな
がら、２つの変形がある。各２値をもつ要素（ラン、レ
ベル）をその形の複合１６ビット値によって記述する。In one embodiment, the semi-compressed (SC) representation is
One such memory-efficient run-length representation. The run-length representation of the non-zero DCT coefficients is similar to the run-length representations 178 and 180 of FIG. However, there are two variants. Each binary element (run, level) is described by its composite 16-bit value.

【００３８】 RL = binary rrrrllllllllllll' (9) 最下位の１２ビット（llllllllllll'）は、ブロック１
８４から反量子化されたDCT係数の値を定義している。
ブロック１８４は、量子化されたブロック１８２から導
出されたものである。なお、ブロック１８４は、DCT領
域表現の一例である。当業者ならば、DCT係数の値は２
０４８から２０４７の範囲であることが明らかである。
図７のブロック１８６は、ブロック１８４にIDCT操作を
行なった後のブロック１７２の再構築ブロックである。
最上位の４ビット（rrrr'）はランの値を定義してい
る。ランは、８ｘ８ブロック内のジグザグ走査に基づ
く、終わりの非零DCT係数の位置に対する非零DCT係数の
相対位置を表わしている。非零係数のランが１５を超え
る可能性があるので、ランをより小さな単位に分かるた
めにエスケープシーケンスが定義される。１５の零係数
の後に零振幅（zero amplitude）の係数が続くランを表
わすために、エスケープシーケンスRL='F0'が定義され
ている。RL = binary rrrrllllllllllll '(9) The least significant 12 bits (llllllllllll') are in block 1
84 defines the value of the anti-quantized DCT coefficient.
Block 184 is derived from quantized block 182. Note that the block 184 is an example of a DCT area expression. Those skilled in the art will appreciate that the value of the DCT
It is clear that the range is from 048 to 2047.
Block 186 of FIG. 7 is a reconstructed block of block 172 after performing the IDCT operation on block 184.
The four most significant bits (rrrr ') define the value of the run. The run represents the relative position of the non-zero DCT coefficients relative to the position of the ending non-zero DCT coefficients based on a zigzag scan within the 8x8 block. Since the run of non-zero coefficients can exceed 15, an escape sequence is defined to make the run smaller. The escape sequence RL = 'F0' is defined to represent a run with 15 zero coefficients followed by a zero amplitude coefficient.

【００３９】メモリ所要量を減らすために、SC表現を保
持及びアクセスするためのデータ構造を開発しなければ
ならない。次のデータ構造、つまり、アレイ、連結リス
ト、ベクトル、ハイブリッドを考察した。これらの構造
を開発する際、メモリ圧縮の必要性と、計算量を低く維
持する必要性とのバランスを考慮に入れた。以下に、表
１を参照してさらに説明する。SC表現は狙ったメモリ圧
縮を実現するが、特定のデータ構造によっては３つの分
野でデコーダの計算量が大きく増大する。第１に、２バ
イト表現を採用することにより、（ラン、レベル）の値
を直ちに使用することができなくなる。これらの値にア
クセス及び変更を加えるには毎回、ビットをパック及び
アンパックするための関数が必要である。第２に、コン
パクトランレングス表現により動き補償が複雑になる。
第３に、予測に予測誤差を加算するには分類及び併合
（sort and merge）操作が必要である。To reduce memory requirements, data structures must be developed to hold and access SC representations. We considered the following data structures: arrays, linked lists, vectors, and hybrids. In developing these structures, we took into account the balance between the need for memory compression and the need to keep computations low. This will be further described below with reference to Table 1. The SC representation achieves the targeted memory compression, but the computational complexity of the decoder greatly increases in three fields depending on the specific data structure. First, the use of the 2-byte representation makes it impossible to use the value of (run, level) immediately. Accessing and modifying these values requires a function to pack and unpack bits each time. Second, motion compensation is complicated by the compact run-length representation.
Third, adding a prediction error to a prediction requires a sort and merge operation.

【００４０】図８は、ランレングス表現の各８ｘ８ブロ
ックの開始位置を見出すために、別のインデックスを使
用することを説明している概略図である。フレームの中
の全ての８ｘ８ブロック１９２−１から１９２−４のラ
ンレングス表現を保持するためにベクトルとも呼ばれる
単一リスト１９０を用いる場合、動き補償時に或る特定
のDCTブロックにアクセスするにはその開始位置を調べ
るために別のインデックスが必要になる。それは、動き
補償を複雑にしてしまう。FIG. 8 is a schematic diagram illustrating the use of another index to find the starting position of each 8.times.8 block of the run-length representation. If a single list 190, also called a vector, is used to hold the run-length representation of all 8x8 blocks 192-1 to 192-4 in the frame, then to access a particular DCT block during motion compensation, Another index is needed to find the starting position. That complicates motion compensation.

【００４１】図９A及び９Bは、アレイベースのデータ構
造及びリストデータ構造それぞれに関して、予測に予測
誤差を加算するのに必要な分類及び併合操作を説明して
いる。図９Aで、アレイベースのデータ構造は、対応す
るアレイインデックスにおける値の加算しか必要としな
い。しかしながら、アレイベースのデータ構造にはメモ
リ圧縮効果がない。図９Bで、リスト（もしくは、ベク
トル）データ構造にはさらに分類及び併合操作が必要で
ある。すなわち、併合のアルゴリズムには挿入及び削除
の機能がなければならない。それは、ベクトルなどデー
タ構造にとって計算量の観点から非常に高価になる。よ
り具体的に言えば、インデックスが等しければ、DCT係
数を加算又は除算することができる。例えば、(0,20) +
(0,620)=(0,640)。間違ったインデックスが予測で先行
している場合には、DCT係数が挿入される。例えば、(0,
-3)を挿入。DCT値の加算により０になる場合には、DCT
係数は削除される。例えば、(1,13) + (4,-13)=(1,0)。FIGS. 9A and 9B illustrate the classification and merging operations required to add the prediction error to the prediction, for an array-based data structure and a list data structure, respectively. In FIG. 9A, the array-based data structure only requires the addition of the value at the corresponding array index. However, array-based data structures have no memory compression effect. In FIG. 9B, the list (or vector) data structure requires further classification and merging operations. That is, the merging algorithm must have insertion and deletion functions. It is very expensive for data structures, such as vectors, in terms of computational complexity. More specifically, if the indexes are equal, DCT coefficients can be added or divided. For example, (0,20) +
(0,620) = (0,640). If the wrong index precedes the prediction, DCT coefficients are inserted. For example, (0,
-3) is inserted. If the addition of DCT value results in 0, DCT
The coefficients are deleted. For example, (1,13) + (4, -13) = (1,0).

【００４２】表１は、いろいろなデータ構造のメモリ圧
縮比及び計算費用を比較したものである。アレイベース
のデータ構造は、予測の更新に必要な６４の加算以外に
追加の計算費用は発生しないが、DCT係数のアレイは、
各DCT係数が保持のために１バイトではなく２バイトを
必要とするので、画素のアレイと比べメモリ圧縮を実現
することはできない。連結リスト又は半圧縮（SC）表現
のベクトルは、画素のアレイと比べ、上限で２．５倍の
メモリ圧縮が可能である。しかしながら、ベクトルの挿
入／削除費用は高価だから、いずれの解法とも最適とは
いえない。特に、ベクトルの真ん中の挿入及び削除は高
価である。また、リストの要素毎に内部ポインタが作成
されるので連結リストのメモリオーバーヘッドは高価で
ある。Table 1 compares memory compression ratios and calculation costs of various data structures. Although the array-based data structure does not incur any additional computational costs other than the 64 additions required to update the prediction, the array of DCT coefficients is
Since each DCT coefficient requires 2 bytes instead of 1 byte to hold, memory compression cannot be achieved compared to an array of pixels. Vectors in linked lists or semi-compressed (SC) representations can have up to 2.5 times the memory compression compared to arrays of pixels. However, the cost of inserting / deleting vectors is expensive, so neither solution is optimal. In particular, inserting and deleting in the middle of a vector is expensive. Also, since an internal pointer is created for each element of the list, the memory overhead of the linked list is expensive.

【表１】 [Table 1]

【００４３】SC表現のハイブリッドデータ構造により、
表１で競合している利害関係を最適なバランスをするこ
とができる。図９Aのアレイ構造の低計算費用及び図９B
のベクトル構造の高圧縮比をうまく利用するために、ハ
イブリッドデータ構造が開発された。ハイブリッドデー
タ構造は、ブロック毎に固定数のDCT係数を保持する固
定サイズアレイと、これらのブロックで固定サイズアレ
イ割り当てを超えたDCT係数を保持する可変サイズのオ
ーバーフローベクトルとで構成されている。なお、固定
サイズアレイは、ブロック毎に適した任意数のDCT係数
を保持するように構成することができる。ここで、DCT
係数の数は６４未満である。言うまでもなく、固定サイ
ズアレイが大きくなると、それに比例してメモリ圧縮量
が減る。１実施例で、固定サイズアレイは、ブロックあ
たり８つのDCT係数を保持するように構成されている。With the hybrid data structure of the SC expression,
Table 1 provides an optimal balance between competing interests. 9A and 9B.
In order to take advantage of the high compression ratio of the vector structure, a hybrid data structure was developed. The hybrid data structure is composed of a fixed-size array that holds a fixed number of DCT coefficients for each block, and a variable-size overflow vector that holds DCT coefficients exceeding the fixed-size array allocation in these blocks. Note that the fixed size array can be configured to hold an arbitrary number of DCT coefficients suitable for each block. Where DCT
The number of coefficients is less than 64. Of course, the larger the fixed size array, the more memory compression is reduced. In one embodiment, the fixed size array is configured to hold eight DCT coefficients per block.

【００４４】図１０は、本発明の１実施例による、メモ
リ圧縮及び計算効率性を可能にするアレイ構造及びベク
トル構造を含むハイブリッドデータ構造の概略図であ
る。DCTブロック２００−１、２００−２、２００−ｎ
には、零DCT係数及び非零DCT係数が入っている。なお、
DCTブロック２００−１から２００−ｎは、図２におい
て先に説明したようにDCT領域表現を表わす。さらに、
ブロック２００−１から２００−ｎは、ビデオデータの
フレームのブロック、例えば、図７のブロック１８４、
と関連付けられる。ブロック２００−１から２００−ｎ
の各ブロックの非零係数が識別され、固定サイズアレイ
２０２データ構造の中に挿入される。固定サイズアレイ
２０２には、固定サイズブロック２０４−１から２０４
−ｎがある。１実施例で、各ブロック２０４−１から２
０４−ｎは、８ｘ８データ構造の中に８つのDCT係数を
保持できる大きさになっている。なお、本発明は、８つ
のDCT係数を保持する構成になったブロックに限定され
るものではなく、適した任意のサイズを採用することが
できる。先に述べたように、ブロックの容量が増大する
と、それに比例してメモリ圧縮量は低下する。FIG. 10 is a schematic diagram of a hybrid data structure including an array structure and a vector structure that enables memory compression and computational efficiency, according to one embodiment of the present invention. DCT blocks 200-1, 200-2, 200-n
Contains a zero DCT coefficient and a non-zero DCT coefficient. In addition,
DCT blocks 200-1 to 200-n represent the DCT domain representation as described earlier in FIG. further,
Blocks 200-1 to 200-n are blocks of frames of video data, eg, block 184 of FIG.
Associated with Blocks 200-1 to 200-n
Are identified and inserted into the fixed-size array 202 data structure. The fixed size array 202 includes fixed size blocks 204-1 to 204-4.
-N. In one embodiment, each block 204-1 through 204-2
04-n is large enough to hold eight DCT coefficients in an 8 × 8 data structure. It should be noted that the present invention is not limited to a block configured to hold eight DCT coefficients, but may employ any suitable size. As described above, as the capacity of the block increases, the amount of memory compression decreases in proportion thereto.

【００４５】引き続き、図１０において、DCTブロック
２００−１から２００−ｎのどれかに８個を超える非零
係数が入っている場合、それぞれ固定サイズブロック２
０４−１から２０４−ｎの容量を超える非零DCT係数は
オーバーフローベクトル２０６の中に入れられる。オー
バーフローベクトル２０６は、可変サイズオーバーフロ
ーベクトルとして構成されている。つまり、オーバーフ
ローベクトルは動的である。例えば、ブロック２００−
１は、９つの非零DCT係数A１〜A９を含んでいる。ここ
で、DCT係数A１〜A８は固定サイズブロック２０４−１
にコピーされるが、DCT係数A９はオーバーフローベクト
ル２０６にコピーされる。ブロック２００−２は、１０
個の非零DCT係数B１〜B１０を含んでいる。ここで、DCT
係数B１〜B８は固定サイズブロック２０４−２にコピー
されるが、DCT係数B９及びB１０は、フレームのブロッ
ク毎に、オーバーフローベクトル２０６、その他に、コ
ピーされる。インデックス表２０８には、オーバーフロ
ーベクトル２０６のエントリに関して、対応する固定サ
イズブロック２０４−１から２０４−ｎを識別するエン
トリが入っている。各エントリは１バイトだから、イン
デックス表のサイズは無視することができる。従って、
DCTブロック２００−１から２００−ｎに対応するデー
タのフレームでは、イメージ２１０を生み出すために、
固定サイズアレイ２０２及びオーバーフローベクトル２
０６からのデータが組み合わされる。相当なメモリの節
約ができる。すなわち、DCTブロック２００−１から２
００−ｎは、ほとんどの場合、６４個の零及び非零係数
から、固定サイズブロック２０４−１から２０４−ｎに
保持された８個以下の非零係数まで減少する。言うまで
もなく、多少の非零係数が提供される可能性はあり、非
零係数が８を超えるとオーバーフローベクトル２０６に
保持される。In FIG. 10, if any of the DCT blocks 200-1 to 200-n contains more than eight non-zero coefficients, the fixed-size blocks 2
Non-zero DCT coefficients exceeding the capacity of 04-1 to 204-n are included in the overflow vector 206. The overflow vector 206 is configured as a variable size overflow vector. That is, the overflow vector is dynamic. For example, block 200-
1 includes nine non-zero DCT coefficients A1 to A9. Here, the DCT coefficients A1 to A8 are fixed size blocks 204-1.
, But the DCT coefficient A9 is copied to the overflow vector 206. Block 200-2 is 10
Number of non-zero DCT coefficients B1 to B10. Where DCT
The coefficients B1 to B8 are copied to the fixed size block 204-2, while the DCT coefficients B9 and B10 are copied to the overflow vector 206 and the like for each block of the frame. The index table 208 contains entries that identify the corresponding fixed-size blocks 204-1 to 204-n for the entries of the overflow vector 206. Since each entry is one byte, the size of the index table can be ignored. Therefore,
In frames of data corresponding to DCT blocks 200-1 to 200-n, to produce image 210,
Fixed size array 202 and overflow vector 2
06 are combined. Significant memory savings can be made. That is, DCT blocks 200-1 to 200-2
00-n most often decreases from 64 zero and non-zero coefficients to eight or less non-zero coefficients held in fixed size blocks 204-1 through 204-n. Of course, some non-zero coefficients may be provided, and if the non-zero coefficient exceeds 8, it will be retained in the overflow vector 206.

【００４６】図１１A〜１１Cは、本発明の１実施例によ
る、ハイブリッドデータ構造の固定サイズアレイの固定
サイズブロックの容量及びオーバーフローベクトルを判
定する際に評価される因子を説明しているグラフであ
る。図１１Aで、２つの典型的なCIFシーケンスの輝度ブ
ロックあたりの非零DCT係数の平均数が、線２２０及び
２２２で描かれている。ブロックあたりの非零係数の数
は３から７の範囲である。すなわち、６４個の係数のう
ち、平均して２から７個の係数だけが非零係数である。
図１１Bは、固定サイズアレイが増加すると、オーバー
フローベクトルのサイズが減少するので、ベクトルの挿
入及び削除費用を最小限にすることができることを、図
１１Aの情報を指針として用いて、説明している。ここ
で、線２２０−１は図１１Aの線２２０のCIFシーケンス
に対応しているのに対し、線２２２−１は図１１Aの線
２２２のCIFシーケンスに対応している。当業者なら
ば、固定サイズアレイは容量が増えるにつれて、メモリ
圧縮は低下することが分かる。さらに、図１１Cは、ア
レイの負荷因子（load factor）も減少して、アレイは
ほとんど空のままであることを示している。１実施例で
は、ブロックあたり８つのDCT係数を保持する固定サイ
ズアレイを選んだ。ここでも、線２２０ー２は図１１A
の線２２０のCIFシーケンスに対応し、線２２２−２は
図１１Aの線２２２のCIFシーケンスに対応している。こ
の選択が、オーバーフローベクトルのサイズに最小にし
てDCT係数を約２００に抑え、負荷因子を約９％と約１
５％の間に維持する。当業者ならば、固定サイズアレイ
はブロックあたり８係数に限定されるものではなく、ブ
ロックあたりの係数の数は適した任意の数を選んでいい
ことが明らかである。さらに、固定サイズアレイの個々
のブロックは適した任意の構成にすることができる。例
えば、８つの係数を保持する能力を有するブロックを、
例えば、８ｘ１のブロック、４ｘ２のブロックとして配
列することができる一方、９つの係数を保持する能力を
有するブロックを、例えば、９ｘ１のブロック、３ｘ３
のブロックとして配列することができる。FIGS. 11A-11C are graphs illustrating factors evaluated in determining the capacity and overflow vector of a fixed size block of a fixed size array of a hybrid data structure, according to one embodiment of the present invention. . In FIG. 11A, the average number of non-zero DCT coefficients per luma block for two typical CIF sequences is depicted by lines 220 and 222. The number of non-zero coefficients per block ranges from three to seven. That is, out of the 64 coefficients, only 2 to 7 coefficients on average are non-zero coefficients.
FIG. 11B illustrates using the information of FIG. 11A as a guideline, as increasing the fixed size array reduces the size of the overflow vector, thereby minimizing vector insertion and deletion costs. . Here, line 220-1 corresponds to the CIF sequence of line 220 in FIG. 11A, while line 222-1 corresponds to the CIF sequence of line 222 in FIG. 11A. One skilled in the art will recognize that as the size of the fixed size array increases, memory compression decreases. In addition, FIG. 11C shows that the load factor of the array has also decreased, leaving the array almost empty. In one embodiment, a fixed size array was chosen that holds eight DCT coefficients per block. Again, line 220-2 is shown in FIG.
Line 222-2 corresponds to the CIF sequence of line 222 in FIG. 11A. This choice minimizes the size of the overflow vector, keeps the DCT coefficient at about 200, and reduces the loading factor to about 9% and about 1
Maintain between 5%. It will be apparent to those skilled in the art that the fixed size array is not limited to eight coefficients per block, but that the number of coefficients per block may be any suitable number. Further, the individual blocks of the fixed size array can be of any suitable configuration. For example, a block with the ability to hold eight coefficients
For example, a block that can be arranged as an 8x1 block, a 4x2 block, but has the ability to hold 9 coefficients, such as a 9x1 block, 3x3
Can be arranged as blocks.

【００４７】図１２は、本発明の１実施例による、ビッ
トストリームを復号化するのに要するメモリ所要量を減
らすための方法のオペレーションのフローチャートであ
る。この方法は、ビデオビットストリームを受け取るオ
ペレーション２３０から始まる。１実施例で、ビットス
トリームは低レートビットストリームである。例えば、
ビデオストリームを、H.263、Motion Pictures Expert
Group（MPEG-1/2/4）、H.261、Joint Photographic Exp
ert Group（JPEG）など、ビデオ符号化規格と関連付け
ることができる。この方法は次に、オペレーション２３
２に進み、そこでビットストリームのフレームが、その
フレームと関連付けられるデータの各ブロックの離散コ
サイン変換（DCT）領域表現に復号化される。ここで、
ビデオは、図２、６B、１５に示したデコーダなど、デ
コーダの初めの２ステージで処理される。すなわち、ビ
デオデータは、圧縮されたビットストリームをDCT領域
表現に復号化するために、可変長デコーダステージ及び
反量子化ステージで処理される。なお、DCT領域表現は
圧縮状態フォーマットになっている。フレームは一度に
１ブロックずつ復号化される。この方法は次に、オペレ
ーション２３４に進み、そこでDCT領域表現の非零係数
が識別される。ここで、データブロックのDCT領域表現
と関連付けられる６４のDCT係数のうち、６４中の比較
的少数のDCT係数が概して非零係数である。FIG. 12 is a flowchart of the operation of a method for reducing the memory requirements required to decode a bitstream, according to one embodiment of the present invention. The method begins at operation 230 where a video bitstream is received. In one embodiment, the bitstream is a low rate bitstream. For example,
Video streams from H.263, Motion Pictures Expert
Group (MPEG-1 / 2/4), H.261, Joint Photographic Exp
It can be associated with video coding standards, such as the ert Group (JPEG). The method then proceeds to operation 23
Proceeding to 2, the frame of the bitstream is decoded into a discrete cosine transform (DCT) domain representation of each block of data associated with the frame. here,
The video is processed in the first two stages of the decoder, such as the decoder shown in FIGS. That is, the video data is processed in a variable length decoder stage and an anti-quantization stage to decode the compressed bit stream into a DCT domain representation. The DCT area representation is in a compressed state format. The frame is decoded one block at a time. The method then proceeds to operation 234 where non-zero coefficients of the DCT domain representation are identified. Here, of the 64 DCT coefficients associated with the DCT domain representation of the data block, a relatively small number of the 64 DCT coefficients are generally non-zero coefficients.

【００４８】引き続き、図１２において、この方法は次
に、ハイブリッドデータ構造をアセンブルするオペレー
ション２３６に進む。ハイブリッドデータ構造は、固定
サイズアレイと可変サイズオーバーフローベクトルとを
含んでいる。模式的な一つのハイブリッドデータ構造
は、図１０に示した複数の固定サイズブロック及び可変
サイズオーバーフローベクトルを含む固定サイズアレイ
である。この方法は次に、オペレーション２３８に進
み、そこでDCT領域表現の非零係数がハイブリッドデー
タ構造の中に挿入される。図１０において説明したよう
に、ビデオデータブロックのDCT領域表現の非零係数
は、固定サイズアレイ内の固定サイズブロックと関連付
けられる。非零係数の数が、ビデオデータブロックと関
連付けられる固定サイズブロックの容量を超えると、残
りの非零係数は可変サイズオーバーフローベクトルに保
持される。１実施例で、インデックス表は、オーバーフ
ローベクトル内のデータを固定サイズアレイ内のしかる
べき固定サイズブロックに写像する。従って、ハイブリ
ッドデータ構造と非零係数の保持により、メモリ所要量
が低減される。より具体的に言えば、ビデオ画質の損失
なしにメモリ所要量を５０％低減することができる。Continuing with FIG. 12, the method then proceeds to operation 236 to assemble the hybrid data structure. The hybrid data structure includes a fixed size array and a variable size overflow vector. One typical hybrid data structure is a fixed-size array including a plurality of fixed-size blocks and a variable-size overflow vector shown in FIG. The method then proceeds to operation 238, where the non-zero coefficients of the DCT domain representation are inserted into the hybrid data structure. As described in FIG. 10, the non-zero coefficients of the DCT domain representation of the video data block are associated with fixed size blocks in the fixed size array. When the number of non-zero coefficients exceeds the capacity of the fixed-size block associated with the video data block, the remaining non-zero coefficients are kept in a variable-size overflow vector. In one embodiment, the index table maps the data in the overflow vector to appropriate fixed-size blocks in a fixed-size array. Thus, the hybrid data structure and the retention of non-zero coefficients reduce memory requirements. More specifically, the memory requirement can be reduced by 50% without loss of video quality.

【００４９】なお、データフレームと関連付けられる各
DCT領域表現の非零係数はハイブリッドデータ構造に保
持される。保持されたフレームのデータは次に、表示す
るために組み合わされ、解凍される。１実施例では、そ
の次のフレームがハイブリッドデータ構造に保持される
べくDCT領域表現に復号化されてしまえば、ハイブリッ
ドデータ構造内で前のフレームと関連付けられたデータ
がフラッシュされる。以下にさらに説明するように、逆
動き補償は圧縮領域の保持されたデータに対して行なわ
れる。逆動き補償は、完全画素逆動き補償には整数近似
を用い、半画素逆動き補償には因数分解を用いる。Each of the data frames
The non-zero coefficients of the DCT domain representation are kept in a hybrid data structure. The data of the retained frames is then combined and decompressed for display. In one embodiment, once the next frame has been decoded into a DCT domain representation to be retained in the hybrid data structure, the data associated with the previous frame in the hybrid data structure is flushed. As will be described further below, reverse motion compensation is performed on the data held in the compressed domain. In reverse motion compensation, integer approximation is used for perfect pixel reverse motion compensation, and factorization is used for half-pixel reverse motion compensation.

【００５０】空間H.263ビデオデコーダの主な構成要素
には、ランレングス復号化、逆DCT、及び逆動き補償が
ある。タイミングプロファイラを用いて、1.1 GHzペン
ティアム（Pentium；登録商標）４プロセッサで走行し
ているTELENORのH.263ビデオデコーダの性能をベースラ
インデータで測定する。ベースラインデータを復号化
し、システム呼び出しを無視して、プロファイラは、１
４４のフレームを復号化するのに要する総体的な時間を
測定すると共に、各構成要素のタイミング特性を詳細に
記述する。表２は、空間的H. 263ビデオデコーダのタイ
ミングプロファイルで、特に、選ばれた機能のタイミン
グ結果を示すものである。The main components of the spatial H.263 video decoder include run-length decoding, inverse DCT, and inverse motion compensation. Using a timing profiler, the performance of a TELENO.H.263 video decoder running on a 1.1 GHz Pentium® 4 processor is measured at baseline data. Decrypting the baseline data and ignoring the system call, the profiler returns 1
The overall time required to decode 44 frames is measured and the timing characteristics of each component are described in detail. Table 2 shows the timing profile of the spatial H.263 video decoder, specifically showing the timing results for selected functions.

【表２】 [Table 2]

【００５１】表３は、最適化されていない圧縮領域H.26
3ビデオデコーダのタイミングプロファイルである。一
つの模式的デコーダのパイプライン構成は図２に示した
デコーダである。Table 3 shows that the non-optimized compressed area H.26
3 is a timing profile of a video decoder. One typical decoder pipeline configuration is the decoder shown in FIG.

【表３】 [Table 3]

【００５２】表２に示すように、空間領域ビデオデコー
ダは、１４４のフレームを復号化するのに約１．２秒か
かる。この時間のほとんどが、例えば、WINDOW^TMなど、
適したオペレーティングシステムで表示するために、各
フレームのカラー値をYUVからRGBに変換する画像表示機
能にとられている。ランレングス復号化、逆DCT、逆動
き補償といった機能が、ビデオを復号化するのに要する
時間全体のうちの約２５％をとっている。逆動き補償は
空間領域で特に高速である。ここで、完全画素の動き補
償は単にポインタをメモリもしくはフレームバッファ内
の位置に設定してデータブロックをコピーするだけなの
に対し、半画素の動き補償はメモリにポインタを設定し
且つシフトオペレータを用いて値を補間する。対照し
て、表３は、最適化されていない圧縮領域ビデオデコー
ダのタイミング結果のいくつかを示したものである。最
適化されていない圧縮領域デコーダは、同じ１４４のフ
レームを復号化するのに約１３．６７秒かかっている。As shown in Table 2, the spatial domain video decoder takes about 1.2 seconds to decode 144 frames. Most of this time, for example, WINDOW ^TM
It uses an image display function that converts the color value of each frame from YUV to RGB for display on a suitable operating system. Functions such as run-length decoding, inverse DCT, and inverse motion compensation take up about 25% of the total time required to decode the video. Reverse motion compensation is particularly fast in the spatial domain. Here, full pixel motion compensation simply sets the pointer to a location in the memory or frame buffer and copies the data block, whereas half pixel motion compensation sets the pointer in memory and uses a shift operator. Interpolate values. In contrast, Table 3 shows some of the timing results of the non-optimized compressed domain video decoder. An unoptimized compressed domain decoder takes about 13.67 seconds to decode the same 144 frames.

【００５３】 [0053]

【数６】 (Equation 6)

【００５４】 [0054]

【表４】 [Table 4]

【００５５】８ｘ８行列の乗算には各々、５１２の掛け
算と４４８の加算が必要である。知られているように、
行列乗算は計算的に高価である。表５は、マクロブロッ
クの行列乗算、行列因数分解、共用ブロックなど最適化
スキームと、図２、６B、１５に示したパイプラインな
ど圧縮領域ビデオパイプラインのハイブリッドスキーム
との比較を示している。各データフレームが３５２本の
ラインを含み、ラインあたり２８８個の画素を有する共
通中間フォーマットといったビデオフォーマットをサポ
ートしているハンドヘルド機器で容認できる画質を提供
するために、圧縮領域ビデオ復号化パイプラインは毎秒
約１５〜２５フレーム（fps）というレートで復号化す
べきである。The multiplication of an 8 × 8 matrix requires 512 multiplications and 448 additions, respectively. As is known,
Matrix multiplication is computationally expensive. Table 5 shows a comparison of optimization schemes such as matrix multiplication, matrix factorization, and shared blocks of macroblocks with hybrid schemes of compressed domain video pipelines such as the pipelines shown in FIGS. To provide acceptable image quality on handheld devices supporting video formats, such as a common intermediate format, where each data frame includes 352 lines and has 288 pixels per line, the compressed domain video decoding pipeline is Decoding should be at a rate of about 15-25 frames per second (fps).

【表５】 [Table 5]

【００５６】圧縮領域ビデオ復号化パイプラインのエン
ハンスメントの一つは、ブロックアライメントにより、
方程式（１０）のTMi演算数を減らすことである。ブロ
ックアライメントを、例えば、次のように行なう：１列
の１４４フレームを復号化して、ブロックアライメント
率を全ブロックの３６．７％で測定する。図１３は、行
列乗算を減らすブロックアライメントの３つの例を説明
している概略図である。ブロックアライメント２４０
（(w＝8, h＝4)）、ブロックアライメント２４２（(w＝
4, h＝8)）、及びブロックアライメント２４４（(w＝8,
h＝8)）の各ケースが描かれている。これらの例２４
０、２４２、２４４では各々、対応するブロックとのオ
ーバーラップがゼロになると、TMi演算はなくなる。但
し、DCT領域（圧縮領域）で、ブロックアライメント
は、半画素補間を指定している場合には、節約にはなら
ない。圧縮領域における半画素動き補償の方程式は以下
の通り。(w＝8, h＝8)の例で、半画素補間には、方程式
１２及び１３に示されているように、４つのTMi演算が
依然として必要である。表６は、半画素変換行列C_hpij
を定義するにあたり、参考までに提供しているものであ
る。One of the enhancements in the compressed domain video decoding pipeline is that block alignment
This is to reduce the number of TMi operations in equation (10). The block alignment is performed, for example, as follows: 144 columns of one column are decoded, and the block alignment ratio is measured at 36.7% of all blocks. FIG. 13 is a schematic diagram illustrating three examples of block alignment that reduces matrix multiplication. Block alignment 240
((W = 8, h = 4)), block alignment 242 ((w =
4, h = 8)) and block alignment 244 ((w = 8,
h = 8)). Example 24 of these
At 0, 242, and 244, when the overlap with the corresponding block becomes zero, the TMi operation disappears. However, in the DCT area (compression area), the block alignment does not save when half-pixel interpolation is specified. The equation for half-pixel motion compensation in the compressed domain is as follows. In the example of (w = 8, h = 8), half-pixel interpolation still requires four TMi operations, as shown in equations 12 and 13. Table 6 shows the half-pixel conversion matrix C _hpij
Is provided for reference in defining.

【数７】 (Equation 7)

【表６】 [Table 6]

【００５７】完璧にアライメントされたブロックでも、
半画素補間により近傍の画素とのオーバーラップが生じ
る。図１４は、完璧にアライメントされたDCTブロック
の半画素補間の概略図である。半画素補間により、１画
素幅及び１画素高だけ近傍ブロックへのオーバーラップ
が生じる。Even with perfectly aligned blocks,
Half-pixel interpolation causes overlap with neighboring pixels. FIG. 14 is a schematic diagram of half-pixel interpolation of a perfectly aligned DCT block. Half-pixel interpolation results in overlap of neighboring blocks by one pixel width and one pixel height.

【００５８】図２のデコーダの機能ブロックの並べ替え
により圧縮領域復号化パイプラインの処理スピードを上
げることができる。表２及び３で、逆DCTブロックの処
理時間は、圧縮領域（632 ms）と比べ、空間領域のほう
がはるかに短い（3 ms）。空間領域で、逆DCTは、フィ
ードバックループの前にイントラブロック及び誤差係数
に対して用いられる。具体的に言えば、イントラブロッ
ク及び誤差係数は、ビデオの全ブロックの１５％未満に
あたる。その他の８５％では、単に逆DCT機能は省かれ
る。圧縮領域では、逆DCTが、パイプラインの最終ステ
ージで、ビデオの各フレームのブロック１００％に用い
られる。The processing speed of the compressed area decoding pipeline can be increased by rearranging the functional blocks of the decoder of FIG. In Tables 2 and 3, the processing time of the inverse DCT block is much shorter (3 ms) in the spatial domain than in the compressed domain (632 ms). In the spatial domain, the inverse DCT is used for intra blocks and error coefficients before the feedback loop. Specifically, intra blocks and error coefficients represent less than 15% of all blocks of the video. In the other 85%, the inverse DCT function is simply omitted. In the compression domain, an inverse DCT is used for 100% of the blocks in each frame of the video at the last stage of the pipeline.

【００５９】図１５は、本発明の１実施例による、ビデ
オデータの処理を向上させる圧縮領域ビデオデコーダの
機能ブロックの並べ替えを説明している概略図である。
ここで、機能ブロックは並べ替えられ、圧縮領域パイプ
ラインは２ヶ所で分けられる。最初のスプリットはVLD
１２４及びDQ１２６の後に点（i）２５２で起きる。上
流のブランチで、パイプラインはメモリ圧縮１２８の内
部DCT領域表現を保っている。下流のブランチで、パイ
プラインは、誤差係数を空間領域に復号化するために、
RLD及びIDCTを前方に移動する。２番目のスプリット
は、動き補償（MC）時に点（ii）２５４で発生する。動
き補償時に、方程式（７）によると、空間領域出力が生
成される可能性がある。ディスプレイ１３６に表示する
ために点（iii）２５６で現ブロックを再構築するべ
く、出力を誤差係数に直に加算することができる。内部
DCT表現を維持するためにDCTブロック２５０がフィード
バックループに挿入される。点（i）２５２でのRLD１３
２とIDCT１３４との組合せ及び点（ii）２５４でのDCT
は、図２のパイプラインの最終ステージでのIDCTブロッ
クと比べ、必要な計算が少ない。表７は、本書で説明し
ている他の最適化スキームに加えて組合せ可能な図１５
に示した並べ替えにより２０％のスピードアップを図れ
ることを実証している。FIG. 15 is a schematic diagram illustrating the reordering of functional blocks of a compressed domain video decoder that enhances the processing of video data, according to one embodiment of the present invention.
Here, the functional blocks are rearranged, and the compression area pipeline is divided into two places. First split is VLD
Occurs at point (i) 252 after 124 and DQ 126. In the upstream branch, the pipeline maintains the internal DCT domain representation of memory compression 128. In the downstream branch, the pipeline decodes the error coefficients into the spatial domain,
Move RLD and IDCT forward. The second split occurs at point (ii) 254 during motion compensation (MC). During motion compensation, a spatial domain output may be generated according to equation (7). The output can be added directly to the error coefficient to reconstruct the current block at point (iii) 256 for display on display 136. internal
DCT block 250 is inserted in the feedback loop to maintain the DCT representation. RLD13 at point (i) 252
2 with IDCT 134 and DCT at point (ii) 254
Requires less computation than the IDCT block in the last stage of the pipeline in FIG. Table 7 shows the possible combinations of FIG. 15 in addition to the other optimization schemes described herein.
It has been demonstrated that the rearrangement shown in FIG.

【表７】 [Table 7]

【００６０】１実施例で、方程式（１１、１３）の基本
的TM演算に必要な掛け算の数を減らすことにより逆動き
補償が加速される。完全８ｘ８行列乗算を計算する代わ
りに、方程式１４に示すように、DCT行列Sが疎行列の列
に因数分解される。方程式（１７）の疎行列は、順序行
列(A₁,A₂,A₃,A₄,A₅,A₆)及び対角行列(D,M)を含む。この
因数分解を方程式（１５）の中に代入すると、方程式
（１６）のTM_iの完全に因数分解された方程式を導き出
すことができる。これは、方程式（１１、１３）よりも
掛け算の数が少ない。In one embodiment, inverse motion compensation is accelerated by reducing the number of multiplications required for the basic TM operation of Equations (11, 13). Instead of computing a full 8x8 matrix multiplication, the DCT matrix S is factored into sparse matrix columns, as shown in equation 14. The sparse matrix of equation (17) includes an order matrix (A ₁ , A ₂ , A ₃ , A ₄ , A ₅ , A ₆ ) and a diagonal matrix (D, M). Substituting this factorization in equation (15) can be derived fully factored equations TM _i in equation (16). This has fewer multiplications than equation (11,13).

【数８】 (Equation 8)

【００６１】従って、行列乗算は、行列順列で置き換え
られる。但し、方程式（１６）に示すように、項TM_i,の
完全に因数分解された式は必ずしも逆動き補償をスピー
ドアップするとは限らない。基本的に、掛け算がメモリ
アクセスとトレードされたので、メモリアクセスが多す
ぎると、実のところ復号化プロセスを遅くする。従っ
て、これらの競合する機能間のバランスをとるべく行列
の再グループ化が行なわれる。行列S (= G₀G₁)は２つの
項、G₀=DA₁A₂A₃（順列と乗算の混合）と、G₁=MA₄A₅A
₆（順列と加算の混合）とに因数分解される。方程式
（２４）で逆動き補償に因数分解された式を形成するた
めに、固定行列J_i, K_iが定義され、方程式（１０及び１
２）の中に代入される。Thus, matrix multiplication is replaced by matrix permutation. However, as shown in equation (16), a completely factorized expression of the term TM _i , does not always speed up inverse motion compensation. Basically, since the multiplication was traded for memory access, too many memory accesses actually slow down the decoding process. Therefore, regrouping of the matrices is performed to balance between these competing functions. The matrix S (= G ₀ G ₁ ) has two terms, G ₀ = DA ₁ A ₂ A ₃ (mixed permutation and multiplication) and G ₁ = MA ₄ A ₅ A
₆ (mixture of permutation and addition). To form the equation factored for inverse motion compensation in equation (24), fixed matrices J _i , K _i are defined and equations (10 and 1)
Substituted in 2).

【数９】半画素補間でも同様に、(Equation 9) Similarly, in half-pixel interpolation,

【数１０】 (Equation 10)

【００６２】固定行列J_i, K_iによる高速乗算を実現する
ことにより、スピードをさらに向上させることができ
る。固定行列には構造の繰り返しが入っている。例え
ば、行列J₆は次のように定義される。By realizing high-speed multiplication by the fixed matrices J _i and K _i , the speed can be further improved. Fixed matrices contain structural repetitions. For example, the matrix J ₆ is defined as follows.

【数１１】 (Equation 11)

【００６３】ここでは、a=0.7071、b=0.9239、c=0.3827
とする。u= {u₁,…,u₈}及びv={v₁,…,v₈}と仮定する
と、u = J₆vを計算するには、方程式の列を次のステッ
プに従って計算する：Here, a = 0.7071, b = 0.9239, c = 0.3827
And Assuming u = {u ₁ , ..., u ₈ } and v = {v ₁ , ..., v ₈ }, to calculate u = J ₆ v, calculate the sequence of equations according to the following steps:

【数１２】 (Equation 12)

【００６４】 [0064]

【００６５】逆動き補償の更なるスピードアップを図る
ために、方程式（１１、１３）の基本的なＴＭ演算に必
要な掛け算を削除する。完全画素及び半画素の行列、C
_ij及びC_hpijが、2^-5のベキに一番近い２進数に近似され
る。これらの行列を２進数に近似すると、方程式（１
０、１２）の逆動き補償を解くために、右シフトや加算
など基本的な整数演算を用いることによって行列乗算を
行なうことができる。例えば、h=1の場合、完全画素行
列C₁₁を以下のように調べる。なお、その他の行列は同
じように近似されている。In order to further speed up the reverse motion compensation, the multiplication necessary for the basic TM operation of the equations (11, 13) is deleted. Matrix of full and half pixels, C
_ij and C _Hpij is approximated to binary closest to powers of 2 ^-5. When these matrices are approximated to binary numbers, the equation (1
To solve the inverse motion compensation of (0, 12), matrix multiplication can be performed by using basic integer operations such as right shift and addition. For example, in the case of h = 1, examine the complete pixel matrix C ₁₁ as follows. Note that the other matrices are similarly approximated.

【数１３】行列の各要素を２のベキに一番近い値に丸めると、行列
（４７）が生まれる。(Equation 13) Rounding each element of the matrix to the value closest to a power of 2 yields a matrix (47).

【数１４】 [Equation 14]

【００６６】DCT要素は[-2048 to 2047]の範囲内に入っ
ているから、DCT係数を直接シフトすることでほとんど
の値がゼロに駆動される。中間結果の精度を維持するた
めに、復号化パイプライン全体で各DCT係数を2⁸でスケ
ーリングする。この倍率は、量子化及び反量子化のステ
ップで導入されるから、余分な演算は発生しない。Since the DCT element is in the range of [-2048 to 2047], most values are driven to zero by directly shifting the DCT coefficient. To maintain the intermediate result precision scales each DCT coefficient 2 ⁸ throughout the decoding pipeline. Since this scaling factor is introduced in the quantization and anti-quantization steps, no extra computation occurs.

【００６７】さらに、積和のルールに従って項をグルー
プ化することにより高速行列乗算を実現する（方程式
（４８〜５０）を参照）。Further, high-speed matrix multiplication is realized by grouping terms according to the rule of sum of products (see equations (48 to 50)).

【数１５】 (Equation 15)

【００６８】 [0068]

【数１６】 (Equation 16)

【００６９】 [0069]

【００７０】ビデオの動きに基づいて選択的に適用され
る変換行列TMのハイブリッド因数分解／整数近似は、容
認できる画質を維持しながら、同時に、好ましい約１５
〜２５fpsのフレームレートを実現する。先に触れたよ
うに、整数近似技法は、デコーダの計算量を低減するだ
けでなく、復号化されたビデオのPSNRも低下させる。同
時に、因数分解技法では、良好なPSNRは維持できるけれ
ども、好ましいフレームレートを満たすにはデコーダの
計算量は低下しない。整数近似技法の計算量に低さと因
数分解技法の精度の高さとを統合することで、低レート
ビデオビットストリームをサポートするための圧縮領域
ビデオ復号化パイプラインを得ることができる。The hybrid factorization / integer approximation of the transformation matrix TM that is selectively applied based on the video motion, while maintaining acceptable image quality, is at the same time preferable to about 15
A frame rate of ２５25 fps is realized. As mentioned earlier, integer approximation techniques not only reduce the complexity of the decoder, but also reduce the PSNR of the decoded video. At the same time, the factorization technique can maintain good PSNR but does not reduce the complexity of the decoder to meet the preferred frame rate. By integrating the low computational complexity of the integer approximation technique with the high accuracy of the factorization technique, a compressed domain video decoding pipeline to support low rate video bitstreams can be obtained.

【００７１】２つのタイプの変換行列、つまり、方程式
（１１）に示した完全画素動き補償TM_iと、方程式（１
３）に示した半画素動き補償TM_hpiについて説明してき
た。TM _iに近似行列を用いると、完全画素動き補償は、
８ｘ８浮動小数点行列を用いた場合と比べ、計算量が２
８％ですむ。しかしながら、半画素変換行列TM_hpiに近
似技法を直接用いる場合、TM_hpiの近似行列を用いる
と、半画素動き補償はＰＳＮＲを低下させる（表８）と
共に復号化されたビデオに目に見える歪みを発生させる
ことが観察された。誤差源は２つある。その一つは、半
画素変換行列TM_hpiが近似技法により敏感に反応するこ
とである。表８で、TM_hpiはTM_i以外にも数多くの項から
なる複合行列である。２つ目は、先に図６A及び６Ｂに
おいて説明したように、半画素補間時の非線形処理が、
近似技法によって発生した誤差と組み合わされて、中程
度から高度の動き領域で目に見える誤差の累積を生じさ
せる。Two types of transformation matrices, namely the equations
Perfect pixel motion compensation TM shown in (11)_iAnd the equation (1
Half pixel motion compensation TM shown in 3)_hpiHave explained
Was. TM _iUsing an approximation matrix for
Compared to the case of using an 8 × 8 floating-point matrix, the calculation amount is
8% is enough. However, the half-pixel transformation matrix TM_hpiClose to
When using similar techniques directly, TM_hpiUse the approximate matrix of
And half-pixel motion compensation reduces PSNR (Table 8)
Cause visible distortion in the video decoded together
Was observed. There are two error sources. One of them is half
Pixel conversion matrixTM_hpiRespond more sensitively to approximation techniques
And In Table 8, TM_hpiIs TM_iBesides, from many sections
Is a composite matrix. The second one is first shown in FIGS. 6A and 6B.
As described above, the nonlinear processing at the time of half-pixel interpolation is
Combined with the error introduced by the approximation technique,
Accumulation of visible errors in the motion range from degrees to high
Let

【００７２】半画素行列に因数分解技法を選択的に適用
することでこうした誤差の問題を解決することができ
る。先に説明したように、因数分解技法は浮動小数点の
精度を維持するので、上に説明した誤差を最小限にする
ことができる。例えば、因数分解技法は、TM_hpiを有す
る行列乗算を、方程式（２５〜４５）に示したのと同じ
ような方程式の列に還元する。これらの方程式は３２ビ
ット浮動小数点の精度を維持するので、近似誤差が生ま
れない。さらに、因数分解技法は動き補償時にDCTブロ
ックを空間領域に復号化するので、図１５において説明
した最適化をいま説明した最適化と組み合わせることが
できる。表５はハイブリッド法で１５fpsの目標フレー
ムレートを達成できることを実証しているのに対し、表
８はハイブリッド法のＰＮＳＲは容認できるPSNRを実現
することを示している。The error problem can be solved by selectively applying the factorization technique to the half-pixel matrix. As discussed above, the factorization technique maintains floating point precision, thereby minimizing the errors described above. For example, the factorization technique reduces the matrix multiplication with TM _hpi to a sequence of equations similar to that shown in equations (25-45). These equations maintain 32-bit floating point precision, so no approximation error is introduced. Further, since the factorization technique decodes the DCT block into the spatial domain during motion compensation, the optimization described in FIG. 15 can be combined with the optimization just described. Table 5 demonstrates that the target frame rate of 15 fps can be achieved with the hybrid method, while Table 8 shows that the PNSR of the hybrid method achieves an acceptable PSNR.

【表８】 [Table 8]

【００７３】図１６は、本発明の１実施例による、圧縮
領域で逆動き補償を行なうための方法のオペレーション
のフローチャート図である。この方法は、圧縮されたビ
ットストリーム内のビデオデータフレームを受け取るオ
ペレーション２６０から始まる。１実施例で、ビットス
トリームは低レートビットストリームである。例えば、
ビットストリームは、MPEG 4、H.263、H.261など、公知
のビデオ符号化規格と関連付けられていて構わない。こ
の方法は次にオペレーション２６２に進み、そこでビッ
トストリームのフレームのブロックが離散コサイン変換
（DCT）領域表現に復号化される。ここで、ビデオは、
図２、６Ｂ、１５に示したデコーダなど、デコーダの最
初の２ステージで処理される。すなわち、ビデオデータ
は、圧縮ビットストリームをDCT領域表現に復号化する
ために、可変長デコーダのステージ及び反量子化のステ
ージで処理される。なお、DCT領域表現は圧縮状態のフ
ォーマットになっている。この方法は次にオペレーショ
ン２６４に進み、そこでDCT領域表現と関連付けられる
データがハイブリッドデータ構造に保持される。適した
ハイブリッドデータ構造は、図１０及び１２において説
明したハイブリッドデータ構造である。１実施例で、ハ
イブリッドデータ構造は、例えば、セルラー電話、PD
A、ウェブタブレット、ポケットパソコンなど、ビデオ
データを表示するためのディスプレイ画面を有する携帯
用電子機器のメモリ所要量を低減する。FIG. 16 is a flowchart of the operation of a method for performing reverse motion compensation in a compressed domain, according to one embodiment of the present invention. The method begins with operation 260 of receiving a video data frame in a compressed bitstream. In one embodiment, the bitstream is a low rate bitstream. For example,
The bitstream may be associated with a known video coding standard such as MPEG4, H.263, H.261. The method then proceeds to operation 262, where the blocks of the frame of the bitstream are decoded into a discrete cosine transform (DCT) domain representation. Where the video is
It is processed in the first two stages of the decoder, such as the decoder shown in FIGS. That is, the video data is processed in a variable length decoder stage and an anti-quantization stage to decode the compressed bit stream into a DCT domain representation. The DCT area representation is in a compressed format. The method then proceeds to operation 264, where data associated with the DCT domain representation is kept in a hybrid data structure. A suitable hybrid data structure is the hybrid data structure described in FIGS. In one embodiment, the hybrid data structure is, for example, a cellular telephone, a PD
A. Reduce the memory requirements of portable electronic devices such as web tablets, pocket personal computers, etc., having display screens for displaying video data.

【００７４】引き続き、図１６において、この方法はオ
ペレーション２６６に進み、そこでDCT領域表現と関連
付けられるデータに圧縮領域で逆動き補償が実行され
る。ここで、逆動き補償には、表５及び９において説明
したハイブリッド因数分解／整数近似技法が選択的に適
用される。この方法は次にオペレーション２６８に進
み、そこでハイブリッド因数分解／整数近似技法が、処
理中のビデオデータのブロックと関連付けられる変換行
列のタイプを識別する。１実施例では、いま復号化され
ているビットストリームのビットセットの中の情報によ
り変換行列のタイプが検出される。変換行列が半画素行
列の場合には、この方法はオペレーション２７０に進
み、そこでビットストリームを復号化するために因数分
解技法が用いられる。１実施例では、先に方程式２５〜
４５において説明したように、因数分解技法により行列
乗算が一連の方程式に還元される。すなわち、行列乗算
が行列順列で置き換えられる。決定のオペレーション２
６８で変換行列が完全画素行列であると判定された場合
には、この方法はオペレーション２７２に進み、そこで
ビットストリームを復号化するために整数近似技法が用
いられる。ここでは、方程式４６〜５８において先に説
明したように、逆動き補償を解くために、基本的な整数
演算を用いて行列乗算を実行することができる。従っ
て、容認できる画質を有しながら先に説明したハイブリ
ッドデータ構造により達成できたメモリの低減を可能に
する程度のフレームレートを実現するために、ハイブリ
ッド因数分解／整数近似技法を選択的に適用することに
より、圧縮領域での処理が実行される。Continuing with FIG. 16, the method proceeds to operation 266 where inverse motion compensation is performed on the data associated with the DCT domain representation in the compressed domain. Here, the hybrid factorization / integer approximation technique described in Tables 5 and 9 is selectively applied to inverse motion compensation. The method then proceeds to operation 268, where a hybrid factorization / integer approximation technique identifies a type of transform matrix associated with the block of video data being processed. In one embodiment, the type of the transform matrix is detected by the information in the bit set of the bit stream that is currently being decoded. If the transform matrix is a half-pixel matrix, the method proceeds to operation 270, where a factorization technique is used to decode the bitstream. In one embodiment, equations 25-
As explained at 45, the factorization technique reduces the matrix multiplication to a series of equations. That is, matrix multiplication is replaced by matrix permutation. Decision operation 2
If it is determined at 68 that the transform matrix is a complete pixel matrix, the method proceeds to operation 272, where an integer approximation technique is used to decode the bitstream. Here, matrix multiplication can be performed using basic integer arithmetic to solve the inverse motion compensation, as described earlier in Equations 46-58. Therefore, the hybrid factorization / integer approximation technique is selectively applied to achieve a frame rate that allows for the memory reduction achieved with the hybrid data structure described above while having acceptable image quality. As a result, processing in the compression area is executed.

【００７５】図１７は、本発明の１実施例による、ハイ
ブリッド因数分解／整数近似技法の選択的適用の概略図
である。ディスプレイ画面２８０は低ビットレートビデ
オによって定義された画像を表示するように構成されて
いる。例えば、ディスプレイ画面２８０を、PDA、セル
ラー電話、ポケットパソコン、ウェブタブレットなど、
携帯用電子機器と関連付けることができる。ボール２８
２はビデオで垂直方向に移動している。ブロック２８４
は移動するオブジェクトの周囲に位置し、高度又は中程
度の動き領域と考えられ、フレームからフレームで変わ
る。ブロック２８６はバックグラウンドを表わし、フレ
ームからフレームで実質的に同じままである。従って、
圧縮ビットストリームの復号化時に、データフレームの
ブロック２８４は、フレームからフレームで、高度な動
き領域と関連付けられるのに対し、ブロック２８６はフ
レームからフレームで実質的に同じままである。高度動
き領域と関連付けられるブロック２８４は、複合化の技
法、つまり、因数分解時には、より高度な精度を必要と
するが、ブロック２８６は実質的に変わらないので計算
量の低い補間法、つまり、整数近似で許される。従っ
て、因数分解技法を高度から中程度の動き領域ブロック
２８４に適用し、整数近似法をバックグラウンドブロッ
ク２８６に適用する。先に説明したように、ブロックが
高度の動きと関連付けられるかどうか、つまり、因数分
解による半画素動き補償を適用するかどうか、もしくは
ブロックがバックグラウンドデータかどうか、つまり、
整数近似による完全画素動き補償を適用するかどうかを
判定するために、ビットストリームに埋め込まれた情報
が検出される。１実施例では、図２、６Ｂ、１５に示し
た動きベクトルが、動き補償が半画素か又は完全画素の
動き補償かどうかを指定している。FIG. 17 is a schematic diagram of the selective application of the hybrid factorization / integer approximation technique according to one embodiment of the present invention. Display screen 280 is configured to display an image defined by the low bit rate video. For example, when the display screen 280 is displayed on a PDA, a cellular phone, a pocket personal computer, a web tablet,
Can be associated with portable electronic devices. Ball 28
2 is moving vertically in the video. Block 284
Is located around the moving object and is considered an altitude or moderate motion area, and varies from frame to frame. Block 286 represents the background and remains substantially the same from frame to frame. Therefore,
Upon decoding of the compressed bitstream, block 284 of the data frame is frame-to-frame and associated with the advanced motion region, while block 286 remains substantially the same frame-to-frame. The block 284 associated with the advanced motion region requires a higher degree of accuracy during the compounding technique, i.e., factorization, but the block 286 is substantially unchanged and therefore requires less computational interpolation, i.e., an integer. Allowed by approximation. Therefore, the factorization technique is applied to the high to medium motion region block 284 and the integer approximation method is applied to the background block 286. As explained earlier, whether the block is associated with a high degree of motion, ie, whether to apply factorized half-pixel motion compensation, or whether the block is background data,
Information embedded in the bitstream is detected to determine whether to apply full pixel motion compensation by integer approximation. In one embodiment, the motion vectors shown in FIGS. 2, 6B, and 15 specify whether the motion compensation is half-pixel or full-pixel motion compensation.

【００７６】なお、先に説明した実施例はソフトウェア
で実行しても、ハードウェアで実行しても構わない。当
業者ならば、デコーダを、先に説明した機能を実現でき
るように構成された論理ゲートを含む半導体チップとし
て実施できることが分かる。例えば、ビデオデコーダを
ハードウェアで実現するには、本書で説明した必要な機
能を実現するための論理ゲートのレイアウト及びファー
ムウェアを合成するために、例えば、VERILOGなど、ハ
ードウェア記述言語（HDL）を採用することができる。The embodiment described above may be executed by software or hardware. One skilled in the art will appreciate that the decoder can be implemented as a semiconductor chip that includes logic gates configured to implement the functions described above. For example, to implement a video decoder in hardware, a hardware description language (HDL) such as VERILOG, for example, is used to synthesize the layout and firmware of the logic gates to achieve the necessary functions described in this document. Can be adopted.

【００７７】図１８は、本発明の１実施例による、メモ
リ所要量を最小限にするためのハイブリッドデータ構造
を活用すると共に、ビットストリームデータを効率よく
復号化するためのハイブリッド因数分解／整数近似技法
を適用するように構成されたデコーダ回路構成を有する
携帯用電子機器の簡約概略図である。携帯用電子機器２
９０は、中央処理機構（CPU）２９４、メモリ２９２、
ディスプレイ画面１３６、デコーダ回路構成２９８を含
み、これらは全てバス２９６で互いに通信し合ってい
る。デコーダ回路構成２９８は、先に説明したビデオ処
理並びに圧縮領域で逆動き補償を実行するのに要するメ
モリ所要量を低減する機能を提供できるように構成され
た論理ゲートを含む。当業者ならば、デコーダ回路構成
２９８はデコーダ回路構成が入っているチップ上にメモ
リを有していても、或いはメモリはチップの外に配置さ
れていても構わないことが明らかである。FIG. 18 illustrates a hybrid factorization / integer approximation for efficiently decoding bitstream data while utilizing a hybrid data structure to minimize memory requirements, according to one embodiment of the present invention. 1 is a simplified schematic diagram of a portable electronic device having a decoder circuit configuration configured to apply the technique. Portable electronic device 2
90 is a central processing unit (CPU) 294, a memory 292,
A display screen 136, including a decoder circuitry 298, are all in communication with each other on a bus 296. Decoder circuitry 298 includes logic gates configured to provide the functions described above to reduce the amount of memory required to perform the video processing and inverse motion compensation in the compressed domain. It will be apparent to one skilled in the art that the decoder circuitry 298 may have memory on the chip containing the decoder circuitry, or the memory may be located off chip.

【００７８】図１９は、本発明の１実施例による、図１
８のデコーダ回路構成のより詳細な概略図である。入っ
てくるビットストリーム１２２は、デコーダ２９８の可
変長デコーダ（VLD）回路構成３００によって受け取ら
れる。当業者ならば、デコーダ回路構成２９８をプリン
ト配線板上に配置された半導体チップ上に設置して構わ
ないことが分かる。ＶＬＤ回路構成３００は、動き補償
回路構成３０６に動きベクトル信号を供給する。ビデオ
処理メモリ３０８は、圧縮領域で反量子化回路構成３０
２からのビデオの内部表現を保持する。DCT回路構成３
０４は動き補償回路構成３０６からのビデオの内部ＤＣ
Ｔ表現を維持する。ランレングス復号（RLD）回路３１
０及び逆離散コサイン変換（IDCT）回路構成３１２は、
ディスプレイ画面１３６に表示できるようにビデオデー
タを解凍する。なお、ここで説明している回路構成のブ
ロックは、図２、６Ｂ、１５において説明したブロック
／ステージと同じような機能を提供する。FIG. 19 is a block diagram of one embodiment of the present invention.
8 is a more detailed schematic diagram of the decoder circuit configuration of FIG. The incoming bit stream 122 is received by the variable length decoder (VLD) circuitry 300 of the decoder 298. Those skilled in the art will understand that the decoder circuit configuration 298 may be provided on a semiconductor chip disposed on a printed wiring board. VLD circuitry 300 provides a motion vector signal to motion compensation circuitry 306. The video processing memory 308 stores the anti-quantization circuit configuration 30 in the compression domain.
2 holds the internal representation of the video from DCT circuit configuration 3
04 is the internal DC of the video from the motion compensation circuitry 306
Maintain the T representation. Run-length decoding (RLD) circuit 31
The 0 and inverse discrete cosine transform (IDCT) circuitry 312
Decompress the video data so that it can be displayed on the display screen 136. The blocks of the circuit configuration described here provide functions similar to those of the blocks / stages described with reference to FIGS.

【００７９】[0079]

【発明の効果】要約すると、今まで説明してきた発明
は、ビデオメモリ量を減らし、圧縮領域で逆動き補償を
実行する圧縮領域ビデオデコーダを提供するものであ
る。現フレームを定義するために基準フレームの非零DC
T係数を保持及び操作するように構成されたハイブリッ
ドデータ構造によりメモリ低減を実現する。ハイブリッ
ドデータ構成は、ビデオデータフレームの各ブロックと
関連付けられる固定サイズブロックを有する固定サイズ
アレイを含んでいる。固定サイズブロックの容量を超え
る非零係数を収容できるように、ハイブリッドデータ構
造には可変サイズオーバーフローベクトルが備わってい
る。圧縮領域ビデオデコーダにより達成されたメモリ圧
縮量は、空間領域ビデオデコーダと比べ、上限で2倍で
ある。圧縮領域ビデオデコーダの逆動き補償は、容認で
きる画質のビデオで毎秒約１５〜２５フレームを実現で
きるように最適化される。復号中のブロックにハイブリ
ッド因数分解／整数近似が選択的にかけられる。因数分
解／整数近似技法のうちのどの補間を適用するかを判定
する基準は、変換行列に基づく。つまり、半画素行列に
は因数分解が適用されるのに対し、完全画素行列には整
数近似が適用される。なお、１実施例では、本書で説明
した圧縮領域パイプラインをＭＰＥＧ−４のシンプルプ
ロファイルビデオデコーダに取り入れることができる。
さらに、実施例により、例えば、電池で動く（CPU制約
型）機器のパワースケーラブル復号化やビデオ会議シス
テムの複合化など、多種多様なアプリケーションを追求
できるようになる。In summary, the inventions described so far provide a compressed domain video decoder that reduces the amount of video memory and performs inverse motion compensation in the compressed domain. Non-zero DC of reference frame to define current frame
A memory reduction is achieved by a hybrid data structure configured to hold and manipulate T coefficients. The hybrid data structure includes a fixed size array having a fixed size block associated with each block of the video data frame. To accommodate non-zero coefficients beyond the capacity of the fixed size block, the hybrid data structure is provided with a variable size overflow vector. The amount of memory compression achieved by the compressed domain video decoder is up to twice that of the spatial domain video decoder. The reverse motion compensation of the compressed domain video decoder is optimized to achieve about 15-25 frames per second with video of acceptable quality. Hybrid blocks / integer approximation are selectively applied to the block being decoded. The criteria for determining which interpolation of the factorization / integer approximation technique to apply is based on a transformation matrix. That is, factorization is applied to a half-pixel matrix, whereas integer approximation is applied to a complete pixel matrix. Note that in one embodiment, the compression domain pipeline described in this document can be incorporated into an MPEG-4 simple profile video decoder.
Further, according to the embodiments, it is possible to pursue a wide variety of applications such as, for example, power scalable decoding of a battery-operated (CPU constrained type) device and integration of a video conference system.

【００８０】上述の実施例を考慮すれば、本発明は、コ
ンピュータシステムに保持されたデータを必要とするい
ろいろなコンピュータで実行されるオペレーションを採
用することができることが分かる。こうしたオペレーシ
ョンには、物理的数量の物理的操作が含まれる。必ずし
もそうとは限らないが、普通、こうした数量は、保持、
変換、結合、比較、さもなければ操作の対象となり得る
電気信号又は磁気信号の形をとる。さらに、実行される
操作は、生成、識別、判定、又は比較といったような言
葉で示されることが多い。In view of the above embodiments, it can be seen that the present invention can employ various computer-implemented operations involving data stored in computer systems. These operations include physical manipulations of physical quantities. Usually, but not always, these quantities are
Takes the form of electrical or magnetic signals that can be transformed, combined, compared, or otherwise manipulated. Further, the manipulations performed are often referred to in terms, such as producing, identifying, determining, or comparing.

【００８１】以上説明した本発明は、ハンドヘルド機
器、マイクロプロセッサシステム、マイクロプロセッサ
ベースの或いはプログラマブル消費者向けエレクトロニ
クス、ミニコンピュータ、メインフレームコンピュータ
など、その他のコンピュータシステム構成と共に実施す
ることができる。本発明は、通信ネットワークを経由し
てリンクされている遠隔処理装置によってタスクが実行
される分散型コンピューティング環境において実施する
ことができる。The invention described above can be implemented with other computer system configurations, such as handheld devices, microprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like. The invention can be implemented in distributed computing environments where tasks are performed by remote processing devices that are linked via a communications network.

【００８２】本発明は、コンピュータ可読媒体上のコン
ピュータ可読コードとして実施することもできる。コン
ピュータ可読媒体は、データを保持し、後からそのデー
タをコンピュータシステムが判読できればどんなデータ
記憶装置でも構わない。コンピュータ可読媒体の例とし
ては、ハードドライブ、ネットワーク接続記憶装置（NA
S）、読み取り専用メモリ、ランダムアクセスメモリ、C
D-ROM、CD-R、CD-RW、磁気テープや、その他の光学式お
よび非光学式データ記憶装置がある。分散してコンピュ
ータ可読コードを保持及び実行されるように、コンピュ
ータ可読媒体をネットワークでつながったコンピュータ
システムで分散することもできる。コンピュータ可読媒
体は、コンピュータコードを含んだ電磁搬送波でも構わ
ない。The present invention can also be embodied as computer readable code on a computer readable medium. The computer readable medium can be any data storage device that holds data and that data can later be read by a computer system. Examples of computer readable media are hard drives, network attached storage (NA
S), read-only memory, random access memory, C
There are D-ROMs, CD-Rs, CD-RWs, magnetic tapes and other optical and non-optical data storage devices. The computer readable media can also be distributed in networked computer systems so that the computer readable code is retained and executed in a distributed fashion. The computer readable medium may be an electromagnetic carrier containing computer code.

【００８３】以上、本発明を、はっきり理解できるよう
に、より詳細に説明してきたが、添付した特許請求の範
囲内で変更、修正が可能なことは明らかである。従っ
て、本願の実施例は説明のためで、制約するためのもの
ではない。また、本発明は、本書で説明した詳細に限定
されるものではないけれども、特許請求の範囲内及び特
許請求と同等の範囲内で修正可能である。請求項におい
て、要素及び／又は工程は、請求項にはっきりと明記さ
れていない限り、何らオペレーションの特定の順序を暗
に示しているものではない。While the invention has been described in more detail for a clearer understanding, it will be apparent that changes and modifications can be made within the scope of the appended claims. Accordingly, the embodiments of the present application are illustrative and not restrictive. Also, the present invention is not limited to the details described herein, but can be modified within the scope and equivalents of the claims. In the claims, elements and / or steps do not imply any particular order of operation, unless explicitly stated in the claims.

[Brief description of the drawings]

本発明は、添付の図面と共に以下に述べる詳細な説明に
より容易に理解できるだろう。類似した構造上の要素を
類似の参照番号で示している。The invention will be more readily understood from the following detailed description taken in conjunction with the accompanying drawings. Similar structural elements are indicated by similar reference numerals.

【図１】ビデオデータを復号化すると共に空間領域で
動き補償を実行するためのビデオデコーダの概略図。FIG. 1 is a schematic diagram of a video decoder for decoding video data and performing motion compensation in the spatial domain.

【図２】本発明の１実施例による、逆動き補償が圧縮
領域で実行されるように構成されたビデオデコーダの概
略図。FIG. 2 is a schematic diagram of a video decoder configured to perform inverse motion compensation in a compression domain, according to one embodiment of the present invention.

【図３】空間領域で実行される逆動き補償を説明して
いる概略図。FIG. 3 is a schematic diagram illustrating inverse motion compensation performed in the spatial domain.

【図４】 H.263規格と関連付けられる強制更新メカニ
ズムの有効性を実証するために、複数のフレームのピー
クの信号対雑音比（PSNR）を説明しているグラフ。FIG. 4 is a graph illustrating peak signal-to-noise ratio (PSNR) of multiple frames to demonstrate the effectiveness of a forced update mechanism associated with the H.263 standard.

【図５】 H.263規格における半画素値の判定を説明し
ている概略図。FIG. 5 is a schematic diagram illustrating determination of a half-pixel value in the H.263 standard.

【図６】Ａ：ベースライン空間ビデオデコーダの概略
図。Ｂ：本発明の１実施例による、圧縮領域ビデオデコ
ーダの概略図。FIG. 6A: Schematic diagram of a baseline space video decoder. B: Schematic diagram of a compressed domain video decoder according to one embodiment of the present invention.

【図７】本発明の１実施例による、ビデオ符号化及び
復号化プロセス時のブロック変換を説明しているブロッ
ク図。FIG. 7 is a block diagram illustrating block conversion during a video encoding and decoding process according to one embodiment of the present invention.

【図８】ランレングス表現における各８ｘ８ブロック
の開始位置を見出すために個別のインデックスの使用を
説明している概略図。FIG. 8 is a schematic diagram illustrating the use of a separate index to find the starting position of each 8 × 8 block in a run-length representation.

【図９】Ａ、Ｂ：それぞれ、アレイベースのデータ構
造及びリストデータ構造で、予測に予測誤差を加算する
ために必要な類別及び併合操作を説明する図。FIGS. 9A and 9B are diagrams for explaining classification and merging operations necessary for adding a prediction error to prediction in an array-based data structure and a list data structure, respectively.

【図１０】本発明の１実施例による、メモリ圧縮及び
計算効率性を可能にするアレイ構造及びベクトル構造を
含むハイブリッドデータ構造の概略図。FIG. 10 is a schematic diagram of a hybrid data structure including an array structure and a vector structure that enables memory compression and computational efficiency, according to one embodiment of the present invention.

【図１１】Ａ、Ｂ、Ｃ：本発明の１実施例による、ハ
イブリッドデータ構造の固定サイズアレイの固定サイズ
ブロックの容量及びオーバーフローベクトルを判定する
際に評価される因子を説明しているグラフ。FIGS. 11A, 11B and 11C are graphs illustrating factors evaluated in determining the capacity and overflow vector of fixed size blocks of a fixed size array of a hybrid data structure, according to one embodiment of the present invention.

【図１２】本発明の１実施例による、ビットストリー
ムを復号化するためのメモリ所要量を低減する方法のオ
ペレーションを説明しているフローチャート。FIG. 12 is a flowchart illustrating the operation of a method for reducing memory requirements for decoding a bitstream, according to one embodiment of the present invention.

【図１３】行列乗算を減らすためのブロックアライメ
ントの３つの例を説明している概略図。FIG. 13 is a schematic diagram illustrating three examples of block alignment for reducing matrix multiplication.

【図１４】完璧にアライメントされたDCTブロックの
半画素補間の概略図。FIG. 14 is a schematic diagram of half-pixel interpolation of a perfectly aligned DCT block.

【図１５】本発明の１実施例による、ビデオデータの
処理を向上させる圧縮領域ビデオデコーダの機能ブロッ
クの並べ替えを説明する概略図。FIG. 15 is a schematic diagram illustrating the reordering of functional blocks of a compressed domain video decoder to enhance processing of video data, according to one embodiment of the present invention.

【図１６】本発明の１実施例による、圧縮領域で逆動
き補償を実行するための方法のオペレーションを示すフ
ローチャート。FIG. 16 is a flowchart illustrating operations of a method for performing inverse motion compensation on a compressed domain, according to one embodiment of the invention.

【図１７】本発明の１実施例による、ハイブリッド因
数分解／整数近似技法の選択的な適用の概略図。FIG. 17 is a schematic diagram of the selective application of a hybrid factorization / integer approximation technique according to one embodiment of the present invention.

【図１８】本発明の１実施例による、メモリ所要量を
最小限にするためにハイブリッドデータ構造を活用する
と共にビットデータストリームを効率的に復号化するた
めにハイブリッド因数分解／整数近似技法を適用するよ
うに構成されたデコーダ回路構成を有する携帯用電子機
器の簡約概略図。FIG. 18 applies a hybrid factorization / integer approximation technique to efficiently decode a bit data stream while utilizing a hybrid data structure to minimize memory requirements, according to one embodiment of the present invention. FIG. 1 is a simplified schematic diagram of a portable electronic device having a decoder circuit configuration configured to perform the following.

【図１９】本発明の１実施例による、図１８のデコー
ダ回路構成のより詳細な概略図。FIG. 19 is a more detailed schematic diagram of the decoder circuit configuration of FIG. 18, according to one embodiment of the present invention.

[Explanation of symbols]

１００,１２０デコーダ１０２,１２２ビットストリーム１０４,１２４可変デコーダ１０６,１３２ランレングスデコーダ１０８,１２６反量子化１１０,１３４逆離散コサイン変換１１２,１２８動き補償１１４,１３０メモリ１１６,１３６ディスプレイ１６０半画素補間 100,120 decoder 102,122 bitstream 104,124 Variable decoder 106,132 run length decoder 108,126 anti-quantization 110,134 inverse discrete cosine transform 112,128 motion compensation 114,130 memory 116,136 display 160 half pixel interpolation

───────────────────────────────────────────────────── フロントページの続き (72)発明者ヴァスデヴバスカランアメリカ合衆国カリフォルニア州サニーベールノースマーフイアベニュ 190 Ｆターム(参考） 5C059 KK08 MA00 MA05 MA23 MC11 MC38 ME05 NN15 NN21 PP04 SS10 SS20 TA61 TC00 UA05 UA33 5J064 AA03 BA09 BB05 BC01 BC02 BC08 BC09 BC14 BC23 BD03 ────────────────────────────────────────────────── ─── Continuation of front page (72) Inventor Vasdev Baskaran United States Sani, California -Vale North Mahui Avenue 190 F term (reference) 5C059 KK08 MA00 MA05 MA23 MC11 MC38 ME05 NN15 NN21 PP04 SS10 SS20 TA61 TC00 UA05 UA33 5J064 AA03 BA09 BB05 BC01 BC02 BC08 BC09 BC14 BC23 BD03

Claims

[Claims]

1. A method for reducing the amount of memory required to decode a bitstream, comprising: receiving a video bitstream; decoding a frame of the bitstream into a transform domain representation; Assembling a hybrid data structure including a fixed size array and a variable size overflow vector, and inserting the non-zero coefficients of the transform domain representation into the hybrid data structure. how to.

2. The method of claim 1, wherein the video bitstream is a low-rate video bitstream.

3. The method of claim 1, wherein decoding the frames of the bitstream into a transform domain representation comprises processing the bitstream through a variable length decoder and an anti-quantization block. The described method.

4. The method of claim 1, wherein the fixed size array includes fixed size blocks.

5. The method of claim 4, wherein the fixed size block is configured to hold eight non-zero coefficients of the transform domain representation.

6. The operation of inserting non-zero coefficients of the transform domain representation into a hybrid data structure maps, for each block of a frame, coefficients in a fixed size array to corresponding coefficients in a variable size overflow vector. The method of claim 1, comprising:

7. A method for decoding video data, comprising: receiving a frame of video data in a compressed bitstream, decoding blocks of the frame into a transform domain representation in a compression domain, and defining a hybrid data structure. Holding data associated with the transform domain representation in the hybrid data structure, performing inverse motion compensation on the data associated with the transform domain representation in the compression domain, and performing inverse motion compensation on the data; A method comprising decompressing data for display.

8. The method of claim 7, wherein the hybrid data structure includes a fixed size array of fixed size blocks and a variable size overflow vector.

9. The operation of maintaining data associated with the transform domain representation in a hybrid data structure comprises: identifying non-zero coefficients of the transform domain representation; and storing the non-zero coefficients until the capacity of the fixed-size block is reached. Holding in a fixed size block of the fixed size array of the hybrid data structure, and after reaching the fixed size block capacity, holding non-zero coefficients exceeding the fixed size block capacity in the overflow vector. 7. The method according to claim 7, wherein
The method described in.

10. The method of claim 7, wherein the compressed bit stream is a low rate bit stream.

11. The operation of performing inverse motion compensation on data associated with the transformed domain representation in a compressed domain includes applying a hybrid factorization and integer approximation technique to data associated with the transformed domain representation. The method of claim 7, wherein:

12. A computer-readable medium having program instructions for reordering a low-rate bit stream for retention in a hybrid data structure, the method comprising identifying non-zero transform coefficients associated with an encoded block of a data frame. Program instructions for arranging the non-zero transform coefficients in a fixed size array, and a program for determining whether the quantity of the non-zero transform coefficients exceeds the capacity of the fixed size array Instructions for holding non-zero transform coefficients exceeding the capacity of the fixed size array in a variable size overflow vector; and for translating the non-zero transform coefficients from a compressed domain to a spatial domain. And computer readable program instructions. Deer.

13. The computer-readable medium of claim 12, wherein the fixed size array includes a plurality of fixed size blocks.

14. The computer-readable medium of claim 13, wherein the fixed-size blocks are each configured to hold eight non-zero transform coefficients.

15. The computer of claim 12, further comprising, for each block of a data frame, program instructions for mapping the coefficients of the fixed size array to corresponding coefficients of a variable size overflow vector. Readable media.

16. The computer-readable medium of claim 12, further comprising program instructions for performing inverse motion compensation on the non-zero transform coefficients using hybrid factorization and integer approximation techniques.

17. A circuit, comprising: a video decoder integrated circuit chip, wherein the video decoder integrated circuit chip receives a bit stream of data associated with a frame of video data.
A circuit configuration for decoding the bit stream of data into a transform domain representation; and a circuit for placing the non-zero transform coefficients of the transform domain representation into a hybrid data structure array in a memory associated with the video decoder. A circuit comprising: a configuration; and a circuit configuration for decompressing the non-zero transform coefficients of the transform domain representation for display.

18. The circuit according to claim 17, wherein said bit stream is an H.263 bit stream.

19. The circuit according to claim 17, wherein said memory is separate from said video decoder integrated circuit chip.

20. The circuit of claim 17, further comprising circuitry for performing inverse motion compensation by hybrid factorization and integer approximation techniques.

21. The circuit according to claim 17, wherein said memory is a static random access memory.

22. An apparatus configured to display a video image, comprising: a central processing unit (CPU); a random access memory (RAM); a display screen configured to display the image; A decoder circuit configured to convert the bit stream to a transform domain representation, wherein the decoder circuitry stores the non-zero transform coefficients of the transform domain representation in a memory associated with the decoder circuitry. The decoder circuit configuration includes a circuit configuration for selectively applying a hybrid factorization / integer approximation technique at the time of inverse motion compensation, further comprising: the CPU, the RAM, the display screen; And a bus in communication with the decoder circuit configuration.

23. The device according to claim 22, wherein the device is a portable electronic device.

24. The device of claim 23, wherein the portable electronic device is selected from the group consisting of a personal digital assistant, a cellular phone, a web tablet, and a pocket personal computer.

25. The apparatus of claim 22, wherein the hybrid data structure includes a fixed size array having a plurality of fixed size blocks and a variable size overflow vector.

26. The apparatus of claim 25, wherein the plurality of fixed size blocks are each configured to hold eight non-zero transform coefficients.

27. The apparatus of claim 26, wherein more than eight non-zero transform coefficients are retained in the variable size overflow vector.

28. The apparatus of claim 22, wherein the decoder circuitry includes an on-chip memory configured to hold data associated with the hybrid data structure.

29. A circuit for selectively applying a hybrid factorization / integer approximation technique during the inverse motion compensation, comprising: converting blocks of a frame of a video image into active motion and inactive motion. When associated with one, the circuit configuration for identification, and the factorization technique is applied to blocks associated with active motion areas, while integer approximation is applied to blocks associated with non-active motion areas. Apparatus for performing inverse motion compensation by applying a technique.

30. The apparatus of claim 22, wherein the video bit stream is a low-rate video bit stream.

31. A method for performing inverse memory compensation, comprising: receiving a video bitstream; identifying a transform matrix type selected from a group consisting of a half-pixel matrix and a full pixel matrix; If the type is a half-pixel matrix, apply a factorization technique to decode the bitstream corresponding to the half-pixel matrix, and if the transformation matrix type is a full-pixel matrix, a bitstream corresponding to the full-pixel matrix Applying an integer approximation technique to decode.

32. The method of claim 31, wherein the video bit stream is a low rate video bit stream.

33. The operation of applying a factorization technique to decode a bitstream corresponding to the half-pixel matrix includes factorizing the half-pixel matrix into columns of a sparse matrix, wherein the sparse matrix is The method of claim 31, comprising an order matrix and a diagonal matrix.

34. The operation of applying an integer approximation technique to decode a bit stream corresponding to the complete pixel matrix includes approximating each element of the complete pixel matrix with a binary number. The method according to claim 31.

35. The method of claim 34, wherein each element is rounded to the nearest square.

36. A method for decoding video data, comprising: receiving a frame of video data in a compressed bitstream, decoding the blocks of the frame into a transform domain representation in a compressed domain, Holding the associated data in a hybrid data structure, performing inverse motion compensation on the data associated with the transform domain representation in the compressed domain, wherein performing the inverse motion compensation comprises a portion of the video data frame. Determining a type of a transformation matrix associated with and applying hybrid factorization and integer approximation techniques to improve inverse motion compensation.

37. The compressed bit stream may be H.26
37. The method of claim 36, wherein the method is associated with a standard selected from the group consisting of: H.261, Motion Picture Expert Group.

38. The method of claim 36, wherein the hybrid data structure includes a fixed size array and a variable size overflow vector.

39. The method of claim 36, wherein the type of the transformation matrix is selected from a group consisting of a half pixel matrix and a full pixel matrix.

40. The method of claim 3, wherein the half-pixel matrix is associated with a high motion region of the image, while the full pixel matrix is associated with a minimum motion region of the image.
10. The method according to 9.

41. The operation of applying a hybrid factorization and integer approximation technique to improve the inverse motion compensation comprises applying a factorization technique to a matrix associated with a block corresponding to a high motion region of the frame; The method of claim 36, comprising applying an integer approximation technique to the remaining blocks.

42. The method of claim 36, wherein said compressed bit stream is a low rate bit stream.

43. A computer readable medium having program instructions for performing inverse motion compensation in a compressed domain, the program instructions for identifying a transformation matrix, wherein the transformation matrix is a half-pixel matrix and a full pixel matrix. A program instruction to determine if it is one of them, a program instruction to apply a factorization technique to decode the block of the bitstream corresponding to the half-pixel matrix, and a bit corresponding to the complete pixel matrix. Computer readable media comprising program instructions for applying an integer approximation technique for decoding blocks of the stream.

44. The computer-readable medium of claim 43, wherein the program instructions for performing the reverse motion compensation are executed in a compression domain.

45. The apparatus of claim 43, further comprising program instructions for extracting motion vector data, wherein the motion vector data identifies a transformation matrix as one of a half pixel matrix and a full pixel matrix. The computer readable medium of claim 1.

46. The computer-readable medium of claim 43, further comprising program instructions for arranging non-zero transform coefficients associated with the encoded data frame block into a hybrid data structure.

47. A program instruction for applying an integer approximation technique for decoding a block of a bit stream corresponding to the complete pixel matrix includes a program instruction for approximating each element of the complete pixel matrix with a binary number. 44. The computer readable medium of claim 43, wherein:

48. Program instructions for applying a factorization technique for decoding a block of a bitstream corresponding to the half-pixel matrix include program instructions for factorizing the half-pixel matrix into columns of a sparse matrix. 44. The computer-readable medium of claim 43, wherein the sparse matrix includes an order matrix and a diagonal matrix.

49. A circuit, comprising: an integrated circuit chip configured to decode video data, wherein the integrated circuit chip receives a bit stream of data associated with a frame of video data. A circuit configuration for decoding the bit stream of data into a transform domain representation; a circuit configuration for identifying a type of transform matrix; and a circuit configuration for performing inverse motion compensation by hybrid factorization and integer approximation techniques. A circuit comprising a circuit configuration.

50. The circuit of claim 49, wherein said integrated circuit chip further has a circuit configuration for arranging non-zero transform coefficients of said transform domain representation in a hybrid data structure.

51. The circuit according to claim 49, wherein said bit stream is a low rate bit stream.

52. A circuit configuration for performing inverse motion compensation by the hybrid factorization and integer approximation technique includes:
50. The circuit of claim 49, wherein the circuit is configured to apply a factorization technique to the half-pixel transformation matrix and apply an integer approximation technique to the full pixel transformation matrix.

53. The circuit of claim 49, further comprising a memory in communication with said integrated circuit chip.

54. The circuit of claim 49, wherein said factorization and integer approximation techniques are used on data in a compression domain.

55. A video decoder, comprising: a variable length decoder (VLD) configured to extract coefficient values and motion vector data from an incoming bit stream, and in communication with the variable length decoder. An anti-quantization block, wherein the anti-quantization block is configured to rescale the coefficient values, and has a downstream branch in communication with the anti-quantization block, wherein the downstream branch has an error An upstream branch in communication with the dequantized block, wherein the upper branch is configured to decode coefficients into a spatial domain, and the upper branch is configured to maintain an internal transform domain representation; Is configured to generate a spatial domain output that can be added to the decoded error coefficients to reconstruct a current block. Video decoder according to claim.

56. The video decoder according to claim 55, wherein said video decoder is implemented as software.

57. The video decoder according to claim 55, wherein said video decoder is implemented as hardware.

58. The video decoder according to claim 55, wherein said incoming bit stream is a low rate bit stream.

59. The video decoder according to claim 55, wherein the upstream branch includes a feedback loop, and the feedback loop includes a frame buffer, a motion compensation block, and a discrete cosine transform block.

60. The video decoder of claim 55, wherein the downstream branch includes a run length decoding block and an inverse compensation block.

61. The video decoder according to claim 55, wherein the inverse motion compensation operation is performed in a compression domain.

62. The method of claim 55, wherein the non-zero coefficients of the transform domain representation are arranged in a hybrid data structure in a memory associated with a video decoder to reduce memory requirements. Video decoder.

63. The video decoder of claim 62, wherein the hybrid data structure includes a fixed size array and a variable size overflow vector.

64. The inverse motion compensation includes hybrid factorization and integer approximation techniques.
2. The video decoder according to 1.

65. The hybrid factorization and integer approximation technique applies a factorization technique to a half-pixel transformation matrix,
65. The video decoder of claim 64, wherein the video decoder is configured to apply an integer approximation technique to the complete pixel transformation matrix.