JP2001527352A

JP2001527352A - Partial decoding of compressed video sequences

Info

Publication number: JP2001527352A
Application number: JP2000526055A
Authority: JP
Inventors: スチュアート，ジェイゴーリン，; チャールズ，マーティンワイン，
Original assignee: サーノフコーポレイション
Priority date: 1997-12-23
Filing date: 1998-12-22
Publication date: 2001-12-25
Also published as: CA2310652C; CA2310652A1; WO1999033275A1; AU1937799A; KR20010033550A; EP1048174A4; EP1048174A1

Abstract

(57)【要約】（例えばＤＣＴなどの）変換を使用して生成された、圧縮ビデオシーケンスは、部分的に復号され（２０２）、符号化されたビットストリームから低周波数変換係数（例えば、ＤＣ係数のみ）を回復する（２０４）。次に低周波数係数を使用して、画像データのブロックが生成され、画像データのブロックにモーション補正されたフレーム間の加算が施されて（２０６）、部分的に復号された画像が生成される。部分的に復号された画像は、ビデオ解析のためのヒストグラム分析（２０８）など、その後の処理に使用できる。本発明は、計算上コストのかかる逆変換操作を行う必要を避けながら、なお充分な結果を得られるものである。 SUMMARY A compressed video sequence, generated using a transform (eg, a DCT), is partially decoded (202) and converted to low-frequency transform coefficients (eg, a DCT) from an encoded bitstream. Coefficient only) is recovered (204). A block of image data is then generated using the low frequency coefficients, and the block of image data is subjected to motion compensated frame addition (206) to generate a partially decoded image. . The partially decoded image can be used for further processing, such as histogram analysis (208) for video analysis. The present invention still achieves satisfactory results while avoiding the need to perform computationally expensive inverse transformation operations.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】（発明の分野）本発明は、ビデオ処理に関し、より詳細には圧縮されたビデオシーケンスの復
号に関する。[0001] The present invention relates to video processing, and more particularly to decoding of compressed video sequences.

【０００２】（関連出願の相互参照）本出願は、米国仮出願番号第60/068，774号（1997年12月23日出願）の出願日の優先権の利益を主張するものである。[0002] This application claims the benefit of filing date priority of US Provisional Application No. 60 / 068,774, filed December 23, 1997.

【０００３】（連邦政府援助による研究開発に関する記述）アメリカ合衆国政府は、政府契約番号MDA-904-95-C-3126号に従い、本発明の少なくとも一部に一定の権利を有する。STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH AND DEVELOPMENT The United States Government has certain rights, at least in part, in the present invention in accordance with Government Contract No. MDA-904-95-C-3126.

【０００４】（関連技術の説明）デジタルビデオシーケンスにおけるビデオ画像（即ちフレーム）は通常画像素
子、即ち画素の配列によって表され、各画素は１つ以上の異なる要素によって表
される。例えば、モノクロ濃度階調画像では、各画素は画素の強度に対応した値
を持つ要素によって表される。ＲＧＢカラーフォーマットでは、各画素は赤の要
素（Ｒ）、緑の要素（Ｇ）、及び青の要素（Ｂ）によって表される。同様にＹＵ
Ｖカラーフォーマットでは、各画素は強度（又は輝度）要素Ｙ、及び２つの色（
又は色光度）要素Ｕ、Ｖによって表される。これらカラーフォーマットの２４ビ
ット版では、各画素要素は８ビット値で表される。2. Description of the Related Art A video image (or frame) in a digital video sequence is usually represented by an array of picture elements, or pixels, where each pixel is represented by one or more different elements. For example, in a monochrome density gradation image, each pixel is represented by an element having a value corresponding to the intensity of the pixel. In the RGB color format, each pixel is represented by a red component (R), a green component (G), and a blue component (B). Similarly YU
In the V color format, each pixel has an intensity (or luminance) component Y and two colors (
Or luminous intensity) elements U, V. In the 24-bit versions of these color formats, each pixel element is represented by an 8-bit value.

【０００５】典型的なビデオシーケンスは、ショットと呼ばれる一連の連続したフレームか
らなっており、ある１つのショットのフレームは同じ基本シーンに対応している
。１つのショットは、１つのカメラから得られた切れ目のないフレームの連続で
ある。現在、デジタルビデオをその構成ショットに解析することに、非常に高い
関心が寄せられている。ビデオ素材が更に入手しやすくなり、ビデオデータベー
スのインデックス作成の必要性が急速に高まっていることが、ビデオ解析への関
心の高さに拍車をかけている。ビデオ解析はまた、ビデオの編集や圧縮にも有用
である。[0005] A typical video sequence consists of a series of consecutive frames called shots, where the frames of one shot correspond to the same basic scene. One shot is a continuous sequence of frames from one camera. At present, there is a great interest in analyzing digital video into its constituent shots. The growing availability of video material and the rapidly increasing need for indexing video databases has fueled interest in video analytics. Video analysis is also useful for video editing and compression.

【０００６】デジタルビデオシーケンス中の異なるショットを識別する１つの方法として、
ビデオフレームに対応するヒストグラムの分析がある。フレームヒストグラムと
は、ビデオデータの１つのフレームに対する、要素値の配分を表したものである
。例えば３２ｂｉｎヒストグラムは、２４ビットＹＵＶカラーフォーマットで表
されたビデオフレームの、８ビットＹ要素に対して生成される。ここで最初のｂ
ｉｎは、フレーム中何個の画素が０〜７のＹ値を持つかを示し、２番目のｂｉｎ
は何個の画素が８〜１５のＹ値を持つかを示す。以下同様に、最後の３２番目の
ｂｉｎは、何個の画素が２４８〜２５５のＹ値を持つかを示す。One way to identify different shots in a digital video sequence is to
There is an analysis of the histogram corresponding to the video frame. The frame histogram represents the distribution of element values for one frame of video data. For example, a 32 bin histogram is generated for an 8-bit Y component of a video frame represented in a 24-bit YUV color format. Here the first b
in indicates how many pixels in the frame have a Y value of 0 to 7, and the second bin
Indicates how many pixels have a Y value between 8 and 15. Similarly, the last 32nd bin indicates how many pixels have Y values of 248 to 255.

【０００７】多次元ヒストグラムも、異なる要素からのビットの組み合わせに基づき生成可
能である。例えば、３次元（３Ｄ）ヒストグラムは、ＹＵＶデータに対してＹ要
素の最も重要な４ビット（ＭＳＢｓ）と、Ｕ要素及びＶ要素の３つのＭＳＢｓを
用いて生成することができる。このとき３Ｄヒストグラム中の各ｂｉｎは、Ｙの
同じ４ＭＢＳｓ、Ｕの同じ３ＭＳＢｓ、及びＶの同じ３ＭＳＢｓを持つフレーム
の中の画素数に対応する。[0007] Multidimensional histograms can also be generated based on combinations of bits from different elements. For example, a three-dimensional (3D) histogram can be generated for the YUV data using the four most significant bits (MSBs) of the Y component and the three MSBs of the U and V components. At this time, each bin in the 3D histogram corresponds to the number of pixels in a frame having the same 4 MBSs of Y, the same 3 MSBs of U, and the same 3 MSBs of V.

【０００８】ビデオシーケンスの一般的な特徴の１つとして、ある種のヒストグラムでは、
与えられたショット内のフレームに対するヒストグラム同士は、それとは異なる
場面に対応するショット内のフレームに対するヒストグラムよりも、通常、より
類似している。従って、デジタルビデオをその構成ショットに解析する１つの方
法は、ビデオシーケンス内のフレームに対して生成されたヒストグラムを比較す
ることにより、ショット間の転位を探すことである。一般に、類似するヒストグ
ラムを持つフレームは、異なるヒストグラムを持つフレームよりも、同じ場面に
対応している可能性がより高い。[0008] One of the general features of video sequences is that in certain histograms,
The histograms for frames in a given shot are usually more similar than the histograms for frames in shots corresponding to different scenes. Thus, one way to analyze digital video into its constituent shots is to look for dislocations between shots by comparing the histograms generated for the frames in the video sequence. In general, frames with similar histograms are more likely to correspond to the same scene than frames with different histograms.

【０００９】ビデオシーケンスは通常、圧縮デジタル形式にて格納、及び／又は伝送される
。圧縮デジタル形式では、オリジナル画素データはビデオフレーム内、及びビデ
オフレーム間の両方に発生する冗長量を利用するよう処理される。ビデオデータ
圧縮には、周知のアルゴリズムが多数ある。これらアルゴリズムの中には、離散
的コサイン変換（ＤＣＴ）などの変換を採用したものがあり、これによって画素
強度データが空間周波数領域の変換係数に変換される。ＤＣＴ変換は画素値に直
接適用するか、或いは、モーション推定を採用する場合には、現在のフレーム中
の画素データと参照フレームからモーション補正された画素データとの間の差分
に対応する、フレーム間の画素の差分に対して適用する。フレーム間の差分演算
は、モーション推定及びモーション補正と共に行うか、或いはこれらを伴わずに
実施する。適切な画素データにＤＣＴ変換を適用後、その結果生じたＤＣＴ係数
は量子化され、その後、例えばジグザグＲＬパターンなどを使用してランレング
ス（ＲＬ）符号化される。ＲＬ符号化されたデータは、次に任意で、圧縮ビデオ
データストリームとして格納、及び／又は伝送するために更に符号化される。モ
ーション推定中に導き出され、モーション補正されたフレーム間での差分演算に
使用されるモーションベクトルもまた、圧縮ビデオデータストリームに符号化さ
れる。ＲＧＢデータやＹＵＶデータなど多重要素のビデオデータに対しては、各
ビデオフレームに対する異なる画素要素値の平面は、通常別々に圧縮される。[0009] Video sequences are typically stored and / or transmitted in compressed digital form. In compressed digital form, the original pixel data is processed to take advantage of the amount of redundancy that occurs both within and between video frames. There are many well-known algorithms for video data compression. Some of these algorithms employ a transform such as discrete cosine transform (DCT), which converts the pixel intensity data into transform coefficients in the spatial frequency domain. The DCT transform is applied directly to the pixel values or, if motion estimation is employed, the inter-frame corresponding to the difference between the pixel data in the current frame and the pixel data motion compensated from the reference frame. Apply to the pixel difference. The difference calculation between frames is performed together with or without motion estimation and motion correction. After applying the DCT transform to the appropriate pixel data, the resulting DCT coefficients are quantized and then run-length (RL) coded using, for example, a zigzag RL pattern. The RL encoded data is then optionally further encoded for storage and / or transmission as a compressed video data stream. The motion vectors derived during the motion estimation and used in the difference operation between the motion compensated frames are also encoded into the compressed video data stream. For multi-element video data such as RGB or YUV data, the planes of different pixel element values for each video frame are usually compressed separately.

【００１０】ビデオシーケンスを圧縮ビデオストリームから回復するときには、符号化工程
を逆行することで表示できる完全なビデオ画像に復元する。例えば、上述の圧縮
アルゴリズムでは、圧縮データは必要に応じＲＬ復号され非量子化されて、符号
化されていないＤＣＴ係数を回復する。次に、復号されたＤＣＴ係数に対して逆
ＤＣＴ変換が適用され、画素データが回復される。画素データがフレーム間の差
分に対応する場合、モーション補正されたフレーム間で、（圧縮ビデオストリー
ムから復号された）モーションベクトルを用いた加算が行われる。その結果、復
号されたビデオシーケンスの現在のフレームに対する、復号された画素強度値が
生成される。When recovering a video sequence from a compressed video stream, the encoding process is reversed to restore a complete video image that can be displayed. For example, in the above-described compression algorithm, the compressed data is RL-decoded and dequantized as needed to recover uncoded DCT coefficients. Next, an inverse DCT transform is applied to the decoded DCT coefficients to recover pixel data. If the pixel data corresponds to the difference between the frames, an addition using the motion vector (decoded from the compressed video stream) is performed between the motion-compensated frames. As a result, a decoded pixel intensity value is generated for the current frame of the decoded video sequence.

【００１１】圧縮ビデオシーケンスをその構成ショットに解析する１つの方法は、圧縮ビデ
オストリームを完全に復元して復号されたビデオフレームを完全に回復させ、そ
の後で、復元されたビデオシーケンスにヒストグラム分析を適用することである
。この方法は計算上コストがかかり、処理速度が遅い。例えば逆ＤＣＴ変換など
の計算に、極めて時間がかかるのがその理由である。One method of analyzing a compressed video sequence into its constituent shots is to completely restore the compressed video stream to fully recover the decoded video frames, and then perform a histogram analysis on the restored video sequence. Is to apply. This method is computationally expensive and slow. For example, the reason for this is that it takes an extremely long time to perform calculations such as inverse DCT.

【００１２】ＤＣＴ変換などの、変換に基づく圧縮アルゴリズムを用いて生成された圧縮ビ
デオシーケンスを解析する別の手法として、低解像度復号方式に基づくものがあ
る。この低解像度復号方式では、圧縮ビデオストリームにおける最低空間周波数
ＤＣＴ係数（即ち、ＤＣＴ変換のＤＣ係数）を除く、すべてのＤＣＴ係数が無視
され、ＤＣ係数のみを用いて、圧縮ビデオストリームの各フレームに対する低解
像度復号画像を生成する。ここで、低解像度復号画像の各画素は、オリジナルビ
デオシーケンスの対応するフレームにおける、画素ブロックに対するＤＣ係数で
ある。その後で、低解像度画像に対してヒストグラム分析が行われる。Another technique for analyzing a compressed video sequence generated using a compression algorithm based on a transform, such as a DCT transform, is based on a low resolution decoding scheme. In this low-resolution decoding scheme, all DCT coefficients except for the lowest spatial frequency DCT coefficient in the compressed video stream (that is, the DC coefficient of DCT transform) are ignored, and only the DC coefficient is used for each frame of the compressed video stream. Generate a low-resolution decoded image. Here, each pixel of the low-resolution decoded image is a DC coefficient for a pixel block in a corresponding frame of the original video sequence. Thereafter, a histogram analysis is performed on the low resolution image.

【００１３】図１（Ａ）及び図１（Ｂ）はそれぞれ、オリジナルビデオシーケンスにおける
オリジナルフレーム１０２と、対応する低解像度復号ビデオシーケンスにおける
低解像度フレーム１０４を示す。オリジナルフレーム１０２は４８０行、５１２
列の画素を持っている。１つの有り得る圧縮アルゴリズムによれば、オリジナル
フレーム１０２は（８×８）の画素ブロックに分割され、オリジナルフレーム１
０２は６０行、６４列の（８×８）ブロックを持っている。各（８×８）ブロッ
クに対してモーション推定が行われ、モーションベクトルが特定される。モーシ
ョンベクトルは、各（８×８）ブロックを、参照フレーム（図示しない）の対応
する（８×８）ブロックに関連付ける。モーション補正されたフレーム間の差分
演算が適用され、オリジナルフレーム１０２の各（８×８）ブロックに対して、
（８×８）のフレーム間の画素差分ブロックを生成する。（８×８）ＤＣＴ変換
が各（８×８）フレーム間画素差分ブロックに適用され、ＤＣＴ係数の（８×８
）ブロックが生成する。ここで、ＤＣ係数は通常左上の隅に位置する。ＤＣＴ係
数の各（８×８）ブロックは、圧縮ビデオストリームとして格納及び／又は伝送
されるため、次にモーションベクトルと共に量子化、ランレングス符号化され、
或いは更に符号化される。FIGS. 1A and 1B show an original frame 102 in an original video sequence and a low-resolution frame 104 in a corresponding low-resolution decoded video sequence, respectively. Original frame 102 has 480 rows and 512 rows
Has columns of pixels. According to one possible compression algorithm, the original frame 102 is divided into (8 × 8) pixel blocks and the original frame 1
02 has an (8 × 8) block of 60 rows and 64 columns. Motion estimation is performed for each (8 × 8) block, and a motion vector is specified. The motion vector associates each (8 × 8) block with a corresponding (8 × 8) block in a reference frame (not shown). A difference operation between the motion-compensated frames is applied, and for each (8 × 8) block of the original frame 102,
A pixel difference block between (8 × 8) frames is generated. An (8 × 8) DCT transform is applied to each (8 × 8) inter-frame pixel difference block and the DCT coefficients (8 × 8
) Block generated. Here, the DC coefficient is usually located in the upper left corner. Since each (8 × 8) block of DCT coefficients is stored and / or transmitted as a compressed video stream, it is then quantized, run-length coded with motion vectors,
Alternatively, it is further encoded.

【００１４】低解像度復号方式によれば、圧縮ビデオストリームを解析するために（例えば
、ショット間の転位を特定するために）、圧縮ビデオデータにランレングス復号
が適用され、量子化ＤＣＴ係数を回復する。適切であれば、各ＤＣＴ係数の組か
ら量子化されたＤＣ係数を非量子化し、モーション補正されたフレーム間で、復
号されたＤＣ係数に対して加算を行って（因子８で適切に縮尺した復号されたモ
ーションベクトルを用いて）、図１の低解像度フレーム１０４を生成する。ここ
で、低解像度フレーム１０４の各画素は、復元されたＤＣ係数のみに対応してい
る。従って、オリジナルフレーム１０２における各（８×８）ブロックは低解像
度フレーム１０４の単一画素によって表され、従って低解像度フレーム１０４は
、６０（即ち４８０／８）行、６４（即ち５１２／８）列の画素のみを有してい
る。ＤＣＴ変換のＤＣ係数は、対応する（８×８）画素ブロック中の６４画素の
平均強度値と等しいため、低解像度フレーム１０４は、オリジナルフレーム１０
２の低解像度の近似値である。According to the low resolution decoding scheme, run-length decoding is applied to the compressed video data to analyze the compressed video stream (eg, to identify transitions between shots) to recover the quantized DCT coefficients. I do. If appropriate, the quantized DC coefficients from each set of DCT coefficients are dequantized and added between the motion compensated frames to the decoded DC coefficients (appropriately scaled by a factor of 8). Using the decoded motion vectors), generate the low resolution frame 104 of FIG. Here, each pixel of the low-resolution frame 104 corresponds to only the restored DC coefficient. Thus, each (8 × 8) block in the original frame 102 is represented by a single pixel in the low resolution frame 104, so that the low resolution frame 104 has 60 (ie, 480/8) rows and 64 (ie, 512/8) columns. Of pixels. Since the DC coefficient of the DCT transform is equal to the average intensity value of 64 pixels in the corresponding (8 × 8) pixel block, the low-resolution frame 104
2 is a low resolution approximation.

【００１５】次に、一連の低解像度フレームにヒストグラム分析が適用され、圧縮ビデオシ
ーケンスがその構成ショットに解析される。この低解像度復号方式は、計算コス
トのかかる逆ＤＣＴ処理を避けているため、逆ＤＣＴ処理を用いて生成する完全
に復号された画像に対してヒストグラム分析を適用する場合よりも、圧縮ビデオ
シーケンスの解析がより速く、安価に達成される。残念ながら、この従来の低解
像度復号方式を用いて復号されたフレームの解像度は非常に低く、正確な解析結
果を得られずに、ポジの誤り（即ち、オリジナルビデオシーケンスに本当はない
ショット間の転位を特定する）、及び／又はネガの誤り（即ち、オリジナルビデ
オシーケンスにある本当の転位を見逃す）を非常に多く導いている。Next, histogram analysis is applied to the series of low resolution frames, and the compressed video sequence is analyzed into its constituent shots. This low-resolution decoding scheme avoids the computationally expensive inverse DCT processing, and thus reduces the complexity of the compressed video sequence compared to applying histogram analysis to the fully decoded image generated using the inverse DCT processing. Analysis is faster and cheaper. Unfortunately, the resolution of frames decoded using this conventional low resolution decoding scheme is very low, and without accurate analysis results, a positive error (i.e., a shift between shots that is not true in the original video sequence). And / or negative errors (i.e. miss the true dislocations in the original video sequence).

【００１６】（発明の概要）本発明は、ビデオ解析などの応用のために、圧縮されたビデオストリームを部
分的に復号する方式に向けられたものである。１つの実施形態によれば、圧縮ビ
デオストリームは復号され、オリジナル各画像データブロックに対して、１つ以
上の低周波数変換係数を回復する。低周波数画像データのブロックは、オリジナ
ル各画像データブロックに対応する低周波数変換係数の各組から生成される。モ
ーション補正されたフレーム間の差分演算が、低周波数画像データの各ブロック
に適用され、圧縮ビデオストリームの各フレームに対し部分的に復号された画像
が生成される。SUMMARY OF THE INVENTION The present invention is directed to a scheme for partially decoding a compressed video stream for applications such as video analysis. According to one embodiment, the compressed video stream is decoded to recover one or more low frequency transform coefficients for each original image data block. A block of low frequency image data is generated from each set of low frequency transform coefficients corresponding to each original image data block. A difference operation between the motion compensated frames is applied to each block of the low frequency image data to generate a partially decoded image for each frame of the compressed video stream.

【００１７】（詳細な説明）本発明の実施形態によれば、変換をベースとした圧縮ビデオストリームを部分
的に復号して部分的に復号された画像を生成し、その後、部分的に復号された画
像に、ビデオ解析のためのヒストグラム分析など、次の処理を与える。部分的に
復号された画像は、圧縮ビデオストリームから得られた、復号されたＤＣ変換係
数のみを用いてブロックを形成することにより生成される。DETAILED DESCRIPTION According to embodiments of the present invention, a partially decoded image is generated by partially decoding a transform-based compressed video stream, and then partially decoded. The following processing is applied to the resulting image, such as histogram analysis for video analysis. Partially decoded images are generated by forming blocks using only the decoded DC transform coefficients obtained from the compressed video stream.

【００１８】図２は、本発明の１つの実施形態による処理の流れ図を示す。圧縮されたビデ
オストリームは部分的に復号され、符号化された変換係数のＤＣ係数が回復され
る（図２のステップ２０２）。例えば、圧縮されたビデオストリームが（８×８
）ＤＣＴ変換を用いて符号化されている場合、ステップ２０２における圧縮ビデ
オストリームの復号（例えばＲＬ復号や、或いは非量子化）は、オリジナルビデ
オシーケンス中の各（８×８）画素ブロックに対応する復号ＤＣＤＣＴ係数を
、ビットストリームから回復するのに充分なだけの、圧縮ビデオストリームの復
号を意味している。FIG. 2 shows a flowchart of a process according to one embodiment of the present invention. The compressed video stream is partially decoded to recover the DC coefficients of the encoded transform coefficients (step 202 in FIG. 2). For example, if the compressed video stream is (8 × 8
3.) If encoded using a DCT transform, the decoding (eg, RL decoding or unquantization) of the compressed video stream in step 202 corresponds to each (8 × 8) pixel block in the original video sequence. Decoding DC means decoding the compressed video stream only enough to recover the DCT coefficients from the bit stream.

【００１９】画像データのブロックは、その後ＤＣ係数のみを用いて、ブロックの各画素に
対して対応するＤＣ係数を複製することによって生成される（ステップ２０４）
。１つの実施形態では各ブロックはサブブロックであり、サブブロックは変換係
数を生成するために使用するオリジナルビデオシーケンスの対応領域よりも小さ
い。例えば、ＤＣＴ変換がオリジナルビデオシーケンスの（８×８）ブロックに
適用された場合、ステップ２０４で生成されるサブブロックは（２×２）或いは
（４×４）となる（ただし、その他のサイズも使用可能）。或いは、複製された
ＤＣ係数のブロックが、変換と同じサイズ（例えば８×８）を取ることも可能で
ある。A block of image data is then generated by duplicating the corresponding DC coefficient for each pixel of the block, using only DC coefficients (step 204).
. In one embodiment, each block is a sub-block, where the sub-block is smaller than the corresponding area of the original video sequence used to generate the transform coefficients. For example, if the DCT transform is applied to (8 × 8) blocks of the original video sequence, the sub-block generated in step 204 will be (2 × 2) or (4 × 4) (although other sizes may be used). Available). Alternatively, the block of duplicated DC coefficients can take the same size (eg, 8 × 8) as the transform.

【００２０】適切であれば、モーション補正されたフレーム間の加算がその後行われ、部分
的に復号された画像が生成される（ステップ２０６）。その後、部分的に復号さ
れた画像に、ビデオ解析のためのヒストグラム分析など追加の処理が与えられる
（ステップ２０８）。複製ＤＣ係数のブロックがオリジナル変換サイズよりも小
さい場合、モーション補正されたフレーム間の加算で使用される、復号されたモ
ーションベクトルには、それに応じた倍率をかけなくてはならない。（４×４）
又は（２×２）サブブロックを用いる１つの利点は、除算を行うことなく単純に
ビットをずらすことによって、モーションベクトルをそれぞれ２或いは４の因子
で縮尺できる点にある。If appropriate, the addition between the motion compensated frames is then performed to produce a partially decoded image (step 206). Thereafter, the partially decoded image is subjected to additional processing such as histogram analysis for video analysis (step 208). If the block of duplicate DC coefficients is smaller than the original transform size, the decoded motion vectors used in the addition between motion compensated frames must be scaled accordingly. (4x4)
One advantage of using (2 × 2) sub-blocks is that motion vectors can be scaled by a factor of 2 or 4, respectively, by simply shifting bits without performing division.

【００２１】図３（Ａ）及び図３（Ｂ）は、図２の処理の実施による、オリジナルビデオシ
ーケンス中のオリジナルフレーム３０２と、対応する部分的に復号されたビデオ
シーケンスの部分的に復号されたフレーム３０４を、それぞれ示す。オリジナル
フレーム３０２は、図１（Ａ）のフレーム１０２同様、（４８０×５１２）のフ
レームである。部分的に復号された画像３０４は、複製されたＤＣ係数の（４×
４）ブロックから生成され、従って２４０行、２５６列の画素を持っている。FIGS. 3A and 3B illustrate an original frame 302 in an original video sequence and a partially decoded version of a corresponding partially decoded video sequence, according to an implementation of the process of FIG. Each of the frames 304 is shown. The original frame 302 is a (480 × 512) frame like the frame 102 in FIG. The partially decoded image 304 has a (4 ×
4) It is generated from a block and therefore has 240 rows and 256 columns of pixels.

【００２２】図３（Ｃ）は、図３（Ｂ）の部分的に復号された画像３０４を生成するために
使用する、サブブロックを表す複製されたＤＣ係数の（４×４）ブロック３０６
を示す。複製されたＤＣ係数のあるサブブロックに対して、各画素は同じ情報を
含んでいるが、適切に縮尺された復号されたモーションベクトルを用いてモーシ
ョン補正されたフレーム間での加算が行われる場合、その結果の、部分的に復号
された画像における対応するサブブロックは、通常同じデータの複製を含んでい
ない。これは、ほとんどのモーションベクトルが、４の整数倍以外の要素を持つ
からである。FIG. 3C shows a (4 × 4) block 306 of duplicated DC coefficients representing sub-blocks used to generate the partially decoded image 304 of FIG. 3B.
Is shown. For a sub-block with duplicated DC coefficients, each pixel contains the same information, but the addition between the motion compensated frames is performed using appropriately scaled decoded motion vectors. , The corresponding sub-blocks in the resulting partially decoded image usually do not contain duplicates of the same data. This is because most motion vectors have elements other than integral multiples of four.

【００２３】本発明者らは、この部分的復号方式によって、従来技術の低解像度復号方式を
行うよりも、圧縮ビデオストリームのビデオ解析がより良く行われる（即ち、ポ
ジ及び／又はネガの誤りが少ない）ことを見出した。いくつかの追加計算の負荷
、例えば、より大きな復号画像に対してモーション補正されたフレーム間の加算
演算などが必要となるが、本発明の部分的復号方式は、従来技術の低解像度復号
方式における逆変換の実施を避ける、という利点を共有している。従って処理の
制限次第で、本発明は計算コストが上がっても手頃な価格にできる点で、より良
い結果を提供することができる。The present inventors believe that this partial decoding scheme allows for better video analysis of the compressed video stream than does the prior art low resolution decoding scheme (ie, positive and / or negative errors are eliminated). Less). Although some additional computational load is required, such as the addition between motion-compensated frames for larger decoded images, the partial decoding scheme of the present invention is less than the prior art low resolution decoding scheme. They share the advantage of avoiding the inverse transformation. Thus, depending on processing limitations, the present invention can provide better results in that it can be made more affordable even if the computational cost increases.

【００２４】図４（Ａ）〜図４（Ｄ）は、４つの異なるＤＣ要素ブロックサイズについてＰ
ｘ６４ビデオ圧縮方式を用いて符号化された１５００フレームの試験用シーケン
スに関して、フレーム間のヒストグラムの差異をフレーム数に対してプロットし
、グラフ表現で表したものである（Ｙ軸のスケールは任意）。図４（Ａ）は、従
来技術の処理に対応するものであり、低解像度画像の単一のＤＣ値（即ち、（１
×１）ブロック）によって各（８×８）画素ブロックが表されている。図４（Ｂ
）は、本発明による処理に対応しており、部分的に復号された画像における複製
されたＤＣ値の（２×２）ブロックによって、各（８×８）画素ブロックが表さ
れている。同様に、図４（Ｃ）及び図４（Ｄ）は本発明による処理に対応してお
り、部分的に復号された画像における複製されたＤＣ値の、それぞれ（４×４）
ブロック、及び（８×８）ブロックによって、各（８×８）画素ブロックが表さ
れている。これらの図における高いピークは、ビデオシーケンスでの場面変化を
示している。FIGS. 4A to 4D show P for four different DC element block sizes.
For a test sequence of 1500 frames encoded using the x64 video compression scheme, the histogram differences between the frames are plotted against the number of frames and represented in a graphical representation (Y-axis scale is arbitrary). . FIG. 4A corresponds to the processing of the prior art, where a single DC value (ie, (1)
Each (8 × 8) pixel block is represented by (× 1) block). FIG. 4 (B
) Corresponds to the process according to the invention, wherein each (8 × 8) pixel block is represented by a (2 × 2) block of duplicated DC values in a partially decoded image. Similarly, FIGS. 4 (C) and 4 (D) correspond to the process according to the invention, in which (4 × 4) of the duplicated DC values in the partially decoded image, respectively.
Each (8 × 8) pixel block is represented by a block and an (8 × 8) block. High peaks in these figures indicate scene changes in the video sequence.

【００２５】図４（Ａ）〜図４（Ｄ）では、複製されたＤＣ値のブロックが（１×１）から
（８×８）に増えるにつれて、結果としてバックグラウンドのノイズレベルが減
少することが示されている。このことは、従来技術である図４（Ａ）の低解像度
方式に勝る、本発明の利点の１つである。FIGS. 4A-4D show that as the number of blocks of duplicated DC values increases from (1 × 1) to (8 × 8), the background noise level decreases as a result. It is shown. This is one of the advantages of the present invention over the conventional low-resolution system shown in FIG.

【００２６】本発明を２次元（８×８）ＤＣＴ変換の状況下について説明してきたが、当業
者には、本発明を例えば（８×８）以外のサイズの２次元ＤＣＴ変換や１次元Ｄ
ＣＴ変換などその他のＤＣＴ変換、或いは１次元や２次オリジナルスラント変換
やハール変換など、その他の変換を同様に用いて実施することが可能であること
が理解されるであろう。同様に本発明を、モーション推定やモーション補正が（
８×８）画素データブロックに対して行われる、という状況について説明してき
たが、モーション分析はその他のサイズのブロックに対しても実施可能であり、
これらその他のサイズは、変換サイズとは異なっていてもよいことが理解される
であろう。例えば一般的なビデオ圧縮方式では、変換が（８×８）ＤＣＴ変換で
あっても、モーション推定及びモーション補正を（１６×１６）画素データブロ
ックに対して実行させている。Although the present invention has been described in the context of a two-dimensional (8 × 8) DCT transform, those skilled in the art will appreciate that the present invention can be applied to two-dimensional DCT transforms of sizes other than (8 × 8) or one-dimensional D
It will be appreciated that other DCT transforms, such as a CT transform, or other transforms, such as a one-dimensional or second-order original Slant transform or a Haar transform, can be used as well. Similarly, the present invention can be applied to motion estimation and motion correction (
Although we have described the situation where it is performed on 8 × 8) pixel data blocks, motion analysis can be performed on blocks of other sizes,
It will be appreciated that these other sizes may be different from the transform size. For example, in a general video compression method, motion estimation and motion correction are performed on a (16 × 16) pixel data block even if the transform is an (8 × 8) DCT transform.

【００２７】更に、本発明をＤＣ変換係数のみを用いて、部分的に復号された画像を生成す
る状況について説明してきたが、当業者には選択的な実施例として、２つ以上の
低周波数変換係数（ＤＣ係数を含む）を用いて部分的に復号された画像を生成す
ることも可能であることが理解されるであろう。Further, while the present invention has been described in terms of generating a partially decoded image using only DC transform coefficients, those skilled in the art will appreciate that, as an alternative, two or more low frequency It will be appreciated that transform coefficients (including DC coefficients) may be used to generate the partially decoded image.

【００２８】同様に、本発明をヒストグラム分析に基づいたビデオ解析を行うという状況で
説明してきたが、当業者には、本発明が、低解像度を受認可能なその他の適用、
例えば画像中画像の生成、ビデオシーケンスの早送り再生、ターゲットの認識、
及びモーション検出などに対しても使用可能であることが理解されよう。Similarly, while the present invention has been described in the context of performing video analysis based on histogram analysis, those skilled in the art will recognize that the present invention may be used in other applications where low resolution is acceptable,
For example, generation of images in images, fast forward playback of video sequences, target recognition,
It can also be used for motion detection and the like.

【００２９】本発明は、これらの方法を行うための方法、及び装置の形で具体化することが
できる。本発明はまた、フロッピーディスク、ＣＤ−ＲＯＭ、ハードディスクド
ライブ、或いはその他の装置読取可能の記憶媒体など、具体的な媒体に具現化さ
れたプログラムコードの形で具体化することができる。このとき、プログラムコ
ードがコンピュータのような機械にロードされ、機械によって実行される場合、
機械が本発明を行うための装置となる。本発明はまた、プログラムコードの形で
具体化することができ、前記プログラムコードは、例えば機械にロードされ、及
び／又は、機械によって実行されるような記憶媒体に記憶されるか、或いは、例
えば電気配線、ケーブル、光ファイバ、電磁放射線など、なんらかの伝送媒体を
通じて伝送されることが可能である。このとき前記プログラムコードがコンピュ
ータなどの機械にロードされ、それによって実行される場合に、その機械が本発
明を行う装置となる。汎用プロセッサ上で実行される場合、プログラムコードの
セグメントがプロセッサに結合し、特定の論理回路と類似の動作を行う、固有の
デバイスを提供する。The present invention can be embodied in the form of a method and an apparatus for performing these methods. The present invention can also be embodied in the form of program code embodied in a specific medium, such as a floppy disk, CD-ROM, hard disk drive, or other device-readable storage medium. At this time, if the program code is loaded on a machine such as a computer and executed by the machine,
A machine is an apparatus for performing the present invention. The invention can also be embodied in the form of program code, said program code being loaded on a machine and / or stored on a storage medium as executed by the machine, or It can be transmitted through any transmission medium, such as electrical wiring, cables, optical fibers, electromagnetic radiation, and the like. At this time, when the program code is loaded on a machine such as a computer and executed by the machine, the machine becomes an apparatus for performing the present invention. When executed on a general-purpose processor, the program code segments combine with the processor to provide a unique device that performs similar operations as certain logic circuits.

【００３０】更に、当業者には、本発明の本質を説明するために記述及び例証された詳細、
材料、及び部分の配置については、以下の請求項に表された本発明の原理及び範
囲から逸脱することなく、様々に変更することが可能であることが理解されよう
。Further, those skilled in the art will appreciate the details described and illustrated, to illustrate the essence of the invention;
It will be understood that various changes may be made in the materials and arrangement of parts without departing from the principles and scope of the invention as set forth in the following claims.

[Brief description of the drawings]

本発明の他の様態、特徴、及び利点は、以下に詳述する説明、添付の請求項、
及び以下の図面からより完全に明らかになろう。Other aspects, features, and advantages of the invention are set forth in the description which follows, the appended claims,
And will be more fully apparent from the following figures.

【図１】（Ａ）及び（Ｂ）は、オリジナルビデオシーケンスにおけるオリジナルフレー
ムと、対応する低解像度復号ビデオシーケンスにおける低解像度フレームを、そ
れぞれ示す。1A and 1B show an original frame in an original video sequence and a low-resolution frame in a corresponding low-resolution decoded video sequence, respectively.

【図２】本発明の１つの実施形態による、処理の流れ図を示す。FIG. 2 illustrates a process flow diagram according to one embodiment of the present invention.

【図３】（Ａ）及び（Ｂ）は、図２の処理による、オリジナルビデオシーケンスにおけ
るオリジナルフレームと、部分的に復号された対応するビデオシーケンスにおけ
る部分的に復号されたフレームを、それぞれ示し、（Ｃ）は、図３（Ｂ）の部分
的に復号された画像の生成に使用される、サブブロックを表す、複製されたＤＣ
係数の（４×４）ブロックを示す。FIGS. 3A and 3B show an original frame in an original video sequence and a partially decoded frame in a corresponding partially decoded video sequence, respectively, according to the process of FIG. 2; (C) is a duplicated DC representing a sub-block used to generate the partially decoded image of FIG. 3 (B).
Shows a (4 × 4) block of coefficients.

【図４】（Ａ）〜（Ｄ）は、４つの異なるＤＣ要素ブロックサイズについてＰｘ６４ビ
デオ圧縮方式を用いて符号化された、１５００フレームの試験用シーケンスに関
して、フレーム間のヒストグラムの差異をフレーム数に対してプロットして表し
たグラフを示す（Ｙ軸のスケールは任意）。4A-4D show histogram differences between frames for a 1500 frame test sequence encoded using the Px64 video compression scheme for four different DC element block sizes. Is shown (a scale on the Y axis is arbitrary).

───────────────────────────────────────────────────── フロントページの続き (81)指定国ＥＰ(ＡＴ，ＢＥ，ＣＨ，ＣＹ，ＤＥ，ＤＫ，ＥＳ，ＦＩ，ＦＲ，ＧＢ，ＧＲ，ＩＥ，ＩＴ，ＬＵ，ＭＣ，ＮＬ，ＰＴ，ＳＥ)，ＯＡ(ＢＦ，ＢＪ，ＣＦ，ＣＧ，ＣＩ，ＣＭ，ＧＡ，ＧＮ，ＧＷ，ＭＬ，ＭＲ，ＮＥ，ＳＮ，ＴＤ，ＴＧ)，ＡＰ(ＧＨ，ＧＭ，ＫＥ，ＬＳ，ＭＷ，ＳＤ，ＳＺ，ＵＧ，ＺＷ)，ＥＡ(ＡＭ，ＡＺ，ＢＹ，ＫＧ，ＫＺ，ＭＤ，ＲＵ，ＴＪ，ＴＭ) ，ＡＬ，ＡＭ，ＡＴ，ＡＵ，ＡＺ，ＢＡ，ＢＢ，ＢＧ，ＢＲ，ＢＹ，ＣＡ，ＣＨ，ＣＮ，ＣＵ，ＣＺ，ＤＥ，ＤＫ，ＥＥ，ＥＳ，ＦＩ，ＧＢ，ＧＥ，ＧＨ，ＧＭ，ＨＲ，ＨＵ，ＩＤ，ＩＬ，ＩＳ，ＪＰ，ＫＥ，ＫＧ，ＫＰ，ＫＲ，ＫＺ，ＬＣ，ＬＫ，ＬＲ，ＬＳ，ＬＴ，ＬＵ，ＬＶ，ＭＤ，ＭＧ，ＭＫ，ＭＮ，ＭＷ，ＭＸ，ＮＯ，ＮＺ，ＰＬ，ＰＴ，ＲＯ，ＲＵ，ＳＤ，ＳＥ，ＳＧ，ＳＩ，ＳＫ，ＳＬ，ＴＪ，ＴＭ，ＴＲ，ＴＴ，ＵＡ，ＵＧ，ＵＺ，ＶＮ，ＹＵ，ＺＷＦターム(参考） 5C059 KK01 KK40 MA23 PP04 RF09 SS20 TA76 TB02 TC04 TD10 TD11 ──────────────────────────────────────────────────続き Continuation of front page (81) Designated country EP (AT, BE, CH, CY, DE, DK, ES, FI, FR, GB, GR, IE, IT, LU, MC, NL, PT, SE ), OA (BF, BJ, CF, CG, CI, CM, GA, GN, GW, ML, MR, NE, SN, TD, TG), AP (GH, GM, KE, LS, MW, SD, SZ, UG, ZW), EA (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), AL, AM, AT, AU, AZ, BA, BB, BG, BR, BY, CA, CH, CN, CU, CZ, DE, DK, EE, ES, FI, GB, GE, GH, GM, HR, HU, ID, IL, IS, JP, KE, KG, KP , KR, KZ, LC, LK, LR, LS, LT, LU, LV, MD, MG, MK, MN, MW, MX, NO, NZ, PL, PT, RO, RU, SD, SE, SG, SI, SK, SL, TJ, TM, TR, TT, UA, UG, UZ, VN, YU, ZWF terms (reference) 5C059 KK01 KK40 MA23 PP04 RF09 SS20 TA76 TB02 TC04 TD10 TD11

Claims

[Claims]

(A) decoding a compressed video stream to recover one or more low frequency transform coefficients for each block of original image data; and (b) corresponding to each block of original image data. Generating a block of low-frequency image data from each set of low-frequency transform coefficients; and (c) applying a difference operation between motion-compensated frames to each block of low-frequency image data. Generating a partially decoded image for each frame of the video stream. A method for partially decoding a transform-based compressed video stream.

2. The invention of claim 1, further comprising applying a histogram analysis to the partially decoded image to analyze the compressed video stream.

3. The invention according to claim 1, wherein each block of the low-frequency image data is smaller than the transform size.

4. The invention according to claim 1, wherein the transform is a discrete cosine transform (DCT).

5. The method according to claim 4, wherein the transform is an (8 × 8) DCT transform, and each block of the low-frequency image data is either (2 × 2) or (4 × 4). The described invention.

6. The method of claim 1, wherein each block of the low frequency image data is smaller than a size of the transform, and further comprising applying a histogram analysis to the partially decoded image to analyze the compressed video stream. Item 6. The invention according to Item 5.

7. Step (a) comprises decoding the compressed video stream to recover only DC transform coefficients for each block of original image data, and step (b) comprises: The invention of claim 1, further comprising the step of generating a block of image data.

8. The invention of claim 7, wherein each block of low frequency image data is generated by duplicating a corresponding DC transform coefficient.

9. A means for decoding the compressed video stream to recover one or more low frequency transform coefficients for each block of the original image data; and (b) corresponding to each block of the original image data. Means for generating blocks of low-frequency image data from each set of low-frequency transform coefficients to be applied; and (c) means for applying a difference operation between motion-compensated frames to each block of low-frequency image data. Means for generating a partially decoded image for each frame of the compressed video stream, wherein the apparatus partially decodes the transform-based compressed video stream.

10. A computer readable medium having stored thereon a plurality of instructions, the instructions comprising, when executed by a processor, a method for partially decoding a conversion-based compressed video stream. Instructions for causing the processor to execute, the means comprising: (a) decoding the compressed video stream and recovering one or more low frequency transform coefficients for each block of the original image data; Generating a block of low-frequency image data from each set of low-frequency transform coefficients corresponding to each block of image data; and (c) a difference between frames whose motion has been corrected for each block of low-frequency image data. Applying an operation, wherein for each frame of the compressed video stream the partially decoded image Computer-readable medium, comprising the steps of: generating a.