JP2006261950A

JP2006261950A - Method for estimating movement of block matching in frequency region

Info

Publication number: JP2006261950A
Application number: JP2005075327A
Authority: JP
Inventors: Yung-Sen Chen; 陳永森; De-Yu Kao; 高得▼よ▲; Ying-Yuan Tang; 唐英原
Original assignee: Princeton Technology Corp
Current assignee: Princeton Technology Corp
Priority date: 2005-03-16
Filing date: 2005-03-16
Publication date: 2006-09-28

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method for generating a motion vector by using low frequency information at the time of performing the block matching of motion estimation, and for reducing calculation quantity. <P>SOLUTION: This method is provided to change an extremely small part of a hard disk, and to add small correction to a video compression method for executing motion estimation in a frequency region. In this case, naked eyes are not more sensitive in a high frequency region than in a low frequency region. Then, the motion vector of the motion estimation can be searched by extracting only the low frequency data. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、動き推定（motion estimation）のブロックマッチング（block matching）時に、低周波数情報を使用して、モーションベクトル（motion vector）を生成する方法に関するものであって、ブロックマッチング処理は低周波数だけに応用され、全周波数幅に採用されないので、デジタル動画計算の複雑性を減少させる方法に関するものである。 The present invention relates to a method of generating a motion vector using low frequency information during block matching of motion estimation, and the block matching process is performed only at a low frequency. The present invention relates to a method for reducing the complexity of digital moving image calculation because it is not applied to the entire frequency range.

一般のコンピュータ、テレビ、携帯電話等のスクリーン上のデジタルアニメーション処理は、メモリ空間、或いは、伝送帯域幅を減少させるため、デジタルアニメーション圧縮技術を使用する。現在、デジタルアニメーション圧縮技術において、最もよく使用される規格は、MPEG−２、MPEG−４、AVS、H．２６４で、これらの規格は、どれも、“動き推定”の方法を使用し、時間領域中、前後データの関連性を照合し、そのデータ量を圧縮する。一般に、連続したアニメーションは、画面が流れるように動くために、毎秒、２０〜３０枚の画面が必要で、連続した二枚の画面中、相同のイメージは、動き推定を使用して、その移動関係を決定する。
動き推定の方法は、フレームを、１６×１６の２５６画素マトリクスを、一つのマクロブロック（Macro−Block、MB）に分割し、その後、各MBに、前の画面と相関する最適なモーションベクトルを探し出す。図１を参照すると、フレーム（ａ）とフレーム（ｂ）は、二枚の連続した画面である。しかし、フレーム（ｂ）を伝送（或いは、保存）する時、列車のモーションベクトル（点線の矢印で示される）だけが伝送され、その後フレーム（ａ）中の列車が遮蓋された背景を加え、元の列車と背景データを配合することにより、フレーム（ｂ）を生成する。この方法は、伝送帯域幅（或いは、メモリの容量）を減少させることができるが、計算の複雑性を増加させる。
フレーム（ａ）のMBのモーションベクトルを計算する時、フレーム（ｂ）のあるMBの対応画素により、フレーム（ａ）のMB中の各画素を減らすことが必要で、その後、２５６個の絶対差異値を加えて、「差分絶対値の総和」（SAD）を得る。この場合、多くのSADは、フレーム（ｂ）の全MBを計算する時生成され、最小のSADに対応する比較点の位置の差異が、ターゲットポイントである。フレーム（ａ）の比較点に関連するターゲットポイントの位置の差異は、
“モーションベクトル”と称される。計算量を減少させるため、通常は、小さいサーチ範囲（Searching Range）を設定し、このサーチ範囲中、で見つかったSADが所定の設定値より小さい場合、この比較点位置の差異が、モーションベクトルである。
図２を参照すると、全サーチ方法の動き推定と、サーチ範囲３２×３２画素に基づいて、MBの大きさは１６×１６で、あるMBのモーションベクトルを見つけたい場合、あるMBとその他の全MBを運算しなければならず、これにより、１７×１７＝２８９回のMBの比較を有する（MBは、サーチ範囲内１７×１７の範囲で移動する）。各比較は、最小の差分絶対値の総和MADの運算方法に基づいて処理される。MB画素値は、他のMBの対応する画素値により引かれ、その後、絶対値を得て、累加し、全部で７６７回の運算を必要とする（２５６回減少、絶対値を２５６回得る、２５５回累加する、２５６＋２５６＋２５５＝７６７）。全部で２８９回の比較を有するので、それぞれ、７６７回の運算を必要とし、２８９×７６７＝２２１６６３回の運算で、一個のMBのモーションベクトルの検索が完成する。
一枚のフレームは７２０×４８０のフレームを有し、フレームは、１３５０個のMBに分割できる。この場合、約２．９９×１０^８回（１３５０×２２１６６３）の演算で、このフレームのモーションベクトルの計算を完成する。一般の連続アニメーションは、毎秒、少なくとも２２枚のフレームを必要とするので、毎秒の総運算量は、約６．５８×１０^９回（２２×２．２９×１０^８）である。
上述から、動き推定は、非常に大きい演算処理力を必要とする。このシステムは、高システムクロックと大DSPを装備しなければならず、電力消耗が多くなるにつれて、ポータブル電子用品のバッテリーはロードをサポートすることが出来ず、コストが増加する。よって、多くの新しいソリューションが発展し、二つのカテゴリに分割される。まず、比較点の数を減少させ、次に、演算を減少させる。共に、同時に応用でき、計算量を最小にまで減少することができる。
比較点を減少させるため、多くのソリューションが用いられ、“三ステップサーチTSS”“四ステップサーチFSS”等がよく用いられ、現在のサーチ範囲で、幾つかの点を探し、最小MAD値を求め、その後、最小MAD値付近で、領域化運算を実行する。
演算を減少するのに用いられるソリューションは少なく、常用されるのは：
SUM（ABS（ａ−ｂ））>＝ABS（SUM（ａ）−SUM（ｂ））
ａ、ｂはそれぞれ、二つのMBの各点の画素値を示す。この不等式の意義は、二つのMBが対応する画素差分絶対値の総和（前述のMAD運算）は、二つのMBの各画素値の和の差分絶対値より大きいか、或いは、等しい（近似計算と称される）。
上述の全ての方法は、時間領域（timing domain）に適用される。しかし、時間領域を周波数領域に転換した後、我々は、ブロックマッチングアルゴリズムは、更に改善できることを発見した。 Digital animation processing on a screen of a general computer, television, mobile phone or the like uses a digital animation compression technique in order to reduce memory space or transmission bandwidth. Currently, the most commonly used standards in digital animation compression technology are MPEG-2, MPEG-4, AVS, H.264, and the like. In H.264, each of these standards uses the method of “motion estimation” to check the relevance of the preceding and succeeding data in the time domain and compress the amount of data. In general, continuous animation requires 20 to 30 screens per second to move as the screen flows. In two consecutive screens, homologous images are moved using motion estimation. Determine the relationship.
In the motion estimation method, a frame is divided into a 16 × 16 256-pixel matrix into one macro block (Macro-Block, MB), and then an optimal motion vector correlated with the previous screen is assigned to each MB. Find out. Referring to FIG. 1, frame (a) and frame (b) are two continuous screens. However, when transmitting (or storing) frame (b), only the train motion vector (indicated by the dotted arrow) is transmitted, and then the train in frame (a) is added with a capped background. The frame (b) is generated by combining the train and the background data. This method can reduce transmission bandwidth (or memory capacity), but increases computational complexity.
When calculating the motion vector of MB of frame (a), it is necessary to reduce each pixel in MB of frame (a) by the corresponding pixel of MB of frame (b), and then 256 absolute differences Add the values to get the “sum of absolute differences” (SAD). In this case, many SADs are generated when calculating the total MB of the frame (b), and the difference in the position of the comparison point corresponding to the smallest SAD is the target point. The difference in the position of the target point relative to the comparison point in frame (a) is
It is called “motion vector”. In order to reduce the amount of calculation, usually, a small search range is set, and if the SAD found in this search range is smaller than the predetermined set value, the difference of the comparison point position is expressed by the motion vector. is there.
Referring to FIG. 2, based on the motion estimation of all search methods and a search range of 32 × 32 pixels, the MB size is 16 × 16, and if one wants to find a motion vector of one MB, one MB and all other MB has to be calculated, which has 17 × 17 = 289 MB comparisons (MB moves in the range of 17 × 17 within the search range). Each comparison is processed based on the calculation method of the sum MAD of the minimum difference absolute values. MB pixel values are subtracted by the corresponding pixel values of other MBs, then absolute values are obtained and accumulated, requiring a total of 767 operations (decrease 256 times, get absolute values 256 times, Accumulate 255 times, 256 + 256 + 255 = 767). Since there are a total of 289 comparisons, each requires 767 computations, and 289 × 767 = 2221663 computations complete the search for a single MB motion vector.
One frame has 720 × 480 frames, and the frame can be divided into 1350 MB. In this case, the calculation of the motion vector of this frame is completed by approximately 2.99 × 10 ⁸ operations (1350 × 221663). Since a general continuous animation requires at least 22 frames per second, the total calculation amount per second is about 6.58 × 10 ⁹ times (22 × 2.29 × 10 ⁸ ).
From the above, motion estimation requires a very large processing power. The system must be equipped with a high system clock and a large DSP, and as power consumption increases, portable electronic equipment batteries cannot support loading, increasing costs. Thus, many new solutions are developed and divided into two categories. First, the number of comparison points is decreased, and then the calculation is decreased. Both can be applied simultaneously and the amount of computation can be reduced to a minimum.
To reduce the number of comparison points, many solutions are used, and “three-step search TSS”, “four-step search FSS”, etc. are often used to find several points in the current search range and find the minimum MAD value. After that, the area calculation is performed near the minimum MAD value.
Few solutions are used to reduce computation and are commonly used:
SUM (ABS (ab))> = ABS (SUM (a) -SUM (b))
Each of a and b represents the pixel value at each point of two MBs. The meaning of this inequality is that the sum of the absolute pixel differences corresponding to two MBs (the MAD operation mentioned above) is greater than or equal to the absolute difference of the sum of the pixel values of the two MBs (approximate calculation and Called).
All the methods described above apply to the timing domain. However, after converting the time domain to the frequency domain, we have found that the block matching algorithm can be further improved.

本発明は、計算量を減少させる方法を提供し、極小部分のハードディスクを変更し、ビデオ圧縮方法に少しの修正を加えるだけで、周波数領域中で、動き推定を実行することを目的とする。 The present invention provides a method for reducing the amount of calculation, and aims to perform motion estimation in the frequency domain by changing a minimal hard disk and making a few modifications to the video compression method.

肉眼は、高周波数領域では、低周波数領域ほど敏悦ではないので、本発明は、低周波数データだけを取って、動き推定のモーションベクトルを探し出す。
動き推定は、二枚のデジタルアニメーションフレームの相同画像間で、動き関係を定義する。ビデオ圧縮はDCTを用いて、イメージを、時間領域から周波数領域のデータに転換し、データを、直流、低周波数、高周波数の順に排列する。量化ブロックを使用し、データ中の高周波数余剰部分を減少させる。 Since the naked eye is not as agile in the high frequency region as it is in the low frequency region, the present invention takes only the low frequency data and searches for motion vectors for motion estimation.
Motion estimation defines a motion relationship between homologous images of two digital animation frames. Video compression uses DCT to transform an image from time domain to frequency domain data and to sort the data in the order DC, low frequency, and high frequency. Use quantification blocks to reduce high frequency surplus in the data.

本発明の方法は、量化方法の後、動き推定を二枚のデジタルアニメーションフレームの周波数領域のデータに応用して、計算量を減少させる。 The method of the present invention reduces the computational complexity by applying motion estimation to the frequency domain data of two digital animation frames after the quantification method.

現在、大部分のビデオ標準は、異なるアルゴリズムを使用し、データを圧縮する。人間の目の高周波数領域に対する敏感度は、低周波数領域の敏感度ほどではないので、大部分のビデオ圧縮標準は、離散コサイン転換DCT（Discrere Cosine Transfer）プロセスを使用し、画像入力を、時間領域から周波数領域のデータに転換する。その後、データを直流、低周波数、高周波数の順に排列する。量化を応用し、高周波余剰を減少する。可変長符号化VCL（Variable Length Coding）を使用し、コード空間中の余剰を減少する。反圧縮、反DCT（inverse DCT）と反VLCを経たデータは、最後に、時間領域中で、動き推定を用いて、前後二枚のピクチャ間の重複部分を減少し、これについては、図３を参照する。 Currently, most video standards use different algorithms to compress data. Since the sensitivity of the human eye to the high frequency region is not as high as that of the low frequency region, most video compression standards use a discrete cosine transform DCT (Discrere Cosine Transfer) process to convert the image input over time. Convert from domain to frequency domain data. Thereafter, the data is arranged in the order of direct current, low frequency, and high frequency. Apply quantification to reduce high frequency surplus. Variable length coding VCL (Variable Length Coding) is used to reduce the surplus in the code space. Finally, the data that has undergone decompression, inverse DCT (inverse DCT), and anti-VLC uses motion estimation in the time domain to reduce the overlap between the two previous and next pictures. Refer to

図４と図５を参照すると、ビデオ圧縮はDCTを用いて、サンプルピクチャのイメージを、時間領域（図４）から周波数領域（図５）に転換し、データを、直流、低周波数、高周波数の順に、鋸歯状（或いは、その他の形状）に排列する（図６）。その後、量化ブロック（図３中のQ）を使用し、人類が不敏感である高周波数情報を圧縮する。最後に、VLCを使用し、コード空間でデータを圧縮し、バッファ（図３）によりイメージコードを出力する。 Referring to FIGS. 4 and 5, video compression uses DCT to convert the sample picture image from the time domain (FIG. 4) to the frequency domain (FIG. 5), and to convert the data to DC, low frequency, high frequency. Are arranged in a sawtooth shape (or other shape) in this order (FIG. 6). Thereafter, a quantification block (Q in FIG. 3) is used to compress high frequency information that is insensitive to humanity. Finally, using VLC, the data is compressed in the code space, and the image code is output by the buffer (FIG. 3).

再び、図３を参照すると、時間性圧縮を達成するため、反量化（iQ）、反排列（inverse formatting、iF、鋸歯状／或いは、その他の形状）、及び、反DCT（iDCT）の後、動き推定のブロックマッチングが完成する。全情報が、時間領域に回復した後、動き推定中のブロックマッチングが、時間領域中のデータに適用される。 Referring again to FIG. 3, to achieve temporal compression, after inverse quantification (iQ), inverse displacement (iF, sawtooth / or other shape), and anti-DCT (iDCT), Block matching for motion estimation is completed. After all information is recovered in the time domain, block matching during motion estimation is applied to the data in the time domain.

図７を参照すると、本発明は、一種の新方法を提出する。全ブロックマッチングアルゴリズムが周波数領域に適用できるので、動き推定は、iDCTプロセスの前に実行され、図７の二つの矢印１と２で示される。低周波数は人の目には敏感であるので、低周波数情報が最適なマッチングポイントを見つけ、高周波数情報を削除すればよい。例えば、８×８DCTブロック（図５）で、各６４ビットの最初の８ビットだけを得て、動き推定し、その後、計算の複雑性は、元の計算の１２．５％まで減少する。必要時に、動き推定の一部は、iDCT後に完成する。全比較アルゴリズムが周波数領域に適用できるため、また、高周波数情報を削除したために比較点が減少するため、計算の複雑性が減少する。
本発明は、現有のブロック／アルゴリズムを使用し、処理の順序を変更するだけで、計算を減少する目的を達成する。 Referring to FIG. 7, the present invention presents a kind of new method. Since all block matching algorithms can be applied in the frequency domain, motion estimation is performed before the iDCT process and is indicated by the two arrows 1 and 2 in FIG. Since the low frequency is sensitive to human eyes, the low frequency information only needs to find a matching point and delete the high frequency information. For example, in an 8 × 8 DCT block (FIG. 5), only the first 8 bits of each 64 bits are obtained and motion estimation is performed, after which the computational complexity is reduced to 12.5% of the original computation. When necessary, part of the motion estimation is completed after iDCT. Since all comparison algorithms can be applied in the frequency domain, and because high frequency information is deleted, the number of comparison points is reduced, thus reducing the computational complexity.
The present invention achieves the goal of reducing computation by using existing blocks / algorithms and simply changing the order of processing.

本発明では好ましい実施例を前述の通り開示したが、これらは決して本発明に限定するものではなく、当該技術を熟知する者なら誰でも、本発明の精神と領域を脱しない範囲内で各種の変動や潤色を加えることができ、従って本発明明の保護範囲は、特許請求の範囲で指定した内容を基準とする。 In the present invention, preferred embodiments have been disclosed as described above. However, the present invention is not limited to the present invention, and any person who is familiar with the technology can use various methods within the spirit and scope of the present invention. Therefore, the protection scope of the present invention is based on the content specified in the claims.

動き推定中のモーションベクトルを示す図である。It is a figure which shows the motion vector in motion estimation. 公知のフルサーチ動き推定を示す図である。It is a figure which shows well-known full search motion estimation. MPEG４システムがビデオ圧縮に用いられることを示す図である。It is a figure which shows that an MPEG4 system is used for video compression. ビデオ圧縮のサンプルピクチャを示す図である。It is a figure which shows the sample picture of video compression. サンプルピクチャがDCT転換を経た後の結果を示す図である。It is a figure which shows the result after a sample picture passed DCT conversion. DCT転換中の鋸歯形状の順序を示す図である。It is a figure which shows the order of the sawtooth shape in DCT conversion. 本発明のビデオ圧縮に用いられるシステムを示す図である。It is a figure which shows the system used for the video compression of this invention.

Claims

A motion estimation method for block matching in the frequency domain,
Motion estimation defines a motion relationship between homologous images of two digital animation frames;
Video compression uses a DCT (Discrere Cosine Transfer) process to convert the image input from time domain to frequency domain data;
Thereafter, arranging the data in the order of direct current, low frequency, high frequency,
Applying quantification to reduce high frequency surplus in the data;
Consists of
The method is characterized in that, after the quantification, the motion estimation is applied to the data of the two digital animation frames to achieve a reduction in computational complexity.