JP4470440B2

JP4470440B2 - Method for calculating wavelet time coefficient of image group

Info

Publication number: JP4470440B2
Application number: JP2003356768A
Authority: JP
Inventors: ジリアニフランセスコ; レイチェルジュリアン
Original assignee: ビジオウェーブサール
Priority date: 2002-10-30
Filing date: 2003-10-16
Publication date: 2010-06-02
Anticipated expiration: 2023-10-16
Also published as: CA2446271A1; JP2004153818A

Description

本発明は、各分解レベルが、各入力信号対ａ_i，ｂ_iの関数Ｍ（ａ_i，ｂ_i）及び関数Ｄ（ａ_i，ｂ_i）を含むｎ段階の分解レベル（n decomposition level）を生成する時間変換を再帰的に適用することにより、長さ２ⁿのＧＯＰ（Group Of Pictures:画像群）のウェーブレット時間係数（wavelet temporal coefficient）を算出する方法に関する。使用されている技術は、ウェーブレットに基づいており、デジタルビデオ符号化のためのものである。 In the present invention, each decomposition level includes an n decomposition level including a function M (a _i , b _i ) and a function D (a _i , b _i ) of each input signal pair a _i , b _i. The present invention relates to a method for calculating a wavelet temporal coefficient of a GOP (Group Of Pictures: image group) having a length of 2 ⁿ by recursively applying a time transformation that generates the image. The technique used is based on wavelets and is for digital video coding.

画像は、数値の行列であり、各数値が、画素における色の強度、又は輝度、又はアルゴリズムにより算出された値等を、表している。イメージの符号化がバイトでなされると、各数値は、０から２５５迄の値を含むことになる。画像は、画素における色の強度又は輝度の行列を表すときに、フレームと称される。 An image is a matrix of numerical values, and each numerical value represents a color intensity or luminance at a pixel, a value calculated by an algorithm, or the like. If the image is encoded in bytes, each number will contain a value from 0 to 255. An image is called a frame when it represents a matrix of color intensity or luminance at the pixels.

デジタルビデオ信号は、一連のフレームからなる。毎秒、所定量（通例２５又は３０）のフレームが表示されることにより、観測者に対して動作の様相が示される。このような意味において、デジタルビデオ信号は、実際には３−Ｄ信号（３次元）である。ここで、２次元がイメージ平面（単一フレーム）を表し、第３次元が時間（一連の連続フレーム（図１））を表している。 A digital video signal consists of a series of frames. By displaying a predetermined amount (usually 25 or 30) of frames every second, the mode of operation is shown to the observer. In this sense, the digital video signal is actually a 3-D signal (three-dimensional). Here, the two dimensions represent the image plane (single frame) and the third dimension represents time (a series of consecutive frames (FIG. 1)).

このような３−Ｄ信号を効率よく圧縮するためには、空間（２−Ｄ）及び時間（３−Ｄ）の双方の冗長性を活用することが必要である。例えばハール変換（Haar transform）を適用することによって、ビデオ列の空間冗長性を活用するのに、ウェーブレットが広く用いられてきている。 In order to efficiently compress such a 3-D signal, it is necessary to utilize the redundancy of both space (2-D) and time (3-D). Wavelets have been widely used to exploit the spatial redundancy of video sequences, for example by applying a Haar transform.

ハール変換は、ウェーブレット理論において周知であり、最もサポートが少なくてすむウェーブレット変換の１つである。２つの入力値Ａ及びＢが与えられると、対応するハール係数は、単純に、それらの差の半分（half difference）Δ及びそれらの平均μとなる。 The Haar transform is well known in wavelet theory and is one of the least supported wavelet transforms. Given two input values A and B, the corresponding Haar coefficients are simply their half difference Δ and their mean μ.

ビデオ符号化の意味においては、ハール変換は、空間変換及び時間変換の双方に用いられてきている。その２番目の場合、時間分解（temporal decomposition）は、２ⁿのサイズのＧＯＰ（画像群）に適用される。入力データは、ビデオ列の未処理のイメージ（輝度及びクロミナンス値）であってもよく、任意の２−Ｄ線形変換を用いてそれらが空間分解されたもの（変換係数）であってもよい。 In the sense of video coding, Haar transform has been used for both spatial and temporal transforms. In the second case, temporal decomposition is applied to a 2 ⁿ size GOP (image group). The input data may be raw images of video sequences (luminance and chrominance values), or they may be spatially resolved (transform coefficients) using any 2-D linear transform.

ハール変換が時間変換として適用された場合、図２に示された方式が得られる。ここで、入力は、入力列のフレームの２−Ｄハール変換により生成された４つの画像（Ｆ₁，Ｆ₂，Ｆ₃，Ｆ₄）により表される。 When Haar transform is applied as time transform, the scheme shown in FIG. 2 is obtained. Here, the input is represented by _four images (F ₁ , F ₂ , F ₃ , F ₄ ) generated by 2-D Haar transform of the frames of the input sequence.

所与のＦ₁及びＦ₂について、対応する時間ハール変換は平均画像μ₁及び差分画像Δ₁からなり、μ₁及びΔ₁はＦ₁及びＦ₂のハール変換を表している。何段階かの分解レベルを生成するために、ハール変換を再帰的に適用することが可能である。図２の例では、まずハール変換を画像（Ｆ₁，Ｆ₂）及び（Ｆ₃，Ｆ₄）に適用し、次に２つの平均画像（μ₁，μ₂）に適用することにより得られる２レベルの分解が、示される。対応するハール変換は、Δ₁，Δ₂，Δ₃，μ₃により表される。なお、ハール分解は、入力画像が相互に類似している場合、ビデオ符号化の用途に極めて効率的である。この場合、差分画像はゼロに近いので、エントロピー符号器で圧縮するのが容易である。 For a given F ₁ and F ₂ , the corresponding temporal Haar transform consists of an average image μ ₁ and a difference image Δ ₁ , where μ ₁ and Δ ₁ represent the Haar transform of F ₁ and F ₂ . The Haar transform can be applied recursively to generate several levels of decomposition. In the example of FIG. 2, the Haar transform is first applied to the images (F ₁ , F ₂ ) and (F ₃ , F ₄ ), and then applied to the two average images (μ ₁ , μ ₂ ). Two levels of decomposition are shown. The corresponding Haar transform is represented by Δ ₁ , Δ ₂ , Δ ₃ , μ ₃ . It should be noted that Haar decomposition is very efficient for video coding applications when the input images are similar to each other. In this case, since the difference image is close to zero, it is easy to compress with the entropy encoder.

ハール変換では、分解は、各分解について全ての入力に均等に適用される。 In Haar transform, the decomposition is applied equally to all inputs for each decomposition.

本発明は、あらゆる２−Ｄ線形変換により算出される変換係数の時間的冗長性を効率的に活用可能な、ＧＯＰのウェーブレット時間係数を算出する方法を、提案するものである。 The present invention proposes a method for calculating the wavelet time coefficient of a GOP that can efficiently utilize the temporal redundancy of the conversion coefficient calculated by any 2-D linear conversion.

本発明による方法は、後の方のｎ−１段階の分解レベルでは、各分解レベルの変換ブロックは、前段の分解レベルから出力された２つの関数Ｄ（ａ_i，ｂ_i）の和に対応した制御信号により制御されるとともに、前段の分解レベルにおける対応する関数Ｍ（ａ_i，ｂ_i）は、後段の変換ブロックへの入力信号となり、制御信号がゼロである場合、後段の変換ブロックの出力値は、入力信号Ｍ（ａ_i，ｂ_i）の関数Ｍ（Ｍ（ａ_i，ｂ_i），Ｍ（ａ_i+1，ｂ_i+1））及びＤ（Ｍ（ａ_i，ｂ_i），Ｍ（ａ_i+1，ｂ_i+1））となり、制御信号がゼロでない場合、出力信号は前記入力信号となることを特徴とする。 In the method according to the present invention, at the later n-1 decomposition levels, the transform block at each decomposition level corresponds to the sum of two functions D (a _i , b _i ) output from the previous decomposition level. And the corresponding function M (a _i , b _i ) at the previous decomposition level is an input signal to the subsequent conversion block, and when the control signal is zero, The output values are functions M (M (a _i , b _i ), M (a _{i + 1} , b _{i + 1} )) and D (M (a _i , b _i )) of the input signal M (a _i , b _i ). ), M (a _{i + 1} , b _{i + 1} )), and when the control signal is not zero, the output signal is the input signal.

好適な一実施形態によると、関数Ｍ（ａ_i，ｂ_i）は、信号ａ_i，ｂ_iの平均値であり、関数Ｄ（ａ_i，ｂ_i）は、信号ａ_i，ｂ_iの平均差を量子化したものであることを特徴とする。 According to a preferred embodiment, the function M (a _i , b _i ) is the average value of the signals a _i , b _i and the function D (a _i , b _i ) is the average of the signals a _i , b _i . The difference is quantized.

提案される技術は、好ましくは、発明者が「動的時間変換」と称する新規の方法を用いた２−Ｄ線形変換により算出される変換係数の時間的冗長性を効率的に活用する。他の解決策とは異なり、この技術は、コンピュータ処理的に高コストなタスクである動作推定手順（motion estimation procedure）を必要としない。 The proposed technique preferably takes advantage of the temporal redundancy of the transform coefficients calculated by 2-D linear transformation using a novel method that the inventors call “dynamic time transformation”. Unlike other solutions, this technique does not require a motion estimation procedure, which is a computationally expensive task.

一実施形態によると、２ⁿフレームの入力列は、最初に、２−Ｄ線形変換で２ⁿ個の画像のＧＯＰへと個別に変換され、ＧＯＰの各画像は、各入力フレームの変換係数を含み、変換係数は、第１のレベルの時間分解を生成する時間変換へと渡され、各分解は、時間平均及び時間差分を含み、その後のｎ−１段階の各分解レベルでは、制御信号は、前段の分解レベルから出力された２つの時間平均差が量子化されたものの和となる。 According to one embodiment, an input sequence of 2 ⁿ frames is first converted individually into 2 ⁿ image GOPs with a 2-D linear transformation, and each image of the GOP has a transform coefficient for each input frame. And the transform coefficients are passed to a time transform that produces a first level temporal decomposition, each decomposition including a time average and a time difference, and at each subsequent decomposition level of n-1 stages, the control signal is The sum of the two time average differences output from the previous decomposition level is quantized.

なお、本実施形態のために提案される「動的時間変換」は、２−Ｄ線形変換の結果に適用されるものであり、未処理のイメージに適用されるものではない。この選択は、符号器の複雑さに関して利点がある。 The “dynamic time conversion” proposed for the present embodiment is applied to the result of 2-D linear conversion, and is not applied to an unprocessed image. This choice has advantages with respect to encoder complexity.

他の実施形態によると、最初の分解レベルへの入力値は、未処理のイメージであり、制御信号は、前段のレベルから出力された２つの信号における符号化され、量子化され、量子化が復元され、復号されたイメージの平均差の和である。 According to another embodiment, the input value to the first decomposition level is a raw image and the control signal is encoded and quantized in the two signals output from the previous level, and the quantization is It is the sum of the average differences of the restored and decoded images.

好適な一実施形態によると、２−Ｄ線形変換は、２−Ｄ・５．３ウェーブレット変換であり、その低域フィルタは、５タップフィルタであり、高域フィルタは、３タップフィルタであり、時間変換は、ハール時間変換である。 According to a preferred embodiment, the 2-D linear transform is a 2-D 5.3 wavelet transform, the low-pass filter is a 5-tap filter, the high-pass filter is a 3-tap filter, The time conversion is a Haar time conversion.

本発明は、添付の図面とともに提示される。ここで、２−Ｄ線形変換及び時間変換は、ハール変換である。 The present invention is presented with the accompanying drawings. Here, the 2-D linear transformation and the time transformation are Haar transformations.

図３に、第１実施形態により提案される技術の概略方式を示す。 FIG. 3 shows a schematic scheme of the technique proposed by the first embodiment.

提案される「動的時間変換」は、長さ２ⁿの任意のＧＯＰに適用可能である。図３では、ＧＯＰは、長さが４（２²）である。最初に、これらのイメージは、２−Ｄウェーブレットで個別に変換される（このステップには、並列の実現が適していることがある）。結果として、各入力イメージのウェーブレット係数を含んだ空間分解が４つ生成される。なお、各分解には、入力イメージ内の画素と同数の係数が含まれる。そして、これらの係数は、４つの時間分解を生成する「動的時間変換」へと渡される。この場合にも、各分解には、入力イメージ内の画素と同数の係数が含まれる。次により詳細に説明されるように、これらの分解の１つが時間平均であり、３つが時間差分である。 The proposed “dynamic time conversion” is applicable to any GOP of length 2 ⁿ . In FIG. 3, the GOP has a length of 4 (2 ² ). Initially, these images are individually transformed with 2-D wavelets (a parallel implementation may be suitable for this step). As a result, four spatial decompositions containing the wavelet coefficients of each input image are generated. Each decomposition includes the same number of coefficients as the pixels in the input image. These coefficients are then passed to a “dynamic time conversion” that generates four time resolutions. Again, each decomposition includes the same number of coefficients as the pixels in the input image. As will be explained in more detail below, one of these decompositions is a time average and three are time differences.

ＧＯＰ（２ⁿ）の長さは、時間分解の数（ｎ）を定める。ｎが大きくなるほど、時間的冗長性がより活用可能となる。但し、ｎは、符号器による遅延をも規定するので、実際の応用例では、ｎをあまり大きくすることができない。 The length of GOP (2 ⁿ ) determines the number of time resolutions (n). As n increases, temporal redundancy can be utilized more effectively. However, since n also defines the delay due to the encoder, in an actual application example, n cannot be made too large.

提案される方法は、どの入力係数がさらに変換され、どの係数が単に伝達されるだけなのかを、動的に判別する手順を定める。 The proposed method defines a procedure for dynamically determining which input coefficients are further transformed and which are simply transmitted.

定められた方式が、図４に示される。ここでは、その手順は、８つの入力画像Ｉ_t〜Ｉ_t-7に及んでいる。その方式は、第１のレベルにて、上述（図２）の標準的なハール変換が全ての入力に対して適用されることを示している。その結果は、２つの画像となる。上側が平均μであり、下側が差分Δである。差分画像を量子化したものであるＱが、次段の分解レベル用の制御信号として用いられる。一方、平均画像は、次段の分解レベルにおける制御変換ブロック（controlled transform block）用の入力の１つとなる。 The defined scheme is shown in FIG. Here, the procedure covers eight input images I _{t to} I _t-7 . The scheme shows that at the first level, the standard Haar transform described above (FIG. 2) is applied to all inputs. The result is two images. The upper side is the average μ, and the lower side is the difference Δ. Q obtained by quantizing the difference image is used as a control signal for the next decomposition level. On the other hand, the average image is one of the inputs for the controlled transform block at the next decomposition level.

制御変換ブロックの出力は、制御信号に従い、入力信号の平均及び差分となるか、直接、入力信号となる。図５に、制御変換ブロックの詳細機構を示す。入力信号は、Ｃ_n（ｘ，ｙ，ｔ₁）及びＣ_n（ｘ，ｙ，ｔ₂）であり、時刻ｔ₁，ｔ₂でのそれぞれのサブバンド位置（subband position）ｘ，ｙにおけるｎ番目の分解で生成されるウェーブレット係数を、表している。 The output of the control conversion block is the average and difference of the input signals or directly becomes the input signal according to the control signal. FIG. 5 shows the detailed mechanism of the control conversion block. The input signals are C _n (x, y, t ₁ ) and C _n (x, y, t ₂ ), and _n at respective subband positions x and y at times t ₁ and t _2. Represents the wavelet coefficients generated by the th decomposition.

２つのケースが図示されている。第１のケースでは、制御信号Ｑは０である。この場合は、出力信号は、単に、入力のハール変換である（大まかに言えば、入力信号の平均及び差分である）。第２のケースでは、入力信号Ｑは０ではない。この場合は、出力信号は、直接的に入力信号である。 Two cases are shown. In the first case, the control signal Q is zero. In this case, the output signal is simply a Haar transform of the input (roughly speaking, the average and difference of the input signal). In the second case, the input signal Q is not zero. In this case, the output signal is an input signal directly.

提案される「動的時間変換」の機構を説明するために、図６に、２つの例が提示されている。第１の例では、標準的なハール変換機構が示されている。入力信号は、時間領域における強度の不連続性（例えば動いている物体）をシミュレートした区分的定数（piece-wise constant）である。８つ（２³）の入力値は、３段階で分解される。各段階にて、２つの入力信号から、２つの係数が、入力の平均及びそれらの差分の半分として、取得される（この例では、入力０，０が平均及び差分の半分として０，０となり、そして他の入力０，４が、２，２に変換される）。８つの入力信号は、３段階の分解の後８つの係数により表される。その１つは、最後に算出された平均値（この例では５）であり、他の７つは、算出された差分（この例では３，２，０，０，４，０，０）である。これらの係数から、入力値を再構成することが可能である。 To illustrate the proposed "dynamic time conversion" mechanism, two examples are presented in FIG. In the first example, a standard Haar transformation mechanism is shown. The input signal is a piece-wise constant that simulates an intensity discontinuity (eg, a moving object) in the time domain. Eight (2 ³ ) input values are decomposed in three stages. At each stage, from the two input signals, two coefficients are obtained as the average of the input and half of the difference (in this example, the input 0,0 becomes 0,0 as the average and half of the difference). , And other inputs 0, 4 are converted to 2, 2). The eight input signals are represented by eight coefficients after three stages of decomposition. One of them is the last calculated average value (5 in this example), and the other seven are calculated differences (3, 2, 0, 0, 4, 0, 0 in this example). is there. From these coefficients, it is possible to reconstruct the input value.

同一の入力信号に対して、提案される「動的時間変換」を適用した。標準的なハール変換との差異は、分解の第２レベルで生じる。ここでは、入力信号０，４は、２，２ではなく０，４となる。これは、この変換ブロック用の制御信号が、ゼロでないためである（図６では、ゼロでない制御信号は点線になっている）。このような差異のため、３レベルの分解後、以下の８つの出力係数が得られる。すなわち、０が最終的な平均であり、８，４，０，０，４，０，０が、いずれも差分である。標準的なハール変換により得られた結果に対して、２ステップの量子化を適用すると、以下の係数２，１，１，０，０，２，０，０が得られる。これにより、復号後、０，０，０，８，６，６，６，６が得られる。「動的時間変換」を適用することにより得られた係数に対して、同様にすると、０，４，２，０，０，２，０，０が得られ、復号後、０，０，０，８，８，８，８，８が得られる。これは正確に入力と一致している。このことは、上記の例において「動的時間変換」は、標準的なハール変換よりも優れた符号化法であることを、意味している。 The proposed "dynamic time conversion" was applied to the same input signal. Differences from the standard Haar transform occur at the second level of decomposition. Here, the input signals 0, 4 are not 2, 2, but 0, 4. This is because the control signal for this conversion block is not zero (in FIG. 6, the non-zero control signal is a dotted line). Because of these differences, the following eight output coefficients are obtained after three levels of decomposition. That is, 0 is the final average, and 8, 4, 0, 0, 4, 0, 0 are all differences. When the two-step quantization is applied to the result obtained by the standard Haar transform, the following coefficients 2,1,1,0,0,2,0,0 are obtained. As a result, 0,0,0,8,6,6,6,6 are obtained after decoding. Similarly, 0, 4, 2, 0, 0, 2, 0, 0 are obtained for the coefficients obtained by applying “dynamic time conversion”, and after decoding, 0, 0, 0 , 8, 8, 8, 8, 8 are obtained. This exactly matches the input. This means that “dynamic time conversion” in the above example is a better coding method than the standard Haar transform.

提案される「動的時間変換」は、ビデオ符号化の状況で、標準的なハール変換の性能を著しく向上させる。主要な利点は、所与の割合で、提案される「動的時間変換」は、動いている物体周辺のゴーストのような、厄介なアーティファクトをもたらすことがないということである。それに対して、標準的なハール変換では、このようなことが起こる。このようなアーティファクトに対する耐性により、ＧＯＰの長さを伸ばすとともに入力信号における時間的冗長性をよりよく活用することが、可能となる。標準的なハール変換では、動いている物体が場面内にあるときのアーティファクトの存在により、ＧＯＰの長さが、２⁴に制限されている。この制約は、標準的なハール変換の符号化性能に影響する。 The proposed “dynamic time transform” significantly improves the performance of the standard Haar transform in the context of video coding. The main advantage is that, at a given rate, the proposed “dynamic time conversion” does not result in troublesome artifacts such as ghosts around moving objects. On the other hand, this happens in the standard Haar transform. Such resistance to artifacts makes it possible to increase the length of the GOP and better utilize temporal redundancy in the input signal. In a standard Haar transform, the presence of artifacts when moving objects are in the scene, the length of the GOP, are limited to 2 ^4. This restriction affects the encoding performance of a standard Haar transform.

提案される実現例における他の重要な利点は、ＭＰＥＧ−２／４等の標準的な手法に比べて複雑さを軽減できることにある。実際に、提案される符号化処理では、時間的冗長性を活用するために符号器内で先行する復号手順が、必要ではない。 Another important advantage of the proposed implementation is that it can reduce complexity compared to standard approaches such as MPEG-2 / 4. Indeed, the proposed encoding process does not require a preceding decoding procedure in the encoder to take advantage of temporal redundancy.

「動的時間変換」における提案される実現例では、何の欠点ももたらされない。変換が対称かつ可逆的であるため、復号器に対して、追加的な制御信号を何も送る必要がない。 The proposed implementation in “Dynamic Time Conversion” does not bring any drawbacks. Since the transformation is symmetric and reversible, there is no need to send any additional control signals to the decoder.

提案される「動的時間変換」は、ビデオ監視信号の圧縮に、主要な適用領域がある。その理由は以下の通りである。 The proposed “dynamic time conversion” has a major application area in the compression of video surveillance signals. The reason is as follows.

１．保安用ビデオでは、場面の大部分が固定されたままである。このため、長いＧＯＰについて時間的冗長性を活用することは、圧縮性能に著しい効果をもたらす。長いＧＯＰに標準的なハール変換を用いることは、ゴースト・アーティファクトにより制限される。これは、提案される「動的時間変換」を用いることにより、補正される。
２．保安用の用途では、リアルタイム的制約が非常に強い。提案される「動的時間変換」は、ＭＰＥＧ−２／４の標準的手法に比べて、コンピュータ処理の複雑さが非常に低いので、容易にハードウェアで実現可能である。 1. In security videos, most of the scene remains fixed. Thus, exploiting temporal redundancy for long GOPs has a significant effect on compression performance. The use of standard Haar transforms for long GOPs is limited by ghost artifacts. This is corrected by using the proposed “dynamic time conversion”.
2. In security applications, real-time constraints are very strong. The proposed “dynamic time conversion” can be easily implemented in hardware because the complexity of computer processing is very low compared to the standard method of MPEG-2 / 4.

一方、提案される「動的時間変換」は、高度なコンピュータ処理能力を要するものや、静的な場面が圧縮されるといった他の適用領域にも利用可能である。例として、ビデオ電話、ビデオフォーラム、ビデオ会議等が挙げられる。 On the other hand, the proposed “dynamic time conversion” can also be used in other application areas that require high computer processing capacity and static scenes are compressed. Examples include video telephony, video forums, video conferencing and the like.

上記実施形態で説明された方法は、線形変換の領域に適用される。例えば、ウェーブレット係数等の変換係数に適用される。但し、本発明の他の実施形態によると、それは拡張可能であり、イメージ領域にも一般化されるので、イメージのカラー情報にも適用可能である。こうした意味合いで、「動的時間変換」を変換領域（例えば、図２に示されたウェーブレット係数）に適用する代わりに、動的ハール変換を、イメージ領域（例えば、カラー強度。フォーマットの如何を問わない。図７に示されるように、ＲＧＢ、ＹＵＶ、ＹｃｂＣｒ、…）に適用することが可能である。 The method described in the above embodiment is applied to the domain of linear transformation. For example, it is applied to transform coefficients such as wavelet coefficients. However, according to another embodiment of the present invention, it can be expanded and generalized to an image region, and therefore can be applied to color information of an image. In this sense, instead of applying a “dynamic time transform” to the transform domain (eg, the wavelet coefficients shown in FIG. 2), a dynamic Haar transform is applied to the image domain (eg, color intensity, regardless of format). 7. It is possible to apply to RGB, YUV, YcbCr,.

なお、この一般化には、符号器が、復号器にも利用可能な同一の情報にアクセスできることが、必要とされる。このため、図４の方式は、図８に示されるように一般化される必要がある。この方式を理解しやすくするために、単一のレベルのみの分解が例として示される。ここで、入力Ｉ_t−Ｉ_t-3は、図４でのような２−Ｄ変換ではなく、未処理の入力イメージにより表現されている。 Note that this generalization requires that the encoder be able to access the same information that is also available to the decoder. For this reason, the method of FIG. 4 needs to be generalized as shown in FIG. To make this scheme easier to understand, only a single level of decomposition is shown as an example. Here, the input I _t −I _t−3 is expressed by an unprocessed input image instead of the 2-D conversion as shown in FIG. 4.

この一般化では、制御変換ブロックは、図５に示されたもののままである。上述のように、第１の分解の制御変換ブロックにより、２つの画像が生成される。すなわち、平均差Δ及び平均μである。Δは、使用可能な任意の符号化器を用いて、符号化される（図８では、符号化器１の結果をδと称する）。そして、δは、対応する復号器１を用いて復号される。その結果は、入力イメージの差分を近似したΔ´となる。このフレームは、復号器でも利用可能となることが、理解されるであろう。 In this generalization, the control transformation block remains as shown in FIG. As described above, two images are generated by the control transform block of the first decomposition. That is, the average difference Δ and the average μ. Δ is encoded using any available encoder (in FIG. 8, the result of encoder 1 is referred to as δ). Δ is decoded using the corresponding decoder 1. The result is Δ ′ approximating the difference between the input images. It will be appreciated that this frame will also be available at the decoder.

図８の例では、第１の分解レベルにより、２つの画像Δ´が得られる。画素レベルでのそれらの和が、以降の分解レベル用の制御信号となる。制御変換ブロックは、符号化され、量子化され、量子化が復元され、復号された画像の差分が、０であるか０でないかという情報に従って、復号された画像の平均差及び平均を符号化するのか、あるいは、時間符号化なしに実際の値を転送するのか、判別する。この選択は、画素レベルでなされ、符号器から復号器へと追加の情報を送出する必要はない。これは、両者が、それらの決定を同一のデータから取得するためである。取得された画像μ及びΔは、任意の符号化器で個別に符号化されて、２つのストリームΩ及びδが生成される。最後に、復号器へと送出されるストリームは、最終的な平均μが符号化されたものであるΩ、及び３つの平均差が符号化されたものに対応する３つのδストリームである。このように、復号器は、制御信号Δ´を、対応するδから再構成することができる。 In the example of FIG. 8, two images Δ ′ are obtained with the first decomposition level. These sums at the pixel level become control signals for the subsequent decomposition levels. The control transform block is encoded, quantized, quantized, and encoded average difference and average of the decoded image according to the information whether the difference of the decoded image is 0 or not 0 Or whether to transfer the actual value without time encoding. This selection is made at the pixel level and there is no need to send additional information from the encoder to the decoder. This is because they both get their decisions from the same data. The acquired images μ and Δ are individually encoded by an arbitrary encoder to generate two streams Ω and δ. Finally, the streams sent to the decoder are Ω, which is the final average μ encoded, and three δ streams corresponding to the three average differences encoded. In this way, the decoder can reconstruct the control signal Δ ′ from the corresponding δ.

なお、ＭＰＥＧ−４規格では、ブロックが内部で符号化されたか、あるいは、外部との間で符号化されたかについての情報を送信することにより、同様の結果が得られる。この場合には、その選択は、ブロックサイズの解像度で、そして、符号器へと命令を送信する追加的なコストをかけて、なされる。 In the MPEG-4 standard, a similar result can be obtained by transmitting information on whether a block is encoded internally or externally. In this case, the selection is made at block size resolution and at the additional cost of sending instructions to the encoder.

本発明における説明用の実施形態が示されて説明されたが、広範な修正、変更及び置換が、上記の開示内容及びいくつかの実例にて意図されており、本発明のいくつかの特徴は、他の特徴を組み合わせて用いることなく、使用されうる。従って、添付の特許請求の範囲は、広く、そして、本発明の範囲と一致した様式で解釈されることが、適切である。 While illustrative embodiments in the present invention have been shown and described, a wide variety of modifications, changes and substitutions are contemplated in the above disclosure and some examples, and some features of the invention are It can be used without combining other features. Accordingly, it is appropriate that the appended claims be construed broadly and in a manner consistent with the scope of the invention.

３−Ｄ信号とみなされるビデオ列を表わす図である。It is a figure showing the video sequence considered as 3-D signal. ハール時間分解の方式を示す図である。It is a figure which shows the system of Haar time decomposition. 本発明の方法の概略方式を示す図である。It is a figure which shows the schematic system of the method of this invention. 本発明の方法の方式を示す図である。It is a figure which shows the system of the method of this invention. 制御変換ブロックを表わす図である。It is a figure showing a control conversion block. 標準的ハール変換及び本発明の変換に従った分解の説明図である。FIG. 4 is an illustration of decomposition according to a standard Haar transform and a transform of the present invention. イメージ領域に適用された時間ハール変換の方式を示す図である。It is a figure which shows the system of the time Haar transformation applied to the image area | region. イメージ領域に適用された本発明の方法を表わす図である。FIG. 3 represents the method of the present invention applied to an image area.

Claims

A method of calculating a wavelet time coefficient of a GOP (image group) having a length of 2 ⁿ by recursively applying a time transformation that generates n stages of decomposition levels, wherein each decomposition level corresponds to each input signal In the method including the function M (a _i , b _i ) and the function D (a _i , b _i ) of the pair a _i , b _i , at the later n−1 stage decomposition level, The transformation block is controlled by a control signal corresponding to the sum of two functions D (a _i , b _i ) output from the previous decomposition level, and the corresponding function M (a _i at the previous decomposition level. , B _i ) is an input signal to the transform block, and when the control signal is zero, the output value of the transform block is a function M (M (a _i ) of the input signal M (a _i , b _i ). _{, b i), M (a} i + 1, b i + 1)) and D (M (a _i _{b i), M (a i} + 1, b i + 1)) , and when the control signal is not zero, the method in which the output signal is characterized by the said input signal.

The function M (a _i , b _i ) is the average value of the signals a _i and b _i , and the function D (a _i , b _i ) is a quantized version of the average difference between the signals a _i and b _i. The method of claim 1 wherein:

An input sequence of 2 ⁿ frames is first converted individually into G ⁿ of 2 ⁿ images by an arbitrary 2-D linear transformation (wavelet, DCT,...), And each image has a conversion coefficient of each input frame. And the resulting spatial transform coefficients are passed to a time transform that produces a first level temporal decomposition, each decomposition including a time average and a time difference, followed by each decomposition level in n-1 stages. 3. The method according to claim 2, wherein the control signal is a sum of two time average differences output from the previous decomposition level quantized.

The input value to the first decomposition level is an unprocessed image, and the control signal is encoded and quantized in two signals output from the previous level, quantized, decompressed, and decoded image. 3. The method of claim 2, wherein the sum is the sum of the average differences.

4. The 2-D linear transform is a 2-D · 5.3 wavelet transform, the low-pass filter is a 5-tap filter, and the high-pass filter is a 3-tap filter. the method of.

The method according to claim 1, wherein the time conversion is a Haar time conversion.