JPH0646269A

JPH0646269A - Expansion method and compression method for still picture data or device executing the methods

Info

Publication number: JPH0646269A
Application number: JP5033813A
Authority: JP
Inventors: Pii Booritsuku Maatein; ピーボーリックマーティン; Dei Aren Jieimusu; ディアレンジェイムス; Emu Buronsutain Suteiibun; エムブロンスタインスティーブン
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1992-02-28
Filing date: 1993-02-24
Publication date: 1994-02-18
Anticipated expiration: 2016-04-09
Also published as: GB2264609A; DE4306010C2; GB2264609B; JP3155383B2; DE4306010A1; GB9304142D0

Abstract

PURPOSE:To provide the expansion method and compression method for still picture data or a device executing the methods in which bits are used optimizingly while keeping compatibility with the JPEG standards. CONSTITUTION:The expansion method for still picture data to invert the conversion converting a sequence for a substantial value into a sequence for a conversion area coefficient is provided with a stage in which each conversion area coefficient is multiplied with a Q coefficient stored in an N-bit storage register in a form of M-bit exponent part identified to be a Q exponent part and an (N-M)-bit mantissa part identified to be a Q mantissa part and a value of a range larger than 2AN is provided to the Q coefficient by a form of Q coefficient=Q mantissa part *2AQ exponent part and with a stage in which a sequence of the multiplied conversion area coefficient is converted into a 2nd sequence approximated to the sequence for a substantial value.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の背景と概要】本発明は、ＪＰＥＧ（Ｊoint Ｐh
otographic Ｅxperts Ｇroup）の静止画像の伸長／圧縮
規格と互換性のある静止画像データの伸長方法、圧縮方
法及びそのための対応装置に関する。BACKGROUND AND SUMMARY OF THE INVENTION The present invention is based on JPEG (Joint Ph
The present invention relates to a still image data decompression method and compression method compatible with the still image decompression / compression standard of otographic Experts Group), and a corresponding apparatus therefor.

【０００２】高品位画像を圧縮してメモリ又は転送の条
件を節約しなければならない場合、情報がもっと小さく
表現し得る別の空間に画像を第１に転送するのが一般的
である。これは、通常、ブロック毎に線形変換（マトリ
クス逓倍）により行なわれる。典型的な構成は、８画素
の行成分について８点変換を実行し、次いで、この行変
換した画像の８画素の列成分について８点変換を実行す
る。８×８のブロックに配置された６４画素の画素ブロ
ックについて１回に６４画素変換を実行しても同等であ
る。When high quality images must be compressed to save memory or transfer requirements, it is common to transfer the image first to another space where the information can be represented smaller. This is usually performed by linear conversion (matrix multiplication) for each block. A typical configuration performs an 8-point transform on an 8-pixel row component, and then performs an 8-point transform on an 8-pixel column component of this row-transformed image. Even if 64 pixel conversion is executed at once for a pixel block of 64 pixels arranged in an 8 × 8 block, it is equivalent.

【０００３】１次元変換のよい選択は、数１に示すよう
な、独立したチェビチェフ変換である。A good choice for the one-dimensional transform is the independent Chebychev transform, as shown in Eq.

【数１】Ｆ(u)＝ｃ(u)＊ sum ｆ(i) ＊ cos ｕ(2i+1)pi／16 （ただし、sum はｉ＝０〜７に関する）ここで、F (u) = c (u) * sum f (i) * cos u (2i + 1) pi / 16 (where sum relates to i = 0 to 7) where:

【数２】である。[Equation 2] Is.

【０００４】この変換には、幾つかの利点が存在する。
即ち、ａ）圧縮は幾つかの基準でほぼ最適であるｂ）この変換とその逆向き変換を実行するために高速
計算アルゴリズムが存在するｃ）文献“アチェロイ，Ｍ．「画像シーケンスの再現
用ＤＣＴの使用」、ＳＰＩＥ第５９３巻医用画像処理
（1985年）”に記載の、ある種の仮定に基づけば、変換
空間内でデブラリング（初期画像の拡張）が容易に実行
可能であることを含む。There are several advantages to this conversion.
That is: a) compression is nearly optimal on several criteria b) there is a fast computation algorithm to perform this transformation and its inverse transformation c) the document "Acherloy, M." DCT for reproduction of image sequences , SPIE Vol. 593, Medical Image Processing (1985) ”, based on certain assumptions, including the ability to easily perform deblurring (extension of the initial image) in the transform space. .

【０００５】[0005]

【発明の目的】本発明の目的は、静止画像データの伸長
方法、圧縮方法及びそのための対応装置を提供すること
である。本発明のさらなる目的は、ＪＰＥＧ規格と互換
性を保てる静止画像データの圧縮方法並びにその対応装
置を提供することである。本発明の別の目的は、データ
圧縮の量子化及び圧縮段階におけるビットの使用の最適
化である。本発明の別の目的は、量子化及び係数圧縮を
統合するデータ圧縮方式における自乗平均値エラーの最
小化である。本発明のさらなる目的は、データ圧縮の範
囲、並びに、解像度を最適化する方法における一定量の
ビットの使用である。本発明のさらなる目的は、小さい
量子化の値について解像度にＪＰＥＧ規格Ｈ．２６１仕
様を適合させることである。より特定すれば、本発明の
目的は、１６入力１出力のマルチプレクサ及び１６ビッ
ト乗算器を用いることにより、ダイナミックレンジ２８
ビットで量子化の予備圧縮を可能にするための方法を提
供することである。本発明の別の目的は、処理のパイプ
ライン化実装において、最大限の利点まで一般化チェン
変換の速度を使用することである。本発明のさらなる目
的は、変換を実行するために要求されるゲート数を最小
限に抑えることである。より特定すれば、本発明の目的
は、変換の加算回路ネットワーク部分の速度の利点を用
いて同一ハードウェアによる垂直方向及び水平方向の変
換の追加を実行することである。本発明のさらなる目
的、利点及び新規の特徴は、以下の詳細な説明に詳述さ
れ、また、当業者には、以下の詳細な説明の検討により
明らかになり、又は、本発明の実施例から見出されよ
う。本発明の目的及び特徴は、特許請求の範囲に示され
ている構成要素及びその組合せを用いて実現達成し得る
ものである。SUMMARY OF THE INVENTION It is an object of the present invention to provide a decompression method and compression method for still image data and a corresponding apparatus therefor. It is a further object of the present invention to provide a still image data compression method that is compatible with the JPEG standard and a corresponding device. Another object of the invention is the optimization of the use of bits in the quantization and compression stages of data compression. Another object of the present invention is the minimization of root mean square error in a data compression scheme that integrates quantization and coefficient compression. A further object of the invention is the range of data compression, as well as the use of a certain amount of bits in the method of optimizing the resolution. A further object of the invention is the resolution of the JPEG standard H.264 for small quantization values. 261 specifications. More specifically, the object of the present invention is to provide a dynamic range of 28 by using a 16-input 1-output multiplexer and a 16-bit multiplier.
It is to provide a method for enabling pre-compression of quantization in bits. Another object of the present invention is to use the speed of generalized chain transformations in a pipelined implementation of processing to the maximum advantage. A further object of the invention is to minimize the number of gates required to perform the conversion. More particularly, it is an object of the present invention to perform the addition of vertical and horizontal transforms with the same hardware, taking advantage of the speed of the adder network portion of the transform. Further objects, advantages and novel features of the invention will be set forth in the detailed description which follows and will be apparent to those skilled in the art upon examination of the detailed description, or from the examples of the invention. Will be found. The objectives and features of the invention may be realized and obtained by means of the elements and combinations particularly pointed out in the appended claims.

【０００６】[0006]

【本発明の理論面の議論】画像の圧縮及び再生（伸長）
のための完全なシステムは、以下の如く、表現し得る。６４画素入力 ↓ Ａ）ディスクリート・チェビチェフ変換（又は類似の行変換） ↓ Ｂ）ディスクリート・チェビチェフ変換（又は類似の列変換） ↓ Ｚ）（オプション）↓ 難度分類 ↓ Ｃ）レート・スケーラによる乗算 ↓ Ｄ）心理的要因の重みによる乗算 ↓ Ｅ）デブラリング重みによる乗算 ↓ Ｆ）閾値化、量子化、符号化及び転送 ↓ Ｇ）受信、復号、補間 ↓ Ｈ）逆レート・スケーラによる乗算 ↓ Ｉ）逆向き心理的要因の重みによる乗算 ↓ Ｊ）逆ディスクリート・チェビチェフ変換 ↓ Ｋ）逆ディスクリート・チェビチェフ変換 ↓ Ｌ）画素ブロック周辺の円滑化 ↓ 再生した６４画素 → 隣接画素[Discussion on theoretical aspects of the present invention] Image compression and reproduction (decompression)
The complete system for can be expressed as: 64 pixel input ↓ A) Discrete Chebychev conversion (or similar row conversion) ↓ B) Discrete Chebychev conversion (or similar column conversion) ↓ Z) (optional) ↓ Difficulty classification ↓ C) Rate scaler multiplication ↓ D ) Multiply by weight of psychological factor ↓ E) Multiply by debra ringing weight ↓ F) Thresholding, quantization, encoding and transfer ↓ G) Reception, decoding, interpolation ↓ H) Inverse rate scaler ↓ I) Inverse Multiply by weight of orientation psychological factor ↓ J) Inverse discrete Chebyshev transform ↓ K) Inverse discrete Chebychev transform ↓ L) Smoothing around pixel block ↓ Reproduced 64 pixels → Adjacent pixel

【０００７】上記手順は本発明を記述しており、任意の
段階（Ｌ，Ｚ）を省略することにより、現在の技術も説
明している。デブラリング重みによる乗算（段階Ｅ）
は、復号段階（例えば、段階Ｉの後）で実行することも
可能である。デブラリングは入力装置の点拡散関数の補
償のために行なわれる。これは、装置に合わせて設定す
るか、又は、入力画像が既に強調されている場合は排除
されねばならない。画像を際立たせるその他の良い方法
も存在するが、ここに図示した方法は計算が安く済み、
ある種の用途、例えばカラー複写装置に適している。The above procedure describes the invention and also describes the current state of the art by omitting any of the steps (L, Z). Multiply by debra ring weight (stage E)
Can also be performed in the decoding stage (eg after stage I). Deblurring is performed to compensate for the point spread function of the input device. This must be set for the device or eliminated if the input image is already highlighted. There are other good ways to make an image stand out, but the one shown here is cheaper to calculate,
It is suitable for certain applications, such as color copiers.

【０００８】計算負荷の大半が最終的な乗算過程よりな
るように前向き変換（Ａ，Ｂ）の計算を配置することが
可能である。これらの乗算器の積、並びに、段階（Ｃ，
Ｅ）のそれを予め計算しておくことで、圧縮過程を進捗
させることができる。同様に、計算負荷の大半が予備的
乗算過程からなるように逆向き変換（Ｊ，Ｒ）の計算を
配置することも可能である。ここでも、積の予備計算に
より計算（Ｈ，Ｉ）の段階の労力は効果的に排除され
る。さらに、別の変換を２次元ディスクリート・コサイ
ン変換（２次元ＤＣＴ変換）に置換え、さらなる計算の
簡略化が得られる。さらに、心理的要因の重みを選択的
に変化させて段階（Ｂ，Ｄ）の統合乗算器で計算効率を
上げる、例えば、自乗に比例させるようになすことがで
きる。低エネルギー出力変換素子の心理的要因の重みの
小さな変化は、画像の品位又は圧縮比に対して殆ど効果
を有さない。最後に、後述する図１の段階（Ｌ，Ｚ）、
画像の難易度分類及びブロック周辺の円滑化に注意すべ
きである。これらは任意であり、本発明の主題とは独立
しているので、本明細書では最小限の議論しか加えない
こととする。It is possible to arrange the calculation of the forward transform (A, B) so that most of the calculation load consists of the final multiplication process. The product of these multipliers, as well as the stages (C,
By pre-calculating that of E), the compression process can proceed. Similarly, it is possible to arrange the inverse transform (J, R) computations such that most of the computation load consists of preliminary multiplication processes. Again, the preliminary calculation of the product effectively eliminates the effort in the calculation (H, I) stage. Further, another transform is replaced with a two-dimensional discrete cosine transform (two-dimensional DCT transform), and further simplification of calculation is obtained. Further, the weight of the psychological factor can be selectively changed to improve the calculation efficiency in the integrated multiplier of the stage (B, D), for example, to be proportional to the square. Small changes in the weight of the psychological factors of the low energy output conversion element have little effect on the image quality or compression ratio. Finally, the steps (L, Z) of FIG.
Attention should be paid to the difficulty level of the image and smoothing around the block. Since these are optional and independent of the subject matter of the invention, only a minimal discussion will be given here.

【０００９】…チェン・アルゴリズム… １次元チェン・アルゴリズム（文献“チェン，Ｗら、
「ＤＣＴ用高速計算アルゴリズム」ＩＥＥＥＴrans.
Ｃommun. COM-25号（1977年）”参照）は、[Chen algorithm] One-dimensional chain algorithm (reference "Chen, W et al.
"High-speed calculation algorithm for DCT" IEEE Trans.
Commun. COM-25 (1977) ”)

【数３】Ｘ＝２／_NＡ_Nｘのようなものである。ここで、ｘはデータのベクトル、
Ｘは変換されたベクトル、また、Ａ_Nは、Equation 3] is as _{_{X = 2 / N A N x}} . Where x is a vector of data,
X is the transformed vector and A _N is

【数４】Ａ_N＝ｃ(ｋ) cos((２ｊ＋１）ｋπ／２Ｎ）；ｊ，ｋ＝０，１，２，…，Ｎ−１で示される。## EQU4 ## A _N = c (k) cos ((2j + 1) kπ / 2N); j, k = 0, 1, 2, ..., N-1.

【００１０】さらに、このようなＡ_Nは次の行列式Further, such an A _N has the following determinant

【数５】で分解することができる。ここで、Ｒ_N/2は、[Equation 5] Can be disassembled with. Where R _{N / 2} is

【数６】Ｒ_N/2＝ｃ（２ｋ＋１）cos((２ｊ＋１)(２ｋ＋１）π／２Ｎ）；ｊ，ｋ＝０，１，２，…，Ｎ／２−１である。## EQU6 ## R _{N / 2} = c (2k + 1) cos ((2j + 1) (2k + 1) π / 2N); j, k = 0, 1, 2, ..., N / 2−1.

【００１１】行列式Ｚは、チェン行列である点に注意さ
れたい。本出願においては、行列式Ｐとの混乱を回避す
るために表記方法を変更してある。Note that the determinant Z is the Chien matrix. In this application, the notation is changed to avoid confusion with the determinant P.

【００１２】…８点（Ｎ＝８）１次元チェン変換の実施
例… ８点で行なうためには、数５に示すチェン・アルゴリズ
ムを２回再帰的に使用する。第１の反復では、行列式Ｚ
₈ ，Ｒ₄ ，Ｂ₈ を使用する。第２の反復では、Ａ₄ につ
いて解き、行列式Ｚ₄ ，Ｒ₂ ，Ａ₂ ，Ｂ₄ を用いる。こ
れらは上述の式又はチェンの論文“チェン，Ｗら、「Ｄ
ＣＴ用高速計算アルゴリズム」ＩＥＥＥＴrans. Ｃommu
n. COM-25号（1977年）”から簡単に導ける。Example of 8-point (N = 8) one-dimensional Chen transform: In order to perform 8-point conversion, the Chen algorithm shown in Equation 5 is recursively used twice. In the first iteration, the determinant Z
₈ , R ₄ , and B ₈ are used. The second iteration solves for A ₄ and uses the determinant Z ₄ , R ₂ , A ₂ , B ₄ . These are described in the above formula or in the Chen article "Chen, W et al.," D
High-speed calculation algorithm for CT "IEEE Trans. Commu
n. COM-25 (1977) ”can be easily derived.

【００１３】[0013]

【数７】 [Equation 7]

【００１４】ここで、Ｚ₈ は数８、Ｂ₈ は数９、Ｒ₄ は
数１０、Ｚ₄ は数１１、Ｂ₄ は数１２、Ｒ₂ は数１３、
Ａ₂ は数１４に各々示される。Here, Z ₈ is the number 8, B ₈ is the number 9, R ₄ is the number 10, Z ₄ is the number 11, B ₄ is the number 12, R ₂ is the number 13,
A ₂ is shown in equation (14).

【数８】 [Equation 8]

【数９】 [Equation 9]

【数１０】 [Equation 10]

【数１１】 [Equation 11]

【数１２】 [Equation 12]

【数１３】 [Equation 13]

【数１４】ここで、数６から、[Equation 14] Here, from Equation 6,

【数１５】Ｃｎ＝ cos（ｎπ／１６）である。## EQU15 ## Cn = cos (n.pi./16).

【００１５】…チェン・ウ（変法）又はパラメータ変換
… これまでに行なわれてきたのはチェン変換である。これ
を乗算して、計算の節約を実現し、集中的ＤＣＴ実装を
乗算している。しかし、これは出願人が提供したもので
はない。乗算を最小限に減少するには、行列式を数１６
ないし数１８のようにパラメータを取り直す。これは出
願人がチェン・ウ（変法）と呼ぶもので、出願人による
創作物である。... Cheng Woo (variant method) or parameter conversion ... The Chen conversion has been performed so far. Multiply this to achieve computational savings and multiply intensive DCT implementations. However, this was not provided by the applicant. To minimize the multiplication, the determinant is
Or, retake the parameters as shown in Eq. This is what the applicant calls Chen Woo (modified method) and is a creation by the applicant.

【００１６】[0016]

【数１６】 [Equation 16]

【数１７】 [Equation 17]

【数１８】 [Equation 18]

【００１７】ここで、ａ，ｂ，ｃ，ｒは数１９に示され
る。Here, a, b, c and r are shown in the equation (19).

【数１９】ａ＝Ｃ１／Ｃ７＝sin(7π／16)／cos(7π／16）＝tan(7π／16) ｂ＝Ｃ２／Ｃ６＝tan(6π／16) ｃ＝Ｃ３／Ｃ５＝tan(5π／16) ｒ＝Ｃ４＝tan(4π／16)A = C1 / C7 = sin (7π / 16) / cos (7π / 16) = tan (7π / 16) b = C2 / C6 = tan (6π / 16) c = C3 / C5 = tan ( 5π / 16) r = C4 = tan (4π / 16)

【００１８】対角線行列式ＲＦ₄ は、パラメータ化して
いない行列式ＲＡ₄ の標準化因数を含んでいる点に注意
されたい。また、対角線行列式はＲ₂ 及びＡ₂ の定数か
ら作られ得ることにも注意されたい。Note that the diagonal determinant RF ₄ contains the standardization factor for the unparameterized determinant RA ₄ . Also note that the diagonal determinant can be made from the constants of R ₂ and A ₂ .

【００１９】Ａ₈ 行列の再生において、２つの行列式が
分離される。対角線行列式は主行列式から分離しておか
れる。主行列式はＢ_N項により乗算される。適切な再配
置と定数項による乗算の後、数３は数２０のように減少
する。In reproducing the A ₈ matrix, the two determinants are separated. The diagonal determinant is kept separate from the main determinant. The main determinant is multiplied by the B _N term. After proper rearrangement and multiplication by a constant term, the number 3 decreases to the number 20.

【００２０】[0020]

【数２０】Ｘ＝Ｑ（ａ，ｂ，ｃ）Ｐ（ａ，ｂ，ｃ，ｒ）ｘここで、Ｑ（ａ，ｂ，ｃ）は数２１、Ｐ（ａ，ｂ，ｃ，
ｒ）は数２２に示される。X = Q (a, b, c) P (a, b, c, r) x where Q (a, b, c) is the number 21 and P (a, b, c,
r) is shown in Equation 22.

【数２１】 [Equation 21]

【数２２】 [Equation 22]

【００２１】…一般化変換… 一般化８ビットＤＣＴ変換は４つのパラメータａ，ｂ，
ｃ，ｒから求まり、数２３のように表わすことができるGeneralized transform ... The generalized 8-bit DCT transform has four parameters a, b, and
Calculated from c and r, it can be expressed as

【数２３】Ｔ(ａ，ｂ，ｃ，ｒ) ＝Ｐ(ａ，ｂ，ｃ，ｒ) ＸＱ(ａ，ｂ，ｃ) ここに、Ｐ( )，Ｑ( )は上述した通りである。T (a, b, c, r) = P (a, b, c, r) XQ (a, b, c) where P () and Q () are as described above.

【００２２】画像の変換は、２つのこうした変換Ｔ、各
々Ｔ_v及びＴ_hが各々垂直方向及び水平方向の画像の変
換に必要とされる。完全な２次元変換は数２４のように
表現される。The image transformation is such that two such transformations T, T _v and T _h , respectively, are required to transform the image vertically and horizontally, respectively. A complete two-dimensional transformation is expressed as in Equation 24.

【数２４】［Ｆ］＝［Ｔ_v］＾ｔ［ｆ］［Ｔ_h］ここで、ｆは入力画像のブロック、Ｆは出力変換係数、
また、べき乗数“ｔ”は行列の変換を表わす。ここで、
全ての行列は８行８列である。[F] = [T _v ] ^ t [f] [T _h ] where f is a block of the input image, F is an output transform coefficient,
Also, the power multiplier “t” represents matrix transformation. here,
All matrices are 8 rows and 8 columns.

【００２３】対角線行列（例えば、Ｑ）は、それ自身の
変換であるから、全ての行列について、Since a diagonal matrix (eg, Q) is its own transformation, for every matrix,

【数２５】［Ａ］＾ｔ［Ｂ］＾ｔ＝（［Ｂ］［Ａ］）＾ｔ［Ｔ_v］＝［Ｐ_v］［Ｑ_v］［Ｔ_h］＝［Ｐ_h］［Ｑ_h］で表される。そこで、数２４を書き改めると、[Number 25] [A] ^ t [B] ^ t = ([B] [A]) ^ t [T v] = [P v] [Q v] [T h] = [P h] [Q h ] Is represented. So, if you rewrite equation 24,

【数２６】［Ｆ］＝［Ｑ_v］［Ｐ_v］＾ｔ［ｆ］［Ｐ_h］［Ｑ_h］となる。[Number 26] the _{[F] = [Q v]} [P v] ^ t [f] [P h] [Q h].

【００２４】これは、数２７のように表すこともでき
る。This can also be expressed as in Equation 27.

【数２７】Ｆ(i,j) ＝ｑ(i,j) ＊ｇ(i,j) ここで、F (i, j) = q (i, j) * g (i, j) where:

【数２８】［ｇ］＝［Ｐ_v］＾ｔ［ｆ］［Ｐ_h］ｑ(i,j) ＝Ｑ_v(i,i) ＊Ｑ_h(j,j) である。[The number 28 is a _{[g] = [P v]} ^ t [f] [P h] q (i, j) = Q v (i, i) * Q h (j, j).

【００２５】画像ブロックの変換に際して、チェン・ウ
変換を用いて［ｇ］について解き、次いで係数ｑ(i,j)
で乗算することになる。いま、In transforming an image block, the [C] -Cho transform is used to solve for [g] and then the coefficient q (i, j)
Will be multiplied by. Now

【数２９】Ｐ_v＝Ｐ(ａ，ｂ，ｃ，ｒ_v）Ｐ_h＝Ｐ(ａ，ｂ，ｃ，ｒ_h）とすれば、上述の変換の逆方向は、数３０のように表現
される。If P _v = P (a, b, c, r _v ) and P _h = P (a, b, c, r _h ), then the reverse direction of the above-mentioned conversion is expressed as To be done.

【数３０】［ｆ］＝［Ｐ_v′］［Ｑ_v ］［Ｆ］［Ｑ_h ］［Ｐ_h′］＾ｔここで、Ｐ_v′，Ｐ_h′は数３１に示される。[F] = [P _v ′] [Q _v ] [F] [Q _h ] [P _h ′] ̂t Here, P _v ′ and P _h ′ are shown in Expression 31.

【数３１】Ｐ_v′＝Ｐ(ａ，ｂ，ｃ，１／２ｒ_v）Ｐ_h′＝Ｐ(ａ，ｂ，ｃ，１／２ｒ_h）また、解法はチェン・ウ変換経由であるP _v ′ = P (a, b, c, ½r _v ) P _h ′ = P (a, b, c, ½r _h ), and the solution method is via the Cheng-U transformation.

【００２６】…チェンのアルゴリズム… １次元又は２次元チェビチェフ変換とその逆の計算を高
速化するために幾つかの方法が工夫されてきた。周知の
アルゴリズム（チェン）…文献“クーリー及びタキー，
ＪＷ．「（高速）フーリエ級数のアルゴリズム」Ｍath
Ｃomput、第１９巻９０号、２９６〜３０１ページ、19
65年”又は、文献“チェン，Ｗら、「ＤＣＴ用高速計算
アルゴリズム」ＩＥＥＥＴrans. Ｃommun. COM-25号
（1977年）”参照…では任意の８組を上記の行列Ｔで乗
算し、乗算１６回、加算１３回、また、減算１３回だけ
を使用している。このアルゴリズムはパラメータａ，
ｂ，ｃ，ｒの何らかの特別な属性に依存するものではな
い。... Chen's algorithm ... Several methods have been devised to speed up the one-dimensional or two-dimensional Chebyshev transform and vice versa. Well-known algorithm (Chen) ... Reference "Coolie and Tukey,
JW. "(Fast) Fourier Series Algorithm" Math
Comput, Vol. 19, No. 90, pp. 296-301, 19
See "Chang, W et al.," High-speed calculation algorithm for DCT ", IEEE Trans. Commun. COM-25 (1977)" in 1965 .... Any eight sets are multiplied by the above matrix T, and multiplication is performed. It uses only 16 times, 13 times addition, and 13 times subtraction.
It does not depend on any special attributes of b, c, r.

【００２７】…チェン・ウ・アルゴリズム（変法）… 上述のように、［Ｔ］＝［Ｐ］［Ｑ］と因子をとること
により、チェンのアルゴリズムは２つの段階に分割さ
れ、［Ｑ］による乗算では８回の乗算を使用し、［Ｐ］
による乗算においては８回の乗算と残りの数値計算を使
用する。これは、［Ｑ］についての選択の結果であり、
［Ｐ］の幾つかの要素は“１”又は“−１”となり、計
算が消滅している。... Chen-Woo Algorithm (Modified Method) As described above, by taking a factor of [T] = [P] [Q], the Chen algorithm is divided into two stages, and [Q] The multiplication by uses eight multiplications, [P]
The multiplication by uses 8 multiplications and the remaining numerical calculations. This is the result of the choice on [Q],
Some elements of [P] become "1" or "-1", and the calculation disappears.

【００２８】上記で指摘したように、同様の単純化が逆
変換、２次元変換及び逆向き２次元変換に適用される。
８×８ブロックでは正方向又は逆方向２次元変換
（［ｑ］による乗算を除く）の何れかで１２８回の乗算
を用いている。チェンのアルゴリズムの内部的なデータ
の流れを見ると、これらの乗算は８つの加算／減算段階
の構造と４つの乗算段階に埋め込まれている。As pointed out above, similar simplifications apply to inverse transforms, two-dimensional transforms and inverse two-dimensional transforms.
In 8 × 8 blocks, 128 multiplications are used in either forward or backward two-dimensional transformation (except multiplication by [q]). Looking at the internal data flow of Chen's algorithm, these multiplications are embedded in a structure of eight add / subtract stages and four multiply stages.

【００２９】チェンのアルゴリズムがパラメータａ，
ｂ，ｃ，ｒに拘らず作用することを強調するのは重要で
ある。しかし、従来技術で使用されてきた８点ＤＣＴ
は、次のような「真のコサイン変換」のパラメータを有
している。ａ＝tan（7＊ｐｉ／16）ｂ＝tan（6＊ｐｉ／16）ｃ＝tan（5＊ｐｉ／16）ｒ＝sqrt（1／2）＝0.70710678… そこで、行列Ｔで直交するように必要かつ十分なｒを選
択する。The Chen algorithm uses the parameter a,
It is important to emphasize that it works regardless of b, c, or r. However, the 8-point DCT used in the prior art
Has the following "true cosine transform" parameters. a = tan (7 * pi / 16) b = tan (6 * pi / 16) c = tan (5 * pi / 16) r = sqrt (1/2) = 0.70710678 ... Then, the matrix T should be orthogonal. Select necessary and sufficient r.

【００３０】…パラメータ値の選択… チェン変換は、パラメータａ，ｂ，ｃ，ｒに選択した値
によらず動作する。これは、ＱＰにより生成した変換が
直交するためである。あらゆる数を使用して圧縮する必
要のある画像データの所望の非相関を実行し得るような
変換を有することは全く可能である。この変換は、ディ
スクリート・コサイン変換ではなく、また、ＤＣＴの近
似でもないことに注意されたい。これは、それ自体の変
換である。... Selection of parameter value ... The chain transformation operates regardless of the values selected for the parameters a, b, c and r. This is because the transforms generated by QP are orthogonal. It is quite possible to have a transform that can perform the desired decorrelation of the image data that needs to be compressed using any number. Note that this transform is neither a discrete cosine transform nor a DCT approximation. This is its own transformation.

【００３１】しかし、効率的な入力画像の非相関のため
に、また、比較的有意な空間頻度係数への変換のために
は、一般的にＤＣＴが極めて望ましいとされている（文
献“リー，ＢＣ．「高速コサイン変換」ＩＥＥＥＡＳ
ＳＰ、第３３巻（1985年）”参照）。よって、ＤＣＴの
長所を実現するためには、パラメータが数１９に示した
ＤＣＴのそれに近似させて設定されることになる。対抗
する要因は、計算の効率である。加算は乗算より安い
（ハードウェア的な節約はシリコン資源であり、ソフト
ウェア的な節約はサイクル数である）ので、パラメータ
は計算面で効率的になるように選択される。However, for efficient decorrelation of the input image, and for conversion to relatively significant spatial frequency coefficients, DCT is generally considered highly desirable (see "Li, BC "High-speed cosine transform" IEEE AS
SP, Vol. 33 (1985) ”). Therefore, in order to realize the advantages of the DCT, the parameters should be set close to those of the DCT shown in the equation (19). Computation efficiency: Addition is cheaper than multiplication (hardware savings are silicon resources, software savings is cycles), so the parameters are chosen to be computationally efficient.

【００３２】…他のアルゴリズム… その他の計算法もディスクリート・チェビチェフ変換用
に工夫されてきた。例えば、リーによるアルゴリズムは
８点１次元及び６４点２次元変換を各々１２回と１４４
回の乗算で実行している（文献“ウ，Ｈ．Ｒ．及びパオ
リーニ，ＦＪ．「２次元高速コサイン変換」ＩＥＥＥ画
像処理カンファレンス、第１巻（1989年）”又は、文献
“リー，ＢＣ．「高速コサイン変換」ＩＥＥＥＡＳＳ
Ｐ、第３３巻（1985年）”参照）。Other algorithms ... Other calculation methods have also been devised for the discrete Chebyshev transform. For example, Lee's algorithm uses 8-point 1-dimensional and 64-point 2-dimensional transformations 12 times and 144 times, respectively.
It is executed by multiplying the number of times (reference “U, HR. And Paolini, FJ.“ Two-dimensional fast cosine transform ”IEEE image processing conference, Volume 1 (1989)”) or reference “Lee, BC. "High-speed cosine conversion" IEEE ASS
P. 33 (1985) ").

【００３３】しかし、これらの「より高速な」アルゴリ
ズムはチェン・アルゴリズムと比較した場合、以下のよ
うな幾つかの欠点を有している。ａ）Ｔ＝Ｐ×Ｑの単純化（及び逆向き変換について同
様の因数分解）が動作しなくなる。対角線行列Ｑの分離
は、これ以降の単純化に必須である。ｂ）これらのアルゴリズムは任意のパラメータａ，
ｂ，ｃ，ｒについて機能しない。その代り、これらは真
のコサイン・パラメータについて特に有効な三角法の各
種属性に依存している。ｃ）これらのアルゴリズムはさらに構造的に複雑であ
る。これは、工学的に障害となり得るもので、数値の不
安定の可能性を増大させる。However, these "faster" algorithms have some drawbacks when compared to the Chen algorithm: a) The simplification of T = P × Q (and similar factorization for the inverse transform) fails. Separation of the diagonal matrix Q is essential for subsequent simplifications. b) These algorithms have arbitrary parameters a,
It does not work for b, c and r. Instead, they rely on trigonometric attributes that are particularly valid for true cosine parameters. c) These algorithms are more structurally complex. This can be an engineering obstacle and increases the likelihood of numerical instability.

【００３４】…発明の詳細な説明… Ａ］前述した理論面での議論でのシステムを再度参照
すると、段階（Ｃ，Ｄ，Ｅ）が「Ｑ」から導いた前向き
変換後置乗算器に組込まれ得ることがわかる。同様に、
段階（Ｈ，Ｉ）は逆向き変換前置乗算器に組込むことが
できる。これは、レート・スケーラ演算、心理的要因の
重み付け演算（一般に、量子値として公知である）、ま
た、デブラリング重み演算は全て点乗算演算である。
ｂ，ｃ，ｄ，ｅが各々段階Ｂ，Ｃ，Ｄ，Ｅの出力の場
合、Detailed Description of the Invention A] Referring again to the system in the theoretical discussion above, steps (C, D, E) are incorporated into the forward transform post-multiplier derived from "Q". I understand that it can be done. Similarly,
Stage (H, I) can be incorporated into the inverse transform premultiplier. This is a rate scaler operation, a psychological factor weighting operation (generally known as a quantum value), and a debra ring weighting operation are all point multiplication operations.
When b, c, d, and e are outputs of stages B, C, D, and E, respectively,

【数３２】ｃ(i,j) ＝ｂ(i,j) ＊ｑ(i,j) ｄ(i,j) ＝ｃ(i,j) ＊ｒ(i,j) ＝ｂ(i,j) ＊ｑ(i,j) ＊ｒ(i,j) ｅ(i,j) ＝ｄ(i,j) ＊ｕ(i,j) ＝ｂ(i,j) ＊ｑ(i,j) ＊ｒ(i,j) ＊ｕ(i,j) 又は、C (i, j) = b (i, j) * q (i, j) d (i, j) = c (i, j) * r (i, j) = b (i, j) ) * Q (i, j) * r (i, j) e (i, j) = d (i, j) * u (i, j) = b (i, j) * q (i, j) * r (i, j) * u (i, j) or

【数３３】ｅ(i,j) ＝ｂ(i,j) ＊ａｌｌ(i,j) で表される。ここで、ａｌｌ(i,j) は[Expression 33] e (i, j) = b (i, j) * all (i, j) Where all (i, j) is

【数３４】ａｌｌ(i,j) ＝ｑ(i,j) ＊ｒ(i,j) ＊ｕ(i,j) である。また、ｑ(i,j) はレート・スケーラであり、ｒ
(i,j) は心理的要因として選択された（又は利用者の選
択した）量子化重みであり、ｕ(i,j) はデブラリング重
みである。同様に、段階Ｈ及びＩを統合することができ
る。(34) all (i, j) = q (i, j) * r (i, j) * u (i, j). Also, q (i, j) is the rate scaler and r
(i, j) is a quantization weight selected as a psychological factor (or selected by the user), and u (i, j) is a debraling weight. Similarly, stages H and I can be combined.

【００３５】これは明らかに、レート・スケーラ、適合
重み付け及びデブラリング関数が余分な計算のオーバー
ヘッド無しで提供されていることを意味している。上述
のように、この方法は、リーのアルゴリズムなどのよう
な「高速」アルゴリズムには適用できない。This clearly means that the rate scaler, adaptive weighting and debraling functions are provided without any extra computational overhead. As mentioned above, this method is not applicable to "fast" algorithms such as Lee's algorithm.

【００３６】Ｂ］チェンのアルゴリズムは、パラメー
タａ，ｂ，ｃ，ｒにより動作するから、ＤＣＴに匹敵す
る品質及び圧縮が得られるが、高速の乗算が行なえるよ
うな値を選択することになる。B] The Chen algorithm operates with the parameters a, b, c and r, so that quality and compression comparable to DCT can be obtained, but a value that allows fast multiplication is selected. .

【００３７】以下のパラメータは、ＤＣＴのパラメータ
と適度に近似しているが大幅に計算効率が高い。ａ＝5.0 ｂ＝2.5 ｃ＝1.5 ｒ＝0.75 乗算は、ここで大幅に簡単な算術計算に置換される。例
えば、５倍はcopy;shift-left-2;addになる。１．５倍
はcopy;shift-right-1;addになる。これ以外では、有理
乗数の逆向き分子は結合乗数［ｑ］に因数分解し得る。
よって、２．５倍は各々影響する項と影響しない項で５
倍と２倍の乗算になり得る。The following parameters are reasonably close to the DCT parameters, but the calculation efficiency is significantly high. The a = 5.0 b = 2.5 c = 1.5 r = 0.75 multiplication is now replaced by a much simpler arithmetic calculation. For example, 5 times becomes copy; shift-left-2; add. 1.5 times becomes copy; shift-right-1; add. Otherwise, the inverse numerator of a rational multiplier can be factored into a bond multiplier [q].
Therefore, 2.5 times is 5 in terms of affecting and not affecting.
It can be double and double.

【００３８】後者の考え方だと、本来のチェン・アルゴ
リズムにおけるパラメータｒ＝0.75の取扱いは、４の乗
算９６回と３の乗算４回を必要とする。２次元実装にお
けるウ・パオリーニの改善では乗算段階全体が排除さ
れ、これは１６の乗算３６回、１２の乗算２４回、及び
９の乗算４回となる（逆向き変換では９の乗算３６回、
６の乗算２４回、４の乗算４回を使用する）。According to the latter concept, the handling of the parameter r = 0.75 in the original Chen algorithm requires 96 multiplications of 4 and 4 multiplications of 3. The improvement of U Paolini in the two-dimensional implementation eliminates the entire multiplication stage, which is 36 multiplications of 16, 24 multiplications of 12, and 4 multiplications of 9 (36 multiplications of 9 in the inverse transform,
6 times 24 times, 4 times 4 times used).

【００３９】計算速度のコストについては、コサイン変
換に近いパラメータ値も選択し得る。ｂ＝12/5、及び／
又は（and／or）、ｒ＝17/24の置換が可能である。も
う一つの興味深い置換は、 rＲow＝0.7008333 (17/24) rＣol＝0.7 (7/10) である。Regarding the cost of calculation speed, a parameter value close to the cosine transform may be selected. b = 12/5, and /
Alternatively (and / or), a substitution of r = 17/24 is possible. Another interesting substitution is rRow = 0.7008333 (17/24) rCol = 0.7 (7/10).

【００４０】ここで、わずかに異なる変換（別のパラメ
ータｒ）を行と列について使用している。ウ・パオリー
ニ法で求まる乗数を単純化するためにこれを行なってい
る。この方法だと、１５の乗算３６回、８５／８の乗算
１２回、２１／２の乗算１２回、１１９／１６の乗算４
回が得られる（逆向き変換では１１９／１６の乗算３６
回、８５／１６の乗算１２回、２１／４の乗算１２回、
１５／４の乗算４回を使用する）。Here, a slightly different transformation (another parameter r) is used for rows and columns. This is done to simplify the multipliers obtained by the U-Paolini method. With this method, 15 multiplications 36 times, 85/8 multiplications 12 times, 21/2 multiplications 12 times, 119/16 multiplications 4
Times are obtained (119/16 multiplications 36 in reverse transformation)
Times, 85/16 multiplications 12 times, 21/4 multiplications 12 times,
Use 4 multiplications of 15/4).

【００４１】上記で解説した方法では、全ての乗数は圧
縮器（コンプレッサ）における結合乗数［ｑ］と伸長器
（デコンプレッサ）における結合乗数［ｑ］以外で高速
かつ安価となった。これらの各々は変換素子当たり１回
の乗算を要求する。後者は変換係数の大半が“０”とな
るように、また、“０”以外の係数が特別に取扱い得る
“０”に極めて近い整数となるように単純化される。In the method described above, all the multipliers are fast and cheap except for the combination multiplier [q] in the compressor (compressor) and the combination multiplier [q] in the expander (decompressor). Each of these requires one multiplication per conversion element. The latter is simplified so that most of the transform coefficients are "0" and the coefficients other than "0" are integers extremely close to "0" that can be specially handled.

【００４２】Ｃ］圧縮器において、さらなる技術を用
いて結合乗数［ｑ］の計算コストを減少する。レート・
スケーラは現実には任意の値であり、［ｑ］行列要素の
計算を単純な値、例えば、２乗とするには２点間で調節
されることになる。これら６４個の調節が１回だけ実行
される必要がある（レート・スケーラ及びデブラリング
・フィルタを指定した後）。C] In the compressor, an additional technique is used to reduce the computational cost of the combined multiplier [q]. rate·
The scaler is actually an arbitrary value, and is adjusted between two points to make the calculation of the [q] matrix element a simple value, for example, square. These 64 adjustments need to be performed only once (after specifying the rate scaler and deblurring filter).

【００４３】例えば、結合乗数の要素Ｃ及びこれに対応
する伸長乗数要素Ｄが、Ｃ＝0.002773 Ｄ＝0.009367 だったとすると、近似Ｃ≒3/1024=0.002930が発見さ
れ、乗算を単純化するために使用される。これにより
Ｃ′＝3/1024、Ｄ′＝Ｄ＊Ｃ／Ｃ′≒0.008866 とな
る。For example, if the element C of the combination multiplier and the corresponding expansion multiplier element D are C = 0.002773 D = 0.009367, the approximation C≈3 / 1024 = 0.002930 is found, and in order to simplify the multiplication, used. This results in C '= 3/1024 and D' = D * C / C'≈0.008866.

【００４４】[0044]

【１次処理の詳細な説明】…注意事項… ａ）量子化変換空間においては一定の幅（ｗ）とすべ
き係数量子化「ＡＣ」の非０段階をとり、また、幅（ｗ
＊ｑ）とすべき０段階をとるのが便利かつ効果的であ
る。さらに、ｑ＝２は算術的に便利であり、広範囲の圧
縮因子に渡る品質についてほぼ最適である。説明におい
て、“＝２”（「倍幅ゼロ」）をとっているが、本発明
はあらゆる可能なｑをとり得る。ｂ）以下のアルゴリズムは、高精度計算によって１回
だけ実施されるステップ２，４及び８の中間確定を除
き、精密度の限定されている２の補数の２進整数算術の
ために設計してある。さらにまた、ステップ９．１をさ
らに除外すると、本論に記載した整数乗数はコストと速
度について最適化されている。例えば、以下の乗算Ｎrr＊Ｎrc＝Ｄrr′＊Ｄrc′＝1.75＊4.25＝7.4375 を考えると、同一性7.4375＝（8−1）＊（1＋1／16）を
選択することでシフトと加算による乗算が効率的に行な
われている。ｃ）デブラリング乗数は、ここではステップ８に示し
てあるが、通常、ステップ４において行なわれるべきも
のである。多くの用途において、伸長器は画像のデブラ
リングを如何に又はどのように行なうべきか否かを「知
らない」。Ｔhr（）の最良の値は、入力装置とデブラ
リング法に依存することに注意されたい。推奨される方
法は、値ｍ(i,j) について（ステップ８参照）、圧縮時
間で計算し（ステップ４参照）、また、圧縮画像の一部
として転送又は保存することである。ｄ）後続の計算を並列化、時系列化、又は、断片化す
る幾つかの明確な方法が存在する。所定のハードウェア
構成について好適な方法は自明である。[Detailed Description of Primary Processing] ... Precautions ... a) In the quantization transform space, a non-zero stage of the coefficient quantization “AC” that should have a constant width (w) is taken, and the width (w) is also taken.
It is convenient and effective to take 0 steps, which should be * q). Furthermore, q = 2 is arithmetically convenient and nearly optimal for quality over a wide range of compression factors. Although “= 2” (“double width zero”) is used in the description, the present invention can take any possible q. b) The following algorithm is designed for two's complement binary integer arithmetic with limited precision except for the intermediate determination of steps 2, 4 and 8 which is performed only once by high precision computation. is there. Furthermore, with the further exclusion of step 9.1, the integer multipliers described in this paper are optimized for cost and speed. For example, considering the following multiplication Nrr * Nrc = Drr '* Drc' = 1.75 * 4.25 = 7.4375, by selecting the sameness 7.4375 = (8-1) * (1 + 1/16), the multiplication by shift and addition can be performed. It is done efficiently. c) The debraling multiplier, shown here in step 8, should normally be done in step 4. In many applications, the decompressor "does not know" how or how to deblurr an image. Note that the best value for Thr () depends on the input device and debraling method. The recommended method is to calculate for the value m (i, j) (see step 8) the compression time (see step 4) and transfer or save it as part of the compressed image. d) There are some explicit ways of parallelizing, time-sequencing, or fragmenting subsequent computations. The preferred method for a given hardware configuration is self-evident.

【００４５】…疑似符号の実施例… 本出願のこの部分は、基本的に文章と疑似符号で解説し
た本発明の好適実施例である。パラメータ化、前述した
数３４と同様の全(i,j) の計算、前向きＧＣＴの本体の
実行、逆方向の全(i,j) の計算、逆ＧＣＴの本体の実行
を含む複数の章を有する。Pseudo Code Embodiment ... This part of the application is basically a preferred embodiment of the invention described in the text and pseudo code. Multiple chapters including parametrization, calculation of all (i, j) similar to equation 34 above, execution of forward GCT body, calculation of all backwards (i, j), execution of inverse GCT body Have.

【００４６】ステップ１パラメータａ，ｂ，ｃ，ｒは既に示した通りである。行
と列との双方について、ｒの値が存在することに注意さ
れたい。２次元ＧＣＴは、分離可能な変換であり、２工
程で実行可能だが、対称性をなすように要求する制約は
存在しない。よって、圧縮（スケーリング）要因は、図
示したように非対称性となり得る。Step 1 Parameters a, b, c and r are as described above. Note that there are r values for both rows and columns. The two-dimensional GCT is a separable transform and can be performed in two steps, but there are no constraints that require it to be symmetric. Therefore, the compression (scaling) factor can be asymmetric as shown.

【００４７】分子Ｎと分母Ｄの均衡は、上記の値に等し
くなり得る分子及び分母の考え得る組合せを示してい
る。ＧＣＴ実装の設計者は加算回路アレイ中に使用する
実際の値に予知を有している。値の選択は最終的乗算段
階で補正されることになる。The balance of numerator N and denominator D indicates possible combinations of numerator and denominator that can be equal to the above values. Designers of GCT implementations have a clue about the actual values to use in an adder circuit array. The choice of values will be corrected in the final multiplication stage.

【００４８】即ち、That is,

【数３５】 tan 7＊ｐｉ／16 ≒ａ＝Ｎａ／Ｄａ tan 6＊ｐｉ／16 ≒ｂ＝Ｎｂ／Ｄｂ tan 5＊ｐｉ／16 ≒ｃ＝Ｎｃ／Ｄｃ sqrt(0.5) ≒rＲow ＝Ｎrr／Ｄrr sqrt(0.5) ≒rＣol ＝Ｎrc／Ｄrc 0.5／rＲow ＝rＲow′ ＝Ｎrr′／Ｄrr′ 0.5／rＣol ＝rＣol′ ＝Ｎrc′／Ｄrc′ を、上述のように一般化チェン変換のパラメータとして
選択する。「分子」Ｎと「分母」Ｄは整数でなくともよ
いが、計算に便利なように選択する。幾つかの有用な組
合せは、Ｎａ＝5，Ｎｂ＝3，Ｎｃ＝1.5，Ｎrr＝1.75，Ｎrc＝4.2
5，Ｎrr′＝1.25，Ｎrc′＝3，Ｄａ＝1，Ｄｂ＝1.25，
Ｄｃ＝1，Ｄrr＝2.5，Ｄrc＝6，Ｄrr′＝1.75，Ｄrc′
＝4.25 である。Tan 7 * pi / 16 ≈a = Na / Da tan 6 * pi / 16 ≈b = Nb / Db tan 5 * pi / 16 ≈c = Nc / Dc sqrt (0.5) ≈rRow = Nrr / Drr sqrt (0.5) .apprxeq.rCol = Nrc / Drc0.5 / rRow = rRow '= Nrr' / Drr'0.5 / rCol = rCol '= Nrc' / Drc 'are selected as parameters of the generalized Chien transform as described above. The “numerator” N and the “denominator” D do not have to be integers, but are selected for convenience of calculation. Some useful combinations are: Na = 5, Nb = 3, Nc = 1.5, Nrr = 1.75, Nrc = 4.2
5, Nrr ′ = 1.25, Nrc ′ = 3, Da = 1, Db = 1.25,
Dc = 1, Drr = 2.5, Drc = 6, Drr ′ = 1.75, Drc ′
= 4.25.

【００４９】また、繰返すが、本発明は上記タンジェン
トへの有理数の近似を全て含むものである。これによ
り、必要とされる標準化圧縮（標準化スケーラ）を計算
する。Again, the present invention includes all rational approximations to the tangent. This calculates the required standardized compression (standardized scaler).

【００５０】ステップ２また、Step 2 In addition,

【数３６】Ｕ(0）＝Ｕ(4）＝ sqrt（0.5）Ｕ(1）＝Ｕ(7）＝ 1／sqrt（Ｎａ＊Ｎａ＋Ｄａ＊Ｄａ）Ｕ(2）＝Ｕ(6）＝ 1／sqrt（Ｎｂ＊Ｎｂ＋Ｄｂ＊Ｄｂ）Ｕ(3）＝Ｕ(5）＝ 1／sqrt（Ｎｃ＊Ｎｃ＋Ｄｃ＊Ｄｃ）とも書き表せる。[Equation 36] U (0) = U (4) = sqrt (0.5) U (1) = U (7) = 1 / sqrt (Na * Na + Da * Da) U (2) = U (6) = 1 / It can also be written as sqrt (Nb * Nb + Db * Db) U (3) = U (5) = 1 / sqrt (Nc * Nc + Dc * Dc).

【００５１】ステップ３ｉを、（画像空間内の）縦位置、又は、（変換空間内
の）垂直方向の一連の変化を表す｛０，１，２，３，
４，５，６，７｝のインデックスとする。同様に、ｊ
を、（画像空間内の）横位置又は（変換空間内の）水平
方向の一連の変化を表す｛０，１，２，３，４，５，
６，７｝のインデックスとする。Ｄebl (i,j) がデブラ
リング係数を表わし、デブラリングしない場合にはＤeb
l( ）＝１とする。Ｔhr(i,j) は、例えばＣＣＩＴＴの
勧告する逆向き心理要因の重み付けを表わす。ｖ(i,j)
は、画像（広がり）空間内の幾つかのルミナンス値を表
わす。Ｌ(i,j) は、変換（圧縮）空間内の変換されたル
ミナンス値を表わす。Ｓは、再生に使用される算術的正
確性を表わす任意の小さな整数とする。Step 3 Let i be a vertical position (in image space) or a series of vertical changes (in transform space) {0, 1, 2, 3, 3.
4, 5, 6, 7}. Similarly, j
, {0, 1, 2, 3, 4, 5, representing a series of lateral positions (in the image space) or horizontal directions (in the transform space).
6, 7} index. Debl (i, j) represents the debraing coefficient, and Deb if deblurring is not performed.
Set l () = 1. Thr (i, j) represents the weighting of the reverse psychological factor recommended by CCITT, for example. v (i, j)
Represents some luminance value in the image (spread) space. L (i, j) represents the transformed luminance value in the transformation (compression) space. Let S be any small integer that represents the arithmetic correctness used in the reproduction.

【００５２】心理的要因の重み１／Ｔhr(i,j) は、一般
化チェン変換の各々のパラメータの組について再最適化
を行なう。しかし、ステップ１で与えられているパラメ
ータは、同一の行列式Ｔhr( ）が最適なＣＣＩＴＴのパ
ラメータに十分近似している。The psychological factor weight 1 / Thr (i, j) is re-optimized for each parameter set of the generalized Chen transform. However, the parameters given in step 1 are sufficiently close to the optimal CCITT parameters for the same determinant Thr ().

【００５３】ステップ４ここでは、ｇ(i,j) が全ての(i,j) と等しい。変換位置
(i,j) ６４ヶ所に渡る反復で、数３７を満足するように
ｋ(i,j) 及びｓ(i,j) を解くと、Step 4 Here, g (i, j) is equal to all (i, j). Conversion position
(i, j) Iterating over 64 places, solving k (i, j) and s (i, j) so as to satisfy Eq.

【数３７】 q(i,j)＜｛M＊U(i)＊U(j)＊2＾s(i,j)｝/{k(i,j)＊Zr(i)＊Zc(j)＊Thr(i,j)｝である。右辺を可能な限りｇ(i,j) に近付くようにな
し、ｓ(i,j) を整数とすると、ここで、[Formula 37] q (i, j) <{M * U (i) * U (j) * 2 ^ s (i, j)} / {k (i, j) * Zr (i) * Zc (j ) * Thr (i, j)}. Assuming that the right side is as close as possible to g (i, j) and s (i, j) is an integer,

【数３８】ｑ(i,j) ＝1.0，ｋ(i,j)in｛1,3,5,7,9｝ただ
し、i＋j＜4 ｑ(i,j) ＝0.9，ｋ(i,j)in｛1,3,5｝ただし、i＋
j＜4 ｑ(i,j) ＝0.7，ｋ(i,j)＝1 ただし、i＋
j＜4 Ｚr(i) ＝1 （ｉ＝0,1,2又は3の時）Ｚr(i) ＝Ｄrr，（ｉ＝4,5,6又は7の時）Ｚc(j) ＝1 （ｊ＝0,1,2又は3の時）Ｚc(j) ＝Ｄrc （ｊ＝4,5,6又は7の時）Ｚr′(i) ＝1 （ｉ＝0,1,2又は3の時）Ｚr′(i) ＝Ｄrr′ （ｉ＝4,5,6又は7の時）Ｚr′(j) ＝1 （ｊ＝0,1,2又は3の時）Ｚr′(j) ＝Ｄrr′ （ｊ＝4,5,6又は7の時）である。因数ｇ(i,j) は選択した寸法に関係なく量子化
バイアスをなすことを意図している。Q (i, j) = 1.0, k (i, j) in {1,3,5,7,9} where i + j <4 q (i, j) = 0.9, k (i, j ) in {1,3,5} where i +
j <4 q (i, j) = 0.7, k (i, j) = 1, where i +
j <4 Zr (i) = 1 (when i = 0,1,2 or 3) Zr (i) = Drr, (when i = 4,5,6 or 7) Zc (j) = 1 (j = 0, 1, 2 or 3) Zc (j) = Drc (when j = 4, 5, 6 or 7) Zr '(i) = 1 (when i = 0, 1, 2 or 3) Zr '(i) = Drr' (when i = 4,5,6 or 7) Zr '(j) = 1 (when j = 0,1,2 or 3) Zr' (j) = Drr '( j = 4,5,6 or 7). The factor g (i, j) is intended to make the quantization bias independent of the size chosen.

【００５４】ステップ５ …前向きＧＣＴ（フォワードＧＣＴ）の実行…… ステップ５は、前向き変換の疑似符号実行である。以下
のステップでは、断片化フォームにおける２次元変換を
実行する。ルミナンス値ｖ( , ) の８×８ブロック毎に
以下の実行を画像全体に反復する。Step 5 ... Execution of Forward GCT (Forward GCT) ... Step 5 is execution of pseudo-code for forward conversion. The following steps perform a two-dimensional transformation on the fragmented form. The following execution is repeated for the entire image for each 8 × 8 block of the luminance value v (,).

【００５５】ステップ５．１値を準備する。Step 5.1 Prepare the values.

【数３９】Ｍ(i,0) ＝Ｖ(i,0) ＋Ｖ(i,7) M(i,1) ＝Ｖ(i,1) ＋Ｖ(i,6) Ｍ(i,2) ＝Ｖ(i,2) ＋Ｖ(i,5) Ｍ(i,3) ＝Ｖ(i,3) ＋Ｖ(i,4) Ｍ(i,4) ＝Ｖ(i,3) − Ｖ(i,4) Ｍ5(i) ＝Ｖ(i,2) − Ｖ(i,5) Ｍ6(i) ＝Ｖ(i,1) − Ｖ(i,6) ;ｉ＝０，１，２，…，７に対する[Equation 39] M (i, 0) = V (i, 0) + V (i, 7) M (i, 1) = V (i, 1) + V (i, 6) M (i, 2) = V (i, 2) + V (i, 5) M (i, 3) = V (i, 3) + V (i, 4) M (i, 4) = V (i, 3) − V ( i, 4) M5 (i) = V (i, 2) -V (i, 5) M6 (i) = V (i, 1) -V (i, 6); i = 0,1,2, ... , For 7

【００５６】ステップ５．２値を準備する。Step 5.2 Prepare the values.

【数４０】Ｈ(0,j) ＝Ｍ(0,j) ＋Ｍ(7,j) Ｈ(1,j) ＝Ｍ(1,j) ＋Ｍ(6,j) Ｈ(2,j) ＝Ｍ(2,j) ＋Ｍ(5,j) Ｈ(3,j) ＝Ｍ(3,j) ＋Ｍ(4,j) Ｈ(4,j) ＝Ｍ(4,j) − Ｍ(4,j) Ｈ5(j) ＝Ｍ(2,j) − Ｍ(5,j) Ｈ6(j) ＝Ｍ(1,j) − Ｍ(6,j) Ｈ(5,j) ＝Ｈ6(j) ＋Ｈ5(j) Ｈ(6,j) ＝Ｈ6(j) ＝Ｈ5(j) Ｈ(7,j) ＝Ｍ(0,j) − Ｍ(7,j) ；ｊ＝０，１，２，…，７に対するH (0, j) = M (0, j) + M (7, j) H (1, j) = M (1, j) + M (6, j) H (2, j) = M (2, j) + M (5, j) H (3, j) = M (3, j) + M (4, j) H (4, j) = M (4, j) − M ( 4, j) H5 (j) = M (2, j) -M (5, j) H6 (j) = M (1, j) -M (6, j) H (5, j) = H6 (j ) + H5 (j) H (6, j) = H6 (j) = H5 (j) H (7, j) = M (0, j) -M (7, j); j = 0,1,2 ,,, for 7

【００５７】ステップ５．３各々のＨ(i,j) を乗算す
る。Step 5.3 Multiply each H (i, j).

【数４１】ｉ＝０，１，２又は３の時；Ｎrc （ｊ＝５又は６の時）Ｄrc （ｊ＝４又は７の時）１（ノーアクション）（ｊ＝０，１，２又は３の時）ｉ＝４又は７の時；Ｄrr Ｎrr （ｊ＝５又は６の時）Ｄrr Ｄrc （ｊ＝４又は７の時）Ｄrr （ｊ＝０，１，２又は３の時）ｉ＝５又は６の時；Ｎrr Ｎrr （ｊ＝５又は６の時）Ｎrr Ｄrc （ｊ＝４又は７の時）Ｎrr （ｊ＝０，１，２又は３の時）When i = 0, 1, 2 or 3, Nrc (when j = 5 or 6) Drc (when j = 4 or 7) 1 (no action) (j = 0, 1, 2 or 3) i = 4 or 7; Drr Nrr (j = 5 or 6) Drr Drc (j = 4 or 7) Drr (j = 0, 1, 2 or 3) i = When 5 or 6; Nrr Nrr (when j = 5 or 6) Nrr Drc (when j = 4 or 7) Nrr (when j = 0, 1, 2 or 3)

【００５８】ステップ５．４値を準備する。Step 5.4 Prepare the values.

【数４２】Ｅ(0,j) ＝Ｈ(0,j) ＋Ｈ(3,j) Ｅ(1,j) ＝Ｈ(7,j) ＋Ｈ(5,j) Ｅ(2,j) ＝Ｈ(0,j) − Ｈ(3,j) Ｅ(3,j) ＝Ｈ(7,j) − Ｈ(5,j) Ｅ(4,j) ＝Ｈ(I,j) ＋Ｈ(2,j) Ｅ(5,j) ＝Ｈ(6,j) − Ｈ(4,j) Ｅ(6,j) ＝Ｈ(I,j) − Ｈ(2,j) Ｅ(7,j) ＝Ｈ(6,j) ＋Ｈ(4,j) Ｆ(0,j) ＝Ｅ(4,j) ＋Ｅ(0,j) Ｆ(4,j) ＝Ｅ(0,j) − Ｅ(4,j) Ｆ(2,j) ＝Ｄb＊Ｅ(6,j) ＋Ｎb＊Ｅ(2,j) Ｆ(6,j) ＝Ｄb＊Ｅ(2,j) ＋Ｎb＊Ｅ(6,j) Ｆ(1,j) ＝Ｄa＊Ｅ(7,j) ＋Ｎa＊Ｅ(1,j) Ｆ(7,j) ＝Ｄa＊Ｅ(1,j) ＋Ｎa＊Ｅ(7,j) Ｆ(3,j) ＝Ｄc＊Ｅ(5,j) ＋Ｎc＊Ｅ(3,j) Ｆ(5,j) ＝Ｄc＊Ｅ(3,j) ＋Ｎc＊Ｅ(5,j) ；ｊ＝０，１，２，…，７に対する[Equation 42] E (0, j) = H (0, j) + H (3, j) E (1, j) = H (7, j) + H (5, j) E (2, j) = H (0, j) -H (3, j) E (3, j) = H (7, j) -H (5, j) E (4, j) = H (I, j) + H ( 2, j) E (5, j) = H (6, j) -H (4, j) E (6, j) = H (I, j) -H (2, j) E (7, j) = H (6, j) + H (4, j) F (0, j) = E (4, j) + E (0, j) F (4, j) = E (0, j) − E ( 4, j) F (2, j) = Db * E (6, j) + Nb * E (2, j) F (6, j) = Db * E (2, j) + Nb * E (6, j) F (1, j) = Da * E (7, j) + Na * E (1, j) F (7, j) = Da * E (1, j) + Na * E (7, j) F (3, j) = Dc * E (5, j) + Nc * E (3, j) F (5, j) = Dc * E (3, j) + Nc * E (5, j); j = 0, 1, 2, ...

【００５９】ステップ５．５値を準備する。Step 5.5 Prepare the values.

【数４３】Ｚ(i,0) ＝Ｆ(i,0) ＋Ｆ(i,e) Ｚ(i,2) ＝Ｆ(i,0) − Ｆ(i,3) Ｚ(i,4) ＝Ｆ(i,1) ＋Ｆ(i,2) Ｚ(i,6) ＝Ｆ(i,1) ＋Ｆ(i,2) Ｚ(i,1) ＝Ｆ(i,7) ＋Ｆ(i,5) Ｚ(i,3) ＝Ｆ(i,7) − Ｆ(i,5) Ｚ(i,5) ＝Ｆ(i,6) − Ｆ(i,4) Ｚ(i,7) ＝Ｆ(i,6) ＋Ｆ(i,4) Ｇ(i,0) ＝Ｚ(i,4) ＋Ｚ(i,0) Ｇ(i,4) ＝Ｚ(i,0) − Ｚ(i,4) Ｇ(i,2) ＝Ｄb＊Ｚ(i,6) ＋Ｎb＊Ｚ(i,2) Ｇ(i,6) ＝Ｄb＊Ｚ(i,2) − Ｎb＊Ｚ(i,6) Ｇ(i,1) ＝Ｄa＊Ｚ(i,7) ＋Ｎa＊Ｚ(i,1) Ｇ(i,7) ＝Ｄa＊Ｚ(i,1) − Ｎa＊Ｚ(i,7) Ｇ(i,3) ＝Ｄc＊Ｚ(i,5) ＋Ｎc＊Ｚ(i,3) Ｇ(i,5) ＝Ｄc＊Ｚ(i,3) − Ｎc＊Ｚ(i,5) ；ｉ＝０，１，２，…，７に対するZ (i, 0) = F (i, 0) + F (i, e) Z (i, 2) = F (i, 0) −F (i, 3) Z (i, 4) = F (i, 1) + F (i, 2) Z (i, 6) = F (i, 1) + F (i, 2) Z (i, 1) = F (i, 7) + F ( i, 5) Z (i, 3) = F (i, 7) -F (i, 5) Z (i, 5) = F (i, 6) -F (i, 4) Z (i, 7) = F (i, 6) + F (i, 4) G (i, 0) = Z (i, 4) + Z (i, 0) G (i, 4) = Z (i, 0) -Z ( i, 4) G (i, 2) = Db * Z (i, 6) + Nb * Z (i, 2) G (i, 6) = Db * Z (i, 2) -Nb * Z (i, 6) G (i, 1) = Da * Z (i, 7) + Na * Z (i, 1) G (i, 7) = Da * Z (i, 1) -Na * Z (i, 7) G (i, 3) = Dc * Z (i, 5) + Nc * Z (i, 3) G (i, 5) = Dc * Z (i, 3) -Nc * Z (i, 5); i = 0, 1, 2, ...

【００６０】これ以外でも、変換を１次元変換によって
２工程に分割することが可能である。以下は、１次元変
換経路の一実施例である。図８及び図９にこれらのステ
ップを示す。Besides this, it is possible to divide the conversion into two steps by one-dimensional conversion. The following is an example of a one-dimensional conversion path. These steps are shown in FIGS. 8 and 9.

【００６１】[0061]

【数４４】数４４に示すこれらの等式の内の全ての乗数は、シフト
及び加算操作により実行されることに注意されたい。こ
れを、ＧＣＴの行列形状と関連付けるには、ベクトル点
Ｙ６を実施例のように実証する。[Equation 44] Note that all the multipliers in these equations shown in equation 44 are performed by shift and add operations. To relate this to the GCT matrix shape, vector point Y6 is demonstrated as in the example.

【００６２】[0062]

【数４５】 Y6＝C1−C4 ＝(1.25 B1) − (3 B4) ＝1.25(A1−A2) − 3(A4−A3) ＝1.25((X0＋X7) − (X3＋X4)) − 3((X1＋X6) − (X2＋X5)) ＝1.25 X0−3 X1＋3 X2−1.25 X3＋1.25 X4＋3 X5−3 X6＋1.25 X7 Y6/1.25＝X0−2.4 X1＋2.4 X2−X3＋X4＋2.4 X5−2.4 X6＋X7 ＝｜１ −ｂｂ −１１ｂ −ｂ１｜ｘここで、ｂ＝２．４である。これは、等式の行列式Ｐの
６行目である。１．２５による除算はレート・スケーラ
行列中に集められているスケーリング因子である。８×
８画素ブロックの行データはこの加算回路アレイを通過
する。得られた１次元周波数成分は移項され同一のアレ
イを再び通過する。[Equation 45] Y6 = C1-C4 = (1.25 B1)-(3 B4) = 1.25 (A1-A2) -3 (A4-A3) = 1.25 ((X0 + X7)-(X3 + X4))-3 ((X1 + X6) -(X2 + X5)) = 1.25 X0-3 X1 + 3 X2-1.25 X3 + 1.25 X4 + 3 X5-3 X6 + 1.25 X7 Y6 / 1.25 = X0-2.4 X1 + 2.4 X2-X3 + X4 + 2.4 X5-2.4 X6 + X7 = | 1 -b b -11b-b1 | x Here, b = 2.4. This is the sixth row of the determinant P of the equation. The division by 1.25 is a scaling factor collected in the rate scaler matrix. 8x
The row data of the 8-pixel block passes through this adder circuit array. The obtained one-dimensional frequency component is transposed and passes through the same array again.

【００６３】ステップ６ステップ５．５の後、各々の画像の下位ブロックにおい
て、また、６４の位置(i,j) 各々について、ステップ４
からｋ(i,j) 及びｓ(i,j) を用いて数４６に示すような
値を準備する。Step 6 After Step 5.5, in the lower block of each image, and for each of the 64 positions (i, j), Step 4
From k (i, j) and s (i, j), a value as shown in Expression 46 is prepared.

【数４６】Ｌ(i,j) ＝Ｇ(i,j) ＊ｋ(i,j) ＊２＾(-s(i,j)) しかし、これが負の場合（又は、ｉ＝ｊ＝０）、これに
１を加算する。この結果が変換係数Ｌ(i,j) である。L (i, j) = G (i, j) * k (i, j) * 2 ^ (-s (i, j)) However, if this is negative (or i = j = 0) ), And add 1 to this. The result is the conversion coefficient L (i, j).

【００６４】…ステップ６についての注釈… ここでの計算は単純で、これは、−ｋ(i,j) が必ず１，
３，５，７又は９、かつ、常に１であるためと、−２＾
(-s(i,j)) の乗算が単純に右シフト（又は、Ｍが極めて
大きく選択されていればおそらく左シフト）であるため
である。数学的右シフトは、必ず、まるめが起こる。０
に向かってのまるめが実際に望ましく、よって、表現
「if (negative) add 1」である。ｉ＝ｊ＝０の時の１
の加算は、ｖ(i,j) ≧０に依存し、これは、以下のステ
ップ９．１の宣言を単純化するための装置でしかない。Comment on step 6 ... The calculation here is simple, because -k (i, j) is always 1,
3, 5, 7 or 9, and because it is always 1, -2 ^
This is because the multiplication of (-s (i, j)) is simply a right shift (or perhaps a left shift if M is chosen to be very large). A mathematical right shift always causes rounding. 0
Rounding towards is actually desirable, and is thus the expression "if (negative) add 1." 1 when i = j = 0
The addition of p depends on v (i, j) ≧ 0, which is only a device to simplify the declaration of step 9.1 below.

【００６５】ステップ７値Ｌ(i,j) の符号化、保存及び／又は送信最終的にこれらの値が取込まれ画像は次の段階で再生さ
れる。Step 7 Encoding, storing and / or transmitting the values L (i, j) Finally these values are captured and the image is reproduced in the next step.

【００６６】ステップ８これは全(i,j) の反転バージョンである。６４ヶ所の変
換位置(i,j) について反復し、ｍ(i,j) をStep 8 This is the inverted version of all (i, j). Iterate for 64 conversion positions (i, j), and m (i, j) is

【数４７】ｍ(i,j) ＝｛Ｕ(i)²＊Ｕ(j)²＊Zr(i)＊Zc(j)＊Debl(i,j)｝／(4-S-s(i,j)) Zr′(i)＊Zc′(j)＊k(i,j)＊２に最も近い整数として解く。ここで、ｓ(i,j) 及びｋ
(i,j) はステップ４で既に解かれており、表現「Ｚ」は
ステップ４で定義されている。また、Ａ(i,j) をM (i, j) = {U (i) ² * U (j) ² * Zr (i) * Zc (j) * Debl (i, j)} / (4-Ss (i, j )) Solve as the integer closest to Zr '(i) * Zc' (j) * k (i, j) * 2. Where s (i, j) and k
(i, j) has already been solved in step 4 and the expression "Z" has been defined in step 4. Also, let A (i, j) be

【数４８】Ａ(0,0) ＝｛(２＾(S-2))／Drc′＊Drr′｝−0.5＊ｍ(0,0) Ａ(i,j) ＝ｍ(i,j)＊(25−i−j)／64 ；ｉ＝０又はｊ＝０についてに最も近い整数として選択する。A (0,0) = {(2 ^ (S-2)) / Drc '* Drr'}-0.5 * m (0,0) A (i, j) = m (i, j) * (25-i-j) / 64; for i = 0 or j = 0, choose as the closest integer to.

【００６７】…ステップ８についての注釈… 値ｍ(i,j) は、既にステップ４で予め計算しておき、圧
縮画像と共に送信してもよい。これは、定数項とｍ(i,
j) にのみ依存するＡ(i,j) には不要である。レート・
スケーラ及びデブラリング重みが固定されているような
用途において、ｍ(i,j) 及びＡ(i,j) は定数項と見做さ
れる。係数２＾Ｓはステップ９．２及びステップ１０に
おいて、算術的右シフトで、この後除去されることにな
る正確度の剰余ビットを反映する。Ａ(0,0) への調節
は、まるめバイアスを補正して、まるめ補正無しで以下
の出力の使用を可能にする。ここでも述べたように、Ａ
(0,0) はステップ６におけるＬ(0,0) への１の加算に依
存する。補間“（25−i−j）／64”は発見学習的である
が、自乗平均誤差検出における最適近似値である。さら
に、２０に断片化したバージョンである。Comment on step 8 The value m (i, j) may be calculated in advance in step 4 and transmitted together with the compressed image. This is a constant term and m (i,
It is not necessary for A (i, j) which depends only on j). rate·
In applications where the scaler and debraling weights are fixed, m (i, j) and A (i, j) are considered constant terms. The coefficient 2 ^ S is an arithmetic right shift in steps 9.2 and 10 to reflect the remainder bits of accuracy that are to be subsequently removed. Adjustment to A (0,0) corrects the rounding bias and allows the use of the following outputs without rounding correction. As mentioned here, A
(0,0) depends on the addition of 1 to L (0,0) in step 6. The interpolation "(25-i-j) / 64" is a heuristic, but it is an optimum approximation value in the root mean square error detection. Furthermore, it is a fragmented version of 20.

【００６８】ステップ９変換された画像について反復し、上記ステップ５で導い
た変換ルミナンス値Ｌ( , ) の８×８ブロック各々につ
いて、以下を実行する。Step 9 Iterate over the transformed image and for each 8 × 8 block of transformed luminance values L (,) derived in step 5 above, do the following:

【００６９】ステップ９．１値を準備する。Step 9.1 Prepare the values.

【数４９】Ｌ(i,j) ＞０の時；Ｅ(i,j) ＝Ｌ(i,j)＊ｍ(i,j) ＋Ａ(i,j) Ｌ(i,j) ＜０の時；Ｅ(i,j) ＝Ｌ(i,j)＊ｍ(i,j) − Ａ(i,j) Ｌ(i,j) ＝０の時；Ｅ(i,j) ＝０；各々の(i,j) について、ｉ＝０，１，２，…，７、ｊ＝０，１，２，…，７に対するＡ(0,0) は必ず加算されることを意味する。本発明も、
検査“Ｌ(0,0) ＞０”が行なわれず、ステップ６，８が
上記のように（任意で）単純化されない部分を包括して
いる。実際には、小さな乗算、例えば、−１１＜Ｌ(i,
j) ＜１１を乗算の計算費用を節約すべき特例として認
識すべきである。When L (i, j)>0; E (i, j) = L (i, j) * m (i, j) + A (i, j) L (i, j) <0 When; E (i, j) = L (i, j) * m (i, j) -A (i, j) L (i, j) = 0; E (i, j) = 0; For each (i, j), it means that A (0,0) for i = 0,1,2, ..., 7, j = 0,1,2, ..., 7 is always added. The present invention also
The check "L (0,0)>0" is not performed, and steps 6 and 8 encompass the (optionally) non-simplified part as described above. In practice, a small multiplication, for example −11 <L (i,
j) Recognize <11 as a special case that should save the calculation cost of multiplication.

【００７０】ステップ９．２半導体装置の費用を減少
させるために利便であれば、数値Ｅ(i,j) を位置Ｓ１の
任意の数で右シフトする。これらのシフトは、本法のあ
る種の実現において「自由」であることに注意された
い。シフトが自由ではないような実現方法において、Ｅ
(i,j) が０となる場合にこれを無視するように選択して
もよい（又は、Ｓ１＝０と設定しておくことにより、全
てのシフトを排除するように選択することも可能であ
る。）Step 9.2 If it is convenient to reduce the cost of the semiconductor device, the numerical value E (i, j) is right-shifted by an arbitrary number at the position S1. Note that these shifts are "free" in certain implementations of the method. In the realization method that shift is not free, E
When (i, j) becomes 0, it may be selected so as to be ignored (or, by setting S1 = 0, it is also possible to select so as to eliminate all shifts. is there.)

【００７１】ステップ９．３もう一度、２次元の形状
において値を準備する。Step 9.3 Once again, prepare the values in the two-dimensional shape.

【数５０】Ｆ(0,j) ＝Ｅ(4,j) ＋Ｅ(0,j) Ｆ(4,j) ＝Ｅ(0,j) − Ｅ(4,j) F(2,j) ＝Ｄb＊Ｅ(6,j) ＋Ｎb＊Ｅ(2,j) Ｆ(6,j) ＝Ｄb＊Ｅ(2,j) − Ｎb＊Ｅ(6,j) Ｆ(1,j) ＝Ｄa＊Ｅ(7,j) ＋Ｎa＊Ｅ(1,j) Ｆ(7,j) ＝Ｄa＊Ｅ(1,j) − Ｎa＊Ｅ(7,j) Ｆ(3,j) ＝Ｄc＊Ｅ(5,j) ＋Ｎc＊Ｅ(3,j) Ｆ(5,j) ＝Ｄc＊Ｅ(3,j) − Ｎc＊Ｅ(5,j) Ｈ(0,j) ＝Ｆ(0,j) ＋Ｆ(2,j) Ｈ(1,j) ＝Ｆ(4,j) ＋Ｆ(6,j) Ｈ(2,j) ＝Ｆ(4,j) − Ｆ(6,j) Ｈ(3,j) ＝Ｆ(0,j) − Ｆ(2,j) Ｈ(4,j) ＝Ｆ(7,j) − Ｆ(5,j) Ｈ5(j) ＝Ｆ(7,j) ＋Ｆ(5,j) Ｈ6(j) ＝Ｆ(1,j) − Ｆ(3,j) Ｈ(5,j) ＝Ｈ6(j) ＋Ｈ5(j) Ｈ(7,j) ＝Ｆ(1,j) ＋Ｆ(3,j) ；ｊ＝０，１，２，…，７に対するF (0, j) = E (4, j) + E (0, j) F (4, j) = E (0, j) -E (4, j) F (2, j) = Db * E (6, j) + Nb * E (2, j) F (6, j) = Db * E (2, j) -Nb * E (6, j) F (1, j) = Da * E (7, j) + Na * E (1, j) F (7, j) = Da * E (1, j) -Na * E (7, j) F (3, j) = Dc * E (5, j) + Nc * E (3, j) F (5, j) = Dc * E (3, j) -Nc * E (5, j) H (0, j) = F (0, j ) + F (2, j) H (1, j) = F (4, j) + F (6, j) H (2, j) = F (4, j) -F (6, j) H ( 3, j) = F (0, j) -F (2, j) H (4, j) = F (7, j) -F (5, j) H5 (j) = F (7, j) + F (5, j) H6 (j) = F (1, j) -F (3, j) H (5, j) = H6 (j) + H5 (j) H (7, j) = F (1 , j) + F (3, j); for j = 0,1,2, ..., 7

【００７２】ステップ９．４値を準備する。Step 9.4 Prepare the values.

【数５１】Ｇ(i,0) ＝Ｈ(i,4) ＋Ｈ(i,0) Ｇ(i,4) ＝Ｈ(i,0) − Ｈ(i,4) Ｇ(i,2) ＝Ｄb＊Ｈ(i,6) ＋Ｎb＊Ｈ(i,2) Ｇ(i,6) ＝Ｄb＊Ｈ(i,2) − Ｎb＊Ｈ(i,6) Ｇ(i,1) ＝Ｄa＊Ｈ(i,7) ＋Ｎa＊Ｈ(i,1) Ｇ(i,7) ＝Ｄa＊Ｈ(i,1) − Ｎa＊Ｈ(i,7) Ｇ(i,3) ＝Ｄc＊Ｈ(i,5) ＋Ｎc＊Ｈ(i,3) Ｇ(i,5) ＝Ｄc＊Ｈ(i,3) − Ｎc＊Ｈ(i,5) Ｍ(i,0) ＝Ｇ(i,0) ＋Ｇ(i,2) Ｍ(i,1) ＝Ｇ(i,4) ＋Ｇ(i,6) M(i,2) ＝Ｇ(i,4) − Ｇ(i,6) Ｍ(i,3) ＝Ｇ(i,0) − Ｇ(i,2) Ｍ(i,4) ＝Ｇ(i,7) − Ｇ(i,5) Ｍ5(i) ＝Ｇ(i,7) ＋Ｇ(i,5) Ｍ6(i) ＝Ｇ(i,4) − Ｇ(i,3) Ｍ(i,5) ＝Ｍ6(i) −Ｍ5(i) Ｍ(i,6) ＝Ｍ6(i) ＋Ｍ5(i) Ｍ(i,7) ＝Ｇ(i,1) ＋Ｇ(i,3) ；ｉ＝０，１，２，…，７に対するG (i, 0) = H (i, 4) + H (i, 0) G (i, 4) = H (i, 0) −H (i, 4) G (i, 2) = Db * H (i, 6) + Nb * H (i, 2) G (i, 6) = Db * H (i, 2) -Nb * H (i, 6) G (i, 1) = Da * H (i, 7) + Na * H (i, 1) G (i, 7) = Da * H (i, 1) -Na * H (i, 7) G (i, 3) = Dc * H (i, 5) + Nc * H (i, 3) G (i, 5) = Dc * H (i, 3) -Nc * H (i, 5) M (i, 0) = G (i, 0 ) + G (i, 2) M (i, 1) = G (i, 4) + G (i, 6) M (i, 2) = G (i, 4) -G (i, 6) M ( i, 3) = G (i, 0) -G (i, 2) M (i, 4) = G (i, 7) -G (i, 5) M5 (i) = G (i, 7) + G (i, 5) M6 (i) = G (i, 4) -G (i, 3) M (i, 5) = M6 (i) -M5 (i) M (i, 6) = M6 (i ) + M5 (i) M (i, 7) = G (i, 1) + G (i, 3); for i = 0, 1, 2, ..., 7

【００７３】ステップ９．５各々のＭ(i,j) を数５２
に従い、乗算する。Step 9.5 Calculate each M (i, j) by Equation 52
And multiply according to.

【数５２】ｉ＝０，２又は３の時；Ｎrc′ ｊ＝５又は６の時Ｄrc′ ｊ＝４又は７の時１（ノーアクション）ｊ＝０，１，２又は３の時ｉ＝４又は７の時；Ｄrr′Ｎrc′ ｊ＝５又は６の時Ｄrr′Ｄrc′ ｊ＝４又は７の時Ｄrr′ ｊ＝０，１，２又は３の時ｉ＝５又は６の時；Ｎrr′Ｎrc′ ｊ＝５又は６の時Ｎrr′Ｄrc′ ｊ＝４又は７の時Ｎrr′ ｊ＝０，１，２又は３の時When i = 0, 2 or 3, Nrc ′ j = 5 or 6 Drc ′ j = 4 or 7 1 (no action) j = 0, 1, 2 or 3 i = When 4 or 7; When Drr'Nrc 'j = 5 or 6 When Drr'Drc' j = 4 or 7 When Drr 'j = 0, 1, 2 or 3 When i = 5 or 6; Nrr When'Nrc 'j = 5 or 6 Nrr'Drc' j = 4 or 7 Nrr 'j = 0, 1, 2 or 3

【００７４】ステップ９．６値を準備する。Step 9.6 Prepare the values.

【数５３】Ｚ(i,0) ＝Ｍ(i,0) ＋Ｍ(i,7) Ｚ(i,1) ＝Ｍ(i,1) ＋Ｍ(i,6) Z(i,2) ＝Ｍ(i,2) ＋Ｍ(i,5) Ｚ(i,3) ＝Ｍ(i,3) ＋Ｍ(i,4) Ｚ(i,4) ＝Ｍ(i,3) − Ｍ(i,4) Ｚ(i,5) ＝Ｍ(i,2) − Ｍ(i,5) Ｚ(i,6) ＝Ｍ(i,1) − Ｍ(i,6) Ｚ(i,7) ＝Ｍ(i,0) − Ｍ(i,7) ；ｉ＝０，１，２，…，７について[Equation 53] Z (i, 0) = M (i, 0) + M (i, 7) Z (i, 1) = M (i, 1) + M (i, 6) Z (i, 2) = M (i, 2) + M (i, 5) Z (i, 3) = M (i, 3) + M (i, 4) Z (i, 4) = M (i, 3) − M ( i, 4) Z (i, 5) = M (i, 2) -M (i, 5) Z (i, 6) = M (i, 1) -M (i, 6) Z (i, 7) = M (i, 0) -M (i, 7); For i = 0, 1, 2, ..., 7

【００７５】ステップ９．７値を準備する。Step 9.7 Prepare the value.

【数５４】Ｙ(0,j) ＝Ｚ(0,j) ＋Ｚ(7,j) Ｙ(1,j) ＝Ｚ(1,j) ＋Ｚ(6,j) Ｙ(2,j) ＝Ｚ(2,j) ＋Ｚ(5,j) Ｙ(3,j) ＝Ｚ(3,j) ＋Ｚ(4,j) Ｙ(4,j) ＝Ｚ(3,j) − Ｚ(4,j) Ｙ(5,j) ＝Ｚ(2,j) − Ｚ(5,j) Ｙ(6,j) ＝Ｚ(1,j) − Ｚ(6,j) Ｙ(7,j) ＝Ｚ(0,j) − Ｚ(7,j) ；ｊ＝０，１，２，…，７について[Equation 54] Y (0, j) = Z (0, j) + Z (7, j) Y (1, j) = Z (1, j) + Z (6, j) Y (2, j) = Z (2, j) + Z (5, j) Y (3, j) = Z (3, j) + Z (4, j) Y (4, j) = Z (3, j) − Z ( 4, j) Y (5, j) = Z (2, j) -Z (5, j) Y (6, j) = Z (1, j) -Z (6, j) Y (7, j) = Z (0, j) -Z (7, j); For j = 0, 1, 2, ..., 7

【００７６】ステップ１０ステップ９．７の後、各々の画像の下位ブロックにおい
て６４ヶ所の位置(i,j) の各々に対し、値を準備する。Step 10 After step 9.7, a value is prepared for each of the 64 positions (i, j) in the lower block of each image.

【数５５】Ｖ(i,j) ＝Ｙ(i,j) ＊２＾(S1-S) ここで、Ｓ及びＳ１は上記ステップ７，９．２で定義し
た任意の整数である。また、乗算は実際には右シフトで
ある。V (i, j) = Y (i, j) * 2 ^ (S1-S) where S and S1 are arbitrary integers defined in steps 7 and 9.2 above. Also, the multiplication is actually a right shift.

【００７７】ステップ１１実現するシステムにより変化するが、範囲の検証を実行
することが、ここで必要とされることがある。例えば、
ルミナンスの許容範囲が０≦ｖ(i,j) ≦２５５であれ
ば０以下又は２５５以上のＶ(i,j) の値は各々０と２５
５で置き換えることになる。値ｖ(i,j) は、これで再生
された画像ルミナンス値となる。Step 11 Depending on the system implemented, it may now be necessary to perform range verification. For example,
If the allowable range of luminance is 0 ≦ v (i, j) ≦ 255, the values of V (i, j) of 0 or less or 255 or more are 0 and 25, respectively.
Will be replaced by 5. The value v (i, j) becomes the image luminance value reproduced by this.

【００７８】[0078]

【２次処理についての考察】画像の圧縮又は品質を向上
させるために、さらなる方法をとり、１次処理を補足す
るのが通例である。ステップ１０の後、画像の正確性
は、全ての画素の対Ｖ(8I+7,j)、Ｖ(8I+8,j)、及び、全
ての画素の対Ｖ(k,8J+7)、Ｖ(i,8J+8)（つまり、別の画
像ブロック内に分割されていた隣接画素）を通しての反
復により、また、例えば、Ｍをステップ４で用いたレー
ト・スケーラとし、分数表現が最適化に好適な近似でも
あるような（ｖ２−ｖ１）／max（２，１１sqrt(Ｍ)）
によって、これらの値ｖ１，ｖ２を各々増加させまた減
少させることにより改善し得る。Consideration of Secondary Processing It is customary to take further measures and supplement the primary processing to improve the compression or quality of the image. After step 10, the accuracy of the image is determined by all pixel pairs V (8I + 7, j), V (8I + 8, j), and all pixel pairs V (k, 8J + 7), Iterating through V (i, 8J + 8) (ie, adjacent pixels that had been split into another image block), and also, for example, M being the rate scaler used in step 4, optimizes the fractional representation. (V2-v1) / max (2,11sqrt (M)) which is also a suitable approximation to
Can be improved by increasing and decreasing these values v1 and v2, respectively.

【００７９】ステップ６を実行する前に、局部画像領域
の客観的難易度を、接頭符号“０”，“１０”又は“１
１”の出力を各々に付けた３つの形式、単精度、倍精
度、４倍精度の一つに分類するのが望ましい。ステップ
６の計算は次式で置換される。Before performing step 6, the objective difficulty of the local image area is set to the prefix "0", "10" or "1".
It is preferable to classify the output of "1" into one of three formats, single precision, double precision and quadruple precision, each of which is attached to each output. The calculation in step 6 is replaced by the following equation.

【数５６】Ｌ(i,j) ＝Ｇ(i,j)＊Ｋ(i,j)＾(P-s(i,j))＊２ここで、単精度、倍精度、４倍精度の各々について、ｐ
＝０，１又は２である。これは、付加精度が（増分の）
右シフトで排除される必要のあるステップ９．２におい
て補償される。L (i, j) = G (i, j) * K (i, j) ^ (Ps (i, j)) * 2 where single precision, double precision, and double precision , P
= 0, 1 or 2. It has additional precision (incremental)
Compensated in step 9.2, which needs to be eliminated on the right shift.

【００８０】残念なことに極めて有効な単一の分類方式
は発見されていない。現在のところ難易度Ｐを次の４つ
の供給源ａ）Ｐ left及びＰ upが隣接する画像領域の難易度ｂ） sum(i+j)Ｇ(i,j)′ 2)／sum(Ｇ(i,j)′2が変換エ
ネルギーの歪曲ｃ） −Ｇ(0,0) が反転平均ルミナンスｄ） max(sum over fixed width(Histogram(ｖ(i,
j)))) の均一性から導出するような厄介な手段を用いている。Unfortunately, no very effective single classification scheme has been found. At present, the difficulty level P is the following four sources a) The difficulty level of the image area where P left and P up are adjacent b) sum (i + j) G (i, j) ′ 2) / sum (G ( i, j) ′ 2 is the distortion of the conversion energy c) −G (0,0) is the inverted mean luminance d) max (sum over fixed width (Histogram (v (i,
j)))) is used as a troublesome method.

【００８１】ステップ７において、保存するか転送すべ
き変換データＬ( , ) は、さらにエントロピー符号化法
により減少することができる。ビット率に従って幾つか
の初期設定ホフマン・テーブルにＣＣＩＴＴの作成した
ジグザグ・ラン及びテンプレート符号（zigzag run and
template符号）を使用しており、また、推奨するもの
である。確定性については、以下の章でこれの実施例を
詳細に述べる。In step 7, the transformed data L (,) to be stored or transferred can be further reduced by entropy coding. CCITT created zigzag run and template codes on some default Hoffman tables according to bit rate.
(template code) is used and is recommended. For determinism, examples of this will be described in detail in the following sections.

【００８２】…圧縮ファイルフォーマットの例… 圧縮された画像は、次のように表現される。１）接頭辞（画像幅、高さ、レート・スケーラＭなど）２）画素ブロック０画素ブロック１画素ブロック２ … 画素ブロックＮ−１３）接尾辞（あれば）Example of compressed file format: A compressed image is expressed as follows. 1) Prefix (image width, height, rate scaler M, etc.) 2) Pixel block 0 Pixel block 1 Pixel block 2 ... Pixel block N-1 3) Suffix (if any)

【００８３】ここで、各々の画素ブロックは次のように
表現される。１）精度符号（選択段階Ｚで決定する）２）ＤＣ係数デルタ符号３）ＡＣ係数符号（０又はそれ以上の回数反復）４）ブロック終端符号Here, each pixel block is expressed as follows. 1) Precision code (determined in the selection stage Z) 2) DC coefficient delta code 3) AC coefficient code (0 or more iterations) 4) Block termination code

【００８４】ここで、各々のＡＣ係数符号は次のように
表現される。１）９桁の０の拡張子（Ｅ回反復、Ｅ０）２）ラン及びテンプレート符号の記述（Ｒ，Ｔ）３）係数値符号（１ビット）４）最上位ビットを削除した係数の絶対値（Ｔビット）Here, each AC coefficient code is expressed as follows. 1) 9-digit 0 extension (E times repeated, E0) 2) Run and template code description (R, T) 3) Coefficient value code (1 bit) 4) Absolute value of coefficient with the most significant bit removed (T bit)

【００８５】ここで、“Ｒ＋（＊Ｅ”は「ジグザグ」な
順番でこれに先行する０値の係数の数、また、Ｔは係数
の絶対値の最上位ビットのビット位置で、例えば、Ｔ＝
３なら係数は１１又は−１１である。ビット位置：８７６５４３２１０１１＝０００００１０１１（２進） −−最上位ビットHere, "R + (* E" is the number of zero-valued coefficients preceding this in a "zigzag" order, and T is the bit position of the most significant bit of the absolute value of the coefficient, for example, T =
If 3, the coefficient is 11 or -11. Bit position: 876543210 11 = 000001011 (binary) --- most significant bit

【００８６】ＤＣ係数デルタの選択又は符号化は詳述し
ないが、ＡＣラン及びテンプレート（run and templat
e）符号としてもっと高いビット率で有用なホフマン符
号の例を下記に提示しておく。なお、{０} はｎ個の連続する０（ｎ＝０，１，２，
３，…）、ｘｘはｗ＝０，１，２又は３として解釈され
る２ビット、ｘはｗ＝０又は１として解釈される１ビッ
トである。The selection or encoding of the DC coefficient delta will not be detailed, but it will be the AC run and template.
e) An example of a Hoffman code useful as a code at a higher bit rate is presented below. Note that {0} is n consecutive 0s (n = 0, 1, 2,
, ...), where xx is 2 bits interpreted as w = 0, 1, 2 or 3 and x is 1 bit interpreted as w = 0 or 1.

【００８７】…１２８点及び２５６点変換… 前記の方法は、さらに大きな８×１６又は１６×１６の
一般化チェン変換で使用可能である。さらに、一般化し
たチェン変換についての方法は、１次元１６点ＧＣＴ
が、次式のように与えられると記述することで明確にな
る筈である（「バタフライ順列」の行を伴い標準化後乗
算の必要がない）。128-point and 256-point transforms ... The above method can be used with larger 8 × 16 or 16 × 16 generalized chain transforms. Furthermore, the generalized method for the Chien transform is a one-dimensional 16-point GCT.
However, it should be clear by describing that it is given by the following equation (there is no need for post-standardization multiplication accompanied by the row of "butterfly permutation").

【００８８】[0088]

【数５７】 [Equation 57]

【００８９】ここで、ＧＣＴ８(ａ，ｂ，ｃ，ｒ)，Ｇ
Ｑ８(ｅ，ｆ，ｇ，ｈ，ｒ，ｓ，ｔ)は、数５８に示さ
れる。Here, GCT 8 (a, b, c, r), G
Q 8 (e, f, g, h, r, s, t) is shown in Equation 58.

【数５８】 [Equation 58]

【００９０】さらに、「真のコサイン」パラメータは、
次式で示される。Furthermore, the "true cosine" parameter is
It is shown by the following formula.

【数５９】ｑ＝ tan 15ｐｉ／32 ≒ 10.1532 ａ＝ tan 14ｐｉ／32 ≒ 5.0273 ｆ＝ tan 13ｐｉ／32 ≒ 3.2966 ｂ＝ tan 12ｐｉ／32 ≒ 2.4142 ｇ＝ tan 11ｐｉ／32 ≒ 1.8709 ｃ＝ tan 10ｐｉ／32 ≒ 1.4966 ｈ＝ tan 9ｐｉ／32 ≒ 1.2185 ｒ＝ cos 8ｐｉ／32 ≒ 0.7071 ｔ＝ cos 12ｐｉ／32 ≒ 0.3827 ｓ＝ cos 4ｐｉ／32 ＝ｔ＊ｂQ = tan 15 pi / 32 ≈ 10.1532 a = tan 14 pi / 32 ≈ 5.0273 f = tan 13 pi / 32 ≈ 3.2966 b = tan 12 pi / 32 ≈ 2.4142 g = tan 11 pi / 32 ≈ 1.8709 c = tan 10 pi / 32 ≈ 1.4966 h = tan 9pi / 32 ≈ 1.2185 r = cos 8pi / 32 ≈ 0.7071 t = cos 12pi / 32 ≈ 0.3827 s = cos 4pi / 32 = t * b

【００９１】使用しているパラメータは、次式の通りで
ある。The parameters used are as follows:

【数６０】ｅ＝10 ａ＝5 ｆ＝3.25 ｂ＝2.4 ｇ＝1.875 ｃ＝1.5 ｈ＝1.25 ｒ＝17／240.708333 ｔ＝5／13 ≒ 0.384615 ｓ＝ｔ＊ｂ＝12／13 ＧＱ８(ｅ，ｆ，ｇ，ｈ，ｒ，ｓ，ｔ)の反転は、ＧＱ
８(ｅ，ｆ，ｇ，ｈ，１／２ｒ，ｔ′，ｂ，ｔ′)の移項
である。[Equation 60] e = 10 a = 5 f = 3.25 b = 2.4 g = 1.875 c = 1.5 h = 1.25 r = 17 / 240.708333 t = 5/13 ≈ 0.384615 s = t * b = 12/13 GQ 8 (e , F, g, h, r, s, t) is the inverse of GQ
8 (e, f, g, h, 1 / 2r, t ', b, t').

【００９２】ここで、Here,

【数６１】ｂ＝ｓ／ｔｔ′＝１／（ｔ＋ｔ＊ｂ＊ｂ）である。B = s / t t ′ = 1 / (t + t * b * b).

【００９３】…行列式の例… 行列式ＴＰの移項コサイン変換（ａ＝5.02734，ｂ=2.41421，ｃ＝1.4966
1，ｒ＝0.70711）[Example of determinant ...] Transposition of determinant TP Cosine transform (a = 5.02734, b = 2.41421, c = 1.4966
1, r = 0.70711)

【数６２】 0.1768 0.1768 0.1768 0.1768 0.1768 0.1768 0.1768 0.1768 0.2452 0.2079 0.1389 0.0488 -0.0488 -0.1389 -0.2079 -0.2452 0.2310 0.0957 -0.0957 -0.2310 -0.2310 -0.0957 0.0957 0.2310 0.2070 -0.0488 -0.2452 -0.1389 0.1389 0.2452 0.0488 -0.2079 0.1768 -0.1768 -0.1768 0.1768 0.1768 -0.1768 0.1768 0.1768 0.1389 -0.2452 0.0488 0.2079 -0.2079 -0.0488 0.2452 -0.1389 0.0957 -0.2310 0.2310 -0.0957 -0.0957 -0.2310 -0.2310 0.0957 0.0488 -0.1389 0.2079 -0.2452 0.2452 0.2452 -0.2079 0.1389(Expression 62) 0.1768 0.1768 0.1768 0.1768 0.1768 0.1768 0.1768 0.1768 0.2452 0.2079 0.1389 0.0488 -0.0488 -0.1389 -0.2079 -0.2452 0.2310 0.0957 -0.0957 -0.2310 -0.2310 -0.0957 0.0957 0.2310 0.2070 -0.0488 -0.2452 -0.1389 0.1389 0.2452 0.0488 -0.2079 0.1768- 0.1768 -0.1768 0.1768 0.1768 -0.1768 0.1768 0.1768 0.1389 -0.2452 0.0488 0.2079 -0.2079 -0.0488 0.2452 -0.1389 0.0957 -0.2310 0.2310 -0.0957 -0.0957 -0.2310 -0.2310 0.0957 0.0488 -0.1389 0.2079 -0.2452 0.2452 0.2452 -0.2079 0.1389

【００９４】関連チェン変換（ａ＝5.0、ｂ＝2.4、ｃ＝
1.5、ｒ＝0.7）Associated Chain Transform (a = 5.0, b = 2.4, c =
1.5, r = 0.7)

【数６３】 0.1768 0.1768 0.1768 0.1768 0.1768 0.1768 0.1768 0.1768 0.2451 0.2059 0.1373 0.0490 -0.0490 -0.1373 -0.2059 -0.2451 0.2308 0.0962 -0.0962 -0.2308 -0.2308 -0.0962 0.0962 0.2308 0.2080 -0.0485 -0.2427 -0.1387 0.1387 0.2427 0.0485 -0.2080 0.1768 -0.1768 -0.1768 0.1768 0.1768 -0.1768 -0.1768 0.1768 0.1387 -0.2427 0.0485 0.2080 -0.2080 -0.0485 0.2427 -0.1387 0.0962 -0.2308 0.2308 -0.0962 -0.0962 0.2308 -0.2308 0.0962 0.0490 -0.1373 0.2059 -0.2451 0.2451 -0.2059 0.1373 -0.0490(Equation 63) 0.1768 0.1768 0.1768 0.1768 0.1768 0.1768 0.1768 0.1768 0.2451 0.2059 0.1373 0.0490 -0.0490 -0.1373 -0.2059 -0.2451 0.2308 0.0962 -0.0962 -0.2308 -0.2308 -0.0962 0.0962 0.2308 0.2080 -0.0485 -0.2427 -0.1387 0.1387 0.2427 0.0485 -0.2080 0.1768- 0.1768 -0.1768 0.1768 0.1768 -0.1768 -0.1768 0.1768 0.1387 -0.2427 0.0485 0.2080 -0.2080 -0.0485 0.2427 -0.1387 0.0962 -0.2308 0.2308 -0.0962 -0.0962 0.2308 -0.2308 0.0962 0.0490 -0.1373 0.2059 -0.2451 0.2451 -0.2059 0.1373 -0.0490

【００９５】[0095]

【装置の詳細な説明】本発明についての詳細な説明を提
供したので、本発明の態様を具体化する装置について解
説する。以下の説明を通して、「点（point）」は任意
の精度のスケーラ・レジスタ又はデータ経路を表わし、
通常、８ないし１２ビットである。適切な精度を決定す
るための方法は公知である（文献“ジャラリ及びラオ．
「制限つきワード長とＦＤＣＴ処理の正確性」IEEE ASS
P-81、第３巻ページ１１８０〜２”参照）。Detailed Description of the Apparatus Having provided a detailed description of the present invention, an apparatus embodying aspects of the present invention will be described. Throughout the following discussion, "point" represents a scaler register or datapath of arbitrary precision,
Usually 8 to 12 bits. Methods for determining the appropriate precision are known (reference "Jalari and Lao.
"Limited word length and accuracy of FDCT processing" IEEE ASS
See P-81, Volume 3, pages 1180-2 ").

【００９６】ソフトウェアによる方法において、変換段
は統合されウ・パオリーニ拡張が採用された。好適実施
例の半導体装置では、単に８点変換装置を垂直方向及び
水平方向の検出に一つずつ２台提供するのが最も便利で
ある。垂直方向及び水平方向の変換の間で６４点シフト
アレイを提供する必要があり、同様に変換部と符号化部
の間に緩衝装置を提供する必要がある。In the software method, the conversion stages were integrated and the U. Paolini extension was adopted. In the semiconductor device of the preferred embodiment, it is most convenient to simply provide two 8-point converters, one for vertical and one for horizontal detection. It is necessary to provide a 64-point shift array between the vertical and horizontal transforms, as well as a buffer between the transform and the encoder.

【００９７】本発明は、白黒用装置、及び／又は、圧縮
と伸長のための別個の装置を含むが、好適実施例（図
７）は３原色データを操作するコンプレッサ（画像デー
タ圧縮装置…図１（ａ))とデコンプレッサ（画像データ
伸長装置…図１（ｂ))の両方を含んでいる。Although the present invention includes a black and white device and / or separate devices for compression and decompression, the preferred embodiment (FIG. 7) is a compressor (image data compression device ... 1 (a)) and a decompressor (image data decompression device ... Fig. 1 (b)).

【００９８】データは８画素のベクトルでコンプレッサ
へ収容され（図２（ａ）参照）、これがさらに辞書の順
序で６４画素のブロックに配置される。ブロックの処理
はパイプライン化されている（図２（ｂ))。コンプレッ
サへの画素入力は、“Ｒ”（赤）と“Ｇ”（緑）と
“Ｂ”（青）よりなる。これらはルミナンス・クロミナ
ンス空間にすぐに変換される（このような変換の理由は
周知である）。The data is stored in the compressor in a vector of 8 pixels (see FIG. 2A), which is further arranged in a block of 64 pixels in the order of the dictionary. The block processing is pipelined (FIG. 2 (b)). Pixel input to the compressor consists of "R" (red), "G" (green) and "B" (blue). These are immediately transformed into luminance-chrominance space (the reasons for such transformations are well known).

【００９９】変換は、任意の固定又はプログラム可能な
係数（図３（ａ))を使用でき、又は、専用の用途で簡単
な値に「ハードワイヤ結線」しておくことも可能であ
る。変換空間は、ここではＸＹＺで表記しているが、３
原色入力のあらゆる線形フォームを使用してもよく、Ｃ
ＣＩＴＴ規格の（Ｙ，Ｒ−Ｙ，Ｂ−Ｙ）もあり得る。実
際に、Ｘ，Ｙ，Ｚの３つの値は、各々別個の白黒コンプ
レッサに供給される。デコンプレッサは図３と同一又は
同等の回路を使用するが、ＸＹＺベクトルが、ここでは
ＲＧＢベクトルに変換される点で異なっている。The transformation can use any fixed or programmable coefficient (FIG. 3 (a)), or it can be "hardwired" to simple values for dedicated applications. The conversion space is represented by XYZ here, but 3
Any linear form of primary color input may be used, C
The CITT standard (Y, RY, BY) is also possible. In fact, the three values of X, Y, Z are each fed to a separate black and white compressor. The decompressor uses the same or equivalent circuitry as in FIG. 3, but differs in that the XYZ vectors are now converted to RGB vectors.

【０１００】値Ｙ，Ｘ，Ｚは３つのシフトレジスタへ入
力されて（図５参照）、第１の変換ユニットへの供給に
待機する。変換ユニットは、（２＋２／３）画素倍だけ
動作するので、データの幾らかは、図示したように遅延
されることになる。表示“ＸＹＺ”は不適切ビットであ
る。最適化した符号化方式はルミナンス（“Ｙ”）を第
１に処理する必要がある。The values Y, X, Z are input to the three shift registers (see FIG. 5) and await supply to the first conversion unit. Since the transform unit operates by (2 + 2/3) pixel times, some of the data will be delayed as shown. The indication "XYZ" is an incorrect bit. The optimized coding scheme requires the luminance ("Y") to be processed first.

【０１０１】伸長処理中、ＸＹＺ歪曲（スキュー）の問
題は反転する。レジスタの５点が伸長中のＹ及びＺシフ
トレジスタの使用を反転することで、好適実施例におい
て節約されていることに注意されたい。During the decompression process, the XYZ distortion problem is reversed. Note that 5 points of the register are saved in the preferred embodiment by inverting the use of the Y and Z shift registers during decompression.

【０１０２】図１（ａ）を参照すると、コンプレッサの
主要部分は入力をＸＹＺ空間に変換し、これを後続の変
換ユニット３への転送のために緩衝する入力部分１，２
を含む。各８画素の区間について変換１ユニットは３倍
のサイクルを行なう（Ｘ，Ｙ，Ｚのデータ各々について
１回ずつ）。変換１の出力はシフトアレイ４に配置さ
れ、ここで、８×８画素ブロックが完全に読取られるま
で保持される。変換２ユニット５，６は予め読取った画
素ブロックを操作し、各々の８画素ブロックの区間で３
倍のサイクルを行ない、データを符号化回路入力バッフ
ァ７，８へ提供する。符号化回路９，１０，１１は、ま
た、３原色座標の間で共有されているが、全ルミナンス
ブロックは割込みなしに符号化され、クロミナンスブロ
ックの各々が後続する。これら３ブロックの処理が６４
画素区間内に完了し得ない場合、タイミング兼制御論理
回路は外部入力回路に対し画素クロックを保持したまま
にする。記憶領域（入力シフトレジスタ２、シフトアレ
イ４、及び符号化回路入力バッファ７，８）は３原色の
ために３組作られる必要があるが、計算ユニット３，
５，６，９，１０，１１はＹ，Ｘ，Ｚのデータの間で共
有（時分割）される。Referring to FIG. 1 (a), the main part of the compressor transforms its input into XYZ space and buffers it for transfer to a subsequent transformation unit 3.
including. The conversion 1 unit performs a triple cycle for each section of 8 pixels (once for each of the X, Y, and Z data). The output of transform 1 is placed in shift array 4, where it is held until the 8x8 pixel block is completely read. The conversion 2 units 5 and 6 operate the pre-read pixel block, and each of the 8 pixel block sections has 3 units.
Double the cycle and provide the data to the encoder input buffers 7, 8. The coding circuits 9, 10, 11 are also shared between the three primary color coordinates, but the entire luminance block is coded without interruption, followed by each of the chrominance blocks. The processing of these 3 blocks is 64
If it cannot be completed within the pixel interval, the timing and control logic circuit holds the pixel clock to the external input circuit. The storage areas (the input shift register 2, the shift array 4, and the encoding circuit input buffers 7 and 8) need to be made in three sets for the three primary colors.
5, 6, 9, 10, and 11 are shared (time division) among Y, X, and Z data.

【０１０３】符号化回路９，１０，１１、符号化回路入
力バッファ７，８、符号プログラミング１２，１３，１
４及びタイミング兼制御論理回路（図示せず）は、従来
技術又は従来法を踏襲してもよい。同様に、３原色を単
一回路によって時分割するための方法も周知である。３
点変換部１（図３参照）及びシフトレジスタ２（図５参
照）もまた公知である。Encoding circuits 9, 10, 11, encoding circuit input buffers 7, 8, code programming 12, 13, 1.
4 and the timing and control logic (not shown) may follow conventional techniques or methods. Similarly, methods for time-sharing the three primary colors with a single circuit are also well known. Three
The point converter 1 (see FIG. 3) and the shift register 2 (see FIG. 5) are also known.

【０１０４】スケーラ６（図１）は、以下に説明する本
発明の量子化乗算器を使用する。これは簡便な実現であ
る。一般化チェン変換の定義と適切なパラメータを与え
れば、８点変換回路（図８及び図９参照）もまた簡便で
ある。シフトアレイ（図６（ａ))は、特に議論に値す
る。現在の入力ブロックから垂直方向のベクトル（に変
換されたベクトル）は、直前の画素ブロックからの水平
方向のベクトルが水平方向変換回路へ供給される間に組
立てられる。特別な設計なしで１２８個のレジスタが必
要とされ（現在のブロックと直前のブロックに各々６４
個ずつ）るのは、点が受信した順序とは異なる順序で使
用されるためである。しかし、この必要性は偶数番号の
画素ブロックの間にデータを左から右へシフトし奇数番
号の画素ブロックの間に上から下へシフトすることによ
り排除される。解説したシフトアレイは双方向性であ
る。４方向性シフトアレイがある種の実施例では好適で
ある。The scaler 6 (FIG. 1) uses the quantisation multiplier of the present invention described below. This is a simple realization. The 8-point conversion circuit (see FIGS. 8 and 9) is also simple, given the definition of the generalized Cheng conversion and appropriate parameters. The shift array (FIG. 6 (a)) deserves special discussion. The vertical vector (converted to) from the current input block is assembled while the horizontal vector from the previous pixel block is supplied to the horizontal conversion circuit. 128 registers are required without special design (64 blocks each for the current block and the previous block)
This is because the points are used in a different order than they were received. However, this need is eliminated by shifting data from left to right during even numbered pixel blocks and from top to bottom during odd numbered pixel blocks. The described shift array is bidirectional. A four-way shift array is preferred in certain embodiments.

【０１０５】図６（ｂ）は、同図（ａ）のシフトアレイ
の態様をさらに詳細に図示している。同図（ｂ）におい
て、ベクトルは底部でシフトアレイから一つずつ除去さ
れ、同図（ａ）の８点ＤＣＴ５部分へ送出される。その
間に、他の８点ＤＣＴ部分からの垂直方向ベクトルが上
部でシフトアレイに入力されている。段階的に古いベク
トルがシフトアレイから除去され、シフトアレイは次の
画素ブロックからの垂直方向ベクトルで完全に埋められ
る。FIG. 6B shows the mode of the shift array of FIG. 6A in more detail. In the figure (b), the vectors are removed one by one from the shift array at the bottom and sent to the 8-point DCT5 part of the figure (a). Meanwhile, the vertical vector from the other 8-point DCT portion is input to the shift array at the top. Gradually the old vector is removed from the shift array and the shift array is completely filled with vertical vectors from the next block of pixels.

【０１０６】次の画素ブロックで、データの流れる方向
は直前の画素ブロックのデータの流れの方向とは９０度
異なる。この方法で、水平方向ベクトルはシフトアレイ
の右から除去されて８点ＤＣＴへ送出され、新しく垂直
方向ベクトルが左から入ってくる。ブロック（Ｎ＋２）
まで進むと、別の９０度回転により元の形態に戻り、さ
らにこれが続く。In the next pixel block, the data flow direction is different from the data flow direction of the immediately previous pixel block by 90 degrees. In this way, the horizontal vector is removed from the right of the shift array and sent to the 8-point DCT, with the new vertical vector coming in from the left. Block (N + 2)
Another 90 degree rotation returns to the original form, and so on.

【０１０７】デコンプレッサ（図１（ｂ）参照）は、同
図（ａ）に示すコンプレッサと極めて類似した構造を有
しているが、データの流れる方向が逆である点で異な
る。好適実施例では、単一の装置がコンプレッサ又はデ
コンプレッサの何れかの２つのモードで動作する。The decompressor (see FIG. 1B) has a structure very similar to that of the compressor shown in FIG. 1A, but is different in that the data flow direction is opposite. In the preferred embodiment, a single device operates in two modes, either compressor or decompressor.

【０１０８】可能なＶＬＳＩの配置は（図４参照）、圧
縮（図４（ｂ)(ｃ))と伸長（図４（ｅ)(ｆ))で異なるデ
ータの流れとなる。これ以外のデータの流れも、以下の
章で詳述するパイプライン化した実現方法などで可能で
ある。変換及びシフトアレイユニットの動作は、一方の
配置では圧縮と伸長の両方について同一の方向的意味を
有するが、他方ではそうではない（図４（ａ）参照）。
これは、統合されたコンプレッサ／デコンプレッサのデ
ータ流れ（図７）を考えた場合に、一層明確に分かる。
２つの変換ユニットがＲＧＢ及び圧縮データ各々に関与
している場合（図４（ａ))、４方向シフトアレイを使用
しない限り配置の困難は解決されない。従って、２つの
変換ユニットを各々シフトアレイの入力及び出力部分に
関連させている（図４（ｄ))。Possible VLSI arrangements (see FIG. 4) have different data flows for compression (FIGS. 4 (b) (c)) and expansion (FIGS. 4 (e) (f)). Other data flows are possible with the pipelined implementation method described in detail in the following chapters. The operation of the transform and shift array units has the same directional meaning for both compression and decompression in one arrangement, but not in the other (see Figure 4 (a)).
This is more clearly seen when considering the integrated compressor / decompressor data flow (FIG. 7).
If two transform units are involved in each of the RGB and compressed data (Fig. 4 (a)), the placement difficulty will not be resolved unless a four-way shift array is used. Therefore, two conversion units are respectively associated with the input and output parts of the shift array (FIG. 4 (d)).

【０１０９】一つの実施例において、コンプレッサ中の
変換ユニット（図８参照）は、３８個の加算器を用いて
いる。右に１つ（“Ｒ１”）、２つ（“Ｒ２”）、又は
４つ（“Ｒ４”）位置をシフトするか左に１つ（“Ｌ
１”）位置をシフトするのは簡単に行なえる。図示した
回路はパラメータ（a,b,c,r）＝（5,2.4,1.5,17/24）を
用いている。ｂ＝２．５とした実現方法では、もう一つ
の実施例において３６個の加算器しか必要としなかっ
た。In one embodiment, the conversion unit in the compressor (see FIG. 8) uses 38 adders. Shift one ("R1"), two ("R2"), or four ("R4") positions to the right or one to the left ("L1")
It is easy to shift the 1 ") position. The circuit shown uses the parameters (a, b, c, r) = (5,2.4,1.5,17 / 24). B = 2.5. In the implementation method described above, only 36 adders were required in the other embodiment.

【０１１０】デコンプレッサの逆向き変換ユニットには
付随回路が必要である。「出力イネーブル」信号の注意
深い使用により、前向き変換回路中の大半の加算器は再
利用することが可能である。これの実現は当業者には容
易であろう。スケーラは、プログラムされたＲＡＭ又は
ＲＯＭ、及び無条件シフトとマルチプレクサと加算回路
のシステムを使用する。これは簡便な実現である。デス
ケーラは、各種の方法で実現可能だが、小さなハードワ
イヤ結線したＲＡＭ付き乗算器と、アキュムレータと、
タイミング兼制御論理回路及び小さなテンプレートカッ
トオフが望ましい。専用の低コスト用途において、デス
ケーラはデブラリング重みが広い範囲に渡ってほぼ最適
であることに注意して単純化することが可能である。従
って、単純なスケーリングをスケーラ内に使用すること
が可能である。デスケーラは、図１（ｂ）及び図７に図
示してあるように、符号化回路とその出力バッファの
間、又は出力バッファと変換回路の間の何れかに配置す
ることができる。符号化回路入力バッファは各種の方法
で実現可能で、シフトアレイと同様のサイクル共有レジ
スタ縮小構成を含む。より簡便な設計では、３８４×１
０ビットＲＡＭに６４×７ビットＲＯＭを使用してＲＡ
Ｍアドレスを提供している。An associated circuit is required in the inverse transform unit of the decompressor. By careful use of the "output enable" signal, most adders in the forward transform circuit can be reused. Implementation of this would be easy for a person skilled in the art. The scaler uses a programmed RAM or ROM and a system of unconditional shift, multiplexer and adder circuits. This is a simple realization. Descaler can be realized by various methods, but a multiplier with RAM connected to a small hard wire, an accumulator,
Timing and control logic and a small template cutoff are desirable. In dedicated low cost applications, the descaler can be simplified by noting that the debra ring weights are nearly optimal over a wide range. Therefore, simple scaling can be used in the scaler. The descaler can be placed either between the encoding circuit and its output buffer, or between the output buffer and the conversion circuit, as shown in FIGS. 1 (b) and 7. The encoding circuit input buffer can be implemented in various ways and includes a cycle sharing register reduction arrangement similar to a shift array. 384 x 1 for a simpler design
RA using 64x7 bit ROM for 0 bit RAM
Provides M address.

【０１１１】動作サイクルの例を図１（ａ）及び同図
（ｂ）との関連で解説する。同図（ａ）において、デー
タは３原色情報、赤、緑、青としてコンプレッサに入力
される。これは、すぐにＸＹＺと呼ばれる代替空間に変
換される。３つの要素Ｘ，Ｙ，Ｚは各々のシフトレジス
タへ入力される。シフトレジスタ（ステップ２）からこ
れらは８点ＤＣＴユニットへ進む。Ｘ，Ｙ，Ｚの３原色
の間で多重使用される８点ＤＣＴユニット１個か、又は
個々に独立した８点ＤＣＴユニットを各々が有するか、
の何れかが有り得る。情報は６４点シフトアレイ４へ入
力される。各色について個別のシフトアレイが存在す
る。情報はブロック４のシフトアレイから、ブロック３
と同様のブロック５の別のＤＣＴユニットへ進む。情報
はここで圧縮され、これが加算されたシフトのさらなる
層となる。情報は水平方向及び垂直方向の双方にだけ変
換される。シフトアレイはデータを９０度実際に概念的
に回転させ、これが他の方向に変換できるようになす。
データの圧縮後、データはブロック７，８で示される
（Ｚ１及びＺ２）別のバッファへ進み、最終的に符号化
されてチップから出力されるようにデータが保持される
（Ｚ１及びＺ２は等しくジグザグである）。An example of the operation cycle will be described in relation to FIG. 1 (a) and FIG. 1 (b). In the same figure (a), the data is input to the compressor as three primary color information, red, green and blue. It is immediately transformed into an alternative space called XYZ. The three elements X, Y, Z are input to each shift register. From the shift register (step 2) these go to the 8-point DCT unit. One 8-point DCT unit multiplexed between the three primary colors X, Y, Z, or each having an independent 8-point DCT unit,
Either of them can be. Information is input to the 64-point shift array 4. There is a separate shift array for each color. Information is transferred from block 4 shift array to block 3
Proceed to another DCT unit in block 5, similar to. The information is compressed here, which is the further layer of the added shift. Information is transformed only in both horizontal and vertical directions. The shift array actually rotates the data 90 degrees conceptually, allowing it to be translated in the other direction.
After compression of the data, the data goes to another buffer (Z1 and Z2) indicated by blocks 7 and 8 where the data is held so that it is finally encoded and output from the chip (Z1 and Z2 are equal). It is zigzag).

【０１１２】概念的には、これはブロック４のシフトア
レイと同様でデータが９０度回転されていない点で異な
っている。その代り、従来からこれらのことに用いられ
ておりＣＣＩＴＴ規格で使用されているジグザグの順序
に変更されている。情報はブロック９のラン及びテンプ
レート制御ユニットに渡され、ここで、０を検出して０
のランを生成し、非０を検出して値の対数値の推定値を
検出する。これは、テンプレートと呼ばれる。ランとテ
ンプレートの組合せは、ＲＴ符号と呼ばれて、ＲＡＭ又
はＲＯＭ内に参照され、これがチップから出力される。Conceptually, this is similar to the shift array of block 4 except that the data is not rotated 90 degrees. Instead, the zigzag order used in the CCITT standard has been changed in the related art. The information is passed to the run and template control unit of block 9 where 0 is detected and 0
, Run non-zero and detect the log-valued estimate of the value. This is called a template. The combination of run and template, referred to as the RT code, is referenced in RAM or ROM and is output from the chip.

【０１１３】変換係数の上位ビットである仮数部もチッ
プから出力される。仮数部及びランとテンプレート符号
は任意の長さ、１ビット、２ビットなどで良く、チップ
からの出力は必ず１６ビット又は８ビット、３２ビット
などとなるため、ブロック１１（整列）がこれを容易に
する。The mantissa part, which is the upper bit of the transform coefficient, is also output from the chip. The mantissa part, the run, and the template code may be of any length, 1 bit, 2 bits, etc., and the output from the chip is always 16 bits, 8 bits, 32 bits, etc., so block 11 (alignment) facilitates this. To

【０１１４】図１（ａ）に図示したその他のブロック
（任意）のプログラミングブロック１２，１３は、各々
任意のＲＧＢをＸＹＺ変換、任意のレート・スケーラ及
び心理要因の重み、及びランとテンプレート用の任意の
修正ホフマン符号に設定できる。The programming blocks 12 and 13 of the other blocks (arbitrary) shown in FIG. 1A are for XYZ conversion of arbitrary RGB, arbitrary weights of rate scaler and psychological factor, and runs and templates. It can be set to any modified Hoffman code.

【０１１５】図１（ｂ）は同図（ａ）と極めて類似して
いる。ランとテンプレート符号はここではランとテンプ
レートの組合せに復号される必要があり、必要な数の０
が無視されねばならない。FIG. 1B is very similar to FIG. The run and template codes now have to be decoded into a combination of run and template, with the required number of 0's.
Must be ignored.

【０１１６】図１（ａ）において、スケーラ７は加算回
路とシフト回路の単純アレイである。同図（ｂ）におい
て、デスケーラ１５は極めて小さいハードウェアの乗算
器として実現されている。In FIG. 1A, the scaler 7 is a simple array of adder circuits and shift circuits. In the figure (b), the descaler 15 is realized as a multiplier of extremely small hardware.

【０１１７】図１０は２次元一般化チェン変換の非パイ
プライン化実装の略図を示す。パイプライン化実装は後
章で解説する。画素は上部から入り、通常８ビット幅で
ある。画素は標準１２８ビットのデータ幅で水平方向の
変換回路１０内の広い加算回路のアレイを通過する。水
平方向の変換回路からの出力は移項用ＲＡＭ１２を通過
して水平方向から垂直方向へ情報を回転する。データは
次にこれも加算回路だけからなる（通常１２８ビット
幅）垂直方向の変換回路１６を通過する。出力係数は最
終的におよそ１６ビットの幅に縮小され、本発明におい
てＪＰＥＧ互換となしている単一の乗算器２０を通過す
る。FIG. 10 shows a schematic of a non-pipelined implementation of the two-dimensional generalized Chien transform. Pipelined implementation will be explained in a later section. The pixels enter from the top and are usually 8 bits wide. The pixels pass through a wide array of adders in the horizontal converter circuit 10 with a standard 128 bit data width. The output from the horizontal conversion circuit passes through the transfer RAM 12 to rotate the information from the horizontal direction to the vertical direction. The data then passes through a vertical conversion circuit 16, which also consists of an adder circuit only (usually 128 bits wide). The output coefficients are eventually reduced to a width of approximately 16 bits and pass through a single multiplier 20 which is JPEG compatible in the present invention.

【０１１８】図１１は本発明によるＶＬＳＩ実装のブロ
ック図である。図１１において、データはブロック４０
で入力され入力ラッチ４２内にラッチされ、マルチプレ
クサ４４を通過してＧＣＴ変換回路５０の前半へ進む
（これは、図８に示したように加算器ネットワークより
なる）。ＧＣＴ変換回路５０の後半は中断ラッチ５４の
右側へ接続される。出力はマルチプレクサ６２を通って
水平方向から垂直方向の変換が行なわれる移項用ＲＡＭ
６６へ進む。移項用ＲＡＭ６６の出力は、タイムシェア
リング又はタイムスライシング構成における垂直方向の
変換の前半を形成する目的で、ＧＣＴ変換回路５０の第
１段への背景に供給される。ＧＣＴ変換回路５０の出力
は垂直方向のＧＣＴ変換回路６０の第２段の入力へ供給
される。最後のＧＣＴ変換回路６０の出力が出力ラッチ
・マルチプレクサ７０から取出され、乗算器７４とまる
め回路７６を経由してジグザグ順序配置回路８０へ進
み、これの出力が１２ビット係数としてブロック８４か
ら出力される。FIG. 11 is a block diagram of VLSI packaging according to the present invention. In FIG. 11, the data is block 40.
Is input to the input latch 42, is latched in the input latch 42, passes through the multiplexer 44, and proceeds to the first half of the GCT conversion circuit 50 (this is composed of an adder network as shown in FIG. 8). The latter half of the GCT conversion circuit 50 is connected to the right side of the interruption latch 54. The output is passed through a multiplexer 62 and the transfer RAM for horizontal to vertical conversion is performed.
Proceed to 66. The output of the transfer RAM 66 is supplied to the background to the first stage of the GCT conversion circuit 50 for the purpose of forming the first half of the vertical conversion in a time sharing or time slicing arrangement. The output of the GCT conversion circuit 50 is supplied to the input of the second stage of the vertical GCT conversion circuit 60. The output of the last GCT conversion circuit 60 is taken from the output latch / multiplexer 70, passes through the multiplier 74 and the rounding circuit 76 to the zigzag ordering circuit 80, and its output is output from the block 84 as a 12-bit coefficient. It

【０１１９】さらに、図１１を参照し、本発明の逆向き
変換の過程を簡潔に解説する。図１１において、１２ビ
ット係数はブロック８４を通ってジグザグ順序回路８０
のＹ入力へ供給される。ジグザグ順序回路８０の出力
は、前向き処理において実行されたのと類似の逆向きの
量子化処理を実行する乗算器７４とまるめ回路７６を経
由する。乗算器７４の出力は逆向き変換処理の第１段で
あるラッチ４２へ入力される。ラッチ４２から、逆向き
変換処理は前向き処理が辿ったのと同じ２段階の時間多
重経路を辿る。出力は出力ラッチ７０に出現し、これの
出力はまるめ回路７６によりまるめられた画素で、まる
め回路７６の出力は出力用２のブロック４０へ供給され
る。Further, with reference to FIG. 11, the process of the reverse transformation of the present invention will be briefly described. In FIG. 11, the 12-bit coefficient passes through the block 84 and the zigzag sequential circuit 80.
To the Y input of the. The output of the zigzag sequential circuit 80 is passed through a multiplier 74 and a rounding circuit 76 which performs an inverse quantization process similar to that performed in the forward process. The output of the multiplier 74 is input to the latch 42 which is the first stage of the reverse conversion process. From the latch 42, the reverse transform process follows the same two-stage time multiplex path that the forward process followed. The output appears in the output latch 70, the output of which is the pixel rounded by the rounding circuit 76, and the output of the rounding circuit 76 is provided to the output 2 block 40.

【０１２０】[0120]

【本発明の量子化乗算器】符号化すべき大量のデータを
圧縮するには、頻度領域係数Ｆ(i,j) が正の整数の量子
値Ｑ(i,j) で除され、さらに、最も近い整数にまるめら
れる（Ｑ(i,j) は、この章で量子行列式を表わすために
使用しており、直前の章とは対照的であることに注意さ
れたい）。逆に、逆向きの動作には、Ｑ(i,j) による乗
算が要求されることになる。大きな量子値は大幅な圧縮
を提供するが、画像の品位の大幅な劣化を招来する（自
乗平均誤差（ＭＳＥ）による測定で）。小さな量子値は
大幅な圧縮を提供しないが、もっと小さなＭＳＥを生成
する。Quantization Multiplier of the Present Invention To compress a large amount of data to be encoded, the frequency domain coefficient F (i, j) is divided by a positive integer quantum value Q (i, j), and further, Rounded to the nearest integer (note that Q (i, j) is used to represent quantum determinants in this chapter, as opposed to the previous chapter). On the contrary, the reverse operation requires multiplication by Q (i, j). Large quantum values provide significant compression, but lead to significant degradation of image quality (as measured by root mean square error (MSE)). Small quantum values do not provide significant compression, but produce smaller MSE.

【０１２１】量子化係数Ｑ(i,j) は、ここで前向きスケ
ーリング行列Ｓf(i,j)と称する段階Ｃ、Ｄ及びＥの行列
と組合せることができる。同様に、量子化係数の反転
も、ここで反転スケーリング行列Ｓi(i,j)と称する段階
Ｈ及びＩの行列式と組合せることができる。従って、前
向き変換はＳf ／Ｑ（指数部は便利のために削除した）
の応用に関連し、逆向き変換はＳi ＊Ｑの応用に関連す
る。前向き操作は除算であるため、逆の相関がＱの大き
さとＳの数学的解像度の間に存在する。計算効率につい
てみると、整数の除算は一般に乗算とシフトによって実
行される。例えば、１６ビットの計算において、整数ｋ
による除算は２¹⁶／ｋ＝６５５３６／ｋの乗算と、それ
に続く１６ビットの右シフトにより、さらに便宜的に実
行し得るものである。The quantized coefficient Q (i, j) can be combined with a matrix of stages C, D and E, referred to herein as the forward scaling matrix Sf (i, j). Similarly, the inversion of the quantized coefficients can also be combined with the determinant of steps H and I, referred to herein as the inversion scaling matrix Si (i, j). Therefore, the forward transform is Sf / Q (the exponent part was deleted for convenience)
, And the inverse transform is related to the application of Si * Q. Since the forward operation is division, there is an inverse correlation between the magnitude of Q and the mathematical resolution of S. In terms of computational efficiency, integer division is generally performed by multiplication and shift. For example, in a 16-bit calculation, the integer k
The division by can be performed more expediently by a multiplication of 2 ¹⁶ / k = 65536 / k followed by a 16-bit right shift.

【０１２２】逆向き変換において、ＱとＳi の乗算のた
め、ＱとＳi の範囲の間に逆の相関が存在し、それによ
って、逆相間はＱの範囲と積の解像度の間に存在するこ
とになる。ＪＰＥＧの基準システムにおいて、量子化値
は符号なし１１ビットである。よって、可能な最大の量
子化係数は１０２３又は２¹⁰である。乗算が１６ビット
計算で実行された場合、Ｓi は２⁶ の範囲を有する。Ｑ
の値が小さいとＳi の解像度はＭＳＥより関与が大き
い。In the inverse transform, there is an inverse correlation between the Q and Si ranges due to the multiplication of Q and Si, so that there is an antiphase between the Q range and the product resolution. become. In the JPEG reference system, the quantized value is unsigned 11 bits. Therefore, the maximum possible quantization coefficient is 1023 or 2 ¹⁰ . If the multiplication is carried out in a 16-bit calculation, Si has a range of 2 ⁶ . Q
The smaller the value of, the greater the resolution of Si is than MSE.

【０１２３】…従来の方法… 最も近代的なコンピュータ、マイクロプロセッサ及び専
用のデジタル信号処理チップは、３２ビット（３２ｂ）
乗算を有し、正しく使用した場合にこの問題を解決する
には十分以上である。Conventional Method ... Most modern computers, microprocessors and dedicated digital signal processing chips have 32 bits (32b).
Having multiplication and more than enough to solve this problem when used correctly.

【０１２４】高速の専用ハードウェアにおいて、前向き
と逆向き変換両方について、同一の乗算器を使用するこ
とが望ましい。「リアルタイム」の速度（ビデオ画像に
ついて、およそ３０メガサイクル又はそれ以上）では、
１６ｂ乗算回路は最も実現しやすい解像度に近い。さら
に、大掛りな乗算器はさらにシリコンを必要とし、実行
速度が遅くなる。幾つかのＪＰＥＧ変換チップでは、一
般化チェン変換の代りにディスクリート・コサイン変換
ＤＣＴを使用しており、スケーリング及び予備スケーリ
ング、即ち、Ｓf とＳi の必要を有していない。他方
で、多くのＤＣＴ実装はＧＣＴが呼出すスケーリングの
形式を必要としている。In high speed dedicated hardware, it is desirable to use the same multiplier for both forward and backward transforms. At “real time” speeds (approximately 30 megacycles or more for video images),
The 16b multiplication circuit is close to the resolution that is most easily realized. In addition, large multipliers require more silicon and are slower to execute. Some JPEG transform chips use a discrete cosine transform DCT instead of a generalized Cheng transform and do not have the need for scaling and pre-scaling, Sf and Si. On the other hand, many DCT implementations require some form of scaling that the GCT calls.

【０１２５】しかし、妥当なＭＳＥのためには、３２ビ
ット出力の殆どを自由に使用できる必要があることには
注意されたい。前向きモードでは、除算は大きな標準化
数から数値を縮小することにより達成している。出力の
順位の高いビットから結果を取出す必要がある。逆向き
モードでは、数値は乗算され、よって、小さい標準化数
が望ましい。出力の低い順位のビットから結果を取出す
必要がある。組合せることによる乗算器ハードウェアに
おいて、不必要なビットの切捨てなど、殆ど又は全く縮
小が行なわれない。Note, however, that for a reasonable MSE, most of the 32-bit output needs to be freely available. In positive mode, division is achieved by reducing the number from a large standardized number. You need to get the result from the high order bits of the output. In reverse mode, the numbers are multiplied, so a small standardized number is desirable. You need to get the result from the low order bits of the output. There is little or no reduction in combinatorial multiplier hardware, such as truncation of unnecessary bits.

【０１２６】乗算が１６ビットに制限されている場合、
性能は大幅に低下し、これは米国特許出願番号０７／５
１１，２４５号の一般化チェン変換について相互参照し
ている例に相当する（以下の性能の議論の章参照）。特
定すれば、逆向き変換において、Ｑの範囲はＳi の範囲
と競合する。Ｓi の解像度は量子化値が低い場合、最も
重要だが、これは、大きい量子化数は乗算の解像度が意
味をなさないほど大きな歪曲を付加するためである。If the multiplication is limited to 16 bits, then
The performance is significantly reduced due to US patent application Ser. No. 07/5
This corresponds to the cross-referenced example of the generalized Cheng transform of No. 11,245 (see the performance discussion section below). In particular, the range of Q competes with the range of Si in the inverse transform. The resolution of Si is most important when the quantization value is low, because a large quantization number adds distortion so large that the resolution of the multiplication makes no sense.

【０１２７】…本発明の説明… 本発明の目的の一つは、前向きと逆向き両方の量子化で
１６ビット・ハードウェア乗算器、即ち、１６ビット計
算を用いて最大限の性能を提供することである。これに
は、範囲と解像度の間の平衡を必要とする。DESCRIPTION OF THE INVENTION One of the objects of the present invention is to provide maximum performance using 16-bit hardware multipliers, ie 16-bit arithmetic, with both forward and backward quantization. That is. This requires a balance between range and resolution.

【０１２８】…前向きスケーリングと量子化… 前向きモードにおいて、経験的な結果から、１６ビット
・ハードウェア乗算器は、十分な解像度を提供し得ると
示されている。最も大きい値（Ｓf ×２¹⁶）を（２¹⁶−
１）となるように選択することが可能である。大きなＱ
は、（Ｓf ／Ｑ×２¹⁶）の値の範囲を減少させるが、こ
の数値の解像度の欠如に起因するエラーは量子化により
もたらされるエラーと比較すれば小さい。Forward Scaling and Quantization In empirical mode, empirical results indicate that a 16-bit hardware multiplier can provide sufficient resolution. The largest value (Sf × 2 ¹⁶ ) is (2 ¹⁶ −
It is possible to select 1). Big Q
Reduces the range of values of (Sf / Q × 2 ¹⁶ ), but the error due to the lack of resolution of this number is small compared to the error introduced by quantization.

【０１２９】入力並びにＳf ／Ｑを正しくスケーリング
することにより、出力は乗算器出力の上位Ｎビットに出
現する。即ち、By properly scaling the input as well as Sf / Q, the output appears in the upper N bits of the multiplier output. That is,

【数６４】結果＝（入力＊Ｑ係数）≫Ｎここで、“≫”は右へのシフト操作を表わす。また、[Equation 64] Result = (input * Q coefficient) >> N where ">>" represents a shift operation to the right. Also,

【数６５】Ｑ係数＝Ｓf／Ｑ＊２¹⁶ である。２つのＮビット係数の乗算は、一般に、２Ｎビ
ットの積となる。結果がハードウェア乗算器の上位側１
６ビットから取出されるので、下位側１６ビットを供給
するゲートを切り詰めることが可能である。下位側Ｎビ
ットから必要とされることの全ては関連性を担う項だけ
である。[Equation 65] Q coefficient = Sf / Q * 2 ¹⁶ . The multiplication of two N-bit coefficients is generally a 2N-bit product. The result is the high-order side 1 of the hardware multiplier
Since it is taken from 6 bits, it is possible to truncate the gate supplying the lower 16 bits. All that is needed from the lower N bits is the relevant term.

【０１３０】これは、図１２に示された２モード・ハー
ドウェアによる実現方法に図示されている。前向き変換
を実行する場合、前向き入力（／Ｆorward）はＬレベル
（即ち、０）である。従って、制御マルチプレクサ（Ｍ
ＵＸ）１００は０のＧＮＤ信号を１６入力１出力のＭＵ
Ｘ１０４へ送出する。前向き入力／Ｆorward上の０信号
はＭＵＸ１０８へ向かい、乗算器１０６で符号を付けら
れた１６ビット×１６ビットの入力Ａ０〜Ａ３へＱ指数
部の４ビットを送信する。この例ではThis is illustrated in the method of implementation by the bimodal hardware shown in FIG. When performing forward conversion, the forward input (/ Forward) is at L level (that is, 0). Therefore, the control multiplexer (M
UX) 100 is a 0 input GND signal with 16 inputs and 1 output MU
Send to X104. The 0 signal on the forward input / Forward goes to the MUX 108 and sends 4 bits of the Q exponent to the 16 bits × 16 bits inputs A0 to A3 signed by the multiplier 106. In this example

【数６６】Ｑ係数＝（Ｑ仮数部≪４）＋Ｑ指数部であり、乗算器１０６は積“Ｑ係数＊入力”を生成す
る。出力Ｒesult は下位１６桁が使用しないワードとし
て破棄されることから、（Ｑ係数＊入力≫１６）に等し
いことになる。Q coefficient = (Q mantissa part << 4) + Q exponent part, and the multiplier 106 generates the product “Q coefficient * input”. The output Result is equal to (Q coefficient * input >> 16) because the lower 16 digits are discarded as unused words.

【０１３１】…逆向き予備スケーリングと脱量子化… 本発明は、１６ビット演算における最高の正確度を可能
にするような範囲と解像力の間の妥協を配分することに
より、逆向き脱量子化を補助する過程に関連をなすもの
である。経験的に、解像度約１２ビットが所望するＭＳ
Ｅに必要であると決定された。ＪＰＥＧ規格仕様におけ
る量子化には１０ビットが必要とされているので、範囲
としては、２４ビットが必要である。これは、１６ビッ
ト係数のうち上位１２ビットを仮数部として用い、下位
４ビットを２を底とする指数項として用いることで達成
している。２⁴ の可能なシフト値と（１６−４）ビット
の仮数部の組合せにより、有効範囲は、Inverse Prescaling and Dequantization: The present invention performs inverse dequantization by allocating a compromise between range and resolution that allows the highest accuracy in 16-bit arithmetic. It is related to the assisting process. Empirically, MS that resolution of about 12 bits is desired
It was decided that it was necessary for E. Since 10 bits are required for quantization in the JPEG standard specifications, 24 bits are required as a range. This is achieved by using the upper 12 bits of the 16-bit coefficient as the mantissa and the lower 4 bits as the exponent term with base 2. With a combination of 2 ⁴ possible shift values and a (16-4) bit mantissa, the effective range is

【数６７】有効範囲＝［（１６−４）＋２⁴ ］ビット＝２８ビットである。図１２の２モード・ハードウェアによる実装で
図示してあるように、逆向きモードにおいて、前向き入
力／ＦorwardがＨレベルの場合、１６入力１出力のＭＵ
Ｘ１０４への制御入力は、入力値にｉ桁の左シフトを生
成する。Ｑ仮数部の１２ビットは、乗算器１０６の入力
Ａ４からＡ１５に入力される。前向き入力／Ｆorwardか
らＭＵＸ１０８へＨレベル側にある制御値は、ＧＮＤ信
号から乗算器１０６のビットＡ０〜Ａ３へ０を送出す
る。ここでも、出力結果は、乗算既出力の上位側１６ビ
ットに存在しており、Ｌレベル側１６桁が未使用ワード
として破棄されることに注意されたい。結果は、従っ
て、次式のように決定されることになる[Equation 67] Effective range = [(16-4) +2 ⁴ ] bits = 28 bits. As shown in the two-mode hardware implementation of FIG. 12, in the reverse mode, when the forward input / Forward is at the H level, the 16-input 1-output MU
The control input to X104 produces an i digit left shift in the input value. The 12 bits of the Q mantissa are input to the inputs A4 to A15 of the multiplier 106. The control value on the H level side from the forward input / Forward to the MUX 108 sends 0 from the GND signal to the bits A0 to A3 of the multiplier 106. Here again, it should be noted that the output result exists in the upper 16 bits of the multiplication output and the 16 digits of the L level are discarded as unused words. The result will therefore be determined as

【数６８】結果＝（（入力≪Ｑ指数部）＊Ｑスケーラ）≫１６ここで、[Equation 68] Result = ((input << Q exponent part) * Q scaler) >> 16 where:

【数６９】Ｑスケーラ≪Ｑ指数部＝Ｓi ×Ｑ×２¹⁶ Ｑスケーラ＜２¹² また、Q scaler << Q exponent part = Si x Q x 2 ¹⁶ Q scaler <2 ¹²

【数７０】０＜Ｑ指数部＜（２⁴ −１）である。0 <Q exponent part <(2 ⁴ −1).

【０１３２】入力値が左にシフトされることから、入力
が制限される必要がある。さもなくば、値がオーバーフ
ローしてしまい、偽の結果が生成される。しかし、これ
らの数が、ここで乗算に用いられている係数によって量
子化されているという事実から、このことは無条件で行
なわれている。本発明があらゆる乗算に一般化し得ない
理由はこれである。Since the input value is shifted to the left, the input needs to be restricted. Otherwise, the value will overflow, producing a false result. However, this is done unconditionally due to the fact that these numbers are quantized by the coefficients used here for multiplication. This is why the present invention cannot be generalized to any multiplication.

【０１３３】本発明の拡張は、図１３に図示してあり、
ここでは、ＭＵＸ１１０による左シフトが１６ビット×
１６ビットのＭＵＸ１１２による乗算段階の後で発生し
ている。前向き変換において、前向き入力／Ｆorwardは
Ｌレベルである。ＭＵＸ１１４の制御入力への０信号
は、乗算器１１２の入力Ａ０〜Ａ３へＱ指数部の４ビッ
トを送信する。乗算器１１２の３２ビットの積の最上位
１６桁（Ｑ３１〜Ｑ１６）は１６入力１出力のマルチプ
レクサによりＭＵＸ１１６からのＧＮＤ信号に従って選
択される。An extension of the invention is illustrated in FIG.
Here, the left shift by MUX 110 is 16 bits ×
It occurs after the multiplication step by the 16-bit MUX 112. In the forward conversion, the forward input / Forward is at the L level. The 0 signal to the control input of MUX 114 sends the 4 bits of the Q exponent to inputs A0-A3 of multiplier 112. The 16 most significant digits (Q31 to Q16) of the product of 32 bits of the multiplier 112 are selected according to the GND signal from the MUX 116 by the multiplexer of 16 inputs and 1 output.

【０１３４】逆向き変換において、前向き入力／Ｆorwa
rd信号はＨレベルである。従って、ＧＮＤ信号はＭＵＸ
１１４により１６ビット×１６ビット符号付き乗算器１
１２の入力Ａ０〜Ａ３へ送信される。ｉ＝３２−Ｑ指数
部、かつ、ｊ＝ｉ−１５であるような乗算器１１２から
の３２ビット出力の内のビットＱｉ−ＱｊがＭＵＸ１１
６を経由して、１６入力１出力の乗算器１１０へのＱ指
数部入力の値に従い、結果として選択される。入力値の
左シフトは実行されないのであるから、入力値は範囲を
制限されることはない（即ち、予めフォーマットされな
い）。この場合、演算は数学的に次のように表現され
る。In reverse conversion, forward input / Forwa
The rd signal is at H level. Therefore, the GND signal is MUX
16-bit × 16-bit signed multiplier 1 114
Sent to 12 inputs A0-A3. i = 32-Q exponent part and bit Qi-Qj of the 32 bit output from multiplier 112 such that j = i-15 is MUX11.
It is selected as a result according to the value of the Q exponent part input to the multiplier 110 having 16 inputs and 1 output via 6. The input value is not limited in range (ie, not pre-formatted), as no left shift of the input value is performed. In this case, the operation is mathematically expressed as follows.

【数７１】結果＝（入力＊Ｑスケーラ）≫Ｑ指数部≫１６[Equation 71] Result = (input * Q scaler) >> Q exponent >> 16

【０１３５】…性能の議論… 表１は、実験的に実行したＣＣＩＴＴによる７０４×５
７６×８ｂグレーレベル・テスト画像であるバルバラ
（Ｂarbara）の画像についての自乗平均誤差（ＭＳＥ）
の結果を示したものである。量子化の値は第１の例で全
て１をなし、第２の例ではＪＰＥＧ規格における提唱ル
ミナンス量子化表からのものである。画像は３２ｂ乗算
器と、１６ｂ乗算器と、これまでの章で解説したような
１２ｂ仮数部と４ｂ指数部を用いた１６ｂ乗算器で実現
した本発明を用いてチップのソフトウェア模擬により処
理を行なった。結果は以下の表１に示すとおりである。Performance discussion ... Table 1 shows 704 × 5 by CCITT which was experimentally executed.
Root Mean Error (MSE) for the Barbara image, a 76x8b gray level test image
It shows the result of. Quantization values are all 1's in the first example and in the second example are from the proposed luminance quantization table in the JPEG standard. The image is processed by software simulation of the chip using the present invention realized by the 32b multiplier, the 16b multiplier, and the 16b multiplier using the 12b mantissa part and the 4b exponent part as explained in the previous chapters. It was The results are shown in Table 1 below.

【０１３６】[0136]

【表１】 [Table 1]

【０１３７】推測されるように、乗算機関の主要な相違
は、Ｑが小さい場合、即ち、品位再現が所望される場合
に発生している。より少ないハードウェアを用いても、
本発明は３２ビットに近い正確度を提供している。ＭＳ
Ｅでの差は視覚的に有意ではない。しかし、ＣＣＩＴＴ
勧告Ｈ．２６１変換不適合追従試験に適合させるには、
本発明はディスクリート・コサイン変換と、さらに密接
に近似するパラメータ値を使用する必要がある。As can be inferred, the main difference in the multiplication engine occurs when Q is small, that is, when quality reproduction is desired. With less hardware,
The present invention provides an accuracy close to 32 bits. MS
The difference at E is not visually significant. However, CCITT
Recommendation H. To comply with the 261 conversion nonconformance follow-up test,
The present invention requires the use of discrete cosine transforms and parameter values that more closely approximate.

【０１３８】３２ｂ乗算器を実現するには、１６ｂ乗算
器よりおよそ８５％増しのシリコン表面領域を使用する
ことになり（１．０μｍＣＭＯＳ標準セル技術に基づく
推定値。これは、ＨＧＣＴが実現されている技術であ
る）、集積回路技術で大きな問題となる。本発明は、こ
の領域に３０％を追加するだけである。単一の乗算器は
シリコンのおよそ１０％を使用することは特筆に値す
る。Implementing a 32b multiplier would use approximately 85% more silicon surface area than a 16b multiplier (estimate based on 1.0 μm CMOS standard cell technology. This is for HGCT implementations). Technology), which is a major problem in integrated circuit technology. The present invention only adds 30% to this area. It is worth noting that a single multiplier uses approximately 10% of the silicon.

【０１３９】[0139]

【ＧＣＴ変換のパイプライン化した実現】…背景… ＣＣＩＴＴのＪＰＥＧ委員会の提案する国際規格画像圧
縮システムを実行するようなＶＬＳＩチップを製造する
ことが望まれる。多くの用途では、ＶＬＳＩチップがビ
デオ速度で動作することが必要とされ、これは（解像度
により差があるが）、毎秒８００ないし１０００万画素
程度を意味する。各々の画素は通常赤、緑、青などの３
原色からなる。大半のＶＬＳＩ実装は一度に一つの成分
について動作し、必要とされるクロック周波数は画素速
度の３倍である。これは、チップのクロック周波数をお
よそ２５〜３０ＭＨｚに押し上げることになる。これ
は、1991年の標準から見ても高いクロック速度である。[Pipelined implementation of GCT conversion] Background ... It is desired to manufacture a VLSI chip that implements the international standard image compression system proposed by the JPEG committee of CCITT. Many applications require VLSI chips to operate at video speeds, which means (depending on resolution) about 8 to 10 million pixels per second. Each pixel is usually 3 such as red, green, blue, etc.
It consists of primary colors. Most VLSI implementations operate on one component at a time and the required clock frequency is three times the pixel rate. This will boost the clock frequency of the chip to approximately 25-30 MHz. This is a high clock speed by the 1991 standard.

【０１４０】ＤＣＴの最も慣習的な実現では、乗算器と
加算器の組合せを用いて変換を実行している。乗算器は
多くの実現において、たいてい障害となっている。その
他の機能、例えば、ＲＡＭやＲＯＭは２次的な障害を構
成する。これらの障害を克服するには、長いパイプライ
ン構造を使用する。典型的なＤＣＴチップでのパイプラ
インは２００クロック周期にまで及ぶことがあり、チッ
プ内部で２００処理が並列的に発生していることを意味
する。The most conventional implementation of DCT uses a combination of multipliers and adders to perform the transformation. Multipliers are often an obstacle in many implementations. Other functions, such as RAM and ROM, constitute a secondary obstacle. To overcome these obstacles, use long pipeline structures. A pipeline in a typical DCT chip can extend up to 200 clock cycles, meaning that 200 processes are occurring in parallel within the chip.

【０１４１】図１５はディスクリート・コサイン変換で
の在来のパイプライン構造を示したものである。画素成
分は図面の左手に到着し、寸法が１×８の並列ベクトル
内でラッチ装置１２０内部にラッチされている。これら
の１×８ベクトルは、ＤＣＴを実行するために１次元変
換回路１２２へ渡される。１×８行ベクトルは、次に移
項装置１２４により移項されて、８×１の形状の列ベク
トルに変換される。移項後、在来システムでは移項した
ベクトルは変換のために、第２のＤＣＴユニット１２６
に供給される。この第２の変換が行なわれている間に、
第１の変換ユニット１２２は次の１×８行ベクトルで占
有されている。従って、パイプラインは有効に作用す
る。最後の乗算は、乗算ユニット１２８で実行される。
ＤＣＴはシステムにとって計算上の障害であるので、上
述のような構造がビデオ速度を達成するために必要とさ
れる。FIG. 15 shows a conventional pipeline structure in the discrete cosine transform. The pixel components arrive on the left hand side of the drawing and are latched inside the latch device 120 in a parallel vector of size 1x8. These 1 × 8 vectors are passed to the one-dimensional transform circuit 122 to perform the DCT. The 1 × 8 row vector is then transposed by the transposition device 124 to be converted into a column vector having an 8 × 1 shape. After transposition, in the conventional system, the transposed vector is transferred to the second DCT unit 126 for conversion.
Is supplied to. While this second conversion is taking place,
The first transform unit 122 is occupied by the next 1 × 8 row vector. Therefore, the pipeline works effectively. The final multiplication is performed in multiplication unit 128.
Since DCT is a computational impediment to the system, structures such as those described above are required to achieve video rates.

【０１４２】図１５は明確になすために簡略化してある
が、変換全体についての制約を理解することが重要であ
る。乗算演算がシステムの障害であることを想起された
い。変換ユニット１２２，１２６は乗算を含むので、こ
れらと最後の乗算器１２８が大まかに等しい障害を構成
していることになる。ここで、単一の乗算を実行するの
にｘナノ秒必要だと仮定する。図１５において（２つの
変換ユニット１２２及び１２６が存在する）、各々の変
換ユニット１２２，１２６が８成分の計算を同時に実行
する。従って、変換ユニットは８ｘナノ秒で計算を実行
していることになる。これは、今日の構造によって現在
でも実現可能である。Although FIG. 15 has been simplified for clarity, it is important to understand the constraints on the overall transformation. Recall that multiplication operations are a bottleneck in the system. The conversion units 122, 126 include multiplications, so that they and the last multiplier 128 constitute roughly equal faults. Now assume that it takes x nanoseconds to perform a single multiplication. In FIG. 15 (where there are two transform units 122 and 126), each transform unit 122, 126 simultaneously performs an 8-component calculation. Therefore, the conversion unit is performing the calculation in 8x nanoseconds. This is still feasible today with today's architecture.

【０１４３】…本発明… 本発明の一般化チェン変換（ＧＣＴ）は、主変換におい
て乗算を全く必要とせず、成分当たり１回だけの乗算を
変換処理の最後で必要とするだけである。主１次元ＧＣ
Ｔは、最大７つの不連続レベルで構成された何らかの３
８個の加算回路のアレイからなる（図８，図９及び図１
０参照）。加算回路アレイは、ハードワイヤ結線された
シフト回路を含み、これによって、上述のように２を指
数とする乗算及び除算を生成可能である。さらに、７つ
の段階を２つの別の部分に分割することにより（ＧＣＴ
の単純な構造のため、この分割は容易である）、加算回
路レベルの最大数が４まで減少する。こうした分割を行
なうことにより、変換はデータの流れに対して障害では
なくなる。これは、最大能力がこれらの素子の設計で制
御されていることを意味している。しかし、いまや最後
の乗算が障害となるので、変換ユニットに、さらなる特
徴を用いることが可能である。図１４はこうした構成を
図示している。The Invention The Generalized Chain Transform (GCT) of the present invention requires no multiplication in the main transform, only one multiplication per component at the end of the conversion process. Main one-dimensional GC
T is some 3 made up of up to 7 discontinuity levels
It consists of an array of 8 adder circuits (see FIGS. 8, 9 and 1).
0). The adder circuit array includes hard-wired shift circuits, which can generate the exponential multiplication and division as described above. Furthermore, by dividing the seven stages into two separate parts (GCT
(This division is easy due to the simple structure of), and the maximum number of adder circuit levels is reduced to four. By doing this division, the conversion is no longer an obstacle to the flow of data. This means that maximum capability is controlled by the design of these devices. However, now that the last multiplication becomes an obstacle, it is possible to use additional features in the conversion unit. FIG. 14 illustrates such a configuration.

【０１４４】図１４において、８×１行ベクトル用の入
力ラッチ１３０に続くのは、１次元変換回路１３４へ供
給する２入力の一方を選択するＭＵＸ１３２である。こ
こで重要な相違は、変換ユニット１３４が一つだけ存在
していることである。所定量の時間の後、変換ユニット
１３４は入力行ベクトルについての変換を完了する。移
項用ＲＡＭ１３６へ渡された後、変換された行ベクトル
は第２のＭＵＸ１３８によって第１のＭＵＸ１３２へ戻
され、さらに唯一の変換ユニット１３４へ渡される。列
がここで変換される。列が変換され移項された後、結果
は乗算器１４０へ転送される。平均して変換ユニットが
４ｘナノ秒で動作すべきことは明らかである。これが、
単純な加算回路のＧＣＴネットワークが大きな利点を提
供する部分である。加算回路は乗算器より大幅に高速で
あるから、こうした時分割乗算が可能になる。In FIG. 14, following the input latch 130 for the 8 × 1 row vector is a MUX 132 that selects one of the two inputs supplied to the one-dimensional conversion circuit 134. The important difference here is that there is only one conversion unit 134. After a predetermined amount of time, transform unit 134 completes the transform on the input row vector. After being passed to the transfer RAM 136, the transformed row vector is returned by the second MUX 138 to the first MUX 132 and further to the only translation unit 134. The columns are converted here. After the columns have been transformed and transposed, the result is forwarded to multiplier 140. It is clear that on average the conversion unit should operate in 4x nanoseconds. This is,
This is where the GCT network of simple adder circuits offers great advantages. The adder circuit is significantly faster than the multiplier, thus enabling such time division multiplication.

【０１４５】ＧＣＴそれ自体は、ＤＣＴより大幅な節約
である。図１４に図示した実現方法は、２つではなく１
つの変換ユニットしか有していないということだけで、
さらに５０％の節約を提供するものである。これを眺望
してみれば、本発明の設計はただ一つだけの変換ユニッ
ト１３４を有し、また、このユニットはチップ上で４０
％ないし５０％を占有する。残りの５０％はＲＡＭ、ラ
ッチ、乗算器１４０、Ｉ／Ｏ、その他に割当てられる。
第２の変換ユニットがおよそ５０％のシリコン領域を増
大させるであろうことは理解されよう。The GCT itself is a significant savings over the DCT. The realization method shown in FIG. 14 is not one but two.
Just because it has only one conversion unit,
It offers an additional 50% savings. Looking at this, the design of the present invention has only one conversion unit 134, and this unit is 40% on chip.
Occupy% to 50%. The remaining 50% is allocated to RAM, latches, multipliers 140, I / O, etc.
It will be appreciated that the second conversion unit will increase the silicon area by approximately 50%.

【０１４６】加算回路だけのネットワークを時分割乗算
と併せて使用することにより、ビデオ速度より５０％以
上高い性能を提供する効率的なＪＰＥＧ実装を提供す
る。The use of a network of adder circuits only, in combination with time division multiplication, provides an efficient JPEG implementation that provides performance greater than 50% above video speed.

【０１４７】結局、ＤＣＴなどの変換は画像圧縮に有用
であり、ＤＣＴに類似した方法が計算の単純さの上で望
ましい。この点、本発明に開示した方法並びに装置によ
って、１６ビット変換に匹敵する速度の量子化演算が行
なえ、なおかつ、自乗平均誤差は３２ビット変換のそれ
に匹敵する。比較的高速な加算の組と一組の乗算に変換
を因子分解することで効果的にパイプライン化されたデ
ータの流れをなしており、垂直方向及びび水平方向の変
換の加算部分は終段の乗算部分以前に同一ハードウェア
によって実行されるものとなる。After all, a transform such as DCT is useful for image compression, and a method similar to DCT is desirable in terms of computational simplicity. In this respect, the method and apparatus disclosed in the present invention can perform a quantization operation at a speed comparable to 16-bit conversion, and the root mean square error is comparable to that of 32-bit conversion. Effectively pipelined data flow by factorizing the transform into a relatively fast set of additions and a set of multiplications, with the addition part of the vertical and horizontal conversions being the final stage. Will be executed by the same hardware before the multiplication part of.

【０１４８】[0148]

【一般化】本開示における実施例は、画像符号化に基づ
く変換に制限されているが、本発明の乗算器は、入力が
除されたのと同一の数により出力が乗算されるようなあ
らゆる量子化方式に一般化することが可能である。幾つ
かのアルゴリズムでは、同様な量子化方式を使用してい
るため、ある程度まで一般化し得るが、本発明の乗算器
は量子化及び脱量子化の意味合いにおいてのみ意味を有
する。好適実施例では１６ビット計算を使用している
が、一般に、本発明はＮビットの計算を用いるこのよう
な処理に適用し得るものである。また、本発明は既存の
規格、例えば、ＪＰＥＧ規格と互換性を有している。好
適実施例は、本発明の原理を最も良く説明し得るように
選択し、また、解説しており、これによる実際の応用は
当業者をして、本発明並びに各種実施例を最良の形態で
使用し得るようになし、また、意図する特定の用途に適
合するような各種の変更を行ない得るものである。本発
明の範囲は、特許請求の範囲によってのみ規定されるこ
とを意図するものである。Generalization Although the embodiments in this disclosure are limited to transforms based on image coding, the multiplier of the present invention allows any output where the output is multiplied by the same number that the input is divided. It can be generalized to a quantization method. Although some algorithms use similar quantization schemes, they can be generalized to some extent, but the multipliers of the present invention have meaning only in the context of quantization and dequantization. While the preferred embodiment uses 16-bit arithmetic, the present invention is generally applicable to such processes using N-bit arithmetic. Also, the present invention is compatible with existing standards such as the JPEG standard. The preferred embodiment has been chosen and described in such a manner as to best explain the principles of the invention, and the actual application by which those skilled in the art will appreciate that the invention and various embodiments may be best practiced. It can be used and various modifications can be made to suit the particular intended use. The scope of the invention is intended to be defined only by the claims that follow.

【０１４９】[0149]

【発明の効果】本発明は、上述したように構成したの
で、静止画像データの伸長方法、圧縮方法及びそのため
の対応装置に関して、ＪＰＥＧ規格と互換性を保てるも
のであり、この際、データ圧縮の量子化及び圧縮段階に
おけるビットの使用が最適化され、かつ、量子化及び係
数圧縮を統合するデータ圧縮方式における自乗平均値エ
ラーを最小化することができる。また、データ圧縮の範
囲、並びに、解像度を最適化する方法において、一定量
のビットの使用で済むものとなり、かつ、小さい量子化
の値について解像度にＪＰＥＧ規格Ｈ．２６１仕様を適
合させることもできる。即ち、より具体的には、１６入
力１出力のマルチプレクサ及び１６ビット乗算器を用い
ることにより、ダイナミックレンジ２８ビットで量子化
の予備圧縮が可能となる。さらには、変換処理のパイプ
ライン化実装において、最大限の利点まで一般化チェン
変換の速度を使用することができる。また、変換処理を
実行するために要求されるゲート数を最小限に抑えるこ
ともでき、特に、変換処理を行なう加算回路ネットワー
ク部分の速度の利点を用いて同一ハードウェアによる垂
直方向及び水平方向の変換の追加を実行することができ
る。Since the present invention is configured as described above, the still image data decompression method, compression method, and corresponding apparatus can maintain compatibility with the JPEG standard. The use of bits in the quantization and compression stages is optimized and the root mean square error in data compression schemes that integrate quantization and coefficient compression can be minimized. Further, in the method of optimizing the range of data compression and the resolution, it is possible to use a fixed amount of bits, and for a small quantization value, the resolution is JPEG standard H.264. The H.261 specifications can also be adapted. That is, more specifically, by using a 16-input 1-output multiplexer and a 16-bit multiplier, it is possible to perform preliminary compression for quantization with a dynamic range of 28 bits. Moreover, in pipelined implementations of the transformation process, the speed of generalized chain transformation can be used to the full advantage. It is also possible to minimize the number of gates required to perform the conversion process, and in particular to take advantage of the speed advantage of the adder network portion that performs the conversion process, both vertically and horizontally by the same hardware. The addition of transformations can be performed.

[Brief description of drawings]

【図１】本発明の一実施例を示し、（ａ）はコンプレッ
サ構成のブロック図、（ｂ）はデコンプレッサ構成のブ
ロック図である。FIG. 1 shows an embodiment of the present invention, (a) is a block diagram of a compressor configuration, and (b) is a block diagram of a decompressor configuration.

【図２】動作を説明するためのもので、（ａ）は入力画
素の順序を示す説明図、（ｂ）はブロックのタイミング
図、（ｃ）はベクトルのタイミング図である。2A and 2B are diagrams for explaining the operation, where FIG. 2A is an explanatory diagram showing the order of input pixels, FIG. 2B is a block timing diagram, and FIG. 2C is a vector timing diagram.

【図３】ＲＧＢからＸＹＺへのデータの３点変換を示す
ブロック図である。FIG. 3 is a block diagram showing three-point conversion of data from RGB to XYZ.

【図４】ＶＬＳＩの配置を示す模式図である。FIG. 4 is a schematic diagram showing a layout of VLSI.

【図５】シフトレジスタ構成例を示す概略ブロック図で
ある。FIG. 5 is a schematic block diagram showing a shift register configuration example.

【図６】シフトアレイ構成例を示し、（ａ）は概略ブロ
ック図、（ｂ）はその具体的構成のブロック図である。FIG. 6 shows a shift array configuration example, (a) is a schematic block diagram, and (b) is a block diagram of a specific configuration thereof.

【図７】統合されたデータの流れを示す模式図である。FIG. 7 is a schematic diagram showing a flow of integrated data.

【図８】前向き処理の加算アレイ構成例を示すブロック
図である。FIG. 8 is a block diagram showing a configuration example of an addition array in forward processing.

【図９】前向き処理の加算アレイ構成の他例を示すブロ
ック図である。FIG. 9 is a block diagram showing another example of the addition array configuration of the forward processing.

【図１０】２次元一般化チェン変換を示す概略ブロック
図である。FIG. 10 is a schematic block diagram showing a two-dimensional generalized Chien transform.

【図１１】本発明の好適実施例を示すブロック図であ
る。FIG. 11 is a block diagram showing a preferred embodiment of the present invention.

【図１２】乗算前のシフトを伴う反転予備圧縮及び量子
化のためのハードウェア構成例を示すブロック図であ
る。FIG. 12 is a block diagram showing a hardware configuration example for inversion precompression and quantization with shift before multiplication.

【図１３】乗算後のシフトを伴う反転予備圧縮及び量子
化のためのハードウェア構成例を示すブロック図であ
る。FIG. 13 is a block diagram illustrating a hardware configuration example for inversion precompression and quantization with shift after multiplication.

【図１４】本発明の変換の速度を利用する２次元一般化
チェン変換の実現の流れを模式的に示すブロック図であ
る。FIG. 14 is a block diagram schematically showing a flow of implementation of a two-dimensional generalized chain transformation that utilizes the speed of transformation of the present invention.

【図１５】従来の２次元ＤＣＴ計算の実現の流れを模式
的に示すブロック図である。FIG. 15 is a block diagram schematically showing a flow of realizing a conventional two-dimensional DCT calculation.

[Explanation of symbols]

１２移行用メモリ手段１６変換手段５０第１のＧＣＴ加算回路ネットワーク段６０第１のＧＣＴ加算回路ネットワーク段６６移行用メモリ手段７４乗算テーブル手段７６まるめ手段８０ジグザグ順序手段１３４変換手段１３６移行用メモリ手段 12 migration memory means 16 conversion means 50 first GCT addition circuit network stage 60 first GCT addition circuit network stage 66 migration memory means 74 multiplication table means 76 rounding means 80 zigzag order means 134 conversion means 136 migration memory means

Claims

[Claims]

1. A method of decompressing still image data for inverting a transform for transforming a sequence of original values into a sequence of transform domain coefficients, wherein each of the transform domain coefficients is Q.
The M-bit exponent identified as the exponent and the (N−M) -bit mantissa identified as the Q mantissa are multiplied by the Q coefficient stored in the N-bit save register, Q coefficient = Q A mantissa part * 2 ^ Q to provide the Q coefficient with a value in the range larger than 2 ^ N by forming an exponential part; and a second step of approximating the sequence of the transformed domain coefficient to the original sequence of values. And a step of converting it into a sequence of values.

2. The method of decompressing still image data as claimed in claim 1, wherein the step of multiplying includes the steps of multiplying by Q mantissa using an integer multiplier unit and shifting left by Q exponent bits.

3. The method of decompressing still image data according to claim 2, wherein the step of left shifting by the Q exponent bits follows the step of multiplying by the Q mantissa using an integer multiplier unit.

4. The method of decompressing still image data according to claim 2, wherein the step of multiplying by the Q mantissa using the integer multiplier unit follows the step of left shifting by the Q exponent bits.

5. The still image data decompression method according to claim 2, further comprising a scaling coefficient and a quantized coefficient, and a Q coefficient being equal to a product of the scaling coefficient and the quantized coefficient.

6. The sequence of transform domain coefficients has a value of length L, and an inverse transform operation is divided into a network of adder circuit arrays into said initial multiplication of length L and a series of add and shifts. The still image data decompression method according to claim 1, further comprising:

7. An inverse transformation operation is divided into a product of a diagonal determinant and a non-diagonal determinant, including the use of an adder array, and a diagonal determinant operation can be performed by the adder array. The method for expanding still image data according to claim 1.

8. The method of decompressing still image data according to claim 7, wherein the adder circuit array has seven stages or less and comprises 39 or less adder circuit units.

9. The method for decompressing still image data according to claim 7, wherein the transform is a generalized Cheng transform that approximates a discrete cosine transform.

10. The still image data expansion method according to claim 5, wherein the scaling coefficient includes a psychological factor weighting coefficient.

11. The still image data decompression method according to claim 5, wherein the scaling coefficient includes a debraing coefficient.

12. The method of decompressing still image data according to claim 2, wherein the original sequence values represent a two-dimensional grid of image pixels.

13. The method of decompressing still image data according to claim 1, wherein N is equal to 16 and M is equal to 4.

14. The method of decompressing still image data according to claim 2, wherein the Q coefficient is standardized in advance by 2 ^ N, and the lower N digits of the multiplication product are discarded.

15. A method of compressing still image data for performing conversion from a value of an original sequence into a sequence of transform domain coefficients, the method comprising converting the sequence of original values into a sequence of converted values. Transforming the sequence of transformed values into a sequence of transform domain coefficients by multiplying each transformed value by a Q coefficient, and discarding the lower N bits of the output, where the Q coefficient is 2
A method of compressing still image data, which is standardized in advance by a factor of ^ N and is stored in an N-bit storage register.

16. The method of compressing still image data according to claim 15, further comprising a scaling coefficient and a quantized coefficient, wherein the Q coefficient is equal to 2 ^ N times the scaling coefficient divided by the quantized coefficient. .

17. The maximum value of all Q factors is (2 ^ N −
17. The method of compressing still image data according to claim 16, which is characterized in 1).

18. The static of claim 15, wherein the original sequence of values has a value of length L and the transformation operation is divided into a series of additions and a final multiplication of said length L. Image data compression method.

19. The still image data according to claim 15, wherein the transform operation is divided into a product of a diagonal determinant and a non-diagonal determinant, and the operation by the non-diagonal determinant can be executed by an adder circuit array. Compression method.

20. The adder circuit array has 7 stages or less.
20. The method of compressing still image data according to claim 19, characterized by comprising not more than one adder circuit unit.

21. The method of compressing still image data according to claim 19, wherein the transform is a generalized Cheng transform that approximates a discrete cosine transform.

22. The method of compressing still image data according to claim 16, wherein the scaling coefficient includes an inverse psychological factor weighting coefficient.

23. The original sequence value is 2 of the image pixels.
16. The method of compressing still image data according to claim 15, wherein the method represents a three-dimensional lattice.

24. The method of compressing still image data according to claim 15, wherein N is equal to 16.

25. In a forward mode / reverse mode bimodal processor for performing multiplication of an input integer and a Q value to calculate a product, the input integer is left-shifted by shifting an integer number of bits. , An integer multiplier for multiplying the coefficient with the shifted input integer, and 2
A mode Q value processor, the shift integer is set to 0 and the coefficient is set equal to the Q value when the processor is in forward mode, and the shift integer is Q index when the processor is in reverse mode. The two-mode processing device is characterized in that the coefficient set to the part is equal to the Q mantissa part, and Q value = Q mantissa part * 2 ^ Q exponent part.

26. Including a compression factor, the Q factor is inversely proportional to the compression factor in forward mode and the Q factor is inversely proportional to the compression factor in reverse mode, whereby the device functions as a data compression / decompression device. The two-mode processing device according to claim 25, wherein:

27. A 4-bit storage register, a 12-bit storage register, and a 16-bit storage register, wherein a Q exponent part is stored in the 4-bit storage register,
27. The two-mode processing device according to claim 26, wherein a Q mantissa part is stored in the 12-bit storage register and an input integer is stored in the 16-bit storage register.

28. The Q value is premultiplied by 2 ^ N,
26. The bimodal processor of claim 25, wherein N bits are clipped from the multiplier output.

29. The two-mode processing apparatus according to claim 25, wherein the input integer is a generalized Chien transform coefficient in the backward mode.

30. The Q value in a forward mode is proportional to a psychological factor weighting coefficient, and the Q value in a reverse mode is proportional to a reverse psychological factor weighting coefficient. Mode processor.

31. The two-mode processor according to claim 26, wherein the Q value is proportional to the debraling coefficient in the forward mode.

32. In a forward mode / reverse mode bimodal processor for performing multiplication of an input integer and a Q value to calculate a product, the input multiplier is left-shifted by shifting an integer number of bits. And an integer multiplier unit for pre-multiplying the coefficient with the input integer,
The shift integer is set to 0 and the coefficient is set to the Q value when the processing device is in the forward mode and the processing device is in the forward mode, and the shift integer is the Q index when the processing device is in the backward mode. The two-mode processing device is characterized in that the coefficient set to the part is equal to the Q mantissa part, and Q value = Q mantissa part * 2 ^ Q exponent part.

33. A compression factor is included, in a forward mode the Q factor is inversely proportional to the compression factor, and in the reverse mode the Q factor is proportional to the compression factor, whereby the device is a data compression / decompression device. 33. The bimodal processor of claim 32, which is functional.

34. A 4-bit storage register, a 12-bit storage register, and a 16-bit storage register, wherein a Q exponent part is stored in the 4-bit storage register,
34. The two-mode processing device according to claim 33, wherein a Q mantissa part is stored in the 12-bit storage register and an input integer is stored in the 16-bit storage register.

35. The Q value is premultiplied by 2 ^ N,
34. The bimodal processor of claim 33, wherein N bits are clipped from the multiplier output.

36. Two-mode processing according to claim 35, comprising three sets of 16-bit storage registers, wherein the Q value, the product, and the input integer are stored in the storage registers. apparatus.

37. The two-mode processor of claim 32, wherein the input integers form a generalized Chien transform coefficient in reverse mode.

38. The Q value in a forward mode is proportional to a psychological factor weighting coefficient, and the Q value in a reverse mode is proportional to a reverse psychological factor weighting coefficient. Mode processor.

39. The two-mode processing device according to claim 33, wherein the Q value is proportional to the debraling coefficient in the forward mode.

40. In a two-dimensional transformation method using a pipelined structure to perform a two-dimensional transformation, the two-dimensional transformation is divided into two consecutive one-dimensional transformations, wherein the two one-dimensional transformations are Each of the two steps is divided into a high speed step and a low speed step, the high speed step has a faster calculation time than the low speed step, and the two high speed steps of the two one-dimensional transformations are combined into a pipeline structure. A two-dimensional conversion method comprising the steps of: performing in two parts.

41. The two-dimensional conversion method according to claim 40, wherein each of the high-speed steps is executed by a network of adder circuit arrays.

42. The two-dimensional conversion method according to claim 40, wherein the two high-speed steps are successively executed by a network of substantially the same adder circuit array.

43. The two slow stages are algebraically integrated,
The two-dimensional conversion method according to claim 42, wherein the two-dimensional conversion method is executed as a single step.

44. The two-dimensional conversion method according to claim 43, wherein the one-dimensional conversion is a generalized Cheng transform that approximates a discrete cosine transform.

45. A two-dimensional conversion device having a pipelined structure for executing a two-dimensional conversion divided into two one-dimensional conversions, wherein a first processing device in the pipeline of the pipelined structure is used. And each of the one-dimensional transforms can be divided into a first part and a second part, the first processing device performing the first part of the one-dimensional transform, and a first set A vector in the pipeline of the pipelined structure for rearranging the vectors of the first set of vectors to generate the second set of vectors, the Nth vector of the Mth vector of the first set of vectors. The entry is the Mth entry of the Nth vector of the second set of vectors, and the second processor is pipelined to perform the second part of the one-dimensional transformation. Construction pipeline And a transfer system for introducing a third set of vectors into the first processing unit to generate the first set of vectors. Means for introducing the first set of vectors into the transfer device to generate the second set of vectors; and introducing the second set of vectors into the first processing device. And means for generating a vector of four sets, and means for introducing the vector of the fourth set to the second processing device to generate a two-dimensionally transformed vector set. A two-dimensional conversion device characterized by the above.

46. The two-dimensional conversion apparatus according to claim 45, wherein the set of the first vector, the second vector, the third vector, and the fourth vector is an M × 1 vector consisting of M elements.

47. The processing time of the first processor for generating the vectors of the first set and the fourth set is the processing time of the second processor for generating the set of transformed vectors. 46. The two-dimensional conversion device according to claim 45, which is not significantly larger.

48. The two-dimensional conversion device according to claim 47, wherein the first processing device comprises a network of adder circuit arrays.

49. The two-dimensional conversion apparatus according to claim 48, wherein the second processing device algebraically integrates the second parts of the two one-dimensional conversions.

50. The two-dimensional conversion apparatus according to claim 49, wherein the one-dimensional conversion comprises a generalized Cheng transform that approximates a discrete cosine transform.

51. Converting means for receiving an input pixel having a certain bit width and converting the input pixel in the horizontal direction or the vertical direction by using only the adder circuit array means in a time division configuration; Transferring memory means for rotating vertically converted pixels vertically or horizontally, means for the converting means to receive vertical or horizontal pixels, and vertical or horizontal pixels for the adder circuit array Means for transforming vertically or horizontally using means, and compression for receiving the transformed pixels and performing a single multiplication function on the transformed pixels to represent the input pixels. A device for compressing still image data, comprising a single multiplication circuit means for providing compressed pixel data.

52. A step of receiving an input pixel having a certain bit width in a time division configuration and converting the input pixel in a horizontal direction or a vertical direction using only an adder circuit array means; and the step of converting the converted pixel in a vertical direction. Or horizontally rotating, converting vertical or horizontal pixels using only the adder circuit array means, and performing a single multiplication function on the converted pixels to input the input. A method of compressing still image data, comprising: providing compressed pixel data representing a pixel.

53. Means for receiving input pixel data representing an image and generalized Chien conversion means for compressing said pixel data, said generalized Chien conversion means adding said image data. GCT adder circuit means for converting the pixel converted in the horizontal direction to the horizontal direction and a memory means for transfer for vertically rotating the pixel converted in the horizontal direction. The GCT adder circuit means uses only the adder circuit. Means for vertically converting vertical pixels into a vertical direction, and multiplication circuit means for performing a multiplication function on the converted vertical pixels to provide compressed pixel data representing the input pixels. The GCT adder circuit means includes a first GCT adder circuit network stage for converting the first half of the horizontal and vertical conversions and a second half of the horizontal and vertical conversions. A system for compressing still image data, comprising a second GCT adder circuit network stage for converting.

54. The still image data compression system of claim 53, wherein the first and second GCT summing circuit network stages transform the pixels horizontally and vertically in a time division configuration.

55. The still image data compression system of claim 54, wherein the multiplication circuit means includes zigzag ordering means.

56. A still image data compression system according to claim 55, wherein the multiplication circuit means includes rounding means.

57. A still image data compression system as set forth in claim 56, wherein the multiplication circuit means includes multiplication table means.