JPH0746136A

JPH0746136A - Acoustic or picture conversion processor, acoustic or picture data processor, acoustic or picture data processing method, arithmetic processor, and data processor

Info

Publication number: JPH0746136A
Application number: JP6002391A
Authority: JP
Inventors: Dei Aren Jieimusu; ディアレンジェイムス; Pii Booritsuku Maatein; ピィボーリックマーティン
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1993-01-21
Filing date: 1994-01-14
Publication date: 1995-02-14

Abstract

PURPOSE:To increase the throughput of an acoustic or picture compression system. CONSTITUTION:The same devices as in a standard system are used for devices up to an NXN division unit 20 and devices following a unit 40, in a transformed image compression system. Outputs 25a and 25b of the NXN division unit 20 consist of NXN picture element blocks. The pair of adjacent picture element blocks 25a and 25b are coupled into a double vector 29 by a double vector generating unit 27, and this double vector 29 is transformed by a two-dimensional transformation unit 30 which adopts addition, subtraction, and shift to execute the GCU transformation. An output double vector 31 is decomposed to a standard numerical form by a number extracting unit 33 to obtain transformation coefficients 35. The transformation coefficients 35 are quantized by the quantizing unit 40 and are encoded by an encoder 50.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は画像データの圧縮のため
の装置及び方法、並びに、加算、減算及びシフト操作を
伴うプロセッサのスループットを向上させるための装置
及び方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an apparatus and method for compressing image data, and an apparatus and method for improving the throughput of a processor involving addition, subtraction and shift operations.

【０００２】[0002]

【従来の技術】本出願は、１９９２年７月７日発行の米
国特許第５，１２９，０１５号、当該特許第５，１２
９，０１５号の一部継続出願であるところの係属中の米
国特許出願第０７／７４３，５１７号（受理日：１９９
１年８月９日，発明の名称：静止画像を圧縮するための
装置及び方法）、並びに、当該米国特許出願第０７／７
４３，５１７号の一部継続出願であるところの係属中の
米国特許出願第０７／８１１，４６８号（受理日：１９
９１年１２月１９日，発明の名称：静止画像を圧縮する
ための装置及び方法）と関連する。これら関連特許出願
の詳細は、参照によって本明細書に組み込まれる。BACKGROUND OF THE INVENTION This application is directed to U.S. Pat. No. 5,129,015, issued Jul. 7, 1992, which is hereby incorporated by reference.
Co-pending U.S. patent application Ser. No. 07 / 743,517, dated 199
Aug. 9, 1st, Title of the Invention: Apparatus and method for compressing still images) and US patent application Ser. No. 07/7
No. 43,517, a continuation-in-part application of pending US patent application Ser. No. 07 / 811,468 (receipt date: 19
December 19, 1991, Title of Invention: Apparatus and method for compressing still images). The details of these related patent applications are incorporated herein by reference.

【０００３】しかしながら、本発明の態様は、そのよう
な画像圧縮システムに限定されない。むしろ、本発明
は、音響データ圧縮システム等の一般的なデータ信号処
理システムのスループットを向上させるための装置と方
法に向けられている。However, aspects of the invention are not limited to such image compression systems. Rather, the present invention is directed to apparatus and methods for improving the throughput of general data signal processing systems, such as acoustic data compression systems.

【０００４】信号処理においては、しばしば時間のかか
る計算を実行する必要がある。今日の汎用プロセッサ
（例えばモトローラ６８０２０）は３２ビットの算術デ
ータパスを持っているのが普通であるが、これは本発明
に関係する音響又は画像処理のような重要な用途には必
要以上に高精度である。音響又は画像処理の環境では、
それより精度が低くてもより高速であるほうが望ましい
といえよう。In signal processing, it is often necessary to perform time-consuming calculations. Today's general purpose processors (eg Motorola 68020) typically have a 32-bit arithmetic data path, which is unnecessarily high for critical applications such as audio or image processing in connection with the present invention. It is precision. In an acoustic or image processing environment,
It would be desirable to be faster with less precision.

【０００５】例えば、前記の関連特許及び関連特許出願
に述べられている一般化チェン変換（Ｇeneralized Ｃh
en Ｔransform, ＧＣＴ）を実現するには、加算、減算
及びシフト操作だけを用い、最後に加算を用いて離散的
コサイン変換（ＤＣＴ）を近似することになる。許容さ
れる画像品質を得るのには、３２ビットの精度は必要な
い。したがって、２個の１６ビットワードのデータを結
合して１個の３２ビットワードとし、２個の１６ビット
ワードを３２ビットプロセッサで並列に処理して２個の
１６ビットワードの出力データを抽出することによっ
て、ＧＣＴのスループットを飛躍的に向上させることが
できる。このプロセスによれば、単一命令・単一データ
（ＳＩＳＤ）マシンを単一命令・複数データ（ＳＩＭ
Ｄ）マシンとして動作させることができる。For example, the generalized Chen transform (Generalized Ch) described in the related patents and patent applications mentioned above.
en Transform, GCT) would use only add, subtract and shift operations, and finally add to approximate the discrete cosine transform (DCT). 32-bit precision is not required to obtain acceptable image quality. Therefore, the data of two 16-bit words are combined into one 32-bit word, and the two 16-bit words are processed in parallel by a 32-bit processor to extract the output data of the two 16-bit words. As a result, the throughput of GCT can be dramatically improved. According to this process, a single instruction, single data (SISD) machine can be used for single instruction, multiple data (SIM
D) It can be operated as a machine.

【０００６】[0006]

【発明が解決しようとする課題】本発明の一つの目的
は、データ、特に静止画像データを圧縮するための装置
及びその方法を提供することである。SUMMARY OF THE INVENTION One object of the present invention is to provide an apparatus and method for compressing data, especially still image data.

【０００７】本発明の特に目的とするところは、音響又
は静止画像を圧縮し、かつスループットを増大させるた
めの装置及びその方法を提供することである。It is a particular object of the present invention to provide an apparatus and method for compressing audio or still images and increasing throughput.

【０００８】本発明のもう一つの目的は、単一命令・単
一データマシンを単一命令・複数データマシンとして動
作させることによって、スループットを向上させるため
の方法を提供することである。Another object of the present invention is to provide a method for improving throughput by operating a single instruction / single data machine as a single instruction / multiple data machine.

【０００９】[0009]

【課題を解決するための手段】本発明によれば、音響又
は画像圧縮システムなどのデータ処理システムのスルー
プットを増大させるための装置及び方法が提供させる。
この装置又は方法にあっては、２組の複数の数の加算ま
たは減算は、これら数を結合して”ダブルベクトル”の
ペアにし、このダブルベクトルのペアを加算又は減算
し、結果のダブルベクトルを分離して元の複数の数に対
する加算又は減算の結果を表わす値を求めることによ
り、並列に成し遂げられる。同様に、複数の数の左シフ
トは、それらの数を結合して１つのダブルベクトルに
し、このダブルベクトルを左シフトし、元の数の左シフ
ト後の値を表わす複数の数を取り出すことによって、成
し遂げられる。According to the present invention, there is provided an apparatus and method for increasing the throughput of a data processing system, such as an audio or image compression system.
In this device or method, two sets of numbers are added or subtracted by combining these numbers into a "double vector" pair, adding or subtracting this double vector pair, and the resulting double vector Can be accomplished in parallel by separating and separating the values to obtain a value representing the result of the addition or subtraction on the original numbers. Similarly, multiple numbers left-shifted by combining them into a double vector, left-shifting this double vector, and retrieving multiple numbers that represent the left-shifted value of the original number. Can be accomplished.

【００１０】[0010]

【作用】上述のような本発明の装置又は方法によれば、
加算、減算及びシフトを用い、一般化チェン変換のよう
な線形変換をダブルベクトルを使用して実行することに
より、計算速度を大幅に上げることができる。According to the apparatus or method of the present invention as described above,
By performing a linear transformation such as a generalized Cheng transformation using a double vector with addition, subtraction and shift, the calculation speed can be significantly increased.

【００１１】例えば、３２ビットの算術データパスを持
つ単一命令・単一データの汎用プロセッサを用いる場合
において、２個の１６ビットワードのデータを結合して
１個の３２ビットワードとし、２個の１６ビットワード
をプロセッサで並列に処理して２個の１６ビットワード
の出力データを抽出することにより、単一命令・単一デ
ータのマシンを単一命令・複数データマシンとして動作
させることができるため、ＧＣＴ等のスループットを飛
躍的に増大できる。For example, in the case of using a single instruction / single data general-purpose processor having a 32-bit arithmetic data path, two 16-bit word data are combined into one 32-bit word and two A 16-bit word can be processed in parallel by a processor and two 16-bit words of output data can be extracted to operate a single instruction / single data machine as a single instruction / multiple data machine. Therefore, the throughput of GCT or the like can be dramatically increased.

【００１２】なお、本発明の以上に述べた特徴以外の特
徴及び利点は、以下の説明により明かになろう。The features and advantages of the present invention other than those described above will be apparent from the following description.

【００１３】[0013]

【実施例】本発明の実施態様を詳細に説明する前に、前
記の関連特許及び関連特許出願について簡単に説明す
る。しかしながら、先に指摘したように、本発明はその
ような圧縮システムに限定されるものではない。むし
ろ、本発明は、幾つかの入力データ量が結合されること
により、より少ない”ダブルベクトル”（doublevecto
r）が得られ、プロセッサによって入力ダブルベクトル
に演算が施されることにより出力ダブルベクトルが得ら
れ、かつ、幾つかの出力データ量が出力ダブルベクトル
から抽出され、出力ダブルベクトルの個数が出力データ
量の個数より少なくなるような方法と装置を提供する。Before describing the embodiments of the present invention in detail, the related patents and related patent applications will be briefly described. However, as pointed out above, the present invention is not limited to such compression systems. Rather, the present invention reduces the number of "double vectors" by combining several input data quantities.
r) is obtained, the output double vector is obtained by performing an operation on the input double vector by the processor, and some output data amounts are extracted from the output double vector, and the number of output double vectors is Methods and apparatus are provided that are less than the number of quantities.

【００１４】まず、変換画像圧縮（Ｔransform Ｉmage
Ｃompression）に関して述べる。この変換画像圧縮は、
近傍画素の集合の値を変換係数の集合に変換することに
よって画像を圧縮する。その利点は、値の変換係数の相
関が対応する画素値の相関より小さい傾向にあるという
ことである。First, the transformed image compression (Transform Image)
(Compression). This conversion image compression is
The image is compressed by converting the values of the set of neighboring pixels into a set of transform coefficients. The advantage is that the correlation of the transform coefficients of the values tends to be smaller than the correlation of the corresponding pixel values.

【００１５】画像中の近傍画素は同じような値であるの
が普通である、つまり、画素の”エネルギー”分布は比
較的均一である。ある変換の係数が非常にばらつく、つ
まり相関がないときには、この変換により良好なエネル
ギー圧縮が得られる。再現信号品質が同じである場合、
変換係数の損失性（lossy）量子化は、オリジナルデー
タの損失性量子化よりも遥かに良好な圧縮が得られる。
（あるいは、圧縮量が同じ場合、変換係数の損失性量子
化は、オリジナルデータの損失性量子化より良好な再現
信号品質を得られる。）この変換画像圧縮のためのシス
テムについて説明する。代表的な変換画像圧縮システム
の一例を図６に示す。（なお、ＪＰＥＧ静止画像圧縮規
格ベースラインシステムは、このようなシステムであ
る。）図６において、原カラー画像８は、色変換ユニッ
ト１０で画素がＹＣ_RＣ_Bのような反対色空間に変換され
る。このシステムの他の部分を通じて、一度に１つの色
成分しか処理しない。色変換された画像データ１５は、
ＮｘＮ分割ユニット２０によって、ＮｘＮブロック（Ｊ
ＰＥＧの場合は８ｘ８ブロック）に分割される。このＮ
ｘＮの反対色ブロック２５は２次元変換ユニット３０で
変換される。Neighboring pixels in the image are usually of similar value, ie the "energy" distribution of the pixels is relatively uniform. This transform gives good energy compression when the coefficients of a transform are highly variable, ie uncorrelated. If the reproduced signal quality is the same,
Lossy quantization of the transform coefficients yields much better compression than lossy quantization of the original data.
(Alternatively, lossy quantization of transform coefficients can provide better reproduced signal quality than lossy quantization of original data if the amount of compression is the same.) A system for this transform image compression will be described. An example of a typical converted image compression system is shown in FIG. (Note that the JPEG still image compression standard baseline system is such a system.) In FIG. 6, the original color image 8 is converted by the color conversion unit 10 into the opposite color space such as YC _R C _B. To be done. Through the other parts of the system, only one color component is processed at a time. The color-converted image data 15 is
By the NxN division unit 20, the NxN block (J
In the case of PEG, it is divided into 8 × 8 blocks). This N
The opposite color block 25 of xN is converted by the two-dimensional conversion unit 30.

【００１６】２次元変換を、１次元変換を２回通すこと
により実行できれば、大幅に計算量を削減できる。この
場合、２次元変換は分離可能（separable）変換と呼ば
れる。図７は、このタイプの変換の計算のための回路要
素を示す。まず、１Ｄ変換ユニット３２により、Ｎ行の
１ｘＮ画像要素２５に対し１次元変換が施される。その
結果の変換係数３１は置換ユニット３４で置換され、別
のＮ行の１ｘＮ係数３３が１Ｄ変換ユニット３６で変換
されることにより、２次元変換を受けた１組の２次元変
換係数３５が得られる。If the two-dimensional conversion can be executed by passing the one-dimensional conversion twice, the amount of calculation can be greatly reduced. In this case, the two-dimensional transformation is called a separable transformation. FIG. 7 shows the circuit elements for the calculation of this type of transformation. First, the 1D conversion unit 32 performs one-dimensional conversion on N rows of 1 × N image elements 25. The resulting conversion coefficient 31 is replaced by the replacement unit 34, and another N row of 1 × N coefficients 33 is converted by the 1D conversion unit 36 to obtain a set of two-dimensional conversion coefficients 35 subjected to two-dimensional conversion. To be

【００１７】図６に示すように、変換係数３５のブロッ
クは量子化ユニット４０で量子化される。既知の数で各
係数３５を除することにより量子化が得られる。これに
より、量子化後係数４５の符号化に必要なビット数を減
らす。（伸長の時には、係数に同じ既知数を乗ずること
により量子化は逆向きに行なわれる。）なお、整数の
除算により生じる丸め誤差のために量子化が”損失性”
（lossy）であるため、量子化の後に、正確な変換値を
復元できないであろうことに気付くべきである。量子化
後の係数４５は、非損失性（lossless）符号器５０で、
ハフマン符号化などの何等かのアルゴリズムにより損失
なく符号化されて、１組の符号化変換係数５５が得られ
る。As shown in FIG. 6, the block of transform coefficients 35 is quantized in a quantization unit 40. The quantization is obtained by dividing each coefficient 35 by a known number. This reduces the number of bits required to encode the quantized coefficient 45. (During expansion, quantization is performed in the opposite direction by multiplying the coefficient by the same known number.) Note that the quantization is "lossy" due to the rounding error caused by integer division.
It should be noted that the exact transform value may not be restored after quantization, since it is (lossy). The quantized coefficient 45 is a lossless encoder 50,
It is encoded without loss by some kind of algorithm such as Huffman encoding to obtain a set of encoded transform coefficients 55.

【００１８】このシステムを汎用プロセッサを用いソフ
トウエアで実現した場合、システムの最も遅い部分は変
換そのものである。良好なエネルギー圧縮を得るには、
離散的コサイン変換（ＤＣＴ）のような複雑な変換が必
要である。算術計算は非常にプロセッサ時間を必要とす
る。シフトと加算は除算に比べ非常に高速に行なうこと
ができるので、低速の計算、殊に乗算を、加算、減算及
びシフトで置き換えることによって計算速度を上げるこ
とができる。When this system is realized by software using a general-purpose processor, the slowest part of the system is the conversion itself. To get good energy compression,
Complex transformations such as the Discrete Cosine Transform (DCT) are required. Arithmetic calculations are very processor time intensive. Since shifts and additions can be performed much faster than divisions, slower computations, especially multiplications, can be replaced by additions, subtractions and shifts to speed up computations.

【００１９】米国特許出願第５，１２９，０１５号に開
示され、引用によって本明細書に組み込まれたところの
一般化チェン変換は、変換で加算とシフトしか用いな
い。変換に関して必要な乗算はすべて量子化の乗算に併
合されるので、量子化の速度は低下しないのに対し、変
換の速度は大幅に上昇する。The generalized Cheng transform, disclosed in US Pat. No. 5,129,015 and incorporated herein by reference, uses only addition and shifting in the transform. Since all multiplications required for the transformation are merged into the quantization multiplication, the quantization speed does not decrease, but the conversion speed increases significantly.

【００２０】本発明は、２ブロックの要素を１つの普通
の汎用プロセッサを用いて並列に変換させることによっ
て、ＧＣＴのような変換は加算及びシフトが非常に多い
という利点を利用する。The present invention takes advantage of the significant addition and shift of transformations such as GCTs by transforming the two blocks of elements in parallel using one conventional general purpose processor.

【００２１】以下、本発明の一態様を説明する。信号処
理においては、時間のかかる算術演算がしばしば必要に
なる。今日の汎用プロセッサは３２ビットの算術データ
パスを持つのが普通であるが、本発明に関係する音響又
は画像処理のような重要な用途には必要以上に高精度で
ある。このような音響又は画像処理の環境では、それよ
り精度は低くても、より高速であるほうが望ましいとい
えよう。One aspect of the present invention will be described below. Signal processing often requires time-consuming arithmetic operations. Today's general-purpose processors typically have a 32-bit arithmetic data path, which is more accurate than necessary for important applications such as audio or image processing in connection with the present invention. In such an acoustic or image processing environment, it may be desirable to be faster, though less accurate.

【００２２】本発明は、複数の数を１個の３２ビット”
ダブルベクトル”（doublevector）にパックする（詰め
込む）ことによって、上記の目的を達成する。In the present invention, a plurality of numbers are converted into one 32-bit "
The above objective is achieved by packing (packing) into a double vector.

【００２３】例えば、４個の８ビットデータ、または３
個の１０ビットデータ（２ビットは予備）、あるいは２
個の１６ビットデータを、１個の３２ビット・ダブルベ
クトルにパックできる。左シフトは精度の低下を招くの
で、変換プロセスは好ましくは各要素の数の有効ビット
を例えば１４ビットに制限するように設計され、３２ビ
ット・ダブルベクトルは２個の１６ビット数からなる。For example, 4 pieces of 8-bit data or 3 pieces
10-bit data (2 bits are reserved), or 2
16-bit data can be packed into a 32-bit double vector. The left shift results in a loss of precision, so the conversion process is preferably designed to limit the number of significant bits in each element to, for example, 14 bits, and a 32-bit double vector consists of two 16-bit numbers.

【００２４】ここで、２次元変換の速度を上げるためダ
ブルベクトル法を使用する、本発明の好適な一実施例を
図１に示して説明する。ここにに示した変換画像圧縮シ
ステムにおいて、ＮｘＮ分割ユニット２０までの装置
と、量子化ユニット４０以降の装置は、図６の標準的圧
縮システムのものと同一である。A preferred embodiment of the present invention using the double vector method for speeding up two-dimensional conversion will now be described with reference to FIG. In the transformed image compression system shown here, the devices up to the N × N division unit 20 and the devices after the quantization unit 40 are the same as those of the standard compression system of FIG.

【００２５】図１に示したように、ＮｘＮ分割ユニット
２０の出力２５ａ及び２５ｂはＮｘＮ画素ブロックから
なる。隣接する画素ブロック２５ａ，２５ｂのペアはダ
ブルベクトル生成ユニット２７で結合されてダブルベク
トル２９を生成する。このダブルベクトル２９は、前述
の米国特許において述べられているように、加算、減算
及びシフトを用いてＧＣＴ変換を実行する２次元変換ユ
ニット３０で変換される。出力ダブルベクトル３１は数
抽出ユニット３３で標準的な数値形に分解されることに
より、変換係数３５が得られる。この変換係数３５は上
述のように量子化されてから符号化される。As shown in FIG. 1, the outputs 25a and 25b of the NxN division unit 20 consist of NxN pixel blocks. A pair of adjacent pixel blocks 25a and 25b are combined in a double vector generation unit 27 to generate a double vector 29. This double vector 29 is transformed in a two-dimensional transformation unit 30 which performs GCT transformation using addition, subtraction and shift, as described in the aforementioned US patent. The output double vector 31 is decomposed by the number extraction unit 33 into a standard numerical form, whereby the conversion coefficient 35 is obtained. The transform coefficient 35 is quantized as described above and then encoded.

【００２６】本発明の一つの目的は、図７に示した８要
素ＧＣＴ変換ユニット３２及び３６のような線形変換
を、モトローラ製ＭＣ６８０２０のような３２ビット・
プロセッサ上で高速に行なうことである。好適な一実施
例では、２個の標準的な数が３２ビットレジスタ中で結
合される。かくして、ＳＩＳＤ（単一命令・単一デー
タ）プロセッサは実質的にＳＩＭＤ（単一命令・複数デ
ータ）マシンとして扱われることになる。線形変換は、
加算、減算及び定数乗算からなる。この乗算は、加算、
減算、左シフト及び右シフトの幾つかの組み合わせに置
き換えられる。テーブル検索及び乗算は避けられる。One object of the present invention is to perform a linear conversion such as the 8-element GCT conversion units 32 and 36 shown in FIG. 7 in a 32-bit type such as a Motorola MC68020.
It should be done at high speed on the processor. In one preferred embodiment, two standard numbers are combined in a 32-bit register. Thus, a SIMD (single instruction, single data) processor is effectively treated as a SIMD (single instruction, multiple data) machine. The linear transformation is
It consists of addition, subtraction and constant multiplication. This multiplication is addition,
Replaced by some combination of subtraction, shift left and shift right. Table lookups and multiplications are avoided.

【００２７】ここで、加算及び減算で有効ビットが左に
移動することに注目されたい。例えば、有効ビットが６
ビットの２つの数を加算すると有効ビットが７ビットの
数が得られることがある。例えばNote that addition and subtraction move the significant bit to the left. For example, the effective bit is 6
Adding two numbers of bits may yield a number with 7 significant bits. For example

【００２８】[0028]

【数１】 [Equation 1]

【００２９】変換のどの計算時点についても最大値は分
かっているので、設計によって、オーバーフローを防止
するように数の最大値は制限される。しかし、負数のサ
インビットは、ダブルベクトルの上位側数の下位ビット
へ移動することがある。変換のサイズ及び要求される正
確さによって、データの”中央戻し”（re-center）の
ために右シフトが必要になることがある。したがって、
５を掛ける代わりに、初めの値に２.５を掛け、か
つ、”パートナー”（partner）値に２.０を掛けるほう
が望ましいことがある（これら二つの値は、何等かの除
算の後、最終的に掛け合わされることになることが分か
っている場合）。Since the maximum is known at any point in the calculation of the transform, the design limits the maximum to prevent overflow. However, a negative sign bit may move to the lower bits of the upper number of the double vector. Depending on the size of the transform and the accuracy required, a right shift may be required for "re-center" of the data. Therefore,
Instead of multiplying by 5, it may be desirable to multiply the initial value by 2.5 and the "partner" value by 2.0 (these two values are, after some division, If you know that will eventually be multiplied).

【００３０】本発明によれば、Ｎビットの数ＡとＭビッ
トの数Ｂが結合され、（Ｎ＋Ｍ）ビットの”ダブルベク
トル”数Ｃが得られる。このダブルベクトルＣとその要
素の数であるＡ，Ｂとの間の関係は、Ｃ＝［Ａ，Ｂ］な
る式で表わされる。ダブルベクトル数が得られれば、こ
のダブルベクトル数が一連の演算（その大部分は単純な
四則演算である）を施されてダブルベクトル出力が得ら
れる。そして、このダブルベクトル出力が分解されて、
要素数が得られる。例えば、次の加算Ｘ＝Ａ１＋Ａ２及びＹ＝Ｂ１＋Ｂ２は、次のダブルベクトルＣ１＝［Ａ１，Ｂ２］及びＣ２＝［Ａ２，Ｂ２］を生成し、この２つのダブルベクトルの加算、すなわちＺ＝Ｃ１＋Ｃ２＝［Ｘ，Ｙ］を実行し、そして、ダブルベクトルＺから結果Ｘ及びＹ
を抽出することによって、同時に実行することができ
る。減算も類似の方法で実行される。According to the invention, the N-bit number A and the M-bit number B are combined to obtain an (N + M) -bit "double vector" number C. The relationship between this double vector C and the numbers of its elements, A and B, is expressed by the equation C = [A, B]. When the number of double vectors is obtained, the number of double vectors is subjected to a series of operations (most of which are simple arithmetic operations) to obtain a double vector output. And this double vector output is decomposed,
The number of elements is obtained. For example, the following addition X = A1 + A2 and Y = B1 + B2 produces the following double vectors C1 = [A1, B2] and C2 = [A2, B2], the addition of these two double vectors, ie Z = C1 + C2 = [X, Y], and the result X and Y from the double vector Z
Can be executed simultaneously by extracting Subtraction is performed in a similar way.

【００３１】同様に、２つの（モノベクトル，monovect
or）数Ａ及びＢのｎビット左シフトは、次のダブルベク
トルＣ＝［Ａ，Ｂ］を定義し、このＣに対して左シフトを実行して出力ベク
トルＣ′ Ｃ′＝Ｃ<<ｎ（”<<ｎ”はｎビットの左シフトを表わす。）を作り、
右側に０を詰め、ダブルベクトルＣ′を分解してＡ′と
Ｂ′（ただし、Ｃ′＝［Ａ′，Ｂ′］、Ａ′＝Ａ<<ｎ，
Ｂ′＝Ｂ<<ｎ）を求めることによって、同時に実行する
ことができる。Similarly, two (mono vector, monovect
or) The n-bit left shifts of the numbers A and B define the following double vector C = [A, B] and perform a left shift on this C to output vector C ′ C ′ = C << n ("<<n" represents an n-bit left shift),
The right vector is padded with 0s and the double vector C'is decomposed into A'and B '(where C' = [A ', B'], A '= A << n,
By executing B '= B << n), it is possible to execute simultaneously.

【００３２】最初のダブルベクトル生成方法を線形法と
呼ぶ。この線形法によれば、ｍビットの数Ａとｎビット
の数Ｂから生成されたダブルベクトルＣは、Ｃ＝［Ａ，Ｂ］＝Ａ＊２ⁿ＋Ｂで定義される。ＣからのＡ及びＢの抽出はＢ＝Ｃ−（（Ｃ>>ｎ）＊２ⁿ）及びＡ＝（Ｃ−Ｂ）／２ⁿ で与えられる。ここで、”>>ｎ”はｎビット右へシフト
しかつサインビットを左側へ送ることを意味する。本好
適実施例では、Ａ及びＢは符号付の２の補数の整数であ
り、Ｃは３２ビットのダブルベクトルであり、Ｃ＝［Ａ，Ｂ］＝Ａ＊２¹⁶＋Ｂである。そして、抽出はＢ＝Ｃ−（（Ｃ>>１６）＊２¹⁶ 及びＡ＝（Ｃ−Ｂ）／２¹⁶ で与えられる。（あるいは、Ａ＝Ｃ>>１６、その他等価
な算術演算とブール演算の組み合わせで与えられる。）
例えば、Ａ＝０ｘ００４１＝６５Ｂ＝０ｘＦＦＦ７＝−９の時には、Ｃ＝［Ａ，Ｂ］＝（６５ｘ２¹⁶）−９＝０ｘ００４０ＦＦＦ７である。ここで、０ｘのプリフィックスを持つ数は１６
進で表現されている。The first double vector generation method is called a linear method. According to this linear method, the double vector C generated from the m-bit number A and the n-bit number B is defined by C = [A, B] = A * 2 ⁿ + B. The extraction of A and B from C is given by B = C-((C >> n) * ²ⁿ ) and A = (CB) / ²ⁿ . Here, ">>n" means shift n bits to the right and send the sign bit to the left. In the preferred embodiment, A and B are signed two's complement integers, C is a 32-bit double vector, and C = [A, B] = A * 2 ¹⁶ + B. The extraction is then given by B = C-((C >> 16) * 2 ¹⁶ and A = (C−B) / 2 ¹⁶ (or A = C >> 16, other equivalent arithmetic operations and booleans. It is given as a combination of operations.)
For example, when A = 0x0041 = 65 B = 0xFFF7 = -9, C = [A, B] = (65x2 ¹⁶ ) -9 = 0x0040 FFF7. Here, the number with 0x prefix is 16
It is expressed in hex.

【００３３】線形法を用いると、２つの１６ビット数０
ｘ００４１及び０ｘ００４１と２つの１６ビット数０ｘ
ＦＦ７及び０ｘＦＦＦ７の加算を結合するには、２つの
３２ビットのダブルベクトルの加算、すなわちUsing the linear method, two 16-bit numbers 0
x0041 and 0x0041 and two 16-bit numbers 0x
To combine the additions of FF7 and 0xFFF7, add two 32-bit double vectors, ie

【００３４】[0034]

【数２】 [Equation 2]

【００３５】を実行すればよい。The following may be executed.

【００３６】これにより得られた和を前述の方法で分解
することにより、次の和０ｘ００８１＝１２９及び０ｘＦＦＥＥ＝−１８が得られる。By decomposing the sum thus obtained by the above method, the following sums 0x0081 = 129 and 0xFFEE = -18 are obtained.

【００３７】ダブルベクトルを生成するもう一つの方法
はパック法と呼ばれる。パック法によれば、（ｍ＋ｎ）
ビットのダブルベクトルＣがｍビット数Ａとｎビット数
ＢからＣ＝（Ａ<<ｎ）||Ｂここで、”||”は論理ＯＲ演算を意味する。Another method of generating a double vector is called a pack method. According to the pack method, (m + n)
From the m-bit number A and the n-bit number B, the bit double vector C is C = (A << n) || B where "||" means a logical OR operation.

【００３８】により生成される。したがって、ＡはＣの
高位部分にそのまま入れられ、ＢはＣの最下位のｎビッ
トに入れられる。その逆演算はＡ＝Ｃ>>ｎ及びＢ＝Ｃ&&（２ⁿ−１）である。ここで、”&&”は論理ＡＮＤ演算を意味する。
特に、本好適実施例においては、ＡとＢは１６ビット数
であるのでＣ＝（Ａ<<１６）||Ｂであり、その逆演算はＡ＝Ｃ>>１６及びＢ＝Ｃ&&ＦＦＦＦである。例えば、Ａ＝０ｘ００４１、Ｂ＝０ｘＦＦＦ７
であれば、Ｃ＝０ｘ００４１ＦＦＦ７である。なお、上記のＡ，Ｂ，Ｃを数と考えたが、線形
法及びパック法は配列または行列にも適用できる。Is generated by Therefore, A is placed in the high order part of C and B is placed in the least significant n bits of C. The inverse operations are A = C >> n and B = C && ( ^2n- 1). Here, "&&" means a logical AND operation.
In particular, in the preferred embodiment, A and B are 16-bit numbers, so C = (A << 16) || B, and its inverse operation is A = C >> 16 and B = C && FFFF. For example, A = 0x0041, B = 0xFFF7
Then, C = 0x0041 FFF7. Although the above A, B, and C are considered as numbers, the linear method and the pack method can be applied to an array or a matrix.

【００３９】パック法によって、２つの１６ビット数０
ｘ００４１及び０ｘ００４１の加算と、２つの１６ビッ
ト数０ｘＦＦ７及び０ｘＦＦ７の加算が、２つの３２ビ
ットのダブルベクトル０ｘ００４１ＦＦＦ７及び０ｘ
００４１ＦＦＦ７の加算に変換され、０ｘ００８３
ＦＦＥＥが得られ、これが分解されて０ｘ００８３＝１
３１なる和と０ｘＦＦＥＥ＝−１８が得られる。この例
は、パック法は時に、ダブルベクトルの上位部分に置か
れる数の最下位ビットにエラーが生じることを示してい
る。線形法には、このような問題はない。しかし、シフ
ト、ＯＲ、及びＡＮＤは乗算及び加算より高速であるの
で、パック法は線形法より高速である。図２に、線形法
とパック法を用いて８ビットのダブルベクトル中の４ビ
ット数のペアを加算した結果を対比させて示す。初めの
２組の加算［（３＋２）と（１＋１）、［（−３＋２）
と（１＋１）］は、線形法でもパック法でも正しい結果
が得られる。しかし、最後の２組の加算［（３＋２）と
（１＋（−１））、（３＋２）と（１＋（−１））］
は、線形法の結果は正しいが、パック法の結果は間違っ
ている。一般に、ダブルベクトルの下位部分にある負数
は、ダブルベクトルの高位部分にある数の最下位ビット
のエラーを生じさせる。通常、ダブルベクトルに格納さ
れた数は大きいので、そのようなエラーは計算に大きな
影響を及ぼさない。もう一つの方法は、右側に（小数点
より右に）余分な桁が存在するように数を調整すること
である。By the pack method, two 16-bit numbers 0
The addition of x0041 and 0x0041 and the addition of two 16-bit numbers 0xFF7 and 0xFF7 are two 32-bit double vectors 0x0041 FFF7 and 0x.
Converted to addition of FFF7, 0x0083
FFEE is obtained and decomposed into 0x0083 = 1
A sum of 31 and 0xFFEE = -18 is obtained. This example shows that the packed method sometimes causes errors in the least significant bits of the number placed in the upper part of the double vector. The linear method does not have this problem. However, the pack method is faster than the linear method because shifts, ORs, and ANDs are faster than multiplications and additions. FIG. 2 shows a comparison result of adding pairs of 4-bit numbers in an 8-bit double vector using the linear method and the pack method. Addition of the first two sets [(3 + 2) and (1 + 1), [(-3 + 2)
And (1 + 1)], correct results are obtained by both the linear method and the packed method. However, the last two sets of addition [(3 + 2) and (1 + (-1)), (3 + 2) and (1 + (-1))]
, The result of the linear method is correct, but the result of the packed method is incorrect. In general, negative numbers in the lower part of the double vector cause errors in the least significant bits of the higher part of the double vector. Usually, the number stored in the double vector is large, so such an error does not significantly affect the calculation. Another way is to adjust the numbers so that there is an extra digit to the right (to the right of the decimal point).

【００４０】ダブルベクトルの加算又は減算を実行する
ための回路の概要を図４に示す。記憶レジスタ配列８０
は、線形法又はパック法により生成された３２ビットの
ダブルベクトルを記憶する。計算の性質に応じてレジス
タ配列８０より選択された３２ビットのダブルベクトル
のペアは、３２ビットのライン８４，８６で算術論理ユ
ニット（ＡＬＵ）８２へ送られる。ＡＬＵ８２は例え
ば、モトローラＭＣ６８０２０、またはインテルの８０
３８６や８０４８６、あるいはＳｕｎＳＰＡＲＣファ
ミリーのプロセッサである。ＡＬＵ８２の出力は６４ビ
ット数で、これは６４ビットのライン８８を介してレジ
スタ配列８０中の選択された１つのレジスタへ送られ
る。このレジスタに格納された数は、上位の３２ビット
を捨てることによって３２ビット数（ここではダブルベ
クトル）に整えられる。An outline of a circuit for performing addition or subtraction of double vectors is shown in FIG. Storage register array 80
Stores a 32-bit double vector generated by the linear method or the pack method. A 32-bit double vector pair selected from register array 80 depending on the nature of the calculation is sent to arithmetic logic unit (ALU) 82 on 32-bit lines 84 and 86. ALU82 is, for example, Motorola MC68020, or Intel 80
386 or 80486, or Sun SPARC family of processors. The output of the ALU 82 is a 64-bit number, which is sent on a 64-bit line 88 to one selected register in the register array 80. The number stored in this register is arranged into a 32-bit number (here, a double vector) by discarding the upper 32 bits.

【００４１】線形法とパック法による左シフトの結果を
対比させて図３に示す。最初の例［（３<<１）と（１<<
１）］のようにダブルベクトルの下位部分にある数が負
の時には、線形法でもパック法でも正しい結果を得られ
る。しかし、２番目の例［（３<<１）と（−１<<１）］
のように、ダブルベクトルの下位部分にある数が負の時
には、線形法を用いてもパック法を用いても、ダブルベ
クトルの上位部分にある数の最下位ビットにエラーを生
じる。なお、左シフトは、通常の２の補数の２進数のサ
インを反転させることがあるのと同じく、ダブルベクト
ルの要素のサインを反転させることがある。一般的に、
ｍビットのレジスタに格納される数は（ｍ−ｋ）ビット
の数で、オーバーフローを防ぐためには左シフトされる
ビット数はｋビットを越えない。The results of the left shift by the linear method and the pack method are shown in comparison with each other in FIG. The first example [(3 << 1) and (1 <<
1)], when the number in the lower part of the double vector is negative, the correct result can be obtained by both the linear method and the packed method. However, the second example [(3 << 1) and (-1 << 1)]
As described above, when the number in the lower part of the double vector is negative, an error occurs in the least significant bit of the number in the upper part of the double vector regardless of whether the linear method or the pack method is used. It should be noted that the left shift may invert the sine of the element of the double vector, as may the case of inverting the sine of the normal two's complement binary number. Typically,
The number stored in the m-bit register is a number of (m−k) bits, and the number of bits left-shifted does not exceed k bits in order to prevent overflow.

【００４２】ダブルベクトルに対する右シフトは左シフ
トとは異なる。というのは、右シフトが行なわれる時に
は、ダブルベクトルの下位部分にある数のサインビット
を保存しなければならないからである。次の２つの１６
ビット数Ａ＝０ｘ００４０＝６４Ｂ＝０ｘＦＦＦ７＝−９からパック法によって生成されたダブルベクトルＣ＝０
ｘ００４０ＦＦＦ７を２ビットだけ右シフトしたいと
する。通常の３２ビットデータの右シフトによればＣ>>２＝０ｘ００１０３ＦＦＤが得られ、パック法による要素数は１６と＋１６，３８
１であるが、この要素数に対し直接的に２ビットの右シ
フトを行なうと１６と−３が得られる。このようなサイ
ンの食い違いは許容できない。A right shift for a double vector is different than a left shift. This is because the number of sign bits in the lower part of the double vector must be preserved when the right shift is performed. Next two 16
Double vector C = 0 generated by the pack method from the number of bits A = 0x0040 = 64 B = 0xFFF7 = -9
Suppose you want to right shift x0040 FFF7 by 2 bits. Normal right-shifting of 32-bit data gives C >> 2 = 0x0010 3FFD, and the number of elements by the pack method is 16 and +16,38.
Although it is 1, if the number of elements is directly shifted right by 2 bits, 16 and -3 are obtained. Such discrepancies in signatures are unacceptable.

【００４３】図５に示すように、ダブルベクトルの正し
い右シフトは、ダブルベクトルの各数要素のサインビッ
トをコピーすることによって、各数要素のサインビット
を保存する。この例では、ダブルベクトルの下位部分に
ある数の高位ビット（サインビットを含む）は１のまま
にしなければならず、正しい結果は０ｘ００１０ＦＦ
ＦＤである。As shown in FIG. 5, the correct right shift of the double vector preserves the sign bit of each number element of the double vector by copying the sign bit of each number element. In this example, the number of high order bits (including the sign bit) in the lower part of the double vector must remain 1, and the correct result is 0x0010 FF.
It is FD.

【００４４】例えばＭＣ６８０００の命令セットを使用
し、ダブルベクトルの正しい右シフトを行なうには次の
コマンド系列を必要とする。For example, using the MC68000 instruction set, the following command sequence is required to perform a right shift of a double vector.

【００４５】ａｓｒｗ２，Ｃ；ｓｗａｐＣ，ａ
ｓｒｗ２，ＣｓｗａｐＣここで、ｓｗａｐＣコ
マンドは３２ビット・ダブルベクトルＣの上位１６ビッ
トと下位１６ビットを交換する。（ａｓｒｗ２，Ｃ）
コマンドはＣの下位１６ビットに対する２ビットの算術
右シフトである。この操作の系列はダブルベクトルの下
位１６ビットの数の符号を保存する。なお、１つのダブ
ルベクトルに対し複数の右シフトを行なう場合には、各
ａｓｒｗ／ｓｗａｐ／ａｓｒｗ／ｓｗａｐ系列の最初の
ｓｗａｐしか行なう必要がなく、最後のｓｗａｐが必要
となるのは右シフトが奇数回行なわれた場合である。Asrw 2, C; swap C, a
srw 2, C swap C where the swap C command swaps the upper 16 bits and the lower 16 bits of the 32-bit double vector C. (Asrw 2, C)
The command is a 2-bit arithmetic right shift to the lower 16 bits of C. This sequence of operations preserves the sign of the lower 16 bits of the double vector. When a plurality of right shifts are performed on one double vector, only the first swap of each asrw / swap / asrw / swap sequence needs to be performed, and the last swap is required because the right shift is an odd number. This is the case when it is performed once.

【００４６】具体例を挙げて説明する。本発明のダブル
ベクトル法を使用すれば、次の線形変換A specific example will be described. Using the double vector method of the present invention, the linear transformation

【００４７】[0047]

【数３】 [Equation 3]

【００４８】を次の変換Is converted to

【００４９】[0049]

【数４】 [Equation 4]

【００５０】と結合でき、そのためにダブルベクトルｐ
＝［ｘ1，ｘ2］，ｑ＝［ｙ1，ｙ2］を定義し、次の計算Can be combined with and therefore the double vector p
= [X1, x2], q = [y1, y2], and calculate

【００５１】[0051]

【数５】 [Equation 5]

【００５２】を実行し、ｐ′＝［ｘ1′，ｘ2′］、ｑ′
＝［ｙ1′，ｙ2′］の関係によって、解ｘ1′，ｘ2′，
ｙ1′，ｙ2′を抽出する。And p '= [x1', x2 '], q'
= [Y1 ', y2'], the solution x1 ', x2',
Extract y1 'and y2'.

【００５３】例えば、次の線形変換行列を考える。For example, consider the following linear transformation matrix.

【００５４】[0054]

【数６】 [Equation 6]

【００５５】行列Ｍの成分は簡単な有理数であるので、
（ｘ1，ｙ1）^Tに対する行列演算Ｍは、加算、減算、左
シフト及び右シフトの組み合わせにより実行できる。す
なわち、Since the elements of the matrix M are simple rational numbers,
The matrix operation M on (x1, y1) ^T can be performed by a combination of addition, subtraction, left shift and right shift. That is,

【００５６】[0056]

【数７】 [Equation 7]

【００５７】この変換Ｍを１ｘ２行列（ｘ1，ｙ1）^T，
（ｘ2，ｙ2）^Tに行なう場合には、その計算時間は、次
の計算を行ない、ｐ′，ｑ′の数成分を抽出することに
よって減らすことができる。This transformation M is converted into a 1 × 2 matrix (x1, y1) ^T ,
When the calculation is performed at (x2, y2) ^T , the calculation time can be reduced by performing the following calculation and extracting several components of p'and q '.

【００５８】[0058]

【数８】 [Equation 8]

【００５９】なお、ダブルベクトル計算中、成分の数が
抽出される最終ステージまで、ダブルベクトルの成分の
アドレシッシングは不要である。（（ｐ′，ｑ′）^Tの
２つの成分はそれぞれ１回右シフトを含むが、右シフト
された値は右シフトされない値に加算されるため、swap
は省略されない）。It should be noted that during double vector calculation, addressing of double vector components is not required until the final stage where the number of components is extracted. Each of the two components of ((p ', q') ^T includes a right shift once, but the value right-shifted is added to the value not right-shifted, so swap
Is not omitted).

【００６０】ここで、本発明の変換画像符号化への利用
について述べる。多くの有用な変換は、無理数の乗算を
必要とするので加算、減算及びシフトに分解できない。
しかし、図７の変換ユニット３２，３６で実行されるＧ
ＣＴ変換は因数分解が可能であるので、これらの無理数
乗算を量子化ユニット４０における量子化演算に併合で
きる。この乗算は追加的な計算を全く伴わない。Ｎ次元
変換の場合、これらの乗算は合体できるので、それらの
コストは１ポイント当たりＮでなく１である。変換行列
中の他の成分は、変換の直交性を損なうことなく、有理
数で置き換えることができる。この性質を有する有用な
変換として、高速Ｈartley変換、離散的サイン変換及び
離散的コサイン変換がある。Now, the use of the present invention for the conversion image coding will be described. Many useful transformations require irrational multiplication and cannot be decomposed into additions, subtractions and shifts.
However, G executed by the conversion units 32 and 36 of FIG.
Since the CT transform can be factored, these irrational number multiplications can be merged into the quantization operation in the quantization unit 40. This multiplication does not involve any additional computation. For N-dimensional transforms, these multiplications can coalesce, so their cost is 1 instead of N per point. Other components in the transformation matrix can be replaced by rational numbers without compromising the orthogonality of the transformation. As useful transforms having this property, there are a fast Hartley transform, a discrete sine transform, and a discrete cosine transform.

【００６１】”ＪＰＥＧ”（Ｊoint Ｐhotohraphic Ｅx
perts Ｇroup）として知られている重要な画像圧縮規格
の場合、上に引用した米国特許に述べられているよう
に、表１に示す値を変換マトリックスの成分の代わりに
用いて、ＧＣＴ変換により、医療用途にも十分な、離散
的コサイン変換の近似を得られる。"JPEG" (Joint Photohraphic Ex
In the case of an important image compression standard known as perts Group), the values shown in Table 1 were used in place of the components of the transform matrix, as described in the above-referenced U.S. patents, to allow GCT transformation to It is possible to obtain an approximation of the discrete cosine transform, which is sufficient for medical applications.

【００６２】[0062]

【表１】 [Table 1]

【００６３】よって、簡単な有理数を用いて有用な変換
を実行できる。乗算を加算とシフトとして実施できるの
で、本発明を適用できるようになる。たとえば、ＧＣＴ
で必要とされる１／sqr２（＝〜０．７０７１１）によ
る乗算は（”sqr２”は２の平方根を意味し、”＝〜”
はほぼ等しいことを意味する。）テーブル検索で行なう
ことができるが、シフトと加算を用い、０．７０７１１＊Ａ＝〜（（Ａ＋Ａ>>）>>１）＊（１＋>>２）＋Ａ>>４によって行なうことができる。Therefore, a useful transformation can be executed using a simple rational number. Since the multiplication can be implemented as addition and shift, the present invention can be applied. For example, GCT
Multiplication by 1 / sqr2 (= ~ 0.70711) required in ((sqr2) means the square root of 2 and "= ~"
Means approximately equal. ) It can be done by a table search, but using shift and addition, it can be done by 0.70711 * A = ~ ((A + A >>) >> 1) * (1 + >> 2) + A >> 4.

【００６４】表２に、基本的な計算とそのコストを、標
準的な方法を用いた場合と本発明の方法を用いた場合の
それぞれについて列挙する。Table 2 lists the basic calculations and their costs for each of the standard method and the method of the present invention.

【００６５】[0065]

【表２】 [Table 2]

【００６６】本発明の好適実施例によれば、３２ビット
・ダブルベクトルに対して、１６ビット以下の数の加
算、減算及び左シフトが、標準的な加算、減算及び左シ
フトの２倍のスループットで行なわれる。ダブルベクト
ルの右シフトは、普通の数に対するより１倍または２倍
以上多い操作を必要とする。In accordance with the preferred embodiment of the present invention, for 32-bit double vectors, a number of additions, subtractions and left shifts of 16 bits or less is twice the throughput of standard addition, subtraction and left shifts. Done in. Shifting a double vector to the right requires one or more more operations than a normal number.

【００６７】ダブルベクトルのテーブル検索及び乗算
は、本発明の好適実施例では用いられない。ダブルベク
トルに対する乗算とテーブル検索を行なうのであれば、
ダブルベクトルの数の成分を取り出し、この数の成分に
対して演算し、最後にその数をその後の処理のためにダ
ブルベクトル内に戻す。（なお、３２ビット・ダブルベ
クトルのテーブル検索には非常に大きなテーブル（２³²
エントリー）が必要となろう。また、２つの１６ビット
数の乗算は有効ビット数が３２ビットの結果を生じるの
で、ダブルベクトルの直接的な乗算はデータを破壊す
る。）表３に示すように、本発明の方法は、８ｘ８画素
ブロックに対するＧＣＴ変換のための加算、減算及び左
シフトの回数を半減させる。Double vector table lookup and multiplication is not used in the preferred embodiment of the present invention. If you want to multiply the double vector and search the table,
It takes the number of components of the double vector, operates on this number of components, and finally puts the number back into the double vector for further processing. (Note that a very large table (2 ³²
Entry) will be needed. Also, since the multiplication of two 16-bit numbers produces a result with 32 significant bits, direct multiplication of double vectors destroys the data. ) As shown in Table 3, the method of the present invention halves the number of additions, subtractions and left shifts for the GCT transform on an 8x8 pixel block.

【００６８】[0068]

【表３】 [Table 3]

【００６９】ＭＣ６８０２０のための好適実施例では、
正方向の変換プロセスにパック法が用いられ、逆方向の
変換プロセスに線形法が用いられる。正方向変換動作で
は、８ビット画素成分が１１ビットの係数に変換される
ので、最下位桁の誤差は（１／２０４８）の大きさを持
つに過ぎない。しかし、逆方向変換動作では、１１ビッ
ト画素成分が８ビットの係数に変換されるので、最下位
桁の誤差は１／２５６の大きさを持つ。したがって、速
度を上げるために正方向変換の正確さを犠牲にしてもよ
い。In the preferred embodiment for the MC68020,
The pack method is used for the forward conversion process, and the linear method is used for the reverse conversion process. In the forward direction conversion operation, the 8-bit pixel component is converted into an 11-bit coefficient, so the error in the least significant digit has a magnitude of (1/2048). However, in the backward conversion operation, the 11-bit pixel component is converted into an 8-bit coefficient, so the error in the least significant digit has a magnitude of 1/256. Therefore, the accuracy of the forward conversion may be sacrificed to increase speed.

【００７０】正方向及び逆方向変換を行なう時に生じる
誤差は、整数演算の精度の制約から当然に生じる誤差
と、量子化誤差と、パック操作に関連する上記の誤差と
に分解することができる。典型的な量子化の場合、量子
化誤差の大きさが他の誤差に比べ非常に大きい。したが
って、実際上、上に述べた方法及び装置は有効であるこ
とが分かる。The error that occurs when performing the forward and backward conversions can be decomposed into an error that naturally occurs due to the constraint of the precision of integer arithmetic, a quantization error, and the above-mentioned error related to the pack operation. In the case of typical quantization, the magnitude of the quantization error is much larger than the other errors. Therefore, in practice, the method and apparatus described above prove to be effective.

【００７１】変換ユニット３２及び３５によるダブルベ
クトル画素データの変換後、データは、圧縮システムの
他の部分による処理のため再び標準的な数に変換され
る。After conversion of the double vector pixel data by the conversion units 32 and 35, the data is converted back to a standard number for processing by other parts of the compression system.

【００７２】以上の好適実施例に関する記述は、あくま
で本発明の説明のためのものであって、本発明をそれに
限定することを意図するものではなく、前記実施例に照
らして多くの修正及び変形が可能である。また、本発明
は、ＪＰＥＧ（ＪointＰhotogragh Ｅxperts Ｇroup）
のような既存の規格と互換である。本発明は符号付２進
数にも適用できる。前記好適実施例を選んで説明したの
は、本発明の原理とその応用を説明することにより、当
業者が本発明及びその様々な実施例を個々の用途に適す
るよう様々に修正して利用できるようにするためであ
る。他にも様々な変形が可能である。例えば、以上の説
明の大部分はダブルベクトルに関する計算に向けられた
が、２個より多い数を持つベクトルを作ることもでき
る。The above description of the preferred embodiments is merely for the purpose of explaining the present invention and is not intended to limit the present invention thereto, and many modifications and variations are made in light of the above embodiments. Is possible. Further, the present invention is based on JPEG (Joint Photograph Experts Group).
Is compatible with existing standards such as. The present invention can also be applied to signed binary numbers. The foregoing preferred embodiments have been chosen and described in order to explain the principles of the invention and its application, and those skilled in the art may utilize various modifications of the invention and its various embodiments to suit its particular application. To do so. Various other modifications are possible. For example, much of the above description was directed to calculations on double vectors, but it is possible to create vectors with more than two numbers.

【００７３】（なお、”ダブルベクトル”なる用語は、
便宜上用いたものである。）例えば、ＤＥＣＡｌｐｈ
ａシリーズは６４ビット算術演算を有するので、本発明
による４つの数からなるベクトルを用いることができ
る。データ圧縮のための変換に関して本発明を説明した
が、本発明は他の種類の変換、例えばスペクトル解析の
ためにも有用である。変換に関して本発明を説明した
が、本発明は他の種類の算術演算に用いることができ
る。線形法及びパック法のための算術演算とブール演算
の組み合わせの例を示したが、それ以外の等価な演算の
組み合わせで置き換えてもよい。２つのダブルベクトル
の方法を詳述したが、他の関連した方法も使用できる。(Note that the term "double vector" is
It is used for convenience. ) For example, DEC Alpha
Since the a series has 64-bit arithmetic operations, a vector of four numbers according to the present invention can be used. Although the invention has been described in terms of transforms for data compression, the invention is also useful for other types of transforms, such as spectral analysis. Although the invention has been described in terms of transformations, the invention can be used with other types of arithmetic operations. Although the example of the combination of the arithmetic operation and the Boolean operation for the linear method and the pack method has been shown, it may be replaced with a combination of other equivalent operations. Although two double vector methods have been detailed, other related methods can also be used.

【００７４】[0074]

【発明の効果】本発明によれば、音響又は画像圧縮シス
テムなどのデータ処理システムにおいて、２組の複数の
数の加算または減算は、これら数を結合して”ダブルベ
クトル”のペアにし、このダブルベクトルのペアを加算
又は減算し、結果のダブルベクトルを分離して元の複数
の数に対する加算又は減算の結果を表わす値を求めるこ
とにより、並列に成し遂げられる。同様に、複数の数の
左シフトは、それらの数を結合して１つのダブルベクト
ルにし、このダブルベクトルを左シフトし、元の数の左
シフト後の値を表わす複数の数を取り出すことによっ
て、成し遂げられる。例えば、３２ビットの算術データ
パスを持つ単一命令・単一データの汎用プロセッサを用
いる場合において、２個の１６ビットワードのデータを
結合して１個の３２ビットワードとし、２個の１６ビッ
トワードをプロセッサで並列に処理して２個の１６ビッ
トワードの出力データを抽出することにより、単一命令
・単一データのマシンを単一命令・複数データマシンと
して動作させることができ、計算速度を大幅に上げるこ
とができる。したがって、音響又は画像圧縮システム等
のＧＣＴ等の処理のスループットを飛躍的に増大でき
る。According to the present invention, in a data processing system such as an audio or image compression system, the addition or subtraction of two sets of numbers combines these numbers into a "double vector" pair, This is accomplished in parallel by adding or subtracting pairs of double vectors and separating the resulting double vectors to obtain a value representing the result of the addition or subtraction on the original numbers. Similarly, multiple numbers left-shifted by combining them into a double vector, left-shifting this double vector, and retrieving multiple numbers that represent the left-shifted value of the original number. Can be accomplished. For example, in the case of using a single instruction / single data general-purpose processor having a 32-bit arithmetic data path, two 16-bit word data are combined into one 32-bit word and two 16-bit words are combined. By processing words in parallel in a processor and extracting output data of two 16-bit words, a single instruction / single data machine can be operated as a single instruction / multiple data machine. Can be significantly increased. Therefore, the throughput of processing such as GCT for an audio or image compression system can be dramatically increased.

[Brief description of drawings]

【図１】本発明の好適実施例による画像圧縮装置のブロ
ック図を示す。FIG. 1 shows a block diagram of an image compression apparatus according to a preferred embodiment of the present invention.

【図２】本発明による線形法とパック法を用いた加算結
果を対比して示す。FIG. 2 shows comparison results of addition using the linear method and the pack method according to the present invention.

【図３】本発明による線形法とパック法を用いた左シフ
トの結果を対比して示す。FIG. 3 shows the results of left shift using the linear method and the pack method according to the present invention in contrast.

【図４】本発明によるダブルベクトルの加算又は減算を
実行するための回路構成の一例を示す。FIG. 4 shows an example of a circuit configuration for performing addition or subtraction of double vectors according to the present invention.

【図５】本発明による右シフト操作の例を示す。FIG. 5 shows an example of a right shift operation according to the present invention.

【図６】標準的な圧縮装置の要素のブロック図を示す。FIG. 6 shows a block diagram of the elements of a standard compressor.

【図７】分離可能な２Ｄ変換ユニットの要素のブロック
図を示す。FIG. 7 shows a block diagram of the elements of a separable 2D conversion unit.

[Explanation of symbols]

８原カラー画像１０色変換ユニット２０ＮｘＮ分割ユニット２７ダブルベクトル生成ユニット２９ダブルベクトル３０２次元（２Ｄ）変換ユニット３２１次元（１Ｄ）変換ユニット３３数抽出ユニット３５変換係数３４置換ユニット３６１次元（１Ｄ）変換ユニット４０量子化ユニット５０非損失性符号器５５符号化変換係数８０記憶レジスタ配列８２算術論理ユニット（ＡＬＵ）８４，８６３２ビットライン８８６４ビットライン 8 original color image 10 color conversion unit 20 NxN division unit 27 double vector generation unit 29 double vector 30 two-dimensional (2D) conversion unit 32 one-dimensional (1D) conversion unit 33 number extraction unit 35 conversion coefficient 34 substitution unit 36 one-dimensional ( 1D) Transform unit 40 Quantization unit 50 Lossless encoder 55 Coding transform coefficient 80 Storage register array 82 Arithmetic logic unit (ALU) 84,86 32-bit line 88 64-bit line

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号庁内整理番号ＦＩ技術表示箇所Ｈ０４Ｎ 1/41 Ｃ 9070−5ＣＢ 9070−5Ｃ ─────────────────────────────────────────────────── ─── Continuation of the front page (51) Int.Cl. ⁶ Identification code Internal reference number FI Technical display location H04N 1/41 C 9070-5C B 9070-5C

Claims

[Claims]

1. A first, second, and third register means having a high-order portion and a low-order portion, respectively, and the first and second plural-bit audio or image data numbers are combined to form the first register means. And means for entering into the second register means and for combining third and fourth sound or image data numbers into the second register means, wherein the first register means The register means and the second register means have more bit locations than any of the number of data, the first number of data being directed to the lower portion of the first register means, the second data The number is the first
Of said register means to said high order portion, said third data number to said lower portion of said second register means,
The fourth number of data is directed to the high order portion of the second register means, and the contents of the first register means and the second register means are added or subtracted, and the result is the third register result. Means for storing in the register means, a first output number from the lower part of the third register means, and a second from the higher order part of the third register means
And an audio or image conversion processor in an audio or image compression device.

2. The audio or image conversion processor according to claim 1, wherein the lower part and the upper part of the first, second and third register means are respectively the first part of the register means. An audio or image conversion processor, characterized in that it consists of one half and a second half.

3. A sound or image conversion processor according to claim 2, wherein the first, second and third register means have a length of 32 bits.

4. The sound or image conversion processor according to claim 3, wherein the number of data is 16 bits long.

5. The first and second register means and the first and second plural-bit data numbers are combined and placed in the first register means, and the third and fourth plural-bit data numbers are combined. Means for coupling into the second register means and the first register means has a number of bit locations greater than or equal to the sum of the first and second plurality of bit data numbers. The second register means has a number of bit locations that is greater than or equal to the sum of the third and fourth multi-bit data numbers, and has the first and second data numbers and the first and second data numbers. The third and fourth data numbers are set to 1 of the contents of the first and second register means.
A data processing device having means for adding by adding twice.

6. An audio or image compression apparatus comprising an audio or image conversion processor and first and second multi-bit data path register means, wherein the first and second multi-bit audio or image data numbers are:
Filling the first register means having a bit location of a number equal to or more than the sum thereof, and the number of audio or image data of third and fourth plural bits;
The step of filling the second register means having a bit location of a number equal to or more than the sum of them, and the sound or image added by adding the contents of the first and second register means in one addition operation. A method of processing audio or image data, the method comprising: generating data.

7. A function operation f () is performed on a first n-bit number A to obtain a first result X = f (A), and the function operation f () is a second m-bit number. Second run to B
A single-instruction single-data arithmetic processor for obtaining the result Y = f (B) of the above, wherein a double vector generation that creates an (n + m) -bit double vector C from the first number A and the second number B Means, an arithmetic logic unit for performing the function operation f () on the double vector C to obtain an output double vector Z = f (C), extracting a number from the output double vector Z, and the first result X and An arithmetic processor that acts as a single instruction multiple data machine by including means for obtaining the second result Y.

8. The arithmetic processor according to claim 7, wherein the functional operation is a left shift operation.

9. The arithmetic processor according to claim 7, wherein n = 16 and m = 16.

10. The arithmetic processor according to claim 7, wherein the double vector C is generated from the first number A and the second number B according to a relationship of C = A * 2 ⁿ + B. Arithmetic processor.

11. The arithmetic processor according to claim 10, wherein the first result X and the second result Y are Y = Z-((Z >> n) * 2 ⁿ ) and X = (Z−Y). ) / 2 ⁿ an arithmetic processor characterized in that it is taken from the output double vector Z.

12. The arithmetic processor according to claim 7, wherein the double vector C is generated from the first number A and the second number B according to the relation of C = (A << n) || B. Arithmetic processor characterized by that.

13. The arithmetic processor according to claim 12, wherein the first result X and the second result Y are the output double vector according to a relation of Y = Z && (2 ⁿ −1) and X = Z >> n. Arithmetic processor characterized by being taken from Z.

14. The function operation f () is calculated by a first n-bit number A.
The first result X = f (A1, A2) is obtained by executing the first and second n-bit numbers A2, and the function operation f ()
A single-instruction single-data arithmetic processor for performing a second result Y = f (B1, B2) by executing a third m-bit number B1 and a fourth m-bit number B2. From the number A1 of 1 and the third number B1 to the first (n + m)
A double vector generating means for generating a double vector C1 of bits, and generating a double vector C2 of the second (n + m) bits from the second number A2 and the fourth number B2; and the first double vector C1 and the double vector C1. An arithmetic logic unit for performing the function operation f () on the second double vector C2 to obtain an output double vector Z = f (C1, C2); and extracting a number from the output double vector Z and outputting the first result X And an arithmetic processor that acts as a single-instruction multiple-data machine by comprising means for obtaining the second result Y.

15. The arithmetic processor according to claim 14, wherein the functional operation f () is addition.

16. The arithmetic processor according to claim 14, wherein the functional operation f () is a subtraction.

17. The arithmetic processor according to claim 14, wherein n = 16 and m = 16.

18. The arithmetic processor according to claim 14, wherein the first double vector C1 is generated from the first number A1 and the third number B1 according to a relationship of C1 = A1 * 2 ⁿ + B1. An arithmetic processor, wherein the second double vector C2 is generated from the second number A2 and the fourth number B2 according to the relationship of C2 = A2 * 2 ⁿ + B2.

19. The arithmetic processor according to claim 18, wherein the first result X and the second result Y are Y = Z-((Z >> n) * 2 ⁿ ) and X = (Z- Y)
An arithmetic processor characterized in that the output double vector Z is taken out according to the relationship of / 2 ⁿ .

20. The arithmetic processor according to claim 14, wherein the first double vector C1 is derived from the first number A1 and the second number B1 according to the relationship of C1 = (A1 << n) || B1. An arithmetic processor, wherein the second double vector C2 is generated from the second number A2 and the fourth number B2 according to the relationship of C2 = (A2 << n) || B2.

21. The arithmetic processor according to claim 20, wherein the first result X and the second result Y are the output vectors according to the relationship of Y = Z && (2 ⁿ -1) and X = Z >> n. Arithmetic processor characterized by being taken from Z.

22. A first output data array is generated by performing a conversion calculation on a first input data array X1, and a second output data array is calculated by performing a conversion calculation on a second input data array X2. A data processor for generating an input double vector array Y from the first data array X1 and the second data array X2; and a calculation of the conversion for the input double vector array Y,
Means for obtaining an output double vector data array by performing a series of arithmetic operations and Boolean operations; and the first and the second from the output double vector data array.
And a means for retrieving the output data array of the.

23. The data processor of claim 22, wherein the series of operations includes addition, subtraction and shift, but not multiplication.

24. The data processor of claim 23, wherein the transform is a generalized Cheng transform.

25. The data processor according to claim 24, wherein the creating means creates the input double vector Y according to Y = X1 * 2 ⁿ + X2.

26. A data processor according to claim 24, wherein said creating means creates said input double vector Y according to Y = (X1 << n) || X2.