JPH11122116A

JPH11122116A - Device and method for compression

Info

Publication number: JPH11122116A
Application number: JP16941798A
Authority: JP
Inventors: Tomasz Thomas Prokop; トーマスプロコプトマツ; Trevor Robert Elbourne; ロバートエルボーントレバー; Mark Pulver; プルバーマーク
Original assignee: Canon Information Systems Research Australia Pty Ltd; Canon Inc
Current assignee: Canon Information Systems Research Australia Pty Ltd; Canon Inc
Priority date: 1997-04-30
Filing date: 1998-04-30
Publication date: 1999-04-30
Also published as: JP4101253B2; JP2005348410A

Abstract

PROBLEM TO BE SOLVED: To shorten the time needed for operation and to improve the performance of DCT(discrete cosine transformation) or reverse DCT by equipping a DCT device with a transposed matrix storage means and an arithmetic circuit consisting of a combinational circuit for performing DCT operation without using any clocked storage means. SOLUTION: The substitute memory 1118 of the DCT transformation part transform column type data into row type data so as to implement 2nd pass of two-dimensional discrete cosine transformation. Data from an input circuit 1126 and the substitute memory 1118 are multiplexed by a multiplexer 1124 and sent to a mathematical circuit 1122. The result of the mathematical circuit 1122 is sent to an output circuit 1120 after the 2nd pass ends. A control circuit 1116 controls a stream of data in the DCT transforming device. The mathematical circuit 1122 is the combinational circuit which does not have a storage location where an intermediate result is stored.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は符号化されていない
０またはそれ以上の可変長ビットフィールドを挿入され
た可変長コードを復号化する復号化器に関し、コードの
いくつかは変化させない復号化器に関する。本発明はま
た、パイプラインまたは記憶手段を持たないデータパス
を用い、かつ高速で動作可能な離散コサイン変換（ＤＣ
Ｔ）装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a decoder for decoding a variable length code into which zero or more variable length bit fields which are not encoded are inserted, and a decoder which does not change some of the codes. About. The present invention also provides a discrete cosine transform (DC) that operates at high speed using a data path without a pipeline or storage means.
T) Related to the device.

【０００２】[0002]

【従来の技術】一般に、大きな量のデータは電送、記
憶、読み出しおよびハフマン符号化のような可変長符号
化のいくつかのステージ手段での使用を含むさまざまな
理由により圧縮されまた伸長される。ハフマン符号化は
Ｄ．Ａ．ハフマンによる論文、「最小冗長コードの構築
方法（”ＡＭｅｔｈｏｄｆｏｒｔｈｅＣｏｎｓ
ｔｒｕｃｔｉｏｎｏｆＭｉｎｉｍｕｎＲｅｄｕｎ
ｄａｎｃｙＣｏｄｅｓ”Ｐｒｏｃ．ＩＲＥ，４０：１
０９８，１９５２）」により最初に開示された。多くの
場合、符号化ビット列中の可変長符号は不連続であり、
他の非符号化ビットフィールドが挿入されている。この
ビットフィールドは制御および／またはフォーマット情
報を表し、かつ／またはマーカーヘッダ、マーカーコー
ド、スタッフバイト、パディングビットおよび含まれる
付加ビット、たとえばＪＰＥＧ符号化データなどを含
む、符号化データに対する追加事項を供給する。BACKGROUND OF THE INVENTION In general, large amounts of data are compressed and decompressed for a variety of reasons, including transmission, storage, retrieval, and use in several stage means of variable length coding such as Huffman coding. Huffman coding is based on D. A. Huffman's dissertation, "A Method for Constructing Minimum Redundant Code (" A Method for the Cons
fraction of Minimun Redun
dancy Codes "Proc. IRE, 40: 1
098,1952) ". In many cases, variable-length codes in a coded bit stream are discontinuous,
Another uncoded bit field has been inserted. This bit field represents control and / or format information and / or supplies additional information to the encoded data, including marker headers, marker codes, stuff bytes, padding bits and included additional bits, such as JPEG encoded data. I do.

【０００３】可変長符号化においては、統計的に頻度の
高い入力コードが頻度の低いデータよりも短い符号を割
り当てられるように、入力データの発生の確からしさに
基づいて異なる入力データに異なる長さの符号を割り当
てる。頻度の低い入力コードは長いコードを割り当てら
れる。コードの割り当ては統計的もしくは適応的のいず
れかによりなされる。統計的割り当ての場合、どのデー
タブロックが処理されるかに関わらず一定のデータには
同じ出力コードが与えられる。適応的割り当ての場合、
出力コードは特定の入力ブロックまたはデータブロック
のセットの統計的分析および予想されるブロック間（ま
たはブロックセット間）での変化に基づき割り当てられ
る。In variable length coding, different lengths are assigned to different input data based on the likelihood of occurrence of the input data so that statistically frequent input codes are assigned shorter codes than less frequent data. Assign the sign of Infrequent input codes are assigned long codes. Code allocation is done either statistically or adaptively. In the case of statistical assignment, certain data is given the same output code regardless of which data block is processed. For adaptive assignment,
Output codes are assigned based on statistical analysis of a particular set of input blocks or data blocks and expected changes between blocks (or between sets of blocks).

【０００４】高速な可変長復号化が必要となった場合、
重大な問題が起きる。問題は特にＪＰＥＧ標準のよう
な、符号化データ列が符号化データを挿入（インタリー
ブ）された符号化されていない可変長のビットフィール
ドを含む場合におきる。そのような可変長符号化データ
の高速復号化における大きな困難は、ＪＰＥＧ標準のよ
うに特定の非符号化ビットフィールドの長さが引き続く
（符号化された）データの復号化が完全に終了した後で
ないと判別できない場合に発生する。次の符号化データ
の開始位置が、後ろのコードの復号化が完全に終わった
後でないとわからないため、一般的に直接パイプライン
処理を復号化器とともに用いることができない。When high speed variable length decoding is required,
Serious problems arise. The problem arises especially when the encoded data stream includes an uncoded variable length bit field into which the encoded data has been inserted (interleaved), as in the JPEG standard. A major difficulty in the fast decoding of such variable-length encoded data is that the decoding of (encoded) data following the length of a particular uncoded bit field, such as the JPEG standard, is complete. Occurs when it cannot be determined otherwise. Since the start position of the next encoded data cannot be known until the decoding of the subsequent code is completely completed, direct pipeline processing cannot be generally used together with the decoder.

【０００５】現存する解決法は、多くの用途に対して遅
すぎるが、一つの入力データの復号化に数ステップ（ク
ロックサイクル）を必要とするか、繰り返しユニット
（ｉｔｅｒａｔｉｖｅｕｎｉｔｓ）を用いて一つより
多いシンボルを１ステージ（クロックサイクル）で擬似
的に同時復号化するかである。しかし、更なる復号化ブ
ロックの追加はしばしばそのような復号化器を経済的に
釣り合わなくし、さらに必要十分な速度を得られなくす
る。これは次の復号化器の処理開始が依然として次の入
力データの先頭を決定する、前に位置する復号化器の処
理結果に依存するため、複数の復号化器が完全な並列動
作をしないからである。その結果、１ステージ（クロッ
クサイクル）で複数のシンボルを復号化したとしても、
そのステージ（クロック期間）は相対的に長く、全体の
復号化器としては多くの用途において遅すぎることにな
る。[0005] Existing solutions are too slow for many applications, but require several steps (clock cycles) to decode one input data, or one using iterative units. This is whether more than one symbol is pseudo-simultaneously decoded in one stage (clock cycle). However, the addition of additional decoding blocks often makes such decoders economically unbalanced and does not provide the necessary and sufficient speed. This is because multiple decoders do not operate completely in parallel because the start of the next decoder still depends on the processing result of the preceding decoder, which determines the beginning of the next input data. It is. As a result, even if a plurality of symbols are decoded in one stage (clock cycle),
The stage (clock period) is relatively long and would be too slow for many applications as a whole decoder.

【０００６】よって、従来の復号化器の問題点を１つか
それ以上解決した、可変長非符号化ビットフィールドで
インターリーブされた可変長コードの復号化器に対する
要求は明らかに存在する。具体的には、図７７に示され
た離散コサイン変換（ＤＣＴ）装置は８×８画素のブロ
ックの完全二次元（２−Ｄ）変換を８×８画素ブロック
の行にまず１−ＤＤＣＴを行うことで実現する。そし
て、別の１−ＤＤＣＴを８×８画素ブロックの列に対し
て行う。このような装置は具体的には入力回路１０９
６、演算回路１１０４、制御回路１０９８、転置行列メ
モリ回路１０９０及び出力回路１０９２から構成され
る。[0006] Thus, there is clearly a need for a decoder for variable length codes interleaved with variable length uncoded bit fields that overcomes one or more of the problems of conventional decoders. More specifically, the discrete cosine transform (DCT) device shown in FIG. 77 first performs 1-DDCT on a row of 8 × 8 pixel blocks by performing a complete two-dimensional (2-D) transformation of an 8 × 8 pixel block. It is realized by. Then, another 1-DDCT is performed on the columns of the 8 × 8 pixel block. Such a device is specifically the input circuit 109
6, an arithmetic circuit 1104, a control circuit 1098, a transposed matrix memory circuit 1090, and an output circuit 1092.

【０００７】入力回路１０９６は８×８ブロックから８
ビットの画素を受け付ける。入力回路１０９６は中間多
重化器１１００、１１０２によって演算回路１１０４に
接続される。演算回路１１０４は８×８ブロックの行ま
たは列全体のいずれかに算術操作を施す。制御回路１０
９８は他のすべての回路を制御し、ＤＣＴアルゴリズム
を実現する。演算回路の出力は転置行列メモリ１０９
０、レジスタ１０９５および出力回路１０９２にに接続
される。転置行列メモリは次の多重化器１１０２に出力
を供給する多重化器１１００に接続されている。多重化
器１１０２はまたレジスタ１０９４から入力を受信す
る。転置行列メモリ１０９０は８×８ブロックのデータ
を行に受け付け、列にデータを生成する。出力回路１０
９２は８×８の画素データブロックになされるＤＣＴの
係数を供給する。[0007] The input circuit 1096 converts the 8 × 8 block into 8
Accept bit pixels. The input circuit 1096 is connected to the arithmetic circuit 1104 by the intermediate multiplexers 1100 and 1102. Arithmetic circuit 1104 performs arithmetic operations on either rows or entire columns of 8 × 8 blocks. Control circuit 10
98 controls all other circuits and implements the DCT algorithm. The output of the arithmetic circuit is the transposed matrix memory 109
0, connected to the register 1095 and the output circuit 1092. The transposed matrix memory is connected to a multiplexer 1100 that provides an output to the next multiplexer 1102. Multiplexer 1102 also receives input from register 1094. The transposed matrix memory 1090 receives 8 × 8 block data in rows and generates data in columns. Output circuit 10
Reference numeral 92 supplies coefficients of DCT performed on an 8 × 8 pixel data block.

【０００８】典型的なＤＣＴ装置において、演算回路が
もっとも複雑であるため、基本的には演算回路１１０４
の速度が全体の速度を決定付ける。図７７における演算
回路１１０４は、演算処理を図７８を参照して以下に説
明されるいくつかのステージに分割して構成されてい
る。これらのステージ１１１４、１１４８、１１５２、
１１５６は加算器や乗算器などの共通利用されるものの
集まりによって一つの回路で実現することができる。し
かし、そのような回路１１０４は、共通に使われる一つ
の回路で複数のステージを構成しているため、最適化さ
れたものに比べて遅いという欠点がある。これは中間結
果の記憶に用いる記憶手段を含む。そのような回路のク
ロックサイクルとして割り当てられる時間は、回路の中
の最も遅いステージの時間と同じかそれ以上でなければ
ならず、全体としての時間は全部のステージの合計より
も長くなる可能性があるからである。In a typical DCT device, since the arithmetic circuit is the most complicated, basically, the arithmetic circuit 1104
Speed determines the overall speed. The arithmetic circuit 1104 in FIG. 77 is configured by dividing the arithmetic processing into several stages described below with reference to FIG. These stages 1114, 1148, 1152,
1156 can be realized by one circuit by a group of commonly used components such as an adder and a multiplier. However, such a circuit 1104 has a drawback that it is slower than an optimized one because a plurality of stages are constituted by one commonly used circuit. This includes storage means used to store intermediate results. The time allocated for the clock cycle of such a circuit must be equal to or greater than the time of the slowest stage in the circuit, and the overall time can be longer than the sum of all stages. Because there is.

【０００９】図７８は図７７に示した装置における、４
ステージのＤＣＴの一部としての典型的な演算データパ
スを示している。図は現実の構成を反映しているわけで
はないが、機能は反映している。４つのステージ１１４
４、１１４８、１１５２、１１５６のそれぞれは一つ
の、再構成可能な回路で構成されている。１サイクル毎
に１−ＤＤＣＴである各４つの演算ステージ１１４４、
１１４８、１１５２、１１５６が再構成される。この回
路において、４つのステージ１１４４、１１４８、１１
５２、１１５６のそれぞれは共通に使用されるもの（加
算器および乗算器）の集まりを使用しており、ハードウ
エアを最小化している。FIG. 78 shows an example of the device 4 shown in FIG.
Fig. 3 shows a typical computational data path as part of a stage DCT. The figure does not reflect the actual configuration, but does reflect the function. Four stages 114
Each of 4, 1148, 1152, and 1156 is composed of one reconfigurable circuit. Each of four arithmetic stages 1144 that are 1-DDCT per cycle,
1148, 1152, 1156 are reconfigured. In this circuit, four stages 1144, 1148, 11
Each of 52 and 1156 uses a collection of commonly used ones (adders and multipliers) to minimize hardware.

【００１０】しかしながらこの回路の欠点は、最適化さ
れたものに比べて遅いということである。４つのステー
ジ１１４４、１１４８、１１５２、１１５６のそれぞれ
は同一の加算器および乗算器の集まりで構成されてい
る。クロック周期は最も遅いステージで決まり、この例
ではブロック１１４４における２０ｎｓである。入力及
び出力多重化器１１４６および１１５４の遅延（各２ｎ
ｓ）およびフリップフロップ１１５０の遅延（３ｎｓ）
を加えると、合計時間は２７ｎｓである。よって、この
ＤＣＴ要素は２７ｎｓで動作することが可能である。However, a disadvantage of this circuit is that it is slower than the optimized one. Each of the four stages 1144, 1148, 1152, 1156 is composed of the same group of adders and multipliers. The clock period is determined by the slowest stage, which in this example is 20 ns in block 1144. The delay of the input and output multiplexers 1146 and 1154 (2n each)
s) and the delay of the flip-flop 1150 (3 ns)
Add, the total time is 27 ns. Thus, this DCT element can operate in 27 ns.

【００１１】パイプライン化されたＤＣＴ要素もまたよ
く知られている。この構成の問題点は構成に多量のハー
ドウエアを必要とすることである。本発明では同じ性能
すなわち処理速度を提供はしないが、非常によい性能対
大きさの妥協点を提供する。さらに、現状のＤＣＴ要素
の大半よりもよい速度的な利点を提供する。よって、従
来の技術が有する１つまたはそれ以上の課題を解決でき
る、改良されたＤＣＴ／逆ＤＣＴ方法及び装置に対する
要求は明確である。特に、ＤＣＴ／逆ＤＣＴ装置におい
て必要な結果を計算する中心的な演算回路に要する時間
を短縮でき、ＤＣＴまたは逆ＤＣＴ全体の性能を向上す
る方法および装置の必要性は明確である。[0011] Pipelined DCT elements are also well known. The problem with this configuration is that it requires a lot of hardware. The present invention does not provide the same performance or processing speed, but offers a very good performance versus size compromise. In addition, it offers better speed advantages than most of the current DCT components. Thus, there is a clear need for an improved DCT / inverse DCT method and apparatus that can solve one or more of the problems of the prior art. In particular, there is a clear need for a method and apparatus that can reduce the time required for a central operation circuit for calculating the required result in a DCT / inverse DCT apparatus and improve the performance of the entire DCT or inverse DCT.

【００１２】[0012]

【課題を解決するための手段】本発明の第１は、可変長
の非符号化ビットフィールドでインターリーブされた複
数の可変長コードにより符号化されたデータと、符号化
されない固定長の非符号化フィールドを有する複数のデ
ータブロックの復号化装置であって、複数の固定長非符
号化フィールドを除去し、可変長非符号化ビットフィー
ルドと可変長の非符号化ビットフィールドでインターリ
ーブされた複数の可変長コードと、複数のデータブロッ
ク中の複数の固定長非符号化フィールドの位置を示す複
数の位置信号とを出力する前処理部と、固定長非符号化
フィールドの間に入力される可変長符号化データの復号
化データが、固定長非符号化フィールドに対応する位置
信号の間に復号化装置から出力されるように、位置信号
を復号化されるデータと同期させて復号化装置の出力へ
受け渡しするする受け渡し手段とを含む復号化装置であ
る。SUMMARY OF THE INVENTION A first aspect of the present invention is to provide data encoded by a plurality of variable-length codes interleaved with variable-length uncoded bit fields and fixed-length uncoded non-coded bits. A decoding device for a plurality of data blocks having fields, wherein a plurality of variable-length uncoded bit fields and a plurality of variable-length uncoded bit fields are interleaved by removing a plurality of fixed-length uncoded bit fields. A preprocessor for outputting a long code and a plurality of position signals indicating positions of a plurality of fixed-length uncoded fields in a plurality of data blocks; and a variable-length code input between the fixed-length uncoded fields. The position signal is decoded such that the decoded data of the encoded data is output from the decoding device during the position signal corresponding to the fixed-length uncoded field. A decoding apparatus comprising a transfer means for to pass data and synchronized with the output of the decoding device.

【００１３】また、好ましくは、第１のバレルシフタセ
ットと第１レジスタを有し、可変長の非符号化ビットフ
ィールドでインターリーブされた複数の可変長コードを
処理する第１処理部と、第２のバレルシフタセットと第
２レジスタを有し、複数のデータブロック中の複数の固
定長非符号化フィールドの位置を示す複数の位置信号を
処理する第２処理部とを更に有し、第１および第２処理
部が同一であり、第１、第２バレルシフタセットの出力
と第１、第２処理部が同じ制御信号を受信する復号化装
置である。Preferably, a first processing unit which has a first barrel shifter set and a first register and processes a plurality of variable length codes interleaved with variable length uncoded bit fields, A second processing unit that has a barrel shifter set and a second register and that processes a plurality of position signals indicating positions of a plurality of fixed-length uncoded fields in the plurality of data blocks; The decoding unit has the same processing unit, and the output of the first and second barrel shifter sets and the first and second processing units receive the same control signal.

【００１４】好ましい別の構成としては、固定長非符号
化フィールドの位置を示す位置信号を処理する第２処理
部の出力が、データレジスタに保管されたデータから復
号化時に除去される非符号化可変長フィールドのサイズ
決定に用いられる復号化装置である。さらに、別の好ま
しい構成としては、前処理部が、複数の固定長非符号化
フィールドを除去し、可変長非符号化ビットフィールド
と可変長の非符号化ビットフィールドでインターリーブ
された複数の可変長コードを、複数の固定長ビットフィ
ールドからなる複数の固定長コードとして、かつ一つの
固定長ビットフィールドが、前処理フィールドでパス又
は除去されたこと、前処理フィールドでパスされたこと
のいずれかを示すタグを有し、かつタグは固定長の非符
号化フィールドを示すマーカーの前又は後ろに存在する
ように出力する復号化装置である。According to another preferred configuration, the output of the second processing unit for processing the position signal indicating the position of the fixed-length unencoded field is such that the output from the data stored in the data register is removed during decoding. This is a decoding device used for determining the size of a variable length field. Further, as another preferred configuration, the pre-processing unit removes a plurality of fixed-length uncoded fields, and interleaves a plurality of variable-length uncoded bit fields and a plurality of variable-length uncoded bit fields. The code is defined as a plurality of fixed-length codes consisting of a plurality of fixed-length bit fields, and one of the fixed-length bit fields is either passed or removed in the pre-processing field or passed in the pre-processing field. A decoding device that has a tag indicating the tag, and outputs the tag so that the tag exists before or after the marker indicating the fixed-length uncoded field.

【００１５】データブロックがハフマン符号化されてい
ることがさらに好ましい。また、本発明の第２は、可変
長の非符号化ビットフィールドでインターリーブされた
複数の可変長コードにより符号化されたデータと、符号
化されない固定長の非符号化フィールドを有する複数の
データブロックの復号化方法であって、複数の固定長非
符号化フィールドを除去し、可変長非符号化ビットフィ
ールドと可変長の非符号化ビットフィールドでインター
リーブされた複数の可変長コードと、複数のデータブロ
ック中の複数の固定長非符号化フィールドの位置を示す
複数の位置信号とを出力する前処理ステップと、固定長
非符号化フィールドの間に入力される可変長符号化デー
タの復号化データが、固定長非符号化フィールドに対応
する位置信号の間に復号化装置から出力されるように、
位置信号を復号化されるデータと同期させて復号化装置
の出力へ受け渡しするする受け渡しステップとを含む復
号化方法である。[0015] More preferably, the data block is Huffman coded. Also, a second aspect of the present invention is that a plurality of data blocks each having data encoded by a plurality of variable length codes interleaved with variable length uncoded bit fields and a plurality of data blocks having fixed length uncoded fields which are not coded. A plurality of fixed length uncoded fields, a plurality of variable length codes interleaved with a variable length uncoded bit field and a variable length uncoded bit field, and a plurality of data A preprocessing step of outputting a plurality of position signals indicating the positions of the plurality of fixed-length uncoded fields in the block; and decoding of variable-length coded data input between the fixed-length uncoded fields. As output from the decoding device during the position signal corresponding to the fixed length uncoded field,
And delivering the position signal to the output of the decoding device in synchronization with the data to be decoded.

【００１６】好ましくは、第１のバレルシフタセットと
第１レジスタを有する第１処理部を用いて、可変長の非
符号化ビットフィールドでインターリーブされた複数の
可変長コードを処理するステップと、第２のバレルシフ
タセットと第２レジスタを有する第２処理部を用いて、
複数のデータブロック中の複数の固定長非符号化フィー
ルドの位置を示す複数の位置信号を処理するステップを
更に有し、第１および第２処理部が同一であり、第１、
第２バレルシフタセットの出力と第１、第２処理部が同
じ制御信号を受信する復号化方法である。Preferably, using a first processing unit having a first barrel shifter set and a first register, processing a plurality of variable length codes interleaved with variable length uncoded bit fields; Using a barrel shifter set and a second processing unit having a second register,
Processing a plurality of position signals indicating positions of a plurality of fixed-length uncoded fields in the plurality of data blocks, wherein the first and second processing units are the same;
This is a decoding method in which the output of the second barrel shifter set and the first and second processing units receive the same control signal.

【００１７】さらには、固定長非符号化フィールドの位
置を示す位置信号を処理する第２処理部の出力に応じ
て、データレジスタに保管されたデータから復号化時に
除去される非符号化可変長フィールドのサイズを決定す
るステップを更に有する復号化方法である。好ましく
は、前処理部ステップにおいて、複数の固定長非符号化
フィールドを除去し、可変長非符号化ビットフィールド
と可変長の非符号化ビットフィールドでインターリーブ
された複数の可変長コードを、複数の固定長ビットフィ
ールドからなる複数の固定長コードとして、かつ一つの
固定長ビットフィールドが、前処理フィールドでパス又
は除去されたこと、前処理フィールドでパスされたこと
のいずれかを示すタグを有し、かつタグは固定長の非符
号化フィールドを示すマーカーの前又は後ろに存在する
ように出力する復号化方法である。Further, in accordance with the output of the second processing section which processes the position signal indicating the position of the fixed-length uncoded field, the uncoded variable-length data to be removed from the data stored in the data register at the time of decoding. A decoding method further comprising determining a size of a field. Preferably, in the preprocessing unit step, a plurality of fixed-length uncoded fields are removed, and a plurality of variable-length codes interleaved with a variable-length uncoded bit field and a variable-length uncoded bit field are replaced with a plurality of variable-length codes. As a plurality of fixed-length codes consisting of fixed-length bit fields, and having a tag indicating that one fixed-length bit field has been passed or removed in the preprocessing field, or passed in the preprocessing field. , And a tag is a decoding method in which a tag is output so as to exist before or after a marker indicating a fixed-length uncoded field.

【００１８】また、データブロックがハフマン符号化さ
れていることが好ましい。以下の詳細な説明において
は、他の説明はもとより、特に図８２〜９１およびそれ
に関係する説明に注意されたい。本発明の第３は、離散
コサイン変換（ＤＣＴ）装置であって、転置行列記憶手
段と、転置行列記憶手段と相互接続され、ｃｌｏｃｋｅ
ｄｓｔｏｒａｇｅ手段を用いずにＤＣＴ演算を行うた
めの組合せ回路からなる演算回路を有するＤＣＴ装置で
ある。Preferably, the data block is Huffman coded. In the following detailed description, attention should be paid especially to FIGS. 82 to 91 and the description related thereto, as well as other descriptions. A third aspect of the present invention is a discrete cosine transform (DCT) device, wherein the transposed matrix storage means is interconnected with the transposed matrix storage means, and
This is a DCT device having an arithmetic circuit composed of a combinational circuit for performing a DCT operation without using d storage means.

【００１９】好ましくは、組合せ回路がＤＣＴ演算を行
うための所定数のステージを有し、ステージがシーケン
シャルに配置ＤＣＴ装置である。また好ましくは、ＤＣ
Ｔ装置に入力されるデータと、転置行列記憶手段の出力
とを多重化する多重化手段を有するＤＣＴ装置である。
またＤＣＴ装置の動作を制御する制御手段を有するＤＣ
Ｔ装置である。Preferably, the combination circuit has a predetermined number of stages for performing a DCT operation, and the stages are sequentially arranged DCT devices. Also preferably, DC
This is a DCT device having multiplexing means for multiplexing data input to the T device and output of the transposed matrix storage means.
A DCT having control means for controlling the operation of the DCT device;
T device.

【００２０】本発明の第４は、逆離散コサイン変換（Ｄ
ＣＴ）装置であって、転置行列記憶手段と、転置行列記
憶手段と相互接続され、ｃｌｏｃｋｅｄｓｔｏｒａｇ
ｅ手段を用いずにＤＣＴ演算を行うための組合せ回路を
主構成要素とする演算回路を有する逆ＤＣＴ装置であ
る。The fourth aspect of the present invention is the inverse discrete cosine transform (D
CT) apparatus, comprising: a transposed matrix storage means; and a clocked storage interconnected with the transposed matrix storage means.
This is an inverse DCT apparatus having an arithmetic circuit mainly including a combinational circuit for performing a DCT operation without using the e-means.

【００２１】本発明の第５は、データの離散コサイン変
換（ＤＣＴ）方法であって、ＤＣＴ演算をｃｌｏｃｋｅ
ｄｓｔｒａｇｅ手段無しで行う組合せ回路を主要構成
とする演算回路を用い、入力データをその第１の方向に
合わせてＤＣＴ演算するステップと、ＤＣＴされた入力
データを第１の方向に合わせて組合せ回路と相互接続さ
れた転置行列記憶手段に記憶するステップと、演算回路
を用い、転置行列記憶手段に記憶されたデータをその第
２の方向に合わせてＤＣＴ演算して変換データを得るス
テップとを有する方法である。A fifth aspect of the present invention is a discrete cosine transform (DCT) method for data, in which a DCT operation is performed using a clock.
a step of performing a DCT operation on input data in a first direction using an operation circuit mainly including a combination circuit performed without d storage means; and a step of performing a DCT operation on the input data subjected to DCT in a first direction. Storing the data in the transposed matrix storage means using the arithmetic circuit and performing DCT operation on the data stored in the transposed matrix storage means in accordance with the second direction to obtain transformed data. Is the way.

【００２２】好ましくは、ＤＣＴがシーケンシャルに配
置された所定数のステージで演算されるＤＣＴ方法であ
り、また入力されるデータと転置行列記憶手段の出力と
を多重化するステップを有してもよい。本発明の第６
は、逆離散コサイン変換（ＤＣＴ）方法であって、逆Ｄ
ＣＴ演算をｃｌｏｃｋｅｄｓｔｒａｇｅ手段無しで行
う組合せ回路を主要構成とする演算回路を用い、入力係
数をその第１の方向に合わせてＤＣＴ演算するステップ
と、ＤＣＴされた入力係数を第１の方向に合わせて組合
せ回路と相互接続された転置行列記憶手段に記憶するス
テップと、演算回路を用い、転置行列記憶手段に記憶さ
れた係数をその第２の方向に合わせて逆ＤＣＴ演算して
逆変換データを得るステップとを有する方法である。Preferably, the DCT method is a DCT method in which the DCT is calculated in a predetermined number of stages arranged sequentially, and the method may further include a step of multiplexing the input data and the output of the transposed matrix storage means. . Sixth Embodiment
Is the inverse discrete cosine transform (DCT) method, and the inverse D
A step of performing a DCT operation by adjusting an input coefficient in a first direction using an operation circuit mainly including a combination circuit that performs a CT operation without clocked storage means; and adjusting an input coefficient subjected to DCT in a first direction. Storing the coefficients in the transposed matrix storage means interconnected with the combinational circuit, and performing an inverse DCT operation on the coefficients stored in the transposed matrix storage means in accordance with the second direction using an arithmetic circuit to obtain inverse transformed data. And a step of obtaining.

【００２３】以下の詳細な説明においては、他の説明は
もとより、特に図７９、８０および８１およびそれに関
係する説明に注意されたい。In the following detailed description, particular attention should be paid to FIGS. 79, 80 and 81 and the related description, as well as other descriptions.

【００２４】[0024]

BEST MODE FOR CARRYING OUT THE INVENTION

「目次」１．０図面の簡単な説明２．０テーブルリスト３．０好適な及び他の実施例３．１複数のストリームアーキテクチャの概要３．２ホスト／コプロセッサのキューイング３．３コプロセッサのレジスタ説明３．４複数のストリームのフォーマット３．５現アクティブストリームの判定３．６現アクティブストリームのフェッチ命令３．７命令のデコード及び実行３．８命令コントローラのレジスタの更新３．９レジスタアクセスセマフォの意味論３．１０命令コントローラ３．１１ローカルレジスタファイルモジュールの説明３．１２レジスタのリード・ライト処理３．１３メモリエリアのリード／ライト処理３．１４Ｃバス構造３．１５コプロセッサのデータタイプとデータ操作３．１６データ正規化処理３．１７アクセラレータカードの画像処理３．１７．１合成３．１７．２色空間変換命令ａ．単一出力カラー空間（ＳＯＧＣＳ）変換モードｂ．複数出力からー空間モード３．１７．３ＬＰＥＧ符号化／復号化ａ．符号化ｂ．復号化３．１７．４テーブル索引３．１７．５データ符号化命令３．１７．６高速ＤＣＴ装置３．１７．７ハフマン復号３．１７．８イメージ変換命令３．１７．９コンボルージョン命令３．１７．１０マトリクス乗算３．１７．１１階調（ハーフトーン）３．１７．１２階層イメージフォーマット伸長３．１７．１３メモリコピー命令ａ．汎用データ移動命令ｂ．ローカルＤＭＡ命令３．１７．１４フロー制御命令３．１８アクセラレータカードのモジュール３．１８．１ピクセルオーガナイザ３．１８．２ＭＵＶバッファ３．１８．３結果オーガナイザ３．１８．４オペランドオーガナイザＢ，Ｃ３．１８．５メインデータパスユニット３．１８．６データキャッシュコントローラとキャッ
シュａ．ノーマルキャッシュモードｂ．単一出力一般カラー空間変換モードｃ．複数出力一般カラー空間変換モードｄ．ＪＰＥＧ符号化モードｅ．低速ＪＰＥＧ復号モードｆ．マトリクス乗算モードｇ．ディスエーブルモードｈ．無効化モード３．１８．７入力インターフェーススイッチ３．１８．８ローカルメモリコントローラ３．１８．９その他のモード３．１８．１０外部インターフェースコントローラ３．１８．１１周辺インターフェースコントローラテーブル索引テーブル１：レジスタの説明テーブル２：オペコードの説明テーブル３：オペランドタイプテーブル４：オペランド説明テーブル５：モジュールセットアップ順序テーブル６：Ｃバス信号の定義テーブル７：Ｃバスのトランザクションタイプテーブル８：データ操作レジスタフォーマットテーブル９：希望データタイプテーブル１０：シンボル説明テーブル１１：合成処理テーブル１２：ＳＯＧＣＳモード用アドレス合成テーブル１２Ａ：色空間変換用命令符号化テーブル１３：色変換命令用のマイナーオペコード符
号化テーブル１４：データキャッシュに記憶されたハフマ
ン及び量子化テーブルテーブル１５：フェッチアドレステーブル１６：ハフマン符号化用テーブルテーブル１７：ハフマン及び量子化テーブル用バンク
アドレステーブル１８：命令ワード−マイナーオペコードフィ
ールドテーブル１９：命令ワード−マイナーオペコードフィ
ールドテーブル２０：命令オペランド−結果ワードテーブル２１：命令ワードテーブル２２：命令オペランド−結果ワードテーブル２３：命令ワードテーブル２４：命令オペランド−結果ワードテーブル２５：命令ワード−マイナーオペコードフィ
ールドテーブル２６：命令ワード−マイナーオペコードフィ
ールドテーブル２７：分数テーブル［好適ならびに他の実施例の説明」好適な実施例では、
ハードウェアアクセラレータによる２つの独立命令スト
リームの利用によってハードウェアラスタリングを行う
ことで大きな利点が得られている。従って、第一の命令
ストリームが現ページの印刷準備をしている間に、次の
命令ストリームが次ページの印刷準備をすることができ
る。ハードウェア資源は、ハードウェアアクセラレータ
が出力装置以上の速度で動作可能である場合に特に効率
的に利用することができる。Table of Contents 1.0 Brief Description of the Drawings 2.0 Table List 3.0 Preferred and Alternative Embodiments 3.1 Overview of Multiple Stream Architectures 3.2 Host / Coprocessor Queuing 3.3 Coprocessor 3.4 Format of multiple streams 3.5 Judgment of current active stream 3.6 Fetch instruction of current active stream 3.7 Decode and execute instruction 3.8 Update register of instruction controller 3.9 Register access Semaphore semantics 3.10 Instruction controller 3.11 Description of local register file module 3.12 Register read / write processing 3.13 Memory area read / write processing 3.14 C bus structure 3.15 Coprocessor data Type and data operation 3.16 Data normalization processing 3.17 image processing accelerator card 3.17.1 Synthesis 3.17.2 color space conversion instruction a. Single output color space (SOGCS) conversion mode b. From multiple outputs-spatial mode 3.17.3 LPEG encoding / decoding a. Encoding b. Decoding 3.17.4 Table index 3.17.5 Data encoding instruction 3.17.6 High-speed DCT device 3.17.7 Huffman decoding 3.17.8 Image conversion instruction 3.17.9 Convolution instruction 3 .17.10 Matrix multiplication 3.17.11 Gray scale (halftone) 3.17.12 Hierarchical image format decompression 3.17.13 Memory copy instruction a. General-purpose data move instruction b. Local DMA instruction 3.17.14 Flow control instruction 3.18 Accelerator card module 3.18.1 Pixel organizer 3.18.2 MUV buffer 3.18.3 Result organizer 3.18.4 Operand organizer B, C 3 .18.5 Main data path unit 3.18.6 Data cache controller and cache a. Normal cache mode b. Single output general color space conversion mode c. Multiple output general color space conversion mode d. JPEG encoding mode e. Low speed JPEG decoding mode f. Matrix multiplication mode g. Disable mode h. Invalidation mode 3.18.7 Input interface switch 3.18.8 Local memory controller 3.18.9 Other modes 3.18.10 External interface controller 3.18.11 Peripheral interface controller Table index Table 1: Register Description Table 2: Opcode Description Table 3: Operand Type Table 4: Operand Description Table 5: Module Setup Order Table 6: C Bus Signal Definitions Table 7: C Bus Transaction Types Table 8: Data Manipulation Register Format Table 9: Desired Data type Table 10: Explanation of symbols Table 11: Combination processing Table 12: Address combination for SOGCS mode Table 12A: Instruction encoding for color space conversion Table 13: Huffman and quantization table stored in data cache Table 15: Fetch address Table 16: Huffman coding table Table 17: Huffman and quantization table bank address Table 18: Instruction Word-Minor Opcode Field Table 19: Instruction Word-Minor Opcode Field Table 20: Instruction Operand-Result Word Table 21: Instruction Word Table 22: Instruction Operand-Result Word Table 23: Instruction Word Table 24: Instruction Operand- Result Word Table 25: Instruction Word-Minor Opcode Field Table 26: Instruction Word-Minor Opcode Field Table 27: Fractional Table Description of preferred and other embodiments "In a preferred embodiment,
Significant advantages have been obtained by performing hardware rastering through the use of two independent instruction streams by a hardware accelerator. Thus, while the first instruction stream is preparing to print the current page, the next instruction stream can prepare to print the next page. Hardware resources can be used particularly efficiently when the hardware accelerator can operate at a speed higher than the output device.

【００２５】好適な実施例では、２命令ストリームを用
いる構成を示す。しかし、２以上の命令ストリームを用
いる構成も可能であり、ハードウェアトレードオフを鑑
みてもより多くのストリームを用いることによる利点が
得られる。２つのストリームを用いることで、ラスタ画
像コプロセッサのハードウェア資源は、出力装置に応じ
て現ページ、バンド、ストリップなどを印刷装置に転送
している間にも、続くページ、バンド、ストリップなど
の準備に常に関わることができる。３．１複数ストリームアーキテクチャの一般構成図１は、好適な実施例を含むコンピュータハードウェア
構成２０１を模式的に示した図である。構成２０１に
は、ブリッジ２０４を介してホスト記憶メモリ２０３に
接続されたホストＣＰＵ２０２から成る標準ホストコン
ピュータシステムが含まれている。ホストコンピュータ
システムには、オペレーティングシステムプログラム、
アプリケーション、情報ディスプレイなどの一般のコン
ピュータシステム機能が備わっており、ホストコンピュ
ータシステムはＰＣＩバスインタフェース２０７を介し
て標準ＰＣＩバス２０６に接続されている。なお、ＰＣ
Ｉ標準は良く知られた業界標準であり、市販のほとんど
のコンピュータシステム、特にマイクロソフトウインド
ウズ（商標）オペレーティングシステムを搭載している
システムには、ＰＣＩバス２０６が備わっている。ＰＣ
Ｉバス２０６を用いることにより、ＰＣＩバスインタフ
ェース２１０、他のデバイス２１１、ローカルメモリ２
１２などを更に含む１つ或は複数のＰＣＩカード（例え
ば２０９）を構成２０１に付加して利用することが容易
になる。The preferred embodiment shows a configuration using two instruction streams. However, a configuration using two or more instruction streams is also possible, and the advantage of using more streams can be obtained in view of hardware trade-off. By using two streams, the hardware resources of the raster image coprocessor can be used to transfer the current page, band, strip, etc., to the printing device, depending on the output device, and to transfer subsequent pages, bands, strips, etc. Always be involved in preparation. 3.1 General Configuration of Multiple Stream Architecture FIG. 1 is a diagram schematically illustrating a computer hardware configuration 201 including a preferred embodiment. The configuration 201 includes a standard host computer system comprising a host CPU 202 connected to a host storage memory 203 via a bridge 204. The host computer system contains operating system programs,
General computer system functions such as applications and information displays are provided, and the host computer system is connected to the standard PCI bus 206 via the PCI bus interface 207. In addition, PC
The I standard is a well-known industry standard, and most computer systems on the market, especially those with the Microsoft Windows ™ operating system, have a PCI bus 206. PC
By using the I bus 206, the PCI bus interface 210, other devices 211, and the local memory 2
One or a plurality of PCI cards (for example, 209) further including the T.12 or the like can be easily used by being added to the configuration 201.

【００２６】好適な実施例では、ページ記述言語で表現
されたグラフィックス処理を高速にするために、ラスタ
画像アクセラレータカード２２０を備える。ラスタ画像
アクセラレータカード（ＰＣＩバスインタフェース２２
１を備える）は、他のＰＣＩカード２０９などと同様に
ホストＣＰＵ２０２とは、緩やかに結合された共有メモ
リの形態で動作するように設計されている。なお、必要
であれば、画像アクセラレータカード２２０を更にホス
トコンピュータシステムに付加することもできる。ラス
タ画像アクセラレータカードは、ラスタ画像処理動作に
おける複雑かつ多量の動作処理を高速化するためのもの
であり、これらの動作としては、（ａ）合成（ｂ）一般化色空間変換（ｃ）ＪＰＥＧ符号化／復号（ｄ）ハフマン、ランレングス、予測符号化／復号（ｅ）階層的画像（商標）復号（ｆ）一般化アフィン画像変換（ｇ）小カーネル畳込演算（コンボルージョン）（ｈ）行列演算（ｉ）ハーフトーン処理（ｊ）一括算術／メモリコピー演算ラスタ画像アクセラレータカード２２０は更にラスタ画
像コプロセッサ２２４に接続されたローカルメモリ２２
３を備え、ラスタ画像コプロセッサ２２４はホストＣＰ
Ｕ２０２からの命令に基づいてラスタ画像アクセラレー
タカード２２０を起動する。ここで、コプロセッサ２２
４は特定用途向けＬＳＩ（ＡＳＩＣ）であることが望ま
しい。また、ラスタ画像コプロセッサ２２４は、必要な
少なくとも１つのプリンターデバイス２２６を周辺イン
タフェース２２５を介して制御する能力を有する。更
に、画像アクセラレータカード２２０は、スキャナなど
の入力／出力デバイスを制御することも可能である。あ
わせて、アクセラレータカード２２０にはラスタ画像コ
プロセッサ２２４に接続された一般外部インターフェー
ス２２７が備えられており、モニタリングやテストを行
うこともできる。。In a preferred embodiment, a raster image accelerator card 220 is provided to speed up graphics processing expressed in a page description language. Raster image accelerator card (PCI bus interface 22
1) is designed to operate in the form of a shared memory loosely coupled to the host CPU 202 like other PCI cards 209 and the like. If necessary, the image accelerator card 220 can be further added to the host computer system. The raster image accelerator card is for accelerating complicated and large amount of operation processing in the raster image processing operation. These operations include (a) synthesis (b) generalized color space conversion (c) JPEG code (D) Huffman, run-length, predictive encoding / decoding (e) Hierarchical image (trademark) decoding (f) Generalized affine image transformation (g) Small kernel convolution operation (convolution) (h) Matrix Operation (i) Halftone processing (j) Batch arithmetic / memory copy operation The raster image accelerator card 220 further includes a local memory 22 connected to a raster image coprocessor 224.
3 and the raster image coprocessor 224
The raster image accelerator card 220 is activated based on a command from U202. Here, the coprocessor 22
4 is desirably an application specific LSI (ASIC). Also, the raster image coprocessor 224 has the ability to control at least one required printer device 226 via the peripheral interface 225. Further, the image accelerator card 220 can control input / output devices such as a scanner. In addition, the accelerator card 220 includes a general external interface 227 connected to the raster image coprocessor 224, and can perform monitoring and testing. .

【００２７】実行モードでは、ホストＣＰＵ２０２がＰ
ＣＩバス２０６を介して一連の命令やデータを送信し、
ラスタ画像コプロセッサ２２４で画像の生成処理を行
う。送信されたデータはローカルメモリ２２３のみなら
ずラスタ画像コプロセッサ２２４中のキャッシュ２３
０、あるいはコプロセッサ２２４中のレジスタ２２９に
蓄えられる。In the execution mode, the host CPU 202
A series of commands and data are transmitted via the CI bus 206,
The raster image coprocessor 224 performs an image generation process. The transmitted data is stored not only in the local memory 223 but also in the cache 23 in the raster image coprocessor 224.
0 or stored in a register 229 in the coprocessor 224.

【００２８】図２は、ラスタ画像コプロセッサ２２４を
より詳細に示した図である。コプロセッサ２２４は、前
記の処理を高速化するためのものであり、命令制御部２
３５の制御下にある複数の部位から構成される。コプロ
セッサが外界と通信するために、図１のローカルメモリ
２２３と通信するためのローカルメモリ制御部２３６を
具備している。周辺インタフェース制御部２３７は、プ
リンタデバイスとの通信に利用されるもので、セントロ
ニクスインタフェース標準フォーマットや他のビデオイ
ンタフェースフォーマットなどの標準フォーマットを利
用する。周辺インタフェース制御部２３７はローカルメ
モリ制御部２３６と内部接続されている。ローカルメモ
リ制御部２３６と外部インタフェース制御部２３８とは
入力インタフェーススイッチ２５２を介して接続されて
おり、入力インタフェーススイッチ２５２は命令制御部
２３５と接続されている。入力インタフェーススイッチ
２５２はまたピクセルオーガナイザ２４６とデータキャ
ッシュ制御部２４０に接続されている。入力インタフェ
ーススイッチ２５２は、外部インタフェース制御部２３
７とローカルメモリ制御部２３６からのデータをスイッ
チして命令制御部２３５、あるいはデータキャッシュ制
御部２４０、ピクセルオーガナイザ２４６に転送するた
めのものである。FIG. 2 shows the raster image coprocessor 224 in more detail. The coprocessor 224 is for accelerating the above-described processing, and includes an instruction control unit 2
It consists of a plurality of sites under the control of 35. The coprocessor includes a local memory controller 236 for communicating with the local memory 223 of FIG. 1 for communicating with the outside world. The peripheral interface control unit 237 is used for communication with the printer device, and uses a standard format such as a Centronics interface standard format or another video interface format. The peripheral interface controller 237 is internally connected to the local memory controller 236. The local memory control unit 236 and the external interface control unit 238 are connected via an input interface switch 252, and the input interface switch 252 is connected to the command control unit 235. Input interface switch 252 is also connected to pixel organizer 246 and data cache control 240. The input interface switch 252 is connected to the external interface control unit 23
7 and the data from the local memory control unit 236 are switched and transferred to the instruction control unit 235, the data cache control unit 240, and the pixel organizer 246.

【００２９】外部インタフェース制御部２３８は、図１
中のＰＣＩバス２０６と通信するためにラスタ画像コプ
ロセッサ２２４中に具備されており、命令制御部２３５
と接続されている。また、テスト診断を行ったり、クロ
ック信号やグローバル信号を入力するために、命令制御
部２３９に接続され、コプロセッサ２２４と協調して動
作する他モジュール２３９が備わっている。The external interface control unit 238 operates as shown in FIG.
A command control unit 235 is provided in the raster image coprocessor 224 for communicating with the PCI bus 206 therein.
Is connected to Further, another module 239 connected to the instruction control unit 239 and operating in cooperation with the coprocessor 224 is provided for performing test diagnosis and inputting a clock signal and a global signal.

【００３０】データキャッシュ２３０は、接続されてい
るデータキャッシュ制御部２４０の制御下で動作する。
データキャッシュ２３０は種々の用途において用いられ
るが、コプロセッサ２２４において引き続き使用される
確率の高い最近使用した値を蓄えるために主として用い
られる。上述の高速化処理は、主としてＪＰＥＧ符号化
／復号器２４１やメインデータパス部２４２によって複
数のデータストリームの処理が行われる。部位２４１、
２４２は並列にピクセルオーガナイザ２４６と２つのオ
ペランドオーガナイザ２４７、２４８に接続されてい
る。部位２４１、２４２からの処理されたストリーム
は、結果オーガナイザ２４９に転送され、必要であれば
処理や再フォーマット処理が行われる。なお、中間結果
を記録しておきたいことも多いため、データキャッシュ
２３０に加えて、ピクセルオーガナイザ２４６と結果オ
ーガナイザ２４９との間にマルチユースト値（ＭＵＶ）
バッファ２５０を備えている。結果オーガナイザ２４９
からの結果は、必要であれば外部インタフェース制御部
２３８、ローカルメモリ制御部２３６、周辺インタフェ
ース制御部２３７に出力される。The data cache 230 operates under the control of the connected data cache control unit 240.
The data cache 230 is used in various applications, but is primarily used to store recently used values that are likely to be subsequently used in the coprocessor 224. In the above-described high-speed processing, processing of a plurality of data streams is mainly performed by the JPEG encoder / decoder 241 and the main data path unit 242. Part 241,
242 is connected in parallel to a pixel organizer 246 and two operand organizers 247,248. The processed streams from the parts 241 and 242 are transferred to the result organizer 249, where processing and reformatting are performed if necessary. Since it is often desired to record an intermediate result, in addition to the data cache 230, a multi-value (MUV) value is provided between the pixel organizer 246 and the result organizer 249.
A buffer 250 is provided. Result Organizer 249
Are output to the external interface control unit 238, the local memory control unit 236, and the peripheral interface control unit 237 if necessary.

【００３１】図２中の点線で示されているように、さら
なる（第３の）データパス部２４３を、ＪＰＥＧ符号化
／復号器２４１とメインデータパス部２４２といった他
の二つのデータパスと「並列に」接続することも可能で
ある。また、四あるいはそれ以上のデータパスを構成す
ることも同様に可能である。なお、パスは「並列に」接
続されてはいるが、並列に動作するものではなく、一つ
のパスのみが一時に動作するものであることに注意され
たい。As shown by the dotted line in FIG. 2, a further (third) data path section 243 is divided into two other data paths such as a JPEG encoder / decoder 241 and a main data path section 242 by " A "parallel" connection is also possible. It is equally possible to configure four or more data paths. Note that the paths are connected "in parallel," but do not operate in parallel, only one path operates at a time.

【００３２】図２のＡＳＩＣの全体設計は以下のような
考えに基づいてなされた。まず第１に、印刷ページでは
小さな、或は一時的な画質劣化をも生じさせないことが
必須である。映像信号では、このような小さな画質劣化
が存在したとしても人間の目では感知されることはない
が、印刷物では印刷ページに永久的に小さな画質劣化が
残ってしまい、目立つようになることもあるからであ
る。更に、プリンタに至るまでに遅延が生じると、ペー
ジがプリンタ内を移動している間に白い未印刷の部位が
ページ上にできてしまうことがあるため、見苦しいもの
となる。そのため、高品質かつ高速に結果を提供するこ
とが必須となり、ソフトウエアを用いるアプローチより
もハードウェアの高速性に頼るアプローチの方が好まし
い。The overall design of the ASIC shown in FIG. 2 is based on the following concept. First, it is essential that the print page does not cause small or temporary image quality deterioration. In a video signal, even if such a small image quality deterioration exists, it is not perceived by human eyes, but in a printed matter, a small image quality deterioration is permanently left on a printed page and may become conspicuous. Because. Further, if a delay occurs before reaching the printer, a white unprinted portion may be formed on the page while the page is moving inside the printer, which is unsightly. For this reason, it is essential to provide high-quality and high-speed results, and an approach relying on high-speed hardware is preferable to an approach using software.

【００３３】第２に、印刷処理を実行するのに必要なさ
まざまな動作ステップ（アルゴリズム）すべてをリスト
アップし、各ステップごとに対応するハードウェアを並
べ上げると、全体のハードウェア量は膨大なものにな
り、非常に高価なものになってしまう。また、ハードウ
ェアの動作スピードは、処理に必要なデータをフェッチ
したり、あるいは処理で生成されたデータを転送するレ
ートによって本質的に制限される。すなわち、動作スピ
ードはインタフェースの帯域幅によって制約を受ける。Second, all the various operation steps (algorithms) necessary for executing the printing process are listed, and the hardware corresponding to each step is arranged. And it is very expensive. Also, the operation speed of hardware is essentially limited by the rate at which data required for processing is fetched or data generated by processing is transferred. That is, the operation speed is restricted by the bandwidth of the interface.

【００３４】これに対して、全体のＡＳＩＣのデザイン
は、ハードウェアの全体量を模式的に表したときに、必
要なハードウェアの種々の部位が（ａ）重複しており、
（ｂ）同時に実行されることはない、という驚くべき事
実に基づいている。特に、この点はデータの処理をする
前にデータを転送する際のオーバヘッドにおいて顕著に
みられる。On the other hand, in the overall ASIC design, when the total amount of hardware is schematically represented, (a) various parts of necessary hardware overlap,
(B) It is based on the surprising fact that they are not performed simultaneously. In particular, this point is conspicuous in the overhead when transferring data before processing the data.

【００３５】このような観点から、いつくかのステップ
を経て、ハードウェアのすべての部位をできるだけアク
ティブにしながら、ハードウェア量を低減することにし
た。第１のステップにおいて、画像操作では多くの場合
同一の基本的種類の繰り返し演算が必要であることを認
識した。従って、データがストリーム状に入力される
と、特定の処理を行うように処理部を構成して長いデー
タストリームを処理し、その後次に必要な処理タイプに
合うように処理部を再構成する。データストリームがか
なり長いと、再構成に要する時間は全体の処理時間と比
較して無視できるほど短くなるため、スループットが向
上することになる。From this point of view, after several steps, it was decided to reduce the amount of hardware while keeping all parts of the hardware as active as possible. In the first step, it was recognized that image manipulation often required the same basic type of repetitive operation. Therefore, when data is input in the form of a stream, the processing unit is configured to perform a specific process, processes a long data stream, and then reconfigures the processing unit to match the next required processing type. If the data stream is fairly long, the time required for reconstruction is negligibly short compared to the overall processing time, thus improving throughput.

【００３６】また、複数のデータ処理パスを設けると、
他のパスを使用している間に一つのパスを再構成するこ
とで、再構成に要する時間の無駄を省くこともできる。
すなわち、メインデータパス部２４２がより汎用的な処
理を実行している間に、他のデータパスにおいて部位２
４１のようなＪＰＥＧ符号化／復号、あるいは追加部位
２４３がある場合にはエントロピー符号化やハフマン符
号化などのより特化した処理を行うことができる。When a plurality of data processing paths are provided,
By reconfiguring one path while using another path, waste of time required for reconfiguration can be eliminated.
In other words, while the main data path unit 242 is executing more general-purpose processing, the other data path
If there is a JPEG encoding / decoding such as 41, or an additional portion 243, more specialized processing such as entropy encoding or Huffman encoding can be performed.

【００３７】更に、処理を進めている間に、処理部位へ
のデータのフェッチや転送を行うこともできる。また、
種々の種別のデータを標準化、統一することにより、更
に高速化を図ることができるとともに、ハードウェア資
源も有効に利用することができる。従って、データのフ
ェッチや転送に関わる全体のオーバヘッドを低減するこ
とができる。Further, while the processing is in progress, data can be fetched or transferred to the processing part. Also,
By standardizing and unifying various types of data, the speed can be further increased, and hardware resources can be used effectively. Therefore, the overall overhead related to data fetch and transfer can be reduced.

【００３８】ここで重要なことは、コプロセッサ２２４
がホストＣＰＵ２０２（図１）の制御の下で実行される
ことである。この点で、命令制御部２３５が、コプロセ
ッサ２２４全体の制御を統括する。命令制御部２３５
は、ＣＢｕｓ（Ｃバス）と呼ばれる制御バス２３１によ
ってコプロセッサ２２４を動作させる。ＣＢｕｓ２３１
はそれぞれのモジュール中のセットレジスタ（図１の２
３１）を含むモジュール２３６−２５０のそれぞれに接
続され、コプロセッサ２２４の全体の動作を可能とす
る。図２を見やすくするために、図２では制御バス２３
１からそれぞれのモジュール２３６−２５０までの接続
は示していない。What is important here is that the coprocessor 224
Is executed under the control of the host CPU 202 (FIG. 1). At this point, the instruction control unit 235 supervises control of the entire coprocessor 224. Command control unit 235
Operate the coprocessor 224 by a control bus 231 called CBus (C bus). CBus231
Is a set register (2 in FIG. 1) in each module.
31) is connected to each of the modules 236-250, and enables the entire operation of the coprocessor 224. In order to make FIG. 2 easier to see, FIG.
The connections from 1 to the respective modules 236-250 are not shown.

【００３９】図３は、利用可能なモジュールレジスタの
模式的なレイアウト２６０を示した図である。レイアウ
ト２６０は、コプロセッサ２２４の全体制御のためのレ
ジスタ２６１と命令制御部２３５とが含まれる。コプロ
セッサモジュール２３６−２６０には、同様のレジスタ
２６２が含まれる。３．２ホスト／コプロセッサ・キューイング上述のアーキテクチャによれば、ホストプロセッサ２０
２と画像コプロセッサ２０４との間での協調が十分にと
られていることが必要であることがわかる。しかしなが
ら、これに対する解は一般的なものであり、上述のアー
キテクチャ特有のものではないため、以下ではより一般
的な計算ハードウェア環境を想定して説明する。FIG. 3 is a diagram showing a schematic layout 260 of available module registers. The layout 260 includes a register 261 for overall control of the coprocessor 224 and an instruction control unit 235. Coprocessor modules 236-260 include a similar register 262. 3.2 Host / Coprocessor Queuing According to the architecture described above, the host processor 20
It can be seen that sufficient coordination between the image coprocessor 2 and the image coprocessor 204 is required. However, the solution to this is general and not specific to the above-described architecture, and will be described below assuming a more general computing hardware environment.

【００４０】現代のコンピュータシステムは、動的メモ
リ割当を行うために何かしらのメモリ管理手法を必要と
する。１つあるいは複数のコプロセッサを有するシステ
ムでは、コプロセッサによる動的メモリ割当とメモリ使
用との間で同期をとるための手法が必要である。一般的
なコンピュータハードウェア構成では、ＣＰＵと特別の
コプロセッサとを備え、それぞれが一連のメモリ群を共
有している。このようなシステムでは、ＣＰＵのみがメ
モリを動的に割り当てることのできるシステム中唯一の
部位である。コプロセッサが使用するようにＣＰＵがメ
モリを割り当てた時点で、コプロセッサは当該メモリが
不必要になりＣＰＵによって解放されるまで、自由にメ
モリを利用することができる。すなわち、コプロセッサ
がメモリの使用を終えた後にメモリが解放されることを
保証するために、ＣＰＵとコプロセッサとの間には何か
しらの同期が必要となる。この同期に関しては、種々の
解決策が示されてはいるが、必ずしも性能の面で好まし
いとは言い難い。Modern computer systems require some form of memory management to perform dynamic memory allocation. Systems having one or more coprocessors require a technique for synchronizing dynamic memory allocation and memory usage by the coprocessors. A general computer hardware configuration includes a CPU and a special coprocessor, each of which shares a series of memory groups. In such a system, only the CPU is the only part in the system to which the memory can be dynamically allocated. Once the CPU allocates the memory for use by the coprocessor, the coprocessor is free to use the memory until the memory is no longer needed and is released by the CPU. That is, some kind of synchronization is required between the CPU and the coprocessor to ensure that the memory is released after the coprocessor has finished using the memory. Although various solutions have been suggested for this synchronization, it is not always preferable in terms of performance.

【００４１】静的に割り当てられたメモリを用いれば、
同期の問題を避けることができるが、メモリ資源の利用
を動的に適応させることが不可能となる。同様に、コプ
ロセッサが処理の実行を終えるまでＣＰＵをブロックし
待たせておくことも可能であるが、並列性を失い、全体
のシステム性能を犠牲にすることになる。コプロセッサ
からの処理の終了を知らせるインタラプト信号の利用も
可能であるが、コプロセッサのスループットが非常に高
い場合には大きな処理のオーバヘッドとなってしまう。Using a statically allocated memory,
Although synchronization problems can be avoided, it is not possible to dynamically adapt memory resource utilization. Similarly, it is possible to block the CPU and wait until the coprocessor completes execution of the processing, but it loses parallelism and sacrifices overall system performance. Although it is possible to use an interrupt signal for notifying the end of the processing from the coprocessor, if the throughput of the coprocessor is extremely high, a large processing overhead is required.

【００４２】高性能要件の他に、このようなシステムで
は動的なメモリ欠乏に対してしなやかに対処しなければ
ならない。多くのコンピュータシステムでは種々のメモ
リサイズ構成が可能となっているが、多くのメモリを具
備するシステムでは有効資源を最大限に利用して性能を
最大にすることが重要である。同様に、最小のメモリサ
イズ構成のシステムでは、少ないメモリながらも十分な
動作を可能にすべきであり、少なくともメモリ欠乏の際
には性能がしなやかに劣化すべきである。In addition to high performance requirements, such systems must flexibly address dynamic memory starvation. While many computer systems allow various memory size configurations, it is important to maximize available resources and maximize performance in systems with many memories. Similarly, a system with a minimum memory size configuration should be able to operate satisfactorily with a small amount of memory, and its performance should be degraded gently at least in the case of memory shortage.

【００４３】これらの問題を解決するために、システム
性能を最大にするとともに、コプロセッサのメモリ使用
をシステム容量や実行する処理の複雑さに動的に適応化
する同期機構が必要である。図４に、（ホスト）ＣＰＵ
とコプロセッサとの同期をとる好適な構成を示す。図中
の参照番号は、図１の説明において利用したものを用い
ている。To solve these problems, there is a need for a synchronization mechanism that maximizes system performance and dynamically adapts coprocessor memory usage to system capacity and the complexity of the processing performed. FIG. 4 shows a (host) CPU.
A preferred configuration for synchronizing with the coprocessor is shown. The reference numbers in the figure are the same as those used in the description of FIG.

【００４４】図４において、ＣＰＵ２０２はシステム中
のすべてのメモリ管理を統括している。ＣＰＵ２０２
が、自身、あるいはコプロセッサ２２４での利用のため
に、メモリ２０３を割り当てる。コプロセッサ２２４は
グラフィックス特有の命令セットを有しており、ホスト
プロセッサ２０２と共有しているメモリ２０３から命令
１０２２を実行することができる。これらの命令のそれ
ぞれは結果１０２４を共有メモリ２０３に書き込むこと
ができ、またメモリ２０３からオペランドを読み込むこ
ともできる。ここでコプロセッサ命令のオペランド１０
２３や結果１０２４を記憶するに要するメモリ２０３の
量は、処理の複雑さや種別に依存する。Referring to FIG. 4, a CPU 202 controls all memory management in the system. CPU 202
Allocates the memory 203 for use by itself or by the coprocessor 224. The coprocessor 224 has a graphics-specific instruction set, and can execute the instruction 1022 from the memory 203 shared with the host processor 202. Each of these instructions can write the result 1024 to the shared memory 203, and can also read operands from the memory 203. Where operand 10 of the coprocessor instruction
The amount of the memory 203 required to store 23 and the result 1024 depends on the complexity and type of the processing.

【００４５】ＣＰＵ２０２は、コプロセッサ２２４によ
って実行される命令１０２２を生成する処理をも行う。
ＣＰＵ２０２とコプロセッサ２２４との間の並列性を最
大にするために、ＣＰＵ２０２によって生成された命令
は１０２２に示されるようにキューイングされてからコ
プロセッサ２２４において実行される。キュー１０２２
中の各命令は、コプロセッサ２２４のためにホストＣＰ
Ｕ２０２によって割り当てられた共有メモリ２０３中の
オペランド１０２３や結果１０２４を参照することがで
きる。CPU 202 also performs a process of generating instruction 1022 to be executed by coprocessor 224.
To maximize the parallelism between CPU 202 and coprocessor 224, the instructions generated by CPU 202 are queued as shown at 1022 and then executed on coprocessor 224. Queue 1022
Each instruction in the host CP
The operand 1023 and the result 1024 in the shared memory 203 allocated by the U 202 can be referred to.

【００４６】図５に示すように、これらの処理を行うた
めに、命令生成部１０３０、メモリ管理部１０３１、キ
ュー管理部１０３２が接続されている。これらすべての
モジュールはホストＣＰＵ２０２上で単一プロセスとし
て実行される。コプロセッサ２２４における実行命令は
命令生成部１０３０において生成され、メモリ管理部１
０３１のサービスを利用して生成された命令のオペラン
ド１０２３や結果１０２４のための領域を割り当てる。
また、命令生成部１０３０は、キュー管理部１０３２の
サービスを利用して、コプロセッサ２２４で実行する命
令をキューイングする。As shown in FIG. 5, an instruction generation unit 1030, a memory management unit 1031, and a queue management unit 1032 are connected to perform these processes. All of these modules execute on host CPU 202 as a single process. The execution instruction in the coprocessor 224 is generated in the instruction generation unit 1030, and the memory management unit 1
The area for the operand 1023 and the result 1024 of the instruction generated using the service 031 is allocated.
The instruction generation unit 1030 uses the service of the queue management unit 1032 to queue instructions to be executed by the coprocessor 224.

【００４７】各命令がコプロセッサ２２４において実行
されると、ＣＰＵ２０２はメモリ管理部１０３１によっ
て命令のオペランド用に割り当てられていたメモリを解
放することができる。ある命令の結果が次の命令のオペ
ランドとなることも可能であり、その後でＣＰＵによっ
てメモリが解放される。コプロセッサ２２４が命令を終
えると同時にインタラプト信号を送出しメモリを解放す
るのではなく、コプロセッサ２２４が命令を終えた後の
ある時点でクリーンアップ機構を起動し、命令の処理に
要した資源をシステムが解放する。クリーンアップ機構
が起動される時点は、メモリ管理部１０３１とキュー管
理部１０３２との関係に依存しており、利用可能なシス
テムメモリ量や各コプロセッサ命令に必要なメモリ量に
応じて動的に適応させることができる。When each instruction is executed by the coprocessor 224, the CPU 202 can release the memory allocated by the memory management unit 1031 for the operand of the instruction. The result of one instruction can be the operand of the next instruction, after which the memory is released by the CPU. Rather than sending an interrupt signal and releasing memory at the same time as the coprocessor 224 finishes the instruction, at some point after the coprocessor 224 finishes the instruction, the cleanup mechanism is activated to save the resources required for processing the instruction. System frees. The point at which the cleanup mechanism is activated depends on the relationship between the memory management unit 1031 and the queue management unit 1032, and is dynamically determined according to the amount of available system memory and the amount of memory required for each coprocessor instruction. Can be adapted.

【００４８】図６は、コプロセッサ命令キュー１０２２
の構成を模式的に示した図である。命令群はホストＣＰ
Ｕ２０２によりペンディング命令キュー１０４０に挿入
され、コプロセッサ２２４によって読み出され実行に移
される。コプロセッサ２２４における実行処理が終了す
ると、命令はクリーンアップキュー１０４１に転送さ
れ、コプロセッサ２２４が処理を終えた後で命令が必要
とした資源の解放を行う。FIG. 6 shows the coprocessor instruction queue 1022.
FIG. 3 is a diagram schematically showing the configuration of FIG. Instruction group is host CP
It is inserted into pending instruction queue 1040 by U202 and read and executed by coprocessor 224. When the execution process in the coprocessor 224 is completed, the instruction is transferred to the cleanup queue 1041, and after the coprocessor 224 completes processing, the resources required by the instruction are released.

【００４９】命令キュー１０２２自身は固定あるいは動
的可変サイズの巡回バッファとして構成される。命令キ
ュー１０２２は、ＣＰＵ２０２による命令の生成とコプ
ロセッサ２２４における命令の実行とを分離している。
各命令のオペランドと結果メモリは、命令生成時に命令
生成部１０３０からの要求に応じてメモリ管理部１０３
１（図５）によって割り当てられる。新しく生成された
命令のためのメモリ割当が、以下で説明するメモリ管理
部１０３１とキュー管理部１０３２との協調動作を起動
させ、利用可能なメモリ量や命令の複雑さにシステムが
自動的に適応できるようにしている。The instruction queue 1022 itself is configured as a fixed or dynamically variable size circular buffer. The instruction queue 1022 separates the generation of the instruction by the CPU 202 and the execution of the instruction in the coprocessor 224.
The operand and the result memory of each instruction are stored in the memory management unit 103 according to a request from the instruction generation unit 1030 when the instruction is generated.
1 (FIG. 5). The memory allocation for newly generated instructions triggers the cooperative operation of memory manager 1031 and queue manager 1032, described below, and the system automatically adapts to the amount of available memory and the complexity of the instructions. I can do it.

【００５０】命令キュー管理部１０２は、コプロセッサ
２２４が命令生成部１０３０によって生成された命令を
実行し終えるまで、待機することができる。しかし、メ
モリ管理部１０３１によって割り当てられる命令キュー
１０２２とメモリ２０３が十分大きければ、コプロセッ
サ２２４を全く待つ必要がないか、あるいは少なくとも
すべての命令シーケンスが終了するまで待機する必要は
ない。大きなジョブではこれらの待機時間が、数分間に
も及ぶため、効果は大きい。しかし、ピーク時のメモリ
使用量は利用可能なメモリ量を容易に超えることもあ
る。この時点で、キュー管理部１０３２とメモリ管理部
１０３１との間で協調的な動作が開始される。The instruction queue management unit 102 can wait until the coprocessor 224 finishes executing the instruction generated by the instruction generation unit 1030. However, if the instruction queue 1022 and the memory 203 allocated by the memory management unit 1031 are large enough, it is not necessary to wait for the coprocessor 224 at all, or at least not until all the instruction sequences are completed. In a large job, these waiting times extend for several minutes, so that the effect is large. However, peak memory usage can easily exceed available memory. At this point, a cooperative operation is started between the queue management unit 1032 and the memory management unit 1031.

【００５１】命令キュー管理部１０３２にとって、終了
した命令を「クリーンアップ」し、動的に割り当てられ
たメモリを解放するようにとの指示がなされる時点は適
宜で構わない。メモリ管理部１０３１が利用可能なメモ
リが少なくなりつつある、あるいはなくなったことを検
出した場合には、キュー管理部１０３２にクリーンアッ
プ処理を指示し、コプロセッサ２２４によってもはや利
用されていないメモリを解放させる手段をとる。これに
より、メモリ管理部１０３１は、ＣＰＵ２０２がコプロ
セッサ２２４を待つ、あるいはコプロセッサ２２４と同
期することなく、命令生成部１０３０からの新しく生成
された命令に要するメモリ要求を満足させることができ
る。The instruction queue management unit 1032 may be appropriately instructed to “clean up” the completed instruction and release the dynamically allocated memory. When the memory management unit 1031 detects that the available memory is decreasing or runs out, it instructs the queue management unit 1032 to perform a cleanup process, and releases the memory that is no longer used by the coprocessor 224. Take measures to make it happen. Accordingly, the memory management unit 1031 can satisfy a memory request required for a newly generated instruction from the instruction generation unit 1030 without the CPU 202 waiting for the coprocessor 224 or synchronizing with the coprocessor 224.

【００５２】メモリ管理部１０３１からキュー管理部１
０３２に終了命令をクリーンアップする要求を出して
も、命令生成部の新しい要求を満たすに足る十分メモリ
が解放されなかった場合には、メモリ管理部１０３１は
キュー管理部１０３２にペンディング命令キュー１０４
０中の処理中命令の一部、例えば半分が終了するまで待
機せよ、と要求する。これにより、コプロセッサ２２４
命令のいくつかが終了するまでＣＰＵ２０２処理はブロ
ックされることになる。コプロセッサ２２４命令のいく
つかが終了すると、これらの命令のオペランドが解放さ
れ、要求を満たすに十分なメモリが得られる。処理中の
命令の一部のみを待つことにより、少なくともいくつか
の命令はペンディング命令キュー１０４０に存在してお
り、コプロセッサ２２４は常に動作していることにな
る。多くの場合、ＣＰＵ２０２が待機するペンディング
命令キュー１０４０中の一部をクリーンアップすること
により、メモリ管理部１０３１にとって十分なメモリが
解放され、命令生成部１０３０の要求を満たすことがで
きる。From the memory management unit 1031 to the queue management unit 1
If the request to clean up the end instruction is issued in 032 and the memory sufficient to satisfy the new request of the instruction generation unit is not released, the memory management unit 1031 sends the pending instruction queue 104 to the queue management unit 1032.
Requests to wait until some, for example half, of the in-flight instructions in 0 are completed. Thereby, the coprocessor 224
CPU 202 processing will be blocked until some of the instructions are completed. Upon completion of some of the coprocessor 224 instructions, the operands of those instructions are freed, leaving enough memory to satisfy the request. By waiting for only a portion of the instructions being processed, at least some of the instructions are in the pending instruction queue 1040 and the coprocessor 224 is always running. In many cases, by cleaning up a part of the pending instruction queue 1040 in which the CPU 202 waits, sufficient memory is released for the memory management unit 1031 and the request of the instruction generation unit 1030 can be satisfied.

【００５３】コプロセッサ２２４がペンディング命令の
例えば半分が実行終了するまで待機したとしても要求を
満たすだけのメモリが解放されなかったという特殊なケ
ースの場合には、メモリ管理部１０３１はすべてのペン
ディングコプロセッサ命令が終了するまで待機するとい
う最後の手段をとる。システムの現在のメモリ容量を超
えるような非常に大きなかつ複雑なジョブなどを除い
て、これにより命令生成部１０３０の要求を満たすに十
分な資源が解放される。In a special case in which the memory sufficient to satisfy the request is not released even if the coprocessor 224 waits until, for example, half of the pending instructions have been executed, the memory management unit 1031 will execute all pending commands. One last resort is to wait for the processor instruction to finish. Except for very large and complex jobs that would exceed the current memory capacity of the system, this frees up enough resources to satisfy the requirements of the instruction generator 1030.

【００５４】このようなメモリ管理部１０３１とキュー
管理部１０３２との協調動作により、システムに与えら
れたメモリ量２０３の中で効率的にスループットを最大
にすることが可能となる。より多くのメモリがあれば同
期の必要性は少なくなり、より大きなスループットを得
ることができる。逆に、より少ないメモリの場合には、
コプロセッサ２２４が乏しいメモリ２０３を使っての処
理が終わるまで待機することが多くなり、利用可能なメ
モリが少なくても動作はするものの性能は劣化する。By the cooperative operation of the memory management unit 1031 and the queue management unit 1032, the throughput can be efficiently maximized in the memory amount 203 given to the system. With more memory, the need for synchronization is reduced and greater throughput can be obtained. Conversely, for less memory,
The coprocessor 224 often waits until the processing using the scarce memory 203 is completed, and even if the available memory is small, the coprocessor 224 operates but the performance deteriorates.

【００５５】命令生成部１０３０からの要求を満たす際
にメモリ管理部１０３１が行う処理ステップを以下にま
とめる。各ステップは順々に実行され、ステップ後にメ
モリ管理部１０３１が要求を満たすに十分なメモリ２０
３が得られるかどうか調べる。十分なメモリが得られる
場合には要求が満たされるため、ステップを終了する。
得られなかった場合には、次のステップに進み、要求を
満たすべくより過激な処理に進む。１．利用可能なメモリ２０３で要求を満たすことを試み
る２．すべての終了した命令をクリーンアップする３．ペンディング命令の一部が終了するのを待つ４．すべてのペンディング命令が終了するのを待つなお、要求を満たすために、ペンディング命令のうちの
異なる部分（例えば、１／３や２／３）を待機すると
か、多量のメモリを使用することがわかっている特定の
命令を待機するなど、他のオプションを用いることもで
きる。The processing steps performed by the memory management unit 1031 when satisfying the request from the instruction generation unit 1030 are summarized below. Each step is executed sequentially, and after the step, the memory management unit 1031 has sufficient memory 20 to satisfy the request.
Check if 3 is obtained. If sufficient memory is available, the request is satisfied and the step ends.
If not, it proceeds to the next step and proceeds to more radical processing to satisfy the request. 1. 1. Attempt to satisfy the request with available memory 203 2. Clean up all completed instructions. 3. Wait for part of the pending instruction to end. Wait for all pending instructions to finish Note that it turns out to wait for a different part of the pending instructions (eg, 1/3 or 2/3) or use a lot of memory to satisfy the request. Other options may be used, such as waiting for a particular instruction being performed.

【００５６】図７において、メモリ管理部１０３１とキ
ュー管理部１０３２との間での協調動作に加えて、固定
長命令キューバッファ１０５０が溢れた場合にはキュー
管理部１０３２がコプロセッサ２２４と同期をとること
もできる。このような状況を図７に示しており、ペンデ
ィング命令キュー１０４０は長さ１０個の命令のキュー
としている。付加される最新の命令が最も大きい数を有
しているため、領域が溢れると最新の命令は位置９に格
納される。次にコプロセッサ２２４に入力される命令は
位置０において待機している。In FIG. 7, in addition to the cooperative operation between the memory management unit 1031 and the queue management unit 1032, when the fixed-length instruction queue buffer 1050 overflows, the queue management unit 1032 synchronizes with the coprocessor 224. Can also be taken. Such a situation is shown in FIG. 7, where the pending instruction queue 1040 is a queue of ten instructions long. When the area overflows, the newest instruction is stored at location 9 because the newest instruction to be added has the largest number. The next instruction to be input to coprocessor 224 is waiting at location 0.

【００５７】領域が溢れた場合には、キュー管理部１０
３２はコプロセッサ２２４がペンディング命令の例えば
半分の処理を終えるまで待機する。この待機により、通
常はキュー管理部１０３２によって挿入される新しい命
令に必要な十分な領域が解放される。新しい命令をスケ
ジューリングする際のキュー管理部１０３２の動作は以
下の通りである。１．命令キュー１０４０に十分な領域が残っているかテ
ストする２．十分な領域が残っていない場合は、コプロセッサが
ある所定数の命令が終了するまで待機する３．新しい命令をキューに挿入するある命令が終了するのを待機せよと指示されたキュー管
理部１０３２の動作は以下の通りである。１．命令が終了したとコプロセッサ２２４から指示され
るまで待機する２．クリーンアップされていない終了した命令がある場
合には、次に終了した命令をキューから削除する新しい命令を生成する際の命令生成部１０３０の動作は
以下の通りである。１．命令オペランド１０２３に必要なメモリをメモリ管
理部１０３１に要求する２．転送する命令を生成する３．コプロセッサ命令をキュー管理部１０３２に転送し
実行する以上の動作プロセスを擬似コードの形で示した例を以下
に示す。When the area overflows, the queue management unit 10
32 waits until the coprocessor 224 completes processing, for example, half of the pending instruction. This waiting frees enough space for new instructions that would normally be inserted by the queue manager 1032. The operation of the queue management unit 1032 when scheduling a new instruction is as follows. 1. 1. Test if enough space remains in instruction queue 1040 If there is not enough space left, the coprocessor waits for a given number of instructions to complete. Inserting a New Instruction in Queue The operation of the queue management unit 1032 instructed to wait for an instruction to end is as follows. 1. 1. Wait until instructed by coprocessor 224 that the instruction has been completed. If there is a completed instruction that has not been cleaned up, the next completed instruction is deleted from the queue. The operation of the instruction generation unit 1030 when generating a new instruction is as follows. 1. 1. Request memory required for the instruction operand 1023 from the memory management unit 1031. 2. Generate instructions to transfer A coprocessor instruction is transferred to the queue management unit 1032 and executed. An example of the above operation process in the form of pseudo code is shown below.

【００５８】メモリ管理ＡＬＬＯＣＡＴＥ＿ＭＥＭＯＲＹＢＥＧＩＮＩＦ要求を満たすのに十分なメモリが得られないとするとＴＨＥＮ終了した命令すべてをクリーンアップ（一掃）するＥＮＤＩＦＩＦ要求を満たすのに十分なメモリが未だ得られないとするとＴＨＥＮＷＡＩＴ＿ＦＯＲ＿ＩＮＳＴＲＵＣＴＩＯＮを呼び出し、ペンディング命令の半分の終了を待つＥＮＤＩＦＩＦ要求を満たすのに十分なメモリが未だ得られないとするとＴＨＥＮエラーを出力し戻るＥＮＤＩＦ割り当てたメモリを戻すキュー管理ＳＣＨＥＤＵＬＥ＿ＩＮＳＴＲＵＣＴＩＯＮＢＥＧＩＮＩＦ命令キューに十分な領域が得られないとするとＴＨＥＮある所定数の命令をコプロセッサが終了するまで待機するＥＮＤＩＦ新しい命令をキューに付加するＥＮＤＷＡＩＴ＿ＦＯＲ＿ＩＮＳＴＲＵＣＴＩＯＮ（ｉ）ＢＥＧＩＮ命令ｉが終了したとコプロセッサから指示されるまで待機するＷＨＩＬＥ終了しているもののクリーンアップされていない命令があるＤＯＩＦ次の終了した命令にクリーンアップ機能が備わっているＴＨＥＮクリーンアップ機能を呼び出すＥＮＤＩＦキューから終了した命令を削除するＤＯＮＥＥＮＤ命令生成部ＧＥＮＥＲＡＴＥ＿ＩＮＳＴＲＵＣＴＩＯＮＳＢＥＧＩＮＡＬＬＯＣＡＴＥ＿ＭＥＭＯＲＹを呼び出し、命令オペランドに必要なメモリをメモリ管理部において割り当てる転送する命令を生成するＳＣＨＥＤＵＬＥ＿ＩＮＳＴＲＵＣＴＩＯＮを呼び出し、コプロセッサ命令をキュー管理部に転送し実行するＥＮＤ３．３コプロセッサのレジスタの説明図１と３において説明したように、コプロセッサ２２４
は各命令ストリームを実行するために複数のレジスタを
備える。If there is not enough memory available to satisfy the ALLOCATE_MEMORY BEGIN IF request, then THEN clean up all instructions that have completed THEN If there is not enough memory available to satisfy the ENDIF IF request Then, it calls THEN WAIT_FOR_INSTRUCTION and waits for the completion of half of the pending instruction. If there is not enough memory available to satisfy the ENDIF IF request, it outputs a THEN error and returns. ENDIF returns the allocated memory. Queue management SCHEDULE_INSTRUCTION BEGINIF If there is not enough space in the instruction queue, THEN waits for a certain number of instructions until the coprocessor finishes. ENDIF queues new instructions END WAIT_FOR_INSTRUCTION (i) BEGIN Wait until the coprocessor indicates that the instruction i has finished WHILE There are instructions that have been finished but not cleaned up DO IF The next finished instruction has a cleanup function Call the THEN clean-up function ENDIF Delete the finished instruction from the queue DONE END Instruction generation unit GENERATE_INSTRUCTIONS BEGIN ALLOCATE_MEMORY is called, and the memory required for the instruction operand is allocated in the memory management unit. The transfer instruction is generated. SCHEDULE_INSTRU Coprocessor END 3.3 Coprocessor transfers instructions to queue management unit for execution Description of the registers of the coprocessor 224 as described in FIGS.
Comprises a plurality of registers for executing each instruction stream.

【００５９】図２中のモジュールに対して、表１はコプ
ロセッサ２２４において用いられるレジスタの名前、種
別、説明を示しており、付録Ｂはそれぞれのレジスタの
各フィールドを説明している。レジスタの説明For the modules in FIG. 2, Table 1 shows the names, types, and descriptions of registers used in the coprocessor 224, and Appendix B describes each field of each register. Register description

【００６０】[0060]

【表１Ａ】 [Table 1A]

【００６１】[0061]

【表１Ｂ】 [Table 1B]

【００６２】[0062]

【表１Ｃ】 [Table 1C]

【００６３】[0063]

【表１Ｄ】 [Table 1D]

【００６４】[0064]

【表１Ｅ】 [Table 1E]

【００６５】[0065]

【表１Ｆ】 [Table 1F]

【００６６】[0066]

【表１Ｇ】 [Table 1G]

【００６７】これらのレジスタ中で着目すべきものは以
下のものである。（ａ）命令ポインタレジスタ（ｉｃ＿ｉｐａとｉｃ＿ｉ
ｐｂ）。これらのレジスタペアは現在実行している命令
の仮想アドレスを格納する。仮想アドレスの昇順に命令
がフェッチされ実行される。制御が不連続な仮想アドレ
スに移る場合にはジャンプ命令が用いられる。各命令に
は、３２ビットのシーケンス番号が付与され、シーケン
ス番号は一命令ごとに１ずつ増える。シーケンス番号は
コプロセッサ２２４とホストＣＰＵ２０２双方におい
て、命令の生成と実行の同期をとるために用いられる。（ｂ）終了レジスタ（ｉｃ＿ｆｎａとｉｃ＿ｆｎｂ）。
これらのレジスタペアは、終了した命令のシーケンス番
号を格納する。（ｃ）ＴｏＤｏレジスタ（ｉｃ＿ｔｄａとｉｃ＿ｔｄ
ｂ）。これらのレジスタペアは、キューイングされてい
る命令のシーケンス番号を格納する。（ｄ）インタラプトレジスタ（ｉｃ＿ｉｎｔａとｉｃ＿
ｉｎｔｂ）。これらのレジスタペアは、インタラプトを
かけるシーケンス番号を格納する。（ｅ）インタラプト状態レジスタ（ｉｃ＿ｓｔａｔ．ａ
＿ｐｒｉｍｅｄとｉｃ＿ｓｔａｔ．ｂ＿ｐｒｉｍｅ
ｄ）。これらのレジスタペアは、インタラプト、終了レ
ジスタとが合致した時点でインタラプトを起動するフラ
グであるプライムビットを格納する。本ビットは、イン
タラプト状態（ｉｃ＿ｓｔａｔ）レジスタ中の他のイン
タラプトイネーブルビットや他の状態／構成情報と同様
に格納される。（ｆ）レジスタアクセスセマフォア（ｉｃ＿ｓｅｍａと
ｉｃ＿ｓｅｍｂ）。ホストＣＰＵ２０２は、コプロセッ
サ２２４への高速性、即ち、１回以上のレジスタへの書
き込みを必要とするレジスタアクセスに先立ちセマフォ
アを入手しておかなければならない。これに対して、高
速性を必要としないレジスタアクセスの場合は何時でも
実行することができる。ホストＣＰＵ２０２がセマフォ
アを入手することに付随する欠点は、現在実行中の命令
が終了するまでコプロセッサの実行が中断することであ
る。レジスタアクセスセマフォアは、コプロセッサ２２
４の構成／状態レジスタの１ビットとして構成される。
これらのレジスタは命令制御美のレジスタ領域中に存在
する。前述の通り、コプロセッサの各サブモジュール
は、それぞれ構成／状態レジスタを備えており、通常の
命令実行においてレジスタが設定される。これらのすべ
てのレジスタは、レジスタマップ上に表されており、多
くは命令実行において暗黙的に修正される。ホストはレ
ジスタマップを介してこれらのレジスタの内容を知るこ
とができる。３．４複数ストリームフォーマット前述の通り、資源を最大限に有効に利用するために、ま
た外部周辺装置に高速に出力するために、コプロセッサ
２２４は２つの独立な命令ストリームの１つを実行す
る。通常は、１つの命令ストリームは出力デバイスが適
時点で必要とする現在の出力ページに対応しており、２
つ目の命令ストリームが他の命令ストリームが休止中で
あるときにコプロセッサ２２４のモジュールを利用す
る。ここで、最も重要な点は、必要な出力データを適時
点で出力することであるとともに、続くページ、バンド
などの準備のために資源を最大限に利用することであ
る。従って、コプロセッサ２２４は、全く独立であるも
のの同じように実行される２つの命令ストリーム（以
下、ＡとＢと呼ぶ）を実行するように設計される。命令
はホストＣＰＵ２０２上で動作しているソフトウエアに
よって生成され、ラスタ画像アクセラレータカード２２
０に転送されコプロセッサ２２４によって実行されるこ
とが望ましい。通常動作では、命令ストリームの１つ
（ストリームＡ）は、他の命令ストリーム（ストリーム
Ｂ）よりも高い優先度で動作する。命令ストリームある
いはキューはホストＲＡＭ２０３（図１）中の一つある
いは複数のバッファに書き込まれる。バッファは開始時
点で割り当てられ、アプリケーションの実行中はホスト
２０３の物理メモリに固定される。各命令はホストＲＡ
Ｍ２０３の仮想メモリ環境に格納されることが好まし
く、ラスタ画像コプロセッサ２２４が仮想アドレスから
物理アドレスへの変換を行い、次の命令の位置としてホ
ストＲＡＭ２０３中の対応する物理アドレスを決定す
る。これらの命令は順々にコプロセッサ２２４のローカ
ルメモリに格納される。The following should be noted in these registers. (A) Instruction pointer registers (ic_ipa and ic_i
pb). These register pairs store the virtual address of the currently executing instruction. Instructions are fetched and executed in ascending virtual address order. When control transfers to a discontinuous virtual address, a jump instruction is used. Each instruction is assigned a 32-bit sequence number, and the sequence number increases by one for each instruction. The sequence number is used by both the coprocessor 224 and the host CPU 202 to synchronize the generation and execution of instructions. (B) End registers (ic_fna and ic_fnb).
These register pairs store the sequence numbers of completed instructions. (C) ToDo registers (ic_tda and ic_td
b). These register pairs store the sequence numbers of the queued instructions. (D) Interrupt registers (ic_inta and ic_
intb). These register pairs store a sequence number to be interrupted. (E) Interrupt status register (ic_stat.a)
_Primed and ic_stat. b_prime
d). These register pairs store a prime bit which is a flag for activating the interrupt when the interrupt and end registers match. This bit is stored in the same manner as other interrupt enable bits and other status / configuration information in the interrupt status (ic_stat) register. (F) Register access semaphores (ic_sema and ic_semb). The host CPU 202 must obtain a semaphore prior to high-speed access to the coprocessor 224, that is, register access that requires one or more register writes. On the other hand, a register access that does not require high speed can be executed at any time. A disadvantage associated with host CPU 202 obtaining the semaphore is that coprocessor execution is suspended until the currently executing instruction has completed. The register access semaphore is
4 as one bit of the configuration / status register.
These registers exist in the register area of the instruction control. As described above, each sub-module of the coprocessor has a configuration / status register, and the register is set during normal instruction execution. All these registers are represented on a register map, many of which are modified implicitly in instruction execution. The host can know the contents of these registers via the register map. 3.4 Multiple Stream Format As described above, for maximum resource utilization and high speed output to external peripherals, coprocessor 224 executes one of two independent instruction streams. . Normally, one instruction stream corresponds to the current output page needed by the output device at the right time, and 2
The first instruction stream utilizes the modules of coprocessor 224 when other instruction streams are idle. Here, the most important point is to output necessary output data at an appropriate time, and to make maximum use of resources for preparing subsequent pages and bands. Accordingly, coprocessor 224 is designed to execute two instruction streams (hereinafter referred to as A and B) that are completely independent but are executed in a similar manner. The instructions are generated by software running on the host CPU 202, and are executed by the raster image accelerator card 22.
0 and is preferably executed by coprocessor 224. In normal operation, one of the instruction streams (stream A) operates at a higher priority than the other instruction stream (stream B). The instruction stream or queue is written to one or more buffers in host RAM 203 (FIG. 1). The buffer is allocated at the start and is fixed in the physical memory of the host 203 during the execution of the application. Each instruction is the host RA
It is preferably stored in the virtual memory environment of M203, and the raster image coprocessor 224 performs the conversion from the virtual address to the physical address and determines the corresponding physical address in the host RAM 203 as the position of the next instruction. These instructions are stored sequentially in the local memory of the coprocessor 224.

【００６８】図８は、ホストＲＡＭ２０３中に格納され
ている２つのストリームＡとＢのフォーマットを示す図
である。ストリームＡとＢそれぞれのフォーマットは本
質的に同一である。コプロセッサ２２４における簡単な
実行モデルは、以下のものから構成される。＊ＡストリームとＢストリームの２つの命令仮想ストリ
ーム＊通常はある時点で１つのみの命令が実行される＊どちらかのストリームが優先権を有することもできる
し、「ラウンドロビン」的に優先権を交互にすることも
できる＊どちらかのストリームを「ロック」して、ストリーム
優先権や他のストリームの命令実行可能度に関わらず、
確実に実行することもできる＊どちらかのストリームが空であっても良い＊どちらかのストリームが利用不能であっても良い＊どちらかのストリームは、後続の命令が「オーバラッ
プ」していなければ、次の命令の実行と「オーバラッ
プ」しているような命令を含んでいても良い＊各命令は３２ビットの１つずつ増加するような「一意
な」シーケンス番号を有する＊各命令はインタラプトや命令実行を停止させるコード
を有していても良い＊外部インタフェースの遅延の影響を最小限にするため
に、命令をあらかじめフェッチしても良い命令制御部２３５は、コプロセッサ２２４の全体の実行
制御を行うためや、必要な時にホストＲＡＭ２０３から
命令をフェッチするために、コプロセッサの命令実行モ
デルを実装している。一つの命令ごとに、命令制御部２
３５は命令の復号を行い、ＣＢｕｓ２３１を介してモジ
ュール中の種々のレジスタを構成し、該当モジュールに
命令を実行させる処理を行う。FIG. 8 is a diagram showing the format of two streams A and B stored in the host RAM 203. The format of each of streams A and B is essentially the same. A simple execution model in the coprocessor 224 consists of: * Two instruction virtual streams, A stream and B stream * Normally only one instruction is executed at a time * Either stream can have priority or "round robin" priority * Can be "locked" on either stream, regardless of stream priority or the ability of other streams to execute instructions.
Can be executed reliably * Either stream may be empty * Either stream may not be available * Either stream must have subsequent instructions "overlapping" For example, instructions may include instructions that "overlap" the execution of the next instruction. * Each instruction has a "unique" sequence number that increments by 32 bits. * Each instruction The instruction control unit 235 may have a code for stopping an interrupt or the execution of an instruction. An instruction execution model of the coprocessor is implemented to control execution and fetch instructions from the host RAM 203 when necessary. Instruction control unit 2 for each instruction
35 decodes the instruction, configures various registers in the module via the CBus 231 and performs processing for causing the module to execute the instruction.

【００６９】図９は、命令制御部２３５で実行する命令
実行サイクルを簡単な形で示した図である。命令実行サ
イクルは４つの主なステージ２７６−２７９から成る。
第１ステージ２７６では、命令ストリームにおいて命令
がペンディング状態であるかどうかを調べる。ペンディ
ング状態である場合には、命令をフェッチして２７７、
復号ならびに実行し２７８、レジスタを更新する２７
９。３．５現在のアクティブストリームの決定第１ステージでは、２つのステップを実行しなければな
らない。１．命令がペンディングしているかどうかの決定２．どの命令ストリームを次にフェッチするかの決定どの命令がペンディングであるかを決定するためには次
の可能性を調べる。１．命令制御部がイネーブルかどうか２．内部エラーやインタラプトにより命令制御部が休止
しているかどうか３．ペンディングしている外部エラー状態があるかどう
か４．ＡあるいはＢのストリームがロックしているかどう
か５．どちらかのストリームシーケンス番号がイネーブル
かどうか６．どちらかのストリームがペンディング命令を有して
いるかどうか以下に示す擬似コードは、上記ルールに基づいて命令が
ペンディングしているかどうかを決定するアルゴリズム
を示したものである。このアルゴリズムは、既知の技術
を用いて、命令制御部２３５中に状態遷移機械を介して
ハードウェアとして実装することができる。FIG. 9 is a diagram schematically showing an instruction execution cycle executed by the instruction control unit 235. The instruction execution cycle consists of four main stages 276-279.
The first stage 276 checks whether the instruction is pending in the instruction stream. If so, fetch the instruction and 277;
Decrypt and execute 278, update register 27
9. 3.5 Determining the current active stream In the first stage, two steps must be performed. 1. 1. Determine if the instruction is pending Determining Which Instruction Stream to Fetch Next To determine which instruction is pending, look at the following possibilities: 1. 1. Whether the instruction control unit is enabled 2. Whether the instruction control unit is paused due to an internal error or interrupt. 3. Whether any external error conditions are pending. 4. Whether the A or B stream is locked 5. Whether either stream sequence number is enabled Whether Either Stream Has Pending Instructions The following pseudo code illustrates an algorithm that determines whether an instruction is pending based on the above rules. This algorithm can be implemented as hardware in the instruction controller 235 via a state transition machine using known techniques.

【００７０】ｉｆエラーモードでなく、稼働モードであり、バイパスモードでもなく、自己診断モードであるｉｆＡストリームがロックされていて休止中でないｉｆＡストリームが稼働モードであり、かつ「Ａストリームのシーケンス番号が休止中、あるいはＡストリームに命令が存在する」命令はペンディングしているｅｌｓｅ命令はペンディングしていないｅｎｄｉｆｅｌｓｅｉｆＢストリームがロックされていて休止中でないｉｆＢストリームが稼働モードであり、かつ「Ｂストリームのシーケンス番号が休止中、あるいはＢストリームに命令が存在する」命令はペンディングしているｅｌｓｅ命令はペンディングしていないｅｎｄｉｆｅｌｓｅ／＊ストリームがロックされていない＊／ｉｆＡストリームが稼働モードで休止中でない、かつ「Ａストリームのシーケンス番号が休止中、あるいはＡストリームに命令が存在する」命令はペンディングしているｅｌｓｅ命令はペンディングしていないｅｎｄｉｆｅｎｄｉｆｅｌｓｅ／＊インタフェース制御部が稼動していない＊／命令はペンディングされていないｅｎｄｉｆいかなる命令もペンディングしていない場合には、命令制御部２３５はペンディング命令が見つかるまで「スピン」あるいはアイドル状態となる。If not in error mode, is in operation mode, is not in bypass mode, and is in self-diagnosis mode. IfA stream is locked and not paused. IfA stream is in operation mode, and “sequence of A stream The instruction number is paused or there is an instruction in stream A. The instruction is pending else The instruction is not pending end if else if B stream is locked and not paused if B stream is in active mode And "the sequence number of the B stream is paused, or there is an instruction in the B stream." The instruction is pending else The instruction is not pending end if else / * The stream is not locked * / if A Stream running mode Is not paused, and "the sequence number of the A stream is paused, or there is an instruction in the A stream" The instruction is pending else The instruction is not pending end if end if else / * Interface control unit Is not running * / instruction is not pending end if If no instruction is pending, instruction controller 235 goes into a "spin" or idle state until a pending instruction is found.

【００７１】どのストリームがアクティブであるか、ど
のストリームを次に実行するかを決定するために、次の
状態が調べられる。１．どちらかのストリームがロックされているか２．ＡとＢのストリームにどの優先権が付与されてお
り、最後に実行した命令ストリームはどちらであるか３．どちらかのストリームが稼動しているか４．どちらかのストリームがペンディング命令を有して
いるか以下は、命令制御部によって実装される擬似コードを示
したものであり、どのように次にアクティブとなるスト
リームを決定するかを示している。The next state is examined to determine which stream is active and which stream to execute next. 1. 1. Which stream is locked? 2. Which priority is given to the streams A and B, and which instruction stream was executed last? 3. Which stream is running? Which Stream Has Pending Instructions The following shows the pseudo code implemented by the instruction controller, and shows how to determine the next active stream.

【００７２】ｉｆＡストリームがロックされている次のストリームはＡｅｌｓｅｉｆＢストリームがロックされている次のストリームはＢｅｌｓｅ／＊どちらのストリームもロックされていない＊／ｉｆＡストリームが稼動モード、かつ「Ａストリームのシーケンス番号が休止中、あるいはＡストリームに命令が存在する」、かつ「Ｂストリームが稼動モードで、「Ｂストリームのシーケンス番号が休止中、あるいはＢストリームに命令が存在」」しなければ、次のストリームはＡｅｌｓｅｉｆＢストリームが稼動モード、かつ「Ｂストリームのシーケンス番号が休止中、あるいはＢストリームにペンディング命令が存在する」、かつ「Ａストリームが稼動モードで、「Ａストリームのシーケンス番号が休止中、あるいはＡストリームに命令が存在」」しなければ、次のストリームはＢｅｌｓｅ／＊どちらのストリームも命令が存在しない＊／ｉｆｐｒｉ＝０／＊Ａ高、Ｂ低＊／次のストリームはＡｅｌｓｅｉｆｐｒｉ＝１／＊Ａ低、Ｂ高＊／次のストリームはＢｅｌｓｅｉｆｐｒｉ＝２ｏｒ３／＊ラウンドロビン＊／ｉｆ最後のストリームがＡ次のストリームはＢｅｌｓｅ次のストリームはＡｅｎｄｉｆｅｎｄｉｆｅｎｄｉｆｅｎｄｉｆ条件は常に変化しているため、すべての条件を短時間で調べることが必要である。３．６現在のアクティブストリームのフェッチ命令次のアクティブ命令ストリームを決定すると、命令制御
部２３５は対応する命令ポインタレジスタ（ｉｃ＿ｉｐ
ａとｉｃ＿ｉｐｂ）中のアドレスを用いて命令をフェッ
チする。しかしながら、有効な命令が既に命令制御部２
３５中のプレフェッチバッファ内に存在する場合には、
命令制御部２３５は命令をフェッチしない。If stream A is locked Next stream is A else if B stream is locked Next stream is Belse / * Neither stream is locked * / if A stream is in operation mode And "the sequence number of stream A is paused or there is an instruction in stream A", and "the stream B is in operation mode and the sequence number of stream B is paused or there is an instruction in stream B" Otherwise, the next stream is Aelise if the B stream is in operation mode, "the sequence number of B stream is paused, or there is a pending command in B stream", and "the A stream is in operation mode. "The sequence number of stream A is paused, If there is no instruction in the stream, then the next stream is Belse / * neither stream has instructions * / if pri = 0 / * A high, B low * / the next stream is A else if pri = 1 / * A low, B high * / The next stream is Belse if pri = 2or3 / * Round robin * / if The last stream is A next, the next stream is Belse, and the next stream is A end if end if end if Since the end if condition is constantly changing, it is necessary to check all conditions in a short time. 3.6 Fetch Instruction of Current Active Stream When the next active instruction stream is determined, the instruction control unit 235 causes the corresponding instruction pointer register (ic_ip
Fetch an instruction using the address in a and ic_ipb). However, the valid instruction is already
If it exists in the prefetch buffer in 35,
The instruction control unit 235 does not fetch an instruction.

【００７３】以下の条件が満たされるときに、プレフェ
ッチバッファ中の命令が有効になる。１．プレフェッチバッファが有効である２．プレフェッチバッファ中の命令が現在のアクティブ
ストリームと同じストリームからのものであるプレフェッチバッファの内容の有効性は、ｉｃ＿ｓｔａ
ｔレジスタ中のプレフェッチビットによって表され、当
該ビットは命令のプレフェッチが成功した際にセットさ
れる。なお、命令制御部２３５のいかなるレジスタへの
外部書き込みも、プレフェッチバッファの内容を無効に
させる。３．７復号、実行命令命令がフェッチされ、受理されると、命令制御部２３５
は命令を復号し、命令を実行するためにコプロセッサ２
２４のレジスタ２２９を構成する。An instruction in the prefetch buffer becomes valid when the following conditions are satisfied. 1. 1. Prefetch buffer is valid The instructions in the prefetch buffer are from the same stream as the current active stream. The validity of the contents of the prefetch buffer is ic_sta
Represented by a prefetch bit in the t register, which is set when the instruction prefetch is successful. Note that external writing to any register of the instruction control unit 235 invalidates the contents of the prefetch buffer. 3.7 Decryption and Execution Instructions When an instruction is fetched and accepted, the instruction control unit 235
Coprocessor 2 decodes the instruction and executes the instruction.
24 registers 229 are configured.

【００７４】ラスタ画像コプロセッサ２２４において用
いられる命令フォーマットは、命令の生成がホストＣＰ
Ｕ２０２からの命令によって実行され、ホストに対して
直接的なオーバヘッドになるという点で、従来のプロセ
ッサ命令セットとは異なる。また、命令はホストＲＡＭ
２０３に格納され、図１のＰＣＩバス２０６を介してコ
プロセッサ２２４に転送されるため、命令はできるだけ
小型化すべきである。好ましくは、コプロセッサ２２４
は単一の命令によって実行開始されることが望ましい。
また、将来の変更に最大限対処可能とするためには、命
令セットの柔軟性をできるだけ保持することが望まし
い。更に、コプロセッサ２２４において実行される命令
はオペランドデータの長いストリームにも適用でき、最
適な性能が得られるようにすることも好ましい。なお、
コプロセッサ２２４が用いる命令復号「哲学」として、
「一般的な命令」の復号を簡潔にかつ高速に行うととも
に、「一般的でない」処理に対してもコプロセッサ２２
４の動作に対して細かい制御をホストシステムが行える
ようにデザインを取り入れている。The command format used in the raster image coprocessor 224 is such that the command is generated by the host CP.
It differs from the conventional processor instruction set in that it is executed by an instruction from U202 and has direct overhead to the host. The instruction is the host RAM
The instructions should be as small as possible because they are stored in 203 and transferred to the coprocessor 224 via the PCI bus 206 of FIG. Preferably, coprocessor 224
Is desirably started by a single instruction.
It is also desirable to maintain the flexibility of the instruction set as much as possible in order to be able to cope with future changes. Further, the instructions executed in coprocessor 224 are also applicable to long streams of operand data, preferably for optimal performance. In addition,
As instruction decoding "philosophy" used by the coprocessor 224,
Decoding of “general instructions” is performed simply and at high speed, and coprocessor 22 is used for “uncommon” processing.
The design is adopted so that the host system can perform fine control for the operation of No. 4.

【００７５】図１０は、それぞれが３２ビットの８ワー
ドから成る単一命令２８０フォーマットを示している。
各命令は、命令ワード（オプコード）２８１、オペラン
ドの種別を示すオペランドあるいは結果タイプデータワ
ード２８２を含む。３つのオペランドＡ，Ｂ，Ｃのアド
レス２８３−２８５も、結果アドレス２８６とともに含
まれる。更に、領域２８７も、ホストＣＰＵ２０２が用
いる命令に関する情報を格納するために含んでいる。FIG. 10 shows a single instruction 280 format, each consisting of 8 words of 32 bits.
Each instruction includes an instruction word (opcode) 281 and an operand or result type data word 282 indicating the type of the operand. Addresses 283-285 for the three operands A, B, C are also included along with the result address 286. Further, an area 287 is also included for storing information related to an instruction used by the host CPU 202.

【００７６】図１１は、命令の命令オプコード２８１の
構造２９０を示した図である。命令オプコードは３２ビ
ット長で、主オプコード２９１、補オプコード２９２、
インタラプト（Ｉ）ビット２９３、一部復号（Ｐｄ）ビ
ット２９４、レジスタ長（Ｒ）ビット２９５、ロック
（Ｌ）ビット２９６、長さ２９７を含む。命令ワード２
９０のそれぞれのフィールドの説明を以下の表に示す。FIG. 11 is a diagram showing the structure 290 of the instruction opcode 281 of the instruction. The instruction opcode is 32 bits long and includes a main opcode 291, a complementary opcode 292,
Includes interrupt (I) bit 293, partially decoded (Pd) bit 294, register length (R) bit 295, lock (L) bit 296, and length 297. Instruction word 2
A description of each of the 90 fields is provided in the table below.

【００７７】オプコード説明Description of Opcode

【００７８】[0078]

【表２Ａ】 [Table 2A]

【００７９】[0079]

【表２Ｂ】 [Table 2B]

【００８０】Ｉビットフィールド２９３をセットするこ
とによって、命令が終了した時点で命令の実行がインタ
ラプトされ休止するように命令をコード化することがで
きる。なお、このインタラプトは「命令終了インタラプ
ト」と呼ばれる。一部復号ビット２９４は、一部復号ビ
ット２９４のビットがセットされ、ｉｃ＿ｃｆｇレジス
タ中で稼動モードになると、以下に述べるように命令の
実行に先立ち種々のモジュールがマイクロコード化され
るというような一部復号機能を提供する。ロックビット
２９６は、開始にあたり１つ以上の命令を必要とする処
理の際に用いられる。この際には、命令に先立ち種々の
レジスタがセットされ、次の命令のために現在の命令ス
トリームを「ロック」される。Ｌビット２９６がセット
されると、命令が終了した時点で次の命令が同じストリ
ームからフェッチされる。長さフィールド２９７は各命
令の一般的な定義であり、必要となる「入力データ項
目」数あるいは「出力データ項目」数として定義され、
１６ビット長である。６４、０００項目以上の入力デー
タ項目のストリームに対する処理の場合には、Ｒビット
２９５がセットされ、図２のピクセルオーガナイザ２４
６中のｐｏ＿ｌｅｎレジスタから入力長を得る。当該レ
ジスタはこのような命令の直前にセットされる。By setting the I bit field 293, the instruction can be coded so that execution of the instruction is interrupted and paused when the instruction is completed. This interrupt is called an "instruction end interrupt". The partially decoded bit 294 is such that when the bits of the partially decoded bit 294 are set and the operating mode is set in the ic_cfg register, various modules are microcoded prior to execution of the instruction as described below. Provides a partial decoding function. The lock bit 296 is used for processing that requires one or more instructions to start. At this time, various registers are set prior to the instruction, and the current instruction stream is "locked" for the next instruction. When the L bit 296 is set, the next instruction is fetched from the same stream when the instruction is completed. Length field 297 is a general definition of each instruction, defined as the number of "input data items" or "output data items" required,
It is 16 bits long. In the case of processing for a stream of 64,000 or more input data items, the R bit 295 is set and the pixel organizer 24 of FIG.
6 is obtained from the po_len register. The register is set immediately before such an instruction.

【００８１】図１０において、ある命令に必要なオペラ
ンド２８３〜２８６の数は用いる命令タイプに応じて可
変である。以下の表は、各命令タイプごとにオペランド
数と長さの定義とを示したものである。オペランドタイプIn FIG. 10, the number of operands 283 to 286 required for an instruction is variable depending on the instruction type used. The following table shows the number of operands and the definition of the length for each instruction type. Operand type

【００８２】[0082]

【表３】 [Table 3]

【００８３】図１２は、３オペランド命令に対する図１
０のデータワード、オペランド記述子２８２のデータワ
ードフォーマット３００と、２オペランド命令に対する
データワードフォーマット３０１とを示している。以下
の表に、オペランド記述子のコード化の詳細を示す。オペランド記述子FIG. 12 is a diagram of FIG. 1 for a three-operand instruction.
0 shows a data word of 0, a data word format 300 of an operand descriptor 282, and a data word format 301 for a two-operand instruction. The following table details the encoding of the operand descriptor. Operand descriptor

【００８４】[0084]

【表４】 [Table 4]

【００８５】上述の表において、一定データアドレスモ
ードの場合には、コプロセッサ２２４が１つの内部デー
タ項目をフェッチあるいは計算して、この項目を当該オ
ペランドの命令長として用いる。タイルアドレスモード
の場合には、コプロセッサ２２４がいくつかのデータを
サイクルして「タイル効果」を得る。オペランド記述子
のＬビットがゼロの場合には、データが短く、データ項
目がオペランドワード中に存在することを意味する。In the above table, in the case of the fixed data address mode, the coprocessor 224 fetches or calculates one internal data item, and uses this item as the instruction length of the operand. In the case of the tile address mode, the coprocessor 224 cycles some data to obtain a “tile effect”. If the L bit of the operand descriptor is zero, the data is short, meaning that the data item is present in the operand word.

【００８６】図１０において、それぞれのオペランド／
結果ワード２８３−２８６は、オペランド自身の値ある
いはデータが格納されているオペランド／結果の開始位
置を示す３２ビット仮想アドレスを含む。図２の命令制
御部２３５は、命令を二段階で復号する。最初に、命令
の主オプコードが有効であるかを調べ、主オプコード
（図１１）が無効である場合にはエラーを生成する。次
に、ＣＢｕｓ２３１を介して種々のレジスタを設定する
ことにより、命令制御部２３５が命令を実行し、命令に
指定されている動作を行う。なお、設定するレジスタが
ないような命令もある。In FIG. 10, each operand /
Result words 283-286 include a 32-bit virtual address that indicates the value of the operand itself or the start position of the operand / result where the data is stored. The instruction control unit 235 in FIG. 2 decodes the instruction in two stages. First, it checks whether the main opcode of the instruction is valid, and generates an error if the main opcode (FIG. 11) is invalid. Next, by setting various registers via the CBus 231, the instruction control unit 235 executes the instruction and performs the operation specified by the instruction. Note that there are instructions for which there is no register to be set.

【００８７】各モジュールのレジスタは動作に応じてい
くつかの種別に分けられる。まず、状態レジスタタイプ
があり、他のモジュールからは「読み込まれるのみ」
で、レジスタを含むモジュールによって「読み込み／書
き込み」されるものがある。次に、構成レジスタの一番
目のタイプ（以降、ｃｏｎｆｉｇ１）は、モジュールか
ら外部的に「読み込み／書き込み」され、レジスタを含
むモジュールからは「読み込みのみ」される。これらの
レジスタは一般にアドレス値などの大きなタイプ構成情
報を格納する際に用いられる。構成レジスタの二番目の
タイプ（以降、ｃｏｎｆｉｇ２）はすべてのモジュール
から読み込み、書き込みができるが、レジスタを含むモ
ジュールからは読み込みしかできない。このレジスタタ
イプは、レジスタのビットごとのアドレシングが必要な
ときに用いられる。The registers of each module are classified into several types according to the operation. First, there is a status register type, which is "only read" from other modules
Some are "read / written" by a module including a register. Next, the first type of configuration register (hereinafter config1) is externally "read / write" from the module and "read only" from the module containing the register. These registers are generally used when storing large type configuration information such as address values. The second type of configuration register (hereinafter config2) can be read and written from all modules, but can only be read from modules containing registers. This register type is used when addressing for each bit of the register is required.

【００８８】制御タイプのレジスタとしては種々のもの
が存在する。第一のタイプ（以降、ｃｏｎｔｒｏｌ１レ
ジスタ）はすべてのモジュール（レジスタを含むモジュ
ールも含む）によって読み込み／書き込みが可能であ
る。Ｃｏｎｔｒｏｌ１レジスタは、アドレス値などの大
きな制御情報を格納する際に用いられる。同様に、制御
レジスタの第二のタイプ（以降、ｃｏｎｔｒｏｌ２）
は、ビットごとに設定される。There are various types of control type registers. The first type (hereinafter, control1 register) can be read / written by all modules (including modules including registers). The Control1 register is used when storing large control information such as an address value. Similarly, a second type of control register (hereinafter, control2)
Is set for each bit.

【００８９】最後のレジスタタイプ（インタラプトレジ
スタ）は、レジスタを含むモジュールによって１にセッ
トされ、セットされたビットに「１」を外部から書き込
みことによりゼロにリセットすることができるようなビ
ットをレジスタ内に含む。このようなタイプのレジスタ
はそれぞれのモジュールからのインタラプト／エラー信
号に対処するために用いられる。The last register type (interrupt register) is set to 1 by the module including the register, and a bit in the register which can be reset to zero by externally writing “1” to the set bit is set. Included. This type of register is used to handle interrupt / error signals from each module.

【００９０】コプロセッサ２２４の各モジュールは、命
令を実行中でビジー状態のときには、ＣＢｕｓ２３１上
のｃ＿ａｃｔｉｖｅラインをセットする。このため、命
令制御部２３５は、ＣＢｕｓ２３１上の各モジュールか
らのｃ＿ａｃｔｉｖｅラインの「ＯＲ」をとり、命令が
終了した時点を把握することができる。ローカルメモリ
制御モジュール２３６と周辺インタフェース制御モジュ
ール２３７とは、オーバラップ命令を実行することがで
き、オーバラップ命令を実行する際に起動するｃ＿ｂａ
ｃｋｇｒｏｕｎｄラインを備える。オーバラップ命令
は、ローカルメモリインタフェースと周辺インタフェー
スとの間でデータを転送する「ローカルＤＭＡ」命令で
ある。Each module of the coprocessor 224 sets the c_active line on the CBus 231 when the module is busy executing an instruction. For this reason, the instruction control unit 235 can take the “OR” of the c_active line from each module on the CBus 231 and know the time when the instruction is completed. The local memory control module 236 and the peripheral interface control module 237 can execute an overlap instruction, and c_ba is activated when the overlap instruction is executed.
CKround line is provided. The overlap command is a “local DMA” command that transfers data between the local memory interface and the peripheral interface.

【００９１】オーバラップローカルＤＭＡ命令の実行サ
イクルは、他の命令の実行サイクルとは異なる。オーバ
ラップ命令が実行に移されるにあたっては、命令制御部
２３５が既にオーバラップ命令が実行されているかどう
かを調べる。オーバラップ命令が既に存在すれば、ある
いはオーバラップ命令が不稼動モードになっていれば、
命令制御部２３５は命令が終了するのを待ってから、当
該命令の実行に移る。オーバラップ命令が存在せず、か
つ稼動モードになっていれば、命令制御部２３５はすぐ
にオーバラップ命令を復号し、周辺インタフェース制御
部２３７やローカルメモリ制御部２３６を構成し命令を
実行する。レジスタを構成し終えたら、従来の意味で命
令が終了するのを待たずに命令制御部２３５はレジスタ
（終了レジスタ、状態レジスタ、命令ポインタ等）を更
新する。この時点で、終了シーケンス番号はインタラプ
トシーケンス番号と同一であれば、「オーバラップ命令
終了」インタラプト信号を出力するのではなく単に当該
信号を用意する。「オーバラップ命令終了」インタラプ
ト信号は、オーバラップ命令が完全に終了した時点で出
力される。The execution cycle of the overlap local DMA instruction is different from the execution cycles of other instructions. Before the execution of the overlap instruction, the instruction control unit 235 checks whether the overlap instruction has already been executed. If an overlap instruction already exists, or if the overlap instruction is in inactive mode,
The instruction control unit 235 waits for the end of the instruction, and then shifts to execution of the instruction. If there is no overlap instruction and the operation mode is set, the instruction control unit 235 immediately decodes the overlap instruction, configures the peripheral interface control unit 237 and the local memory control unit 236, and executes the instruction. After configuring the registers, the instruction control unit 235 updates the registers (end registers, status registers, instruction pointers, etc.) without waiting for the instruction to end in the conventional sense. At this time, if the end sequence number is the same as the interrupt sequence number, the signal is simply prepared instead of outputting the “end of overlap instruction” interrupt signal. The "overlap instruction end" interrupt signal is output when the overlap instruction is completely completed.

【００９２】命令が復号されると、命令制御部は現在の
命令を実行しつつ、次の命令をプレフェッチする。ほと
んどの命令では、命令のフェッチ、復号よりも命令の実
行に要する時間の方がかなり長い。命令制御部２３５
は、以下の条件が揃った時点で命令をプレフェッチす
る。１．現在実行中の命令がインタラプトや休止中でない２．現在実行中の命令がジャンプ命令でない３．次の命令ストリームがプリフェッチ可能である４．他にペンディングしている命令が存在する命令制御部２３５がプレフェッチ可能と判断すると、次
の命令に要求を出し、プレフェッチバッファに配置し、
バッファを有効にする。ここまで処理を進めると、命令
制御部２３５は現在実行中の命令が終了するまでは何も
することがなく、当該命令の終了をＣＢｕｓ２３１上の
ｃ＿ａｃｔｉｖｅとｃ＿ｂａｃｋｇｒｏｕｎｄラインを
調べることのみを行う。３．８命令制御部のレジスタの更新命令が終了すると、命令制御部２３５は新しい状態を反
映させるためにレジスタの更新を行う。この処理は外部
からのアクセスとの同期の問題を避けるために高速に行
わなければならない。この高速更新処理は以下の手順で
行われる。１．適切なレジスタアクセスセマフォアの入手。セマフ
ォアが命令制御部２３５の外部のエージェントによって
占有されている場合には、セマフォアが解放されるまで
命令実行サイクルが待機し、解放されてから処理に移
る。２．適切なレジスタの更新。命令が適切なジャンプ命令
でない場合には、命令ポインタ（ｉｃ＿ｉｐａとｉｃ＿
ｉｐｂ）を命令のサイズ分増加させる。ジャンプ命令の
ときは、ジャンプ先の値が命令ポインタにロードされ
る。従って、シーケンス番号が稼動モードであれば終了
レジスタ（ｉｃ＿ｆｎａとｉｃ＿ｆｎｂ）は増加するこ
とになる。When an instruction is decoded, the instruction control unit prefetches the next instruction while executing the current instruction. For most instructions, the time it takes to execute the instruction is significantly longer than the time it takes to fetch and decode the instruction. Command control unit 235
Prefetches instructions when the following conditions are met: 1. 1. The currently executing instruction is not interrupted or paused. 2. The currently executing instruction is not a jump instruction. 3. The next instruction stream can be prefetched. If there is another pending instruction, the instruction control unit 235 determines that prefetching is possible, issues a request for the next instruction, and places it in the prefetch buffer.
Enable the buffer. When the processing is performed so far, the instruction control unit 235 does nothing until the currently executed instruction ends, and only checks the c_active and c_background lines on the CBus 231 for the end of the instruction. 3.8 Updating Register of Instruction Control Unit When the instruction ends, the instruction control unit 235 updates the register to reflect a new state. This process must be performed at high speed to avoid synchronization problems with external access. This high-speed update process is performed in the following procedure. 1. Obtain the appropriate register access semaphore. If the semaphore is occupied by an agent outside the instruction control unit 235, the instruction execution cycle waits until the semaphore is released, and the processing proceeds after the semaphore is released. 2. Update appropriate registers. If the instruction is not a proper jump instruction, the instruction pointers (ic_ipa and ic_ipa)
ipb) is increased by the size of the instruction. In the case of a jump instruction, the value of the jump destination is loaded into the instruction pointer. Therefore, if the sequence number is the operation mode, the end registers (ic_fna and ic_fnb) increase.

【００９３】状態レジスタ（ｉｃ＿ｓｔａｔ）も新しい
状態を反映させるように適切に更新される。必要であれ
ば、休止ビットを設定することもある。インタラプトが
生じ、インタラプトに対する休止が稼動状態になった
り、エラーが生じた場合には、命令制御部２３５は休止
する。休止は、状態レジスタ中の命令ストリーム休止ビ
ット（ａ＿ｐａｕｓｅとｂ＿ｐａｕｓｅ）をセットする
ことによって起動される。命令実行を再開する際には、
これらのビットを０にリセットしなければならない。３．１クロックサイクル時間、ＣＢｕｓ２３１上にｃ＿
ｅｎｄ信号を送出し、コプロセッサ２２４中の他のモジ
ュールに命令が終了した旨を伝える。４．必要であればインタラプトを送出する。インタラプ
トの送出は、以下の状況のときに送出される。ａ．「シーケンス番号終了」インタラプトが生じたと
き。すなわち、終了レジスタ（ｉｃ＿ｆｎａとｉｃ＿ｆ
ｎｂ）シーケンス番号がインタラプトシーケンス番号と
一致したとき。このとき、インタラプトが準備され、シ
ーケンス番号が稼動モードになり、インタラプトが生じ
る。あるいは、ｂ．終了した命令が終了時点でインタラプトするように
符号化されている場合。この場合にはインタラプト機構
が起動される。３．９レジスタアクセスセマフォアのセマンティック
スレジスタアクセスセマフォアは、複数の命令制御レジス
タに高速アクセスを提供する機構である。高速アクセス
を必要とするレジスタとして、以下のものが挙げられ
る。１．命令ポインタレジスタ（ｉｃ＿ｉｐａとｉｃ＿ｉｐ
ｂ）２．ＴｏＤｏレジスタ（ｉｃ＿ｔｄａとｉｃ＿ｔｄｂ）３．終了レジスタ（ｉｃ＿ｆｎａとｉｃ＿ｆｎｂ）４．インタラプトレジスタ（ｉｃ＿ｉｎｔａとｉｃ＿ｉ
ｎｔｂ）５．構成レジスタ中の休止ビット（ｉｃ＿ｃｆｇ）外部エージェントはすべてのレジスタをいつでも安全に
読むことができる。また、外部エージェントはすべての
レジスタにいつでも書き込むことができるが、命令制御
部２３５がこれらのレジスタ中の値を更新してしまわな
いように、外部エージェントはまずレジスタアクセスセ
マフォアを入手しなければならない。命令制御部は、レ
ジスタアクセスセマフォアが外部で宣言されている間は
上述のレジスタ中の値を更新することはできない。ま
た、命令制御部２３５は、高速を維持するために１クロ
ックサイクルの間に上述のすべてのレジスタを更新す
る。The status register (ic_stat) is also appropriately updated to reflect the new status. If necessary, a pause bit may be set. When an interrupt occurs and the pause for the interrupt is activated or an error occurs, the instruction control unit 235 pauses. Pause is activated by setting the instruction stream pause bits (a_pause and b_pause) in the status register. When resuming instruction execution,
These bits must be reset to zero. 3.1 clock cycle time, c_ on CBus 231
An end signal is sent to notify the other modules in the coprocessor 224 that the instruction has been completed. 4. Send an interrupt if necessary. An interrupt is sent in the following situations. a. When the "sequence number end" interrupt occurs. That is, the end registers (ic_fna and ic_f
nb) When the sequence number matches the interrupt sequence number. At this time, an interrupt is prepared, the sequence number is set to the operation mode, and an interrupt occurs. Or b. When the completed instruction is coded to interrupt at the end. In this case, the interrupt mechanism is activated. 3.9 Semantics of Register Access Semaphore The register access semaphore is a mechanism that provides high-speed access to a plurality of instruction control registers. The registers that require high-speed access include the following. 1. Instruction pointer registers (ic_ipa and ic_ip
b) 2. 2. ToDo registers (ic_tda and ic_tdb) 3. End registers (ic_fna and ic_fnb) Interrupt registers (ic_inta and ic_i
ntb) 5. Pause bit in configuration register (ic_cfg) The external agent can safely read all registers at any time. Also, the external agent can write to all registers at any time, but the external agent must first obtain a register access semaphore so that the instruction controller 235 does not update the values in these registers. . The instruction control unit cannot update the value in the above-mentioned register while the register access semaphore is declared externally. Further, the instruction control unit 235 updates all the registers during one clock cycle in order to maintain high speed.

【００９４】前述のように、シーケンス機構が稼動モー
ドであれば、各命令には３２ビットの「シーケンス番
号」が付与されている。命令シーケンス番号は順々に増
加していき、０ｘＦＦＦＦＦＦＦＦから０ｘ０００００
０００にラッピングされる。外部からの書き込みがイン
タラプトレジスタ（ｉｃ＿ｉｎｔａとｉｃ＿ｉｎｔｂ）
になされると、命令制御部２３５はすぐに以下の比較と
更新を行う。１．インタラプトシーケンス番号（インタラプトレジス
タ中の値）が同一ストリームの終了シーケンス番号（終
了レジスタ中の値）よりも「大きければ」（モジュロ演
算）、命令制御部は状態レジスタ中の「シーケンス番号
終了」準備ビット（ｉｃ＿ｓｔａｔ中のａ＿ｐｒｉｍｅ
ｄとｂ＿ｐｒｉｍｅｄビット）をセットすることで「シ
ーケンス番号終了」インタラプト機構を準備する。２．インタラプトシーケンス番号が終了シーケンス番号
よりも「小さく」、当該ストリームにおいてオーバラッ
プ命令が実行中であり、インタラプトシーケンス番号が
最後のオーバラップ命令シーケンス番号（ｉｃ＿ｌｏａ
あるいはｉｃ＿ｌｏｂレジスタ中の値）と同一であれ
ば、命令制御部はｉｃ＿ｓｔａｔレジスタ中のａ＿ｏｌ
＿ｐｒｉｍｅｄあるいはｂ＿ｏｌ＿ｐｒｉｍｅｄビット
をセットすることで「オーバラップ命令シーケンス番号
終了」インタラプト機構を準備する。３．インタラプトシーケンス番号が終了シーケンス番号
よりも「小さく」、当該ストリームにおいてオーバラッ
プ命令が実行中であり、インタラプトシーケンス番号が
最後のオーバラップ命令シーケンス番号と同一でなけれ
ば、インタラプトシーケンス番号は終了命令を示すこと
になり、インタラプト機構は準備されない。４．インタラプトシーケンス番号が終了シーケンス番号
よりも「小さく」、当該ストリームにおいてオーバラッ
プ命令が実行中でなければ、インタラプトシーケンス番
号は終了命令を示すことになり、インタラプト機構は準
備されない。As described above, when the sequence mechanism is in the operation mode, a 32-bit "sequence number" is assigned to each instruction. The instruction sequence number increases sequentially, from 0xFFFFFFFF to 0x00000.
Wrapped to 000. Write from outside is interrupt register (ic_inta and ic_intb)
, The instruction control unit 235 immediately performs the following comparison and update. 1. If the interrupt sequence number (the value in the interrupt register) is larger than the end sequence number (the value in the end register) of the same stream (modulo operation), the instruction control unit sets the "sequence number end" preparation bit in the status register. (A_prime in ic_stat
By setting the d and b_primed bits), a "sequence number end" interrupt mechanism is prepared. 2. The interrupt sequence number is “smaller” than the end sequence number, an overlap instruction is being executed in the stream, and the interrupt sequence number is the last overlap instruction sequence number (ic_loa).
Or the value in the ic_lob register), the instruction control unit a
Set the _primed or b_ol_primed bit to set up the “End Overlap Instruction Sequence Number” interrupt mechanism. 3. If the interrupt sequence number is "smaller" than the end sequence number, an overlap instruction is being executed in the stream, and the interrupt sequence number is not the same as the last overlap instruction sequence number, the interrupt sequence number indicates the end instruction. That is, no interrupt mechanism is prepared. 4. If the interrupt sequence number is "smaller" than the end sequence number and no overlap instruction is being executed in the stream, the interrupt sequence number will indicate the end instruction, and no interrupt mechanism is prepared.

【００９５】外部のエージェントは、状態レジスタ中の
インタラプト準備ビット（ａ＿ｐｒｉｍｅｄ，ａ＿ｏｌ
＿ｐｒｉｍｅｄ，ｂ＿ｐｒｉｍｅｄ，ｂ＿ｏｌ＿ｐｒｉ
ｍｅｄビット）をセットすることができ、インタラプト
機構を独立に起動、解除することができる。３．１０命令制御部図１３は、命令制御部２３５をより詳細に示した図であ
る。命令制御部２３５は、命令実行サイクルを処理しコ
プロセッサ２２４の全体の実行制御を管理する実行制御
部３０５を含む。実行制御部３０５は、命令制御部２３
５の全体の実行制御を管理し、命令シーケンスを決定
し、命令のフェッチやプレフェッチを行い、命令の復号
や命令制御レジスタの更新を行う。命令制御部は更に命
令復号器３０６を備える。命令復号器３０６は、プレフ
ェッチバッファ３０７から命令を受信し、前述の通り復
号する。命令復号器３０６は、他のコプロセッサモジュ
ール中のレジスタを構成して命令を実行する処理も行
う。プレフェッチバッファ制御部３０７は、プレフェッ
チバッファ制御部中のプレフェッチバッファからの読み
込みや書き込みを管理するとともに、命令復号器３０６
と入力インタフェーススイッチ２５２（図２）との間の
インタフェースをも管理する。また、プレフェッチバッ
ファ制御部３０７は二つの命令ポインタレジスタ（ｉｃ
＿ｉｐａとｉｃ＿ｉｐｂ）の更新をも管理する。命令制
御部２３５、種々のモジュール２３９（図２）、外部イ
ンタフェース制御部２３８（図２）からのＣＢｕｓ２３
１（図２）へのアクセスは、三つのモジュールのアクセ
ス要求間での調停を行う「ＣＢｕｓ」調停部３０８にお
いて行われる。要求はＣＢｕｓ２３１によって種々のモ
ジュールのレジスタ部に転送される。The external agent sends an interrupt preparation bit (a_primed, a_ol) in the status register.
_Primed, b_primed, b_ol_pri
(med bit) can be set, and the interrupt mechanism can be independently activated and released. 3.10 Command Control Unit FIG. 13 is a diagram showing the command control unit 235 in more detail. The instruction control unit 235 includes an execution control unit 305 that processes an instruction execution cycle and manages overall execution control of the coprocessor 224. The execution control unit 305 includes the instruction control unit 23
5 to manage the overall execution control, determine an instruction sequence, fetch and prefetch instructions, decode instructions, and update the instruction control register. The instruction control unit further includes an instruction decoder 306. The instruction decoder 306 receives the instruction from the prefetch buffer 307 and decodes the instruction as described above. The instruction decoder 306 also performs a process of configuring a register in another coprocessor module to execute an instruction. The prefetch buffer control unit 307 manages reading and writing from the prefetch buffer in the prefetch buffer control unit, and controls the instruction decoder 306.
And the input interface switch 252 (FIG. 2). The prefetch buffer control unit 307 has two instruction pointer registers (ic
_Ipa and ic_ipb) are also managed. CBus 23 from the instruction control unit 235, various modules 239 (FIG. 2), and the external interface control unit 238 (FIG. 2).
1 (FIG. 2) is performed by a “CBus” arbitration unit 308 that arbitrates between access requests of three modules. The request is transferred by the CBus 231 to the registers of various modules.

【００９６】図１４は、図１３の実行制御部３０５をよ
り詳細に示した図である。前述の通り、実行制御部は図
９の命令実行サイクル２７５の処理を管理し、特に以下
の処理を行う。１．次の命令をどの命令ストリームから取り出すかを決
定し、２．当該命令のフェッチを開始し、３．プレフェッチバッファに格納されている命令の復号
を命令復号器に指示し、４．次の命令のプレフェッチを決定して開始し、５．命令の終了を決定し、６．命令が終了したらレジスタを更新する。FIG. 14 is a diagram showing the execution control unit 305 of FIG. 13 in more detail. As described above, the execution control unit manages the processing of the instruction execution cycle 275 of FIG. 9, and particularly performs the following processing. 1. 1. Determine from which instruction stream to fetch the next instruction; 2. Start fetching the instruction; 3. Instruct the instruction decoder to decode the instruction stored in the prefetch buffer; 4. Determine and start prefetching the next instruction; 5. determine the end of the instruction; When the instruction is completed, update the register.

【００９７】実行制御部は、全体の命令実行サイクルを
管理する大きなコア状態器３１０（以下、中枢部と呼
ぶ）を備える。図１５は、上述の命令実行サイクルを管
理する中枢部３１０状態遷移図を示した図である。図１
４において、実行制御部は命令プレフェッチ論理部３１
１を備える。この部位は、実行すべき命令が存在するか
どうか、どの命令ストリームに命令が属するか、の決定
処理を行う。図１５の遷移図において開始３１２ならび
にプレフェッチ３１３状態は、この情報を用いて命令を
入手する。図１４のレジスタ管理部３１７は、双方の命
令ストリームのレジスタアクセスセマフォアをモニタ
し、各モジュール中の必要なすべてのレジスタを更新す
る処理を行う。また、終了レジスタ（ｉｃ＿ｆｎａとｉ
ｃ＿ｆｎｂ）とインタラプトレジスタ（ｉｃ＿ｉｎｔａ
とｉｃ＿ｉｎｔｂ）とを比較し、「シーケンス番号終
了」インタラプトを行うべきかどうかを決定する処理
も、レジスタ管理部３１７が行う。更に、レジスタ管理
部３１７はインタラプト準備処理も行う。オーバラップ
命令部３１８は、ｉｃ＿ｓｔａｔレジスタ中の適切な状
態ビットの管理を通して、オーバラップ命令の終了処理
の管理を行う。実行制御部は、更に中枢部３１０と図１
３の命令復号器３０６との間のインタフェースを行う復
号インタフェース部３１９を備える。The execution control unit has a large core state machine 310 (hereinafter referred to as a central unit) that manages the entire instruction execution cycle. FIG. 15 is a diagram showing a state transition diagram of the central unit 310 for managing the above-mentioned instruction execution cycle. FIG.
In 4, the execution control unit includes the instruction prefetch logic unit 31.
1 is provided. This section performs processing for determining whether an instruction to be executed exists and to which instruction stream the instruction belongs. In the transition diagram of FIG. 15, the start 312 and prefetch 313 states use this information to obtain an instruction. The register management unit 317 of FIG. 14 monitors the register access semaphores of both instruction streams and performs a process of updating all necessary registers in each module. Also, the end registers (ic_fna and i
c_fnb) and an interrupt register (ic_inta)
And ic_intb), and the register management unit 317 also performs a process of determining whether or not to execute a “sequence number end” interrupt. Further, the register management unit 317 also performs an interrupt preparation process. The overlap instruction unit 318 manages the end processing of the overlap instruction through management of appropriate status bits in the ic_stat register. The execution control unit further includes the central unit 310 and FIG.
And a decoding interface unit 319 for interfacing with the third instruction decoder 306.

【００９８】図１６は、命令復号部３０６をより詳細に
示した図である。命令復号器はコプロセッサを構成して
プレフェッチバッファ内の命令を実行する処理を行う。
命令復号器３０６は、多くの小さな状態マシンの組み合
わせである大きな状態マシンから構成される命令復号シ
ーケンサ３２１を備える。命令シーケンサ３２１は，各
モジュール中のレジスタをセットするＣＢｕｓディスパ
ッチャ３１２と通信する。また、命令復号シーケンサ３
２１は、命令の有効性や命令のオーバラップ状況などの
関連情報を実行制御部に伝える。ここで、命令の有効性
チェックは命令オプコードが予約されているオプコード
であるかどうかをチェックするものである。FIG. 16 is a diagram showing the instruction decoding unit 306 in more detail. The instruction decoder performs a process of configuring a coprocessor to execute an instruction in the prefetch buffer.
The instruction decoder 306 comprises an instruction decoding sequencer 321 consisting of a large state machine, which is a combination of many small state machines. The instruction sequencer 321 communicates with a CBus dispatcher 312 that sets registers in each module. Instruction decoding sequencer 3
21 informs the execution control unit of relevant information such as the validity of the instruction and the overlap state of the instruction. Here, the instruction validity check checks whether the instruction opcode is a reserved opcode.

【００９９】図１７は、図１６の命令ディスパッチャシ
ーケンサ３２１をより詳細に示した図である。命令ディ
スパッチャシーケンサ３２１は、全体のシーケンス制御
状態マシン３２４と連続したモジュール毎構成シーケン
サ状態マシン（例えば３２５や３２６）を備える。モジ
ュール毎構成シーケンサ状態マシンは構成すべき各モジ
ュールに与えられる。全体として状態マシンはモジュー
ルのコプロセッサマイクロプログラミングを定義する。
状態マシン（例えば３２５）は、ＣＢｕｓディスパッチ
ャに全体のＣＢｕｓを利用して種々のレジスタをセット
するように指示し、処理のための種々モジュールを構成
する。特定のレジスタに書き込みをするためには、命令
の実行が開始されなければならない。一般に命令の実行
にはシーケンサ３２１が処理のためにコプロセッサのレ
ジスタを構成する以上の時間が必要である。付録Ａにお
いて、コプロセッサの命令シーケンサによって実行され
るマイクロプログラミング処理と命令シーケンサ３２１
によってセットアップされた形式を示す。FIG. 17 is a diagram showing the instruction dispatcher sequencer 321 of FIG. 16 in more detail. The instruction dispatcher sequencer 321 comprises an overall sequence control state machine 324 and a contiguous per-module sequencer state machine (eg, 325 or 326). A per-module configuration sequencer state machine is provided for each module to be configured. The state machine as a whole defines the coprocessor microprogramming of the module.
The state machine (eg, 325) instructs the CBus dispatcher to use the entire CBus to set various registers, and configures various modules for processing. To write to a particular register, execution of the instruction must begin. Generally, the execution of an instruction requires more time than the sequencer 321 configures the coprocessor register for processing. In Appendix A, the microprogramming process performed by the instruction sequencer of the coprocessor and the instruction sequencer 321 are described.
Indicates the format set up by.

【０１００】実際には、命令復号シーケンサ３２１は命
令ごとにコプロセッサ中のすべてのモジュールを構成す
るわけではない。以下の表では、命令クラスに対するモ
ジュール構成順序を、ピクセルオーガナイザ２４６（Ｐ
Ｏ）、データキャッシュ制御部２４０（ＤＣＣ）、オペ
ランドオーガナイザＢ２４７（ＯＯＢ）、オペランドオ
ーガナイザＣ２４８（ＯＯＣ）、主データパス２４２
（ＭＤＰ）、結果オーガナイザ２４９（ＲＯ）、ＪＰＥ
Ｇエンコーダ２４１（ＪＣ）などの構成されるモジュー
ルとともに示している。なお、外部インタフェース制御
部２３８（ＥＩＣ），ローカルメモリ制御部２３６（Ｌ
ＭＣ），命令制御部２３５自身（ＩＣ）、入力インタフ
ェーススイッチ２５２（ＩＩＳ）、雑多モジュール（Ｍ
Ｍ）などのモジュールは、命令復号処理中には構成され
ることはない。Actually, the instruction decoding sequencer 321 does not configure all the modules in the coprocessor for each instruction. In the following table, the module configuration order for the instruction class is indicated by the pixel organizer 246 (P
O), data cache control unit 240 (DCC), operand organizer B247 (OOB), operand organizer C248 (OOC), main data path 242
(MDP), Result Organizer 249 (RO), JPE
It is shown together with a module such as a G encoder 241 (JC). The external interface control unit 238 (EIC) and the local memory control unit 236 (L
MC), the instruction control unit 235 itself (IC), the input interface switch 252 (IIS), the miscellaneous module (M
Modules such as M) are not configured during the instruction decoding process.

【０１０１】モジュール立ち上げ順序Module startup order

【０１０２】[0102]

【表５】 [Table 5]

【０１０３】図１７において、各モジュール構成シーケ
ンサ（例えば３２５）は必要なレジスタアクセス処理を
行って特定のモジュールを構成するように管理する。ま
た、全体のシーケンス制御状態マシン３２４は、前述の
順序でモジュール構成シーケンサの全体の動作を管理す
る。図１８は、上の表に従って関連するモジュール構成
シーケンサを起動する全体シーケンス制御を状態遷移図
３３０で表した図である。各モジュール構成シーケンサ
は、モジュールの実行中に種々のレジスタをセットする
ために、ＣＢｕｓディスパッチャを制御して、レジスタ
内容を変更する処理を行う。In FIG. 17, each module configuration sequencer (for example, 325) performs necessary register access processing and manages to configure a specific module. The overall sequence control state machine 324 manages the overall operation of the module sequencer in the order described above. FIG. 18 is a state transition diagram 330 illustrating the overall sequence control for activating the related module configuration sequencer according to the above table. Each module configuration sequencer controls the CBus dispatcher and performs a process of changing register contents in order to set various registers during execution of the module.

【０１０４】図１９は、図１３のプリフェッチバッファ
制御部３０７をより詳細に示した図である。プリフェッ
チバッファ制御部は単一のコプロセッサ命令（６×３２
ビットワード）を格納するためのプリフェッチバッファ
３３５を備える。そして、プリフェッチバッファはＩＢ
ｕｓシーケンサ３３６によって制御される一つの書き込
みポートと、命令復号器、実行制御部、命令制御部ＣＢ
ｕｓインタフェースにデータを送出する一つの読み込み
ポートを備える。ＩＢｕｓシーケンサ３３６は、プリフ
ェッチバッファ３３５の入力インタフェーススイッチへ
の接続においてバスプロトコルを監視する。また、命令
をフェッチするためにアドレスを生成するアドレス管理
部３３７をも備える。アドレス管理部３３７は、ｉｃ＿
ｉｐａあるいはｉｃ＿ｉｐｂの一つを選択し入力インタ
フェーススイッチへのバスに接続する機能と、最後の命
令がどのストリームからフェッチされたかに基づいてｉ
ｃ＿ｉｐａあるいはｉｃ＿ｉｐｂの一つを増加させる機
能と、ｉｃ＿ｉｐａとｉｃ＿ｉｐｂレジスタにジャンプ
先のアドレスを格納する機能とを有する。ＰＢＣ制御部
３３９はプレフェッチバッファ制御部３０７の全体の制
御を行う。３．１１モジュールローカルレジスタファイルの説明図１３に示したように、命令制御モジュール自身を含む
各モジュールは、図２０に示してあるＣＢｕｓインタフ
ェース制御部３０３とともに上述したレジスタ３０４の
内部セットを備え、ＣＢｕｓ要求を受け付けるとともに
当該要求に応じて内部レジスタを更新する処理を行う。
モジュールの制御は、ＣＢｕｓインタフェース３０２を
介してモジュール中のレジスタ３０４に書き込むことに
よって行われる。ＣＢｕｓ調整部３０８（図１３）は、
命令制御部２３５、外部インタフェース制御部、雑多モ
ジュールのどのモジュールがＣＢｕｓを制御し、ＣＢｕ
ｓのマスターとして動作し、レジスタの書き込み／読み
出しを行うのかを決定する。FIG. 19 is a diagram showing the prefetch buffer control unit 307 of FIG. 13 in more detail. The prefetch buffer controller controls a single coprocessor instruction (6 × 32
A pre-fetch buffer 335 for storing a bit word). And the prefetch buffer is IB
us sequencer 336, one write port, an instruction decoder, an execution control unit, and an instruction control unit CB.
One read port for sending data to the us interface. The IBus sequencer 336 monitors the bus protocol at the connection of the prefetch buffer 335 to the input interface switch. Further, an address management unit 337 for generating an address for fetching an instruction is provided. The address management unit 337
function of selecting one of ipa or ic_ipb and connecting to the bus to the input interface switch, and i based on which stream the last instruction was fetched from.
It has a function of increasing one of c_ipa or ic_ipb, and a function of storing a jump destination address in the ic_ipa and ic_ipb registers. The PBC control unit 339 controls the entire prefetch buffer control unit 307. 3.11 Description of Module Local Register File As shown in FIG. 13, each module including the instruction control module itself has an internal set of the register 304 described above together with the CBus interface control unit 303 shown in FIG. A request is received and a process of updating an internal register according to the request is performed.
Control of the module is performed by writing to a register 304 in the module via the CBus interface 302. The CBus adjustment unit 308 (FIG. 13)
Which of the instruction control unit 235, the external interface control unit, and the miscellaneous module controls the CBus,
It operates as the master of s, and determines whether to write / read the register.

【０１０５】図２０は、各モジュールにおいて用いられ
るＣＢｕｓインタフェース３０３の標準構成を示した図
である。標準ＣＢｕｓインタフェース３０３はＣＢｕｓ
３０２からの読み出し要求や書き込み要求を受信すると
ともに、モジュール内の種々のサブモジュールによって
３４１を介して更新されるレジスタファイル３０４を備
える。更に、メモリ領域の読み出しを含むサブモジュー
ルのメモリ領域の更新を行う制御ライン３４４が備わっ
ている。標準ＣＢｕｓインタフェース３０３はＣＢｕｓ
の目的地として振る舞い、レジスタ３０４や他のサブモ
ジュールのメモリオブジェクトの読み出し要求や書き込
み要求を受け付ける。FIG. 20 is a diagram showing a standard configuration of the CBus interface 303 used in each module. The standard CBus interface 303 is CBus
It has a register file 304 that receives read requests and write requests from 302 and is updated via 341 by various sub-modules within the module. In addition, a control line 344 is provided for updating the memory area of the sub-module, including reading the memory area. The standard CBus interface 303 is CBus
And accepts a read request or a write request for the memory object of the register 304 or another submodule.

【０１０６】「ｃ＿ｒｅｓｅｔ」信号３４５は標準ＣＢ
ｕｓインタフェース１０３内のすべてのレジスタをデフ
ォルト状態にセットする。しかし、「ｃ＿ｒｅｓｅｔ」
は自身とＣＢｕｓマスターとの間の信号のやり取りを制
御する状態マシンはリセットしない。そのため、「ｃ＿
ｒｅｓｅｔ」がＣＢｕｓ処理中に送出されたとしても、
当該処理は何かしらの形で終了することになる。「ｃ＿
ｉｎｔ」３４７、「ｃ＿ｅｘｐ」３４８、「ｃ＿ｅｒ
ｒ」３４９信号は、以下の式に基づいてモジュールｅｒ
ｒ＿ｉｎｔとｅｒｒ＿ｉｎｔ＿ｅｎレジスタの内容より
生成される。The "c_reset" signal 345 is a standard CB
Set all registers in the us interface 103 to the default state. However, "c_reset"
Does not reset the state machine that controls the exchange of signals between itself and the CBus master. Therefore, "c_
reset is sent during CBus processing,
The process will end in some way. "C_
int "347," c_exp "348," c_er "
r "349 signal is calculated using the following equation:
It is generated from the contents of the r_int and err_int_en registers.

【０１０７】[0107]

【数１】 (Equation 1)

【０１０８】[0108]

【数２】 (Equation 2)

【０１０９】[0109]

【数３】 (Equation 3)

【０１１０】信号「ｃ＿ｓｄａｔａ＿ｉｎ」と「ｃ＿ｓ
ｖａｌｉｄ＿ｉｎ」３４５は、モジュール列の中での前
のモジュールからのデータ／有効信号であり、信号「ｃ
＿ｓｄａｔａ＿ｏｕｔ」と「ｃ＿ｓｖａｌｉｄ＿ｏｕ
ｔ」３５０は、モジュール列の中での次のモジュールへ
のデータ／有効信号である。標準ＣＢｕｓインタフェー
ス３０３の機能としては以下のものが含まれる。１．レジスタの読み出し／書き込み管理２．メモリ領域の読み出し／書き込み管理３．テストモードの読み出し／書き込み管理４．サブモジュールの監視／更新管理３．１２レジスタ読み出し／書き込み管理標準ＣＢｕｓインタフェース３０３はＣＢｕｓ上に流れ
るレジスタ読み出し／書き込み要求やビットセット要求
を受け付ける。標準ＣＢｕｓインタフェースが管理する
ＣＢｕｓ命令として以下の２種類ある。１．タイプＡタイプＡは、他のモジュールが標準ＣＢｕｓインタフェ
ース３０３内のレジスタに１、２、３、４バイト読み出
し／書き込みする動作をする。書き込み動作では、命令
サイクルの直後のクロックサイクルでデータサイクルが
生じる。なお、レジスタ書き込み／読み出しのタイプフ
ィールドはそれぞれ「１０００」と「１００１」であ
る。標準ＣＢｕｓインタフェース３０３は命令を復号し
て、命令がモジュールのアドレスを指しているか、読み
出し／書き込み動作のどちらかであるか、を調べる。読
み出し動作では、標準ＣＢｕｓインタフェース３０３
は、ＣＢｕｓ処理の「ｒｅｇ」フィールドを用いてどの
レジスタ出力に「ｃ＿ｓｄａｔａ」バス３５０を接続す
るかを選択する。書き込み動作では、標準ＣＢｕｓイン
タフェース３０３は「ｒｅｇ」フィールドと「ｂｙｔ
ｅ」フィールドを用いて選択されたレジスタにデータを
書き込む。読み出し動作が終了すると、標準ＣＢｕｓイ
ンタフェースはデータを戻すと同時に「ｃ＿ｓｖａｌｉ
ｄ」３５０を送出する。書き込み動作が終了すると、標
準ＣＢｕｓインタフェース３０３は「ｃ＿ｓｖａｌｉ
ｄ」３５０を送出して返答する。２．タイプＣタイプＣは、１つのレジスタ中のバイトの１つに他のモ
ジュールが１ビットあるいは複数ビット書き込む動作を
する。命令とデータとは１つのワードにまとめられる。The signals "c_sdata_in" and "c_s
"valid_in" 345 is a data / valid signal from the previous module in the module row, and the signal "c
_Sdata_out ”and“ c_svalid_ou ”
“t” 350 is a data / valid signal to the next module in the module train. The functions of the standard CBus interface 303 include the following. 1. 1. Register read / write management 2. Read / write management of memory area 3. Read / write management in test mode Submodule monitoring / update management 3.12 Register read / write management The standard CBus interface 303 receives register read / write requests and bit set requests flowing on the CBus. The following two types of CBus commands are managed by the standard CBus interface. 1. Type A Type A operates to read / write 1, 2, 3, 4 bytes from / to a register in the standard CBus interface 303 by another module. In a write operation, a data cycle occurs in a clock cycle immediately after an instruction cycle. The type fields for register write / read are “1000” and “1001”, respectively. The standard CBus interface 303 decodes the instruction and checks whether the instruction is pointing to a module address or a read / write operation. In the read operation, the standard CBus interface 303
Selects which register output to connect the "c_sdata" bus 350 to using the "reg" field of the CBus process. For a write operation, the standard CBus interface 303 uses a “reg” field and a “byte”
Write data to the register selected using the "e" field. When the read operation is completed, the standard CBus interface returns the data and simultaneously reads “c_svali”.
d "350 is transmitted. When the write operation is completed, the standard CBus interface 303 displays “c_svali”.
d "350 and respond. 2. Type C Type C is an operation in which another module writes one or more bits to one of the bytes in one register. Instructions and data are combined into one word.

【０１１１】標準ＣＢｕｓインタフェース３０３は命令
をチェックして、命令がモジュールのアドレスを指して
いるかを調べる。また、「ｒｅｇ」「ｂｙｔｅ」「ｅｎ
ａｂｌｅ」フィールドを復号して、必要なイネーブル信
号を生成する。また、命令のデータフィールドを取り出
し、取り出したデータをワードの４バイトすべてに転送
する。これにより、必要なビットはすべてのイネーブル
バイト中のイネーブルビットに書き込まれることにな
る。この動作においては返答は必要ない。３．１３メモリ領域読み出し／書き込み管理標準ＣＢｕｓインタフェース３０３はＣＢｕｓ上のメモ
リ読み出し／書き込み要求を受け付ける。メモリ読み出
し／書き込み要求を受け付けると、標準ＣＢｕｓインタ
フェース３０３は要求がモジュールのアドレスを指して
いるかを調べる。そして、命令のアドレスフィールドを
復号することで、標準ＣＢｕｓインタフェースは適切な
アドレスと、メモリ読み出し／書き込みを行うサブモジ
ュールへのアドレスストローブ信号３４４とを生成す
る。書き込み動作では、標準ＣＢｕｓインタフェース
は、命令からのバイトイネーブル信号をサブモジュール
に転送する。The standard CBus interface 303 checks the instruction to see if it points to a module address. "Reg", "byte", "en"
Decode the "able" field to generate the required enable signal. It also fetches the data field of the instruction and transfers the fetched data to all four bytes of the word. As a result, the necessary bits are written to the enable bits in all the enable bytes. No response is required in this operation. 3.13 Memory Area Read / Write Management The standard CBus interface 303 receives a memory read / write request on the CBus. Upon accepting a memory read / write request, the standard CBus interface 303 checks whether the request points to a module address. Then, by decoding the address field of the instruction, the standard CBus interface generates an appropriate address and an address strobe signal 344 to the sub-module that performs the memory read / write. For a write operation, the standard CBus interface transfers the byte enable signal from the instruction to the sub-module.

【０１１２】標準ＣＢｕｓインタフェース３０３の動作
は、ＣＢｕｓ３０２上のＣＢｕｓ命令のタイプフィール
ドを復号し、次のサイクルにおいてデータがレジスタフ
ァイル３０４に取り込まれるか、あるいは他のサブモジ
ュール３４４に転送されるようにするために、レジスタ
ファイル３０４と出力セレクタ３５３に適切なイネーブ
ル信号を生成するような読み出し／書き込み制御部３５
２によって制御される。ＣＢｕｓ命令がレジスタ読み出
し動作であれば、読み出し／書き込み制御部３５２は出
力セレクタ３５３をイネーブルにし、「ｃ＿ｓｄａｔａ
バス」３４５への正しいレジスタ出力を選択する。命令
がレジスタ書き込み動作であれば、読み出し／書き込み
制御部３５２はレジスタファイル３０４をイネーブルに
し、次にサイクルでデータを選択する。もしその命令が
メモリエリアのリード／ライトであれば、読み出し／書
き込み制御部３５２は適切な信号３４４を生成し、モジ
ュールが管理するメモリ領域を制御する。レジスタファ
イル３０４は、レジスタ選択復号部３５５、出力セレク
タ３５３、インタラプト３５６、エラー３５７、例外３
５８生成部、アンマスクエラー生成部３５９、あるモジ
ュールのレジスタを構成するレジスタ部３６０の４つの
部位から構成される。レジスタ選択復号部３５５は、読
み出し／書き込み制御部３５２からの信号「ｒｅｆ＿ｅ
ｎ」（レジスタファイルイネーブル）「ｗｒｉｔｅ」
「ｒｅｇ」を復号し、あるレジスタをイネーブルにする
ためのレジスタイネーブル信号を生成する。出力セレク
タ３５３は、読み出し／書き込み制御部３５２からの信
号「ｒｅｇ」出力に応じて、レジスタ読み出し処理のた
めに正しいレジスタデータを選択しｃ＿ｓｄａｔｅ＿ｏ
ｕｔラインに出力する。The operation of the standard CBus interface 303 decodes the type field of the CBus instruction on the CBus 302 so that in the next cycle the data is taken into the register file 304 or transferred to another sub-module 344. For this purpose, the read / write control unit 35 generates an appropriate enable signal to the register file 304 and the output selector 353.
2 is controlled. If the CBus instruction is a register read operation, the read / write control unit 352 enables the output selector 353 and “c_sdata
Select the correct register output to "bus" 345. If the instruction is a register write operation, read / write controller 352 enables register file 304 and then selects data in a cycle. If the instruction is a read / write of a memory area, the read / write control unit 352 generates an appropriate signal 344 to control the memory area managed by the module. The register file 304 includes a register selection decoding unit 355, an output selector 353, an interrupt 356, an error 357, and an exception 3.
58, an unmask error generator 359, and a register 360 that constitutes a register of a certain module. The register selection decoding unit 355 outputs the signal “ref_e” from the read / write control unit 352.
n "(register file enable)" write "
Decode "reg" and generate a register enable signal to enable a certain register. The output selector 353 selects correct register data for register read processing according to the signal “reg” output from the read / write control unit 352, and selects c_sdate_o.
Output to ut line.

【０１１３】例外生成部３５６〜３５９は入力中にエラ
ーが検出されたら出力エラー信号（例えば、３４７〜３
４９、３６２）を生成する。各出力エラーを計算する手
法は前述の通りである。レジスタ部３６０は、表５にお
いてレジスタセットの構成を説明したときに論じたよう
に、要求に応じて種々のタイプになり得る。３．１４ＣＢｕｓ構成前述の通り、ＣＢｕｓ（制御バス）は、各モジュールの
標準ＣＢｕｓインタフェース中のレジスタをセットする
ための情報を転送することによって、全体的に各モジュ
ールを制御する。標準ＣＢｕｓインタフェースの記述か
ら明らかなように、ＣＢｕｓは以下の二つの目的を有す
る。１．各モジュールを駆動する制御バス２．ＲＡＭ，ＦＩＦＯ，各モジュール中の状態情報のた
めのアクセスバスＣＢｕｓは命令−アドレス−データプロトコルを用い
て、モジュール中の構成レジスタをセットすることによ
り、モジュールを制御する。一般に、レジスタは各命令
ごとにセットされるが、修正はどの時点でも行うことが
できる。ＣＢｕｓは状態情報や他の情報を集め、データ
を要求することにより種々のモジュールからＲＡＭやＦ
ＩＦＯデータにアクセスする。When an error is detected during the input, the exception generators 356 to 359 output an output error signal (for example, 347 to 347).
49, 362). The method of calculating each output error is as described above. The register section 360 can be of various types as required, as discussed when describing the configuration of the register set in Table 5. 3.14 CBus Configuration As described above, the CBus (control bus) controls each module as a whole by transferring information for setting a register in the standard CBus interface of each module. As is apparent from the description of the standard CBus interface, CBus has the following two purposes. 1. 1. Control bus for driving each module RAM, FIFO, access bus for status information in each module The CBus controls the modules by setting configuration registers in the modules using an instruction-address-data protocol. Generally, registers are set for each instruction, but modifications can be made at any time. The CBus collects status and other information and requests data from various modules to RAM or F
Access IFO data.

【０１１４】ＣＢｕｓは以下の３つのどちらかにより処
理ごとに駆動される。１．命令実行時の命令制御部２３５（図２）２．ターゲット（スレーブ）モードバス動作実行時の外
部インタフェース制御部２３８（図２）３．外部ＣＢｕｓインタフェースが構成された際には外
部デバイスいずれの場合でも、駆動モジュールはＣＢｕｓの発モジ
ュールとなり、他のすべてのモジュールが可能な着モジ
ュールとなる。バスの調整処理は命令制御部が行う。The CBus is driven for each process by one of the following three methods. 1. 1. Instruction control unit 235 (FIG. 2) during instruction execution 2. External interface control unit 238 (FIG. 2) when executing target (slave) mode bus operation When the external CBus interface is configured, in any case of the external device, the drive module becomes a CBus source module and all other modules become possible destination modules. The bus control process is performed by the instruction control unit.

【０１１５】以下の表は、好適な実施例において用いる
のに適しているＣＢｕｓ信号の一つの定義を示したもの
である。ＣＢｕｓ信号定義The following table shows one definition of a CBus signal that is suitable for use in the preferred embodiment. CBus signal definition

【０１１６】[0116]

【表６】 [Table 6]

【０１１７】ＣＢｕｓのｃ＿ｉａｄ信号はアドレスデー
タを含み、二つの異なるサイクルにおいて制御部によっ
て駆動される。１．ｃ＿ｉａｄ上でＣＢｕｓ命令やアドレスが駆動され
る命令サイクル（ｃ＿ｖａｌｉｄ高）２．ｃ＿ｉａｄ（書き込み動作）やｃ＿ｓｄａｔａ（読
み出し動作）上でデータが駆動されるデータサイクル
（ｃ＿ｖａｌｉｄ低）書き込み動作の場合は、命令に関するデータは命令サイ
クルの直後にｃ＿ｉａｄバス上に置かれる。読み出し動
作の場合は、データサイクルが終了するまで読み出し動
作のターゲットモジュールがｃ＿ｓｄａｔａ信号を駆動
する。The cBus c_iad signal contains address data and is driven by the controller in two different cycles. 1. 1. Instruction cycle in which a CBus instruction or address is driven on c_iad (c_valid high) Data cycle (c_valid low) in which data is driven on c_iad (write operation) or c_sdata (read operation) In the case of a write operation, data relating to an instruction is placed on the c_iad bus immediately after the instruction cycle. In the case of a read operation, the target module of the read operation drives the c_sdata signal until the data cycle ends.

【０１１８】図２１において、バスは３２ビットの命令
−アドレス−データフィールドを含む。このフィールド
は以下の３つのタイプ（３７０〜３７２）がある。１．タイプＡ動作（３７０）は、コプロセッサ中のレジ
スタや各モジュールのデータ領域の読み出し／書き込み
を行うために用いられる。これらの動作は、ターゲット
モードＰＣＩサイクルを実行している外部インタフェー
ス制御部２３８、命令のためにコプロセッサを構成して
いる命令制御部２３１、外部ＣＢｕｓインタフェースに
よって生成される。In FIG. 21, the bus includes a 32-bit instruction-address-data field. This field has the following three types (370 to 372). 1. The type A operation (370) is used to read / write a register in the coprocessor and a data area of each module. These operations are generated by the external interface control unit 238 executing the target mode PCI cycle, the instruction control unit 231 configuring the coprocessor for instructions, and the external CBus interface.

【０１１９】これらの動作では、命令サイクルの直後の
クロックサイクルがデータサイクルとなる。２．タイプＢ動作（３７１）は診断モードで用いられ、
ローカルメモリにアクセスしたり、一般インタフェース
上のサイクルを生成する。これらの動作は、ターゲット
モードＰＣＩサイクルを実行している外部インタフェー
ス制御部や外部ＣＢｕｓインタフェースによって生成さ
れる。データサイクルは命令サイクルの後のどの時点で
も良く、データサイクルはｃ＿ｓｖａｌｉｄ信号を用い
て着モジュールから返答される。３．タイプＣ動作（３７２）はモジュールのレジスタ中
の各ビットをセットするために用いられる。これらの動
作は、命令のためにコプロセッサを構成している命令制
御部２３１や外部ＣＢｕｓインタフェースによって生成
される。タイプＣ動作ではデータサイクルはなく、デー
タは命令サイクル中に含まれる。In these operations, the clock cycle immediately after the instruction cycle is the data cycle. 2. Type B operation (371) is used in diagnostic mode,
Access local memory or generate cycles on general interface. These operations are generated by an external interface control unit or an external CBus interface executing a target mode PCI cycle. The data cycle can be at any time after the instruction cycle, and the data cycle is returned from the destination module using the c_svalid signal. 3. Type C operation (372) is used to set each bit in the module's register. These operations are generated by the instruction control unit 231 and an external CBus interface that constitute a coprocessor for instructions. There is no data cycle in Type C operation, and data is included in the instruction cycle.

【０１２０】各命令のタイプフィールドは、以下の表に
従って関連するＣＢｕｓ処理を符号化したものである。ＣＢｕｓ処理タイプThe type field of each instruction encodes the associated CBus operation according to the following table. CBus processing type

【０１２１】[0121]

【表７】 [Table 7]

【０１２２】バイトフィールドは、レジスタ中のビット
をセットするために用いられる。モジュールフィールド
はＣＢｕｓ上の命令のアドレス先モジュールを指定する
フィールドである。レジスタフィールドはモジュール中
のどのレジスタを更新するかを指定するフィールドであ
る。アドレスフィールドは、動作を行うメモリ部位を指
定するフィールドである、ＲＡＭ，ＦＩＦＯなどのアド
レスを指定するものである。イネーブルフィールドは、
ビット設定命令が用いられたときに選択されたバイト中
の選択されたビットをイネーブルにするフィールドであ
る。データフィールドは、更新されるべきバイトに書き
込まれるビットデータを含む。The byte field is used to set a bit in a register. The module field is a field for designating the address destination module of the instruction on the CBus. The register field is a field for specifying which register in the module is to be updated. The address field specifies an address of a RAM, a FIFO, or the like, which is a field for specifying a memory portion where an operation is performed. The enable field is
A field that enables selected bits in a selected byte when a set bit instruction is used. The data field contains the bit data written to the byte to be updated.

【０１２３】前述の通り、ＣＢｕｓは各モジュールごと
に、モジュールが命令実行中のときに送出されるｃ＿ａ
ｃｔｉｖｅラインを含む。命令制御部はこの信号に基づ
いて命令の終了時を知ることができる。また、ＣＢｕｓ
は各モジュールごとにバックグラウンドモード時に動作
するｃ＿ｂａｃｋｇｒｏｕｎｄラインを、リセット、エ
ラー検出、インタラプトを行うためのリセット、エラ
ー、インタラプトラインとともに含む。３．１５コプロセッサデータタイプとデータ操作図２において、コプロセッサ部２２４の動作、特にＪＰ
ＥＧ符号化器２４１や主データパスのコプロセッサ中の
主な計算処理動作を簡潔にするため、コプロセッサは外
部フォーマットと内部フォーマットとを差別化するデー
タモデルを用いる。外部データフォーマットは、ローカ
ルメモリインタフェースやＰＣＩバスなどのコプロセッ
サの外部インタフェースに現われるデータフォーマット
である。逆に、内部データフォーマットは、コプロセッ
サ２２４の主機能モジュール間で現われるフォーマット
である。図２２は、種々の入力／出力フォーマットを模
式的に示した図である。入力外部フォーマット３８１
は、ピクセルオーガナイザ２４６、オペランドオーガナ
イザＢ２４７，オペランドオーガナイザＣ２４８への入
力フォーマットである。これらのオーガナイザは、入力
外部フォーマットを、ＪＰＥＧ符号化器２４１や主デー
タパス部２４２へ入力される入力内部フォーマット３８
２に再フォーマットする。また、これら２つの機能部は
出力データを出力内部フォーマットで出力し、結果オー
ガナイザ２４９が出力内部フォーマットを所望出力フォ
ーマット３０４に変換する。As described above, the CBus is transmitted for each module when the module is executing an instruction.
active line. The command control unit can know the end time of the command based on this signal. Also, CBus
Includes a c_background line that operates in the background mode for each module, along with reset, error, and interrupt lines for performing reset, error detection, and interrupt. 3.15 Coprocessor Data Type and Data Manipulation In FIG. 2, the operation of the coprocessor unit 224, particularly JP
To simplify the main computational operations in the EG encoder 241 and the main datapath coprocessor, the coprocessor uses a data model that differentiates between the external and internal formats. The external data format is a data format that appears on an external interface of the coprocessor such as a local memory interface or a PCI bus. Conversely, the internal data format is the format that appears between the main function modules of coprocessor 224. FIG. 22 is a diagram schematically showing various input / output formats. Input external format 381
Is an input format to the pixel organizer 246, the operand organizer B247, and the operand organizer C248. These organizers convert the input external format into the input internal format 38 input to the JPEG encoder 241 and the main data path unit 242.
Reformat to 2. In addition, these two functional units output the output data in an output internal format, and the result organizer 249 converts the output internal format into a desired output format 304.

【０１２４】実施例では、外部データフォーマットは３
つのタイプに分けられる。第一のタイプは、データごと
に４つまでのチャネルを有し、各チャネルが１、２、
４、８、あるいは１６ビットサンプルから成り立ってい
るような連続ストリームから成るデータの「パックスト
リーム」である。パックストリームは、ピクセル、ピク
セルに変換されるデータ、まとめられたビットなどを表
現する際に用いられる。また、コプロセッサはリトルエ
ンディアンバイトアドレッシングとバイト中ではビッグ
エンディアンビットアドレッシングを用いる。図２３は
パックストリームフォーマットの第一の例を示してい
る。ここでは、各オブジェクト３８７は、各チャネルご
とに２ビットのチャネル０、チャネル１、チャネル２の
三つのチャネルから構成される。このフォーマットのデ
ータ配置が３８８である。図２４の次の例３９０では、
各データオブジェクトが３２ビットワードを有し、チャ
ネルごとに８ビット有する４チャネルオブジェクト３９
５が示されている。図２５の第三の例３９５では、ビッ
トアドレス３９７から始まるチャネルごとに８ビットを
有するチャネルオブジェクト３９６が示されている。も
ちろん、アプリケーションに応じて、データチャネルの
実際の幅や数は変化する。In the embodiment, the external data format is 3
Divided into two types. The first type has up to four channels per data, where each channel is 1,2,
A "packed stream" of data consisting of a continuous stream, such as consisting of 4, 8, or 16 bit samples. The pack stream is used to represent a pixel, data to be converted into a pixel, a group of bits, and the like. The coprocessor also uses little endian byte addressing and big endian bit addressing in bytes. FIG. 23 shows a first example of the pack stream format. Here, each object 387 includes three channels of channel 0, channel 1, and channel 2 of 2 bits for each channel. The data arrangement of this format is 388. In the next example 390 of FIG.
4-channel object 39 with each data object having 32 bit words and 8 bits per channel
5 is shown. In the third example 395 of FIG. 25, a channel object 396 having 8 bits for each channel starting from the bit address 397 is shown. Of course, depending on the application, the actual width and number of data channels will vary.

【０１２５】外部データフォーマットの第二のタイプは
「アンパックバイトストリーム」であり、各ワード中の
１バイトのみが有効であるような３２ビットワードのシ
ーケンスである。このフォーマットの例が図２６の３９
９として示されており、各ワード中の単一バイト４００
のみが用いられる。さらなる外部データフォーマットは
「他」フォーマットとして分類されるオブジェクトで表
現される。一般に、これらのデータオブジェクトは色空
間変換表、ハフマン符号化表などの大きな表型のデータ
である。The second type of external data format is an "unpacked byte stream", a sequence of 32-bit words in which only one byte of each word is valid. An example of this format is 39 in FIG.
9, a single byte 400 in each word
Only used. Additional external data formats are represented by objects that are classified as "other" formats. Generally, these data objects are large tabular data such as a color space conversion table and a Huffman coding table.

【０１２６】コプロセッサは４つの内部データタイプを
用いる。第一のタイプは「パックバイト」フォーマット
であり、最後の３２ビットワードを除いて４アクティブ
バイトの３２ビットワードから成るフォーマットであ
る。図２７に、ワードが４バイトであるパックバイトフ
ォーマットの例４０２を示す。図２８に示す次のデータ
タイプは「ピクセル」フォーマットであり、４アクティ
ブバイトチャネルの３２ビットワード４０３から成るフ
ォーマットである。このピクセルフォーマットは４つの
チャネルデータとして解釈される。The coprocessor uses four internal data types. The first type is a "packed byte" format, which is a format consisting of four active byte 32-bit words except for the last 32-bit word. FIG. 27 shows an example 402 of a packed byte format in which a word is 4 bytes. The next data type shown in FIG. 28 is the "pixel" format, which is a format consisting of 32-bit words 403 in a 4 active byte channel. This pixel format is interpreted as four channel data.

【０１２７】図２９に示す次の内部データタイプは「ア
ンパックバイト」フォーマットであり、各ワードは一つ
のアクティブバイトチャネル４０５と三つの非アクティ
ブバイトチャネルから成るフォーマットである。この
際、アクティブバイトチャネルは最小バイトを占める。
他の内部データオブジェクトは「他」データフォーマッ
トとして区分される。外部フォーマットの入力データは
適切な内部フォーマットに変換される。図３０は、種々
のオーガナイザによって実行される外部フォーマット４
１０から入力フォーマット４１１への変換形態を示して
いる。図３１は、結果オーガナイザ２４９によって実行
される内部フォーマット４１２から外部フォーマット４
１３への変換形態を示している。The next internal data type shown in FIG. 29 is in "unpacked byte" format, where each word is in a format consisting of one active byte channel 405 and three inactive byte channels. At this time, the active byte channel occupies the minimum byte.
Other internal data objects are classified as "other" data formats. External format input data is converted to the appropriate internal format. FIG. 30 shows an external format 4 executed by various organizers.
10 shows a form of conversion from 10 to an input format 411. FIG. 31 shows an internal format 412 to an external format 4 executed by the result organizer 249.
13 is shown.

【０１２８】以下、変換を実行する処理をより詳細に説
明する。まず入力データ外部フォーマットから内部フォ
ーマットへの変換であるが、図３２は変換処理において
種々のオーガナイザによって用いられる手法を示してい
る。はじめは外部他フォーマット４１６であるが、これ
は種々のオーガナイザを経ずに単に通過する。次に、外
部アンパックバイトフォーマット４１７は、アンパック
正規化４１８を行って内部アンパックバイトと呼ばれる
フォーマット４１９を生成する。アンパック正規化４１
８処理は、外部アンパックバイトストリームから非アク
ティブ３バイトを取り除く処理を行う。図３３はアンパ
ック正規化処理を示したものであるが、４バイトチャネ
ルを有する入力のうち１つのバイトチャネルのみが出力
フォーマット４１９において有効な結果となっており、
単なるバイトを出力している様子を示している。Hereinafter, the processing for executing the conversion will be described in more detail. First, the conversion of the input data from the external format to the internal format is shown in FIG. 32. FIG. 32 shows a method used by various organizers in the conversion process. Initially, the external other format 416, which simply passes without going through various organizers. Next, the external unpacked byte format 417 performs unpack normalization 418 to generate a format 419 called an internal unpacked byte. Unpack Normalization 41
The process 8 removes three inactive bytes from the external unpacked byte stream. FIG. 33 shows the unpack normalization process, but only one byte channel among the inputs having a 4-byte channel has a valid result in the output format 419,
This shows a state where a simple byte is output.

【０１２９】図３２において、パック正規化４２１処理
は、外部パックストリーム４２２中の要素オブジェクト
をバイトストリーム４２３に変換する処理を行う。チャ
ネルの各要素のサイズがバイト以下であれば、サンプル
は８ビット値に補間される。例えば、４ビット単位をバ
イト単位に変換する場合には、４ビット値０ｘＮはバイ
ト値０ｘＮＮに変換される。１バイト以上のオブジェク
トの場合には切り捨てが行われる。ストリーム４２２で
サポートされる入力オブジェクトサイズは、１、２、
４、８、１６ビットサイズである。なお、これらは、本
発明が適用されるシステム中のデータオブジェクトやワ
ードの全幅に依存する。In FIG. 32, a pack normalization 421 process is a process of converting an element object in the external pack stream 422 into a byte stream 423. If the size of each element of the channel is less than or equal to bytes, the samples are interpolated to 8-bit values. For example, when converting a 4-bit unit into a byte unit, a 4-bit value 0xN is converted into a byte value 0xNN. If the object is longer than 1 byte, truncation is performed. The input object sizes supported by stream 422 are 1, 2,
4, 8, and 16 bit sizes. Note that these depend on the total width of data objects and words in the system to which the present invention is applied.

【０１３０】図３４は、チャネルごとに（図２３のデー
タフォーマット３８６ごとのように）２ビット有する３
チャネルオブジェクト形式の入力データ４２２が入力さ
れたときのパック正規化４２１の様子を示している。出
力データはバイトチャネルフォーマット４２３になって
いる。この際、必要であれば各チャネルに「補間処理」
が施され、８ビットサンプルが生成される。FIG. 34 shows three bits having two bits per channel (as per the data format 386 in FIG. 23).
The state of pack normalization 421 when input data 422 in the channel object format is input is shown. The output data is in the byte channel format 423. At this time, if necessary, "interpolation processing"
To generate an 8-bit sample.

【０１３１】図３２において、ピクセルストリームはそ
の後、パック処理４２５、アンパック処理４２６、要素
選択処理４２７のいずれかに送られる。図３５はパック
処理４２５の例を示したもので、単に非アクティブバイ
トチャネルが取り除かれ、ワードごとの４アクティブバ
イトにパックされたバイトストリームが生成される様子
を示している。即ち、単一の有効バイトストリーム４３
０がワードごとの４アクティブバイトを有するフォーマ
ット４３１に圧縮される。アンパック処理４２６はほぼ
パック処理の反対の処理であり、アンパックバイトがワ
ードの最小バイトとなる。図３６は、パックバイトスト
リーム４３３がアンパックされ結果４３４が得られる様
子を示している。In FIG. 32, the pixel stream is then sent to one of a pack process 425, an unpack process 426, and an element selection process 427. FIG. 35 shows an example of the packing process 425, in which the inactive byte channel is simply removed and a byte stream packed into 4 active bytes per word is generated. That is, a single valid byte stream 43
Zeros are compressed into format 431 with four active bytes per word. The unpack operation 426 is almost the opposite operation of the pack operation, and the unpack byte is the minimum byte of the word. FIG. 36 shows how the packed byte stream 433 is unpacked and the result 434 is obtained.

【０１３２】図３７は要素選択４２７処理を示したもの
であり、Ｎを単位ごとの入力チャネル数とすると、入力
ストリームからＮ要素を選択する処理である。アンパッ
ク処理は「プロトタイプピクセル」、例えば４３７を生
成するときに用いられる。なお、ピクセルチャネルは最
小バイトから埋められる。図３８は、形式４３６の入力
データが要素選択部４２７によって変換され、プロトタ
イプピクセルフォーマット４３７が生成される様子を示
している。FIG. 37 shows the element selection 427 processing, where N is the number of input channels per unit, and is the processing for selecting N elements from the input stream. The unpacking process is used when generating “prototype pixels”, for example, 437. Note that the pixel channel is filled from the minimum byte. FIG. 38 shows how the input data of the format 436 is converted by the element selection unit 427 and the prototype pixel format 437 is generated.

【０１３３】要素選択が行われると、要素入替処理４４
０（図３２）が行われる。図３８は要素入替処理の様子
を示したもので、内部データレジスタ４４１に格納され
た一定値で選択要素を入れ替え、例のように出力要素２
４２を生成する様子を示している。図３２において、処
理段４２５、５２６、４４０の出力はレーンスワップ処
理４４４に送られる。図３９に示されているように、レ
ーンスワップ処理はあるレーンを他のレーンにバイトご
とに多重化する処理であり、あるレーンを他のレーンに
複製する処理をも含む。図３８の例では、チャネル３と
チャネル１とを入れ替え、チャネル３をチャネル２とチ
ャネル１に複製する様子が示されている。When an element is selected, an element replacement process 44 is performed.
0 (FIG. 32). FIG. 38 shows a state of the element replacement processing. The selection elements are replaced with a fixed value stored in the internal data register 441, and the output element 2 is output as shown in the example.
42 is generated. In FIG. 32, the outputs of processing stages 425, 526, 440 are sent to lane swap processing 444. As shown in FIG. 39, the lane swap process is a process of multiplexing a certain lane with another lane on a byte basis, and also includes a process of duplicating a certain lane with another lane. In the example of FIG. 38, a state is shown in which channel 3 and channel 1 are exchanged, and channel 3 is copied to channel 2 and channel 1.

【０１３４】図３２において、レーンスワップ処理４４
４が終わると、データストリームが再読み出しされて複
製処理４４６に移る前に、マルチユースト値ＲＡＭ２５
０に格納されることもある。複製処理４４６は単にデー
タオブジェクトを複製する処理である。図４０は、複製
処理４４６をピクセルデータに適用した様子であり、複
製ファクターは１である。In FIG. 32, lane swap processing 44
4 is completed, the data stream is reread and before the copy process 446 is performed, the multicast value RAM 25 is read.
It may be stored in 0. The copy process 446 is a process of simply copying a data object. FIG. 40 shows a state where the duplication process 446 is applied to the pixel data, and the duplication factor is one.

【０１３５】図４１は、複製処理をパックバイトデータ
に適用した様子である。図４２は、出力内部フォーマッ
ト３８３から出力外部フォーマット３８４にデータを変
換する結果オーガナイザ２４９の処理を示したものであ
る。この処理では、図３２に示した変換処理と同様の処
理４２４、４２５、４２６、４４０を含むが、処理４５
０では更に要素非選択４５１、非正規化４５２、バイト
アドレシング４５３、書き込みマスキング４５４の処理
を含んでいる。図４３に示した要素非選択処理４５１
は、図３７の要素選択処理の逆処理であり、不必要なデ
ータが削除される。例えば、図４３では、入力中の３つ
の有効チャネルのみが取り出され、データ項目４５６に
パックされる。FIG. 41 shows a state where the duplication processing is applied to pack byte data. FIG. 42 shows the processing of the result organizer 249 for converting data from the output internal format 383 to the output external format 384. This processing includes the same processing 424, 425, 426, and 440 as the conversion processing shown in FIG.
In the case of 0, processing of element non-selection 451, denormalization 452, byte addressing 453, and write masking 454 is further included. Element non-selection processing 451 shown in FIG.
Is a reverse process of the element selection process of FIG. 37, and unnecessary data is deleted. For example, in FIG. 43, only the three valid channels being input are retrieved and packed into data item 456.

【０１３６】図４４に示した非正規化処理は、図３４で
示したパック正規化処理４２１のほぼ反対の動作をす
る。非正規化処理では、バイト単位で扱われていた各オ
ブジェクトあるいはデータ項目を非バイト値に変換する
処理が行われる。図４２のバイトアドレシング処理４５
３は、バイトアドレシングに必要なバイトごとの再構成
処理を行う。外部アンパックバイト出力ストリームで
は、ストリームアドレスの最小２ビットがアクティブス
トリームに対応する。バイトアドレシング処理４５３で
は、外部アンパックバイトが用いられているとき（図４
５）、１つのバイトチャネルから他のチャネルバイトに
出力ストリームが再マップされる。外部パックストリー
ムが用いられているときは（図４６）、バイトアドレシ
ングモジュール４５３は出力ストリームの開始アドレス
を図示のように再マップする。The denormalization processing shown in FIG. 44 operates almost in the opposite manner to the pack normalization processing 421 shown in FIG. In the denormalization process, a process of converting each object or data item handled in a byte unit into a non-byte value is performed. Byte addressing processing 45 in FIG.
No. 3 performs a reconstruction process for each byte necessary for byte addressing. In an external unpacked byte output stream, at least two bits of the stream address correspond to the active stream. In the byte addressing process 453, when an external unpacked byte is used (FIG. 4)
5) The output stream is remapped from one byte channel to another channel byte. When an external packed stream is used (FIG. 46), the byte addressing module 453 remaps the starting address of the output stream as shown.

【０１３７】図４２の書き込みマスク処理４５４を図４
７に示す。書き込みされないパックストリームのあるチ
ャネル（例えば４６０）をマスクする処理である。適用
される入力／出力データタイプ変換は、以下のデータ操
作レジスタの内容に基づいて決められる。＊ピクセルオーガナイザデータ操作レジスタ（ｐｏ＿ｄ
ｍｒ）＊オペランドオーガナイザＢとオペランドオーガナイザ
Ｃデータ操作レジスタ（ｏｏｒ＿ｄｍｒ，ｏｏｃ＿ｄｍ
ｒ）＊結果オーガナイザデータ操作レジスタ（ｒｏ＿ｄｍ
ｒ）命令のための各データ操作レジスタの設定は、以下の２
つの方法によってなされる。１．命令実行の直前にコプロセッサレジスタに書き込む
標準手法を用いて設定される２．現在の命令に基づいてコプロセッサ自身で設定され
る命令復号処理では、コプロセッサはデータの命令ワード
やデータワードの内容を調べ、種々のデータ操作レジス
タをどのように設定するかを決定する処理を他の処理と
ともに行う。なお、命令とオペランドのすべての組み合
わせが有効であるわけではない。いくつかの命令ではオ
ペランドフォーマットを規定しているものもある。不適
切なオペランドを含む命令の場合、「定義されていな
い」結果が生成されることになるが、エラーを生じるこ
となく終了してしまうこともある。対応するデータ記述
子の「Ｓ」ビットが０であれば、コプロセッサはデータ
操作レジスタをセットし、現命令を反映させる。The write mask processing 454 of FIG.
FIG. This is a process of masking a certain channel (eg, 460) of a pack stream that is not written. The input / output data type conversion to be applied is determined based on the contents of the following data operation registers. * Pixel organizer data operation register (po_d
mr) * Operand organizer B and operand organizer C data operation registers (oor_dmr, ooc_dm
r) * Result organizer data operation register (ro_dm
r) The setting of each data operation register for an instruction is as follows.
Done in one of two ways. 1. 1. Set using standard techniques to write to coprocessor registers just before instruction execution In the instruction decoding process, which is set by the coprocessor itself based on the current instruction, the coprocessor examines the contents of the data instruction word and data word and determines how to set various data operation registers. Perform with other processing. Note that not all combinations of instructions and operands are valid. Some instructions specify an operand format. Instructions with incorrect operands will produce an "undefined" result, but may also exit without error. If the "S" bit of the corresponding data descriptor is 0, the coprocessor sets the data manipulation register to reflect the current instruction.

【０１３８】図４８はデータ操作レジスタのフォーマッ
トを示した図である。以下の表は、図４８に示されたレ
ジスタ中の種々のビットフォーマットを示している。データ操作レジスタフォーマットFIG. 48 shows the format of the data operation register. The following table shows the various bit formats in the registers shown in FIG. Data operation register format

【０１３９】[0139]

【表８Ａ】 [Table 8A]

【０１４０】[0140]

【表８Ｂ】 [Table 8B]

【０１４１】各１つの命令において、複数の内部／外部
データタイプが用いられることがある。オペランド、結
果、命令タイプのすべて組み合わせは有効ではあるが、
これらの組み合わせの一部のみが意味のある結果を生成
する。各命令に対して期待されるオペランドと結果デー
タタイプの具体的な組み合わせを表９に示す。表９は、
外部／内部フォーマットにおいて期待されるデータタイ
プをまとめたものである。A single instruction may use a plurality of internal / external data types. All combinations of operands, results, and instruction types are valid,
Only some of these combinations produce meaningful results. Table 9 shows specific combinations of operands and result data types expected for each instruction. Table 9 shows
It summarizes the expected data types in the external / internal format.

【０１４２】期待されるデータタイプExpected data type

【０１４３】[0143]

【表９】 [Table 9]

【０１４４】なお、表９において用いたシンボルは以下
の通りである。シンボルの説明Note that the symbols used in Table 9 are as follows. Explanation of the symbol

【０１４５】[0145]

【表１０】 [Table 10]

【０１４６】３．１６データ正規化回路図４９は、３つの主機能ブロックを含むコンピュータグ
ラフィックスプロセッサを示している。３つの主機能ブ
ロックは、ピクセルオーガナイザ２４６とオペランドオ
ーガナイザＢ、Ｃ２４７、２４８中のデータ正規化部１
０６２、主データパス２４２あるいはＪＰＥＧ部２４１
の中央グラフィックスエンジン、命令制御部２３５中の
プログラミングエージェント１０６４である。データ正
規化部１０６２と中央グラフィックスエンジンの動作
は、プログラミングエージェント１０６４への命令スト
リーム１０６４によって決定される。各命令ごとに、プ
ログラミングエージェント１０６４は復号処理を行い、
内部制御信号１０６７と１０６８をシステム中の他のブ
ロックに出力する。各入力データワード１０６９ごと
に、正規化部１０６２は現命令に基づいてデータのフォ
ーマットを行い、処理結果をさらなる処理が実行される
中央グラフィックスエンジン１０６３に送出する。3.16 Data Normalization Circuit FIG. 49 shows a computer graphics processor including three main function blocks. The three main functional blocks are the data normalizer 1 in the pixel organizer 246 and the operand organizers B, C247, 248.
062, main data path 242 or JPEG section 241
And a programming agent 1064 in the instruction control unit 235. The operation of the data normalizer 1062 and the central graphics engine is determined by the instruction stream 1064 to the programming agent 1064. For each instruction, the programming agent 1064 performs a decryption process,
Outputs internal control signals 1067 and 1068 to other blocks in the system. For each input data word 1069, normalizer 1062 formats the data based on the current instruction and sends the processing result to central graphics engine 1063 where further processing is performed.

【０１４７】データ正規化部は、簡潔にはピクセルオー
ガナイザとオペランドオーガナイザＢ，Ｃを意味する。
これらのオーガナイザはデータ正規化回路を含み、入力
データを適切に正規化した後、ＪＰＥＧ符号化あるいは
主データパス中で中央グラフィックスエンジンに結果を
送出する。中央グラフィックスエンジン１０６３は、３
２ビットピクセルである標準フォーマットのデータに対
して動作する。従って、正規化部は入力データを３２ビ
ットピクセルフォーマットに変換する処理を行う。正規
化部への入力データワード１０６９も３２ビット幅を有
するが、パック要素あるいはアンパックバイトのいずれ
かのフォーマットであってもよい。パック要素入力スト
リームは、データオブジェクトが１，２，４，８，１６
バイト幅であるようなデータワード中での連続するオブ
ジェクトから成る。一方、アンパックバイト入力ストリ
ームは、８ビットのバイトのみが有効であるような３２
ビットのワードから成る。更に、正規化部で生成される
ピクセルデータ１１は、チャネルが８ビット幅で定義さ
れるような１，２，３，４個の有効チャネルから成る。The data normalizing unit simply means a pixel organizer and operand organizers B and C.
These organizers include a data normalization circuit which, after appropriately normalizing the input data, sends the result to the central graphics engine in a JPEG encoded or main data path. The central graphics engine 1063 has 3
Operates on standard format data that is a 2-bit pixel. Therefore, the normalizing unit performs a process of converting the input data into a 32-bit pixel format. The input data word 1069 to the normalizer also has a 32-bit width, but may be in either packed element or unpacked byte format. The packed element input stream has data objects of 1, 2, 4, 8, 16
Consists of contiguous objects in a data word that is byte wide. An unpacked byte input stream, on the other hand, has 32 bits, where only 8-bit bytes are valid.
Consists of a word of bits. Further, the pixel data 11 generated by the normalization unit includes 1, 2, 3, and 4 effective channels whose channels are defined by an 8-bit width.

【０１４８】図５０は、データ正規化部１０６２の具体
的なハードウェア構成を示した図である。データ正規化
部１０６２は、ＦＩＦＯバッファ（ＦＩＦＯ）１０７
３、３２ビット入力レジスタ（ＲＥＧ１）、３２ビット
出力レジスタ（ＲＥＧ２）、正規化マルチプレクサ１０
７５，制御部１０７６から成る。入力データワード１０
６９はＦＩＦＯ１０７３に格納された後、（ＲＥＧ１）
１０７４にすべての入力ビットが所望出力フォーマット
に変換されるまでラッチされる。正規化マルチプレクサ
１０７５は、（ＲＥＧ１）１０７４中の値と（ＦＩＦ
Ｏ）１０７３の現出力とからのビットを選択すること
で、ＲＥＧ２にラッチされるピクセルを生成するような
３２組み合わせスイッチを備える。即ち、正規化マルチ
プレクサ１０７５はｘ［６３．．３２］とｘ［３１．．
０］とで示される２つの３２ビット入力ワード１０７
７、１０７８を入力とする。FIG. 50 is a diagram showing a specific hardware configuration of the data normalization unit 1062. The data normalization unit 1062 includes a FIFO buffer (FIFO) 107
3, 32-bit input register (REG1), 32-bit output register (REG2), normalizing multiplexer 10
75, and a control unit 1076. Input data word 10
69 is stored in the FIFO 1073, and then (REG1)
At 1074, all input bits are latched until converted to the desired output format. The normalization multiplexer 1075 calculates the value in (REG1) 1074 and (FIF
O) There is a 32 combination switch that selects a bit from the current output of 1073 to generate a pixel that is latched into REG2. That is, the normalization multiplexer 1075 outputs x [63. . 32] and x [31. .
0] and two 32-bit input words 107
7, 1078 are input.

【０１４９】このような手法を用いることで、特に命令
処理においてＦＩＦＯが少なくとも２つの有効データワ
ードを有する場合に、装置の全体スループットを向上さ
せることができる。これは、データワードをメモリから
フェッチする手法によるものである。所望データワード
あるいはオブジェクトがＦＩＦＯバッファ中の隣接する
入力データワードに拡散あるいは「ラップ」されている
ことがあるが、入力レジスタ１０７４を用いることで、
ＦＩＦＯバッファ中の隣接データワードからの要素を用
いて完全な入力データを再構成することができ、主デー
タ操作処理段に先立って必要となるさらなる記憶装置や
ビットストリップ処理を省くことができる。類似のタイ
プの複数データワードが正規化部に入力されるような場
合には、このような構成が大きな利点となる。By using such a technique, the overall throughput of the device can be improved, especially when the FIFO has at least two valid data words in the instruction processing. This is due to the technique of fetching data words from memory. Although the desired data word or object may be spread or "wrapped" in adjacent input data words in the FIFO buffer, using the input register 1074,
The complete input data can be reconstructed using elements from adjacent data words in the FIFO buffer, eliminating the need for additional storage and bit strip processing required prior to the main data manipulation processing stage. Such a configuration is a great advantage when multiple data words of a similar type are input to the normalization unit.

【０１５０】制御部は、ＲＥＧ１１０７４やＲＥＧ２
１０７６を更新するイネーブル信号ＲＥＧ１＿ＥＮ１
０８０やＲＥＧ２＿ＥＮ［３．．０］１０８１を生成す
るとともに、ＦＩＦＯ１０７３や正規化マルチプレク
サ１０７５を制御する信号をも生成する。図４９のプロ
グラミングエージェント１０６４はデータ正規化部１０
６２に対して次のような構成信号を送出する。ＦＩＦＯ
＿ＷＲ４信号、正規化ファクターｎ［２．．０］、ビッ
トオフセットｂ［２．．０］、チャネルカウントｃ
［１．．０］、外部フォーマット（Ｅ）といった信号で
ある。入力データは，有効データが存在するクロックサ
イクルごとにＦＩＦＯ＿ＷＲ信号１０８５を送出するこ
とにより、ＦＩＦＯ１０７３に書き込まれる。領域が得
られないときには、ＦＩＦＯはｆｉｆｏ＿ｆｕｌｌ状態
フラグ１０８６を送出する。３２ビット入力データが与
えられると、外部フォーマット信号を用いて、入力がパ
ックストリームフォーマット（Ｅ＝１）であるかアンパ
ックバイト（Ｅ＝０）であるかが調べられる。Ｅ＝１の
場合には、正規化ファクターはパックストリームの各要
素サイズとなる。即ち、ｎ＝０は１ビット幅の要素、ｎ
＝１は２ビット幅要素、ｎ＝２は４ビット幅要素、ｎ＝
３は８ビット幅要素、ｎ＞３は１６ビット幅要素を示
す。また、チャネルカウントは、所望有効バイト数でピ
クセルを生成するためにクロックサイクルごとにフォー
マットする連続した入力オブジェクトの最大数である。
具体的には、ｃ＝１は最小バイトのみが有効であるピク
セル、ｃ＝２は最小２バイトが有効であるピクセル、ｃ
＝３は最小３バイトが有効であるピクセル、ｃ＝０はす
べての４バイトが有効であるピクセルである。The control unit controls REG1 1074 and REG2
Enable signal REG1_EN1 for updating 1076
080 or REG2_EN [3. . 0] 1081 and a signal for controlling the FIFO 1073 and the normalizing multiplexer 1075. The programming agent 1064 in FIG.
The following configuration signal is sent to the control unit 62. FIFO
_WR4 signal, normalization factor n [2. . 0], bit offset b [2. . 0], channel count c
[1. . 0], an external format (E). The input data is written to the FIFO 1073 by sending out the FIFO_WR signal 1085 every clock cycle in which valid data exists. When no area is available, the FIFO sends out a fifo_full status flag 1086. Given 32 bit input data, it is checked using an external format signal whether the input is in packed stream format (E = 1) or unpacked byte (E = 0). When E = 1, the normalization factor is the size of each element of the pack stream. That is, n = 0 is a 1-bit element, n
= 1 is a 2-bit width element, n = 2 is a 4-bit width element, n =
3 indicates an 8-bit width element, and n> 3 indicates a 16-bit width element. Also, the channel count is the maximum number of consecutive input objects that will be formatted every clock cycle to produce a pixel with the desired number of significant bytes.
Specifically, c = 1 is a pixel for which only the minimum byte is valid, c = 2 is a pixel for which at least 2 bytes are valid, c
= 3 is a pixel for which at least 3 bytes are valid, and c = 0 is a pixel for which all 4 bytes are valid.

【０１５１】パックストリームが８ビット幅以下の要素
から成る場合には、ビットオフセットがＲＥＧ１に格納
されている値であるｘ［３１．．０］中のデータ処理開
始位置を決定する。ビットオフセットがはじめの入力バ
イトの最大ビットからの偏移である場合には、出力デー
タバイトｙ［７．．０］の生成方法は以下の式で与えら
れる。ｎ＝０の場合、ｙ［ｉ］＝ｘ［７−ｂ］０≦ｉ≦７のときｎ＝１の場合、ｙ［ｉ］＝ｘ［７−ｂ］ｉ＝１，３，５，７のと
きｙ［ｉ］＝ｘ［６−ｂ］ｉ＝０，２，４，６のと
きｎ＝２の場合、ｙ［３］＝ｘ［７−ｂ］ｙ［２］＝ｘ［６−ｂ］ｙ［１］＝ｘ［５−ｂ］ｙ［０］＝ｘ［４−ｂ］ｙ［７］＝ｙ［３］ｙ［６］＝ｙ［２］ｙ［５］＝ｙ［１］ｙ［４］＝ｙ［０］ｎ＝３の場合、ｙ［ｉ］＝ｘ［ｉ］０≦ｉ≦７のときｎ＞３の場合、ｙ［７．．．０］＝ｘ［１５．．．８］出力データバイトｙ［１５．．８］，ｙ［２３．．１
６］，ｙ［３１．．２４］を生成する式も同様である。If the pack stream is composed of elements having a width of 8 bits or less, the bit offset is x [31. . 0] is determined. If the bit offset is a deviation from the largest bit of the first input byte, the output data byte y [7. . 0] is given by the following equation. When n = 0, y [i] = x [7-b] When 0 ≦ i ≦ 7 When n = 1, y [i] = x [7-b] i = 1, 3, 5, 7 When y [i] = x [6-b] When i = 0,2,4,6 When n = 2, y [3] = x [7-b] y [2] = x [6- b] y [1] = x [5-b] y [0] = x [4-b] y [7] = y [3] y [6] = y [2] y [5] = y [1 Y [4] = y [0] When n = 3, y [i] = x [i] When 0 ≦ i ≦ 7 When n> 3, y [7. . . 0] = x [15. . . 8] Output data byte y [15. . 8], y [23. . 1
6], y [31. . 24] is the same.

【０１５２】なお、以上の手法は、入力ストリームの要
素を入力し、必要な回数の複製処理を行い標準幅の出力
オブジェクトを生成することで、いかなる長さの出力ア
レイをも生成することができるように拡張できる。ま
た、入力要素の処理順は、リトルエンディアンでもビッ
グエンディアンでも良い。なお、上述の例では、常に処
理が入力バイトの最大ビットから始まるため、ビッグエ
ンディアン要素順を用いている。リトルエンディアン順
を用いる場合には、ビットオフセットを入力バイトの最
小ビットに対する値として再定義する必要がある。ま
た、入力要素幅が標準出力幅以上のときには、出力要素
は入力要素を切り捨てる、一般には適当な数の最小ビッ
トを削除することによって生成される。上式では、１６
ビットデータオブジェクトの最大バイトを選択すること
により、１６ビット入力要素を切り捨てて８ビット幅標
準出力を生成している。In the above method, an output array of any length can be generated by inputting elements of an input stream and performing a necessary number of duplication processes to generate an output object having a standard width. Can be extended as follows. The processing order of the input elements may be little endian or big endian. In the above example, since the processing always starts from the maximum bit of the input byte, the big endian element order is used. If little endian order is used, the bit offset must be redefined as the value for the least significant bit of the input byte. Also, if the input element width is greater than or equal to the standard output width, the output element is generated by truncating the input element, typically removing an appropriate number of the least significant bits. In the above formula, 16
By selecting the largest byte of the bit data object, the 16-bit input element is truncated to produce an 8-bit wide standard output.

【０１５３】図５０の制御部はｎ［２．．０］とｃ
［１．．０］の復号を行い、これらとｂ［２．．０］と
を用いて正規化マルチプレクサのための選択信号やＲＥ
Ｇ１やＲＥＧ２のためのイネーブル信号を生成する。ま
た、ＦＩＦＯは命令中において空になることもあるた
め、制御部はＲＥＧ１中に入力データを選択する現在の
ビット位置ｉｎ＿ｂｉｔ［４．．０］と、出力データの
書き込みを始める現在のバイト位置ｏｕｔ＿ｂｙｔｅ
［４．．０］を記憶するカウンタを備える。制御部は、
処理が終了した時点で、ｉｎ＿ｂｉｔ［４．．０］の値
とＲＥＧ１の最終オブジェクトの位置とを比較すること
で入力ワードを検出し、ＦＩＦＯが空でない１クロック
サイクルにおいてＦＩＦＯ＿ＲＤ信号を送出することで
ＦＩＦＯ読み出し動作を開始する。信号ｆｉｆｏ＿ｅｍ
ｐｔｙ，ｆｉｆｏ＿ｆｕｌｌはＦＩＦＯ状態フラグであ
り、ＦＩＦＯが有効なデータを有していないときにｆｉ
ｆｏ＿ｅｍｐｔｙ＝１、ＦＩＦＯがフルのときにｆｉｆ
ｏ＿ｆｕｌｌ＝１となる。ＦＩＦＯ＿ＲＤが送出された
クロックサイクルにおいて、ＲＥＧ１＿ＥＮの送出さ
れ、新しいデータがＲＥＧ１に取り込まれる。ＲＥＧ２
のイネーブル信号は、それぞれが出力レジスタの各バイ
トに対応ごとに４つある。制御部は、復号されたｃ
［１．．０］、ＲＥＧ１内の処理待機中の有効要素数、
ＲＥＧ２において未使用チャネル数の３つの値中での最
小値をとることで、ＲＥＧ２＿ＥＮ［３．．０］を計算
する。Ｅ＝０の場合には、ＲＥＧ１中には一つの有効要
素しか存在しない。ＲＥＧ２を占めるチャネル数が復号
されたｃ［３．．０］と等しい場合に、完全な出力ワー
ドが得られる。The control unit shown in FIG. . 0] and c
[1. . 0], and these and b [2. . 0] and the selection signal for the normalization multiplexer or RE
Generate enable signals for G1 and REG2. In addition, since the FIFO may be empty during the instruction, the control unit determines the current bit position in_bit [4. . 0] and the current byte position out_byte at which to start writing output data
[4. . 0] is stored. The control unit is
When the processing is completed, in_bit [4. . 0] and the position of the last object of REG1 to detect the input word, and start the FIFO read operation by sending the FIFO_RD signal in one clock cycle when the FIFO is not empty. Signal fifo_em
pty and fifo_full are FIFO status flags, which are used when the FIFO has no valid data.
fo_empty = 1, fif when FIFO is full
o_full = 1. In the clock cycle in which FIFO_RD is transmitted, REG1_EN is transmitted, and new data is taken into REG1. REG2
There are four enable signals corresponding to each byte of the output register. The control unit determines the decrypted c
[1. . 0], the number of active elements in REG1 waiting for processing,
By taking the minimum value among the three values of the number of unused channels in REG2, REG2_EN [3. . 0]. If E = 0, there is only one valid element in REG1. The number of channels occupying REG2 is decoded c [3. . 0], a complete output word is obtained.

【０１５４】本発明の好適な実施例では、制御部と正規
化マルチプレクサにおいて用いられるオフセットの一部
のみを用いるなど、ビットオフセットパラメータを制限
する機能を付加することにより、図５０の装置が占める
回路領域を大幅に低減することができる。このオフセッ
ト制限機能は正規化ファクターに依存するものであり、
以下の式に応じて動作する。In the preferred embodiment of the present invention, the function occupied by the apparatus of FIG. 50 is added by adding a function of limiting the bit offset parameter, such as using only a part of the offset used in the control unit and the normalizing multiplexer. The area can be significantly reduced. This offset limit function depends on the normalization factor,
It operates according to the following equation.

【０１５５】ｂ＿ｔｒｕｎｃ［２．．．０］＝０ｎ≧３の場合＝ｂ［２．．．０］ｎ＝０の場合＝ｂ［２．．．１］ｎ＝１の場合＝ｂ［２］＆”００” ｎ＝２の場合（「＆」はビットごとの結合処理を示す）このような処理により、図５０においてＭＵＸ０、ＭＵ
Ｘ１．．．ＭＵＸ３１で示されている各正規化マルチプ
レクサのサイズが、制限機能を用いないときの３２−１
からビットオフセット制限を行ったときの最大サイズ２
０−１まで低減される。このサイズ縮小により回路速度
の向上も図ることができる。B_trunc [2. . . 0] = 0 When n ≧ 3 = b [2. . . 0] When n = 0 = b [2. . . 1] In the case of n = 1 = b [2] & “00” In the case of n = 2 (“&” indicates a combining process for each bit) By such a process, MUX0 and MU in FIG.
X1. . . The size of each normalization multiplexer indicated by MUX31 is 32-1 when the limiting function is not used.
Maximum size when bit offset restriction is performed from 2
0-1. This size reduction can also improve the circuit speed.

【０１５６】以上のように、好適な実施例では、データ
をいくつかの正規化形式に変換する効率的な回路を備え
る。３．１７アクセラレータカードの画像処理動作図２と表２において、命令制御部２３５はコプロセッサ
２２４において実行される動作に帰着される命令を「実
行する」。実行される命令は、主データパス部２４２に
おいて有用な機能が実行されるような種々の命令を含
む。これらの有用な命令の１つが合成処理である。As described above, the preferred embodiment comprises an efficient circuit for converting data into several normalized forms. 3.17 Image Processing Operation of Accelerator Card In FIG. 2 and Table 2, the instruction control unit 235 “executes” an instruction resulting in an operation executed in the coprocessor 224. The executed instructions include various instructions that cause useful functions to be performed in main data path unit 242. One of these useful instructions is the composition process.

【０１５７】３．１７．１合成図５１は、主データパス部２４２において実装される合
成モデルを示した図である。合成モデル４６２は、一般
に３つのデータ入力ソースと出力データ（シンク）４６
３を含む。入力ソースの１つは、出力４６３とメモリ内
での同じ相手先からのピクセルデータ４６４である。ま
た、色や不透明度などのデータソースとして用いられる
命令オペランド４６５を含む。ここで、色や不透明度は
フラット、ブレンド、ピクセル、タイルのどれでも良
い。なお、フラットやブレンドに関しては、入力／出力
を介してフェッチするよりも内部で生成した方が高速で
あるため、ブレンド生成部４６７において生成される。
更に、入力データは、オペランドデータ４６５を減衰さ
せる減衰データ４６６をも含む。3.17.1 Synthesis FIG. 51 is a diagram showing a synthesis model implemented in the main data path unit 242. The composite model 462 generally has three data input sources and output data (sink) 46.
3 inclusive. One input source is output 463 and pixel data 464 from the same destination in memory. It also includes an instruction operand 465 used as a data source such as color and opacity. Here, the color and opacity may be any of flat, blend, pixel, and tile. It should be noted that the flat and the blend are generated by the blend generation unit 467 because the speed of the internal generation is faster than that of the fetch via the input / output.
Further, the input data also includes attenuation data 466 that attenuates the operand data 465.

【０１５８】前述のように、通常ピクセルデータは各チ
ャネルが１バイト幅である４つのチャネルから成る。こ
こで、最高アドレスの１バイトが不透明チャネルであ
る。なお、合成処理の動作や有用性に関しては、解説論
文「ＴｈｏｍａｓＰｏｒｔｅｒａｎｄＴｏｍＤ
ｕｆｆ”ＣｏｍｐｏｓｉｔｉｎｇＤｉｇｉｔａｌＩ
ｍａｇｅｓ”ｉｎＣｏｍｐｕｔｅｒＧｒａｐｈｉｃ
ｓ，ｖｏｌｕｍｅ１８，ｎｕｍｂｅｒ３，Ｊｕｌｙ
１９８４」などの標準記事を参照されたい。As mentioned above, pixel data usually consists of four channels, each channel being one byte wide. Here, one byte of the highest address is an opaque channel. The operation and usefulness of the synthesis process are described in the commentary “Thomas Porter and Tom D.
uff "Compositing Digital I
images "in Computer Graphic
s, volume 18, number 3, July
1984 ".

【０１５９】コプロセッサはプレ乗算データを用いるこ
ともできる。プレ乗算は、各色チャネルと不透明チャネ
ルとを前もって乗算する処理である。そのため、２つの
オプションのプレ乗算部４６８、４６９を備え、必要な
ときに、不透明チャネル４７０、４７１と色データとを
プレ乗算し、プレ乗算された出力４７２、４７３を得る
ことができる。合成部４７５は、現在の命令データに基
づいて２つの入力を合成する。以下の表１１に、合成オ
ペレータを示す。The coprocessor can also use premultiplied data. Pre-multiplication is the process of multiplying each color channel and the opaque channel in advance. Therefore, two optional pre-multipliers 468, 469 can be provided to pre-multiply the opaque channels 470, 471 and the color data, if necessary, to obtain pre-multiplied outputs 472, 473. The combining unit 475 combines the two inputs based on the current instruction data. Table 11 below shows the composition operators.

【０１６０】合成動作Synthetic operation

【０１６１】[0161]

【表１１】 [Table 11]

【０１６２】ここで、（ａｃｏ，ａｏ）は、色ａｃと不
透明度ａｏのプレ乗算ピクセルを表す。Ｒはオフセット
値であり、「ｗｃ」は以下で説明するラッピング／クラ
ンピングオペレータである。なお、上表の各オペレータ
の逆動作も合成部４７５が備えていることに注意された
い。クランプ／ラッピング部４７６は、制限値０〜２５
５内にデータをクランプ、或はラップするための処理部
である。また、必要であれば、データをオプションの
「アンプレ乗算」４７７処理することもでき、もとのピ
クセル値に戻すこともできる。最後に、出力データ４６
３が生成され、メモリに戻される。Here, (aco, ao) represents a pre-multiplied pixel of the color ac and the opacity ao. R is an offset value, and “wc” is a wrapping / clamping operator described below. It should be noted that the synthesizing unit 475 includes the reverse operation of each operator in the above table. The clamp / lapping unit 476 has a limit value of 0 to 25.
5 is a processing unit for clamping or wrapping data. If desired, the data can also be subjected to optional "ample multiplication" 477 processing to restore the original pixel values. Finally, the output data 46
3 is generated and returned to memory.

【０１６３】図５２は、合成処理を行う際に主データパ
ス部に送られる命令形式を示している。主オプコード中
のＸフィールドが１であれば、前記の表に従って加算オ
ペレータが適用される。このフィールドが０であれば、
加算オペレータ以外の他の命令が適用される。Ｐａフィ
ールドは、第一のデータストリーム４６４（図５１）を
プレ乗算するかどうかを示すフィールドである。また、
Ｐｂフィールドは第２のデータストリーム４６５をプレ
乗算するかどうかを示し、Ｐｒフィールドは部位４７７
を用いて結果を「アンプレ乗算」するかどうかを示す。
Ｃフィールドは範囲０−２５５内にラップあるいはクラ
ンプ、オーバフローあるいはアンダーフローするかどう
かを示し、「ｃｏｍ−ｃｏｄｅ」フィールドはどのオペ
レータを適用するかを示す。加算オペレータはオフセッ
トレジスタ（ｍｄｐ＿ｐｏｒ）を用いることもできる。
このオフセットはラッピング／クランピング処理が行わ
れる前に加算動作の結果から引かれる。加算オペレータ
では、ｃｏｍ−ｃｏｄｅフィールドはオフセットレジス
タのチャネルごとにイネーブルするかどうかを示すフィ
ールドとなる。FIG. 52 shows an instruction format sent to the main data path unit when performing the synthesizing process. If the X field in the main opcode is 1, the addition operator is applied according to the above table. If this field is 0,
Other instructions other than the addition operator apply. The Pa field is a field indicating whether to premultiply the first data stream 464 (FIG. 51). Also,
The Pb field indicates whether to pre-multiply the second data stream 465 and the Pr field is the part 477
Is used to indicate whether the result is to be "ample-multiplied."
The C field indicates whether to wrap or clamp, overflow or underflow within the range 0-255, and the "com-code" field indicates which operator to apply. The addition operator can also use the offset register (mdp_por).
This offset is subtracted from the result of the addition operation before the wrapping / clamping process is performed. In the addition operator, the com-code field is a field indicating whether to enable each channel of the offset register.

【０１６４】先に述べた図１０の標準命令ワード符号化
２８０は、合成オペランドのために変更させられる。出
力データの相手先がもとのソースと同じであるため、オ
ペランドＡは常に結果ワードと同一となる。そのため、
オペランドＡはオペランドＢとともにオペランドＢをよ
り長く記述することができる。他の命令と同様に、命令
中のＡ記述子は入力フォーマットを記述し、Ｒ記述子が
出力フォーマットを規定する。The previously described standard instruction word encoding 280 of FIG. 10 is modified for composite operands. Since the destination of the output data is the same as the original source, operand A is always the same as the result word. for that reason,
Operand A can describe operand B together with operand B longer. As with the other instructions, the A descriptor in the instruction describes the input format, and the R descriptor specifies the output format.

【０１６５】図５３は、ブレンド命令の命令ワードフォ
ーマットを第一例４７０として示している。ブレンド処
理は、各チャネルごとの開始値４７１と終了値４７２と
で規定される。同様に、図５４は、タイルアドレス４７
６、開始オフセット４７７、長さ４７８によって規定さ
れるタイル命令フォーマットを示している。すべてのタ
イルアドレスやサイズはバイトごとに特定される。タイ
ル処理はモジュラー的に行われ、図５５は図５４のフィ
ールド４７６〜４７８を説明する図である。タイルアド
レス４７６はタイルメモリの開始アドレスを、タイル開
始オフセット４７７はタイル開始時に用いられる最初の
バイトを、タイル長４７８はラップする全体のタイル長
を指定する。FIG. 53 shows the instruction word format of the blend instruction as a first example 470. The blending process is defined by a start value 471 and an end value 472 for each channel. Similarly, FIG.
6, a starting offset 477, and a tile instruction format defined by a length 478. All tile addresses and sizes are specified on a byte-by-byte basis. Tile processing is performed in a modular manner, and FIG. 55 is a view for explaining fields 476 to 478 in FIG. The tile address 476 specifies the start address of the tile memory, the tile start offset 477 specifies the first byte used at the start of the tile, and the tile length 478 specifies the total tile length to be wrapped.

【０１６６】図５１において、色要素や不透明度は減衰
値４６６によって減衰させられることもある。減衰値は
以下の３つの手法により得られる。１．命令のオペランドＣワード中に減衰ファクタをいれ
ることによって、ソフトウエアがフラット減衰を指定す
ることができる。２．１がオンで、０がオフであるビットマップ減衰は、
命令のオペランドＣワード中でビットマップのアドレス
を特定するソフトウェアを用いて利用できる。３．バイトマップ減衰を、命令のオペランドＣワードの
バイトマップアドレスに設けてもよい。４．定するソフトウエアを用いて、１のときにオン、２
のときにオフとするビットマップ減衰を行うことができ
る。In FIG. 51, color elements and opacity may be attenuated by the attenuation value 466. The attenuation value is obtained by the following three methods. 1. By including an attenuation factor in the instruction's operand C word, software can specify flat attenuation. The bitmap attenuation with 2.1 on and 0 off is:
It can be used with software that specifies the address of a bitmap in the operand C word of the instruction. 3. Byte map attenuation may be provided at the byte map address of the operand C word of the instruction. 4. ON when the value is 1, using the software
Bitmap attenuation can be performed, which is turned off when.

【０１６７】減衰値は符号なしの０〜２５５の整数であ
るため、プレ乗算された色チャネルは、Ｃｏａ＝Ｃｏａ×Ａ／２５５を計算することで、減衰ファクターと乗算される。ここ
で、Ａは減衰ファクター、Ｃｏはプレ乗算された色チャ
ネルである。Since the attenuation value is an unsigned integer from 0 to 255, the pre-multiplied color channel is multiplied by the attenuation factor by calculating Coa = Coa × A / 255. Where A is the attenuation factor and Co is the pre-multiplied color channel.

【０１６８】３．１７．２色空間変換命令図２と表２において、主データパス部２４２とデータキ
ャッシュ２３０は、主に色変換の処理を行う。色空間変
換は第一の色空間フォーマット（例えば、ＲＧＢカラー
ディスプレイに適したフォーマット）から第二の色空間
フォーマット（例えばＣＹＭあるいはＣＹＭＫ印刷に適
したフォーマット）への変換処理を行う。色空間変換処
理はすべての色空間をサポートするように設計されてお
り、１次元から多次元までのいかなる機能において用い
ることができる命令制御部２３５はＣＢｕｓ２３１を介
して、主データパス部２４２、データキャッシュ制御部
２４０、入力インタフェーススイッチ２５２、ピクセル
オーガナイザ２４６、ＭＵＶバッファ２５０、オペラン
ドオーガナイザＢ２４７、オペランドオーガナイザＣ２
４８、結果オーガナイザ２４９を構成し、色変換モード
で動作するように制御する。このモードでは、ピクセル
の複数ラインから成る入力画像がピクセルストリームと
して主データパス部２４２に１ピクセルラインごとに送
出される。主データパス部２４２（図２）は入力インタ
フェーススイッチ２５２からピクセルオーガナイザ２４
６を介してピクセルストリームを受け取り、１ピクセル
ごとに色空間変換処理を行う。また、インターバル表や
分数表がＭＵＶバッファ２５０にあらかじめロードさ
れ、色変換表がデータキャッシュ２３０にロードされ
る。主データパス２４２はこれらの表にオペランドオー
ガナイザＢ，Ｃを介してアクセスし、例えばＲＧＢ色空
間からＣＹＭあるいはＣＹＭＫ色空間にピクセルを変換
し、変換されたピクセルを結果オーガナイザ２４９に送
る。主データパス部２４２、データキャッシュ２３０、
データ制御部２４０、他の前述のデバイスは、命令制御
部２３５の制御のもとで、単一出力一般色空間（ＳＯＧ
ＣＳ）変換モードあるいは複数出力一般色空間（ＭＯＧ
ＣＳ）変換モードのどちらかのモードで動作する。デー
タキャッシュ制御部２４０やデータキャッシュ２３０の
詳細に関しては、「データキャッシュ制御部とキャッシ
ュ」２４０、２３０（図２）の項目を参照されたい。3.17.2 Color Space Conversion Instruction In FIG. 2 and Table 2, the main data path unit 242 and the data cache 230 mainly perform color conversion processing. The color space conversion performs a conversion process from a first color space format (for example, a format suitable for RGB color display) to a second color space format (for example, a format suitable for CYM or CYMK printing). The color space conversion process is designed to support all color spaces, and the instruction control unit 235 that can be used in any function from one-dimensional to multi-dimensional uses a main bus 242, a data Cache controller 240, input interface switch 252, pixel organizer 246, MUV buffer 250, operand organizer B247, operand organizer C2
48, configure the result organizer 249, and control it to operate in the color conversion mode. In this mode, an input image composed of a plurality of lines of pixels is sent to the main data path unit 242 as a pixel stream for each pixel line. The main data path unit 242 (FIG. 2) switches the input interface switch 252 to the pixel organizer 24.
6 and performs a color space conversion process for each pixel. Further, the interval table and the fraction table are loaded in the MUV buffer 250 in advance, and the color conversion table is loaded in the data cache 230. Main data path 242 accesses these tables via operand organizers B and C, for example, converts pixels from RGB color space to CYM or CYMK color space, and sends the converted pixels to result organizer 249. Main data path unit 242, data cache 230,
The data control unit 240 and the other aforementioned devices are controlled by the command control unit 235 to control a single output general color space (SOG).
CS) conversion mode or multiple output general color space (MOG
CS) operates in one of the conversion modes. For details of the data cache control unit 240 and the data cache 230, refer to the item of “Data cache control unit and cache” 240, 230 (FIG. 2).

【０１６９】正確な色空間変換処理は複雑な非線形処理
である。例えば、ＲＧＢピクセルからＣＹＭＫ色空間の
単一主色要素（即ちシアン）への色空間変換処理は理論
的には線形であるが、実際には主にピクセルの色要素を
出力する出力デバイスにおいて非線形性が生じてしま
う。ＲＧＢピクセルからＣＹＭＫ色空間の他の主色要素
（黄、マジェンタ、黒）への色空間変換処理においても
同様である。即ち、各色要素において生じてしまう非線
形性を補償するために、非線形色空間変換が一般に用い
られる。このような複雑な色変換処理の非線形性のため
に、複雑な伝達関数が組み込まれたり、ルックアップテ
ーブルが用いられる。例えば２４ビットのＲＧＢピクセ
ルの入力色空間が与えられると、これらのピクセルをＣ
ＹＭＫ色空間の８ビット主色要素（シアン）にマッピン
グするルックアップテーブルは１６メガバイト以上を必
要とする。同様に、２４ビットＲＧＢピクセルをＣＹＭ
Ｋ色空間の４つの８ビット主色要素にマッピングするル
ックアップテーブルは６４メガバイト以上となり、膨大
な容量が必要なる。これに対して、主データパス２４２
（図２）は、データキャッシュ２３０に格納されたルッ
クアップテーブルを用い、入力色空間中の点に粗い出力
色値を対応させ、出力色値を補間することで中間出力を
得る。ａ．単一出力一般色空間（ＳＯＧＣＳ）変換モード単一ならびに複数出力色変換モード（ＳＯＧＣＳ）と
（ＭＯＧＣＳ）双方とも、ＲＧＢ色空間は８ビットの
赤、緑、青色要素を有する２４ビットピクセルから成
る。ＲＧＢ色空間の各ＲＧＢ次元は１５の区間に分割さ
れ、それぞれの区間の長さはプリンタのＲＧＢからＣＹ
ＭＫ色空間への非線形性の逆関数となるように設定され
る。即ち、伝達関数が強い非線形性を示す場合には区間
の長さを短くし、伝達関数が線形に近い場合には区間の
長さを長くする。このような伝達関数の非線形部位を知
るためには、各出力プリンタの色空間を正確に調べるこ
とが望ましい。しかし、ノウハウやプリンタタイプ（例
えばインクジェット）の測定された特徴に基づいて、伝
達関数を近似あるいはモデル化することも可能である。
入力ピクセルの各色チャネルごとに、色要素値の１５の
区間中の位置が決められる。どの区間に入力色要素値が
存在するかを決定するためと、入力色要素値が存在する
区間内の位置を決定するためとの２つのテーブルが主デ
ータパス部２４２において用いられる。もちろん、異な
る伝達関数を有する出力プリンタに対しては異なるテー
ブルを用いても良い。An accurate color space conversion process is a complicated nonlinear process. For example, the color space conversion process from RGB pixels to a single primary color component (ie, cyan) in the CYMK color space is linear in theory, but is actually non-linear in an output device that outputs mainly color components of pixels. Nature will arise. The same applies to color space conversion processing from RGB pixels to other main color elements (yellow, magenta, black) in the CYMK color space. That is, nonlinear color space conversion is generally used to compensate for the nonlinearity that occurs in each color element. Due to such complicated nonlinearity of the color conversion processing, a complicated transfer function is incorporated or a look-up table is used. For example, given an input color space of 24-bit RGB pixels, these pixels are
A lookup table that maps to an 8-bit primary color element (cyan) in the YMK color space requires 16 megabytes or more. Similarly, a 24-bit RGB pixel is converted to CYM
The look-up table that maps to the four 8-bit primary color elements of the K color space is 64 megabytes or more, requiring a huge capacity. On the other hand, the main data path 242
(FIG. 2) uses a lookup table stored in the data cache 230 to associate a coarse output color value with a point in the input color space and interpolate the output color value to obtain an intermediate output. a. Single Output General Color Space (SOGCS) Conversion Mode In both single and multiple output color conversion modes (SOGCS) and (MOGCS), the RGB color space consists of 24-bit pixels with 8 bits of red, green, and blue components. Each of the RGB dimensions of the RGB color space is divided into 15 sections, and the length of each section is calculated from the printer's RGB to CY.
It is set to be an inverse function of the nonlinearity to the MK color space. That is, if the transfer function shows strong nonlinearity, the length of the section is shortened, and if the transfer function is nearly linear, the length of the section is lengthened. In order to know such a nonlinear portion of the transfer function, it is desirable to accurately check the color space of each output printer. However, it is also possible to approximate or model the transfer function based on know-how and measured characteristics of the printer type (eg, inkjet).
For each color channel of the input pixel, the position of the color element value in 15 sections is determined. Two tables are used in the main data path unit 242 to determine in which section the input color element value exists and to determine the position in the section where the input color element value exists. Of course, different tables may be used for output printers having different transfer functions.

【０１７０】前述のようにＲＧＢの各次元は１５の区間
に分割される。即ち、ＲＧＢ色空間は区間で区切られた
３次元ラティス構造となっており、区間の両端の入力ピ
クセルは入力色空間では粗い配置となっている。更に、
区間の両端に対応する出力色空間の出力色値のみがルッ
クアップテーブルに格納されている。従って、入力色ピ
クセルの出力色値は、入力ピクセルが存在する区間の両
端に対応する出力色値を決定し、区間内の位置に基づい
て出力色値を補間することで求められる。この手法によ
り、大容量のメモリを用いなければならない必要性を低
減できる。As described above, each dimension of RGB is divided into 15 sections. That is, the RGB color space has a three-dimensional lattice structure divided into sections, and input pixels at both ends of the section are coarsely arranged in the input color space. Furthermore,
Only the output color values of the output color space corresponding to both ends of the section are stored in the lookup table. Therefore, the output color value of the input color pixel is determined by determining output color values corresponding to both ends of the section where the input pixel exists, and interpolating the output color value based on the position in the section. This approach can reduce the need to use large memories.

【０１７１】図５６は、入力ＲＧＢ色ピクセルに対し
て、対応する区間や区間内の位置を決定する例４８０を
示している。変換処理は、２４ビット入力ピクセルの８
ビット入力色チャネルごとに区間テーブル４８２や区間
内位置テーブル４８３を用いて実行される。図５６にお
いて、８ビット入力色要素４８１は１０進数の４をバイ
ナリー形式で表示したものであるが、この８ビット入力
色要素４８１が区間テーブルや区間内位置テーブルへの
ルックアップとして用いられる。区間テーブル４８２
は、入力色要素値４８１が存在する０から１４までの区
間の１つを４ビットで出力する。同様に、区間内テーブ
ル４８２は、入力色値要素４８１が存在する区間内での
位置を示す。区間内テーブルは、０から２５５までの範
囲の８ビット値を格納しており、この値は２５６の分数
として解釈される。従って、１０進数４をバイナリーで
あらわした入力色値要素４８１の場合には、区間テーブ
ル４８２をルックアップすることで、出力値０が生成さ
れる。また、入力値４を区間内位置テーブル４８３でル
ックアップすることにより、分数１６０／２５６を表わ
す出力値１６０が生成される。区間テーブル４８２と区
間内位置テーブル４８３からわかるように、区間長は均
一ではない。前述のように、区間長は伝達関数の非線形
性によって決められる。FIG. 56 shows an example 480 for determining a corresponding section or a position in the section for an input RGB color pixel. The conversion process is performed on 8 bits of a 24-bit input pixel.
This is executed using the section table 482 and the section position table 483 for each bit input color channel. In FIG. 56, an 8-bit input color element 481 displays a decimal number 4 in a binary format, and this 8-bit input color element 481 is used as a lookup in a section table or a section position table. Section table 482
Outputs one of the sections from 0 to 14 in which the input color element value 481 exists in 4 bits. Similarly, the section table 482 indicates the position in the section where the input color value element 481 exists. The intra-section table stores an 8-bit value ranging from 0 to 255, and this value is interpreted as a 256 fraction. Therefore, in the case of the input color value element 481 that represents the decimal number 4 in binary, the output value 0 is generated by looking up the section table 482. Further, by looking up the input value 4 in the section position table 483, an output value 160 representing the fraction 160/256 is generated. As can be seen from the section table 482 and the section position table 483, the section lengths are not uniform. As described above, the section length is determined by the nonlinearity of the transfer function.

【０１７２】上述の通り、各ＲＧＢ色要素に対して区間
テーブルと区間内位置テーブルとを用いることで、３つ
の区間出力と３つの区間内位置出力が得られる。各色要
素に対する区間／区間内位置テーブルはＭＵＶバッファ
（図２）にロードされ、必要な時点で主データパス２４
２によってアクセスされる。色変換処理におけるＭＵＶ
バッファ２５０の構成を図５７に示す。ＭＵＶバッファ
２５０（図５７）は、それぞれが各色要素に対応する３
つの領域４８８、４８９、４９０に分けられる。各領域
（例えば４８８）は、更に４ビットの区間テーブルと８
ビットの区間内位置テーブルとに分けられる。１２ビッ
ト出力４９２は主データパス部２４２によってＭＵＶバ
ッファ２５０から各入力色チャネルごとに取り出され
る。１０進数４の単一入力色要素の上述例では、１２ビ
ット出力は０００００１０１００００となる。As described above, by using the section table and the intra-section position table for each RGB color element, three section outputs and three intra-section position outputs can be obtained. The section / in-section position table for each color element is loaded into the MUV buffer (FIG. 2), and the main data path 24
2 accessed. MUV in color conversion processing
FIG. 57 shows the configuration of the buffer 250. The MUV buffer 250 (FIG. 57) has 3
Area 488, 489, 490. Each area (for example, 488) further includes a 4-bit section table and 8 sections.
It is divided into a bit position table within the section. The 12-bit output 492 is extracted from the MUV buffer 250 by the main data path unit 242 for each input color channel. In the above example of a single input color element of decimal four, the 12-bit output would be 000001010000.

【０１７３】図５８は、補間処理の例を示した図であ
る。補間処理は、１つの３次元空間５００（例えばＲＧ
Ｂ色空間）から他の色空間（例えばＣＭＹあるいはＣＭ
ＹＫ）への補間が主な処理である。ピクセルＰ０からＰ
７はＲＧＢ入力色空間内で粗く存在しており、出力色空
間において対応する出力色値ＣＶ（Ｐ０）からＣＶ（Ｐ
７）を有する。ピクセルＰ０からＰ７の間に位置する入
力ピクセルＰｉの出力色要素値は、以下のようにして決
定される。まず、入力ピクセルＰｉを取り囲む区間の両
端Ｐ０，Ｐ１，．．．，Ｐ７を決定する。次に、区間内
位置要素ｆｒａｃ＿ｒ，ｆｒａｃ＿ｇ，ｆｒａｃ＿ｂを
決定し、最後に、Ｐ０からＰ７の両端に対応する出力色
値ＣＶ（Ｐ０）からＣＶ（Ｐ７）の間を区間内位置要素
を用いて補間する。FIG. 58 is a diagram showing an example of the interpolation processing. The interpolation process is performed in one three-dimensional space 500 (for example, RG
B color space) to another color space (for example, CMY or CM
YK) is the main processing. Pixels P0 to P
7 exist roughly in the RGB input color space, and the corresponding output color values CV (P0) to CV (P
7). The output color element value of the input pixel Pi located between the pixels P0 and P7 is determined as follows. First, both ends P0, P1,... Of the section surrounding the input pixel Pi. . . , P7 are determined. Next, the intra-section position elements frac_r, frac_g, and frac_b are determined. Finally, the output color values CV (P0) to CV (P7) corresponding to both ends of P0 to P7 are interpolated using the intra-section position elements. I do.

【０１７４】補間処理は、まず赤（Ｒ）方向の１次元補
間を行い、ｔｅｍｐ１１，ｔｅｍｐ１２，ｔｅｍｐ１
３，ｔｅｍｐ１４の値を以下の式から求める。ｔｅｍｐ１１＝ＣＶ（Ｐ０）＋ｆｒａｃ＿ｒ（ＣＶ（Ｐ１）−ＣＶ（Ｐ０））ｔｅｍｐ１２＝ＣＶ（Ｐ２）＋ｆｒａｃ＿ｒ（ＣＶ（Ｐ３）−ＣＶ（Ｐ２））ｔｅｍｐ１３＝ＣＶ（Ｐ４）＋ｆｒａｃ＿ｒ（ＣＶ（Ｐ５）−ＣＶ（Ｐ４））ｔｅｍｐ１４＝ＣＶ（Ｐ６）＋ｆｒａｃ＿ｒ（ＣＶ（Ｐ７）−ＣＶ（Ｐ６））次に、補間処理は、以下の式を用いてｔｅｍｐ２１，ｔ
ｅｍｐ２２を求め、緑（Ｇ）方向の１次元補間の計算を
する。In the interpolation processing, first, one-dimensional interpolation in the red (R) direction is performed, and temp11, temp12, temp1
3, The value of temp14 is determined from the following equation. temp11 = CV (P0) + frac_r (CV (P1) -CV (P0)) temp12 = CV (P2) + frac_r (CV (P3) -CV (P2)) temp13 = CV (P4) + frac_r (CV (P5) -CV) (P4)) temp14 = CV (P6) + frac_r (CV (P7) -CV (P6)) Next, the interpolation process is performed by using the following equations as temp21, t
emp22 is obtained, and one-dimensional interpolation in the green (G) direction is calculated.

【０１７５】ｔｅｍｐ２１＝ｔｅｍｐ１１＋ｆｒａｃ＿ｇ（ｔｅｍｐ１２−ｔｅｍｐ１１）ｔｅｍｐ２２＝ｔｅｍｐ１３＋ｆｒａｃ＿ｇ（ｔｅｍｐ１４−ｔｅｍｐ１３）最後に、以下の式に基づいて最終色出力値を求め、青
（Ｂ）方向の最終次元補間を行う。ｆｉｎａｌ＝ｔｅｍｐ２１＋ｆｒａｇ＿ｂ（ｔｅｍｐ２２−ｔｅｍｐ２１）入力と出力との範囲が一致しない場合もしばしば有り得
る。ここで、出力範囲が入力範囲よりも狭いと、両端で
範囲をクランプしなければならないことが多い。即ち、
範囲の端あたりの色を変換した際に望ましくないひずみ
が生じることが多い。図５９は、この問題が生じる例を
説明しており、入力範囲値を出力範囲値に１次元マッピ
ングする様子が示されている。ここで、入力値に対する
出力値が点５１０と５１１とで定まっているものとす
る。最大の出力値が点５１２でクランプされるとする
と、点５１１はこの大きさの出力でなければならない。
従って、５１０と５１１の２つの点を補間する場合に
は、線５１５が補間線となり、入力点５１６には出力値
５１７が対応する。しかし、範囲の制約が存在しないと
きに出力値が点５１８になるような場合には、この手法
が必ずしも最適な色マッピングであるとは限らない。５
１０と５１８との補間線は、入力点５１６に対して出力
値５１９を生成する。このような２つの出力値５１７と
５１９の差異は、特に範囲の端あたりの色を印刷する場
合などしばしば目につくひずみとなる、この問題を避け
るために、主データパス部は、拡張出力色空間で計算
し、以下の式に用いて適切な範囲にスケールやクランプ
することも可能である。Temp21 = temp11 + frac_g (temp12−temp11) temp22 = temp13 + frac_g (temp14−temp13) Finally, the final color output value is obtained based on the following equation, and the final dimension interpolation in the blue (B) direction is performed. final = temp21 + flag_b (temp22-temp21) There is often a possibility that the range between the input and the output does not match. Here, if the output range is smaller than the input range, it is often necessary to clamp the range at both ends. That is,
Undesirable distortion often occurs when converting colors near the edges of a range. FIG. 59 illustrates an example in which this problem occurs, and shows how input range values are one-dimensionally mapped to output range values. Here, it is assumed that the output value with respect to the input value is determined by points 510 and 511. Assuming that the maximum output value is clamped at point 512, point 511 must be an output of this magnitude.
Therefore, when two points 510 and 511 are interpolated, the line 515 becomes an interpolation line, and the output value 517 corresponds to the input point 516. However, if the output value is point 518 when there is no range constraint, this technique is not always the optimal color mapping. 5
The interpolation line between 10 and 518 produces an output value 519 for the input point 516. Such a difference between the two output values 517 and 519 often results in noticeable distortion, especially when printing colors near the end of the range. To avoid this problem, the main data path section uses the extended output color It can also be calculated in space and scaled or clamped to an appropriate range using the following formula:

【０１７６】図５８において、補間処理は、ＲＧＢピクセルを単一出
力色要素（例えばシアン）に変換するＳＯＣＧＳ変換モ
ードでも、ＲＧＢピクセルをすべての出力色要素に同時
に変換するＭＯＧＣＳモードのどちらでも実行される。
色変換が画像中の各ピクセルに対して行われる場合に
は、数１００万ピクセルがそれぞれ独立に色変換される
ことになる。従って、高速に動作するためには、入力値
周辺の８つの値（Ｐ０−Ｐ７）を素早く見つけることが
望ましい。[0176] In FIG. 58, the interpolation processing is executed in either the SOCGS conversion mode for converting RGB pixels to a single output color element (for example, cyan) or the MOGCS mode for simultaneously converting RGB pixels to all output color elements.
When color conversion is performed on each pixel in an image, several million pixels are color-converted independently. Therefore, in order to operate at high speed, it is desirable to quickly find eight values (P0-P7) around the input value.

【０１７７】図５７において説明した通り、主データパ
ス部２４２は、各色入力チャネルごとに４ビット区間部
位と８ビット区間内位置部位とから成る１２ビット出力
を取り出す。主データパス部２４２は赤、緑、青色チャ
ネルの４ビット区間部位を結合し、図６０中の５２０の
ように単一の１２ビットアドレス（ＩＲ，ＩＧ，ＩＢ）
を生成する。図６０は、単一１２ビットアドレス５２０
から単一出力色要素５６３が得られる様子を示したデー
タフロー図である。１２ビットアドレス５２０は、まず
生成部１８８１（図１４１）のようなデータキャッシュ
制御部２４０のアドレス生成部に送られ、メモリバンク
（Ｂ０，Ｂ１，．．．，Ｂ７）に対する８個の９ビット
ライン／バイトアドレス５２１を生成する。データキャ
ッシュ（図２）は、８個の独立のメモリバンク５２２に
分割され、それぞれは８個のライン／バイトアドレスに
よって独立にアドレシングされる。アドレス生成部にお
ける１２ビットアドレス５２０から８ライン／バイトア
ドレスへの変換は、以下の表に従って行われる。As described with reference to FIG. 57, main data path section 242 extracts a 12-bit output consisting of a 4-bit section and a position in an 8-bit section for each color input channel. The main data path unit 242 connects the 4-bit sections of the red, green, and blue channels, and forms a single 12-bit address (IR, IG, IB) as indicated by 520 in FIG.
Generate FIG. 60 shows a single 12-bit address 520
FIG. 10 is a data flow diagram showing a state in which a single output color element 563 is obtained from. The 12-bit address 520 is first sent to the address generation unit of the data cache control unit 240, such as the generation unit 1881 (FIG. 141), and the eight 9-bit lines for the memory banks (B0, B1,. / Byte address 521 is generated. The data cache (FIG. 2) is divided into eight independent memory banks 522, each independently addressed by eight line / byte addresses. The conversion from the 12-bit address 520 to the 8-line / byte address in the address generation unit is performed according to the following table.

【０１７８】ＳＯＧＣＳモードにおけるアドレス合成Address synthesis in SOGCS mode

【０１７９】[0179]

【表１２Ａ】 [Table 12A]

【０１８０】ここで、ＢＩＴ［８：６］，ＢＩＴ［５：
３］，ＢＩＴ［２：０］は、それぞれ９ビットバンクア
ドレスの６から８ビット、３から５ビット、０から２ビ
ットを示す。また、Ｒ［３：１］，Ｇ［３：１］，Ｂ
［３：１］は１２ビットアドレス５２０の４ビット区間
ＩＲ，ＩＧ，ＩＢの第１から第３ビットまでを示す。表
１２のメモリバンク５に関して、１２ビットから９ビッ
トへのマッピングを詳細に説明する。１２ビットアドレ
ス５２０中の４ビット赤区間Ｉｒの１〜３ビットが９ビ
ットアドレスＢ５の６〜８ビットにマッピングされ、４
ビット緑区間Ｉｇの１〜３ビットが加算されて９ビット
アドレスＢ５の３〜５ビットにマッピングされ、４ビッ
ト青区間Ｉｂの１〜３ビットが９ビットアドレスＢ５の
０〜２ビットにマッピングされる。Here, BIT [8: 6], BIT [5:
3] and BIT [2: 0] indicate 6 to 8 bits, 3 to 5 bits, and 0 to 2 bits of a 9-bit bank address, respectively. Also, R [3: 1], G [3: 1], B
[3: 1] indicates the first to third bits of the 4-bit section IR, IG, IB of the 12-bit address 520. For the memory bank 5 in Table 12, the mapping from 12 bits to 9 bits will be described in detail. 1 to 3 bits of the 4-bit red section Ir in the 12-bit address 520 are mapped to 6 to 8 bits of the 9-bit address B5,
The 1 to 3 bits of the bit green section Ig are added and mapped to 3 to 5 bits of the 9-bit address B5, and the 1 to 3 bits of the 4-bit blue section Ib are mapped to 0 to 2 bits of the 9-bit address B5. .

【０１８１】８つのライン／バイトアドレス５２１は、
５１２×８ビットから成る対応するメモリバンク５２２
へのアドレスとして用いられ、対応する８ビット出力色
要素５２３が各メモリバンク５２２からラッチされる。
このアドレシング処理によれば、端点Ｐ０〜Ｐ７に対応
する出力色値ＣＶ（Ｐ０）〜ＣＶ（Ｐ７）がメモリバン
ク中での異なるアドレスとなることがある。例えば、１
２ビットアドレス００００００００００００は、す
べてのバンクで０００００００００という同一のバ
ンクアドレスが得られるが、１２ビットアドレス０００
００００００００１の場合には、バンク７、５、３、
１ではバンクアドレス０００００００００となり、バ
ンク６、４、２、０ではバンクアドレス００００００
００１となるように異なるバンクアドレスが得られ
る。このようにして、入力ピクセル値を取り囲む８つの
単一出力色値ＣＶ（Ｐ０）〜ＣＶ（Ｐ７）が同時に各メ
モリバンクから得られ、メモリバンクにおいて出力色値
が二重になることを防ぐことができる。The eight line / byte addresses 521 are
Corresponding memory bank 522 consisting of 512 × 8 bits
And the corresponding 8-bit output color element 523 is latched from each memory bank 522.
According to this addressing process, the output color values CV (P0) to CV (P7) corresponding to the end points P0 to P7 may be different addresses in the memory bank. For example, 1
The 2-bit address 0000 0000 0000 gives the same bank address of 0000 000 000 in all banks, but the 12-bit address 000
In the case of 000000001, banks 7, 5, 3,
1, bank address 0000000000, and banks 6, 4, 2, 0, bank address 0000 000
A different bank address is obtained so as to be 001. In this way, eight single output color values CV (P0) to CV (P7) surrounding an input pixel value are simultaneously obtained from each memory bank, and the output color values in the memory banks are prevented from being duplicated. Can be.

【０１８２】図６１は、単一色変換モードにおいて用い
られるデータキャッシュ２３０のメモリバンクの構成を
示している。各メモリバンクは１２８ラインエントリか
ら成り、各ラインエントリは３２ビット長で４×８ビッ
トメモリ５３３〜５３６から構成される。メモリアドレ
ス５２１の上７ビットは、メモリアドレス中の対応する
データ列を決定し、メモリバンク出力としてラッチ５４
２するために用いられる。下２ビットはバイトアドレス
で、マルチプレクサ５４３への入力となり、どの４×８
ビットエントリを出力として選択５４４するかを決定す
るために用いられる。クロックサイクルごとに８つの各
メモリバンクのためのデータが出力され、主データパス
部２４２に送られる。即ち、データキャッシュ制御部は
オペランドオーガナイザ２４８（図２）から１２ビット
のバイトアドレスを受け取り、主データパス部２４２に
おける補間処理のための８ビット出力色値をオペランド
オーガナイザ２４７、２４８に出力する。FIG. 61 shows a configuration of a memory bank of the data cache 230 used in the single color conversion mode. Each memory bank is composed of 128 line entries, and each line entry is composed of 4 × 8 bit memories 533 to 536 having a 32-bit length. The upper 7 bits of the memory address 521 determine the corresponding data string in the memory address, and are latched as a memory bank output.
2 The lower two bits are a byte address, which is input to the multiplexer 543, and which 4 × 8
It is used to determine whether to select 544 a bit entry as an output. Data for each of the eight memory banks is output every clock cycle and sent to main data path unit 242. That is, the data cache control unit receives the 12-bit byte address from the operand organizer 248 (FIG. 2), and outputs an 8-bit output color value for the interpolation processing in the main data path unit 242 to the operand organizers 247 and 248.

【０１８３】図６０において、主データパス部２４２
（図２）は補間処理を３ステップで実行する。主データ
パス部における第１ステップにおいて、乗算／加算部
（例えば５５０）は対応するメモリバンク（例えば５２
２）から出力される色値と赤区間位置要素５５１を入力
とし、前記の式の第１ステップに従って４つの出力値を
計算する。第１ステップの出力（例えば５５３、５５
４）は第２ステップ５５６に送られ、ｆｒａｃ＿ｇ入力
５５７を用いて第２ステップの前式に従って出力５５８
を計算する。最後に、第２ステップ出力５５８、５５９
とｆｒａｃ＿ｂ入力５６２とを用いて、前式に基づいて
最終出力色５６３を計算する。In FIG. 60, main data path section 242
(FIG. 2) executes the interpolation process in three steps. In a first step in the main data path section, the multiplication / addition section (e.g., 550) may use a corresponding memory bank (e.g., 52
The color value output from 2) and the red section position element 551 are input, and four output values are calculated according to the first step of the above equation. Output of the first step (for example, 553, 55
4) is sent to a second step 556, which uses the frac_g input 557 to output 558 according to the previous equation of the second step.
Is calculated. Finally, the second step outputs 558, 559
And the frac_b input 562 to calculate the final output color 563 based on the previous equation.

【０１８４】図６０に示した処理は、全体で最大のスル
ープットを得るためにパイプライン化されている。更
に、図６０の手法は単一出力色要素５６３が必要なとき
に用いられる。例えば、図６０の手法は、まず出力画像
のシアン色要素を生成し、その後でパス間のキャッシュ
テーブルを再ロードして出力画像のマジェンタ、黄、黒
要素を生成するような場合に用いられる。これは、特
に、それぞれの出力色が独立パスとなるような４パス印
刷処理に適している。ｂ．複数出力一般色空間モードコプロセッサ２２４はＭＯＧＣＳモードでの動作も行う
が、ＭＯＧＣＳモードはいくつかの点を除いてＳＯＣＧ
Ｓモードとほぼ同様に動作する。ＭＯＧＣＳモードで
は、図２の主データパス部２４２、データキャッシュ制
御部２４０、データキャッシュが協調して、出力される
４つの主色要素を同時に出力する。このためにはデータ
キャッシュ２３０のサイズが４倍必要となるが、記憶領
域を節約するためにＭＯＧＣＳ動作モードでは、データ
キャッシュ制御部２４０は出力色空間のすべての出力色
値の１／４のみを格納する。出力色空間の残りの出力色
値は低速度の外部メモリに格納され、必要な時点で取り
出される。なお、本装置や手法は、キャッシュシステム
にある粗い色変換テーブルのミス率が非常に小さいとい
う驚くべき事実に基づいている。これは、多くのカラー
画像では、１つのピクセルと他のピクセルとの色値の分
散が小さいという知見に基づいたものである。また、粗
い出力色値は近隣のピクセルにおいても同じになる確率
が非常に高い。The processing shown in FIG. 60 is pipelined in order to obtain the maximum throughput as a whole. 60 is used when a single output color element 563 is required. For example, the method of FIG. 60 is used in a case where the cyan component of the output image is generated first, and then the cache table between passes is reloaded to generate the magenta, yellow, and black components of the output image. This is particularly suitable for a four-pass printing process in which each output color is an independent pass. b. Multiple output general color space mode The coprocessor 224 also operates in the MOGCS mode, but the MOGCS mode is the same as the SOCG mode except for a few points.
The operation is almost the same as in the S mode. In the MOGCS mode, the main data path unit 242, the data cache control unit 240, and the data cache in FIG. 2 cooperate to simultaneously output the four main color elements to be output. For this purpose, the size of the data cache 230 is required to be four times, but in the MOGCS operation mode, the data cache control unit 240 uses only one-fourth of all output color values of the output color space to save storage space. Store. The remaining output color values of the output color space are stored in a low speed external memory and retrieved when needed. The present apparatus and method are based on the surprising fact that the coarse color conversion table in the cache system has a very low miss rate. This is based on the finding that in many color images, the variance of the color value between one pixel and another pixel is small. Also, the coarse output color values are very likely to be the same for neighboring pixels.

【０１８５】図６２は、コプロセッサが複数チャネルキ
ャッシュ色変換を実行する手法を示している。各入力ピ
クセルは色要素に分解された後、対応する区間テーブル
値（図５６）が前述のように決定され、Ｉｒ，Ｉｇ，Ｉ
ｂ５７０といった３つの４ビット区間が得られる。結合
された１２ビット数５７０は前述の表１２に従って変換
され、８個の９ビットアドレスが得られる。アドレス
（例えば５７２）は図６３において以下で説明するよう
に再マッピングされ、対応するメモリバンク５７３をル
ックアップして４つの色出力チャネル５７４が得られ
る。メモリバンク５７３は、全体で５１２×３２ビット
エントリとなり得るが、そのうちの１２８×３２ビット
エントリを格納する。メモリバンク５７３はデータキャ
ッシュ２３０の一部をなし、図６３で説明するようにキ
ャッシュとして用いられる。FIG. 62 shows a method in which the coprocessor executes a multi-channel cache color conversion. After each input pixel is decomposed into color components, the corresponding section table values (FIG. 56) are determined as described above, and Ir, Ig, I
Three 4-bit intervals such as b570 are obtained. The combined 12-bit number 570 is converted according to Table 12 above to obtain eight 9-bit addresses. The address (eg, 572) is remapped as described below in FIG. 63, and the corresponding memory bank 573 is looked up to obtain four color output channels 574. The memory bank 573 can store 512 × 32 bit entries, of which 128 × 32 bit entries are stored. The memory bank 573 forms a part of the data cache 230 and is used as a cache as described in FIG.

【０１８６】図６３は、９ビットバンク入力５７８が５
７９に再マッピングされる様子を示しており、ビット５
８０〜５８２の順番を入れ替えることによりメモリパタ
ーンのエイリアスを取り除くことができる。これによ
り、隣接するピクセル値が同じキャッシュ要素のエイリ
アスされる確率を低減することができる。再構成された
メモリアドレス５７９は、それぞれが３２ビットの１２
８エントリから成る対応するメモリバンク（例えば５８
５）へのアドレスとして用いられる。７ビットラインア
ドレスを用いてメモリ５８５にアクセスすることで、メ
モリバンクごとにラッチ５８６される出力が得られる。
各メモリバンク（例えば５８５）は、それぞれが２ビッ
トの１２８エントリから成る関連タグメモリを有する。
７ビットラインアドレスは、このタグメモリ５８７中の
対応するタグにアクセスするためにも用いられる。アド
レス５７９の最大２ビットをタグメモリ５８７中の対応
するタグと比較することで、出力色値がキャッシュ中に
格納されているかどうかが決定される。この９ビットア
ドレス中の最大２ビットは、赤と緑データ区間の最大ビ
ットに対応する（表１２参照）。従って、ＭＯＧＣＳモ
ードでは、ＲＧＢ入力色空間が赤と緑次元において効率
よく４象限に分割され、９ビットアドレスの最大２ビッ
トがＲＧＢ入力色区間中の象限を指定することになる。
即ち、２つのビットタグによって指定された４つの象限
に、出力色値が効率的に分割される。このため、あるラ
インの各タグ値に対応する色出力値は出力色空間で離れ
て位置することになり、メモリパターンのエイリアスを
削減することができる。FIG. 63 shows that 9-bit bank input 578 has 5 inputs.
It shows that it is remapped to 79, bit 5
By changing the order of 80 to 582, the alias of the memory pattern can be removed. This can reduce the probability that adjacent pixel values will be aliased for the same cache element. The reconstructed memory addresses 579 are 32 bits each of 12 bits.
A corresponding memory bank of 8 entries (eg, 58
Used as address to 5). By accessing the memory 585 using the 7-bit line address, an output latched 586 for each memory bank is obtained.
Each memory bank (eg, 585) has an associated tag memory, each consisting of 128 entries of 2 bits.
The 7-bit line address is also used to access a corresponding tag in the tag memory 587. By comparing up to two bits of address 579 with the corresponding tag in tag memory 587, it is determined whether the output color value is stored in the cache. The maximum 2 bits in the 9-bit address correspond to the maximum bits in the red and green data sections (see Table 12). Therefore, in the MOGCS mode, the RGB input color space is efficiently divided into four quadrants in the red and green dimensions, and up to two bits of the 9-bit address specify a quadrant in the RGB input color section.
That is, the output color value is efficiently divided into four quadrants specified by two bit tags. For this reason, the color output values corresponding to the respective tag values of a certain line are located apart in the output color space, and the alias of the memory pattern can be reduced.

【０１８７】２つのビットタグが一致しない場合には、
データキャッシュ制御部はキャッシュミスを記録し、必
要なメモリ読み出しがキャッシュルックアップ処理とと
もにデータキャッシュ制御部によって起動される。な
お、キャッシュルックアップ処理は、２ビットタグエン
トリに対応するラインのすべての値が外部メモリから読
み出され、キャッシュに格納されるまで停止状態にあ
る。この処理においては、外部メモリに格納されている
色変換テーブルの関連ラインを読み出す処理が含まれ
る。図６３の処理５７５は図６２の各メモリバンク（例
えば５７３）ごとに実行されるため、キャッシュ内容に
よってはメモリバンクから結果（例えば５８６）が出力
されるまでに時間が必要となることもある。データ５８
６の８つの３２ビットセットは、この後主データパス部
（２４２）に転送され、上述の補間処理（図６２）の３
ステップ５９０−５９２がすべての色チャネル同時にか
つパイプライン処理で実行され、プリンタデバイスに送
る４つの色書津力５９５が生成される。If the two bit tags do not match,
The data cache controller records a cache miss and the required memory read is activated by the data cache controller along with the cache lookup process. Note that the cache lookup process is stopped until all values of the line corresponding to the 2-bit tag entry are read from the external memory and stored in the cache. This process includes a process of reading a related line of the color conversion table stored in the external memory. Since the process 575 in FIG. 63 is executed for each memory bank (for example, 573) in FIG. 62, depending on the cache contents, it may take time until a result (for example, 586) is output from the memory bank. Data 58
The eight 32-bit sets 6 are then transferred to the main data path section (242), where
Steps 590-592 are performed on all color channels simultaneously and in a pipelined process to generate four color books 595 to send to the printer device.

【０１８８】実験によれば、一般的な画像におけるキャ
ッシュのミス率が平均で０．０１から０．０３のピクセ
ルごとのキャッシュラインフェッチであるので、図６２
と図６３において示したキャッシュ機構が有効であるこ
とが示されている。このようなキャッシュ機構を用いる
ことで、多くの場合、データキャッシュ外部のメモリア
クセスに対する要求を大幅に低減することができる。According to the experiment, since the cache miss rate in a general image is a cache line fetch for each pixel of 0.01 to 0.03 on average, FIG.
63 shows that the cache mechanism shown in FIG. 63 is effective. By using such a cache mechanism, in many cases, requests for memory access outside the data cache can be significantly reduced.

【０１８９】コプロセッサが行う２つの色空間変換モー
ド（図１０）での命令符号化は以下の構造を有する。色空間変換における命令符号化The instruction encoding in the two color space conversion modes (FIG. 10) performed by the coprocessor has the following structure. Instruction encoding in color space conversion

【０１９０】[0190]

【表１２Ｂ】 [Table 12B]

【０１９１】図６４は、色空間変換命令における命令フ
ィールド符号化を示したものであり、色変換命令におけ
るマイナーオプコード符号化は以下のようになる。色変換命令におけるマイナーオプコード符号化FIG. 64 shows the instruction field encoding in the color space conversion instruction. The minor opcode encoding in the color conversion instruction is as follows. Minor opcode encoding in color conversion instructions

【０１９２】[0192]

【表１３】 [Table 13]

【０１９３】図６５は、ＭＯＧＣＳモードにおいて、Ｒ
ＧＢピクセルストリームをＣＹＭＫ色値に変換する手法
を示している。ステップＳ１において、２４ビットＲＧ
Ｂピクセルストリームがピクセルオーガナイザ２４６
（図２）に入力される。ステップＳ２では、図５６と図
５７で説明したように、ピクセルオーガナイザ２４６が
ルックアップテーブルを用いて各入力画素の４ビット区
間値と８ビット区間内位置とを決定する。入力ピクセル
の区間値と区間内位置は、入力ピクセルがどの区間に存
在するのか、また区間内のどの位置に存在するのかを表
すものである。ステップＳ３では、主データパス部２４
２が入力ピクセルの赤、緑、青色要素の４ビット区間を
結合して、１２ビットアドレスワードを生成し、この１
２ビットアドレスワードをデータキャッシュ制御部２４
０（図２）に送る。ステップＳ４では、表１２と図６２
において説明したように、データキャッシュ制御部２４
０がこの１２ビットアドレスワードを８つの９ビットア
ドレスに変換する。これらの８つのアドレスは、８つの
出力値ＣＶ（Ｐ０）−ＣＶ（Ｐ７）のメモリバンク５７
３（図６２）中の位置を示す。ステップＳ５では、デー
タキャッシュ制御部２４０（図２）が８つの９ビットア
ドレスを、図６３で説明したように再マッピングする。
このようにして、赤と緑の４ビット区間の最大ビット
が、９ビットアドレスの最大２ビットにマッピングされ
る。FIG. 65 shows that in the MOGCS mode, R
4 shows a method of converting a GB pixel stream into CYMK color values. In step S1, a 24-bit RG
The B pixel stream is the pixel organizer 246
(FIG. 2). In step S2, as described with reference to FIGS. 56 and 57, the pixel organizer 246 determines a 4-bit section value and an 8-bit section position of each input pixel using a look-up table. The section value and the position in the section of the input pixel indicate in which section the input pixel exists and in which position in the section. In step S3, the main data path unit 24
2 combines the four bit intervals of the red, green and blue components of the input pixel to produce a 12 bit address word,
The 2-bit address word is transferred to the data cache control unit 24.
0 (FIG. 2). In step S4, Table 12 and FIG.
As described in, the data cache control unit 24
A 0 translates this 12-bit address word into eight 9-bit addresses. These eight addresses correspond to the memory banks 57 of the eight output values CV (P0) -CV (P7).
3 (FIG. 62). In step S5, the data cache control unit 240 (FIG. 2) remaps the eight 9-bit addresses as described with reference to FIG.
In this way, the maximum bit of the red and green 4-bit sections is mapped to a maximum of 2 bits of the 9-bit address.

【０１９４】ステップＳ６では、データキャッシュ制御
部２４０が９ビットアドレスの最大２ビットと、メモリ
５８７（図６３）中の２ビットタグとを比較する。２ビ
ットタグが９ビットアドレスの最大２ビットと一致しな
ければ、出力色値ＣＶ（Ｐ０）−ＣＶ（Ｐ７）はキャッ
シュメモリ２３０に存在しない。従ってステップＳ７に
おいて、２ビットタグエントリに対応する出力色値が外
部メモリからデータキャッシュ２３０に読み込まれる。
２ビットタグが９ビットアドレスの最大２ビットと一致
する際には、データキャッシュ制御部２４０はステップ
Ｓ８において図６２において説明した要領で８つの出力
色値ＣＶ（Ｐ０）−ＣＶ（Ｐ７）を取り出す。このよう
にして、入力ピクセルを取り囲む８つの出力色値ＣＶ
（Ｐ０）−ＣＶ（Ｐ７）が主データパス部２４２によっ
てデータキャッシュ２３０から取り込まれる。ステップ
Ｓ７では、ステップＳ２で決定された区間内位置を用い
て出力色値ＣＶ（Ｐ０）−ＣＶ（Ｐ７）が主データパス
部２４２において補間され、補間された出力色値が出力
される。In step S6, the data cache control section 240 compares a maximum of 2 bits of the 9-bit address with a 2-bit tag in the memory 587 (FIG. 63). If the 2-bit tag does not match the maximum 2 bits of the 9-bit address, the output color value CV (P0) -CV (P7) does not exist in the cache memory 230. Therefore, in step S7, the output color value corresponding to the 2-bit tag entry is read from the external memory into the data cache 230.
When the 2-bit tag matches the maximum 2 bits of the 9-bit address, the data cache control unit 240 extracts eight output color values CV (P0) -CV (P7) in step S8 as described with reference to FIG. . Thus, the eight output color values CV surrounding the input pixel
(P0) -CV (P7) is fetched from the data cache 230 by the main data path unit 242. In step S7, the output color value CV (P0) -CV (P7) is interpolated in the main data path unit 242 using the position in the section determined in step S2, and the interpolated output color value is output.

【０１９５】ここで、ＲＧＢ色空間や対応する出力色値
を４象限以上、例えば３２ブロックに更に分割すること
により、データキャッシュ容量の格納領域を低減するこ
とができることは、専門家にとっては明らかである。３
２ブロックに分割する場合には、データキャッシュの格
納容量は出力色値の１／３２ブロックのみで良い。ま
た、ＭＯＧＣＳモードで用いられるデータキャッシュ機
構を単一出力一般変換モードにおいて用いることもでき
ることも、専門家にとっては明らかである。この場合に
も、データキャッシュの格納領域を低減することができ
る。Here, it is obvious to an expert that the storage area of the data cache capacity can be reduced by further dividing the RGB color space and the corresponding output color values into four or more quadrants, for example, into 32 blocks. is there. 3
When divided into two blocks, the storage capacity of the data cache may be only 1/32 block of the output color value. It is also clear to the expert that the data cache mechanism used in the MOGCS mode can be used in the single output general conversion mode. Also in this case, the storage area of the data cache can be reduced.

【０１９６】３．１７．３ＪＰＥＧ符号化／復号特にメモリの節約やある場所から他の場所への画像転送
速度の観点において、画像を符号化して格納することに
よる利点は計り知れない。画像符号化としてはさまざま
な広く流布している標準が生まれている。非常に有名な
標準の１つがＪＰＥＧ標準であるが、ＪＰＥＧ標準に関
する詳細な説明はＶａｎＮｏｓｔｒａｎｄＲｅｉｎ
ｈｏｌｄにより１９９３年に出版されたＰｅｎｎｅｂａ
ｋｅｒとＭｉｔｃｈｅｌｌによる著名な本「ＪＰＥＧ：
ＳｔｉｌｌＩｍａｇｅＤａｔａＣｏｍｐｒｅｓｓ
ｉｏｎＳｔａｎｄａｒｄ」を参照されたい。コプロセ
ッサ２２４はＪＰＥＧ標準のサブセットを用いて画像を
格納する。ＪＰＥＧ標準の利点は、画質を維持したまま
大幅な圧縮率が得られる点である。もちろん、画像を圧
縮して格納するためには他の標準を用いても良い。ＪＰ
ＥＧ標準は専門家には良く知られた標準であり、ＡＳＩ
ＣＳに用いることができるようなＪＰＥＧを実装した種
々の製品がＪＰＥＧコア製品などを含む製造業者から市
販されている。3.17.3 JPEG encoding / decoding The advantages of encoding and storing images are enormous, especially in terms of memory savings and image transfer speed from one place to another. A variety of widely distributed standards have emerged for image coding. One of the very famous standards is the JPEG standard, but a detailed description of the JPEG standard can be found in Van Nostrand Rein.
Penneba published in 1993 by Hold
A well-known book by ker and Mitchell, "JPEG:
Still Image Data Compress
See “ion Standard”. Coprocessor 224 stores the image using a subset of the JPEG standard. An advantage of the JPEG standard is that a large compression ratio can be obtained while maintaining image quality. Of course, other standards may be used to compress and store the image. JP
The EG standard is a standard well known to experts,
Various products implementing JPEG that can be used for CS are commercially available from manufacturers including JPEG core products and the like.

【０１９７】コプロセッサ２２４は、１、３、４色要素
から成る画像をＪＰＥＧ符号化／復号する機能を備えて
いる。１色要素画像はメッシュでもメッシュでなくても
良い。即ち、１色要素を、メッシュデータあるいはメッ
シュされていないデータのどちらかでも取り出すことが
できる。メッシュデータの例としてピクセルデータごと
の３色要素（即ち、ピクセルデータごとのＲＧＢ）があ
り、メッシュされていないデータの例として、画像の各
色要素が別々に格納されており各色要素を独立に処理で
きるようなデータが挙げられる。３色要素画像の場合に
は、コプロセッサ２２４は３色チャネルが最小３バイト
に符号化されていると仮定して、ワードごとに１ピクセ
ルを用いる。The coprocessor 224 has a function of JPEG encoding / decoding an image composed of 1, 3, and 4 color elements. The one-color element image may or may not be a mesh. That is, one color element can be extracted as either mesh data or unmeshed data. As an example of mesh data, there are three color elements for each pixel data (that is, RGB for each pixel data). As an example of non-meshed data, each color element of an image is stored separately and each color element is processed independently. Data that can be used. For a three-color component image, coprocessor 224 uses one pixel per word, assuming that the three-color channel is encoded to a minimum of three bytes.

【０１９８】ＪＰＥＧ標準は画像を最小符号化部位（Ｍ
ＣＵ）と呼ばれる小さな２次元部位に分割する。ここ
で、各最小符号化部位は独立に処理される。ＪＰＥＧ符
号化器（図２）は、ダウンサンプリングされた画像の横
１６ピクセル、縦８ピクセルのＭＣＵでも良いし、ダウ
ンサンプリングされていない画像の場合の横８ピクセ
ル、縦８ピクセルのＭＣＵでも良い。The JPEG standard defines an image as a minimum coding part (M
CU) is divided into small two-dimensional parts. Here, each minimum coding part is processed independently. The JPEG encoder (FIG. 2) may be an MCU of 16 pixels horizontally and 8 pixels vertically of a down-sampled image, or may be an MCU of 8 pixels horizontally and 8 pixels vertically for an image that has not been down-sampled.

【０１９９】図６６は、３要素画像をダウンサンプリン
グする手法を示している。元のピクセルデータ６００
は、各ピクセルは６０１がＹＵＶ色空間でのＹ，Ｕ，Ｖ
要素から成るピクセル形式でＭＵＶバッファ２５０（図
２）に格納されている。このデータはまず４つのデータ
ブロック６０１〜６０４から成るＭＣＵ部位に変換され
る。データブロックは種々の色要素を含み、ブロック６
０１，６０２は直接サンプルされたＹ要素であり、ブロ
ック６０３、６０４は図３の例においてサブサンプルさ
れたＵ，Ｖ要素である。ここで、コプロセッサ２２４は
２種類のサブサンプリング機能を備える。１つはフィル
タリングしない直接サンプリングであり、奇数のピクセ
ルデータを残し、偶数のピクセルデータを削除するもの
である。なお、隣接値の平均をとりＵ，Ｖ要素をフィル
タリングすることもできる。FIG. 66 shows a method of down-sampling a three-element image. Original pixel data 600
Means that each pixel 601 is Y, U, V in YUV color space
It is stored in the MUV buffer 250 (FIG. 2) in pixel format consisting of elements. This data is first converted to an MCU part consisting of four data blocks 601 to 604. The data block contains various color components, and block 6
01 and 602 are directly sampled Y elements, and blocks 603 and 604 are U and V elements that are subsampled in the example of FIG. Here, the coprocessor 224 has two types of subsampling functions. One is direct sampling without filtering, in which odd pixel data is left and even pixel data is deleted. It is also possible to filter the U and V elements by averaging adjacent values.

【０２００】もう一つのＪＰＥＧサブサンプリングは、
図６７に示した４色チャネルサブサンプリングである。
このサブサンプリングでは、１６×８ピクセル６１０の
ピクセルデータブロックが通常のＹ，Ｕ，Ｖ要素に加え
て不透明度要素（Ｏ）を含む４要素６１１を有してい
る。このピクセルデータ６１０も図６６と同様にサブサ
ンプルされる。しかし、この場合には、不透明チャネル
を用いてデータブロック６１２、６１３が作成される。Another JPEG subsampling is:
This is the four-color channel sub-sampling shown in FIG.
In this sub-sampling, a pixel data block of 16 × 8 pixels 610 has four elements 611 including an opacity element (O) in addition to normal Y, U, V elements. This pixel data 610 is also sub-sampled as in FIG. However, in this case, data blocks 612 and 613 are created using the opaque channel.

【０２０１】図６８は、図２のＪＰＥＧ符号化器２４１
をより詳細に説明した図である。ＪＰＥＧ符号化／復号
器２４１は、ＪＰＥＧ符号化と復号との双方を行う。符
号化処理は、バス６２０を介してピクセルオーガナイザ
２４６（図２）からブロックデータを受信する。ブロッ
クデータはＭＵＶバッファ２５０に格納され、ブロック
ごとに処理がなされる。ＪＰＥＧ符号化処理はいくつか
の明確なステップに分割される。これらのステップは、１．ＤＣＴ部における離散コサイン変換の実行６２１２．ＤＣＴ出力の量子化６２２３．量子化器６２２で実行されるジグザグスキャンによ
るＤＣＴ係数の配置４．係数符号化器６２３で実行されるＤＣＤＣＴ係数
の予測符号化とＡＣＤＣＴ係数のランレンクス符号化５．ハフマン符号化器６２４で実行される係数符号化器
の出力の可変長符号化。出力はマルチプレクサ６２５と
Ｒｂｕｓ６２６を介して結果オーガナイザ６２９（図
２）に送られる。FIG. 68 is a block diagram showing the JPEG encoder 241 shown in FIG.
FIG. The JPEG encoder / decoder 241 performs both JPEG encoding and decoding. The encoding process receives block data from the pixel organizer 246 (FIG. 2) via the bus 620. The block data is stored in the MUV buffer 250, and is processed for each block. The JPEG encoding process is divided into several distinct steps. These steps are: 1. Execution of discrete cosine transform in DCT section 621 2. DCT output quantization 622 3. Arrangement of DCT coefficients by zigzag scan executed by quantizer 622 4. Predictive coding of DC DCT coefficients and run-length coding of AC DCT coefficients performed by coefficient encoder 623 Variable length coding of the output of the coefficient coder performed by the Huffman coder 624. The output is sent through multiplexer 625 and Rbus 626 to result organizer 629 (FIG. 2).

【０２０２】ＪＰＥＧ復号処理は、ＪＰＥＧ符号化動作
を逆にしたものである。即ち、ＪＰＥＧ復号処理は、Ｂ
ｕｓ６２０から圧縮されたＪＰＥＧブロックを入力する
処理を含む。圧縮データはＢｕｓ６３０を介してハフマ
ン符号化器６２４に送られ、データがＤＣ差分とＡＣラ
ンレンクスとに復号される。次に、データは係数符号化
器６２３に送られ、ＡＣとＤＣ係数が復号され、通常の
スキャンに戻される。その後、量子化器６２２において
ＤＣ係数に対応する量子化値を乗算することでＤＣ係数
の逆量子化が行われる。最後に、ＤＣＴ部６２１におい
て逆離散コサイン変換が施されもとのデータが復元さ
れ、Ｂｕｓ６３１を介してマルチプレクサ６２５、Ｂｕ
ｓ６２６を介して結果オーガナイザに送られる。ＪＰＥ
Ｇ符号化器２４１は、ＪＰＥＧ符号化器の動作を開始さ
せるために命令制御部によってセットされたレジスタを
含むような標準Ｃｂｕｓインタフェース６３２を介して
の通常の方法で動作する。また、量子化器６２２とハフ
マン符号化器６２４はテーブルを必要とするが、これは
必要時にデータキャッシュ２３０からロードされる。テ
ーブルデータは、Ｏｂｕｓインタフェース部６３４を介
してアクセスされる。ここでＯｂｕｓインタフェース部
６３４はオペランドオーガナイザＢ２４７に接続され、
データキャッシュ制御部２４０と作用しあう。The JPEG decoding process is the reverse of the JPEG encoding operation. That is, the JPEG decoding process
The process includes inputting a compressed JPEG block from the us. The compressed data is sent to the Huffman encoder 624 via the Bus 630, and the data is decoded into a DC difference and an AC run length. Next, the data is sent to a coefficient encoder 623 where the AC and DC coefficients are decoded and returned to normal scanning. Thereafter, the quantizer 622 performs inverse quantization of the DC coefficient by multiplying the DC coefficient by a corresponding quantization value. Finally, the DCT section 621 performs inverse discrete cosine transform to restore the original data, and the multiplexer 625 and the Bu through the Bus 631.
It is sent to the result organizer via s626. JPE
The G encoder 241 operates in a normal manner via a standard Cbus interface 632 that includes a register set by the instruction controller to initiate operation of the JPEG encoder. Also, quantizer 622 and Huffman encoder 624 require tables, which are loaded from data cache 230 when needed. The table data is accessed via the Obus interface unit 634. Here, the Obus interface unit 634 is connected to the operand organizer B247,
It interacts with the data cache control unit 240.

【０２０３】ＤＣＴ部６２１はピクセルデータに対して
離散コサイン変換と逆離散コサイン変換とを行う。ＤＣ
Ｔに関しては、さまざまな種類のＤＣＴ変換実現手法が
知られており、「ＳｔｉｌｌＩｍａｇｅＤａｔａ
ＣｏｍｐｒｅｓｓｉｏｎＳｔａｎｄａｒｄ」（同上）
の中にも記されているものの、ＤＣＴ６２１は以下の項
「高速ＤＣＴ装置」で詳述する高速手法を用いている。
なお、ＤＣＴ変換動作においては、ＴｈｅＴｒａｎｓ
ａｃｔｉｏｎｓｏｆｔｈｅＩＥＩＣＥ，ｖｏｌ．
Ｅ７１，ｎｏ．１１，Ｎｏｖｅｍｂｅｒ１９８８の１
０９５ページに掲載されたにＡｒａｉらによる論文「Ａ
ＦａｓｔＤＣＴ−ＳＱＳｃｈｅｍｅｆｏｒＩ
ｍａｇｅｓ」に基づくＤＣＴ変換手法を用いることもで
きる。The DCT section 621 performs a discrete cosine transform and an inverse discrete cosine transform on the pixel data. DC
Regarding T, various types of DCT transform realization methods are known, and “Still Image Data” is known.
Compression Standard "(ibid.)
, The DCT 621 uses the high-speed technique described in detail in the following section, "High-speed DCT device."
Note that in the DCT conversion operation, The Trans
actions of the IEICE, vol.
E71, no. 11, November 1988-1
Arai et al.'S paper “A
Fast DCT-SQ Scheme for I
DCT transform method based on “images”.

【０２０４】量子化器６２２はＤＣＴ係数の量子化と逆
量子化を行い、データキャッシュに格納された対応する
テーブルから関連値をＯｂｕｓインタフェース部６３４
を介して取り出すことで動作する。量子化処理において
は、入力データストリームは、データキャッシュ中の量
子化テーブルから読み出された値でもって除算される。
この除算は固定小数点の乗算として実装される。また、
逆量子化処理では、データストリームは逆量子化テーブ
ル中の値と乗算される。The quantizer 622 performs quantization and inverse quantization of the DCT coefficient, and obtains an associated value from the corresponding table stored in the data cache by the Obus interface unit 634.
It works by taking out via. In the quantization process, the input data stream is divided by a value read from a quantization table in a data cache.
This division is implemented as a fixed-point multiplication. Also,
In the inverse quantization process, the data stream is multiplied by the values in the inverse quantization table.

【０２０５】図６９は、逆量子化６２２をより詳細に説
明した図である。量子化器６２２は、ローカルバスを介
してＤＣＴモジュール６２１にデータを渡したり、ＤＣ
Ｔモジュール６２１からデータを受け取ったりするＤＣ
Ｔインタフェース６４０を備える。量子化処理において
は、量子化器６２２はクロックサイクルごとに２つのＤ
ＣＴ係数を受信する。これらの値は量子化器の内部バッ
ファ６４１、６４２の１つに書き込まれる。バッファ６
４１、６４２は入力データをバッファするための２つの
ポートを備えたバッファである。量子化処理において、
ＤＣＴサブモジュール６２１からの係数データはバッフ
ァ６４１、６４２の１つに格納される。バッファがフル
になると、データはバッファからジグザグスキャンで読
み出され、Ｏｂｕｓインタフェース部６３４を介して受
信した量子化値でもって乗算器６４３で乗算される。こ
の出力は係数符号化インタフェース６４５を介して係数
符号化器６２３（図６８）に転送される。これらの処理
を行っている間、次のブロックの係数が他のバッファに
書き込まれている。ＪＰＥＧ復号処理において、量子化
モジュールは、テーブルに格納された値でもって復号さ
れたＤＣＴ係数を乗算することで逆量子化処理を行う。
量子化と逆量子化とはそれぞれ排他的な動作をするた
め、乗算器６４３は量子化と逆量子化との双方において
用いられる。なお、逆量子化テーブルへのインデックス
として、８×８のブロック中の係数の位置を用いる。FIG. 69 is a diagram for explaining the inverse quantization 622 in more detail. The quantizer 622 passes data to the DCT module 621 via the local bus,
DC for receiving data from T module 621
A T interface 640 is provided. In the quantization process, quantizer 622 uses two Ds per clock cycle.
Receive CT coefficients. These values are written to one of the internal buffers 641, 642 of the quantizer. Buffer 6
Reference numerals 41 and 642 denote buffers provided with two ports for buffering input data. In the quantization process,
The coefficient data from DCT sub-module 621 is stored in one of buffers 641 and 642. When the buffer becomes full, data is read from the buffer by zigzag scanning and multiplied by the multiplier 643 with the quantized value received via the Obus interface unit 634. This output is transferred to the coefficient encoder 623 (FIG. 68) via the coefficient encoding interface 645. While these processes are being performed, the coefficient of the next block is being written to another buffer. In the JPEG decoding process, the quantization module performs the inverse quantization process by multiplying the DCT coefficient decoded by the value stored in the table.
The multiplier 643 is used in both the quantization and the inverse quantization because the quantization and the inverse quantization operate exclusively. Note that the position of a coefficient in an 8 × 8 block is used as an index into the inverse quantization table.

【０２０６】量子化処理と同様に、２つのバッファ６４
１、６４２が係数符号化器６２３（図６８）からの入力
係数データをバッファするために用いられる。データは
量子化値と乗算され、逆ジグザグスキャン順にバッファ
に書き込まれる。バッファがフルになると、逆量子化さ
れた係数が通常の順番でバッファから２つ同時に読み出
され、ＤＣＴインタフェース６４０を介してＤＣＴサブ
モジュール６２１（図６８）に送られる。従って、係数
符号化器インタフェースモジュール６４５は、係数符号
化器とのインタフェースとなっており、ローカルバスを
介して符号化器にデータを送ったり符号化器からデータ
を読み出したりする。このモジュールは、符号化時には
ジグザグスキャン順でバッファからデータを読み出し、
復号時には逆ジグザグスキャン順でバッファにデータを
書き込む。ＤＣＴインタフェースモジュール６４０とＣ
Ｃインタフェースモジュール６４５ともバッファからの
読み出しや書き込みを行うことができる。そのため、ア
ドレス／制御マルチプレクサ６４７を用いて、各インタ
フェースがどちらのバッファと動作しているのかを、量
子化器のすべてのモジュールを制御するための状態マシ
ンから成る制御モジュール６４８の制御のもとで、決定
する。乗算器６４３は、１６×８の２の補数の乗算器を
用いてＤＣＴ係数を量子化テーブル値と乗算しても良
い。Similarly to the quantization processing, the two buffers 64
1, 642 are used to buffer the input coefficient data from coefficient encoder 623 (FIG. 68). The data is multiplied by the quantized value and written to the buffer in reverse zigzag scan order. When the buffer is full, two inversely quantized coefficients are simultaneously read from the buffer in the usual order and sent to the DCT sub-module 621 (FIG. 68) via the DCT interface 640. Therefore, the coefficient encoder interface module 645 serves as an interface with the coefficient encoder, and sends data to and reads data from the encoder via the local bus. This module reads data from the buffer in zigzag scan order during encoding,
At the time of decoding, data is written to the buffer in reverse zigzag scan order. DCT interface module 640 and C
The C interface module 645 can also read from and write to the buffer. Therefore, the address / control multiplexer 647 is used to determine which buffer each interface is operating with, under the control of a control module 648, which consists of a state machine for controlling all modules of the quantizer. ,decide. The multiplier 643 may multiply the DCT coefficient by a quantization table value using a 16 × 8 two's complement multiplier.

【０２０７】図６８において、係数符号化器６２３は以
下の機能を実行する。（ａ）ＪＰＥＧモードにおけるＤＣ係数の予測符号化
／復号（ｂ）ＪＰＥＧモードにおけるＡＣ係数のランレンク
ス符号化／復号なお、係数符号化器６２３は、ＪＰＥＧモード動作とは
別に、必要な時点でピクセルの予測符号化／復号やメモ
リコピー動作のために用いることができると好ましい。
係数符号化器６２３は、ピンクブックに規定されている
ように、ＤＣ／ＡＣ係数の予測／ランレンクス符号化／
復号を行う。また、ＪＰＥＧ標準に規定されているよう
なＪＰＥＧＡＣ係数のランレンクス符号化／復号に加
えて、標準の予測符号化／復号機能も備えている。Referring to FIG. 68, coefficient encoder 623 performs the following functions. (A) Predictive encoding / decoding of DC coefficient in JPEG mode (b) Run-Length encoding / decoding of AC coefficient in JPEG mode Note that the coefficient encoder 623 separates a pixel at a necessary time from the JPEG mode operation. Preferably, it can be used for predictive encoding / decoding and memory copy operations.
The coefficient encoder 623 performs prediction / run-lens encoding /
Perform decryption. Further, in addition to the run-length encoding / decoding of JPEG AC coefficients as defined in the JPEG standard, it also has a standard predictive encoding / decoding function.

【０２０８】ハフマン符号化器６２４は、ＪＰＥＧデー
タ列のハフマン符号化／復号を行う。ハフマン符号化モ
ードでは、係数符号化器６２３からランレンクス符号化
されたデータが受信され、パックバイトのハフマンスト
リームが生成される。また、ハフマン復号モードでは、
ハフマンストリームがＰｂｕｓインタフェース６２０か
らパックバイト形式で読み出され、ハフマン復号された
係数が係数符号化モジュール６２３に送られる。ハフマ
ン符号化器６２４は、データキャッシュに格納され、Ｏ
ｂｕｓインタフェース６３４を介してアクセスされるハ
フマンテーブルを利用する。或は、ハフマンテーブルを
ハードで構成して高速にすることもできる。The Huffman encoder 624 performs Huffman encoding / decoding of the JPEG data sequence. In the Huffman coding mode, run-Lenx-encoded data is received from the coefficient encoder 623, and a Huffman stream of packed bytes is generated. Also, in Huffman decoding mode,
The Huffman stream is read from the Pbus interface 620 in a packed byte format, and the Huffman-decoded coefficients are sent to the coefficient encoding module 623. Huffman encoder 624 is stored in the data cache and
A Huffman table accessed via a bus interface 634 is used. Alternatively, the Huffman table can be configured by hardware to increase the speed.

【０２０９】ハフマン符号化においてデータキャッシュ
を用いるときには、データキャッシュの８つのバンク
は、以下に各テーブルごとに詳細に説明されているよう
にデータテーブルを格納する。データキャッシュに格納されているハフマン、量子化テ
ーブルWhen using data caches in Huffman coding, the eight banks of data caches store data tables as described in detail below for each table. Huffman, quantization table stored in data cache

【０２１０】[0210]

【表１４】 [Table 14]

【０２１１】図７０において、ハフマン符号化器６２４
は、符号化器６６０と復号器６６１との２つの独立のブ
ロックから主に構成される。双方のブロック６６０、６
６１はマルチプレクサモジュール６６２を介して同じＯ
ｂｕｓインタフェースを共有する。各ブロックは、それ
ぞれ入力と出力を有し、ＪＰＥＧ符号化器で実行される
機能に応じて、一時点ではどちらか１つのブロックのみ
がアクティブとなる。ａ．符号化ＪＰＥＧモードにおける符号化においては、ハフマンテ
ーブルを用いて、ＤＣ差分値やＡＣランレンクス値に可
変長コード（コードごとに１６ビットまで）を割り当て
られる。割り当てられたコードは、ＣＣサブモジュール
からＨＣサブモジュールに送られる。また、ハフマンテ
ーブルは動作開始前にデータキャッシュから予めロード
されていなければならない。そして、可変長コードをＣ
Ｃサブモジュールから送られてきたＤＣやＡＣ係数の他
のビットと結合し、パックバイト形式が生成される。パ
ック処理の結果、Ｘ’ＦＦバイトが得られたとすると、
Ｘ’００バイトが挿入される。ＲＳＴｍマーカが必要な
ときはマーカが挿入されるが、この際には、最後のハフ
マン符号の「１」ビットでのバイト詰込処理と、詰込ま
れたバイトがＸ’ＦＦになったときのＸ’００バイト挿
入処理が行われる。ＲＳＴｍマーカが必要かどうかは、
ＣＣサブモジュールによって指示される。また、ＨＣサ
ブモジュールは、Ｐｂｕｓ−ＣＣスレーブインタフェー
ス上の「最後の」信号での指示により、画像の最後にＥ
ＯＩマーカを挿入する。ＥＯＩマーカの挿入処理におい
ては、ＲＳＴｍマーカと同様のパック処理、詰込み処
理、挿入処理が必要となる。最後に、出力ストリームは
パックバイトとして結果オーガナイザ２４９に送られ、
外部メモリに書き込まれる。In FIG. 70, Huffman encoder 624
Is mainly composed of two independent blocks, an encoder 660 and a decoder 661. Both blocks 660, 6
61 is the same O through a multiplexer module 662
Share the bus interface. Each block has an input and an output, and depending on the function performed by the JPEG encoder, only one of the blocks is active at a time. a. Encoding In encoding in the JPEG mode, a variable length code (up to 16 bits for each code) is assigned to a DC difference value or an AC run length value using a Huffman table. The assigned code is sent from the CC submodule to the HC submodule. Also, the Huffman table must be pre-loaded from the data cache before the operation starts. And the variable length code is C
Combined with other bits of the DC and AC coefficients sent from the C submodule, a packed byte format is generated. As a result of the pack processing, X'FF bytes are obtained.
X'00 bytes are inserted. When the RSTm marker is required, the marker is inserted. In this case, the byte is filled with the last Huffman code “1” bit, and when the filled byte becomes X′FF. X'00 byte insertion processing is performed. Whether the RSTm marker is required
Indicated by the CC submodule. Also, the HC sub-module is configured to output E at the end of the image according to the instruction of the “last” signal on the Pbus-CC slave interface.
Insert an OI marker. In the insertion processing of the EOI marker, the same packing processing, packing processing, and insertion processing as those of the RSTm marker are required. Finally, the output stream is sent to result organizer 249 as packed bytes,
Written to external memory.

【０２１２】非ＪＰＥＧモードの場合には、ＣＣサブモ
ジュール（Ｐｂｕｓ−ＣＣスレーブインタフェース）か
らアンパックデータとして符号化器にデータが送られ
る。各バイトは（ＪＰＥＧモードと同様に）キャッシュ
にあらかじめロードされたテーブルを用いて独立に符号
化され、可変長シンボルがパックバイト形式にまとめら
れ、結果オーガナイザ２４９に送られる。なお、出力ス
トリームの最後のバイトは１での詰込処理が行われる。ｂ．復号復号アルゴリズムは、高速（リアルタイム）のものと低
速のものとを備える。高速アルゴリズムはＪＰＥＧモー
ドのみで動作し、低速アルゴリズムはＪＰＥＧモードで
も非ＪＰＥＧモードでも動作する。In the case of the non-JPEG mode, data is sent from the CC submodule (Pbus-CC slave interface) to the encoder as unpacked data. Each byte is independently coded using a table pre-loaded into the cache (similar to JPEG mode), and the variable length symbols are packed into packed bytes and sent to the result organizer 249. Note that the last byte of the output stream is subjected to a 1-packing process. b. Decoding The decoding algorithm includes a high-speed (real-time) algorithm and a low-speed algorithm. The fast algorithm operates in JPEG mode only, and the slow algorithm operates in both JPEG and non-JPEG modes.

【０２１３】高速ＪＰＥＧハフマン復号アルゴリズム
は、ハフマンシンボルをＤＣ差分値あるいはＡＣランレ
ンクス値のどちらかにマッピングする。これは特にＪＰ
ＥＧに適するように設計されており、符号化時において
例のハフマンテーブル（Ｋ３，Ｋ４，Ｋ５，Ｋ６）が用
いられることを想定している。なお、これらのテーブル
は、キャッシュメモリを参照することなく復号できるよ
うに、アルゴリズム中にハード的に埋め込まれている。
このような復号処理は、あるデータレートを保証しつつ
復号画像を印刷しなければならないような場合を想定し
たものである。バンド（ＲＳＴｍマーカで区切られたブ
ロック）を復号するＨＣサブモジュールのデータレート
は、１クロックサイクルでほぼ１つのＤＣ／ＡＣ係数で
ある。ＨＣサブモジュールとＣＣサブモジュール間で
は、データストリームからＸ’００挿入バイトを削除す
るために、１クロックサイクル必要になることもある
が、これはデータに強く依存している。The high-speed JPEG Huffman decoding algorithm maps Huffman symbols to either DC difference values or AC run-length values. This is especially JP
It is designed to be suitable for EG, and it is assumed that the example Huffman table (K3, K4, K5, K6) is used at the time of encoding. These tables are embedded in the algorithm in hardware so that they can be decoded without referring to the cache memory.
Such a decoding process assumes a case where a decoded image must be printed while guaranteeing a certain data rate. The data rate of the HC sub-module for decoding bands (blocks delimited by RSTm markers) is approximately one DC / AC coefficient per clock cycle. Between the HC submodule and the CC submodule, one clock cycle may be required to remove the X'00 insertion byte from the data stream, but this is strongly dependent on the data.

【０２１４】ハフマン復号器は高速モードで動作し、ク
ロックサイクルごとに１ハフマンシンボルを抽出する。
なお、高速ハフマン復号器については、以下の「可変長
符号の復号器」において記している。また、ハフマン復
号器６６１は、ヒープに基づく低速復号アルゴリズムを
備えており、図７１に示す構造６７０となっている。The Huffman decoder operates in the high-speed mode and extracts one Huffman symbol every clock cycle.
The high-speed Huffman decoder is described in “Decoder of variable-length code” below. The Huffman decoder 661 has a low-speed decoding algorithm based on a heap, and has a structure 670 shown in FIG.

【０２１５】ＪＰＥＧ符号化ストリームに対して、スト
リッパー６７１においてＸ’００挿入バイト、Ｘ’ＦＦ
詰込バイト、ＲＳＴｍマーカが取り除かれ、結合された
他のビットとともにハフマンシンボルがシフター６７２
に送られる。なお、ハフマンのみの符号化ストリームで
はこの処理は行われない。ハフマンシンボル復号の最初
のステップは、ハフマンデータストリームの最初の８ビ
ットでアドレシングされたキャッシュに格納されたＨＵ
ＦＶＡＬテーブルの２５６のエントリをルックアップす
る処理である。この値が対応するハフマンシンボルの真
の長さである場合には、当該値が出力フォーマッター６
７６に転送され、復号値のシンボル長と付加ビット数と
がシフター６７２にフィードバックされ、関連する付加
ビットを出力フォーマッター６７６に転送し、復号部６
７３に送るハフマンストリームの新しい開始部位を整列
する。ここで、付加ビット数は復号値の関数である。最
初のルックアップが復号値にならなかった場合、即ちハ
フマンシンボルが８ビット以上であった場合には、ヒー
プアドレスが計算され、一致するまで、あるいは「不適
切ハフマンシンボル」条件が満たされるまで、引き続き
ヒープ（キャッシュ内に位置）アクセスが実行される。
ルックアップが一致すると上記と同様の処理が行われ、
「不適切ハフマンシンボル」条件が満たされた場合には
インタラプト状態となる。For a JPEG encoded stream, X'00 insertion byte, X'FF
The padding byte, the RSTm marker is removed and the Huffman symbol is shifted 672 along with the other bits combined.
Sent to Note that this processing is not performed on an encoded stream including only Huffman. The first step in Huffman symbol decoding is to store the cached HUs addressed by the first 8 bits of the Huffman data stream.
This is a process of looking up 256 entries of the FVAL table. If this value is the true length of the corresponding Huffman symbol, the value is output formatter 6
76, the symbol length of the decoded value and the number of additional bits are fed back to the shifter 672, and the associated additional bits are transferred to the output formatter 676, where
Align the new start site of the Huffman stream sent to 73. Here, the number of additional bits is a function of the decoded value. If the first lookup did not result in a decoded value, i.e., the Huffman symbol was 8 bits or more, the heap address was calculated until a match or until the "improper Huffman symbol" condition was met. Subsequently, heap (position in the cache) access is executed.
If the lookup matches, the same process is performed,
When the “unsuitable Huffman symbol” condition is satisfied, an interrupt state is set.

【０２１６】ヒープに基づく復号アルゴリズムは以下の
通りである。画像の最後までループシンボル長Ｎを８にセット入力ストリームの最初の８ビットをＩＮＤＥＸに格納ＨＵＦＶＡＬ（ＩＮＤＥＸ）をフェッチＩｆＨＵＦＶＡＬ（ＩＮＤＥＸ）＝＝００ｘｘ０００１１１．．（ＩＬＬ）「不適切ハフマンシンボル」信号の送出ｅｘｉｔｅｌｓｅｉｆＨＵＦＶＡＬ（ＩＮＤＥＸ）＝＝１ｎｎｎｅｅｅｅｅｅｅｅ−−（ＨＩＴ）ｎｎｎビットをｅｅｅｅｅｅｅｅに値として転送シンボル長Ｎ＝ｄｅｃｉｍａｌ（ｎｎｎ）を転送／＊０００がシンボル長８として＊／入力ストリームの調整ｂｒｅａｋｅｌｓｅ／＊ＨＵＦＶＡＬ（ＩＮＤＥＸ）＝＝０１ｉｉｉｉｉｉｉｉｉｉｉ−−（ＭＩＳＳ）ＨＥＡＰＩＮＤＥＸ＝＝ｉｉｉｉｉｉｉｉｉｉにセット（ヒープベースを０に仮定）Ｎ＝９にセットＩｆ入力ストリームの第９ビットが０であるＨＥＡＰＩＮＤＥＸを１増加ｆｉＶＡＬＵＥ＝ＨＥＡＰ（ＨＥＡＰＩＮＤＥＸ）のフェッチ（第９ビットの符号）ＬｏｏｐＩｆＶＡＬＵＥ＝＝０００１００００１１１１−−（ＮＬ）「不適切ハフマンシンボル」信号の送出ｅｘｉｔｅｌｓｅｉｆＶＡＬＵＥ＝＝＝１０００ｅｅｅｅｅｅｅｅｅｅｅｅｅｅｅｅを値として転送シンボル長Ｎを転送入力ストリームの調整ｂｒｅａｋｅｌｓｅ／＊ＶＡＬＵＥ＝＝０１ｉｉｉｉｉｉｉｉｉｉｉ−−（ＭＩＳＳ）Ｎ＝Ｎ＋１にセット（ＨＥＡＰＩＮＤＥＸ＝ｉｉｉｉｉｉｉｉｉｉ）Ｉｆ入力ストリームの第Ｎビットが０ＨＥＡＰＩＮＤＥＸを１増加ｆｉＶＡＬＵＥ＝ＨＥＡＰ（ＨＥＡＰＩＮＤＥＸ）のフェッチｐｏｏｌｐｏｏｌストリッパ６７１は、入力ＪＰＥＧ６７１符号化ストリ
ームからＸ’００挿入バイト、Ｘ’ＦＦ詰込みバイト、
ＲＳＴｍマーカを削除し、「きれいな」ハフマンシンボ
ルを連結された付加ビットとともにシフタ６７２に転送
する。ハフマンのみの符号化においては他の付加ビット
は存在しないため、このモードにおいては転送されたス
トリームはハフマンシンボルのみから成る。The decoding algorithm based on the heap is as follows. Set loop symbol length N to 8 until end of image Store first 8 bits of input stream in INDEX Fetch HUFVAL (INDEX) If HUFVAL (INDEX) == 00xx000111. . (ILL) Transmission of “inappropriate Huffman symbol” signal exit elseif HUFVAL (INDEX) == 1nnn eeeee eee e- (HIT) Transfer nnn bits as value to eeeeeee Transfer symbol length N = decimal (nnn) / * 000 as symbol length 8 * / adjustment of input stream break else / * HUFVAL (INDEX) == 01iiiiiiiiiiiiiiii-(MISS) Set to HEAPINDEX == iiiiiiiiiiii (assume heap base is 0) N = 9 If the ninth bit of the input stream is 0, increment HEAPINDEX by one. Fi VALUE = Fetch of HEAP (HEAPINDEX) (the sign of the ninth bit) Loop If VALUE == 0001 0000 111 ---- (NL) Transmission of "inappropriate Huffman symbol" signal exit elseif VALUE === 1000 eeee eeee eeeeeeeeeeee transfer symbol length N as input value adjustment input stream break else / * VALUE == 01iiiiiiiiiiii- -(MISS) Set N = N + 1 (HEAPINDEX = ii iiiii iii ii) If Nth bit of input stream is 0, HEAPINDEX is incremented by 1 fi VALUE = HEAP (HEAPINDEX) fetch pool pool Stripper 671 encodes input JPEG 671 encoding To X'00 insertion byte, X'FF padding byte,
Remove the RSTm marker and transfer the “clean” Huffman symbol to shifter 672 with the concatenated additional bits. In Huffman-only encoding, no additional bits are present, so in this mode the transferred stream consists only of Huffman symbols.

【０２１７】シフタ６７２ブロックは１６ビット出力レ
ジスタを備え、次のハフマンシンボルを復号部６７３に
（ＭＳＢからＬＳＢの順番のビットストリームで）転送
する。シンボルは１６ビット以下であることも多いが、
どれだけのビットを解析するかを決定するのは復号部６
７３に任されている。シフタ６７２は復号部６７３から
フィードバック６７８、即ち現在のシンボル長と（ＪＰ
ＥＧモードにおける）現シンボルに続く付加ビット長と
を受信し、シフタ６７２における次のシンボルの開始時
点を適切に整列させる。The shifter 672 block has a 16-bit output register, and transfers the next Huffman symbol to the decoding unit 673 (in a bit stream from the MSB to the LSB). Symbols are often 16 bits or less,
It is the decoding unit 6 that determines how many bits to analyze.
73. Shifter 672 provides feedback 678 from decoding section 673, that is, the current symbol length and (JP
Receive the additional bit length following the current symbol (in the EG mode) and properly align the start of the next symbol in shifter 672.

【０２１８】復号部６７３はヒープに基づくアルゴリズ
ムのコアを実装しており、Ｏｂｕｓ６７４経由でデータ
キャッシュに接続されている。復号部６７３は、データ
キャッシュフェッチブロック、ルックアップ値比較部、
シンボル長カウンター、ヒープインデックス加算部、付
加ビット数の復号部（復号は復号値に基づいて行われ
る）を備える。ここで、フェッチアドレスは以下のよう
に解釈される。The decoding unit 673 has a core of an algorithm based on a heap, and is connected to the data cache via the Obus 674. The decoding unit 673 includes a data cache fetch block, a lookup value comparison unit,
It includes a symbol length counter, a heap index addition unit, and a decoding unit for decoding the number of additional bits (decoding is performed based on the decoded value). Here, the fetch address is interpreted as follows.

【０２１９】フェッチアドレスFetch address

【０２２０】[0220]

【表１５】 [Table 15]

【０２２１】出力フォーマッターブロック６７６は８ビ
ット値の復号や（スタンドアロンハフマンモード）、２
４ビット値と付加ビットとＲＳＴｍマーカ情報との３２
ビットワードへの結合（ＪＰＥＧモード）を行う。付加
ビットは、復号部６７３が現シンボルに対する付加ビッ
トの開始位置を決定した後に、シフタ６７２によって出
力フォーマッタ６７６に転送される。また、出力フォー
マッタ６７３は、最終値ワードを予測するために１ワー
ド遅延を用いた２ディープＦＩＦＯバッファを備えてい
る。復号処理においては、（高速、低速どちらでも）シ
フタ６７２が入力ビットストリームの最後部の詰込みビ
ットを復号しようと試みることが生じる。このような状
態はシフタによって通常検出され、「不適切シンボル」
インタラプトを送出する替わりに、「強制終了」信号を
送出する。アクティブな「強制終了」信号が送出される
と、出力フォーマッタ６７６は最近の１復号ワード（Ｆ
ＩＦＯにまだ存在している）を「最後」として送出し、
復号ストリームに属していない更に最近のワードを削除
する。The output formatter block 676 decodes an 8-bit value (stand-alone Huffman mode),
32 of the 4-bit value, the additional bit, and the RSTm marker information
Combine to bit words (JPEG mode). The additional bits are transferred to the output formatter 676 by the shifter 672 after the decoding unit 673 determines the start position of the additional bits for the current symbol. The output formatter 673 also includes a two deep FIFO buffer using one word delay to predict the final word. In the decoding process, it may occur that the shifter 672 (either fast or slow) attempts to decode the last stuffing bit of the input bitstream. Such a condition is usually detected by the shifter and may result in an "inappropriate symbol"
Instead of sending an interrupt, it sends a "forced end" signal. When an active "kill" signal is issued, output formatter 676 causes the most recent one decoded word (F
(Which is still in the IFO) as "last"
Delete more recent words that do not belong to the decoded stream.

【０２２２】図７０におけるハフマン符号化器６６０の
詳細を図７２に示す。ハフマン符号化器６６０はルック
アップテーブルを介してバイトデータをハフマンシンボ
ルにマッピングし、符号化部６８１、シフタ６８２、出
力フォーマッタ６８３、キャッシュからアクセスされる
ルックアップテーブルを備える。入力値６８５はデータ
キャッシュに格納された符号化テーブルを用いて符号化
部６８１において符号化される。テーブルとしては、符
号化すべき値ごとに対応コードを含むテーブルとコード
長を含むテーブルとの２つのテーブルが必要となるが、
シンボルを符号化する際にはキャッシュ２３０へのアク
セスは一度で良い。なお、ＪＰＥＧ圧縮においては、Ａ
Ｃ係数とＤＣ係数ごとに別のテーブルが必要となる。ま
た、サブサンプリングが実行されている場合には、サブ
サンプル要素と非サブサンプル要素ごとに別のテーブル
が必要となる。非ＪＰＥＧ圧縮では、２つのテーブル
（符号とサイズ）のみが必要である。符号はシフタ６８
２によって処理されて、出力ストリームをビットレベル
で構成する。また、シフタ６８２は、必要時のバイトパ
ディング処理であるＲＳＴｍとＥＯＩマーカ挿入処理を
も行う。そして、データバイトは出力フォーマッタ６８
３に転送され、Ｘ’００バイトでの挿入処理、Ｘ’ＦＦ
バイトやマーカ符号に先立つＦＦバイトでの詰込処理、
パッキングされたバイトのフォーマット処理を行う。な
お、非ＪＰＥＧモードでは、パッキングされたバイトの
フォーマット処理のみが行われる。FIG. 72 shows details of Huffman encoder 660 in FIG. The Huffman encoder 660 maps byte data to Huffman symbols through a look-up table, and includes an encoding unit 681, a shifter 682, an output formatter 683, and a lookup table accessed from a cache. The input value 685 is encoded by the encoding unit 681 using the encoding table stored in the data cache. As a table, two tables are required: a table including a corresponding code for each value to be encoded and a table including a code length.
Access to the cache 230 may be performed only once when encoding a symbol. In JPEG compression, A
A separate table is required for each of the C coefficient and the DC coefficient. When subsampling is performed, a separate table is required for each subsample element and each non-subsample element. Non-JPEG compression requires only two tables (code and size). The sign is shifter 68
2 to form an output stream at the bit level. The shifter 682 also performs RSTm and EOI marker insertion processing, which are byte padding processing when necessary. Then, the data byte is output formatter 68
3 and the insertion processing at X'00 byte, X'FF
Filling with FF byte preceding byte or marker code,
Formats the packed bytes. In the non-JPEG mode, only the format processing of the packed bytes is performed.

【０２２３】Ｘ’ＦＦバイトの挿入処理はシフター６８
２によって行われるため、出力フォーマッタ６８３は
Ｘ’ＦＦバイトを前に挿入するために、シフタ６８２か
らのどのバイトがマーカであるのかを知る必要がある。
これは、バイトに対応しているタグレジスタをシフター
６８２内に備えることによって行われる。バイト境界に
存在する各マーカは、マーカ挿入処理においてシフター
６８２によってタグ付けされる。結合処理部６８３はマ
ーカに先立つＸ”ＦＦ”バイト以降には挿入処理を行わ
ない。タグは、主シフトレジスタと同期してシフトされ
る。The X'FF byte insertion processing is performed by the shifter 68.
2, the output formatter 683 needs to know which byte from the shifter 682 is a marker in order to insert the X'FF byte before.
This is done by providing a shift register 682 with a tag register corresponding to the byte. Each marker located at a byte boundary is tagged by a shifter 682 in the marker insertion process. The combination processing unit 683 does not perform insertion processing after the X “FF” byte preceding the marker. Tags are shifted in synchronization with the main shift register.

【０２２４】ハフマン符号化器はＪＰＥＧ圧縮において
４あるいは８つのテーブルを用い、２つのテーブルを直
接ハフマン符号化に用いる。用いるテーブルを以下に示
す。ハフマン符号化器において用いられるテーブルThe Huffman encoder uses four or eight tables in JPEG compression and uses two tables directly for Huffman encoding. The table used is shown below. Table used in Huffman encoder

【０２２５】[0225]

【表１６】 [Table 16]

【０２２６】３．１７．４テーブルインデックシングハフマンテーブルは、コプロセッサデータキャッシュ２
３０において局所的に格納されている。データキャッシ
ュ２３０は、各ラインが８ワードから成る１２８ライン
の直接マッピングキャッシュとして構成される。キャッ
シュライン中の各ワードは独立にアドレスすることがで
き、この特徴をハフマン復号器が利用して同時に複数の
テーブルにアクセスする。テーブルは小さい（≦２５６
項目）なので、Ｏｂｕｓの３２ビットアドレスフィール
ドで複数のテーブルへのインデックスを含めることがで
きる。3.17.4 Table Indexing The Huffman table contains the coprocessor data cache 2
At 30 it is stored locally. Data cache 230 is configured as a 128 line direct mapping cache, with each line consisting of 8 words. Each word in the cache line can be addressed independently, and this feature is used by the Huffman decoder to access multiple tables simultaneously. The table is small (≦ 256
Item), an index to a plurality of tables can be included in the Obus 32-bit address field.

【０２２７】上述のように、ＪＰＥＧ低速復号モードで
は、様々なハフマンテーブルを格納するためにデータキ
ャッシュが用いられる。データキャッシュのフォーマッ
トを以下に示す。ハフマン／量子化テーブルのバンクアドレスAs described above, in the JPEG low-speed decoding mode, a data cache is used to store various Huffman tables. The format of the data cache is shown below. Huffman / Quantization table bank address

【０２２８】[0228]

【表１７】 [Table 17]

【０２２９】ＪＰＥＧ符号化器２４１（図２）において
ＪＰＥＧ命令が実行されるのに先立ち、画像次元レジス
タ（ＰＯ＿ＩＤＲ）あるいは（ＲＯ＿ＩＤＲ）に適切な
画像幅値がセットされなければならない。他の命令とと
もに、命令の長さは処理すべき入力データ項目数に関係
する。これはいかなるパディングデータをも含み、用い
られているサブサンプリングオプションや色チャネル数
にも関連する。Before the JPEG instruction is executed in the JPEG encoder 241 (FIG. 2), an appropriate image width value must be set in the image dimension register (PO_IDR) or (RO_IDR). As with other instructions, the length of the instruction is related to the number of input data items to be processed. This includes any padding data and is related to the subsampling options and the number of color channels used.

【０２３０】コプロセッサ２２４により出されたすべて
の命令は、生成する出力データ量を制限するために２つ
の機能を用いる。これらの機能は、入力と出力データの
サイズが異なるときにもっとも有効であり、特にＪＰＥ
Ｇ符号化／復号のように出力データサイズが未知である
ときに有効である。これらの機能は、出力データを書き
出すか、命令が適切に実行されたように見せながら単に
データを削除するかを決定する。デフォルトではこの機
能はオフになっており、ＲＯ＿ＣＦＧレジスタ中の適切
なビットをイネーブルにすることでオンとなる。しか
し、ＪＰＥＧ命令ではこのビットをセットする特別なオ
プションが用意されている。なお、ＪＰＥＧ圧縮を用い
る際には、コプロセッサ２２４は出力データの「削除」
や「制限」機能をサポートすることが望ましい。Every instruction issued by coprocessor 224 uses two functions to limit the amount of output data generated. These functions are most effective when the size of input and output data is different.
This is effective when the output data size is unknown as in G encoding / decoding. These functions determine whether to write out the output data or simply delete the data while making it appear that the instruction was properly executed. By default, this feature is off and is turned on by enabling the appropriate bit in the RO_CFG register. However, the JPEG instruction has a special option to set this bit. When JPEG compression is used, the coprocessor 224 deletes the output data.
And support for "restriction" functions.

【０２３１】図７３を用いて、削除、制限処理を説明す
る。入力画像６９０は、ある高さ６９１とある幅６９２
とを有する。ここで、画像の一部分のみに関心があり、
他の部位は印刷するのには関係がないというような状況
がしばしば存在する。しかしながら、ＪＰＥＧ符号化シ
ステムでは８×８ピクセルブロックを対象とする。その
ため、画像の幅が８の倍数とならない場合や、ＭＣＵ６
９５を構成する関心部位領域がきちんと境界と一致しな
い場合が生じる。そこで、出力削除レジスタＲＯ＿ＣＵ
Ｔは、出力データストリームのはじめの部位６９６にお
いて削除する出力バイト数を決定する。また、出力制限
レジスタＲＯ＿ＬＭＴは、生成する最大出力バイト数を
決定する。この最大出力バイト数は、削除レジスタの結
果に基づいてメモリに書込まれないバイトをも含む。こ
のような処理により、最終出力バイト６９８以降のデー
タは出力されないような最終出力バイトを求めることが
できる。The deletion and restriction processing will be described with reference to FIG. The input image 690 has a certain height 691 and a certain width 692
And Here, we are only interested in a part of the image,
There are often situations where other parts have nothing to do with printing. However, the JPEG encoding system targets 8 × 8 pixel blocks. Therefore, when the width of the image is not a multiple of 8, or when the MCU 6
There is a case where the region of interest constituting 95 does not exactly match the boundary. Therefore, the output deletion register RO_CU
T determines the number of output bytes to delete at the beginning 696 of the output data stream. The output limit register RO_LMT determines the maximum number of output bytes to be generated. This maximum number of output bytes includes those bytes that are not written to memory based on the result of the delete register. By such processing, it is possible to obtain a final output byte such that data after the final output byte 698 is not output.

【０２３２】ＪＰＥＧ復号器における削除、制限機能が
特に有効であるケースとして２つの場合がある。第１の
ケースは、図７４に示すように、復号画像の１ストリッ
プ７０１の一部位７００を抽出あるいは解凍する場合で
ある。第２のケースは、図７５に示すように、全体の画
像７１４において、複数の完全なストリップ（例えば、
７１１、７１２、７１３）の抽出あるいは解凍が必要と
なる場合である。There are two cases where the deletion and restriction functions in the JPEG decoder are particularly effective. The first case is a case where one portion 700 of one strip 701 of the decoded image is extracted or decompressed as shown in FIG. In the second case, as shown in FIG. 75, in the entire image 714, multiple complete strips (eg,
711, 712, and 713).

【０２３３】ＪＰＥＧ命令の命令フォーマットやフィー
ルド符号化を図７６に示す。マイナーオプコードフィー
ルドの説明を以下に記す。命令ワード−マイナーオプコードフィールドFIG. 76 shows the instruction format and the field encoding of the JPEG instruction. The description of the minor opcode field is described below. Instruction word-minor opcode field

【０２３４】[0234]

【表１８】 [Table 18]

【０２３５】３．１７．５データ符号化命令コプロセッサ２２４は図２のＪＰＥＧ符号化器の一部を
他の用途で用いることができる機能を備えることが望ま
しい。例えば、ハフマン符号化はＪＰＥＧのみならず他
の圧縮手法においても用いられる。また、階層的画像復
号のためのみにハフマン符号化部を制御するデータ符号
化命令が備わっていることも望ましい。更に、ランレン
クス符号化器／復号器、予測符号化器も同様の命令でも
って独立に用いられることができる。3.17.5 Data Coding Instruction The coprocessor 224 preferably has a function that allows a part of the JPEG encoder shown in FIG. 2 to be used for other purposes. For example, Huffman coding is used not only in JPEG but also in other compression methods. It is also desirable that a data encoding instruction for controlling the Huffman encoding unit be provided only for hierarchical image decoding. Further, the run-lens encoder / decoder and prediction encoder can be used independently with similar instructions.

【０２３６】３．１７．６高速ＤＣＴ装置従来の図７７に示したような離散コサイン変換（ＤＣ
Ｔ）装置では、まず８×８ブロックの列方向に対して１
次元ＤＣＴを実行し、次いで８×８ピクセルブロックの
行方向に更に１次元ＤＣＴすることにより、８×８ピク
セルブロックの２次元変換を実行する。このような装置
では、入力回路１０９６、算術回路１１０４、制御回路
１０９８、置換メモリ回路１０９０、出力回路１０９２
を一般に備える。3.17.6 High-Speed DCT Apparatus A conventional discrete cosine transform (DC) as shown in FIG.
T) In the apparatus, first, 1 ×
Perform a two-dimensional transform of the 8 × 8 pixel block by performing a one-dimensional DCT in the row direction of the 8 × 8 pixel block, then performing a one-dimensional DCT in the row direction of the 8 × 8 pixel block. In such a device, an input circuit 1096, an arithmetic circuit 1104, a control circuit 1098, a replacement memory circuit 1090, an output circuit 1092
Is generally provided.

【０２３７】入力回路１０９６は８×８ブロックから８
ビットピクセルを受信する。入力回路１０９６は、中間
マルチプレクサ１１００、１１０２を介して算術回路１
１０４に接続されている。算術回路１１０４は、８×８
ブロックの完全な列あるいは行に対して算術処理を行
う。制御回路１０９８は、他の全ての回路を制御し、Ｄ
ＣＴアルゴリズムを実行する。算術回路の出力は、置換
メモリ１０９０、レジスタ１０９５、出力回路１０９２
に送られる。置換メモリは更にマルチプレクサ１１００
に接続され、マルチプレクサ１１００は次のマルチプレ
クサ１１０２に出力を送出する。また、マルチプレクサ
１１０２はレジスタ１０９４からのデータをも受信す
る。置換回路１０９０は８×８ブロックデータを列形式
で入力し、行形式でデータを出力する。出力回路１０９
２はピクセルデータの８×８ブロックに対するＤＣＴ係
数を出力する。The input circuit 1096 converts 8 × 8 blocks to 8
Receive a bit pixel. The input circuit 1096 is connected to the arithmetic circuit 1 via the intermediate multiplexers 1100 and 1102.
104. The arithmetic circuit 1104 is 8 × 8
Perform arithmetic on the complete column or row of a block. The control circuit 1098 controls all other circuits,
Execute the CT algorithm. The output of the arithmetic circuit is output from the replacement memory 1090, the register 1095, the output circuit 1092.
Sent to The replacement memory further includes a multiplexer 1100
And the multiplexer 1100 sends the output to the next multiplexer 1102. Multiplexer 1102 also receives data from register 1094. The replacement circuit 1090 inputs 8 × 8 block data in a column format and outputs data in a row format. Output circuit 109
2 outputs DCT coefficients for 8 × 8 blocks of pixel data.

【０２３８】通常のＤＣＴ装置では、算術回路１１０４
がもっとも複雑であるため、算術回路１１０４の速度が
全体の装置速度を決定する。図７７の算術回路１１０４
は、一般に算術処理を図７８を用いて説明するように複
数の処理段階に分割して処理を行う。従って、各処理段
階１１４４、１１４８、１１５２、１１５６を加算器や
乗算器などの通常の資源を用いて実行するような単一回
路が用いられる。このような算術回路１１０４では、単
一の共通回路が回路１１０４の種々の処理段階を実行す
るために用いられるため、最適速度に比べて速度が遅く
なるという欠点を有する。また、中間結果を蓄える格納
手段もこれに含まれる。回路のクロックサイクル時間は
少なくとも最も遅い回路段階以上でなければならないた
め、全体の処理に要する時間は各処理段階に要する時間
の和以上となり得る。In a normal DCT device, the arithmetic circuit 1104
Is the most complex, the speed of the arithmetic circuit 1104 determines the overall device speed. Arithmetic circuit 1104 in FIG. 77
Generally performs arithmetic processing by dividing arithmetic processing into a plurality of processing steps as described with reference to FIG. Thus, a single circuit is used that performs each processing step 1144, 1148, 1152, 1156 using normal resources such as adders and multipliers. Such an arithmetic circuit 1104 has the disadvantage that the speed is slower than the optimal speed because a single common circuit is used to execute the various processing stages of the circuit 1104. This also includes storage means for storing intermediate results. Since the clock cycle time of the circuit must be at least as long as the slowest circuit stage, the time required for the entire process can be equal to or greater than the sum of the time required for each processing stage.

【０２３９】図７８は、図７７の装置における通常の算
術データパスを示したものであり、ＤＣＴを４処理段階
で行う処理の一部を示している。なお、本図は実際の実
装を示したものでなく、機能を示したものである。４処
理段階１１４４、１１４８、１１５２、１１５６のそれ
ぞれは、単一の再構成可能な回路として構築される。サ
イクルごとに、１次元ＤＣＴの４処理段階１１４４、１
１４８、１１５２、１１５６のそれぞれが再構成され
る。また、この回路においては、４処理段階１１４４、
１１４８、１１５２、１１５６のそれぞれが共通の資源
（加算器や乗算器など）のプールを用いることで、ハー
ドウェア規模を小さくしてえる。FIG. 78 shows a normal arithmetic data path in the apparatus shown in FIG. 77, and shows a part of the processing for performing the DCT in four processing stages. Note that this drawing does not show actual implementation but shows functions. Each of the four processing stages 1144, 1148, 1152, 1156 is built as a single reconfigurable circuit. 4 cycles of 1D DCT 1144, 1
Each of 148, 1152, 1156 is reconfigured. Also, in this circuit, four processing steps 1144,
Each of 1148, 1152, and 1156 uses a pool of common resources (such as an adder and a multiplier), so that the hardware scale can be reduced.

【０２４０】しかしながら、この回路の欠点は速度が最
適になっていないことである。４処理段階１１４４、１
１４８、１１５２、１１５６はそれぞれが加算器や乗算
器の同一プールから構成されている。そのため、クロッ
クピリオドは最も遅い処理段階によって決定される（こ
の例ではブロック１１４４の２０ｎｓ）。入力と出力マ
ルチプレクサ１１４６と１１５４の遅延（それぞれ２ｎ
ｓ）と、フリップフロップ１１５０の遅延（３ｎｓ）を
足すと、全体の遅延が２７ｎｓとなる。従って、このＤ
ＣＴ構成では最速２７ｎｓで動作する。However, a disadvantage of this circuit is that the speed is not optimal. 4 processing steps 1144, 1
Each of 148, 1152 and 1156 is composed of the same pool of adders and multipliers. Thus, the clock period is determined by the slowest processing stage (in this example, 20 ns of block 1144). Delay of input and output multiplexers 1146 and 1154 (2n each
s) and the delay of the flip-flop 1150 (3 ns), the total delay is 27 ns. Therefore, this D
In the CT configuration, it operates at a maximum speed of 27 ns.

【０２４１】パイプライン形式のＤＣＴ構成もよく知ら
れている。この構成の欠点は、多量のハードウェアを必
要とする点である。スループットの観点では本発明の構
成ではパイプライン構成に及ばないものの、現在のほと
んどのＤＣＴ構成と比べてきわめて良好な性能／サイズ
特性や速度特性を示す。図７９は、ピクセルデータが入
力回路１１２６に入力され、８ビットピクセルデータの
列を格納するようなＪＰＥＧ符号化器（図２）において
用いられる好適な離散コサイン変換部の構成を示した図
である。置換メモリは、２次元離散コサイン変換の２回
目のパスを実施するために、列形式データを行形式デー
タに変換する。入力回路１１２６と置換メモリ１１１８
からのメモリは、マルチプレクサ１１２４においてマル
チプレキシングされ、出力データが算術回路１１２２に
送られる。算術回路１１２２の結果は、２回目のパスの
終了後出力回路１１２０に送られる。制御回路１１１６
は、離散コサイン変換装置中のデータの流れを制御す
る。The DCT configuration of the pipeline type is well known. The disadvantage of this configuration is that it requires a lot of hardware. Although the configuration of the present invention does not reach the pipeline configuration from the viewpoint of throughput, it exhibits extremely good performance / size characteristics and speed characteristics as compared with most current DCT configurations. FIG. 79 is a diagram showing a configuration of a suitable discrete cosine transform unit used in a JPEG encoder (FIG. 2) in which pixel data is input to an input circuit 1126 and stores a column of 8-bit pixel data. . The permutation memory converts the columnar data to row-format data to perform the second pass of the two-dimensional discrete cosine transform. Input circuit 1126 and replacement memory 1118
Are multiplexed in a multiplexer 1124 and the output data is sent to an arithmetic circuit 1122. The result of the arithmetic circuit 1122 is sent to the output circuit 1120 after the end of the second pass. Control circuit 1116
Controls the data flow in the discrete cosine transform device.

【０２４２】離散コサイン変換処理の第１回目のパスで
は、変換すべき画像の列データあるいはピクセルデータ
に逆変換される変換画像係数が、入力回路１１２６に送
られる。このパスでは、マルチプレクサ１１２４は制御
回路１１１６によって設定され、入力回路１１２６から
算術回路１１２２にデータが送られる。図８０は、算術
回路１１２２の構成をより詳細に示した図である。フォ
ワード離散コサイン変換の実行の場合には、フォワード
離散コサイン変換を実行するフォワード回路１１３８の
結果がマルチプレクサ１１２４において選択される。こ
こで、マルチプレクサ１１２４は制御回路１１１６によ
って設定される。逆離散コサイン変換の実行の場合に
は、制御回路１１２６の設定に基づいて、逆回路１１４
０からの出力がマルチプレクサ１１４２において選択さ
れる。１回目のパスでは、各列ベクトルが算術回路１１
２２（制御回路１１６６によって適切に設定される）に
よって処理された後、当該ベクトルが置換メモリ１１１
８に書込まれる。８×８ブロック中のすべての８列ベク
トルの処理が終わり、置換メモリ１１１８に書込まれる
と、離散コサイン変換の２回目のパスが開始される。In the first pass of the discrete cosine transform processing, a transformed image coefficient to be inversely transformed into column data or pixel data of an image to be transformed is sent to the input circuit 1126. In this path, the multiplexer 1124 is set by the control circuit 1116, and data is sent from the input circuit 1126 to the arithmetic circuit 1122. FIG. 80 is a diagram showing the configuration of the arithmetic circuit 1122 in more detail. In the case of executing the forward discrete cosine transform, the result of the forward circuit 1138 that executes the forward discrete cosine transform is selected in the multiplexer 1124. Here, the multiplexer 1124 is set by the control circuit 1116. In the case of performing the inverse discrete cosine transform, the inverse circuit 114 is set based on the setting of the control circuit 1126.
The output from 0 is selected in multiplexer 1142. In the first pass, each column vector is stored in the arithmetic circuit 11
22 (set appropriately by the control circuit 1166), the vector is
Written in 8. When all eight column vectors in the 8 × 8 block have been processed and written to the permutation memory 1118, the second pass of the discrete cosine transform is started.

【０２４３】フォワードあるいは逆離散コサイン変換の
２回目のパスでは、行形式のベクトルが置換メモリ１１
１８から読み出され、マルチプレクサ１１２４を介して
算術回路１１２２に送られる。このパスでは、マルチプ
レクサ１１２４は入力回路１１３６からのデータを無視
し、置換メモリ１１１８からの行ベクトルデータを算術
回路１１２２に転送するように、制御回路によって設定
される。算術回路１１２２中のマルチプレクサ１１４２
は、逆回路１１４０からの結果データを算術回路１１２
２の出力に送る。算術回路１１２２からの結果が得られ
た時点で、制御回路１１１６からの指令に基づいて出力
回路１１２０は結果を取り込み、以降の時点で出力す
る。In the second pass of the forward or inverse discrete cosine transform, the vector in row format is
18 and sent to the arithmetic circuit 1122 via the multiplexer 1124. In this pass, the multiplexer 1124 is set by the control circuit to ignore the data from the input circuit 1136 and to transfer the row vector data from the replacement memory 1118 to the arithmetic circuit 1122. Multiplexer 1142 in arithmetic circuit 1122
Converts the result data from the inverse circuit 1140 into the arithmetic circuit 112
Send to the output of 2. When the result from the arithmetic circuit 1122 is obtained, the output circuit 1120 takes in the result based on a command from the control circuit 1116 and outputs the result at a subsequent time.

【０２４４】算術回路１１２２は、中間結果を格納する
記憶部位を持たないという点で、組み合わせ回路となっ
ている。制御回路１１１６は、データが入力回路１１３
６からマルチプレクサ１１２４や算術回路１１２２を介
して出力されるまでに要する時間を把握しているため、
算術回路１１２２の出力からの結果ベクトルを出力回路
１１２０に取り込む時点を正確に指示することができ
る。算術回路１１２２において中間記憶を持たない利点
は、中間記憶要素との間でのデータのやり取りに必要な
時間を省くことができるとともに、算術回路１１２２を
データが通過するのに要する時間が内部処理段すべての
和となり、最大の時間を要する処理段のＮ倍（従来の離
散コサイン変換装置のように）にはならないことが挙げ
られる。なお、ここで、Ｎは算術回路中の処理段数であ
る。Arithmetic circuit 1122 is a combinational circuit in that it does not have a storage section for storing intermediate results. The control circuit 1116 outputs data to the input circuit 113.
Since the time required from 6 to the output from the multiplexer 1124 and the arithmetic circuit 1122 is known,
The time at which the result vector from the output of the arithmetic circuit 1122 is taken into the output circuit 1120 can be accurately indicated. The advantage of having no intermediate storage in the arithmetic circuit 1122 is that the time required for exchanging data with the intermediate storage element can be saved, and the time required for data to pass through the arithmetic circuit 1122 is reduced by the internal processing stage. That is, the sum of all of them does not become N times the processing stage requiring the longest time (as in the conventional discrete cosine transform device). Here, N is the number of processing stages in the arithmetic circuit.

【０２４５】図８１は、全体の遅延が単に４つの処理段
１１５８、１１６０、１１６２、１１６４の和、２０ｎ
ｓ＋１０ｎｓ＋１２ｎｓ＋１５ｎｓ＝５７ｎｓとなり、
図７８の回路よりも高速となることを示している。この
ような回路によれば、全体のシステムクロックサイクル
を短くすることができる。図８１の回路において、結果
を得るのに４クロックサイクルが必要であるとすると、
全体のＤＣＴシステムにおいて最小実行時間は５７／４
ｎｓ（１４．２５ｎｓ）となり、図７８ではＤＣＴクロ
ックサイクルが２７ｎｓとせざるを得ないことを鑑みる
と大幅な性能向上となることがわかる。FIG. 81 shows that the total delay is simply the sum of four processing stages 1158, 1160, 1162, 1164, 20n
s + 10 ns + 12 ns + 15 ns = 57 ns,
It shows that the speed is higher than that of the circuit of FIG. According to such a circuit, the entire system clock cycle can be shortened. Assuming that four clock cycles are required to obtain the result in the circuit of FIG.
The minimum execution time for the entire DCT system is 57/4
ns (14.25 ns), and in FIG. 78, it can be seen that the performance is greatly improved in consideration of the fact that the DCT clock cycle must be 27 ns.

【０２４６】本ＤＣＴ装置の実際の実行時においては、
ＹｕｋｉｈｉｒｏＡｒａｉ，ＴａｋｅｓｈｉＡｇｕ
ｉ，ＭａｓａｙｕｋｉＮａｋａｊｉｍａらによるＴｈ
ｅＴｒａｎｓａｃｔｉｏｎｓｏｆｔｈｅＩＥＩＣ
Ｅ，ｖｏｌ，Ｅ７１，ｎｏ．１１，１９８８年１１月の
ページ１０９５に掲載された論文「画像のための高速Ｄ
ＣＴ−ＳＱ手法」で示されたＤＣＴアルゴリズムを用い
ることもできる。このアルゴリズムをハードウェアで実
行することで、本ＤＣＴ装置中の算術回路１１２２に容
易に配置することができる。同様に、他のＤＣＴアルゴ
リズムを算術回路１１２２中にハードウェアとして配置
することも可能である。At the time of actual execution of the present DCT apparatus,
Yukihiro Arai, Takeshi Agu
i, Th by Masayuki Nakajima et al.
eTransactions of the IEIC
E, vol, E71, no. 11, a paper "High-speed D for Images" published on page 1095 of November 1988.
The DCT algorithm described in “CT-SQ method” can also be used. By executing this algorithm by hardware, it can be easily arranged in the arithmetic circuit 1122 in the DCT device. Similarly, other DCT algorithms can be implemented as hardware in arithmetic circuit 1122.

【０２４７】３．１７．７ハフマン復号器以下の実施例は、種々の長さのビットフィールドがイン
ターリーブされた可変長符号に対する手法と装置に関す
るものである。特に、本発明の実施例は、可変長符号化
データの効率の良い、高速な、単一処理段（クロックサ
イクル）の復号を提供するものである。ここで、可変長
符号化されていず整列されているようなデータとは、既
に別の前処理ブロックにおいて符号化データストリーム
から削除されているものとする。更に、削除されたバイ
ト整列データの位置情報は、復号されるデータと同時に
復号器の出力に送られる。また、前処理された入力デー
タ中に残っているバイト整列、非可変長符号化ビットフ
ィールドの高速な検出、並びに削除をも提供するもので
ある。3.17.7 Huffman Decoder The following embodiment relates to a method and apparatus for a variable length code in which bit fields of various lengths are interleaved. In particular, embodiments of the present invention provide for efficient, fast, single processing stage (clock cycle) decoding of variable length encoded data. Here, it is assumed that data that is not variable-length coded but is aligned has already been deleted from the coded data stream in another preprocessing block. Further, the position information of the deleted byte-aligned data is sent to the output of the decoder at the same time as the data to be decoded. It also provides for alignment of bytes remaining in the preprocessed input data, fast detection of non-variable length coded bit fields, and elimination.

【０２４８】本発明の好適な実施例では、マーカ符号間
のクロックサイクルごとに１ハフマンシンボルといった
レートで、ＪＰＥＧ符号化データを復号することのでき
る高速ハフマン復号器を備えることが望ましい。これ
は、別の前処理ブロックにおいて、入力データからバイ
ト整列されハフマン符号化されていないマーカヘッダ、
マーカ符号、挿入バイトを分離し、除去する手法によっ
て実現できる。バイト整列されたデータが除去される
と、入力データはデータシフト組み合わせ回路ブロック
に送られ、データ復号レジスタの連続的な挿入処理を行
い、復号部位にデータが送られる。もとの入力データか
ら除去されたマーカの位置はマーカシフトブロックに送
られ、データシフトブロックにおいてシフトされた入力
データと同時にマーカ位置ビットのシフトが行われる。In a preferred embodiment of the present invention, it is desirable to have a high-speed Huffman decoder capable of decoding JPEG encoded data at a rate of one Huffman symbol per clock cycle between marker codes. This includes, in another preprocessing block, a marker header that is byte-aligned from the input data and not Huffman coded,
This can be realized by a method of separating and removing the marker code and the insertion byte. When the byte-aligned data is removed, the input data is sent to the data shift combination circuit block to perform a continuous insertion process of the data decoding register, and the data is sent to the decoding part. The position of the marker removed from the original input data is sent to the marker shift block, and the marker position bits are shifted simultaneously with the input data shifted in the data shift block.

【０２４９】復号部は、データ復号レジスタから入力さ
れた符号化ビットフィールドを組合せ回路で復号する。
復号部の出力は、復号値（ｖ）と入力符号の実際の長さ
（ｍ）である。ここで、ｍはｎ以下である。また、可変
長ビットフィールドの長さ（ａ）も出力する。ここで、
ａは０以上の値である。可変長ビットフィールドはハフ
マン符号化されていないため、すぐにハフマン符号化さ
れる。復号部の入力中の長さｎのビットフィールドは実
際の符号以上の長さを有する。復号部では、実際のコー
ド長（ｍ）を決定し、他のビット（ａ）の長さとともに
制御ブロックに転送する。制御ブロックはシフト値（ａ
＋ｍ）を決定し、データ／マーカシフトブロックを起動
して次の復号サイクルに備えて入力データをシフトす
る。The decoding section decodes the coded bit field input from the data decoding register by the combination circuit.
The output of the decoder is the decoded value (v) and the actual length (m) of the input code. Here, m is n or less. It also outputs the length (a) of the variable length bit field. here,
a is a value of 0 or more. Since the variable length bit field is not Huffman coded, it is immediately Huffman coded. The bit field of length n in the input of the decoding unit has a length greater than the actual code. The decoding unit determines the actual code length (m) and transfers it to the control block together with the length of the other bits (a). The control block determines the shift value (a
+ M) and activates the data / marker shift block to shift the input data in preparation for the next decoding cycle.

【０２５０】本発明の装置では、復号値、入力符号の実
際の長さ、ハフマン符号化されていないビットフィール
ドの長さを所定の時間内に出力するものであれば、ＲＯ
Ｍ，ＲＡＭ，ＰＬＡなどのいかなる組合せ回路の復号部
を用いることができる。本実施例では、復号部は、ＪＰ
ＥＧ標準で規定されているように予測符号化ＤＣ係数値
やＡＣランレンクス値を出力する。また、ＪＰＥＧ標準
で規定されているように、復号値と同時に入力データか
ら除去されたハフマン符号化されていないビットフィー
ルドは、ＤＣとＡＣ係数の値を決定する付加ビットを示
す。データ復号レジスタ中のデータから除去されたハフ
マン符号化されていないビットフィールドの他の種別と
しては、ＪＰＥＧ標準に規定されているようにもとの入
力データストリーム中のバイト整列マーカに先立つパデ
ィングビットがある。これらのビットは、制御ブロック
がデータレジスタのパディング領域の内容をチェックす
ることによって検出される。パディング領域はデータレ
ジスタのｋ最大ビットから成り、マーカレジスタの最大
ビット中のマーカビットの存在によって示される。パデ
ィング領域中のすべてのビットが同一（ＪＰＥＧ標準で
は１）であれば、パディングビットとして判断され、復
号されることなくデータレジスタから除去される。そし
て、次の復号サイクルに向けて、データとマーカレジス
タの内容は更新される。In the apparatus of the present invention, if the decoded value, the actual length of the input code, and the length of the bit field not subjected to Huffman coding are output within a predetermined time, RO
A decoding unit of any combinational circuit such as M, RAM, and PLA can be used. In the present embodiment, the decoding unit
It outputs predictive coded DC coefficient values and AC run length values as specified in the EG standard. Further, as specified in the JPEG standard, the non-Huffman-coded bit field removed from the input data at the same time as the decoded value indicates additional bits that determine the values of the DC and AC coefficients. Another type of non-Huffman coded bit field removed from the data in the data decoding register is a padding bit preceding the byte alignment marker in the original input data stream as specified in the JPEG standard. is there. These bits are detected by the control block checking the contents of the padding area of the data register. The padding area consists of the k most significant bits of the data register and is indicated by the presence of a marker bit in the largest bit of the marker register. If all bits in the padding area are the same (1 in the JPEG standard), they are determined as padding bits and are removed from the data register without being decoded. Then, the data and the contents of the marker register are updated for the next decoding cycle.

【０２５１】装置の実施例では、本発明の好適な実施例
の要求に応じて、出力データのフォーマット処理を行う
出力ブロックを備える。出力ブロックは、ＪＰＥＧにお
ける付加ビットなどのように、対応する可変長符号化さ
れていないビットフィールドや、ＪＰＥＧにおけるマー
カのように整列された入力バイトや符号化されていない
ビットフィールドの位置を示す信号とともに、復号値を
出力する。The apparatus embodiment includes an output block for performing output data format processing as required by the preferred embodiment of the present invention. The output block is a signal indicating the position of a corresponding non-variable-length coded bit field such as an additional bit in JPEG, an input byte aligned as a marker in JPEG, or an uncoded bit field. And outputs the decoded value.

【０２５２】ＪＰＥＧ符号化器２４１（図２）によって
復号されたデータは、ＪＰＥＧコンパチブルであり、
「付加ビット」と呼ばれる可変長符号化されていないビ
ットフィールド、「パディングフィールド」と呼ばれる
可変長符号化されていないニットフィールド、「マー
カ」「挿入バイト」「詰込バイト」と呼ばれる固定長
の、バイト整列された、符号化されていないビットフィ
ールドがインタリーブされた可変長ハフマン符号化コー
ドから構成される。図８２に代表的な入力データを示
す。Data decoded by the JPEG encoder 241 (FIG. 2) is JPEG compatible,
Variable length uncoded bit fields called "additional bits", variable length uncoded knit fields called "padding fields", fixed lengths called "markers", "insertion bytes", and "stuffing bytes" The byte-aligned, uncoded bit field consists of an interleaved variable length Huffman coded code. FIG. 82 shows typical input data.

【０２５３】ＪＰＥＧ符号化器２４１のハフマン復号器
中の全体構成やデータフローを図８３と図８４に示す。
図８３は、ＪＰＥＧデータのハフマン復号器の構成を詳
細に示している。ストリッパ１１７１はマーカ符号（符
号ＦＦＸＸｈｅｘ，ＸＸは非零）を除去し、バイト（符
号ＦＦｈｅｘ）を挿入し、バイト（符号Ｆｆｈｅｘに続
く符号００ｈｅｘ）を詰込む。これらはすべて入力デー
タのバイト整列された要素であり、３２ビットワードと
してストリッパに送られる。処理すべき第１ワードの最
大ビットは、入力ビットストリームの先頭になる。スト
リッパ１１７１では、バイト整列されたビットフィール
ドが、ハフマン符号の復号処理が復号器のダウンストリ
ーム部位において実際に行われる前に、入力データから
除去される。FIGS. 83 and 84 show the entire configuration and data flow in the Huffman decoder of the JPEG encoder 241. FIG.
FIG. 83 shows the configuration of the Huffman decoder for JPEG data in detail. The stripper 1171 removes the marker code (code FFXXhex, XX is non-zero), inserts a byte (code FFhex), and fills the byte (code 00hex following the code Ffhex). These are all byte-aligned elements of the input data and are sent to the stripper as 32-bit words. The largest bit of the first word to be processed is at the beginning of the input bit stream. In the stripper 1171, the byte-aligned bit field is removed from the input data before the Huffman code decoding process is actually performed in the downstream part of the decoder.

【０２５４】入力データはストリッパ１１７１にクロッ
クサイクルに１つごとの３２ビットワードとして入力さ
れる。入力バイト１２１１を０から３への番号付けを図
８５に示す。番号（ｉ）のバイトが挿入バイト、詰込バ
イト、あるいはマーカであるため除去されたとすると、
番号（ｉ−１）から０の残りのバイトがストリッパ１１
７１の出力で左にシフトされ、番号（ｉ）を１減らす。
この際、バイト０は「無関係な」バイトとなる。ストリ
ッパ１１７１から出力されたバイトの有効性は、図８５
に示されている別の出力タグ１２１２によって符号化さ
れる。ストリッパ１１７１によって除去されないバイト
はストリッパにおいて左詰めで出力される。出力中の各
バイトは、対応するバイトが有効（ストリッパ１１７１
を通過する）か、無効（ストリッパ１１７１で除去され
る）か、有効かつマーカの後部か、を示すタグが付加さ
れる。タグ１２１２は、データシフタを通してデータレ
ジスタ１１８２へのデータバイトのロードを制御すると
ともに、マーカシフタを通してマーカレジスタ１１８３
へのマーカ位置のロードを制御する。入力ワードから１
バイト以上削除された場合でも同様の手法が実行され
る。すなわち、すべての残りの有効バイトが左詰めさ
れ、対応する出力タグが出力バイトの有効性を示す。図
８５には、種々の入力バイトの組み合わせに対する出力
バイトと出力タグの例１２１３が示されている。The input data is input to stripper 1171 as one 32-bit word in each clock cycle. FIG. 85 shows the numbering of the input bytes 1211 from 0 to 3. If the byte with number (i) is removed because it is an insertion byte, a padding byte, or a marker,
The remaining bytes from the number (i-1) to 0 are the stripper 11
At the output of 71, it is shifted left and decrements the number (i) by one.
At this time, byte 0 becomes an “irrelevant” byte. The validity of the byte output from the stripper 1171 is shown in FIG.
Are encoded by another output tag 1212 shown in FIG. Bytes not removed by stripper 1171 are output left-justified in the stripper. For each byte being output, the corresponding byte is valid (stripper 1171).
), Invalid (removed by the stripper 1171), or valid and at the rear of the marker. Tag 1212 controls the loading of data bytes into data register 1182 through the data shifter and marker register 1183 through the marker shifter.
Control the loading of marker positions into 1 from input word
A similar technique is performed when more than one byte is deleted. That is, all remaining valid bytes are left justified, and the corresponding output tag indicates the validity of the output byte. FIG. 85 shows examples 1213 of output bytes and output tags for various combinations of input bytes.

【０２５５】図８３において、プレシフタとポストシフ
タブロック１１７２、１１７３、１１８０、１１８１の
役割は、データレジスタ１１８２とマーカレジスタ１１
８３に十分な空き領域がある場合にデータレジスタとマ
ーカレジスタとに連続的にデータをロードすることであ
る。データシフタとマーカシフタブロックは、プレシフ
タブロックとポストシフタブロックとから成るが、それ
ぞれは同一であり同様に制御される。差異は、データシ
フタがストリッパ１１７１からのデータを処理するのに
対し、マーカシフタはタグのみを処理し、マーカ位置を
復号されたハフマン値と同時に復号器に出力する点にあ
る。ポストシフタ１１８０、１１８１の出力は、図８３
に示されているように対応するレジスタ１１８２、１１
８３に直接転送される。In FIG. 83, the roles of the pre-shifter and post-shifter blocks 1172, 1173, 1180 and 1181 are as follows.
That is, when there is a sufficient free area in the data register 83, data is continuously loaded into the data register and the marker register. The data shifter and the marker shifter block are composed of a pre-shifter block and a post-shifter block, which are identical and controlled similarly. The difference is that the data shifter processes the data from the stripper 1171 while the marker shifter processes only the tags and outputs the marker positions to the decoder at the same time as the decoded Huffman values. The outputs of the post shifters 1180 and 1181 are shown in FIG.
Corresponding registers 1182, 11 as shown in FIG.
83 directly.

【０２５６】図８６にもデータプレシフタ１１７２が示
されているが、データプレシフタ１１７２は、ストリッ
パ１１７１からのデータに３２個のゼロを最小ビット１
２５１に付加し、６４ビットにデータを拡張する。次い
で、拡張データは６４ビット幅のバレルシフタ１２５２
で右にデータレジスタ１１８２に現在存在するビット数
だけシフトされる。この際、ビット数は、データ１１８
２、マーカ１１８３レジスタ内にどれだけの有効ビット
が存在するかを常に把握している制御ロジック１１８５
から与えられる。そして、バレルレジスタ１２５２は、
６４ビットを、６４個の２×１基本マルチプレクサ１２
５４から成るマルチプレクサブロック１２５３に転送す
る。各基本２×１マルチプレクサ１２５４は、バレルシ
フタ１２５２からの１ビットとデータレジスタ１１８２
からの１ビットを入力とする。データレジスタ中のビッ
トが有効であるときにデータレジスタビットを出力す
る。一方、無効である場合には、バレルシフタ１２５２
のビットを出力する。すべての基本マルチプレクサ１２
５４への制御信号は、図８６ならびに図８７におけるレ
ジスタ１２２３のプレシフタ制御ビット０．．．５とし
て示されているように制御ブロックのシフト制御１信号
より復号される。基本マルチプレクサ１２５４の出力は
バレルシフタ１２５５に送られ、図８６に示されるよう
に５ビット制御信号シフト制御２より与えられるビット
数分左にシフトされる。これらのビットは、データレジ
スタ１１８２において現データの復号によって使用され
るビット数を示したものであり、現復号ハフマンコード
長と続く付加ビット数、あるいはパディングビットが検
出されていれば削除されるパディングビット数、あるい
はデータレジスタ１１８２中の有効ビット数が削除され
るビット数以下であれば０を足したものとなる。このよ
うにして、バレルシフタ１２５５から出力されるデータ
には、単一復号サイクルの後にデータレジスタ１１８２
にロードされる新しいデータが含まれることになる。デ
ータレジスタ１１８２の内容は、最大ビットが復号され
るためにレジスタからシフトアウトされ、ストリッパ１
１７１から０、８、１６、２４、３２ビットがデータレ
ジスタ１１８２に付加されるといった具合に変更され
る。データレジスタ１１８２に復号できるだけの十分な
ビットが存在しない場合には、ストリッパ１１７１から
のデータが存在すれば現サイクルにおいてロードされ
る。現サイクルにおいてストリッパ１１７１からのデー
タが存在しない場合には、データレジスタ１１８２から
の復号ビットは、十分なビット数であれば削除され、十
分なビット数でなければデータレジスタ１１８２の内容
は変更されない。FIG. 86 also shows data pre-shifter 1172. Data pre-shifter 1172 adds 32 zeros to the minimum bit 1 in the data from stripper 1171.
251 to extend the data to 64 bits. Next, the extension data is a barrel shifter 1252 having a width of 64 bits.
To the right by the number of bits currently in the data register 1182. At this time, the number of bits is
2. Control logic 1185 that keeps track of how many valid bits are in the marker 1183 register
Given by And the barrel register 1252
64 bits are divided into 64 2 × 1 basic multiplexers 12
54 to the multiplexer block 1253. Each basic 2 × 1 multiplexer 1254 includes one bit from barrel shifter 1252 and data register 1182.
1 bit as input. The data register bit is output when the bit in the data register is valid. On the other hand, if invalid, the barrel shifter 1252
The bits of are output. All basic multiplexers 12
The control signal to register 54 is the pre-shifter control bit 0. . . 5 is decoded from the shift control 1 signal of the control block. The output of basic multiplexer 1254 is sent to barrel shifter 1255, and is shifted to the left by the number of bits given by 5-bit control signal shift control 2, as shown in FIG. These bits indicate the number of bits used for decoding the current data in the data register 1182, and include the current decoded Huffman code length followed by the number of additional bits, or padding to be deleted if padding bits are detected. If the number of bits or the number of valid bits in the data register 1182 is equal to or less than the number of bits to be deleted, 0 is added. Thus, the data output from barrel shifter 1255 contains data register 1182 after a single decode cycle.
Will contain the new data to be loaded. The contents of data register 1182 are shifted out of the register so that the most significant bit is decoded and stripper 1
From 171, 0, 8, 16, 24, and 32 bits are added to the data register 1182 and so on. If there are not enough bits in the data register 1182 to be decoded, the data from the stripper 1171, if present, is loaded in the current cycle. If there is no data from the stripper 1171 in the current cycle, the decoded bits from the data register 1182 are deleted if the number of bits is sufficient, and if not, the contents of the data register 1182 are not changed.

【０２５７】マーカプレシフタ１１７３、ポストシフタ
１１８１、マーカレジスタ１１８３は、データプレシフ
タ１１７２、データポストシフタ１１８０、データレジ
スタ１１８２とそれぞれ同一の部位である。部位１１７
３、１１８１、１１８３内のデータフローならびにこれ
らの部位間のデータフローも、部位１１７２、１１８
０、１１８２間でのデータフローと同一である。同様の
制御信号が制御部１１８５より双方の部位セットに送ら
れる。これらの部位の差異は、マーカプレシフタ１１７
３とデータプレシフタ１１７２の入力データ種別と、マ
ーカレジスタ１１８３とデータレジスタ１１８２の内容
がどのように用いられるか、という点である。図８８に
示すように、ストリッパ１１７１からのタグ１２６１は
８ビットワードとして入力され、データレジスタ１１８
２に向かうデータバイトごとに２ビット割り当てられて
いる。図８５に示した符号化手法によれば、有効かつマ
ーカ後部であるバイトを示す２ビットタグの最大ビット
は１である。ストリッパ１１７１から同時に送られる４
つのタグの最大ビット位置のみが、マーカプレシフタ１
１７３の入力１２６２として送出される。このようにし
て、マーカプレシフタへの入力には、はじめに符号化さ
れたデータビットでマーカの後部に位置する位置を示す
１がセットされたビットが存在することになる。同時
に、これらはデータレジスタ１１８２中でマーカが後に
続くはじめに符号化されたデータビットの位置をマーク
している。マーカレジスタ１１８３中のマーカ位置ビッ
トとデータレジスタ１１８２中のデータビットの同期的
な振る舞いによって、制御ブロック１１８５はパディン
グビットの検出や削除を行うことができるとともに、復
号データと同時にマーカ位置を復号器の出力に送出する
ことができる。上述の通り、２つのプレシフタ（データ
１１７２とマーカ１１７３）、ポストシフタ（データ１
１８０とマーカ１１８１）、レジスタ（データ１１８２
とマーカ１１８３）は同一の制御信号を与えられている
ため、完全な並列、同期動作が可能となる。The marker pre-shifter 1173, post-shifter 1181 and marker register 1183 are the same as the data pre-shifter 1172, data post-shifter 1180 and data register 1182, respectively. Part 117
3, 1181 and 1183, as well as the data flow between these parts,
0, 1182. A similar control signal is sent from the control unit 1185 to both sets of parts. The difference between these sites is due to the marker pre-shifter 117.
3 and the input data type of the data pre-shifter 1172, and how the contents of the marker register 1183 and the data register 1182 are used. As shown in FIG. 88, the tag 1261 from the stripper 1171 is input as an 8-bit word, and the data register 118
Two bits are allocated for each data byte going to 2. According to the encoding method shown in FIG. 85, the maximum bit of the 2-bit tag indicating the byte that is valid and is the rear portion of the marker is 1. 4 sent simultaneously from stripper 1171
Only the maximum bit position of one tag is
173 is sent out as an input 1262. In this way, the input to the marker pre-shifter has a bit set to 1 indicating the position located at the rear of the marker in the data bit encoded first. At the same time, they mark the position of the first encoded data bit followed by a marker in data register 1182. The synchronous behavior of the marker position bits in the marker register 1183 and the data bits in the data register 1182 allows the control block 1185 to detect and remove padding bits and to simultaneously decode the marker position with the decoded data. Can be sent to output. As described above, two pre-shifters (data 1172 and marker 1173) and a post-shifter (data 1
180 and marker 1181), register (data 1182)
And the marker 1183) are supplied with the same control signal, so that complete parallel and synchronous operations are possible.

【０２５８】復号部１１８４（図８９にも示されてい
る）は、データレジスタ１１８２の最大１６ビットを入
力し、復号されたハフマン値、復号される現在の入力符
号長、入力符号に続く付加ビット長（復号値の関数とな
る）を抽出するための組み合わせ回路復号部１１８４に
送られる。付加ビット長は、対応する前のハフマンシン
ボルが復号された時点で明らかになり、次のハフマンシ
ンボルの開始位置となる。従って、クロックサイクルご
とに１つの値が復号される速度を維持する場合には、ハ
フマン値の復号を組み合わせ回路ブロックで行わなけれ
ばならない。復号部は、図８９に示すように、１６ビッ
トトークンをデータレジスタ１１８２から入力し、ハフ
マン値（８ビット）、対応するハフマン符号化されたシ
ンボル（４ビット）、付加ビット（４ビット）を生成す
るような組み合わせ回路ブロックとしてハードワイヤさ
れた４つのＰＬＡスタイルの復号テーブルを備えること
が望ましい。A decoding unit 1184 (also shown in FIG. 89) inputs a maximum of 16 bits of the data register 1182, and outputs a decoded Huffman value, a current input code length to be decoded, and additional bits following the input code. The length (which is a function of the decoded value) is sent to the combinational circuit decoding unit 1184 for extracting the length. The additional bit length becomes apparent when the corresponding previous Huffman symbol is decoded, and becomes the start position of the next Huffman symbol. Therefore, to maintain the rate at which one value is decoded every clock cycle, Huffman value decoding must be performed in a combinational circuit block. As shown in FIG. 89, the decoding unit receives the 16-bit token from the data register 1182 and generates a Huffman value (8 bits), a corresponding Huffman-coded symbol (4 bits), and additional bits (4 bits). It is desirable to have four PLA style decoding tables hardwired as a combinational circuit block that performs the following.

【０２５９】パディングビットの削除処理は、制御部１
１８５の一部であるパディングビットの復号部において
データレジスタ１１８２中でパディングビット列が検出
された際の実際の復号処理において行われる。図９０に
パディングビットの復号部を示す。マーカレジスタ１１
８３、１２４２の８最大ビット中にマーカ位置ビットが
存在するかどうかが調べられる。マーカ位置ビットが存
在した場合には、マーカレジスタ１２４２中のマーカビ
ットに先立つビットに対応するデータレジスタ１１８
２、１２４１中のすべてのビットが現在のパディング領
域として判断される。現在のパディング領域の内容は、
パディングビット検出部１２４３によってすべて１であ
るかどうかがチェックされる。現パディング領域のすべ
てのビットが１である場合には、パディングビットであ
ると判断されデータレジスタから削除される。ここで、
削除処理は、データレジスタ１１８２、１２４１（同時
にマーカレジスタ１１８３、１２４２）の内容を対応す
るシフタ１１７２、１１７３、１１８０、１１８１を用
いて１クロックサイクルで左にシフトさせることで行わ
れる。この処理は、復号値が出力されないことを除いて
通常の復号モードと同一である。現パディング領域のす
べてのビットが１でない場合には、パディングビット削
除サイクルではなく通常の復号サイクルが実行される。
パディングビットの検出は上述のように各サイクルごと
に行われ、データレジスタ１１８２にパディングビット
が存在する場合には削除される。The padding bit deletion processing is performed by the control unit 1
This is performed in actual decoding processing when a padding bit string is detected in the data register 1182 in the padding bit decoding unit which is a part of the 185. FIG. 90 shows a padding bit decoding unit. Marker register 11
It is checked whether a marker position bit exists in the eight maximum bits of 83 and 1242. If the marker position bit exists, the data register 118 corresponding to the bit preceding the marker bit in the marker register 1242
2, all bits in 1241 are determined as the current padding area. The contents of the current padding area are
The padding bit detection unit 1243 checks whether all bits are 1 or not. If all bits in the current padding area are 1, it is determined that the bit is a padding bit and is deleted from the data register. here,
The deletion processing is performed by shifting the contents of the data registers 1182 and 1241 (and simultaneously the marker registers 1183 and 1242) to the left in one clock cycle using the corresponding shifters 1172, 1173, 1180 and 1181. This process is the same as the normal decoding mode except that no decoded value is output. If all bits in the current padding area are not 1, a normal decoding cycle is executed instead of a padding bit deletion cycle.
The padding bit is detected for each cycle as described above, and if a padding bit exists in the data register 1182, it is deleted.

【０２６０】図８７は、制御部１１８５を詳細に示した
ものである。制御部の中心部位はレジスタ１２２３であ
り、データレジスタ１１８２中の現有効ビット数を保持
している。マーカレジスタ１１８３中の有効ビット数は
常にデータレジスタ１１８２中の有効ビット数と等し
い。制御部は３つの機能を実行する。第一の機能は、レ
ジスタ１２２３に格納されるデータレジスタ１１８２中
の新しいビット数の計算である。第二の機能は、シフタ
１１７２、１１７３、１１８０、１１８１、１１８６、
１１８７、復号部１１８４、出力フォーマット部１１８
８への制御信号の生成である。第三の機能は、上述のよ
うにデータレジスタ１１８２中のパディングビットの検
出である。FIG. 87 shows the control unit 1185 in detail. The central part of the control unit is a register 1223, which holds the current number of valid bits in the data register 1182. The number of valid bits in the marker register 1183 is always equal to the number of valid bits in the data register 1182. The control unit performs three functions. The first function is to calculate the new number of bits in data register 1182 stored in register 1223. The second function is shifters 1172, 1173, 1180, 1181, 1186,
1187, decoding section 1184, output format section 118
8 is the generation of a control signal. The third function is the detection of padding bits in data register 1182 as described above.

【０２６１】データレジスタ１１８２中の新しいビット
数（ｎｅｗ＿ｎｏｂ）は、データレジスタ１１８２（ｎ
ｏｂ）中の現ビット数と現サイクルにおいてストリッパ
１１７１からロード可能なビット数（ｎｏｓ）との加算
し、現サイクルにおいてデータレジスタ１１８２から削
除されるビット数（ｎｏｒ）を減算したものとして計算
される。ここで、現サイクルは、復号サイクルあるいは
パディングビット削除サイクルである。従って、新しい
ビット数は以下のように計算される。The new number of bits (new_nob) in the data register 1182 is
ob) is calculated by adding the number of bits (nos) that can be loaded from the stripper 1171 in the current cycle to the number of bits (nos) that can be loaded in the current cycle, and subtracting the number of bits (nor) deleted from the data register 1182 in the current cycle. . Here, the current cycle is a decoding cycle or a padding bit deletion cycle. Therefore, the new number of bits is calculated as follows:

【０２６２】ｎｅｗ＿ｎｏｂ＝ｎｏｂ＋ｎｏｓ−ｎｏｒこれらの処理は加算器１２２１と減算器１２２２とで実
行される。なお、現サイクルにおいてストリッパ１１７
１からデータが入力されない場合には（ｎｏｓ）が０と
なる。また、データレジスタ１１８２においてビットが
足りない、即ちデータレジスタ中のビットが制御部１１
８５からの現符号長と続く付加ビット長との和以下であ
ることにより、現サイクルにおいて復号処理が行われな
い場合にも（ｎｏｓ）は０となる。値（ｎｅｗ＿ｎｏ
ｂ）は６４を越えることがあり、ブロック１２２４にお
いて越えているかどうかがチェックされる。このような
場合には、ストリッパ１１７１は停止状態となり、新し
いデータのロードがなされない。マルチプレクサ１２３
３は、ストリッパ１１７１からロードされたビット数を
ゼロにするために用いられる。ここで、ストリッパ１１
７１を停止させる信号は図示されていない。復号部１２
３１からの信号「パディングサイクル」はマルチプレク
サ１２３４を制御し、パディングビット数あるいは復号
ビット数（符号ビットと付加ビットとの長さ）を削除す
べきビット数（ｎｏｒ）として選択する。復号ビット数
がデータレジスタ中のビット数（ｎｏｂ）以上である
と、比較器１２２８において判断されると、マルチプレ
クサ１２３４に与えられるシフトすべき有効ビット数は
ＮＡＮＤゲート１２３０においてゼロに設定される。す
なわち、（ｎｏｒ）はゼロに設定され、データレジスタ
のビットの削除は行われない。マルチプレクサ１２３４
の出力は、ポストシフタ１１８２と１１８３の制御にも
用いられる。データレジスタ１１８２の幅はデッドロッ
ク状態を避けるように設定される。すなわち、ストリッ
パ１１７１からの最大ビット数を収容するだけの領域を
データレジスタに確保するように、あるいは復号／パデ
ィングビット削除サイクルの結果として十分な有効ビッ
ト数が削除されるように設定される。New_nob = nob + nos-nor These processes are executed by the adder 1221 and the subtractor 1222. In the current cycle, the stripper 117 is used.
If no data is input from 1, (nos) becomes 0. Further, the number of bits in the data register 1182 is insufficient, that is, the bit in the data register is
Since it is equal to or less than the sum of the current code length from 85 and the subsequent additional bit length, (nos) becomes 0 even when the decoding process is not performed in the current cycle. Value (new_no
b) may exceed 64, and it is checked at block 1224 if it is. In such a case, the stripper 1171 is stopped, and no new data is loaded. Multiplexer 123
3 is used to reduce the number of bits loaded from the stripper 1171 to zero. Here, the stripper 11
The signal for stopping 71 is not shown. Decoding unit 12
The signal "padding cycle" from 31 controls the multiplexer 1234 to select the number of padding bits or the number of decoded bits (the length of the sign bit and the additional bit) as the number of bits (nor) to be deleted. If the comparator 1228 determines that the number of decoded bits is greater than or equal to the number of bits (nob) in the data register, the number of valid bits to be provided to the multiplexer 1234 is set to zero in the NAND gate 1230. That is, (nor) is set to zero, and the bit of the data register is not deleted. Multiplexer 1234
Are also used to control the post-shifters 1182 and 1183. The width of data register 1182 is set to avoid a deadlock condition. That is, it is set so that an area enough to accommodate the maximum number of bits from the stripper 1171 is secured in the data register, or a sufficient number of valid bits is deleted as a result of the decoding / padding bit deletion cycle.

【０２６３】復号サイクルにおいて削除されるビット数
の計算は加算器１２２６において実行される。オペラン
ドは組み合わせ回路復号部１１８４から入力される。１
６ビットの符号長は復号部において”００００”と符号
化されるため、”ｏｕ＿ｒｅｄｕｃｅ”ロジック１２２
５では”００００”が”１００００”に符号化され、現
在の符号なしのオペランドが得られる。このオペランド
と減算器１２２７の出力とが、出力フォーマットシフタ
１１８６と１１８７への制御信号を与える。The calculation of the number of bits to be deleted in the decoding cycle is performed in the adder 1226. The operand is input from combinational circuit decoding section 1184. 1
Since the 6-bit code length is encoded as “0000” in the decoding unit, the “ou_reduce” logic 122
In 5, the "0000" is encoded to "10000" to obtain the current unsigned operand. This operand and the output of subtractor 1227 provide control signals to output format shifters 1186 and 1187.

【０２６４】ブロック１２２９はＥＯＩ（画像終了）マ
ーカ位置の検出に用いられる。ＥＯＩマーカ自身はスト
リッパ１１７１において削除されるが、ストリッパ１１
７１で削除される以前にＥＯＩマーカに先立つ位置に存
在していたデータの最終ビットとなるパディングビット
は存在する。比較器１２２９では、レジスタ１２２３に
格納されているデータレジスタ１１８２中のビット数が
８以下であるかどうかをチェックする。８以下であれ
ば、ストリッパ１１７１から新しいデータは入力されず
（データレジスタ１１８２が復号されるデータ部の残り
のビットを保持している）、残りのビットが削除された
ＥＯＩマーカの前のパディング領域サイズを示すことに
なる。さらなるパディング領域の処理やパディングビッ
トの削除などは、上述のＲＳＴマーカの前のパディング
ビットの場合に用いた手順と同一である。A block 1229 is used for detecting an EOI (end of image) marker position. The EOI marker itself is deleted by the stripper 1171, but the stripper 11
There is a padding bit which is the last bit of the data existing at a position preceding the EOI marker before being deleted at 71. The comparator 1229 checks whether the number of bits in the data register 1182 stored in the register 1223 is 8 or less. If it is 8 or less, no new data is input from the stripper 1171 (the data register 1182 holds the remaining bits of the data portion to be decoded), and the padding area before the EOI marker from which the remaining bits are deleted It will show the size. The processing of the padding area and the deletion of the padding bit are the same as the procedure used for the padding bit before the RST marker described above.

【０２６５】バレルシフタ１１８６、１１８７と出力フ
ォーマット部１１８８とはサポートする投割を有し、実
施例に応じたさまざまな実装を考えることができる。ま
た、まったく実装されないこともあり得る。これらへの
制御信号は上述のように制御部１１８５より与えられ
る。付加ビットプレシフタ１１８６はデータレジスタか
ら３２ビットを入力し、現在復号されているハフマン符
号長だけ左にシフトする。このようにして、現在復号さ
れている符号に続くすべての付加ビットは、バレルシフ
タ１１８６の出力に合わせて左に位置することになり、
バレルシフタ１１８７への入力として送られる。付加ビ
ットポストシフタ１１８７は、データの出力フォーマッ
トとして用いられ図９１にも示されている１１ビットフ
ィールドにおいて、左整列から右整列に付加ビット位置
を調整する。付加ビットフィールドは出力ワードフォー
マット１１９６においてビット８からビット１８に拡張
され、実際の付加ビット数に応じて最大ビットのいくつ
かは無効であることもある。このビット数はＪＰＥＧ標
準で規定されているように１１９６のビット０から３に
符号化される。出力データフォーマットとして異なるフ
ォーマットを用いる場合には、フォーマットに応じてバ
レルシフタ１１８６、１１８７とその機能を変更するこ
とになる。The barrel shifters 1186 and 1187 and the output format unit 1188 have a support division, and various implementations according to the embodiment can be considered. Also, it may not be implemented at all. These control signals are provided from the control unit 1185 as described above. The additional bit pre-shifter 1186 receives 32 bits from the data register and shifts to the left by the currently decoded Huffman code length. In this way, all additional bits following the currently decoded code will be to the left in accordance with the output of barrel shifter 1186,
Sent as input to barrel shifter 1187. The additional bit post shifter 1187 adjusts an additional bit position from left alignment to right alignment in an 11-bit field used as a data output format and also shown in FIG. The extra bits field is extended from bit 8 to bit 18 in the output word format 1196, and some of the maximum bits may be invalid depending on the actual number of extra bits. This number of bits is encoded into 1196 bits 0 to 3 as defined in the JPEG standard. When a different format is used as the output data format, the barrel shifters 1186 and 1187 and their functions are changed according to the format.

【０２６６】出力フォーマットブロック１１８８は復号
値をパックする処理を行い、ＪＰＥＧ標準では制御部１
１８５から与えられるＤＣ／ＡＣ係数（１１９６，ビッ
ト０から７）とＤＣ係数指示ビット（１１９６，ビット
１９）、付加ビットポストシフタ１１８７から与えられ
る付加ビット（１１９６，ビット８から１８）、マーカ
レジスタ１１８３から与えられるマーカ位置ビット（１
１９６、ビット２３）とを図９１に示すフォーマットに
従ってワードに構成する処理を行う。出力フォーマット
部１１８８は、復号部の出力インタフェースに関する機
能要件にも対処する。出力フォーマット部の実装は、異
なる機能要件の結果として出力インタフェースを変更す
ることになると、通常それに応じて変更される。上述の
ハフマン復号器は非常に効果的な復号処理を提供し、高
速復号処理を実現する。The output format block 1188 performs a process of packing decoded values, and according to the JPEG standard, the control unit 1
DC / AC coefficient (1196, bits 0 to 7) and DC coefficient indication bit (1196, bit 19) given from 185, additional bit (1196, bits 8 to 18) given from additional bit post shifter 1187, marker register 1183 Marker position bit (1
196, bit 23) into words according to the format shown in FIG. The output format unit 1188 also addresses functional requirements for the output interface of the decoding unit. If the output interface changes as a result of different functional requirements, the implementation of the output format part will usually change accordingly. The Huffman decoder described above provides a very effective decoding process and realizes a high-speed decoding process.

【０２６７】３．１７．８画像変換命令これらの命令はソース画像の一般アフィン変換を行うた
めのものである。変換画像の一部を生成する処理は大き
く２つのエリアに分けられる。一つはソース画像のどの
部位が現在の出力スキャンラインと関連するかを決定す
るステップ、もう一つは必要なサブサンプリング／補間
処理を行ってピクセルごとに出力画像を生成するステッ
プである。3.17.8 Image Conversion Instructions These instructions are for performing general affine transformation of a source image. The process of generating a part of the converted image is roughly divided into two areas. One is to determine which part of the source image is associated with the current output scan line, and the other is to perform the necessary sub-sampling / interpolation processing to generate an output image for each pixel.

【０２６８】図９２は、ソース画像の適切な領域が復号
されているものとして、目的ピクセル値を計算するため
に必要なステップ７２０のフローチャートを示してい
る。まず、サブサンプリングが行われていればサブサン
プルが７２１で考慮される。次に、他の補間処理７２２
と他のサブサンプリング処理といった２つの処理が通常
実装されている。通常、補間とサブサンプリングとは別
のステップであるが、補間とサブサンプリングとを一緒
に行う場合もある。補間処理においては、まず周囲の４
ピクセルを探し、プレ乗算７２３が必要であるかどうか
を、双線形補間７２４を行う前に決定する。双線形補間
処理７２４は一般に計算量が非常に多くなるため、これ
により画像変換処理動作が制約される。目的ピクセル値
を計算する最後のステップは、ソース画像から双線形補
間されたサブサンプルを加算する処理である。加算され
たピクセル値はさまざまな方法で積分７２７され、目的
画像ピクセル７２８が生成される。FIG. 92 shows a flowchart of the step 720 required to calculate the destination pixel value, assuming that the appropriate region of the source image has been decoded. First, if subsampling has been performed, the subsample is considered at 721. Next, another interpolation processing 722 is performed.
And other processing such as subsampling processing are usually implemented. Usually, interpolation and subsampling are separate steps, but interpolation and subsampling may be performed together. In the interpolation processing, first, the surrounding 4
Find the pixel and determine if pre-multiply 723 is needed before bilinear interpolation 724 is performed. The bilinear interpolation process 724 generally requires a large amount of calculation, and this limits the image conversion processing operation. The final step in calculating the destination pixel value is to add the bilinearly interpolated subsamples from the source image. The summed pixel values are integrated 727 in various ways to produce a destination image pixel 728.

【０２６９】画像変換命令のための命令ワード符号を図
９３に示すとともに、マイナーオプコードフィールドの
説明を以下の表に示す。命令ワード：マイナーオプコードフィールドThe instruction word code for the image conversion instruction is shown in FIG. 93, and the description of the minor opcode field is shown in the following table. Instruction word: minor opcode field

【０２７０】[0270]

【表１９】 [Table 19]

【０２７１】命令オペランドや結果フィールドの説明を
以下に示す。命令オペランドと結果ワードThe description of the instruction operand and the result field is as follows. Instruction operands and result words

【０２７２】[0272]

【表２０】 [Table 20]

【０２７３】オペランドＡは、実際の変換を定義するた
めに必要なすべての情報を記述している「カーネル記述
子」として知られているデータストラクチャを指す。こ
のデータストラクチャは２つのフォーマットのうちの１
つとなる（Ａ記述子のＬビットで定義される）。図９４
はカーネル記述子の長い符号フォーマットを示し、図９
５は短い符号フォーマットを示す。カーネル記述子は、
以下の情報を記述する。１．ソース画像開始座標７３０（符号なしの固定長、
２４．２４解像度）。位置（０、０）が画像の左上。２．水平７３１と垂直７３２（サブサンプル）デルタ
（２の補数、固定長、２４．２４解像度）３．後述の固定長行列係数中のバイナリポイントの位
置を示す３ビットのｂｐフィールド７３３４．（存在する場合には）積分行列係数７３５。これ
らは、ｂｐフィールドによって暗黙的に指定されたバイ
ナリ点の位置である２０のバイナリ点の「可変」ポイン
ト解像度（２の補数）である。５．カーネル記述子中の残りのワード数を示すｒｌフ
ィールド７３６。この値は列数と行数とを掛けたものか
ら１を引いた値となる。Operand A points to a data structure known as a "kernel descriptor" that describes all the information needed to define the actual conversion. This data structure is one of two formats
(Defined by the L bit of the A descriptor). FIG. 94
9 shows the long code format of the kernel descriptor.
5 indicates a short code format. The kernel descriptor is
Describe the following information. 1. Source image start coordinates 730 (unsigned fixed length,
24.24 resolution). Position (0,0) is the upper left of the image. 2. 2. Horizontal 731 and vertical 732 (subsample) delta (2's complement, fixed length, 24.24 resolution) 3. a 3-bit bp field 733 indicating the position of a binary point in the fixed-length matrix coefficients described below; Integral matrix coefficients 735 (if present). These are the "variable" point resolutions (2's complement) of the 20 binary points, which are the locations of the binary points implicitly specified by the bp field. 5. An rl field 736 indicating the number of words remaining in the kernel descriptor. This value is a value obtained by subtracting 1 from the value obtained by multiplying the number of columns by the number of rows.

【０２７４】記述子のカーネル係数は列ごとに並べられ
るが、ジグザグスキャンとなるように隣り合う列は逆方
向に並べられる。図９６において、オペランドＢはソー
ス画像のスキャンラインを指すインデックステーブルへ
のポインターから成る。インデックステーブルの構造は
図９６に示されているように、オペランドＢ７４０がイ
ンデックステーブル７４１を指し、インデックステーブ
ルが必要なソース画像ピクセルのスキャンライン（例え
ば７４２）を指すという構造である。一般に、インデッ
クステーブルとソース画像ピクセルとはキャッシュ可能
であり、ローカルメモリに位置している。Although the kernel coefficients of the descriptor are arranged for each column, adjacent columns are arranged in the opposite direction so as to form a zigzag scan. In FIG. 96, operand B comprises a pointer to an index table that points to the scan line of the source image. As shown in FIG. 96, the structure of the index table is such that the operand B 740 points to the index table 741, and the index table points to the scan line (eg, 742) of the required source image pixel. Generally, the index table and the source image pixels are cacheable and located in local memory.

【０２７５】オペランドＣは水平／垂直サブサンプルレ
ートを保持している。水平／垂直サブサンプルレート
は、Ｃ記述子が存在する際に指定されるサブサンプル重
み行列の次元によって定義される。行列ｒとｃの次元
は、図９７に示すように画像変換命令のデータワードに
符号化されている。結果ピクセルＰ［Ｎ］のチャネルＮ
は以下の式に基づいて計算される。Operand C holds the horizontal / vertical sub-sample rate. The horizontal / vertical sub-sample rate is defined by the dimensions of the sub-sample weight matrix specified when the C descriptor is present. The dimensions of the matrices r and c are encoded in the data word of the image conversion instruction as shown in FIG. Channel N of result pixel P [N]
Is calculated based on the following equation.

【０２７６】[0276]

【数４】 (Equation 4)

【０２７７】内部的には、積分値は各チャネルごとの３
６のバイナリ点として保持される。フィールド中のバイ
ナリ点の位置は、ＢＰフィールドによって指定される。
ＢＰフィールドは削除する積分結果の先のビット数を示
している。３６ビットの積分値は符号付きの２の補数と
して表現され、指定されたようにクランプ処理あるいは
ラップ処理される。図９８に、係数符号におけるＢＰフ
ィールドの解釈例を示す。Internally, the integral value is 3 for each channel.
6 binary points. The position of the binary point in the field is specified by the BP field.
The BP field indicates the number of bits ahead of the integration result to be deleted. The 36-bit integral is represented as a signed two's complement and clamped or wrapped as specified. FIG. 98 shows an example of interpretation of the BP field in the coefficient code.

【０２７８】３．１７．９畳込み命令レンダリング画像に適用される畳込み処理は、２次元畳
込みカーネルをソース画像に適用して結果画像を生成す
るものである。畳込み処理は通常、エッジ先鋭化やいろ
いろな画像フィルタにおいて用いられる。畳込み処理は
コプロセッサ２２４において実装され、画像変換処理で
はカーネルが各出力ピクセルごとにカーネル幅だけ移さ
れるのに対し、畳込み処理では各出力ピクセルごとに１
ソースピクセルが移動するといった点以外は、画像変換
処理と同様の処理である。3.17.9 Convolution Instruction The convolution processing applied to the rendering image is to apply a two-dimensional convolution kernel to the source image to generate a result image. Convolution is typically used in edge sharpening and various image filters. The convolution process is implemented in the coprocessor 224. In the image conversion process, the kernel is shifted by the kernel width for each output pixel, whereas in the convolution process, one kernel is output for each output pixel.
The processing is the same as the image conversion processing except that the source pixel moves.

【０２７９】ソース画像が値Ｓ（ｘ，ｙ）を有し、ｎｘ
ｍ畳込みカーネルが値Ｃ（ｘ，ｙ）を有すると、ＳとＣ
の畳込みＨ［ｎ］のｎ番目のチャネルは、The source image has the value S (x, y) and nx
If the m convolution kernel has the value C (x, y), then S and C
The n th channel of the convolution H [n] of

【０２８０】[0280]

【数５】 (Equation 5)

【０２８１】で与えられる。ここで、ｉ∈［０，ｃ］，
ｊ∈［０，ｒ］である。オフセット値の意味、中間結果
の解像度、ｂｐフィールドの意味は画像変換命令と同一
である。図９９は、畳込みカーネル７５０がソース画像
７５１に適用し、結果画像７５２を生成する例を示した
図である。ソース画像アドレス生成や出力ピクセル計算
は、画像変換命令と同様に行われる。命令オペランドも
画像変換と同様の形式である。図１００は、畳込み命令
の命令ワード符号を示したものであり、以下の表が種々
のフィールドの説明である。Is given by Where i∈ [0, c],
j∈ [0, r]. The meaning of the offset value, the resolution of the intermediate result, and the meaning of the bp field are the same as those of the image conversion command. FIG. 99 is a diagram showing an example in which the convolution kernel 750 applies a source image 751 to generate a result image 752. The source image address generation and the output pixel calculation are performed in the same manner as the image conversion command. The instruction operand has the same format as the image conversion. FIG. 100 shows the instruction word codes for the convolutional instructions, and the following table describes the various fields.

【０２８２】命令ワードInstruction word

【０２８３】[0283]

【表２１】 [Table 21]

【０２８４】３．１７．１０行列乗算行列乗算は、２つの色空間においてアフィン変換の関係
が存在するような色空間変換処理などに用いられる。行
列乗算は以下の式で定義される。3.17.10 Matrix Multiplication Matrix multiplication is used for color space conversion processing in which there is an affine transformation relationship in two color spaces. Matrix multiplication is defined by the following equation.

【０２８５】[0285]

【数６】 (Equation 6)

【０２８６】行列乗算命令オペランドと結果ワードは以
下のフォーマットを有する。命令オペランドと結果ワードMatrix multiply instruction operands and result words have the following format: Instruction operands and result words

【０２８７】[0287]

【表２２】 [Table 22]

【０２８８】図１０１に行列乗算命令のための命令ワー
ド符号を示すとともに、以下の表にマイナーオプコード
フィールドを示す命令ワードFIG. 101 shows the instruction word codes for the matrix multiply instruction and the following table shows the minor opcode fields.

【０２８９】[0289]

【表２３】 [Table 23]

【０２９０】３．１７．１１ハーフトーン化コプロセッサ２２４はハーフトーン処理のための多値レ
ベルディザーを備える。２から２５５までの値は意味の
あるハーフトーンレベルとなる。ハーフトーンするデー
タは、スクリーンが対応してメッシュあるいはアンメッ
シュである限り、バイト（アンメッシュあるいはメッシ
ュデータからの１チャネル）あるいはピクセル（メッシ
ュ）のどちらでも良い。４つの出力チャネル（あるいは
同一チャネルから４バイト）まで、一緒にパックされた
ようなあるいはバイトごとに１符号にアンパックされた
ようなパックビット（２レベルハーフトーンの場合）あ
るいは符号（２出力レベル以上の場合）生成することが
できる。3.17.11 Halftoning The coprocessor 224 includes multilevel dither for halftoning. Values from 2 to 255 are significant halftone levels. The data to be halftoned can be either bytes (one channel from unmesh or mesh data) or pixels (mesh) as long as the screen is correspondingly mesh or unmesh. Up to four output channels (or four bytes from the same channel), packed bits (in the case of two-level halftones) or codes (two or more output levels), packed together or unpacked to one code per byte ) Can be generated.

【０２９１】出力ハーフトーン値は以下の式を用いて計
算される。（Ｐ×（ｌ−１）＋ｄ）／２５５ここで、ｐはピクセル値（０≦ｐ≦２５５）、ｌはレベ
ル数（２≦ｌ≦２５５）、ｄはディザ行列値（０≦ｄ≦
２５４）である。オペランド符号は以下の通りである。命令オペランドと結果ワードThe output halftone value is calculated using the following equation. (P × (l−1) + d) / 255 where p is a pixel value (0 ≦ p ≦ 255), l is the number of levels (2 ≦ l ≦ 255), and d is a dither matrix value (0 ≦ d ≦
254). Operand codes are as follows. Instruction operands and result words

【０２９２】[0292]

【表２４】 [Table 24]

【０２９３】命令ワード符号では、マイナーオプコード
はハーフトーンレベル数を指定する。オペランドＢ符号
はハーフトーンスクリーンのためのものであり、タイル
合成と同様に符号化される。３．１７．１２階層的画像フォーマット復号階層的画像フォーマット復号処理は複数のステップを含
む。これらのステップは、水平補間、垂直補間、ハフマ
ン復号、残部融合である。各ステップは別の命令でもっ
て実行される。ハフマン復号ステップでは、補間ステッ
プからの補間された値に付加される残りの値がハフマン
符号化される。従って、ＪＰＥＧ復号部がハフマン復号
において用いられる。In the instruction word code, the minor opcode specifies the number of halftone levels. The operand B code is for a halftone screen and is coded in a manner similar to tile synthesis. 3.17.12 Hierarchical Image Format Decoding The hierarchical image format decoding process includes multiple steps. These steps are horizontal interpolation, vertical interpolation, Huffman decoding, and residual fusion. Each step is performed with a different instruction. In the Huffman decoding step, the remaining values added to the interpolated values from the interpolation step are Huffman coded. Therefore, a JPEG decoding unit is used in Huffman decoding.

【０２９４】図１０２に、水平補間処理を示す。出力ス
トリーム７６１は入力ストリーム６７２の２倍のデータ
となり、最後のデータ値７６３は複製されている７６
４。図１０３は４倍の水平補間を行う例である。階層的
画像フォーマット復号の第２ステップでは、線形補間に
よりピクセル列を２倍あるいは４倍に垂直にアップサン
プルする。このステップでは、１ピクセル列がオペラン
ドＡ，他の列がオペランドＢとなる。FIG. 102 shows the horizontal interpolation processing. The output stream 761 is twice as much data as the input stream 672, and the last data value 763 is duplicated 76
4. FIG. 103 shows an example in which quadruple horizontal interpolation is performed. In the second step of the hierarchical image format decoding, the pixel sequence is vertically upsampled by a factor of 2 or 4 by linear interpolation. In this step, one pixel column becomes the operand A and the other columns become the operand B.

【０２９５】垂直補間の場合には２倍、４倍どちらの場
合でも、出力データストリームは入力ストリームと同数
のピクセルとなる。図１０４に、２つの入力データスト
リーム７７０、７７１を用いて２倍補間の出力ストリー
ム７７２と４倍補間の出力ストリーム７７３を生成する
垂直補間の例が示されている。ピクセル補間の場合に
は、補間処理は４つのチャネルピクセルの４チャネルご
とに別々に行われる。In the case of vertical interpolation, in either case of double or quadruple, the output data stream has the same number of pixels as the input stream. FIG. 104 shows an example of vertical interpolation in which two input data streams 770 and 771 are used to generate a two-fold interpolation output stream 772 and a four-fold interpolation output stream 773. In the case of pixel interpolation, the interpolation process is performed separately for every four channels of the four channel pixels.

【０２９６】残部融合処理は、２つのデータストリーム
のバイトごとの加算を含む。第一ストリーム（オペラン
ドＡ）はベース値ストリームであり、第二ストリーム
（オペランドＢ）は残値ストリームである。図１０５
に、残部融合処理を用いた場合の２つの入力ストリーム
７８０、７８１と対応する出力ストリーム７８２を示
す。The remainder fusion process involves byte-by-byte addition of the two data streams. The first stream (operand A) is a base value stream, and the second stream (operand B) is a residual value stream. Fig. 105
Shows two input streams 780 and 781 and an output stream 782 corresponding to the case where the residual fusion processing is used.

【０２９７】図１０６は、階層的画像フォーマット命令
の命令ワード符号を示したものであり、以下の表にマイ
ナーオプコードフィールドの詳細を示す。命令ワード−マイナーオプコードフィールドFIG. 106 shows the instruction word codes of the hierarchical image format instruction. The following table shows the details of the minor opcode field. Instruction word-minor opcode field

【０２９８】[0298]

【表２５】 [Table 25]

【０２９９】３．１７．１３命令コピー命令これらの命令は２つのそれぞれ別のグループに分けられ
る。ａ．汎用データ移動命令これらの命令は、入力インタフェースモジュール、入力
インタフェーススイッチ２５２、ピクセルオーガナイザ
２４６、ＪＰＥＧ符号化部２４１、結果オーガナイザ２
４９、出力インタフェースモジュールからなるコプロセ
ッサ２２４内の通常のデータフローパスを用いる。この
場合、ＪＰＥＧ符号化モジュールはデータを処理を行わ
ずに直接送る。3.17.13 Instruction Copy Instructions These instructions are divided into two distinct groups. a. General-purpose data movement instructions These instructions include an input interface module, an input interface switch 252, a pixel organizer 246, a JPEG encoder 241, a result organizer 2
49, using the normal data flow path in the coprocessor 224 comprising the output interface module. In this case, the JPEG encoding module sends the data directly without any processing.

【０３００】データ操作動作の他の命令としては以下の
ものが挙げられる。・サブバイト値（ビット、２ビット値、４ビット値）の
バイトへのパッキング、アンパッキング・ワード内でのバイトのパッキングとアンパッキング・整列・バイトレーンスワッピングと複製・メモリクリア・値の複製データ操作動作は、ピクセルオーガナイザ（入力）と結
果オーガナイザ（出力）の組み合わせで実行される。多
くの場合、これらの命令は他の命令と組み合わせて用い
られる。ｂ．ローカルＤＭＡ命令データ操作は行われない。図２に示すように、ローカル
メモリ２３６と周辺インタフェース２３７間でデータ転
送（双方向）が行われる。これらの命令は実行が他の命
令とオーバラップする唯一の命令である。最大これらの
命令の１つが「オーバラップしていない」命令と同時に
実行することができる。The other instructions for the data manipulation operation include the following. -Packing and unpacking of sub-byte values (bits, 2-bit values, 4-bit values) into bytes-Packing and unpacking of bytes in words-Alignment-Byte lane swapping and duplication-Memory clear-Duplicate values Data Manipulation operations are performed with a combination of a pixel organizer (input) and a result organizer (output). These instructions are often used in combination with other instructions. b. Local DMA instruction No data operation is performed. As shown in FIG. 2, data transfer (bidirectional) is performed between the local memory 236 and the peripheral interface 237. These instructions are the only instructions whose execution overlaps with other instructions. At most one of these instructions can execute concurrently with "non-overlapping" instructions.

【０３０１】メモリコピー動作では、オペランドＡはコ
ピーするデータを示し、結果オペランドはメモリコピー
命令の目的アドレスを示す。汎用のメモリコピー命令で
は、オペランドＢによって入力へのデータ操作動作が規
定され、オペランドＣによって出力オペランドワードへ
の動作が規定される。３．１７．１４フロー制御命令フロー制御命令は、図９に示したような命令実行モデル
のさまざまな部位を制御するための命令群である。フロ
ー制御命令としては、命令ストリームを実行しちえると
きに１つの仮想アドレスから他のアドレスへの移動を可
能にする条件付きジャンプあるいは条件なしジャンプを
含む。条件付きジャンプ命令は、コプロセッサやレジス
タでもって関連するフィールドをマスクし、所定の値と
比較することにより決定される。これにより命令の一般
性を保つことができる。更に、フロー制御命令は、オー
バラップ命令と非オーバラップ命令との間の同期をとる
ために、あるいはマイクロプログラミングの一部として
用いられる待機命令をも含む。In the memory copy operation, operand A indicates the data to be copied, and the result operand indicates the target address of the memory copy instruction. In a general-purpose memory copy instruction, an operand B defines a data manipulation operation on an input, and an operand C defines an operation on an output operand word. 3.17.14 Flow Control Instruction The flow control instruction is an instruction group for controlling various parts of the instruction execution model as shown in FIG. Flow control instructions include conditional jumps or unconditional jumps that allow movement from one virtual address to another as the instruction stream is executed. A conditional jump instruction is determined by masking the relevant field with a coprocessor or register and comparing it with a predetermined value. This allows the generality of the instruction to be maintained. In addition, flow control instructions also include wait instructions used to synchronize between overlapping and non-overlapping instructions or as part of microprogramming.

【０３０２】図１０７に、フロー制御命令の符号を示
す。また、以下の表はマイナーオプコードの説明であ
る。命令ワード−マイナーオプコードフィールドFIG. 107 shows the codes of the flow control instructions. The following table describes the minor opcodes. Instruction word-minor opcode field

【０３０３】[0303]

【表２６】 [Table 26]

【０３０４】ジャンプ命令においては、オペランドＡワ
ードはジャンプ命令の目的アドレスを指定する。マイナ
ーオプコードのＳビットが０にセットされれば、オペラ
ンドＢはコプロセッサレジスタを指定し、条件のソース
として用いる。オペランドＢ記述子の値はレジスタのア
ドレスを指定し、オペランドＢワードの値がレジスタ内
容を比較する値となる。オペランドＣワードは結果に適
用されるビットごとのマスクを指定する。すなわち、ジ
ャンプ命令条件は以下のビットごとの式が満たされてい
れば真となる。[0304] In the jump instruction, the operand A word specifies the destination address of the jump instruction. If the S bit of the minor opcode is set to 0, operand B specifies the coprocessor register and uses it as the source of the condition. The value of the operand B descriptor specifies the address of the register, and the value of the operand B word is the value for comparing the register contents. The operand C word specifies a bitwise mask applied to the result. That is, the jump instruction condition is true if the following bitwise expression is satisfied.

【０３０５】（（（ｒｅｇｉｓｔｅｒ＿ｖａｌｕｅｘ
ｏｒＯｐｅｒａｎｄＢ）ａｎｄＯｐｅｒａｎｄ
Ｃ）＝０ｘ００００００００）更に、マイクロプログラミングレベルで十分に制御する
ためのレジスタアクセスのためにも当該命令が用いられ
る。３．１８アクセラレータカードのモジュール図２において、種々のモジュールを更に説明する。(((Register_value x
or Operand B) and Operand
C) = 0x0000000000) Further, the instruction is used for register access for sufficient control at the microprogramming level. 3.18 Accelerator Card Modules In FIG. 2, the various modules are further described.

【０３０６】３．１８．１ピクセルオーガナイザピクセルオーガナイザ２４６は入力インタフェーススイ
ッチ２５２からのデータストリームのアドレスを指定し
てバッファに格納する。入力データはピクセルオーガナ
イザの内部メモリに格納されるか、あるいはＭＵＶバッ
ファ２５０に格納される。入力ストリームに対する必要
なのデータ処理を全部済ませた後、必要に応じて入力ス
トリームを主データパス２４２あるいはＪＰＥＧ符号化
器２４１に渡す。ピクセルオーガナイザの動作モードは
通常のＣＢｕｓインタフェースによって構成することが
できる。ピクセルオーガナイザ２４６はＰＯ＿ＣＦＧ制
御レジスタの指定するような五つのモードのうちの一つ
のモードで動作する。これらのモードは次のとおりであ
る。（ａ）アイドルモード：ピクセルオーガナイザ２４６が
動作しないモード。（ｂ）シーケンシャルモード：入力データは内部ＦＩＦ
Ｏに格納されるようになり、ピクセルオーガナイザ２４
６はデータの３２ビットアドレスを生成して入力インタ
フェーススイッチ２５２にデータを要求するモード。（ｃ）色空間変換モード：ピクセルオーガナイザが色空
間変換のためにピクセルをバッファするモード。更に、
ＭＵＶバッファ２５０に格納されているインターバルお
よび分数値を要求する。（ｄ）ＪＰＥＧ圧縮モード：ピクセルオーガナイザ２４
６が画像データをＭＣＵの形式でＭＵＶバッファに格納
するモード。（ｅ）畳込み演算および画像変換モード：ピクセルオー
ガナイザ２４６が行列係数をＭＵＶバッファ２５０に格
納し、必要であれば主データパス２４２にもそれを伝え
るモード。3.18.1 Pixel Organizer The pixel organizer 246 specifies the address of the data stream from the input interface switch 252 and stores it in a buffer. The input data is stored in the internal memory of the pixel organizer or in the MUV buffer 250. After completing all necessary data processing on the input stream, the input stream is passed to the main data path 242 or the JPEG encoder 241 as necessary. The operation mode of the pixel organizer can be configured by a normal CBus interface. Pixel organizer 246 operates in one of five modes as specified by the PO_CFG control register. These modes are as follows. (A) Idle mode: a mode in which the pixel organizer 246 does not operate. (B) Sequential mode: input data is internal FIFO
O and the pixel organizer 24
6 is a mode for generating a 32-bit address of data and requesting data from the input interface switch 252. (C) Color space conversion mode: a mode in which the pixel organizer buffers pixels for color space conversion. Furthermore,
Requests the interval and fraction values stored in the MUV buffer 250. (D) JPEG compression mode: pixel organizer 24
6 is a mode for storing image data in the MUV buffer in MCU format. (E) Convolution operation and image conversion mode: a mode in which the pixel organizer 246 stores the matrix coefficients in the MUV buffer 250 and transmits it to the main data path 242 if necessary.

【０３０７】ピクセルオーガナイザ２４６は主データパ
ス２４２とＪＰＥＧ符号化器２４１の両方ともの動作の
ためにＭＵＶバッファ２５０を使う。色空間変換におい
て、インターバルおよび分数テーブルはＭＵＶＲＡＭ
２５０によって格納され、３６ビットのデータ（４つの
カラーチャネル）×（４ビットのインターバル値と８ビ
ットの分数値）としてアクセスされる。画像変換および
畳込み演算のために、ＭＵＶＲＡＭ２５０は行列係数
および関連する構成データを格納する。係数行列は１６
行×１６列に制限され、各係数の幅は最大２０ビットで
ある。ＭＵＶＲＡＭ２５０は１クロックサイクルあたり
１つの係数を必要とする。係数データに加えて、バイナ
リポイント、ソーススタート座標、サブサンプルデルタ
等の制御情報も主データパス２４２に伝えなければなら
ない。この制御情報は、行列係数より先にピクセルオー
ガナイザ２４６によってフェッチされる。The pixel organizer 246 uses the MUV buffer 250 for operation of both the main data path 242 and the JPEG encoder 241. In color space conversion, the interval and fraction tables are stored in MUV RAM
250 and accessed as 36-bit data (4 color channels) x (4-bit interval value and 8-bit fractional value). For image transformation and convolution operations, MUV RAM 250 stores matrix coefficients and associated configuration data. The coefficient matrix is 16
It is limited to rows x 16 columns, and each coefficient is up to 20 bits wide. MUVRAM 250 requires one coefficient per clock cycle. In addition to coefficient data, control information such as binary points, source start coordinates, sub-sample deltas, etc., must also be communicated to main data path 242. This control information is fetched by the pixel organizer 246 before the matrix coefficients.

【０３０８】ＪＰＥＧ圧縮において、ピクセルオーガナ
イザ２４６は、ＭＵＶバッファ２５０を使ってＭＣＵを
ダブルバッファする。ＪＰＥＧ圧縮の性能向上のために
は、ダブルバッファ技術を使うことが望ましい。ＭＵＶ
ＲＡＭ２５０の１半分は入力インタフェーススイッチ
２５２からのデータを使って書き込まれる。一方、もう
一方の半分は、ＪＰＥＧ符号化器２４１に送るべきデー
タを得るためにピクセルオーガナイザによって読み出さ
れる。ピクセルオーガナイザ２４６は、必要とされる所
におけるカラー成分の水平サブサンプリングを行うとと
もに、入力画像のサイズがＭＣＵの整数倍でない場合に
はＭＣＵをパディングする。In JPEG compression, the pixel organizer 246 double buffers the MCU using the MUV buffer 250. In order to improve the performance of JPEG compression, it is desirable to use a double buffer technique. MUV
One half of RAM 250 is written using data from input interface switch 252. Meanwhile, the other half is read by the pixel organizer to get the data to send to the JPEG encoder 241. The pixel organizer 246 performs horizontal subsampling of color components where needed, and pads the MCU if the size of the input image is not an integral multiple of the MCU.

【０３０９】ピクセルオーガナイザ２４６は、図３２に
おいて前述した、バイトレーンスワップと、正規化と、
バイト入り代えと、バイトパックおよびアンパックと、
複写動作とを含む入力データのフォーマットをも行う。
動作はピクセルオーガナイザレジスタを設定することに
より必要に応じて行われる。図１０８において、ピクセ
ルオーガナイザ２４６をより詳細に説明する。ピクセル
オーガナイザ２４６は、ＣＢｕｓインタフェース制御部
８０１に含まれている自身のレジスタセットの制御に従
い作動しており、ＣＢｕｓインタフェース制御部８０１
はグローバルＣＢｕｓを経由して命令制御部２３５に接
続されている。ピクセルオーガナイザ２４６にはオペラ
ンドフェッチ部８０２が含まれており、ピクセルオーガ
ナイザ２４６が必要とするオペランドデータを入力イン
タフェーススイッチ２５２から要求する。、オペランド
データのスタートアドレスは、実行直前にセットされる
ＰＯ＿ＳＡＩＤレジスタによって指定される。ＰＯ＿Ｓ
ＡＩＤレジスタは、ＰＯ＿ＤＭＲレジスタのＬビットに
よる指定に応じて、即座のデータを保持することもあ
る。現在アドレスポインタはＰＯ＿ＣＤＰレジスタに格
納され、入力インタフェーススイッチの要求があればそ
のバースト長さだけ増加される。データがＭＵＶＲＡ
Ｍ２５０にフェッチされるとき、データの現在オフセッ
トはＰＬ＿ＭＵＶレジスタによって指定されるＭＵＶ
ＲＡＭ２５０のベースアドレスと連結される。The pixel organizer 246 performs the byte lane swap, the normalization,
Byte replacement, byte pack and unpack,
It also formats input data, including copying operations.
Operation is performed as needed by setting the pixel organizer register. Referring to FIG. 108, the pixel organizer 246 will be described in more detail. The pixel organizer 246 operates according to the control of its own register set included in the CBus interface control unit 801, and
Are connected to the instruction control unit 235 via the global CBus. The pixel organizer 246 includes an operand fetch unit 802, and requests operand data required by the pixel organizer 246 from the input interface switch 252. , The start address of the operand data is specified by the PO_SAID register set immediately before execution. PO_S
The AID register may hold immediate data according to the designation by the L bit of the PO_DMR register. The current address pointer is stored in the PO_CDP register and is incremented by the burst length if required by the input interface switch. Data is MUV RA
When fetched into M250, the current offset of the data is the MUV specified by the PL_MUV register.
It is connected to the base address of the RAM 250.

【０３１０】オペランドフェッチ部８０２によってフェ
ッチされたシーケンシャル入力データをバッファするた
めに、ＦＩＦＯ８０３が用いられる。データ操作部８０
４は、図３２において説明したような様々な操作を実行
する。データ操作部の出力はＭＵＶアドレス生成部８０
５に伝えられる。ＭＵＶアドレス生成部８０５は構成レ
ジスタに従ってデータをＭＵＶＲＡＭ２５０、主デー
タパス２４２、ＪＰＥＧ符号化器２４１のどちらかに伝
える。ピクセルオーガナイザ制御部８０６は、ピクセル
オーガナイザ２４６のサブモジュール全てのために必要
な制御信号を生成する状態機械である。必要な信号の中
では、種々のＢｕｓインタフェース上での通信を制御す
る信号も含まれる。ピクセルオーガナイザ制御部は、状
態レジスタの設定に従い他モジュール２３９が必要とす
る診断情報を出力する。The FIFO 803 is used to buffer the sequential input data fetched by the operand fetch unit 802. Data operation unit 80
4 executes various operations as described in FIG. The output of the data operation unit is the MUV address generation unit 80
It is conveyed to 5. The MUV address generation unit 805 transmits data to one of the MUV RAM 250, the main data path 242, and the JPEG encoder 241 according to the configuration register. Pixel organizer control 806 is a state machine that generates the necessary control signals for all sub-modules of pixel organizer 246. The necessary signals include signals for controlling communication on various Bus interfaces. The pixel organizer control section outputs diagnostic information required by the other module 239 according to the setting of the status register.

【０３１１】図１０９において、図１０８のオペランド
フェッチ部８０２をより詳細に示す。オペランドフェッ
チ部８０２には、命令バスアドレス生成部（ＩＡＧ）８
１０が含まれており、オペランドデータをフェッチせよ
という要求を生成する状態機械を含む。この要求は要求
仲裁部８１１に送られが、要求仲裁部８１１はアドレス
生成部８１０の要求とＭＵＶアドレス生成部８０５の要
求（図１０８）との間を仲裁しており、勝ちの要求を入
力（ＭＡＧ）インタフェーススイッチ２５２に送るよう
にしている。要求仲裁部８１１は要求を扱うための状態
機械を含んでいる。これは、ＦＩＦＯカウント部８１４
を用いてＦＩＦＯの状態をモニタし、次の要求をいつデ
スパッチすべきかを決定する。バイトイネーブル生成部
８１２はＩＡＧ８１０の情報を受け取り、入力インタフ
ェーススイッチ２５２がリターンする各オペランドにお
ける有効なバイトを指定するバイトイネーブルパタン８
１６を生成する。バイトイネーブルパタンは関連するオ
ペランドデータとともにＦＩＦＯに格納される。ＭＡＧ
要求とＩＡＧ要求が同時に到着したとき、要求仲裁部８
１１はＭＡＧ要求をＩＡＧ要求より優先して処理する。FIG. 109 shows the operand fetch unit 802 of FIG. 108 in more detail. The operand fetch unit 802 includes an instruction bus address generation unit (IAG) 8
10 and includes a state machine that generates a request to fetch operand data. This request is sent to the request arbitration unit 811. The request arbitration unit 811 arbitrates between the request of the address generation unit 810 and the request of the MUV address generation unit 805 (FIG. 108), and inputs a request for winning ( MAG) is sent to the interface switch 252. Request arbitration unit 811 includes a state machine for handling requests. This is the FIFO count unit 814
To monitor the status of the FIFO and determine when the next request should be dispatched. The byte enable generation unit 812 receives the information of the IAG 810, and the byte enable pattern 8 for specifying a valid byte in each operand returned by the input interface switch 252.
16 is generated. The byte enable pattern is stored in the FIFO along with the associated operand data. MAG
When the request and the IAG request arrive at the same time, the request arbitration unit 8
11 processes the MAG request in preference to the IAG request.

【０３１２】図１０８において、ＭＵＶアドレス生成部
８０５は異なるいくつかのモードで動作する。これらの
モードにおいて、第１はＪＰＥＧ（圧縮）モードであ
る。このモードでは、ＪＰＥＧ圧縮のための入力データ
がデータ操作部８０４によって供給され、ＭＵＶバッフ
ァ２５０はダブルバッファとして使われる。ＭＵＶＲ
ＡＭ２５０アドレス生成部８０５は、データ操作部８０
４によって処理された入力データを格納するに適するＭ
ＵＶバッファのアドレスを生成する。ＭＡＧ８０５は、
格納されたピクセルからカラー成分データを取り出すた
めの読み出しアドレスを生成するとともに、ＪＰＥＧ圧
縮用の８×８ブロークを形成するように動作する。ＭＡ
Ｇ８０５は、ＭＣＵが画像と一部重なっている場合も扱
う。図１１０は、ＭＡＧ８０５が行うパディング動作の
一例を示す。In FIG. 108, the MUV address generator 805 operates in several different modes. The first of these modes is the JPEG (compression) mode. In this mode, input data for JPEG compression is supplied by the data operation unit 804, and the MUV buffer 250 is used as a double buffer. MUV R
The AM 250 address generation unit 805 includes a data operation unit 80
M suitable for storing the input data processed by
Generate the address of the UV buffer. MAG805 is
A read address for extracting color component data from the stored pixels is generated, and an operation is performed to form an 8 × 8 break for JPEG compression. MA
G805 also handles the case where the MCU partially overlaps the image. FIG. 110 shows an example of a padding operation performed by the MAG 805.

【０３１３】普通のピクセルデータにおいて、ＭＡＧ８
０５は、４つの８ビットＲＡＭのＭＵＶＲＡＭ２５０
における同じアドレス内に、４つのカラー成分を格納す
る。同じカラーチャネルからデータを同時に取り出すた
めに、ＭＣＵデータは左にバレルシフトされてからＭＵ
ＶＲＡＭ２５０に格納される。データの左にシフトさ
れるバイト数は、書き込みアドレスの下位２ビットによ
って決定される。例えば、図１１１は、サブサンプリン
グの要らない場合３２ビットピクセルデータがＭＵＶ
ＲＡＭ２５０内で配置されるデータ構造を示す。３チャ
ネル又は４チャネルインタリーブＪＰＥＧモードにおい
ては、入力データのサブサンプリングが選択されること
もあり得る。サブサンプリングを伴うマルチチャネルＪ
ＰＥＧ圧縮モードにおいて、ＭＡＧ８０５（図１０８）
は、ＪＰＥＧ符号化器の最適性能のために３２ビットデ
ータがＭＵＶＲＡＭ２５０に格納される前にサブサン
プリングを行うようになっている。最初四つの入力ピク
セルの中で、ＭＵＶＲＡＭ２５０に格納される第１お
よび第４番目のチャネルだけが有用なデータを含んでい
る。第２および第３番目のチャネルのデータはサブサン
プリングされ、ピクセルオーガナイザ２４６のレジスタ
に格納される。次の４つの入力ピクセルにおいて、第２
および第３番目のチャネルはサブサンプリングされたデ
ータをもって埋められる。図１１２は、マルチチャネル
サブサンプリングモードにおけるＭＣＵデータ構成の一
例を示す。ＭＡＧは単一チャネルアンパックデータ全て
をマルチチャネルピクセルデータと全く同様に扱う。Ｍ
ＵＶＲＡＭから読み出された単一チャネルパックデー
タの一例が図１１３に示されている。In ordinary pixel data, MAG8
05 is an MUV RAM 250 with four 8-bit RAMs
The four color components are stored in the same address in. To retrieve data from the same color channel simultaneously, the MCU data is barrel-shifted to the left before the MU
It is stored in the VRAM 250. The number of bytes shifted to the left of the data is determined by the lower two bits of the write address. For example, FIG. 111 shows that when subsampling is not required, 32-bit pixel data is MUV
2 shows a data structure arranged in the RAM 250. In the 3-channel or 4-channel interleaved JPEG mode, subsampling of input data may be selected. Multi-channel J with sub-sampling
In the PEG compression mode, MAG805 (FIG. 108)
Performs subsampling before 32-bit data is stored in the MUV RAM 250 for optimal performance of the JPEG encoder. Of the first four input pixels, only the first and fourth channels stored in MUV RAM 250 contain useful data. The data of the second and third channels is sub-sampled and stored in a register of the pixel organizer 246. In the next four input pixels, the second
And the third channel is filled with sub-sampled data. FIG. 112 shows an example of an MCU data configuration in the multi-channel sub-sampling mode. The MAG treats all single-channel unpacked data exactly like multi-channel pixel data. M
An example of the single channel pack data read from the UV RAM is shown in FIG.

【０３１４】書き込みプロセスによって入力ＭＣＵがＭ
ＵＶＲＡＭに格納されている間、読み出しプロセスは
ＭＵＶＲＡＭから８×８ブロックを読み出す。一般的
に、前記ブロックは各チャネルに対してデータを順次読
み出すことによって、四つの係数ずつＭＡＧ８０５によ
って生成される。ピクセルデータとアンパック入力デー
タにおいて、格納されるデータは図１１１に示すように
整理される。従って、サンプルされなかったピクセルデ
ータからなる８×８ブロックを合成するためには、読み
出しプロセスはＭＵＶＲＡＭからデータを斜行しなが
ら読み出す。図１１４は、このようなプロセスの一例を
示す。図１１４には、四つのチャネルデータにおける読
み出しシケンス示されており、ＭＵＶＲＡＭ２５０の
格納形式が同一チャネルから多数の値を同時に読み出す
ことを容易にしていることが分かる。The input process sets the input MCU to M
While stored in UV RAM, the read process reads 8x8 blocks from MUV RAM. Generally, the blocks are generated by the MAG 805 for each of the four coefficients by sequentially reading data for each channel. In the pixel data and the unpacked input data, the stored data is organized as shown in FIG. Thus, to synthesize an 8 × 8 block of unsampled pixel data, the read process reads the data from the MUV RAM skewed. FIG. 114 illustrates an example of such a process. FIG. 114 shows a read sequence for four channel data, and it can be seen that the storage format of the MUV RAM 250 makes it easy to read multiple values simultaneously from the same channel.

【０３１５】色変換モードにおいて、ＭＵＶＲＡＭ２
５０はインターバルおよび分数値を格納するキャッシュ
として用いられ、ＭＡＧ８０５はそのキャッシュの制御
部として働くようになっている。ＭＵＶＲＡＭ２５０
は３つのカラーチャネル値をキャッシュする。ここで、
各カラーチャネルは２５６対の４ビットインターバルお
よび分数値を有する。ＤＭＵを通じた各ピクセル出力に
おいて、ＭＵＶＲＡＭ２５０から前記値を得るために
ＭＡＧ８０５が使われる。この値が得られないときに、
ＭＡＧ８０５は欠けているインターバルおよび分数値を
フェッチせよというメモリ読み出し要求を出す。帯域の
有効利用のために、要求あたりエントリ一つだけをフェ
ッチする手法のかわりに、多数のエントリをフェッチす
るような手法を取る。In the color conversion mode, the MUV RAM 2
Numeral 50 is used as a cache for storing interval and fraction values, and MAG 805 functions as a control unit of the cache. MUV RAM250
Caches three color channel values. here,
Each color channel has 256 pairs of 4-bit intervals and fractional values. At each pixel output through the DMU, MAG 805 is used to obtain the values from MUV RAM 250. When this value is not available,
MAG 805 issues a memory read request to fetch the missing interval and fraction values. For effective use of bandwidth, instead of fetching only one entry per request, a method of fetching many entries is used.

【０３１６】画像変換および畳込み演算のために、ＭＵ
ＶＲＡＭ２５０はＭＤＰの行列係数を記憶している。
ＭＡＧはＭＵＶＲＡＭ２５０に格納されている全ての
行列係数をスキャンする。画像変換および畳込み命令の
始めにおたって、ＭＡＧ８０５はオペランドフェッチ部
に要求を出し、オペランドフェッチ部がカーネル記述
“ヘッダ”（図９４）とバスト要求の第１行列係数とを
フェッチするようにする。For image transformation and convolution operations, the MU
The VRAM 250 stores the matrix coefficients of the MDP.
The MAG scans all matrix coefficients stored in the MUV RAM 250. At the beginning of the image transformation and convolution instructions, MAG 805 issues a request to the operand fetch, which causes the operand fetch to fetch the kernel description "header" (FIG. 94) and the first matrix coefficient of the bust request. .

【０３１７】図１１５において、図１０８のＭＵＶアド
レス生成部（ＭＡＧ）８０５をより詳細に示す。ＭＡＧ
８０５はＩＢｕｓ要求を多重化するＩＢｕｓ要求モジュ
ール８２０を備えており、ＩＢｕｓ要求は画像変換制御
部（ＩＴＸ）８２１と色空間変換（ＣＳＣ）制御部８２
２によって生成される。この要求は、要求を実行するよ
うになっているオペランドフェッチ部に送られる。ピク
セルオーガナイザ２４６は画像変換、色空間変換のどち
らか１つのモードで動作するようになっているため、制
御部８２１，８２２の間では仲裁が要らないことにな
る。ＩＢｕｓ要求モジュール８２０は、オペランドフェ
ッチ部への要求を生成するのに必要なバストアドレスと
バスト長さとを含む情報を、関連するピクセルオーガナ
イザから導出する。FIG. 115 shows the MUV address generator (MAG) 805 of FIG. 108 in more detail. MAG
Reference numeral 805 includes an IBus request module 820 for multiplexing the IBus request.
2 generated. The request is sent to an operand fetch unit that is adapted to execute the request. Since the pixel organizer 246 operates in one of the image conversion mode and the color space conversion mode, no arbitration is required between the control units 821 and 822. The IBus request module 820 derives information including a bust address and a bust length necessary to generate a request to the operand fetch unit from the associated pixel organizer.

【０３１８】ＪＰＥＧ制御部８２４は、ＪＰＥＧ書き込
み制御部とＪＰＥＧ読み出し制御部という２つの状態機
械を備えており、ＪＰＥＧモードにおいて使われる。前
記二つの制御部は同時に作動するようになっており、内
部レジスタを用いることによってお互いに同期を取る。
ＪＰＥＧ圧縮動作において、ＤＭＵはＭＣＵデータを出
力しＭＵＶＲＡＭに格納する。ＪＰＥＧ書き込み制御
部は水平パディングとピクセルサブサンプリングの制御
とを担当しており、ＪＰＥＧ読み出し制御部は垂直パデ
ィングを担当する。水平パディングはＤＭＵ出力を停止
することによって行われ、垂直パディングは既に読み出
した８×８ブロックを再び読み出すことによって行われ
る。The JPEG controller 824 has two state machines, a JPEG write controller and a JPEG read controller, and is used in the JPEG mode. The two control units operate simultaneously and are synchronized with each other by using an internal register.
In the JPEG compression operation, the DMU outputs MCU data and stores it in the MUV RAM. The JPEG write control unit is responsible for horizontal padding and pixel subsampling control, and the JPEG read control unit is responsible for vertical padding. Horizontal padding is performed by stopping the DMU output, and vertical padding is performed by re-reading the already read 8 × 8 block.

【０３１９】ＪＰＥＧ書き込み制御部は、ソース画像に
おけるＤＣＵおよびＤＭＵ出力ピクセルの現在位置をト
ラッキングしており、水平パディングのためにいつＤＭ
Ｕを停止すべきかを決定するのにその情報を用いる。Ｍ
ＣＵがＭＵＶＲＡＭ２５０に書き込まれたときに、Ｊ
ＰＥＧ書き込み制御部は内部レジスタをセットするかま
たはリセットすることによって、ＭＣＵが画像の右エッ
ジにあるかあるいは画像の最低エッジにあるかを表す。
ＪＰＥＧ読み出し制御部は、前記レジスタの内容に基づ
き、垂直パディングが必要であるかや画像の最後のＭＣ
Ｕまで読んだのかを判断する。The JPEG write control keeps track of the current position of the DCU and DMU output pixels in the source image and
The information is used to determine if U should be stopped. M
When the CU is written to the MUV RAM 250, J
The PEG write control sets or resets an internal register to indicate whether the MCU is at the right edge of the image or at the lowest edge of the image.
Based on the contents of the register, the JPEG read control unit determines whether vertical padding is necessary or the last MC of the image.
Judge whether you read up to U.

【０３２０】ＪＰＥＧ書き込み制御部はＤＭＵ出力デー
タをトラッキングし、ＤＭＵ出力データをＭＵＶＲＡ
Ｍ２５０に格納する。前記制御部は、レジスタセットを
用いて入力ピクセルの現在位置を記憶する。この情報は
ＤＭＵ出力を停止して水平パディングを行うときに使わ
れる。全てのＭＣＵがＭＵＶＲＡＭ２５０に書き込ま
れたときに、前記制御部はＭＣＵ情報をＪＰＥＧ−ＲＷ
−ＩＰＣレジスタに書き込み、以後ＪＰＥＧ読み出し制
御部によって利用し得るようにする。The JPEG write control unit tracks the DMU output data and converts the DMU output data to the MUV RA
Store it in M250. The control unit stores a current position of an input pixel using a register set. This information is used when stopping the DMU output and performing horizontal padding. When all the MCUs are written to the MUV RAM 250, the control unit stores the MCU information in the JPEG-RW
-Write to the IPC register so that it can be used by the JPEG read controller thereafter.

【０３２１】この制御部は、最後のＭＣＵがＭＵＶＲ
ＡＭ２５０に書き込まれた後、ＳＬＥＥＰ状態に入り現
在の命令が終了するまでその状態に残る。ＪＰＥＧ読み
出し制御部は、ＭＵＶＲＡＭ２５０に格納されている
ＭＣＵから８×８ブロックを読み出す。マルチチャネル
ピクセルにおいては、制御部がＭＣＵを数回に渡って読
み出すようになっており、ＭＵＶＲＡＭに格納されて
いる各ピクセルから、各読み出しにおける異なるバイト
を抽出する。The control unit determines that the last MCU has the MUV R
After being written to AM 250, it enters SLEEP state and remains there until the current instruction is completed. The JPEG read control unit reads an 8 × 8 block from the MCU stored in the MUV RAM 250. In a multi-channel pixel, the control unit reads the MCU several times and extracts a different byte in each read from each pixel stored in the MUV RAM.

【０３２２】この制御部はＪＰＥＧ−ＲＷ−ＩＰＣによ
って提供される情報を用いて、垂直パディングを行うべ
きかを検出する。垂直パディングはＭＵＶＲＡＭ２５
０から読み出した直前の８バイトを再び読み出すことに
よって行われる。画像変換制御部８２１はＩＢｕｓから
カーネルディスクリプタを読み出し、カーネルヘッダを
ＭＤＰ２４２に伝える。そして、ｐｏ．ｌｅｎレジスタ
で指定された回数だけ行列係数をスキャンする。画像変
換および畳込み命令において、ＰＯ２４６による全ての
データ出力はＩＢｕｓから直接フェッチされるようにな
っており、ＤＭＵには伝えられない。This control unit detects whether to perform vertical padding using information provided by JPEG-RW-IPC. Vertical padding is MUV RAM25
This is performed by reading the 8 bytes immediately before reading from 0 again. The image conversion control unit 821 reads the kernel descriptor from the IBus, and transmits the kernel header to the MDP 242. And po. The matrix coefficients are scanned the number of times specified by the len register. In image conversion and convolution instructions, all data output by PO 246 is to be fetched directly from IBus and is not passed on to the DMU.

【０３２３】カーネルヘッダの直後フェッチされる第１
行列係数の最初８ビットは、フェッチすべき残りの行列
係数の数を表す。カーネルヘッダは修正されずに直接Ｍ
ＤＰに伝えられるが、行列係数はＭＤＰに伝えられる前
にサイン拡張される。ピクセルサブサンプラ８２５は、
それぞれが入力ワードの１バイトに対して動作する二つ
の同じチャネルサブサンプラを備える。関連する構成レ
ジスタが起動されていないときに、ピクセルサブサンプ
ラは自身の入力をそのまま自身の出力にコピーする。一
方、構成レジスタが起動されているときに、サブサンプ
ラは入力データに対して平均を取るか又は間引きを行う
かすることによって入力データをサブサンプルする。First fetched immediately after kernel header
The first eight bits of the matrix coefficients represent the number of remaining matrix coefficients to fetch. Kernel header is not modified directly M
Passed to the DP, but the matrix coefficients are sign expanded before being passed to the MDP. The pixel subsampler 825 is
It comprises two identical channel subsamplers, each operating on one byte of the input word. When the associated configuration register has not been activated, the pixel subsampler copies its input directly to its output. On the other hand, when the configuration register is activated, the subsampler subsamples the input data by averaging or decimating the input data.

【０３２４】ＭＵＶ多重化モジュール８２６は現在アク
ティブである制御部からＭＵＶ読み出しおよび書き込み
信号を選ぶ。内部多重化部は、ＭＵＶＲＡＭ２５０を
使う種々の制御部を経由して、読み出しアドレス出力を
選ぶ。ＭＵＶＲＡＭ書き込みアドレスはＭＵＶ多重化
モジュールの８ビットレジスタに格納されている。ＭＵ
ＶＲＡＭ２５０を用いる制御部は次のＭＵＶＲＡＭ
アドレスを決定するための制御を行うとともに、書き込
みアドレスレジスタをロードする。The MUV multiplex module 826 selects MUV read and write signals from the currently active control. The internal multiplexing unit selects the read address output via various control units using the MUV RAM 250. The MUV RAM write address is stored in an 8-bit register of the MUV multiplex module. MU
The control unit using the V RAM 250 is the next MUV RAM
Control for determining the address is performed, and the write address register is loaded.

【０３２５】ＭＵＶ有効アクセスモジュール８２７は色
空間変換制御部によって用いられ、データ操作部による
現在ピクセル出力のインターバルおよび分数値がＭＵＶ
ＲＡＭ２５０において利用できるかを決定する。一つ
以上のカラーチャネルが欠けているとき、ＭＵＶ有効ア
クセスモジュール８２７は関連するアドレスをＩＢｕｓ
要求モジュール８２０に伝え、インターバルおよび分数
値をバーストモードでロードする。キャッシュミスがサ
ービスされると、ＭＵＶ有効アクセスモジュール８２７
は今までフェッチされたインターバルおよび分数値のセ
ットを表す内部有効ビットをセットする。The MUV effective access module 827 is used by the color space conversion control unit, and the interval and the fraction value of the current pixel output by the data operation unit are set to the MUV.
Determine if it is available in RAM 250. When one or more color channels are missing, the MUV enabled access module 827 may assign the associated address to the IBus
Communicate to request module 820 to load interval and fraction values in burst mode. When a cache miss is serviced, the MUV enabled access module 827
Sets an internal valid bit that represents the set of interval and fraction values fetched so far.

【０３２６】複写モジュール８２９は、内部ピクセルレ
ジスタが定める回数だけ、入力データを複写する。複写
モジュールが現在の入力ワードを複写している間、入力
ストリームは停止されるようになる。ＰＢｕｓインタフ
ェースモジュール８３０は、ピクセルオーガナイザ２４
６を主データパス２４２およびＪＰＥＧ符号化器２４１
にリタイムするか或いはその逆の処理をするのに使われ
る。最後に、ＭＡＧ制御部８３１は種々のサブモジュー
ルをイニシエイトする信号とシャットダウンする信号と
を生成する。なお、ＭＡＧ制御部８３１は、主データパ
ス２４２およびＪＰＥＧ符号化器２４１からの入力ＰＢ
ｕｓ信号に対する多重化をも行う。The copy module 829 copies the input data the number of times determined by the internal pixel register. The input stream will be stopped while the duplication module is copying the current input word. The PBus interface module 830 includes the pixel organizer 24
6 to the main data path 242 and the JPEG encoder 241
Used to retime to or vice versa. Finally, the MAG controller 831 generates a signal for initiating various sub-modules and a signal for shutting down. Note that the MAG control unit 831 sends the input PB from the main data path 242 and the JPEG encoder 241.
Multiplexing is also performed on the us signal.

【０３２７】３．１８．２ＭＵＶバッファ図２においては、これまでの説明から明らかなようにピ
クセルオーガナイザ２４６はＭＵＶバッファ２５０と相
互関係にある。再コンフィギュレーション可能なＭＵＶ
バッファ２５０は単純ルックアップテーブルモード（モ
ード０）、多重ルックアップテーブルモード（モード
１）、ＪＰＥＧモード（モード２）を含む様々な処理モ
ードをサポートしている。それぞれのモードで、バッフ
ァには異なるタイプのデータオブジェクトが格納され
る。例えば、バッファに格納されているデータワード、
様々な検索テーブルの値、単一チャネルデータ、複数チ
ャネルデータはデータオブジェクトである。一般的に、
データオブジェクトは異なるサイズを持つ。更に再コン
フィギュレーション可能なＭＵＶバッファ２５０に格納
されたデータオブジェクトはバッファのオペレーティン
グモードに依存した様々な方法で実際にアクセスでき
る。3.18.2 MUV Buffer In FIG. 2, the pixel organizer 246 is interrelated with the MUV buffer 250, as will be apparent from the above description. Reconfigurable MUV
Buffer 250 supports various processing modes, including simple look-up table mode (mode 0), multiple look-up table mode (mode 1), and JPEG mode (mode 2). In each mode, the buffer stores different types of data objects. For example, data words stored in a buffer,
Various lookup table values, single channel data, and multiple channel data are data objects. Typically,
Data objects have different sizes. Further, the data objects stored in the reconfigurable MUV buffer 250 can actually be accessed in various ways depending on the operating mode of the buffer.

【０３２８】異なるタイプのデータを書き戻したり及び
格納するのに必要な様々な方法を適切にするために、デ
ータオブジェクトはしばしば、格納される前に符号化さ
れる。データオブジェクトのコーディングに用いられる
方法はデータオブジェクトのサイズ、表現されているデ
ータオブジェクトのフォーマット、どのようにデータオ
ブジェクトがバッファから書き戻されるのか、バッファ
上に形成されたメモリモジュールの構成状態によって決
定される。Data objects are often encoded before being stored, in order to accommodate the various methods required to write back and store different types of data. The method used to code the data object is determined by the size of the data object, the format of the data object being represented, how the data object is written back from the buffer, and the configuration of the memory modules formed on the buffer. You.

【０３２９】図１１６は再コンフィギュレーション可能
なＭＵＶバッファ２５０を実装するために用いられるコ
ンポーネントのブロックダイアグラムである。再コンフ
ィギュレーション可能なＭＵＶバッファ２５０はエンコ
ーダ１２９０、ストレージデバイス１２９３、デコーダ
１２９１、アドレス読み込み・ローテーション信号発生
器１２９２からなる。入力データストリーム１２９５に
データオブジェクトが入力された時には、データオブジ
ェクトはエンコーダ１２９０により内部データに符号化
され、内部データストリーム１２９６に配置される。符
号化されたデータオブジェクトはストレージデバイス１
２９３に格納される。FIG. 116 is a block diagram of the components used to implement the reconfigurable MUV buffer 250. The reconfigurable MUV buffer 250 includes an encoder 1290, a storage device 1293, a decoder 1291, and an address read / rotation signal generator 1292. When a data object is input to the input data stream 1295, the data object is encoded into internal data by the encoder 1290 and placed in the internal data stream 1296. The encoded data object is stored in storage device 1
293.

【０３３０】格納されたデータオブジェクトを復号化す
る場合には、符号化されたデータは符号化データ出力ス
トリーム１２９７によりストレージデバイスから取り出
される。符号化データ出力ストリーム１２９７上の符号
化されたデータはデコーダ１２９１によって復号化され
る。復号化されたデータオブジェクトは出力データスト
リーム１２９８上に現れる。When decoding the stored data object, the encoded data is retrieved from the storage device by the encoded data output stream 1297. The encoded data on the encoded data output stream 1297 is decoded by the decoder 1291. The decrypted data object appears on output data stream 1298.

【０３３１】ストレージデバイス１２９３への書き込み
アドレス１０３５はＭＡＧ８０５（図１０８）により与
えられる。書き込みアドレス１２９９，１３００，１３
０１も同様にＭＡＧ８０５（図１０８）によって与えら
れ、アドレス読み込み・ローテーション信号発生器１２
９２によってストレージデバイス１２９３に分配され
る。アドレス読み込み・ローテーション信号発生器１２
９２はまた、入力・出力ローテーション信号１３０３，
１３０４をエンコーダ、デコーダそれぞれに対して生成
する。書き込み有効信号１３０６と１３０７は外部ソー
スから与えられる。コントローラ８０１（図１０８）に
よって与えられる処理モード信号１３０２はエンコーダ
１２９０、デコーダ１２９１、アドレス読み込み・ロー
テーション信号発生器１２９２、ストレージデバイス１
２９３に接続される。インクリメント信号１３０８はア
ドレス読み込み・ローテーション信号発生器内の内部カ
ウンタをインクリメントし、ＪＰＥＧモード（モード
２）でも用いられることがある。The write address 1035 to the storage device 1293 is given by the MAG 805 (FIG. 108). Write address 1299, 1300, 13
01 is also given by the MAG 805 (FIG. 108), and the address read / rotation signal generator 12
92 to the storage device 1293. Address reading / rotation signal generator 12
92 also includes an input / output rotation signal 1303,
1304 is generated for each of the encoder and the decoder. Write valid signals 1306 and 1307 are provided from an external source. The processing mode signal 1302 provided by the controller 801 (FIG. 108) includes an encoder 1290, a decoder 1291, an address read / rotation signal generator 1292, and a storage device 1.
293. The increment signal 1308 increments an internal counter in the address read / rotation signal generator, and may be used in the JPEG mode (mode 2).

【０３３２】再コンフィギュレーション可能なＭＵＶバ
ッファ２５０が単純ルックアップテーブルモード（モー
ド０）である場合には、本質的にバッファ２５０はむし
ろ、単一モードのメモリモジュールの様に動作する。デ
ータオブジェクトは本質的にメモリモジュールにアクセ
スする方法と同様な方法でバッファに格納あるいはバッ
ファから取り出せる。When the reconfigurable MUV buffer 250 is in the simple look-up table mode (mode 0), the buffer 250 essentially behaves like a single mode memory module. A data object can be stored in or retrieved from a buffer in a manner essentially similar to accessing a memory module.

【０３３３】再コンフィギュレーション可能なＭＵＶバ
ッファ２５０が多重ルックアップテーブルモード（モー
ド１）で動作中の時、バッファ２５０はストレージデバ
イス１２９３に格納されている最大３つの検索テーブル
をもちいて複数のテーブルに分割される。検索テーブル
は同時かつ独立にアクセスすることができる。一例を挙
げると、インターバルおよびフラクションの値は多重ル
ックアップテーブルモードのストレージデバイス１２９
３に格納される、テーブルは入力データストリーム１２
９５の下位３バイトを利用してインデックスがつけられ
る。３バイトのそれぞれはストレージデバイス１２９３
に格納された独立の検索テーブルに発行される。When the reconfigurable MUV buffer 250 is operating in the multiple look-up table mode (mode 1), the buffer 250 can be divided into a plurality of tables using up to three search tables stored in the storage device 1293. Divided. The lookup tables can be accessed simultaneously and independently. As an example, the values of the interval and fraction are stored in the storage device 129 in the multiple lookup table mode.
3 are stored in the input data stream 12
The lower 3 bytes of 95 are indexed. Each of the three bytes is a storage device 1293
Issued to an independent search table stored in.

【０３３４】画像がＪＰＥＧ圧縮されているとき、画像
は符号化されたデータストリームに変換される。ピクセ
ルは原画像からＭＣＵのフォーマットで取り出される。
ＭＣＵは画像の左から右に、上から下に読み出される。
それぞれのＭＣＵは多数の８×８のブロックに再合成さ
れる。多数の８×８ブロックはＭＣＵから抽出される。
ＭＣＵは原画像のカラーコンポーネント、複数チャネル
のＪＰＥＧモード、サブサンプリングの必要性等のいく
つかの要因に依存している。８×８のブロックはその後
フォワードＤＣＴ（ＦＤＣＴ）、量子化、エントロピー
符号化される。ＪＰＥＧ圧縮の場合には、符号化された
データはデータストリームからシーケンシャルに読み込
まれる。データストリームはエントロピー復号化、逆量
子化、逆ＤＣＴ（ＩＤＣＴ）が行われる。ＩＤＣＴ処理
の出力は８×８のブロックである。多数の８×８ブロッ
クはＭＣＵを再構成するように統合される。ＪＰＥＧ圧
縮を用いるとき、多数の８×８ブロックは前述の要因に
依存する。再コンフィギュレーション可能なＭＵＶバッ
ファ２５０はＭＣＵを多数の８×８ブロックに分解した
り、多数の８×８ブロックをＭＣＵに再構成したりする
ときにも用いられる。When an image is JPEG compressed, the image is converted to an encoded data stream. Pixels are extracted from the original image in MCU format.
MCUs are read from left to right and top to bottom of the image.
Each MCU is recombined into a number of 8x8 blocks. Many 8x8 blocks are extracted from the MCU.
The MCU depends on several factors, such as the color components of the original image, the multi-channel JPEG mode, and the need for subsampling. The 8 × 8 block is then forward DCT (FDCT), quantized and entropy coded. In the case of JPEG compression, encoded data is read sequentially from a data stream. The data stream is subjected to entropy decoding, inverse quantization, and inverse DCT (IDCT). The output of the IDCT process is an 8 × 8 block. Multiple 8x8 blocks are integrated to reconstruct the MCU. When using JPEG compression, a large number of 8x8 blocks depends on the factors mentioned above. The reconfigurable MUV buffer 250 is also used when decomposing an MCU into a number of 8 × 8 blocks or reconstructing a number of 8 × 8 blocks into an MCU.

【０３３５】再コンフィギュレーション可能なＭＵＶバ
ッファ２５０がＪＰＥＧモードの処理を行っているとき
はバッファ２５０への入力データストリーム１２９５は
ＪＰＥＧ圧縮処理を行っているピクセルあるいはＪＰＥ
Ｇ圧縮処理を行っている単一のコンポーネントを含んで
いる。バッファ２５０の出力データストリームはＪＰＥ
Ｇ伸長処理の単一チャネルデータブロックあるいはＪＰ
ＥＧ伸長処理のピクセルデータを含んでいる。このＪＰ
ＥＧ圧縮の例では、入力ピクセルはＹ，Ｕ，Ｖ，Ｏの４
チャネルまで構成できる。指定の数のピクセルが完成し
たピクセルブロックとして処理処理されたときには、単
一のコンポーネントデータブロックの抽出が開始でき
る。それぞれの単一のコンポーネントデータブロックは
バッファに格納された同チャネルのピクセルからなるデ
ータにより構成される。従ってこの例では、４つまでの
単一のコンポーネントデータブロックをひとつのピクセ
ルデータブロックから抽出できる。この具体例では、再
コンフィギュレーション可能なＭＵＶバッファ２５０が
ＪＰＥＧ圧縮用のＪＰＥＧモード（モード２）で処理を
行っているときには、多数の単位最小コード（ＭＣＵ）
はそれぞれ６４の単一あるいは複数チャネルのピクセル
をバッファに格納でき、多数の６４バイト長の単一チャ
ネルのコンポーネントデータブロックをバッファに格納
されたそれぞれのＭＣＵから抽出できる。例えば、バッ
ファ１２８９がＪＰＥＧ伸長を行うためにＪＰＥＧモー
ド（モード２）である間は、出力データストリームは、
Ｙ，Ｕ，Ｖ，Ｏの最大４つのコンポーネントを持つ出力
ピクセルから構成される。要求された数の完成した単一
のコンポーネントデータブロックをバッファに書き込ん
だときは、ピクセルデータの抽出ができる。異なる色の
コンポーネントに対応する４つの単一のコンポーネント
データブロックからのバイトは出力ピクセルとして取り
出される。When the reconfigurable MUV buffer 250 is performing the JPEG mode processing, the input data stream 1295 to the buffer 250 has the JPEG compression processed pixels or JPE data.
Includes a single component performing G compression. The output data stream of the buffer 250 is JPE
Single channel data block or JP for G decompression processing
It contains the pixel data of the EG expansion process. This JP
In the example of EG compression, the input pixels are Y, U, V, O
Can be configured up to channels. When a specified number of pixels have been processed as a complete pixel block, the extraction of a single component data block can begin. Each single component data block is composed of data of the same channel pixels stored in the buffer. Thus, in this example, up to four single component data blocks can be extracted from one pixel data block. In this specific example, when the reconfigurable MUV buffer 250 is processing in the JPEG mode (mode 2) for JPEG compression, a large number of unit minimum codes (MCUs)
Can buffer 64 single or multiple channels of pixels each, and can extract multiple 64-byte long single channel component data blocks from each buffered MCU. For example, while buffer 1289 is in JPEG mode (mode 2) to perform JPEG decompression, the output data stream
It consists of output pixels with a maximum of four components Y, U, V, O. When the required number of completed single component data blocks have been written to the buffer, pixel data can be extracted. The bytes from the four single component data blocks corresponding to the different color components are retrieved as output pixels.

【０３３６】図１１７は図１１６のエンコーダ１２９０
の詳細図である。ピクセルブロックの伸長のでは、入力
データオブジェクトそれぞれはストレージデバイス１２
９３に格納される前にバイト方向のローテーションによ
り符号化される（図１２９）。ローテーションの大きさ
は入力ローテーション制御信号１３０３により決定され
る。この例ではピクセルデータが最大の４バイトであっ
たときは、３２ビットの４入力１出力のマルチプレクサ
１３２０および１３２５が、４つのうちの１つの可能な
入力ピクセルのローテーションの選択に用いられる。例
えば、もしピクセルの４つのバイトが（３，２，１，
０）のようにラベルが付けられていたとすると、このピ
クセルのローテーションは（３，２，１，０）（０，
３，２，１）（１，０，３，２）（２，１，０，３）と
なる。４つの符号化されたバイトはストレージデバイス
の１２９０に出力される。FIG. 117 shows the encoder 1290 shown in FIG.
FIG. For pixel block decompression, each of the input data objects is stored in the storage device 12.
Before being stored in 93, it is encoded by rotation in the byte direction (FIG. 129). The size of the rotation is determined by the input rotation control signal 1303. In this example, when the pixel data is a maximum of four bytes, 32-bit four-input one-output multiplexers 1320 and 1325 are used to select one of four possible input pixel rotations. For example, if four bytes of a pixel are (3,2,1,
0), the rotation of this pixel is (3,2,1,0) (0,
3, 2, 1) (1, 0, 3, 2) (2, 1, 0, 3). The four encoded bytes are output to the storage device 1290.

【０３３７】バッファがＪＰＥＧモード（モード２）以
外のモード、例えば、単一ルックアップテーブルモード
（モード０）、多重ルックアップテーブルモードである
時には、バイト方向のローテーションは必要ではなく、
また入力データオブジェクトに対して行えない。入力デ
ータオブジェクトは後者の場合に、ノーローテーション
の値をもつ入力ローテーション制御信号を無視すること
によって、ローテーションにより妨害を受ける。この値
１３２３はである。２入力１出力のマルチプレクサ１３
２１は制御信号１３２６を入力ローテーション制御信号
１３０３とノーオペレーション値１３２３の選択をする
ことによって生成する。現在の処理モード１３０２はマ
ルチプレクサ選択信号を生成するために、ピクセルブロ
ック分解モードの値と比較される。。信号１３２６によ
って制御される４入力１出力のマルチプレクサ１３２０
は入力データオブジェクトの４つのローテーションのう
ち１つを選択し、符号化された入力データストリーム１
３２６上に符号化された有力データオブジェクトを生成
する。When the buffer is in a mode other than the JPEG mode (mode 2), for example, in a single lookup table mode (mode 0) or a multiple lookup table mode, rotation in the byte direction is not necessary.
Also, it cannot be performed on input data objects. The input data object is, in the latter case, disturbed by rotation by ignoring the input rotation control signal with no rotation value. This value 1323 is 2 input 1 output multiplexer 13
21 generates a control signal 1326 by selecting an input rotation control signal 1303 and a no-operation value 1323. The current processing mode 1302 is compared with the value of the pixel block decomposition mode to generate a multiplexer select signal. . 4-input, 1-output multiplexer 1320 controlled by signal 1326
Selects one of the four rotations of the input data object and encodes the encoded input data stream 1
Generate a potential data object encoded on 326.

【０３３８】図１１８は符号化された出力データストリ
ーム１２９７を復号化するデコーダ１２９１を実装する
組み合わせ回路の回路図である。デコーダ１３２１はエ
ンコーダと本質的に同様な方法で動作する。デコーダは
データバッファがＪＰＥＧモード（モード２）である場
合のみにデータを操作する。下部の符号化されたデータ
ストリーム１２９７内の符号化された出力データオブジ
ェクトの下位３２ビットはデコーダに渡される。データ
はエンコーダ１２９０でローテーションするのとは逆の
感覚でバイト方向のローテーションを用いて復号化され
る。３２ビットの４入力１出力のマルチプレクサは、可
能な４つの種類の符号化データのうちの１つを選択する
ために用いられる。例えば４バイトの入力ピクセルが
（３，２，１，０）の様にラベルが付けられているとす
ると、このピクセルのローテーションの種類は（３，
２，１，０）（２，１，０，３）（１，０，３，２）
（０，３，２，１）の４つが可能である。出力ローテー
ション制御信号１３０４はバッファがピクセルブロック
分解ノードの時と、他のオペレーションモードでノーオ
ペレーション値が無視されたときに使用される。ノーオ
ペレーション値１３３３は０である。２入力１出力のマ
ルチプレクサ１３３１は、出力ローテーション制御信号
１３０４とノーオペレーション値１３３３の選択を行う
ことで信号１３３４を生成する。現在の処理モード１３
０２はマルチプレクサ選択信号１３３２を生成するため
に、ピクセルブロック分解モードの値と比較される。信
号１３３４よって制御される４入力１出力のマルチプ
レクサ１３３０は符号化された出力データストリーム１
２９７上の符号化された出力データオブジェクトの４種
類のローテーションを選択し、出力データストリーム１
２９８上に出力データを生成する。FIG. 118 is a circuit diagram of a combinational circuit implementing a decoder 1291 for decoding the encoded output data stream 1297. Decoder 1321 operates in essentially the same way as an encoder. The decoder operates on data only when the data buffer is in JPEG mode (mode 2). The lower 32 bits of the encoded output data object in the lower encoded data stream 1297 are passed to the decoder. The data is decoded using byte-wise rotation in the opposite sense as rotating by encoder 1290. A 32-bit 4-input, 1-output multiplexer is used to select one of four possible types of encoded data. For example, if a 4-byte input pixel is labeled as (3,2,1,0), the rotation type of this pixel is (3,2,1,0)
2,1,0) (2,1,0,3) (1,0,3,2)
Four (0, 3, 2, 1) are possible. The output rotation control signal 1304 is used when the buffer is a pixel block decomposition node and when no operation values are ignored in other operation modes. The no operation value 1333 is 0. The two-input one-output multiplexer 1331 generates the signal 1334 by selecting the output rotation control signal 1304 and the no-operation value 1333. Current processing mode 13
02 is compared with the value of the pixel block decomposition mode to generate a multiplexer select signal 1332. A 4-input, 1-output multiplexer 1330 controlled by signal 1334 provides an encoded output data stream 1
297. Select the four types of rotation of the encoded output data object on the output data stream 1
298 on the output data.

【０３３９】図１１６において、回路で用いられる内部
読み込みアドレス生成の方法は、再コンフィギュレーシ
ョン可能なＭＵＶバッファ２５０の処理モード１３０２
によって選択される。単一ルックアップテーブルモード
（モード０）と多重ルックアップテーブルモード（モー
ド１）では読み込みアドレスは外部読み込みアドレス１
２９９，１３００，１３０１の形でＭＡＧ８０５（図１
０８）によって生成される。単純ルックアップテーブル
モード（モード０）ではストレージデバイス１２９３上
にメモリモジュール１３８０，１３８１，１３８２，１
３８３，１３８４，１３８５（図１２１）は一緒に処理
する。メモリモジュール１３８０から１３８５（図１２
１）に与えられる書き込みアドレスと読み込みアドレス
は本質的に同じである。即ち、ストレージデバイス１２
９３は外部回路に１つの読み込みアドレスと１つの書き
込みアドレスの供給のみを必要とし、これらのアドレス
をメモリモジュール１３８０から空１３８５（図１２
１）に分配するために内部ロジックを使用する。モード
０では、読み込みアドレスは外部アドレス１２９９（図
１１６）により与えられ、本質的に変化しないまま内部
アドレス１３４８（図１２１）に分配される。外部読み
込みアドレス１３４９，１３５０，１３５１（図１２
１）はモード０では使用されない。書き込みアドレスは
外部書き込みアドレス１３０５（図１１６）により与え
られ、本質的に修正なしで各メモリモジュール１３８０
から１３８５（図１２１）の書き込みアドレスに接続さ
れる。In FIG. 116, the method of generating the internal read address used in the circuit is based on the processing mode 1302 of the reconfigurable MUV buffer 250.
Selected by. In the single lookup table mode (mode 0) and the multiple lookup table mode (mode 1), the read address is the external read address 1.
MAG805 in the form of 299, 1300, 1301 (FIG. 1)
08). In the simple lookup table mode (mode 0), the memory modules 1380, 1381, 1382, 1 are stored on the storage device 1293.
383, 1384 and 1385 (FIG. 121) are processed together. The memory modules 1380 to 1385 (FIG. 12)
The write address and the read address given in 1) are essentially the same. That is, the storage device 12
93 requires only one read address and one write address to be supplied to the external circuit, and these addresses are stored in the memory module 1380 as empty 1385 (FIG. 12).
Use internal logic to distribute to 1). In mode 0, the read address is given by the external address 1299 (FIG. 116) and is distributed to the internal address 1348 (FIG. 121) essentially unchanged. External read addresses 1349, 1350, 1351 (FIG. 12)
1) is not used in mode 0. The write address is given by the external write address 1305 (FIG. 116), and each memory module 1380 is essentially unchanged.
To 1385 (FIG. 121).

【０３４０】ここでは、多重ルックアップテーブルモー
ド（モード１）における３ルックアップテーブルの構成
を示す。３つのテーブルが独立にアクセスされるとき、
符号化された入力データは１３８０から１３８５（図１
２１）までのすべてのメモりもジュールに同時に書き込
まれ、従って３つのテーブルそれぞれに１つのインデッ
クスが必要となる。メモリモジュール１３８０から１３
８５（図２１２）への３つのインデックス、即ち読み込
みアドレスはストレージデバイス１２９３により与えら
れる。これらの読み込みアドレスは、内部ロジックを用
いて１３８０から１３８５の適切なメモリモジュールに
分配される。本質的に単一ルックアップテーブルモード
のときと同様な手法で、外部から与えられる書き込みア
ドレスは、本質的な変更なしに１３０８から１３８５の
それぞれのメモリモジュールのアドレスに接続される。
その結果、多重ルックアップテーブルモード（モード
１）では外部読み込みアドレス１２９９，１３００，１
３１１は内部読み込みアドレス１３４８，１３４９，１
３５０にそれぞれ分配される。内部読み込みアドレス１
３５２はモード１では使用されない。ＪＰＥＧモード
（モード２）で使用される内部アドレス生成方法は前述
の方法とは異なる。Here, the configuration of a three lookup table in the multiple lookup table mode (mode 1) is shown. When three tables are accessed independently,
The encoded input data is 1380 to 1385 (FIG. 1).
All memories up to 21) are also written to the joule at the same time, thus requiring one index for each of the three tables. Memory modules 1380 to 13
The three indices to 85 (FIG. 212), the read address, are provided by the storage device 1293. These read addresses are distributed to the appropriate memory modules 1380 to 1385 using internal logic. In essentially the same manner as in single look-up table mode, an externally applied write address is connected to each memory module address from 1308 to 1385 without substantial change.
As a result, in the multiple lookup table mode (mode 1), the external read addresses 1299, 1300, 1
311 is an internal read address 1348, 1349, 1
350 respectively. Internal read address 1
352 is not used in mode 1. An internal address generation method used in the JPEG mode (mode 2) is different from the above-described method.

【０３４１】図１１９はＪＰＥＧ圧縮を行うＪＰＥＧモ
ード（モード２）における、再コンフィギュレーション
可能なデータバッファ用の、読み込みアドレスおよびロ
ーテーション信号生成回路１２９２を実装する組み合わ
せ回路の回路図である。ＪＰＥＧモード（モード２）で
は、信号生成器１２９２はコンポーネントカウンタ１３
４０とデータバイトカウンタ１３４１の出力を、ストレ
ージデバイス１２９３を含むメモリーモジュールの内部
読み込みアドレスを計算するために用いている。コンポ
ーネントブロックカウンタ１３４０はストレージデバイ
スに格納されている、ピクセルデータブロックから抽出
したコンポーネントブロック数を生成する。そのブロッ
ク数はデータバイトカウンタ１３４１の出力を４倍する
ことで与えられる。具体的には、ピクセルブロック分解
モードにおける内部読み込みアドレス１３４８、１３４
９、１３５０、１３５１は次のように計算される。コン
ポーネントブロックカウンタはオフセット値１３４３、
１３４４、１３４５、１３４７を計算するために使用さ
れ、また出力データバイトカウンタ１３４１はベース読
み込みアドレス１３５４を生成するために用いられる。
オフセット値１３４３はベース読み込みアドレス１３５
４に加算された１３５８で、加算値は内部読み込みアド
レス１３４８（あるいは１３４９，１３５０，１３５
１）である。メモリモジュールのオフセット値は、多重
メモリモジュールで実行される同時読み込みに対して一
般的に異なる値をとるが、コンポーネントブロックの抽
出においては本質的に同じである。ピクセルデータブロ
ック分解モードにおける４つの内部読み込みアドレスを
計算するのに用いられるベースアドレス１３５４も同様
である。インクリメント信号１３０８はコンポーネント
バイトカウンタのインクリメント信号として使用され
る。カウンタは読み込みが成功する度にインクリメント
される。コンポーネントブロックカウンタインクリメン
ト信号１３５６は、単一校正用をデータブロックが正常
にバッファから取り出された後、コンポーネントブロッ
クカウンタ１３４０をインクリメントするのに用いられ
る。FIG. 119 is a circuit diagram of a combination circuit for mounting a read address and rotation signal generation circuit 1292 for a reconfigurable data buffer in the JPEG mode (mode 2) for performing JPEG compression. In the JPEG mode (mode 2), the signal generator 1292
40 and the output of the data byte counter 1341 are used to calculate the internal read address of the memory module including the storage device 1293. The component block counter 1340 generates the number of component blocks extracted from the pixel data blocks stored in the storage device. The number of blocks is given by multiplying the output of the data byte counter 1341 by four. Specifically, the internal read addresses 1348 and 134 in the pixel block decomposition mode
9, 1350 and 1351 are calculated as follows. The component block counter has an offset value of 1343,
Output data byte counter 1341 is used to generate base read address 1354, and is used to calculate 1344, 1345, 1347.
The offset value 1343 is the base read address 135
4 is added to the internal read address 1348 (or 1349, 1350, 135).
1). The offset values of the memory modules generally take different values for simultaneous reading performed by multiple memory modules, but are essentially the same in the extraction of component blocks. The same applies to the base address 1354 used to calculate the four internal read addresses in the pixel data block decomposition mode. The increment signal 1308 is used as an increment signal of the component byte counter. The counter is incremented for each successful read. The component block counter increment signal 1356 is used to increment the component block counter 1340 after a data block for single calibration has been successfully retrieved from the buffer.

【０３４２】出力ローテーション制御信号１３０４（図
１１６）はコンポーネントブロックカウンタの出力と出
力データバイトカウンタの出力から取り出され、本質的
に内部アドレスの生成と同じ方法である。コンポーネン
トブロックカウンタの出力はローテーションオフセット
１３４７を計算するのに用いられる。出力ローテーショ
ン制御信号１３０４はローテーションオフセット１３５
５とベース読み込みアドレス１３５４の和の最下位２ビ
ットにより与えられる。入力ローテーション制御信号
は、アドレス及びローテーション制御信号生成器の例の
様に、外部書き込みアドレス１３０５の最下位２ビット
により与えられる。The output rotation control signal 1304 (FIG. 116) is derived from the output of the component block counter and the output of the output data byte counter, and is essentially the same as the generation of the internal address. The output of the component block counter is used to calculate a rotation offset 1347. The output rotation control signal 1304 is a rotation offset 135
5 and the base least significant bit of the sum of the base read address 1354. The input rotation control signal is provided by the least significant two bits of the external write address 1305 as in the example of the address and rotation control signal generator.

【０３４３】図１２０は、再コンフィギュレーション可
能なＭＵＶバッファ２５０に格納された単一コンポーネ
ントデータからの多重チャネルピクセルデータの再構成
に用いられるもう１つのアドレス生成器１２９２であ
る。この場合、バッファはＪＰＥＧ伸長のためのＪＰＥ
Ｇモード（モード２）となる。この場合、単一コンポー
ネントデータブロックはバッファに格納され、ピクセル
データブロックはバッファから取り出される。この例で
は、メモリモジュールへの書き込みアドレスは、本質的
変更なしで外部書き込みアドレス１３０５によって与え
られる。単一コンポーネントブロックは連続したメモリ
に格納される。この例の入力ローテーション制御信号１
３０３は単に書き込みアドレスの最下位２ビットによっ
てセットされる。ピクセルカウンタ１３６０は、バッフ
ァ内に格納されている単一コンポーネントブロックから
抽出されたピクセル数の記録を保持するために用いられ
る。ピクセルカウンタの出力は、読み込みアドレス１３
４８、１３４９、１３５０、１３５１及び出力ローテー
ション制御信号１３０４を生成するために用いられる。
一般に読み込みアドレスは、ストレージデバイス１２９
３を構成するそれぞれのモジュール毎に異なっている。
この例では、読み込みアドレスは単一コンポーネントブ
ロックインデックス１３６２、１３６３、１３６４、１
３６５あるいは１３６５とバイトインデックス１３６１
の２つの部分からなる。特定のブロックの単一コンポー
ネントブロックインデックスを計算するために、オフセ
ットが出力ピクセルカウンタのビット３と４に加えられ
る。一般にオフセット１３６６、１３６７、１３６８、
１３６９はそれぞれの読み込みアドレスで異なる。ピク
セルカウンタのビット２からビット０は読み込みアドレ
スのバイトインデックス１３６１に用いられる。読み込
みアドレスは図１２０に示されるように、単一コンポー
ネントブロックインデックス１３６２、１３６３、１３
６４、１３６５あるいは１３６５とバイトインデックス
１３６１の結合の結果である。この例では、出力ローテ
ーション制御信号１３０４は、本質的な変化なしにピク
セルカウンタの出力のビット４とビット３により生成さ
れる。インクリメント信号１３０８はピクセルカウンタ
１３６０をインクリメントするためのピクセルカウンタ
インクリメント信号として使用される。ピクセルカウン
タ１３６０はピクセルが正常にバッファから取り出され
たときにインクリメントされる。FIG. 120 is another address generator 1292 used to reconstruct multi-channel pixel data from single component data stored in the reconfigurable MUV buffer 250. In this case, the buffer is JPE for JPEG decompression.
The mode becomes the G mode (mode 2). In this case, the single component data block is stored in the buffer, and the pixel data block is retrieved from the buffer. In this example, the write address to the memory module is given by the external write address 1305 without substantial change. Single component blocks are stored in contiguous memory. Input rotation control signal 1 of this example
303 is simply set by the least significant two bits of the write address. Pixel counter 1360 is used to keep track of the number of pixels extracted from a single component block stored in the buffer. The output of the pixel counter is read address 13
48, 1349, 1350, 1351 and an output rotation control signal 1304.
Generally, the read address is the storage device 129
3 is different for each module.
In this example, the read address is a single component block index 1362, 1363, 1364, 1
365 or 1365 and byte index 1361
It consists of two parts. An offset is added to bits 3 and 4 of the output pixel counter to calculate a single component block index for a particular block. Offsets 1366, 1367, 1368,
1369 is different for each read address. Bit 2 to bit 0 of the pixel counter are used for the byte index 1361 of the read address. The read address is, as shown in FIG. 120, a single component block index 1362, 1363, 13
64, 1365 or 1365 and the result of combining byte index 1361. In this example, the output rotation control signal 1304 is generated by bits 4 and 3 of the output of the pixel counter without substantial change. The increment signal 1308 is used as a pixel counter increment signal for incrementing the pixel counter 1360. Pixel counter 1360 is incremented when a pixel is successfully removed from the buffer.

【０３４４】図１２１はストレージデバイス１２９３の
構造である。ストレージデバイス１２９３は１３８３、
１３８４、１３８５の３つの４ビットワイドメモリモジ
ュールと１３８０，１３８１、１３８２の３つの８ビッ
トワイドメモリモジュールを持つことができる。メモリ
モジュールは単一ルックアップテーブルモード（モード
０）の３６ビットのワード、多重ルックアップテーブル
モード（モード１）の１２×３ビットのワード、ＪＰＥ
Ｇモード（モード２）における３２ビットのピクセルあ
るいは４×８ビットの単一コンポーネントデータを格納
するために結合できる。通常それぞれのメモリモジュー
ルは符号化された入力及び出力データストリーム（１２
９６と１２９７）の異なる部分に関連づけられる。たと
えば、メモリモジュール１３８０は符号化された入力デ
ータストリーム１２９６のビット０からビット７に接続
されデータ入力ポートと符号化された出力データストリ
ーム１２９７のビット０からビット７に接続されたデー
タ出力ポートをもつ。この例ですべてのメモリモジュー
ルの書き込みアドレスは一緒に接続され、同時に同じ値
を共有する。一方、図１２１に示されるメモリモジュー
ルの読み込みアドレス１３８６，１３８７，１３８８，
１３９０，１３９１は読み込みアドレス生成器１２９２
により与えられ、これらは一般に異なる値をとる。例で
は、共通の書き込み有効信号はすべての８ビットメモリ
モジュールに対して書き込み有効信号を出すために用い
られ、第二の共通の書き込み有効信号はすべての４ビッ
トメモリモジュールに対して書き込み有効信号を出すた
めに用いられる。FIG. 121 shows the structure of the storage device 1293. Storage device 1293 is 1383,
It is possible to have three 4-bit wide memory modules 1384 and 1385 and three 8-bit wide memory modules 1380, 1381 and 1382. The memory module is a 36-bit word in single lookup table mode (mode 0), a 12 × 3 bit word in multiple lookup table mode (mode 1), JPE
Can be combined to store 32 bit pixels or 4x8 bit single component data in G mode (mode 2). Usually each memory module has an encoded input and output data stream (12
96 and 1297). For example, memory module 1380 has a data input port connected to bits 0 through 7 of encoded input data stream 1296 and a data output port connected to bits 0 through 7 of encoded output data stream 1297. . In this example, the write addresses of all memory modules are connected together and simultaneously share the same value. On the other hand, the read addresses 1386, 1387, 1388,
1390 and 1391 are read address generators 1292
Which generally take different values. In the example, a common write enable signal is used to issue a write enable signal to all 8-bit memory modules, and a second common write enable signal sends a write enable signal to all 4-bit memory modules. Used to get out.

【０３４５】図１２２はストレージデバイス１２９３内
のメモリモジュールにアクセスするための読み込みアド
レス１３８６，１３８７，１３８８，１３８９，１３９
０を生成するための組み合わせ回路の回路図である。符
号化されたそれぞれの入力データオブジェクトは部分部
分に分解され、それぞれの部分はストレージデバイスの
独立したメモリモジュール内に格納される。従って通
常、すべての処理モードにおけるすべてのメモリモジュ
ールの書き込みアドレスは本質的には同じであり、メモ
リモジュールの書き込みアドレスを計算するために実質
的にロジックは必要ない。一方、読み込みアドレスは通
常、処理毎に異なり、それぞれの処理モードにおけるメ
モリモジュールそれぞれに対しても異なる。再コンフィ
ギュレーション可能なＭＵＶバッファ２５０の出力デー
タストリーム１２９８内のすべてのバイトはＪＰＥＧ圧
縮のＪＰＥＧモード（モード２）のバッファに格納され
ているピクセルデータから抽出された単位コンポーネン
トデータ、あるいはＪＰＥＧ伸長のＪＰＥＧモードのバ
ッファ内に格納されて単一コンポーネントデータから抽
出されたピクセルデータを含まなくてはならない。出力
データに対する要求はバッファへの４つの読み込みアド
レス１３４８、１３４９、１３５０、１３５１の生成に
よって満たされる。多重ルックアップテーブルモード
（モード１）においては、最大３つの検索テーブルがバ
ッファに格納され、従って最大３つまでの読み込みアド
レス１３４８、１３４９、１３５０が３つの検索テーブ
ルにインデックスをつけるために必要である。すべての
メモリモジュールの読み込みアドレスは単一ルックアッ
プテーブルモード（モード０）の場合と同じであり、読
み込みアドレス２４８のみがこのモードで用いられる。
図１２２に示されている制御回路の例はストレージデバ
イス１２９３を構成する６つのメモリモジュールそれぞ
れの読み込みアドレス１３８６−１３９１を計算するた
めに、バッファの処理モード信号と最大４つの読み込み
アドレスを用いる。読み込みアドレス生成器１２９２は
入力信号として外部アドレスバス１３４８，１３４９、
１３５０、１３５１からなる外部読み込み信号をもち
い、ストレージデバイス１２９３を構成するメモリモジ
ュールの内部読み込みアドレス１３８６，１３８７、１
３８９、１３９０を生成する。FIG. 122 shows read addresses 1386, 1387, 1388, 1389, and 139 for accessing a memory module in the storage device 1293.
FIG. 3 is a circuit diagram of a combinational circuit for generating 0. Each encoded input data object is decomposed into sub-portions, and each sub-portion is stored in a separate memory module of the storage device. Thus, typically, the write addresses of all memory modules in all processing modes are essentially the same, and substantially no logic is required to calculate the write addresses of the memory modules. On the other hand, the read address usually differs for each processing, and also differs for each memory module in each processing mode. All bytes in the output data stream 1298 of the reconfigurable MUV buffer 250 are unit component data extracted from pixel data stored in the JPEG compressed JPEG mode (mode 2) buffer, or JPEG decompressed JPEG. Must contain pixel data stored in the mode buffer and extracted from the single component data. The request for output data is satisfied by generating four read addresses 1348, 1349, 1350, 1351 to the buffer. In the multiple look-up table mode (mode 1), up to three lookup tables are stored in the buffer, so up to three read addresses 1348, 1349, 1350 are needed to index the three lookup tables. . The read addresses of all memory modules are the same as in single look-up table mode (mode 0), and only read address 248 is used in this mode.
The control circuit example shown in FIG. 122 uses the processing mode signal of the buffer and up to four read addresses to calculate the read addresses 1386-1391 of each of the six memory modules constituting the storage device 1293. The read address generator 1292 has external address buses 1348 and 1349 as input signals,
Using the external read signals 1350 and 1351, the internal read addresses 1386, 1387, and 1 of the memory module constituting the storage device 1293 are used.
389, 1390 are generated.

【０３４６】図１２３はバッファ２５０が単一ルックア
ップテーブルモードにある時に、どのようにして２０ビ
ットの行列係数がバッファ２５０に格納されるのかを示
した図である。この場合、データオブジェクトが再コン
フィギュレーション可能なＭＵＶバッファに書き込まれ
るときにはキャッシュ上のデータオブジェクトに対して
エンコーディングは通常行われない。行列係数は８ビッ
トメモリモジュール１３８０，１３８１，１３８２に格
納される。行列係数のビット７からビット０はメモリモ
ジュール１３８０に格納され、ビット１５からビット８
はメモリモジュール１３８１に格納され、ビット１９か
らビット１６はメモリモジュール１３８２の下位４ビッ
トに格納される。命令の残りのために必要であるような
バッファに格納されたデータオブジェクトは何回も取り
出される。単一ルックアップテーブルモードにおける、
すべてのメモリモジュールの読み込みと書き込みのアド
レスは本質的に同じである。FIG. 123 is a diagram showing how 20-bit matrix coefficients are stored in the buffer 250 when the buffer 250 is in the single look-up table mode. In this case, no encoding is normally performed on the data object on the cache when the data object is written to the reconfigurable MUV buffer. The matrix coefficients are stored in 8-bit memory modules 1380, 1381, 1382. Bits 7 to 0 of the matrix coefficient are stored in the memory module 1380 and bits 15 to 8
Are stored in the memory module 1381, and bits 19 to 16 are stored in the lower 4 bits of the memory module 1382. Data objects stored in a buffer as needed for the rest of the instruction are retrieved many times. In single lookup table mode,
The read and write addresses of all memory modules are essentially the same.

【０３４７】図１２４は多重ルックアップテーブルモー
ド（モード１）において、どのようにしてバッファにテ
ーブルエントリが格納されるかを示した図である。この
場合、３つの検索テーブルはバッファに格納され、それ
ぞれの検索テーブルは４ビットのインターバル値と８ビ
ットの小数値をもつ。通常インターバール値は４ビット
のメモリモジュールに格納され、小数値は８ビットのメ
モリモジュールに格納される。この場合３つの検索テー
ブル１４１０，１４１１，１４１２はメモリバンク１３
８０と１３８３、１３８１と１３８４、１３８２と１３
８５に格納される。分離過去も未有効制御信号１３０６
と１３０７（図１２１）はストレージデバイスに格納さ
れている小数値に影響せずにストレージデバイス１２９
３にインターバル値を書き込むことができる。本質的に
同様な方法でインターバル値に影響を与えずに小数値を
書き込むことができる。FIG. 124 is a diagram showing how a table entry is stored in the buffer in the multiple lookup table mode (mode 1). In this case, three search tables are stored in the buffer, and each search table has a 4-bit interval value and an 8-bit decimal value. Normally, the interval value is stored in a 4-bit memory module, and the decimal value is stored in an 8-bit memory module. In this case, the three search tables 1410, 1411 and 1412 are stored in the memory bank 13
80 and 1383, 1381 and 1384, 1382 and 13
85. The control signal 1306 which is not effective even in the separation past
And 1307 (FIG. 121) are stored in the storage device 129 without affecting the decimal value stored in the storage device.
3, an interval value can be written. A decimal value can be written in essentially the same way without affecting the interval value.

【０３４８】図１２５はピクセルデータブロックを単一
要素データブロックに分解するＪＰＥＧモード（モード
２）の状態の再コンフィギュレーション可能なＭＵＶバ
ッファ２５０にどのようにしてピクセルデータが書き込
まれるのかを示した図である。ストレージデバイス１２
９３は、８ビットメモリモジュールと同様な方法で統合
して扱われるメモリモジュール、１３８１と１３８４を
含むメモリモジュール１３８０、１３８１、１３８２、
１３８３、１３８４からなる４つの８ビットメモリバン
クとして統括される。メモリモジュール１３８５はＪＰ
ＥＧモード（モード２）では使用されない。３２ビット
の符号化されたピクセルは４つのバイトに分解され、そ
れぞれが異なる８ビットのメモリモジュールに格納され
る。FIG. 125 shows how pixel data is written to the reconfigurable MUV buffer 250 in the JPEG mode (mode 2), which breaks up pixel data blocks into single element data blocks. It is. Storage device 12
93 is a memory module that is integrated and handled in the same manner as the 8-bit memory module, and memory modules 1380, 1381, 1382 including 1381 and 1384,
It is controlled as four 8-bit memory banks consisting of 1383 and 1384. Memory module 1385 is JP
It is not used in the EG mode (mode 2). The 32-bit encoded pixel is broken down into four bytes, each stored in a different 8-bit memory module.

【０３４９】図１２６は単一コンポーネントモードであ
るストレージデバイス１２９３にどのようにして単一コ
ンポーネントデータブロックが格納されるのかを示した
図である。ストレージデバイス１２９３は、８ビットメ
モリモジュールと同様な方法で統合して扱われるメモリ
モジュール、１３８１と１３８４を含むメモリモジュー
ル１３８０、１３８１、１３８２、１３８３、１３８４
からなる４つの８ビットメモリバンクとして統括され
る。メモリモジュール１３８５はＪＰＥＧモード（モー
ド２）では使用されない。３２ビットの符号化されたピ
クセルは４つのバイトに分解され、それぞれが異なる８
ビットのメモリモジュールに格納される。この場合、単
一コンポーネントブロックは６４バイトからなる。単い
るコンポーネントブロックが亜バッファに書き込まれる
ときは、それぞれに異なる量のバイトローテーションが
適用される。３２ビットの符号化されたピクセルデータ
はバッファ内の異なる単一コンポーネントデータブロッ
クを読むことで取り出される。FIG. 126 shows how a single component data block is stored in storage device 1293 in the single component mode. The storage device 1293 is a memory module integrated and handled in the same way as an 8-bit memory module, and memory modules 1380, 1381, 1382, 1383, 1384 including 1381 and 1384
As four 8-bit memory banks. The memory module 1385 is not used in the JPEG mode (mode 2). The 32-bit encoded pixel is broken down into 4 bytes, each with a different 8 bytes.
Bits are stored in the memory module. In this case, a single component block consists of 64 bytes. When a single component block is written to the sub-buffer, a different amount of byte rotation is applied to each. 32-bit encoded pixel data is retrieved by reading different single component data blocks in the buffer.

【０３５０】より詳細な再コンフィギュレーション可能
なデータバッファ２５０の統括方法は、ピクセルオーガ
ナイザの節を参照せよ。以上の具体例では、再コンフィ
ギュレーション可能はデータバッファが、異なる命令と
関係するデータの処理に用いられることを示した。３つ
の処理モードのある再コンフィギュレーション可能なデ
ータバッファが明らかにされた。異なるアドレスの生成
技術がバッファの処理モードのそれぞれにおいて必要と
なる。単一ルックアップテーブルモード（モード０）は
画像変換において、行列係数をバッファに格納するのに
用いられる。多重ルックアップテーブルモード（モード
１）では多チャネルの色空間変換（ＣＳＣ）における多
数のインターバル及びフラクション検索テーブルをバッ
ファに格納するのに用いられる。ＪＰＥＧモード（モー
ド２）はＪＰＥＧ圧縮、ＪＰＥＧ伸長それぞれにおい
て、ＭＣＵデータを８×８の単一コンポーネントブロッ
クに分解、あるいは８×８の単一コンポーネントブロッ
クをＭＣＵに再合成するのに用いられる。For more details on how to manage the reconfigurable data buffer 250, see the section on the pixel organizer. In the above example, it was shown that the reconfigurable data buffer is used for processing data related to different instructions. A reconfigurable data buffer with three processing modes has been identified. Different address generation techniques are required in each of the buffer processing modes. The single look-up table mode (mode 0) is used to store matrix coefficients in a buffer in image conversion. The multiple look-up table mode (mode 1) is used to buffer multiple interval and fraction lookup tables in a multi-channel color space conversion (CSC). The JPEG mode (mode 2) is used for decomposing MCU data into 8 × 8 single component blocks or recomposing 8 × 8 single component blocks into MCUs in JPEG compression and JPEG expansion.

【０３５１】３．１８．３結果オーガナイザＭＵＶバッファ２５０は結果オーガナイザ２４９におい
ても用いられる。結果オーガナイザ２４９は、メインデ
ータパス２４２あるいはＪＰＥＧコーダ２４１のストリ
ームをバッファしてフォーマットする。結果オーガナイ
ザ２４９はまた、図４２で説明した結果データの圧縮、
非圧縮、非正規化、バイトレーンスワップ、再編成にも
関係する。更に結果オーガナイザ２４９は外部インター
フェースコントローラ２３８、ローカルメモリコントロ
ーラ２３６、周辺インターフェースコントローラ２３７
の要求に対し、その結果を転送する。3.18.3 Result Organizer The MUV buffer 250 is also used in the result organizer 249. The result organizer 249 buffers and formats the stream of the main data path 242 or the JPEG coder 241. The result organizer 249 also compresses the result data described in FIG.
Also related to decompression, denormalization, byte lane swap, and reorganization. Further, the result organizer 249 includes an external interface controller 238, a local memory controller 236, and a peripheral interface controller 237.
In response to the request, transfer the result.

【０３５２】ＪＰＥＧ伸長モードの時、結果オーガナイ
ザ２４９はＭＵＶＲＡＭ２５０をＪＰＥＧコーダ２４
９の画像データをダブルバッファするために用いる。ダ
ブルバッファはＭＵＶＲＡＭ２５０の半分に書き込まれ
ているＪＰＥＧコーダ２４１のデータを用いてＪＰＥＧ
伸長する場合に、同時に残りの半分に書きこまれた画像
データが指定の格納場所に出力されるとき、そのパフォ
ーマンスをあげることができる。In the JPEG decompression mode, the result organizer 249 stores the MUV RAM 250 in the JPEG coder 24.
9 to double buffer the image data. The double buffer uses the data of the JPEG coder 241 written in half of the MUVRAM 250 to perform JPEG
In the case of decompression, when the image data written in the other half at the same time is output to the designated storage location, the performance can be improved.

【０３５３】１，３及び４チャネル画像データは、同一
チャネルからの８ビットのコンポーネントを含む８×８
ブロックの形のＪＰＥＧ伸長を行っている間に、結果オ
ーガナイザ２４９に渡される。結果オーガナイザはこれ
らのブロックを指定の順番でＭＵＶＲＡＭ２５０に格納
し、また複数チャネルのインターリーブ画像のために、
データをＭＵＶＲＡＭ２５０から読みこみを行っている
時のチャネルのメッシュを格納する。例えば、ＹＵＶに
よる３チャネルのＪＰＥＧ圧縮ではＪＰＥＧコーダ２４
１は３つの８×８ブロックを、初めにＹ、次にＵ、最後
にＶの順で出力する。メッシュ処理がはそれぞれブロッ
クか１つのコンポーネントを取り出すことによって行わ
れ、ピクセルを（ＹＵＶＸ）の形で構成する。ここでＸ
は未使用チャネルである。バイトスワッピングは出力チ
ャネルのスワップが必要となったときに行われる。結果
オーガナイザはまた、伸長された出力データのクロマデ
ータの再構成のための必要なサブサンプリング処理を行
う必要がある。このことは生成するためにそれぞれのプ
ログラムチャネルを繰り返すという意味を含んでいる。The 1, 3 and 4 channel image data are 8 × 8 including 8-bit components from the same channel.
While performing JPEG decompression in block form, it is passed to the result organizer 249. The result organizer stores these blocks in the specified order in the MUVRAM 250, and for multi-channel interleaved images,
The mesh of the channel when data is read from the MUVRAM 250 is stored. For example, in a 3-channel JPEG compression by YUV, a JPEG coder 24
1 outputs three 8 × 8 blocks in the order of Y first, then U, and finally V. The meshing is performed by extracting each block or one component, and constructing the pixels in (YUVX) form. Where X
Is an unused channel. Byte swapping occurs when output channel swapping becomes necessary. The result organizer also needs to perform the necessary sub-sampling for reconstruction of the chroma data of the decompressed output data. This implies that each program channel is repeated to generate.

【０３５４】図１２７にもどると図２の結果オーガナイ
ザ２４９の詳細が示されている。結果オーガナイザ２４
９は、その処理に設定されるレジスタのレジスタファイ
ルを含む通常の標準ＣＢｕｓインターフェース８４０周
辺に基礎をおいている。結果オーガナイザ２４９の処理
はピクセルオーガナイザ２４９と同様であるが、リバー
スデータ操作が行われる。データ操作ユニット８４２は
バイトレーンスワッピング、コンポーネント代入、コン
ポーネント解放、非正規化をＭＵＶアドレス発生器８０
５により生成されるデータに対して行う。実行された処
理は図４２を参照して前述の通り説明され、内部レジス
タにセットされた様々なフィールドに従って処理が行わ
れる。ＦＩＦＯキュー８４３は出力データをそれがＲＢ
ｕｓ制御ユニット８４４を用いて出力される前にバッフ
ァを行う。ＲＢｕｓ制御ユニット８４４はアドレスデコ
ーダとアドレス生成器によって構成される。格納モジュ
ール用のアドレスは、必要な出力バイト数のデータに加
えて、内部レジスタに格納される。更に、内部ＲＯ＿Ｃ
ＵＴレジスタはいくつくらいの出力バイトが出力バスの
バイトストリーム上に送られる前に欠落したかを決定す
る。加えて、ＲＯ＿ＬＭＴレジスタは出力制限が中止さ
れた後の次のデータを用いて最大いくつのデータ項目が
出力されるかを決定する。ＭＡＧ８０５はＪＰＥＧ伸長
時にＭＵＶＲＡＭ２５０のアドレスを生成する。ＭＵＶ
ＲＡＭ２５０はＪＰＥＧコーダからの出力をダブルバッ
ファするために用いられる。ＭＡＧ８０５は内部コンフ
ィギュレーションレジスタに依存するＭＵＶＲＡＭ２５
０におけるコンポーネントのメッシュを行い、ピクセル
の入った単一チャネル、３チャネル、４チャネルの出力
を行う。バイトレーンスワッピングがピクセルデータを
適切な場所に格納する前に必要となるので、ＭＵＶＲＡ
Ｍ２５０から得られるデータはデータ操作ユニットを通
して渡される。結果オーガナイザ２４９がＪＰＥＧモー
ドになっていないときはＭＡＧ８０５は単にＰＢｕｓレ
シーバ８４５のデータをデータ操作ユニット８４２にダ
イレクトに送る。Returning to FIG. 127, details of the result organizer 249 of FIG. 2 are shown. Result Organizer 24
9 is based around a normal standard CBus interface 840 that contains a register file of registers that are set for the process. The processing of the result organizer 249 is similar to that of the pixel organizer 249, except that a reverse data operation is performed. The data operation unit 842 performs byte lane swapping, component assignment, component release, and denormalization on the MUV address generator 80.
5 for the data generated. The executed processing is described above with reference to FIG. 42, and processing is performed according to various fields set in the internal registers. The FIFO queue 843 stores the output data in the RB
Buffering is performed before output using the us control unit 844. The RBus control unit 844 includes an address decoder and an address generator. The address for the storage module is stored in an internal register in addition to the data of the required number of output bytes. Further, the internal RO_C
The UT register determines how many output bytes are missing before being sent onto the output bus byte stream. In addition, the RO_LMT register determines the maximum number of data items to be output using the next data after the output restriction has been suspended. The MAG 805 generates an address of the MUVRAM 250 at the time of JPEG decompression. MUV
RAM 250 is used to double-buffer the output from the JPEG coder. The MAG 805 is an MUVRAM 25 depending on an internal configuration register.
A mesh of the components at 0 is performed, and a single channel, three channels, and four channels of pixels are output. Since byte lane swapping is required before storing the pixel data in the proper location, the MUVRA
The data obtained from M250 is passed through a data manipulation unit. When the result organizer 249 is not in the JPEG mode, the MAG 805 simply sends the data of the PBus receiver 845 directly to the data operation unit 842.

【０３５５】３．１８．４オペランドオーガナイザＢ
及びＣ図２に再び戻って、２つの独立なオペランドオーガナイ
ザ２４７と２４８はデータキャッシュコントロール２４
０のデータバッファの機能と、ＪＰＥＧコーダ２４１あ
るいはメインデータパス２４２にデータを転送する機能
を持つ。オペランドオーガナイザ２４７と２４８は様々
なモードで操作される。（ａ）オペランドオーガナイザがＣＢｕｓ要求にたい
してのみ応答するアイドルモード（ｂ）現在の命令のデータがオペランドレジスタの内
部レジスタに格納されている時の直接モード（ｃ）オペレータオーガナイザがシーケンシャルアド
レスおよびデータキャッシュコントローラ２４０のバッ
ファが満杯である時のデータを生成するシーケンシャル
モード。3.18.4 Operand Organizer B
2 and C. Returning again to FIG. 2, two independent operand organizers 247 and 248
0 and a function to transfer data to the JPEG coder 241 or the main data path 242. Operand organizers 247 and 248 operate in various modes. (A) The idle mode in which the operand organizer responds only to a CBus request. (B) The direct mode when the data of the current instruction is stored in the internal register of the operand register. (C) The sequential address and data cache controller 240 Sequential mode to generate data when the buffer is full.

【０３５６】多数のメインデータパス２４２の処理モー
ドは、少なくともどちらかのオペランドオーガナイザに
シーケンシャルモードであることを要求する。オペラン
ドオーガナイザＢ２４７における、合成を含むこれらの
モードは、ほかのイメージを用いて合成されるバッファ
ピクセルで必要である。オペランドオーガナイザＣ２４
８はそれぞれのデータチャネルの値の減衰を行う合成処
理に用いられる。ハーフトーンモードではオペランドオ
ーガナイザＢ２４７は８ビットの行列係数のバッファを
行い、階層的画像フォーマット分解モードではオペラン
ドオーガナイザＢ２４７は垂直補間と残部融合命令の両
方のデータのバッファを行う。（ｄ）定常モードではオペランドオーガナイザＢは単
一の内部データワードの組立とそのワードを内部レジス
タによって指定された回数繰返すことを行う。（ｅ）タイルモードではオペランドオーガナイザＢは
ピクセルタイルを構成するデータのバッファを行う。（ｆ）ランダムモードでは、オペランドオーガナイザ
はデータキャッシュコントローラにＭＤＰ２４２あるい
はＪＰＥＧコーダ２４１のアドレスをダイレクトに転送
する。The processing mode of the multiple main data paths 242 requires at least one of the operand organizers to be in the sequential mode. These modes, including compositing, in operand organizer B247 are required for buffered pixels that are composited with other images. Operand organizer C24
Reference numeral 8 is used in a combining process for attenuating the value of each data channel. In the halftone mode, the operand organizer B247 buffers 8-bit matrix coefficients, and in the hierarchical image format decomposition mode, the operand organizer B247 buffers both the vertical interpolation and the residual fusion instruction data. (D) In the steady mode, operand organizer B assembles a single internal data word and repeats that word the number of times specified by the internal register. (E) In the tile mode, the operand organizer B buffers data constituting a pixel tile. (F) In the random mode, the operand organizer directly transfers the address of the MDP 242 or the JPEG coder 241 to the data cache controller.

【０３５７】内部長さレジスタは、シーケンシャル、タ
イル、定常の各モードの処理の時に、オペランドオーガ
ナイザ２４７、２４８の個々で生成される項目の数を決
定する。オペランドオーガナイザ２４７、２４８それぞ
れは、はそれまでに処理されたデータ項目の数を保持
し、内部レジスタによって決定される値に達したら停止
する。オペランドオーガナイザそれぞれは、バイトレー
ンスワッピングを用いた入力データのフォーマット、コ
ンポーネントの代入、圧縮・非圧縮・正規化機能、にた
いしてより信頼がある。要求された処理は内部レジスタ
を用いてコンフィギュレーションされる。更に、オペラ
ンドオーガナイザ２４７と２４８それぞれはデータ項目
を制限するためにコンフィギュレーションされる。The internal length register determines the number of items generated individually by the operand organizers 247 and 248 when processing in the sequential, tile, and steady mode. Each of the operand organizers 247, 248 holds the number of data items processed so far and stops when it reaches a value determined by an internal register. Each of the operand organizers is more reliable with respect to input data format using byte lane swapping, component substitution, compression / decompression / normalization functions. Requested processing is configured using internal registers. Further, each of the operand organizers 247 and 248 is configured to limit data items.

【０３５８】図１２８では、オペランドオーガナイザ
（２４７、２４８）のより詳細な構成が示されている。
オペランドオーガナイザ２４７、２４８は通常の標準Ｃ
Ｂｕｓインターフェースとオペランドオーガナイザ全体
の制御を司るレジスタ８５０を含む。更に、ＯＢｕｓ制
御ユニット８５１はデータキャッシュコントローラに接
続され、シーケンシャル、タイル、定常の各モードのア
ドレス生成、オペランドオーガナイザ２４７，２４８の
ＯＢｕｓインターフェースの通信を可能にする制御信号
の生成、入力ストリームの過去のクロックサイクルから
保存される状態を必要とする、正規化、繰り返し等を行
うデータ操作ユニットの制御を行う。オペランドオーガ
ナイザ２４７、２４８がシーケンシャル、あるいはタイ
ルモードであるときには、ＯＢｕｓコントローラユニッ
ト８５１はデータの要求をデータキャッシュコントロー
ラに送る。このときアドレスは内部レジスタによって決
定されている。FIG. 128 shows a more detailed configuration of the operand organizers (247, 248).
Operand organizers 247 and 248 are standard C
It includes a register 850 that controls the Bus interface and the entire operand organizer. Further, the OBus control unit 851 is connected to the data cache controller, generates addresses in the sequential, tile, and steady modes, generates control signals enabling communication of the OBus interface of the operand organizers 247 and 248, and stores past input streams. It controls the data manipulation unit that performs normalization, repetition, etc., which requires a state saved from the clock cycle. When the operand organizers 247 and 248 are in the sequential or tile mode, the OBus controller unit 851 sends a request for data to the data cache controller. At this time, the address is determined by the internal register.

【０３５９】それぞれのオペランドオーガナイザは更
に、様々なモードの処理において、データキャッシュコ
ントローラ２４０からのデータをバッファするために用
いられる３６ビット幅のＦＩＦＯバッファ８５２を含ん
でいる。データ操作ユニット８５３は、ピクセルオーガ
ナイザ２４６のデータ操作ユニット８０４に対応する機
能と同じ機能を行う。[0359] Each operand organizer further includes a 36-bit wide FIFO buffer 852 used to buffer data from the data cache controller 240 in various modes of processing. The data operation unit 853 performs the same function as the function corresponding to the data operation unit 804 of the pixel organizer 246.

【０３６０】メインデータパス／ＪＰＥＧコーダインタ
ーフェース８５４は通常処理モードにおいてメインデー
タパスやＪＰＥＧコーダモジュール２４２、２４１でや
りとりされるデータ及びアドレスを分配する。ＭＤＰ／
ＪＣインターフェース８５４はデータ操作ユニット８５
３からのデータをメインデータパス及びそのデータを繰
り返すように構成されたプロセスに送る。色変換モード
の場合には、ユニット８５１、８５４はデータキャッシ
ュコントローラ２４０と色変換テーブルの高速アクセス
を確立するためにバイパスされる。The main data path / JPEG coder interface 854 distributes data and addresses exchanged between the main data path and the JPEG coder modules 242 and 241 in the normal processing mode. MDP /
The JC interface 854 is a data operation unit 85
3 from the main data path and to a process configured to repeat the data. In the case of the color conversion mode, the units 851 and 854 are bypassed to establish high-speed access of the color conversion table with the data cache controller 240.

【０３６１】３．１８．５主データパス部以下の実施例の特徴は、複数の画像処理動作を高速で行
うことのできる低価格のコンピュータアーキテクチャを
提供する画像プロセッサに関するものである。更に、画
像プロセッサは、元々は規定されなかった画像処理動作
を行うように構成されることのできる、柔軟性のあるコ
ンピュータアーキテクチャを提供することを目的とす
る。また、画像プロセッサは、同じロジックをたくさん
持っていて、設計プロセスが簡単で安くなるような、コ
ンピュータアーキテクチャを提供することをも目的とす
る。3.18.5 Main Data Path Unit The feature of the following embodiment relates to an image processor which provides a low-cost computer architecture capable of performing a plurality of image processing operations at high speed. Further, the image processor aims to provide a flexible computer architecture that can be configured to perform image processing operations not originally defined. It is also an object of the image processor to provide a computer architecture that has a lot of the same logic and makes the design process simple and cheap.

【０３６２】コンピュータアーキテクチャは制御レジス
タブロック、復号ブロック、データオブジェクトプロセ
ッサ、および、フロー制御ロジックを具備する。制御レ
ジスタブロックは画像処理動作に関する全ての情報を格
納する。復号ブロックは情報を構成信号に復号し、入力
データオブジェクトインターフェースを構成する。入力
データオブジェクトインターフェースはデータオブジェ
クトを外部から受け取り格納する。そして、これらのデ
ータオブジェクトをデータオブジェクトプロセッサに配
分する。ある画像処理動作においては、入力データオブ
ジェクトインターフェースがデータオブジェクトのアド
レスを生成することもあり、これらのデータオブジェク
トのソースが正しいデータオブジェクトを提供できるよ
うになる。データオブジェクトプロセッサは、受け取っ
たデータオブジェクトに対して算術演算を行う。フロー
制御ロジックは、データオブジェクト処理ロジックの中
のデータオブジェクトフローを制御する。The computer architecture includes a control register block, a decoding block, a data object processor, and flow control logic. The control register block stores all information related to the image processing operation. The decoding block decodes the information into constituent signals and constitutes an input data object interface. The input data object interface receives and stores data objects from outside. Then, these data objects are distributed to the data object processor. In some image processing operations, the input data object interface may generate the addresses of the data objects so that the source of these data objects can provide the correct data object. The data object processor performs an arithmetic operation on the received data object. Flow control logic controls the data object flow in the data object processing logic.

【０３６３】特に、データオブジェクトプロセッサは、
いくつかの同一なデータオブジェクトサブプロセッサを
備えることができ、各サブプロセッサは、入力データオ
ブジェクトの一部を処理する。データオブジェクトサブ
プロセッサは、データオブジェクトの当該部分に対し算
術演算を行ういくつかの同一な多機能算術部、出力デー
タオブジェクトを処理する後処理ロジック、および、多
機能算術部と後処理部とを接続する多重化ロジックを有
する。多機能算術部は計算されたデータオブジェクトの
ための記憶装置を具備する。この記憶装置は、フロー制
御ロジックによってイネーブルされるか又はデスエーブ
ルされる。多機能算術部および多重化ロジックは、復号
ロジックによって生成された構成信号によって構成され
る。In particular, the data object processor
There can be several identical data object sub-processors, each processing a portion of the input data object. The data object subprocessor connects several identical multi-function arithmetic units that perform arithmetic operations on the relevant part of the data object, post-processing logic that processes output data objects, and multi-function arithmetic and post-processing units. Multiplexing logic. The multi-function arithmetic unit has a storage device for calculated data objects. This storage is enabled or disabled by the flow control logic. The multi-function arithmetic unit and the multiplexing logic are configured by configuration signals generated by the decoding logic.

【０３６４】なお、復号ロジックからの構成信号は外部
プログラミングエージェントによって変化されることが
できる。このメカニズムを通じて、どのような多機能ブ
ロックおよび多重化ロジックであっても、外部プログラ
ミングエージェントによって個々に構成することがで
き、前もって規定されなかった画像処理動作を行うよう
に画像プロセッサを構成することを可能にする。本発明
の実施例が持つこれらの特徴およびその他の特徴を以下
で詳述する。Note that configuration signals from the decoding logic can be changed by an external programming agent. Through this mechanism, any multi-function block and multiplexing logic can be individually configured by an external programming agent and configure the image processor to perform image processing operations not previously defined. to enable. These and other features of embodiments of the present invention are described in detail below.

【０３６５】図２において、前述したように、主データ
パス部２４２はＪＰＥＧデータ符号化以外の全てのデー
タ操作動作および命令を行う。これらの命令には、合
成、色空間変換、画像変換、畳込み演算、行列の乗算、
ハーフトーン処理、メモリ複写、および階層画像フォー
マットの解凍が含まれる。主データパス２４２はピクセ
ルオーガナイザ２４６およびオペランドオーガナイザ２
４７、２４８から、ピクセルとオペランドデータとを受
け取り、結果出力を結果オーガナイザ２４９に送る。In FIG. 2, as described above, the main data path section 242 performs all data manipulation operations and instructions other than JPEG data encoding. These instructions include composition, color space conversion, image conversion, convolution, matrix multiplication,
Includes halftoning, memory duplication, and decompression of the hierarchical image format. The main data path 242 includes the pixel organizer 246 and the operand organizer 2
47 and 248, receive the pixel and operand data and send the result output to the result organizer 249.

【０３６６】図１２９は、主データパス部２４２のブロ
ック図である。主データパス部２４２は汎用の画像プロ
セッサであって、入力インターフェース１４６０、画像
データプロセッサ１４６２、命令ワードレジスタ１４６
４、命令ワード復号器１４６８、制御信号レジスタ１４
７０、レジスタファイル１４７２、および、ＲＯＭ１４
７５を備える。FIG. 129 is a block diagram of the main data path unit 242. The main data path unit 242 is a general-purpose image processor, and includes an input interface 1460, an image data processor 1462, and an instruction word register 146.
4. Instruction word decoder 1468, control signal register 14
70, register file 1472, and ROM 14
75.

【０３６７】命令制御部２３５は、バス１４５４を通じ
て、命令ワードを命令ワードレジスタ１４６４へ移す。
それぞれの命令ワードは、実行すべき画像処理動作の種
類や画像処理動作の様々なオプションを選択するプラグ
などの情報を含んでいる。命令ワードは、バス１４６５
を経由して命令ワード復号器１４６８に運ばれる。それ
で、命令制御部２３５は、命令ワードを復号するように
命令ワード復号器１４６８に指示することができる。そ
の指示を受けると、命令復号器１４６８は命令ワードを
制御信号に復号する。それから、これらの制御信号はバ
ス１４６９を経由して制御信号レジスタ１４７０に運ば
れる。それから、制御信号レジスタの出力は、バス１４
７１を経由して入力インターフェース１４６０および画
像データプロセッサ１４６２に接続される。The instruction control unit 235 transfers the instruction word to the instruction word register 1464 via the bus 1454.
Each command word contains information such as the type of image processing operation to be performed and plugs for selecting various options of the image processing operation. The instruction word is bus 1465
Through to the instruction word decoder 1468. Thus, the instruction control unit 235 can instruct the instruction word decoder 1468 to decode the instruction word. Upon receiving the instruction, instruction decoder 1468 decodes the instruction word into a control signal. These control signals are then carried via bus 1469 to control signal register 1470. Then, the output of the control signal register is
It is connected to an input interface 1460 and an image data processor 1462 via a line 71.

【０３６８】主データパス部２４２をより柔軟性のある
ものにするために、命令制御部２３５が制御信号レジス
タ１４７０に直接書き込むこともできる。これによっ
て、主データパス部２４２の構造を熟知している誰で
も、主データパス部２４２の細かい構成を行えるように
なり、主データパス部２４２は命令ワードで記述されて
いない画像処理動作をも実行できるようになる。In order to make the main data path unit 242 more flexible, the instruction control unit 235 can write directly to the control signal register 1470. As a result, anyone who is familiar with the structure of the main data path unit 242 can perform a detailed configuration of the main data path unit 242, and the main data path unit 242 can perform image processing operations not described in the instruction word. You can do it.

【０３６９】所望の画像処理動作を実行するために必要
な全ての情報を命令ワードに収容できない場合、命令制
御部２３５は、その収容できない必要な全ての情報をレ
ジスタファイル１４７２のいくつかの選ばれたレジスタ
に書き込むことができる。この情報は、バス１４７３を
経由して、入力インターフェース１４６０および画像デ
ータプロセッサ１４６２に伝えられる。ある画像処理動
作において、入力インターフェース１４６０は主データ
パス部２４２の現在状態を反映するために、レジスタフ
ァイル１４７２の選ばれたレジスタの内容を更新するこ
ともあり得る。画像処理動作を実行するときに問題が生
じたとき、命令制御部２３５は前述の特徴を用いて、問
題を容易に発見できるようになる。If not all the information necessary to execute a desired image processing operation can be stored in the instruction word, the instruction control unit 235 stores all the information that cannot be stored in some of the register files 1472. Can be written to the register. This information is transmitted to the input interface 1460 and the image data processor 1462 via the bus 1473. In some image processing operations, the input interface 1460 may update the contents of the selected register in the register file 1472 to reflect the current state of the main data path unit 242. When a problem occurs when performing an image processing operation, the instruction control unit 235 can easily find the problem using the above-described features.

【０３７０】命令ワードの復号が終了し、制御信号レジ
スタに所望する制御信号がロードされたとき、命令制御
部２３５は主データパス部２４２に所望画像処理動作の
実行を始めるように指示することができる。この指示を
受けると、入力インターフェース１４６０はバス１４５
１からのデータオブジェクトを受け取り始める。入力イ
ンターフェース１４６０は、実行される画像処理動作の
種類に応じて、オペランドバス１４５２又はオペランド
バス１４５３からのオペランドデータを受け取り始める
か、或は、オペランドデータのアドレスを生成してオペ
ランドバス１４５２又はオペランドバス１４５３からの
オペランドデータを受け取り始める。入力インターフェ
ース１４６０は、制御信号レジスタ１４７０の出力に応
じて、入力データを格納して配列し直す。アフィン画像
変換動作および畳込み演算のような計算を行うとき、入
力インターフェース１４６０はバス１４５２および１４
５３を経由してフェッチされるべき座標をも生成する。When decoding of the instruction word is completed and the desired control signal is loaded into the control signal register, the instruction control unit 235 may instruct the main data path unit 242 to start execution of the desired image processing operation. it can. Upon receiving this instruction, input interface 1460 connects to bus 145.
Start receiving data objects from 1. The input interface 1460 starts receiving operand data from the operand bus 1452 or the operand bus 1453 depending on the type of image processing operation to be performed, or generates an address of the operand data to generate the operand bus 1452 or the operand bus. Begin receiving operand data from 1453. Input interface 1460 stores and rearranges input data according to the output of control signal register 1470. When performing calculations such as affine image transformation operations and convolution operations, input interface 1460 connects to buses 1452 and
It also generates the coordinates to be fetched via 53.

【０３７１】画像データプロセッサ１４６２は、入力イ
ンターフェース１４６０に配列し直してもらったデータ
オブジェクトに対して主算術演算を行う。画像プロセッ
サ１４６２は、所定の補間ファクタで行われる２つのデ
ータオブジェクトの間の補間、２つのデータオブジェク
トの乗算、及びその結果を２５５で割る割算、２つのデ
ータオブジェクトに対する通常の乗算及び足し算、デー
タオブジェクトの分数部に対する様々な精度での切り捨
て、データオブジェクトのオーバフローをある最大値
に、そしてデータオブジェクトのアンダフローをある最
低値にそれぞれ抑えるクランプ、データオブジェクトの
スケーリング及びクランピングというような処理を行う
ことができる。バス１４７１の制御信号は、前記の算術
演算中のどれがデータオブジェクトに対して行われる
か、及びその動作の順序などを制御する。The image data processor 1462 performs a main arithmetic operation on the data objects re-arranged by the input interface 1460. Image processor 1462 performs interpolation between two data objects with a predetermined interpolation factor, multiplies the two data objects, and divides the result by 255, normal multiplication and addition for the two data objects, data Performs operations such as truncation of the fractional part of the object with various precisions, clamping to reduce the overflow of the data object to a certain maximum value, and the underflow of the data object to a certain minimum value, scaling and clamping of the data object. be able to. The control signal on the bus 1471 controls which of the arithmetic operations is performed on the data object, the order of the operations, and the like.

【０３７２】ＲＯＭ１４７５は、８．８フォーマットで
切り捨てられた２５５／ｘの被除数を有するが、ここ
で、ｘは０から２５５までの数である。ＲＯＭ１４７５
は、バス１４７６を経由して、入力インターフェース１
４６０および画像データプロセッサ１４６２に接続され
る。ＲＯＭ１４７５は短い長さのブレンドを生成し、デ
ータオブジェクトに２５５を掛け、その結果を他のデー
タオブジェクトで割るというような動作に用いられる。The ROM 1475 has a dividend of 255 / x truncated in 8.8 format, where x is a number from 0 to 255. ROM 1475
Is input interface 1 via bus 1476
460 and the image data processor 1462. ROM 1475 is used to generate short length blends, multiply data objects by 255, and divide the result by other data objects.

【０３７３】オペランドバス、例えば１４５２の数は２
に制限されるが、大多数の画像処理動作においては十分
である。図１３０は、入力インターフェース１４６０を
より詳細に示す。入力インターフェース１４６０は、デ
ータオブジェクトインターフェース部１４８０、オペラ
ンドインターフェース部１４８２および１４８４、アド
レス生成状態器１４８６、ブレンド生成状態器１４８
８、行列乗算状態器１４９０、補間状態器１４９４、デ
ータ同期部１５００、算術部１４９６、他レジスタ１４
９８、並びに、データ分配ロジック１５０５を備える。The number of operand buses, for example 1452, is 2
, But sufficient for most image processing operations. FIG. 130 shows the input interface 1460 in more detail. The input interface 1460 includes a data object interface section 1480, operand interface sections 1482 and 1484, an address generation state machine 1486, and a blend generation state machine 148.
8, matrix multiplication state machine 1490, interpolation state machine 1494, data synchronization section 1500, arithmetic section 1496, other registers 14
98 and data distribution logic 1505.

【０３７４】データオブジェクトインターフェース部１
４８０と、オペランドインターフェース部１４８２及び
１４８４とは、外部からデータオブジェクト及びオペラ
ンドを受け取る。インターフェース部１４８２，１４８
４は、２つとも制御バス１５１５からの制御信号によっ
て構成される。インターフェース部１４８２，１４８４
は、受け取ったばかりのデータオブジェクト／オペラン
ドを含むデータレジスタを内部に有しており、２つとも
前記データレジスタが有効であるときはＶＡＬＩＤ信号
を出力する。インターフェース部１４８２，１４８４の
データレジスタの出力はデータバス１５０５に接続され
る。インターフェース部１４８２、１４８４のＶＡＬＩ
Ｄ信号はフローバス１５１０に接続される。オペランド
をフェッチするように構成されたとき、オペランドイン
ターフェース部１４８２および１４８４は、算術部１４
９６からのアドレスと、行列乗算状態器１４９０と、デ
ータオブジェクトインターフェース部１４８０のデータ
レジスタの出力とを受け取り、その中で必要なアドレス
を制御バス１５１５からの制御信号に応じて選択する。
いくつかの場合、特に、外部からデータを受けて格納す
る必要がない場合、オペランドインターフェース部１４
８２および１４８４のデータレジスタは、データオブジ
ェクトインターフェース部１４８０または算術部１４９
６のデータレジスタの出力からデータを格納するように
構成されることが有り得る。Data object interface unit 1
480 and operand interface units 1482 and 1484 receive data objects and operands from outside. Interface section 1482, 148
4 is constituted by a control signal from the control bus 1515. Interface units 1482, 1484
Has a data register containing the data object / operand just received, and both output a VALID signal when the data register is valid. Outputs of the data registers of the interface units 1482 and 1484 are connected to the data bus 1505. VALI of interface unit 1482, 1484
The D signal is connected to the flow bus 1510. When configured to fetch operands, the operand interface units 1482 and 1484 allow the arithmetic unit 14
It receives the address from N. 96, the matrix multiplication state unit 1490, and the output of the data register of the data object interface unit 1480, and selects a necessary address according to the control signal from the control bus 1515.
In some cases, particularly when there is no need to receive and store data from outside, the operand interface unit 14
82 and 1484 are stored in the data object interface unit 1480 or the arithmetic unit 149.
6 may be configured to store data from the output of the data register.

【０３７５】アドレス生成状態器１４８６は、アフィン
画像変換動作および畳込み演算動作において算術部１４
９６を制御し、ソース画像のアクセスされるべき次の座
標を計算する。アドレス生成状態器１４８６は、制御バ
ス１５１５のＳＴＡＲＴ信号が設定されることを待つ。
制御バス１５１５のＳＴＡＲＴ信号が設定されると、ア
ドレス生成状態器１４８６はデータオブジェクトインタ
ーフェース部１４８０に対してＳＴＡＬＬ信号を解除し
て、データオブジェクトが到着することを待つ。なお、
アドレス生成状態器１４８６は、アドレス生成状態器１
４８６がフェッチすることを必要とするカーネルデスク
リプタのデータオブジェクトの数と同じとなるようにカ
ウンタを設定する。カウンタの出力は、復号され、オペ
ランドインターフェース部１４８２および１４８４のデ
ータレジスタと他レジスタ１４９８とのイネーブル信号
になる。データオブジェクトインターフェース部１４８
０からＶＡＬＩＤ信号が起動されると、アドレス生成状
態器１４８６はカウンタを減少させるようになり、デー
タオブジェクトの次の部分が異なるレジスタにラッチさ
れる。The address generation state unit 1486 is used by the arithmetic unit 14 in the affine image conversion operation and the convolution operation.
Control 96 calculates the next coordinate of the source image to be accessed. The address generation state unit 1486 waits for the START signal on the control bus 1515 to be set.
When the START signal of the control bus 1515 is set, the address generation state unit 1486 releases the STALL signal to the data object interface unit 1480 and waits for the arrival of the data object. In addition,
Address generation state machine 1486 is an address generation state machine 1
The counter is set to be equal to the number of kernel descriptor data objects that need to be fetched. The output of the counter is decoded and becomes an enable signal for the data register of the operand interface units 1482 and 1484 and the other register 1498. Data object interface unit 148
When the VALID signal is asserted from zero, the address generation state machine 1486 will decrement the counter and the next portion of the data object will be latched into a different register.

【０３７６】カウンタが零に達すると、アドレス生成状
態器１４８６はオペランドインターフェース部１４８４
からインデックステーブル値とピクセルとをフェッチし
始めよとオペランドインターフェース部１４８２に指示
する。なお、アドレス生成状態器１４８６は、行の数と
列の数とをそれぞれ持つ２つのカウンタをロードする。
全てのクロックエッジにおいて、かつオペランドインタ
ーフェース部１４８２などからのＳＴＡＬＬ信号によっ
て停止されないとき、カウンタは減少され残りの行と列
を出力する。そして、算術部１４９６は、フェッチされ
るべき次の座標を計算する。両方のカウンタが零に達す
ると、カウンタは行と列の数を再びロードし、算術部１
４９６は次の行列の左上端を探すように構成される。When the counter reaches zero, address generation state machine 1486 causes operand interface 1484 to operate.
And instructs the operand interface unit 1482 to start fetching the index table value and the pixel from. The address generation state unit 1486 loads two counters each having the number of rows and the number of columns.
At every clock edge and when not stopped by a STALL signal, such as from the operand interface 1482, the counter is decremented to output the remaining rows and columns. Then, the arithmetic unit 1496 calculates the next coordinate to be fetched. When both counters reach zero, the counters reload the row and column numbers and the arithmetic unit 1
496 is configured to look for the upper left corner of the next matrix.

【０３７７】ピクセルの真の値を決定するために補間が
使われる場合、アドレス生成状態器１４８６は２つのク
ロックサイクルごとに、行および列の数を減少させる。
これは１ビットカウンタを使って実行され、その出力は
行および列カウンタのイネーブルとして用いられる。行
列が一度スキャンされた後、状態器は長さカウンタのカ
ウントを減少させる信号を送る。カウンタが１に達し
て、かつ最終インデックステーブルアドレスがオペラン
ドインターフェース部１４８２に送られたとき、状態器
は最終信号を出し、開始ビットをリセットする。If interpolation is used to determine the true value of a pixel, address generation state 1486 reduces the number of rows and columns every two clock cycles.
This is performed using a one-bit counter, the output of which is used as an enable for the row and column counters. After the matrix has been scanned once, the state machine sends a signal that decrements the count of the length counter. When the counter has reached 1 and the last index table address has been sent to the operand interface 1482, the state machine issues a final signal and resets the start bit.

【０３７８】ブレンド生成状態器１４８８は、算術部１
４９６を制御して、ブレンド長さのための０から２５５
までの数列を生成する。この数列は、ブレンド開始値と
ブレンド終了値との間を補間する補間ファクタとして使
われる。ブレンド生成状態器１４８８はどちらかのモー
ド（ジャンプモード又はステップモード）で実行すべき
であるかを決める。ブレンド長さが２５６以下である場
合はジャンプモードが使われ、そうでない場合はステッ
プモードが使われる。[0378] The blend generation state unit 1488 includes the arithmetic unit 1
Control 496 to 0 to 255 for blend length
Generate a sequence up to This sequence is used as an interpolation factor for interpolating between the blend start value and the blend end value. The blend creation state machine 1488 determines which mode (jump mode or step mode) should be executed. If the blend length is less than 256, the jump mode is used; otherwise, the step mode is used.

【０３７９】ブレンド生成状態器１４８８は、下記の計
算を行い、その結果をレジスタ（ｒｅｇ０，ｒｅｇ１，
ｒｅｇ２）にセットする。ブランドランプが予め決定さ
れた長さでステップモードにある場合、５１１−長さを
ｒｅｇ０（２４ビット）に、５１２−２＊長さをｒｅｇ
１（２４ビット）に、そして、終了−開始をｒｅｇ２
（４×９ビット）に、それぞれラッチする。ランプがジ
ャンプモードにある場合は、０をｒｅｇ０（２４ビッ
ト）に、２５５／（長さ−１）をｒｅｇ１（２４ビッ
ト）に、そして、終了−開始をｒｅｇ２（４×９ビッ
ト）に、それぞれラッチする。The blend generation state unit 1488 performs the following calculation, and stores the result in the register (reg0, reg1,
reg2). If the brand ramp is in step mode with a predetermined length, 511-length is reg0 (24 bits) and 512-2 * length is reg
1 (24 bits) and end-start is reg2
(4 × 9 bits). If the ramp is in jump mode, 0 to reg0 (24 bits), 255 / (length-1) to reg1 (24 bits), and end-start to reg2 (4 × 9 bits), respectively Latch.

【０３８０】ステップモードにおいて、以下の処理が各
サイクルにおいて実行される。ｒｅｇ０＞０であると
き、ｒｅｇ０にｒｅｇ１を加え、その結果をｒｅｇ０に
格納する。もう一つのインクリメンタがイネーブルされ
ることもできるが、その場合には出力が１だけ増加され
る。ｒｅｇ０≦０であるとき、ｒｅｇ０に５１０を加
え、その結果をｒｅｇ０に格納する。インクリメンタは
増加されない。インクリメンタの出力はランプ値であ
る。In the step mode, the following processing is executed in each cycle. When reg0> 0, reg1 is added to reg0, and the result is stored in reg0. Another incrementer could be enabled, in which case the output would be incremented by one. When reg0 ≦ 0, 510 is added to reg0, and the result is stored in reg0. The incrementer is not increased. The output of the incrementer is a ramp value.

【０３８１】ジャンプモードにおいて、以下の処理が各
サイクルにおいて実行される。ｒｅｇ０にｒｅｇ１を加
える。加算の出力は２４ビットであり、１６．８の固定
少数点フォーマットで出力される。前記加算出力をｒｅ
ｇ０に格納する。分数結果の第１ビットが１である場
合、整数部を増加させる。インクリメンタの整数部の下
位８ビットはランプ値である。このランプ値、即ちｒｅ
ｇ２の出力と、ブレンド開始値とは画像データプロセッ
サ１４６２に送られ、ランプを生成する。In the jump mode, the following processing is executed in each cycle. Add reg1 to reg0. The output of the addition is 24 bits and is output in a fixed point format of 16.8. The sum output is re
Store it in g0. If the first bit of the fraction result is 1, the integer part is increased. The lower 8 bits of the integer part of the incrementer are the ramp value. This ramp value, ie, re
The output of g2 and the blend start value are sent to the image data processor 1462 to generate a ramp.

【０３８２】行列乗算状態器１４９０は、変換行列を用
いて入力データオブジェクトに対する線形色空間変換を
行う。変換行列は４×５次元である。第１から第４列に
はデータオブジェクトの４チャネルを掛けるようになっ
ており、最後列は積の和に加えられるべき常係数を含ん
でいる。制御バス１５１５からのＳＴＡＲＴ信号が起動
されたとき、行列乗算状態器は以下のように動く。A matrix multiplication state machine 1490 performs a linear color space conversion on an input data object using a conversion matrix. The transformation matrix has 4 × 5 dimensions. The first to fourth columns are multiplied by the four channels of the data object, and the last column contains the constants to be added to the sum of the products. When the START signal from control bus 1515 is activated, the matrix multiply state machine operates as follows.

【０３８３】１）バス１４８２及び１４８４から変換行
列の常係数をフェッチすべきライン番号を生成する。な
お、他レジスタ１４９８をイネーブルして常係数が格納
できるようにする。２）１ビットフリップフロップを備えていて、ライン番
号を生成して、バス１４８２および１４８４から行列の
半分をフェッチするときにアドレスとして使う。なお、
データオブジェクトの半分から、前記行列の半分に掛け
られるべきものを選択する“ＭＡＴ＿ＳＥＬ”信号をも
生成する。1) A line number to be fetched from the buses 1482 and 1484 for constant coefficients of the conversion matrix is generated. The other register 1498 is enabled so that the constant coefficient can be stored. 2) A 1-bit flip-flop is provided to generate line numbers to use as addresses when fetching half of the matrix from buses 1482 and 1484. In addition,
It also generates a "MAT_SEL" signal that selects from half of the data object what to multiply by half of the matrix.

【０３８４】３）データオブジェクトインターフェース
部１４８０から入力されるデータオブジェクトがないと
き終了する。補間状態器１４９４は、データオブジェクトの水平補間
を行う。水平補間において、主データパス部２４２はバ
ス１４５１からデータオブジェクトストリームを受け取
り、隣のデータオブジェクトの間を補間する。そして、
元ストリームの２倍、又は４倍の長さであるデータオブ
ジェクトのストリームを出力する。データオブジェクト
はバイト又はピクセルにパックされることがあり得るた
め、補間状態器１４９４は、スループットが最大になる
ようにそれぞれの場合に異なる操作を行う。補間状態器
１４９４は以下のように動作する。3) When there is no data object input from the data object interface unit 1480, the processing ends. Interpolation state machine 1494 performs horizontal interpolation of data objects. In the horizontal interpolation, the main data path unit 242 receives a data object stream from the bus 1451 and interpolates between adjacent data objects. And
Output a stream of data objects that is twice or four times as long as the original stream. Since data objects can be packed into bytes or pixels, the interpolator 1494 performs different operations in each case to maximize throughput. Interpolation state machine 1494 operates as follows.

【０３８５】１）ＩＮＴ＿ＳＥＬ信号を生成することに
よって、データ配分ロジック１５０３が入力データオブ
ジェクトを再配列するようにし、正しいデータオブジェ
クト対に対して補間を行うようにする。２）隣接するデータオブジェクト対の間を補間するため
の補間ファクタを生成する。1) By generating the INT_SEL signal, the data distribution logic 1503 causes the input data objects to be reordered and to interpolate on the correct data object pair. 2) Generate an interpolation factor for interpolating between adjacent data object pairs.

【０３８６】３）データオブジェクトインターフェース
部１４８０がもうデータオブジェクトを受け入れないよ
うにするＳＴＡＬＬ信号を生成する。これが必要とされ
る理由は、出力ストリームが入力ストリームより長いか
らである。ＳＴＡＬＬ信号はフローバス１５１０に送ら
れる。算術部１４９６は、算術計算を行うなめの回路を具備し
ており、制御バス１５１５の制御信号によって構成され
る。これは、アフィン画像変換および畳込み演算と合成
においてのブレンド生成という２つの命令のみによって
使われる。3) Generate a STALL signal to prevent the data object interface 1480 from accepting data objects anymore. This is required because the output stream is longer than the input stream. The STALL signal is sent to the flow bus 1510. The arithmetic unit 1496 includes a circuit for performing arithmetic calculations, and is configured by a control signal of a control bus 1515. It is used only by two instructions: affine image transformation and blending in convolution and compositing.

【０３８７】アフィン画像変換および畳込み演算におい
て、算術部１４９６は以下のような演算を行う。１）次のｘおよびｙ座標を計算する。ｘ座標を計算する
ために、算術部１４９６は加算器を用いて現在のｘ座標
に水平および垂直デルタのｘ成分を加えるか、減算器を
用いて現在のｘ座標から水平および垂直デルタのｘ成分
を引くようにする。ｙ座標を計算するために、算術部１
４９８は加算器を用いて現在のｙ座標に水平又は垂直デ
ルタのｙ成分を加えるか、減算器を用いて現在のｙ座標
から水平又は垂直デルタのｙ成分を引くようにする。In the affine image conversion and convolution operation, the arithmetic unit 1496 performs the following operation. 1) Calculate the next x and y coordinates. To calculate the x coordinate, the arithmetic unit 1496 adds the horizontal and vertical delta x components to the current x coordinate using an adder, or subtracts the horizontal and vertical delta x components from the current x coordinate using a subtractor. So that Arithmetic unit 1 to calculate y-coordinate
498 adds the horizontal or vertical delta y component to the current y coordinate using an adder, or subtracts the horizontal or vertical delta y component from the current y coordinate using a subtractor.

【０３８８】２）ｙ座標をインデックステーブルオフセ
ットに加算しインデックステーブルアドレスを計算す
る。ピクセルの元の値を求めるために補間を使う場合、
前記の和はインデックスエントリを求めるために、更に
４だけ増加される。３）ｘ座標をインデックステーブルエントリに加算し、
ピクセルのアドレスを求める。2) The y-coordinate is added to the index table offset to calculate an index table address. If you use interpolation to find the original value of a pixel,
The sum is further increased by 4 to find the index entry. 3) Add the x coordinate to the index table entry,
Find the address of the pixel.

【０３８９】４）長さカウントから１を引く。ブレンド生成において、算術部１４９６は以下のように
作動する。１）ステップモードにおいて、ある１つのランプ加算器
を用いてランプ生成アルゴリズムの内部変数を計算す
る。一方、その他の１つの加算器は、インターバル変数
が零より大きいときにランプ値を増加させるために用い
られる。4) Subtract 1 from the length count. In blend generation, the arithmetic unit 1496 operates as follows. 1) In the step mode, an internal variable of the ramp generation algorithm is calculated using a certain ramp adder. On the other hand, another adder is used to increase the ramp value when the interval variable is greater than zero.

【０３９０】２）ジャンプモードにおいては、ジャンプ
値を現在のランプ値に加えるために１つの加算器のみが
必要とされる。３）ジャンプモードでは、分数の切り捨てが行われる。４）ランプ生成の始めにあたって、ブランドの終了から
ブランドの開始を引く。2) In jump mode, only one adder is needed to add the jump value to the current ramp value. 3) In the jump mode, fractions are truncated. 4) At the beginning of ramp generation, subtract the brand start from the brand end.

【０３９１】５）長さカウントから１を引く。他レジスタ１４９８は、データオブジェクトインターフ
ェース部１４８０、並びに、オペランドインターフェー
ス部１４８２及び１４８４において、データレジスタ以
外の余分の格納空間を提供する。他レジスタ１４９８
は、内部変数を格納するか、或はデータオブジェクトイ
ンターフェース部１４８０からの過去のデータオブジェ
クトをバッファするのにおいて使われるのが普通であ
る。レジスタ１４９８は、制御バス１５１５の制御信号
によって構成される。5) Subtract 1 from the length count. The other register 1498 provides an extra storage space other than the data register in the data object interface unit 1480 and the operand interface units 1482 and 1484. Other register 1498
Is typically used to store internal variables or buffer past data objects from the data object interface 1480. The register 1498 is configured by a control signal of the control bus 1515.

【０３９２】データ同期部１５００は、制御バス１５１
５の制御信号によって構成される。データ同期部１５０
０は、ＳＴＡＬＬ信号をデータオブジェクトインターフ
ェース部１４８０、並びに、オペランドインターフェー
ス部１４８２および１４８４に提供することによって、
あるインターフェース部が、他のインターフェースは持
っていない一部データオブジェクトを受け取った場合、
他のインターフェースの全てかデータを受け取るまでそ
のインターフェース部を停止させる。[0392] The data synchronization section 1500 includes a control bus 151.
5 control signals. Data synchronization unit 150
0 provides the STALL signal to the data object interface 1480 and the operand interface 1482 and 1484,
If one interface part receives some data objects that other interfaces do not have,
Stop the interface until all of the other interfaces have received data.

【０３９３】データ配分ロジック１５０５は、行列乗算
状態器１４９０からのＭＡＴ＿ＳＥＬ信号と、補間状態
器１４９４からのＩＮＴ＿ＳＥＬ信号とを含む制御バス
１５１５の制御信号に応じて、データバス１５１０およ
びレジスタファイル１４７２からのデータオブジェクト
をバス１５３０を経由して再配列する。再配列されたデ
ータはバス１４６１へ出力される。Data distribution logic 1505 responds to control signals on control bus 1515, including the MAT_SEL signal from matrix multiplication state machine 1490 and the INT_SEL signal from interpolation state machine 1494, from data bus 1510 and register file 1472. The data objects are rearranged via the bus 1530. The rearranged data is output to bus 1461.

【０３９４】図１３１は、図１２９の画像データプロセ
ッサ１４６２をより詳細に示す。画像データプロセッサ
１４６２は、パイプライン制御部１５４０と、多数のカ
ラーチャネルプロセッサ１５４５，１５５０，１５５
５、及び１５６０とを有する。全てのカラーチャネルプ
ロセッサは、入力インターフェース１４６０（図１３
１）によって駆動されるバス１５６５から入力を受け取
る。全てのチャネルプロセッサとパイプライン制御部１
５４０は、バス１４７２を経由する、制御信号レジスタ
１４７０からの制御信号によって構成される。全てのカ
ラーチャネルプロセッサは、図１２９のレジスタファイ
ル１４７２及びＲＯＭ１４７５からの入力をもバス１５
８０を経由して受け取ることがある。全てのカラーチャ
ネルプロセッサとパイプライン制御部との出力はグルー
プされてバス１５７０となり、画像データプロセッサ１
４６２の出力１４５５を形成する。FIG. 131 shows the image data processor 1462 of FIG. 129 in more detail. The image data processor 1462 includes a pipeline control unit 1540 and a number of color channel processors 1545, 1550, and 155.
5 and 1560. All color channel processors have an input interface 1460 (FIG. 13).
Receive input from bus 1565 driven by 1). All channel processors and pipeline controller 1
540 is constituted by a control signal from a control signal register 1470 via a bus 1472. All color channel processors also receive input from register file 1472 and ROM 1475 of FIG.
It may be received via 80. The outputs of all the color channel processors and the pipeline control unit are grouped into a bus 1570, and the image data processor 1
462 to form an output 1455.

【０３９５】パイプライン制御部１５４０は、全てのカ
ラーチャネルプロセッサのレジスタをイネーブル又はデ
スエーブルすることによって、全てのカラーチャネルプ
ロセッサのデータオブジェクトのフローを制御する。パ
イプライン制御部１５４０の中には、レジスタパイプラ
インがある。パイプラインの形態及び長さは、バス１４
７１からの制御信号により構成されるようになってお
り、パイプライン制御部１５４０のパイプラインとカラ
ーチャネルプロセッサのパイプラインとは、その形態が
同じである。パイプライン制御部はバス１５６５からＶ
ＡＬＩＤ信号を受け取る。パイプライン制御部１５４０
のパイプラインステージそれぞれにおいて、入力ＶＡＬ
ＩＤ信号が起動され、パイプラインステージが停止され
ていない場合、パイプラインステージは全てのカラーチ
ャネルプロセッサに対してレジスタイネーブル信号を起
動させるとともに入力ＶＡＬＩＤ信号をラッチする。そ
れから、ラッチの出力、即ち、ＶＡＬＩＤ信号は、次の
パイプラインステージに移る。このようにして、パイプ
ラインにおけるデータオブジェクトの移動が、データ記
憶装置を用いずに、シミュレートかつ制御される。The pipeline control unit 1540 controls the flow of data objects of all color channel processors by enabling or disabling registers of all color channel processors. The pipeline control unit 1540 includes a register pipeline. The shape and length of the pipeline are
The configuration is the same as that of the pipeline of the pipeline control unit 1540 and the pipeline of the color channel processor. The pipeline control unit operates from the bus 1565 to V
Receive the ALID signal. Pipeline control unit 1540
Input VAL at each pipeline stage
If the ID signal is activated and the pipeline stage is not stopped, the pipeline stage activates the register enable signal for all color channel processors and latches the input VALID signal. The output of the latch, the VALID signal, then moves to the next pipeline stage. In this way, the movement of data objects in the pipeline is simulated and controlled without using data storage.

【０３９６】カラーチャネルプロセッサ１５４５，１５
５０，１５５５、及び１５６０は、入力データオブジェ
クトに対する主な算術動作を行い、各プロセッサは出力
データオブジェクトの１つのチャネルを担当している。
好適な実施例においては、大多数のピクセルデータオブ
ジェクトが最大４つのチャネルを持っているため、カラ
ーチャネルプロセッサの数は４に制限される。The color channel processors 1545 and 15
50, 1555, and 1560 perform the main arithmetic operations on the input data object, and each processor is responsible for one channel of the output data object.
In the preferred embodiment, the number of color channel processors is limited to four because most pixel data objects have a maximum of four channels.

【０３９７】カラーチャネルプロセッサの中には、ピク
セルの不透明（ｏｐａｃｉｔｙ）チャネルを処理する部
分がある。図１３１には示されていないが、制御バス１
４７１に接続されている追加の回路があり、カラーチャ
ネルプロセッサは不透明チャネルを正しく処理するよう
に制御バス１４７１からの制御信号を変換する。これ
は、ある画像処理動作においては、不透明チャネルに対
する動作がカラーチャネルに対する動作と少し異なるか
らである。Some of the color channel processors handle opacity channels of pixels. Although not shown in FIG. 131, the control bus 1
There is additional circuitry connected to 471, and the color channel processor translates control signals from control bus 1471 to correctly handle opaque channels. This is because in some image processing operations, the operation on the opaque channel is slightly different from the operation on the color channel.

【０３９８】図１３２は、カラーチャネルプロセッサ１
５４５，１５５０，１５５５、１５６０を（図１３２に
おいては一般的に１６００で示した）より詳細に示す。
各カラーチャネルプロセッサ１５４５，１５５０，１５
５５、１５６０は、処理ブロックＡ１６１０と、処理ブ
ロックＢ１６１５と、ビッグ加算器１６２０と、分数切
り捨て部１６２５と、クランプまたはラッパー１６３０
と、出力多重化部１６３５とを備えている。カラーチャ
ネルプロセッサ１６００は、制御信号レジスタ１４７０
からの制御信号をバス１６０２を経由して、パイプライ
ン制御部１５４０からのイネーブル信号をバス１６０４
を経由して、レジスタファイル１４７２からの情報をバ
ス１６０５を経由して、その他カラーチャネルプロセッ
サからのデータオブジェクトをバス１６０３を経由し
て、入力インターフェース１４６０からのデータオブジ
ェクトをバス１６０１を経由して、それぞれ受け取る。FIG. 132 shows the color channel processor 1
545, 1550, 1555, and 1560 are shown in more detail (generally shown as 1600 in FIG. 132).
Each color channel processor 1545, 1550, 15
55 and 1560 are a processing block A 1610, a processing block B 1615, a big adder 1620, a fraction truncation unit 1625, and a clamp or wrapper 1630.
And an output multiplexing unit 1635. The color channel processor 1600 includes a control signal register 1470
Control signal from the pipeline control unit 1540 via the bus 1602, and an enable signal from the pipeline control unit 1540 to the bus 1604.
, The information from the register file 1472 via the bus 1605, other data objects from the color channel processor via the bus 1603, the data objects from the input interface 1460 via the bus 1601, Receive each.

【０３９９】処理ブロックＡ１６１０は，バス１６０１
からのデータオブジェクトに対していくつかの算術動作
を行い、部分的に計算されたデータオブジェクトをバス
１６１１に出力する。処理ブロックＡ１６１０が画像処
理動作のために行うべき処理を以下に説明する。合成に
おいて、処理ブロックＡ１６１０はデータオブジェクト
バス１４５１からのデータオブジェクトに不透明度を掛
け、ブレンド開始値とブレンド終了値との間を図１２９
の入力インターフェース１４６０からの補間ファクタに
よって補間し、図１２９のオペランドバス１４５２から
のオペランドをプレ乗算するかまたはブレンドカラーに
不透明度を掛けるかする。そして、プレ乗算されたオペ
ランドまたはブレンドカラーデータに対する乗算を減衰
させる。The processing block A 1610 includes a bus 1601
Performs some arithmetic operations on the data objects from, and outputs the partially calculated data objects to bus 1611. The processing that the processing block A1610 should perform for the image processing operation will be described below. In the synthesis, the processing block A 1610 multiplies the data object from the data object bus 1451 by opacity, and sets a value between the blend start value and the blend end value as shown in FIG.
129, pre-multiply the operands from the operand bus 1452 in FIG. 129 or multiply the blend color by opacity. Then, the multiplication on the premultiplied operand or the blend color data is attenuated.

【０４００】一般色空間変換において、処理ブロックＡ
１６１０は、図１２９のバス１４５１からの２つの分数
値を用いて４つのカラーテーブル値の間を補間する。ア
フィン画像変換および畳込み演算において、処理ブロッ
クＡ１６１０はソースピクセルの色に不透明度をプレ乗
算し、現在ｘ座標の分数部を用いて同じ行のピクセルの
間を補間する。In general color space conversion, processing block A
1610 interpolates between the four color table values using the two fractional values from bus 1451 of FIG. In the affine image transformation and convolution operation, processing block A1610 pre-multiplies the color of the source pixel by opacity and interpolates between the same row of pixels using the current x-coordinate fraction.

【０４０１】線形色空間変換において、処理ブロックＡ
１６１０はソースピクセルのカラーに不透明度をプレ乗
算し、プレ乗算されたカラーデータに変換行列係数を掛
ける。水平補間と垂直補間において、処理ブロックＡ１
６１０は２つのデータオブジェクトの間を補間する。In the linear color space conversion, processing block A
1610 pre-multiplies the color of the source pixel by the opacity and multiplies the pre-multiplied color data by a transformation matrix coefficient. In the horizontal interpolation and the vertical interpolation, the processing block A1
610 interpolates between two data objects.

【０４０２】レジデュアルマージンにおいて、処理ブロ
ックＡ１６１０は２つのデータオブジェクトを加算す
る。処理ブロックＡ１６１０は多数の多機能ブロック１
６４０と、処理ブロックＡグルーロジック１６４５とを
備える。多機能ブロック１６４０は制御信号によって構
成されていて、以下の機能のどちらかの１つを実行する
ことができる。At the residual margin, processing block A1610 adds two data objects. The processing block A1610 includes a number of multifunctional blocks 1
640 and a processing block A glue logic 1645. Multi-function block 1640 is configured by control signals and can perform one of the following functions.

【０４０３】２つのデータオブジェクトに対し加減算を
行う。１つのデータオブジェクトを伝える。２つのデー
タオブジェクトの間をある補間ファクタによって補間す
る。色に不透明度をプレ乗算する。２つのデータオブジ
ェクトを掛け、その積に第３のデータオブジェクトを掛
ける。The addition and subtraction are performed on the two data objects. Conveys one data object. Interpolate between two data objects by some interpolation factor. Pre-multiply colors by opacity. Multiply two data objects and multiply the product by a third data object.

【０４０４】２つのデータオブジェクトに対し加減算を
行い、その結果に不透明度をプレ乗算する。多機能ブロ
ック１６４０のレジスタは、図１３１のパイプライン制
御部１５４０によって生成される、バス１６０４からの
イネーブル信号によってイネーブルされるかデスエーブ
ルされる。処理ブロックＡグルーロジック１６４５はバ
ス１６０１からのデータオブジェクトおよびバス１６０
３からのデータオブジェクトと、いくつかの多機能ブロ
ック１６４０の出力とを受け取り、これらをその他の選
択された多機能ブロック１６４０の入力に送る。処理ブ
ロックＡグルーロジック１６４５もバス１６０２からの
制御信号によって構成される。Addition and subtraction are performed on two data objects, and the result is premultiplied by opacity. The registers of multifunction block 1640 are enabled or disabled by an enable signal from bus 1604 generated by pipeline control 1540 of FIG. Processing block A glue logic 1645 includes data objects from bus 1601 and bus 160
3 and the output of some multifunction blocks 1640 and sends them to the inputs of other selected multifunction blocks 1640. The processing block A glue logic 1645 is also configured by a control signal from the bus 1602.

【０４０５】処理ブロックＢ１６１５は，バス１６０１
からのデータオブジェクトとバス１６１１からの部分的
に計算されたデータオブジェクトとに対して算術動作を
行い、部分的に計算されたデータオブジェクトをバス１
６１６に出力する。処理ブロックＢ１６１５が画像処理
動作のために行う処理を以下に説明する。非正のオペレ
ータをもつ合成において、処理ブロックＢ１６１５はデ
ータオブジェクトバス１４５１からのプレ処理されたデ
ータオブジェクトと、オペランドバス１４５２からのオ
ペランドに対して、バス１６０３からの合成被乗数を掛
けるとともに、８．８フォーマットの２５５／不透明度
の値であるＲＯＭの出力を、クランプ／ラップされたデ
ータオブジェクトに掛ける。The processing block B 1615 includes a bus 1601
Performs an arithmetic operation on the data object from the bus 1611 and the partially calculated data object from the bus 1611 and transfers the partially calculated data object to the bus 1
616. The processing performed by the processing block B1615 for the image processing operation will be described below. In composition with non-positive operators, processing block B 1615 multiplies the pre-processed data object from data object bus 1451 and the operand from operand bus 1452 by the composite multiplicand from bus 1603 and 8.8. Multiply the output of the ROM, which is the 255 / opacity value of the format, on the clamped / wrapped data object.

【０４０６】正のオペレータをもつ合成において、処理
ブロックＢ１６１５は、プレ処理された２つのデータオ
ブジェクトを加算する。更に、不透明チャネルにおいて
は、前記の和から２５５を引いて、その差をオフセット
に掛け、その積を２５５で割る。一般色空間変換におい
て、処理ブロックＢ１６１５は、バス１４５１からの２
つの分数値を用いて４つのカラーテーブル値の間を補間
し、残っている分数値を用いて処理ブロックＡ１６１０
からの部分的に補間されたカラー値と、以前の補間結果
との間を補間する。In a composition with a positive operator, processing block B1615 adds the two pre-processed data objects. Further, in the opaque channel, subtract 255 from the sum, multiply the difference by the offset, and divide the product by 255. In the general color space conversion, the processing block B1615
Interpolate between the four color table values using the two fractional values and process block A1610 using the remaining fractional values.
Interpolate between the partially interpolated color values from and the previous interpolation result.

【０４０７】アフィン画像変換および畳込み演算におい
て、処理ブロックＢ１６１５は、現在ｙ座標の分数部を
用いて、部分的に補間されたピクセルの間を補間し、補
間されたピクセルにサブサンプルウェート行列の係数を
掛ける。線形色空間変換において、処理ブロックＢ１６
１５はソースピクセルのカラーに不透明度をプレ乗算
し、プレ乗算されたカラーに変換行列係数を掛ける。In the affine image transformation and convolution operation, processing block B1615 interpolates between the partially interpolated pixels using the fractional part of the current y-coordinate and replaces the interpolated pixels with the subsample weight matrix. Multiply the coefficient. In the linear color space conversion, the processing block B16
15 pre-multiplies the color of the source pixel by the opacity and multiplies the pre-multiplied color by a transformation matrix coefficient.

【０４０８】処理ブロックＢ１６１５は、多数の多機能
ブロックと、処理ブロックＢグルーロジック１６５０と
を備える。多機能ブロックは、処理ブロックＡ１６１０
のものと同様であるが、処理ブロックＢグルーロジック
１６５０においては、バス１６０１，１６０３，１６１
１，１６３１からのデータオブジェクトと、選択された
多機能ブロックの出力とを受け入れ、これらを選択され
た多機能ブロックの入力に送る。処理ブロックＢグルー
ロジック１６５０もバス１６０２からの制御信号によっ
て構成される。The processing block B 1615 includes a number of multi-function blocks and a processing block B glue logic 1650. The multi-function block is a processing block A1610
, But in processing block B glue logic 1650, buses 1601, 1603, 161
Accept the data objects from 1,1631 and the output of the selected multi-function block and send them to the input of the selected multi-function block. The processing block B glue logic 1650 is also configured by a control signal from the bus 1602.

【０４０９】ビッグ加算器１６２０は、処理ブロックＡ
１６１０と処理ブロックＢ１６１５からの部分的結果の
いくつかを結合する。これは、バス１６０１を経由して
入力インターフェース１６４０から、バス１６１１を経
由して処理ブロックＡ１６１０から、バス１６１６を経
由して処理ブロックＢ１６１５から、そして、バス１６
０５を経由してレジスタファイル１４７２から、それぞ
れの入力を受け取り、バス１６２１に結合された結果を
出力する。ビッグ加算器１６２０も、バス１６０２の制
御信号によって構成される。The big adder 1620 has the processing block A
1610 and some of the partial results from processing block B 1615. This is from the input interface 1640 via the bus 1601, from the processing block A 1610 via the bus 1611, from the processing block B 1615 via the bus 1616, and from the
The respective input is received from the register file 1472 via the interface 05 and the result connected to the bus 1621 is output. The big adder 1620 is also configured by a control signal of the bus 1602.

【０４１０】ビッグ加算器１６２０は、様々な画像処理
動作に従って、異なる構成にすることができる。ビッグ
加算器１６２０の所定の画像処理動作における動作を以
下に説明する。非正のオペレータを持つ合成において、
ビッグ加算器１６２０は処理ブロックＢ１６１５からの
２つの部分積を合算する。The big adder 1620 can have different configurations according to various image processing operations. The operation of the big adder 1620 in the predetermined image processing operation will be described below. In a composition with a non-positive operator,
Big adder 1620 sums the two partial products from processing block B 1615.

【０４１１】正のオペレータを持つ合成において、オフ
セットイネーブルが起動されているときに、ビッグ加算
器１６２０は不透明度チャネルからオフセットのある先
処理されたデータオブジェクトの和を引く。アフィン画
像変換／畳込み演算において、ビッグ加算器１６２０は
処理ブロックＢ１６１５からの積を累算する。In a composition with a positive operator, when the offset enable is activated, the big adder 1620 subtracts the sum of the preprocessed data object with the offset from the opacity channel. In the affine image conversion / convolution operation, the big adder 1620 accumulates the product from the processing block B1615.

【０４１２】線形色空間変換において、第１サイクルで
ビッグ加算器は２つの行列係数／データオブジェクト積
と常係数とを合算する。第２サイクルで、直前サイクル
の和に他のもう２つの行列係数／データオブジェクト積
を加える。分数切り捨て（丸め）部１６２５は、バス１
６２１を経由してビッグ加算器１６２０からの入力を受
け取り、出力の分数部を切り捨てる。分数部を表すビッ
トの数は、レジスタファイル１４７２からバス１６０５
のＢＰ信号によって表示される。ＢＰ信号を解釈する仕
方を以下の表に表す。切り捨てられた出力はバス１６２
６に提供される。In the linear color space conversion, in the first cycle, the big adder adds two matrix coefficient / data object products and ordinary coefficients. In the second cycle, add the other two matrix coefficient / data object products to the sum of the previous cycle. The fraction truncation (rounding) unit 1625 is the bus 1
It receives the input from the big adder 1620 via 621 and truncates the output fraction. The number of bits representing the fractional part is obtained from the register file 1472 to the bus 1605.
Is displayed by the BP signal. The following table shows how to interpret the BP signal. The truncated output is the bus 162
6 is provided.

【０４１３】分数テーブル[0413] Fraction table

【０４１４】[0414]

【表２７】 [Table 27]

【０４１５】分数切り捨て部１６２５は、分数の切り捨
ての以外に２つの作業を行う。１）切り捨てられた結果が負であるかどうかを決定す
る。２）切り捨てられた結果の絶対値が２５５より大きいか
どうかを決定する。クランプ又はラッパー１６３０はバス１６２６を経由し
て分数切り捨て部１６２５から入力を受け取り、下記の
動作をその順序に従い行う。[0415] The fraction cutoff unit 1625 performs two operations other than fraction cutoff. 1) Determine if the truncated result is negative. 2) Determine if the absolute value of the truncated result is greater than 255. The clamp or wrapper 1630 receives input from the fractional truncation unit 1625 via the bus 1626, and performs the following operations in that order.

【０４１６】切り捨てられた結果の絶対値を求めるべき
というオプションがイネーブルされているとき、その絶
対値を求める。データオブジェクトのアンダフローをあ
る最低値に、そして、データオブジェクトのオーバフロ
ーをある最大値に、それぞれクランプする。出力多重化
部１６３５は、バス１６１６の処理ブロックＢの出力と
バス１６３１のクランプまたはラッパーの出力とのなか
で、最終の出力を選択する。なお、データオブジェクト
に対して、いくつかの最終処理をも行うが、以下は所定
の画像処理動作のために行われる動作を説明する。If the option to determine the absolute value of the truncated result is enabled, determine the absolute value. The underflow of the data object is clamped to a certain minimum value, and the overflow of the data object is clamped to a certain maximum value. The output multiplexing unit 1635 selects the final output from the output of the processing block B on the bus 1616 and the output of the clamp or wrapper on the bus 1631. Although some final processing is also performed on the data object, an operation performed for a predetermined image processing operation will be described below.

【０４１７】非正のオペレータをもつ、プレ乗算なしの
合成において、多重化部１６３５は処理ブロックＢ１６
１５のいくつかの出力を結合し、プレ乗算なしのデータ
オブジェクトを形成する。非正のオペレータをもつ、プ
レ乗算ありの合成において、多重化部１６３５はクラン
プまたはラッパー１６３０の出力を通過させる。In the synthesis without the pre-multiplication with the non-positive operator, the multiplexing unit 1635 sets the
The 15 outputs are combined to form a data object without pre-multiplication. In pre-multiplication compositing with non-positive operators, multiplexer 1635 passes the output of clamp or wrapper 1630.

【０４１８】正のオペレータをもつ合成において、多重
化部１６３５は処理ブロックＢ１６３０のいくつかの出
力を結合し、データオブジェクト結果を形成する。一般
色空間変換において、多重化部１６３５は出力データオ
ブジェクトに対して、翻訳・クランプ機能を適用する。
他の動作において、多重化部１６３５は、クランプ又は
ラッパー１６３０の出力を通過させる。In a composition with a positive operator, multiplexer 1635 combines the several outputs of processing block B 1630 to form a data object result. In the general color space conversion, the multiplexing unit 1635 applies a translation / clamp function to an output data object.
In another operation, the multiplexer 1635 passes the output of the clamp or wrapper 1630.

【０４１９】図１３３は、例えば１６４０のような、１
つの多機能ブロックをより詳細に示す。多機能ブロック
１６４０は、モード検出部１７１０と、２つの加算オペ
ランド論理部１６６０及び１６７０と、３つの多重化論
理部１６８０，１６８５，及び１６９０と、２入力加算
部１６７５と、２つの加数を持つ２入力乗算部１６９５
と、レジスタ１７０５とを備える。FIG. 133 shows one such as 1640.
One multifunctional block is shown in more detail. The multi-function block 1640 includes a mode detection unit 1710, two addition operand logic units 1660 and 1670, three multiplexing logic units 1680, 1865, and 1690, a two-input addition unit 1675, and two addends. 2 input multiplication unit 1695
And a register 1705.

【０４２０】モード検出部１７１０は、図１２９の制御
信号レジスタ１４７０からのＭＯＤＥ信号１７１１と、
図１２９の入力インターフェース１４６０からの２つの
ＳＵＢ信号１７１２及びＳＷＡＰ信号１７１３とを受け
取る。モード検出部１７１０は、これらの信号を復号し
て、加算オペランド論理部１６６０および１６７０と、
多重化論理部１６８０，１６８５，および１６９０に伝
えられる制御信号を生成する。そして、この制御信号
は、多機能ブロック１６４０を種々な動作のできるよう
に構成する。多機能ブロック１６４０は、８つのモード
を有する。[0420] The mode detection unit 1710 includes a MODE signal 1711 from the control signal register 1470 shown in FIG.
It receives two SUB signals 1712 and a SWAP signal 1713 from the input interface 1460 of FIG. Mode detection section 1710 decodes these signals, and performs addition operand logic sections 1660 and 1670;
It generates control signals that are passed to multiplexing logic 1680, 1685, and 1690. The control signal configures the multi-function block 1640 to perform various operations. The multi-function block 1640 has eight modes.

【０４２１】１）加減算モード：ＳＵＢ信号１７１２に
従い、入力１６５５を入力１６６５に加えるか、また
は、入力１６６５から引く。更に、ＳＷＡＰ信号６９３
に従い、入力をスワップすることもできる。２）バイパスモード：入力１６５５を出力にバイパスす
る。３）補間モード：入力１６７５を補間ファクタとして、
入力１６５５と１６６５の間を補間する。ＳＷＡＰ信号
１７１３に従い、入力１６５５および１６６５をスワッ
プすることができる。1) Addition / subtraction mode: Input 1655 is added to input 1665 or subtracted from input 1665 in accordance with SUB signal 1712. Further, the SWAP signal 693
, The input can be swapped. 2) Bypass mode: Bypass input 1655 to output. 3) Interpolation mode: Using input 1675 as an interpolation factor,
Interpolate between inputs 1655 and 1665. The inputs 1655 and 1665 can be swapped according to the SWAP signal 1713.

【０４２２】４）プレ乗算モード：入力１６５５に入力
１６７５を掛け、その結果を２５５で割る。ＩＮＣレジ
スタ１７０８の出力は、正しい結果を得るためにバス１
７０７における、このステージの結果を増加すべきかど
うかを、次のステージに教える。５）乗算モード：入力１６５５に入力１６７５を掛け
る。4) Pre-multiply mode: multiply input 1655 by input 1675 and divide the result by 255. The output of the INC register 1708 is connected to bus 1 for correct results.
At 707, the next stage is told if the results of this stage should be increased. 5) Multiplication mode: multiply input 1655 by input 1675.

【０４２３】６）加減算およびプレ乗算モード：入力１
６６５を入力１６５５に加えるか、または、入力１６５
５から引き、その結果に入力１６７５を掛け、そして、
この積を２５５で割る。ＩＮＣレジスタ１７０８の出力
は、正しい結果を得るためにバス１７０７にあるこのス
テージの結果を増加すべきかどうかを、次のステージに
教える。6) Addition / subtraction and pre-multiplication mode: input 1
665 to input 1655 or input 165
5 and multiply the result by the input 1675, and
Divide this product by 255. The output of the INC register 1708 tells the next stage if the result of this stage on bus 1707 should be incremented to get the correct result.

【０４２４】加算オペランド論理部１６６０及び１６７
０は、加算器によって減算もできるようにするために、
必要に応じて入力に対する１の補数を求める。加算器１
６７５は、バス１６６２と１６７２の加算オペランドロ
ジック１６６０及び１６７０の出力を合算し、その和を
バス１６７７に出力する。多重化ロジック１６８０，１
６８５、及び１６９０は、所望の機能を実行するために
適する被乗数と加数を選ぶ。これらは全てモード検出部
１７１０からのバス１７１４の制御信号によって構成さ
れる。Addition operand logic units 1660 and 167
0 is used so that it can be subtracted by an adder.
If necessary, find the one's complement of the input. Adder 1
675 sums the outputs of add operand logic 1660 and 1670 on buses 1662 and 1672 and outputs the sum on bus 1677. Multiplexing logic 1680,1
685 and 1690 select the appropriate multiplicand and addend to perform the desired function. These are all configured by the control signal of the bus 1714 from the mode detection unit 1710.

【０４２５】２つの加数を持つ乗算部１６９５は、バス
１６８２からの入力をバス１６７７からの入力に掛け
る。そして、前記積にバス１６８７および１６９２から
の入力の和を加える。加算器１７００は、乗算部１６９
５の出力の下位８ビットに乗算部１６９５の出力の上位
８ビットを加える。加算器１７００の桁上げはＩＮＣレ
ジスタ１７０１にラッチされる。ＩＮＣレジスタ１７０
１は、信号１７０２によってイネーブルされる。レジス
タ１７０５は乗算部１６９５からの積を記憶する。これ
も信号１７０２によってイネーブルされる。The multiplication unit 1695 having two addends multiplies the input from the bus 1682 by the input from the bus 1677. The sum of the inputs from buses 1687 and 1692 is then added to the product. The adder 1700 includes a multiplication unit 169
The upper 8 bits of the output of the multiplication unit 1695 are added to the lower 8 bits of the output of 5. The carry of the adder 1700 is latched in the INC register 1701. INC register 170
1 is enabled by signal 1702. Register 1705 stores the product from multiplication section 1695. This is also enabled by signal 1702.

【０４２６】図１３４は、合成動作のブロック図を示
す。この合成動作は３つの入力データストリームを受け
取る。１）累算ピクセルデータ：この累算部モデルにおいて、
結果が格納された位置と同一な位置から誘導される。２）合成オペランド：カラーと不透明度からなる。カラ
ーと不透明度の両方はフラット、ブレンド、ピクセル、
またはタイルであることができる。FIG. 134 is a block diagram of the synthesizing operation. This combining operation receives three input data streams. 1) Accumulated pixel data: In this accumulator model,
The result is derived from the same location where it was stored. 2) Composite operand: consists of color and opacity. Both color and opacity are flat, blended, pixel,
Or it could be a tile.

【０４２７】３）減衰：オペランドデータを減衰する。
減衰はフラットなビットマップまたはバイトマップであ
ることができる。ピクセルデータは典型的に４つのチャネルからなる。そ
の３つのチャネルがピクセルのカラーを形成する。残り
のチャネルはピクセルの不透明度である。ピクセルデー
タはプレ乗算されても、或はされなくてもよい。ピクセ
ルデータがプレ乗算されるとき、各カラーチャネルに不
透明度を掛ける。ピクセルがプレ乗算されると合成動作
の式が簡単になるため、ピクセルデータがプレ乗算され
てから他のピクセルと合成されるのが普通である。3) Attenuation: Attenuates operand data.
The attenuation can be a flat bitmap or a bytemap. Pixel data typically consists of four channels. The three channels form the color of the pixel. The remaining channel is the opacity of the pixel. Pixel data may or may not be pre-multiplied. When pixel data is pre-multiplied, each color channel is multiplied by opacity. Pre-multiplying pixels simplifies the formula for the compositing operation, so it is common for pixel data to be pre-multiplied before compositing with other pixels.

【０４２８】好適な実施例で実行される合成命令を表１
に示す。各命令はプレ乗算されたデータに働きかける。
（ａｃ０，ａ０）はプレ乗算されたピクセルカラーａｃ
と不透明度ａ０を、ｒは“オフセット”値、ｗｃ（）は
ラップ／クランプ・オペレータを意味し、表１における
ｏｖｅｒ、ｉｎ、ｏｕｔ、ａｔｏｐの各オペレータの逆
オペレータも実装されている。また、合成モデルは左側
に累算器を備える。Table 1 shows the synthesis instructions executed in the preferred embodiment.
Shown in Each instruction operates on pre-multiplied data.
(Ac0, a0) is the pre-multiplied pixel color ac
And opacity a0, r is the "offset" value, wc () is the wrap / clamp operator, and the inverse operators of the over, in, out, and top operators in Table 1 are also implemented. The composite model has an accumulator on the left side.

【０４２９】図１３４における合成ブロック１７６０
は、３つのカラーサブブロックと不透明サブブロックを
具備する。各々のカラーサブブロックは、入力ピクセル
の１つのカラーチャンネルと不透明チャンネルに対して
動作して、出力ピクセルのカラーを得る。以上の動作を
擬似コードの形で以下に示す。ＰＩＸＥＬＣｏｍｐｏｓｉｔｅ（ＩＮｃｏｌｏｒＡ，ｃｏｌｏｒＢ：ＰＩＸＥＬ；ＩＮｏｐａｃｉｔｙＡ，ｏｐａｃｉｔｙＢ：ＰＩＸＥＬ；ＩＮｃｏｍｐ＿ｏｐ：ＣＯＭＰＯＳＩＴＥ＿ＯＰＥＲＡＴＯＲ）（ＰＩＸＥＬｒｅｓｕｌｔ；ＩＦｃｏｍｐ＿ｏｐがｒｏｖｅｒ，ｒｉｎ，ｒｏｕｔ，ｒａｔｏｐであるとＴＨＥＮｃｏｌｏｒＡとｃｏｌｏｒＢをスワップする；ｏｐａｃｉｔｙＡ，ｏｐａｃｉｔｙＢをスワップする；ＥＮＤＩＦ；ＩＦｃｏｍｐ＿ｏｐがｏｖｅｒ，ｒｏｖｅｒ，ｌｏａｄｏ，又は、ｐｌｕｓであるとＴＨＥＮＸ＝１；ＥＬＳＥＩＦｃｏｍｐ＿ｏｐがｉｎ，ｒｉｎ，ａｔｏｐ，又は、ｒａｔｏｐであるとＴＨＥＮＸ＝ｏｐａｃｉｔｙＢ；ＥＬＳＥＩＦｃｏｍｐ＿ｏｐがｏｕｔ，ｒｏｕｔ，又は、ｘｏｒであるとＴＨＥＮＸ＝ｎｏｔ（ｏｐａｃｉｔｙＢ）；ＥＬＳＥＩＦｃｏｍｐ＿ｏｐがｌｏａｄｚｅｒｏ，ｌｏａｄｃ，又は、ｌｏａｄｃｏであるとＴＨＥＮＸ＝０；ＥＮＤＩＦ；ＩＦｃｏｍｐ＿ｏｐがｏｖｅｒ，ｒｏｖｅｒ，ａｔｏｐ，ｒａｔｏｐ，又は、ｘｏｒであるとＴＨＥＮＹ＝ｎｏｔ（ｏｐａｃｉｔｙａ）；ＥＬＳＥＩＦｃｏｍｐ＿ｏｐがｐｌｕｓ，ｌｏａｄｃ，又は、ｌｏａｄｃｏであるとＴＨＥＮＹ＝ｎｏｔ（ｏｐａｃｉｔｙａ）；ＥＬＳＥＩＦｃｏｍｐ＿ｏｐがｐｌｕｓ，ｌｏａｄｃ，又は、ｌｏａｄｃｏであるとＴＨＥＮＹ＝１；ＥＬＳＥＩＦｃｏｍｐ＿ｏｐがｉｎ，ｒｉｎ，ｏｕｔ，ｒｏｕｔ，ｌｏａｄｚｅｒｏ，又は、ｌｏａｄｏＴＨＥＮＹ＝０；ＥＮＤＩＦ；ｒｅｓｕｌｔ＝ｃｏｌｏＡ＊Ｘ＋ｃｏｌｏｒＢ＊Ｙ；ＲＥＴＵＲＮｒｅｓｕｌｔ；命令’ｌｏａｄ’と’ｌｏａｄｏ’が不透明チャンネル
に対して異なる意味を持っているため、以上のコードは
不透明サブブロックにおいて異なる。Synthetic block 1760 in FIG. 134
Has three color sub-blocks and an opaque sub-block. Each color sub-block operates on one color channel and an opaque channel of the input pixel to obtain the color of the output pixel. The above operation is shown below in the form of pseudo code. PIXEL Composite (IN color A, color B: PIXEL; IN opacity A, opacity B: PIXEL; IN comp_op: COMPOSITE_OPERATOR) END IF; THEN X = 1 if IF comp_op is over, over, loado, or plus; THEN X = opacyB if ELSE IF comp_op is in, rin, atop, or rat op; ELSE IF comp_op is out, rout, or xor If there is, THEN X = not (opacityB); ELSE IF comp_op is loadzero, loadc, or loadco, THEN X = 0; END IF; IF comp_op is over, over, atop, ratop, or xor. THEN Y = not (opacity); ELSE IF comp_op is plus, loadc, or load co; and THEN Y = not (opacity); ELSE IF comp_op is plus, loadc, or load co; ELSE IF comp_op is in, rin, out, route, loadzero, or load THEN Y = 0; END IF; result = coloA * X + olorB * Y; RETURN result; for instruction 'load' and 'LOADO' has a different meaning to the opaque channel, or code differ in opaque sub-block.

【０４３０】図１３４におけるブロック１７６５は、ブ
ロック１７６０の出力をクランプまたはラップする。ブ
ロック１７６５がクランプするように構成されると、許
容される最小値より小さい全ての値を最小値に、許容さ
れる最大値より大きい全ての値を最大許容値に抑える。
ブロック１７６５がスワップするように構成されると、
以下の式を計算する。Block 1765 in FIG. 134 clamps or wraps the output of block 1760. When block 1765 is configured to clamp, all values less than the minimum allowed are reduced to the minimum and all values greater than the maximum allowed are reduced to the maximum allowed.
When block 1765 is configured to swap,
Calculate the following formula.

【０４３１】（（ｘ−ｍｉｎ）ｍｏｄ（ｍａｘ−ｍｉ
ｎ））＋ｍｉｎ，ここで、ｍｉｎとｍａｘはカラーにおいて許容される最
小値と最大値を意味する。最小値と最大値としては、０
と２５５が望ましい。図１３４におけるブロック１７７
０は、ブロック１７６５からの結果をプレ乗算する。こ
れはプレ乗算されたカラー値に２５５／ｏを掛けること
によりピクセルをプレ乗算する。ここで、ｏは合成後の
不透明度を意味する。２５５／ｏの値は合成エンジン内
のＲＯＭから得られる。ＲＯＭ内の値は８．８フォーマ
ットで記憶されており、分数以下の部分は丸められる。
乗算の結果は１６．８フォーマットで格納される。逆プ
レ乗算されたピクセルを生成するために、この結果は８
ビットで丸められる。((X-min) mod (max-mi)
n)) + min, where min and max mean the minimum and maximum values allowed for the color. The minimum and maximum values are 0
And 255 are desirable. Block 177 in FIG.
0 pre-multiplies the result from block 1765. This pre-multiplies the pixels by multiplying the pre-multiplied color value by 255 / o. Here, o means opacity after synthesis. The value of 255 / o is obtained from ROM in the synthesis engine. The values in the ROM are stored in 8.8 format, and fractions and fractions are rounded.
The result of the multiplication is stored in a 16.8 format. To generate the inverse pre-multiplied pixel, the result is 8
Rounded with bits.

【０４３２】ブランド生成部１７２１は特定の開始値と
終了値を持つ特定長さのブランドを生成する。これは以
下の２つのステージに渡って行なわれる。１）ランプ生成２）補間ランプ生成において、合成エンジンは命令の長さに対し
て、０から２５５まで線形増加する数列を生成する。ラ
ンプ生成には、長さが２５５以下の“ジャンプ”モード
と長さが２５５より長い“ステップ”モードの２つがあ
る。モードは長さの上位２４ビットによって決まる。ジ
ャンプモードにおいて、ランプ値の増加分はクロック周
期ごとに少なくとも１である。ステップモードおいて、
ランプ値の増加分はクロック周期ごとに最大１である。[0432] The brand generation unit 1721 generates a brand of a specific length having a specific start value and a specific end value. This is performed over the following two stages. 1) Ramp generation 2) Interpolation In ramp generation, the synthesis engine generates a sequence that increases linearly from 0 to 255 over the length of the instruction. There are two types of ramp generation: a "jump" mode with a length less than 255 and a "step" mode with a length greater than 255. The mode is determined by the upper 24 bits of the length. In jump mode, the ramp value increment is at least one per clock cycle. In Step mode,
The increment of the ramp value is at most 1 for each clock cycle.

【０４３３】ジャンプモードにおいて、合成エンジンは
ステップ値２５５／（長さ−１）を求めるために８．８
フォーマットのＲＯＭを用いる。この値は１６ビット累
算器に加えられる。累算器の出力は８ビットで切り捨て
られて数列を形成する。ステップモードおいて、合成エ
ンジンはＢｒｅｓｅｎｈａｍの線描アルゴリズムに似た
アルゴリズムを用いる。そのアルゴリズムを以下に示
す。In the jump mode, the synthesis engine calculates 8.8 to obtain a step value of 255 / (length-1).
Format ROM is used. This value is applied to a 16-bit accumulator. The output of the accumulator is truncated at 8 bits to form a sequence. In step mode, the compositing engine uses an algorithm similar to Bresenham's drawing algorithm. The algorithm is shown below.

【０４３４】Ｖｏｉｄｌｉｎｅｄｒａｗ（ｌｅｎｇｔｈ：ＩＮＴＥＲＧＥＲ）｛ｄ＝５１１−ｌｅｎｇｔｈ；ｉｎｃｒＥ＝５１０；ｉｎｃｒＮＥ＝５１２−２＊ｌｅｎｇｔｈ；ｒａｍｐ−０；ｆｏｒ（ｉ＝０；ｉ（ｌｅｎｇｔｈ；ｉ＋＋）｛ｉｆｄ（＝０ｔｈｅｎｄ＋＝ｉｎｃｒＥ；ｅｌｓｅ｛ｄ＋＝ｉｎｃｒＮＥ；ｒａｍｐ＋＋；｝｝｝その後、ランプからブランドを生成するために次の式が使われる。Void line draw (length: INTERGER) d = 511-length; incrE = 510; incrNE = 512-2 * length; ramp-0; for (i = 0; i (length; i ++) （ifd = 0 then d + = incrE; else {d + = incrNE; ramp ++; {}} Then, the following formula is used to generate a brand from the lamp.

【０４３５】Ｂｌｅｎｄ＝（（ｅｎｄ−ｓｔａｒｔ）ｘｒａｍｐ／２５５）＋ｓｔａｒｔ２５５による割算に対して切り捨てが行われる。上記式
は、２つの加算器と、各チャンネルのランプによって
（ｅｎｄ−ｓｔａｒｔに対し）“プレ乗算”を行なうブ
ロックとを必要とする。主データパス部２４２が行なう
ことのできる他の画像処理は、一般色空間変換である。
一般化色空間変換（ＧＣＳＣ）は出力カラー値を求める
ためにピースワイズトライーリニア（３次線形）補間を
用いる。３次元の入力空間から１次元もしくは４次元出
力空間への変換が行なわれるのが望ましい。Blend = ((end-start) × ramp / 255) + start 255 The truncation is performed for the division. The above equation requires two adders and a block to "pre-multiply" (for end-start) by the ramp of each channel. Another image processing that can be performed by the main data path unit 242 is a general color space conversion.
Generalized color space conversion (GCSC) uses piecewise trilinear interpolation to determine output color values. Preferably, a transformation from a three-dimensional input space to a one-dimensional or four-dimensional output space is performed.

【０４３６】いくつかの場合においては、色域のエッジ
におけるトライーリニア補間の正確さが問題になる。こ
の問題はエッジ付近に対して敏感なプリントデバイスに
おいて著しくなる。この問題を避けるためにＧＣＳＣ
は、選択的に拡張出力色空間において計算されることが
でき、次の式を用いて適当な範囲内にスケール及びクラ
ンプされる。In some cases, the accuracy of tri-linear interpolation at the edges of the color gamut is a problem. This problem is exacerbated in printing devices that are sensitive to near edges. GCSC to avoid this problem
Can be selectively calculated in the extended output color space and scaled and clamped within the appropriate range using the following equation:

【０４３７】好適な実施例が実行できるその他の画像処理には、画像
変換および畳込み演算である。画像変換においてソース
画像はスケール、回転、スキューされる。畳込み演算に
おいて、ソース画像のピクセルは畳込み行列をもってサ
ンプリングされ、目的画像を生成する。目的画像におけ
るスキャンラインを生成するためには次の段階が必要で
ある。[0437] Other image processing that the preferred embodiment can perform are image transformation and convolution operations. In image conversion, the source image is scaled, rotated, and skewed. In a convolution operation, pixels of a source image are sampled with a convolution matrix to generate a destination image. The following steps are required to generate a scan line in a target image.

【０４３８】１）図１３５に示すような目的画像のスキ
ャンラインを逆変換する。これによって目的画像のスキ
ャンラインを生成するに必要なソース画像のピクセルを
識別することができる。２）ソース画像の必要部分を解凍する。３）目的画像の水平、垂直サブサンプリング距離、開始
ｘ，ｙ座標をソース画像に逆変換する。1) The scan line of the target image as shown in FIG. 135 is inversely converted. This makes it possible to identify the pixels of the source image required to generate the scan lines of the destination image. 2) Decompress the necessary part of the source image. 3) Invert the horizontal and vertical subsampling distances and the starting x, y coordinates of the target image into the source image.

【０４３９】４）上記情報を処理部に伝送し、必要なサ
ブサンプリングと補間を行ない、出力画像のピクセルを
求める。サブサンプリング、補間、目的ピクセルの書き込みなど
は好適な実施例によって行なわれ、ソース画像における
関連する部分、使うべきサブサンプリング周波数などの
計算はホストアプリケーションによって行なわれる。4) The above information is transmitted to the processing section, and necessary sub-sampling and interpolation are performed to obtain the pixels of the output image. Subsampling, interpolation, writing of destination pixels, etc. are performed by the preferred embodiment, and calculations of relevant parts in the source image, subsampling frequency to be used, etc. are performed by the host application.

【０４４０】図１３６は目的ピクセル値の計算において
必要な段階のブロック図である。図１３６は必要なソー
ス画像のピクセルが利用可能であるものと想定してい
る。目的ピクセルを計算する最後の段階は、ソース画像
から２次線形補間された全てのサブサンプルを合算する
ことである。主データパス部２４２における適当な設定
によって引き出される画像変換エンジンのブロック図を
図１３７に示す。画像変換エンジン１８３０はアドレス
生成部１８３１、プレ乗算部１８３２、補間部１８３
３、累算部１８３４、切捨て、クランプ、絶対値を求め
る論理部１８３５からなる。FIG. 136 is a block diagram showing the steps necessary for calculating the target pixel value. FIG. 136 assumes that the required source image pixels are available. The final step in calculating the destination pixel is to sum all quadratic linearly interpolated subsamples from the source image. FIG. 137 is a block diagram of an image conversion engine derived by appropriate settings in the main data path unit 242. The image conversion engine 1830 includes an address generation unit 1831, a pre-multiplication unit 1832, and an interpolation unit 183.
3, an accumulating unit 1834, a truncating unit, a clamping unit, and a logical unit 1835 for calculating an absolute value.

【０４４１】アドレス生成部１８３１は、結果ピクセル
を構成するのに必要なソース画像のｘ，ｙ軸を生成す
る。また、これは入力インデックステーブル１８１５と
画像１８１０のピクセルからインデックスオフセットを
求めるためのアドレスを生成する。アドレス生成部１８
３１がソース画像のｘ，ｙ軸を生成する前にカーネルデ
ィスクリプタを読む。カーネルディスクリプタのフォー
マットには２つの種類があり、それを図１３８に示す。
カーネルディスクリプタは、１）ソース画像の開始座標（符号なしの固定小数点、２
４．２４精度）。位置（０、０）は画像の左上端であ
る。The address generator 1831 generates the x and y axes of the source image required to form the result pixel. It also generates an address to determine an index offset from the input index table 1815 and the pixels of the image 1810. Address generator 18
31 reads the kernel descriptor before generating the x and y axes of the source image. There are two types of kernel descriptor formats, which are shown in FIG.
The kernel descriptors are: 1) the starting coordinates of the source image (unsigned fixed point, 2
4.24 precision). Position (0,0) is the upper left corner of the image.

【０４４２】２）水平、垂直のサブサンプルデルタ（２
の補数、２４．２４精度）３）固定小数点行列係数における２進小数点の位置を示
す３ビットのｂｐフィールド。図１５０はｂｐフィール
ドの定義とその説明を示す。４）累算行列係数。これは２０個の２進位置（２の補
数）を持つ”可変”小数点精度のものであり、２進小数
点の位置はｂｐフィールドにより暗黙的に規定される。2) Horizontal and vertical sub-sample deltas (2
3) A 3-bit bp field indicating the position of the binary point in the fixed-point matrix coefficients. FIG. 150 shows the definition and description of the bp field. 4) Accumulation matrix coefficients. It is of "variable" decimal precision with 20 binary positions (two's complement), the position of the binary point being implicitly defined by the bp field.

【０４４３】５）カーネルディスクリプタのワードの残
り個数を示すｒｌフィールド。この値は行の個数と（列
の個数−１）とを掛けたものと同じである。短いカーネルディスクリプタにおいて、ｘの開始座標の
定数部を除いた他のパラメータは次のような値を持つ。ｘの開始座標の分数＜ −０，ｙの開始座標＜ −０，水平デルタ＜ −１．０，垂直デルタ＜ −１．０．アドレス生成部１８３１が構成された後、現座標を計算
する。これにはサブサンプル行列の次元に応じて２つの
方法がある。サブサンプル行列の次元が１×１である場
合、アドレス生成部１８３１は十分な座標が得られるま
で水平デルタを現座標に加える。5) An rl field indicating the remaining number of words in the kernel descriptor. This value is the same as the value obtained by multiplying the number of rows by (the number of columns−1). In the short kernel descriptor, other parameters except the constant part of the starting coordinate of x have the following values. Fraction of start coordinates of x <-0, start coordinates of y <-0, horizontal delta <-1.0, vertical delta <-1.0. After the address generation unit 1831 is configured, the current coordinates are calculated. There are two ways to do this, depending on the dimensions of the subsample matrix. If the dimension of the subsample matrix is 1 × 1, the address generator 1831 adds a horizontal delta to the current coordinates until sufficient coordinates are obtained.

【０４４４】サブサンプル行列の次元が１×１でない場
合、アドレス生成部１８３１は行列の１つの行が終るま
で水平デルタを現座標に加える。その後、アドレス生成
部１８３１は次の行の座標を求めるために垂直デルタを
現座標に加える。アドレス生成部１８３１は次の座標を
求めるため、１つ以上の列が終るまで現座標から水平デ
ルタを引く。その後、アドレス生成部１８３１は垂直デ
ルタを現座標に加え、そしてこの過程を繰り返す。図１
５０の上端におけるダイアグラムは行列へのアクセス方
法を示す。この構造を用いて、行列はジグザグでスキャ
ンされ、この方法によって現在のｘ，ｙ軸が計算される
ので、必要なレジスタ数は少なくてもよい。累算行列係
数はカーネルディスクリプタにおいて同様な順序で並べ
なければならない。If the dimension of the subsample matrix is not 1 × 1, the address generator 1831 adds a horizontal delta to the current coordinates until one row of the matrix ends. Thereafter, the address generation unit 1831 adds a vertical delta to the current coordinates to obtain the coordinates of the next row. The address generator 1831 subtracts the horizontal delta from the current coordinates until one or more columns are completed to find the next coordinate. Thereafter, the address generator 1831 adds the vertical delta to the current coordinates and repeats this process. FIG.
The diagram at the top of 50 shows how to access the matrix. Using this structure, the matrix is scanned in a zigzag manner and the current x, y axes are calculated by this method, so that fewer registers are required. The accumulation matrix coefficients must be arranged in a similar order in the kernel descriptor.

【０４４５】現座標を生成した後、アドレス生成部１８
３１はインデックステーブルのアドレスを求めるため、
ｙ軸をインデックステーブルベースアドレスに加える
（ソースピクセルが補間されている場合、アドレス生成
部１８３１は次のインデックステーブルも求める必要が
ある）。インデックステーブルベースアドレスは（ｙ＋
０）におけるインデックステーブルエントリを指す。イ
ンデックステーブルからインデックスオフセットを求め
た後、アドレス生成部１８３１はそれをｘ座標に加え
る。この和は、ソース画像から１ピクセルを求めるとき
に用いられる（ソースピクセルが補間されている場合は
２ピクセル）。ソースピクセルが補間されている場合、
アドレス生成部１８３１はｘ座標を次のインデックスオ
フセットに加え、２以上のピクセルを得る。After generating the current coordinates, the address generation unit 18
31 is to find the address of the index table,
The y-axis is added to the index table base address (if the source pixel is interpolated, the address generator 1831 also needs to find the next index table). The index table base address is (y +
0). After obtaining the index offset from the index table, the address generator 1831 adds it to the x coordinate. This sum is used to determine one pixel from the source image (two pixels if the source pixel is interpolated). If the source pixels are interpolated,
The address generator 1831 adds the x coordinate to the next index offset to obtain two or more pixels.

【０４４６】画像変換の座標を求めるとき、畳込み演算
においても類似な手法を使う。畳込み演算との唯一の差
異は、畳込み演算は次の出力ピクセルにおける行列の開
始座標が前ピクセルにおける行列の開始座標から水平デ
ルタだけ離れていることである。画像変換において、次
のピクセルにおける行列の開始座標は、以前の出力ピク
セルにおける行列の右上端ピクセルの座標から水平デル
タだけ離れている。When obtaining coordinates for image conversion, a similar technique is used in the convolution operation. The only difference from the convolution operation is that the start coordinate of the matrix at the next output pixel is separated from the start coordinate of the matrix at the previous pixel by a horizontal delta. In the image transformation, the starting coordinates of the matrix at the next pixel are separated by a horizontal delta from the coordinates of the upper right pixel of the matrix at the previous output pixel.

【０４４７】図１３９において、中段のダイアグラムは
上記の差を示す。プレ乗算部１８３２は必要であればピ
クセルのカラーチャネルと不透明チャネルを掛ける。補
間部１８３２は必要なピクセルの真の色を求めるためソ
ースピクセルを補間する。これはソース画像メモリから
２ピクセルを取り、現在のｘ座標の分数部分を用いて補
間し、その結果をレジスタに入力する。その後、ソース
画像メモリの次の列の２ピクセルを取り、同じくｘの分
数を用いて補間する。その後、補間部１８３３は現在の
ｙ座標の分数部を用いて、この補間値とその前の補間値
を補間する。In FIG. 139, the middle diagram shows the above difference. The pre-multiplier 1832 multiplies the pixel's color and opacity channels if necessary. Interpolator 1832 interpolates the source pixels to determine the true color of the required pixel. It takes two pixels from the source image memory, interpolates using the fractional part of the current x coordinate, and inputs the result to a register. Thereafter, two pixels in the next column of the source image memory are taken and similarly interpolated using a fraction of x. Thereafter, the interpolation unit 1833 uses the fractional part of the current y coordinate to interpolate this interpolated value with the previous interpolated value.

【０４４８】累算部１８３４は２つの作業をする。１）行列係数とピクセルを掛ける。２）全ての行列に対する上の結果を累算した値を次のス
テージに出力する。累算部１８３４の初期値は、チャネ
ルに応じて、０もしくは特定の値に初期化される。The accumulator 1834 performs two tasks. 1) Multiply the matrix coefficient by the pixel. 2) Output the value obtained by accumulating the above results for all matrices to the next stage. The initial value of the accumulating unit 1834 is initialized to 0 or a specific value according to the channel.

【０４４９】ブロック１８３５は累算部１８３４の出力
を切り捨て、必要であればアンダーフローやオーバーフ
ローした値を最大値または最小値に制限する。そして、
必要であれば出力の絶対値を求めることもある。累算部
の出力において２進小数点の位置はカーネルディスクリ
プタのｂｐフィールドによって指定される。ｂｐフィー
ルドは、累算結果において捨てるべきビットの数を示
す。これは、図１３９における下端のダイアグラムに示
されている。この累算値は符号ありの２の補数として扱
われる。Block 1835 truncates the output of accumulator 1834 and limits underflow or overflow values to maximum or minimum values, if necessary. And
If necessary, the absolute value of the output may be obtained. The position of the binary point in the output of the accumulator is specified by the bp field of the kernel descriptor. The bp field indicates the number of bits to be discarded in the accumulation result. This is shown in the lower diagram in FIG. This accumulated value is treated as a signed two's complement.

【０４５０】主データパス部２４２が行えるもう１つの
画像処理動作は行列乗算である。行列乗算は２つの空間
の間でアフィン関係がある場合の色空間変換に使われ
る。これが、（３次線形補間に基づく）一般色空間変換
との差異である。行列乗算の結果は次の式によって定義
される。Another image processing operation that can be performed by the main data path unit 242 is matrix multiplication. Matrix multiplication is used for color space conversion when there is an affine relationship between two spaces. This is a difference from general color space conversion (based on cubic linear interpolation). The result of the matrix multiplication is defined by the following equation:

【０４５１】[0451]

【数７】 (Equation 7)

【０４５２】ここで、ｒｉは結果ピクセルであり、ａｉ
はＡオペランドピクセルである。行列のサイズは５列４
行でなければならない。図１４０は、主データパス部２
４２において行列乗算を行なう乗算−加算器のブロック
図である。この中にはピクセルチャンネルに行列係数を
掛ける乗算部、その結果を合算する加算器、必要に応じ
て出力値をクランプしそして絶対値を求める論理部から
なる。Where ri is the result pixel and ai
Is the A operand pixel. Matrix size is 5 columns 4
Must be a line. FIG. 140 shows the main data path unit 2
42 is a block diagram of a multiply-adder that performs matrix multiplication at 42. FIG. It comprises a multiplier for multiplying the pixel channel by a matrix coefficient, an adder for summing the results, and a logic for clamping an output value and obtaining an absolute value as necessary.

【０４５３】行列乗算が終了するためには２クロックサ
イクルが必要である。各サイクルごとに多重化部を設定
し、乗算部と加算部のデータが正しく選択されるように
する。第０サイクルにおいて、ピクセルの最下位２バイ
トが多重化部１８５１、１８５２によって選択される。
次にその係数を行列の左側における２つの列、即ち、キ
ャッシュにおける第０ラインにある行列係数に掛ける。It takes two clock cycles to complete the matrix multiplication. A multiplexing unit is set for each cycle so that data of the multiplying unit and the adding unit are correctly selected. In the 0th cycle, the least significant two bytes of the pixel are selected by the multiplexers 1851 and 1852.
The coefficient is then multiplied by the matrix coefficient in the two columns on the left side of the matrix, ie, line 0 in the cache.

【０４５４】第１サイクルにおいて、ピクセルのより上
位２バイトがトップ多重化部によって選択される。次に
その係数を行列の右側における２つの列に掛ける。乗算
の結果は最終サイクルの結果に加えられる１８５４。加
算部における和は８ビットに切り捨てられる１８５５。
“オペランド論理部”１８５６は、加算部１８５４の入
力が４つになるように乗算部出力を再配列する。これは
乗算部の結果に対する加算を可能にするための再配列を
行い、２４ビット係数と８ビットピクセル成分との正し
い積を出力するようにする。In the first cycle, the upper two bytes of the pixel are selected by the top multiplexer. The coefficients are then multiplied by the two columns on the right side of the matrix. The result of the multiplication is added 1854 to the result of the last cycle. The sum in the adder is truncated to 8 bits 1855.
The “operand logic unit” 1856 rearranges the output of the multiplication unit so that the input of the addition unit 1854 becomes four. This performs a rearrangement to allow addition to the result of the multiplier and outputs the correct product of the 24-bit coefficients and the 8-bit pixel components.

【０４５５】“ＡＣ論理部”１８５５は加算部の出力の
最下位１２ビットを切捨て、設定に従い切り捨てられた
結果の絶対値を求める。その後、設定に応じて、その結
果をクランプまたはラップする。“ＡＣ論理部”がクラ
ンプするように設定されたとき、０以下の全ての値は０
に、２５５以上の全ての値は２５５に抑えられる。“Ａ
Ｃ論理部”がラップするように設定されたとき、定数部
分の下位８ビットが出力される。[0455] The "AC logic unit" 1855 truncates the least significant 12 bits of the output of the adding unit, and obtains the absolute value of the truncated result according to the setting. The result is then clamped or wrapped, depending on the settings. When the “AC logic” is set to clamp, all values below 0 are 0
In addition, all values 255 and above are suppressed to 255. "A
When the "C logic part" is set to wrap, the lower 8 bits of the constant part are output.

【０４５６】主データパス部２４２は、上記以外の画像
処理を行なうように設定されることもできる。設計再利
用によってコストが低減されるとともに、様々な画像処
理動作を早く行なうことのできるコンピュータアーキテ
クチャについて以下述べるようにする。なお、このコン
ピュータアーキテクチャは柔軟性をもっているため、外
部プログラミングエージェントであってもそのアーキテ
クチャにさえ慣れていれば、元々予測しなかった画像処
理動作をも実行できるようにコンピュータを構成するこ
とができる。また、設計のコアーは主にいくつかの多機
能ブロックからなるため、設計の苦労を著しく減らすこ
とができる。The main data path section 242 can be set to perform image processing other than the above. A computer architecture capable of reducing costs by design reuse and performing various image processing operations quickly will be described below. It should be noted that since the computer architecture is flexible, even if an external programming agent is used to the architecture, the computer can be configured to execute image processing operations that were not originally predicted. Also, since the core of the design mainly consists of several multifunctional blocks, the design effort can be significantly reduced.

【０４５７】３．１８．６データキャッシュ制御部と
キャッシュデータキャッシュ制御部２４０は、コプロセッサ２２４
における４キロバイトの読み出しデータキャッシュ２３
０を備えている。データキャッシュ２３０はダイレクト
マップＲＡＭキャッシュとして配列されており、外部メ
モリにおける同じ長さを持つラインのいずれも、キャッ
シューメモリ２３０（図２）における同じ長さの同じラ
インに直接マッピングされることができる。キャッシュ
メモリにおけるこのラインを普通キャッシュラインと呼
び、上記キャッシュメモリは、多数のこのようなキャッ
シュラインからなる。3.18.6 Data Cache Control Unit and Cache The data cache control unit 240
4 KB read data cache 23
0 is provided. The data cache 230 is arranged as a direct mapped RAM cache, and any line of the same length in the external memory can be directly mapped to the same line of the same length in the cache memory 230 (FIG. 2). . This line in a cache memory is commonly called a cache line, and the cache memory is made up of many such cache lines.

【０４５８】データキャッシュ制御部２４０は２つのオ
ペランドオーガナイザ２４７、２４８からのデータ要求
をサービスする。まずデータがキャッシュ２３０に存在
するかを確認する。そうでなければデータが外部メモリ
からフェッチされる。データキャッシュ制御部２４０に
はプログラムのできるアドレス生成部があり、データキ
ャッシュ制御部２４０がいくつかの異なるアドレッシン
グモードで動作するのを可能にする。また、要求された
データのアドレスがデータキャッシュ制御部２４０によ
って作られるようになる特殊アドレシングモードもあ
る。このモードでは８ワード（２５６ビット）までのデ
ータをオペレーションオーガナイザ２４７、２４８に同
時に送ることができる。The data cache control unit 240 services data requests from the two operand organizers 247 and 248. First, it is confirmed whether data exists in the cache 230. Otherwise, data is fetched from external memory. The data cache controller 240 has a programmable address generator, which enables the data cache controller 240 to operate in several different addressing modes. There is also a special addressing mode in which the address of the requested data is made by the data cache control unit 240. In this mode, data of up to eight words (256 bits) can be sent to the operation organizers 247 and 248 simultaneously.

【０４５９】キャッシュＲＡＭは８つの独立してアドレ
ス可能なメモリバンクからなる（異なるラインアドレス
によってアドレスされた）。各々のバンクからのデータ
が２５６ビットに単位付けられる一部の特殊アドレシン
グモードに必要である。この配置は、お互いに異なるバ
ンクから来たものであれば、８つの３２ビット要求まで
を同時にサービスすることができる。The cache RAM consists of eight independently addressable memory banks (addressed by different line addresses). Required for some special addressing modes where the data from each bank is united into 256 bits. This arrangement can service up to eight 32-bit requests simultaneously if they come from different banks.

【０４６０】キャッシュは、詳細に後述する以下のモー
ドにおいて動作する。必要であれば、すべてのキャッシ
ュが自動的に入れ込まれるようにすることも可能であ
る。１．ノーマルモード２．単一出力一般色空間変換モード３．多出力一般色空間変換モード４．ＪＰＥＧ符号化モード５．低速ＪＰＥＧ復号モード６．行列乗算モード７．デスエーブルモード８．無効化モード図１４１は、図２におけるデータキャッシュ制御部２４
０のアドレス、データ、制御フローとデータキャッシュ
２３０とを示す。The cache operates in the following modes, which will be described in detail below. If necessary, all caches can be automatically populated. 1. Normal mode 2. 2. Single output general color space conversion mode 3. Multi-output general color space conversion mode JPEG encoding mode 5. Low-speed JPEG decoding mode 6. Matrix multiplication mode 7. Disable mode Invalidation mode FIG. 141 shows the data cache control unit 24 in FIG.
0 shows the address, data, control flow, and data cache 230.

【０４６１】データキャッシュ２３０は、前述したダイ
レクトマップキャッシュを具備する。データキャッシュ
制御部２４０は、各キャッシュラインにおけるタグエン
トリを有するタグメモリ１８７２を具備しており、タグ
エントリはキャッシュラインが現在マップされている外
部メモリアドレスの最上位部を有する。また、現在のキ
ャッシュラインが有効であるかどうかを示すライン有効
状態メモリ１８７３も備える。全てのキャッシュライン
の初期状態は無効である。The data cache 230 has the direct map cache described above. The data cache controller 240 includes a tag memory 1872 having a tag entry for each cache line, with the tag entry having the most significant external memory address to which the cache line is currently mapped. It also has a line valid state memory 1873 indicating whether the current cache line is valid. The initial state of all cache lines is invalid.

【０４６２】データキャッシュ制御部２４０は、オペラ
ンドオーガナイザＣ２４７（図２）とオペランドオーガ
ナイザＣ２４８（図２）からのデータ要求をオペランド
バスインターフェースを通じて同時にサービスできる。
動作において、オペランドオーガナイザ２４７、２４８
（図２）のどちらかの一方もしくは両方はインデックス
１８７４を提供し、データ要求信号１８７６を出す。ア
ドレス生成部１８８１はインデックス１８７４に対して
１つもしくはそれ以上の完全な外部アドレス１８７７を
生成する。キャッシュ制御部１８７８は、生成されたア
ドレス１８７７のタグアドレスに対するタグメモリ１８
７２を検査するとともに、関連するキャッシュラインが
有効であるかどうかを調べるためにライン有効状態メモ
リ１８７３を検査することにより、要求されたデータが
キャッシュ２３０に存在するかどうかを判断する。要求
されたデータがキャッシュメモリ２３０に存在すると
き、要求データ１８８０と共に、アクノレッジメント
（応答）信号１８７９が関連するオペレーションオーガ
ナイザ２４７、２４８に送られる。要求されたデータが
キャッシュメモリ２３０に存在しないとき、入力バスイ
ンターフェース１８７１と入力インターフェーススイッ
チ２５２（図２）を通じて、要求されたデータ１８７０
が外部メモリからフェッチされる。データ１８７０は要
求信号１８８２を出力し、要求されたデータ１８７０が
生成されたアドレス１８７７を提供することによってフ
ェッチされる。アクノリッジ信号１８８３及び要求され
たデータ１８７０はそれぞれキャッシュ制御部１８７８
及びキャッシュメモリ２３０に送られる。それから、そ
のキャッシュメモリ２３０に関連するキャッシュライン
が新しいデータ１８７０によって更新される。新しいキ
ャッシュラインのタグアドレスもタグメモリ１８７２に
書き込まれ、新しいキャッシュラインにおけるライン有
効状態１８７３が起動される。アクノリッジ信号１８７
９はデータ１８７０とともに関連するオペランドオーガ
ナイザ２４７又は２４８（図２）に送られる。The data cache control unit 240 can simultaneously service data requests from the operand organizer C247 (FIG. 2) and the operand organizer C248 (FIG. 2) through the operand bus interface.
In operation, the operand organizers 247, 248
Either or both of FIG. 2 provides an index 1874 and issues a data request signal 1876. The address generator 1881 generates one or more complete external addresses 1877 for the index 1874. The cache control unit 1878 stores the tag memory 18 with respect to the tag address of the generated address 1877.
Determine whether the requested data is in cache 230 by examining 72 and examining line valid state memory 1873 to see if the associated cache line is valid. When the requested data is present in the cache memory 230, an acknowledgment (response) signal 1879 is sent along with the requested data 1880 to the associated operation organizer 247,248. When the requested data does not exist in the cache memory 230, the requested data 1870 is input through the input bus interface 1871 and the input interface switch 252 (FIG. 2).
Is fetched from external memory. Data 1870 is fetched by outputting request signal 1882 and providing the address 1877 at which requested data 1870 was generated. The acknowledgment signal 1883 and the requested data 1870 are stored in the cache controller 1878, respectively.
And sent to the cache memory 230. The cache line associated with that cache memory 230 is then updated with the new data 1870. The tag address of the new cache line is also written to the tag memory 1872, and the line valid state 1873 for the new cache line is activated. Acknowledge signal 187
9 is sent along with data 1870 to the associated operand organizer 247 or 248 (FIG. 2).

【０４６３】図１４２において、データキャッシュ２３
０のメモリ構成を示す。データキャッシュ２３０は、キ
ャッシュライン長が３２である１２８個のキャッシュラ
インＣ０，．．．，Ｃ１２７をもつダイレクトマップキ
ャッシュとして整理される。キャッシュＲＡＭは別々の
アドレス指定のできるメモリバンクＢ０，．．．，Ｂ７
を具備しており、各メモリバンクは３２ビットのバンク
ライン１２８個のを持ち、各キャッシュラインＣｉは８
つのメモリバンクＢ０，．．．Ｂ７において相当する８
つのバンクラインＢ０ｉ，．．．，Ｂ７ｉを有する。In FIG. 142, the data cache 23
0 shows a memory configuration. The data cache 230 has 128 cache lines C0,. . . , C127 as a direct map cache. The cache RAM has separate addressable memory banks B0,. . . , B7
Each memory bank has 128 32-bit bank lines, and each cache line Ci has 8 banks.
The two memory banks B0,. . . 8 equivalent in B7
One bank line B0i,. . . , B7i.

【０４６４】生成された外部メモリアドレスの構成を図
１４３に示す。生成されたアドレスは２０ビットタグア
ドレス、７ビットラインアドレス、３ビットバンクアド
レス、２ビットバイトアドレスからなる３２ビットのワ
ードである。２０ビットタグアドレスはタグアドレスと
タグメモリ１８７２に記憶されているタグと比較するの
に使われる。７ビットラインアドレスはキャッシュメモ
リ１８７０にある関連するキャッシュラインのアドレス
に使われる。３ビットバンクアドレスはキャッシュメモ
リ１８７０のメ関連するモリバンクのアドレスに使われ
る。２ビットバイトアドレスは３２ビットバンクライン
の関連するバイトのアドレスに使われる。FIG. 143 shows the configuration of the generated external memory address. The generated address is a 32-bit word consisting of a 20-bit tag address, a 7-bit line address, a 3-bit bank address, and a 2-bit byte address. The 20-bit tag address is used to compare the tag address with the tag stored in tag memory 1872. The 7-bit line address is used for the address of the associated cache line in cache memory 1870. The 3-bit bank address is used for the address of the memory bank associated with the memory of the cache memory 1870. The 2-bit byte address is used to address the associated byte of the 32-bit bank line.

【０４６５】図１４４は、データキャッシュ制御部２４
０とデータキャッシュ２３０の構造のブロック図を示
す。ここで、１２８×２５６ビットＲＡＭはキャッシュ
メモリ２３０を構成し、これは８つの１２８×３２ビッ
トの分離住所付けが可能なメモリバンクからなる。この
ＲＡＭは書き込み可能ポート（ｗｒｉｔｅ）、書き込み
アドレスポート（ｗｒｉｔｅ＿ａｄｄｒ）、書き込みデ
ータポート（ｗｒｉｔｅ＿ｄａｔａ）を持つ。また、読
み可能ポート（ｒｅａｄ）、８つの読みアドレスポート
（ｒｅａｄ＿ａｄｄｒ）、８つの読みデータ出力ポート
（ｒｅａｄ＿ｄａｔａ）を持つ。キャッシュメモリ２３
０の全てのメモリバンクへの同時書き込みを可能にさせ
るためキャッシュ制御ブロック１８７８から書き込み可
能信号が生成される。必要によって、データキャッシュ
２３０は書き込みデータポート（ｗｒｉｔｅ＿ｄａｔ
ａ）を通じて外部メモリからの１もしくはそれ以上のラ
インのデータに更新される。書き込みアドレスポート
（ｗｒｉｔｅ＿ａｄｄｒ）にラインアドレスを提供し、
８：１多重化器ＭＵＸを利用することによって１ライン
のデータが書き込まれる。８：１多重化器ＭＵＸはデー
タキャッシュ制御部（ａｄｄｒ＿ｓｅｌｅｃｔ）の制御
の下で生成された外部アドレスからラインアドレスを選
択する。キャッシュメモリ２３０の全てのメモリバンク
への同時読み込みを可能にさせるため、キャッシュ制御
ブロック１８７８から読み可能信号が生成される。この
方法で、キャッシュメモリ２３０のメモリバンクの８つ
の書きアドレスポート（ｒｅａｄ＿ａｄｄｒ）に提供さ
れる各々のラインアドレスに応じて、８つの読みデータ
ポート（ｒｅａｄ＿ｄａｔａ）から８つの異なるバンク
ラインのデータを同時に読み込むことができる。FIG. 144 shows the data cache control unit 24.
0 shows a block diagram of the structure of the data cache 230. Here, the 128 × 256 bit RAM constitutes the cache memory 230, which comprises eight 128 × 32 bit separate addressable memory banks. This RAM has a writable port (write), a write address port (write_addr), and a write data port (write_data). It also has a readable port (read), eight read address ports (read_addr), and eight read data output ports (read_data). Cache memory 23
A write enable signal is generated from the cache control block 1878 to allow simultaneous writing to all memory banks of zeros. If necessary, the data cache 230 may store the write data port (write_dat).
Through a), the data is updated to one or more lines of data from the external memory. Providing a line address to a write address port (write_addr),
One line of data is written by using the 8: 1 multiplexer MUX. The 8: 1 multiplexer MUX selects a line address from the external address generated under the control of the data cache control unit (addr_select). A read enable signal is generated from cache control block 1878 to allow simultaneous reading of all memory banks of cache memory 230. In this manner, data of eight different bank lines are simultaneously read from eight read data ports (read_data) according to respective line addresses provided to eight write address ports (read_addr) of the memory banks of the cache memory 230. be able to.

【０４６６】各々のキャッシュメモリ２３０のバンクは
プログラム可能アドレス生成器１８８１を持っている。
これは違う８つの位置への、関連する８つのメモリバン
クからの同時アクセスを可能にする。各々のアドレス生
成器１８８１はアドレス生成器１８８１の作動モード設
定のためのｄｃｃモード入力、インデックスパケット入
力、ベースアドレス入力、アドレス出力を持つ。プログ
ラム可能アドレス生成器１８８１の作動モードは、（ａ）ｄｃｃモード入力への信号が各々のアドレス生成
器１８８１をランダムアクセスモードにし、外部メモリ
アドレスがインデックスパケット入力へ提供され、一つ
もしくはそれ以上のアドレス生成器１８８１のアドレス
出力に出力されるランダムアクセスモード；（ｂ）ｄｃｃモード入力への信号が各々のアドレス生成
器１８８１を適切なモードにするＪＰＥＧエンコーディ
ングと復号、色空間変換、行列乗算モード。このモード
では、各々のアドレス生成器１８８１にはインデックス
パケット入力へのインデックスが入力され、インデック
スアドレスを生成する。作動モードによって、アドレス
生成部は最大８つの異なる外部メモリアドレスを生成さ
せることができる。Each cache memory 230 bank has a programmable address generator 1881.
This allows simultaneous access to eight different locations from the eight associated memory banks. Each address generator 1881 has a dcc mode input for setting an operation mode of the address generator 1881, an index packet input, a base address input, and an address output. The modes of operation of the programmable address generator 1881 include: (a) a signal to the dcc mode input places each address generator 1881 in a random access mode, an external memory address is provided to the index packet input, and one or more (B) JPEG encoding and decoding, color space conversion, matrix multiplication mode in which a signal to the dcc mode input puts each address generator 1881 into an appropriate mode. In this mode, the index to the index packet input is input to each address generator 1881 to generate an index address. Depending on the operation mode, the address generator can generate up to eight different external memory addresses.

【０４６７】８つのアドレス生成部１８８１は８つの異
なる論理回路からなっており、各々は入力としてベース
アドレス、出力として外部メモリアドレスを持つｄｃｃ
モードとインデックスからなる。ベースアドレスレジス
タ１８８５はインデックスパケットの組合せである現在
のベースアドレスを記憶し、ｄｃｃモードレジスタ１８
８８はデータキャッシュ制御部２４０の現在の作動モー
ド（ｄｃｃモード）を記憶する。The eight address generators 1881 are composed of eight different logic circuits, each of which has a base address as an input and a dcc having an external memory address as an output.
It consists of a mode and an index. The base address register 1885 stores the current base address, which is a combination of index packets, and stores the current base address in the dcc mode register 18.
Reference numeral 88 stores the current operation mode (dcc mode) of the data cache control unit 240.

【０４６８】タグメモリ１８７２は１ブロック、１２８
×２０ビットのマルチポートＲＡＭで構成される。この
ＲＡＭは１つの書きポート（ｕｐｄａｔｅ−ｌｉｎｅ−
ａｄｄｒ）、１つの書き可能ポート（ｗｒｉｔｅ）、８
つの読みポート（ｔａｇ０＿ｄａｔａ，．．．，ｔａｇ
７＿ｄａｔａ）を持っている。これは、８つのアドレス
生成器１８８１が現在記憶されている、１つもしくはそ
れ以上に生成されたメモリアドレスの、ラインのタグア
ドレスを決定することによりポート（ｒｅａｄ０ｌｉｎ
ｅ−ａｄｄｒ，．．．，ｒｅａｄ７ｌｉｎｅ−ａｄｄ
ｒ）において８つの同時のルックアップを可能にする。
これらラインの現在のタグアドレスはポート（ｔａｇ０
−ｄａｔａ，．．．，ｔａｇ７−ｄａｔａ）からタグ比
較部１８８６に出力される。ポート（ｕｐｄａｔｅ−ｌ
ｉｎｅ−ａｄｄｒ）のタグメモリ１８７２への書き込み
を可能にするため、必要によって、キャッシュ制御ブロ
ック１８７２によりタグ書き信号は生成される。The tag memory 1872 has one block, 128
It is composed of a × 20-bit multiport RAM. This RAM has one write port (update-line-
addr), one writable port (write), 8
Read ports (tag0_data, ..., tag)
7_data). This is accomplished by determining the tag address of the line (read0lin) of the one or more generated memory addresses where the eight address generators 1881 are currently stored.
e-addr,. . . , Read7line-add
r) allows eight simultaneous lookups.
The current tag address of these lines is the port (tag0
-Data,. . . , Tag7-data) to the tag comparing unit 1886. Port (update-l
A tag write signal is generated by the cache control block 1872, if necessary, to allow writing of in-addr) to the tag memory 1872.

【０４６９】１２８ビットのラインｖａｌｉｄメモリ１
８７３は、キャッシュメモリ２３０の各キャッシュライ
ンのｖａｌｉｄ状態を保っている。これは１つの書きポ
ート（ｕｐｄａｔｅ−ｌｉｎｅ−ａｄｄｒ）、１つの書
き可能ポート（ｕｐｄａｔｅ）、８つの読み込みポート
（ｒｅａｄ０ｌｉｎｅ−ａｄｄｒ，．．．，ｒｅａｄ７
ｌｉｎｅ−ａｄｄｒ）、８つの読み可能ポート（ｌｉｎ
ｅｖａｌｉｄ０，．．．，ｌｉｎｅｖａｌｉｄ７）から
なる１２８×１ビットのメモリである。タグメモリと同
じように、これは８つのアドレス生成部１８８１に、１
つ若しくはそれ以上に生成されたメモリアドレスの個々
のラインアドレスに対して、現在のラインにセーブされ
ているラインｖａｌｉｄ状態を決定させることにより、
ポート（ｒｅａｄ０ｌｉｎｅ−ａｄｄｒ，．．．，ｒｅ
ａｄ７ｌｉｎｅ−ａｄｄｒ）に対しての８つの同時ルッ
クアップを可能にする。このラインの現ラインｖａｌｉ
ｄｅビットはポート（ｌｉｎｅｖａｌｉｄ０，．．．，
ｌｉｎｅｖａｌｉｄ７）からタグ比較部１８８６に出力
される。必要によっては、ラインｖａｌｉｄ状態メモリ
１８７３の書きポートに、ポート（ｕｐｄａｔｅ−ｌｉ
ｎｅ−ａｄｄｒ）からラインｖａｌｉｄ状態メモリ１８
７３への書き込みを可能にするための書き信号がキャッ
シュ制御ブロック１８７８から生成する。A 128-bit line valid memory 1
Reference numeral 873 holds a valid state of each cache line of the cache memory 230. This includes one write port (update-line-addr), one writable port (update), and eight read ports (read0-line-addr, ..., read7).
line-addr), 8 readable ports (lin
evalid0,. . . , Linevalid7) is a 128 × 1 bit memory. Like the tag memory, it has eight address generators 1881, 1
For each line address of one or more generated memory addresses, determine the line valid state saved in the current line,
Port (read0 line-addr, ..., re
ad7 line-addr) for eight simultaneous lookups. The current line val of this line
The de bit is the port (linevalid0, ...,
linevalid7) to the tag comparing unit 1886. If necessary, the port (update-li) may be added to the write port of the line valid state memory 1873.
ne-addr) to line-valid state memory 18
A write signal to enable writing to 73 is generated from cache control block 1878.

【０４７０】タグ比較部１８８６は８つのタグ比較器か
らなっており、現在生成された外部アドレスのラインア
ドレスによってアクセスされるラインのタグメモリ１８
７２に現在セーブされているタグアドレスを受け取るた
めのｔａｇ＿ｄａｔａ入力、現在生成された外部メモリ
アドレスのタグアドレス受け取るためのｔａｇ＿ａｄｄ
ｒ入力、比較されるタグアドレス部を設定するための現
動作モード信号（ｄｃｃ＿ｍｏｄｅ）を受け取るための
ｄｃｃ＿ｉｎｐｕｔ、現在生成された外部アドレスのラ
インアドレスによってアクセスされるラインにあるライ
ンｖａｌｉｄ状態メモリ１８７３に現在セーブされてい
るラインｖａｌｉｄ状態を受け取るためのｌｉｎｅ＿ｖ
ａｌｉｄ入力を持っている。比較部１８８６は８つのア
ドレス生成部１８８１それぞれに対して８つのｈｉｔ出
力を持つ。生成された外部メモリアドレスのタグアドレ
スと、生成された外部メモリのラインアドレスによって
アクセスされる位置にあるタグメモリ１８７２の内容と
が一致する時、ｈｉｔ信号とそのラインへのラインｖａ
ｌｉｄ状態ビット１８７３が出力される。この実施例で
は、外部メモリにセーブされているデータ構造は小さく
なり、タグアドレスの最上位ビットが全て同じである。
従って、タグアドレスの変化する最下位ビットだけを比
較すれば良い。これはタグ比較部１８６６がタグアドレ
スの変化する最下位ビットを比較するよう現作動モード
信号（ｄｃｃ＿ｍｏｄｅ）を設定することで可能にな
る。The tag comparing unit 1886 is composed of eight tag comparators, and stores the tag memory 18 of the line accessed by the line address of the currently generated external address.
Tag_data input for receiving the currently saved tag address in 72, tag_add for receiving the tag address of the currently generated external memory address
r input, dcc_input for receiving the current operation mode signal (dcc_mode) for setting the tag address part to be compared, line valid state memory 1873 in the line accessed by the line address of the currently generated external address, and Line_v to receive the saved line valid state
It has a valid input. The comparison unit 1886 has eight hit outputs for each of the eight address generation units 1881. When the tag address of the generated external memory address matches the contents of the tag memory 1872 at the position accessed by the generated external memory line address, the hit signal and the line va to that line are output.
A lid status bit 1873 is output. In this embodiment, the data structure saved in the external memory becomes smaller, and the most significant bits of the tag address are all the same.
Therefore, only the least significant bit of the tag address change needs to be compared. This is made possible by the tag comparator 1866 setting the current operation mode signal (dcc_mode) to compare the least significant bit of the tag address.

【０４７１】キャッシュ制御部１８７８はキャッシュメ
モリ２３０にあるデータへのアクセスが可能なとき、オ
ペランドＢ２４７、オペランドＣ２４８からの要求（ｐ
ｒｏｃ＿ｒｅｑ）と通知（ｐｒｏｃ＿ａｃｋ）を受け取
る。動作モードによっては、キャッシュメモリ２３０の
８つまでのバンクから異なるアドレスのデータが要求さ
れる。要求データがキャッシュメモリ２３０からアクセ
スできる時、タグ比較部１８８６からそのメモリのライ
ンにヒットを出す。出されたヒット信号（ｈｉｔ
０，．．．，ｈｉｔ７）に対して、キャッシュ制御部１
８７８はポート（ｃａｃｈｅ＿ｒｅａｄ）に読み込み可
能信号を生成し、ヒット信号が出されたキャッシュライ
ンへの読み込みを可能にする。ヒット信号（ｈｉｔ
０，．．．，ｈｉｔ７）ではなく要求（ｐｒｏｃ＿ｒｅ
ｑ）１８７６が出された時には、生成された要求（ｅｘ
ｔ＿ｒｅｑ）と供にデータのキャッシュラインの外部メ
モリアドレスが外部メモリに送られる。このキャッシュ
ラインは入力（ｅｘｔ＿ｄａｔａ）が可能な時、それを
通じてキャッシュメモリ２３０の８つのバンクに書き込
まれる。この場合、タグ情報もラインアドレスのタグメ
モリ１８８６に書き込まれ、そのラインのライン状態ビ
ット１８７３が出力される。When access to data in the cache memory 230 is possible, the cache control unit 1878 requests from the operands B247 and C248 (p
proc_req) and a notification (proc_ack). Depending on the operation mode, data of different addresses is requested from up to eight banks of the cache memory 230. When the requested data can be accessed from the cache memory 230, the tag comparison unit 1886 issues a hit to a line of that memory. Hit signal (hit
0,. . . , Hit7), the cache control unit 1
878 generates a readable signal at the port (cache_read) and enables reading to the cache line from which the hit signal was issued. Hit signal (hit
0,. . . , Hit7) but not the request (proc_re
q) When 1876 is issued, the generated request (ex
The external memory address of the data cache line is sent to the external memory together with (t_req). This cache line is written to the eight banks of the cache memory 230 through it when input (ext_data) is available. In this case, the tag information is also written in the tag memory 1886 of the line address, and the line status bit 1873 of the line is output.

【０４７２】キャッシュメモリ２３０の８つのバンクか
らのデータは、データオーガナイザ１８９２にあるいく
つかの多重化器を通じて出力され、所定の方法で出力デ
ータパケット１８９４に位置付けられる。ある動作モー
ドでデータオーガナイザ１８９２は、現動作モード信号
（ｄｃｃ＿ｍｏｄｅ）と生成された外部メモリアドレス
のバイトアドレス（ｂｙｔｅ＿ａｄｄｒ）を用いる事に
よって、８つのメモリバンクから出力された８つの３２
ビットワードから８ビットワードを選択、出力すること
ができる。他のモードでデータオーガナイザ１８９２
は、８つのメモリバンクから出力された８つの３２ビッ
トワードを直接出力する。前述した通り、データオーガ
ナイザはこのデータを決められた方式に整列し出力す
る。Data from the eight banks of the cache memory 230 is output through a number of multiplexers in the data organizer 1892 and is positioned in a predetermined manner in output data packets 1894. In one operation mode, the data organizer 1892 uses the current operation mode signal (dcc_mode) and the byte address (byte_addr) of the generated external memory address to generate eight 32 bits output from eight memory banks.
An 8-bit word can be selected and output from the bit word. Data Organizer 1892 in other modes
Directly outputs eight 32-bit words output from eight memory banks. As described above, the data organizer arranges and outputs this data in a predetermined format.

【０４７３】要求は次の段階で行われる。１）プロセッシングユニットはキャッシュ制御部１８７
８にあるプロセッシングユニットインターフェースにア
ドレスを送りパケットデータを要求する。２）８つのアドレス生成ユニット１８８１は動作モード
に従い、キャッシュメモリの各ブロックのアドレスを生
成する。The request is made in the next stage. 1) The processing unit is a cache control unit 187
8 to the processing unit interface to request packet data. 2) The eight address generation units 1881 generate addresses of each block of the cache memory according to the operation mode.

【０４７４】３）生成されたアドレスのタグ位置は３ポ
ートのタグメモリ１８８６の４ブロックにセーブされて
いるタグアドレスと比較され、８つの生成されたアドレ
スに相当するライン部によって位置づけられる。４）それらが一致し、そのラインのラインｖａｌｉｄ状
態１８７３が出されたら、要求されたデータはキャッシ
ュメモリ２３０に存在するとみなされる。3) The tag position of the generated address is compared with the tag addresses saved in four blocks of the three-port tag memory 1886, and positioned by the line portion corresponding to the eight generated addresses. 4) If they match and the line valid state 1873 for that line is issued, the requested data is considered to be in cache memory 230.

【０４７５】５）存在しないデータは外部バス１８９０
を介してフェッチされ、キャッシュメモリ２３０の８つ
のブロックはその外部メモリからのデータラインの内容
に更新される。新しいデータのタグアドレスはタグメモ
リ１８８６に書き込まれ、そのラインのラインｖａｌｉ
ｄ状態１８７３が出される。６）全ての要求データがキャッシュメモリ２３０に存在
すれば、それは決められたパケット形式でプロセッシン
グユニットに現れる。5) Data that does not exist is transferred to external bus 1890
And the eight blocks of cache memory 230 are updated with the contents of the data lines from its external memory. The tag address of the new data is written to the tag memory 1886, and the line
The d state 1873 is issued. 6) If all the requested data is present in the cache memory 230, it appears in the processing unit in a fixed packet format.

【０４７６】前述した通り、コプロセッサ２２４の全て
の部分（図２）は標準ＣＢｕｓインターフェース３０３
（図２０）を含めている。データキャッシュ制御部２４
０とキャッシュ２３０の標準ＣＢｕｓインターフェース
レジスタの詳細は、付録ＢのＢ４２からＢ４６までに記
載されている。このレジスタの設定はデータ制御部２４
０の作動を制御する。簡単のため、２つのレジスタ（ｂ
ａｓｅ＿ａｄｄｒｅｓｓとｂｃｃ＿ｍｏｄｅ）だけを図
１５３に示す。As described above, all parts of the coprocessor 224 (FIG. 2) have the standard CBus interface 303.
(FIG. 20). Data cache control unit 24
Details of the 0 and standard CBus interface registers of the cache 230 are described in Appendix B B42 to B46. The setting of this register is performed by the data control unit 24.
0 operation is controlled. For simplicity, two registers (b
FIG. 153 shows only “ase_address” and “bcc_mode”.

【０４７７】データキャッシュ制御部２４０とデータキ
ャッシュ２３０が有効ならば、データキャッシュ制御部
は最初全てのキャッシュラインを無効にして標準モード
で動作する。ある命令の終わりには、データキャッシュ
制御部２４０とキャッシュ２３０はいつも標準動作モー
ドに切り替わる。”Ｉｎｖａｌｉｄａｔｅ”モードを除
いた全てのモードには”Ａｕｔｏ−ｆｉｌｌａｎｄ
ｖａｌｉｄａｔｅ”と言うオプションがある。ｄｃｃ＿
ｃｆｇ２レジスタに１ビットをセットすることにより、
全てのキャッシュをｂａｓｅ＿ａｄｄｒｅｓｓレジスタ
にセーブされているアドレスから始めることができる。
この動作の間、オペランドオーガナイザＢ、Ｃ２４７，
２４８からのデータ要求は中止される。キャッシュはこ
の動作が終わった後に有効になる。ａ．標準キャッシュモードこのモードでは、２つのオペランドオーガナイザにより
要求データの外部メモリアドレスが提供される。アドレ
ス生成部１８８１が外部メモリアドレスを出力し、内部
タグメモリを用いてそれがメモリキャッシュ２３０に存
在するのかを確かめる。両方の要求データがキャッシュ
２３０に存在しない場合、入力インターフェーススイッ
チ２５２からデータが要求される。持続的かつ同時的要
求に構えてラウンド・ロビンスケジューリングが採用さ
れる。When the data cache control unit 240 and the data cache 230 are valid, the data cache control unit invalidates all cache lines first and operates in the standard mode. At the end of an instruction, the data cache control 240 and the cache 230 always switch to the standard operation mode. All modes except the "Invalidate" mode have "Auto-fill and
validate ”. dcc_
By setting one bit in the cfg2 register,
All caches can start from the address saved in the base_address register.
During this operation, operand organizers B, C247,
The data request from 248 is aborted. The cache will be valid after this operation. a. Standard Cache Mode In this mode, the external memory address of the requested data is provided by two operand organizers. The address generator 1881 outputs the external memory address, and checks whether it exists in the memory cache 230 using the internal tag memory. If both requested data are not present in cache 230, data is requested from input interface switch 252. Round-robin scheduling is employed for persistent and simultaneous requests.

【０４７８】同時的な要求に対し、１つのデータアイテ
ムがキャッシュに存在すれば、それは要求したデータバ
スの後ろの３２ビットに位置するようになる。他のデー
タは入力インターフェーススイッチを通じて外部に要求
される。ｂ．シングル出力一般色空間変換モードこのモードでは、要求はオペランドオーガナイザ部Ｂか
ら１２ビットバイトのアドレス形式で出される。図６０
に示されている様に、要求データアイテムは８ビットカ
ラー出力値である。１２ビットアドレスはアドレス生成
部１８８１のｉｎｄｅｘ＿ｐａｃｋｅｔ入力に入力さ
れ、８つのアドレス生成部１８８１は図９６に示される
形式の３２ビット外部メモリアドレスを生成する。この
生成されたアドレスのバンク、ライン、バイトアドレス
は表１２と図６１によって決められる。外部メモリアド
レスは、８つの９ビットラインとバイトアドレスとして
解釈され、それはＲＡＭの８つのバンクのバイトを指す
ために使われる。キャッシュは補間のため主データパス
２４２によりオペランドオーガナイザ部に、図６０に示
された前述の原理で戻されたバンクの８バイト値を求め
るためにアクセスされる。全てのシングル出力一般カラ
ー値テーブルはキャッシュメモリ２３０に収まるため、
シングルカラー変換モードを適用する前にシングル出力
カラー値テーブルをキャッシュメモリ２３０にロードす
るのが望ましい。ｃ．マルチ出力一般色空間変換モードこのモードでは、１２ビットワードアドレスがオペラン
ドオーガナイザ部Ｂ２４７から受けられる。要求データ
アイテムは図６２を参照して前述した３２ビットカラー
出力値である。１２ビットアドレスはアドレス生成部１
８８１のｉｎｄｅｘ＿ｐａｃｋｅｔ入力に入力され、８
つのアドレス生成部１８８１は、図９６に示される形式
の８つの異なる３２ビット外部メモリアドレスを作る。
外部メモリアドレスのラインとタグアドレスは、表１２
と図６３によって決定される。外部メモリアドレスは、
図６３を参照して前述したように、７ビットラインアド
レスと２ビットタグアドレスに分けられる９ビットアド
レスを有する８個の９ビットアドレスとして解釈され
る。タグアドレスが発見されなかった場合、入力インタ
ーフェーススイッチ２５２（図２）から適切なデータが
ロードされるまでキャッシュは停止する。データが利用
可能な場合、出力データはオペランドオーガナイザ部に
出力される。ｄ．ＪＰＥＧ符号化モードこのモードでは、ＪＰＥＧ符号化モードに必要なテーブ
ルなどがキャッシュＲＡＭのバンクにセーブされる。テ
ーブルの記憶についてはＪＰＥＧ符号化モード（表１
４、１６）のところに述べられている。ｅ．低速ＪＰＥＧ復号モードこのモードでは、データは表１７に従って生成される。ｆ．行列乗算モードこのモードでは、キャッシュは２５６バイトラインのデ
ータにアクセスするために使われる。ｇ．Ｄｉｓａｂｌｅｄモードこのモードでは、全ての要求は入力インターフェースス
イッチ２５２にパスされる。ｈ．Ｉｎｖａｌｉｄａｔｅ（無効化）モードこのモードでは、ラインｖａｌｉｄ状態ビットをクリア
することにより、全てのキャッシュの内容が無効にされ
る。For a simultaneous request, if one data item exists in the cache, it will be located in the last 32 bits of the requested data bus. Other data is requested externally through the input interface switch. b. Single Output General Color Space Conversion Mode In this mode, the request is issued from the operand organizer B in the form of a 12-bit byte address. Figure 60
The requested data item is an 8-bit color output value, as shown in FIG. The 12-bit address is input to the index_packet input of the address generator 1881, and the eight address generators 1881 generate a 32-bit external memory address in the format shown in FIG. The bank, line, and byte addresses of the generated address are determined by Table 12 and FIG. External memory addresses are interpreted as eight 9-bit lines and byte addresses, which are used to point to bytes in eight banks of RAM. The cache is accessed by the main data path 242 for interpolation to the operand organizer to determine the 8-byte value of the bank returned in the manner described above with reference to FIG. Since all single output general color value tables fit in the cache memory 230,
Preferably, a single output color value table is loaded into cache memory 230 before applying the single color conversion mode. c. Multi-output general color space conversion mode In this mode, a 12-bit word address is received from the operand organizer unit B247. The request data item is the 32-bit color output value described above with reference to FIG. The 12-bit address is stored in the address generator 1
881 is input to the index_packet input, and 8
One address generator 1881 produces eight different 32-bit external memory addresses of the form shown in FIG.
Table 12 shows the line and tag address of the external memory address.
And FIG. The external memory address is
As described above with reference to FIG. 63, it is interpreted as eight 9-bit addresses having a 9-bit address divided into a 7-bit line address and a 2-bit tag address. If no tag address is found, the cache stops until the appropriate data is loaded from input interface switch 252 (FIG. 2). If data is available, the output data is output to the operand organizer. d. JPEG encoding mode In this mode, tables and the like necessary for the JPEG encoding mode are saved in the bank of the cache RAM. The table is stored in the JPEG encoding mode (Table 1).
4, 16). e. Slow JPEG decoding mode In this mode, data is generated according to Table 17. f. Matrix multiply mode In this mode, the cache is used to access data on 256 byte lines. g. Disabled mode In this mode, all requests are passed to the input interface switch 252. h. Invalidate mode In this mode, the contents of all caches are invalidated by clearing the line valid status bit.

【０４７９】３．１８．７入力インターフェーススイ
ッチ図２で、入力インターフェーススイッチはピクセルオー
ガナイザ部２４６、データキャッシュ制御部２４０、命
令制御部２３５からの要求データを調節する投割を果た
す。またこれは外部インターフェース制御部２３８とロ
ーカルメモリ制御部２３６に必要なアドレスとデータを
伝送する。3.18.7 Input Interface Switch In FIG. 2, the input interface switch performs a function of adjusting request data from the pixel organizer unit 246, the data cache control unit 240, and the instruction control unit 235. It also transmits necessary addresses and data to the external interface controller 238 and the local memory controller 236.

【０４８０】入力インターフェーススイッチ２５２はベ
ースアドレス若しくはホストメモリマップにあるメモリ
オブジェクトのいずれかのレジスタにその設定を保存す
る。２０個のアドレスビットが必要なため、これはペー
ジ境界に整列されるバーチュアルアドレスである。ピク
セルオーガナイザ部、データキャッシュ制御部、命令制
御部からの要求に対して、入力インターフェーススイッ
チ２５２は、まずデータの開始アドレスの上位６ビット
からコプロセッサのベースアドレスビットを減じる。こ
の結果が負であるか、この結果の上位６ビットが０では
ない場合はＰＣＩバスが望ましい伝送先であることを意
味する。The input interface switch 252 stores the setting in any register of the base address or the memory object in the host memory map. This is a virtual address aligned on a page boundary because 20 address bits are required. In response to a request from the pixel organizer unit, the data cache control unit, or the instruction control unit, the input interface switch 252 first subtracts the coprocessor base address bit from the upper six bits of the data start address. If the result is negative or the upper 6 bits of the result are not 0, then the PCI bus is the desired destination.

【０４８１】結果の上位６ビットが０である場合は、デ
ータマップがコプロセッサのメモリ位置を現すことを意
味する。その後、入力インターフェーススイッチはコプ
ロセッサの位置が正しいか否かを判別するため次の３ビ
ットを検査する。コプロセッサの正当な位置は、１）コプロセッサのベースアドレスからオフセット０ｘ
０１００００００から始まる一般インターフェースが占
める１６メガバイト。If the upper 6 bits of the result are 0, it means that the data map represents a memory location of the coprocessor. The input interface switch then examines the next three bits to determine if the coprocessor is in the correct position. The legal location of the coprocessor is: 1) offset 0x from the coprocessor base address
16 megabytes occupied by general interfaces starting from 01000000.

【０４８２】２）コプロセッサのメモリオブジェクトの
ベースアドレスからオフセット０ｘ０２００００００か
ら始まるローカルメモリ制御部（ＬＭＣ）が占める３２
メガバイト。不当なコプロセッサの位置を指す要求は、
入力インターフェーススイッチによりエラーと見なされ
る。ＰＣＩバスはコプロセッサのメモリオブジェクトが
占める領域以外のアドレスのデータソースとなる。入力
インターフェーススイッチは要求データがＰＣＩバスか
らのものなのか、それとも一般インターフェースからの
ものかをＥＩＣに知らせるためｉソース信号を用いる。2) Local memory controller (LMC) occupying 32 starting at offset 0x0200000 from the base address of the memory object of the coprocessor 32
Megabytes. Requests pointing to illegal coprocessor locations are:
Considered an error by the input interface switch. The PCI bus is a data source for addresses other than the area occupied by the memory objects of the coprocessor. The input interface switch uses the i source signal to inform the EIC whether the requested data is from the PCI bus or from the general interface.

【０４８３】アドレス復号処理の後、正当な要求は適切
なＩＢｕｓインターフェースに伝送される。ＥＩＣとＬ
ＭＣはｉ−ａｃｋ信号が出された時、入力インターフェ
ーススイッチにデータを伝送する。しかし入力インター
フェーススイッチは入力されるワード数をカウントしな
いので、現在のデータ伝送がいつ終わるのかを、ピクセ
ルオーガナイザ部により制御されるｉ−ｏｅ信号、命令
制御部、データキャッシュ制御部が監視すなければなら
ない。After the address decoding process, the legitimate request is transmitted to the appropriate IBus interface. EIC and L
The MC transmits data to the input interface switch when the i-ack signal is output. However, since the input interface switch does not count the number of input words, the i-oe signal controlled by the pixel organizer unit, the command control unit, and the data cache control unit must monitor when the current data transmission ends. No.

【０４８４】入力インターフェーススイッチ２５２はピ
クセルオーガナイザ部、データキャッシュ制御部、命令
制御部の３つのモジュールを調節する。これらはデータ
を同時に要求することができるが、物理的な資源は２つ
しかないため、その要求は直に処理されない。入力イン
ターフェーススイッチに使われる調節技術は優先権をベ
ースにし、またプログラムも可能である。入力インター
フェーススイッチの設定レジスタにある制御ビットは、
命令制御部、データキャッシュ制御部、ピクセルオーガ
ナイザ部の相対的優先権を指定する。優先権が低いモジ
ュールからの要求は、その他の２つのモジュールからの
同じ資源へのアクセス要求がない場合に受け入れられ
る。少なくとも２つの要求発行元に同じ優先順位が与え
られると、要求が受付けられる発行元を決定するために
ラウンドロビン技術を用いる必要が生じる。The input interface switch 252 controls three modules: a pixel organizer, a data cache controller, and an instruction controller. They can request data at the same time, but since there are only two physical resources, the request is not processed immediately. The adjustment technique used for the input interface switch is priority based and can be programmed. The control bits in the input interface switch setting register are:
The relative priority of the instruction control unit, the data cache control unit, and the pixel organizer unit is specified. A request from a lower priority module is accepted in the absence of a request from the other two modules to access the same resource. Given that at least two request sources are given the same priority, it becomes necessary to use round-robin techniques to determine the request source from which requests are accepted.

【０４８５】１つのソースに直ちにアクセスするのが不
可能であるため、入力インターフェーススイッチは要求
されたデータのアドレスとバースト長を記憶し、要求元
から提供されたデータをプリフェッチするかどうかをみ
る必要がある。あるソースに対する処理の中で、ＩＢｕ
ｓ処理がない場合には優先権を決める調整処理が必要に
なる。Since it is impossible to immediately access one source, the input interface switch needs to store the address and burst length of the requested data and check whether to prefetch the data provided by the requestor. There is. In the process for a certain source, IBu
If there is no s process, an adjustment process for determining the priority is required.

【０４８６】図１４５に命令インターフェーススイッチ
２５２の詳細を示す。スイッチ２５２は標準ＣＢｕｓイ
ンターフェースとレジスタファイル８６０以外にアドレ
ス復号器８６３と調節部８６４の間に２つのＩＢｕｓト
ランシーバ６６１を持つ。アドレス復号器８６３はピク
セルオーガナイザ部、データキャッシュ制御部、命令制
御部から受けた要求に対するアドレス復号をする。アド
レス復号器８６３は、アドレスが正当なのかを検査する
他、必要によってアドレスを再マッピングする。調節部
８６４はどの要求をＩＢｕｓトランシーバ６６１からＩ
Ｂｕｓトランシーバ６６２に伝送するのかを決める。優
先権はプログラム可能である。FIG. 145 shows the details of the command interface switch 252. The switch 252 has two IBus transceivers 661 between the address decoder 863 and the controller 864 in addition to the standard CBus interface and the register file 860. The address decoder 863 decodes an address in response to a request received from the pixel organizer, the data cache controller, and the instruction controller. The address decoder 863 checks whether the address is valid and remaps the address if necessary. The control unit 864 determines which request is transmitted from the IBus transceiver 661 to the I bus transceiver 661.
It decides whether to transmit to the Bus transceiver 662. Priority is programmable.

【０４８７】ＩＢｕｓトランシーバ８６１、８６２は、
マルチプレクシングとデマルチプレクシング機能と、他
のインターフェースから入力インターフェーススイッチ
への通信を可能にするためのトライステートのバッファ
ーリング機能を有している。３．１８．８ローカルメモリ制御部図２において、ローカルメモリ制御部２３６は、ローカ
ルメモリの制御及びローカルメモリとコプロセッサ内の
モジュールとの間におけるアクセス要求の処理の全てを
担当する。ローカルメモリ制御部２３６は、結果オーガ
ナイザ２４９からの書き込み要求と入力インターフェー
ススイッチ２５２からの読み出し要求に応答する。更
に、周辺インターフェース制御部２３７と通常の一般Ｃ
Ｂｕｓ入力からの読み出しと書き込み要求に対しても応
答する。ローカルメモリ制御部はプログラム可能なプラ
イオリティシステムを用いており、更にスループットを
最大化するためにＦＩＦＯバッファを採用している。The IBus transceivers 861 and 862 are
It has a multiplexing and demultiplexing function, and a tri-state buffering function for enabling communication from another interface to the input interface switch. 3.18.8 Local Memory Control Unit In FIG. 2, the local memory control unit 236 is in charge of all of the control of the local memory and the processing of the access request between the local memory and the module in the coprocessor. The local memory controller 236 responds to a write request from the result organizer 249 and a read request from the input interface switch 252. Further, the peripheral interface control unit 237 and the ordinary general C
It also responds to read and write requests from the Bus input. The local memory controller uses a programmable priority system and employs a FIFO buffer to maximize throughput.

【０４８８】本発明においては、ファーストイン・ファ
ーストアウト（ＦＩＦＯ）バッファの他に、メモリアレ
イからポートをデカップルするためにマルチポートバー
ストダイナミックメモリ制御部が用いられている。図１
４６は、本発明の第１の実施例に従い、４ポートバース
トダイナミックメモリ制御部のブロック図を示してい
る。この回路には、メモリアレイ１９１０へのアクセス
を必要とする２つの書き込みポート（Ａ１９４４とＢ１
９４６）と２つの読み出しポート（Ｃ１９４８とＤ１９
５０）が含まれている。読み出しポート１９４８、１９
５０のデータパスは別個のＦＩＦＯ１９３６、１９３８
経由でメモリアレイ１９１０から出てくるのに対し、２
つの書き込みポートからのデータパスは別個のＦＩＦＯ
１９２０、１９２２を通り、多重化部１９１２経由でメ
モリアレイ１９１０に向かう。中央制御部１９３２は、
ダイナミックメモリ１９１０へのインターフェースに必
要な全てのコントロール信号を駆動すると共に全体のポ
ートアクセスを調整する。リフレッシュカウンタ１９３
４は、メモリアレイ１９１０のためにダイナミックメモ
リのリフレッシュサイクルの必要時期を決め、制御部１
９３２と共にこれらを調整する。In the present invention, in addition to a first-in first-out (FIFO) buffer, a multiport burst dynamic memory controller is used to decouple ports from the memory array. FIG.
Reference numeral 46 denotes a block diagram of a 4-port burst dynamic memory control unit according to the first embodiment of the present invention. This circuit has two write ports (A1944 and B1) that require access to the memory array 1910.
946) and two read ports (C1948 and D19)
50) is included. Read ports 1948, 19
The 50 data paths are separate FIFOs 1936, 1938
Out of the memory array 1910 via
The data path from one write port is a separate FIFO
After passing through 1920 and 1922, the signal goes to the memory array 1910 via the multiplexing unit 1912. The central control unit 1932
It drives all control signals necessary for interfacing to the dynamic memory 1910 and coordinates overall port access. Refresh counter 193
4 determines the required time of the refresh cycle of the dynamic memory for the memory array 1910,
Adjust these together with 932.

【０４８９】好ましくは、メモリアレイ１９１０に対す
るデータの読み出しと書き込みは、書き込みポート１９
４４、１９４６からＦＩＦＯ１９２０、１９２２へ、或
はＦＩＦＯ１９３６、１９３８から読み出しポート１９
４８、１９５０への転送の２倍のレートで行われる。こ
の結果、書き込みと読み出しポート１９４４、１９４
６、１９４８、１９５０を通してデータを転送するのに
要する時間に対し、メモリアレイ１９１０からの転送、
又はメモリアレイ１９１０への転送に要する時間（いか
なるメモリシステムのボトルネックである）を可能な限
り短くするのである。Preferably, reading and writing of data from and to memory array 1910 is performed by using write port 19.
44, 1946 to FIFO 1920, 1922 or FIFO 1936, 1938 to read port 19
48, 1950 at twice the rate. As a result, the write and read ports 1944, 194
6, 1948, 1950, the time required to transfer data through the memory array 1910,
Alternatively, the time required for transfer to the memory array 1910 (which is the bottleneck of any memory system) is made as short as possible.

【０４９０】データは、書き込みポート１９４４、１９
４６のいずれかを経由してメモリアレイ１９１０に書き
込まれる。書き込みポート１９４４、１９４６に接続さ
れた回路は、初期値ゼロのＦＩＦＯ１９２０、１９２２
のみを認知する事になる。書き込みポート１９４４、１
９４６を通してのデータ転送は、ＦＩＦＯ１９２０、１
９２２が一杯になるか、又はバーストが終了するまでス
ムーズに進んでいく。データが最初にＦＩＦＯ１９２
０、１９２２に書き込まれると、制御部１９３２はＤＲ
ＡＭへのアクセスのための他のポートとの仲裁を行う。
アクセスが得られると、データは最高レートでＦＩＦＯ
１９２０、１９２２から読み出され、メモリアレイ１９
１０に書き込まれる。ＤＲＡＭ１９１０へのバースト書
き込みサイクルは、ＦＩＦＯ１９２０、１９２２にプリ
セットされた数のデータワードが貯えられた場合、又は
書き込みポートからのバーストが終了した場合のみに開
始される。いずれの場合においても、ＤＲＡＭ１９１０
へのバーストは許可された時点から進み、ＦＩＦＯ１９
２０、１９２２が空になるか、又はより高いプライオリ
ティポートからのサイクル要求があるまで続く。いずれ
のイベントにおいてもデータは、ＦＩＦＯが充満する
か、又は現在のバーストが終了し、新たなバーストが開
始するまで、書き込みポートからＦＩＦＯ１９２０、１
９２２へ邪魔されなく続けて書き込まれる。後者の場
合、新しいバーストは、以前のバーストがＦＩＦＯ１９
２０、１９２２を空にしてＤＲＡＭ１９１０に書き込ま
れるまでは進行されない。前者の場合には、最初のワー
ドがＦＩＦＯ１９２０、１９２２から読み出されてＤＲ
ＡＭ１９１０に書き込まれるや否やデータ転送が再開さ
れる。ＦＩＦＯ１９２０、１９２２からのデータ転送が
最高レートであるため、書き込みポート１９４４、１９
４６がストールするのは、制御部１８３２が他のポート
からのサイクル要求で割り込みされた時のみ可能であ
る。書き込みポート１９４４、１９４６からＦＩＦＯ１
９２０、１９２２へのデータ転送に対するいかなる割り
込みも、できるだけ最小に維持するのが望ましい。Data is written to write ports 1944, 19
The data is written to the memory array 1910 via any one of 46. The circuits connected to the write ports 1944, 1946 are provided with FIFOs 1920, 1922 having an initial value of zero.
Only you will recognize. Write port 1944, 1
Data transfer through 946 is performed in FIFO 1920, 1
Proceed smoothly until 922 is full or the burst is over. The data is first FIFO 192
0, 1922, the control unit 1932
Arbitrate with other ports for access to the AM.
When access is available, data is FIFO at maximum rate
1920, 1922, and the memory array 19
Written to 10. A burst write cycle to DRAM 1910 starts only when a preset number of data words have been stored in FIFOs 1920 and 1922 or when the burst from the write port has ended. In either case, the DRAM 1910
Burst from the permitted point, and FIFO 19
Continue until 20, 1922 is empty or there is a cycle request from a higher priority port. In either event, data is transferred from the write ports to the FIFO 1920, 1 until the FIFO is full or the current burst ends and a new burst begins.
The data is continuously written to 922 without any interruption. In the latter case, the new burst is the same as the previous burst
The process does not proceed until the data 20 and 1922 are emptied and written into the DRAM 1910. In the former case, the first word is read from FIFO 1920,
As soon as the data is written to the AM 1910, the data transfer is resumed. Since data transfer from the FIFOs 1920 and 1922 is at the highest rate, the write ports 1944 and 1944
The stall 46 is possible only when the control unit 1832 is interrupted by a cycle request from another port. FIFO1 from write ports 1944, 1946
It is desirable to keep any interruptions to data transfers to 920, 1922 as minimal as possible.

【０４９１】読み出しポート１９４８、１９５０は逆の
順で動作する。読み出しポート１９４８、１９５０が読
み出し要求を出すと、即刻、ＤＲＡＭサイクルが要求さ
れる。この要求に対する許可が得られるとメモリアレイ
１９１０が読まれ、対応するＦＩＦＯ１９３６、１９３
８にデータが書き込まれる。最初のデータワードがＦＩ
ＦＯ１９３６、１９３８に書き込まれるやいなや、読み
出しポート１９４８、１９５０による読み出しが可能に
なる。このように最初のデータワードを得るには初期遅
延が存在するが、その後の連続するデータワードの獲得
にはおそらくそれ以上の遅延は出て来ないのである。Ｄ
ＲＡＭの読み出しは、より高いプライオリティのＤＲＡ
Ｍ要求があるか、読み出しＦＩＦＯ１９３６、１９３８
が一杯になった場合、或は読み出しポート１９４８、１
９５０がそれ以上データを要求しなくなったら終了す
る。一旦このようにして読み出しが終了すると、ＦＩＦ
Ｏ１９３６、１９３８へプリセットされているデータワ
ードの数に余裕ができるまで再開されない。一旦読み出
しポートがサイクルを終了すると、ＦＩＦＯ１９３６、
１９３８に残っているいかなるデータも廃棄される。The read ports 1948 and 1950 operate in the reverse order. As soon as the read ports 1948, 1950 issue a read request, a DRAM cycle is requested. When permission for this request is obtained, the memory array 1910 is read and the corresponding FIFOs 1936, 193 are read.
8 is written. FI is the first data word
As soon as the data is written to the FOs 1936 and 1938, the data can be read by the read ports 1948 and 1950. Thus, while there is an initial delay in obtaining the first data word, there is probably no further delay in obtaining subsequent data words. D
RAM read is a higher priority DRA
M request, read FIFO 1936, 1938
Is full or the read ports 1948, 1
If 950 no longer requests data, it ends. Once reading is completed in this way, the FIFO
The processing is not restarted until the number of data words preset in O1936 and 1938 is sufficient. Once the read port completes the cycle, FIFO 1936,
Any data remaining in 1938 is discarded.

【０４９２】常にＤＲＡＭコントロールが最小値を上回
るようにするため、プリセットされている数のデータワ
ードが全て転送されるまで（或は、対応するＦＩＦＯ１
９２０、１９２２が空になるか、読み出しＦＩＦＯ１９
３６、１９３８が一杯になるまで）バーストが割り込み
されないようにＤＲＡＭアクセスへの再仲裁は制限され
る。全てのアクセスポート１９４４、１９４６、１９４
８、１９５０はそれぞれに対応するバースト開始アドレ
スを持っており、これらはバーストの開始時にカウンタ
１９４２にラッチされている。このカウンタはポートに
対する取り引きのためのカレントアドレスを保持してお
り、例え転送が割り込みされても、いっでも正しいメモ
リアドレスで再開する事が可能である。現在アクティヴ
なＤＲＡＭサイクルのアドレスのみが多重化部１９４０
により選択され、行アドレスカウンタ１９１６と列アド
レスカウンタ１９１８に送られる。アドレスの低次Ｎビ
ットは列カウンタ１９１８に入力され、一方の上位アド
レスビットは行カウンタ１９１６へ入力される。多重化
部１９１４は、ＤＲＡＭの行アドレスタイムの間には行
カウンタ１９１６からメモリアレイ１９１０へ行アドレ
スを出力し、ＤＲＡＭの列アドレスタイムの間には列カ
ウンタ１９１８から列アドレスを送る。行アドレスカウ
ンタ１９１６と列アドレスカウンタ１９１８は、いかな
るバーストの開始時においてもメモリアレイＤＲＡＭ１
９１０へロードされる。これは、ポートサイクルの開始
時と、割り込みされたバーストの継続時の両方に当ては
まる事実である。列アドレスカウンタ１９１８は、それ
ぞれのメモリへの転送が起きた後にインクリメントさ
れ、行アドレスカウンタ１９１６は列アドレスカウンタ
１９１８がゼロに変わるとインクリメントされる。後者
の場合にはバーストが終了され、新たな行アドレスで再
開されなければならない。In order to ensure that the DRAM control always exceeds the minimum value, a preset number of data words must be transferred (or the corresponding FIFO1
920 and 1922 are empty or read FIFO 19
Re-arbitration for DRAM accesses is limited so that bursts are not interrupted (until 36, 1938 is full). All access ports 1944, 1946, 194
8, 1950 have corresponding burst start addresses, which are latched in counter 1942 at the start of the burst. This counter holds the current address for dealing with the port, so that even if the transfer is interrupted, it is possible to restart at least the correct memory address. Only the address of the currently active DRAM cycle is multiplexed by 1940.
And sent to the row address counter 1916 and the column address counter 1918. The lower N bits of the address are input to column counter 1918, while one upper address bit is input to row counter 1916. The multiplexing unit 1914 outputs a row address from the row counter 1916 to the memory array 1910 during the row address time of the DRAM, and sends a column address from the column counter 1918 during the column address time of the DRAM. Row address counter 1916 and column address counter 1918 indicate that memory array DRAM1
910. This is true both at the beginning of the port cycle and at the continuation of the interrupted burst. The column address counter 1918 is incremented after each memory transfer occurs, and the row address counter 1916 is incremented when the column address counter 1918 changes to zero. In the latter case, the burst is terminated and must be restarted with a new row address.

【０４９３】本実施例では、メモリアレイ１９１０は４
×８ビットバイトラインを含んでおり、ワード当たり３
２ビットを構成すると仮定している。更に、それぞれの
書き込みポート１９４４、１９４６に対応する４バイト
の書き込みイネーブル信号のセット１９５０、１９５２
があり、個別的にデータがメモリアレイ１９１０内のそ
れぞれの３２ビットデータワードのそれぞれの８ビット
部分に書き込まれるようにする。メモリアレイ１９１０
に書き込まれるそれぞれのワード内のいかなるバイトに
データの書き込みに対するマスクを任意にかける事が可
能であるため、対応するＦＩＦＯ１９２６、１９２８に
それぞれのデータワードと共に書き込みイネーブル情報
を貯えておく必要がある。これらのＦＩＦＯ１９２６、
１９２８は書き込みＦＩＦＯ１９２０、１９２２のコン
トロールに用いられるのと同じ信号でコントロールされ
るが、ＦＩＦＯ１９２０、１９２２へのデータの書き込
みに必要とされる３２ビットの代わりに４ビットのみが
用いられる。同様に、多重化部１９３０は多重化部１９
１２と同じようにコントロールされる。選択された書き
込みイネーブルは、制御部１９３２へ入力され、制御部
はこれらの情報を用い、多重化部１９１２によりメモリ
アレイ１９１０へ入力される書き込みデータと同期して
メモリアレイ１９１０内のアドレスされたワードへの書
き込みを選択的に可能又は不可能にする。In this embodiment, the memory array 1910 has 4
X 8 bit byte lines, 3 words per word
It is assumed that two bits are configured. Furthermore, a set of 4-byte write enable signals 1950, 1952 corresponding to the respective write ports 1944, 1946.
To individually write data to each 8-bit portion of each 32-bit data word in memory array 1910. Memory array 1910
Since it is possible to arbitrarily mask data writing to any byte in each word to be written to the corresponding FIFO, it is necessary to store write enable information in the corresponding FIFOs 1926 and 1928 together with each data word. These FIFOs 1926,
1928 is controlled by the same signals used to control the write FIFOs 1920 and 1922, but only four bits are used instead of the 32 bits required to write data to the FIFOs 1920 and 1922. Similarly, the multiplexing unit 1930
Controlled in the same way as 12. The selected write enable is input to the control unit 1932, and the control unit uses the information to synchronize the write data input to the memory array 1910 by the multiplexing unit 1912 with the addressed word in the memory array 1910. To selectively enable or disable writing to the file.

【０４９４】図１４６の構成は制御部１９３２の制御下
で動作する。図１４７は、図１４６において制御部１９
３２の動作の詳細を示す状態図である。パワーアップの
後とリセットの完了時に、状態器は強制的にＩＤＬＥ１
００状態になり、この状態ですべてのＤＲＡＭコントロ
ール信号がインアクティブ（ｈｉｇｈ）になり、多重化
部１９１４は行アドレスをＤＲＡＭアレイ１９１０へ送
る。リフレッシュまたはサイクル要求が検出されると、
ＲＡＳＤＥＬ１１９６２状態へ遷移される。次のクロッ
クエッジでサイクル要求とリフレッシュがなくなった
ら、状態器はＩＤＬＥ１９００状態に戻る。そうでない
と、ＤＲＡＭｔＲＰ（ＲＡＳプリチャージタイミング
制限）周期が満たされた時にＲＡＳＯＮ１９６６状態へ
遷移され、この時、行アドレスストローブ信号ＲＡＳは
ローレベルになる。ｔＲＣＤ（ＲＡＳからＣＡＳへの遅
延タイミング制限）が満たされた後、ＣＯＬ１９６８状
態へ遷移され、ＤＲＡＭアレイ１９１０へ入力するため
の列アドレスを選択するように多重化部１９１４がスイ
ッチされる。次のクロックエッジでＣＡＳＯＮ１９７０
状態に遷移され、ＤＲＡＭ列アドレスストローブ（ＣＡ
Ｓ）信号がアクティブローになる。一旦、ｔＣＡＳ（Ｃ
ＡＳアクティヴタイミング制限）が満たされたら、ＣＡ
ＳＯＦＦ１９７２状態へ遷移され、この状態でＤＲＡＭ
列アドレスストローブ（ＣＡＳ）は再びインアクティヴ
ハイになる。ここで、更なるデータワードが転送される
ことになっていると共に、より高いプライオリティのサ
イクル要求や、リフレッシュが差し迫ってないか、或は
再仲裁するには速すぎる場合、それから一旦ｔＣＰ（Ｃ
ＡＳプリチャージタイミング制限）周期が満たされたら
ＣＡＳＯＮ１９７０状態へ復帰し、ＤＲＡＭ列アドレス
ストローブ（ＣＡＳ）は再びアクティヴローになる。も
し更なるデータワードの転送がない、或は再仲裁が発生
し、より高いプライオリティのサイクル要求や、リフレ
ッシュが差し迫っている場合、ｔＲＡＳ（ＲＡＳアクテ
ィヴタイミング制限）とｔＣＰ（ＣＡＳプリチャージタ
イミング制限）が両方満たされたら、その代わりにＲＡ
ＳＯＦＦ１９７４状態へ遷移される。この状態で、ＤＲ
ＡＭ行アドレスストローブ（ＲＡＳ）信号はインアクテ
ィヴハイになる。次のクロックエッジで状態器はＩＤＬ
Ｅ１８６０状態に復帰し、次のサイクル開始を準備す
る。The structure shown in FIG. 146 operates under the control of the control unit 1932. FIG. 147 is a block diagram of the control unit 19 shown in FIG.
32 is a state diagram showing details of the operation of FIG. After power-up and upon completion of reset, the state machine forces IDLE1
00 state, all the DRAM control signals become inactive (high) in this state, and the multiplexer 1914 sends the row address to the DRAM array 1910. When a refresh or cycle request is detected,
Transition is made to the RASDEL 11962 state. When there are no more cycle requests and refreshes at the next clock edge, the state machine returns to the IDLE 1900 state. Otherwise, transition to the RASON 1966 state occurs when the DRAM tRP (RAS precharge timing limit) cycle is satisfied, at which time the row address strobe signal RAS goes low. After tRCD (delay timing restriction from RAS to CAS) is satisfied, the state transits to the COL1968 state, and the multiplexing unit 1914 is switched so as to select a column address to be input to the DRAM array 1910. CASON 1970 at next clock edge
To the DRAM column address strobe (CA
S) The signal goes active low. Once, tCAS (C
If AS Active Timing Limit is met, CA
The state transits to the SOFF1972 state.
The column address strobe (CAS) goes inactive high again. Here, if more data words are to be transferred and if a higher priority cycle request or refresh is not imminent or too fast to re-arbitrate, then tCP (C
When the (AS precharge timing limitation) cycle is satisfied, the state returns to the CASON 1970 state, and the DRAM column address strobe (CAS) becomes active low again. If there is no further data word transfer or re-arbitration occurs and a higher priority cycle request or refresh is imminent, tRAS (RAS active timing limit) and tCP (CAS precharge timing limit) If both are met, RA instead
Transition is made to the SOFF 1974 state. In this state, DR
The AM row address strobe (RAS) signal goes inactive high. State machine IDL at next clock edge
It returns to the E1860 state and prepares for the start of the next cycle.

【０４９５】ＲＡＳＤＥＬ２１９６４状態でリフレッ
シュ要求が検出されると、一旦ｔＲＰ（ＲＡＳプリチャ
ージタイミング制限）が満たされたら、ＲＣＡＳＯＮ
１９８０状態に遷移される。この状態でＤＲＡＭ列アド
レスストローブがアクティヴローになり、ＲＡＳリフレ
ッシュサイクルの前にＤＲＡＭＣＡＳを開始する。次
のクロックエッジで遷移はＲＲＡＳＯＮ１９７８へ行
われ、ＤＲＡＭ行アドレスストローブ（ＲＡＳ）はアク
ティヴローになる。ｔＣＡＳ（ＣＡＳアクティヴタイミ
ング制限）が満たされると遷移はＲＣＡＳＯＦＦ１９
７６へ行われ、ＤＲＡＭ列アドレスストローブ（ＣＡ
Ｓ）はインアクティヴハイになる。一旦ｔＲＡＳ（ＲＡ
Ｓアクティヴタイミング制限）が満たされると遷移はＲ
ＡＳＯＦＦ１９７４へ行われ、ＤＲＡＭ行アドレススト
ローブ（ＲＡＳ）はインアクティヴハイになり、有効的
にリフレッシュサイクルを終了させる。状態器は通常の
ＤＲＡＭサイクルのために上記のような振る舞いを継続
し、ＩＤＬＥ１９６０状態へ遷移する。When a refresh request is detected in the RASDEL2 1964 state, once tRP (RAS precharge timing restriction) is satisfied, RCASON
Transition is made to the 1980 state. In this state, the DRAM column address strobe becomes active low, and DRAM CAS starts before the RAS refresh cycle. At the next clock edge, a transition is made to RRASON 1978 and the DRAM row address strobe (RAS) goes active low. When tCAS (CAS active timing limit) is satisfied, the transition is RCASOFF 19
76, and the DRAM column address strobe (CA
S) becomes inactive high. Once tRAS (RA
If the S active timing limit is satisfied, the transition is R
ASOFF 1974, the DRAM row address strobe (RAS) goes inactive high, effectively ending the refresh cycle. The state machine continues to behave as described above for a normal DRAM cycle and transitions to the IDLE 1960 state.

【０４９６】図１４６のリフレッシュカウンタ１９３４
は単純にカウンタであり、１５マイクロ秒当たりに一回
の固定レート、或は特殊ＤＲＡＭ業者の要求により定ま
ったレートでリフレッシュ要求信号を発生させる。リフ
レッシュ要求が発行されると、この要求は図１４７の状
態器により認知されるまで発行状態を続ける。このアク
ノレッジメントは、状態器がＲＣＡＳＯＮ１９８０状態
に入った時に行われ、状態器がリフレッシュ要求の撤去
を検出するまでその状態を続ける。The refresh counter 1934 shown in FIG.
Is a simple counter which generates a refresh request signal at a fixed rate of once per 15 microseconds or at a rate determined by the requirements of a special DRAM vendor. When a refresh request is issued, the request remains in the issued state until acknowledged by the state machine of FIG. This acknowledgment occurs when the state machine enters the RCASON 1980 state and remains there until the state machine detects withdrawal of the refresh request.

【０４９７】図１４８には、疑似コードフォームで図１
４６の仲裁器１９２４の動作が示されている。ここで
は、４つのサイクル要求発行者の中でどれにメモリアレ
イ１９１０へのアクセスを許可するかを決める方法と、
アクセスへの公平さを保つためにサイクル要求者のプラ
イオリティを修正するメカニズムを記述している。これ
らのコードに用いられたシンボルは図１４９に説明され
ている。FIG. 148 shows the pseudo code form of FIG.
The operation of 46 arbitrators 1924 is shown. Here, a method for determining which of the four cycle request issuers is permitted to access the memory array 1910,
Describes a mechanism that modifies the priority of a cycle requestor to maintain fairness to access. The symbols used for these codes are described in FIG.

【０４９８】それぞれの要求発行者は、その要求のプラ
イオリティを表す４ビットを持っている。上位の２ビッ
トは一般の構成レジスタに設定されている構成値により
全般的なプライオリティにプリセットされている。プラ
イオリティの下位２ビットは仲裁者２４により更新され
る２ビットカウンタに収められている。仲裁の勝者を決
める際に、仲裁者１９２４は単にそれぞれの要求者の４
ビットの値を比較し、最高値の要求者にアクセスを許可
する。要求者にサイクルが許可されると、下位２ビット
のプライオリティカウンタの値はゼロになり、同一の上
位２ビットのプライオリティ値と勝者より低い下位２ビ
ットのプライオリティ値を持つ他の要求者の下位２ビッ
トのプライオリティカウントは全て１ずつインクリメン
トされる。この結果、今メモリアレイ１９１０へのアク
セスを許可された要求者は同一の上位２ビットプライオ
リティ値を持つ要求者の間で最も低いプライオリティに
なる。上位２ビットのプライオリティ値が勝者とは違っ
た値を持つ要求者の下位２ビットのプライオリティ値は
影響されない。プライオリティの上位２ビットの値は要
求者の全般的なプライオリティを決め、下位２ビットの
値は同一の上位プライオリティの要求者の間で公平な仲
裁スキームを実現している。このスキームを用いること
により、ハードウェアで結線された固定プライオリティ
（それぞれの要求者の上位２ビットがユニーク）から部
分的な入れ替えと、部分ハードウェア結線（全てではな
いが、一部の上位２ビットプライオリティが他のと異な
る）、厳密に公平な入れ替え（全ての上位２ビットのプ
ライオリティ値が同一）までのいろいろな仲裁スキーム
が実現できる。Each request issuer has 4 bits that indicate the priority of the request. The upper two bits are preset to a general priority by a configuration value set in a general configuration register. The lower two bits of the priority are stored in a two-bit counter updated by the arbitrator 24. In determining the winner of the arbitration, the arbitrator 1924 will simply determine the 4
Compare bit values and grant access to requestor with highest value. When the requester is granted the cycle, the value of the lower 2 bits priority counter becomes zero, and the lower 2 bits of the other requesters having the same upper 2 bits priority value and the lower 2 bits priority value lower than the winner. All bit priority counts are incremented by one. As a result, the requester who is now permitted to access the memory array 1910 has the lowest priority among requesters having the same upper two-bit priority value. The priority value of the lower 2 bits of the requestor whose priority value of the upper 2 bits is different from the winner is not affected. The value of the upper two bits of the priority determines the overall priority of the requester, and the value of the lower two bits implements a fair arbitration scheme between requesters of the same higher priority. By using this scheme, it is possible to partially switch from a fixed priority connected by hardware (the upper 2 bits of each requester is unique) and to perform partial hardware connection (some but not all upper 2 bits). Various arbitration schemes can be realized, up to a strict fair exchange (priority is different from the others) (priority values of all upper 2 bits are the same).

【０４９９】図１４９は、それぞれの要求者に対するプ
ライオリティビットの構造とそのビットの利用法を示し
ている。ここでは、図１４８に用いられているシンボル
の意味も定義されている。上記の実施例で各種のＦＩＦ
Ｏ１９２０、１９２２、１９３８、それから１９３６は
幅３２ビット、深さ３２ワードである。この深さは効率
と消費される回路エリアの間の良い線での妥協を与えて
いる。しかし、深さの値は、パフォーマンスの変化と共
に特定のアプリケーションのニーズに合わせて変えられ
る。FIG. 149 shows the structure of a priority bit for each requester and how to use the bit. Here, the meaning of the symbols used in FIG. 148 is also defined. Various FIFOs in the above embodiment
O1920, 1922, 1938, and 1936 are 32 bits wide and 32 words deep. This depth offers a good line compromise between efficiency and circuit area consumed. However, the depth value can be changed to suit the needs of a particular application as performance changes.

【０５００】また、ここに示されている４ポート構成は
単に一つの実施例である。メモリアレイと読み出しまた
は書き込みポートのいずれかとの間に単一のＦＩＦＯバ
ッファを用意するだけでも効果は得られる。しかし、多
数の読み出しと書き込みポートを用いると最高のスピー
ド向上が得られることになる。３．１８．９他モジュール他モジュール２３９は、コプロセッサ２２４の動作、リ
セット同期、内部診断信号を必要に応じて外部ピンにま
わすことによるエラーと割り込み信号のマルチプレクシ
ング、ＣＢｕｓの内部と外部フォームとの間のインタフ
ェーシングや内部と一般Ｂｕｓ信号の一般／外部Ｃｂｕ
ｓ出力ピンへのマルチプレクシングなどのためのクロッ
クの発生と選択を行う。勿論他モジュール２３９の動作
は、用いられるＡＳＩＣテクノロジによるクロッキング
への要求と具現詳細により異なる。The four-port configuration shown here is merely one embodiment. Providing only a single FIFO buffer between the memory array and either the read or write port can be effective. However, using a large number of read and write ports will provide the highest speed improvement. 3.18.9 Other Modules The other modules 239 operate the coprocessor 224, reset synchronization, multiplex error and interrupt signals by routing internal diagnostics signals to external pins as needed, and internal and external forms of the CBus. Interfacing and general / external Cbu of internal and general Bus signals
A clock for multiplexing to the s output pin is generated and selected. Of course, the operation of the other module 239 depends on the clocking requirements and implementation details of the ASIC technology used.

【０５０１】３．１８．１０外部インターフェース制
御部次に記述される本発明の特徴は、仮想メモリを共有する
コプロセッサを有するホストコンピュータで仮想メモリ
を提供するための方法と装置に関連している。本発明の
実施例は、コプロセッサがホストプロセッサと連動し仮
想メモリモードで動作可能になるよう模索している。3.18.10 External Interface Control The following described aspects of the invention relate to a method and apparatus for providing virtual memory on a host computer having a coprocessor that shares virtual memory. . Embodiments of the present invention seek to enable a coprocessor to operate in a virtual memory mode in conjunction with a host processor.

【０５０２】特に、コプロセッサはホストプロセッサの
仮想メモリモードで動作することが可能である。コプロ
セッサには、ホストプロセッサの仮想メモリテーブルを
参照することができる仮想メモリ対物理メモリマッピン
グデバイスが含まれており、コプロセッサにより生成さ
れた命令アドレスをホストプロセッサのメモリ内の対応
する物理アドレスにマッピングする。むしろ、仮想メモ
リ対物理メモリマッピングデバイスは、グラフィックイ
メージを生成するためにコンピュータグラフィックコプ
ロセッサの一部を形成する。コプロセッサには、イメー
ジに種々の複雑な動作を行える多数のモジュールが含ま
れる。マッピングデバイスはコプロセッサとホストプロ
セッサとの間の相互作用に関与するのである。In particular, the coprocessor can operate in the virtual memory mode of the host processor. The coprocessor includes a virtual memory-to-physical memory mapping device that can reference the host processor's virtual memory table, and translates the instruction address generated by the coprocessor into a corresponding physical address in the host processor's memory. Map. Rather, the virtual memory-to-physical memory mapping device forms part of a computer graphics coprocessor for generating graphic images. The coprocessor includes a number of modules that can perform various complex operations on the image. The mapping device is responsible for the interaction between the coprocessor and the host processor.

【０５０３】外部インターフェース制御部（ＥＩＣ）２
３８は、コプロセッサのＰＣＩＢｕｓと一般Ｂｕｓへ
のインターフェースを提供する。更に外部インターフェ
ース制御部は、コプロセッサの内部仮想アドレス空間と
ホストシステムの物理アドレス空間との間をつなぐメモ
リマネジメントも提供する。外部インターフェース制御
部２３８は、入力インターフェーススイッチ２５２から
の要求に応じてホストメモリからデータを読み出す時
や、結果オーガナイザ２４９からの要求に応じてホスト
メモリにデータを書き込む時にＰＣＩＢｕｓ上のマス
タとして作動する。ＰＣＩＢｕｓへのアクセスは、
“ＰＣＩＬｏｃａｌＢｕｓＳｐｅｃｉｆｉｃａｔ
ｉｏｎ，ｄｒａｆｔ２．１”ＰＣＩｓｐｅｃｉａｌ
ｉｎｔｅｒｅｓｔｇｒｏｕｐ，１９９４の標準に従っ
て具現する。External interface control unit (EIC) 2
38 provides an interface to the coprocessor PCI Bus and General Bus. Further, the external interface control unit also provides a memory management for connecting between the internal virtual address space of the coprocessor and the physical address space of the host system. The external interface control unit 238 operates as a master on the PCI bus when reading data from the host memory in response to a request from the input interface switch 252 or writing data to the host memory in response to a request from the result organizer 249. . Access to PCI Bus
“PCI Local Bus Specificat
ion, draft2.1 ”PCI special
It is embodied according to the standard of interest group, 1994.

【０５０４】外部インターフェース制御部２３８は、入
力インターフェーススイッチ２５２と結果オーガナイザ
２４９からのＰＣＩ取り引きのための同時要求を仲裁す
る。仲裁は構成可能であるのが望ましい。受け取った要
求のタイプには、一度にホストコプロセッサの１行以下
のキャッシュライン読み出しや、ホストの１行と２行の
間のキャッシュラインの読み出しと、２行又はそれ以上
のキャッシュラインの読み出しが含まれる。長さ無制限
の書き込みも外部インターフェース制御部２３８により
具現される。更に外部インターフェース制御部２３８
は、随意にデータのプリフェッチングも行う。The external interface controller 238 arbitrates simultaneous requests for PCI transactions from the input interface switch 252 and the result organizer 249. Preferably, the arbitration is configurable. The type of request received may include reading one or less cache lines at a time by the host coprocessor, reading cache lines between one and two lines by the host, and reading two or more cache lines at a time. included. Infinite length writing is also implemented by the external interface control unit 238. Further, the external interface control unit 238
Also optionally prefetches data.

【０５０５】外部インターフェース制御部２３８の構築
には、全てのコプロセッサの内部モジュールのために仮
想メモリからホストの物理メモリへのアドレスマッピン
グを提供するメモリマネジメントが含まれる。このマッ
ピングは、アクセスを要求するモジュールに対し完全に
透明である。外部インターフェース制御部２３８がホス
トメモリへのアクセス要求を受け取ると、メモリマネジ
メントユニットを初期化して、その要求されたアドレス
を変換する。メモリマネジメントユニットがアドレスの
変換に失敗すると、場合によっては一つまたはそれ以上
のＰＣＩＢｕｓの取り引きがアドレスの変換を完了す
る結果になる。これは、メモリマネジメントユニット自
身がＰＣＩＢｕｓへ取り引きを要求するもう一つのソ
ースになれることを意味する。入力インターフェースス
イッチ２５２や結果オーガナイザ２４９から要求された
バーストが仮想ページの境界を越えると、外部インター
フェース制御部２３８は自動的にメモリマネジメントユ
ニットを作動し、全ての仮想アドレスのマッピングを正
しくやり直す。The construction of the external interface controller 238 includes memory management that provides address mapping from virtual memory to host physical memory for all coprocessor internal modules. This mapping is completely transparent to the module requesting access. When the external interface control unit 238 receives a request to access the host memory, it initializes the memory management unit and translates the requested address. If the memory management unit fails to translate the address, in some cases one or more PCI Bus transactions may result in completing the address translation. This means that the memory management unit itself can be another source to request a transaction from the PCI Bus. When a burst requested from input interface switch 252 or result organizer 249 crosses a virtual page boundary, external interface control 238 automatically activates the memory management unit and remaps all virtual addresses correctly.

【０５０６】メモリマネジメントユニット（ＭＭＵ）
（図１５０の９１５）は、１６個のルックアサイドバッ
ファ（ＴＬＢ）が基本になっている。ＴＬＢは仮想対物
理アドレスマッピングのキャッシュとして作動する。Ｔ
ＬＢでは次のような作業が可能である。１）比較：仮想アドレスが与えられると、ＴＬＢは対応
する物理アドレスかＴＬＢミス信号（アドレスにマッチ
する有効なエントリがない場合）のいずれかを返す。[0506] Memory management unit (MMU)
(915 in FIG. 150) is based on 16 lookaside buffers (TLBs). The TLB acts as a cache for virtual to physical address mapping. T
The following operations are possible in the LB. 1) Comparison: Given a virtual address, the TLB returns either the corresponding physical address or a TLB miss signal (if no valid entry matches the address).

【０５０７】２）置換：ＴＬＢには、既存エントリや有
効でないエントリの代わりに新しい仮想対物理マッピン
グが書き込まれる。３）無効化：仮想アドレスが与えられた時、ＴＬＢのエ
ントリにマッチするとマッチしたエントリを無効化す
る。４）全無効化：すべてのＴＬＢエントリを無効化する。2) Replacement: A new virtual-to-physical mapping is written to the TLB in place of an existing entry or an invalid entry. 3) Invalidation: When a virtual address is given and a TLB entry is matched, the matching entry is invalidated. 4) Invalidate All: Invalidates all TLB entries.

【０５０８】５）読み出し：ＴＬＢエントリの仮想や物
理アドレスは、４ビットアドレスベースで読み出され
る。テストのみに用いられる。６）書き込み：ＴＬＢエントリの仮想や物理アドレス
は、４ビットアドレスベースで書き込まれる。ＴＬＢ内のエントリは図１５１に示すようなフォーマッ
トになっている。それぞれの有効なエントリは、２０ビ
ットの仮想アドレス６７０、２０ビットの物理アドレス
６７１、それから対応する物理ページが書き込み可能か
否かを表すフラグで構成される。エントリの許容ページ
サイズは４Ｋバイトである。ＭＭＵ内のレジスタは、比
較に用いられた１０ビットまでのアドレスにマスクをか
けるのに用いることができる。これによってＴＬＢのペ
ージは４Ｍバイトまでサポートされる。マスクレジスタ
は１つのみであるため、すべてのＴＬＢエントリは同サ
イズのページを参照する。5) Read: The virtual and physical addresses of the TLB entry are read on a 4-bit address basis. Used for testing only. 6) Write: The virtual and physical addresses of the TLB entry are written on a 4-bit address basis. The entry in the TLB has a format as shown in FIG. Each valid entry is composed of a 20-bit virtual address 670, a 20-bit physical address 671, and a flag indicating whether the corresponding physical page is writable. The allowable page size of the entry is 4K bytes. A register in the MMU can be used to mask up to 10 bits of the address used for the comparison. This supports up to 4 Mbytes of TLB pages. Since there is only one mask register, all TLB entries refer to pages of the same size.

【０５０９】ＴＬＢには、“Ｌｅａｓｔ−Ｒｅｃｅｎｔ
ｌｙＵｓｅｄ”（ＬＲＵ）置換アルゴリズムが用いら
れている。新しいエントリは最も長い時間が経過したエ
ントリに上書きされる。なぜなら、それは最後に書き込
まれたか、或は比較作業で一致したものだからである。
これは無効なエントリがない場合のみに適用される。無
効なエントリがある場合には、有効なエントリに上書き
する前に無効なエントリに書き込まれる。[0509] The TLB contains "Least-Recent".
A ly Used "(LRU) replacement algorithm is used. The new entry overwrites the oldest entry since it was last written or matched in a comparison operation.
This only applies if there are no invalid entries. If there is an invalid entry, it is written to the invalid entry before overwriting the valid entry.

【０５１０】図１５２はＴＬＢ比較操作の流れを示す。
受け取られた仮想アドレス８８０は８８１〜８８３の３
つの部分に分けられる。下位１２ビット８８１は常にペ
ージ内のオフセットの部分であるため、対応する物理ア
ドレスビット８８５へダイレクトに送られる。次の１０
ビット８８２は、マスクビットにより設定された通り、
ページサイズによってオフセットの部分か、ページ番号
の部分かのいずれかである。マスクレジスタ８８７内の
ゼロの値は、ビットがページオフセットの部分であるた
めＴＬＢ比較に用いてはいけないということを示してい
る。１０アドレスビットは１０マスクビットとロジカル
に“ＡＮＤＥＤ”（論理積）され、ＴＬＢルックアップ
のために下位１０ビットの仮想ページ番号８８９を与え
る。仮想アドレスの上位１０ビット８８３は、仮想ペー
ジ番号８８９の上位１０ビットとしてダイレクトに用い
られる。FIG. 152 shows the flow of the TLB comparison operation.
The received virtual address 880 is 3 of 881 to 883.
Divided into two parts. The lower 12 bits 881 are always sent directly to the corresponding physical address bits 885 because they are always part of the offset in the page. Next 10
Bit 882, as set by the mask bit,
Either the offset part or the page number part depending on the page size. A value of zero in the mask register 887 indicates that the bit is part of the page offset and should not be used for TLB comparison. The 10 address bits are logically "ANDED" with the 10 mask bits to provide the lower 10 bits of the virtual page number 889 for TLB lookup. The upper 10 bits 883 of the virtual address are used directly as the upper 10 bits of the virtual page number 889.

【０５１１】このように生成された２０ビットの仮想ペ
ージ番号はＴＬＢに送られる。これがエントリの１つと
一致すると、ＴＬＢは対応する物理ページ番号８７２と
一致した位置の番号を返す。物理アドレス８７３は、マ
スクレジスタ８８７を再び用いて物理ページ番号から生
成される。物理ページ番号８７２の上位１０ビットは物
理アドレス８７３の上位１０ビットとしてダイレクトに
用いられる。物理アドレス８７２の次の１０ビットは、
物理ページ番号（対応するマスクビットが１の場合）か
仮想アドレス（マスクビットが０の場合）かのいずれか
から８７５に選択される。物理アドレスの下位１２ビッ
ト８８５は仮想アドレスからダイレクトに与えられる。[0511] The 20-bit virtual page number generated in this manner is sent to the TLB. If this matches one of the entries, the TLB returns the number of the location that matched the corresponding physical page number 872. The physical address 873 is generated from the physical page number using the mask register 887 again. The upper 10 bits of the physical page number 872 are directly used as the upper 10 bits of the physical address 873. The next 10 bits of the physical address 872 are
875 is selected from either the physical page number (when the corresponding mask bit is 1) or the virtual address (when the mask bit is 0). The lower 12 bits 885 of the physical address are directly provided from the virtual address.

【０５１２】最後に、マッチに従いＬＲＵバッファ８７
６が更新され、マッチされたアドレスの使用を表す。Ｔ
ＬＢミスは、入力インターフェーススイッチ２５２や結
果オーガナイザ２４９がＴＬＢ８７２に存在しない仮想
アドレスへのアクセスを要求した時に発生する。この場
合、ＭＭＵは要求されたアクセスの処理を進める前に、
ホストメモリ２０３のページテーブルから要求された仮
想対物理変換をフェッチし、それをＴＬＢに書き込まな
ければならない。Finally, according to the match, the LRU buffer 87
6 is updated to indicate use of the matched address. T
The LB miss occurs when the input interface switch 252 or the result organizer 249 requests access to a virtual address that does not exist in the TLB 872. In this case, before the MMU proceeds with the requested access,
The requested virtual-to-physical translation must be fetched from the page table in host memory 203 and written to the TLB.

【０５１３】ページテーブルはホストメインメモリのハ
ッシュテーブルである。それぞれのページテーブルエン
トリは、図１５３に示すようなフォーマットの２つの３
２ビットワードで構成されている。２番目のワードは物
理アドレスのための上位２０ビットを構成し、下位１２
ビットは予約されている。対応する仮想アドレスの上位
２０ビットは最初のワードに与えられている。下位１２
ビットには有効（Ｖ）ビットと書き込み可能（Ｗ）また
は“リードオンリ”ビットが含まれており、残りの１０
ビットは予約されている。The page table is a hash table of the host main memory. Each page table entry has two 3D entries in the format shown in FIG.
It consists of two bit words. The second word comprises the upper 20 bits for the physical address and the lower 12
Bits are reserved. The upper 20 bits of the corresponding virtual address are given in the first word. Lower 12
The bits include a valid (V) bit and a writable (W) or "read only" bit, with the remaining 10 bits.
Bits are reserved.

【０５１４】ページテーブルエントリには、基本的にＴ
ＬＢエントリと同じ情報が含まれている。ページテーブ
ルの余分のフラグは予約されている。ページテーブル自
身は、通常メインメモリ２０３内の複数のページにわた
って分散され、一般に仮想空間と隣接していて物理空間
とは接していない。ＭＭＵには、ソフトウェアにより設
定された１６のページテーブルポインタのセットが含ま
れており、それぞれはページテーブルの部分を含んでい
る４Ｋバイトメモリ領域への２０ビットポインタであ
る。これは、コプロセッサ２２４が６４Ｋバイトサイズ
のページテーブルをサポートし、８Ｋページマッピング
を有することを意味している。４Ｋバイトページサイズ
のシステムにおいて、これは最大３２Ｍバイトのマッピ
ングされた仮想アドレス空間を意味する。むしろページ
テーブルポインタは、ＴＬＢに用いられるページサイズ
とは関係なく、常に４Ｋバイトのメモリ領域を参照する
ことである。The page table entry basically contains T
It contains the same information as the LB entry. Extra flags in the page table are reserved. The page table itself is usually distributed over a plurality of pages in the main memory 203, and is generally adjacent to the virtual space and not in contact with the physical space. The MMU contains a set of 16 page table pointers set by software, each being a 20-bit pointer to a 4K byte memory area containing the page table portion. This means that the coprocessor 224 supports a page table of 64K byte size and has 8K page mapping. In a 4K byte page size system, this means up to 32 Mbytes of mapped virtual address space. Rather, the page table pointer always refers to a 4K byte memory area, regardless of the page size used for the TLB.

【０５１５】ＴＬＢミス後のＭＭＵ操作は、次のように
図１５４の６９０に示している。１．ＴＬＢに存在しない仮想ページ番号８９１上のハッ
シュファンクション８９２を実行し、ページテーブルへ
１３ビットのインデックスを生成する。２．ページテーブルインデックス８９４、８９６の上位
４ビット８９４を用い、ページテーブルポインタ８９５
を選択する。An MMU operation after a TLB miss is shown in FIG. 154 at 690 as follows. 1. The hash function 892 on the virtual page number 891 that does not exist in the TLB is executed to generate a 13-bit index in the page table. 2. Using the upper 4 bits 894 of the page table indexes 894 and 896, the page table pointer 895 is used.
Select

【０５１６】３．２０ビットのページテーブルポインタ
８９５とページテーブルインデックス８９６の下位９ビ
ットを連結し、最下位３ビットに０００を設定すること
により（ページテーブルエントリはホストメモリ内の８
バイトを占めるため）、要求されたページテーブルエン
トリの物理アドレス８９０を生成する。４．ページテーブルエントリの物理アドレス８９８から
始め、ホストメモリから８バイトを読み出す。[0516] 3. The 20-bit page table pointer 895 and the lower 9 bits of the page table index 896 are concatenated, and the least significant 3 bits are set to 000.
Generates the physical address 890 of the requested page table entry (to occupy bytes). 4. Starting from the physical address 898 of the page table entry, 8 bytes are read from the host memory.

【０５１７】５．８バイトのページテーブルエントリ９
００がＰＣＩバスへ返されたとき、ＶＡＬＩＤビットが
１にセットされていれば仮想ページ番号はＴＬＢミスを
起こした元の仮想ページ番号と比較される。両者がマッ
チしないと、上記のプロセスを用いて次のページテーブ
ルエントリがフェッチされる（物理アドレスは８バイト
ずつインクリメントされる）。この過程はマッチする仮
想ページ番号のページテーブルエントリが見つかるま
で、或は無効なページテーブルエントリに遭うまで続け
られる。無効なページテーブルエントリに遭った場合に
は、ページフォールトエラーが出され処理は中止する。5.8-byte page table entry 9
When the 00 is returned to the PCI bus, if the VALID bit is set to 1, the virtual page number is compared to the original virtual page number that caused the TLB miss. If they do not match, the next page table entry is fetched using the above process (the physical address is incremented by 8 bytes). This process continues until a page table entry with a matching virtual page number is found or an invalid page table entry is encountered. If an invalid page table entry is encountered, a page fault error is issued and processing stops.

【０５１８】６．マッチする仮想ページ番号を有するペ
ージテーブルエントリが見つかると、置換操作によって
完全なエントリがＴＬＢに書き込まれる。新しいエント
リはＬＲＵバッファ８７６によってポイントされたＴＬ
Ｂ位置に置かれる。それからＴＬＢの比較作業が再び行
われ、順調に続いて、元の要求されたホストメモリアク
セスの処理が可能になる。新しいエントリがＴＬＢに書
き込まれると、ＬＲＵバッファ８７６は更新される。[0518] 6. When a page table entry with a matching virtual page number is found, the replace operation writes the complete entry to the TLB. The new entry is the TL pointed to by the LRU buffer 876
Placed in position B. Then, the TLB comparison operation is performed again, and the processing of the original requested host memory access can be performed smoothly. As new entries are written to the TLB, the LRU buffer 876 is updated.

【０５１９】ＥＩＣ２３８に具現されているハッシュフ
ァンクション８９２は、２０ビットの仮想ページ番号
（ｖｐｎ）に対し、次の方程式を用いる。ｉｎｄｅｘ＝（（ｖｐｎ＞＞Ｓ１）ＸＯＲ（ｖｐｎ＞＞
Ｓ２）ＸＯＲ（ｖｐｎ＞＞Ｓ３））＆Ｏｘ１ｆｆｆ；ここで、Ｓ１、Ｓ２、Ｓ３は独立的にプログラム可能な
シフト量（正、又は負）で、それぞれ４つの値を取るこ
とができる。The hash function 892 implemented in the EIC 238 uses the following equation for a 20-bit virtual page number (vpn). index = ((vpn >> S1) XOR (vpn >>
S2) XOR (vpn >> S3)) &Ox1ffff; Here, S1, S2, and S3 are independently programmable shift amounts (positive or negative) and can take four values.

【０５２０】ページテーブルの線形探索が４Ｋバイトの
境界を越えると、ＭＭＵは自動的に次のページテーブル
ポインタを選択し、正しい物理メモリ位置で探索を継続
する。この作業には、ページテーブルの最後から最初へ
のラッピングが含まれる。ぺージテーブルは、探索が常
に終了されるように常に少なくとも１つの無効（ｎｕｌ
ｌ）エントリを含んでいる。When the linear search of the page table crosses the 4K byte boundary, the MMU automatically selects the next page table pointer and continues the search at the correct physical memory location. This involves wrapping the page table from the end to the beginning. The page table always has at least one invalid (null) so that the search is always terminated.
l) Contains an entry.

【０５２１】ソフトウェアがホストメモリ内のページを
置換するたびに、新しい仮想ページのためのページテー
ブルエントリを追加し、置換されたページに対応するエ
ントリを削除しなければならない。また、古いページテ
ーブルエントリはコプロセッサ２２４のＴＬＢにキャッ
シュされてはいけない。これは、ＭＭＵ内のＴＬＢ無効
化サイクルを果たすことにより行われる。Each time software replaces a page in host memory, a page table entry for the new virtual page must be added and the entry corresponding to the replaced page must be deleted. Also, old page table entries must not be cached in the TLB of coprocessor 224. This is done by performing a TLB invalidation cycle in the MMU.

【０５２２】無効化サイクルは無効化作業を引き起こす
ビットと共に無効化される仮想ページ番号をし、ＭＭＵ
へのレジスタ書き込みを通じて果たされる。このレジス
タ書き込みは、ソフトウェアによって直接、或は命令デ
コーダにより割り込みされた命令を通じて果たされる。
無効化作業は、提供された仮想ページ番号のためにＴＬ
Ｂ上で果たされる。ＴＬＢエントリにマッチすると、エ
ントリは無効にマークされ、無効化された位置が次の置
換作業で用いられるようにＬＲＵテーブルが更新され
る。The invalidation cycle gives the virtual page number to be invalidated with the bit causing the invalidation work, and the MMU
This is accomplished through a register write to. This register write is performed directly by software or through an instruction interrupted by the instruction decoder.
Invalidation work is performed by TL for the provided virtual page number.
Performed on B. When a TLB entry matches, the entry is marked invalid and the LRU table is updated so that the invalidated location is used for the next replacement operation.

【０５２３】未決定の無効化作業はいかなる未決定のＴ
ＬＢ比較より高いプライオリティを持っている。無効化
作業が完了すると、ＭＭＵは無効化ビットをクリアし、
次の無効化処理が可能であることを知らせる。ＭＭＵが
要求された仮想アドレスのための有効なページテーブル
エントリを見つけられない場合、これをページフォルト
という。ＭＭＵはエラー信号を出し、フォルトを起こし
た仮想アドレスをソフトウェアがアクセス可能なレジス
タに保管する。ＭＭＵはアイドル状態に入り、エラーが
解決されるまで待機する。割り込みがクリアされると、
ＭＭＵは次の要求された取り引きから再び作業を始め
る。[0523] The pending revocation work is any pending T
Has a higher priority than LB comparison. When the invalidation operation is completed, the MMU clears the invalidation bit,
Notifies that the next invalidation process is possible. If the MMU cannot find a valid page table entry for the requested virtual address, it is called a page fault. The MMU issues an error signal and stores the faulted virtual address in a software accessible register. The MMU enters an idle state and waits until the error is resolved. When the interrupt is cleared,
The MMU starts again with the next requested transaction.

【０５２４】読み出し専用とマークされた（書き込み可
能とマークされてない）ページへの書き込み作業がなさ
れた時にもページフォルトが出される。外部インターフ
ェース制御部（ＥＩＣ）２３８は、一般バスへアドレス
されている入力インターフェーススイッチ２５２と結果
オーガナイザ２４９からの取り引き要求に応じられる。
それぞれの要求モジュールは現在の要求が一般バス用か
あるいはＰＣＩバス用かを表す。入力インターフェース
スイッチ２５２と結果オーガナイザ２４９とのコミュニ
ケーションに共通バスを用いるのとは異なり、一般バス
要求へのＥＩＣ操作はＰＣＩ要求への操作と完全に分か
れている。更にＥＩＣ２３８は、一般バス空間にダイレ
クトにアドレスするＣｂｕｓ取り引きタイプにも応じら
れる。A page fault is also issued when a write operation is performed on a page marked as read-only (not marked as writable). The external interface controller (EIC) 238 responds to transaction requests from the input interface switch 252 and result organizer 249 addressed to the general bus.
Each request module indicates whether the current request is for a general bus or a PCI bus. Unlike using a common bus for communication between the input interface switch 252 and the result organizer 249, EIC operations for general bus requests are completely separate from operations for PCI requests. Further, the EIC 238 is also compatible with a Cbus transaction type that directly addresses the general bus space.

【０５２５】図１５０は、外部インターフェース制御部
２３８の構造を示している。ＩＢｕｓ要求は多重化部９
１０を通り、多重化部９１０は要求の目的地をもとにし
て（ＰＣＩまたは一般バス）適当な内部モジュールへ要
求を導く。一般バスへの要求は、ＲＢｕｓとＣＢｕｓも
持っている一般バス制御部９１１へ送られる。ＲＢｕｓ
上の一般バスとＰＣＩバス要求は異なるコントロール信
号を用いるため、このバスには多重化部が必要とされな
い。FIG. 150 shows the structure of the external interface control section 238. The IBus request is sent to the multiplexing unit 9
Through 10, multiplexor 910 routes the request to the appropriate internal module (PCI or general bus) based on the destination of the request. The request for the general bus is sent to the general bus control unit 911 which also has the RBus and the CBus. RBus
Since the above general bus and PCI bus requests use different control signals, a multiplexing unit is not required for this bus.

【０５２６】ＰＣＩバスへ導かれたＩＢｕｓ要求はＩＢ
ｕｓドライバ（ＩＢＤ）９１２によって扱われる。同様
に、ＰＣＩへのＲＢｕｓ要求はＲＢｕｓレシーバ（ＲＢ
Ｒ）９１４によって処理される。ＩＢＤ９１２とＲＢＲ
９１４は仮想アドレスを、物理アドレスを返すメモリマ
ネジメントユニット（ＭＭＵ）９１５に送る。ＩＢＤ、
ＲＢＲ、それからＭＭＵは、それぞれＰＣＩトランザク
ションを要求できて、これらはＰＣＩマスタモード制御
部（ＰＭＣ）９１７によって生成され、コントロールさ
れる。ＩＢＤとＭＭＵはＰＣＩ読み出しのみを要求し、
ＲＢＲはＰＣＩ書き込みのみを要求する。The IBus request led to the PCI bus is IB
us driver (IBD) 912. Similarly, the RBus request to the PCI is sent to the RBus receiver (RB
R) 914. IBD912 and RBR
914 sends the virtual address to a memory management unit (MMU) 915 that returns a physical address. IBD,
The RBR and then the MMU can each request PCI transactions, which are generated and controlled by a PCI Master Mode Controller (PMC) 917. IBD and MMU require only PCI read,
RBR requires only PCI writes.

【０５２７】別個のＰＣＩターゲットモード制御部（Ｐ
ＴＣ）９１８は、ターゲットとしてコプロセッサへアド
レスされた全てのＰＣＩトランザクションを処理する。
これはＣＢｕｓマスタモード信号を命令制御部へ送り、
すべての他モジュールへのアクセスを可能にする。ＰＴ
Ｃは、返されたＣＢｕｓデータをＰＭＣ経由でＰＣＩバ
スへ送るため、ＰＣＩデータバスピンのコントロールは
単一のソースから出される。A separate PCI target mode control unit (P
TC) 918 processes all PCI transactions addressed to the coprocessor as targets.
This sends a CBus master mode signal to the command control,
Allow access to all other modules. PT
C sends the returned CBus data to the PCI bus via the PMC, so that control of the PCI data bus pins comes from a single source.

【０５２８】ＥＩＣレジスタとモジュールメモリへアド
レスされたＣＢｕｓトランザクションは標準ＣＢｕｓイ
ンターフェース７によって扱われる。全てのサブモジュ
ールはコントロールレジスタからビットをもらい、ステ
ータスレジスタにビットを返す。これらは標準ＣＢｕｓ
インターフェース内部に位置している。ＰＣＩバストラ
ンザクションのためのパリティ生成とチェックは、ＰＭ
ＣとＰＴＣのコントロール下で作動するパリティ生成と
チェック（ＰＧＣ）モジュール９２１によって処理され
る。生成されたパリティは、パリティエラー信号と同様
にＰＣＩバスへ送られる。パリティチェックの結果は、
エラーレポートのためにＰＴＣのコンフィギュレーショ
ンレジスタにも送られる。The CBus transactions addressed to the EIC registers and module memory are handled by the standard CBus interface 7. All submodules receive bits from the control register and return bits to the status register. These are standard CBus
Located inside the interface. Parity generation and checking for PCI bus transactions
Parity generation and check (PGC) module 921 which operates under the control of C and PTC. The generated parity is sent to the PCI bus similarly to the parity error signal. The result of the parity check is
It is also sent to the configuration register of the PTC for error reporting.

【０５２９】図１５５は、図１５０のＩＢｕｓドライバ
９１２の構造を示している。受け入れたＩＢｕｓアドレ
スとコントロール信号はサイクルの始点でラッチされる
９３０。オアゲート９３１はサイクルの始まりを検出
し、コントロールロジック９３２に開始信号を発生す
る。仮想ページ番号を形成するラッチ９３０の上位アド
レスビットはカウンタ９３５にロードされる。仮想ペー
ジ番号は、９３６にラッチされた物理ページ番号を返す
ＭＭＵ９１５（図１５０）へ送られる。FIG. 155 shows the structure of the IBus driver 912 of FIG. The accepted IBus address and control signals are latched 930 at the beginning of the cycle. OR gate 931 detects the beginning of a cycle and generates a start signal to control logic 932. The upper address bits of latch 930 forming the virtual page number are loaded into counter 935. The virtual page number is sent to MMU 915 (FIG. 150) which returns the physical page number latched in 936.

【０５３０】物理ページ番号と下位仮想アドレスビット
は、マスク９３７によって再結合され、ＰＭＣ７１７
（図１０２）へのＰＣＩ要求のためのアドレス９３８を
形成する。また、サイクルのためのバーストカウントも
カウンタ９３９にロードされる。プリフェッチ動作は異
なるカウンタ９４１とアドレスラッチと比較回路９４３
を用いる。ＰＭＣから返されたデータは、データがプリ
フェッチの一部か否かを表すマーカと共にＦＩＦＯ９４
４にロードされる。データがＦＩＦＯ９４４の前の部分
で使用可能になってくると、ラッチ９４５、９４６経由
で読み出し、ロジックによりクロックアウトされる。読
み出しロジック９４６はＩＢｕｓアクノレッジメント信
号も生成する。The physical page number and the lower virtual address bit are recombined by the mask 937, and the PMC 717
Form address 938 for a PCI request to (FIG. 102). The burst count for the cycle is also loaded into counter 939. The prefetch operation is different from that of the counter 941, the address latch, and the comparison circuit 943.
Is used. The data returned from the PMC includes a FIFO 94 with a marker indicating whether the data is part of a prefetch.
4 is loaded. As data becomes available in the portion prior to FIFO 944, it is read through latches 945, 946 and clocked out by logic. Read logic 946 also generates an IBus acknowledgment signal.

【０５３１】中央コントロールブロック９３２は、状態
器を含め、全てのアドレスとデータ要素の順次処理、そ
れからＰＭＣへのインターフェースをコントロールす
る。仮想ページ番号カウンタ９３５は、ＩＢｕｓアドレ
スからのページ番号ビットで、ＩＢｕｓトランザクショ
ンの開始と共にロードされる。この２０ビットカウンタ
の上位１０ビットは常に受け入れるアドレスからくる。
下位１０ビットに対しては、それぞれのビットは対応す
るマスクビット９３７が１にセットされていれば受け入
れるアドレスからロードされ、そうでないと、カウンタ
ビットが１にセットされる。２０ビットの値はＭＭＵイ
ンターフェースへ送られる。A central control block 932 controls the sequential processing of all address and data elements, including the state machine, and the interface to the PMC. The virtual page number counter 935 is the page number bit from the IBus address and is loaded at the start of the IBus transaction. The upper 10 bits of this 20-bit counter always come from the accepted address.
For the lower 10 bits, each bit is loaded from the accepting address if the corresponding mask bit 937 is set to 1, otherwise the counter bit is set to 1. The 20-bit value is sent to the MMU interface.

【０５３２】通常の動作で、仮想ページ番号は初期アド
レス変換の後で用いられない。しかし、ＩＢＤがバース
トのページ境界越えを検出した場合には、仮想ページカ
ウンタがインクリメントされ、もう１つの変換が行われ
る。カウンタがロードされた時仮想ページ番号の一部で
ない下位ビットが１にセットされているため、２０ビッ
トの値への単純インクリメントは実際のページ番号フィ
ールドのインクリメントをもたらす。インクリメントさ
れた後、次のインクリメントのためにカウンタをセット
アップするために、マスクビット９３７が再び用いられ
る。In normal operation, virtual page numbers are not used after initial address translation. However, if the IBD detects that the burst crosses a page boundary, the virtual page counter is incremented and another conversion is performed. A simple increment to a 20-bit value results in an increment of the actual page number field because the low order bit that is not part of the virtual page number is set to one when the counter is loaded. After being incremented, mask bit 937 is used again to set up the counter for the next increment.

【０５３３】物理アドレスは、変換後、ＭＭＵが有効な
物理ページ番号を返すたびにラッチされる９３６。マス
クビットは、返された物理ページ番号と元の仮想アドレ
スビットとを正しく結合するために用いられる。物理ア
ドレスカウンタ９３８は物理アドレスラッチ９３６から
ロードされる。これはＰＭＣからワードが返されるたび
にインクリメントされる。インクリメントされるたびに
カウンタはモニタされ、トランザクションがページ境界
を越えようとしているか否かを判断する。マスクビット
は、カウンタのどのビットが比較に用いられるかを判断
するのに使用される。カウンタがページ内に残っている
ワードの数が２つ以下であることを検出すると、コント
ロールロジック９３２に信号を出し、２つのデータ転送
後現在のＰＣＩ要求を終了し、必要に応じて新たなアド
レス変換を要求する。カウンタは新しいアドレス変換後
に再びロードされ、ＰＣＩ要求が再開する。After conversion, the physical address is latched 936 each time the MMU returns a valid physical page number. The mask bits are used to correctly combine the returned physical page number with the original virtual address bits. The physical address counter 938 is loaded from the physical address latch 936. It is incremented each time a word is returned from the PMC. Each time the counter is incremented, the counter is monitored to determine if the transaction is about to cross a page boundary. The mask bits are used to determine which bits of the counter are used for the comparison. When the counter detects that the number of words remaining in the page is less than two, it signals the control logic 932 to terminate the current PCI request after two data transfers and, if necessary, to add a new address. Request a conversion. The counter is reloaded after the new address translation and the PCI request resumes.

【０５３４】バーストカウンタ９３９は、トランザクシ
ョンの始点でＩＢｕｓバースト値と共にロードされる６
ビットのダウンカウンタである。これはＰＭＣからワー
ドが返されるたびにデクリメントされる。カウンタの値
が２つ以下になると、コントロールロジック９３２へ信
号を出し、これで２つのデータ転送後、ＰＣＩトランザ
クションを終了することができる（プリフェッチングが
可能でない限り）。The burst counter 939 is loaded with the IBus burst value at the beginning of the transaction.
It is a bit down counter. It is decremented each time a word is returned from the PMC. When the value of the counter falls below two, a signal is sent to the control logic 932, which can terminate the PCI transaction after two data transfers (unless prefetching is possible).

【０５３５】プリフェッチアドレスレジスタ９４３は、
いかなるプリフェッチの最初のワードの物理アドレスと
共にロードされる。続くＩＢｕｓトランザクションが開
始し、それからプリフェッチカウンタが少なくとも１つ
のワードが巧くプリフェッチされたことを示したら、ト
ランザクションの最初の物理アドレスがプリフェッチア
ドレスの値と比較される。両者がマッチすると、プリフ
ェッチデータはＩＢｕｓ引取りを満たすのに用いられ、
最後にプリフェッチされたワードの後のアドレスでＰＣ
Ｉトランザクション要求が開始する。[0535] The prefetch address register 943 is
Loaded with the physical address of the first word of any prefetch. The subsequent IBus transaction begins, and if the prefetch counter indicates that at least one word was successfully prefetched, the first physical address of the transaction is compared to the value of the prefetch address. If they match, the prefetch data is used to satisfy IBus pickup,
PC at address after last prefetched word
An I transaction request starts.

【０５３６】プリフェッチカウンタ９４１は４ビットの
カウンタで、プリフェッチ動作中にＰＭＣによってワー
ドが返されるたびに最大入力ＦＩＦＯの深さと同じカウ
ントまでインクリメントされる。続くＩＢｕｓトランザ
クションがプリフェッチアドレスとマッチすると、プリ
フェッチカウントがアドレスカウンタに足され、それか
らバーストカウンタから引かれ、ＰＣＩ要求が要求され
る位置で開始できるようになる。代わりに、ＩＢｕｓト
ランザクションがプリフェッチされたデータの一部だけ
を必要とすると、要求されたバーストの長さはプリフェ
ッチカウントから引かれ、それからラッチされたプリフ
ェッチアドレスに足され、残りのプリフェッチデータは
更なる要求を満たすために保留される。The prefetch counter 941 is a 4-bit counter, and is incremented each time a word is returned by the PMC during a prefetch operation to the same count as the maximum input FIFO depth. When the subsequent IBus transaction matches the prefetch address, the prefetch count is added to the address counter and then subtracted from the burst counter, allowing the PCI request to start at the required location. Alternatively, if the IBus transaction requires only a portion of the prefetched data, the length of the requested burst is subtracted from the prefetch count, then added to the latched prefetch address, and the remaining prefetch data is further processed. Pending to fulfill request.

【０５３７】データＦＩＦＯ９４４は、８ワード×３３
ビットの非同期フォールスルーＦＩＦＯである。ＰＭＣ
からのデータは、データがプリフェッチの一部であるか
否かを表すビットと共にＦＩＦＯに書きこまれる。ＦＩ
ＦＯの先端からのデータは、使用可能になるや否やＦＩ
ＦＯから読み出されＩＢｕｓへ送られる。データ読み出
し信号を生成するロジックはｃｌｋと同期して動作し、
ＩＢｕｓアクノレッジメント出力を発生する。トランザ
クションがプリフェッチされたデータを用いて満たされ
る場合に、コントロールロジックからの信号は、ＦＩＦ
Ｏから読み出すプリフェッチされたデータの数の情報を
を読み出しロジックに与える。[0537] The data FIFO 944 is 8 words x 33
Bit asynchronous fall-through FIFO. PMC
Is written to the FIFO with a bit indicating whether the data is part of a prefetch. FI
Data from the top of the FO will be available as soon as it becomes available.
It is read from the FO and sent to the IBus. The logic for generating the data read signal operates in synchronization with clk,
Generate an IBus acknowledgment output. If the transaction is filled with prefetched data, the signal from the control logic will be
Information about the number of prefetched data to be read from O is provided to the read logic.

【０５３８】図１５６は、図１５０のＲＢｕｓレシーバ
９１４の構造を示している。コントロールは２つの状態
器９５０、９５１との間でスプリットされる。書き込み
状態器９５１はＲＢｕｓへのインターフェースをコント
ロールする。入力アドレス７５２はＲＢｕｓバーストの
始点でラッチされる。バーストのそれぞれのデータワー
ドは、バイトイネーブルと共にＦＩＦＯ７５４に書き込
まれる。ＦＩＦＯ９５４が充満するようになると書き込
みロジック９５１によってｒ−レディが取り消され、オ
ーガナイザがそれ以上のワードを書き込まないようにす
る。FIG. 156 shows the structure of the RBus receiver 914 shown in FIG. Control is split between the two state machines 950, 951. Write state machine 951 controls the interface to RBus. The input address 752 is latched at the start of the RBus burst. Each data word of the burst is written to FIFO 754 with a byte enable. When the FIFO 954 becomes full, the write logic 951 cancels the r-ready, preventing the organizer from writing any more words.

【０５３９】書き込みロジック９５１は、再同期開始信
号を介してメイン状態器９５０にＲＢｕｓバーストの開
始を通知し、オーガナイザがそれ以上のワードを書き込
まないようにする。仮想ページ番号を形成する上位アド
レスビットはカウンタ９５７にロードされる。仮想ペー
ジ番号はＭＭＵへ送られ、ＭＭＵからは物理ページ番号
９５８が返される。物理ページ番号と仮想アドレスの下
位ビットはマスクに従って再結合され、カウンタ９６０
にロードされ、ＰＭＣへのＰＣＩ要求のためのアドレス
を提供する。ＰＣＩ要求のそれぞれのワードのためのデ
ータとバイトイネーブルは、すべてのＰＭＣＭインター
フェースコントロール信号も扱うメインコントロールロ
ジック９５０によってＦＩＦＯ９５４からクロックアウ
トされる。メイン状態器は、ビジー信号を介してアクテ
ィヴであることを示し、それは書き込み状態器へ再同期
して返される。[0539] The write logic 951 signals the start of the RBus burst to the main state machine 950 via the resynchronization start signal to prevent the organizer from writing any more words. The upper address bits forming the virtual page number are loaded into counter 957. The virtual page number is sent to the MMU, which returns a physical page number 958. The lower bits of the physical page number and the virtual address are recombined according to the mask, and the counter 960
And provides the address for the PCI request to the PMC. The data and byte enable for each word of the PCI request are clocked out of the FIFO 954 by the main control logic 950, which also handles all PMCM interface control signals. The main state machine indicates that it is active via a busy signal, which is resynchronized back to the write state machine.

【０５４０】書き込み状態器９５１は、ｒ−ファイナル
を用いてＲＢｕｓバーストの終了を検出する。するとＦ
ＩＦＯ９５４へのデータのロードを中止し、メイン状態
器にＲＢｕｓバーストが終了したことを通知する。メイ
ン状態器はデータＦＩＦＯが空になるまでＰＣＩ要求を
継続する。それからビジーを取り消し、書き込み状態器
が次のＲＢｕｓバーストを開始するようにする。[0540] The write state machine 951 detects the end of the RBus burst using the r-final. Then F
The loading of the data into the IFO 954 is stopped, and the main state machine is notified that the RBus burst has ended. The main state machine continues the PCI request until the data FIFO is empty. It then cancels the busy and causes the write state machine to start the next RBus burst.

【０５４１】図１５０に再び戻り、メモリマネジメント
ユニット９１５は、ＩＢｕｓドライバ（ＩＢＤ）９１２
とＲＢｕｓレシーバ（ＩＢＲ）９１４のために仮想ペー
ジ番号から物理ページ番号への変換を担当する。図１５
７に、メモリマネジメントユニットの詳細を示してい
る。１６エントリの変換ルックアサイドバッファ（ＴＬ
Ｂ）９７０は、ＴＬＢアドレスロジック９７１から入力
データを受け取って出力を送り返す。状態器が含まれて
いるＴＬＢコントロールロジック９７２は、ＲＢＲまた
はＩＢＤからＴＬＢアドレスロジックにバッファされて
いる要求を受け取る。要求を受け取ると、入力のソース
とＴＬＢによって行われる作業を選択する。有効なＴＬ
Ｂ作業は、比較、無効化、全無効化、書き込みと読み出
しである。ＴＬＢ入力アドレスのソースとしては、ＩＢ
ＤとＲＢＲインターフェース（比較作業用）、ページテ
ーブルエントリバッファ９７４（ＴＬＢミスサービス
用）またはＴＬＢアドレスロジック内のレジスタなどが
ある。ＴＬＢは、ＴＬＢコントロールロジックにそれぞ
れの作業のステータスを返す。成功した比較作業からの
物理ページ番号はＩＢＤとＲＢＲへ送り返す。ＴＬＢは
最も最近アクセスされた（ＬＲＵ）位置の記録を保有
し、これはＴＬＢアドレスロジックにとっては書き込み
作業用の位置として用いるのに有用である。Referring back to FIG. 150, the memory management unit 915 includes an IBus driver (IBD) 912.
And an RBus receiver (IBR) 914 for converting a virtual page number to a physical page number. FIG.
FIG. 7 shows details of the memory management unit. 16-Entry Translation Lookaside Buffer (TL
B) 970 receives input data from TLB address logic 971 and sends back output. TLB control logic 972, which includes a state machine, receives requests buffered in TLB address logic from RBR or IBD. Upon receiving the request, it selects the source of the input and the work to be performed by the TLB. Valid TL
The B operation is comparison, invalidation, all invalidation, writing and reading. The source of the TLB input address is IB
D and RBR interface (for comparison work), page table entry buffer 974 (for TLB miss service) or register in TLB address logic. The TLB returns the status of each task to the TLB control logic. The physical page number from the successful comparison operation is sent back to the IBD and RBR. The TLB keeps a record of the most recently accessed (LRU) location, which is useful for the TLB address logic to use as a location for write operations.

【０５４２】比較作業が失敗した場合、ＴＬＢコントロ
ールロジック９７２はページテーブルアクセスコントロ
ールロジック９７６にＰＣＩ要求を開始するよう信号を
出す。ページテーブルアドレスゼネレータ９７７は、内
部ページテーブルポインタレジスタを用い、仮想ページ
番号をもとにＰＣＩアドレスを生成する。ＰＣＩ要求か
ら返されたデータは、ページテーブルエントリバッファ
９７４へラッチされる。要求される仮想アドレスにマッ
チするページテーブルエントリが見つかると、物理ペー
ジ番号がＴＬＢアドレスロジック９７７へ送られ、その
後ページテーブルアクセスコントロールロジック９７６
はページテーブルアクセスが完了したことを通知する。
それからＴＬＢコントロールロジック９７２は、ＴＬＢ
に新たなエントリを書き込み、比較作業を再び開始す
る。If the comparison operation fails, the TLB control logic 972 signals the page table access control logic 976 to initiate a PCI request. The page table address generator 977 uses the internal page table pointer register to generate a PCI address based on the virtual page number. Data returned from the PCI request is latched into page table entry buffer 974. If a page table entry that matches the required virtual address is found, the physical page number is sent to TLB address logic 977 and then page table access control logic 976
Notifies that page table access has been completed.
Then the TLB control logic 972
And a comparison operation is started again.

【０５４３】ＳＣＩへのレジスタ信号とＳＣＩからのレ
ジスタ信号は両方の方向に再同期される９８０。信号は
全てのサブモジュールへ行き来する。モジュールメモリ
インターフェース９８１は、標準ＣＢｕｓインターフェ
ースからＴＬＢとページテーブルポインタメモリ要素へ
のアクセスをデコードする。ＴＬＢアクセスは読み出し
専用で、データを得るためにＴＬＢコントロールロジッ
クを用いる。ページテーブルポインタは読み出し・書き
込み両方可能で、モジュールメモリインターフェースに
よってダイレクトにアクセスされる。これらのパスには
同期回路も含まれている。The register signals to and from the SCI are resynchronized 980 in both directions. The signal goes to all sub-modules. Module memory interface 981 decodes access to TLB and page table pointer memory elements from the standard CBus interface. TLB access is read-only and uses TLB control logic to obtain data. The page table pointer is readable and writable, and is directly accessed by the module memory interface. These paths also include synchronization circuits.

【０５４４】３．１８．１１周辺インターフェース制
御部図１５８には、図２の周辺インターフェース制御部
（ＰＩＣ）の一例を詳細に示している。ＰＩＣ２３７
は、外部周辺デバイスへ、又はデバイスからデータを転
送するいくつかのモードの１つで動作する。基本的なモ
ードは、１）ビデオ出力モード：このモードで、データは外部ビ
デオクロックとクロック・データイネーブルのコントロ
ール下で、周辺へ転送される。ＰＩＣ２３７は、出力デ
ータに対し必要とされるタイミングで出力クロックとク
ロックイネーブルサインを送る。3.18.11 Peripheral Interface Controller FIG. 158 shows an example of the peripheral interface controller (PIC) in FIG. 2 in detail. PIC237
Operates in one of several modes for transferring data to or from external peripheral devices. The basic modes are: 1) Video output mode: In this mode, data is transferred to the periphery under the control of an external video clock and clock data enable. The PIC 237 sends an output clock and a clock enable sign at the required timing for the output data.

【０５４５】２）ビデオ入力モード：このモードで、デ
ータは外部ビデオクロックとクロック・データイネーブ
ルのコントロール下で、周辺へ転送される。３）セントロニクスモード：このモードは、ＩＥＥＥ
１２８４標準に定義されている標準プロトコルに従い、
周辺へと周辺からデータを転送する。ＰＩＣ２３７は、必要に応じて、内部データソースや目
的地から外部インターフェースのプロトコルを分離す
る。内部データソースは、出力データの単一ストリーム
にデータを書き込み、選択されているモードによって外
部周辺機器へ転送される。同様に、外部周辺からの全て
のデータは単一入力データストリームに書き込まれ、可
能な内部データ目的地の１つに要求されたトランザクシ
ョンを満たすのに用いられる。2) Video input mode: In this mode, data is transferred to the periphery under the control of an external video clock and clock data enable. 3) Centronics mode: This mode is based on IEEE
According to the standard protocol defined in the 1284 standard,
Transfer data to and from the periphery. The PIC 237 separates the protocol of the external interface from the internal data source and destination as needed. The internal data source writes the data to a single stream of output data and is transferred to external peripherals according to the mode selected. Similarly, all data from the external periphery is written to a single input data stream and used to fulfill the required transaction for one of the possible internal data destinations.

【０５４６】可能な出力データのソースとしては、ＬＭ
Ｃ２３６（ＡＢｕｓを用いる）、ＲＯ２４９（ＲＢｕｓ
を用いる）、それから一般ＣＢｕｓの３つが挙げられ
る。ＰＩＣ２３７は、これらのデータソースからのトラ
ンザクションに一度に１つのみに応答する。１つのソー
スからのトランザクションは次のソースが考慮される前
に完全に終了するのである。一般に、いつでも１つのみ
のデータソースしかアクティヴになってはならないので
ある。２つ以上のソースがアクティヴになった場合には
ＣＢｕｓ、ＡＢｕｓ、ＲＢｕｓのプライオリティで順に
処理される。[0546] Possible output data sources include LM
C236 (using ABus), RO249 (RBus)
), And three of the general CBus. PIC 237 responds to transactions from these data sources only one at a time. Transactions from one source are completed before the next source is considered. Generally, only one data source should be active at any given time. When two or more sources become active, they are processed in order of CBus, ABus, and RBus priorities.

【０５４７】通常通り、モジュールはＰＩＣの内部レジ
スタが含まれている標準ＣＢｕｓインターフェース９９
０のコントロール下で動作する。更に、ＣＢｕｓインタ
ーフェース９９２は、コプロセッサ２２４を介して周辺
デバイスをアクセスし、コントロールすることができ
る。ＡＢｕｓインターフェース９９１もローカルメモリ
制御部とのメモリ相互作用を処理することができる。結
果オーガナイザ２４９に加え、ＡＢｕｓインターフェー
ス９９１とＣＢｕｓインターフェース９９２は両方とも
バイト−ワイドＦＩＦＯが含まれている出力データパス
９９３へデータを送る。出力データパスへのアクセス
は、どのソースが出力ストリームに対してプライオリテ
ィまたは所有権を持っているかを常にチェックする仲裁
者によってコントロールされる。出力データパスは、ど
っちがイネーブルになっているかによってビデを出力制
御部９９４とセントロニクス制御部９９７とインターフ
ェースする。それぞれのモジュール９９４、９９７は出
力データパスの内部ＦＩＦＯから一度に１バイトを読み
出す。セントロニクス制御部９９７は、周辺デバイスを
コントロールするために標準セントロニクスデータイン
ターフェースを具現する。ビデオ出力制御部には、要求
されるビデオ出力プロトコルに従い、出力パッドをコン
トロールするロジックが含まれている。同様に、ビデオ
入力制御部９９８には、用いられているいかなるビデオ
入力標準もコントロールするロジックが含まれている。
ビデオ入力制御部９９８は入力データパスユニット９９
９へ出力を出し、これは再びビデオ入力制御部９９８か
セントロニクス制御部９９７かのいずれかによって一度
に１バイトずつ非同期でＦＩＦＯに書き込まれるデータ
とバイトワイド入力ＦＩＦＯを構成する。As usual, the module is a standard CBus interface 99 containing the PIC internal registers.
Operate under 0 control. Further, the CBus interface 992 can access and control peripheral devices via the coprocessor 224. The ABus interface 991 can also handle memory interactions with the local memory controller. In addition to the result organizer 249, the ABus interface 991 and the CBus interface 992 both send data to an output data path 993 that contains a byte-wide FIFO. Access to the output data path is controlled by an arbitrator that constantly checks which source has priority or ownership over the output stream. The output data path interfaces the bidet with output control 994 and centronics control 997 depending on which is enabled. Each module 994, 997 reads one byte at a time from the internal FIFO of the output data path. The Centronics control unit 997 implements a standard Centronics data interface to control peripheral devices. The video output controller includes logic for controlling the output pads according to the required video output protocol. Similarly, the video input control 998 includes logic to control any video input standard being used.
The video input control unit 998 includes an input data path unit 99
9, which again constitutes a byte-wide input FIFO with data being asynchronously written to the FIFO one byte at a time by either the video input control 998 or the Centronics control 997.

【０５４８】データタイマ９９６には種々のカウンタが
含まれており、出力データパス９９３と入力データパス
９９９内のＦＩＦＯの現在状態をモニタするために用い
られている。以上のことから、コプロセッサを用いると
多重イメージまたは単一イメージの多重部分を同時に生
成するために二重ストリームの命令を実行するのが可能
に思われる。一次命令ストリームは現在ページの出力イ
メージを得るのに用いられ、一次命令ストリームがアイ
ドルになっている間に次のページのレンダリングを始め
るために二次命令ストリームを用いることができる。そ
の結果、標準モードの動作で、現在ページのイメージは
レンダリングされてからＪＰＥＧコーダ２４１を用いて
圧縮される。イメージをプリントする必要がある時に、
コプロセッサ２４１は二度ＪＰＥＧコーダ２４１を用い
てＪＰＥＧエンコーデッドイメージを解凍する。出力デ
バイスにからそれ以上のＪＰＥＧデコーデッドイメージ
の部分が必要とされないアイドルタイムの間に、次のペ
ージまたはバンドの構成のために命令を実行するのが可
能である。一般にこのプロセスは、コプロセッサの動作
オーバーラップにより、イメージを生成するレートを上
げる。特に、コプロセッサ２２４を用いると、コプロセ
ッサに付いたプリンタによってプリントが行われ、結果
的にレンダリングスピードが上がるため、イメージプロ
セシング作業のスピードアップの面でベネフィットが得
られるのである。The data timer 996 includes various counters, and is used to monitor the current state of the FIFO in the output data path 993 and the input data path 999. In view of the foregoing, it appears possible with a coprocessor to execute dual stream instructions to simultaneously generate multiple images or multiple portions of a single image. The primary instruction stream is used to obtain an output image of the current page, and the secondary instruction stream can be used to begin rendering the next page while the primary instruction stream is idle. As a result, in the standard mode of operation, the image of the current page is rendered and then compressed using the JPEG coder 241. When you need to print an image,
The coprocessor 241 uses the JPEG coder 241 twice to decompress the JPEG encoded image. During idle time when no more JPEG decoded image portions are needed from the output device, it is possible to execute instructions for the construction of the next page or band. Generally, this process increases the rate at which images are generated due to the overlap of the coprocessor operations. In particular, when the coprocessor 224 is used, printing is performed by a printer attached to the coprocessor, and as a result, the rendering speed is increased. Therefore, a benefit is obtained in terms of speeding up the image processing operation.

【０５４９】上記好適な実施例は本発明の１つの実施形
態であり、本発明の範囲を外れずに当業者にとって自明
な修正ができることが、以上から明らかであろう。It will be apparent from the foregoing that the preferred embodiment described above is one embodiment of the present invention and that obvious modifications can be made to those skilled in the art without departing from the scope of the present invention.

【０５５０】付録Ａコプロセッサマイクロプログラミングこの節では新しい命令の実行毎にコプロセッサ内で行わ
れる動作について詳述する。命令実行の間にコプロセッ
サにより行われるすべてのセルフコンフィグレーション
は内部のレジスタのリード／ライトにより実現されてお
り、従って、コプロセッサは外部のＣバスインターフェ
ースあるいはホストによってＰＣＩバスインターフェー
スを用いることで完全にマイクロプログラミング可能で
ある。但し、ホストを用いるマイクロプログラミングの
場合には一般的にホスト同期の問題から困難となること
が予想される。本章は読者がコプロセッサについて以下
の点で十分な知識を持っていることを前提している。１．実行モデル２．命令セットとコーディング３．レジスタセット４．内部構造Ａ．１一般事項Ａ１．１コプロセッサのセットアップに関する一般事
項コントロール命令とローカルＤＭＡ命令以外のすべての
命令については、コプロセッサで内のデータの流れは基
本的にピクセルオーガナイザの制御下におかれる。ピク
セルオーガナイザは入力データストリームの先頭のフェ
ッチ、データのカウント、及び最後のデータがフェッチ
された時期の決定について責任を持っている。コプロセ
ッサ内のその他のモジュールは基本的に、送られてきた
データに単に応答するだけである。Ａ１．２モジュールのコンフィグレーション順序すべてのモジュールが命令毎にセットアップされるわけ
ではない。いくつかのモジュールは命令デコーディング
時に、全くコンフィグレーションされない。モジュール
のコンフィグレーション順序は常にＰＯ，ＤＣＣ，ＯＯ
Ｂ，ＯＯＣ，ＭＤＰ，ＪＣ，ＲＯ，ＰＩＣの順である。Ａ１．３その他のレジスタの設定命令が、あるレジスタ値の設定を含んで符号化された場
合にはそのレジスタは次の順序に従うマイクロプログラ
ミングにより設定される。１．設定されるべきレジスタを持つモジュールに、ほか
にレジスタセットが存在しなければ、そのレジスタはほ
かのいかなるレジスタ設定よりも先に設定される。２．設定されるべきレジスタを持つモジュールに、ほか
にもレジスタセットがあるときはそのレジスタはほかの
レジスタの設定が終わった後に、そのモジュールの＿ｃ
ｆｇレジスタの直前に設定される。Ａ１．４整合性のない命令オペランドのコーディング多くの命令は、オペランド及び結果のデータタイプが指
定されているので、ほかのデータタイプが指定された場
合には、無意味な結果を返す。各オペランドに対し、コ
プロセッサは次の手順で目的のオペランドのフォーマッ
トを決定する。１．オペランドの内部フォーマットが１つのピクセル
（圧縮バイトあるいは非圧縮バイト）に特化されている
場合には、対応するオペランドオーガナイザはこれを反
映して設定される。データキャッシュコントローラはコ
ンフィグレーションされず、従ってノーマルモードで演
算が継続される。２．オペランドの内部フォーマットが「その他の形式」
に特化されている場合には、コプロセッサは命令からオ
ペランドのフォーマットを生成する。オペランドＢとオ
ペランドＣについては前進的である。オペランドＡにつ
いて「その他の形式」は元来指定されていなく、コプロ
セッサの振る舞いは定義されていない。対応するオペラ
ンドオーガナイザはバイパスモードになり、データキャ
ッシュコントローラは得られたフォーマットのオペラン
ドデータを管理するように設定される。マイクロプログ
ラミングは合理的に様々なモジュール間で相互独立であ
る。Ａ１．５疑似命令の文法・命令の実行順序は左端の番号で決定される。・レジスタ名はＨｅｌｖｅｔｉｃａＢｏｌｄ体でかか
れている。・レジスタフィールドはｒｅｇｉｓｔｅｒ．ｆｉｅｌｄ
によって示される。・Ｉ，Ｄは現在復号化されている命令ワードとデータワ
ードをそれぞれ示す。・Ａ，Ｂ及びＣは現在復号化されているオペランドワー
ドＡ、オペランドワードＢ、オペランドワードＣを示す。・Ａ＿ｄｅｓｋｒｉｐｔｏｒ，Ｂ＿ｄｅｓｋｒｉｐｔｏ
ｒおよびＣ＿ｄｅｓｋｒｉｐｔｏｒは現在復号化されて
いる命令のデータワードのデスクリプタを示す。・Ｒは現在復号化されている命令の結果ワードを示す。・”Ｘ：Ｙ”はＸとＹの連結を示す。・”＠Ｘ”はコプロセッサのレジスタ番号Ｘを示す。・”Ｃｂｕｓ（Ｘ）”はＣバスオペレーションＸの実行
を示す。・”^＊Ｃｂｕｓ（Ｘ）”はＣバスオペレーションＸによ
る受け取りデータを示す。・”^＊Ｘ”は仮想メモリ番地Ｘを示す。・”？？”は不明な値、あるいは未定の値を示す。・”ｓｅｔ”はデータマニピュレーションレジスタの設
定を示す。Ａ．２合成演算子注：１．主要オペコードは０ｘＣと０ｘＤ２．曖昧さは最上位アドレスのバイト（すなわち、最上
位バイト）であると考える。３．アキュムレータあるいはオペランドはプレ乗算され
ていてもよい。４．結果は非プレ乗算されていてもよい。５．命令長は入力ピクセルの数により定義されている。Ａ．３色空間変換注：１．入力空間は常に３次元である。デフォルトでは３つ
の最下位なピクセルのチャネルである。曖昧さは排除さ
れる。２．カラーテーブルのフォーマットはひとつの出力チャ
ネルを含むものか、４つの出力チャネルを含むもののう
ちどちらかである。Ａ．４ＪＰＥＧ命令注：１．オペコードは０ｘ２である。２．オペランドＣはヤットするためのレジスタでもよ
い。３．オプションは多数存在する。・サブサンプリングを行う／行わない。・フィルタリングを行う／行わない。・１，３あるいは４スキャン。４．これらの命令は命令実行前に設定されたいくつかの
レジスタと関係している。Ａ．４．１伸長注：１．以下のレジスタは命令実行前に設定されている
必要がある。・ｒｏ＿ｉｄｒ：出力画像次元数レジスタ・ｒｏ＿ｃｕｔ：出力カットレジスタ・ｒｏ＿ｌｍｔ：出力制限レジスタＡ．４．２圧縮注：１．以下のレジスタは命令実行前に設定されている必要
がある。・ｐｏ＿ｉｄｒ：出力画像次元数レジスタ・ｊｃ＿ｒｍｌ：再スタートマーカのインターバル・ｒｏ＿ｃｕｔ：出力カットレジスタ・ｒｏ＿ｌｍｔ：出力制限レジスタＡ．５データコーディング注：１．すべてのデータコーディング操作は圧縮、圧縮解除
いずれの場合も同じ様に扱われる。これらの操作設定は
ＪＰＥＧの時とほとんど同じである。２．可能なエンコーディング操作・ハフマン符号化・予測符号化３．可能なデコーディング操作・高速ハフマン復号化・低速ハフマン復号化・ｐａｃｋｂｉｔｓ復号化（バージョンＡ）・ｐａｃｋｂｉｔｓ復号化（バージョンＢ）・予測復号化４．オペランドＣは設定するためのレジスタでも良い。５．以下のレジスタは命令実行前に設定されている必要
がある。・ｒｏ＿ｃｕｔ：出力カットレジスタ・ｒｏ＿ｌｍｔ：出力制限レジスタＡ．６変換と畳み込み１．オペコードは０ｘ４（畳み込み）と０ｘ５（変
換）。２．コプロセッサは画像変換と画像畳み込みのそれぞれ
のために必要となるスーパーセットである操作を行う。
画像変換と画像畳込みの唯一の違いは、コプロセッサに
関する限り、画像変換ではカーネルステップサイズがカ
ーネルの大きさ（水平、垂直）なのに対して、畳込みで
はステップサイズが１ソースピクセルとなっていること
である。３．オプション：・隣接ピクセルへのスナッピングおよび補間・ピクセル（カーネル）の蓄積を行うか否か・ソースピクセルのプレ乗算を行うか否か・最終結果のクランプ、ラッピング、絶対値４．注：変換と畳込みは元の位置には実行できない。つ
まり、ソースのポインタとデスティネーションのポイン
タが同じであるときは、その内容が破壊される。Ａ．７行列乗算注：１．オペコードは０ｘ３２．オプション：・ソースピクセルのプレ乗算を行うか否か・最終結果のクランプ、ラッピング、絶対値化・オペランドＣはレジスタに書き込んでも良いＡ．８ハーフトーン処理注：１．オペコードは０ｘ７２．オプションはハーフトーンのレベル値のみ３．ハーフトーンスクリーンが適切にメッシュあるいは
アンメッシュされているかぎり、ピクセルあるいはバイ
トに対して行うことができる。Ａ．９メモリーコピー注：１．オペコードは０ｘ９２．この命令はメモリーコピー
の操作を完了するために、全く個別の機構を用いてい
る。・汎用データ転送命令はコプロセッサにおける通常のデ
ータフローを利用し、ＰＯおよびＲＯ内のデータ操作ユ
ニットを用いる様々な関数を利用できる。・ペリフェラルＤＭＡ命令はＰＩＣとＬＭＣ間の直接的
なコネクションを利用する。このことはデータ操作がで
きないことを意味し、後続の命令と同時実行が可能であ
る。Ａ．９．１汎用データ転送Ａ．９．２ペリフェラルＤＭＡ転送注：１．同時実行でもそうでなくとも良い。このことは、Ｉ
Ｃによって扱われている。２．オペランドＣは設定するレジスタでも良い３．ＰＩＣはデータを扱うモジュールなので、この命令
はほかの”能動”命令と異なる。Ａ．１０フォトＣＤ伸長この命令群は３つの異なる操作すなわち、水平補間、垂
直補間、残部融合から構成される。垂直補間と残部融合
の設定方法は同じである。これら全ての命令のオペコ
ードは０ｘ９である。Ａ．１０．１水平補間注：１．ピクセルあるいはバイトに対して実行可能２．この命令はオペランドが１つの命令であり、オペラ
ンドＣは設定するレジスタでも良い。Ａ．１０．２垂直補間と残部融合注：１．垂直補間と残部融合の設定は同じである。２．ピクセルとバイトの両方に対して実行可能。３．この命令はオペランドが２つの命令であり、オペラ
ンドＣはレジスタセットでも良い。Ａ．１１制御命令注：１．制御命令は２種類の操作、すなわちフロー制御命令
と内部アクセス命令からなる。Ａ．１１．１フロー制御注：１．オペコードは０ｘＢ２．フロー制御命令は現在、各種ジャンプ命令と各種の
待機命令から成っている。３．コプロセッサ内では明確な設置は行われず、またこ
の命令は、”能動”命令ではない。つまり、ほかの命令
のようにコプロセッサ内のサブモジュールが実際に何か
を行ったりはしない。４．オペランドＣは設定するレジスタでも良い。Ａ．１１．２内部アクセス（リード）注：１．オペコードは０ｘＡ２．リード命令はデータをコプロセッサ外に転送する。３．ＲＯが実際にコプロセッサ内ですべてを行う唯一の
モジュールである。Ａ．１１．３内部アクセス（ライト）注：１．オペコードは０ｘＡ２．ライト命令はデータをコプロセッサ内に転送する。３．この命令は”能動”命令ではないので、ＩＣ以外の
モジュールは実際には何も行わない。Ａ．１２予約された命令注：１．オペコード０ｘ０，０ｘＦは予約されている。２．予約された命令はマスク可能なエラーを出す。３．これらの予約された命令はコプロセッサが今後改訂
されたときにほかの命令として使用されることになって
いる。付録Ｂ：レジスタ１．１レジスタおよびテーブル本節ではコプロセッサのレジスタについて解説する。こ
れらのレジスタは３通りの方法で変更可能である。１．特定のコプロセッサの命令群ははレジスタの読み書
きをするためにある。これらの命令群を用いることでレ
ジスタは、イニシエータのＰＩＣバスサイクルの開始あ
るいは汎用インターフェースのトランザクションを用い
て、ローカルメモリインターフェースに関連するメモリ
への、あるいはメモリからの読み書きが行われる。２．多くのレジスタは命令実行の副作用により内容が変
化する。命令実行のためにコプロセッサが自身の設定を
行うという主要な機構は、様々なレジスタを現在の状態
を反映するように設定することで実現されている。命令
実行終了後には各レジスタはコプロセッサの状態を反映
する。多くの典型的な処理はある命令により完全に特定
され、設定される。いくつかのレジスタでは命令実行の
直前に設定する必要がある。「予約」レジスタビットの意味あらゆるレジスタ或はその構成要素の「予約」の意味は
次の通りである。・予約された場所への書き込みは行えるが、そのデータ
は棄却される。・予約された場所からの読み込みは行えるが、そのデー
タは不定である全ての特定されていないレジスタ及びレジスタフィール
ドは「予約」である。１．１．１レジスタの分類コプロセッサ内のレジスタは本節に記述される振る舞い
に基づいて分類される。これらの記述は・外部：モジュール外部（からのアクセス）。ＣＢｕｓ
インターフェースを用いた外部アクセスである。すなわ
ち、命令コントローラあるいは外部ＣＢｕｓインターフ
ェースによるターゲットモードのＰＣＩを用いる。注、
レジスタは、バイセットモードを介してＰＣＩバスから
セットできない。・内部：モジュール内部（からのアクセス）状態レジスタ状態レジスタは外部からは読み込み専用で、内部からは
読み書き可能。コンフィグ１レジスタコンフィグ１レジスタは外部からは読み書き可能で、内
部からは読み込み専用である。コンフィグ１レジスタは
タイプＣのＣＢｕｓ操作はサポートせず（すなわち、ビ
ットセットモードをサポートしない）、アドレス値のよ
うなバイト（またはそれより大きな）コンフィギュレー
ション情報を保持するレジスタとして用いられる。コンフィグ２レジスタコンフィグ２レジスタも外部から読み書き可能で、内部
からは読み込み専用である。コンフィグ２レジスタはタ
イプＣのＣＢｕｓ操作（すなわちビットセットモード）
をサポートし、ビット単位で設定する必要のあるコンフ
ィギュレーション情報を保持するレジスタとして用いら
れる。コントロール１レジスタコントロール１レジスタは外部および内部から読み書き
可能。コントロール１レジスタはタイプＣのＣＢｕｓ操
作をサポートせず（すなわちビットセットモードをサポ
ートしない）、アドレス値のようなバイト（またはそれ
より大きなコントロール情報を保持するレジスタとして
用いられる。コントロール２レジスタコントロール２レジスタは外部および内部から読み書き
可能。コントロール２レジスタはタイプＣのＣＢｕｓ操
作（すなわちビットセットモード）をサポートし、ビッ
ト単位で設定する必要のあるコントロール情報を保持す
るレジスタとして用いられる。割り込みレジスタ割り込みレジスタ内のビットは内部からは１にセットで
き、外部からは１を書き込むことによって０にリセット
できる。モジュール割り込み／エラーレジスタもこのタ
イプである。モジュールの割り込み／エラーレジスタは
３つのフィールドから構成される。［７：０］モジュールによって生成されたあらゆるエラ
ー状態（ステータス）を意味する［２３：８］モジュールによって生成されたあらゆる例
外状態を意味する［３１：２４］モジュールによって生成されたあらゆる
割り込み状態を意味する１．１．２レジスタマップ表１．１はコプロセッサのレジスタである。番号はアド
レスではなくレジスタ番号である。表１．１コプロセッサレジスタ１．１．３レジスタ定義汎用モジュールレジスタ命令コントローラレジスタＩ．ｉｃ＿ｃｆｇｉｃ＿ｃｆｇレジスタは３つの部分に別れる。最下位バ
イトはグローバルコンフィギュレーション情報を含む。
最下位から３番目のバイトはストリームＡのコンフィギ
ュレーション情報を含み、最上位バイトはストリームＢ
のコンフィギュレーション情報を含む。このレジスタ
のリセット値は０ｘ００００００００である。ｍ．ｉｓ＿ｓｔａｔこのレジスタは４つのセクションに分かれている。最下
位バイトはＩＣの内部状態を保持する。最下位から２番
目のバイトは現在の命令の復号化された結果と現在及び
プリフェッチした命令ストリームを保持する。最上位か
ら２番目のバイトはＡストリームに関してすべてのステ
ータス情報を保持する。最上位バイトはＢストリームに
関する情報を保持する。このレジスタのリセット値は０
ｘ００００００００である。ｎ．ｉｃ＿ｅｒｒｉｎｔこのレジスタはＩＣ内部で割り込みやエラーが発生した
かどうかを示す、アクティブ・ハイのフラグを含む。そ
れぞれのビットは１を書き込むことでクリアされる。ｏ．ｉｃ＿ｅｒｒ＿ｉｎｔ＿ｅｎこのレジスタは様々なエラーや割り込みの許可のマスク
を含み、リセット値は０ｘ００００００００である。ｐ．ｉｃ＿ｉｐａこのレジスタはストリームＡの命令フェッチに用いられ
る仮想アドレスの最上位３０ビットを保持する。２つの
最下位ビットは命令が整列されてるはずであるとして０
に仮定される。このレジスタのリセット値は０ｘ０００
０００００である。ｑ．ｉｃ＿ｔｄａこのレジスタはストリームＡの“ｔｏｄｏ”値を保持
する。これは適正な命令が存在するまでの３２ビット
（ラッピング）のシーケンス番号である。このレジスタ
のリセット値は０ｘ００００００００である。ｒ．ｉｃ＿ｆｎａこのレジスタはストリームＡの「終了」値を保持する。
これは３２ビット（ラッピング）のシーケンス番号で最
後に完了した命令を示している。このレジスタのリセッ
ト値は０ｘ００００００００である。ｓ．ｉｃ＿ｉｎｔａこのレジスタはストリームＡの「割り込み」番号を保持
する。これは機構が有効であり用意されている場合にど
こへ割り込みをかけるかの、３２ビット（ラッピング）
のシーケンス番号である。このレジスタのリセット値は
０ｘ００００００００である。ｔ．ｉｃ＿ｌｏａこのレジスタはストリームＡで実行される最後の重複命
令の３２ビット（ラッピング）のシーケンス番号を保持
する。このレジスタのリセット値は０ｘ０００００００
０である。ｕ．ｉｃ＿ｉｐｂこのレジスタはストリームＢの命令フェッチに用いられ
る仮想アドレスの最上位３０ビットを保持する。２つの
最下位ビットは命令が整列されているはずであるとして
０に仮定される。このレジスタのリセット値は０ｘ００
００００００である。ｖ．ｉｃ＿ｔｄｐこのレジスタはストリームＢの“ｔｏｄｏ”値を保持
する。これは適正な命令が存在するまでの３２ビット
（ラッピング）番号である。このレジスタのリセット値
は０ｘ００００００００である。ｗ．ｉｃ＿ｆｎｂこのレジスタはストリームＢの「終了」値を保持する。
これは３２ビット（ラッピング）のシーケンス番号で最
後に完了した命令を示している。このレジスタのリセッ
ト値は０ｘ００００００００である。ｘ．ｉｃ＿ｉｎｔｂこのレジスタはストリームＢの「割り込み」番号を保持
する。これは機構が有効であり用意されている場合にど
こへ割り込みをかけるかの、３２ビット（ラッピング）
のシーケンス番号である。このレジスタのリセット値は
０ｘ００００００００である。ｙ．ｉｃ＿ｌｏｂこのレジスタはストリームＢで実行される最後の重複命
令の３２ビット（ラッピング）のシーケンス番号を保持
する。このレジスタのリセット値は０ｘ０００００００
０である。ｚ．ｉｃ＿ｓｅｍａこのレジスタはｉｃ＿ｓｔａｔレジスタの副作用を用い
たエイリアスであり、このレジスタの読み込はストリー
ムＡのレジスタセマフォの要求の副作用である。ａａ．ｉｃ＿ｓｅｍｂこのレジスタはｉｃ＿ｓｔａｔレジスタの副作用を用い
たエイリアスであり、このレジスタの読み込みはストリ
ームＢのレジスタセマフォの要求の副作用である。入力インターフェースレジスタａｂ．ｉｉｓ＿ｃｆｇａｃ．ｉｉｓ＿ｓｔａｔａｄ．ｉｉｓ＿ｅｒｒ＿ｉｎｔａｅ．ｉｉｓ＿ｅｒｒ＿ｉｎｔ＿ｅｎａｆ．ｉｉｓ＿ｉｃ＿ａｄｄｒａｇ．ｉｉｓ＿ｄｃｃ＿ａｄｄｒａｈ．ｉｉｓ＿ｐｏ＿ａｄｄｒａｉ．ｉｉｓ＿ｂｕｒｓｔａｊ．ｉｉｓ＿ｂａｓｅ＿ａｄｄｒａｋ．ｉｉｓ＿ｔｅｓｔ外部インターフェースコントローラレジスタａｌ．ｅｉｃ＿ｃｆｇａｍ．ｅｉｃ＿ｓｔａｔａｎ．ｅｉｃ＿ｅｒｒ＿ｉｎｔｅｉｃ＿ｅｒｒ＿ｉｎｔレジスタのエラー及び割り込み
ビットはＥＩＣのみによって設定でき、ソフトウェアの
みによってリセットできる。通常のエラー及び割り込み
ビットはそのビットに１を書き込むことでリセットされ
る。ＰＣＩコンフィギュレーションレジスタビットのコ
ピーであるエラービットはＰＣＩコンフィギュレーショ
ンレジスタに書き込むことでクリアされなければならな
い。すなわち、ｅｉｃ＿ｅｒｒ＿ｉｎｔでのコピーは何
も影響しない。ａｏ．ｅｉｃ＿ｅｒｒ＿ｉｎｔ＿ｅｎａｐ．ｅｉｃ＿ｔｅｓｔａｑ．ｅｉｃ＿ｐｏｂａｒ．ｅｉｃ＿ｈｉｇｈ＿ａｄｄｒａｓ．ｅｉｃ＿ｗｔｌｂ＿ｖａｔ．ｅｉｃ＿ｗｔｌｂ＿ｐａｕ．ｅｉｃ＿ｍｍｕ＿ｖ注：このレジスタの値は、ＭＭＵがページフォールトエ
ラーあるいはＭＭＵからＰＣＩバスのエラーにより無効
でないなら、いつでも変更可能である。ａｖ．ｅｉｃ＿ｍｍｕ＿ｐ注：このレジスタの値は、ＭＭＵがページフォールトエ
ラーあるいはＭＭＵからＰＣＩバスのエラーにより無効
でないなら、いつでも変更可能である。ａｗ．ｅｉｃ＿ｉｐ＿ａｄｄｒ注：このレジスタの値はＩＢＤがＩＢｕｓからＰＣＩバ
スへのエラーによって無効でないならいつでも変更可能
である。ａｘ．ｅｉｃ＿ｒｐ＿ａｄｄｒ注：このレジスタの値はＲＢＲがＲＢｕｓからＰＣＩバ
スへのエラーによって無効でないなら、いつでも変更可
能である。ａｙ．ｅｉｃ＿ｉｇ＿ａｄｄｒ注：このレジスタの値
はＧＢＣが汎用バスのエラーによって無効でないなら、
いつでも変更可能である。ａｚ．ｅｉｃ＿ｒｇ＿ａｄｄｒ注：このレジスタの値はＧＢＣが汎用バスのエラーによ
って無効でないなら、いつでも変更可能である。ＰＣＩバスコンフィギュレーション空間のエイリアス１６ワードからなるＰＣＩバスコンフィギュレーション
空間は０ｘｃ０から０ｘｃｆまでのアドレスで示される
レジスタにエイリアスされている。ローカルメモリコントローラレジスタｂａ．ｌｍｉ＿ｃｆｇこのレジスタはＬＭＣの処理モードとパラメータを決定
するのに用いられる多くのコンフィギュレーションビッ
トと制御ビットを含む。ｓｄｒａｍ＿１ピンがハイの時
ＳＤＲＡＭ処理を特別に参照するビットは全く影響を持
たない。このレジスタはｃｌｋｉｎの周波数が８０Ｍ
Ｈｚのとき３．２マイクロ秒のリフレッシュ間隔である
ようなリセット値０ｘ２００００１００をもつ。すべて
の特別なモードや機能は電源投入時には無効であり、す
べてのアクセス権限は等しく０に設定される。リフレッ
シュはリセット時に有効であるが、ほかのモジュールは
無効（Ｅ＝０）である。リフレッシュはＥビットに影響
されない。ｂｂ．ｌｍｉ＿ｓｔａｔステータスレジスタはマシン内部の情報と同様にモジュ
ールのアクティブや未決定ビットからなる。ステートマ
シンはＣＢｕｓインターフェースの２倍のクロックで駆
動されており、従って最新の８０ＭＨｚクロック２サイ
クルそれぞれの状態情報を保持するのには２フィールド
必要である。ｂｃ．ｌｍｉ＿ｅｒｒ＿ｉｎｔエラーと割り込みのステータスレジスタは割り込み、例
外、エラー状態の情報を保持する。レジスタは読み書き
でき、読み込みはステータス情報を返し、特定ビットへ
の１の書き込みはそのビットをリセットする。０の書き
込みはそのビットに対して全く影響を持たない。このレジスタはリセット値０ｘ００００００００を持た
なくてはならず、これは割り込み及びエラーが発生して
いないことを示す。予約ビットは常に０であり決して状
態を変更できない。ｂｄ．ｌｍｉ＿ｅｒｒ＿ｉｎｔ＿ｅｎレジスタエラー、例外、割り込み有効レジスタはエラー、例外割
り込み信号の有効、無効の選択に用いられる。レジスタ
は読み書きできる。このレジスタはｌｍｉ＿ｅｒｒ＿ｉ
ｎｔレジスタ内のエラー、例外、割り込みそれぞれに基
づいて、ビット単位で有効化するのに用いられる。この
レジスタのビットとｌｍｉ＿ｅｒｒ＿ｉｎｔレジスタの
ビットとの間には１対１の対応がある。もしｌｍｉ＿ｅ
ｒｒ＿ｉｎｔ＿ｅｎレジスタの特定のビットがハイにな
ったらｌｍｉ＿ｅｒｒ＿ｉｎｔレジスタの対応するビッ
トが有効になり、それがハイであるならば、ＬＭＣモジ
ュールエラー、例外あるいは割り込み信号、ｃ＿ｅｒ
ｒ、ｃ＿ｅｘｐ、あるいはｃ＿ｉｎｔが発生できる。も
しｌｍｉ＿ｅｒｒ＿ｉｎｔ＿ｅｎレジスタの特定のビッ
トがクリアされたらたらｌｍｉ＿ｅｒｒ＿ｉｎｔレジス
タの対応するビットが無効になり、ｃ＿ｅｒｒ、ｃ＿ｅ
ｘｐあるいはｃ＿ｉｎｔを発生させることはできない。
ＬＭＣには例外はないので、このレジスタのｅｘｐ＿ｍ
ａｓｋビットは全く影響せず、すべて予約である。この
レジスタのリセット値はすべてのエラー及び割り込み源
を無効にする０ｘ００００００００である。使用されな
いビットは常に０であり、ハイにセットすることはでき
ない。ｂｅ．ｌｍｉ＿ｄｃｆｇこのコンフィギュレーションレジスタはＤＲＡＭチップ
を使用する場合のサイズやコンフィギュレーションを決
定する設計パラメータを保持する。このレジスタはす
べてのタイミング制限の値を最大値にするようなリセッ
ト値０ｘ０００７ｆｆ８０を保持する。ｂｆ．ｌｍｉ＿ｍｏｄｅレジスタこのコンフィギュレーションレジスタは初期化処理の一
環としてＳＤＲＡＭモードレジスタに書き込まれる情報
を保持する。このレジスタは常に読み書き可能で、初期
化ビットをセットすることによってＳＤＲＡＭに書き込
んでも良い。このレジスタはリセット値０ｘ００３７を
もつ。この有用なデフォルト値は電源投入プリチャージ
後あるいはレベル１のリセット後直ちに要求される。こ
れは読み込み遅延を３クロックに設定し、バースト長を
シーケンシャルラップを用いたフルページに設定する。
あらゆるリセットの後、もしｓｄｒａｍ＿１ピンがロー
であれば、ＳＤＲＡＭモードレジスタを初期的にプログ
ラムするために、初期化ビットはセットされる。モード
レジスタの書き込み実行後、このビットは自動的にゼロ
にクリアされる。周辺インターフェースレジスタｂｇ．ｐｉｃ＿ｃｆｇレジスタｂｈ．ｐｉｃ＿ｓｔａｔｂｉ．ｐｉｃ＿ｅｒｒ＿ｉｎｔｐｉｃ＿ｅｒｒ＿ｉｎｔレジスタのエラーおよび割り込
みビットはＰＩＣのみによりセットされ、ソフトウェア
のみによってリセットされる。それぞれのビットは１を
書き込むことでリセットされるｂｊ．ｐｉｃ＿ｅｒｒ＿ｉｎｔ＿ｅｎｂｋ．ｐｉｃ＿ａｂｕｓ＿ｃｆｇｂｌ．ｐｉｃ＿ａｂｕｓ＿ａｄｄｒｂｍ．ｐｉｃ＿ｃｅｎｔ＿ｃｆｇｐｉｃ＿ｃｅｎｔ＿ｃｆｇレジスタはセントロニクスモ
ードが有効の場合に、すべてのインターフェースの局面
を制御する読み込み／書き込み信号及び読み込み専用ス
テータス信号を含んでいる。ｂｎ．ｐｉｃ＿ｃｅｎｔ＿ｄｉｒｂｏ．ｐｉｃ＿ｒｅｖｅｒｓｅ＿ｃｆｇｂｐ．ｐｉｃ＿ｔｉｍｅｒ０ｂｑ．ｐｉｃ＿ｔｉｍｅｒ１データキャッシュコントローラレジスタｂｒ．ｄｃｃ＿ｃｆｇ１ｂｓ．ｄｃｃ＿ｃｆｇ２ｂｔ．ｄｃｃ＿ｓｔａｔｂｕ．ｄｃｃ＿ｅｒｒ＿ｉｎｔｂｖ．ｄｃｃ＿ｅｒｒ＿ｉｎｔ＿ｅｎｂｗ．ｄｃｃ＿ｌｖ０ｂｘ．ｄｃｃ＿ｌｖ１ｂｙ．ｄｃｃ＿ｌｖ２ｂｚ．ｄｃｃ＿ｌｖ３ｃａ．ｄｃｃ＿ａｄｄｒｃｂ．ｄｃｃ＿ｒａｄｄｒｂｃｃ．ｄｃｃ＿ｒａｄｄｒｃｃｄ．ｄｃｃ＿ｔｅｓｔオペランドオーガナイザレジスタオペランドオーガナ
イザレジスタには同様の２つのオペランドオーガナイザ
が存在する：オペランドオーガナイザＢとオペランドオ
ーガナイザＣである。これらの２つのオペランドオーガ
ナイザ用のレジスタはここに記述されている。ｃｅ．ｏｏｎ＿ｃｆｇ（ｏｏｂ＿ｃｆｇ＝０ｘ７０，
ｏｏｃ＿ｃｆｇ＝０ｘ８０）ｃｆ．ｏｏｎ＿ｓｔａｔ（ｏｏｂ＿ｃｆｇ＝０ｘ７
１，ｏｏｃ＿ｃｆｇ＝０ｘ８１）ｃｇ．ｏｏｎ＿ｅｒｒ＿ｉｎｔ（ｏｏｂ＿ｅｒｒ＿ｉ
ｎｔ＝０ｘ７２，ｅｒｒ＿ｉｎｔ＝０ｘ８２）ｃｈ．ｏｏｎ＿ｅｒｒ＿ｉｎｔ＿ｅｎ（ｏｏｂ＿ｅｒ
ｒ＿ｉｎｔ＿ｅｎ＝０ｘ７３，ｅｒｒ＿ｉｎｔ＿ｅｎ＝
０ｘ８３）ｃｉ．ｏｏｎ＿ｄｍｒ（ｏｏｂ＿ｄｍｒ＝０ｘ７４，
ｏｏｃ＿ｄｍｒ＝０ｘ８４）ｃｊ．ｏｏｎ＿ｓｕｂｓｔ（ｏｏｂ＿ｓｕｂｓｔ＝０
ｘ７５，ｏｏｃ＿ｓｕｂｓｔ＝０ｘ８５）ｃｋ．ｏｏｎ＿ｃｄｐ（ｏｏｂ＿ｃｄｐ＝０ｘ７６，
ｏｏｃ＿ｃｄｐ＝０ｘ８６）ｃｌ．ｏｏｎ＿ｌｅｎ（ｏｏｂ＿ｌｅｎ＝０ｘ７７，
ｏｏｃ＿ｌｅｎ＝０ｘ８ｃｍ．ｏｏｎ＿ｓａｉｄ（ｏｏｂ＿ｓａｉｄ＝０ｘ７
８，ｏｏｃ＿ｓａｉｄ＝０ｘ８８）ｃｎ．ｏｏｎ＿ｔｉｌｅ（ｏｏｂ＿ｔｉｌｅ＝０ｘ７
９，ｏｏｃ＿ｔｉｌｅ＝０ｘ８９）ピクセルオーガナイザレジスタｃｏ．ｐｏ＿ｃｆｇｃｐ．ｐｏ＿ｓｔａｔｃｑ．ｐｏ＿ｅｒｒ＿ｉｎｔｃｒ．ｐｏ＿ｅｒｒ＿ｉｎｔ＿ｅｎｃｓ．ｐｏ＿ｄｍｒｃｔ．ｐｏ＿ｓｕｂｓｔｃｕ．ｐｏ＿ｃｄｐｃｖ．ｐｏ＿ｌｅｎｃｗ．ｐｏ＿ｓａｉｄｃｘ．ｐｏ＿ｉｄｒｃｙ．ｐｏ＿ｍｕｖ＿ｖａｌｉｄｃｚ．ｐｏ＿ｍｕｖ主データパスレジスタｄａ．ｍｄｐ＿ｃｆｇすべてのビットは０にリセッ
トされる。ｄｂ．ｍｄｐ＿ｓｔａｔすべてのビットは０にリセットされる。ｄｃ．ｍｄｐ＿ｅｒｒ＿ｉｎｔすべてのビットは０にリセットされる。ｄｄ．ｍｄｐ＿ｅｒｒ＿ｉｎｔ＿ｅｎすべてのビットは０にリセットされる。ｄｅ．ｍｄｐ＿ｔｅｓｔすべてのビットは０にリセ
ットされる。ｄｆｍｄｐ＿ｏｐ１すべてのビットは０にリセット
される。ｄｇｍｄｐ＿ｏｐ２すべてのビットは０にリセット
される。ｄｈｍｄｐ＿ｐｏｒすべてのビットは０にリセット
される。ｄｉｍｄｐ＿ｂｉすべてのビットは０にリセットさ
れる。ｍｄｐ＿ｂｉレジスタは種々のモードの様々なも
のに用いられる。ｄｊｍｄｐ＿ｂｍすべてのビットは０にリセットさ
れる。ｍｄｐ＿ｂｍレジスタは異なるモードの異なるも
のに用いられる。ｄｋｍｄｐ＿ｌｅｎすべてのビットは０にリセット
されるＪＰＥＧ符号化器レジスタｄｌｊｃ＿ｃｆｇｄｍｊｃ＿ｓｔａｔｄｎｊｃ＿ｅｒｒ＿ｉｎｔｄｏｊｃ＿ｅｒｒ＿ｉｎｔ＿ｅｎｄｐｊｃ＿ｒｓｉｄｑｊｃ＿ｄｅｃｏｄｅｄｒｊｃ＿ｒｅｓｄｓｊｃ＿ｔａｂｌｅ＿ｓｅｌ結果オーガナイザレジスタｄｔｒｏ＿ｃｆｇｄｕｒｏ＿ｓｔａｔｄｖｒｏ＿ｅｒｒ＿ｉｎｔｄｗｒｏ＿ｅｒｒ＿ｉｎｔ＿ｅｎｄｘｒｏ＿ｄｍｒｄｙｒｏ＿ｓｕｂｓｔｄｚｒｏ＿ｃｄｐｅａｒｏ＿ｌｅｎｅｂｒｏ＿ｓａｅｃｒｏ＿ｉｄｒｅｄｒｏ＿ｖｂａｓｅｅｅｒｏ＿ｃｕｔｅｆｒｏ＿ｌｍｔＰＣＩコンフィギュレーション空間のエイリアスＰＣ
Ｉコンフィギュレーション空間は２５６バイトの、ＰＣ
Ｉによって定義されたレジスタのブロックであり、ホス
トがＰＣＩデバイスをコンフィギュレーションしたり、
その状態を読んだりすることを認めている。それはＰＣ
Ｉコンフィギュレーションサイクルを用いてアクセスさ
れる。レジスタはまたコプロセッサの内部メモリの読み
込み専用エリアにミラーされており、従ってＰＣＩの通
常のメモリサイクルを用いて読むことができる。ＥＩＣ
に実装されているコンフィギュレーション空間のフォー
マットを表１．１４１．１に示す。表１．１４１．１コプロセッサＰＣＩ構成の空間的レ
イアウト予約のレジスタと実装されたレジスタにおける予約のビ
ットは読み込みに対しては０を返し、また書き込みによ
って影響しない。０ｘ４０−０ｘｆｆの範囲のコンフィ
ギュレーション空間のアドレスもまた予約である。ベン
ダー専用のコンフィギュレーションレジスタは定義され
ない。ｅｇベンダーＩＤこのレジスタは読み込み専用である。ＣＩＳＲＡのベン
ダーＩＤは０ｘ１１ＡＣである。ｅｈデバイスＩＤこのレジスタは読み込み専用である。コプロセッサのデ
バイスＩＤは０ｘ０００１である。デバイスＩＤフィー
ルドは二つの８ビットのフィールドに分割されている：
最上位の８ビットはデバイスの特徴をを示す番号（０ｘ
０はコプロセッサ）で、最下位の８ビットはそのデバイ
スのバージョン番号（０ｘ１はコプロセッサのバージョ
ン）を示す。ｅｉコマンドレジスタコマンドレジスタのフィールドの定義を表１．１４２に
示す。このレジスタのすべての予約されていないビット
は読みこみ／書き込みができる。リセット後にはこのレ
ジスタは０ｘ００００にセットされる。ｅｊステータスレジスタステータスレジスタのフィ
ールドの定義を表１．１４３に示す。このレジスタの読
み込みは通常通りである。このレジスタのいくつかのビ
ットは読み込み専用である。その他のビットはコプロセ
ッサのみにより１にセットされ、ホストのみによって０
にリセットされる（テストモードを除く）。ホストはそ
のビットに１を書き込むことでリセットする；０の書き
込みは意味をなさない。リセット後にはこのレジスタは
０ｘ０２８０にセットされる。ｅｋリビジョンＩＤこれは読み込み専用のレジスタ
である。コプロセッサの初期リビジョンＩＤは０ｘ０１
である。ｅｌクラスコードこれは読み込み専用の
レジスタである。コプロセッサはＰＣＩＳＩＧの定義さ
れたクラスコードに適さないのでこのレジスタは０ｘＦ
Ｆ００００にセットされる。ｅｍキャッシュラインサイズこれは３２ビットワード単位でシステムのキャッシュラ
インサイズを決定する読み書き可能なレジスタである。
これはコプロセッサがメモリ読み込みラインやメモリ多
重読み込みコマンドを使用するときに決定する。コプロ
セッサはこのレジスタの０から２５５までの値をサポー
トする。このレジスタにおける０はメモリ読み込みライ
ンおよびメモリ多重読み込みの形式を無効にする。この
レジスタはリセット時には０ｘ００にセットされる。ｅｎ遅延タイマこのレジスタはすべてのＰＣＩの処理にＣＰＵが使用す
る最大のクロック数を特定する読み書きできるレジスタ
である。コプロセッサはこのレジスタにおいて０から２
５５の値をサポートする。このレジスタはリセット時に
は０ｘ００にセットされる。ｅｏヘッダタイプこの読み込み専用のレジスタは０ｘ００にセットされ
る。このことはコプロセッサがタイプ０のレイアウトの
コンフィギュレーション空間を使用することを意味す
る。ｅｐベースアドレスこの読み書き可能なレジスタはコプロセッサの内部レジ
スタ、内部メモリ、ローカルメモリ、及び汎用インター
フェースをホストのメモリマップ内に配置するために用
いられる。コプロセッサの様々なリソースは６４ＭＢ
（すべてが使用される訳ではない）を占有し、従ってこ
のレジスタの先頭６ビットだけが書き込み可能である。
残りのビットはすべて０にハード的に結線されている。
このレジスタの下位の４ビットは読み込み専用の制御ビ
ットであり、これらもまた０に結線されている。このこ
とはレジスタがメモリ空間を参照することを意味し、コ
プロセッサがホスト側の３２ビット空間のどこにでもマ
ッピングされ、コプロセッサのリソースがターゲットで
あるときはプリフェッチできないことを意味する。ｅｑサブシステムベンダーＩＤこの読み込み専用レジスタはホストがシステムに実装さ
れたＰＣＩボードのベンダーを識別できるようにする
（ボード上のＰＣＩインターフェースに実装したコンポ
ーネントのベンダーに対して）。このレジスタの内容は
リセット時にＥＩＣコンフィギュレーションシリアルポ
ートを用いてロードされる。ｅｒサブシステムＩＤこの読み込み専用レジスタはホストがシステムに実装さ
れたＰＣＩボードを識別できるようにする。このレジス
タの内容はリセット時にＥＩＣコンフィギュレーション
シリアルポートを用いてロードされる。このメカニズム
はボードの機能あるいはコンフィギュレーションに必要
な情報の外部からの符号化およびホストからの読み込み
を可能にする。ｅｓ割り込みラインこの読み書きできるレジスタはシステムソフトウェアが
割り込みラインルーティング情報を記録できる様にする
ために使用され、割り込みサービスソフトウェアにより
アクセスできる。コプロセッサ内の処理には全く影響を
与えない。このレジスタはリセット時には０ｘ００にセ
ットされる。ｅｔ割り込みピンこの読み込み専用レジスタはハード的に０ｘ０１に結線
されている。このことはコプロセッサがＰＣＩのｉｎｔ
ａ＿１割り込みピンを使用することを示す。ｅｕＭｉｎ＿Ｇｎｔこの読み込み専用レジスタはコプロセッサが要求する１
／４マイクロ秒単位のバースト期間長をホストに示す。
このレジスタの最適な値はまだ決まっていない。ｅｖＭａｘ＿Ｌａｔこの読み込み専用レジスタは１／４マイクロ秒単位で
の、コプロセッサが要求するＰＣＩバスのゲインコント
ロール最大遅延をホストに示す。このレジスタの最適な
値はまだ決まっていない。１．１．４内部メモリマップ本節ではコプロセッサの内部メモリマップ内のプレモジ
ュールデータエリアに生ずるオブジェクトの詳細につい
て述べる。１．１．５メモリワードフィールドａｅｉｃ＿ｐｔｐ Appendix A Coprocessor Microprogramming In this section, each execution of a new instruction is performed in the coprocessor.
The operation performed will be described in detail. Coprocessor during instruction execution
All self-configuration performed by the
Is implemented by reading / writing internal registers.
Therefore, the coprocessor is connected to the external C bus interface.
Interface or PCI bus interface by host
Is fully microprogrammable
is there. However, in microprogramming using the host
In some cases it can be difficult due to host synchronization issues
Is expected. This chapter is based on the reader
It is assumed that you have sufficient knowledge in this regard. 1. Execution model 2. 2. Instruction set and coding Register set 4. Internal structure A. 1 General A1.1 General about coprocessor setup
Item All except the control instruction and local DMA instruction
For instructions, the data flow in the coprocessor is based
It is essentially under the control of a pixel organizer. Pic
The cell organizer starts the first data stream in the input data stream.
Switch, data count, and last data fetch
Responsible for determining when it was done. Coproce
Other modules in the server are basically sent
It simply responds to the data. A1.2 Module configuration order All modules are set up for each instruction
is not. Some modules have instruction decoding
Sometimes not configured at all. module
Configuration order is always PO, DCC, OO
B, OOC, MDP, JC, RO, and PIC. A1.3 Other register settings If an instruction is coded including the setting of certain register values,
If the register is a microprogram that follows the following order:
This is set by the user. 1. Modules with registers to be set, other
If the register set does not exist in the
Is set before any register setting. 2. Modules with registers to be set, other
If there is also a register set, that register is
After register setting, _c of the module
It is set immediately before the fg register. A1.4 Coding of inconsistent instruction operands Many instructions have operands and result data types that are not specified.
Is specified, if another data type is specified
Returns a meaningless result. For each operand,
The processor follows this procedure to format the desired operand.
To decide. 1. The internal format of the operand is one pixel
(Compressed or uncompressed bytes)
In that case, the corresponding operand organizer
Set to reflect. The data cache controller is
Not configured, and therefore perform in normal mode.
The calculation is continued. 2. The internal format of the operand is "other format"
If the coprocessor is specialized for
Generate Peland format. Operand B and E
Perland C is progressive. Operand A
And other formats are not originally specified,
Sessa's behavior is undefined. Corresponding Opera
The organizer goes into bypass mode and the data
Flash controller is the format of the opera
Is set to manage data. Microprog
Ramming is reasonably independent of the various modules.
You. A1.5 Syntax of pseudo-instructions • Instruction execution order is determined by the leftmost number.・ Register name is Helvetica Bold
Have been. -The register field is register. field
Indicated by • I and D are the instruction word and data word currently being decoded.
Mode is shown. A, B and C are the operand words currently decoded
A, operand word B, and operand word C.・ A_descriptor, B_deskripto
r and C_descriptor are currently decoded
Indicates the descriptor of the data word of the current instruction. R indicates the result word of the instruction currently being decoded. "X: Y" indicates a connection between X and Y. "$ X" indicates the register number X of the coprocessor. "Cbus (X)" indicates execution of C bus operation X
Is shown.・ " ^* Cbus (X) "is based on C bus operation X.
Indicates the data received.・ " ^* X "indicates a virtual memory address X. ・"? ? "" Indicates an unknown or undecided value. ・ "Set" indicates the setting of the data manipulation register.
Shows the setting. A. 2 Composition operator Notes: 1. The main opcodes are 0xC and 0xD The ambiguity is due to the byte at the highest address (ie,
Byte). 3. The accumulator or operand is premultiplied
May be. 4. The result may be non-premultiplied. 5. The instruction length is defined by the number of input pixels. A. 3 Color space conversion Notes: 1. The input space is always three-dimensional. 3 by default
Is the channel of the lowest order pixel. Ambiguity is eliminated
It is. 2. The format of the color table is one output channel.
Channel or four output channels
Either. A. 4 JPEG instruction Notes: 1. The operation code is 0x2. 2. Operand C may be a register for yatt
No. 3. There are many options.・ Subsampling is performed / not performed.・ Perform / do not perform filtering. -1, 3 or 4 scans. 4. These instructions have a number of settings
Related to registers. A. 4.1 Extension Notes: 1. The following registers are set before instruction execution
There is a need.・ Ro_idr: output image dimension number register ・ ro_cut: output cut register ・ ro_lmt: output restriction register A. 4.2 Compression Notes: 1. The following registers must be set before executing the instruction
There is. • po_idr: output image dimension register • jc_rml: restart marker interval • ro_cut: output cut register • ro_lmt: output restriction register A. 5 Data coding Notes: 1. All data coding operations are compressed and decompressed
Both cases are treated the same. These operation settings
It is almost the same as JPEG. 2. Possible encoding operations • Huffman coding • Predictive coding Possible decoding operations • Fast Huffman decoding • Slow Huffman decoding • packbits decoding (version A) • packbits decoding (version B) • predictive decoding Operand C may be a register for setting. 5. The following registers must be set before executing the instruction
There is.・ Ro_cut: output cut register ・ ro_lmt: output limit register A. 6 Transformation and convolution The operation code is 0x4 (convolution) and 0x5 (conversion).
Exchange). 2. Coprocessor for image conversion and image convolution
Perform operations that are supersets required for
The only difference between image transformation and image convolution is that the coprocessor
As far as image conversion is concerned, the kernel step size is
The size of the tunnel (horizontal, vertical)
Means that the step size is one source pixel
It is. 3. Options: • Snapping and interpolation to neighboring pixels • Whether to accumulate pixels (kernel) • Whether to perform pre-multiplication of source pixels • Clamp, wrap, absolute value of final result Note: Transformation and convolution cannot be performed on the original position. One
In short, the source pointer and the destination point
If they are the same, their contents are destroyed. A. 7 Matrix multiplication Notes: 1. The operation code is 0x3. Options: • Whether or not to premultiply the source pixel. • Clamp, wrap, or absoluteize the final result. • Operand C may be written to a register. A. 8 Halftone processing Notes: 1. The operation code is 0x7 2. Option is only halftone level value. The halftone screen is properly meshed or
Pixels or bids as long as they are unmeshed
Can be done against A. 9 Memory copy Notes: 1. The operation code is 0x92. This instruction is a memory copy
Use completely separate mechanisms to complete the
You.・ General-purpose data transfer instructions are
Utilizing data flow, data manipulation user in PO and RO
Various functions using knits are available. • Peripheral DMA instructions are direct between PIC and LMC
Use a simple connection. This means that data manipulation
And cannot be executed simultaneously with the subsequent instruction.
You. A. 9.1 General-purpose data transfer A. 9.2 Peripheral DMA transfer Notes: 1. Simultaneous execution is not required. This means that I
Handled by C. 2. Operand C may be a register to be set. Since PIC is a module that handles data, this instruction
Is different from other "active" instructions. A. 10 Photo CD decompression This group of instructions has three different operations: horizontal interpolation, vertical
It consists of direct interpolation and rest fusion. Vertical interpolation and rest fusion
The setting method is the same. Opeco for all these instructions
The mode is 0x9. A. 10.1 Horizontal interpolation Notes: 1. Executable on pixel or byte This instruction is an instruction with one operand.
Command C may be a register to be set. A. 10.2 Vertical interpolation and rest fusion Notes: 1. The settings for vertical interpolation and remnant fusion are the same. 2. Can be performed on both pixels and bytes. 3. This instruction has two operands.
Command C may be a register set. A. 11 Control instructions Notes: 1. Control commands are two types of operations, namely flow control commands.
And an internal access instruction. A. 11.1 Flow control Notes: The operation code is 0xB. Flow control instructions are currently available for various jump instructions and various
Consists of wait instructions. 3. There is no explicit installation within the coprocessor and
Is not an "active" instruction. That is, other instructions
What is the submodule in the coprocessor actually like
Do not go. 4. Operand C may be a register to be set. A. 11.2 Internal access (read) Notes: 1. The operation code is 0xA. The read instruction transfers data out of the coprocessor. 3. RO is the only one that actually does everything in the coprocessor
Module. A. 11.3 Internal access (write) Notes: 1. The operation code is 0xA. Write instructions transfer data into the coprocessor. 3. Since this instruction is not an “active” instruction,
The module doesn't actually do anything. A. 12 Reserved instructions Notes: Operation codes 0x0 and 0xF are reserved. 2. Reserved instructions give maskable errors. 3. These reserved instructions will be revised by the coprocessor in the future
Will be used as other instructions when
I have. Appendix B: Registers 1.1 Registers and Tables This section describes the coprocessor registers. This
These registers can be modified in three ways. 1. Specific coprocessor instructions are read / write registers
I have to do it. By using these instructions,
The register starts the PIC bus cycle of the initiator.
Or using a generic interface transaction
The memory associated with the local memory interface
Reading from or writing to memory. 2. Many registers change contents due to the side effects of instruction execution.
Become The coprocessor configures itself for instruction execution
The main mechanism of doing is to make various registers current state
This is realized by setting to reflect. order
After execution, each register reflects the state of the coprocessor
I do. Many typical operations are completely specified by certain instructions
Is set. Some registers have instruction execution
Must be set immediately before. The meaning of the “reserved” register bits The meaning of “reserved” for any register or its components is
It is as follows.・ Writing to the reserved place is possible, but the data
Is rejected.・ Reading from the reserved place can be performed, but the data
Data is undefined All unspecified registers and register fields
"Do" is "reserved". 1.1.1 Register classification Registers in the coprocessor behave as described in this section.
Classified based on These descriptions are: • External: External (access from module). CBus
External access using an interface. Sand
Command controller or external CBus interface
Use the target-mode PCI. note,
Registers are accessed from the PCI bus via Biset mode
Cannot set.・ Internal: Module internal (access from) Status register The status register is read-only from the outside,
Readable and writable. Config 1 register The Config 1 register is readable and writable from outside,
It is read-only from the department. Config 1 register
Type C CBus operations are not supported (ie,
Set mode is not supported).
Unequal byte (or larger) configuration
It is used as a register that holds application information. Config 2 register The Config 2 register is also readable and writable from outside,
Is read-only from. Config 2 register
IBus CBus operation (ie bit set mode)
Support that must be set on a bit-by-bit basis
Used as a register to hold configuration information
It is. Control 1 register Control 1 register is read / write from outside and inside
Possible. The control 1 register is a type C CBus operation.
Operation is not supported (that is, bit set mode is not supported).
Bytes (or
As a register to hold larger control information
Used. Control 2 register Control 2 register is read / write from outside and inside
Possible. The control 2 register is a type C CBus operation.
(I.e. bit set mode)
Holds control information that must be set in
Used as a register. Interrupt register The bits in the interrupt register are set to 1 internally.
Reset to 0 by writing 1 from outside
it can. The module interrupt / error register also
Ip. The module interrupt / error register is
It consists of three fields. [7: 0] Any errors generated by the module
-Meaning of status (23: 8) Any example generated by the module
Means out state [31:24] Any generated by the module
1.1.2 Register Map Meaning Interrupt Status Table 1.1 shows coprocessor registers. Number is ad
Register number, not address. Table 1.1 Coprocessor registers 1.1.3 Register definitions General-purpose module registers Instruction Controller Register I. ic_cfg The ic_cfg register is divided into three parts. Lowest order
The site contains global configuration information.
The third byte from the bottom is the config of stream A.
And the most significant byte is stream B
Configuration information. This register
Is 0x0000000000. m. is_stat This register is divided into four sections. Bottom
The order byte holds the internal state of the IC. 2nd from bottom
The first byte contains the decoded result of the current instruction and the current and
Holds the prefetched instruction stream. Top?
The second byte contains all the steps for stream A.
Status information. Most significant byte in stream B
Holds information about The reset value of this register is 0
x00000000. n. ic_err int This register indicates that an interrupt or error has occurred inside the IC.
Contains an active high flag that indicates whether So
Each bit is cleared by writing 1. o. ic_err_int_en This register is a mask for enabling various errors and interrupts.
And the reset value is 0x00000000. p. ic_ipa This register is used to fetch stream A instructions.
Holds the most significant 30 bits of the virtual address. Two
The least significant bit is 0 if the instruction should be aligned
Is assumed. The reset value of this register is 0x000
00000. q. ic_tda This register holds the "to do" value of stream A
I do. This is 32 bits until the correct instruction exists
(Wrapping) sequence number. This register
Is 0x0000000000. r. ic_fna This register holds the “end” value of stream A.
This is a 32-bit (wrapping) sequence number with the highest
Indicates an instruction that was completed later. Reset this register
The default value is 0x0000000000. s. ic_inta This register holds the "interrupt" number of stream A
I do. This can happen if the mechanism is valid and available.
32 bit (wrapping) to interrupt here
Is the sequence number. The reset value of this register is
0x00000000. t. ic_loa This register is the last duplicate instruction executed on stream A.
Holds the 32-bit (wrapping) sequence number of the instruction
I do. The reset value of this register is 0x00000000
0. u. ic_ipb This register is used to fetch stream B instructions.
Holds the most significant 30 bits of the virtual address. Two
The least significant bit is the instruction should be aligned
Assume 0. The reset value of this register is 0x00
000000. v. ic_tdp This register holds the "to do" value of stream B
I do. This is 32 bits until the correct instruction exists
(Wrapping) number. Reset value of this register
Is 0x00000000. w. ic_fnb This register holds the “end” value of stream B.
This is a 32-bit (wrapping) sequence number with the highest
Indicates an instruction that was completed later. Reset this register
The default value is 0x0000000000. x. ic_intb This register holds the "interrupt" number of stream B
I do. This can happen if the mechanism is valid and available.
32 bit (wrapping) to interrupt here
Is the sequence number. The reset value of this register is
0x00000000. y. ic_lob This register is the last duplicate instruction executed on stream B.
Holds the 32-bit (wrapping) sequence number of the instruction
I do. The reset value of this register is 0x00000000
0. z. ic_sema This register uses the side effects of the ic_stat register.
The reading of this register is streamed.
This is a side effect of requesting the register semaphore of the program A. aa. ic_semb This register uses the side effect of the ic_stat register.
Reading this register is
This is a side effect of the request for the register semaphore in the B-mode. Input interface register ab. iis_cfg ac. iis_stat ad. iis_err_int ae. iis_err_int_en af. iis_ic_addr ag. iis_dcc_addr ah. iis_po_addr ai. iis_burst aj. iis_base_addr ak. iis_test External interface controller register al. eic_cfg am. eic_stat an. eic_err_int error and interrupt of eic_err_int register
Bits can only be set by the EIC,
Can be reset only by Normal errors and interrupts
A bit is reset by writing a 1 to it.
You. PCI configuration register bits
Error bits that are
Must be cleared by writing to the
No. That is, what is the copy in eic_err_int
Has no effect. ao. eic_err_int_en ap. eic_test aq. eic_pob ar. eic_high_addr as. eic_wtlb_v at. eic_wtlb_p au. eic_mmu_v Note: The value of this register is
Error due to PCI bus error from error or MMU
If not, you can change it at any time. av. eic_mmu_p Note: The value of this register is
Error due to PCI bus error from error or MMU
If not, you can change it at any time. aw. eic_ip_addr Note: The value of this register is
Can be changed at any time if it is not invalid due to errors in the source
It is. ax. eic_rp_addr Note: The value of this register is
Can be changed at any time, unless invalid due to errors in the source
Noh. ay. eic_ig_addr Note: Value of this register
If the GBC is not invalid due to a general bus error,
Can be changed at any time. az. eic_rg_addr Note: The value of this register is
If it is not invalid, you can change it at any time. Alias of PCI bus configuration space PCI bus configuration consisting of 16 words
Space is indicated by addresses from 0xc0 to 0xcf
Aliased to register. Local memory controller register ba. lmi_cfg This register determines the LMC processing mode and parameters
Many configuration bits used to
And control bits. When sdram_1 pin is high
Bits that specifically refer to SDRAM processing have no effect.
Not. This register has a clkin frequency of 80M
Hz is 3.2 microseconds refresh interval
It has such a reset value 0x20,000100. all
The special modes and features of the
All access rights are set equal to zero. Refresh
Is active at reset, but other modules
Invalid (E = 0). Refresh affects E bit
Not done. bb. lmi_stat The status register is a module as well as information inside the machine.
The active and undecided bits of the rule. Statema
Shin runs at twice the clock rate of the CBus interface.
Operating and therefore the latest 80 MHz clock 2
Two fields to hold status information for each vehicle
is necessary. bc. lmi_err_int Error and interrupt status registers are interrupts, examples
In addition, information on error status is retained. Register is read / write
Yes, the read returns status information,
Writing a 1 resets that bit. Writing 0
The embedding has no effect on that bit. This register has the reset value 0x00000000
Must be done, this is due to interrupts and errors
Indicates that there is not. Reserved bits are always 0 and never
Cannot change state. bd. lmi_err_int_en register Error, exception, and interrupt enable registers are
It is used to select the valid or invalid of the embedded signal. register
Can read and write. This register is lmi_err_i
error, exception, and interrupt in the nt register.
Therefore, it is used to enable in bit units. this
Register bits and lmi_err_int register
There is a one-to-one correspondence between bits. If lmi_e
A particular bit in the rr_int_en register goes high.
Once the corresponding bit in the lmi_err_int register
Is enabled and if it is high, the LMC module
Module error, exception or interrupt signal, c_er
r, c_exp, or c_int can occur. Also
Specific bits of the lmi_err_int_en register.
Lmi_err_int regis
Corresponding bit of the data becomes invalid, c_err, c_e
xp or c_int cannot be generated.
Since there is no exception in LMC, the exp_m
The ask bits have no effect and are all reserved. this
Register reset values are used for all error and interrupt sources.
Is 0x0000000000 to invalidate. Not used
Bit is always 0 and cannot be set high.
Absent. be. lmi_dcfg This configuration register is a DRAM chip
Determine the size and configuration when using
Holds design parameters to be specified. This register is
Reset to maximize all timing limit values.
Hold the default value 0x0007ff80. bf. lmi_mode register This configuration register is part of the initialization process.
Information written to the SDRAM mode register as a ring
Hold. This register is always readable and writable,
Write to SDRAM by setting
It's fine. This register stores the reset value 0x0037
Have. This useful default value is power-up precharge
It is required immediately after the reset of level 1 or after. This
This sets the read delay to 3 clocks and the burst length
Set to a full page using sequential wrap.
After any reset, if sdram_1 pin goes low
, The SDRAM mode register is initially programmed.
To program, the initialization bit is set. mode
After a register write, this bit is automatically set to zero.
Is cleared. Peripheral interface register bg. pic_cfg register bh. pic_stat bi. pic_err_int pic_err_int register errors and interrupts
Bit is set by PIC only
Reset only by Each bit is 1
Reset by writing bj. pic_err_int_en bk. pic_abus_cfg bl. pic_abus_addr bm. pic_cent_cfg The pic_cent_cfg register is a Centronics module.
All aspects of the interface when the mode is enabled
Read / write signal and read-only switch
Contains the status signal. bn. pic_cent_dir bo. pic_reverse_cfg bp. pic_timer0 bq. pic_timer1 Data cache controller register br. dcc_cfg1 bs. dcc_cfg2 bt. dcc_stat bu. dcc_err_int bv. dcc_err_int_en bw. dcc_lv0 bx. dcc_lv1 by. dcc_lv2 bz. dcc_lv3 ca. dcc_addr cb. dcc_raddrb cc. dcc_raddrc cd. dcc_test Operand organizer register Operand organizer
The same two operand organizers
Exists: operand organizer B and operand
-Organizer C. These two operand augers
The registers for the riser are described here. ce. oon_cfg (oob_cfg = 0x70,
ooc_cfg = 0x80) cf. oon_stat (oob_cfg = 0x7
1, ooc_cfg = 0x81) cg. oon_err_int (oob_err_i
(nt = 0x72, err_int = 0x82) ch. oon_err_int_en (oob_er
r_int_en = 0x73, err_int_en =
0x83) ci. oon_dmr (oob_dmr = 0x74,
ooc_dmr = 0x84) cj. oon_subst (oob_subst = 0)
x75, ooc_subst = 0x85) ck. oon_cdp (oob_cdp = 0x76,
ooc_cdp = 0x86) cl. oon_len (oob_len = 0x77,
ooc_len = 0x8 cm. oon_said (oob_said = 0x7
8, ooc_said = 0x88) cn. oon_tile (oob_tile = 0x7
9, ooc_tile = 0x89) Pixel organizer register co. po_cfg cp. po_stat cq. po_err_int cr. po_err_int_en cs. po_dmr ct. po_subst cu. po_cdp cv. po_len cw. po_said cx. po_idr cy. po_muv_valid cz. po_muv Main data path register da. mdp_cfg All bits are reset to 0.
Is db. mdp_stat All bits are reset to zero. dc. mdp_err_int All bits are reset to zero. dd. mdp_err_int_en All bits are reset to zero. de. mdp_test All bits reset to 0
Is set. df mdp_op1 All bits reset to 0
Is done. dg mdp_op2 All bits reset to 0
Is done. dh mdp_por All bits reset to 0
Is done. di mdp_bi All bits are reset to 0.
It is. The mdp_bi register contains various registers for various modes.
Used for dj mdp_bm All bits reset to 0
It is. The mdp_bm register is different for different modes.
Used for dk mdp_len All bits reset to 0
Be done JPEG encoder register dl jc_cfg dm jc_stat dn jc_err_int do jc_err_int_en dp jc_rsi dq jc_decode dr jc_res ds jc_table_sel Result organizer register dtro_cfg duro_stat dv ro_err_int dw ro_err_int_en dx ro_dmr dy ro_subst dz ro_cdp ea ro_len eb ro_sa ecro_idr ed ro_vbase ee ro_cut ef ro_lmt Alias of PCI configuration space PC
I configuration space is 256 bytes, PC
I is a block of registers defined by
Configure PCI devices,
I am allowed to read the condition. It is PC
Accessed using the I configuration cycle
It is. The registers also read the internal memory of the coprocessor.
Is mirrored in the dedicated
It can be read using normal memory cycles. EIC
Configuration space format implemented in
The mat is shown in Table 1.141.1. Table 1.141.1 Spatial level of coprocessor PCI configuration
I-out Reserved registers and reserved registers in implemented registers
Returns 0 for reads and 0 for writes.
Has no effect. Configuration in the range 0x40-0xff
Regulation space addresses are also reserved. Ben
Configuration registers specific to the
Absent. eg vendor ID This register is read-only. Ben of CISRA
The header ID is 0x11AC. eh Device ID This register is read-only. Coprocessor data
The device ID is 0x0001. Device ID fee
The field is divided into two 8-bit fields:
The most significant 8 bits are numbers (0x) indicating the characteristics of the device.
0 is a coprocessor) and the least significant 8 bits are
Version number (0x1 is the version of the coprocessor)
). ei Command register Table 1.142 defines the fields of the command register.
Show. All unreserved bits in this register
Can read / write. After reset,
The register is set to 0x0000. ej Status register Status register file
The definition of the field is shown in Table 1.143. Reading this register
Imprinting is as usual. Some registers in this register
The set is read-only. Other bits are coprocessor
Set to 1 by the server only, 0 by the host only
Reset (except in test mode). The host
Reset by writing 1 to bit; write 0
The meaning does not make sense. After reset, this register
It is set to 0x0280. ek revision ID This is a read-only register
It is. The initial revision ID of the coprocessor is 0x01
It is. el class code This is a read-only
It is a register. Coprocessor is defined by PCISIG
This register is 0xF
Set to F0000. em Cache line size This is the system cache line size in 32-bit words.
This is a readable / writable register that determines the in-size.
This is because the coprocessor uses memory read lines and memory
Determined when using the double read command. Copro
Sessa supports values from 0 to 255 in this register.
To 0 in this register is the memory read line.
Disables the format of memory and memory multiplex reading. this
The register is set to 0x00 at reset. en Delay timer This register is used by the CPU for all PCI processing.
Readable and writable registers that specify the maximum number of clocks
It is. The coprocessor sets 0 to 2 in this register.
Supports 55 values. This register is reset
Is set to 0x00. eo header type This read-only register is set to 0x00
You. This means that the coprocessor has a type 0 layout
Means using configuration space
You. ep base address This readable and writable register is the internal register of the coprocessor.
Memory, internal memory, local memory, and general-purpose interface.
Used to place faces in the host memory map
Can be. 64 MB of coprocessor resources
(Not all are used) and therefore
Only the first 6 bits of the register can be written.
The remaining bits are all hardwired to zero.
The lower 4 bits of this register are read-only control
And these are also connected to zero. this child
Means that registers refer to memory space,
The processor can be mapped anywhere in the host's 32-bit space.
And coprocessor resources are targeted
Sometimes it means that prefetching is not possible. eq Subsystem vendor ID This read-only register is implemented by the host on the system.
To identify the PCI board vendor
(Component mounted on PCI interface on board
Component vendors). The contents of this register are
EIC configuration serial port at reset
Loaded using the er Subsystem ID This read-only register is implemented by the host in the system.
PCI boards that have been assigned. This Regis
Data is reset when reset
Loaded using serial port. This mechanism
Required for board function or configuration
External information and read from host
Enable. es Interrupt line These registers are readable and writable by the system software.
Enable recording of interrupt line routing information
Used by interrupt service software
Can access. Has no effect on the processing in the coprocessor
Do not give. This register is set to 0x00 at reset.
Is set. et interrupt pin This read-only register is hardwired to 0x01
Have been. This means that the coprocessor is
Indicates that the a_1 interrupt pin is to be used. eu Min_Gnt This read-only register is the one requested by the coprocessor.
The burst period length in units of / 4 microsecond is indicated to the host.
The optimal value for this register has not yet been determined. ev Max_Lat This read-only register is in 1/4 microsecond units
Of the PCI bus required by the coprocessor
Indicate the maximum roll delay to the host. Optimal for this register
The value has not been determined yet. 1.1.4 Internal memory map In this section, the pre-
Details of the objects that occur in the rule data area.
State. 1.1.5 Memory word field a eic_ptp

[Brief description of the drawings]

【図１】ホストコンピュータ環境内のラスタ画像コプ
ロセッサの動作を示す図FIG. 1 illustrates the operation of a raster image coprocessor in a host computer environment.

【図２】図１のラスタ画像コプロセッサをより詳細に
示した図FIG. 2 shows the raster image coprocessor of FIG. 1 in more detail;

【図３】ラスタ画像コプロセッサのメモリマップを示
す図FIG. 3 is a diagram showing a memory map of a raster image coprocessor.

【図４】ＣＰＵ，命令キュー、命令オペランド、共有
メモリ中の結果、コプロセッサ間の関係を示す図FIG. 4 is a diagram showing a relationship among a CPU, an instruction queue, instruction operands, a result in a shared memory, and a coprocessor;

【図５】命令生成部、メモリ管理部、キュー管理部、
コプロセッサ間の関係を示す図FIG. 5 shows an instruction generation unit, a memory management unit, a queue management unit,
Diagram showing relationships between coprocessors

【図６】命令をペンディング命令キューから読み込
み、終了命令キューに配置するグラフィックスコプロセ
ッサの動作を示す図FIG. 6 is a diagram illustrating an operation of a graphics coprocessor that reads an instruction from a pending instruction queue and arranges the instruction in an end instruction queue.

【図７】命令キューの固定長巡回バッファ実装を示
し、バッファが溢れるまで待機しする必要性を説明する
図FIG. 7 illustrates a fixed-length circular buffer implementation of an instruction queue and illustrates the need to wait until the buffer overflows.

【図８】コプロセッサにおいて用いられる命令実行ス
トリームを示す図FIG. 8 is a diagram showing an instruction execution stream used in a coprocessor.

【図９】命令実行フローチャート、FIG. 9 is an instruction execution flowchart,

【図１０】コプロセッサにおいて用いられる標準命令
ワードフォーマットを示す図FIG. 10 is a diagram showing a standard instruction word format used in a coprocessor.

【図１１】標準命令の命令ワードフィールドを示す図FIG. 11 is a diagram showing an instruction word field of a standard instruction.

【図１２】標準命令のデータワードフィールドを示す
図FIG. 12 shows a data word field of a standard instruction.

【図１３】図２の命令制御部を模式的に示す図FIG. 13 is a diagram schematically showing the instruction control unit in FIG. 2;

【図１４】図１３の実行制御部をより詳細に示した図FIG. 14 is a diagram showing the execution control unit of FIG. 13 in more detail;

【図１５】命令制御部の状態遷移図FIG. 15 is a state transition diagram of the instruction control unit.

【図１６】図１３の命令復号部を示す図FIG. 16 is a diagram illustrating an instruction decoding unit in FIG. 13;

【図１７】図１６の命令シーケンサをより詳細に示し
た図17 is a diagram showing the instruction sequencer of FIG. 16 in more detail;

【図１８】図１６のＩＤシーケンサの状態遷移図18 is a state transition diagram of the ID sequencer in FIG.

【図１９】図１３のプレフェッチバッファ制御部をよ
り詳細に示した図FIG. 19 is a diagram showing the prefetch buffer control unit in FIG. 13 in more detail;

【図２０】コプロセッサで用いられるレジスタ記憶と
モジュール間関連の標準形式を示す図FIG. 20 is a diagram showing a standard format of register storage and inter-module relations used in a coprocessor.

【図２１】コプロセッサにおいて用いられる制御バス
処理のフォーマットを示す図FIG. 21 is a diagram showing a format of control bus processing used in the coprocessor.

【図２２】コプロセッサの一部内のデータフローを示
す図FIG. 22 is a diagram showing a data flow in a part of the coprocessor.

【図２３】コプロセッサにおいて用いられるさまざま
なデータ再フォーマット例を示す図FIG. 23 shows examples of various data reformatting used in a coprocessor.

【図２４】コプロセッサにおいて用いられるさまざま
なデータ再フォーマット例を示す図FIG. 24 illustrates various data reformatting examples used in the coprocessor.

【図２５】コプロセッサにおいて用いられるさまざま
なデータ再フォーマット例を示す図FIG. 25 illustrates various data reformatting examples used in a coprocessor.

【図２６】コプロセッサにおいて用いられるさまざま
なデータ再フォーマット例を示す図FIG. 26 illustrates various data reformatting examples used in the coprocessor.

【図２７】コプロセッサにおいて用いられるさまざま
なデータ再フォーマット例を示す図FIG. 27 illustrates various data reformatting examples used in the coprocessor.

【図２８】コプロセッサにおいて用いられるさまざま
なデータ再フォーマット例を示す図FIG. 28 illustrates various data reformatting examples used in the coprocessor.

【図２９】コプロセッサにおいて用いられるさまざま
なデータ再フォーマット例を示す図FIG. 29 is a diagram showing various data reformatting examples used in the coprocessor.

【図３０】コプロセッサにおいて実行されるフォーマ
ット変換を示す図FIG. 30 is a diagram showing a format conversion performed in the coprocessor.

【図３１】コプロセッサにおいて実行されるフォーマ
ット変換を示す図FIG. 31 is a diagram illustrating format conversion performed in a coprocessor.

【図３２】コプロセッサにおいて実行される入力デー
タ変換処理を示す図FIG. 32 is a diagram showing input data conversion processing executed in the coprocessor.

【図３３】コプロセッサにおいて実行されるさまざま
なデータ変換を示す図FIG. 33 shows various data transformations performed in the coprocessor.

【図３４】コプロセッサにおいて実行されるさまざま
なデータ変換を示す図FIG. 34 illustrates various data transformations performed in the coprocessor.

【図３５】コプロセッサにおいて実行されるさまざま
なデータ変換を示す図FIG. 35 illustrates various data transformations performed in the coprocessor.

【図３６】コプロセッサにおいて実行されるさまざま
なデータ変換を示す図FIG. 36 illustrates various data transformations performed in the coprocessor.

【図３７】コプロセッサにおいて実行されるさまざま
なデータ変換を示す図FIG. 37 illustrates various data transformations performed in the coprocessor.

【図３８】コプロセッサにおいて実行されるさまざま
なデータ変換を示す図FIG. 38 illustrates various data conversions performed in the coprocessor.

【図３９】コプロセッサにおいて実行されるさまざま
なデータ変換を示す図FIG. 39 illustrates various data conversions performed in the coprocessor.

【図４０】コプロセッサにおいて実行されるさまざま
なデータ変換を示す図FIG. 40 illustrates various data transformations performed in the coprocessor.

【図４１】コプロセッサにおいて実行されるさまざま
なデータ変換を示す図FIG. 41 illustrates various data conversions performed in the coprocessor.

【図４２】コプロセッサにおいて実行されるさまざま
な内部から出力データ変換を示す図FIG. 42 illustrates various internal to output data conversions performed in the coprocessor.

【図４３】コプロセッサにおいて実行されるさまざま
なデータ変換例を示す図FIG. 43 is a diagram showing various data conversion examples executed in the coprocessor.

【図４４】コプロセッサにおいて実行されるさまざま
なデータ変換例を示す図FIG. 44 is a diagram showing various data conversion examples executed in the coprocessor.

【図４５】コプロセッサにおいて実行されるさまざま
なデータ変換例を示す図FIG. 45 is a diagram showing various data conversion examples executed in the coprocessor.

【図４６】コプロセッサにおいて実行されるさまざま
なデータ変換例を示す図FIG. 46 is a diagram showing various data conversion examples executed in the coprocessor.

【図４７】コプロセッサにおいて実行されるさまざま
なデータ変換例を示す図FIG. 47 is a diagram showing various data conversion examples executed in the coprocessor.

【図４８】どのデータ変換が用いられるべきかを決定
する内部レジスタで用いられるさまざまなフィールドを
示す図FIG. 48 illustrates various fields used in internal registers to determine which data conversion is to be used.

【図４９】データ正規化を用いるグラフィックスサブ
システムのブロック図FIG. 49 is a block diagram of a graphics subsystem using data normalization.

【図５０】データ正規化装置の回路図FIG. 50 is a circuit diagram of a data normalization device.

【図５１】合成処理において実行されるピクセル処理
を示す図FIG. 51 is a diagram showing pixel processing performed in the synthesis processing.

【図５２】合成処理のための命令ワードフォーマット
を示す図FIG. 52 is a diagram showing an instruction word format for a combining process;

【図５３】合成処理のためのデータワードフォーマッ
トを示す図FIG. 53 is a view showing a data word format for a combining process;

【図５４】タイル処理のための命令ワードフォーマッ
トを示す図FIG. 54 shows an instruction word format for tile processing.

【図５５】画像に対するタイル命令の動作を示す図FIG. 55 is a view showing the operation of a tile instruction for an image.

【図５６】色値を再マッピングするための色区間／区
間内位置テーブルの利用処理を示す図FIG. 56 is a diagram showing use processing of a color section / in-section position table for remapping color values;

【図５７】コプロセッサのＭＵＶバッファ内の区間／
区間内位置テーブルの格納形式を示す図FIG. 57. Sections in MUV buffer of coprocessor /
Diagram showing the storage format of the section position table

【図５８】コプロセッサにおいて実行される補間を用
いた色変換処理を示す図FIG. 58 is a diagram showing a color conversion process using interpolation executed in the coprocessor.

【図５９】コプロセッサにおいて実行されるエッジで
の色変換処理の改善処理を示す図FIG. 59 is a diagram showing an improvement process of a color conversion process at an edge executed in the coprocessor.

【図６０】コプロセッサにおいて実行される１出力色
のための色空間変換処理を示す図FIG. 60 is a diagram showing a color space conversion process for one output color executed in the coprocessor.

【図６１】単一色出力色空間変換を用いたときのコプ
ロセッサのキャッシュ内でのメモリ格納を示す図FIG. 61 illustrates memory storage in a cache of a coprocessor when using single color output color space conversion.

【図６２】複数色空間変換で用いられる手法を示す図FIG. 62 is a diagram showing a technique used in multiple color space conversion.

【図６３】複数色空間変換処理において用いられるキ
ャッシュのためのアドレス再マッピング処理を示す図FIG. 63 is a diagram showing an address remapping process for a cache used in a multiple color space conversion process;

【図６４】色空間変換命令における命令ワードフォー
マットを示す図FIG. 64 is a diagram showing an instruction word format in a color space conversion instruction.

【図６５】複数色変換手法を示す図FIG. 65 is a diagram showing a multiple color conversion method.

【図６６】コプロセッサで実行されるＪＰＥＧ変換処
理でのＭＣＵの生成を説明する図FIG. 66 is an exemplary view for explaining generation of an MCU in JPEG conversion processing executed by a coprocessor;

【図６７】コプロセッサで実行されるＪＰＥＧ変換処
理でのＭＣＵの生成を説明する図FIG. 67 is a view for explaining generation of an MCU in JPEG conversion processing executed by the coprocessor.

【図６８】コプロセッサのＪＰＥＧ符号化部の構造を
示す図FIG. 68 is a view showing the structure of a JPEG encoding unit of the coprocessor.

【図６９】図６８の量子化部をより詳細に示す図FIG. 69 is a diagram showing the quantization unit of FIG. 68 in more detail;

【図７０】図６８のハフマン符号化部をより詳細に示
す図FIG. 70 is a diagram showing the Huffman encoding unit of FIG. 68 in more detail;

【図７１】ハフマン符号化部と復号部とを示す図FIG. 71 is a diagram showing a Huffman encoding unit and a decoding unit;

【図７２】ハフマン符号化部と復号部とを示す図FIG. 72 is a diagram showing a Huffman encoding unit and a decoding unit;

【図７３】コプロセッサで用いられるＪＰＥＧデータ
の削除・制約処理を説明する図FIG. 73 is a view for explaining JPEG data deletion / constraint processing used in the coprocessor.

【図７４】コプロセッサで用いられるＪＰＥＧデータ
の削除・制約処理を説明する図FIG. 74 is a view for explaining deletion / constraint processing of JPEG data used in the coprocessor.

【図７５】コプロセッサで用いられるＪＰＥＧデータ
の削除・制約処理を説明する図FIG. 75 is a view for explaining JPEG data deletion / constraint processing used in the coprocessor.

【図７６】ＪＰＥＧ命令の命令ワードフォーマットを
示す図FIG. 76 shows an instruction word format of a JPEG instruction.

【図７７】一般の離散コサイン変換装置（従来例）の
ブロック図FIG. 77 is a block diagram of a general discrete cosine transform device (conventional example);

【図７８】従来例のＤＣＴ装置の算術データパスを示
す図FIG. 78 is a diagram showing an arithmetic data path of a DCT device of a conventional example.

【図７９】コプロセッサで用いられるＤＣＴ装置のブ
ロック図FIG. 79 is a block diagram of a DCT device used in a coprocessor.

【図８０】図７９の算術回路をより詳細に示すブロッ
ク図FIG. 80 is a block diagram showing the arithmetic circuit of FIG. 79 in more detail;

【図８１】図７９のＤＣＴ装置の算術データパスを示
す図FIG. 81 is a view showing an arithmetic data path of the DCT device of FIG. 79;

【図８２】ＪＰＥＧフォーマットのように符号化され
ていないビットフィールド（バイト整列されているもの
とされていないもの）がインタリーブされた代表的なハ
フマン符号化データを示す図FIG. 82 is a diagram showing typical Huffman encoded data in which bit fields that are not encoded as in the JPEG format (ones that are not byte-aligned) are interleaved.

【図８３】図８４のＪＰＥＧデータのハフマン復号部
の全体の構造をより詳細に示した図FIG. 83 is a diagram showing the overall structure of the Huffman decoding unit for JPEG data in FIG. 84 in more detail;

【図８４】ＪＰＥＧデータのハフマン復号部の全体の
構造を示す図FIG. 84 is a view showing the overall structure of a Huffman decoding unit for JPEG data

【図８５】バイト整列された符号化されていないビッ
トフィールドを入力データから削除するストリッパブロ
ック中のデータ処理を示し、ストリッパから出力される
データに対応するタグ符号の例をも示す図FIG. 85 shows data processing in a stripper block for removing a byte-aligned uncoded bit field from input data, and also shows an example of a tag code corresponding to data output from the stripper.

【図８６】データプレシフタの構成とデータフローを
示す図FIG. 86 is a view showing the configuration and data flow of a data pre-shifter

【図８７】図８１の復号部の制御ロジックを示す図FIG. 87 shows a control logic of the decoding unit in FIG. 81.

【図８８】マーカプレシフタの構成とデータフローを
示す図FIG. 88 is a diagram showing the configuration and data flow of a marker pre-shifter.

【図８９】ＪＰＥＧ符号化においてハフマン符号値を
復号する組み合わせ回路のブロック図、FIG. 89 is a block diagram of a combination circuit for decoding a Huffman code value in JPEG encoding;

【図９０】パディング領域の概念とパディングビット
の復号部のブロック図FIG. 90 is a block diagram of the concept of a padding area and a padding bit decoding unit.

【図９１】復号部から出力され、コプロセッサにおい
て用いられるデータフォーマットの例を示す図Fig. 91 is a diagram illustrating an example of a data format output from a decoding unit and used in a coprocessor.

【図９２】画像変換命令において用いられる手法を示
す図FIG. 92 is a diagram showing a technique used in an image conversion instruction.

【図９３】画像変換命令における命令ワードフォーマ
ットを示す図FIG. 93 shows an instruction word format in an image conversion instruction.

【図９４】コプロセッサで用いられる画像変換カーネ
ルのフォーマットを示す図FIG. 94 is a diagram showing a format of an image conversion kernel used in the coprocessor.

【図９５】コプロセッサで用いられる画像変換カーネ
ルのフォーマットを示す図FIG. 95 is a diagram showing a format of an image conversion kernel used in the coprocessor.

【図９６】コプロセッサで用いられる画像変換のため
のインデックステーブルの利用処理を示す図FIG. 96 is a diagram showing processing for using an index table for image conversion used in the coprocessor.

【図９７】変換や畳込みで用いる命令のためのデータ
フィールドフォーマットを示す図、FIG. 97 is a view showing a data field format for an instruction used in conversion or convolution;

【図９８】命令ワードのｂｐフィールドの説明図FIG. 98 is an explanatory diagram of a bp field of an instruction word.

【図９９】コプロセッサで用いられる畳込み処理を示
す図FIG. 99 shows a convolution process used in the coprocessor.

【図１００】コプロセッサで用いられる畳込み命令の
命令ワードフォーマット図FIG. 100 is an instruction word format diagram of a convolution instruction used in a coprocessor.

【図１０１】コプロセッサで用いられる行列乗算の命
令ワードフォーマット図、FIG. 101 is an instruction word format diagram of matrix multiplication used in the coprocessor,

【図１０２】コプロセッサで用いられる階層的画像操
作処理を示す図FIG. 102 is a view showing a hierarchical image manipulation process used in the coprocessor.

【図１０３】コプロセッサで用いられる階層的画像操
作処理を示す図FIG. 103 is a view showing a hierarchical image operation process used in the coprocessor.

【図１０４】コプロセッサで用いられる階層的画像操
作処理を示す図FIG. 104 is a view showing a hierarchical image manipulation process used in the coprocessor.

【図１０５】コプロセッサで用いられる階層的画像操
作処理を示す図FIG. 105 is a view showing a hierarchical image operation process used in the coprocessor.

【図１０６】階層的画像命令での命令ワード符号を示
す図FIG. 106 is a view showing an instruction word code in a hierarchical image instruction.

【図１０７】コプロセッサで用いられるフロー制御命
令の命令ワード符号を示す図FIG. 107 is a diagram showing an instruction word code of a flow control instruction used in the coprocessor.

【図１０８】ピクセルオーガナイザをより詳細に示す
図FIG. 108 shows the pixel organizer in more detail.

【図１０９】ピクセルオーガナイザにおけるオペラン
ドフェッチ部をより詳細に示す図FIG. 109 is a diagram showing the operand fetch unit in the pixel organizer in more detail;

【図１１０】コプロセッサで用いられる種々の格納フ
ォーマットを示す図FIG. 110 shows various storage formats used in the coprocessor.

【図１１１】コプロセッサで用いられる種々の格納フ
ォーマットを示す図FIG. 111 shows various storage formats used in the coprocessor.

【図１１２】コプロセッサで用いられる種々の格納フ
ォーマットを示す図FIG. 112 shows various storage formats used in the coprocessor.

【図１１３】コプロセッサで用いられる種々の格納フ
ォーマットを示す図FIG. 113 shows various storage formats used in the coprocessor.

【図１１４】コプロセッサで用いられる種々の格納フ
ォーマットを示す図FIG. 114 shows various storage formats used in the coprocessor.

【図１１５】コプロセッサのピクセルオーガナイザに
おけるＭＵＶアドレス生成部をより詳細に示す図FIG. 115 is a diagram showing the MUV address generator in the pixel organizer of the coprocessor in more detail;

【図１１６】コプロセッサで用いられる多重値（ＭＵ
Ｖ）バッファのブロック図FIG. 116 shows a multiple value (MU) used in the coprocessor.
V) Block diagram of buffer

【図１１７】図１１６の符号化器の構造を示す図117 shows the structure of the encoder in FIG. 116.

【図１１８】図１１６の復号器の構造を示す図118 shows the structure of the decoder in FIG. 116.

【図１１９】ＪＰＥＧモード（ピクセル分解）におい
て読み出しアドレスを生成する図１１６のアドレス生成
部の構造を示す図FIG. 119 is a view showing the structure of the address generation unit shown in FIG. 116 for generating a read address in the JPEG mode (pixel decomposition).

【図１２０】ＪＰＥＧモード（ピクセル復元）におい
て読み出しアドレスを生成する図１１６のアドレス生成
部の構造を示す図FIG. 120 is a diagram showing the structure of the address generation unit in FIG. 116 that generates a read address in the JPEG mode (pixel restoration).

【図１２１】図１１６の記憶装置を備えるメモリモジ
ュールの構成を示す図FIG. 121 illustrates a configuration of a memory module including the storage device in FIG. 116.

【図１２２】読み出しアドレスをメモリモジュールに
多重化する回路の構造を示す図FIG. 122 is a diagram showing a structure of a circuit for multiplexing a read address in a memory module;

【図１２３】単一ルックアップテーブルモードで動作
するバッファ内にルックアップテーブルエントリがどの
ように格納されるかを示す図FIG. 123 illustrates how a lookup table entry is stored in a buffer operating in a single lookup table mode.

【図１２４】多重ルックアップテーブルモードで動作
するバッファ内にルックアップテーブルエントリがどの
ように格納されるかを示す図FIG. 124 illustrates how a lookup table entry is stored in a buffer operating in a multiple lookup table mode.

【図１２５】ＪＰＥＧモード（ピクセル分解）で動作
するバッファ内にピクセルがどのように格納されるかを
示す図FIG. 125 shows how pixels are stored in a buffer operating in JPEG mode (pixel decomposition)

【図１２６】ＪＰＥＧモード（ピクセル復元）で動作
するバッファから単一カラーがどのように格納されるか
を示す図FIG. 126 shows how a single color is stored from a buffer operating in JPEG mode (pixel recovery)

【図１２７】コプロセッサの結果オーガナイザの構造
をより詳細に示す図FIG. 127 shows the structure of the result organizer of the coprocessor in more detail.

【図１２８】コプロセッサのオペランドオーガナイザ
の構造をより詳細に示す図FIG. 128 shows the structure of the operand organizer of the coprocessor in more detail.

【図１２９】コプロセッサにおいて用いられる主デー
タパス部のためのコンピュータアーキテクチャのブロッ
ク図Fig. 129 is a block diagram of a computer architecture for a main data path unit used in the coprocessor.

【図１３０】更なる処理のために入力データオブジェ
クトを受け取り、格納し、再配列するための入力インタ
ーフェースのブロック図FIG. 130 is a block diagram of an input interface for receiving, storing, and rearranging input data objects for further processing.

【図１３１】入力データオブジェクトに対して算術演
算を実行するための画像データプロセッサのブロック図FIG. 131 is a block diagram of an image data processor for performing arithmetic operations on input data objects.

【図１３２】入力データオブジェクトの１つのチャネ
ルに対して算術演算を実行するためのカラーチャネルプ
ロセッサのブロック図FIG. 132 is a block diagram of a color channel processor for performing arithmetic operations on one channel of an input data object.

【図１３３】カラーチャネルプロセッサにおける多機
能ブロックのブロック図FIG. 133 is a block diagram of a multi-function block in the color channel processor.

【図１３４】合成動作のためのブロック図FIG. 134 is a block diagram for a combining operation.

【図１３５】スキャンラインの逆変換を示す図FIG. 135 is a view showing an inverse scan line conversion;

【図１３６】指定されたピクセルにおける値を計算す
るために必要なステップのブロック図FIG. 136 is a block diagram of the steps required to calculate a value at a specified pixel.

【図１３７】画像変換エンジンのブロック図Fig. 137 is a block diagram of an image conversion engine.

【図１３８】カーネルデスクリップションにおける２
つのフォーマットを示す図FIG. 138. Kernel Descriptor 2
Diagram showing two formats

【図１３９】ｂｐフィールドの定義と解釈を示す図Fig. 139 is a diagram showing the definition and interpretation of a bp field.

【図１４０】行列乗算を実行する乗算・加算部のブロ
ック図FIG. 140 is a block diagram of a multiplication / addition unit that performs matrix multiplication

【図１４１】コプロセッサでのキャッシュ及びキャッ
シュ制御部における制御、アドレス及びデータフローを
示す図FIG. 141 is a diagram showing a control, an address, and a data flow in a cache and a cache control unit in a coprocessor.

【図１４２】キャッシュのメモリ構成を示す図FIG. 142 is a diagram showing a memory configuration of a cache;

【図１４３】コプロセッサにおけるキャッシュ制御部
のためのアドレスフォーマットを示す図FIG. 143 is a diagram showing an address format for a cache control unit in the coprocessor.

【図１４４】カラーチャネルプロセッサにおける多機
能ブロックのブロック図FIG. 144 is a block diagram of a multi-function block in the color channel processor.

【図１４５】図１４４のキャッシュ及びキャッシュコ
ントローラのコプロセッサ入力インターフェーススイッ
チを示す図FIG. 145 illustrates a coprocessor input interface switch of the cache and cache controller of FIG. 144.

【図１４６】主アドレス及びデータパスを示すコプロ
セッサの４ポートダイナミックローカルメモリ制御部を
示す図FIG. 146 is a diagram showing a 4-port dynamic local memory control unit of the coprocessor showing a main address and a data path.

【図１４７】図１４６の制御部における状態機構図FIG. 147 is a state mechanism diagram of the control unit in FIG. 146;

【図１４８】図１４６の仲裁部における機能を詳細に
リストした擬似コードを示す図FIG. 148 shows a pseudo code listing in detail the functions in the arbitration unit of FIG. 146

【図１４９】図１４６において用いられる要求者プラ
イオリティビットの構造および用語を示す図FIG. 149 is a diagram showing the structure and terminology of a requester priority bit used in FIG. 146.

【図１５０】コプロセッサにおける外部インターフェ
ース制御部をより詳細に示す図FIG. 150 is a diagram showing the external interface control unit in the coprocessor in more detail.

【図１５１】コプロセッサで用いられる物理アドレス
へのマッピング処理又は物理アドレスからのマッピング
処理を示す図FIG. 151 is a diagram showing a mapping process to a physical address or a mapping process from a physical address used in the coprocessor.

【図１５２】コプロセッサで用いられる物理アドレス
へのマッピング処理又は物理アドレスからのマッピング
処理を示す図FIG. 152 is a diagram showing a mapping process to a physical address or a mapping process from a physical address used in the coprocessor.

【図１５３】コプロセッサで用いられる物理アドレス
へのマッピング処理又は物理アドレスからのマッピング
処理を示す図FIG. 153 is a diagram showing a mapping process to a physical address or a mapping process from a physical address used in the coprocessor.

【図１５４】コプロセッサで用いられる物理アドレス
へのマッピング処理又は物理アドレスからのマッピング
処理を示す図FIG. 154 is a diagram illustrating mapping processing to a physical address or mapping processing from a physical address used in the coprocessor.

【図１５５】図１５０におけるＩＢｕｓ受信部をより
詳細に示す図155 is a diagram showing the IBus receiving unit in FIG. 150 in more detail;

【図１５６】図２におけるＲＢｕｓ受信部をより詳細
に示す図156 is a diagram showing the RBus receiving unit in FIG. 2 in more detail;

【図１５７】図１５０におけるメモリ管理部をより詳
細に示す図157 is a diagram showing the memory management unit in FIG. 150 in more detail;

【図１５８】図２における周辺インターフェース制御
部をより詳細に示す図FIG. 158 is a diagram showing the peripheral interface control unit in FIG. 2 in more detail;

───────────────────────────────────────────────────── フロントページの続き (71)出願人 000001007 キヤノン株式会社東京都大田区下丸子３丁目30番２号 (72)発明者トマツトーマスプロコプオーストラリア国 2150 ニューサウスウェールズ州パラマッティ，カンベルストリート 13ディ／15 (72)発明者トレバーロバートエルボーンオーストラリア国 2094 ニューサウスウェールズ州フェアライト，ヒルトップクレセント５／54 (72)発明者マークプルバーオーストラリア国 2042 ニューサウスウェールズ州エンモア，トラファルガーストリート 15 ────────────────────────────────────────────────── ─── Continuing from the front page (71) Applicant 000001007 Canon Inc. 3-30-2 Shimomaruko, Ota-ku, Tokyo (72) Inventor Tomas Thomas Prokop Australia 2150 Paramatti, New South Wales 13 Campbell Street / 15 (72) Inventor Trevor Robert Elbourne Australia 2094 Fairlight, Hilltop Crescent, New South Wales 5/54 (72) Inventor Mark Pullbar Australia 2042 Enmore, New South Wales Trafalgar Street 15

Claims

[Claims]

An apparatus for decoding a plurality of data blocks encoded with a plurality of variable length codes interleaved with variable length uncoded bit fields and having a plurality of fixed length uncoded fields, Removing a plurality of fixed-length uncoded fields, the plurality of variable-length codes interleaved with the variable-length uncoded bit fields and the variable-length uncoded bit fields, and the plurality of data blocks in a plurality of data blocks; A preprocessing unit that outputs a plurality of position signals indicating the positions of the fixed-length uncoded fields, and the decoded data of the variable-length coded data that is input between the fixed-length uncoded fields, Synchronizing the position signal with the data to be decoded, so as to be output from the decoding device during the position signal corresponding to the long uncoded field Decoding apparatus characterized by comprising a means for delivering the part device.

2. A first processing unit having a first barrel shifter set and a first register, the first processing unit processing the plurality of variable length codes interleaved with the variable length uncoded bit field, and a second barrel shifter. A second processing unit that has a set and a second register, and that processes a plurality of position signals indicating positions of the plurality of fixed-length uncoded fields in a plurality of data blocks; The decoding device according to claim 1, wherein two processing units are the same, and outputs of the first and second barrel shifter sets and the first and second processing units receive the same control signal.

3. An uncoded variable-length field, wherein an output of the second processing unit for processing a position signal indicating a position of the fixed-length uncoded field is removed from data stored in a data register at the time of decoding. 3. The decoding device according to claim 2, wherein the decoding device is used for determining the size of the decoding data.

4. The pre-processing unit removes a plurality of fixed-length uncoded fields, and interleaves the plurality of variable-length uncoded bit fields and the plurality of variable-length uncoded bit fields. The code is defined as a plurality of fixed-length codes consisting of a plurality of fixed-length bit fields, and one fixed-length bit field is passed or removed in the pre-processing field, and is passed in the pre-processing field. 2. The decoding apparatus according to claim 1, further comprising a tag indicating any one of the tags, and outputting the tag such that the tag exists before or after a marker indicating a fixed-length uncoded field.

5. The decoding device according to claim 1, wherein said data block is Huffman-coded.

6. A method for decoding a plurality of data blocks encoded by a plurality of variable length codes interleaved with variable length uncoded bit fields and having a plurality of fixed length uncoded fields. Removing a plurality of fixed-length uncoded fields, the plurality of variable-length codes interleaved with the variable-length uncoded bit fields and the variable-length uncoded bit fields, and the plurality of data blocks in a plurality of data blocks; A pre-processing step of outputting a plurality of position signals indicating the position of the fixed-length uncoded field, and the decoded data of the variable-length coded data input between the fixed-length uncoded fields, The position signal is synchronized with the data to be decoded so that it is output from the decoding device during the position signal corresponding to the long uncoded field. Decoding method characterized by not including a step of sending to the external device.

7. Processing the plurality of variable length codes interleaved with the variable length uncoded bit field using a first processing unit having a first barrel shifter set and a first register; Second barrel shifter set having a second register and a second register
Processing a plurality of position signals indicating positions of the plurality of fixed-length uncoded fields in a plurality of data blocks using a processing unit, wherein the first and second processing units are the same. 7. The decoding method according to claim 6, wherein the outputs of the first and second barrel shifter sets and the first and second processing units receive the same control signal.

8. An uncoded variable removed from data stored in a data register at the time of decoding in accordance with an output of the second processing unit which processes a position signal indicating a position of the fixed-length uncoded field. The method of claim 7, further comprising determining a size of the long field.

9. The method according to claim 1, wherein in the preprocessing step, a plurality of fixed-length uncoded fields are removed, and the plurality of variable-length uncoded bit fields are interleaved with the variable-length uncoded bit fields. A long code as a plurality of fixed-length codes consisting of a plurality of fixed-length bit fields, and that one fixed-length bit field has been passed or removed in the pre-processing field and has been passed in the pre-processing field. 7. The decoding method according to claim 6, further comprising a tag indicating any one of the following: and outputting the tag such that the tag exists before or after a marker indicating a fixed-length uncoded field.

10. The decoding method according to claim 6, wherein said data block is Huffman-coded.

11. A discrete cosine transform (DCT) device, comprising a transposed matrix storage means and a combination circuit interconnected with the transposed matrix storage means for performing a DCT operation without using a clocked storage means. A DCT device having a circuit.

12. The DCT apparatus according to claim 11, wherein the combination circuit has a predetermined number of stages for performing a DCT operation, and the stages are sequentially arranged.

13. The DCT apparatus according to claim 11, further comprising multiplexing means for multiplexing data input to said DCT apparatus and an output of said transposed matrix storage means.

14. The DCT device according to claim 11, further comprising control means for controlling an operation of the DCT device.

15. An inverse discrete cosine transform (DCT) device, comprising: transposed matrix storage means; interconnected with said transposed matrix storage means;
Inverse DC having an arithmetic circuit whose main component is a combinational circuit for performing a DCT operation without using storage means
T device.

16. The inverse DCT apparatus according to claim 15, wherein the combination circuit has a predetermined number of stages for performing an inverse DCT operation, and the stages are sequentially arranged.

17. The inverse DCT apparatus according to claim 15, further comprising multiplexing means for multiplexing data input to said inverse DCT apparatus and an output of said transposed matrix storage means.

18. The inverse DCT device according to claim 15, further comprising control means for controlling an operation of the inverse DCT device.

19. A discrete cosine transform (DCT) of the data.
A method for performing a DCT operation on input data in accordance with a first direction by using an operation circuit mainly including a combination circuit that performs a DCT operation without clocked storage means; Storing the data stored in the transposed matrix storage means in the second direction using the arithmetic circuit; and storing the data stored in the transposed matrix storage means in the second direction in accordance with the first direction. DCT operation to obtain transformed data.

20. The D according to claim 19, wherein the DCT is calculated in a predetermined number of stages arranged sequentially.
CT method.

21. The method according to claim 19, further comprising the step of multiplexing input data and an output of said transposed matrix storage means.
The described DCT method.

22. An inverse discrete cosine transform (DCT) method using an arithmetic circuit mainly composed of a combinational circuit for performing an inverse DCT operation without clocked storage means, and adjusting an input coefficient in the first direction. Performing a DCT operation; storing the input coefficient subjected to the DCT in the transposed matrix storage means interconnected with the combinational circuit in accordance with the first direction; Performing an inverse DCT operation on the stored coefficients according to the second direction to obtain inverse transformed data.

23. The inverse DCT method according to claim 22, wherein the inverse DCT is calculated in a predetermined number of stages arranged sequentially.

24. The method according to claim 22, further comprising the step of multiplexing input data and an output of said transposed matrix storage means.
The described inverse DCT method.

25. Claims 6 to 10 and 19 to 2
A storage medium that stores the method according to any one of claims 4 as program code executable by a computer.