JP2005348410A

JP2005348410A - Compression equipment and its method

Info

Publication number: JP2005348410A
Application number: JP2005161916A
Authority: JP
Inventors: Tomasz Thomas Prokop; トーマスプロコプトマツ; Trevor Robert Elbourne; ロバートエルボーントレバー; Mark Pulver; プルバーマーク
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1997-04-30
Filing date: 2005-04-27
Publication date: 2005-12-15
Anticipated expiration: 2018-04-30
Also published as: JPH11122116A; JP4101253B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a DCT/inverse DCT equipment and method in which high-speed operation implemented. <P>SOLUTION: It is DCT equipments 1118-1126 which has a combinational circuit 1138, which does not have a middle clocked memory circuit, which performs DCT operation with an arithmetic circuit 1122 connected to a transposed matrix memory 1118, The combinational circuit has a predetermined number of stages 1158-1164, sequentially arranged for performing the DCT, Moreover, the DCT equipment may have a multiplexing machine 1124, which multiplexes an input data and an output data of the transposed matrix memory 1118. Moreover, it is desirable to have a control circuit 1116 which controls a performance of the DCT equipment. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は符号化されていない０またはそれ以上の可変長ビットフィールドを挿入された可変長コードを復号化する復号化器に関し、コードのいくつかは変化させない復号化器に関する。本発明はまた、パイプラインまたは記憶手段を持たないデータパスを用い、かつ高速で動作可能な離散コサイン変換（ＤＣＴ）装置に関する。 The present invention relates to a decoder that decodes a variable-length code with zero or more unencoded variable-length bit fields inserted, and a decoder that does not change some of the codes. The present invention also relates to a discrete cosine transform (DCT) apparatus that uses a data path without a pipeline or storage means and can operate at high speed.

一般に、大きな量のデータは電送、記憶、読み出しおよびハフマン符号化のような可変長符号化のいくつかのステージ手段での使用を含むさまざまな理由により圧縮されまた伸長される。ハフマン符号化はＤ．Ａ．ハフマンによる論文、「最小冗長コードの構築方法（”ＡＭｅｔｈｏｄｆｏｒｔｈｅＣｏｎｓｔｒｕｃｔｉｏｎｏｆＭｉｎｉｍｕｎＲｅｄｕｎｄａｎｃｙＣｏｄｅｓ”Ｐｒｏｃ．ＩＲＥ，４０：１０９８，１９５２）」（非特許文献１）により最初に開示された。多くの場合、符号化ビット列中の可変長符号は不連続であり、他の非符号化ビットフィールドが挿入されている。このビットフィールドは制御および／またはフォーマット情報を表し、かつ／またはマーカーヘッダ、マーカーコード、スタッフバイト、パディングビットおよび含まれる付加ビット、たとえばＪＰＥＧ符号化データなどを含む、符号化データに対する追加事項を供給する。 In general, large amounts of data are compressed and decompressed for a variety of reasons, including use in several stage means of variable length encoding such as electrical transmission, storage, retrieval and Huffman encoding. Huffman coding A. Huffman's paper, “A Method for the Construction of Minimum Redundancy Codes” Proc. IRE, 40: 1098, 1952 (Non-Patent Document 1). In many cases, the variable-length code in the coded bit string is discontinuous, and other uncoded bit fields are inserted. This bit field represents control and / or format information and / or provides additional information to the encoded data, including marker headers, marker codes, stuff bytes, padding bits and included additional bits such as JPEG encoded data, etc. To do.

可変長符号化においては、統計的に頻度の高い入力コードが頻度の低いデータよりも短い符号を割り当てられるように、入力データの発生の確からしさに基づいて異なる入力データに異なる長さの符号を割り当てる。頻度の低い入力コードは長いコードを割り当てられる。コードの割り当ては統計的もしくは適応的のいずれかによりなされる。統計的割り当ての場合、どのデータブロックが処理されるかに関わらず一定のデータには同じ出力コードが与えられる。適応的割り当ての場合、出力コードは特定の入力ブロックまたはデータブロックのセットの統計的分析および予想されるブロック間（またはブロックセット間）での変化に基づき割り当てられる。 In variable-length encoding, different input codes are assigned different length codes based on the probability of occurrence of input data so that statistically frequent input codes can be assigned shorter codes than less frequent data. assign. Infrequent input codes are assigned longer codes. Code assignment is done either statistically or adaptively. In the case of statistical allocation, the same output code is given to certain data regardless of which data block is processed. In the case of adaptive assignment, output codes are assigned based on statistical analysis of a particular set of input blocks or data blocks and expected changes between blocks (or between block sets).

高速な可変長復号化が必要となった場合、重大な問題が起きる。問題は特にＪＰＥＧ標準のような、符号化データ列が符号化データを挿入（インタリーブ）された符号化されていない可変長のビットフィールドを含む場合におきる。そのような可変長符号化データの高速復号化における大きな困難は、ＪＰＥＧ標準のように特定の非符号化ビットフィールドの長さが引き続く（符号化された）データの復号化が完全に終了した後でないと判別できない場合に発生する。次の符号化データの開始位置が、後ろのコードの復号化が完全に終わった後でないとわからないため、一般的に直接パイプライン処理を復号化器とともに用いることができない。 A serious problem arises when fast variable length decoding is required. The problem arises particularly when the encoded data sequence includes uncoded variable length bit fields into which the encoded data is inserted (interleaved), such as the JPEG standard. The great difficulty in high-speed decoding of such variable-length encoded data is that after the decoding of the data (encoded) in which the length of a specific unencoded bit field continues as in the JPEG standard is complete. This occurs when it cannot be determined. In general, direct pipeline processing cannot be used with a decoder because the start position of the next encoded data is not known until after the decoding of the subsequent code is completed.

現存する解決法は、多くの用途に対して遅すぎるが、一つの入力データの復号化に数ステップ（クロックサイクル）を必要とするか、繰り返しユニット（ｉｔｅｒａｔｉｖｅｕｎｉｔｓ）を用いて一つより多いシンボルを１ステージ（クロックサイクル）で擬似的に同時復号化するかである。しかし、更なる復号化ブロックの追加はしばしばそのような復号化器を経済的に釣り合わなくし、さらに必要十分な速度を得られなくする。これは次の復号化器の処理開始が依然として次の入力データの先頭を決定する、前に位置する復号化器の処理結果に依存するため、複数の復号化器が完全な並列動作をしないからである。その結果、１ステージ（クロックサイクル）で複数のシンボルを復号化したとしても、そのステージ（クロック期間）は相対的に長く、全体の復号化器としては多くの用途において遅すぎることになる。 Existing solutions are too slow for many applications, but require several steps (clock cycles) to decode one input data, or use more than one symbol with iterative units. Is pseudo-simultaneously decoded in one stage (clock cycle). However, the addition of further decoding blocks often renders such decoders economically unbalanced, and also does not provide the necessary and sufficient speed. This is because the start of the next decoder still depends on the processing result of the previous decoder, which determines the beginning of the next input data, so that the multiple decoders do not operate in full parallel. It is. As a result, even if multiple symbols are decoded in one stage (clock cycle), the stage (clock period) is relatively long and too slow for many applications as an overall decoder.

よって、従来の復号化器の問題点を１つかそれ以上解決した、可変長非符号化ビットフィールドでインターリーブされた可変長コードの復号化器に対する要求は明らかに存在する。具体的には、図７７に示された離散コサイン変換（ＤＣＴ）装置は８×８画素のブロックの完全二次元（２−Ｄ）変換を８×８画素ブロックの行にまず１−ＤＤＣＴを行うことで実現する。そして、別の１−ＤＤＣＴを８×８画素ブロックの列に対して行う。このような装置は具体的には入力回路１０９６、演算回路１１０４、制御回路１０９８、転置行列メモリ回路１０９０及び出力回路１０９２から構成される。 Thus, there is clearly a need for a decoder for variable length codes interleaved with variable length uncoded bit fields that solves one or more of the problems of conventional decoders. Specifically, the discrete cosine transform (DCT) apparatus shown in FIG. 77 first performs 1-DDCT on an 8 × 8 pixel block row by performing a complete two-dimensional (2-D) transform of an 8 × 8 pixel block. It will be realized. Then, another 1-DDCT is performed on a column of 8 × 8 pixel blocks. Specifically, such an apparatus includes an input circuit 1096, an arithmetic circuit 1104, a control circuit 1098, a transposed matrix memory circuit 1090, and an output circuit 1092.

入力回路１０９６は８×８ブロックから８ビットの画素を受け付ける。入力回路１０９６は中間多重化器１１００、１１０２によって演算回路１１０４に接続される。演算回路１１０４は８×８ブロックの行または列全体のいずれかに算術操作を施す。制御回路１０９８は他のすべての回路を制御し、ＤＣＴアルゴリズムを実現する。演算回路の出力は転置行列メモリ１０９０、レジスタ１０９５および出力回路１０９２にに接続される。転置行列メモリは次の多重化器１１０２に出力を供給する多重化器１１００に接続されている。多重化器１１０２はまたレジスタ１０９４から入力を受信する。転置行列メモリ１０９０は８×８ブロックのデータを行に受け付け、列にデータを生成する。出力回路１０９２は８×８の画素データブロックになされるＤＣＴの係数を供給する。 The input circuit 1096 accepts 8-bit pixels from the 8 × 8 block. Input circuit 1096 is connected to arithmetic circuit 1104 by intermediate multiplexers 1100 and 1102. Arithmetic circuit 1104 performs arithmetic operations on either rows or entire columns of 8 × 8 blocks. The control circuit 1098 controls all other circuits and implements the DCT algorithm. The output of the arithmetic circuit is connected to a transposed matrix memory 1090, a register 1095, and an output circuit 1092. The transposed matrix memory is connected to a multiplexer 1100 that supplies an output to the next multiplexer 1102. Multiplexer 1102 also receives input from register 1094. The transposed matrix memory 1090 receives 8 × 8 block data in rows and generates data in columns. The output circuit 1092 supplies the DCT coefficients that are made into 8 × 8 pixel data blocks.

典型的なＤＣＴ装置において、演算回路がもっとも複雑であるため、基本的には演算回路１１０４の速度が全体の速度を決定付ける。図７７における演算回路１１０４は、演算処理を図７８を参照して以下に説明されるいくつかのステージに分割して構成されている。これらのステージ１１１４、１１４８、１１５２、１１５６は加算器や乗算器などの共通利用されるものの集まりによって一つの回路で実現することができる。しかし、そのような回路１１０４は、共通に使われる一つの回路で複数のステージを構成しているため、最適化されたものに比べて遅いという欠点がある。これは中間結果の記憶に用いる記憶手段を含む。そのような回路のクロックサイクルとして割り当てられる時間は、回路の中の最も遅いステージの時間と同じかそれ以上でなければならず、全体としての時間は全部のステージの合計よりも長くなる可能性があるからである。 In a typical DCT device, the arithmetic circuit is the most complex, so basically the speed of the arithmetic circuit 1104 determines the overall speed. The arithmetic circuit 1104 in FIG. 77 is configured by dividing the arithmetic processing into several stages described below with reference to FIG. These stages 1114, 1148, 1152, and 1156 can be realized in one circuit by a collection of commonly used ones such as an adder and a multiplier. However, such a circuit 1104 has a disadvantage that it is slower than an optimized circuit because a single circuit used in common constitutes a plurality of stages. This includes storage means used to store intermediate results. The time allotted as a clock cycle for such a circuit must be equal to or greater than the time of the slowest stage in the circuit, and the overall time can be longer than the sum of all stages. Because there is.

図７８は図７７に示した装置における、４ステージのＤＣＴの一部としての典型的な演算データパスを示している。図は現実の構成を反映しているわけではないが、機能は反映している。４つのステージ１１４４、１１４８、１１５２、１１５６のそれぞれは一つの、再構成可能な回路で構成されている。１サイクル毎に１−ＤＤＣＴである各４つの演算ステージ１１４４、１１４８、１１５２、１１５６が再構成される。この回路において、４つのステージ１１４４、１１４８、１１５２、１１５６のそれぞれは共通に使用されるもの（加算器および乗算器）の集まりを使用しており、ハードウエアを最小化している。 FIG. 78 shows a typical arithmetic data path as part of a 4-stage DCT in the apparatus shown in FIG. The diagram does not reflect the actual configuration, but reflects the function. Each of the four stages 1144, 1148, 1152, and 1156 is composed of one reconfigurable circuit. Each of the four arithmetic stages 1144, 1148, 1152, 1156, which is 1-DDCT, is reconfigured per cycle. In this circuit, each of the four stages 1144, 1148, 1152, and 1156 uses a collection of commonly used ones (adder and multiplier), minimizing hardware.

しかしながらこの回路の欠点は、最適化されたものに比べて遅いということである。４つのステージ１１４４、１１４８、１１５２、１１５６のそれぞれは同一の加算器および乗算器の集まりで構成されている。クロック周期は最も遅いステージで決まり、この例ではブロック１１４４における２０ｎｓである。入力及び出力多重化器１１４６および１１５４の遅延（各２ｎｓ）およびフリップフロップ１１５０の遅延（３ｎｓ）を加えると、合計時間は２７ｎｓである。よって、このＤＣＴ要素は２７ｎｓで動作することが可能である。 However, the disadvantage of this circuit is that it is slow compared to the optimized one. Each of the four stages 1144, 1148, 1152, and 1156 is composed of the same set of adders and multipliers. The clock period is determined by the slowest stage, which in this example is 20 ns in block 1144. Adding the delay of input and output multiplexers 1146 and 1154 (2 ns each) and the delay of flip-flop 1150 (3 ns), the total time is 27 ns. Thus, this DCT element can operate at 27 ns.

パイプライン化されたＤＣＴ要素もまたよく知られている。この構成の問題点は構成に多量のハードウエアを必要とすることである。本発明では同じ性能すなわち処理速度を提供はしないが、非常によい性能対大きさの妥協点を提供する。さらに、現状のＤＣＴ要素の大半よりもよい速度的な利点を提供する。よって、従来の技術が有する１つまたはそれ以上の課題を解決できる、改良されたＤＣＴ／逆ＤＣＴ方法及び装置に対する要求は明確である。特に、ＤＣＴ／逆ＤＣＴ装置において必要な結果を計算する中心的な演算回路に要する時間を短縮でき、ＤＣＴまたは逆ＤＣＴ全体の性能を向上する方法および装置の必要性は明確である。 Pipelined DCT elements are also well known. The problem with this configuration is that the configuration requires a large amount of hardware. The present invention does not provide the same performance or processing speed, but provides a very good performance versus size compromise. In addition, it offers a speed advantage over most of the current DCT elements. Thus, there is a clear need for improved DCT / inverse DCT methods and apparatus that can solve one or more of the problems of the prior art. In particular, there is a clear need for a method and apparatus that can reduce the time required for a central arithmetic circuit to calculate the required results in a DCT / inverse DCT apparatus and improve the overall performance of the DCT or inverse DCT.

Ｄ．Ａ．ハフマン、「最小冗長コードの構築方法」（”A Method for the Construction of Minimun Redundancy Codes”），Proc.IRE,40:1098,1952）D. A. Huffman, “A Method for the Construction of Minimun Redundancy Codes”, Proc. IRE, 40: 1098, 1952)

本発明の第１は、可変長の非符号化ビットフィールドでインターリーブされた複数の可変長コードにより符号化されたデータと、符号化されない固定長の非符号化フィールドを有する複数のデータブロックの復号化装置であって、複数の固定長非符号化フィールドを除去し、可変長非符号化ビットフィールドと可変長の非符号化ビットフィールドでインターリーブされた複数の可変長コードと、複数のデータブロック中の複数の固定長非符号化フィールドの位置を示す複数の位置信号とを出力する前処理部と、固定長非符号化フィールドの間に入力される可変長符号化データの復号化データが、固定長非符号化フィールドに対応する位置信号の間に復号化装置から出力されるように、位置信号を復号化されるデータと同期させて復号化装置の出力へ受け渡しするする受け渡し手段とを含む復号化装置である。 The first of the present invention is to decode data encoded by a plurality of variable-length codes interleaved with variable-length non-encoded bit fields and a plurality of data blocks having non-encoded fixed-length non-encoded fields. A plurality of fixed length uncoded fields, a plurality of variable length codes interleaved with a variable length uncoded bit field and a variable length uncoded bit field, and a plurality of data blocks The fixed-length encoded data input between the pre-processing unit that outputs a plurality of position signals indicating the positions of a plurality of fixed-length uncoded fields and the variable-length encoded data that is input between the fixed-length uncoded fields is fixed. The position signal is synchronized with the data to be decoded so that it is output from the decoding apparatus during the position signal corresponding to the long uncoded field. A decoding apparatus comprising a transfer means for to pass into force.

また、好ましくは、第１のバレルシフタセットと第１レジスタを有し、可変長の非符号化ビットフィールドでインターリーブされた複数の可変長コードを処理する第１処理部と、第２のバレルシフタセットと第２レジスタを有し、複数のデータブロック中の複数の固定長非符号化フィールドの位置を示す複数の位置信号を処理する第２処理部とを更に有し、第１および第２処理部が同一であり、第１、第２バレルシフタセットの出力と第１、第２処理部が同じ制御信号を受信する復号化装置である。 Preferably, the first processing unit includes a first barrel shifter set and a first register, and processes a plurality of variable length codes interleaved with a variable length uncoded bit field, and a second barrel shifter set; A second processing unit that has a second register and processes a plurality of position signals indicating the positions of a plurality of fixed-length non-encoded fields in the plurality of data blocks, wherein the first and second processing units are The decoding apparatus is the same, and the outputs of the first and second barrel shifter sets and the first and second processing units receive the same control signal.

好ましい別の構成としては、固定長非符号化フィールドの位置を示す位置信号を処理する第２処理部の出力が、データレジスタに保管されたデータから復号化時に除去される非符号化可変長フィールドのサイズ決定に用いられる復号化装置である。さらに、別の好ましい構成としては、前処理部が、複数の固定長非符号化フィールドを除去し、可変長非符号化ビットフィールドと可変長の非符号化ビットフィールドでインターリーブされた複数の可変長コードを、複数の固定長ビットフィールドからなる複数の固定長コードとして、かつ一つの固定長ビットフィールドが、前処理フィールドでパス又は除去されたこと、前処理フィールドでパスされたことのいずれかを示すタグを有し、かつタグは固定長の非符号化フィールドを示すマーカーの前又は後ろに存在するように出力する復号化装置である。 As another preferred configuration, the output of the second processing unit that processes the position signal indicating the position of the fixed-length uncoded field is removed from the data stored in the data register at the time of decoding. It is a decoding apparatus used for determining the size of Furthermore, as another preferable configuration, the preprocessing unit removes a plurality of fixed-length uncoded fields, and a plurality of variable lengths interleaved with a variable-length uncoded bit field and a variable-length uncoded bit field. Whether the code is a plurality of fixed-length codes composed of a plurality of fixed-length bit fields and one fixed-length bit field is passed or removed in the pre-processing field or passed in the pre-processing field The decoding device has a tag to be output, and the tag is output so as to exist before or after the marker indicating the fixed-length uncoded field.

データブロックがハフマン符号化されていることがさらに好ましい。また、本発明の第２は、可変長の非符号化ビットフィールドでインターリーブされた複数の可変長コードにより符号化されたデータと、符号化されない固定長の非符号化フィールドを有する複数のデータブロックの復号化方法であって、複数の固定長非符号化フィールドを除去し、可変長非符号化ビットフィールドと可変長の非符号化ビットフィールドでインターリーブされた複数の可変長コードと、複数のデータブロック中の複数の固定長非符号化フィールドの位置を示す複数の位置信号とを出力する前処理ステップと、固定長非符号化フィールドの間に入力される可変長符号化データの復号化データが、固定長非符号化フィールドに対応する位置信号の間に復号化装置から出力されるように、位置信号を復号化されるデータと同期させて復号化装置の出力へ受け渡しするする受け渡しステップとを含む復号化方法である。 More preferably, the data block is Huffman encoded. A second aspect of the present invention is a plurality of data blocks each having data encoded by a plurality of variable length codes interleaved with a variable length non-encoded bit field and a fixed length non-encoded field that is not encoded. And a plurality of variable length codes interleaved with a variable length non-encoded bit field and a variable length non-encoded bit field, and a plurality of data A preprocessing step for outputting a plurality of position signals indicating positions of a plurality of fixed-length non-encoded fields in the block; and decoded data of variable-length encoded data input between the fixed-length non-encoded fields. The position signal is synchronized with the data to be decoded so that it is output from the decoding device during the position signal corresponding to the fixed length uncoded field. A decoding method and a transfer step of to pass to the output of Goka device.

好ましくは、第１のバレルシフタセットと第１レジスタを有する第１処理部を用いて、可変長の非符号化ビットフィールドでインターリーブされた複数の可変長コードを処理するステップと、第２のバレルシフタセットと第２レジスタを有する第２処理部を用いて、複数のデータブロック中の複数の固定長非符号化フィールドの位置を示す複数の位置信号を処理するステップを更に有し、第１および第２処理部が同一であり、第１、第２バレルシフタセットの出力と第１、第２処理部が同じ制御信号を受信する復号化方法である。 Preferably, using a first processing unit having a first barrel shifter set and a first register, processing a plurality of variable length codes interleaved with variable length uncoded bit fields, and a second barrel shifter set And a second processing unit having a second register, and further comprising a step of processing a plurality of position signals indicating the positions of a plurality of fixed-length uncoded fields in the plurality of data blocks. This is a decoding method in which the processing units are the same, and the outputs of the first and second barrel shifter sets and the first and second processing units receive the same control signal.

さらには、固定長非符号化フィールドの位置を示す位置信号を処理する第２処理部の出力に応じて、データレジスタに保管されたデータから復号化時に除去される非符号化可変長フィールドのサイズを決定するステップを更に有する復号化方法である。好ましくは、前処理部ステップにおいて、複数の固定長非符号化フィールドを除去し、可変長非符号化ビットフィールドと可変長の非符号化ビットフィールドでインターリーブされた複数の可変長コードを、複数の固定長ビットフィールドからなる複数の固定長コードとして、かつ一つの固定長ビットフィールドが、前処理フィールドでパス又は除去されたこと、前処理フィールドでパスされたことのいずれかを示すタグを有し、かつタグは固定長の非符号化フィールドを示すマーカーの前又は後ろに存在するように出力する復号化方法である。 Furthermore, the size of the non-encoded variable length field that is removed at the time of decoding from the data stored in the data register according to the output of the second processing unit that processes the position signal indicating the position of the fixed-length non-encoded field A decoding method further comprising the step of determining. Preferably, in the preprocessing unit step, a plurality of fixed-length uncoded fields are removed, and a plurality of variable-length codes interleaved with a variable-length uncoded bit field and a variable-length uncoded bit field are As a plurality of fixed-length codes consisting of fixed-length bit fields, and having a tag indicating whether one fixed-length bit field is passed or removed in the preprocessing field or passed in the preprocessing field The tag is a decoding method for outputting the tag so as to exist before or after the marker indicating the fixed-length non-encoded field.

また、データブロックがハフマン符号化されていることが好ましい。以下の詳細な説明においては、他の説明はもとより、特に図８２〜９１およびそれに関係する説明に注意されたい。 The data block is preferably Huffman-encoded. In the following detailed description, attention should be paid to FIGS. 82 to 91 and related descriptions as well as other descriptions.

「目次」
１．０図面の簡単な説明
２．０テーブルリスト
３．０好適な及び他の実施例
３．１複数のストリームアーキテクチャの概要
３．２ホスト／コプロセッサのキューイング
３．３コプロセッサのレジスタ説明
３．４複数のストリームのフォーマット
３．５現アクティブストリームの判定
３．６現アクティブストリームのフェッチ命令
３．７命令のデコード及び実行
３．８命令コントローラのレジスタの更新
３．９レジスタアクセスセマフォの意味論
３．１０命令コントローラ
３．１１ローカルレジスタファイルモジュールの説明
３．１２レジスタのリード・ライト処理
３．１３メモリエリアのリード／ライト処理
３．１４Ｃバス構造
３．１５コプロセッサのデータタイプとデータ操作
３．１６データ正規化処理
３．１７アクセラレータカードの画像処理
３．１７．１合成
３．１７．２色空間変換命令
ａ．単一出力カラー空間（ＳＯＧＣＳ）変換モード
ｂ．複数出力からー空間モード
３．１７．３ＬＰＥＧ符号化／復号化
ａ．符号化
ｂ．復号化
３．１７．４テーブル索引
３．１７．５データ符号化命令
３．１７．６高速ＤＣＴ装置
３．１７．７ハフマン復号
３．１７．８イメージ変換命令
３．１７．９コンボルージョン命令
３．１７．１０マトリクス乗算
３．１７．１１階調（ハーフトーン）
３．１７．１２階層イメージフォーマット伸長
３．１７．１３メモリコピー命令
ａ．汎用データ移動命令
ｂ．ローカルＤＭＡ命令
３．１７．１４フロー制御命令
３．１８アクセラレータカードのモジュール
３．１８．１ピクセルオーガナイザ
３．１８．２ＭＵＶバッファ
３．１８．３結果オーガナイザ
３．１８．４オペランドオーガナイザＢ，Ｃ
３．１８．５メインデータパスユニット
３．１８．６データキャッシュコントローラとキャッシュ
ａ．ノーマルキャッシュモード
ｂ．単一出力一般カラー空間変換モード
ｃ．複数出力一般カラー空間変換モード
ｄ．ＪＰＥＧ符号化モード
ｅ．低速ＪＰＥＧ復号モード
ｆ．マトリクス乗算モード
ｇ．ディスエーブルモード
ｈ．無効化モード
３．１８．７入力インターフェーススイッチ
３．１８．８ローカルメモリコントローラ
３．１８．９その他のモード
３．１８．１０外部インターフェースコントローラ
３．１８．１１周辺インターフェースコントローラ
テーブル索引
テーブル１：レジスタの説明
テーブル２：オペコードの説明
テーブル３：オペランドタイプ
テーブル４：オペランド説明
テーブル５：モジュールセットアップ順序
テーブル６：Ｃバス信号の定義
テーブル７：Ｃバスのトランザクションタイプ
テーブル８：データ操作レジスタフォーマット
テーブル９：希望データタイプ
テーブル１０：シンボル説明
テーブル１１：合成処理
テーブル１２：ＳＯＧＣＳモード用アドレス合成
テーブル１２Ａ：色空間変換用命令符号化
テーブル１３：色変換命令用のマイナーオペコード符号化
テーブル１４：データキャッシュに記憶されたハフマン及び量子化テーブル
テーブル１５：フェッチアドレス
テーブル１６：ハフマン符号化用テーブル
テーブル１７：ハフマン及び量子化テーブル用バンクアドレス
テーブル１８：命令ワード−マイナーオペコードフィールド
テーブル１９：命令ワード−マイナーオペコードフィールド
テーブル２０：命令オペランド−結果ワード
テーブル２１：命令ワード
テーブル２２：命令オペランド−結果ワード
テーブル２３：命令ワード
テーブル２４：命令オペランド−結果ワード
テーブル２５：命令ワード−マイナーオペコードフィールド
テーブル２６：命令ワード−マイナーオペコードフィールド
テーブル２７：分数テーブル
［好適ならびに他の実施例の説明」
好適な実施例では、ハードウェアアクセラレータによる２つの独立命令ストリームの利用によってハードウェアラスタリングを行うことで大きな利点が得られている。従って、第一の命令ストリームが現ページの印刷準備をしている間に、次の命令ストリームが次ページの印刷準備をすることができる。ハードウェア資源は、ハードウェアアクセラレータが出力装置以上の速度で動作可能である場合に特に効率的に利用することができる。 "table of contents"
1.0 Brief Description of Drawings 2.0 Table List 3.0 Preferred and Other Embodiments 3.1 Overview of Multiple Stream Architecture 3.2 Host / Coprocessor Queuing 3.3 Coprocessor Register Description 3.4 Multiple Stream Formats 3.5 Current Active Stream Determination 3.6 Current Active Stream Fetch Instruction 3.7 Instruction Decode and Execution 3.8 Instruction Controller Register Update 3.9 Register Access Semaphore Meaning 3.10 Instruction controller 3.11 Local register file module description 3.12 Register read / write processing 3.13 Memory area read / write processing 3.14 C bus structure 3.15 Coprocessor data type and data Operation 3.16 Data normalization process 3.1 Image processing 3.17.1 Synthesis 3.17.2 color space conversion instruction a accelerator card. Single output color space (SOGCS) conversion mode b. From multiple outputs to spatial mode 3.17.3 LPEG encoding / decoding a. Encoding b. Decoding 3.17.4 Table index 3.17.5 Data encoding instruction 3.17.6 High-speed DCT device 3.17.7 Huffman decoding 3.17.8 Image conversion instruction 3.17.9 Convolution instruction 3 .17.10 Matrix multiplication 3.17.11 Tone (halftone)
3.17.12 Hierarchical image format decompression 3.17.13 Memory copy instruction a. General data movement instruction b. Local DMA instructions 3.17.14 Flow control instructions 3.18 Accelerator card module 3.18.1 Pixel organizer 3.18.2 MUV buffer 3.18.3 Result organizer 3.18.4 Operand organizers B, C
3.18.5 Main data path unit 3.18.6 Data cache controller and cache a. Normal cache mode b. Single output general color space conversion mode c. Multiple output general color space conversion mode d. JPEG encoding mode e. Low speed JPEG decoding mode f. Matrix multiplication mode g. Disable mode h. Invalidation mode 3.18.7 Input interface switch 3.18.8 Local memory controller 3.18.9 Other modes 3.18.10 External interface controller 3.18.11 Peripheral interface controller Table index Table 1: Register Description Table 2: Opcode Description Table 3: Operand Type Table 4: Operand Description Table 5: Module Setup Order Table 6: C Bus Signal Definition Table 7: C Bus Transaction Type Table 8: Data Manipulation Register Format Table 9: Desired Data type Table 10: Explanation of symbols Table 11: Composition processing Table 12: Address composition for SOGCS mode Table 12A: Instruction coding table for color space conversion Table 13: Minor opcode encoding for color conversion instruction Table 14: Huffman and quantization table stored in data cache Table 15: Fetch address Table 16: Huffman encoding table Table 17: Bank address for Huffman and quantization table Table 18: Instruction word-minor opcode field Table 19: Instruction word-minor opcode field Table 20: Instruction operand-result word Table 21: Instruction word Table 22: Instruction operand-result word Table 23: Instruction word table 24: Instruction operand- Result Word Table 25: Instruction Word-Minor Opcode Field Table 26: Instruction Word-Minor Opcode Field Table 27: Fraction Table Description of preferred and other embodiments "
In the preferred embodiment, hardware rasterization is gained by using two independent instruction streams by a hardware accelerator. Thus, while the first instruction stream is preparing to print the current page, the next instruction stream can prepare to print the next page. Hardware resources can be used particularly efficiently when the hardware accelerator can operate at a speed higher than the output device.

好適な実施例では、２命令ストリームを用いる構成を示す。しかし、２以上の命令ストリームを用いる構成も可能であり、ハードウェアトレードオフを鑑みてもより多くのストリームを用いることによる利点が得られる。２つのストリームを用いることで、ラスタ画像コプロセッサのハードウェア資源は、出力装置に応じて現ページ、バンド、ストリップなどを印刷装置に転送している間にも、続くページ、バンド、ストリップなどの準備に常に関わることができる。
３．１複数ストリームアーキテクチャの一般構成
図１は、好適な実施例を含むコンピュータハードウェア構成２０１を模式的に示した図である。構成２０１には、ブリッジ２０４を介してホスト記憶メモリ２０３に接続されたホストＣＰＵ２０２から成る標準ホストコンピュータシステムが含まれている。ホストコンピュータシステムには、オペレーティングシステムプログラム、アプリケーション、情報ディスプレイなどの一般のコンピュータシステム機能が備わっており、ホストコンピュータシステムはＰＣＩバスインタフェース２０７を介して標準ＰＣＩバス２０６に接続されている。なお、ＰＣＩ標準は良く知られた業界標準であり、市販のほとんどのコンピュータシステム、特にマイクロソフトウインドウズ（商標）オペレーティングシステムを搭載しているシステムには、ＰＣＩバス２０６が備わっている。ＰＣＩバス２０６を用いることにより、ＰＣＩバスインタフェース２１０、他のデバイス２１１、ローカルメモリ２１２などを更に含む１つ或は複数のＰＣＩカード（例えば２０９）を構成２０１に付加して利用することが容易になる。 In the preferred embodiment, a configuration using two instruction streams is shown. However, a configuration using two or more instruction streams is also possible, and an advantage of using more streams can be obtained in view of hardware trade-off. By using two streams, the hardware resources of the raster image coprocessor can be used to transfer the current page, band, strip, etc. to the printing device depending on the output device, while the next page, band, strip, etc. Be always involved in preparation.
3.1 General Configuration of Multiple Stream Architecture FIG. 1 is a diagram schematically illustrating a computer hardware configuration 201 that includes a preferred embodiment. Configuration 201 includes a standard host computer system consisting of a host CPU 202 connected to host storage memory 203 via a bridge 204. The host computer system is provided with general computer system functions such as an operating system program, an application, and an information display. The host computer system is connected to a standard PCI bus 206 via a PCI bus interface 207. Note that the PCI standard is a well-known industry standard, and most commercially available computer systems, particularly systems incorporating the Microsoft Windows ™ operating system, include a PCI bus 206. By using the PCI bus 206, it is easy to add one or more PCI cards (for example, 209) further including the PCI bus interface 210, other devices 211, local memory 212, etc. to the configuration 201 for use. Become.

好適な実施例では、ページ記述言語で表現されたグラフィックス処理を高速にするために、ラスタ画像アクセラレータカード２２０を備える。ラスタ画像アクセラレータカード（ＰＣＩバスインタフェース２２１を備える）は、他のＰＣＩカード２０９などと同様にホストＣＰＵ２０２とは、緩やかに結合された共有メモリの形態で動作するように設計されている。なお、必要であれば、画像アクセラレータカード２２０を更にホストコンピュータシステムに付加することもできる。ラスタ画像アクセラレータカードは、ラスタ画像処理動作における複雑かつ多量の動作処理を高速化するためのものであり、これらの動作としては、
（ａ）合成
（ｂ）一般化色空間変換
（ｃ）ＪＰＥＧ符号化／復号
（ｄ）ハフマン、ランレングス、予測符号化／復号
（ｅ）階層的画像（商標）復号
（ｆ）一般化アフィン画像変換
（ｇ）小カーネル畳込演算（コンボルージョン）
（ｈ）行列演算
（ｉ）ハーフトーン処理
（ｊ）一括算術／メモリコピー演算
ラスタ画像アクセラレータカード２２０は更にラスタ画像コプロセッサ２２４に接続されたローカルメモリ２２３を備え、ラスタ画像コプロセッサ２２４はホストＣＰＵ２０２からの命令に基づいてラスタ画像アクセラレータカード２２０を起動する。ここで、コプロセッサ２２４は特定用途向けＬＳＩ（ＡＳＩＣ）であることが望ましい。また、ラスタ画像コプロセッサ２２４は、必要な少なくとも１つのプリンターデバイス２２６を周辺インタフェース２２５を介して制御する能力を有する。更に、画像アクセラレータカード２２０は、スキャナなどの入力／出力デバイスを制御することも可能である。あわせて、アクセラレータカード２２０にはラスタ画像コプロセッサ２２４に接続された一般外部インターフェース２２７が備えられており、モニタリングやテストを行うこともできる。。 In the preferred embodiment, a raster image accelerator card 220 is provided to speed up graphics processing expressed in a page description language. The raster image accelerator card (comprising the PCI bus interface 221) is designed to operate in the form of a shared memory that is loosely coupled to the host CPU 202, like the other PCI cards 209 and the like. If necessary, the image accelerator card 220 can be further added to the host computer system. The raster image accelerator card is for accelerating a complicated and large amount of operation processing in the raster image processing operation.
(A) Synthesis (b) Generalized color space conversion (c) JPEG encoding / decoding (d) Huffman, run length, predictive encoding / decoding (e) Hierarchical image (TM) decoding (f) Generalized affine image Conversion (g) Small kernel convolution (convolution)
(H) Matrix operation (i) Halftone processing (j) Batch arithmetic / memory copy operation The raster image accelerator card 220 further includes a local memory 223 connected to the raster image coprocessor 224. The raster image coprocessor 224 is the host CPU 202. The raster image accelerator card 220 is activated based on the command from Here, the coprocessor 224 is preferably an application specific LSI (ASIC). The raster image coprocessor 224 also has the ability to control at least one required printer device 226 via the peripheral interface 225. Furthermore, the image accelerator card 220 can also control input / output devices such as a scanner. In addition, the accelerator card 220 is provided with a general external interface 227 connected to the raster image coprocessor 224, and can also perform monitoring and testing. .

実行モードでは、ホストＣＰＵ２０２がＰＣＩバス２０６を介して一連の命令やデータを送信し、ラスタ画像コプロセッサ２２４で画像の生成処理を行う。送信されたデータはローカルメモリ２２３のみならずラスタ画像コプロセッサ２２４中のキャッシュ２３０、あるいはコプロセッサ２２４中のレジスタ２２９に蓄えられる。 In the execution mode, the host CPU 202 transmits a series of commands and data via the PCI bus 206, and the raster image coprocessor 224 performs image generation processing. The transmitted data is stored not only in the local memory 223 but also in the cache 230 in the raster image coprocessor 224 or the register 229 in the coprocessor 224.

図２は、ラスタ画像コプロセッサ２２４をより詳細に示した図である。コプロセッサ２２４は、前記の処理を高速化するためのものであり、命令制御部２３５の制御下にある複数の部位から構成される。コプロセッサが外界と通信するために、図１のローカルメモリ２２３と通信するためのローカルメモリ制御部２３６を具備している。周辺インタフェース制御部２３７は、プリンタデバイスとの通信に利用されるもので、セントロニクスインタフェース標準フォーマットや他のビデオインタフェースフォーマットなどの標準フォーマットを利用する。周辺インタフェース制御部２３７はローカルメモリ制御部２３６と内部接続されている。ローカルメモリ制御部２３６と外部インタフェース制御部２３８とは入力インタフェーススイッチ２５２を介して接続されており、入力インタフェーススイッチ２５２は命令制御部２３５と接続されている。入力インタフェーススイッチ２５２はまたピクセルオーガナイザ２４６とデータキャッシュ制御部２４０に接続されている。入力インタフェーススイッチ２５２は、外部インタフェース制御部２３７とローカルメモリ制御部２３６からのデータをスイッチして命令制御部２３５、あるいはデータキャッシュ制御部２４０、ピクセルオーガナイザ２４６に転送するためのものである。 FIG. 2 shows the raster image coprocessor 224 in more detail. The coprocessor 224 is for accelerating the processing described above, and includes a plurality of parts under the control of the instruction control unit 235. In order for the coprocessor to communicate with the outside world, a local memory control unit 236 for communicating with the local memory 223 of FIG. 1 is provided. The peripheral interface control unit 237 is used for communication with a printer device, and uses a standard format such as a Centronics interface standard format or another video interface format. The peripheral interface control unit 237 is internally connected to the local memory control unit 236. The local memory control unit 236 and the external interface control unit 238 are connected via the input interface switch 252, and the input interface switch 252 is connected to the instruction control unit 235. The input interface switch 252 is also connected to the pixel organizer 246 and the data cache controller 240. The input interface switch 252 is for switching data from the external interface control unit 237 and the local memory control unit 236 and transferring the data to the instruction control unit 235, the data cache control unit 240, and the pixel organizer 246.

外部インタフェース制御部２３８は、図１中のＰＣＩバス２０６と通信するためにラスタ画像コプロセッサ２２４中に具備されており、命令制御部２３５と接続されている。また、テスト診断を行ったり、クロック信号やグローバル信号を入力するために、命令制御部２３９に接続され、コプロセッサ２２４と協調して動作する他モジュール２３９が備わっている。 The external interface control unit 238 is provided in the raster image coprocessor 224 to communicate with the PCI bus 206 in FIG. 1 and is connected to the command control unit 235. Further, in order to perform a test diagnosis and input a clock signal and a global signal, another module 239 connected to the instruction control unit 239 and operating in cooperation with the coprocessor 224 is provided.

データキャッシュ２３０は、接続されているデータキャッシュ制御部２４０の制御下で動作する。データキャッシュ２３０は種々の用途において用いられるが、コプロセッサ２２４において引き続き使用される確率の高い最近使用した値を蓄えるために主として用いられる。上述の高速化処理は、主としてＪＰＥＧ符号化／復号器２４１やメインデータパス部２４２によって複数のデータストリームの処理が行われる。部位２４１、２４２は並列にピクセルオーガナイザ２４６と２つのオペランドオーガナイザ２４７、２４８に接続されている。部位２４１、２４２からの処理されたストリームは、結果オーガナイザ２４９に転送され、必要であれば処理や再フォーマット処理が行われる。なお、中間結果を記録しておきたいことも多いため、データキャッシュ２３０に加えて、ピクセルオーガナイザ２４６と結果オーガナイザ２４９との間にマルチユースト値（ＭＵＶ）バッファ２５０を備えている。結果オーガナイザ２４９からの結果は、必要であれば外部インタフェース制御部２３８、ローカルメモリ制御部２３６、周辺インタフェース制御部２３７に出力される。 The data cache 230 operates under the control of the connected data cache control unit 240. Data cache 230 is used in a variety of applications, but is primarily used to store recently used values that are likely to be subsequently used in coprocessor 224. In the speed-up process described above, a plurality of data streams are processed mainly by the JPEG encoder / decoder 241 and the main data path unit 242. The parts 241 and 242 are connected to the pixel organizer 246 and the two operand organizers 247 and 248 in parallel. The processed streams from the parts 241 and 242 are transferred to the result organizer 249, where they are processed and reformatted if necessary. Since there are many cases where it is desired to record intermediate results, a multi-use value (MUV) buffer 250 is provided between the pixel organizer 246 and the result organizer 249 in addition to the data cache 230. The result from the result organizer 249 is output to the external interface control unit 238, the local memory control unit 236, and the peripheral interface control unit 237 if necessary.

図２中の点線で示されているように、さらなる（第３の）データパス部２４３を、ＪＰＥＧ符号化／復号器２４１とメインデータパス部２４２といった他の二つのデータパスと「並列に」接続することも可能である。また、四あるいはそれ以上のデータパスを構成することも同様に可能である。なお、パスは「並列に」接続されてはいるが、並列に動作するものではなく、一つのパスのみが一時に動作するものであることに注意されたい。 As shown by the dotted lines in FIG. 2, the additional (third) data path unit 243 is “in parallel” with the other two data paths, such as the JPEG encoder / decoder 241 and the main data path unit 242. It is also possible to connect. It is also possible to configure four or more data paths. Note that although the paths are connected "in parallel", they do not operate in parallel, but only one path operates at a time.

図２のＡＳＩＣの全体設計は以下のような考えに基づいてなされた。まず第１に、印刷ページでは小さな、或は一時的な画質劣化をも生じさせないことが必須である。映像信号では、このような小さな画質劣化が存在したとしても人間の目では感知されることはないが、印刷物では印刷ページに永久的に小さな画質劣化が残ってしまい、目立つようになることもあるからである。更に、プリンタに至るまでに遅延が生じると、ページがプリンタ内を移動している間に白い未印刷の部位がページ上にできてしまうことがあるため、見苦しいものとなる。そのため、高品質かつ高速に結果を提供することが必須となり、ソフトウエアを用いるアプローチよりもハードウェアの高速性に頼るアプローチの方が好ましい。 The overall design of the ASIC in FIG. 2 was made based on the following idea. First of all, it is essential that a print page does not cause small or temporary image quality degradation. Even if such small image quality degradation is present in the video signal, it is not perceived by the human eye. However, in printed matter, the image quality may be permanently noticeable on the printed page and become noticeable. Because. Furthermore, if there is a delay before reaching the printer, white unprinted parts may be formed on the page while the page is moving through the printer, which is unsightly. Therefore, providing results with high quality and high speed is essential, and an approach that relies on the high speed of hardware is preferable to an approach that uses software.

第２に、印刷処理を実行するのに必要なさまざまな動作ステップ（アルゴリズム）すべてをリストアップし、各ステップごとに対応するハードウェアを並べ上げると、全体のハードウェア量は膨大なものになり、非常に高価なものになってしまう。また、ハードウェアの動作スピードは、処理に必要なデータをフェッチしたり、あるいは処理で生成されたデータを転送するレートによって本質的に制限される。すなわち、動作スピードはインタフェースの帯域幅によって制約を受ける。 Second, if all the various operation steps (algorithms) necessary to execute the printing process are listed and the corresponding hardware is listed for each step, the total amount of hardware becomes enormous. It will be very expensive. In addition, the operation speed of hardware is essentially limited by the rate at which data necessary for processing is fetched or data generated by processing is transferred. That is, the operating speed is limited by the bandwidth of the interface.

これに対して、全体のＡＳＩＣのデザインは、ハードウェアの全体量を模式的に表したときに、必要なハードウェアの種々の部位が（ａ）重複しており、（ｂ）同時に実行されることはない、という驚くべき事実に基づいている。特に、この点はデータの処理をする前にデータを転送する際のオーバヘッドにおいて顕著にみられる。 On the other hand, in the overall ASIC design, when the total amount of hardware is schematically represented, various parts of the necessary hardware are (a) overlapping, and (b) are executed simultaneously. It is based on the surprising fact that it never happens. In particular, this point is conspicuous in the overhead in transferring data before processing the data.

このような観点から、いつくかのステップを経て、ハードウェアのすべての部位をできるだけアクティブにしながら、ハードウェア量を低減することにした。第１のステップにおいて、画像操作では多くの場合同一の基本的種類の繰り返し演算が必要であることを認識した。従って、データがストリーム状に入力されると、特定の処理を行うように処理部を構成して長いデータストリームを処理し、その後次に必要な処理タイプに合うように処理部を再構成する。データストリームがかなり長いと、再構成に要する時間は全体の処理時間と比較して無視できるほど短くなるため、スループットが向上することになる。 From this point of view, through some steps, we decided to reduce the amount of hardware while making all parts of the hardware as active as possible. In the first step, it was recognized that image operations often require the same basic type of iterative computation. Therefore, when data is input in the form of a stream, the processing unit is configured to perform a specific process, a long data stream is processed, and then the processing unit is reconfigured to match the required processing type. If the data stream is quite long, the time required for reconstruction will be negligibly short compared to the overall processing time, thus improving throughput.

また、複数のデータ処理パスを設けると、他のパスを使用している間に一つのパスを再構成することで、再構成に要する時間の無駄を省くこともできる。すなわち、メインデータパス部２４２がより汎用的な処理を実行している間に、他のデータパスにおいて部位２４１のようなＪＰＥＧ符号化／復号、あるいは追加部位２４３がある場合にはエントロピー符号化やハフマン符号化などのより特化した処理を行うことができる。 In addition, if a plurality of data processing paths are provided, it is possible to eliminate a waste of time required for reconfiguration by reconfiguring one path while using other paths. That is, while the main data path unit 242 is performing more general processing, if there is JPEG encoding / decoding such as the part 241 in another data path, or if there is an additional part 243, entropy coding or More specialized processing such as Huffman coding can be performed.

更に、処理を進めている間に、処理部位へのデータのフェッチや転送を行うこともできる。また、種々の種別のデータを標準化、統一することにより、更に高速化を図ることができるとともに、ハードウェア資源も有効に利用することができる。従って、データのフェッチや転送に関わる全体のオーバヘッドを低減することができる。 Furthermore, data can be fetched or transferred to a processing site while processing is in progress. In addition, by standardizing and unifying various types of data, the speed can be further increased, and hardware resources can be used effectively. Accordingly, it is possible to reduce the overall overhead related to fetching and transferring data.

ここで重要なことは、コプロセッサ２２４がホストＣＰＵ２０２（図１）の制御の下で実行されることである。この点で、命令制御部２３５が、コプロセッサ２２４全体の制御を統括する。命令制御部２３５は、ＣＢｕｓ（Ｃバス）と呼ばれる制御バス２３１によってコプロセッサ２２４を動作させる。ＣＢｕｓ２３１はそれぞれのモジュール中のセットレジスタ（図１の２３１）を含むモジュール２３６−２５０のそれぞれに接続され、コプロセッサ２２４の全体の動作を可能とする。図２を見やすくするために、図２では制御バス２３１からそれぞれのモジュール２３６−２５０までの接続は示していない。 What is important here is that the coprocessor 224 is executed under the control of the host CPU 202 (FIG. 1). In this regard, the instruction control unit 235 controls the entire coprocessor 224. The instruction control unit 235 operates the coprocessor 224 through a control bus 231 called CBus (C bus). The CBus 231 is connected to each of the modules 236-250 including the set register (231 in FIG. 1) in each module, and enables the entire operation of the coprocessor 224. In order to make FIG. 2 easier to see, FIG. 2 does not show connections from the control bus 231 to the respective modules 236-250.

図３は、利用可能なモジュールレジスタの模式的なレイアウト２６０を示した図である。レイアウト２６０は、コプロセッサ２２４の全体制御のためのレジスタ２６１と命令制御部２３５とが含まれる。コプロセッサモジュール２３６−２６０には、同様のレジスタ２６２が含まれる。
３．２ホスト／コプロセッサ・キューイング
上述のアーキテクチャによれば、ホストプロセッサ２０２と画像コプロセッサ２０４との間での協調が十分にとられていることが必要であることがわかる。しかしながら、これに対する解は一般的なものであり、上述のアーキテクチャ特有のものではないため、以下ではより一般的な計算ハードウェア環境を想定して説明する。 FIG. 3 shows a schematic layout 260 of available module registers. The layout 260 includes a register 261 and an instruction control unit 235 for overall control of the coprocessor 224. The coprocessor module 236-260 includes a similar register 262.
3.2 Host / Coprocessor Queuing According to the architecture described above, it can be seen that sufficient coordination between the host processor 202 and the image coprocessor 204 is required. However, since the solution to this is general and is not specific to the above-described architecture, the following description will be given assuming a more general computing hardware environment.

現代のコンピュータシステムは、動的メモリ割当を行うために何かしらのメモリ管理手法を必要とする。１つあるいは複数のコプロセッサを有するシステムでは、コプロセッサによる動的メモリ割当とメモリ使用との間で同期をとるための手法が必要である。一般的なコンピュータハードウェア構成では、ＣＰＵと特別のコプロセッサとを備え、それぞれが一連のメモリ群を共有している。このようなシステムでは、ＣＰＵのみがメモリを動的に割り当てることのできるシステム中唯一の部位である。コプロセッサが使用するようにＣＰＵがメモリを割り当てた時点で、コプロセッサは当該メモリが不必要になりＣＰＵによって解放されるまで、自由にメモリを利用することができる。すなわち、コプロセッサがメモリの使用を終えた後にメモリが解放されることを保証するために、ＣＰＵとコプロセッサとの間には何かしらの同期が必要となる。この同期に関しては、種々の解決策が示されてはいるが、必ずしも性能の面で好ましいとは言い難い。 Modern computer systems require some form of memory management to perform dynamic memory allocation. In systems with one or more coprocessors, a technique is needed to synchronize between dynamic memory allocation and memory usage by the coprocessor. A typical computer hardware configuration includes a CPU and a special coprocessor, each sharing a series of memory groups. In such a system, only the CPU is the only part of the system that can dynamically allocate memory. When the CPU allocates memory for use by the coprocessor, the coprocessor is free to use the memory until it becomes unnecessary and freed by the CPU. That is, some synchronization is required between the CPU and the coprocessor to ensure that the memory is freed after the coprocessor finishes using the memory. Although various solutions have been shown for this synchronization, it is not necessarily preferable in terms of performance.

静的に割り当てられたメモリを用いれば、同期の問題を避けることができるが、メモリ資源の利用を動的に適応させることが不可能となる。同様に、コプロセッサが処理の実行を終えるまでＣＰＵをブロックし待たせておくことも可能であるが、並列性を失い、全体のシステム性能を犠牲にすることになる。コプロセッサからの処理の終了を知らせるインタラプト信号の利用も可能であるが、コプロセッサのスループットが非常に高い場合には大きな処理のオーバヘッドとなってしまう。 Using statically allocated memory can avoid synchronization problems, but it becomes impossible to dynamically adapt the use of memory resources. Similarly, it is possible to block the CPU and wait until the coprocessor finishes executing the process, but it loses parallelism and sacrifices overall system performance. An interrupt signal for informing the end of processing from the coprocessor can be used, but if the throughput of the coprocessor is very high, a large processing overhead occurs.

高性能要件の他に、このようなシステムでは動的なメモリ欠乏に対してしなやかに対処しなければならない。多くのコンピュータシステムでは種々のメモリサイズ構成が可能となっているが、多くのメモリを具備するシステムでは有効資源を最大限に利用して性能を最大にすることが重要である。同様に、最小のメモリサイズ構成のシステムでは、少ないメモリながらも十分な動作を可能にすべきであり、少なくともメモリ欠乏の際には性能がしなやかに劣化すべきである。 In addition to high performance requirements, such systems must flexibly cope with dynamic memory starvation. In many computer systems, various memory size configurations are possible, but in a system with many memories, it is important to maximize performance by making the best use of effective resources. Similarly, a system with a minimum memory size configuration should be able to operate satisfactorily with a small amount of memory, and the performance should be gracefully degraded at least in the event of a memory shortage.

これらの問題を解決するために、システム性能を最大にするとともに、コプロセッサのメモリ使用をシステム容量や実行する処理の複雑さに動的に適応化する同期機構が必要である。図４に、（ホスト）ＣＰＵとコプロセッサとの同期をとる好適な構成を示す。図中の参照番号は、図１の説明において利用したものを用いている。 To solve these problems, there is a need for a synchronization mechanism that maximizes system performance and dynamically adapts the coprocessor's memory usage to system capacity and complexity of processing to be performed. FIG. 4 shows a preferred configuration for synchronizing the (host) CPU and the coprocessor. The reference numbers in the figure are the same as those used in the description of FIG.

図４において、ＣＰＵ２０２はシステム中のすべてのメモリ管理を統括している。ＣＰＵ２０２が、自身、あるいはコプロセッサ２２４での利用のために、メモリ２０３を割り当てる。コプロセッサ２２４はグラフィックス特有の命令セットを有しており、ホストプロセッサ２０２と共有しているメモリ２０３から命令１０２２を実行することができる。これらの命令のそれぞれは結果１０２４を共有メモリ２０３に書き込むことができ、またメモリ２０３からオペランドを読み込むこともできる。ここでコプロセッサ命令のオペランド１０２３や結果１０２４を記憶するに要するメモリ２０３の量は、処理の複雑さや種別に依存する。 In FIG. 4, the CPU 202 controls all memory management in the system. The CPU 202 allocates memory 203 for use by itself or the coprocessor 224. The coprocessor 224 has a graphics specific instruction set and can execute instructions 1022 from the memory 203 shared with the host processor 202. Each of these instructions can write the result 1024 to the shared memory 203 and can also read operands from the memory 203. Here, the amount of the memory 203 required to store the operand 1023 of the coprocessor instruction and the result 1024 depends on the complexity and type of processing.

ＣＰＵ２０２は、コプロセッサ２２４によって実行される命令１０２２を生成する処理をも行う。ＣＰＵ２０２とコプロセッサ２２４との間の並列性を最大にするために、ＣＰＵ２０２によって生成された命令は１０２２に示されるようにキューイングされてからコプロセッサ２２４において実行される。キュー１０２２中の各命令は、コプロセッサ２２４のためにホストＣＰＵ２０２によって割り当てられた共有メモリ２０３中のオペランド１０２３や結果１０２４を参照することができる。 The CPU 202 also performs processing for generating an instruction 1022 to be executed by the coprocessor 224. In order to maximize parallelism between CPU 202 and coprocessor 224, instructions generated by CPU 202 are queued as shown at 1022 and then executed in coprocessor 224. Each instruction in the queue 1022 can refer to the operand 1023 and the result 1024 in the shared memory 203 allocated by the host CPU 202 for the coprocessor 224.

図５に示すように、これらの処理を行うために、命令生成部１０３０、メモリ管理部１０３１、キュー管理部１０３２が接続されている。これらすべてのモジュールはホストＣＰＵ２０２上で単一プロセスとして実行される。コプロセッサ２２４における実行命令は命令生成部１０３０において生成され、メモリ管理部１０３１のサービスを利用して生成された命令のオペランド１０２３や結果１０２４のための領域を割り当てる。また、命令生成部１０３０は、キュー管理部１０３２のサービスを利用して、コプロセッサ２２４で実行する命令をキューイングする。 As shown in FIG. 5, an instruction generation unit 1030, a memory management unit 1031 and a queue management unit 1032 are connected to perform these processes. All these modules are executed as a single process on the host CPU 202. An execution instruction in the coprocessor 224 is generated in the instruction generation unit 1030, and an area for the operand 1023 of the instruction generated using the service of the memory management unit 1031 and the result 1024 is allocated. The instruction generation unit 1030 uses the service of the queue management unit 1032 to queue instructions to be executed by the coprocessor 224.

各命令がコプロセッサ２２４において実行されると、ＣＰＵ２０２はメモリ管理部１０３１によって命令のオペランド用に割り当てられていたメモリを解放することができる。ある命令の結果が次の命令のオペランドとなることも可能であり、その後でＣＰＵによってメモリが解放される。コプロセッサ２２４が命令を終えると同時にインタラプト信号を送出しメモリを解放するのではなく、コプロセッサ２２４が命令を終えた後のある時点でクリーンアップ機構を起動し、命令の処理に要した資源をシステムが解放する。クリーンアップ機構が起動される時点は、メモリ管理部１０３１とキュー管理部１０３２との関係に依存しており、利用可能なシステムメモリ量や各コプロセッサ命令に必要なメモリ量に応じて動的に適応させることができる。 When each instruction is executed in the coprocessor 224, the CPU 202 can release the memory allocated for the operand of the instruction by the memory management unit 1031. The result of one instruction can also be the operand of the next instruction, after which the memory is released by the CPU. Rather than sending an interrupt signal and releasing the memory as soon as the coprocessor 224 finishes the instruction, the cleanup mechanism is activated at some point after the coprocessor 224 finishes the instruction, and the resources required for processing the instruction are reduced. The system releases. The point in time when the cleanup mechanism is activated depends on the relationship between the memory management unit 1031 and the queue management unit 1032 and dynamically depends on the amount of available system memory and the amount of memory required for each coprocessor instruction. Can be adapted.

図６は、コプロセッサ命令キュー１０２２の構成を模式的に示した図である。命令群はホストＣＰＵ２０２によりペンディング命令キュー１０４０に挿入され、コプロセッサ２２４によって読み出され実行に移される。コプロセッサ２２４における実行処理が終了すると、命令はクリーンアップキュー１０４１に転送され、コプロセッサ２２４が処理を終えた後で命令が必要とした資源の解放を行う。 FIG. 6 is a diagram schematically showing the configuration of the coprocessor instruction queue 1022. The instruction group is inserted into the pending instruction queue 1040 by the host CPU 202, read by the coprocessor 224, and executed. When the execution process in the coprocessor 224 is completed, the instruction is transferred to the cleanup queue 1041, and after the coprocessor 224 finishes the process, the resources required by the instruction are released.

命令キュー１０２２自身は固定あるいは動的可変サイズの巡回バッファとして構成される。命令キュー１０２２は、ＣＰＵ２０２による命令の生成とコプロセッサ２２４における命令の実行とを分離している。各命令のオペランドと結果メモリは、命令生成時に命令生成部１０３０からの要求に応じてメモリ管理部１０３１（図５）によって割り当てられる。新しく生成された命令のためのメモリ割当が、以下で説明するメモリ管理部１０３１とキュー管理部１０３２との協調動作を起動させ、利用可能なメモリ量や命令の複雑さにシステムが自動的に適応できるようにしている。 The instruction queue 1022 itself is configured as a fixed or dynamic variable size circular buffer. The instruction queue 1022 separates instruction generation by the CPU 202 and execution of instructions in the coprocessor 224. The operand and result memory of each instruction are allocated by the memory management unit 1031 (FIG. 5) in response to a request from the instruction generation unit 1030 at the time of instruction generation. Memory allocation for newly generated instructions activates the cooperative operation of the memory manager 1031 and queue manager 1032 described below, and the system automatically adapts to the amount of available memory and the complexity of the instructions I can do it.

命令キュー管理部１０２は、コプロセッサ２２４が命令生成部１０３０によって生成された命令を実行し終えるまで、待機することができる。しかし、メモリ管理部１０３１によって割り当てられる命令キュー１０２２とメモリ２０３が十分大きければ、コプロセッサ２２４を全く待つ必要がないか、あるいは少なくともすべての命令シーケンスが終了するまで待機する必要はない。大きなジョブではこれらの待機時間が、数分間にも及ぶため、効果は大きい。しかし、ピーク時のメモリ使用量は利用可能なメモリ量を容易に超えることもある。この時点で、キュー管理部１０３２とメモリ管理部１０３１との間で協調的な動作が開始される。 The instruction queue management unit 102 can wait until the coprocessor 224 finishes executing the instruction generated by the instruction generation unit 1030. However, if the instruction queue 1022 and the memory 203 allocated by the memory management unit 1031 are sufficiently large, there is no need to wait for the coprocessor 224 at all, or at least until all instruction sequences are completed. For large jobs, these waiting times can be several minutes, so the effect is great. However, peak memory usage can easily exceed available memory. At this time, a cooperative operation is started between the queue management unit 1032 and the memory management unit 1031.

命令キュー管理部１０３２にとって、終了した命令を「クリーンアップ」し、動的に割り当てられたメモリを解放するようにとの指示がなされる時点は適宜で構わない。メモリ管理部１０３１が利用可能なメモリが少なくなりつつある、あるいはなくなったことを検出した場合には、キュー管理部１０３２にクリーンアップ処理を指示し、コプロセッサ２２４によってもはや利用されていないメモリを解放させる手段をとる。これにより、メモリ管理部１０３１は、ＣＰＵ２０２がコプロセッサ２２４を待つ、あるいはコプロセッサ２２４と同期することなく、命令生成部１０３０からの新しく生成された命令に要するメモリ要求を満足させることができる。 The time when the instruction queue management unit 1032 is instructed to “clean up” the completed instruction and release the dynamically allocated memory may be determined as appropriate. When the memory management unit 1031 detects that the available memory is decreasing or has run out, the queue management unit 1032 is instructed to perform cleanup processing, and the coprocessor 224 releases memory that is no longer used. Take measures. As a result, the memory management unit 1031 can satisfy the memory request for the newly generated instruction from the instruction generation unit 1030 without the CPU 202 waiting for the coprocessor 224 or synchronizing with the coprocessor 224.

メモリ管理部１０３１からキュー管理部１０３２に終了命令をクリーンアップする要求を出しても、命令生成部の新しい要求を満たすに足る十分メモリが解放されなかった場合には、メモリ管理部１０３１はキュー管理部１０３２にペンディング命令キュー１０４０中の処理中命令の一部、例えば半分が終了するまで待機せよ、と要求する。これにより、コプロセッサ２２４命令のいくつかが終了するまでＣＰＵ２０２処理はブロックされることになる。コプロセッサ２２４命令のいくつかが終了すると、これらの命令のオペランドが解放され、要求を満たすに十分なメモリが得られる。処理中の命令の一部のみを待つことにより、少なくともいくつかの命令はペンディング命令キュー１０４０に存在しており、コプロセッサ２２４は常に動作していることになる。多くの場合、ＣＰＵ２０２が待機するペンディング命令キュー１０４０中の一部をクリーンアップすることにより、メモリ管理部１０３１にとって十分なメモリが解放され、命令生成部１０３０の要求を満たすことができる。 If the memory management unit 1031 issues a request to clean up the end instruction to the queue management unit 1032, but the memory management unit 1031 does not free enough memory to satisfy the new request of the instruction generation unit, the memory management unit 1031 Requests the unit 1032 to wait until a part of the pending command in the pending command queue 1040, for example half, is completed. This blocks CPU 202 processing until some of the coprocessor 224 instructions are complete. When some of the coprocessor 224 instructions are finished, the operands of these instructions are released, and enough memory is available to satisfy the request. By waiting for only some of the instructions being processed, at least some instructions are in the pending instruction queue 1040 and the coprocessor 224 is always running. In many cases, by cleaning up a part of the pending instruction queue 1040 on which the CPU 202 waits, sufficient memory for the memory management unit 1031 is released, and the request of the instruction generation unit 1030 can be satisfied.

コプロセッサ２２４がペンディング命令の例えば半分が実行終了するまで待機したとしても要求を満たすだけのメモリが解放されなかったという特殊なケースの場合には、メモリ管理部１０３１はすべてのペンディングコプロセッサ命令が終了するまで待機するという最後の手段をとる。システムの現在のメモリ容量を超えるような非常に大きなかつ複雑なジョブなどを除いて、これにより命令生成部１０３０の要求を満たすに十分な資源が解放される。 Even in the special case where the coprocessor 224 waits for half of the pending instructions to finish executing, for example, when the memory sufficient to satisfy the request has not been released, the memory management unit 1031 will send all pending coprocessor instructions. Take the last measure of waiting until finished. Except for very large and complex jobs that exceed the current memory capacity of the system, this frees up sufficient resources to satisfy the request of the instruction generator 1030.

このようなメモリ管理部１０３１とキュー管理部１０３２との協調動作により、システムに与えられたメモリ量２０３の中で効率的にスループットを最大にすることが可能となる。より多くのメモリがあれば同期の必要性は少なくなり、より大きなスループットを得ることができる。逆に、より少ないメモリの場合には、コプロセッサ２２４が乏しいメモリ２０３を使っての処理が終わるまで待機することが多くなり、利用可能なメモリが少なくても動作はするものの性能は劣化する。 Through such cooperative operation of the memory management unit 1031 and the queue management unit 1032, it is possible to efficiently maximize the throughput in the memory amount 203 given to the system. With more memory, the need for synchronization is reduced and greater throughput can be obtained. Conversely, in the case of less memory, the coprocessor 224 often waits until the processing using the scarce memory 203 is completed, and the performance deteriorates although it operates even if there is little available memory.

命令生成部１０３０からの要求を満たす際にメモリ管理部１０３１が行う処理ステップを以下にまとめる。各ステップは順々に実行され、ステップ後にメモリ管理部１０３１が要求を満たすに十分なメモリ２０３が得られるかどうか調べる。十分なメモリが得られる場合には要求が満たされるため、ステップを終了する。得られなかった場合には、次のステップに進み、要求を満たすべくより過激な処理に進む。
１．利用可能なメモリ２０３で要求を満たすことを試みる
２．すべての終了した命令をクリーンアップする
３．ペンディング命令の一部が終了するのを待つ
４．すべてのペンディング命令が終了するのを待つ
なお、要求を満たすために、ペンディング命令のうちの異なる部分（例えば、１／３や２／３）を待機するとか、多量のメモリを使用することがわかっている特定の命令を待機するなど、他のオプションを用いることもできる。 The processing steps performed by the memory management unit 1031 when the request from the instruction generation unit 1030 is satisfied are summarized below. Each step is executed in sequence, and after the step, the memory management unit 1031 checks whether sufficient memory 203 is available to satisfy the request. If sufficient memory is available, the request is satisfied and the step ends. If not, proceed to the next step and proceed to more radical processing to satisfy the request.
1. 1. Try to satisfy the request with available memory 203 2. Clean up all finished instructions. 3. Wait for some pending commands to finish Wait for all pending instructions to finish Note that waiting for a different part of the pending instruction (eg, 1/3 or 2/3) or using a lot of memory to satisfy the request. Other options can be used, such as waiting for a specific command.

図７において、メモリ管理部１０３１とキュー管理部１０３２との間での協調動作に加えて、固定長命令キューバッファ１０５０が溢れた場合にはキュー管理部１０３２がコプロセッサ２２４と同期をとることもできる。このような状況を図７に示しており、ペンディング命令キュー１０４０は長さ１０個の命令のキューとしている。付加される最新の命令が最も大きい数を有しているため、領域が溢れると最新の命令は位置９に格納される。次にコプロセッサ２２４に入力される命令は位置０において待機している。 In FIG. 7, in addition to the cooperative operation between the memory management unit 1031 and the queue management unit 1032, the queue management unit 1032 may synchronize with the coprocessor 224 when the fixed-length instruction queue buffer 1050 overflows. it can. FIG. 7 shows such a situation, and the pending instruction queue 1040 is a queue of instructions having a length of ten. Since the latest instruction to be added has the largest number, the latest instruction is stored in position 9 when the area overflows. The next command input to coprocessor 224 is waiting at position 0.

領域が溢れた場合には、キュー管理部１０３２はコプロセッサ２２４がペンディング命令の例えば半分の処理を終えるまで待機する。この待機により、通常はキュー管理部１０３２によって挿入される新しい命令に必要な十分な領域が解放される。新しい命令をスケジューリングする際のキュー管理部１０３２の動作は以下の通りである。
１．命令キュー１０４０に十分な領域が残っているかテストする
２．十分な領域が残っていない場合は、コプロセッサがある所定数の命令が終了するまで待機する
３．新しい命令をキューに挿入する
ある命令が終了するのを待機せよと指示されたキュー管理部１０３２の動作は以下の通りである。
１．命令が終了したとコプロセッサ２２４から指示されるまで待機する
２．クリーンアップされていない終了した命令がある場合には、次に終了した命令をキューから削除する
新しい命令を生成する際の命令生成部１０３０の動作は以下の通りである。
１．命令オペランド１０２３に必要なメモリをメモリ管理部１０３１に要求する
２．転送する命令を生成する
３．コプロセッサ命令をキュー管理部１０３２に転送し実行する
以上の動作プロセスを擬似コードの形で示した例を以下に示す。 When the area overflows, the queue management unit 1032 waits until the coprocessor 224 finishes, for example, half the processing of the pending command. By this standby, sufficient area necessary for a new instruction that is normally inserted by the queue management unit 1032 is released. The operation of the queue management unit 1032 when scheduling a new instruction is as follows.
1. 1. Test whether enough space remains in the instruction queue 1040. 2. If there is not enough space left, wait until the coprocessor finishes a certain number of instructions. Inserting a new instruction into the queue The operation of the queue management unit 1032 instructed to wait for a certain instruction to end is as follows.
1. 1. Wait until the coprocessor 224 indicates that the instruction is complete When there is a finished instruction that has not been cleaned up, the operation of the instruction generation unit 1030 when generating a new instruction that deletes the next finished instruction from the queue is as follows.
1. 1. Requests memory management unit 1031 for memory required for instruction operand 1023. 2. Generate instructions to transfer. An example in which the above-described operation process is transferred in the form of pseudo code is shown below.

メモリ管理
ＡＬＬＯＣＡＴＥ＿ＭＥＭＯＲＹ
ＢＥＧＩＮ
ＩＦ要求を満たすのに十分なメモリが得られないとすると
ＴＨＥＮ終了した命令すべてをクリーンアップ（一掃）する
ＥＮＤＩＦ
ＩＦ要求を満たすのに十分なメモリが未だ得られないとすると
ＴＨＥＮＷＡＩＴ＿ＦＯＲ＿ＩＮＳＴＲＵＣＴＩＯＮを呼び出
し、ペンディング命令の半分の終了を待つ
ＥＮＤＩＦ
ＩＦ要求を満たすのに十分なメモリが未だ得られないとすると
ＴＨＥＮエラーを出力し戻る
ＥＮＤＩＦ割り当てたメモリを戻す
キュー管理
ＳＣＨＥＤＵＬＥ＿ＩＮＳＴＲＵＣＴＩＯＮ
ＢＥＧＩＮ
ＩＦ命令キューに十分な領域が得られないとすると
ＴＨＥＮある所定数の命令をコプロセッサが終了するまで待機する
ＥＮＤＩＦ新しい命令をキューに付加する
ＥＮＤ
ＷＡＩＴ＿ＦＯＲ＿ＩＮＳＴＲＵＣＴＩＯＮ（ｉ）
ＢＥＧＩＮ
命令ｉが終了したとコプロセッサから指示されるまで待機する
ＷＨＩＬＥ終了しているもののクリーンアップされていない命令が
ある
ＤＯ
ＩＦ次の終了した命令にクリーンアップ機能が備わっている
ＴＨＥＮクリーンアップ機能を呼び出す
ＥＮＤＩＦキューから終了した命令を削除する
ＤＯＮＥ
ＥＮＤ
命令生成部
ＧＥＮＥＲＡＴＥ＿ＩＮＳＴＲＵＣＴＩＯＮＳ
ＢＥＧＩＮ
ＡＬＬＯＣＡＴＥ＿ＭＥＭＯＲＹを呼び出し、命令オペランドに必要な
メモリをメモリ管理部において割り当てる
転送する命令を生成する
ＳＣＨＥＤＵＬＥ＿ＩＮＳＴＲＵＣＴＩＯＮを呼び出し、コプロセッサ
命令をキュー管理部に転送し実行する
ＥＮＤ
３．３コプロセッサのレジスタの説明
図１と３において説明したように、コプロセッサ２２４は各命令ストリームを実行するために複数のレジスタを備える。 Memory management ALLOCATE_MEMORY
BEGIN
If there is not enough memory available to satisfy the IF request, THEN cleans up all the terminated instructions.
If there is not enough memory available to satisfy the IF request, call THEN WAIT_FOR_INSTRUTION and wait for the end of half the pending instruction.
If there is not enough memory available to satisfy the IF request, output a THEN error and return. ENDIF Return the allocated memory. Queue management SCHEDULE_INSTRUTION
BEGIN
If there is not enough space in the IF instruction queue, THEN waits for a certain number of instructions until the coprocessor finishes ENDIF Adds new instructions to the queue END
WAIT_FOR_INSTRUTION (i)
BEGIN
Wait until instructed by the coprocessor that instruction i has finished WHILE There are instructions that have finished but have not been cleaned up DO
IF The next finished instruction has a cleanup function. THEN Calls the cleanup function. ENDIF Deletes the finished instruction from the queue. DONE
END
Instruction generator GENERATE_INSTRUTIONS
BEGIN
Call ALLOCATE_MEMORY and allocate the memory required for the instruction operand in the memory management unit. Generate a transfer instruction. Call SCHEDULE_INSTRUTION, transfer the coprocessor instruction to the queue management unit, and execute it. END
3.3 Coprocessor Register Description As described in FIGS. 1 and 3, the coprocessor 224 includes a plurality of registers for executing each instruction stream.

図２中のモジュールに対して、表１はコプロセッサ２２４において用いられるレジスタの名前、種別、説明を示しており、付録Ｂはそれぞれのレジスタの各フィールドを説明している。
レジスタの説明 For the modules in FIG. 2, Table 1 shows the names, types and descriptions of the registers used in the coprocessor 224, and Appendix B describes the fields of each register.
Register description

これらのレジスタ中で着目すべきものは以下のものである。
（ａ）命令ポインタレジスタ（ｉｃ＿ｉｐａとｉｃ＿ｉｐｂ）。これらのレジスタペアは現在実行している命令の仮想アドレスを格納する。仮想アドレスの昇順に命令がフェッチされ実行される。制御が不連続な仮想アドレスに移る場合にはジャンプ命令が用いられる。各命令には、３２ビットのシーケンス番号が付与され、シーケンス番号は一命令ごとに１ずつ増える。シーケンス番号はコプロセッサ２２４とホストＣＰＵ２０２双方において、命令の生成と実行の同期をとるために用いられる。
（ｂ）終了レジスタ（ｉｃ＿ｆｎａとｉｃ＿ｆｎｂ）。これらのレジスタペアは、終了した命令のシーケンス番号を格納する。
（ｃ）ＴｏＤｏレジスタ（ｉｃ＿ｔｄａとｉｃ＿ｔｄｂ）。これらのレジスタペアは、キューイングされている命令のシーケンス番号を格納する。
（ｄ）インタラプトレジスタ（ｉｃ＿ｉｎｔａとｉｃ＿ｉｎｔｂ）。これらのレジスタペアは、インタラプトをかけるシーケンス番号を格納する。
（ｅ）インタラプト状態レジスタ（ｉｃ＿ｓｔａｔ．ａ＿ｐｒｉｍｅｄとｉｃ＿ｓｔａｔ．ｂ＿ｐｒｉｍｅｄ）。これらのレジスタペアは、インタラプト、終了レジスタとが合致した時点でインタラプトを起動するフラグであるプライムビットを格納する。本ビットは、インタラプト状態（ｉｃ＿ｓｔａｔ）レジスタ中の他のインタラプトイネーブルビットや他の状態／構成情報と同様に格納される。
（ｆ）レジスタアクセスセマフォア（ｉｃ＿ｓｅｍａとｉｃ＿ｓｅｍｂ）。ホストＣＰＵ２０２は、コプロセッサ２２４への高速性、即ち、１回以上のレジスタへの書き込みを必要とするレジスタアクセスに先立ちセマフォアを入手しておかなければならない。これに対して、高速性を必要としないレジスタアクセスの場合は何時でも実行することができる。ホストＣＰＵ２０２がセマフォアを入手することに付随する欠点は、現在実行中の命令が終了するまでコプロセッサの実行が中断することである。レジスタアクセスセマフォアは、コプロセッサ２２４の構成／状態レジスタの１ビットとして構成される。これらのレジスタは命令制御美のレジスタ領域中に存在する。前述の通り、コプロセッサの各サブモジュールは、それぞれ構成／状態レジスタを備えており、通常の命令実行においてレジスタが設定される。これらのすべてのレジスタは、レジスタマップ上に表されており、多くは命令実行において暗黙的に修正される。ホストはレジスタマップを介してこれらのレジスタの内容を知ることができる。
３．４複数ストリームフォーマット
前述の通り、資源を最大限に有効に利用するために、また外部周辺装置に高速に出力するために、コプロセッサ２２４は２つの独立な命令ストリームの１つを実行する。通常は、１つの命令ストリームは出力デバイスが適時点で必要とする現在の出力ページに対応しており、２つ目の命令ストリームが他の命令ストリームが休止中であるときにコプロセッサ２２４のモジュールを利用する。ここで、最も重要な点は、必要な出力データを適時点で出力することであるとともに、続くページ、バンドなどの準備のために資源を最大限に利用することである。従って、コプロセッサ２２４は、全く独立であるものの同じように実行される２つの命令ストリーム（以下、ＡとＢと呼ぶ）を実行するように設計される。命令はホストＣＰＵ２０２上で動作しているソフトウエアによって生成され、ラスタ画像アクセラレータカード２２０に転送されコプロセッサ２２４によって実行されることが望ましい。通常動作では、命令ストリームの１つ（ストリームＡ）は、他の命令ストリーム（ストリームＢ）よりも高い優先度で動作する。命令ストリームあるいはキューはホストＲＡＭ２０３（図１）中の一つあるいは複数のバッファに書き込まれる。バッファは開始時点で割り当てられ、アプリケーションの実行中はホスト２０３の物理メモリに固定される。各命令はホストＲＡＭ２０３の仮想メモリ環境に格納されることが好ましく、ラスタ画像コプロセッサ２２４が仮想アドレスから物理アドレスへの変換を行い、次の命令の位置としてホストＲＡＭ２０３中の対応する物理アドレスを決定する。これらの命令は順々にコプロセッサ２２４のローカルメモリに格納される。 Of these registers, the following are worth noting.
(A) Instruction pointer registers (ic_ipa and ic_ipb). These register pairs store virtual addresses of currently executing instructions. Instructions are fetched and executed in ascending order of virtual addresses. A jump instruction is used when the control moves to a discontinuous virtual address. Each instruction is given a 32-bit sequence number, and the sequence number is incremented by one for each instruction. The sequence number is used in both the coprocessor 224 and the host CPU 202 to synchronize instruction generation and execution.
(B) End registers (ic_fna and ic_fnb). These register pairs store the sequence number of the finished instruction.
(C) ToDo registers (ic_tda and ic_tdb). These register pairs store the sequence numbers of the queued instructions.
(D) Interrupt registers (ic_inta and ic_intb). These register pairs store the sequence numbers to be interrupted.
(E) Interrupt status registers (ic_stat.a_primed and ic_stat.b_primed). These register pairs store a prime bit which is a flag for starting an interrupt when the interrupt and end register match. This bit is stored in the same manner as other interrupt enable bits and other state / configuration information in the interrupt status (ic_stat) register.
(F) Register access semaphores (ic_sema and ic_semb). The host CPU 202 must obtain a semaphore prior to register access that requires high speed to the coprocessor 224, that is, one or more register writes. On the other hand, register access that does not require high speed can be executed at any time. A drawback associated with obtaining a semaphore by the host CPU 202 is that execution of the coprocessor is suspended until the currently executing instruction is completed. The register access semaphore is configured as one bit of the configuration / status register of the coprocessor 224. These registers exist in the register area of the instruction control beauty. As described above, each sub-module of the coprocessor includes a configuration / status register, and the register is set in normal instruction execution. All these registers are represented on the register map, and many are implicitly modified in instruction execution. The host can know the contents of these registers via the register map.
3.4 Multiple Stream Format As described above, the coprocessor 224 executes one of two independent instruction streams for maximum effective use of resources and for high speed output to external peripheral devices. . Typically, one instruction stream corresponds to the current output page that the output device needs at the right time, and the second instruction stream is a module of the coprocessor 224 when the other instruction stream is paused. Is used. Here, the most important point is to output necessary output data at a suitable time, and to make maximum use of resources for preparation of subsequent pages, bands, and the like. Thus, the coprocessor 224 is designed to execute two instruction streams (hereinafter referred to as A and B) that are completely independent but executed in the same way. The instructions are preferably generated by software running on the host CPU 202, transferred to the raster image accelerator card 220, and executed by the coprocessor 224. In normal operation, one of the instruction streams (stream A) operates at a higher priority than the other instruction stream (stream B). The instruction stream or queue is written to one or more buffers in the host RAM 203 (FIG. 1). The buffer is allocated at the start time, and is fixed to the physical memory of the host 203 during execution of the application. Each instruction is preferably stored in a virtual memory environment of the host RAM 203, and the raster image coprocessor 224 performs conversion from a virtual address to a physical address, and determines a corresponding physical address in the host RAM 203 as the position of the next instruction. To do. These instructions are sequentially stored in the local memory of the coprocessor 224.

図８は、ホストＲＡＭ２０３中に格納されている２つのストリームＡとＢのフォーマットを示す図である。ストリームＡとＢそれぞれのフォーマットは本質的に同一である。コプロセッサ２２４における簡単な実行モデルは、以下のものから構成される。
＊ＡストリームとＢストリームの２つの命令仮想ストリーム
＊通常はある時点で１つのみの命令が実行される
＊どちらかのストリームが優先権を有することもできるし、「ラウンドロビン」的に優先権を交互にすることもできる
＊どちらかのストリームを「ロック」して、ストリーム優先権や他のストリームの命令実行可能度に関わらず、確実に実行することもできる
＊どちらかのストリームが空であっても良い
＊どちらかのストリームが利用不能であっても良い
＊どちらかのストリームは、後続の命令が「オーバラップ」していなければ、次の命令の実行と「オーバラップ」しているような命令を含んでいても良い
＊各命令は３２ビットの１つずつ増加するような「一意な」シーケンス番号を有する
＊各命令はインタラプトや命令実行を停止させるコードを有していても良い
＊外部インタフェースの遅延の影響を最小限にするために、命令をあらかじめフェッチしても良い
命令制御部２３５は、コプロセッサ２２４の全体の実行制御を行うためや、必要な時にホストＲＡＭ２０３から命令をフェッチするために、コプロセッサの命令実行モデルを実装している。一つの命令ごとに、命令制御部２３５は命令の復号を行い、ＣＢｕｓ２３１を介してモジュール中の種々のレジスタを構成し、該当モジュールに命令を実行させる処理を行う。 FIG. 8 is a diagram showing the formats of two streams A and B stored in the host RAM 203. The formats of streams A and B are essentially the same. A simple execution model in the coprocessor 224 consists of:
* Two instruction virtual streams of A stream and B stream * Normally, only one instruction is executed at a certain point. * Either stream can have priority or "round robin" priority. * Alternatively * You can "lock" one of the streams to ensure that it is executed regardless of the stream priority and the instruction executability of the other stream * Either stream is empty May be * Either stream may be unavailable * Either stream is "overlapping" with the execution of the next instruction unless the subsequent instruction is "overlapping" * Each instruction has a "unique" sequence number that increments by 32 bits. * Each instruction is an interrupt or instruction * In order to minimize the influence of the delay of the external interface, an instruction may be fetched in advance. The instruction control unit 235 controls the entire execution of the coprocessor 224. In order to fetch an instruction from the host RAM 203 when necessary, a coprocessor instruction execution model is implemented. For each instruction, the instruction control unit 235 decodes the instruction, configures various registers in the module via the CBus 231, and performs processing for causing the corresponding module to execute the instruction.

図９は、命令制御部２３５で実行する命令実行サイクルを簡単な形で示した図である。命令実行サイクルは４つの主なステージ２７６−２７９から成る。第１ステージ２７６では、命令ストリームにおいて命令がペンディング状態であるかどうかを調べる。ペンディング状態である場合には、命令をフェッチして２７７、復号ならびに実行し２７８、レジスタを更新する２７９。
３．５現在のアクティブストリームの決定
第１ステージでは、２つのステップを実行しなければならない。
１．命令がペンディングしているかどうかの決定
２．どの命令ストリームを次にフェッチするかの決定
どの命令がペンディングであるかを決定するためには次の可能性を調べる。
１．命令制御部がイネーブルかどうか
２．内部エラーやインタラプトにより命令制御部が休止しているかどうか
３．ペンディングしている外部エラー状態があるかどうか
４．ＡあるいはＢのストリームがロックしているかどうか
５．どちらかのストリームシーケンス番号がイネーブルかどうか
６．どちらかのストリームがペンディング命令を有しているかどうか
以下に示す擬似コードは、上記ルールに基づいて命令がペンディングしているかどうかを決定するアルゴリズムを示したものである。このアルゴリズムは、既知の技術を用いて、命令制御部２３５中に状態遷移機械を介してハードウェアとして実装することができる。 FIG. 9 is a diagram showing the instruction execution cycle executed by the instruction control unit 235 in a simple form. The instruction execution cycle consists of four main stages 276-279. In the first stage 276, it is checked whether the instruction is pending in the instruction stream. If it is pending, the instruction is fetched 277, decoded and executed 278, and the register is updated 279.
3.5 Determining the current active stream In the first stage, two steps must be performed.
1. 1. Determine if the instruction is pending Determining which instruction stream to fetch next To determine which instruction is pending, examine the following possibilities:
1. 1. Whether the instruction control unit is enabled 2. Whether the instruction control unit is paused due to an internal error or interrupt. Whether there are any pending external error conditions 4. Whether A or B stream is locked Whether either stream sequence number is enabled Whether or not either stream has a pending instruction The pseudo code shown below shows an algorithm for determining whether an instruction is pending based on the above rules. This algorithm can be implemented as hardware in the instruction control unit 235 via a state transition machine using a known technique.

ｉｆエラーモードでなく、稼働モードであり、バイパスモードでもなく、自己
診断モードである
ｉｆＡストリームがロックされていて休止中でない
ｉｆＡストリームが稼働モードであり、かつ「Ａストリームのシーケン
ス番号が休止中、あるいはＡストリームに命令が存在する」
命令はペンディングしている
ｅｌｓｅ命令はペンディングしていない
ｅｎｄｉｆ
ｅｌｓｅｉｆＢストリームがロックされていて休止中でない
ｉｆＢストリームが稼働モードであり、かつ「Ｂストリームのシーケ
ンス番号が休止中、あるいはＢストリームに命令が存在する」
命令はペンディングしている
ｅｌｓｅ命令はペンディングしていない
ｅｎｄｉｆ
ｅｌｓｅ／＊ストリームがロックされていない＊／
ｉｆＡストリームが稼働モードで休止中でない、かつ「Ａストリー
ムのシーケンス番号が休止中、あるいはＡストリームに命令が存在する」
命令はペンディングしている
ｅｌｓｅ命令はペンディングしていない
ｅｎｄｉｆ
ｅｎｄｉｆ
ｅｌｓｅ／＊インタフェース制御部が稼動していない＊／
命令はペンディングされていない
ｅｎｄｉｆ
いかなる命令もペンディングしていない場合には、命令制御部２３５はペンデ
ィング命令が見つかるまで「スピン」あるいはアイドル状態となる。 If not in error mode, in operation mode, in bypass mode, in self-diagnostic mode, if A stream is locked and not inactive if A stream is in operation mode, Instruction is in pause or in A stream "
Instruction is pending else instruction is not pending end if
else if B stream is locked and not inactive if B stream is in active mode and “Sequence number of B stream is inactive or there is an instruction in B stream”
Instruction is pending else instruction is not pending end if
else / * stream is not locked * /
if A stream is not paused in active mode and “Sequence number of A stream is paused or there is an instruction in A stream”
Instruction is pending else instruction is not pending end if
end if
else / * Interface control unit is not running * /
Instruction not pending end if
If no command is pending, the command control unit 235 will be "spinned" or idle until a pending command is found.

どのストリームがアクティブであるか、どのストリームを次に実行するかを決定するために、次の状態が調べられる。
１．どちらかのストリームがロックされているか
２．ＡとＢのストリームにどの優先権が付与されており、最後に実行した命令ストリームはどちらであるか
３．どちらかのストリームが稼動しているか
４．どちらかのストリームがペンディング命令を有しているか
以下は、命令制御部によって実装される擬似コードを示したものであり、どのように次にアクティブとなるストリームを決定するかを示している。 The next state is examined to determine which stream is active and which stream is to be executed next.
1. 1. Which stream is locked? 2. Which priority is given to the A and B streams, and which is the last instruction stream executed? 3. Which stream is running? Which stream has a pending instruction The following shows the pseudo code implemented by the instruction control unit and how to determine the next active stream.

ｉｆＡストリームがロックされている
次のストリームはＡ
ｅｌｓｅｉｆＢストリームがロックされている
次のストリームはＢ
ｅｌｓｅ／＊どちらのストリームもロックされていない＊／
ｉｆＡストリームが稼動モード、かつ「Ａストリームのシーケンス番号
が休止中、あるいはＡストリームに命令が存在する」、かつ「Ｂストリームが稼
動モードで、「Ｂストリームのシーケンス番号が休止中、あるいはＢストリーム
に命令が存在」」しなければ、次のストリームはＡ
ｅｌｓｅｉｆＢストリームが稼動モード、かつ「Ｂストリームのシーケ
ンス番号が休止中、あるいはＢストリームにペンディング命令が存在する」、か
つ「Ａストリームが稼動モードで、「Ａストリームのシーケンス番号が休止中、
あるいはＡストリームに命令が存在」」しなければ、次のストリームはＢ
ｅｌｓｅ／＊どちらのストリームも命令が存在しない＊／
ｉｆｐｒｉ＝０／＊Ａ高、Ｂ低＊／
次のストリームはＡ
ｅｌｓｅｉｆｐｒｉ＝１／＊Ａ低、Ｂ高＊／
次のストリームはＢ
ｅｌｓｅｉｆｐｒｉ＝２ｏｒ３／＊ラウンドロビン＊／
ｉｆ最後のストリームがＡ
次のストリームはＢ
ｅｌｓｅ
次のストリームはＡ
ｅｎｄｉｆ
ｅｎｄｉｆ
ｅｎｄｉｆ
ｅｎｄｉｆ
条件は常に変化しているため、すべての条件を短時間で調べることが必要であ
る。
３．６現在のアクティブストリームのフェッチ命令
次のアクティブ命令ストリームを決定すると、命令制御部２３５は対応する命令ポインタレジスタ（ｉｃ＿ｉｐａとｉｃ＿ｉｐｂ）中のアドレスを用いて命令をフェッチする。しかしながら、有効な命令が既に命令制御部２３５中のプレフェッチバッファ内に存在する場合には、命令制御部２３５は命令をフェッチしない。 if A stream is locked The next stream is A
else if B stream is locked The next stream is B
else / Neither stream is locked * /
if A stream is in active mode and “A stream sequence number is paused or there is an instruction in A stream” and “B stream is in active mode and“ B stream sequence number is paused or B If there is no instruction in the stream, the next stream is A
else if B stream is in operation mode and “B stream sequence number is paused or there is a pending command in B stream” and “A stream is in operation mode and“ A stream sequence number is paused ” ,
Or, if there is no instruction in the A stream, the next stream is B
else / * Neither stream has instructions * /
if pri = 0 / * A high, B low * /
The next stream is A
else if pri = 1 / * A low, B high * /
The next stream is B
else if pri = 2or3 / * round robin * /
if the last stream is A
The next stream is B
else
The next stream is A
end if
end if
end if
end if
Since conditions are constantly changing, all conditions need to be examined in a short time.
3.6 Fetch Instruction for Current Active Stream When the next active instruction stream is determined, the instruction control unit 235 fetches an instruction using the address in the corresponding instruction pointer register (ic_ipa and ic_ipb). However, if a valid instruction already exists in the prefetch buffer in the instruction control unit 235, the instruction control unit 235 does not fetch the instruction.

以下の条件が満たされるときに、プレフェッチバッファ中の命令が有効になる。
１．プレフェッチバッファが有効である
２．プレフェッチバッファ中の命令が現在のアクティブストリームと同じストリームからのものである
プレフェッチバッファの内容の有効性は、ｉｃ＿ｓｔａｔレジスタ中のプレフェッチビットによって表され、当該ビットは命令のプレフェッチが成功した際にセットされる。なお、命令制御部２３５のいかなるレジスタへの外部書き込みも、プレフェッチバッファの内容を無効にさせる。
３．７復号、実行命令
命令がフェッチされ、受理されると、命令制御部２３５は命令を復号し、命令を実行するためにコプロセッサ２２４のレジスタ２２９を構成する。 The instruction in the prefetch buffer becomes valid when the following conditions are met.
1. 1. The prefetch buffer is valid. The instruction in the prefetch buffer is from the same stream as the current active stream The validity of the contents of the prefetch buffer is represented by the prefetch bit in the ic_stat register, which is when the instruction prefetch is successful Set to Note that external writing to any register of the instruction control unit 235 invalidates the contents of the prefetch buffer.
3.7 Decoding and Executing Instruction When an instruction is fetched and accepted, the instruction control unit 235 configures the register 229 of the coprocessor 224 to decode the instruction and execute the instruction.

ラスタ画像コプロセッサ２２４において用いられる命令フォーマットは、命令の生成がホストＣＰＵ２０２からの命令によって実行され、ホストに対して直接的なオーバヘッドになるという点で、従来のプロセッサ命令セットとは異なる。また、命令はホストＲＡＭ２０３に格納され、図１のＰＣＩバス２０６を介してコプロセッサ２２４に転送されるため、命令はできるだけ小型化すべきである。好ましくは、コプロセッサ２２４は単一の命令によって実行開始されることが望ましい。また、将来の変更に最大限対処可能とするためには、命令セットの柔軟性をできるだけ保持することが望ましい。更に、コプロセッサ２２４において実行される命令はオペランドデータの長いストリームにも適用でき、最適な性能が得られるようにすることも好ましい。なお、コプロセッサ２２４が用いる命令復号「哲学」として、「一般的な命令」の復号を簡潔にかつ高速に行うとともに、「一般的でない」処理に対してもコプロセッサ２２４の動作に対して細かい制御をホストシステムが行えるようにデザインを取り入れている。 The instruction format used in the raster image coprocessor 224 differs from the conventional processor instruction set in that instruction generation is performed by instructions from the host CPU 202 and is a direct overhead to the host. Further, since the instruction is stored in the host RAM 203 and transferred to the coprocessor 224 via the PCI bus 206 of FIG. 1, the instruction should be as small as possible. Preferably, coprocessor 224 is initiated by a single instruction. It is also desirable to retain the flexibility of the instruction set as much as possible in order to be able to cope with future changes to the maximum extent. In addition, the instructions executed in the coprocessor 224 are preferably applicable to long streams of operand data so that optimum performance is obtained. As the instruction decoding “philosophy” used by the coprocessor 224, the “general instruction” is decoded simply and at high speed, and the operation of the coprocessor 224 is also fine for “unusual” processing. Designed to allow the host system to control.

図１０は、それぞれが３２ビットの８ワードから成る単一命令２８０フォーマットを示している。各命令は、命令ワード（オプコード）２８１、オペランドの種別を示すオペランドあるいは結果タイプデータワード２８２を含む。３つのオペランドＡ，Ｂ，Ｃのアドレス２８３−２８５も、結果アドレス２８６とともに含まれる。更に、領域２８７も、ホストＣＰＵ２０２が用いる命令に関する情報を格納するために含んでいる。 FIG. 10 shows a single instruction 280 format consisting of 8 words each of 32 bits. Each instruction includes an instruction word (opcode) 281 and an operand or result type data word 282 indicating the type of operand. Addresses 283-285 for the three operands A, B, C are also included along with the result address 286. Further, an area 287 is also included for storing information relating to instructions used by the host CPU 202.

図１１は、命令の命令オプコード２８１の構造２９０を示した図である。命令オプコードは３２ビット長で、主オプコード２９１、補オプコード２９２、インタラプト（Ｉ）ビット２９３、一部復号（Ｐｄ）ビット２９４、レジスタ長（Ｒ）ビット２９５、ロック（Ｌ）ビット２９６、長さ２９７を含む。命令ワード２９０のそれぞれのフィールドの説明を以下の表に示す。 FIG. 11 is a diagram showing a structure 290 of the instruction opcode 281 of the instruction. The instruction opcode is 32 bits long, and main opcode 291, complementary opcode 292, interrupt (I) bit 293, partial decode (Pd) bit 294, register length (R) bit 295, lock (L) bit 296, length 297 including. A description of each field of the instruction word 290 is shown in the following table.

オプコード説明 Opcode explanation

Ｉビットフィールド２９３をセットすることによって、命令が終了した時点で命令の実行がインタラプトされ休止するように命令をコード化することができる。なお、このインタラプトは「命令終了インタラプト」と呼ばれる。一部復号ビット２９４は、一部復号ビット２９４のビットがセットされ、ｉｃ＿ｃｆｇレジスタ中で稼動モードになると、以下に述べるように命令の実行に先立ち種々のモジュールがマイクロコード化されるというような一部復号機能を提供する。ロックビット２９６は、開始にあたり１つ以上の命令を必要とする処理の際に用いられる。この際には、命令に先立ち種々のレジスタがセットされ、次の命令のために現在の命令ストリームを「ロック」される。Ｌビット２９６がセットされると、命令が終了した時点で次の命令が同じストリームからフェッチされる。長さフィールド２９７は各命令の一般的な定義であり、必要となる「入力データ項目」数あるいは「出力データ項目」数として定義され、１６ビット長である。６４、０００項目以上の入力データ項目のストリームに対する処理の場合には、Ｒビット２９５がセットされ、図２のピクセルオーガナイザ２４６中のｐｏ＿ｌｅｎレジスタから入力長を得る。当該レジスタはこのような命令の直前にセットされる。 By setting the I bit field 293, the instruction can be coded such that execution of the instruction is interrupted and paused when the instruction is finished. This interrupt is called an “instruction end interrupt”. The partial decode bit 294 is a bit such that when the partial decode bit 294 bit is set and an operation mode is set in the ic_cfg register, various modules are microcoded prior to instruction execution as described below. Provides partial decoding function. The lock bit 296 is used for processing that requires one or more instructions to start. In this case, various registers are set prior to the instruction and the current instruction stream is "locked" for the next instruction. When the L bit 296 is set, the next instruction is fetched from the same stream when the instruction ends. The length field 297 is a general definition of each instruction, is defined as the number of necessary “input data items” or “output data items”, and is 16 bits long. In the case of processing for a stream of input data items of 64,000 items or more, the R bit 295 is set and the input length is obtained from the po_len register in the pixel organizer 246 of FIG. The register is set immediately before such an instruction.

図１０において、ある命令に必要なオペランド２８３〜２８６の数は用いる命令タイプに応じて可変である。以下の表は、各命令タイプごとにオペランド数と長さの定義とを示したものである。
オペランドタイプ In FIG. 10, the number of operands 283 to 286 required for a certain instruction is variable depending on the instruction type used. The following table shows the number of operands and length definitions for each instruction type.
Operand type

図１２は、３オペランド命令に対する図１０のデータワード、オペランド記述子２８２のデータワードフォーマット３００と、２オペランド命令に対するデータワードフォーマット３０１とを示している。以下の表に、オペランド記述子のコード化の詳細を示す。
オペランド記述子 FIG. 12 shows the data word of FIG. 10 for a three operand instruction, the data word format 300 of the operand descriptor 282, and the data word format 301 for a two operand instruction. The following table details the encoding of the operand descriptor.
Operand descriptor

上述の表において、一定データアドレスモードの場合には、コプロセッサ２２４が１つの内部データ項目をフェッチあるいは計算して、この項目を当該オペランドの命令長として用いる。タイルアドレスモードの場合には、コプロセッサ２２４がいくつかのデータをサイクルして「タイル効果」を得る。オペランド記述子のＬビットがゼロの場合には、データが短く、データ項目がオペランドワード中に存在することを意味する。 In the above table, in the constant data address mode, the coprocessor 224 fetches or calculates one internal data item and uses this item as the instruction length of the operand. In the tile address mode, the coprocessor 224 cycles through some data to obtain a “tile effect”. If the L bit in the operand descriptor is zero, it means that the data is short and the data item is present in the operand word.

図１０において、それぞれのオペランド／結果ワード２８３−２８６は、オペランド自身の値あるいはデータが格納されているオペランド／結果の開始位置を示す３２ビット仮想アドレスを含む。図２の命令制御部２３５は、命令を二段階で復号する。最初に、命令の主オプコードが有効であるかを調べ、主オプコード（図１１）が無効である場合にはエラーを生成する。次に、ＣＢｕｓ２３１を介して種々のレジスタを設定することにより、命令制御部２３５が命令を実行し、命令に指定されている動作を行う。なお、設定するレジスタがないような命令もある。 In FIG. 10, each operand / result word 283-286 includes a 32-bit virtual address indicating the starting position of the operand / result in which the value or data of the operand itself is stored. The instruction control unit 235 in FIG. 2 decodes the instruction in two stages. First, it is checked whether the main opcode of the instruction is valid. If the main opcode (FIG. 11) is invalid, an error is generated. Next, by setting various registers via the CBus 231, the instruction control unit 235 executes the instruction and performs the operation specified in the instruction. Some instructions do not have a register to set.

各モジュールのレジスタは動作に応じていくつかの種別に分けられる。まず、状態レジスタタイプがあり、他のモジュールからは「読み込まれるのみ」で、レジスタを含むモジュールによって「読み込み／書き込み」されるものがある。次に、構成レジスタの一番目のタイプ（以降、ｃｏｎｆｉｇ１）は、モジュールから外部的に「読み込み／書き込み」され、レジスタを含むモジュールからは「読み込みのみ」される。これらのレジスタは一般にアドレス値などの大きなタイプ構成情報を格納する際に用いられる。構成レジスタの二番目のタイプ（以降、ｃｏｎｆｉｇ２）はすべてのモジュールから読み込み、書き込みができるが、レジスタを含むモジュールからは読み込みしかできない。このレジスタタイプは、レジスタのビットごとのアドレシングが必要なときに用いられる。 The registers of each module are divided into several types according to the operation. First, there is a status register type, which is “read only” from other modules and “read / write” by the module containing the register. Next, the first type of configuration register (hereinafter config1) is “read / write” externally from the module and “read only” from the module containing the register. These registers are generally used when storing large type configuration information such as address values. The second type of configuration register (hereinafter config2) can be read and written from all modules, but can only be read from the module containing the register. This register type is used when bitwise addressing of a register is required.

制御タイプのレジスタとしては種々のものが存在する。第一のタイプ（以降、ｃｏｎｔｒｏｌ１レジスタ）はすべてのモジュール（レジスタを含むモジュールも含む）によって読み込み／書き込みが可能である。Ｃｏｎｔｒｏｌ１レジスタは、アドレス値などの大きな制御情報を格納する際に用いられる。同様に、制御レジスタの第二のタイプ（以降、ｃｏｎｔｒｏｌ２）は、ビットごとに設定される。 There are various types of control type registers. The first type (hereinafter, control 1 register) can be read / written by all modules (including modules including registers). The Control1 register is used when storing large control information such as an address value. Similarly, the second type of control register (hereinafter, control 2) is set for each bit.

最後のレジスタタイプ（インタラプトレジスタ）は、レジスタを含むモジュールによって１にセットされ、セットされたビットに「１」を外部から書き込みことによりゼロにリセットすることができるようなビットをレジスタ内に含む。このようなタイプのレジスタはそれぞれのモジュールからのインタラプト／エラー信号に対処するために用いられる。 The last register type (interrupt register) is set to 1 by the module containing the register and contains a bit in the register that can be reset to zero by externally writing a “1” to the set bit. These types of registers are used to handle interrupt / error signals from the respective modules.

コプロセッサ２２４の各モジュールは、命令を実行中でビジー状態のときには、ＣＢｕｓ２３１上のｃ＿ａｃｔｉｖｅラインをセットする。このため、命令制御部２３５は、ＣＢｕｓ２３１上の各モジュールからのｃ＿ａｃｔｉｖｅラインの「ＯＲ」をとり、命令が終了した時点を把握することができる。ローカルメモリ制御モジュール２３６と周辺インタフェース制御モジュール２３７とは、オーバラップ命令を実行することができ、オーバラップ命令を実行する際に起動するｃ＿ｂａｃｋｇｒｏｕｎｄラインを備える。オーバラップ命令は、ローカルメモリインタフェースと周辺インタフェースとの間でデータを転送する「ローカルＤＭＡ」命令である。 Each module of the coprocessor 224 sets the c_active line on the CBus 231 when an instruction is being executed and is busy. For this reason, the instruction control unit 235 can take the “OR” of the c_active line from each module on the CBus 231 and grasp the time point when the instruction is completed. The local memory control module 236 and the peripheral interface control module 237 can execute an overlap instruction, and include a c_background line that is activated when the overlap instruction is executed. The overlap instruction is a “local DMA” instruction that transfers data between the local memory interface and the peripheral interface.

オーバラップローカルＤＭＡ命令の実行サイクルは、他の命令の実行サイクルとは異なる。オーバラップ命令が実行に移されるにあたっては、命令制御部２３５が既にオーバラップ命令が実行されているかどうかを調べる。オーバラップ命令が既に存在すれば、あるいはオーバラップ命令が不稼動モードになっていれば、命令制御部２３５は命令が終了するのを待ってから、当該命令の実行に移る。オーバラップ命令が存在せず、かつ稼動モードになっていれば、命令制御部２３５はすぐにオーバラップ命令を復号し、周辺インタフェース制御部２３７やローカルメモリ制御部２３６を構成し命令を実行する。レジスタを構成し終えたら、従来の意味で命令が終了するのを待たずに命令制御部２３５はレジスタ（終了レジスタ、状態レジスタ、命令ポインタ等）を更新する。この時点で、終了シーケンス番号はインタラプトシーケンス番号と同一であれば、「オーバラップ命令終了」インタラプト信号を出力するのではなく単に当該信号を用意する。「オーバラップ命令終了」インタラプト信号は、オーバラップ命令が完全に終了した時点で出力される。 The execution cycle of the overlap local DMA instruction is different from the execution cycle of other instructions. When the overlap instruction is executed, the instruction control unit 235 checks whether the overlap instruction has already been executed. If the overlap instruction already exists, or if the overlap instruction is in the inoperative mode, the instruction control unit 235 waits for the instruction to end before proceeding to execute the instruction. If there is no overlap command and the operation mode is set, the command control unit 235 immediately decodes the overlap command, configures the peripheral interface control unit 237 and the local memory control unit 236, and executes the command. After completing the configuration of the registers, the instruction control unit 235 updates the registers (end register, status register, instruction pointer, etc.) without waiting for the instruction to end in the conventional sense. At this time, if the end sequence number is the same as the interrupt sequence number, the “overlap instruction end” interrupt signal is not output but the signal is simply prepared. The “overlap instruction end” interrupt signal is output when the overlap instruction is completely completed.

命令が復号されると、命令制御部は現在の命令を実行しつつ、次の命令をプレフェッチする。ほとんどの命令では、命令のフェッチ、復号よりも命令の実行に要する時間の方がかなり長い。命令制御部２３５は、以下の条件が揃った時点で命令をプレフェッチする。
１．現在実行中の命令がインタラプトや休止中でない
２．現在実行中の命令がジャンプ命令でない
３．次の命令ストリームがプリフェッチ可能である
４．他にペンディングしている命令が存在する
命令制御部２３５がプレフェッチ可能と判断すると、次の命令に要求を出し、プレフェッチバッファに配置し、バッファを有効にする。ここまで処理を進めると、命令制御部２３５は現在実行中の命令が終了するまでは何もすることがなく、当該命令の終了をＣＢｕｓ２３１上のｃ＿ａｃｔｉｖｅとｃ＿ｂａｃｋｇｒｏｕｎｄラインを調べることのみを行う。
３．８命令制御部のレジスタの更新
命令が終了すると、命令制御部２３５は新しい状態を反映させるためにレジスタの更新を行う。この処理は外部からのアクセスとの同期の問題を避けるために高速に行わなければならない。この高速更新処理は以下の手順で行われる。
１．適切なレジスタアクセスセマフォアの入手。セマフォアが命令制御部２３５の外部のエージェントによって占有されている場合には、セマフォアが解放されるまで命令実行サイクルが待機し、解放されてから処理に移る。
２．適切なレジスタの更新。命令が適切なジャンプ命令でない場合には、命令ポインタ（ｉｃ＿ｉｐａとｉｃ＿ｉｐｂ）を命令のサイズ分増加させる。ジャンプ命令のときは、ジャンプ先の値が命令ポインタにロードされる。従って、シーケンス番号が稼動モードであれば終了レジスタ（ｉｃ＿ｆｎａとｉｃ＿ｆｎｂ）は増加することになる。 When the instruction is decoded, the instruction control unit prefetches the next instruction while executing the current instruction. For most instructions, the time required to execute an instruction is considerably longer than the fetching and decoding of the instruction. The instruction control unit 235 prefetches an instruction when the following conditions are met.
1. The currently executing instruction is not interrupted or paused. 2. The instruction currently being executed is not a jump instruction. 3. The next instruction stream can be prefetched. If there is another pending instruction, the instruction control unit 235 determines that prefetching is possible, issues a request to the next instruction, places it in the prefetch buffer, and validates the buffer. When the processing is advanced so far, the instruction control unit 235 does nothing until the currently executed instruction is ended, and only checks the c_active and c_background lines on the CBus 231 for the end of the instruction.
3.8 Instruction Controller Register Update When the instruction is complete, the instruction controller 235 updates the register to reflect the new state. This process must be performed at high speed to avoid synchronization problems with external access. This high-speed update process is performed according to the following procedure.
1. Obtain the appropriate register access semaphore. When the semaphore is occupied by an agent external to the instruction control unit 235, the instruction execution cycle waits until the semaphore is released, and the process proceeds after being released.
2. Appropriate register updates. If the instruction is not an appropriate jump instruction, the instruction pointers (ic_ipa and ic_ipb) are increased by the size of the instruction. In the case of a jump instruction, the jump destination value is loaded into the instruction pointer. Therefore, if the sequence number is the operation mode, the end registers (ic_fna and ic_fnb) are increased.

状態レジスタ（ｉｃ＿ｓｔａｔ）も新しい状態を反映させるように適切に更新される。必要であれば、休止ビットを設定することもある。インタラプトが生じ、インタラプトに対する休止が稼動状態になったり、エラーが生じた場合には、命令制御部２３５は休止する。休止は、状態レジスタ中の命令ストリーム休止ビット（ａ＿ｐａｕｓｅとｂ＿ｐａｕｓｅ）をセットすることによって起動される。命令実行を再開する際には、これらのビットを０にリセットしなければならない。
３．１クロックサイクル時間、ＣＢｕｓ２３１上にｃ＿ｅｎｄ信号を送出し、コプロセッサ２２４中の他のモジュールに命令が終了した旨を伝える。
４．必要であればインタラプトを送出する。インタラプトの送出は、以下の状況のときに送出される。
ａ．「シーケンス番号終了」インタラプトが生じたとき。すなわち、終了レジスタ（ｉｃ＿ｆｎａとｉｃ＿ｆｎｂ）シーケンス番号がインタラプトシーケンス番号と一致したとき。このとき、インタラプトが準備され、シーケンス番号が稼動モードになり、インタラプトが生じる。あるいは、
ｂ．終了した命令が終了時点でインタラプトするように符号化されている場合。この場合にはインタラプト機構が起動される。
３．９レジスタアクセスセマフォアのセマンティックス
レジスタアクセスセマフォアは、複数の命令制御レジスタに高速アクセスを提供する機構である。高速アクセスを必要とするレジスタとして、以下のものが挙げられる。
１．命令ポインタレジスタ（ｉｃ＿ｉｐａとｉｃ＿ｉｐｂ）
２．ＴｏＤｏレジスタ（ｉｃ＿ｔｄａとｉｃ＿ｔｄｂ）
３．終了レジスタ（ｉｃ＿ｆｎａとｉｃ＿ｆｎｂ）
４．インタラプトレジスタ（ｉｃ＿ｉｎｔａとｉｃ＿ｉｎｔｂ）
５．構成レジスタ中の休止ビット（ｉｃ＿ｃｆｇ）
外部エージェントはすべてのレジスタをいつでも安全に読むことができる。また、外部エージェントはすべてのレジスタにいつでも書き込むことができるが、命令制御部２３５がこれらのレジスタ中の値を更新してしまわないように、外部エージェントはまずレジスタアクセスセマフォアを入手しなければならない。命令制御部は、レジスタアクセスセマフォアが外部で宣言されている間は上述のレジスタ中の値を更新することはできない。また、命令制御部２３５は、高速を維持するために１クロックサイクルの間に上述のすべてのレジスタを更新する。 The status register (ic_stat) is also updated appropriately to reflect the new state. If necessary, the pause bit may be set. When an interrupt occurs and the pause for the interrupt becomes active or an error occurs, the instruction control unit 235 pauses. Pause is activated by setting the instruction stream pause bits (a_pause and b_pause) in the status register. When resuming instruction execution, these bits must be reset to zero.
3. Send c_end signal on CBus 231 for a clock cycle time to inform other modules in coprocessor 224 that the instruction is complete.
4). Send interrupts if necessary. Interrupts are sent in the following situations.
a. When the “sequence number end” interrupt occurs. That is, when the end register (ic_fna and ic_fnb) sequence number matches the interrupt sequence number. At this time, an interrupt is prepared, the sequence number enters the operation mode, and an interrupt occurs. Or
b. The finished instruction is coded to interrupt at the end. In this case, the interrupt mechanism is activated.
3.9 Register Access Semaphore Semantics A register access semaphore is a mechanism that provides fast access to multiple instruction control registers. Examples of registers that require high-speed access include the following.
1. Instruction pointer registers (ic_ipa and ic_ipb)
2. ToDo registers (ic_tda and ic_tdb)
3. End registers (ic_fna and ic_fnb)
4). Interrupt registers (ic_int and ic_intb)
5). Pause bit (ic_cfg) in configuration register
The foreign agent can read all registers safely at any time. Also, the external agent can write to all registers at any time, but the external agent must first obtain a register access semaphore so that the instruction control unit 235 does not update the values in these registers. . The instruction control unit cannot update the value in the register while the register access semaphore is declared externally. In addition, the instruction control unit 235 updates all the above-described registers during one clock cycle in order to maintain high speed.

前述のように、シーケンス機構が稼動モードであれば、各命令には３２ビットの「シーケンス番号」が付与されている。命令シーケンス番号は順々に増加していき、０ｘＦＦＦＦＦＦＦＦから０ｘ００００００００にラッピングされる。外部からの書き込みがインタラプトレジスタ（ｉｃ＿ｉｎｔａとｉｃ＿ｉｎｔｂ）になされると、命令制御部２３５はすぐに以下の比較と更新を行う。
１．インタラプトシーケンス番号（インタラプトレジスタ中の値）が同一ストリームの終了シーケンス番号（終了レジスタ中の値）よりも「大きければ」（モジュロ演算）、命令制御部は状態レジスタ中の「シーケンス番号終了」準備ビット（ｉｃ＿ｓｔａｔ中のａ＿ｐｒｉｍｅｄとｂ＿ｐｒｉｍｅｄビット）をセットすることで「シーケンス番号終了」インタラプト機構を準備する。
２．インタラプトシーケンス番号が終了シーケンス番号よりも「小さく」、当該ストリームにおいてオーバラップ命令が実行中であり、インタラプトシーケンス番号が最後のオーバラップ命令シーケンス番号（ｉｃ＿ｌｏａあるいはｉｃ＿ｌｏｂレジスタ中の値）と同一であれば、命令制御部はｉｃ＿ｓｔａｔレジスタ中のａ＿ｏｌ＿ｐｒｉｍｅｄあるいはｂ＿ｏｌ＿ｐｒｉｍｅｄビットをセットすることで「オーバラップ命令シーケンス番号終了」インタラプト機構を準備する。
３．インタラプトシーケンス番号が終了シーケンス番号よりも「小さく」、当該ストリームにおいてオーバラップ命令が実行中であり、インタラプトシーケンス番号が最後のオーバラップ命令シーケンス番号と同一でなければ、インタラプトシーケンス番号は終了命令を示すことになり、インタラプト機構は準備されない。
４．インタラプトシーケンス番号が終了シーケンス番号よりも「小さく」、当該ストリームにおいてオーバラップ命令が実行中でなければ、インタラプトシーケンス番号は終了命令を示すことになり、インタラプト機構は準備されない。 As described above, when the sequence mechanism is in the operation mode, a 32-bit “sequence number” is assigned to each instruction. The instruction sequence number increases sequentially and is wrapped from 0xFFFFFFFF to 0x00000000. When an external write is made to the interrupt registers (ic_inta and ic_intb), the instruction control unit 235 immediately performs the following comparison and update.
1. If the interrupt sequence number (value in the interrupt register) is “larger” (modulo operation) than the end sequence number (value in the end register) of the same stream, the instruction control unit prepares the “sequence number end” preparation bit in the status register By setting (a_primed and b_primed bits in ic_stat), the “sequence number end” interrupt mechanism is prepared.
2. If the interrupt sequence number is “smaller” than the end sequence number, an overlap instruction is being executed in the stream, and the interrupt sequence number is the same as the last overlap instruction sequence number (the value in the ic_loa or ic_lob register) The instruction control unit prepares an “overlapping instruction sequence number end” interrupt mechanism by setting the a_ol_primed or b_ol_primed bit in the ic_stat register.
3. If the interrupt sequence number is “smaller” than the end sequence number, an overlap instruction is being executed in the stream, and the interrupt sequence number is not the same as the last overlap instruction sequence number, the interrupt sequence number indicates the end instruction. As a result, no interrupt mechanism is prepared.
4). If the interrupt sequence number is “smaller” than the end sequence number and the overlap instruction is not being executed in the stream, the interrupt sequence number indicates the end instruction, and no interrupt mechanism is prepared.

外部のエージェントは、状態レジスタ中のインタラプト準備ビット（ａ＿ｐｒｉｍｅｄ，ａ＿ｏｌ＿ｐｒｉｍｅｄ，ｂ＿ｐｒｉｍｅｄ，ｂ＿ｏｌ＿ｐｒｉｍｅｄビット）をセットすることができ、インタラプト機構を独立に起動、解除することができる。
３．１０命令制御部
図１３は、命令制御部２３５をより詳細に示した図である。命令制御部２３５は、命令実行サイクルを処理しコプロセッサ２２４の全体の実行制御を管理する実行制御部３０５を含む。実行制御部３０５は、命令制御部２３５の全体の実行制御を管理し、命令シーケンスを決定し、命令のフェッチやプレフェッチを行い、命令の復号や命令制御レジスタの更新を行う。命令制御部は更に命令復号器３０６を備える。命令復号器３０６は、プレフェッチバッファ３０７から命令を受信し、前述の通り復号する。命令復号器３０６は、他のコプロセッサモジュール中のレジスタを構成して命令を実行する処理も行う。プレフェッチバッファ制御部３０７は、プレフェッチバッファ制御部中のプレフェッチバッファからの読み込みや書き込みを管理するとともに、命令復号器３０６と入力インタフェーススイッチ２５２（図２）との間のインタフェースをも管理する。また、プレフェッチバッファ制御部３０７は二つの命令ポインタレジスタ（ｉｃ＿ｉｐａとｉｃ＿ｉｐｂ）の更新をも管理する。命令制御部２３５、種々のモジュール２３９（図２）、外部インタフェース制御部２３８（図２）からのＣＢｕｓ２３１（図２）へのアクセスは、三つのモジュールのアクセス要求間での調停を行う「ＣＢｕｓ」調停部３０８において行われる。要求はＣＢｕｓ２３１によって種々のモジュールのレジスタ部に転送される。 The external agent can set the interrupt preparation bits (a_primed, a_ol_primed, b_primed, b_ol_primed bits) in the status register, and can activate and cancel the interrupt mechanism independently.
3.10 Command Control Unit FIG. 13 is a diagram showing the command control unit 235 in more detail. The instruction control unit 235 includes an execution control unit 305 that processes an instruction execution cycle and manages overall execution control of the coprocessor 224. The execution control unit 305 manages overall execution control of the instruction control unit 235, determines an instruction sequence, fetches and prefetches instructions, decodes instructions, and updates an instruction control register. The instruction control unit further includes an instruction decoder 306. The instruction decoder 306 receives the instruction from the prefetch buffer 307 and decodes it as described above. The instruction decoder 306 also performs processing for configuring instructions in other coprocessor modules and executing instructions. The prefetch buffer control unit 307 manages reading and writing from the prefetch buffer in the prefetch buffer control unit, and also manages an interface between the instruction decoder 306 and the input interface switch 252 (FIG. 2). . The prefetch buffer control unit 307 also manages updating of the two instruction pointer registers (ic_ipa and ic_ipb). Access to the CBus 231 (FIG. 2) from the instruction control unit 235, various modules 239 (FIG. 2), and the external interface control unit 238 (FIG. 2) is “CBus” which performs arbitration between access requests of three modules. This is performed in the arbitration unit 308. The request is transferred by CBus 231 to the register units of various modules.

図１４は、図１３の実行制御部３０５をより詳細に示した図である。前述の通り、実行制御部は図９の命令実行サイクル２７５の処理を管理し、特に以下の処理を行う。
１．次の命令をどの命令ストリームから取り出すかを決定し、
２．当該命令のフェッチを開始し、
３．プレフェッチバッファに格納されている命令の復号を命令復号器に指示し、
４．次の命令のプレフェッチを決定して開始し、
５．命令の終了を決定し、
６．命令が終了したらレジスタを更新する。 FIG. 14 is a diagram showing the execution control unit 305 of FIG. 13 in more detail. As described above, the execution control unit manages the processing of the instruction execution cycle 275 in FIG. 9, and particularly performs the following processing.
1. Determine which instruction stream to fetch the next instruction from,
2. Start fetching the instruction,
3. Instructs the instruction decoder to decode the instruction stored in the prefetch buffer,
4). Decide and start prefetching the next instruction,
5). Determine the end of the instruction,
6). When the instruction is finished, the register is updated.

実行制御部は、全体の命令実行サイクルを管理する大きなコア状態器３１０（以下、中枢部と呼ぶ）を備える。図１５は、上述の命令実行サイクルを管理する中枢部３１０状態遷移図を示した図である。図１４において、実行制御部は命令プレフェッチ論理部３１１を備える。この部位は、実行すべき命令が存在するかどうか、どの命令ストリームに命令が属するか、の決定処理を行う。図１５の遷移図において開始３１２ならびにプレフェッチ３１３状態は、この情報を用いて命令を入手する。図１４のレジスタ管理部３１７は、双方の命令ストリームのレジスタアクセスセマフォアをモニタし、各モジュール中の必要なすべてのレジスタを更新する処理を行う。また、終了レジスタ（ｉｃ＿ｆｎａとｉｃ＿ｆｎｂ）とインタラプトレジスタ（ｉｃ＿ｉｎｔａとｉｃ＿ｉｎｔｂ）とを比較し、「シーケンス番号終了」インタラプトを行うべきかどうかを決定する処理も、レジスタ管理部３１７が行う。更に、レジスタ管理部３１７はインタラプト準備処理も行う。オーバラップ命令部３１８は、ｉｃ＿ｓｔａｔレジスタ中の適切な状態ビットの管理を通して、オーバラップ命令の終了処理の管理を行う。実行制御部は、更に中枢部３１０と図１３の命令復号器３０６との間のインタフェースを行う復号インタフェース部３１９を備える。 The execution control unit includes a large core state machine 310 (hereinafter referred to as a central part) that manages the entire instruction execution cycle. FIG. 15 is a diagram showing a state transition diagram of the central part 310 for managing the above instruction execution cycle. In FIG. 14, the execution control unit includes an instruction prefetch logic unit 311. This part determines whether there is an instruction to be executed and which instruction stream the instruction belongs to. The start 312 and prefetch 313 states in the transition diagram of FIG. 15 use this information to obtain instructions. The register management unit 317 in FIG. 14 monitors the register access semaphores of both instruction streams and performs a process of updating all necessary registers in each module. The register management unit 317 also performs processing for comparing the end registers (ic_fna and ic_fnb) with the interrupt registers (ic_inta and ic_intb) and determining whether or not the “sequence number end” interrupt should be performed. Further, the register management unit 317 also performs interrupt preparation processing. The overlap instruction unit 318 manages the end process of the overlap instruction through management of an appropriate status bit in the ic_stat register. The execution control unit further includes a decoding interface unit 319 that performs an interface between the central unit 310 and the instruction decoder 306 of FIG.

図１６は、命令復号部３０６をより詳細に示した図である。命令復号器はコプロセッサを構成してプレフェッチバッファ内の命令を実行する処理を行う。命令復号器３０６は、多くの小さな状態マシンの組み合わせである大きな状態マシンから構成される命令復号シーケンサ３２１を備える。命令シーケンサ３２１は，各モジュール中のレジスタをセットするＣＢｕｓディスパッチャ３１２と通信する。また、命令復号シーケンサ３２１は、命令の有効性や命令のオーバラップ状況などの関連情報を実行制御部に伝える。ここで、命令の有効性チェックは命令オプコードが予約されているオプコードであるかどうかをチェックするものである。 FIG. 16 shows the instruction decoding unit 306 in more detail. The instruction decoder constitutes a coprocessor and performs processing for executing instructions in the prefetch buffer. The instruction decoder 306 comprises an instruction decode sequencer 321 composed of a large state machine that is a combination of many small state machines. The instruction sequencer 321 communicates with the CBus dispatcher 312 that sets the registers in each module. Further, the instruction decoding sequencer 321 transmits related information such as instruction validity and instruction overlap status to the execution control unit. Here, the instruction validity check is to check whether the instruction opcode is a reserved opcode.

図１７は、図１６の命令ディスパッチャシーケンサ３２１をより詳細に示した図である。命令ディスパッチャシーケンサ３２１は、全体のシーケンス制御状態マシン３２４と連続したモジュール毎構成シーケンサ状態マシン（例えば３２５や３２６）を備える。モジュール毎構成シーケンサ状態マシンは構成すべき各モジュールに与えられる。全体として状態マシンはモジュールのコプロセッサマイクロプログラミングを定義する。状態マシン（例えば３２５）は、ＣＢｕｓディスパッチャに全体のＣＢｕｓを利用して種々のレジスタをセットするように指示し、処理のための種々モジュールを構成する。特定のレジスタに書き込みをするためには、命令の実行が開始されなければならない。一般に命令の実行にはシーケンサ３２１が処理のためにコプロセッサのレジスタを構成する以上の時間が必要である。付録Ａにおいて、コプロセッサの命令シーケンサによって実行されるマイクロプログラミング処理と命令シーケンサ３２１によってセットアップされた形式を示す。 FIG. 17 is a diagram showing the instruction dispatcher sequencer 321 of FIG. 16 in more detail. The instruction dispatcher sequencer 321 includes an overall sequence control state machine 324 and a continuous module-by-module sequencer state machine (eg, 325 or 326). A per-module configuration sequencer state machine is provided for each module to be configured. Overall, the state machine defines the coprocessor microprogramming of modules. A state machine (eg, 325) instructs the CBus dispatcher to set various registers using the entire CBus and configures various modules for processing. In order to write to a particular register, the execution of the instruction must be started. In general, execution of instructions requires more time than the sequencer 321 configures the coprocessor registers for processing. Appendix A shows the microprogramming process performed by the instruction sequencer of the coprocessor and the format set up by the instruction sequencer 321.

実際には、命令復号シーケンサ３２１は命令ごとにコプロセッサ中のすべてのモジュールを構成するわけではない。以下の表では、命令クラスに対するモジュール構成順序を、ピクセルオーガナイザ２４６（ＰＯ）、データキャッシュ制御部２４０（ＤＣＣ）、オペランドオーガナイザＢ２４７（ＯＯＢ）、オペランドオーガナイザＣ２４８（ＯＯＣ）、主データパス２４２（ＭＤＰ）、結果オーガナイザ２４９（ＲＯ）、ＪＰＥＧエンコーダ２４１（ＪＣ）などの構成されるモジュールとともに示している。なお、外部インタフェース制御部２３８（ＥＩＣ），ローカルメモリ制御部２３６（ＬＭＣ），命令制御部２３５自身（ＩＣ）、入力インタフェーススイッチ２５２（ＩＩＳ）、雑多モジュール（ＭＭ）などのモジュールは、命令復号処理中には構成されることはない。 Actually, the instruction decoding sequencer 321 does not constitute all the modules in the coprocessor for each instruction. In the following table, the module configuration order for instruction classes is as follows: pixel organizer 246 (PO), data cache controller 240 (DCC), operand organizer B247 (OOB), operand organizer C248 (OOC), main data path 242 (MDP). The result organizer 249 (RO), the JPEG encoder 241 (JC), and other modules are shown. Modules such as the external interface control unit 238 (EIC), local memory control unit 236 (LMC), instruction control unit 235 itself (IC), input interface switch 252 (IIS), miscellaneous module (MM), etc. are subjected to instruction decoding processing. It is never configured inside.

モジュール立ち上げ順序 Module startup sequence

図１７において、各モジュール構成シーケンサ（例えば３２５）は必要なレジスタアクセス処理を行って特定のモジュールを構成するように管理する。また、全体のシーケンス制御状態マシン３２４は、前述の順序でモジュール構成シーケンサの全体の動作を管理する。図１８は、上の表に従って関連するモジュール構成シーケンサを起動する全体シーケンス制御を状態遷移図３３０で表した図である。各モジュール構成シーケンサは、モジュールの実行中に種々のレジスタをセットするために、ＣＢｕｓディスパッチャを制御して、レジスタ内容を変更する処理を行う。 In FIG. 17, each module configuration sequencer (for example, 325) performs necessary register access processing and manages to configure a specific module. The overall sequence control state machine 324 manages the overall operation of the module configuration sequencer in the order described above. FIG. 18 is a diagram showing the overall sequence control for activating the related module configuration sequencer according to the above table as a state transition diagram 330. Each module configuration sequencer controls the CBus dispatcher to change the register contents in order to set various registers during execution of the module.

図１９は、図１３のプリフェッチバッファ制御部３０７をより詳細に示した図である。プリフェッチバッファ制御部は単一のコプロセッサ命令（６×３２ビットワード）を格納するためのプリフェッチバッファ３３５を備える。そして、プリフェッチバッファはＩＢｕｓシーケンサ３３６によって制御される一つの書き込みポートと、命令復号器、実行制御部、命令制御部ＣＢｕｓインタフェースにデータを送出する一つの読み込みポートを備える。ＩＢｕｓシーケンサ３３６は、プリフェッチバッファ３３５の入力インタフェーススイッチへの接続においてバスプロトコルを監視する。また、命令をフェッチするためにアドレスを生成するアドレス管理部３３７をも備える。アドレス管理部３３７は、ｉｃ＿ｉｐａあるいはｉｃ＿ｉｐｂの一つを選択し入力インタフェーススイッチへのバスに接続する機能と、最後の命令がどのストリームからフェッチされたかに基づいてｉｃ＿ｉｐａあるいはｉｃ＿ｉｐｂの一つを増加させる機能と、ｉｃ＿ｉｐａとｉｃ＿ｉｐｂレジスタにジャンプ先のアドレスを格納する機能とを有する。ＰＢＣ制御部３３９はプレフェッチバッファ制御部３０７の全体の制御を行う。
３．１１モジュールローカルレジスタファイルの説明
図１３に示したように、命令制御モジュール自身を含む各モジュールは、図２０に示してあるＣＢｕｓインタフェース制御部３０３とともに上述したレジスタ３０４の内部セットを備え、ＣＢｕｓ要求を受け付けるとともに当該要求に応じて内部レジスタを更新する処理を行う。モジュールの制御は、ＣＢｕｓインタフェース３０２を介してモジュール中のレジスタ３０４に書き込むことによって行われる。ＣＢｕｓ調整部３０８（図１３）は、命令制御部２３５、外部インタフェース制御部、雑多モジュールのどのモジュールがＣＢｕｓを制御し、ＣＢｕｓのマスターとして動作し、レジスタの書き込み／読み出しを行うのかを決定する。 FIG. 19 is a diagram showing the prefetch buffer control unit 307 of FIG. 13 in more detail. The prefetch buffer controller includes a prefetch buffer 335 for storing a single coprocessor instruction (6 × 32 bit word). The prefetch buffer has one write port controlled by the IBus sequencer 336 and one read port for sending data to the instruction decoder, execution control unit, and instruction control unit CBus interface. The IBus sequencer 336 monitors the bus protocol at the connection of the prefetch buffer 335 to the input interface switch. An address management unit 337 that generates an address for fetching an instruction is also provided. The address management unit 337 has a function of selecting one of ic_ipa or ic_ipb and connecting it to the bus to the input interface switch, and a function of increasing one of ic_ipa or ic_ipb based on from which stream the last instruction is fetched And a function of storing the jump destination address in the ic_ipa and ic_ipb registers. The PBC control unit 339 controls the prefetch buffer control unit 307 as a whole.
3.11 Description of Module Local Register File As shown in FIG. 13, each module including the instruction control module itself includes an internal set of the register 304 described above together with the CBus interface control unit 303 shown in FIG. A process of accepting the request and updating the internal register in response to the request is performed. The module is controlled by writing to the register 304 in the module via the CBus interface 302. The CBus adjustment unit 308 (FIG. 13) determines which of the instruction control unit 235, the external interface control unit, and the miscellaneous module controls the CBus, operates as a CBus master, and writes / reads a register.

図２０は、各モジュールにおいて用いられるＣＢｕｓインタフェース３０３の標準構成を示した図である。標準ＣＢｕｓインタフェース３０３はＣＢｕｓ３０２からの読み出し要求や書き込み要求を受信するとともに、モジュール内の種々のサブモジュールによって３４１を介して更新されるレジスタファイル３０４を備える。更に、メモリ領域の読み出しを含むサブモジュールのメモリ領域の更新を行う制御ライン３４４が備わっている。標準ＣＢｕｓインタフェース３０３はＣＢｕｓの目的地として振る舞い、レジスタ３０４や他のサブモジュールのメモリオブジェクトの読み出し要求や書き込み要求を受け付ける。 FIG. 20 is a diagram showing a standard configuration of the CBus interface 303 used in each module. The standard CBus interface 303 includes a register file 304 that receives read requests and write requests from the CBus 302 and is updated via 341 by various submodules in the module. Further, a control line 344 for updating the memory area of the submodule including reading of the memory area is provided. The standard CBus interface 303 behaves as a CBus destination and accepts read requests and write requests for the memory objects of the register 304 and other submodules.

「ｃ＿ｒｅｓｅｔ」信号３４５は標準ＣＢｕｓインタフェース１０３内のすべてのレジスタをデフォルト状態にセットする。しかし、「ｃ＿ｒｅｓｅｔ」は自身とＣＢｕｓマスターとの間の信号のやり取りを制御する状態マシンはリセットしない。そのため、「ｃ＿ｒｅｓｅｔ」がＣＢｕｓ処理中に送出されたとしても、当該処理は何かしらの形で終了することになる。「ｃ＿ｉｎｔ」３４７、「ｃ＿ｅｘｐ」３４８、「ｃ＿ｅｒｒ」３４９信号は、以下の式に基づいてモジュールｅｒｒ＿ｉｎｔとｅｒｒ＿ｉｎｔ＿ｅｎレジスタの内容より生成される。 A “c_reset” signal 345 sets all registers in the standard CBus interface 103 to a default state. However, “c_reset” does not reset the state machine that controls the exchange of signals between itself and the CBus master. Therefore, even if “c_reset” is transmitted during the CBus process, the process ends in some form. The “c_int” 347, “c_exp” 348, and “c_err” 349 signals are generated from the contents of the modules err_int and err_int_en registers based on the following equations.

信号「ｃ＿ｓｄａｔａ＿ｉｎ」と「ｃ＿ｓｖａｌｉｄ＿ｉｎ」３４５は、モジュール列の中での前のモジュールからのデータ／有効信号であり、信号「ｃ＿ｓｄａｔａ＿ｏｕｔ」と「ｃ＿ｓｖａｌｉｄ＿ｏｕｔ」３５０は、モジュール列の中での次のモジュールへのデータ／有効信号である。標準ＣＢｕｓインタフェース３０３の機能としては以下のものが含まれる。
１．レジスタの読み出し／書き込み管理
２．メモリ領域の読み出し／書き込み管理
３．テストモードの読み出し／書き込み管理
４．サブモジュールの監視／更新管理
３．１２レジスタ読み出し／書き込み管理
標準ＣＢｕｓインタフェース３０３はＣＢｕｓ上に流れるレジスタ読み出し／書き込み要求やビットセット要求を受け付ける。標準ＣＢｕｓインタフェースが管理するＣＢｕｓ命令として以下の２種類ある。
１．タイプＡ
タイプＡは、他のモジュールが標準ＣＢｕｓインタフェース３０３内のレジスタに１、２、３、４バイト読み出し／書き込みする動作をする。書き込み動作では、命令サイクルの直後のクロックサイクルでデータサイクルが生じる。なお、レジスタ書き込み／読み出しのタイプフィールドはそれぞれ「１０００」と「１００１」である。標準ＣＢｕｓインタフェース３０３は命令を復号して、命令がモジュールのアドレスを指しているか、読み出し／書き込み動作のどちらかであるか、を調べる。読み出し動作では、標準ＣＢｕｓインタフェース３０３は、ＣＢｕｓ処理の「ｒｅｇ」フィールドを用いてどのレジスタ出力に「ｃ＿ｓｄａｔａ」バス３５０を接続するかを選択する。書き込み動作では、標準ＣＢｕｓインタフェース３０３は「ｒｅｇ」フィールドと「ｂｙｔｅ」フィールドを用いて選択されたレジスタにデータを書き込む。読み出し動作が終了すると、標準ＣＢｕｓインタフェースはデータを戻すと同時に「ｃ＿ｓｖａｌｉｄ」３５０を送出する。書き込み動作が終了すると、標準ＣＢｕｓインタフェース３０３は「ｃ＿ｓｖａｌｉｄ」３５０を送出して返答する。
２．タイプＣ
タイプＣは、１つのレジスタ中のバイトの１つに他のモジュールが１ビットあるいは複数ビット書き込む動作をする。命令とデータとは１つのワードにまとめられる。 Signals “c_sdata_in” and “c_svalid_in” 345 are data / valid signals from the previous module in the module row, and signals “c_sdata_out” and “c_svalid_out” 350 are to the next module in the module row. Data / valid signal. The functions of the standard CBus interface 303 include the following.
1. 1. Register read / write management 2. Read / write management of memory area 3. Test mode read / write management Submodule monitoring / update management 3.12 Register read / write management The standard CBus interface 303 accepts register read / write requests and bit set requests that flow on the CBus. There are the following two types of CBus commands managed by the standard CBus interface.
1. Type A
In type A, other modules operate to read / write 1, 2, 3, and 4 bytes to the register in the standard CBus interface 303. In a write operation, a data cycle occurs in the clock cycle immediately after the instruction cycle. The register write / read type fields are “1000” and “1001”, respectively. The standard CBus interface 303 decodes the instruction and checks whether the instruction points to the address of the module or is a read / write operation. In a read operation, the standard CBus interface 303 uses the “reg” field of the CBus process to select which register output is connected to the “c_sdata” bus 350. In a write operation, the standard CBus interface 303 writes data to the register selected using the “reg” field and the “byte” field. When the read operation ends, the standard CBus interface returns “c_svalid” 350 at the same time as returning the data. When the writing operation is completed, the standard CBus interface 303 sends “c_svalid” 350 and returns a response.
2. Type C
In type C, another module writes one bit or a plurality of bits to one byte in one register. Instruction and data are combined into one word.

標準ＣＢｕｓインタフェース３０３は命令をチェックして、命令がモジュールのアドレスを指しているかを調べる。また、「ｒｅｇ」「ｂｙｔｅ」「ｅｎａｂｌｅ」フィールドを復号して、必要なイネーブル信号を生成する。また、命令のデータフィールドを取り出し、取り出したデータをワードの４バイトすべてに転送する。これにより、必要なビットはすべてのイネーブルバイト中のイネーブルビットに書き込まれることになる。この動作においては返答は必要ない。
３．１３メモリ領域読み出し／書き込み管理
標準ＣＢｕｓインタフェース３０３はＣＢｕｓ上のメモリ読み出し／書き込み要求を受け付ける。メモリ読み出し／書き込み要求を受け付けると、標準ＣＢｕｓインタフェース３０３は要求がモジュールのアドレスを指しているかを調べる。そして、命令のアドレスフィールドを復号することで、標準ＣＢｕｓインタフェースは適切なアドレスと、メモリ読み出し／書き込みを行うサブモジュールへのアドレスストローブ信号３４４とを生成する。書き込み動作では、標準ＣＢｕｓインタフェースは、命令からのバイトイネーブル信号をサブモジュールに転送する。 The standard CBus interface 303 checks the instruction to see if the instruction points to the module address. Further, the “reg”, “byte”, and “enable” fields are decoded to generate a necessary enable signal. Also, the data field of the instruction is extracted, and the extracted data is transferred to all 4 bytes of the word. This causes the necessary bits to be written to the enable bits in all enable bytes. No response is necessary for this operation.
3.13 Memory Area Read / Write Management The standard CBus interface 303 accepts memory read / write requests on the CBus. When a memory read / write request is received, the standard CBus interface 303 checks whether the request points to the module address. Then, by decoding the address field of the instruction, the standard CBus interface generates an appropriate address and an address strobe signal 344 to the submodule that performs memory read / write. In a write operation, the standard CBus interface transfers the byte enable signal from the instruction to the submodule.

標準ＣＢｕｓインタフェース３０３の動作は、ＣＢｕｓ３０２上のＣＢｕｓ命令のタイプフィールドを復号し、次のサイクルにおいてデータがレジスタファイル３０４に取り込まれるか、あるいは他のサブモジュール３４４に転送されるようにするために、レジスタファイル３０４と出力セレクタ３５３に適切なイネーブル信号を生成するような読み出し／書き込み制御部３５２によって制御される。ＣＢｕｓ命令がレジスタ読み出し動作であれば、読み出し／書き込み制御部３５２は出力セレクタ３５３をイネーブルにし、「ｃ＿ｓｄａｔａバス」３４５への正しいレジスタ出力を選択する。命令がレジスタ書き込み動作であれば、読み出し／書き込み制御部３５２はレジスタファイル３０４をイネーブルにし、次にサイクルでデータを選択する。もしその命令がメモリエリアのリード／ライトであれば、読み出し／書き込み制御部３５２は適切な信号３４４を生成し、モジュールが管理するメモリ領域を制御する。レジスタファイル３０４は、レジスタ選択復号部３５５、出力セレクタ３５３、インタラプト３５６、エラー３５７、例外３５８生成部、アンマスクエラー生成部３５９、あるモジュールのレジスタを構成するレジスタ部３６０の４つの部位から構成される。レジスタ選択復号部３５５は、読み出し／書き込み制御部３５２からの信号「ｒｅｆ＿ｅｎ」（レジスタファイルイネーブル）「ｗｒｉｔｅ」「ｒｅｇ」を復号し、あるレジスタをイネーブルにするためのレジスタイネーブル信号を生成する。出力セレクタ３５３は、読み出し／書き込み制御部３５２からの信号「ｒｅｇ」出力に応じて、レジスタ読み出し処理のために正しいレジスタデータを選択しｃ＿ｓｄａｔｅ＿ｏｕｔラインに出力する。 The operation of the standard CBus interface 303 is to decode the type field of the CBus instruction on the CBus 302 so that in the next cycle the data is captured in the register file 304 or transferred to another submodule 344. The read / write control unit 352 generates appropriate enable signals for the register file 304 and the output selector 353. If the CBus instruction is a register read operation, the read / write control unit 352 enables the output selector 353 and selects the correct register output to the “c_sdata bus” 345. If the instruction is a register write operation, the read / write control unit 352 enables the register file 304 and then selects data in a cycle. If the instruction is a read / write of a memory area, the read / write control unit 352 generates an appropriate signal 344 and controls a memory area managed by the module. The register file 304 includes four parts: a register selection decoding unit 355, an output selector 353, an interrupt 356, an error 357, an exception 358 generation unit, an unmask error generation unit 359, and a register unit 360 that constitutes a register of a module. . The register selection decoding unit 355 decodes the signals “ref_en” (register file enable) “write” and “reg” from the read / write control unit 352, and generates a register enable signal for enabling a certain register. The output selector 353 selects correct register data for register read processing in accordance with the output of the signal “reg” from the read / write control unit 352, and outputs it to the c_date_out line.

例外生成部３５６〜３５９は入力中にエラーが検出されたら出力エラー信号（例えば、３４７〜３４９、３６２）を生成する。各出力エラーを計算する手法は前述の通りである。レジスタ部３６０は、表５においてレジスタセットの構成を説明したときに論じたように、要求に応じて種々のタイプになり得る。
３．１４ＣＢｕｓ構成
前述の通り、ＣＢｕｓ（制御バス）は、各モジュールの標準ＣＢｕｓインタフェース中のレジスタをセットするための情報を転送することによって、全体的に各モジュールを制御する。標準ＣＢｕｓインタフェースの記述から明らかなように、ＣＢｕｓは以下の二つの目的を有する。
１．各モジュールを駆動する制御バス
２．ＲＡＭ，ＦＩＦＯ，各モジュール中の状態情報のためのアクセスバス
ＣＢｕｓは命令−アドレス−データプロトコルを用いて、モジュール中の構成レジスタをセットすることにより、モジュールを制御する。一般に、レジスタは各命令ごとにセットされるが、修正はどの時点でも行うことができる。ＣＢｕｓは状態情報や他の情報を集め、データを要求することにより種々のモジュールからＲＡＭやＦＩＦＯデータにアクセスする。 The exception generation units 356 to 359 generate output error signals (for example, 347 to 349 and 362) when an error is detected during input. The method for calculating each output error is as described above. The register unit 360 can be of various types as required as discussed in Table 5 when describing the configuration of the register set.
3.14 CBus Configuration As described above, the CBus (control bus) controls each module as a whole by transferring information for setting registers in the standard CBus interface of each module. As is clear from the description of the standard CBus interface, CBus has the following two purposes.
1. 1. Control bus that drives each module RAM, FIFO, access bus for status information in each module The CBus controls the module by setting configuration registers in the module using an instruction-address-data protocol. In general, registers are set for each instruction, but modifications can be made at any time. The CBus collects status information and other information and requests data from various modules to access RAM and FIFO data.

ＣＢｕｓは以下の３つのどちらかにより処理ごとに駆動される。
１．命令実行時の命令制御部２３５（図２）
２．ターゲット（スレーブ）モードバス動作実行時の外部インタフェース制御部２３８（図２）
３．外部ＣＢｕｓインタフェースが構成された際には外部デバイス
いずれの場合でも、駆動モジュールはＣＢｕｓの発モジュールとなり、他のすべてのモジュールが可能な着モジュールとなる。バスの調整処理は命令制御部が行う。 The CBus is driven for each process by one of the following three methods.
1. Instruction control unit 235 during instruction execution (FIG. 2)
2. External interface controller 238 (FIG. 2) when executing the target (slave) mode bus operation
3. When the external CBus interface is configured, in any case of the external device, the drive module is a CBus source module, and all other modules are possible destination modules. The instruction control unit performs bus adjustment processing.

以下の表は、好適な実施例において用いるのに適しているＣＢｕｓ信号の一つの定義を示したものである。
ＣＢｕｓ信号定義 The following table shows one definition of a CBus signal that is suitable for use in the preferred embodiment.
CBus signal definition

ＣＢｕｓのｃ＿ｉａｄ信号はアドレスデータを含み、二つの異なるサイクルにおいて制御部によって駆動される。
１．ｃ＿ｉａｄ上でＣＢｕｓ命令やアドレスが駆動される命令サイクル（ｃ＿ｖａｌｉｄ高）
２．ｃ＿ｉａｄ（書き込み動作）やｃ＿ｓｄａｔａ（読み出し動作）上でデータが駆動されるデータサイクル（ｃ＿ｖａｌｉｄ低）
書き込み動作の場合は、命令に関するデータは命令サイクルの直後にｃ＿ｉａｄバス上に置かれる。読み出し動作の場合は、データサイクルが終了するまで読み出し動作のターゲットモジュールがｃ＿ｓｄａｔａ信号を駆動する。 The CBus c_iad signal contains address data and is driven by the controller in two different cycles.
1. Instruction cycle in which CBus instruction and address are driven on c_iad (c_valid high)
2. Data cycle in which data is driven on c_iad (write operation) or c_sdata (read operation) (c_valid low)
For a write operation, the data for the instruction is placed on the c_iad bus immediately after the instruction cycle. In the case of a read operation, the target module for the read operation drives the c_sdata signal until the data cycle is completed.

図２１において、バスは３２ビットの命令−アドレス−データフィールドを含む。このフィールドは以下の３つのタイプ（３７０〜３７２）がある。
１．タイプＡ動作（３７０）は、コプロセッサ中のレジスタや各モジュールのデータ領域の読み出し／書き込みを行うために用いられる。これらの動作は、ターゲットモードＰＣＩサイクルを実行している外部インタフェース制御部２３８、命令のためにコプロセッサを構成している命令制御部２３１、外部ＣＢｕｓインタフェースによって生成される。 In FIG. 21, the bus includes a 32-bit instruction-address-data field. This field has the following three types (370 to 372).
1. The type A operation (370) is used to read / write a register in the coprocessor and a data area of each module. These operations are generated by the external interface control unit 238 executing the target mode PCI cycle, the instruction control unit 231 configuring the coprocessor for the instruction, and the external CBus interface.

これらの動作では、命令サイクルの直後のクロックサイクルがデータサイクルとなる。
２．タイプＢ動作（３７１）は診断モードで用いられ、ローカルメモリにアクセスしたり、一般インタフェース上のサイクルを生成する。これらの動作は、ターゲットモードＰＣＩサイクルを実行している外部インタフェース制御部や外部ＣＢｕｓインタフェースによって生成される。データサイクルは命令サイクルの後のどの時点でも良く、データサイクルはｃ＿ｓｖａｌｉｄ信号を用いて着モジュールから返答される。
３．タイプＣ動作（３７２）はモジュールのレジスタ中の各ビットをセットするために用いられる。これらの動作は、命令のためにコプロセッサを構成している命令制御部２３１や外部ＣＢｕｓインタフェースによって生成される。タイプＣ動作ではデータサイクルはなく、データは命令サイクル中に含まれる。 In these operations, the clock cycle immediately after the instruction cycle is the data cycle.
2. Type B operation (371) is used in diagnostic mode to access local memory and generate cycles on the general interface. These operations are generated by the external interface control unit or the external CBus interface executing the target mode PCI cycle. The data cycle can be any time after the instruction cycle, and the data cycle is returned from the destination module using the c_svalid signal.
3. Type C operation (372) is used to set each bit in the module's register. These operations are generated by an instruction control unit 231 and an external CBus interface that constitute a coprocessor for instructions. In Type C operation, there is no data cycle, and data is included in the instruction cycle.

各命令のタイプフィールドは、以下の表に従って関連するＣＢｕｓ処理を符号化したものである。
ＣＢｕｓ処理タイプ The type field of each instruction is an encoding of the associated CBus process according to the following table.
CBus processing type

バイトフィールドは、レジスタ中のビットをセットするために用いられる。モジュールフィールドはＣＢｕｓ上の命令のアドレス先モジュールを指定するフィールドである。レジスタフィールドはモジュール中のどのレジスタを更新するかを指定するフィールドである。アドレスフィールドは、動作を行うメモリ部位を指定するフィールドである、ＲＡＭ，ＦＩＦＯなどのアドレスを指定するものである。イネーブルフィールドは、ビット設定命令が用いられたときに選択されたバイト中の選択されたビットをイネーブルにするフィールドである。データフィールドは、更新されるべきバイトに書き込まれるビットデータを含む。 The byte field is used to set a bit in the register. The module field is a field for designating an address destination module of an instruction on the CBus. The register field is a field for designating which register in the module is updated. The address field is used for designating an address such as RAM or FIFO, which is a field for designating a memory part to be operated. The enable field is a field that enables a selected bit in a selected byte when a bit setting instruction is used. The data field contains bit data that is written to the byte to be updated.

前述の通り、ＣＢｕｓは各モジュールごとに、モジュールが命令実行中のときに送出されるｃ＿ａｃｔｉｖｅラインを含む。命令制御部はこの信号に基づいて命令の終了時を知ることができる。また、ＣＢｕｓは各モジュールごとにバックグラウンドモード時に動作するｃ＿ｂａｃｋｇｒｏｕｎｄラインを、リセット、エラー検出、インタラプトを行うためのリセット、エラー、インタラプトラインとともに含む。
３．１５コプロセッサデータタイプとデータ操作
図２において、コプロセッサ部２２４の動作、特にＪＰＥＧ符号化器２４１や主データパスのコプロセッサ中の主な計算処理動作を簡潔にするため、コプロセッサは外部フォーマットと内部フォーマットとを差別化するデータモデルを用いる。外部データフォーマットは、ローカルメモリインタフェースやＰＣＩバスなどのコプロセッサの外部インタフェースに現われるデータフォーマットである。逆に、内部データフォーマットは、コプロセッサ２２４の主機能モジュール間で現われるフォーマットである。図２２は、種々の入力／出力フォーマットを模式的に示した図である。入力外部フォーマット３８１は、ピクセルオーガナイザ２４６、オペランドオーガナイザＢ２４７，オペランドオーガナイザＣ２４８への入力フォーマットである。これらのオーガナイザは、入力外部フォーマットを、ＪＰＥＧ符号化器２４１や主データパス部２４２へ入力される入力内部フォーマット３８２に再フォーマットする。また、これら２つの機能部は出力データを出力内部フォーマットで出力し、結果オーガナイザ２４９が出力内部フォーマットを所望出力フォーマット３０４に変換する。 As described above, the CBus includes, for each module, a c_active line that is sent when the module is executing an instruction. The command control unit can know the end time of the command based on this signal. In addition, the CBus includes a c_background line that operates in the background mode for each module, together with reset, error detection, and reset for performing interrupts, errors, and interrupt lines.
3.15 Coprocessor Data Type and Data Manipulation In FIG. 2, to simplify the operation of the coprocessor unit 224, particularly the main computational operations in the JPEG encoder 241 and the main data path coprocessor, Use a data model that differentiates between external and internal formats. The external data format is a data format that appears on an external interface of a coprocessor such as a local memory interface or a PCI bus. Conversely, the internal data format is a format that appears between the main functional modules of the coprocessor 224. FIG. 22 is a diagram schematically showing various input / output formats. The input external format 381 is an input format to the pixel organizer 246, operand organizer B247, and operand organizer C248. These organizers reformat the input external format into an input internal format 382 that is input to the JPEG encoder 241 and the main data path unit 242. These two functional units output the output data in the output internal format, and the result organizer 249 converts the output internal format into the desired output format 304.

実施例では、外部データフォーマットは３つのタイプに分けられる。第一のタイプは、データごとに４つまでのチャネルを有し、各チャネルが１、２、４、８、あるいは１６ビットサンプルから成り立っているような連続ストリームから成るデータの「パックストリーム」である。パックストリームは、ピクセル、ピクセルに変換されるデータ、まとめられたビットなどを表現する際に用いられる。また、コプロセッサはリトルエンディアンバイトアドレッシングとバイト中ではビッグエンディアンビットアドレッシングを用いる。図２３はパックストリームフォーマットの第一の例を示している。ここでは、各オブジェクト３８７は、各チャネルごとに２ビットのチャネル０、チャネル１、チャネル２の三つのチャネルから構成される。このフォーマットのデータ配置が３８８である。図２４の次の例３９０では、各データオブジェクトが３２ビットワードを有し、チャネルごとに８ビット有する４チャネルオブジェクト３９５が示されている。図２５の第三の例３９５では、ビットアドレス３９７から始まるチャネルごとに８ビットを有するチャネルオブジェクト３９６が示されている。もちろん、アプリケーションに応じて、データチャネルの実際の幅や数は変化する。 In the embodiment, the external data format is divided into three types. The first type is a “pack stream” of data consisting of a continuous stream with up to four channels per data, each channel consisting of 1, 2, 4, 8 or 16 bit samples. is there. The pack stream is used to represent pixels, data to be converted into pixels, collected bits, and the like. The coprocessor uses little endian byte addressing and big endian bit addressing in the byte. FIG. 23 shows a first example of the pack stream format. Here, each object 387 includes three channels of channel 0, channel 1, and channel 2 of 2 bits for each channel. The data arrangement of this format is 388. In the next example 390 of FIG. 24, a 4-channel object 395 is shown with each data object having a 32-bit word and 8 bits per channel. In the third example 395 of FIG. 25, a channel object 396 is shown having 8 bits per channel starting at bit address 397. Of course, depending on the application, the actual width and number of data channels will vary.

外部データフォーマットの第二のタイプは「アンパックバイトストリーム」であり、各ワード中の１バイトのみが有効であるような３２ビットワードのシーケンスである。このフォーマットの例が図２６の３９９として示されており、各ワード中の単一バイト４００のみが用いられる。さらなる外部データフォーマットは「他」フォーマットとして分類されるオブジェクトで表現される。一般に、これらのデータオブジェクトは色空間変換表、ハフマン符号化表などの大きな表型のデータである。 The second type of external data format is an “unpacked byte stream”, which is a sequence of 32-bit words such that only one byte in each word is valid. An example of this format is shown as 399 in FIG. 26, where only a single byte 400 in each word is used. Further external data formats are represented by objects classified as “other” formats. Generally, these data objects are large tabular data such as a color space conversion table and a Huffman encoding table.

コプロセッサは４つの内部データタイプを用いる。第一のタイプは「パックバイト」フォーマットであり、最後の３２ビットワードを除いて４アクティブバイトの３２ビットワードから成るフォーマットである。図２７に、ワードが４バイトであるパックバイトフォーマットの例４０２を示す。図２８に示す次のデータタイプは「ピクセル」フォーマットであり、４アクティブバイトチャネルの３２ビットワード４０３から成るフォーマットである。このピクセルフォーマットは４つのチャネルデータとして解釈される。 The coprocessor uses four internal data types. The first type is a “packed byte” format, which consists of a 32-bit word of 4 active bytes, excluding the last 32-bit word. FIG. 27 shows an example 402 of packed byte format in which a word is 4 bytes. The next data type shown in FIG. 28 is the “pixel” format, which consists of a 32-bit word 403 with 4 active byte channels. This pixel format is interpreted as four channel data.

図２９に示す次の内部データタイプは「アンパックバイト」フォーマットであり、各ワードは一つのアクティブバイトチャネル４０５と三つの非アクティブバイトチャネルから成るフォーマットである。この際、アクティブバイトチャネルは最小バイトを占める。他の内部データオブジェクトは「他」データフォーマットとして区分される。外部フォーマットの入力データは適切な内部フォーマットに変換される。図３０は、種々のオーガナイザによって実行される外部フォーマット４１０から入力フォーマット４１１への変換形態を示している。図３１は、結果オーガナイザ２４９によって実行される内部フォーマット４１２から外部フォーマット４１３への変換形態を示している。 The next internal data type shown in FIG. 29 is an “unpacked byte” format, where each word is a format consisting of one active byte channel 405 and three inactive byte channels. In this case, the active byte channel occupies the smallest byte. Other internal data objects are classified as “other” data formats. Input data in external format is converted to an appropriate internal format. FIG. 30 shows the conversion form from the external format 410 to the input format 411 executed by various organizers. FIG. 31 shows a conversion form from the internal format 412 to the external format 413 executed by the result organizer 249.

以下、変換を実行する処理をより詳細に説明する。まず入力データ外部フォーマットから内部フォーマットへの変換であるが、図３２は変換処理において種々のオーガナイザによって用いられる手法を示している。はじめは外部他フォーマット４１６であるが、これは種々のオーガナイザを経ずに単に通過する。次に、外部アンパックバイトフォーマット４１７は、アンパック正規化４１８を行って内部アンパックバイトと呼ばれるフォーマット４１９を生成する。アンパック正規化４１８処理は、外部アンパックバイトストリームから非アクティブ３バイトを取り除く処理を行う。図３３はアンパック正規化処理を示したものであるが、４バイトチャネルを有する入力のうち１つのバイトチャネルのみが出力フォーマット４１９において有効な結果となっており、単なるバイトを出力している様子を示している。 Hereinafter, the process for executing the conversion will be described in more detail. First, conversion from the external format of the input data to the internal format is shown in FIG. Initially it is an external other format 416, which simply passes through various organizers. Next, the external unpack byte format 417 performs unpack normalization 418 to generate a format 419 called an internal unpack byte. The unpack normalization 418 process performs a process of removing inactive 3 bytes from the external unpacked byte stream. FIG. 33 shows the unpacking normalization process. Of the inputs having 4 byte channels, only one byte channel has a valid result in the output format 419, and a state where simple bytes are output. Show.

図３２において、パック正規化４２１処理は、外部パックストリーム４２２中の要素オブジェクトをバイトストリーム４２３に変換する処理を行う。チャネルの各要素のサイズがバイト以下であれば、サンプルは８ビット値に補間される。例えば、４ビット単位をバイト単位に変換する場合には、４ビット値０ｘＮはバイト値０ｘＮＮに変換される。１バイト以上のオブジェクトの場合には切り捨てが行われる。ストリーム４２２でサポートされる入力オブジェクトサイズは、１、２、４、８、１６ビットサイズである。なお、これらは、本発明が適用されるシステム中のデータオブジェクトやワードの全幅に依存する。 In FIG. 32, pack normalization 421 processing performs processing for converting an element object in the external pack stream 422 into a byte stream 423. If the size of each element of the channel is less than or equal to bytes, the sample is interpolated to an 8-bit value. For example, when converting a 4-bit unit into a byte unit, a 4-bit value 0xN is converted into a byte value 0xNN. In the case of an object of 1 byte or more, truncation is performed. The input object sizes supported by the stream 422 are 1, 2, 4, 8, and 16 bit sizes. Note that these depend on the total width of data objects and words in the system to which the present invention is applied.

図３４は、チャネルごとに（図２３のデータフォーマット３８６ごとのように）２ビット有する３チャネルオブジェクト形式の入力データ４２２が入力されたときのパック正規化４２１の様子を示している。出力データはバイトチャネルフォーマット４２３になっている。この際、必要であれば各チャネルに「補間処理」が施され、８ビットサンプルが生成される。 FIG. 34 shows a state of pack normalization 421 when input data 422 in a 3-channel object format having 2 bits is input for each channel (as in the data format 386 of FIG. 23). The output data is in the byte channel format 423. At this time, if necessary, an “interpolation process” is performed on each channel to generate 8-bit samples.

図３２において、ピクセルストリームはその後、パック処理４２５、アンパック処理４２６、要素選択処理４２７のいずれかに送られる。図３５はパック処理４２５の例を示したもので、単に非アクティブバイトチャネルが取り除かれ、ワードごとの４アクティブバイトにパックされたバイトストリームが生成される様子を示している。即ち、単一の有効バイトストリーム４３０がワードごとの４アクティブバイトを有するフォーマット４３１に圧縮される。アンパック処理４２６はほぼパック処理の反対の処理であり、アンパックバイトがワードの最小バイトとなる。図３６は、パックバイトストリーム４３３がアンパックされ結果４３４が得られる様子を示している。 In FIG. 32, the pixel stream is then sent to one of a pack process 425, an unpack process 426, and an element selection process 427. FIG. 35 shows an example of the pack process 425, which simply shows that the inactive byte channel is removed and a byte stream packed into 4 active bytes per word is generated. That is, a single valid byte stream 430 is compressed into a format 431 having 4 active bytes per word. The unpacking process 426 is almost the opposite of the packing process, and the unpacked byte is the smallest byte of the word. FIG. 36 shows how the packed byte stream 433 is unpacked and a result 434 is obtained.

図３７は要素選択４２７処理を示したものであり、Ｎを単位ごとの入力チャネル数とすると、入力ストリームからＮ要素を選択する処理である。アンパック処理は「プロトタイプピクセル」、例えば４３７を生成するときに用いられる。なお、ピクセルチャネルは最小バイトから埋められる。図３８は、形式４３６の入力データが要素選択部４２７によって変換され、プロトタイプピクセルフォーマット４３７が生成される様子を示している。 FIG. 37 shows the element selection 427 process, which is a process of selecting N elements from the input stream, where N is the number of input channels per unit. Unpacking is used when generating “prototype pixels”, eg, 437. Note that the pixel channel is filled from the smallest byte. FIG. 38 shows a state where input data of the format 436 is converted by the element selection unit 427 and a prototype pixel format 437 is generated.

要素選択が行われると、要素入替処理４４０（図３２）が行われる。図３８は要素入替処理の様子を示したもので、内部データレジスタ４４１に格納された一定値で選択要素を入れ替え、例のように出力要素２４２を生成する様子を示している。図３２において、処理段４２５、５２６、４４０の出力はレーンスワップ処理４４４に送られる。図３９に示されているように、レーンスワップ処理はあるレーンを他のレーンにバイトごとに多重化する処理であり、あるレーンを他のレーンに複製する処理をも含む。図３８の例では、チャネル３とチャネル１とを入れ替え、チャネル３をチャネル２とチャネル１に複製する様子が示されている。 When element selection is performed, element replacement processing 440 (FIG. 32) is performed. FIG. 38 shows a state of the element replacement process, and shows a state in which the selected element is replaced with a constant value stored in the internal data register 441 and the output element 242 is generated as in the example. In FIG. 32, the outputs of the processing stages 425, 526, and 440 are sent to the lane swap processing 444. As shown in FIG. 39, the lane swap process is a process of multiplexing a certain lane to another lane for each byte, and includes a process of copying a certain lane to another lane. In the example of FIG. 38, channel 3 and channel 1 are exchanged, and channel 3 is duplicated to channel 2 and channel 1.

図３２において、レーンスワップ処理４４４が終わると、データストリームが再読み出しされて複製処理４４６に移る前に、マルチユースト値ＲＡＭ２５０に格納されることもある。複製処理４４６は単にデータオブジェクトを複製する処理である。図４０は、複製処理４４６をピクセルデータに適用した様子であり、複製ファクターは１である。 In FIG. 32, when the lane swap process 444 is completed, the data stream may be stored in the multi-use value RAM 250 before being read again and transferred to the duplication process 446. The duplication process 446 is simply a process for duplicating a data object. FIG. 40 shows that the duplication process 446 is applied to the pixel data, and the duplication factor is 1.

図４１は、複製処理をパックバイトデータに適用した様子である。図４２は、出力内部フォーマット３８３から出力外部フォーマット３８４にデータを変換する結果オーガナイザ２４９の処理を示したものである。この処理では、図３２に示した変換処理と同様の処理４２４、４２５、４２６、４４０を含むが、処理４５０では更に要素非選択４５１、非正規化４５２、バイトアドレシング４５３、書き込みマスキング４５４の処理を含んでいる。図４３に示した要素非選択処理４５１は、図３７の要素選択処理の逆処理であり、不必要なデータが削除される。例えば、図４３では、入力中の３つの有効チャネルのみが取り出され、データ項目４５６にパックされる。 FIG. 41 shows a state in which replication processing is applied to packed byte data. FIG. 42 shows the processing of the result organizer 249 for converting data from the output internal format 383 to the output external format 384. This process includes the same processes 424, 425, 426, and 440 as the conversion process shown in FIG. Contains. The element non-selection process 451 shown in FIG. 43 is the reverse process of the element selection process of FIG. 37, and unnecessary data is deleted. For example, in FIG. 43, only the three valid channels being entered are retrieved and packed into data items 456.

図４４に示した非正規化処理は、図３４で示したパック正規化処理４２１のほぼ反対の動作をする。非正規化処理では、バイト単位で扱われていた各オブジェクトあるいはデータ項目を非バイト値に変換する処理が行われる。図４２のバイトアドレシング処理４５３は、バイトアドレシングに必要なバイトごとの再構成処理を行う。外部アンパックバイト出力ストリームでは、ストリームアドレスの最小２ビットがアクティブストリームに対応する。バイトアドレシング処理４５３では、外部アンパックバイトが用いられているとき（図４５）、１つのバイトチャネルから他のチャネルバイトに出力ストリームが再マップされる。外部パックストリームが用いられているときは（図４６）、バイトアドレシングモジュール４５３は出力ストリームの開始アドレスを図示のように再マップする。 The denormalization process shown in FIG. 44 performs substantially the opposite operation of the pack normalization process 421 shown in FIG. In the denormalization process, a process for converting each object or data item that has been handled in byte units into a non-byte value is performed. The byte addressing process 453 of FIG. 42 performs a reconfiguration process for each byte necessary for byte addressing. In the external unpacked byte output stream, a minimum of 2 bits of the stream address corresponds to the active stream. In byte addressing process 453, when an external unpacked byte is used (FIG. 45), the output stream is remapped from one byte channel to another channel byte. When an external packed stream is used (FIG. 46), the byte addressing module 453 remaps the starting address of the output stream as shown.

図４２の書き込みマスク処理４５４を図４７に示す。書き込みされないパックストリームのあるチャネル（例えば４６０）をマスクする処理である。適用される入力／出力データタイプ変換は、以下のデータ操作レジスタの内容に基づいて決められる。
＊ピクセルオーガナイザデータ操作レジスタ（ｐｏ＿ｄｍｒ）
＊オペランドオーガナイザＢとオペランドオーガナイザＣデータ操作レジスタ（ｏｏｒ＿ｄｍｒ，ｏｏｃ＿ｄｍｒ）
＊結果オーガナイザデータ操作レジスタ（ｒｏ＿ｄｍｒ）
命令のための各データ操作レジスタの設定は、以下の２つの方法によってなされる。
１．命令実行の直前にコプロセッサレジスタに書き込む標準手法を用いて設定される
２．現在の命令に基づいてコプロセッサ自身で設定される
命令復号処理では、コプロセッサはデータの命令ワードやデータワードの内容を調べ、種々のデータ操作レジスタをどのように設定するかを決定する処理を他の処理とともに行う。なお、命令とオペランドのすべての組み合わせが有効であるわけではない。いくつかの命令ではオペランドフォーマットを規定しているものもある。不適切なオペランドを含む命令の場合、「定義されていない」結果が生成されることになるが、エラーを生じることなく終了してしまうこともある。対応するデータ記述子の「Ｓ」ビットが０であれば、コプロセッサはデータ操作レジスタをセットし、現命令を反映させる。 The write mask process 454 of FIG. 42 is shown in FIG. This is a process of masking a channel (for example, 460) having a pack stream that is not written. The applied input / output data type conversion is determined based on the contents of the following data manipulation registers.
* Pixel organizer data operation register (po_dmr)
* Operand Organizer B and Operand Organizer C data operation registers (oor_dmr, ooc_dmr)
* Result organizer data operation register (ro_dmr)
Each data operation register for an instruction is set by the following two methods.
1. 1. Set using standard techniques to write to coprocessor registers immediately before instruction execution. In the instruction decoding process, which is set by the coprocessor itself based on the current instruction, the coprocessor examines the instruction word of data and the contents of the data word and determines how to set various data manipulation registers. Perform with other processing. Note that not all combinations of instructions and operands are valid. Some instructions specify an operand format. An instruction with an inappropriate operand will produce an “undefined” result, but may terminate without an error. If the “S” bit in the corresponding data descriptor is 0, the coprocessor sets the data manipulation register to reflect the current instruction.

図４８はデータ操作レジスタのフォーマットを示した図である。以下の表は、図４８に示されたレジスタ中の種々のビットフォーマットを示している。
データ操作レジスタフォーマット FIG. 48 shows the format of the data operation register. The following table shows the various bit formats in the register shown in FIG.
Data manipulation register format

各１つの命令において、複数の内部／外部データタイプが用いられることがある。オペランド、結果、命令タイプのすべて組み合わせは有効ではあるが、これらの組み合わせの一部のみが意味のある結果を生成する。各命令に対して期待されるオペランドと結果データタイプの具体的な組み合わせを表９に示す。表９は、外部／内部フォーマットにおいて期待されるデータタイプをまとめたものである。 Multiple internal / external data types may be used in each one instruction. All combinations of operands, results, and instruction types are valid, but only some of these combinations produce meaningful results. Table 9 shows specific combinations of expected operands and result data types for each instruction. Table 9 summarizes the expected data types in the external / internal format.

期待されるデータタイプ Expected data type

なお、表９において用いたシンボルは以下の通りである。
シンボルの説明 The symbols used in Table 9 are as follows.
Symbol description

３．１６データ正規化回路
図４９は、３つの主機能ブロックを含むコンピュータグラフィックスプロセッサを示している。３つの主機能ブロックは、ピクセルオーガナイザ２４６とオペランドオーガナイザＢ、Ｃ２４７、２４８中のデータ正規化部１０６２、主データパス２４２あるいはＪＰＥＧ部２４１の中央グラフィックスエンジン、命令制御部２３５中のプログラミングエージェント１０６４である。データ正規化部１０６２と中央グラフィックスエンジンの動作は、プログラミングエージェント１０６４への命令ストリーム１０６４によって決定される。各命令ごとに、プログラミングエージェント１０６４は復号処理を行い、内部制御信号１０６７と１０６８をシステム中の他のブロックに出力する。各入力データワード１０６９ごとに、正規化部１０６２は現命令に基づいてデータのフォーマットを行い、処理結果をさらなる処理が実行される中央グラフィックスエンジン１０６３に送出する。 3.16 Data Normalization Circuit FIG. 49 shows a computer graphics processor that includes three main functional blocks. The three main functional blocks are pixel organizer 246 and operand organizer B, data normalization unit 1062 in C247, 248, central data engine in main data path 242 or JPEG unit 241, and programming agent 1064 in instruction control unit 235. is there. The operation of the data normalizer 1062 and the central graphics engine is determined by the instruction stream 1064 to the programming agent 1064. For each instruction, the programming agent 1064 performs a decoding process and outputs internal control signals 1067 and 1068 to the other blocks in the system. For each input data word 1069, the normalization unit 1062 formats the data based on the current instruction and sends the processing result to the central graphics engine 1063 where further processing is performed.

データ正規化部は、簡潔にはピクセルオーガナイザとオペランドオーガナイザＢ，Ｃを意味する。これらのオーガナイザはデータ正規化回路を含み、入力データを適切に正規化した後、ＪＰＥＧ符号化あるいは主データパス中で中央グラフィックスエンジンに結果を送出する。中央グラフィックスエンジン１０６３は、３２ビットピクセルである標準フォーマットのデータに対して動作する。従って、正規化部は入力データを３２ビットピクセルフォーマットに変換する処理を行う。正規化部への入力データワード１０６９も３２ビット幅を有するが、パック要素あるいはアンパックバイトのいずれかのフォーマットであってもよい。パック要素入力ストリームは、データオブジェクトが１，２，４，８，１６バイト幅であるようなデータワード中での連続するオブジェクトから成る。一方、アンパックバイト入力ストリームは、８ビットのバイトのみが有効であるような３２ビットのワードから成る。更に、正規化部で生成されるピクセルデータ１１は、チャネルが８ビット幅で定義されるような１，２，３，４個の有効チャネルから成る。 The data normalization unit simply means a pixel organizer and operand organizers B and C. These organizers include a data normalization circuit that properly normalizes the input data and then sends the results to the central graphics engine in a JPEG encoding or main data path. The central graphics engine 1063 operates on standard format data that is 32 bit pixels. Therefore, the normalization unit performs processing for converting the input data into a 32-bit pixel format. The input data word 1069 to the normalization unit also has a 32-bit width, but may be in the format of packed elements or unpacked bytes. A packed element input stream consists of consecutive objects in a data word such that the data object is 1, 2, 4, 8, 16 bytes wide. On the other hand, the unpacked byte input stream consists of 32-bit words such that only 8-bit bytes are valid. Further, the pixel data 11 generated by the normalization unit is composed of 1, 2, 3, and 4 effective channels whose channels are defined with an 8-bit width.

図５０は、データ正規化部１０６２の具体的なハードウェア構成を示した図である。データ正規化部１０６２は、ＦＩＦＯバッファ（ＦＩＦＯ）１０７３、３２ビット入力レジスタ（ＲＥＧ１）、３２ビット出力レジスタ（ＲＥＧ２）、正規化マルチプレクサ１０７５，制御部１０７６から成る。入力データワード１０６９はＦＩＦＯ１０７３に格納された後、（ＲＥＧ１）１０７４にすべての入力ビットが所望出力フォーマットに変換されるまでラッチされる。正規化マルチプレクサ１０７５は、（ＲＥＧ１）１０７４中の値と（ＦＩＦＯ）１０７３の現出力とからのビットを選択することで、ＲＥＧ２にラッチされるピクセルを生成するような３２組み合わせスイッチを備える。即ち、正規化マルチプレクサ１０７５はｘ［６３．．３２］とｘ［３１．．０］とで示される２つの３２ビット入力ワード１０７７、１０７８を入力とする。 FIG. 50 is a diagram illustrating a specific hardware configuration of the data normalization unit 1062. The data normalization unit 1062 includes a FIFO buffer (FIFO) 1073, a 32-bit input register (REG1), a 32-bit output register (REG2), a normalization multiplexer 1075, and a control unit 1076. After the input data word 1069 is stored in the FIFO 1073, it is latched in (REG1) 1074 until all input bits are converted to the desired output format. Normalization multiplexer 1075 includes 32 combinational switches that select the bits from the value in (REG1) 1074 and the current output of (FIFO) 1073 to generate a pixel that is latched into REG2. That is, the normalization multiplexer 1075 has x [63. . 32] and x [31. . 0] and two 32-bit input words 1077, 1078 as inputs.

このような手法を用いることで、特に命令処理においてＦＩＦＯが少なくとも２つの有効データワードを有する場合に、装置の全体スループットを向上させることができる。これは、データワードをメモリからフェッチする手法によるものである。所望データワードあるいはオブジェクトがＦＩＦＯバッファ中の隣接する入力データワードに拡散あるいは「ラップ」されていることがあるが、入力レジスタ１０７４を用いることで、ＦＩＦＯバッファ中の隣接データワードからの要素を用いて完全な入力データを再構成することができ、主データ操作処理段に先立って必要となるさらなる記憶装置やビットストリップ処理を省くことができる。類似のタイプの複数データワードが正規化部に入力されるような場合には、このような構成が大きな利点となる。 By using such a technique, it is possible to improve the overall throughput of the apparatus, particularly when the FIFO has at least two valid data words in instruction processing. This is due to the technique of fetching data words from memory. The desired data word or object may be spread or “wrapped” into adjacent input data words in the FIFO buffer, but by using the input register 1074, elements from adjacent data words in the FIFO buffer are used. Complete input data can be reconstructed, and additional storage and bit strip processing required prior to the main data manipulation processing stage can be omitted. Such a configuration is a great advantage when a plurality of similar types of data words are input to the normalization unit.

制御部は、ＲＥＧ１１０７４やＲＥＧ２１０７６を更新するイネーブル信号ＲＥＧ１＿ＥＮ１０８０やＲＥＧ２＿ＥＮ［３．．０］１０８１を生成するとともに、ＦＩＦＯ１０７３や正規化マルチプレクサ１０７５を制御する信号をも生成する。図４９のプログラミングエージェント１０６４はデータ正規化部１０６２に対して次のような構成信号を送出する。ＦＩＦＯ＿ＷＲ４信号、正規化ファクターｎ［２．．０］、ビットオフセットｂ［２．．０］、チャネルカウントｃ［１．．０］、外部フォーマット（Ｅ）といった信号である。入力データは，有効データが存在するクロックサイクルごとにＦＩＦＯ＿ＷＲ信号１０８５を送出することにより、ＦＩＦＯ１０７３に書き込まれる。領域が得られないときには、ＦＩＦＯはｆｉｆｏ＿ｆｕｌｌ状態フラグ１０８６を送出する。３２ビット入力データが与えられると、外部フォーマット信号を用いて、入力がパックストリームフォーマット（Ｅ＝１）であるかアンパックバイト（Ｅ＝０）であるかが調べられる。Ｅ＝１の場合には、正規化ファクターはパックストリームの各要素サイズとなる。即ち、ｎ＝０は１ビット幅の要素、ｎ＝１は２ビット幅要素、ｎ＝２は４ビット幅要素、ｎ＝３は８ビット幅要素、ｎ＞３は１６ビット幅要素を示す。また、チャネルカウントは、所望有効バイト数でピクセルを生成するためにクロックサイクルごとにフォーマットする連続した入力オブジェクトの最大数である。具体的には、ｃ＝１は最小バイトのみが有効であるピクセル、ｃ＝２は最小２バイトが有効であるピクセル、ｃ＝３は最小３バイトが有効であるピクセル、ｃ＝０はすべての４バイトが有効であるピクセルである。 The control unit updates the enable signals REG1_EN1080 and REG2_EN [3. . 0] 1081 and a signal for controlling the FIFO 1073 and the normalizing multiplexer 1075 are also generated. The programming agent 1064 in FIG. 49 sends the following configuration signal to the data normalization unit 1062. FIFO_WR4 signal, normalization factor n [2. . 0], bit offset b [2. . 0], channel count c [1. . 0] and external format (E). Input data is written into the FIFO 1073 by sending out a FIFO_WR signal 1085 every clock cycle in which valid data exists. If the area is not available, the FIFO sends a fifo_full status flag 1086. Given 32-bit input data, an external format signal is used to check whether the input is packed stream format (E = 1) or unpacked byte (E = 0). When E = 1, the normalization factor is the size of each element of the pack stream. That is, n = 0 indicates a 1-bit width element, n = 1 indicates a 2-bit width element, n = 2 indicates a 4-bit width element, n = 3 indicates an 8-bit width element, and n> 3 indicates a 16-bit width element. The channel count is also the maximum number of consecutive input objects that are formatted every clock cycle to generate a pixel with the desired number of valid bytes. Specifically, c = 1 is a pixel where only the minimum byte is valid, c = 2 is a pixel where a minimum of 2 bytes is valid, c = 3 is a pixel where a minimum of 3 bytes is valid, c = 0 is all 4 bytes is a valid pixel.

パックストリームが８ビット幅以下の要素から成る場合には、ビットオフセットがＲＥＧ１に格納されている値であるｘ［３１．．０］中のデータ処理開始位置を決定する。ビットオフセットがはじめの入力バイトの最大ビットからの偏移である場合には、出力データバイトｙ［７．．０］の生成方法は以下の式で与えられる。
ｎ＝０の場合、
ｙ［ｉ］＝ｘ［７−ｂ］０≦ｉ≦７のとき
ｎ＝１の場合、
ｙ［ｉ］＝ｘ［７−ｂ］ｉ＝１，３，５，７のとき
ｙ［ｉ］＝ｘ［６−ｂ］ｉ＝０，２，４，６のとき
ｎ＝２の場合、
ｙ［３］＝ｘ［７−ｂ］
ｙ［２］＝ｘ［６−ｂ］
ｙ［１］＝ｘ［５−ｂ］
ｙ［０］＝ｘ［４−ｂ］
ｙ［７］＝ｙ［３］
ｙ［６］＝ｙ［２］
ｙ［５］＝ｙ［１］
ｙ［４］＝ｙ［０］
ｎ＝３の場合、
ｙ［ｉ］＝ｘ［ｉ］０≦ｉ≦７のとき
ｎ＞３の場合、
ｙ［７．．．０］＝ｘ［１５．．．８］
出力データバイトｙ［１５．．８］，ｙ［２３．．１６］，ｙ［３１．．２４］を生成する式も同様である。 When the pack stream is composed of elements having a width of 8 bits or less, the bit offset is a value stored in REG1 x [31. . 0] is determined. If the bit offset is a deviation from the largest bit of the first input byte, the output data byte y [7. . 0] is given by the following equation.
If n = 0,
y [i] = x [7−b] When 0 ≦ i ≦ 7 When n = 1,
y [i] = x [7−b] When i = 1, 3, 5, 7 y [i] = x [6-b] When i = 0, 2, 4, 6 When n = 2,
y [3] = x [7-b]
y [2] = x [6-b]
y [1] = x [5-b]
y [0] = x [4-b]
y [7] = y [3]
y [6] = y [2]
y [5] = y [1]
y [4] = y [0]
If n = 3,
y [i] = x [i] When 0 ≦ i ≦ 7 When n> 3,
y [7. . . 0] = x [15. . . 8]
Output data byte y [15. . 8], y [23. . 16], y [31. . The formula for generating [24] is similar.

なお、以上の手法は、入力ストリームの要素を入力し、必要な回数の複製処理を行い標準幅の出力オブジェクトを生成することで、いかなる長さの出力アレイをも生成することができるように拡張できる。また、入力要素の処理順は、リトルエンディアンでもビッグエンディアンでも良い。なお、上述の例では、常に処理が入力バイトの最大ビットから始まるため、ビッグエンディアン要素順を用いている。リトルエンディアン順を用いる場合には、ビットオフセットを入力バイトの最小ビットに対する値として再定義する必要がある。また、入力要素幅が標準出力幅以上のときには、出力要素は入力要素を切り捨てる、一般には適当な数の最小ビットを削除することによって生成される。上式では、１６ビットデータオブジェクトの最大バイトを選択することにより、１６ビット入力要素を切り捨てて８ビット幅標準出力を生成している。 Note that the above method has been extended so that an output array of any length can be generated by inputting the elements of the input stream, performing the required number of times of duplication processing, and generating a standard width output object. it can. The processing order of input elements may be little endian or big endian. In the above example, since the process always starts from the maximum bit of the input byte, the big endian element order is used. When using little-endian order, the bit offset must be redefined as the value for the smallest bit of the input byte. When the input element width is equal to or larger than the standard output width, the output element is generated by truncating the input element, generally by deleting an appropriate number of minimum bits. In the above equation, by selecting the maximum byte of a 16-bit data object, the 16-bit input element is truncated to generate an 8-bit wide standard output.

図５０の制御部はｎ［２．．０］とｃ［１．．０］の復号を行い、これらとｂ［２．．０］とを用いて正規化マルチプレクサのための選択信号やＲＥＧ１やＲＥＧ２のためのイネーブル信号を生成する。また、ＦＩＦＯは命令中において空になることもあるため、制御部はＲＥＧ１中に入力データを選択する現在のビット位置ｉｎ＿ｂｉｔ［４．．０］と、出力データの書き込みを始める現在のバイト位置ｏｕｔ＿ｂｙｔｅ［４．．０］を記憶するカウンタを備える。制御部は、処理が終了した時点で、ｉｎ＿ｂｉｔ［４．．０］の値とＲＥＧ１の最終オブジェクトの位置とを比較することで入力ワードを検出し、ＦＩＦＯが空でない１クロックサイクルにおいてＦＩＦＯ＿ＲＤ信号を送出することでＦＩＦＯ読み出し動作を開始する。信号ｆｉｆｏ＿ｅｍｐｔｙ，ｆｉｆｏ＿ｆｕｌｌはＦＩＦＯ状態フラグであり、ＦＩＦＯが有効なデータを有していないときにｆｉｆｏ＿ｅｍｐｔｙ＝１、ＦＩＦＯがフルのときにｆｉｆｏ＿ｆｕｌｌ＝１となる。ＦＩＦＯ＿ＲＤが送出されたクロックサイクルにおいて、ＲＥＧ１＿ＥＮの送出され、新しいデータがＲＥＧ１に取り込まれる。ＲＥＧ２のイネーブル信号は、それぞれが出力レジスタの各バイトに対応ごとに４つある。制御部は、復号されたｃ［１．．０］、ＲＥＧ１内の処理待機中の有効要素数、ＲＥＧ２において未使用チャネル数の３つの値中での最小値をとることで、ＲＥＧ２＿ＥＮ［３．．０］を計算する。Ｅ＝０の場合には、ＲＥＧ１中には一つの有効要素しか存在しない。ＲＥＧ２を占めるチャネル数が復号されたｃ［３．．０］と等しい場合に、完全な出力ワードが得られる。 The control unit in FIG. . 0] and c [1. . 0] and b [2. . 0] is used to generate a selection signal for the normalization multiplexer and an enable signal for REG1 and REG2. Also, since the FIFO may be empty during an instruction, the control unit selects the current bit position in_bit [4. . 0] and the current byte position out_byte [4. . 0] is stored. When the processing is completed, the control unit in_bit [4. . 0] and the position of the final object of REG1 are detected, and the FIFO read operation is started by sending a FIFO_RD signal in one clock cycle when the FIFO is not empty. The signals fifo_empty and fifo_full are FIFO status flags, and fifo_empty = 1 when the FIFO does not have valid data, and fifo_full = 1 when the FIFO is full. In the clock cycle in which FIFO_RD is sent, REG1_EN is sent and new data is taken into REG1. There are four REG2 enable signals, one for each byte of the output register. The control unit performs the decrypted c [1. . 0], the number of effective elements waiting for processing in REG1, and the minimum value among the three values of the number of unused channels in REG2, thereby obtaining REG2_EN [3. . 0] is calculated. When E = 0, there is only one valid element in REG1. The number of channels occupying REG2 is decoded c [3. . 0], the complete output word is obtained.

本発明の好適な実施例では、制御部と正規化マルチプレクサにおいて用いられるオフセットの一部のみを用いるなど、ビットオフセットパラメータを制限する機能を付加することにより、図５０の装置が占める回路領域を大幅に低減することができる。このオフセット制限機能は正規化ファクターに依存するものであり、以下の式に応じて動作する。 In the preferred embodiment of the present invention, the circuit area occupied by the device of FIG. 50 is greatly increased by adding a function to limit the bit offset parameter, such as using only a part of the offset used in the control unit and the normalization multiplexer. Can be reduced. This offset limiting function depends on the normalization factor, and operates according to the following equation.

ｂ＿ｔｒｕｎｃ［２．．．０］＝０ｎ≧３の場合
＝ｂ［２．．．０］ｎ＝０の場合
＝ｂ［２．．．１］ｎ＝１の場合
＝ｂ［２］＆”００” ｎ＝２の場合
（「＆」はビットごとの結合処理を示す）
このような処理により、図５０においてＭＵＸ０、ＭＵＸ１．．．ＭＵＸ３１で示されている各正規化マルチプレクサのサイズが、制限機能を用いないときの３２−１からビットオフセット制限を行ったときの最大サイズ２０−１まで低減される。このサイズ縮小により回路速度の向上も図ることができる。 b_trunc [2. . . 0] = 0 When n ≧ 3 = b [2. . . 0] When n = 0 = b [2. . . 1] When n = 1 = b [2] & “00” When n = 2 (“&” indicates a bit-by-bit combination process)
By such processing, in FIG. 50, MUX0, MUX1,. . . The size of each normalization multiplexer indicated by MUX 31 is reduced from 32-1 when the limiting function is not used to the maximum size 20-1 when the bit offset is limited. By reducing the size, the circuit speed can be improved.

以上のように、好適な実施例では、データをいくつかの正規化形式に変換する効率的な回路を備える。
３．１７アクセラレータカードの画像処理動作
図２と表２において、命令制御部２３５はコプロセッサ２２４において実行される動作に帰着される命令を「実行する」。実行される命令は、主データパス部２４２において有用な機能が実行されるような種々の命令を含む。これらの有用な命令の１つが合成処理である。 As described above, the preferred embodiment includes an efficient circuit for converting data into several normalized formats.
3.17 Accelerator Card Image Processing Operation In FIG. 2 and Table 2, the instruction control unit 235 “executes” an instruction resulting in an operation executed in the coprocessor 224. The instructions to be executed include various instructions that perform useful functions in the main data path unit 242. One of these useful instructions is a synthesis process.

３．１７．１合成
図５１は、主データパス部２４２において実装される合成モデルを示した図である。合成モデル４６２は、一般に３つのデータ入力ソースと出力データ（シンク）４６３を含む。入力ソースの１つは、出力４６３とメモリ内での同じ相手先からのピクセルデータ４６４である。また、色や不透明度などのデータソースとして用いられる命令オペランド４６５を含む。ここで、色や不透明度はフラット、ブレンド、ピクセル、タイルのどれでも良い。なお、フラットやブレンドに関しては、入力／出力を介してフェッチするよりも内部で生成した方が高速であるため、ブレンド生成部４６７において生成される。更に、入力データは、オペランドデータ４６５を減衰させる減衰データ４６６をも含む。 3.17.1 Composition FIG. 51 is a diagram illustrating a composition model implemented in the main data path unit 242. The composite model 462 generally includes three data input sources and output data (sink) 463. One input source is the output 463 and pixel data 464 from the same counterpart in memory. It also includes an instruction operand 465 used as a data source such as color and opacity. Here, the color and opacity may be flat, blend, pixel, or tile. It should be noted that flat and blend are generated by the blend generation unit 467 because they are generated faster than fetching via input / output. In addition, the input data also includes attenuation data 466 that attenuates operand data 465.

前述のように、通常ピクセルデータは各チャネルが１バイト幅である４つのチャネルから成る。ここで、最高アドレスの１バイトが不透明チャネルである。なお、合成処理の動作や有用性に関しては、解説論文「ＴｈｏｍａｓＰｏｒｔｅｒａｎｄＴｏｍＤｕｆｆ”ＣｏｍｐｏｓｉｔｉｎｇＤｉｇｉｔａｌＩｍａｇｅｓ”ｉｎＣｏｍｐｕｔｅｒＧｒａｐｈｉｃｓ，ｖｏｌｕｍｅ１８，ｎｕｍｂｅｒ３，Ｊｕｌｙ１９８４」などの標準記事を参照されたい。 As described above, normal pixel data consists of four channels, each channel being 1 byte wide. Here, 1 byte of the highest address is an opaque channel. For the operation and usefulness of the synthesis process, refer to standard articles such as the commentary paper “Thomas Porter and Tom Duff” Compositing Digital Images ”in Computer Graphics, volume 18, number 3, July 1984”.

コプロセッサはプレ乗算データを用いることもできる。プレ乗算は、各色チャネルと不透明チャネルとを前もって乗算する処理である。そのため、２つのオプションのプレ乗算部４６８、４６９を備え、必要なときに、不透明チャネル４７０、４７１と色データとをプレ乗算し、プレ乗算された出力４７２、４７３を得ることができる。合成部４７５は、現在の命令データに基づいて２つの入力を合成する。以下の表１１に、合成オペレータを示す。 The coprocessor can also use premultiplied data. Pre-multiplication is a process of multiplying each color channel and the opaque channel in advance. Therefore, two optional pre-multipliers 468 and 469 are provided, and when necessary, the opaque channels 470 and 471 and the color data are pre-multiplied to obtain pre-multiplied outputs 472 and 473. The synthesizer 475 synthesizes two inputs based on the current instruction data. Table 11 below shows the synthesis operators.

合成動作 Compositing operation

ここで、（ａｃｏ，ａｏ）は、色ａｃと不透明度ａｏのプレ乗算ピクセルを表す。Ｒはオフセット値であり、「ｗｃ」は以下で説明するラッピング／クランピングオペレータである。なお、上表の各オペレータの逆動作も合成部４７５が備えていることに注意されたい。クランプ／ラッピング部４７６は、制限値０〜２５５内にデータをクランプ、或はラップするための処理部である。また、必要であれば、データをオプションの「アンプレ乗算」４７７処理することもでき、もとのピクセル値に戻すこともできる。最後に、出力データ４６３が生成され、メモリに戻される。 Here, (aco, ao) represents a pre-multiplied pixel of color ac and opacity ao. R is an offset value, and “wc” is a wrapping / clamping operator described below. Note that the synthesizing unit 475 also includes the reverse operation of each operator in the above table. The clamp / wrapping unit 476 is a processing unit for clamping or wrapping data within the limit value 0 to 255. Further, if necessary, the data can be subjected to an optional “ample multiplication” 477 and can be returned to the original pixel value. Finally, output data 463 is generated and returned to the memory.

図５２は、合成処理を行う際に主データパス部に送られる命令形式を示している。主オプコード中のＸフィールドが１であれば、前記の表に従って加算オペレータが適用される。このフィールドが０であれば、加算オペレータ以外の他の命令が適用される。Ｐａフィールドは、第一のデータストリーム４６４（図５１）をプレ乗算するかどうかを示すフィールドである。また、Ｐｂフィールドは第２のデータストリーム４６５をプレ乗算するかどうかを示し、Ｐｒフィールドは部位４７７を用いて結果を「アンプレ乗算」するかどうかを示す。Ｃフィールドは範囲０−２５５内にラップあるいはクランプ、オーバフローあるいはアンダーフローするかどうかを示し、「ｃｏｍ−ｃｏｄｅ」フィールドはどのオペレータを適用するかを示す。加算オペレータはオフセットレジスタ（ｍｄｐ＿ｐｏｒ）を用いることもできる。このオフセットはラッピング／クランピング処理が行われる前に加算動作の結果から引かれる。加算オペレータでは、ｃｏｍ−ｃｏｄｅフィールドはオフセットレジスタのチャネルごとにイネーブルするかどうかを示すフィールドとなる。 FIG. 52 shows an instruction format sent to the main data path unit when performing the synthesis process. If the X field in the main opcode is 1, the addition operator is applied according to the above table. If this field is 0, instructions other than the addition operator are applied. The Pa field is a field indicating whether to pre-multiply the first data stream 464 (FIG. 51). Also, the Pb field indicates whether to pre-multiply the second data stream 465, and the Pr field indicates whether to “amplify” the result using the part 477. The C field indicates whether to wrap or clamp, overflow or underflow within the range 0-255, and the “com-code” field indicates which operator is applied. The addition operator can also use the offset register (mdp_por). This offset is subtracted from the result of the addition operation before the wrapping / clamping process is performed. In the addition operator, the com-code field is a field indicating whether to enable each channel of the offset register.

先に述べた図１０の標準命令ワード符号化２８０は、合成オペランドのために変更させられる。出力データの相手先がもとのソースと同じであるため、オペランドＡは常に結果ワードと同一となる。そのため、オペランドＡはオペランドＢとともにオペランドＢをより長く記述することができる。他の命令と同様に、命令中のＡ記述子は入力フォーマットを記述し、Ｒ記述子が出力フォーマットを規定する。 The previously described standard instruction word encoding 280 of FIG. 10 is modified for composite operands. Since the destination of the output data is the same as the original source, the operand A is always the same as the result word. Therefore, operand A can describe operand B longer than operand B. Like other instructions, the A descriptor in the instruction describes the input format and the R descriptor defines the output format.

図５３は、ブレンド命令の命令ワードフォーマットを第一例４７０として示している。ブレンド処理は、各チャネルごとの開始値４７１と終了値４７２とで規定される。同様に、図５４は、タイルアドレス４７６、開始オフセット４７７、長さ４７８によって規定されるタイル命令フォーマットを示している。すべてのタイルアドレスやサイズはバイトごとに特定される。タイル処理はモジュラー的に行われ、図５５は図５４のフィールド４７６〜４７８を説明する図である。タイルアドレス４７６はタイルメモリの開始アドレスを、タイル開始オフセット４７７はタイル開始時に用いられる最初のバイトを、タイル長４７８はラップする全体のタイル長を指定する。 FIG. 53 shows an instruction word format of the blend instruction as a first example 470. The blending process is defined by a start value 471 and an end value 472 for each channel. Similarly, FIG. 54 shows a tile instruction format defined by a tile address 476, a start offset 477, and a length 478. All tile addresses and sizes are specified on a byte-by-byte basis. The tile processing is performed in a modular manner, and FIG. 55 is a diagram for explaining the fields 476 to 478 in FIG. The tile address 476 specifies the start address of the tile memory, the tile start offset 477 specifies the first byte used at the start of the tile, and the tile length 478 specifies the entire tile length to be wrapped.

図５１において、色要素や不透明度は減衰値４６６によって減衰させられることもある。減衰値は以下の３つの手法により得られる。
１．命令のオペランドＣワード中に減衰ファクタをいれることによって、ソフトウエアがフラット減衰を指定することができる。
２．１がオンで、０がオフであるビットマップ減衰は、命令のオペランドＣワード中でビットマップのアドレスを特定するソフトウェアを用いて利用できる。
３．バイトマップ減衰を、命令のオペランドＣワードのバイトマップアドレスに設けてもよい。
４．定するソフトウエアを用いて、１のときにオン、２のときにオフとするビットマップ減衰を行うことができる。 In FIG. 51, color elements and opacity may be attenuated by an attenuation value 466. The attenuation value is obtained by the following three methods.
1. Software can specify flat attenuation by including an attenuation factor in the operand C word of the instruction.
Bitmap attenuation with 2.1 on and 0 off can be used with software that identifies the address of the bitmap in the operand C word of the instruction.
3. Byte map attenuation may be provided at the byte map address of the operand C word of the instruction.
4). Bitmap attenuation can be performed using ON software when set to 1, and OFF when set to 2.

減衰値は符号なしの０〜２５５の整数であるため、プレ乗算された色チャネルは、
Ｃｏａ＝Ｃｏａ×Ａ／２５５
を計算することで、減衰ファクターと乗算される。ここで、Ａは減衰ファクター、Ｃｏはプレ乗算された色チャネルである。 The attenuation value is an unsigned integer from 0 to 255, so the premultiplied color channel is
Coa = Coa × A / 255
Is multiplied by the attenuation factor. Where A is the attenuation factor and Co is the premultiplied color channel.

３．１７．２色空間変換命令
図２と表２において、主データパス部２４２とデータキャッシュ２３０は、主に色変換の処理を行う。色空間変換は第一の色空間フォーマット（例えば、ＲＧＢカラーディスプレイに適したフォーマット）から第二の色空間フォーマット（例えばＣＹＭあるいはＣＹＭＫ印刷に適したフォーマット）への変換処理を行う。色空間変換処理はすべての色空間をサポートするように設計されており、１次元から多次元までのいかなる機能において用いることができる命令制御部２３５はＣＢｕｓ２３１を介して、主データパス部２４２、データキャッシュ制御部２４０、入力インタフェーススイッチ２５２、ピクセルオーガナイザ２４６、ＭＵＶバッファ２５０、オペランドオーガナイザＢ２４７、オペランドオーガナイザＣ２４８、結果オーガナイザ２４９を構成し、色変換モードで動作するように制御する。このモードでは、ピクセルの複数ラインから成る入力画像がピクセルストリームとして主データパス部２４２に１ピクセルラインごとに送出される。主データパス部２４２（図２）は入力インタフェーススイッチ２５２からピクセルオーガナイザ２４６を介してピクセルストリームを受け取り、１ピクセルごとに色空間変換処理を行う。また、インターバル表や分数表がＭＵＶバッファ２５０にあらかじめロードされ、色変換表がデータキャッシュ２３０にロードされる。主データパス２４２はこれらの表にオペランドオーガナイザＢ，Ｃを介してアクセスし、例えばＲＧＢ色空間からＣＹＭあるいはＣＹＭＫ色空間にピクセルを変換し、変換されたピクセルを結果オーガナイザ２４９に送る。主データパス部２４２、データキャッシュ２３０、データ制御部２４０、他の前述のデバイスは、命令制御部２３５の制御のもとで、単一出力一般色空間（ＳＯＧＣＳ）変換モードあるいは複数出力一般色空間（ＭＯＧＣＳ）変換モードのどちらかのモードで動作する。データキャッシュ制御部２４０やデータキャッシュ２３０の詳細に関しては、「データキャッシュ制御部とキャッシュ」２４０、２３０（図２）の項目を参照されたい。 3.17.2 Color Space Conversion Instruction In FIG. 2 and Table 2, the main data path unit 242 and the data cache 230 mainly perform color conversion processing. The color space conversion performs conversion processing from a first color space format (for example, a format suitable for RGB color display) to a second color space format (for example, a format suitable for CYM or CYMK printing). The color space conversion process is designed to support all color spaces, and an instruction control unit 235 that can be used in any function from one dimension to multi-dimension is connected to the main data path unit 242, the data via the CBus 231. The cache control unit 240, the input interface switch 252, the pixel organizer 246, the MUV buffer 250, the operand organizer B247, the operand organizer C248, and the result organizer 249 are configured and controlled to operate in the color conversion mode. In this mode, an input image composed of a plurality of lines of pixels is sent to the main data path unit 242 for each pixel line as a pixel stream. The main data path unit 242 (FIG. 2) receives the pixel stream from the input interface switch 252 via the pixel organizer 246 and performs color space conversion processing for each pixel. In addition, an interval table and a fraction table are loaded in advance in the MUV buffer 250, and a color conversion table is loaded in the data cache 230. The main data path 242 accesses these tables via the operand organizers B and C, converts pixels from, for example, the RGB color space to the CYM or CYMK color space, and sends the converted pixels to the result organizer 249. The main data path unit 242, the data cache 230, the data control unit 240, and other devices described above are controlled by the command control unit 235, in a single output general color space (SOGCS) conversion mode or a multiple output general color space. It operates in either mode of (MOGCS) conversion mode. For details of the data cache control unit 240 and the data cache 230, refer to the items of “data cache control unit and cache” 240 and 230 (FIG. 2).

正確な色空間変換処理は複雑な非線形処理である。例えば、ＲＧＢピクセルからＣＹＭＫ色空間の単一主色要素（即ちシアン）への色空間変換処理は理論的には線形であるが、実際には主にピクセルの色要素を出力する出力デバイスにおいて非線形性が生じてしまう。ＲＧＢピクセルからＣＹＭＫ色空間の他の主色要素（黄、マジェンタ、黒）への色空間変換処理においても同様である。即ち、各色要素において生じてしまう非線形性を補償するために、非線形色空間変換が一般に用いられる。このような複雑な色変換処理の非線形性のために、複雑な伝達関数が組み込まれたり、ルックアップテーブルが用いられる。例えば２４ビットのＲＧＢピクセルの入力色空間が与えられると、これらのピクセルをＣＹＭＫ色空間の８ビット主色要素（シアン）にマッピングするルックアップテーブルは１６メガバイト以上を必要とする。同様に、２４ビットＲＧＢピクセルをＣＹＭＫ色空間の４つの８ビット主色要素にマッピングするルックアップテーブルは６４メガバイト以上となり、膨大な容量が必要なる。これに対して、主データパス２４２（図２）は、データキャッシュ２３０に格納されたルックアップテーブルを用い、入力色空間中の点に粗い出力色値を対応させ、出力色値を補間することで中間出力を得る。
ａ．単一出力一般色空間（ＳＯＧＣＳ）変換モード
単一ならびに複数出力色変換モード（ＳＯＧＣＳ）と（ＭＯＧＣＳ）双方とも、ＲＧＢ色空間は８ビットの赤、緑、青色要素を有する２４ビットピクセルから成る。ＲＧＢ色空間の各ＲＧＢ次元は１５の区間に分割され、それぞれの区間の長さはプリンタのＲＧＢからＣＹＭＫ色空間への非線形性の逆関数となるように設定される。即ち、伝達関数が強い非線形性を示す場合には区間の長さを短くし、伝達関数が線形に近い場合には区間の長さを長くする。このような伝達関数の非線形部位を知るためには、各出力プリンタの色空間を正確に調べることが望ましい。しかし、ノウハウやプリンタタイプ（例えばインクジェット）の測定された特徴に基づいて、伝達関数を近似あるいはモデル化することも可能である。入力ピクセルの各色チャネルごとに、色要素値の１５の区間中の位置が決められる。どの区間に入力色要素値が存在するかを決定するためと、入力色要素値が存在する区間内の位置を決定するためとの２つのテーブルが主データパス部２４２において用いられる。もちろん、異なる伝達関数を有する出力プリンタに対しては異なるテーブルを用いても良い。 An accurate color space conversion process is a complex nonlinear process. For example, the color space conversion process from an RGB pixel to a single primary color element (ie, cyan) in the CYMK color space is theoretically linear, but is actually nonlinear in an output device that mainly outputs the pixel color element. Sex will occur. The same applies to color space conversion processing from RGB pixels to other main color elements (yellow, magenta, black) in the CYMK color space. That is, nonlinear color space conversion is generally used to compensate for nonlinearity that occurs in each color element. Due to the nonlinearity of such a complex color conversion process, a complicated transfer function is incorporated or a lookup table is used. For example, given an input color space of 24-bit RGB pixels, a lookup table that maps these pixels to 8-bit primary color elements (cyan) of the CYMK color space requires more than 16 megabytes. Similarly, a look-up table for mapping 24-bit RGB pixels to four 8-bit main color elements in the CYMK color space is 64 megabytes or more and requires a huge capacity. In contrast, the main data path 242 (FIG. 2) uses a look-up table stored in the data cache 230, associates a rough output color value with a point in the input color space, and interpolates the output color value. To get an intermediate output.
a. Single Output General Color Space (SOGCS) Conversion Mode For both single and multiple output color conversion modes (SOGCS) and (MOGCS), the RGB color space consists of 24-bit pixels with 8-bit red, green and blue elements. Each RGB dimension of the RGB color space is divided into 15 sections, and the length of each section is set to be an inverse function of the nonlinearity from the RGB of the printer to the CYMK color space. That is, when the transfer function shows strong nonlinearity, the length of the section is shortened, and when the transfer function is close to linear, the length of the section is increased. In order to know the nonlinear part of such a transfer function, it is desirable to accurately check the color space of each output printer. However, it is also possible to approximate or model the transfer function based on the know-how and the measured characteristics of the printer type (eg ink jet). For each color channel of the input pixel, the position in 15 sections of the color element value is determined. Two tables are used in the main data path unit 242 to determine in which section the input color element value exists and to determine a position in the section in which the input color element value exists. Of course, different tables may be used for output printers having different transfer functions.

前述のようにＲＧＢの各次元は１５の区間に分割される。即ち、ＲＧＢ色空間は区間で区切られた３次元ラティス構造となっており、区間の両端の入力ピクセルは入力色空間では粗い配置となっている。更に、区間の両端に対応する出力色空間の出力色値のみがルックアップテーブルに格納されている。従って、入力色ピクセルの出力色値は、入力ピクセルが存在する区間の両端に対応する出力色値を決定し、区間内の位置に基づいて出力色値を補間することで求められる。この手法により、大容量のメモリを用いなければならない必要性を低減できる。 As described above, each dimension of RGB is divided into 15 sections. That is, the RGB color space has a three-dimensional lattice structure divided by sections, and input pixels at both ends of the section are coarsely arranged in the input color space. Furthermore, only output color values in the output color space corresponding to both ends of the section are stored in the lookup table. Therefore, the output color value of the input color pixel is obtained by determining the output color value corresponding to both ends of the section where the input pixel exists and interpolating the output color value based on the position in the section. This approach can reduce the need to use a large capacity memory.

図５６は、入力ＲＧＢ色ピクセルに対して、対応する区間や区間内の位置を決定する例４８０を示している。変換処理は、２４ビット入力ピクセルの８ビット入力色チャネルごとに区間テーブル４８２や区間内位置テーブル４８３を用いて実行される。図５６において、８ビット入力色要素４８１は１０進数の４をバイナリー形式で表示したものであるが、この８ビット入力色要素４８１が区間テーブルや区間内位置テーブルへのルックアップとして用いられる。区間テーブル４８２は、入力色要素値４８１が存在する０から１４までの区間の１つを４ビットで出力する。同様に、区間内テーブル４８２は、入力色値要素４８１が存在する区間内での位置を示す。区間内テーブルは、０から２５５までの範囲の８ビット値を格納しており、この値は２５６の分数として解釈される。従って、１０進数４をバイナリーであらわした入力色値要素４８１の場合には、区間テーブル４８２をルックアップすることで、出力値０が生成される。また、入力値４を区間内位置テーブル４８３でルックアップすることにより、分数１６０／２５６を表わす出力値１６０が生成される。区間テーブル４８２と区間内位置テーブル４８３からわかるように、区間長は均一ではない。前述のように、区間長は伝達関数の非線形性によって決められる。 FIG. 56 shows an example 480 in which the corresponding section and the position in the section are determined for the input RGB color pixel. The conversion process is executed using the section table 482 and the intra-section position table 483 for each 8-bit input color channel of 24-bit input pixels. In FIG. 56, an 8-bit input color element 481 is obtained by displaying decimal number 4 in binary format, and this 8-bit input color element 481 is used as a lookup to the section table or the intra-section position table. The section table 482 outputs one of the sections from 0 to 14 in which the input color element value 481 exists in 4 bits. Similarly, the intra-section table 482 indicates the position in the section where the input color value element 481 exists. The intra-section table stores 8-bit values in the range of 0 to 255, and this value is interpreted as a fraction of 256. Therefore, in the case of the input color value element 481 in which the decimal number 4 is expressed in binary, the output value 0 is generated by looking up the section table 482. Also, by looking up the input value 4 in the intra-section position table 483, an output value 160 representing the fraction 160/256 is generated. As can be seen from the section table 482 and the intra-section position table 483, the section lengths are not uniform. As described above, the section length is determined by the nonlinearity of the transfer function.

上述の通り、各ＲＧＢ色要素に対して区間テーブルと区間内位置テーブルとを用いることで、３つの区間出力と３つの区間内位置出力が得られる。各色要素に対する区間／区間内位置テーブルはＭＵＶバッファ（図２）にロードされ、必要な時点で主データパス２４２によってアクセスされる。色変換処理におけるＭＵＶバッファ２５０の構成を図５７に示す。ＭＵＶバッファ２５０（図５７）は、それぞれが各色要素に対応する３つの領域４８８、４８９、４９０に分けられる。各領域（例えば４８８）は、更に４ビットの区間テーブルと８ビットの区間内位置テーブルとに分けられる。１２ビット出力４９２は主データパス部２４２によってＭＵＶバッファ２５０から各入力色チャネルごとに取り出される。１０進数４の単一入力色要素の上述例では、１２ビット出力は０００００１０１００００となる。 As described above, by using the section table and the intra-section position table for each RGB color element, three section outputs and three intra-section position outputs can be obtained. The section / intra-section position table for each color element is loaded into the MUV buffer (FIG. 2) and accessed by the main data path 242 when needed. FIG. 57 shows the configuration of the MUV buffer 250 in the color conversion process. The MUV buffer 250 (FIG. 57) is divided into three regions 488, 489, and 490, each corresponding to each color element. Each area (for example, 488) is further divided into a 4-bit section table and an 8-bit intra-section position table. The 12-bit output 492 is extracted from the MUV buffer 250 for each input color channel by the main data path unit 242. In the above example of a decimal 4 single input color element, the 12-bit output is 000001010000.

図５８は、補間処理の例を示した図である。補間処理は、１つの３次元空間５００（例えばＲＧＢ色空間）から他の色空間（例えばＣＭＹあるいはＣＭＹＫ）への補間が主な処理である。ピクセルＰ０からＰ７はＲＧＢ入力色空間内で粗く存在しており、出力色空間において対応する出力色値ＣＶ（Ｐ０）からＣＶ（Ｐ７）を有する。ピクセルＰ０からＰ７の間に位置する入力ピクセルＰｉの出力色要素値は、以下のようにして決定される。まず、入力ピクセルＰｉを取り囲む区間の両端Ｐ０，Ｐ１，．．．，Ｐ７を決定する。次に、区間内位置要素ｆｒａｃ＿ｒ，ｆｒａｃ＿ｇ，ｆｒａｃ＿ｂを決定し、最後に、Ｐ０からＰ７の両端に対応する出力色値ＣＶ（Ｐ０）からＣＶ（Ｐ７）の間を区間内位置要素を用いて補間する。 FIG. 58 is a diagram showing an example of interpolation processing. Interpolation processing is mainly processing from one three-dimensional space 500 (for example, RGB color space) to another color space (for example, CMY or CMYK). Pixels P0 to P7 are coarsely present in the RGB input color space and have corresponding output color values CV (P0) to CV (P7) in the output color space. The output color element value of the input pixel Pi located between the pixels P0 to P7 is determined as follows. First, both ends P0, P1,. . . , P7. Next, intra-section position elements frac_r, frac_g, and frac_b are determined, and finally interpolation is performed between output color values CV (P0) to CV (P7) corresponding to both ends of P0 to P7 using the intra-section position elements. To do.

補間処理は、まず赤（Ｒ）方向の１次元補間を行い、ｔｅｍｐ１１，ｔｅｍｐ１２，ｔｅｍｐ１３，ｔｅｍｐ１４の値を以下の式から求める。
ｔｅｍｐ１１＝ＣＶ（Ｐ０）＋ｆｒａｃ＿ｒ（ＣＶ（Ｐ１）−ＣＶ（Ｐ０））
ｔｅｍｐ１２＝ＣＶ（Ｐ２）＋ｆｒａｃ＿ｒ（ＣＶ（Ｐ３）−ＣＶ（Ｐ２））
ｔｅｍｐ１３＝ＣＶ（Ｐ４）＋ｆｒａｃ＿ｒ（ＣＶ（Ｐ５）−ＣＶ（Ｐ４））
ｔｅｍｐ１４＝ＣＶ（Ｐ６）＋ｆｒａｃ＿ｒ（ＣＶ（Ｐ７）−ＣＶ（Ｐ６））
次に、補間処理は、以下の式を用いてｔｅｍｐ２１，ｔｅｍｐ２２を求め、緑（Ｇ）方向の１次元補間の計算をする。 In the interpolation processing, first, red (R) direction one-dimensional interpolation is performed, and values of temp11, temp12, temp13, and temp14 are obtained from the following equations.
temp11 = CV (P0) + frac_r (CV (P1) −CV (P0))
temp12 = CV (P2) + frac_r (CV (P3) -CV (P2))
temp13 = CV (P4) + frac_r (CV (P5) -CV (P4))
temp14 = CV (P6) + frac_r (CV (P7) -CV (P6))
Next, in the interpolation process, temp21 and temp22 are obtained by using the following formula, and one-dimensional interpolation in the green (G) direction is calculated.

ｔｅｍｐ２１＝ｔｅｍｐ１１＋ｆｒａｃ＿ｇ（ｔｅｍｐ１２−ｔｅｍｐ１１）
ｔｅｍｐ２２＝ｔｅｍｐ１３＋ｆｒａｃ＿ｇ（ｔｅｍｐ１４−ｔｅｍｐ１３）
最後に、以下の式に基づいて最終色出力値を求め、青（Ｂ）方向の最終次元補間を行う。
ｆｉｎａｌ＝ｔｅｍｐ２１＋ｆｒａｇ＿ｂ（ｔｅｍｐ２２−ｔｅｍｐ２１）
入力と出力との範囲が一致しない場合もしばしば有り得る。ここで、出力範囲が入力範囲よりも狭いと、両端で範囲をクランプしなければならないことが多い。即ち、範囲の端あたりの色を変換した際に望ましくないひずみが生じることが多い。図５９は、この問題が生じる例を説明しており、入力範囲値を出力範囲値に１次元マッピングする様子が示されている。ここで、入力値に対する出力値が点５１０と５１１とで定まっているものとする。最大の出力値が点５１２でクランプされるとすると、点５１１はこの大きさの出力でなければならない。従って、５１０と５１１の２つの点を補間する場合には、線５１５が補間線となり、入力点５１６には出力値５１７が対応する。しかし、範囲の制約が存在しないときに出力値が点５１８になるような場合には、この手法が必ずしも最適な色マッピングであるとは限らない。５１０と５１８との補間線は、入力点５１６に対して出力値５１９を生成する。このような２つの出力値５１７と５１９の差異は、特に範囲の端あたりの色を印刷する場合などしばしば目につくひずみとなる、この問題を避けるために、主データパス部は、拡張出力色空間で計算し、以下の式に用いて適切な範囲にスケールやクランプすることも可能である。 temp21 = temp11 + frac_g (temp12-temp11)
temp22 = temp13 + frac_g (temp14−temp13)
Finally, a final color output value is obtained based on the following equation, and final dimension interpolation in the blue (B) direction is performed.
final = temp21 + flag_b (temp22−temp21)
There are often cases where the input and output ranges do not match. Here, if the output range is narrower than the input range, it is often necessary to clamp the range at both ends. That is, undesirable distortions often occur when converting colors around the ends of the range. FIG. 59 illustrates an example in which this problem occurs, and shows a state in which an input range value is one-dimensionally mapped to an output range value. Here, it is assumed that the output value for the input value is determined at points 510 and 511. If the maximum output value is clamped at point 512, point 511 must be an output of this magnitude. Accordingly, when two points 510 and 511 are interpolated, the line 515 becomes an interpolation line, and the output value 517 corresponds to the input point 516. However, when the output value is a point 518 when there is no range restriction, this method is not always the optimum color mapping. The interpolation line 510 and 518 generates an output value 519 for the input point 516. In order to avoid this problem, the difference between the two output values 517 and 519 is often a noticeable distortion, especially when printing colors around the end of the range. It can be calculated in space and scaled or clamped to an appropriate range using the following formula.

図５８において、補間処理は、ＲＧＢピクセルを単一出力色要素（例えばシアン）に変換するＳＯＣＧＳ変換モードでも、ＲＧＢピクセルをすべての出力色要素に同時に変換するＭＯＧＣＳモードのどちらでも実行される。色変換が画像中の各ピクセルに対して行われる場合には、数１００万ピクセルがそれぞれ独立に色変換されることになる。従って、高速に動作するためには、入力値周辺の８つの値（Ｐ０−Ｐ７）を素早く見つけることが望ましい。 In FIG. 58, the interpolation processing is executed in either the SOCGS conversion mode for converting RGB pixels into a single output color element (for example, cyan) or the MOGCS mode for converting RGB pixels into all output color elements simultaneously. When color conversion is performed on each pixel in the image, millions of pixels are independently color-converted. Therefore, in order to operate at high speed, it is desirable to quickly find eight values (P0-P7) around the input value.

図５７において説明した通り、主データパス部２４２は、各色入力チャネルごとに４ビット区間部位と８ビット区間内位置部位とから成る１２ビット出力を取り出す。主データパス部２４２は赤、緑、青色チャネルの４ビット区間部位を結合し、図６０中の５２０のように単一の１２ビットアドレス（ＩＲ，ＩＧ，ＩＢ）を生成する。図６０は、単一１２ビットアドレス５２０から単一出力色要素５６３が得られる様子を示したデータフロー図である。１２ビットアドレス５２０は、まず生成部１８８１（図１４１）のようなデータキャッシュ制御部２４０のアドレス生成部に送られ、メモリバンク（Ｂ０，Ｂ１，．．．，Ｂ７）に対する８個の９ビットライン／バイトアドレス５２１を生成する。データキャッシュ（図２）は、８個の独立のメモリバンク５２２に分割され、それぞれは８個のライン／バイトアドレスによって独立にアドレシングされる。アドレス生成部における１２ビットアドレス５２０から８ライン／バイトアドレスへの変換は、以下の表に従って行われる。 As described in FIG. 57, the main data path unit 242 takes out a 12-bit output composed of a 4-bit section part and an 8-bit section position part for each color input channel. The main data path unit 242 combines the 4-bit sections of the red, green, and blue channels, and generates a single 12-bit address (IR, IG, IB) as indicated by 520 in FIG. FIG. 60 is a data flow diagram showing how a single output color element 563 is obtained from a single 12-bit address 520. The 12-bit address 520 is first sent to the address generation unit of the data cache control unit 240 such as the generation unit 1881 (FIG. 141), and eight 9-bit lines for the memory banks (B0, B1,..., B7). / Byte address 521 is generated. The data cache (FIG. 2) is divided into eight independent memory banks 522, each addressed independently by eight line / byte addresses. Conversion from the 12-bit address 520 to the 8-line / byte address in the address generation unit is performed according to the following table.

ＳＯＧＣＳモードにおけるアドレス合成 Address composition in SOGCS mode

ここで、ＢＩＴ［８：６］，ＢＩＴ［５：３］，ＢＩＴ［２：０］は、それぞれ９ビットバンクアドレスの６から８ビット、３から５ビット、０から２ビットを示す。また、Ｒ［３：１］，Ｇ［３：１］，Ｂ［３：１］は１２ビットアドレス５２０の４ビット区間ＩＲ，ＩＧ，ＩＢの第１から第３ビットまでを示す。表１２のメモリバンク５に関して、１２ビットから９ビットへのマッピングを詳細に説明する。１２ビットアドレス５２０中の４ビット赤区間Ｉｒの１〜３ビットが９ビットアドレスＢ５の６〜８ビットにマッピングされ、４ビット緑区間Ｉｇの１〜３ビットが加算されて９ビットアドレスＢ５の３〜５ビットにマッピングされ、４ビット青区間Ｉｂの１〜３ビットが９ビットアドレスＢ５の０〜２ビットにマッピングされる。 Here, BIT [8: 6], BIT [5: 3], and BIT [2: 0] indicate 6 to 8 bits, 3 to 5 bits, and 0 to 2 bits, respectively, of the 9-bit bank address. R [3: 1], G [3: 1], and B [3: 1] indicate the first to third bits of the 4-bit section IR, IG, IB of the 12-bit address 520. Regarding the memory bank 5 in Table 12, the mapping from 12 bits to 9 bits will be described in detail. The 1 to 3 bits of the 4-bit red section Ir in the 12-bit address 520 are mapped to the 6 to 8 bits of the 9-bit address B5, and the 1 to 3 bits of the 4-bit green section Ig are added to add 3 of the 9-bit address B5. 1 to 5 bits of 4-bit blue section Ib are mapped to 0 to 2 bits of 9-bit address B5.

８つのライン／バイトアドレス５２１は、５１２×８ビットから成る対応するメモリバンク５２２へのアドレスとして用いられ、対応する８ビット出力色要素５２３が各メモリバンク５２２からラッチされる。このアドレシング処理によれば、端点Ｐ０〜Ｐ７に対応する出力色値ＣＶ（Ｐ０）〜ＣＶ（Ｐ７）がメモリバンク中での異なるアドレスとなることがある。例えば、１２ビットアドレス００００００００００００は、すべてのバンクで０００００００００という同一のバンクアドレスが得られるが、１２ビットアドレス０００００００００００１の場合には、バンク７、５、３、１ではバンクアドレス０００００００００となり、バンク６、４、２、０ではバンクアドレス００００００００１となるように異なるバンクアドレスが得られる。このようにして、入力ピクセル値を取り囲む８つの単一出力色値ＣＶ（Ｐ０）〜ＣＶ（Ｐ７）が同時に各メモリバンクから得られ、メモリバンクにおいて出力色値が二重になることを防ぐことができる。 Eight line / byte addresses 521 are used as addresses to corresponding memory banks 522 consisting of 512 × 8 bits, and corresponding 8-bit output color elements 523 are latched from each memory bank 522. According to this addressing process, the output color values CV (P0) to CV (P7) corresponding to the end points P0 to P7 may be different addresses in the memory bank. For example, a 12-bit address 0000 0000 0000 can obtain the same bank address of 000 000 000 in all banks. Thus, different bank addresses are obtained in the banks 6, 4, 2, and 0 so that the bank address is 000 000 001. In this way, eight single output color values CV (P0) to CV (P7) surrounding the input pixel value are obtained simultaneously from each memory bank, and the output color value is prevented from being duplicated in the memory bank. Can do.

図６１は、単一色変換モードにおいて用いられるデータキャッシュ２３０のメモリバンクの構成を示している。各メモリバンクは１２８ラインエントリから成り、各ラインエントリは３２ビット長で４×８ビットメモリ５３３〜５３６から構成される。メモリアドレス５２１の上７ビットは、メモリアドレス中の対応するデータ列を決定し、メモリバンク出力としてラッチ５４２するために用いられる。下２ビットはバイトアドレスで、マルチプレクサ５４３への入力となり、どの４×８ビットエントリを出力として選択５４４するかを決定するために用いられる。クロックサイクルごとに８つの各メモリバンクのためのデータが出力され、主データパス部２４２に送られる。即ち、データキャッシュ制御部はオペランドオーガナイザ２４８（図２）から１２ビットのバイトアドレスを受け取り、主データパス部２４２における補間処理のための８ビット出力色値をオペランドオーガナイザ２４７、２４８に出力する。 FIG. 61 shows a memory bank configuration of the data cache 230 used in the single color conversion mode. Each memory bank consists of 128 line entries, and each line entry is 32 bits long and consists of 4 × 8 bit memories 533-536. The upper 7 bits of the memory address 521 are used to determine the corresponding data string in the memory address and latch 542 as the memory bank output. The lower 2 bits are a byte address and are input to the multiplexer 543 and are used to determine which 4 × 8 bit entry is selected 544 as an output. Data for each of the eight memory banks is output every clock cycle and sent to the main data path unit 242. That is, the data cache control unit receives a 12-bit byte address from the operand organizer 248 (FIG. 2), and outputs an 8-bit output color value for interpolation processing in the main data path unit 242 to the operand organizers 247 and 248.

図６０において、主データパス部２４２（図２）は補間処理を３ステップで実行する。主データパス部における第１ステップにおいて、乗算／加算部（例えば５５０）は対応するメモリバンク（例えば５２２）から出力される色値と赤区間位置要素５５１を入力とし、前記の式の第１ステップに従って４つの出力値を計算する。第１ステップの出力（例えば５５３、５５４）は第２ステップ５５６に送られ、ｆｒａｃ＿ｇ入力５５７を用いて第２ステップの前式に従って出力５５８を計算する。最後に、第２ステップ出力５５８、５５９とｆｒａｃ＿ｂ入力５６２とを用いて、前式に基づいて最終出力色５６３を計算する。 In FIG. 60, the main data path unit 242 (FIG. 2) executes the interpolation process in three steps. In the first step in the main data path unit, the multiplication / addition unit (for example, 550) receives the color value output from the corresponding memory bank (for example, 522) and the red interval position element 551, and the first step of the above formula To calculate four output values. The output of the first step (eg, 553, 554) is sent to the second step 556 and the output 558 is calculated according to the previous equation of the second step using the frac_g input 557. Finally, the final output color 563 is calculated based on the previous equation using the second step outputs 558 and 559 and the frac_b input 562.

図６０に示した処理は、全体で最大のスループットを得るためにパイプライン化されている。更に、図６０の手法は単一出力色要素５６３が必要なときに用いられる。例えば、図６０の手法は、まず出力画像のシアン色要素を生成し、その後でパス間のキャッシュテーブルを再ロードして出力画像のマジェンタ、黄、黒要素を生成するような場合に用いられる。これは、特に、それぞれの出力色が独立パスとなるような４パス印刷処理に適している。
ｂ．複数出力一般色空間モード
コプロセッサ２２４はＭＯＧＣＳモードでの動作も行うが、ＭＯＧＣＳモードはいくつかの点を除いてＳＯＣＧＳモードとほぼ同様に動作する。ＭＯＧＣＳモードでは、図２の主データパス部２４２、データキャッシュ制御部２４０、データキャッシュが協調して、出力される４つの主色要素を同時に出力する。このためにはデータキャッシュ２３０のサイズが４倍必要となるが、記憶領域を節約するためにＭＯＧＣＳ動作モードでは、データキャッシュ制御部２４０は出力色空間のすべての出力色値の１／４のみを格納する。出力色空間の残りの出力色値は低速度の外部メモリに格納され、必要な時点で取り出される。なお、本装置や手法は、キャッシュシステムにある粗い色変換テーブルのミス率が非常に小さいという驚くべき事実に基づいている。これは、多くのカラー画像では、１つのピクセルと他のピクセルとの色値の分散が小さいという知見に基づいたものである。また、粗い出力色値は近隣のピクセルにおいても同じになる確率が非常に高い。 The processing shown in FIG. 60 is pipelined in order to obtain the maximum throughput as a whole. Further, the technique of FIG. 60 is used when a single output color element 563 is required. For example, the method of FIG. 60 is used when a cyan color element of an output image is first generated and then a cache table between passes is reloaded to generate magenta, yellow, and black elements of the output image. This is particularly suitable for 4-pass printing processing in which each output color is an independent pass.
b. Multiple Output General Color Space Mode The coprocessor 224 also operates in the MOGCS mode, but the MOGCS mode operates in much the same way as the SOCGS mode except for a few points. In the MOGCS mode, the main data path unit 242, the data cache control unit 240, and the data cache in FIG. 2 cooperate to output the four main color elements to be output simultaneously. This requires four times the size of the data cache 230, but in order to save storage space, in the MOGCS operation mode, the data cache control unit 240 uses only 1/4 of all output color values in the output color space. Store. The remaining output color values in the output color space are stored in a low speed external memory and retrieved when needed. Note that the present apparatus and method are based on the surprising fact that the miss rate of the coarse color conversion table in the cache system is very small. This is based on the finding that in many color images, the dispersion of color values between one pixel and another pixel is small. Also, there is a very high probability that the coarse output color value will be the same in neighboring pixels.

図６２は、コプロセッサが複数チャネルキャッシュ色変換を実行する手法を示している。各入力ピクセルは色要素に分解された後、対応する区間テーブル値（図５６）が前述のように決定され、Ｉｒ，Ｉｇ，Ｉｂ５７０といった３つの４ビット区間が得られる。結合された１２ビット数５７０は前述の表１２に従って変換され、８個の９ビットアドレスが得られる。アドレス（例えば５７２）は図６３において以下で説明するように再マッピングされ、対応するメモリバンク５７３をルックアップして４つの色出力チャネル５７４が得られる。メモリバンク５７３は、全体で５１２×３２ビットエントリとなり得るが、そのうちの１２８×３２ビットエントリを格納する。メモリバンク５７３はデータキャッシュ２３０の一部をなし、図６３で説明するようにキャッシュとして用いられる。 FIG. 62 illustrates a technique by which the coprocessor performs multiple channel cache color conversion. After each input pixel is decomposed into color elements, the corresponding interval table value (FIG. 56) is determined as described above, and three 4-bit intervals such as Ir, Ig, and Ib570 are obtained. The combined 12-bit number 570 is converted according to Table 12 above to obtain eight 9-bit addresses. The address (eg, 572) is remapped as described below in FIG. 63 and the corresponding memory bank 573 is looked up to obtain four color output channels 574. Memory bank 573 can be a 512 × 32 bit entry as a whole, but stores a 128 × 32 bit entry. The memory bank 573 forms part of the data cache 230 and is used as a cache as will be described with reference to FIG.

図６３は、９ビットバンク入力５７８が５７９に再マッピングされる様子を示しており、ビット５８０〜５８２の順番を入れ替えることによりメモリパターンのエイリアスを取り除くことができる。これにより、隣接するピクセル値が同じキャッシュ要素のエイリアスされる確率を低減することができる。再構成されたメモリアドレス５７９は、それぞれが３２ビットの１２８エントリから成る対応するメモリバンク（例えば５８５）へのアドレスとして用いられる。７ビットラインアドレスを用いてメモリ５８５にアクセスすることで、メモリバンクごとにラッチ５８６される出力が得られる。各メモリバンク（例えば５８５）は、それぞれが２ビットの１２８エントリから成る関連タグメモリを有する。７ビットラインアドレスは、このタグメモリ５８７中の対応するタグにアクセスするためにも用いられる。アドレス５７９の最大２ビットをタグメモリ５８７中の対応するタグと比較することで、出力色値がキャッシュ中に格納されているかどうかが決定される。この９ビットアドレス中の最大２ビットは、赤と緑データ区間の最大ビットに対応する（表１２参照）。従って、ＭＯＧＣＳモードでは、ＲＧＢ入力色空間が赤と緑次元において効率よく４象限に分割され、９ビットアドレスの最大２ビットがＲＧＢ入力色区間中の象限を指定することになる。即ち、２つのビットタグによって指定された４つの象限に、出力色値が効率的に分割される。このため、あるラインの各タグ値に対応する色出力値は出力色空間で離れて位置することになり、メモリパターンのエイリアスを削減することができる。 FIG. 63 shows how the 9-bit bank input 578 is remapped to 579, and the alias of the memory pattern can be removed by changing the order of bits 580-582. This can reduce the probability that adjacent pixel values are aliased to the same cache element. The reconstructed memory address 579 is used as an address to a corresponding memory bank (e.g., 585), each consisting of 128 entries of 32 bits. By accessing the memory 585 using the 7-bit line address, an output latched 586 is obtained for each memory bank. Each memory bank (eg, 585) has an associated tag memory consisting of 128 entries of 2 bits each. The 7-bit line address is also used to access the corresponding tag in this tag memory 587. By comparing the maximum 2 bits of address 579 with the corresponding tag in tag memory 587, it is determined whether the output color value is stored in the cache. The maximum 2 bits in the 9-bit address correspond to the maximum bits in the red and green data sections (see Table 12). Therefore, in the MOGCS mode, the RGB input color space is efficiently divided into four quadrants in the red and green dimensions, and a maximum of two bits of the 9-bit address designate quadrants in the RGB input color section. That is, the output color value is efficiently divided into four quadrants designated by two bit tags. For this reason, the color output values corresponding to each tag value of a certain line are located apart from each other in the output color space, and the alias of the memory pattern can be reduced.

２つのビットタグが一致しない場合には、データキャッシュ制御部はキャッシュミスを記録し、必要なメモリ読み出しがキャッシュルックアップ処理とともにデータキャッシュ制御部によって起動される。なお、キャッシュルックアップ処理は、２ビットタグエントリに対応するラインのすべての値が外部メモリから読み出され、キャッシュに格納されるまで停止状態にある。この処理においては、外部メモリに格納されている色変換テーブルの関連ラインを読み出す処理が含まれる。図６３の処理５７５は図６２の各メモリバンク（例えば５７３）ごとに実行されるため、キャッシュ内容によってはメモリバンクから結果（例えば５８６）が出力されるまでに時間が必要となることもある。データ５８６の８つの３２ビットセットは、この後主データパス部（２４２）に転送され、上述の補間処理（図６２）の３ステップ５９０−５９２がすべての色チャネル同時にかつパイプライン処理で実行され、プリンタデバイスに送る４つの色書津力５９５が生成される。 If the two bit tags do not match, the data cache controller records a cache miss and the required memory read is activated by the data cache controller along with the cache lookup process. Note that the cache lookup process is stopped until all values of the line corresponding to the 2-bit tag entry are read from the external memory and stored in the cache. This process includes a process of reading a related line of the color conversion table stored in the external memory. 63 is executed for each memory bank (for example, 573) in FIG. 62, and depending on the contents of the cache, it may take time until the result (for example, 586) is output from the memory bank. The eight 32-bit sets of data 586 are then transferred to the main data path section (242), and the three steps 590-592 of the interpolation process described above (FIG. 62) are performed simultaneously and pipelined on all color channels. Four color writing forces 595 to be sent to the printer device are generated.

実験によれば、一般的な画像におけるキャッシュのミス率が平均で０．０１から０．０３のピクセルごとのキャッシュラインフェッチであるので、図６２と図６３において示したキャッシュ機構が有効であることが示されている。このようなキャッシュ機構を用いることで、多くの場合、データキャッシュ外部のメモリアクセスに対する要求を大幅に低減することができる。 According to experiments, since the cache miss rate in a general image is a cache line fetch for each pixel having an average of 0.01 to 0.03, the cache mechanism shown in FIGS. 62 and 63 is effective. It is shown. By using such a cache mechanism, in many cases, the demand for memory access outside the data cache can be greatly reduced.

コプロセッサが行う２つの色空間変換モード（図１０）での命令符号化は以下の構造を有する。
色空間変換における命令符号化 The instruction encoding in the two color space conversion modes (FIG. 10) performed by the coprocessor has the following structure.
Instruction coding in color space conversion.

図６４は、色空間変換命令における命令フィールド符号化を示したものであり、色変換命令におけるマイナーオプコード符号化は以下のようになる。
色変換命令におけるマイナーオプコード符号化 FIG. 64 shows instruction field encoding in the color space conversion instruction. Minor opcode encoding in the color conversion instruction is as follows.
Minor opcode coding in color conversion instructions

図６５は、ＭＯＧＣＳモードにおいて、ＲＧＢピクセルストリームをＣＹＭＫ色値に変換する手法を示している。ステップＳ１において、２４ビットＲＧＢピクセルストリームがピクセルオーガナイザ２４６（図２）に入力される。ステップＳ２では、図５６と図５７で説明したように、ピクセルオーガナイザ２４６がルックアップテーブルを用いて各入力画素の４ビット区間値と８ビット区間内位置とを決定する。入力ピクセルの区間値と区間内位置は、入力ピクセルがどの区間に存在するのか、また区間内のどの位置に存在するのかを表すものである。ステップＳ３では、主データパス部２４２が入力ピクセルの赤、緑、青色要素の４ビット区間を結合して、１２ビットアドレスワードを生成し、この１２ビットアドレスワードをデータキャッシュ制御部２４０（図２）に送る。ステップＳ４では、表１２と図６２において説明したように、データキャッシュ制御部２４０がこの１２ビットアドレスワードを８つの９ビットアドレスに変換する。これらの８つのアドレスは、８つの出力値ＣＶ（Ｐ０）−ＣＶ（Ｐ７）のメモリバンク５７３（図６２）中の位置を示す。ステップＳ５では、データキャッシュ制御部２４０（図２）が８つの９ビットアドレスを、図６３で説明したように再マッピングする。このようにして、赤と緑の４ビット区間の最大ビットが、９ビットアドレスの最大２ビットにマッピングされる。 FIG. 65 shows a method of converting an RGB pixel stream into CYMK color values in the MOGCS mode. In step S1, a 24-bit RGB pixel stream is input to the pixel organizer 246 (FIG. 2). In step S2, as described with reference to FIGS. 56 and 57, the pixel organizer 246 determines the 4-bit interval value and the 8-bit interval position of each input pixel using the lookup table. The section value of the input pixel and the position in the section indicate in which section the input pixel exists and in the section. In step S3, the main data path unit 242 combines the 4-bit sections of the red, green, and blue elements of the input pixel to generate a 12-bit address word, and the 12-bit address word is converted into the data cache control unit 240 (FIG. 2). ) In step S4, as described in Table 12 and FIG. 62, the data cache control unit 240 converts the 12-bit address word into eight 9-bit addresses. These eight addresses indicate the positions of the eight output values CV (P0) -CV (P7) in the memory bank 573 (FIG. 62). In step S5, the data cache control unit 240 (FIG. 2) remaps the eight 9-bit addresses as described in FIG. In this manner, the maximum bits of the 4-bit section of red and green are mapped to the maximum 2 bits of the 9-bit address.

ステップＳ６では、データキャッシュ制御部２４０が９ビットアドレスの最大２ビットと、メモリ５８７（図６３）中の２ビットタグとを比較する。２ビットタグが９ビットアドレスの最大２ビットと一致しなければ、出力色値ＣＶ（Ｐ０）−ＣＶ（Ｐ７）はキャッシュメモリ２３０に存在しない。従ってステップＳ７において、２ビットタグエントリに対応する出力色値が外部メモリからデータキャッシュ２３０に読み込まれる。２ビットタグが９ビットアドレスの最大２ビットと一致する際には、データキャッシュ制御部２４０はステップＳ８において図６２において説明した要領で８つの出力色値ＣＶ（Ｐ０）−ＣＶ（Ｐ７）を取り出す。このようにして、入力ピクセルを取り囲む８つの出力色値ＣＶ（Ｐ０）−ＣＶ（Ｐ７）が主データパス部２４２によってデータキャッシュ２３０から取り込まれる。ステップＳ７では、ステップＳ２で決定された区間内位置を用いて出力色値ＣＶ（Ｐ０）−ＣＶ（Ｐ７）が主データパス部２４２において補間され、補間された出力色値が出力される。 In step S6, the data cache control unit 240 compares the maximum 2 bits of the 9-bit address with the 2-bit tag in the memory 587 (FIG. 63). If the 2-bit tag does not match the maximum 2 bits of the 9-bit address, the output color values CV (P0) -CV (P7) do not exist in the cache memory 230. Accordingly, in step S7, the output color value corresponding to the 2-bit tag entry is read from the external memory into the data cache 230. When the 2-bit tag matches the maximum 2 bits of the 9-bit address, the data cache control unit 240 takes out the eight output color values CV (P0) -CV (P7) in the manner described with reference to FIG. . In this way, the eight output color values CV (P0) -CV (P7) surrounding the input pixel are fetched from the data cache 230 by the main data path unit 242. In step S7, the output color values CV (P0) -CV (P7) are interpolated in the main data path unit 242 using the position in the section determined in step S2, and the interpolated output color value is output.

ここで、ＲＧＢ色空間や対応する出力色値を４象限以上、例えば３２ブロックに更に分割することにより、データキャッシュ容量の格納領域を低減することができることは、専門家にとっては明らかである。３２ブロックに分割する場合には、データキャッシュの格納容量は出力色値の１／３２ブロックのみで良い。また、ＭＯＧＣＳモードで用いられるデータキャッシュ機構を単一出力一般変換モードにおいて用いることもできることも、専門家にとっては明らかである。この場合にも、データキャッシュの格納領域を低減することができる。 Here, it is obvious to the expert that the storage area of the data cache capacity can be reduced by further dividing the RGB color space and the corresponding output color value into four quadrants or more, for example, 32 blocks. When dividing into 32 blocks, the storage capacity of the data cache may be only 1/32 blocks of the output color value. It is also clear to the expert that the data cache mechanism used in the MOGCS mode can also be used in the single output general conversion mode. Also in this case, the storage area of the data cache can be reduced.

３．１７．３ＪＰＥＧ符号化／復号
特にメモリの節約やある場所から他の場所への画像転送速度の観点において、画像を符号化して格納することによる利点は計り知れない。画像符号化としてはさまざまな広く流布している標準が生まれている。非常に有名な標準の１つがＪＰＥＧ標準であるが、ＪＰＥＧ標準に関する詳細な説明はＶａｎＮｏｓｔｒａｎｄＲｅｉｎｈｏｌｄにより１９９３年に出版されたＰｅｎｎｅｂａｋｅｒとＭｉｔｃｈｅｌｌによる著名な本「ＪＰＥＧ：ＳｔｉｌｌＩｍａｇｅＤａｔａＣｏｍｐｒｅｓｓｉｏｎＳｔａｎｄａｒｄ」を参照されたい。コプロセッサ２２４はＪＰＥＧ標準のサブセットを用いて画像を格納する。ＪＰＥＧ標準の利点は、画質を維持したまま大幅な圧縮率が得られる点である。もちろん、画像を圧縮して格納するためには他の標準を用いても良い。ＪＰＥＧ標準は専門家には良く知られた標準であり、ＡＳＩＣＳに用いることができるようなＪＰＥＧを実装した種々の製品がＪＰＥＧコア製品などを含む製造業者から市販されている。 3.17.3 JPEG encoding / decoding The benefits of encoding and storing images are immeasurable, especially in terms of memory savings and image transfer rates from one location to another. Various widely distributed standards for image coding have emerged. One very popular standard is the JPEG standard, but a detailed description of the JPEG standard can be found in the prominent book “JPEG: Still Image Data Compression Standard” by Pennebaker and Mitchell, published in 1993 by Van Nostrand Reinhold. I want. Coprocessor 224 stores images using a subset of the JPEG standard. The advantage of the JPEG standard is that a large compression rate can be obtained while maintaining the image quality. Of course, other standards may be used to store the compressed image. The JPEG standard is a standard well known to experts, and various products implementing JPEG that can be used for ASICS are commercially available from manufacturers including JPEG core products and the like.

コプロセッサ２２４は、１、３、４色要素から成る画像をＪＰＥＧ符号化／復号する機能を備えている。１色要素画像はメッシュでもメッシュでなくても良い。即ち、１色要素を、メッシュデータあるいはメッシュされていないデータのどちらかでも取り出すことができる。メッシュデータの例としてピクセルデータごとの３色要素（即ち、ピクセルデータごとのＲＧＢ）があり、メッシュされていないデータの例として、画像の各色要素が別々に格納されており各色要素を独立に処理できるようなデータが挙げられる。３色要素画像の場合には、コプロセッサ２２４は３色チャネルが最小３バイトに符号化されていると仮定して、ワードごとに１ピクセルを用いる。 The coprocessor 224 has a function of JPEG encoding / decoding an image composed of 1, 3, and 4 color elements. The one-color element image may or may not be a mesh. That is, one color element can be extracted from either mesh data or non-mesh data. As an example of mesh data, there are three color elements for each pixel data (that is, RGB for each pixel data), and as an example of non-mesh data, each color element of the image is stored separately and each color element is processed independently Data that can be used. In the case of a three-color element image, the coprocessor 224 assumes that the three-color channel is encoded to a minimum of 3 bytes and uses one pixel per word.

ＪＰＥＧ標準は画像を最小符号化部位（ＭＣＵ）と呼ばれる小さな２次元部位に分割する。ここで、各最小符号化部位は独立に処理される。ＪＰＥＧ符号化器（図２）は、ダウンサンプリングされた画像の横１６ピクセル、縦８ピクセルのＭＣＵでも良いし、ダウンサンプリングされていない画像の場合の横８ピクセル、縦８ピクセルのＭＣＵでも良い。 The JPEG standard divides an image into small two-dimensional parts called minimum coding parts (MCUs). Here, each minimum coding part is processed independently. The JPEG encoder (FIG. 2) may be a 16-pixel by 8-pixel vertical MCU of a downsampled image, or a 8-pixel by 8-pixel vertical MCU for a non-downsampled image.

図６６は、３要素画像をダウンサンプリングする手法を示している。元のピクセルデータ６００は、各ピクセルは６０１がＹＵＶ色空間でのＹ，Ｕ，Ｖ要素から成るピクセル形式でＭＵＶバッファ２５０（図２）に格納されている。このデータはまず４つのデータブロック６０１〜６０４から成るＭＣＵ部位に変換される。データブロックは種々の色要素を含み、ブロック６０１，６０２は直接サンプルされたＹ要素であり、ブロック６０３、６０４は図３の例においてサブサンプルされたＵ，Ｖ要素である。ここで、コプロセッサ２２４は２種類のサブサンプリング機能を備える。１つはフィルタリングしない直接サンプリングであり、奇数のピクセルデータを残し、偶数のピクセルデータを削除するものである。なお、隣接値の平均をとりＵ，Ｖ要素をフィルタリングすることもできる。 FIG. 66 shows a technique for down-sampling a three-element image. The original pixel data 600 is stored in the MUV buffer 250 (FIG. 2) in a pixel format in which each pixel 601 is composed of Y, U, and V elements in the YUV color space. This data is first converted into an MCU part consisting of four data blocks 601-604. The data block includes various color elements, blocks 601 and 602 are directly sampled Y elements, and blocks 603 and 604 are U and V elements subsampled in the example of FIG. Here, the coprocessor 224 has two types of subsampling functions. One is direct sampling without filtering, which leaves odd pixel data and deletes even pixel data. It is also possible to filter the U and V elements by taking the average of adjacent values.

もう一つのＪＰＥＧサブサンプリングは、図６７に示した４色チャネルサブサンプリングである。このサブサンプリングでは、１６×８ピクセル６１０のピクセルデータブロックが通常のＹ，Ｕ，Ｖ要素に加えて不透明度要素（Ｏ）を含む４要素６１１を有している。このピクセルデータ６１０も図６６と同様にサブサンプルされる。しかし、この場合には、不透明チャネルを用いてデータブロック６１２、６１３が作成される。 Another JPEG subsampling is the four-color channel subsampling shown in FIG. In this subsampling, a pixel data block of 16 × 8 pixels 610 has four elements 611 that include an opacity element (O) in addition to the normal Y, U, V elements. This pixel data 610 is also subsampled as in FIG. However, in this case, data blocks 612 and 613 are created using an opaque channel.

図６８は、図２のＪＰＥＧ符号化器２４１をより詳細に説明した図である。ＪＰＥＧ符号化／復号器２４１は、ＪＰＥＧ符号化と復号との双方を行う。符号化処理は、バス６２０を介してピクセルオーガナイザ２４６（図２）からブロックデータを受信する。ブロックデータはＭＵＶバッファ２５０に格納され、ブロックごとに処理がなされる。ＪＰＥＧ符号化処理はいくつかの明確なステップに分割される。これらのステップは、
１．ＤＣＴ部における離散コサイン変換の実行６２１
２．ＤＣＴ出力の量子化６２２
３．量子化器６２２で実行されるジグザグスキャンによるＤＣＴ係数の配置
４．係数符号化器６２３で実行されるＤＣＤＣＴ係数の予測符号化とＡＣＤＣＴ係数のランレンクス符号化
５．ハフマン符号化器６２４で実行される係数符号化器の出力の可変長符号化。出力はマルチプレクサ６２５とＲｂｕｓ６２６を介して結果オーガナイザ６２９（図２）に送られる。 FIG. 68 is a diagram illustrating the JPEG encoder 241 of FIG. 2 in more detail. The JPEG encoder / decoder 241 performs both JPEG encoding and decoding. The encoding process receives block data from the pixel organizer 246 (FIG. 2) via the bus 620. Block data is stored in the MUV buffer 250 and processed for each block. The JPEG encoding process is divided into a number of distinct steps. These steps are
1. Execute discrete cosine transform in DCT unit 621
2. Quantization of DCT output 622
3. 3. Arrangement of DCT coefficients by zigzag scanning executed by the quantizer 622 4. DC DCT coefficient predictive coding and AC DCT coefficient Runlenx coding performed in coefficient encoder 623 Variable length encoding of the output of the coefficient encoder performed by the Huffman encoder 624. The output is sent to result organizer 629 (FIG. 2) via multiplexer 625 and Rbus 626.

ＪＰＥＧ復号処理は、ＪＰＥＧ符号化動作を逆にしたものである。即ち、ＪＰＥＧ復号処理は、Ｂｕｓ６２０から圧縮されたＪＰＥＧブロックを入力する処理を含む。圧縮データはＢｕｓ６３０を介してハフマン符号化器６２４に送られ、データがＤＣ差分とＡＣランレンクスとに復号される。次に、データは係数符号化器６２３に送られ、ＡＣとＤＣ係数が復号され、通常のスキャンに戻される。その後、量子化器６２２においてＤＣ係数に対応する量子化値を乗算することでＤＣ係数の逆量子化が行われる。最後に、ＤＣＴ部６２１において逆離散コサイン変換が施されもとのデータが復元され、Ｂｕｓ６３１を介してマルチプレクサ６２５、Ｂｕｓ６２６を介して結果オーガナイザに送られる。ＪＰＥＧ符号化器２４１は、ＪＰＥＧ符号化器の動作を開始させるために命令制御部によってセットされたレジスタを含むような標準Ｃｂｕｓインタフェース６３２を介しての通常の方法で動作する。また、量子化器６２２とハフマン符号化器６２４はテーブルを必要とするが、これは必要時にデータキャッシュ２３０からロードされる。テーブルデータは、Ｏｂｕｓインタフェース部６３４を介してアクセスされる。ここでＯｂｕｓインタフェース部６３４はオペランドオーガナイザＢ２４７に接続され、データキャッシュ制御部２４０と作用しあう。 The JPEG decoding process is the reverse of the JPEG encoding operation. That is, the JPEG decoding process includes a process of inputting a JPEG block compressed from Bus 620. The compressed data is sent to the Huffman encoder 624 via the Bus 630, and the data is decoded into a DC difference and an AC run length. The data is then sent to a coefficient encoder 623 where AC and DC coefficients are decoded and returned to normal scanning. Thereafter, the quantizer 622 multiplies the quantized value corresponding to the DC coefficient to perform inverse quantization of the DC coefficient. Finally, the DCT unit 621 performs inverse discrete cosine transform to restore the original data, which is sent to the result organizer via the Bus 631 and the multiplexer 625 and Bus 626. The JPEG encoder 241 operates in the normal manner via the standard Cbus interface 632, including registers set by the instruction controller to initiate operation of the JPEG encoder. Also, the quantizer 622 and the Huffman encoder 624 require a table, which is loaded from the data cache 230 when necessary. The table data is accessed via the Obus interface unit 634. Here, the Obus interface unit 634 is connected to the operand organizer B 247 and interacts with the data cache control unit 240.

ＤＣＴ部６２１はピクセルデータに対して離散コサイン変換と逆離散コサイン変換とを行う。ＤＣＴに関しては、さまざまな種類のＤＣＴ変換実現手法が知られており、「ＳｔｉｌｌＩｍａｇｅＤａｔａＣｏｍｐｒｅｓｓｉｏｎＳｔａｎｄａｒｄ」（同上）の中にも記されているものの、ＤＣＴ６２１は以下の項「高速ＤＣＴ装置」で詳述する高速手法を用いている。なお、ＤＣＴ変換動作においては、ＴｈｅＴｒａｎｓａｃｔｉｏｎｓｏｆｔｈｅＩＥＩＣＥ，ｖｏｌ．Ｅ７１，ｎｏ．１１，Ｎｏｖｅｍｂｅｒ１９８８の１０９５ページに掲載されたにＡｒａｉらによる論文「ＡＦａｓｔＤＣＴ−ＳＱＳｃｈｅｍｅｆｏｒＩｍａｇｅｓ」に基づくＤＣＴ変換手法を用いることもできる。 The DCT unit 621 performs discrete cosine transform and inverse discrete cosine transform on the pixel data. Regarding DCT, various types of DCT transformation implementation methods are known, and although described in “Still Image Data Compression Standard” (same as above), DCT 621 is described in detail in the following section “High-Speed DCT Device”. The high-speed method described is used. In the DCT conversion operation, The Transactions of the IEICE, vol. E71, no. 11, November 1988, page 1095, a DCT conversion technique based on the paper “A Fast DCT-SQ Scheme for Images” by Arai et al. Can also be used.

量子化器６２２はＤＣＴ係数の量子化と逆量子化を行い、データキャッシュに格納された対応するテーブルから関連値をＯｂｕｓインタフェース部６３４を介して取り出すことで動作する。量子化処理においては、入力データストリームは、データキャッシュ中の量子化テーブルから読み出された値でもって除算される。この除算は固定小数点の乗算として実装される。また、逆量子化処理では、データストリームは逆量子化テーブル中の値と乗算される。 The quantizer 622 operates by performing quantization and inverse quantization of the DCT coefficient, and extracting a related value from the corresponding table stored in the data cache via the Obus interface unit 634. In the quantization process, the input data stream is divided by the value read from the quantization table in the data cache. This division is implemented as a fixed point multiplication. In the inverse quantization process, the data stream is multiplied by the value in the inverse quantization table.

図６９は、逆量子化６２２をより詳細に説明した図である。量子化器６２２は、ローカルバスを介してＤＣＴモジュール６２１にデータを渡したり、ＤＣＴモジュール６２１からデータを受け取ったりするＤＣＴインタフェース６４０を備える。量子化処理においては、量子化器６２２はクロックサイクルごとに２つのＤＣＴ係数を受信する。これらの値は量子化器の内部バッファ６４１、６４２の１つに書き込まれる。バッファ６４１、６４２は入力データをバッファするための２つのポートを備えたバッファである。量子化処理において、ＤＣＴサブモジュール６２１からの係数データはバッファ６４１、６４２の１つに格納される。バッファがフルになると、データはバッファからジグザグスキャンで読み出され、Ｏｂｕｓインタフェース部６３４を介して受信した量子化値でもって乗算器６４３で乗算される。この出力は係数符号化インタフェース６４５を介して係数符号化器６２３（図６８）に転送される。これらの処理を行っている間、次のブロックの係数が他のバッファに書き込まれている。ＪＰＥＧ復号処理において、量子化モジュールは、テーブルに格納された値でもって復号されたＤＣＴ係数を乗算することで逆量子化処理を行う。量子化と逆量子化とはそれぞれ排他的な動作をするため、乗算器６４３は量子化と逆量子化との双方において用いられる。なお、逆量子化テーブルへのインデックスとして、８×８のブロック中の係数の位置を用いる。 FIG. 69 is a diagram illustrating the inverse quantization 622 in more detail. The quantizer 622 includes a DCT interface 640 that passes data to the DCT module 621 via the local bus and receives data from the DCT module 621. In the quantization process, the quantizer 622 receives two DCT coefficients every clock cycle. These values are written into one of the quantizer internal buffers 641, 642. Buffers 641 and 642 are buffers having two ports for buffering input data. In the quantization process, the coefficient data from the DCT submodule 621 is stored in one of the buffers 641 and 642. When the buffer becomes full, data is read out from the buffer by zigzag scanning, and is multiplied by the multiplier 643 with the quantized value received via the Obus interface unit 634. This output is transferred to the coefficient encoder 623 (FIG. 68) via the coefficient encoding interface 645. While these processes are being performed, the coefficient of the next block is written in another buffer. In the JPEG decoding process, the quantization module performs the inverse quantization process by multiplying the decoded DCT coefficient by the value stored in the table. Since quantization and inverse quantization operate exclusively, the multiplier 643 is used for both quantization and inverse quantization. Note that the position of the coefficient in the 8 × 8 block is used as an index to the inverse quantization table.

量子化処理と同様に、２つのバッファ６４１、６４２が係数符号化器６２３（図６８）からの入力係数データをバッファするために用いられる。データは量子化値と乗算され、逆ジグザグスキャン順にバッファに書き込まれる。バッファがフルになると、逆量子化された係数が通常の順番でバッファから２つ同時に読み出され、ＤＣＴインタフェース６４０を介してＤＣＴサブモジュール６２１（図６８）に送られる。従って、係数符号化器インタフェースモジュール６４５は、係数符号化器とのインタフェースとなっており、ローカルバスを介して符号化器にデータを送ったり符号化器からデータを読み出したりする。このモジュールは、符号化時にはジグザグスキャン順でバッファからデータを読み出し、復号時には逆ジグザグスキャン順でバッファにデータを書き込む。ＤＣＴインタフェースモジュール６４０とＣＣインタフェースモジュール６４５ともバッファからの読み出しや書き込みを行うことができる。そのため、アドレス／制御マルチプレクサ６４７を用いて、各インタフェースがどちらのバッファと動作しているのかを、量子化器のすべてのモジュールを制御するための状態マシンから成る制御モジュール６４８の制御のもとで、決定する。乗算器６４３は、１６×８の２の補数の乗算器を用いてＤＣＴ係数を量子化テーブル値と乗算しても良い。 Similar to the quantization process, two buffers 641 and 642 are used to buffer the input coefficient data from the coefficient encoder 623 (FIG. 68). Data is multiplied by the quantized value and written to the buffer in reverse zigzag scan order. When the buffer is full, two dequantized coefficients are simultaneously read from the buffer in the normal order and sent to the DCT submodule 621 (FIG. 68) via the DCT interface 640. Accordingly, the coefficient encoder interface module 645 serves as an interface with the coefficient encoder, and sends data to the encoder via the local bus and reads data from the encoder. This module reads data from the buffer in zigzag scan order at the time of encoding, and writes data to the buffer in reverse zigzag scan order at the time of decoding. Both the DCT interface module 640 and the CC interface module 645 can read and write from the buffer. Therefore, the address / control multiplexer 647 is used to determine which buffer each interface is operating under the control of a control module 648 consisting of a state machine to control all the modules of the quantizer. ,decide. The multiplier 643 may multiply the quantization table value by the DCT coefficient using a 16 × 8 two's complement multiplier.

図６８において、係数符号化器６２３は以下の機能を実行する。
（ａ）ＪＰＥＧモードにおけるＤＣ係数の予測符号化／復号
（ｂ）ＪＰＥＧモードにおけるＡＣ係数のランレンクス符号化／復号
なお、係数符号化器６２３は、ＪＰＥＧモード動作とは別に、必要な時点でピクセルの予測符号化／復号やメモリコピー動作のために用いることができると好ましい。係数符号化器６２３は、ピンクブックに規定されているように、ＤＣ／ＡＣ係数の予測／ランレンクス符号化／復号を行う。また、ＪＰＥＧ標準に規定されているようなＪＰＥＧＡＣ係数のランレンクス符号化／復号に加えて、標準の予測符号化／復号機能も備えている。 In FIG. 68, the coefficient encoder 623 performs the following functions.
(A) Predictive encoding / decoding of DC coefficient in JPEG mode (b) Run coefficient encoding / decoding of AC coefficient in JPEG mode It should be noted that the coefficient encoder 623 performs pixel encoding at a required time separately from the JPEG mode operation. It can be used for predictive encoding / decoding and memory copy operations. The coefficient encoder 623 performs DC / AC coefficient prediction / Runlenx encoding / decoding as defined in the Pink Book. In addition to JPEG AC coefficient runlenx encoding / decoding as defined in the JPEG standard, a standard predictive encoding / decoding function is also provided.

ハフマン符号化器６２４は、ＪＰＥＧデータ列のハフマン符号化／復号を行う。ハフマン符号化モードでは、係数符号化器６２３からランレンクス符号化されたデータが受信され、パックバイトのハフマンストリームが生成される。また、ハフマン復号モードでは、ハフマンストリームがＰｂｕｓインタフェース６２０からパックバイト形式で読み出され、ハフマン復号された係数が係数符号化モジュール６２３に送られる。ハフマン符号化器６２４は、データキャッシュに格納され、Ｏｂｕｓインタフェース６３４を介してアクセスされるハフマンテーブルを利用する。或は、ハフマンテーブルをハードで構成して高速にすることもできる。 The Huffman encoder 624 performs Huffman encoding / decoding of the JPEG data string. In the Huffman coding mode, the Runlenx-encoded data is received from the coefficient encoder 623, and a Huffman stream of packed bytes is generated. In the Huffman decoding mode, the Huffman stream is read from the Pbus interface 620 in a packed byte format, and the Huffman decoded coefficient is sent to the coefficient encoding module 623. The Huffman encoder 624 uses a Huffman table stored in the data cache and accessed via the Obus interface 634. Alternatively, the Huffman table can be configured at high speed by hardware.

ハフマン符号化においてデータキャッシュを用いるときには、データキャッシュの８つのバンクは、以下に各テーブルごとに詳細に説明されているようにデータテーブルを格納する。
データキャッシュに格納されているハフマン、量子化テーブル When using a data cache in Huffman encoding, the eight banks of the data cache store data tables as described in detail below for each table.
Huffman and quantization table stored in data cache

図７０において、ハフマン符号化器６２４は、符号化器６６０と復号器６６１との２つの独立のブロックから主に構成される。双方のブロック６６０、６６１はマルチプレクサモジュール６６２を介して同じＯｂｕｓインタフェースを共有する。各ブロックは、それぞれ入力と出力を有し、ＪＰＥＧ符号化器で実行される機能に応じて、一時点ではどちらか１つのブロックのみがアクティブとなる。
ａ．符号化
ＪＰＥＧモードにおける符号化においては、ハフマンテーブルを用いて、ＤＣ差分値やＡＣランレンクス値に可変長コード（コードごとに１６ビットまで）を割り当てられる。割り当てられたコードは、ＣＣサブモジュールからＨＣサブモジュールに送られる。また、ハフマンテーブルは動作開始前にデータキャッシュから予めロードされていなければならない。そして、可変長コードをＣＣサブモジュールから送られてきたＤＣやＡＣ係数の他のビットと結合し、パックバイト形式が生成される。パック処理の結果、Ｘ’ＦＦバイトが得られたとすると、Ｘ’００バイトが挿入される。ＲＳＴｍマーカが必要なときはマーカが挿入されるが、この際には、最後のハフマン符号の「１」ビットでのバイト詰込処理と、詰込まれたバイトがＸ’ＦＦになったときのＸ’００バイト挿入処理が行われる。ＲＳＴｍマーカが必要かどうかは、ＣＣサブモジュールによって指示される。また、ＨＣサブモジュールは、Ｐｂｕｓ−ＣＣスレーブインタフェース上の「最後の」信号での指示により、画像の最後にＥＯＩマーカを挿入する。ＥＯＩマーカの挿入処理においては、ＲＳＴｍマーカと同様のパック処理、詰込み処理、挿入処理が必要となる。最後に、出力ストリームはパックバイトとして結果オーガナイザ２４９に送られ、外部メモリに書き込まれる。 In FIG. 70, the Huffman encoder 624 is mainly composed of two independent blocks of an encoder 660 and a decoder 661. Both blocks 660, 661 share the same Obus interface via the multiplexer module 662. Each block has its own input and output, and only one of the blocks is active at a time, depending on the function performed by the JPEG encoder.
a. Encoding In encoding in the JPEG mode, a variable length code (up to 16 bits for each code) is assigned to a DC difference value or an AC run-length value using a Huffman table. The assigned code is sent from the CC submodule to the HC submodule. Also, the Huffman table must be previously loaded from the data cache before the operation starts. Then, the variable length code is combined with other bits of the DC and AC coefficients sent from the CC submodule to generate a packed byte format. If X'FF bytes are obtained as a result of the pack processing, X'00 bytes are inserted. When an RSTm marker is required, a marker is inserted. At this time, a byte stuffing process with “1” bit of the last Huffman code, and when the stuffed byte becomes X′FF X'00 byte insertion processing is performed. Whether a RSTm marker is required is indicated by the CC submodule. In addition, the HC submodule inserts an EOI marker at the end of the image in accordance with an instruction with the “last” signal on the Pbus-CC slave interface. In the EOI marker insertion processing, the same pack processing, clogging processing, and insertion processing as the RSTm marker are required. Finally, the output stream is sent to the result organizer 249 as packed bytes and written to external memory.

非ＪＰＥＧモードの場合には、ＣＣサブモジュール（Ｐｂｕｓ−ＣＣスレーブインタフェース）からアンパックデータとして符号化器にデータが送られる。各バイトは（ＪＰＥＧモードと同様に）キャッシュにあらかじめロードされたテーブルを用いて独立に符号化され、可変長シンボルがパックバイト形式にまとめられ、結果オーガナイザ２４９に送られる。なお、出力ストリームの最後のバイトは１での詰込処理が行われる。
ｂ．復号
復号アルゴリズムは、高速（リアルタイム）のものと低速のものとを備える。高速アルゴリズムはＪＰＥＧモードのみで動作し、低速アルゴリズムはＪＰＥＧモードでも非ＪＰＥＧモードでも動作する。 In the non-JPEG mode, data is sent from the CC submodule (Pbus-CC slave interface) to the encoder as unpacked data. Each byte is independently encoded using a table pre-loaded into the cache (similar to JPEG mode), variable length symbols are packed into packed byte format and sent to result organizer 249. The last byte of the output stream is padded with 1.
b. Decoding Decoding algorithms include high speed (real time) and low speed. The high-speed algorithm operates only in the JPEG mode, and the low-speed algorithm operates in both JPEG mode and non-JPEG mode.

高速ＪＰＥＧハフマン復号アルゴリズムは、ハフマンシンボルをＤＣ差分値あるいはＡＣランレンクス値のどちらかにマッピングする。これは特にＪＰＥＧに適するように設計されており、符号化時において例のハフマンテーブル（Ｋ３，Ｋ４，Ｋ５，Ｋ６）が用いられることを想定している。なお、これらのテーブルは、キャッシュメモリを参照することなく復号できるように、アルゴリズム中にハード的に埋め込まれている。このような復号処理は、あるデータレートを保証しつつ復号画像を印刷しなければならないような場合を想定したものである。バンド（ＲＳＴｍマーカで区切られたブロック）を復号するＨＣサブモジュールのデータレートは、１クロックサイクルでほぼ１つのＤＣ／ＡＣ係数である。ＨＣサブモジュールとＣＣサブモジュール間では、データストリームからＸ’００挿入バイトを削除するために、１クロックサイクル必要になることもあるが、これはデータに強く依存している。 The fast JPEG Huffman decoding algorithm maps Huffman symbols to either DC difference values or AC run-length values. This is designed to be particularly suitable for JPEG, and assumes that the example Huffman tables (K3, K4, K5, K6) are used during encoding. Note that these tables are embedded in the hardware so that they can be decoded without referring to the cache memory. Such a decoding process assumes a case where a decoded image must be printed while guaranteeing a certain data rate. The data rate of the HC submodule that decodes a band (block delimited by RSTm markers) is approximately one DC / AC coefficient in one clock cycle. Between the HC sub-module and the CC sub-module, it may take one clock cycle to delete the X'00 insertion byte from the data stream, but this is strongly dependent on the data.

ハフマン復号器は高速モードで動作し、クロックサイクルごとに１ハフマンシンボルを抽出する。なお、高速ハフマン復号器については、以下の「可変長符号の復号器」において記している。また、ハフマン復号器６６１は、ヒープに基づく低速復号アルゴリズムを備えており、図７１に示す構造６７０となっている。 The Huffman decoder operates in high speed mode and extracts one Huffman symbol every clock cycle. The high-speed Huffman decoder is described in the following “decoder of variable length code”. The Huffman decoder 661 has a low-speed decoding algorithm based on a heap, and has a structure 670 shown in FIG.

ＪＰＥＧ符号化ストリームに対して、ストリッパー６７１においてＸ’００挿入バイト、Ｘ’ＦＦ詰込バイト、ＲＳＴｍマーカが取り除かれ、結合された他のビットとともにハフマンシンボルがシフター６７２に送られる。なお、ハフマンのみの符号化ストリームではこの処理は行われない。ハフマンシンボル復号の最初のステップは、ハフマンデータストリームの最初の８ビットでアドレシングされたキャッシュに格納されたＨＵＦＶＡＬテーブルの２５６のエントリをルックアップする処理である。この値が対応するハフマンシンボルの真の長さである場合には、当該値が出力フォーマッター６７６に転送され、復号値のシンボル長と付加ビット数とがシフター６７２にフィードバックされ、関連する付加ビットを出力フォーマッター６７６に転送し、復号部６７３に送るハフマンストリームの新しい開始部位を整列する。ここで、付加ビット数は復号値の関数である。最初のルックアップが復号値にならなかった場合、即ちハフマンシンボルが８ビット以上であった場合には、ヒープアドレスが計算され、一致するまで、あるいは「不適切ハフマンシンボル」条件が満たされるまで、引き続きヒープ（キャッシュ内に位置）アクセスが実行される。ルックアップが一致すると上記と同様の処理が行われ、「不適切ハフマンシンボル」条件が満たされた場合にはインタラプト状態となる。 For the JPEG encoded stream, the stripper 671 removes the X'00 insertion byte, the X'FF padding byte, and the RSTm marker and sends the Huffman symbol to the shifter 672 along with the other combined bits. Note that this process is not performed for an Huffman-only encoded stream. The first step in Huffman symbol decoding is to look up the 256 entries in the HUFVAL table stored in the cache addressed with the first 8 bits of the Huffman data stream. If this value is the true length of the corresponding Huffman symbol, the value is transferred to the output formatter 676, and the symbol length of the decoded value and the number of additional bits are fed back to the shifter 672, and the associated additional bits are replaced. The new start part of the Huffman stream that is transferred to the output formatter 676 and sent to the decoding unit 673 is aligned. Here, the number of additional bits is a function of the decoded value. If the first lookup did not result in a decoded value, i.e. if the Huffman symbol was 8 bits or more, the heap address was calculated and matched or until the "inappropriate Huffman symbol" condition was met Continue heap (positioned in the cache) access. If the lookups match, the same processing as described above is performed, and an interrupt state is entered when the “inappropriate Huffman symbol” condition is satisfied.

ヒープに基づく復号アルゴリズムは以下の通りである。
画像の最後までループ
シンボル長Ｎを８にセット
入力ストリームの最初の８ビットをＩＮＤＥＸに格納
ＨＵＦＶＡＬ（ＩＮＤＥＸ）をフェッチ
ＩｆＨＵＦＶＡＬ（ＩＮＤＥＸ）＝＝００ｘｘ０００１１１．．（ＩＬＬ）
「不適切ハフマンシンボル」信号の送出
ｅｘｉｔ
ｅｌｓｅｉｆＨＵＦＶＡＬ（ＩＮＤＥＸ）＝＝１ｎｎｎｅｅｅｅｅｅｅ
ｅ−−（ＨＩＴ）
ｎｎｎビットをｅｅｅｅｅｅｅｅに値として転送
シンボル長Ｎ＝ｄｅｃｉｍａｌ（ｎｎｎ）を転送
／＊０００がシンボル長８として＊／
入力ストリームの調整
ｂｒｅａｋ
ｅｌｓｅ／＊ＨＵＦＶＡＬ（ＩＮＤＥＸ）＝＝０１ｉｉｉｉｉｉｉｉｉｉ
ｉ−−（ＭＩＳＳ）
ＨＥＡＰＩＮＤＥＸ＝＝ｉｉｉｉｉｉｉｉｉｉにセット（ヒープベース
を０に仮定）
Ｎ＝９にセット
Ｉｆ入力ストリームの第９ビットが０である
ＨＥＡＰＩＮＤＥＸを１増加
ｆｉ
ＶＡＬＵＥ＝ＨＥＡＰ（ＨＥＡＰＩＮＤＥＸ）のフェッチ（第９ビットの
符号）
Ｌｏｏｐ
ＩｆＶＡＬＵＥ＝＝０００１００００１１１１−−（ＮＬ）
「不適切ハフマンシンボル」信号の送出
ｅｘｉｔ
ｅｌｓｅｉｆＶＡＬＵＥ＝＝＝１０００ｅｅｅｅｅｅｅｅ
ｅｅｅｅｅｅｅｅを値として転送
シンボル長Ｎを転送
入力ストリームの調整
ｂｒｅａｋ
ｅｌｓｅ／＊ＶＡＬＵＥ＝＝０１ｉｉｉｉｉｉｉｉｉｉｉ−−（
ＭＩＳＳ）
Ｎ＝Ｎ＋１にセット（ＨＥＡＰＩＮＤＥＸ＝ｉｉｉｉｉｉｉｉ
ｉｉ）
Ｉｆ入力ストリームの第Ｎビットが０
ＨＥＡＰＩＮＤＥＸを１増加
ｆｉ
ＶＡＬＵＥ＝ＨＥＡＰ（ＨＥＡＰＩＮＤＥＸ）のフェッチ
ｐｏｏｌ
ｐｏｏｌ
ストリッパ６７１は、入力ＪＰＥＧ６７１符号化ストリームからＸ’００挿入バイト、Ｘ’ＦＦ詰込みバイト、ＲＳＴｍマーカを削除し、「きれいな」ハフマンシンボルを連結された付加ビットとともにシフタ６７２に転送する。ハフマンのみの符号化においては他の付加ビットは存在しないため、このモードにおいては転送されたストリームはハフマンシンボルのみから成る。 The decryption algorithm based on the heap is as follows.
Loop to end of image Set symbol length N to 8 Store first 8 bits of input stream in INDEX Fetch HUFVAL (INDEX) If HUFVAL (INDEX) == 00xx000111. . (ILL)
Sending “inappropriate Huffman symbol” signal exit
elseif HUFVAL (INDEX) == 1nnn eeee eeee
e-(HIT)
nnn bits are transferred to eeeeeeee as value Symbol length N = decimal (nnn) is transferred / * 000 is symbol length 8 * /
Input stream adjustment break
else / * HUFVAL (INDEX) == 01iii iiiiii iii
i-(MISS)
HEAPINDEX == set to iii iiiiii (assuming heap base is 0)
Set N = 9 If 9th bit of input stream is 0 Increase HEAPINDEX by 1 fi
VALUE = HEAP (HEAPINDEX) fetch (9th bit code)
Loop
If VALUE == 0001 0000 1111-(NL)
Sending “inappropriate Huffman symbol” signal exit
else Value VALUE === 1000 eeee eeee
Transfer eeeeee as a value Transfer symbol length N Adjust input stream break
else / * VALUE == 01iii iiiiiiiii-(
MISS)
Set N = N + 1 (HEAPINDEX = ii iiiiii
ii)
If the Nth bit of the input stream is 0
Increase HEAPINDEX by 1 fi
VALUE = HEAP (HEAPINDEX) fetch pool
pool
The stripper 671 deletes the X′00 insertion byte, the X′FF padding byte, and the RSTm marker from the input JPEG 671 encoded stream, and transfers the “clean” Huffman symbol to the shifter 672 along with the concatenated additional bits. In Huffman-only coding, there are no other additional bits, so in this mode the transferred stream consists only of Huffman symbols.

シフタ６７２ブロックは１６ビット出力レジスタを備え、次のハフマンシンボルを復号部６７３に（ＭＳＢからＬＳＢの順番のビットストリームで）転送する。シンボルは１６ビット以下であることも多いが、どれだけのビットを解析するかを決定するのは復号部６７３に任されている。シフタ６７２は復号部６７３からフィードバック６７８、即ち現在のシンボル長と（ＪＰＥＧモードにおける）現シンボルに続く付加ビット長とを受信し、シフタ６７２における次のシンボルの開始時点を適切に整列させる。 The shifter 672 block includes a 16-bit output register, and transfers the next Huffman symbol to the decoding unit 673 (in the bit stream in the order of MSB to LSB). The symbol is often 16 bits or less, but it is left to the decoding unit 673 to determine how many bits to analyze. The shifter 672 receives feedback 678 from the decoding unit 673, that is, the current symbol length and the additional bit length following the current symbol (in JPEG mode), and appropriately aligns the start time of the next symbol in the shifter 672.

復号部６７３はヒープに基づくアルゴリズムのコアを実装しており、Ｏｂｕｓ６７４経由でデータキャッシュに接続されている。復号部６７３は、データキャッシュフェッチブロック、ルックアップ値比較部、シンボル長カウンター、ヒープインデックス加算部、付加ビット数の復号部（復号は復号値に基づいて行われる）を備える。ここで、フェッチアドレスは以下のように解釈される。 The decryption unit 673 has a core of an algorithm based on a heap, and is connected to the data cache via the Obus 674. The decoding unit 673 includes a data cache fetch block, a lookup value comparison unit, a symbol length counter, a heap index addition unit, and an additional bit number decoding unit (decoding is performed based on the decoded value). Here, the fetch address is interpreted as follows.

フェッチアドレス Fetch address

出力フォーマッターブロック６７６は８ビット値の復号や（スタンドアロンハフマンモード）、２４ビット値と付加ビットとＲＳＴｍマーカ情報との３２ビットワードへの結合（ＪＰＥＧモード）を行う。付加ビットは、復号部６７３が現シンボルに対する付加ビットの開始位置を決定した後に、シフタ６７２によって出力フォーマッタ６７６に転送される。また、出力フォーマッタ６７３は、最終値ワードを予測するために１ワード遅延を用いた２ディープＦＩＦＯバッファを備えている。復号処理においては、（高速、低速どちらでも）シフタ６７２が入力ビットストリームの最後部の詰込みビットを復号しようと試みることが生じる。このような状態はシフタによって通常検出され、「不適切シンボル」インタラプトを送出する替わりに、「強制終了」信号を送出する。アクティブな「強制終了」信号が送出されると、出力フォーマッタ６７６は最近の１復号ワード（ＦＩＦＯにまだ存在している）を「最後」として送出し、復号ストリームに属していない更に最近のワードを削除する。 The output formatter block 676 decodes 8-bit values (stand-alone Huffman mode), and combines 24-bit values, additional bits, and RSTm marker information into 32-bit words (JPEG mode). The additional bits are transferred to the output formatter 676 by the shifter 672 after the decoding unit 673 determines the start position of the additional bits for the current symbol. The output formatter 673 also includes a 2 deep FIFO buffer that uses a 1 word delay to predict the final value word. In the decoding process, the shifter 672 (both high speed and low speed) may attempt to decode the padding bit at the end of the input bitstream. Such a state is normally detected by the shifter, and instead of sending an “inappropriate symbol” interrupt, a “forced termination” signal is sent. When an active “kill” signal is sent, the output formatter 676 sends the last one decoded word (still in the FIFO) as “last” and the more recent word that does not belong to the decoded stream. delete.

図７０におけるハフマン符号化器６６０の詳細を図７２に示す。ハフマン符号化器６６０はルックアップテーブルを介してバイトデータをハフマンシンボルにマッピングし、符号化部６８１、シフタ６８２、出力フォーマッタ６８３、キャッシュからアクセスされるルックアップテーブルを備える。入力値６８５はデータキャッシュに格納された符号化テーブルを用いて符号化部６８１において符号化される。テーブルとしては、符号化すべき値ごとに対応コードを含むテーブルとコード長を含むテーブルとの２つのテーブルが必要となるが、シンボルを符号化する際にはキャッシュ２３０へのアクセスは一度で良い。なお、ＪＰＥＧ圧縮においては、ＡＣ係数とＤＣ係数ごとに別のテーブルが必要となる。また、サブサンプリングが実行されている場合には、サブサンプル要素と非サブサンプル要素ごとに別のテーブルが必要となる。非ＪＰＥＧ圧縮では、２つのテーブル（符号とサイズ）のみが必要である。符号はシフタ６８２によって処理されて、出力ストリームをビットレベルで構成する。また、シフタ６８２は、必要時のバイトパディング処理であるＲＳＴｍとＥＯＩマーカ挿入処理をも行う。そして、データバイトは出力フォーマッタ６８３に転送され、Ｘ’００バイトでの挿入処理、Ｘ’ＦＦバイトやマーカ符号に先立つＦＦバイトでの詰込処理、パッキングされたバイトのフォーマット処理を行う。なお、非ＪＰＥＧモードでは、パッキングされたバイトのフォーマット処理のみが行われる。 FIG. 72 shows details of the Huffman encoder 660 in FIG. The Huffman encoder 660 maps byte data to Huffman symbols via a lookup table, and includes a lookup table accessed from the encoding unit 681, shifter 682, output formatter 683, and cache. The input value 685 is encoded by the encoding unit 681 using the encoding table stored in the data cache. As the tables, two tables, that is, a table including a corresponding code and a table including a code length are required for each value to be encoded. However, when encoding a symbol, the cache 230 may be accessed only once. In JPEG compression, a separate table is required for each AC coefficient and DC coefficient. Further, when subsampling is performed, a separate table is required for each subsample element and non-subsample element. Non-JPEG compression requires only two tables (code and size). The code is processed by shifter 682 to construct the output stream at the bit level. The shifter 682 also performs RSTm and EOI marker insertion processing, which is byte padding processing when necessary. Then, the data bytes are transferred to the output formatter 683, and an insertion process with X'00 bytes, a stuffing process with X'FF bytes and FF bytes prior to the marker code, and a packed byte formatting process are performed. In the non-JPEG mode, only the packed byte format processing is performed.

Ｘ’ＦＦバイトの挿入処理はシフター６８２によって行われるため、出力フォーマッタ６８３はＸ’ＦＦバイトを前に挿入するために、シフタ６８２からのどのバイトがマーカであるのかを知る必要がある。これは、バイトに対応しているタグレジスタをシフター６８２内に備えることによって行われる。バイト境界に存在する各マーカは、マーカ挿入処理においてシフター６８２によってタグ付けされる。結合処理部６８３はマーカに先立つＸ”ＦＦ”バイト以降には挿入処理を行わない。タグは、主シフトレジスタと同期してシフトされる。 Since the X′FF byte insertion process is performed by the shifter 682, the output formatter 683 needs to know which byte from the shifter 682 is the marker in order to insert the X′FF byte forward. This is done by providing in the shifter 682 a tag register corresponding to the byte. Each marker present at a byte boundary is tagged by a shifter 682 in the marker insertion process. The combination processing unit 683 does not perform insertion processing after the X "FF" byte preceding the marker. The tag is shifted in synchronization with the main shift register.

ハフマン符号化器はＪＰＥＧ圧縮において４あるいは８つのテーブルを用い、２つのテーブルを直接ハフマン符号化に用いる。用いるテーブルを以下に示す。
ハフマン符号化器において用いられるテーブル The Huffman encoder uses 4 or 8 tables in JPEG compression and uses 2 tables directly for Huffman encoding. The table used is shown below.
Table used in a Huffman encoder

３．１７．４テーブルインデックシング
ハフマンテーブルは、コプロセッサデータキャッシュ２３０において局所的に格納されている。データキャッシュ２３０は、各ラインが８ワードから成る１２８ラインの直接マッピングキャッシュとして構成される。キャッシュライン中の各ワードは独立にアドレスすることができ、この特徴をハフマン復号器が利用して同時に複数のテーブルにアクセスする。テーブルは小さい（≦２５６項目）なので、Ｏｂｕｓの３２ビットアドレスフィールドで複数のテーブルへのインデックスを含めることができる。 3.17.4 Table Indexing The Huffman table is stored locally in the coprocessor data cache 230. The data cache 230 is configured as a 128 line direct mapping cache with each line consisting of 8 words. Each word in the cache line can be addressed independently, and this feature is used by the Huffman decoder to access multiple tables simultaneously. Since the table is small (≦ 256 items), an index to multiple tables can be included in the 32-bit address field of Obus.

上述のように、ＪＰＥＧ低速復号モードでは、様々なハフマンテーブルを格納するためにデータキャッシュが用いられる。データキャッシュのフォーマットを以下に示す。
ハフマン／量子化テーブルのバンクアドレス As described above, in the JPEG low-speed decoding mode, a data cache is used to store various Huffman tables. The format of the data cache is shown below.
Bank address of Huffman / quantization table

ＪＰＥＧ符号化器２４１（図２）においてＪＰＥＧ命令が実行されるのに先立ち、画像次元レジスタ（ＰＯ＿ＩＤＲ）あるいは（ＲＯ＿ＩＤＲ）に適切な画像幅値がセットされなければならない。他の命令とともに、命令の長さは処理すべき入力データ項目数に関係する。これはいかなるパディングデータをも含み、用いられているサブサンプリングオプションや色チャネル数にも関連する。 Prior to execution of the JPEG instruction in the JPEG encoder 241 (FIG. 2), an appropriate image width value must be set in the image dimension register (PO_IDR) or (RO_IDR). As with other instructions, the length of the instruction is related to the number of input data items to be processed. This includes any padding data and is related to the subsampling options and number of color channels used.

コプロセッサ２２４により出されたすべての命令は、生成する出力データ量を制限するために２つの機能を用いる。これらの機能は、入力と出力データのサイズが異なるときにもっとも有効であり、特にＪＰＥＧ符号化／復号のように出力データサイズが未知であるときに有効である。これらの機能は、出力データを書き出すか、命令が適切に実行されたように見せながら単にデータを削除するかを決定する。デフォルトではこの機能はオフになっており、ＲＯ＿ＣＦＧレジスタ中の適切なビットをイネーブルにすることでオンとなる。しかし、ＪＰＥＧ命令ではこのビットをセットする特別なオプションが用意されている。なお、ＪＰＥＧ圧縮を用いる際には、コプロセッサ２２４は出力データの「削除」や「制限」機能をサポートすることが望ましい。 All instructions issued by the coprocessor 224 use two functions to limit the amount of output data generated. These functions are most effective when the input and output data sizes are different, and are particularly effective when the output data size is unknown as in JPEG encoding / decoding. These functions determine whether to write the output data or simply delete the data while making it appear that the instruction was properly executed. By default, this feature is turned off and is turned on by enabling the appropriate bit in the RO_CFG register. However, the JPEG instruction provides a special option for setting this bit. Note that when using JPEG compression, the coprocessor 224 preferably supports “delete” and “restrict” functions of output data.

図７３を用いて、削除、制限処理を説明する。入力画像６９０は、ある高さ６９１とある幅６９２とを有する。ここで、画像の一部分のみに関心があり、他の部位は印刷するのには関係がないというような状況がしばしば存在する。しかしながら、ＪＰＥＧ符号化システムでは８×８ピクセルブロックを対象とする。そのため、画像の幅が８の倍数とならない場合や、ＭＣＵ６９５を構成する関心部位領域がきちんと境界と一致しない場合が生じる。そこで、出力削除レジスタＲＯ＿ＣＵＴは、出力データストリームのはじめの部位６９６において削除する出力バイト数を決定する。また、出力制限レジスタＲＯ＿ＬＭＴは、生成する最大出力バイト数を決定する。この最大出力バイト数は、削除レジスタの結果に基づいてメモリに書込まれないバイトをも含む。このような処理により、最終出力バイト６９８以降のデータは出力されないような最終出力バイトを求めることができる。 The deletion / limitation process will be described with reference to FIG. The input image 690 has a certain height 691 and a certain width 692. Here, there are often situations where only one part of the image is of interest and the other part is irrelevant to printing. However, the JPEG encoding system targets 8 × 8 pixel blocks. For this reason, there are cases where the width of the image does not become a multiple of 8, or the region of interest constituting the MCU 695 does not exactly match the boundary. Therefore, the output deletion register RO_CUT determines the number of output bytes to be deleted in the first part 696 of the output data stream. The output restriction register RO_LMT determines the maximum number of output bytes to be generated. This maximum number of output bytes includes bytes that are not written to memory based on the result of the delete register. By such processing, it is possible to obtain a final output byte in which data after the final output byte 698 is not output.

ＪＰＥＧ復号器における削除、制限機能が特に有効であるケースとして２つの場合がある。第１のケースは、図７４に示すように、復号画像の１ストリップ７０１の一部位７００を抽出あるいは解凍する場合である。第２のケースは、図７５に示すように、全体の画像７１４において、複数の完全なストリップ（例えば、７１１、７１２、７１３）の抽出あるいは解凍が必要となる場合である。 There are two cases where the deletion and restriction functions in the JPEG decoder are particularly effective. The first case is a case where one part 700 of one strip 701 of the decoded image is extracted or decompressed as shown in FIG. The second case is when a plurality of complete strips (eg, 711, 712, 713) need to be extracted or decompressed in the entire image 714 as shown in FIG.

ＪＰＥＧ命令の命令フォーマットやフィールド符号化を図７６に示す。マイナーオプコードフィールドの説明を以下に記す。
命令ワード−マイナーオプコードフィールド FIG. 76 shows the instruction format and field encoding of the JPEG instruction. A description of the minor opcode field follows.
Instruction word-minor opcode field

３．１７．５データ符号化命令
コプロセッサ２２４は図２のＪＰＥＧ符号化器の一部を他の用途で用いることができる機能を備えることが望ましい。例えば、ハフマン符号化はＪＰＥＧのみならず他の圧縮手法においても用いられる。また、階層的画像復号のためのみにハフマン符号化部を制御するデータ符号化命令が備わっていることも望ましい。更に、ランレンクス符号化器／復号器、予測符号化器も同様の命令でもって独立に用いられることができる。 3.17.5 Data Encoding Instructions Coprocessor 224 preferably has the capability to use a portion of the JPEG encoder of FIG. 2 for other purposes. For example, Huffman coding is used not only in JPEG but also in other compression methods. It is also desirable that a data encoding instruction for controlling the Huffman encoder only for hierarchical image decoding is provided. Furthermore, the Runlenx encoder / decoder and the predictive encoder can also be used independently with similar instructions.

３．１７．６高速ＤＣＴ装置
従来の図７７に示したような離散コサイン変換（ＤＣＴ）装置では、まず８×８ブロックの列方向に対して１次元ＤＣＴを実行し、次いで８×８ピクセルブロックの行方向に更に１次元ＤＣＴすることにより、８×８ピクセルブロックの２次元変換を実行する。このような装置では、入力回路１０９６、算術回路１１０４、制御回路１０９８、置換メモリ回路１０９０、出力回路１０９２を一般に備える。 3.17.6 High Speed DCT Device In a conventional discrete cosine transform (DCT) device as shown in FIG. 77, first, a one-dimensional DCT is performed in the column direction of 8 × 8 blocks, and then an 8 × 8 pixel block. Further, two-dimensional transformation of 8 × 8 pixel blocks is performed by further one-dimensional DCT in the row direction. Such a device generally includes an input circuit 1096, an arithmetic circuit 1104, a control circuit 1098, a replacement memory circuit 1090, and an output circuit 1092.

入力回路１０９６は８×８ブロックから８ビットピクセルを受信する。入力回路１０９６は、中間マルチプレクサ１１００、１１０２を介して算術回路１１０４に接続されている。算術回路１１０４は、８×８ブロックの完全な列あるいは行に対して算術処理を行う。制御回路１０９８は、他の全ての回路を制御し、ＤＣＴアルゴリズムを実行する。算術回路の出力は、置換メモリ１０９０、レジスタ１０９５、出力回路１０９２に送られる。置換メモリは更にマルチプレクサ１１００に接続され、マルチプレクサ１１００は次のマルチプレクサ１１０２に出力を送出する。また、マルチプレクサ１１０２はレジスタ１０９４からのデータをも受信する。置換回路１０９０は８×８ブロックデータを列形式で入力し、行形式でデータを出力する。出力回路１０９２はピクセルデータの８×８ブロックに対するＤＣＴ係数を出力する。 Input circuit 1096 receives 8-bit pixels from an 8 × 8 block. The input circuit 1096 is connected to the arithmetic circuit 1104 via the intermediate multiplexers 1100 and 1102. Arithmetic circuit 1104 performs arithmetic processing on complete columns or rows of 8 × 8 blocks. The control circuit 1098 controls all other circuits and executes the DCT algorithm. The output of the arithmetic circuit is sent to the replacement memory 1090, the register 1095, and the output circuit 1092. The replacement memory is further connected to a multiplexer 1100 that sends the output to the next multiplexer 1102. The multiplexer 1102 also receives data from the register 1094. The replacement circuit 1090 inputs 8 × 8 block data in a column format and outputs data in a row format. Output circuit 1092 outputs DCT coefficients for an 8 × 8 block of pixel data.

通常のＤＣＴ装置では、算術回路１１０４がもっとも複雑であるため、算術回路１１０４の速度が全体の装置速度を決定する。図７７の算術回路１１０４は、一般に算術処理を図７８を用いて説明するように複数の処理段階に分割して処理を行う。従って、各処理段階１１４４、１１４８、１１５２、１１５６を加算器や乗算器などの通常の資源を用いて実行するような単一回路が用いられる。このような算術回路１１０４では、単一の共通回路が回路１１０４の種々の処理段階を実行するために用いられるため、最適速度に比べて速度が遅くなるという欠点を有する。また、中間結果を蓄える格納手段もこれに含まれる。回路のクロックサイクル時間は少なくとも最も遅い回路段階以上でなければならないため、全体の処理に要する時間は各処理段階に要する時間の和以上となり得る。 In a normal DCT device, the arithmetic circuit 1104 is the most complex, so the speed of the arithmetic circuit 1104 determines the overall device speed. The arithmetic circuit 1104 in FIG. 77 generally performs arithmetic processing by dividing the arithmetic processing into a plurality of processing stages as described with reference to FIG. Thus, a single circuit is used that performs each processing stage 1144, 1148, 1152, 1156 using normal resources such as adders and multipliers. Such an arithmetic circuit 1104 has the disadvantage that it is slower than the optimum speed because a single common circuit is used to perform the various processing steps of the circuit 1104. This also includes storage means for storing intermediate results. Since the clock cycle time of the circuit must be at least the slowest circuit stage, the overall processing time can be more than the sum of the time required for each processing stage.

図７８は、図７７の装置における通常の算術データパスを示したものであり、ＤＣＴを４処理段階で行う処理の一部を示している。なお、本図は実際の実装を示したものでなく、機能を示したものである。４処理段階１１４４、１１４８、１１５２、１１５６のそれぞれは、単一の再構成可能な回路として構築される。サイクルごとに、１次元ＤＣＴの４処理段階１１４４、１１４８、１１５２、１１５６のそれぞれが再構成される。また、この回路においては、４処理段階１１４４、１１４８、１１５２、１１５６のそれぞれが共通の資源（加算器や乗算器など）のプールを用いることで、ハードウェア規模を小さくしてえる。 FIG. 78 shows a normal arithmetic data path in the apparatus shown in FIG. 77, and shows a part of processing for performing DCT in four processing stages. In addition, this figure does not show actual implementation but shows functions. Each of the four processing stages 1144, 1148, 1152, 1156 is constructed as a single reconfigurable circuit. For each cycle, each of the four processing stages 1144, 1148, 1152, 1156 of the one-dimensional DCT is reconfigured. In this circuit, each of the four processing stages 1144, 1148, 1152, and 1156 uses a pool of common resources (such as an adder and a multiplier), thereby reducing the hardware scale.

しかしながら、この回路の欠点は速度が最適になっていないことである。４処理段階１１４４、１１４８、１１５２、１１５６はそれぞれが加算器や乗算器の同一プールから構成されている。そのため、クロックピリオドは最も遅い処理段階によって決定される（この例ではブロック１１４４の２０ｎｓ）。入力と出力マルチプレクサ１１４６と１１５４の遅延（それぞれ２ｎｓ）と、フリップフロップ１１５０の遅延（３ｎｓ）を足すと、全体の遅延が２７ｎｓとなる。従って、このＤＣＴ構成では最速２７ｎｓで動作する。 However, the disadvantage of this circuit is that the speed is not optimal. Each of the four processing stages 1144, 1148, 1152, 1156 is composed of the same pool of adders and multipliers. Therefore, the clock period is determined by the slowest processing stage (in this example, 20 ns of block 1144). Adding the delay of the input and output multiplexers 1146 and 1154 (2 ns each) and the delay of the flip-flop 1150 (3 ns) gives a total delay of 27 ns. Therefore, this DCT configuration operates at a maximum speed of 27 ns.

パイプライン形式のＤＣＴ構成もよく知られている。この構成の欠点は、多量のハードウェアを必要とする点である。スループットの観点では本発明の構成ではパイプライン構成に及ばないものの、現在のほとんどのＤＣＴ構成と比べてきわめて良好な性能／サイズ特性や速度特性を示す。図７９は、ピクセルデータが入力回路１１２６に入力され、８ビットピクセルデータの列を格納するようなＪＰＥＧ符号化器（図２）において用いられる好適な離散コサイン変換部の構成を示した図である。置換メモリは、２次元離散コサイン変換の２回目のパスを実施するために、列形式データを行形式データに変換する。入力回路１１２６と置換メモリ１１１８からのメモリは、マルチプレクサ１１２４においてマルチプレキシングされ、出力データが算術回路１１２２に送られる。算術回路１１２２の結果は、２回目のパスの終了後出力回路１１２０に送られる。制御回路１１１６は、離散コサイン変換装置中のデータの流れを制御する。 Pipelined DCT configurations are also well known. The disadvantage of this configuration is that it requires a large amount of hardware. From the viewpoint of throughput, the configuration of the present invention does not reach the pipeline configuration, but exhibits extremely good performance / size characteristics and speed characteristics as compared with most current DCT configurations. FIG. 79 is a diagram showing a configuration of a preferred discrete cosine transform unit used in a JPEG encoder (FIG. 2) in which pixel data is input to the input circuit 1126 and stores a sequence of 8-bit pixel data. . The replacement memory converts column format data into row format data in order to perform the second pass of the two-dimensional discrete cosine transform. The memory from the input circuit 1126 and the replacement memory 1118 is multiplexed in the multiplexer 1124 and the output data is sent to the arithmetic circuit 1122. The result of the arithmetic circuit 1122 is sent to the output circuit 1120 after completion of the second pass. The control circuit 1116 controls the flow of data in the discrete cosine transform device.

離散コサイン変換処理の第１回目のパスでは、変換すべき画像の列データあるいはピクセルデータに逆変換される変換画像係数が、入力回路１１２６に送られる。このパスでは、マルチプレクサ１１２４は制御回路１１１６によって設定され、入力回路１１２６から算術回路１１２２にデータが送られる。図８０は、算術回路１１２２の構成をより詳細に示した図である。フォワード離散コサイン変換の実行の場合には、フォワード離散コサイン変換を実行するフォワード回路１１３８の結果がマルチプレクサ１１２４において選択される。ここで、マルチプレクサ１１２４は制御回路１１１６によって設定される。逆離散コサイン変換の実行の場合には、制御回路１１２６の設定に基づいて、逆回路１１４０からの出力がマルチプレクサ１１４２において選択される。１回目のパスでは、各列ベクトルが算術回路１１２２（制御回路１１６６によって適切に設定される）によって処理された後、当該ベクトルが置換メモリ１１１８に書込まれる。８×８ブロック中のすべての８列ベクトルの処理が終わり、置換メモリ１１１８に書込まれると、離散コサイン変換の２回目のパスが開始される。 In the first pass of the discrete cosine transform process, transformed image coefficients that are inversely transformed into column data or pixel data of an image to be transformed are sent to the input circuit 1126. In this path, the multiplexer 1124 is set by the control circuit 1116, and data is sent from the input circuit 1126 to the arithmetic circuit 1122. FIG. 80 is a diagram showing the configuration of the arithmetic circuit 1122 in more detail. In the case of executing the forward discrete cosine transform, the result of the forward circuit 1138 that performs the forward discrete cosine transform is selected by the multiplexer 1124. Here, the multiplexer 1124 is set by the control circuit 1116. In the case of executing the inverse discrete cosine transform, the output from the inverse circuit 1140 is selected by the multiplexer 1142 based on the setting of the control circuit 1126. In the first pass, each column vector is processed by arithmetic circuit 1122 (which is set appropriately by control circuit 1166), and then the vector is written to replacement memory 1118. When all 8 column vectors in the 8 × 8 block have been processed and written to the replacement memory 1118, the second pass of the discrete cosine transform is started.

フォワードあるいは逆離散コサイン変換の２回目のパスでは、行形式のベクトルが置換メモリ１１１８から読み出され、マルチプレクサ１１２４を介して算術回路１１２２に送られる。このパスでは、マルチプレクサ１１２４は入力回路１１３６からのデータを無視し、置換メモリ１１１８からの行ベクトルデータを算術回路１１２２に転送するように、制御回路によって設定される。算術回路１１２２中のマルチプレクサ１１４２は、逆回路１１４０からの結果データを算術回路１１２２の出力に送る。算術回路１１２２からの結果が得られた時点で、制御回路１１１６からの指令に基づいて出力回路１１２０は結果を取り込み、以降の時点で出力する。 In the second pass of the forward or inverse discrete cosine transform, the row format vector is read from the permutation memory 1118 and sent to the arithmetic circuit 1122 via the multiplexer 1124. In this path, the multiplexer 1124 is set by the control circuit to ignore the data from the input circuit 1136 and transfer the row vector data from the permutation memory 1118 to the arithmetic circuit 1122. The multiplexer 1142 in the arithmetic circuit 1122 sends the result data from the inverse circuit 1140 to the output of the arithmetic circuit 1122. When the result from the arithmetic circuit 1122 is obtained, the output circuit 1120 captures the result based on a command from the control circuit 1116 and outputs it at a subsequent time.

算術回路１１２２は、中間結果を格納する記憶部位を持たないという点で、組み合わせ回路となっている。制御回路１１１６は、データが入力回路１１３６からマルチプレクサ１１２４や算術回路１１２２を介して出力されるまでに要する時間を把握しているため、算術回路１１２２の出力からの結果ベクトルを出力回路１１２０に取り込む時点を正確に指示することができる。算術回路１１２２において中間記憶を持たない利点は、中間記憶要素との間でのデータのやり取りに必要な時間を省くことができるとともに、算術回路１１２２をデータが通過するのに要する時間が内部処理段すべての和となり、最大の時間を要する処理段のＮ倍（従来の離散コサイン変換装置のように）にはならないことが挙げられる。なお、ここで、Ｎは算術回路中の処理段数である。 The arithmetic circuit 1122 is a combinational circuit in that it does not have a storage part for storing intermediate results. Since the control circuit 1116 knows the time required for data to be output from the input circuit 1136 via the multiplexer 1124 or the arithmetic circuit 1122, the time point when the result vector from the output of the arithmetic circuit 1122 is taken into the output circuit 1120 Can be indicated accurately. The advantage of not having intermediate storage in the arithmetic circuit 1122 is that the time required for data exchange with the intermediate storage element can be omitted, and the time required for data to pass through the arithmetic circuit 1122 is an internal processing stage. It can be mentioned that all the sums are not N times that of the processing stage requiring the maximum time (as in a conventional discrete cosine transform device). Here, N is the number of processing stages in the arithmetic circuit.

図８１は、全体の遅延が単に４つの処理段１１５８、１１６０、１１６２、１１６４の和、２０ｎｓ＋１０ｎｓ＋１２ｎｓ＋１５ｎｓ＝５７ｎｓとなり、図７８の回路よりも高速となることを示している。このような回路によれば、全体のシステムクロックサイクルを短くすることができる。図８１の回路において、結果を得るのに４クロックサイクルが必要であるとすると、全体のＤＣＴシステムにおいて最小実行時間は５７／４ｎｓ（１４．２５ｎｓ）となり、図７８ではＤＣＴクロックサイクルが２７ｎｓとせざるを得ないことを鑑みると大幅な性能向上となることがわかる。 FIG. 81 shows that the overall delay is simply the sum of four processing stages 1158, 1160, 1162, 1164, 20 ns + 10 ns + 12 ns + 15 ns = 57 ns, which is faster than the circuit of FIG. According to such a circuit, the entire system clock cycle can be shortened. In the circuit of FIG. 81, if 4 clock cycles are required to obtain the result, the minimum execution time is 57/4 ns (14.25 ns) in the entire DCT system, and in FIG. 78, the DCT clock cycle is 27 ns. In view of the fact that it is not obtained, it can be seen that the performance is greatly improved.

本ＤＣＴ装置の実際の実行時においては、ＹｕｋｉｈｉｒｏＡｒａｉ，ＴａｋｅｓｈｉＡｇｕｉ，ＭａｓａｙｕｋｉＮａｋａｊｉｍａらによるＴｈｅＴｒａｎｓａｃｔｉｏｎｓｏｆｔｈｅＩＥＩＣＥ，ｖｏｌ，Ｅ７１，ｎｏ．１１，１９８８年１１月のページ１０９５に掲載された論文「画像のための高速ＤＣＴ−ＳＱ手法」で示されたＤＣＴアルゴリズムを用いることもできる。このアルゴリズムをハードウェアで実行することで、本ＤＣＴ装置中の算術回路１１２２に容易に配置することができる。同様に、他のＤＣＴアルゴリズムを算術回路１１２２中にハードウェアとして配置することも可能である。 In actual execution of the present DCT apparatus, TheTransactions of the IEICE, vol. E71, no. By Yukihiro Arai, Takeshi Agui, Masayuki Nakajima et al. 11, the DCT algorithm shown in the paper “Fast DCT-SQ Method for Images” published on page 1095 of November 1988 can also be used. By executing this algorithm by hardware, it can be easily arranged in the arithmetic circuit 1122 in the present DCT apparatus. Similarly, other DCT algorithms can be placed in the arithmetic circuit 1122 as hardware.

３．１７．７ハフマン復号器
以下の実施例は、種々の長さのビットフィールドがインターリーブされた可変長符号に対する手法と装置に関するものである。特に、本発明の実施例は、可変長符号化データの効率の良い、高速な、単一処理段（クロックサイクル）の復号を提供するものである。ここで、可変長符号化されていず整列されているようなデータとは、既に別の前処理ブロックにおいて符号化データストリームから削除されているものとする。更に、削除されたバイト整列データの位置情報は、復号されるデータと同時に復号器の出力に送られる。また、前処理された入力データ中に残っているバイト整列、非可変長符号化ビットフィールドの高速な検出、並びに削除をも提供するものである。 3.17.7 Huffman Decoder The following example relates to a technique and apparatus for variable length codes in which bit fields of various lengths are interleaved. In particular, embodiments of the present invention provide efficient, fast, single processing stage (clock cycle) decoding of variable length encoded data. Here, it is assumed that the data that is not variable-length-coded but arranged is already deleted from the encoded data stream in another preprocessing block. Further, the position information of the deleted byte aligned data is sent to the decoder output simultaneously with the data to be decoded. It also provides for byte alignment remaining in the preprocessed input data, fast detection and deletion of non-variable length encoded bit fields.

本発明の好適な実施例では、マーカ符号間のクロックサイクルごとに１ハフマンシンボルといったレートで、ＪＰＥＧ符号化データを復号することのできる高速ハフマン復号器を備えることが望ましい。これは、別の前処理ブロックにおいて、入力データからバイト整列されハフマン符号化されていないマーカヘッダ、マーカ符号、挿入バイトを分離し、除去する手法によって実現できる。バイト整列されたデータが除去されると、入力データはデータシフト組み合わせ回路ブロックに送られ、データ復号レジスタの連続的な挿入処理を行い、復号部位にデータが送られる。もとの入力データから除去されたマーカの位置はマーカシフトブロックに送られ、データシフトブロックにおいてシフトされた入力データと同時にマーカ位置ビットのシフトが行われる。 In the preferred embodiment of the present invention, it is desirable to have a high speed Huffman decoder capable of decoding JPEG encoded data at a rate of one Huffman symbol per clock cycle between marker codes. This can be realized by a technique of separating and removing a marker header, a marker code, and an insertion byte that are byte-aligned and not Huffman-coded from input data in another preprocessing block. When the byte aligned data is removed, the input data is sent to the data shift combinational circuit block, the data decoding register is continuously inserted, and the data is sent to the decoding part. The marker position removed from the original input data is sent to the marker shift block, and the marker position bit is shifted simultaneously with the input data shifted in the data shift block.

復号部は、データ復号レジスタから入力された符号化ビットフィールドを組合せ回路で復号する。復号部の出力は、復号値（ｖ）と入力符号の実際の長さ（ｍ）である。ここで、ｍはｎ以下である。また、可変長ビットフィールドの長さ（ａ）も出力する。ここで、ａは０以上の値である。可変長ビットフィールドはハフマン符号化されていないため、すぐにハフマン符号化される。復号部の入力中の長さｎのビットフィールドは実際の符号以上の長さを有する。復号部では、実際のコード長（ｍ）を決定し、他のビット（ａ）の長さとともに制御ブロックに転送する。制御ブロックはシフト値（ａ＋ｍ）を決定し、データ／マーカシフトブロックを起動して次の復号サイクルに備えて入力データをシフトする。 The decoding unit decodes the encoded bit field input from the data decoding register by the combinational circuit. The output of the decoding unit is the decoded value (v) and the actual length (m) of the input code. Here, m is n or less. Also, the length (a) of the variable length bit field is output. Here, a is a value of 0 or more. Since the variable length bit field is not Huffman encoded, it is immediately Huffman encoded. The bit field of length n in the input of the decoding unit has a length longer than the actual code. The decoding unit determines the actual code length (m) and transfers it to the control block together with the length of the other bits (a). The control block determines the shift value (a + m) and activates the data / marker shift block to shift the input data in preparation for the next decoding cycle.

本発明の装置では、復号値、入力符号の実際の長さ、ハフマン符号化されていないビットフィールドの長さを所定の時間内に出力するものであれば、ＲＯＭ，ＲＡＭ，ＰＬＡなどのいかなる組合せ回路の復号部を用いることができる。本実施例では、復号部は、ＪＰＥＧ標準で規定されているように予測符号化ＤＣ係数値やＡＣランレンクス値を出力する。また、ＪＰＥＧ標準で規定されているように、復号値と同時に入力データから除去されたハフマン符号化されていないビットフィールドは、ＤＣとＡＣ係数の値を決定する付加ビットを示す。データ復号レジスタ中のデータから除去されたハフマン符号化されていないビットフィールドの他の種別としては、ＪＰＥＧ標準に規定されているようにもとの入力データストリーム中のバイト整列マーカに先立つパディングビットがある。これらのビットは、制御ブロックがデータレジスタのパディング領域の内容をチェックすることによって検出される。パディング領域はデータレジスタのｋ最大ビットから成り、マーカレジスタの最大ビット中のマーカビットの存在によって示される。パディング領域中のすべてのビットが同一（ＪＰＥＧ標準では１）であれば、パディングビットとして判断され、復号されることなくデータレジスタから除去される。そして、次の復号サイクルに向けて、データとマーカレジスタの内容は更新される。 In the device of the present invention, any combination of ROM, RAM, PLA, etc., as long as the decoded value, the actual length of the input code, and the length of the bit field that is not Huffman-coded are output within a predetermined time. The decoding part of the circuit can be used. In this embodiment, the decoding unit outputs a predictive coding DC coefficient value and an AC run-lens value as defined in the JPEG standard. Further, as defined in the JPEG standard, the bit field that is not Huffman-coded and removed from the input data simultaneously with the decoded value indicates additional bits that determine the values of the DC and AC coefficients. Other types of non-Huffman-encoded bit fields removed from the data in the data decoding register include padding bits preceding the byte alignment marker in the original input data stream as specified in the JPEG standard. is there. These bits are detected by the control block checking the contents of the padding area of the data register. The padding area consists of k maximum bits of the data register and is indicated by the presence of marker bits in the maximum bits of the marker register. If all bits in the padding area are the same (1 in the JPEG standard), it is determined as a padding bit and removed from the data register without being decoded. The data and the contents of the marker register are updated for the next decoding cycle.

装置の実施例では、本発明の好適な実施例の要求に応じて、出力データのフォーマット処理を行う出力ブロックを備える。出力ブロックは、ＪＰＥＧにおける付加ビットなどのように、対応する可変長符号化されていないビットフィールドや、ＪＰＥＧにおけるマーカのように整列された入力バイトや符号化されていないビットフィールドの位置を示す信号とともに、復号値を出力する。 The apparatus embodiment includes an output block for formatting output data in response to the requirements of the preferred embodiment of the present invention. The output block is a signal indicating the position of the corresponding non-encoded bit field such as an additional bit in JPEG, the input byte aligned as in the marker in JPEG, or the unencoded bit field. At the same time, the decoded value is output.

ＪＰＥＧ符号化器２４１（図２）によって復号されたデータは、ＪＰＥＧコンパチブルであり、「付加ビット」と呼ばれる可変長符号化されていないビットフィールド、「パディングフィールド」と呼ばれる可変長符号化されていないニットフィールド、「マーカ」「挿入バイト」「詰込バイト」と呼ばれる固定長の、バイト整列された、符号化されていないビットフィールドがインタリーブされた可変長ハフマン符号化コードから構成される。図８２に代表的な入力データを示す。 The data decoded by the JPEG encoder 241 (FIG. 2) is JPEG compatible and is not a variable-length encoded bit field called “additional bit” and a variable-length encoded called “padding field”. A knit field, consisting of a variable length Huffman coded code interleaved with a fixed length, byte aligned, uncoded bit field called "marker", "insert byte", "padded byte". FIG. 82 shows typical input data.

ＪＰＥＧ符号化器２４１のハフマン復号器中の全体構成やデータフローを図８３と図８４に示す。図８３は、ＪＰＥＧデータのハフマン復号器の構成を詳細に示している。ストリッパ１１７１はマーカ符号（符号ＦＦＸＸｈｅｘ，ＸＸは非零）を除去し、バイト（符号ＦＦｈｅｘ）を挿入し、バイト（符号Ｆｆｈｅｘに続く符号００ｈｅｘ）を詰込む。これらはすべて入力データのバイト整列された要素であり、３２ビットワードとしてストリッパに送られる。処理すべき第１ワードの最大ビットは、入力ビットストリームの先頭になる。ストリッパ１１７１では、バイト整列されたビットフィールドが、ハフマン符号の復号処理が復号器のダウンストリーム部位において実際に行われる前に、入力データから除去される。 83 and 84 show the overall configuration and data flow in the Huffman decoder of the JPEG encoder 241. FIG. FIG. 83 shows the configuration of a JPEG data Huffman decoder in detail. The stripper 1171 removes the marker code (codes FFXXhex and XX are non-zero), inserts a byte (code FFhex), and packs a byte (code 00hex following the code Ffhex). These are all byte-aligned elements of the input data and are sent to the stripper as 32-bit words. The maximum bit of the first word to be processed is at the beginning of the input bitstream. In the stripper 1171, the byte aligned bit fields are removed from the input data before the Huffman code decoding process is actually performed in the downstream part of the decoder.

入力データはストリッパ１１７１にクロックサイクルに１つごとの３２ビットワードとして入力される。入力バイト１２１１を０から３への番号付けを図８５に示す。番号（ｉ）のバイトが挿入バイト、詰込バイト、あるいはマーカであるため除去されたとすると、番号（ｉ−１）から０の残りのバイトがストリッパ１１７１の出力で左にシフトされ、番号（ｉ）を１減らす。この際、バイト０は「無関係な」バイトとなる。ストリッパ１１７１から出力されたバイトの有効性は、図８５に示されている別の出力タグ１２１２によって符号化される。ストリッパ１１７１によって除去されないバイトはストリッパにおいて左詰めで出力される。出力中の各バイトは、対応するバイトが有効（ストリッパ１１７１を通過する）か、無効（ストリッパ１１７１で除去される）か、有効かつマーカの後部か、を示すタグが付加される。タグ１２１２は、データシフタを通してデータレジスタ１１８２へのデータバイトのロードを制御するとともに、マーカシフタを通してマーカレジスタ１１８３へのマーカ位置のロードを制御する。入力ワードから１バイト以上削除された場合でも同様の手法が実行される。すなわち、すべての残りの有効バイトが左詰めされ、対応する出力タグが出力バイトの有効性を示す。図８５には、種々の入力バイトの組み合わせに対する出力バイトと出力タグの例１２１３が示されている。 Input data is input to the stripper 1171 as one 32-bit word per clock cycle. The numbering of input bytes 1211 from 0 to 3 is shown in FIG. If the byte of number (i) is removed because it is an insertion byte, a padding byte or a marker, the remaining bytes of 0 from number (i-1) are shifted to the left at the output of the stripper 1171 and the number (i ) Is reduced by one. At this time, byte 0 becomes an “irrelevant” byte. The validity of the bytes output from the stripper 1171 is encoded by another output tag 1212 shown in FIG. Bytes that are not removed by the stripper 1171 are output left-justified in the stripper. Each byte being output is tagged with a tag indicating whether the corresponding byte is valid (passes stripper 1171), invalid (removed by stripper 1171), or valid and after the marker. The tag 1212 controls the loading of data bytes into the data register 1182 through the data shifter, and controls the loading of the marker position into the marker register 1183 through the marker shifter. The same method is executed even when one byte or more is deleted from the input word. That is, all remaining valid bytes are left justified and the corresponding output tag indicates the validity of the output byte. FIG. 85 shows output byte and output tag examples 1213 for various combinations of input bytes.

図８３において、プレシフタとポストシフタブロック１１７２、１１７３、１１８０、１１８１の役割は、データレジスタ１１８２とマーカレジスタ１１８３に十分な空き領域がある場合にデータレジスタとマーカレジスタとに連続的にデータをロードすることである。データシフタとマーカシフタブロックは、プレシフタブロックとポストシフタブロックとから成るが、それぞれは同一であり同様に制御される。差異は、データシフタがストリッパ１１７１からのデータを処理するのに対し、マーカシフタはタグのみを処理し、マーカ位置を復号されたハフマン値と同時に復号器に出力する点にある。ポストシフタ１１８０、１１８１の出力は、図８３に示されているように対応するレジスタ１１８２、１１８３に直接転送される。 In FIG. 83, the roles of the pre-shifter and post-shifter blocks 1172, 1173, 1180, and 1181 are to load data continuously to the data register and marker register when there is sufficient free space in the data register 1182 and marker register 1183. That is. The data shifter and marker shifter block are composed of a pre-shifter block and a post-shifter block, which are the same and controlled in the same way. The difference is that the data shifter processes the data from the stripper 1171 whereas the marker shifter processes only the tag and outputs the marker position to the decoder simultaneously with the decoded Huffman value. The outputs of the post-shifters 1180 and 1181 are directly transferred to the corresponding registers 1182 and 1183 as shown in FIG.

図８６にもデータプレシフタ１１７２が示されているが、データプレシフタ１１７２は、ストリッパ１１７１からのデータに３２個のゼロを最小ビット１２５１に付加し、６４ビットにデータを拡張する。次いで、拡張データは６４ビット幅のバレルシフタ１２５２で右にデータレジスタ１１８２に現在存在するビット数だけシフトされる。この際、ビット数は、データ１１８２、マーカ１１８３レジスタ内にどれだけの有効ビットが存在するかを常に把握している制御ロジック１１８５から与えられる。そして、バレルレジスタ１２５２は、６４ビットを、６４個の２×１基本マルチプレクサ１２５４から成るマルチプレクサブロック１２５３に転送する。各基本２×１マルチプレクサ１２５４は、バレルシフタ１２５２からの１ビットとデータレジスタ１１８２からの１ビットを入力とする。データレジスタ中のビットが有効であるときにデータレジスタビットを出力する。一方、無効である場合には、バレルシフタ１２５２のビットを出力する。すべての基本マルチプレクサ１２５４への制御信号は、図８６ならびに図８７におけるレジスタ１２２３のプレシフタ制御ビット０．．．５として示されているように制御ブロックのシフト制御１信号より復号される。基本マルチプレクサ１２５４の出力はバレルシフタ１２５５に送られ、図８６に示されるように５ビット制御信号シフト制御２より与えられるビット数分左にシフトされる。これらのビットは、データレジスタ１１８２において現データの復号によって使用されるビット数を示したものであり、現復号ハフマンコード長と続く付加ビット数、あるいはパディングビットが検出されていれば削除されるパディングビット数、あるいはデータレジスタ１１８２中の有効ビット数が削除されるビット数以下であれば０を足したものとなる。このようにして、バレルシフタ１２５５から出力されるデータには、単一復号サイクルの後にデータレジスタ１１８２にロードされる新しいデータが含まれることになる。データレジスタ１１８２の内容は、最大ビットが復号されるためにレジスタからシフトアウトされ、ストリッパ１１７１から０、８、１６、２４、３２ビットがデータレジスタ１１８２に付加されるといった具合に変更される。データレジスタ１１８２に復号できるだけの十分なビットが存在しない場合には、ストリッパ１１７１からのデータが存在すれば現サイクルにおいてロードされる。現サイクルにおいてストリッパ１１７１からのデータが存在しない場合には、データレジスタ１１８２からの復号ビットは、十分なビット数であれば削除され、十分なビット数でなければデータレジスタ１１８２の内容は変更されない。 86 also shows a data preshifter 1172, but the data preshifter 1172 adds 32 zeros to the minimum bit 1251 to the data from the stripper 1171 and expands the data to 64 bits. The extension data is then shifted to the right by a 64-bit wide barrel shifter 1252 by the number of bits currently present in the data register 1182. At this time, the number of bits is given from the control logic 1185 that always knows how many valid bits exist in the data 1182 and marker 1183 registers. The barrel register 1252 then transfers the 64 bits to a multiplexer block 1253 consisting of 64 2 × 1 basic multiplexers 1254. Each basic 2 × 1 multiplexer 1254 receives one bit from the barrel shifter 1252 and one bit from the data register 1182. Data register bit is output when the bit in the data register is valid. On the other hand, if it is invalid, the bit of the barrel shifter 1252 is output. The control signals to all the basic multiplexers 1254 are the pre-shifter control bits 0. . . 5 is decoded from the shift control 1 signal of the control block. The output of the basic multiplexer 1254 is sent to the barrel shifter 1255 and is shifted to the left by the number of bits given by the 5-bit control signal shift control 2 as shown in FIG. These bits indicate the number of bits used for decoding the current data in the data register 1182, and the padding to be deleted if the current decoding Huffman code length and the number of additional bits following or the padding bits are detected. If the number of bits or the number of valid bits in the data register 1182 is equal to or less than the number of bits to be deleted, 0 is added. In this manner, the data output from barrel shifter 1255 will include new data loaded into data register 1182 after a single decoding cycle. The contents of the data register 1182 are shifted out of the register so that the maximum bits are decoded, and the 0, 8, 16, 24, 32 bits from the stripper 1171 are added to the data register 1182 and so on. If there are not enough bits in the data register 1182 to decode, if there is data from the stripper 1171, it is loaded in the current cycle. If there is no data from the stripper 1171 in the current cycle, the decoded bits from the data register 1182 are deleted if the number of bits is sufficient, and the contents of the data register 1182 are not changed unless the number of bits is sufficient.

マーカプレシフタ１１７３、ポストシフタ１１８１、マーカレジスタ１１８３は、データプレシフタ１１７２、データポストシフタ１１８０、データレジスタ１１８２とそれぞれ同一の部位である。部位１１７３、１１８１、１１８３内のデータフローならびにこれらの部位間のデータフローも、部位１１７２、１１８０、１１８２間でのデータフローと同一である。同様の制御信号が制御部１１８５より双方の部位セットに送られる。これらの部位の差異は、マーカプレシフタ１１７３とデータプレシフタ１１７２の入力データ種別と、マーカレジスタ１１８３とデータレジスタ１１８２の内容がどのように用いられるか、という点である。図８８に示すように、ストリッパ１１７１からのタグ１２６１は８ビットワードとして入力され、データレジスタ１１８２に向かうデータバイトごとに２ビット割り当てられている。図８５に示した符号化手法によれば、有効かつマーカ後部であるバイトを示す２ビットタグの最大ビットは１である。ストリッパ１１７１から同時に送られる４つのタグの最大ビット位置のみが、マーカプレシフタ１１７３の入力１２６２として送出される。このようにして、マーカプレシフタへの入力には、はじめに符号化されたデータビットでマーカの後部に位置する位置を示す１がセットされたビットが存在することになる。同時に、これらはデータレジスタ１１８２中でマーカが後に続くはじめに符号化されたデータビットの位置をマークしている。マーカレジスタ１１８３中のマーカ位置ビットとデータレジスタ１１８２中のデータビットの同期的な振る舞いによって、制御ブロック１１８５はパディングビットの検出や削除を行うことができるとともに、復号データと同時にマーカ位置を復号器の出力に送出することができる。上述の通り、２つのプレシフタ（データ１１７２とマーカ１１７３）、ポストシフタ（データ１１８０とマーカ１１８１）、レジスタ（データ１１８２とマーカ１１８３）は同一の制御信号を与えられているため、完全な並列、同期動作が可能となる。 The marker preshifter 1173, the post shifter 1181, and the marker register 1183 are the same parts as the data preshifter 1172, the data post shifter 1180, and the data register 1182, respectively. The data flow in the parts 1173, 1181, 1183 and the data flow between these parts are also the same as the data flow between the parts 1172, 1180, 1182. A similar control signal is sent from the control unit 1185 to both site sets. The difference between these parts is the types of input data of the marker preshifter 1173 and the data preshifter 1172 and how the contents of the marker register 1183 and the data register 1182 are used. As shown in FIG. 88, the tag 1261 from the stripper 1171 is input as an 8-bit word, and 2 bits are allocated for each data byte going to the data register 1182. According to the encoding method shown in FIG. 85, the maximum bit of a 2-bit tag indicating a byte that is valid and is at the rear of the marker is 1. Only the maximum bit positions of the four tags sent simultaneously from the stripper 1171 are sent out as the input 1262 of the marker preshifter 1173. In this way, the input to the marker preshifter includes a bit in which 1 indicating the position located at the rear of the marker is set in the data bit encoded first. At the same time, they mark the position of the first encoded data bit followed by a marker in the data register 1182. The synchronous behavior of the marker position bit in the marker register 1183 and the data bit in the data register 1182 allows the control block 1185 to detect and delete padding bits and to simultaneously detect the marker position and the decoder position of the decoder. Can be sent to output. As described above, the two pre-shifters (data 1172 and marker 1173), the post-shifter (data 1180 and marker 1181), and the register (data 1182 and marker 1183) are given the same control signal. Is possible.

復号部１１８４（図８９にも示されている）は、データレジスタ１１８２の最大１６ビットを入力し、復号されたハフマン値、復号される現在の入力符号長、入力符号に続く付加ビット長（復号値の関数となる）を抽出するための組み合わせ回路復号部１１８４に送られる。付加ビット長は、対応する前のハフマンシンボルが復号された時点で明らかになり、次のハフマンシンボルの開始位置となる。従って、クロックサイクルごとに１つの値が復号される速度を維持する場合には、ハフマン値の復号を組み合わせ回路ブロックで行わなければならない。復号部は、図８９に示すように、１６ビットトークンをデータレジスタ１１８２から入力し、ハフマン値（８ビット）、対応するハフマン符号化されたシンボル（４ビット）、付加ビット（４ビット）を生成するような組み合わせ回路ブロックとしてハードワイヤされた４つのＰＬＡスタイルの復号テーブルを備えることが望ましい。 The decoding unit 1184 (also shown in FIG. 89) inputs up to 16 bits of the data register 1182, decodes the Huffman value, the current input code length to be decoded, and the additional bit length following the input code (decoding (Which is a function of the value) is sent to the combinational circuit decoding unit 1184 for extraction. The additional bit length becomes clear when the corresponding previous Huffman symbol is decoded, and becomes the start position of the next Huffman symbol. Therefore, to maintain the rate at which one value is decoded per clock cycle, the Huffman value must be decoded in the combinational circuit block. As shown in FIG. 89, the decoding unit inputs a 16-bit token from the data register 1182 and generates a Huffman value (8 bits), a corresponding Huffman-encoded symbol (4 bits), and an additional bit (4 bits). It is desirable to provide four PLA style decoding tables hardwired as such combinational circuit blocks.

パディングビットの削除処理は、制御部１１８５の一部であるパディングビットの復号部においてデータレジスタ１１８２中でパディングビット列が検出された際の実際の復号処理において行われる。図９０にパディングビットの復号部を示す。マーカレジスタ１１８３、１２４２の８最大ビット中にマーカ位置ビットが存在するかどうかが調べられる。マーカ位置ビットが存在した場合には、マーカレジスタ１２４２中のマーカビットに先立つビットに対応するデータレジスタ１１８２、１２４１中のすべてのビットが現在のパディング領域として判断される。現在のパディング領域の内容は、パディングビット検出部１２４３によってすべて１であるかどうかがチェックされる。現パディング領域のすべてのビットが１である場合には、パディングビットであると判断されデータレジスタから削除される。ここで、削除処理は、データレジスタ１１８２、１２４１（同時にマーカレジスタ１１８３、１２４２）の内容を対応するシフタ１１７２、１１７３、１１８０、１１８１を用いて１クロックサイクルで左にシフトさせることで行われる。この処理は、復号値が出力されないことを除いて通常の復号モードと同一である。現パディング領域のすべてのビットが１でない場合には、パディングビット削除サイクルではなく通常の復号サイクルが実行される。パディングビットの検出は上述のように各サイクルごとに行われ、データレジスタ１１８２にパディングビットが存在する場合には削除される。 The padding bit deletion process is performed in an actual decoding process when a padding bit string is detected in the data register 1182 in the padding bit decoding unit which is a part of the control unit 1185. FIG. 90 shows a padding bit decoding unit. It is checked whether or not there is a marker position bit among the 8 maximum bits of the marker registers 1183 and 1242. When the marker position bit exists, all the bits in the data registers 1182 and 1241 corresponding to the bits preceding the marker bit in the marker register 1242 are determined as the current padding area. It is checked whether or not the contents of the current padding area are all 1 by the padding bit detection unit 1243. If all the bits in the current padding area are 1, it is determined as a padding bit and is deleted from the data register. Here, the deletion process is performed by shifting the contents of the data registers 1182 and 1241 (at the same time, the marker registers 1183 and 1242) to the left in one clock cycle using the corresponding shifters 1172, 1173, 1180, and 1181. This process is the same as the normal decoding mode except that no decoded value is output. If all the bits in the current padding area are not 1, a normal decoding cycle is executed instead of a padding bit deletion cycle. As described above, the padding bit is detected every cycle, and when the padding bit exists in the data register 1182, it is deleted.

図８７は、制御部１１８５を詳細に示したものである。制御部の中心部位はレジスタ１２２３であり、データレジスタ１１８２中の現有効ビット数を保持している。マーカレジスタ１１８３中の有効ビット数は常にデータレジスタ１１８２中の有効ビット数と等しい。制御部は３つの機能を実行する。第一の機能は、レジスタ１２２３に格納されるデータレジスタ１１８２中の新しいビット数の計算である。第二の機能は、シフタ１１７２、１１７３、１１８０、１１８１、１１８６、１１８７、復号部１１８４、出力フォーマット部１１８８への制御信号の生成である。第三の機能は、上述のようにデータレジスタ１１８２中のパディングビットの検出である。 FIG. 87 shows the control unit 1185 in detail. The central part of the control unit is a register 1223, which holds the current number of effective bits in the data register 1182. The number of valid bits in the marker register 1183 is always equal to the number of valid bits in the data register 1182. The control unit performs three functions. The first function is the calculation of the new number of bits in the data register 1182 stored in the register 1223. The second function is generation of control signals to the shifters 1172, 1173, 1180, 1181, 1186, 1187, the decoding unit 1184, and the output format unit 1188. The third function is detection of padding bits in the data register 1182 as described above.

データレジスタ１１８２中の新しいビット数（ｎｅｗ＿ｎｏｂ）は、データレジスタ１１８２（ｎｏｂ）中の現ビット数と現サイクルにおいてストリッパ１１７１からロード可能なビット数（ｎｏｓ）との加算し、現サイクルにおいてデータレジスタ１１８２から削除されるビット数（ｎｏｒ）を減算したものとして計算される。ここで、現サイクルは、復号サイクルあるいはパディングビット削除サイクルである。従って、新しいビット数は以下のように計算される。 The new number of bits (new_nob) in the data register 1182 adds the current number of bits in the data register 1182 (nob) to the number of bits (nos) that can be loaded from the stripper 1171 in the current cycle, and the data register 1182 in the current cycle. Is calculated by subtracting the number of bits to be deleted (nor). Here, the current cycle is a decoding cycle or a padding bit deletion cycle. Therefore, the new number of bits is calculated as follows:

ｎｅｗ＿ｎｏｂ＝ｎｏｂ＋ｎｏｓ−ｎｏｒ
これらの処理は加算器１２２１と減算器１２２２とで実行される。なお、現サイクルにおいてストリッパ１１７１からデータが入力されない場合には（ｎｏｓ）が０となる。また、データレジスタ１１８２においてビットが足りない、即ちデータレジスタ中のビットが制御部１１８５からの現符号長と続く付加ビット長との和以下であることにより、現サイクルにおいて復号処理が行われない場合にも（ｎｏｓ）は０となる。値（ｎｅｗ＿ｎｏｂ）は６４を越えることがあり、ブロック１２２４において越えているかどうかがチェックされる。このような場合には、ストリッパ１１７１は停止状態となり、新しいデータのロードがなされない。マルチプレクサ１２３３は、ストリッパ１１７１からロードされたビット数をゼロにするために用いられる。ここで、ストリッパ１１７１を停止させる信号は図示されていない。復号部１２３１からの信号「パディングサイクル」はマルチプレクサ１２３４を制御し、パディングビット数あるいは復号ビット数（符号ビットと付加ビットとの長さ）を削除すべきビット数（ｎｏｒ）として選択する。復号ビット数がデータレジスタ中のビット数（ｎｏｂ）以上であると、比較器１２２８において判断されると、マルチプレクサ１２３４に与えられるシフトすべき有効ビット数はＮＡＮＤゲート１２３０においてゼロに設定される。すなわち、（ｎｏｒ）はゼロに設定され、データレジスタのビットの削除は行われない。マルチプレクサ１２３４の出力は、ポストシフタ１１８２と１１８３の制御にも用いられる。データレジスタ１１８２の幅はデッドロック状態を避けるように設定される。すなわち、ストリッパ１１７１からの最大ビット数を収容するだけの領域をデータレジスタに確保するように、あるいは復号／パディングビット削除サイクルの結果として十分な有効ビット数が削除されるように設定される。 new_nob = nob + nos-nor
These processes are executed by the adder 1221 and the subtractor 1222. If no data is input from the stripper 1171 in the current cycle, (nos) is zero. Further, when there are not enough bits in the data register 1182, that is, the decoding process is not performed in the current cycle because the number of bits in the data register is less than or equal to the sum of the current code length from the control unit 1185 and the subsequent additional bit length In addition, (nos) is 0. The value (new_nob) can exceed 64 and it is checked in block 1224 if it exceeds. In such a case, the stripper 1171 is stopped and no new data is loaded. Multiplexer 1233 is used to zero the number of bits loaded from stripper 1171. Here, a signal for stopping the stripper 1171 is not shown. The signal “padding cycle” from the decoding unit 1231 controls the multiplexer 1234 to select the number of padding bits or the number of decoded bits (length of sign bit and additional bit) as the number of bits to be deleted (nor). If the comparator 1228 determines that the number of decoded bits is equal to or greater than the number of bits (nob) in the data register, the number of effective bits to be shifted supplied to the multiplexer 1234 is set to zero in the NAND gate 1230. That is, (nor) is set to zero, and no deletion of data register bits is performed. The output of the multiplexer 1234 is also used to control the post shifters 1182 and 1183. The width of the data register 1182 is set so as to avoid a deadlock state. That is, it is set so that an area sufficient to accommodate the maximum number of bits from the stripper 1171 is reserved in the data register, or a sufficient number of effective bits is deleted as a result of the decoding / padding bit deletion cycle.

復号サイクルにおいて削除されるビット数の計算は加算器１２２６において実行される。オペランドは組み合わせ回路復号部１１８４から入力される。１６ビットの符号長は復号部において”００００”と符号化されるため、”ｏｕ＿ｒｅｄｕｃｅ”ロジック１２２５では”００００”が”１００００”に符号化され、現在の符号なしのオペランドが得られる。このオペランドと減算器１２２７の出力とが、出力フォーマットシフタ１１８６と１１８７への制御信号を与える。 Calculation of the number of bits to be deleted in the decoding cycle is performed in adder 1226. The operand is input from the combinational circuit decoding unit 1184. Since the code length of 16 bits is encoded as “0000” in the decoding unit, the “ou_reduce” logic 1225 encodes “0000” to “10000” to obtain the current unsigned operand. This operand and the output of subtractor 1227 provide a control signal to output format shifters 1186 and 1187.

ブロック１２２９はＥＯＩ（画像終了）マーカ位置の検出に用いられる。ＥＯＩマーカ自身はストリッパ１１７１において削除されるが、ストリッパ１１７１で削除される以前にＥＯＩマーカに先立つ位置に存在していたデータの最終ビットとなるパディングビットは存在する。比較器１２２９では、レジスタ１２２３に格納されているデータレジスタ１１８２中のビット数が８以下であるかどうかをチェックする。８以下であれば、ストリッパ１１７１から新しいデータは入力されず（データレジスタ１１８２が復号されるデータ部の残りのビットを保持している）、残りのビットが削除されたＥＯＩマーカの前のパディング領域サイズを示すことになる。さらなるパディング領域の処理やパディングビットの削除などは、上述のＲＳＴマーカの前のパディングビットの場合に用いた手順と同一である。 Block 1229 is used to detect the EOI (end of image) marker position. The EOI marker itself is deleted in the stripper 1171, but there is a padding bit that is the last bit of the data that existed at the position preceding the EOI marker before being deleted by the stripper 1171. The comparator 1229 checks whether the number of bits in the data register 1182 stored in the register 1223 is 8 or less. If it is 8 or less, new data is not input from the stripper 1171 (the data register 1182 holds the remaining bits of the data portion to be decoded), and the padding area before the EOI marker from which the remaining bits are deleted Will indicate the size. Further processing of the padding area, deletion of the padding bits, and the like are the same as the procedure used in the case of the padding bits before the RST marker described above.

バレルシフタ１１８６、１１８７と出力フォーマット部１１８８とはサポートする投割を有し、実施例に応じたさまざまな実装を考えることができる。また、まったく実装されないこともあり得る。これらへの制御信号は上述のように制御部１１８５より与えられる。付加ビットプレシフタ１１８６はデータレジスタから３２ビットを入力し、現在復号されているハフマン符号長だけ左にシフトする。このようにして、現在復号されている符号に続くすべての付加ビットは、バレルシフタ１１８６の出力に合わせて左に位置することになり、バレルシフタ１１８７への入力として送られる。付加ビットポストシフタ１１８７は、データの出力フォーマットとして用いられ図９１にも示されている１１ビットフィールドにおいて、左整列から右整列に付加ビット位置を調整する。付加ビットフィールドは出力ワードフォーマット１１９６においてビット８からビット１８に拡張され、実際の付加ビット数に応じて最大ビットのいくつかは無効であることもある。このビット数はＪＰＥＧ標準で規定されているように１１９６のビット０から３に符号化される。出力データフォーマットとして異なるフォーマットを用いる場合には、フォーマットに応じてバレルシフタ１１８６、１１８７とその機能を変更することになる。 The barrel shifters 1186 and 1187 and the output format unit 1188 have allocations that support them, and various implementations can be considered according to the embodiments. It may also not be implemented at all. The control signals for these are given from the control unit 1185 as described above. The additional bit preshifter 1186 receives 32 bits from the data register and shifts to the left by the currently decoded Huffman code length. In this way, all additional bits following the currently decoded code will be positioned to the left with the output of the barrel shifter 1186 and sent as input to the barrel shifter 1187. The additional bit post shifter 1187 adjusts the additional bit position from the left alignment to the right alignment in the 11-bit field used as the data output format and also shown in FIG. The additional bit field is expanded from bit 8 to bit 18 in the output word format 1196, and some of the maximum bits may be invalid depending on the actual number of additional bits. This number of bits is encoded into 1196 bits 0 to 3 as specified in the JPEG standard. When a different format is used as the output data format, the barrel shifters 1186 and 1187 and their functions are changed according to the format.

出力フォーマットブロック１１８８は復号値をパックする処理を行い、ＪＰＥＧ標準では制御部１１８５から与えられるＤＣ／ＡＣ係数（１１９６，ビット０から７）とＤＣ係数指示ビット（１１９６，ビット１９）、付加ビットポストシフタ１１８７から与えられる付加ビット（１１９６，ビット８から１８）、マーカレジスタ１１８３から与えられるマーカ位置ビット（１１９６、ビット２３）とを図９１に示すフォーマットに従ってワードに構成する処理を行う。出力フォーマット部１１８８は、復号部の出力インタフェースに関する機能要件にも対処する。出力フォーマット部の実装は、異なる機能要件の結果として出力インタフェースを変更することになると、通常それに応じて変更される。上述のハフマン復号器は非常に効果的な復号処理を提供し、高速復号処理を実現する。 The output format block 1188 performs a process of packing the decoded value. In the JPEG standard, a DC / AC coefficient (1196, bits 0 to 7), a DC coefficient instruction bit (1196, bit 19), and an additional bit post given from the control unit 1185 are used. An additional bit (1196, bits 8 to 18) given from the shifter 1187 and a marker position bit (1196, bit 23) given from the marker register 1183 are processed into a word according to the format shown in FIG. The output formatting unit 1188 also addresses functional requirements regarding the output interface of the decoding unit. The implementation of the output format section is usually changed accordingly when the output interface is changed as a result of different functional requirements. The Huffman decoder described above provides a very effective decoding process and realizes a high-speed decoding process.

３．１７．８画像変換命令
これらの命令はソース画像の一般アフィン変換を行うためのものである。変換画像の一部を生成する処理は大きく２つのエリアに分けられる。一つはソース画像のどの部位が現在の出力スキャンラインと関連するかを決定するステップ、もう一つは必要なサブサンプリング／補間処理を行ってピクセルごとに出力画像を生成するステップである。 3.17.8 Image Conversion Instructions These instructions are for performing a general affine transformation of the source image. The process of generating a part of the converted image is roughly divided into two areas. One is to determine which part of the source image is associated with the current output scan line, and the other is to perform the necessary sub-sampling / interpolation process to generate an output image for each pixel.

図９２は、ソース画像の適切な領域が復号されているものとして、目的ピクセル値を計算するために必要なステップ７２０のフローチャートを示している。まず、サブサンプリングが行われていればサブサンプルが７２１で考慮される。次に、他の補間処理７２２と他のサブサンプリング処理といった２つの処理が通常実装されている。通常、補間とサブサンプリングとは別のステップであるが、補間とサブサンプリングとを一緒に行う場合もある。補間処理においては、まず周囲の４ピクセルを探し、プレ乗算７２３が必要であるかどうかを、双線形補間７２４を行う前に決定する。双線形補間処理７２４は一般に計算量が非常に多くなるため、これにより画像変換処理動作が制約される。目的ピクセル値を計算する最後のステップは、ソース画像から双線形補間されたサブサンプルを加算する処理である。加算されたピクセル値はさまざまな方法で積分７２７され、目的画像ピクセル７２８が生成される。 FIG. 92 shows a flowchart of the steps 720 required to calculate the target pixel value, assuming that the appropriate region of the source image has been decoded. First, if subsampling has been performed, the subsample is considered in 721. Next, two processes such as another interpolation process 722 and another sub-sampling process are usually implemented. Usually, interpolation and subsampling are separate steps, but interpolation and subsampling may be performed together. In the interpolation process, first, surrounding four pixels are searched, and whether or not the pre-multiplication 723 is necessary is determined before performing the bilinear interpolation 724. Since the bilinear interpolation processing 724 generally has a very large amount of calculation, this restricts the image conversion processing operation. The final step in calculating the target pixel value is the process of adding the bi-linearly interpolated subsamples from the source image. The summed pixel values are integrated 727 in various ways to produce a target image pixel 728.

画像変換命令のための命令ワード符号を図９３に示すとともに、マイナーオプコードフィールドの説明を以下の表に示す。
命令ワード：マイナーオプコードフィールド The instruction word code for the image conversion instruction is shown in FIG. 93, and the minor opcode field is described in the following table.
Instruction word: minor opcode field

命令オペランドや結果フィールドの説明を以下に示す。
命令オペランドと結果ワード Instruction operands and result fields are described below.
Instruction operand and result word

オペランドＡは、実際の変換を定義するために必要なすべての情報を記述している「カーネル記述子」として知られているデータストラクチャを指す。このデータストラクチャは２つのフォーマットのうちの１つとなる（Ａ記述子のＬビットで定義される）。図９４はカーネル記述子の長い符号フォーマットを示し、図９５は短い符号フォーマットを示す。カーネル記述子は、以下の情報を記述する。
１．ソース画像開始座標７３０（符号なしの固定長、２４．２４解像度）。位置（０、０）が画像の左上。
２．水平７３１と垂直７３２（サブサンプル）デルタ（２の補数、固定長、２４．２４解像度）
３．後述の固定長行列係数中のバイナリポイントの位置を示す３ビットのｂｐフィールド７３３
４．（存在する場合には）積分行列係数７３５。これらは、ｂｐフィールドによって暗黙的に指定されたバイナリ点の位置である２０のバイナリ点の「可変」ポイント解像度（２の補数）である。
５．カーネル記述子中の残りのワード数を示すｒｌフィールド７３６。この値は列数と行数とを掛けたものから１を引いた値となる。 Operand A refers to a data structure known as a “kernel descriptor” that describes all the information necessary to define the actual transformation. This data structure is in one of two formats (defined by the L bit of the A descriptor). FIG. 94 shows the long code format of the kernel descriptor, and FIG. 95 shows the short code format. The kernel descriptor describes the following information:
1. Source image start coordinate 730 (unsigned fixed length, 24.24 resolution). The position (0, 0) is the upper left of the image.
2. Horizontal 731 and vertical 732 (subsample) delta (2's complement, fixed length, 24.24 resolution)
3. A 3-bit bp field 733 indicating the position of a binary point in a fixed-length matrix coefficient described later
4). Integration matrix coefficient 735 (if present). These are the “variable” point resolution (2's complement) of 20 binary points, which are the positions of the binary points implicitly specified by the bp field.
5). An rl field 736 indicating the number of remaining words in the kernel descriptor. This value is obtained by subtracting 1 from the product of the number of columns and the number of rows.

記述子のカーネル係数は列ごとに並べられるが、ジグザグスキャンとなるように隣り合う列は逆方向に並べられる。図９６において、オペランドＢはソース画像のスキャンラインを指すインデックステーブルへのポインターから成る。インデックステーブルの構造は図９６に示されているように、オペランドＢ７４０がインデックステーブル７４１を指し、インデックステーブルが必要なソース画像ピクセルのスキャンライン（例えば７４２）を指すという構造である。一般に、インデックステーブルとソース画像ピクセルとはキャッシュ可能であり、ローカルメモリに位置している。 Although the kernel coefficients of the descriptor are arranged for each column, adjacent columns are arranged in the reverse direction so as to perform a zigzag scan. In FIG. 96, operand B consists of a pointer to an index table that points to the scan line of the source image. As shown in FIG. 96, the structure of the index table is such that the operand B 740 points to the index table 741, and the index table points to the scan line (eg, 742) of the source image pixels that are required. In general, the index table and source image pixels are cacheable and are located in local memory.

オペランドＣは水平／垂直サブサンプルレートを保持している。水平／垂直サブサンプルレートは、Ｃ記述子が存在する際に指定されるサブサンプル重み行列の次元によって定義される。行列ｒとｃの次元は、図９７に示すように画像変換命令のデータワードに符号化されている。結果ピクセルＰ［Ｎ］のチャネルＮは以下の式に基づいて計算される。 Operand C holds the horizontal / vertical subsample rate. The horizontal / vertical subsample rate is defined by the dimensions of the subsample weight matrix specified when the C descriptor is present. The dimensions of the matrices r and c are encoded in the data word of the image conversion command as shown in FIG. The channel N of the result pixel P [N] is calculated based on the following equation:

内部的には、積分値は各チャネルごとの３６のバイナリ点として保持される。フィールド中のバイナリ点の位置は、ＢＰフィールドによって指定される。ＢＰフィールドは削除する積分結果の先のビット数を示している。３６ビットの積分値は符号付きの２の補数として表現され、指定されたようにクランプ処理あるいはラップ処理される。図９８に、係数符号におけるＢＰフィールドの解釈例を示す。 Internally, the integral value is held as 36 binary points for each channel. The position of the binary point in the field is specified by the BP field. The BP field indicates the number of bits ahead of the integration result to be deleted. The 36-bit integral value is expressed as a signed 2's complement number and is clamped or wrapped as specified. FIG. 98 shows an example of interpretation of the BP field in the coefficient code.

３．１７．９畳込み命令
レンダリング画像に適用される畳込み処理は、２次元畳込みカーネルをソース画像に適用して結果画像を生成するものである。畳込み処理は通常、エッジ先鋭化やいろいろな画像フィルタにおいて用いられる。畳込み処理はコプロセッサ２２４において実装され、画像変換処理ではカーネルが各出力ピクセルごとにカーネル幅だけ移されるのに対し、畳込み処理では各出力ピクセルごとに１ソースピクセルが移動するといった点以外は、画像変換処理と同様の処理である。 3.17.9 Convolution Instructions A convolution process applied to a rendered image applies a two-dimensional convolution kernel to a source image to generate a result image. The convolution process is usually used in edge sharpening and various image filters. The convolution process is implemented in the coprocessor 224. In the image conversion process, the kernel is shifted by the kernel width for each output pixel, whereas in the convolution process, one source pixel is moved for each output pixel. This is the same processing as the image conversion processing.

ソース画像が値Ｓ（ｘ，ｙ）を有し、ｎｘｍ畳込みカーネルが値Ｃ（ｘ，ｙ）を有すると、ＳとＣの畳込みＨ［ｎ］のｎ番目のチャネルは、 If the source image has the value S (x, y) and the nxm convolution kernel has the value C (x, y), the nth channel of the S and C convolution H [n] is

で与えられる。ここで、ｉ∈［０，ｃ］，ｊ∈［０，ｒ］である。オフセット値の意味、中間結果の解像度、ｂｐフィールドの意味は画像変換命令と同一である。図９９は、畳込みカーネル７５０がソース画像７５１に適用し、結果画像７５２を生成する例を示した図である。ソース画像アドレス生成や出力ピクセル計算は、画像変換命令と同様に行われる。命令オペランドも画像変換と同様の形式である。図１００は、畳込み命令の命令ワード符号を示したものであり、以下の表が種々のフィールドの説明である。 Given in. Here, iε [0, c], jε [0, r]. The meaning of the offset value, the resolution of the intermediate result, and the meaning of the bp field are the same as those of the image conversion command. FIG. 99 is a diagram illustrating an example in which the convolution kernel 750 applies the source image 751 to generate a result image 752. Source image address generation and output pixel calculation are performed in the same manner as the image conversion command. The command operand is also in the same format as image conversion. FIG. 100 shows the instruction word code of the convolution instruction, and the following table describes the various fields.

命令ワード Instruction word

３．１７．１０行列乗算
行列乗算は、２つの色空間においてアフィン変換の関係が存在するような色空間変換処理などに用いられる。行列乗算は以下の式で定義される。 3.17.10 Matrix multiplication Matrix multiplication is used for color space conversion processing in which there is an affine transformation relationship between two color spaces. Matrix multiplication is defined by the following equation.

行列乗算命令オペランドと結果ワードは以下のフォーマットを有する。
命令オペランドと結果ワード Matrix multiply instruction operands and result words have the following format:
Instruction operand and result word

図１０１に行列乗算命令のための命令ワード符号を示すとともに、以下の表にマイナーオプコードフィールドを示す
命令ワード Figure 101 shows instruction word codes for matrix multiplication instructions, and the following table shows minor opcode fields.

３．１７．１１ハーフトーン化
コプロセッサ２２４はハーフトーン処理のための多値レベルディザーを備える。２から２５５までの値は意味のあるハーフトーンレベルとなる。ハーフトーンするデータは、スクリーンが対応してメッシュあるいはアンメッシュである限り、バイト（アンメッシュあるいはメッシュデータからの１チャネル）あるいはピクセル（メッシュ）のどちらでも良い。４つの出力チャネル（あるいは同一チャネルから４バイト）まで、一緒にパックされたようなあるいはバイトごとに１符号にアンパックされたようなパックビット（２レベルハーフトーンの場合）あるいは符号（２出力レベル以上の場合）生成することができる。 3.17.11 Halftoning The coprocessor 224 includes a multilevel dither for halftoning. Values between 2 and 255 are meaningful halftone levels. The halftoning data can be either bytes (one channel from unmesh or mesh data) or pixels (mesh) as long as the screen is mesh or unmesh corresponding. Up to 4 output channels (or 4 bytes from the same channel), packed bits that are packed together or unpacked into 1 code per byte (in case of 2 level halftone) or code (2 output levels or more) In case of)

出力ハーフトーン値は以下の式を用いて計算される。
（Ｐ×（ｌ−１）＋ｄ）／２５５
ここで、ｐはピクセル値（０≦ｐ≦２５５）、ｌはレベル数（２≦ｌ≦２５５）、ｄはディザ行列値（０≦ｄ≦２５４）である。オペランド符号は以下の通りである。
命令オペランドと結果ワード The output halftone value is calculated using the following formula:
(P × (l−1) + d) / 255
Here, p is a pixel value (0 ≦ p ≦ 255), l is the number of levels (2 ≦ l ≦ 255), and d is a dither matrix value (0 ≦ d ≦ 254). Operand codes are as follows.
Instruction operand and result word

命令ワード符号では、マイナーオプコードはハーフトーンレベル数を指定する。オペランドＢ符号はハーフトーンスクリーンのためのものであり、タイル合成と同様に符号化される。
３．１７．１２階層的画像フォーマット復号
階層的画像フォーマット復号処理は複数のステップを含む。これらのステップは、水平補間、垂直補間、ハフマン復号、残部融合である。各ステップは別の命令でもって実行される。ハフマン復号ステップでは、補間ステップからの補間された値に付加される残りの値がハフマン符号化される。従って、ＪＰＥＧ復号部がハフマン復号において用いられる。 In the command word code, the minor opcode specifies the number of halftone levels. The operand B code is for a halftone screen and is encoded in the same way as tile synthesis.
3.17.12 Hierarchical image format decoding The hierarchical image format decoding process includes a plurality of steps. These steps are horizontal interpolation, vertical interpolation, Huffman decoding, and residual fusion. Each step is executed with a separate instruction. In the Huffman decoding step, the remaining value added to the interpolated value from the interpolation step is Huffman encoded. Therefore, the JPEG decoding unit is used in Huffman decoding.

図１０２に、水平補間処理を示す。出力ストリーム７６１は入力ストリーム６７２の２倍のデータとなり、最後のデータ値７６３は複製されている７６４。図１０３は４倍の水平補間を行う例である。階層的画像フォーマット復号の第２ステップでは、線形補間によりピクセル列を２倍あるいは４倍に垂直にアップサンプルする。このステップでは、１ピクセル列がオペランドＡ，他の列がオペランドＢとなる。 FIG. 102 shows the horizontal interpolation process. The output stream 761 has twice the data of the input stream 672, and the last data value 763 is duplicated 764. FIG. 103 shows an example in which quadruple horizontal interpolation is performed. In the second step of hierarchical image format decoding, the pixel sequence is upsampled vertically by a factor of 2 or 4 by linear interpolation. In this step, one pixel column is operand A and the other column is operand B.

垂直補間の場合には２倍、４倍どちらの場合でも、出力データストリームは入力ストリームと同数のピクセルとなる。図１０４に、２つの入力データストリーム７７０、７７１を用いて２倍補間の出力ストリーム７７２と４倍補間の出力ストリーム７７３を生成する垂直補間の例が示されている。ピクセル補間の場合には、補間処理は４つのチャネルピクセルの４チャネルごとに別々に行われる。 In the case of vertical interpolation, the output data stream has the same number of pixels as the input stream in both cases of double and quadruple. FIG. 104 shows an example of vertical interpolation in which two input data streams 770 and 771 are used to generate a double interpolation output stream 772 and a quadruple interpolation output stream 773. In the case of pixel interpolation, the interpolation process is performed separately for each of the four channels of the four channel pixels.

残部融合処理は、２つのデータストリームのバイトごとの加算を含む。第一ストリーム（オペランドＡ）はベース値ストリームであり、第二ストリーム（オペランドＢ）は残値ストリームである。図１０５に、残部融合処理を用いた場合の２つの入力ストリーム７８０、７８１と対応する出力ストリーム７８２を示す。 The remainder fusion process includes byte-by-byte addition of the two data streams. The first stream (operand A) is a base value stream, and the second stream (operand B) is a residual value stream. FIG. 105 shows two input streams 780 and 781 and an output stream 782 corresponding to the case where the remaining part fusion processing is used.

図１０６は、階層的画像フォーマット命令の命令ワード符号を示したものであり、以下の表にマイナーオプコードフィールドの詳細を示す。
命令ワード−マイナーオプコードフィールド FIG. 106 shows the instruction word code of the hierarchical image format instruction, and details of the minor opcode field are shown in the following table.
Instruction word-minor opcode field

３．１７．１３命令コピー命令
これらの命令は２つのそれぞれ別のグループに分けられる。
ａ．汎用データ移動命令
これらの命令は、入力インタフェースモジュール、入力インタフェーススイッチ２５２、ピクセルオーガナイザ２４６、ＪＰＥＧ符号化部２４１、結果オーガナイザ２４９、出力インタフェースモジュールからなるコプロセッサ２２４内の通常のデータフローパスを用いる。この場合、ＪＰＥＧ符号化モジュールはデータを処理を行わずに直接送る。 3.17.13 Instruction Copy Instructions These instructions are divided into two separate groups.
a. General Data Movement Instructions These instructions use a normal data flow path within the coprocessor 224 consisting of an input interface module, input interface switch 252, pixel organizer 246, JPEG encoding unit 241, result organizer 249, and output interface module. In this case, the JPEG encoding module sends the data directly without processing.

データ操作動作の他の命令としては以下のものが挙げられる。
・サブバイト値（ビット、２ビット値、４ビット値）のバイトへのパッキング、アンパッキング
・ワード内でのバイトのパッキングとアンパッキング
・整列
・バイトレーンスワッピングと複製
・メモリクリア
・値の複製
データ操作動作は、ピクセルオーガナイザ（入力）と結果オーガナイザ（出力）の組み合わせで実行される。多くの場合、これらの命令は他の命令と組み合わせて用いられる。
ｂ．ローカルＤＭＡ命令
データ操作は行われない。図２に示すように、ローカルメモリ２３６と周辺インタフェース２３７間でデータ転送（双方向）が行われる。これらの命令は実行が他の命令とオーバラップする唯一の命令である。最大これらの命令の１つが「オーバラップしていない」命令と同時に実行することができる。 Other commands for data manipulation operations include the following.
-Packing and unpacking of sub-byte values (bits, 2-bit values, 4-bit values) into bytes-Packing and unpacking bytes within a word-Alignment-Byte lane swapping and duplication-Memory clear-Duplicate value data The operation is performed by a combination of a pixel organizer (input) and a result organizer (output). In many cases, these instructions are used in combination with other instructions.
b. Local DMA instruction No data manipulation is performed. As shown in FIG. 2, data transfer (bidirectional) is performed between the local memory 236 and the peripheral interface 237. These instructions are the only instructions whose execution overlaps with other instructions. At most one of these instructions can be executed simultaneously with a “non-overlapping” instruction.

メモリコピー動作では、オペランドＡはコピーするデータを示し、結果オペランドはメモリコピー命令の目的アドレスを示す。汎用のメモリコピー命令では、オペランドＢによって入力へのデータ操作動作が規定され、オペランドＣによって出力オペランドワードへの動作が規定される。
３．１７．１４フロー制御命令
フロー制御命令は、図９に示したような命令実行モデルのさまざまな部位を制御するための命令群である。フロー制御命令としては、命令ストリームを実行しちえるときに１つの仮想アドレスから他のアドレスへの移動を可能にする条件付きジャンプあるいは条件なしジャンプを含む。条件付きジャンプ命令は、コプロセッサやレジスタでもって関連するフィールドをマスクし、所定の値と比較することにより決定される。これにより命令の一般性を保つことができる。更に、フロー制御命令は、オーバラップ命令と非オーバラップ命令との間の同期をとるために、あるいはマイクロプログラミングの一部として用いられる待機命令をも含む。 In the memory copy operation, the operand A indicates data to be copied, and the result operand indicates the target address of the memory copy instruction. In the general-purpose memory copy instruction, the data manipulation operation to the input is defined by the operand B, and the operation to the output operand word is defined by the operand C.
3.17.14 Flow Control Instructions Flow control instructions are a group of instructions for controlling various parts of the instruction execution model as shown in FIG. Flow control instructions include conditional jumps or unconditional jumps that allow movement from one virtual address to another when the instruction stream is executed. A conditional jump instruction is determined by masking the associated field with a coprocessor or register and comparing it with a predetermined value. Thereby, the generality of the instruction can be maintained. In addition, flow control instructions also include wait instructions that are used to synchronize between overlapping and non-overlapping instructions or as part of microprogramming.

図１０７に、フロー制御命令の符号を示す。また、以下の表はマイナーオプコードの説明である。
命令ワード−マイナーオプコードフィールド FIG. 107 shows the code of the flow control instruction. The following table describes minor opcodes.
Instruction word-minor opcode field

ジャンプ命令においては、オペランドＡワードはジャンプ命令の目的アドレスを指定する。マイナーオプコードのＳビットが０にセットされれば、オペランドＢはコプロセッサレジスタを指定し、条件のソースとして用いる。オペランドＢ記述子の値はレジスタのアドレスを指定し、オペランドＢワードの値がレジスタ内容を比較する値となる。オペランドＣワードは結果に適用されるビットごとのマスクを指定する。すなわち、ジャンプ命令条件は以下のビットごとの式が満たされていれば真となる。 In a jump instruction, the operand A word specifies the target address of the jump instruction. If the S bit of the minor opcode is set to 0, operand B specifies a coprocessor register and is used as the source of the condition. The value of the operand B descriptor designates the register address, and the value of the operand B word is a value for comparing the register contents. The operand C word specifies a bit-wise mask applied to the result. In other words, the jump instruction condition is true if the following bitwise expression is satisfied.

（（（ｒｅｇｉｓｔｅｒ＿ｖａｌｕｅｘｏｒＯｐｅｒａｎｄＢ）ａｎｄＯｐｅｒａｎｄＣ）＝０ｘ００００００００）
更に、マイクロプログラミングレベルで十分に制御するためのレジスタアクセスのためにも当該命令が用いられる。
３．１８アクセラレータカードのモジュール
図２において、種々のモジュールを更に説明する。 (((Register_value xor Operating B) and Operating C) = 0x00000000)
Further, the instruction is used for register access for sufficient control at the microprogramming level.
3.18 Accelerator Card Modules Various modules are further described in FIG.

３．１８．１ピクセルオーガナイザ
ピクセルオーガナイザ２４６は入力インタフェーススイッチ２５２からのデータストリームのアドレスを指定してバッファに格納する。入力データはピクセルオーガナイザの内部メモリに格納されるか、あるいはＭＵＶバッファ２５０に格納される。入力ストリームに対する必要なのデータ処理を全部済ませた後、必要に応じて入力ストリームを主データパス２４２あるいはＪＰＥＧ符号化器２４１に渡す。ピクセルオーガナイザの動作モードは通常のＣＢｕｓインタフェースによって構成することができる。ピクセルオーガナイザ２４６はＰＯ＿ＣＦＧ制御レジスタの指定するような五つのモードのうちの一つのモードで動作する。これらのモードは次のとおりである。
（ａ）アイドルモード：ピクセルオーガナイザ２４６が動作しないモード。
（ｂ）シーケンシャルモード：入力データは内部ＦＩＦＯに格納されるようになり、ピクセルオーガナイザ２４６はデータの３２ビットアドレスを生成して入力インタフェーススイッチ２５２にデータを要求するモード。
（ｃ）色空間変換モード：ピクセルオーガナイザが色空間変換のためにピクセルをバッファするモード。更に、ＭＵＶバッファ２５０に格納されているインターバルおよび分数値を要求する。
（ｄ）ＪＰＥＧ圧縮モード：ピクセルオーガナイザ２４６が画像データをＭＣＵの形式でＭＵＶバッファに格納するモード。
（ｅ）畳込み演算および画像変換モード：ピクセルオーガナイザ２４６が行列係数をＭＵＶバッファ２５０に格納し、必要であれば主データパス２４２にもそれを伝えるモード。 3.18.1 Pixel Organizer The pixel organizer 246 specifies the address of the data stream from the input interface switch 252 and stores it in the buffer. Input data is stored in the internal memory of the pixel organizer or stored in the MUV buffer 250. After all necessary data processing for the input stream is completed, the input stream is passed to the main data path 242 or the JPEG encoder 241 as necessary. The operation mode of the pixel organizer can be configured by a normal CBus interface. Pixel organizer 246 operates in one of five modes as specified by the PO_CFG control register. These modes are as follows:
(A) Idle mode: A mode in which the pixel organizer 246 does not operate.
(B) Sequential mode: A mode in which input data is stored in the internal FIFO, and the pixel organizer 246 generates a 32-bit address of the data and requests data from the input interface switch 252.
(C) Color space conversion mode: A mode in which the pixel organizer buffers pixels for color space conversion. Further, the interval and fraction value stored in the MUV buffer 250 are requested.
(D) JPEG compression mode: A mode in which the pixel organizer 246 stores image data in the MUV buffer in the MCU format.
(E) Convolutional operation and image conversion mode: a mode in which the pixel organizer 246 stores matrix coefficients in the MUV buffer 250 and communicates them to the main data path 242 if necessary.

ピクセルオーガナイザ２４６は主データパス２４２とＪＰＥＧ符号化器２４１の両方ともの動作のためにＭＵＶバッファ２５０を使う。色空間変換において、インターバルおよび分数テーブルはＭＵＶＲＡＭ２５０によって格納され、３６ビットのデータ（４つのカラーチャネル）×（４ビットのインターバル値と８ビットの分数値）としてアクセスされる。画像変換および畳込み演算のために、ＭＵＶＲＡＭ２５０は行列係数および関連する構成データを格納する。係数行列は１６行×１６列に制限され、各係数の幅は最大２０ビットである。ＭＵＶＲＡＭ２５０は１クロックサイクルあたり１つの係数を必要とする。係数データに加えて、バイナリポイント、ソーススタート座標、サブサンプルデルタ等の制御情報も主データパス２４２に伝えなければならない。この制御情報は、行列係数より先にピクセルオーガナイザ２４６によってフェッチされる。 Pixel organizer 246 uses MUV buffer 250 for operation of both main data path 242 and JPEG encoder 241. In color space conversion, the interval and fraction table is stored by the MUV RAM 250 and is accessed as 36-bit data (4 color channels) × (4 bit interval value and 8 bit fractional value). For image conversion and convolution operations, MUV RAM 250 stores matrix coefficients and associated configuration data. The coefficient matrix is limited to 16 rows × 16 columns, and the width of each coefficient is a maximum of 20 bits. MUVRAM 250 requires one coefficient per clock cycle. In addition to the coefficient data, control information such as binary points, source start coordinates, subsample delta, etc. must also be communicated to the main data path 242. This control information is fetched by the pixel organizer 246 prior to the matrix coefficients.

ＪＰＥＧ圧縮において、ピクセルオーガナイザ２４６は、ＭＵＶバッファ２５０を使ってＭＣＵをダブルバッファする。ＪＰＥＧ圧縮の性能向上のためには、ダブルバッファ技術を使うことが望ましい。ＭＵＶＲＡＭ２５０の１半分は入力インタフェーススイッチ２５２からのデータを使って書き込まれる。一方、もう一方の半分は、ＪＰＥＧ符号化器２４１に送るべきデータを得るためにピクセルオーガナイザによって読み出される。ピクセルオーガナイザ２４６は、必要とされる所におけるカラー成分の水平サブサンプリングを行うとともに、入力画像のサイズがＭＣＵの整数倍でない場合にはＭＣＵをパディングする。 In JPEG compression, the pixel organizer 246 uses the MUV buffer 250 to double buffer the MCU. In order to improve the performance of JPEG compression, it is desirable to use a double buffer technique. One half of the MUV RAM 250 is written using data from the input interface switch 252. On the other hand, the other half is read by the pixel organizer to obtain data to be sent to the JPEG encoder 241. The pixel organizer 246 performs horizontal sub-sampling of the color components where needed, and pad the MCU if the input image size is not an integer multiple of the MCU.

ピクセルオーガナイザ２４６は、図３２において前述した、バイトレーンスワップと、正規化と、バイト入り代えと、バイトパックおよびアンパックと、複写動作とを含む入力データのフォーマットをも行う。動作はピクセルオーガナイザレジスタを設定することにより必要に応じて行われる。図１０８において、ピクセルオーガナイザ２４６をより詳細に説明する。ピクセルオーガナイザ２４６は、ＣＢｕｓインタフェース制御部８０１に含まれている自身のレジスタセットの制御に従い作動しており、ＣＢｕｓインタフェース制御部８０１はグローバルＣＢｕｓを経由して命令制御部２３５に接続されている。ピクセルオーガナイザ２４６にはオペランドフェッチ部８０２が含まれており、ピクセルオーガナイザ２４６が必要とするオペランドデータを入力インタフェーススイッチ２５２から要求する。、オペランドデータのスタートアドレスは、実行直前にセットされるＰＯ＿ＳＡＩＤレジスタによって指定される。ＰＯ＿ＳＡＩＤレジスタは、ＰＯ＿ＤＭＲレジスタのＬビットによる指定に応じて、即座のデータを保持することもある。現在アドレスポインタはＰＯ＿ＣＤＰレジスタに格納され、入力インタフェーススイッチの要求があればそのバースト長さだけ増加される。データがＭＵＶＲＡＭ２５０にフェッチされるとき、データの現在オフセットはＰＬ＿ＭＵＶレジスタによって指定されるＭＵＶＲＡＭ２５０のベースアドレスと連結される。 The pixel organizer 246 also formats the input data including the byte lane swap, normalization, byte replacement, byte pack and unpack, and copying operations described above with reference to FIG. Operations are performed as needed by setting pixel organizer registers. In FIG. 108, the pixel organizer 246 will be described in more detail. The pixel organizer 246 operates according to the control of its own register set included in the CBus interface control unit 801, and the CBus interface control unit 801 is connected to the instruction control unit 235 via the global CBus. The pixel organizer 246 includes an operand fetch unit 802, and requests operand data required by the pixel organizer 246 from the input interface switch 252. The start address of the operand data is specified by the PO_SAID register set immediately before execution. The PO_SAID register may hold immediate data according to the specification by the L bit of the PO_DMR register. The current address pointer is stored in the PO_CDP register and is increased by its burst length if requested by the input interface switch. When data is fetched into the MUV RAM 250, the current offset of the data is concatenated with the base address of the MUV RAM 250 specified by the PL_MUV register.

オペランドフェッチ部８０２によってフェッチされたシーケンシャル入力データをバッファするために、ＦＩＦＯ８０３が用いられる。データ操作部８０４は、図３２において説明したような様々な操作を実行する。データ操作部の出力はＭＵＶアドレス生成部８０５に伝えられる。ＭＵＶアドレス生成部８０５は構成レジスタに従ってデータをＭＵＶＲＡＭ２５０、主データパス２４２、ＪＰＥＧ符号化器２４１のどちらかに伝える。ピクセルオーガナイザ制御部８０６は、ピクセルオーガナイザ２４６のサブモジュール全てのために必要な制御信号を生成する状態機械である。必要な信号の中では、種々のＢｕｓインタフェース上での通信を制御する信号も含まれる。ピクセルオーガナイザ制御部は、状態レジスタの設定に従い他モジュール２３９が必要とする診断情報を出力する。 A FIFO 803 is used to buffer the sequential input data fetched by the operand fetch unit 802. The data operation unit 804 executes various operations as described in FIG. The output of the data operation unit is transmitted to the MUV address generation unit 805. The MUV address generation unit 805 transmits the data to one of the MUV RAM 250, the main data path 242, and the JPEG encoder 241 according to the configuration register. Pixel organizer controller 806 is a state machine that generates the necessary control signals for all sub-modules of pixel organizer 246. Among the necessary signals, signals for controlling communication on various bus interfaces are also included. The pixel organizer control unit outputs diagnostic information required by the other module 239 in accordance with the setting of the status register.

図１０９において、図１０８のオペランドフェッチ部８０２をより詳細に示す。オペランドフェッチ部８０２には、命令バスアドレス生成部（ＩＡＧ）８１０が含まれており、オペランドデータをフェッチせよという要求を生成する状態機械を含む。この要求は要求仲裁部８１１に送られが、要求仲裁部８１１はアドレス生成部８１０の要求とＭＵＶアドレス生成部８０５の要求（図１０８）との間を仲裁しており、勝ちの要求を入力（ＭＡＧ）インタフェーススイッチ２５２に送るようにしている。要求仲裁部８１１は要求を扱うための状態機械を含んでいる。これは、ＦＩＦＯカウント部８１４を用いてＦＩＦＯの状態をモニタし、次の要求をいつデスパッチすべきかを決定する。バイトイネーブル生成部８１２はＩＡＧ８１０の情報を受け取り、入力インタフェーススイッチ２５２がリターンする各オペランドにおける有効なバイトを指定するバイトイネーブルパタン８１６を生成する。バイトイネーブルパタンは関連するオペランドデータとともにＦＩＦＯに格納される。ＭＡＧ要求とＩＡＧ要求が同時に到着したとき、要求仲裁部８１１はＭＡＧ要求をＩＡＧ要求より優先して処理する。 109 shows the operand fetch unit 802 of FIG. 108 in more detail. The operand fetch unit 802 includes an instruction bus address generation unit (IAG) 810 and includes a state machine that generates a request to fetch operand data. This request is sent to the request arbitration unit 811. The request arbitration unit 811 arbitrates between the request of the address generation unit 810 and the request of the MUV address generation unit 805 (FIG. 108), and inputs the winning request ( MAG) interface switch 252. Request arbitration unit 811 includes a state machine for handling requests. This uses the FIFO count unit 814 to monitor the state of the FIFO and determine when the next request should be dispatched. The byte enable generation unit 812 receives the information of the IAG 810 and generates a byte enable pattern 816 that specifies a valid byte in each operand that the input interface switch 252 returns. The byte enable pattern is stored in the FIFO along with associated operand data. When the MAG request and the IAG request arrive at the same time, the request arbitration unit 811 processes the MAG request with priority over the IAG request.

図１０８において、ＭＵＶアドレス生成部８０５は異なるいくつかのモードで動作する。これらのモードにおいて、第１はＪＰＥＧ（圧縮）モードである。このモードでは、ＪＰＥＧ圧縮のための入力データがデータ操作部８０４によって供給され、ＭＵＶバッファ２５０はダブルバッファとして使われる。ＭＵＶＲＡＭ２５０アドレス生成部８０５は、データ操作部８０４によって処理された入力データを格納するに適するＭＵＶバッファのアドレスを生成する。ＭＡＧ８０５は、格納されたピクセルからカラー成分データを取り出すための読み出しアドレスを生成するとともに、ＪＰＥＧ圧縮用の８×８ブロークを形成するように動作する。ＭＡＧ８０５は、ＭＣＵが画像と一部重なっている場合も扱う。図１１０は、ＭＡＧ８０５が行うパディング動作の一例を示す。 In FIG. 108, the MUV address generation unit 805 operates in several different modes. In these modes, the first is the JPEG (compression) mode. In this mode, input data for JPEG compression is supplied by the data operation unit 804, and the MUV buffer 250 is used as a double buffer. The MUV RAM 250 address generation unit 805 generates an MUV buffer address suitable for storing the input data processed by the data operation unit 804. The MAG 805 generates a read address for retrieving color component data from the stored pixels and operates to form an 8 × 8 block for JPEG compression. The MAG 805 handles the case where the MCU partially overlaps the image. FIG. 110 shows an example of a padding operation performed by the MAG 805.

普通のピクセルデータにおいて、ＭＡＧ８０５は、４つの８ビットＲＡＭのＭＵＶＲＡＭ２５０における同じアドレス内に、４つのカラー成分を格納する。同じカラーチャネルからデータを同時に取り出すために、ＭＣＵデータは左にバレルシフトされてからＭＵＶＲＡＭ２５０に格納される。データの左にシフトされるバイト数は、書き込みアドレスの下位２ビットによって決定される。例えば、図１１１は、サブサンプリングの要らない場合３２ビットピクセルデータがＭＵＶＲＡＭ２５０内で配置されるデータ構造を示す。３チャネル又は４チャネルインタリーブＪＰＥＧモードにおいては、入力データのサブサンプリングが選択されることもあり得る。サブサンプリングを伴うマルチチャネルＪＰＥＧ圧縮モードにおいて、ＭＡＧ８０５（図１０８）は、ＪＰＥＧ符号化器の最適性能のために３２ビットデータがＭＵＶＲＡＭ２５０に格納される前にサブサンプリングを行うようになっている。最初四つの入力ピクセルの中で、ＭＵＶＲＡＭ２５０に格納される第１および第４番目のチャネルだけが有用なデータを含んでいる。第２および第３番目のチャネルのデータはサブサンプリングされ、ピクセルオーガナイザ２４６のレジスタに格納される。次の４つの入力ピクセルにおいて、第２および第３番目のチャネルはサブサンプリングされたデータをもって埋められる。図１１２は、マルチチャネルサブサンプリングモードにおけるＭＣＵデータ構成の一例を示す。ＭＡＧは単一チャネルアンパックデータ全てをマルチチャネルピクセルデータと全く同様に扱う。ＭＵＶＲＡＭから読み出された単一チャネルパックデータの一例が図１１３に示されている。 In normal pixel data, the MAG 805 stores four color components in the same address in four 8-bit RAM MUV RAMs 250. In order to simultaneously retrieve data from the same color channel, the MCU data is barrel shifted left and then stored in the MUV RAM 250. The number of bytes shifted to the left of the data is determined by the lower 2 bits of the write address. For example, FIG. 111 shows a data structure in which 32-bit pixel data is arranged in the MUV RAM 250 when sub-sampling is not required. In 3-channel or 4-channel interleaved JPEG mode, subsampling of the input data may be selected. In the multi-channel JPEG compression mode with subsampling, the MAG 805 (FIG. 108) performs subsampling before 32-bit data is stored in the MUV RAM 250 for optimal performance of the JPEG encoder. Of the first four input pixels, only the first and fourth channels stored in the MUV RAM 250 contain useful data. The second and third channel data is subsampled and stored in the pixel organizer 246 registers. In the next four input pixels, the second and third channels are filled with subsampled data. FIG. 112 shows an example of MCU data configuration in the multi-channel subsampling mode. MAG treats all single channel unpacked data exactly like multi-channel pixel data. An example of single channel pack data read from the MUV RAM is shown in FIG.

書き込みプロセスによって入力ＭＣＵがＭＵＶＲＡＭに格納されている間、読み出しプロセスはＭＵＶＲＡＭから８×８ブロックを読み出す。一般的に、前記ブロックは各チャネルに対してデータを順次読み出すことによって、四つの係数ずつＭＡＧ８０５によって生成される。ピクセルデータとアンパック入力データにおいて、格納されるデータは図１１１に示すように整理される。従って、サンプルされなかったピクセルデータからなる８×８ブロックを合成するためには、読み出しプロセスはＭＵＶＲＡＭからデータを斜行しながら読み出す。図１１４は、このようなプロセスの一例を示す。図１１４には、四つのチャネルデータにおける読み出しシケンス示されており、ＭＵＶＲＡＭ２５０の格納形式が同一チャネルから多数の値を同時に読み出すことを容易にしていることが分かる。 While the input MCU is stored in the MUV RAM by the write process, the read process reads 8 × 8 blocks from the MUV RAM. In general, the block is generated by the MAG 805 by four coefficients by sequentially reading data for each channel. In the pixel data and the unpacked input data, the stored data is organized as shown in FIG. Thus, to synthesize an 8 × 8 block of unsampled pixel data, the read process reads the data from the MUV RAM while skewing. FIG. 114 shows an example of such a process. FIG. 114 shows a reading sequence for four channel data, and it can be seen that the storage format of the MUV RAM 250 facilitates reading multiple values from the same channel simultaneously.

色変換モードにおいて、ＭＵＶＲＡＭ２５０はインターバルおよび分数値を格納するキャッシュとして用いられ、ＭＡＧ８０５はそのキャッシュの制御部として働くようになっている。ＭＵＶＲＡＭ２５０は３つのカラーチャネル値をキャッシュする。ここで、各カラーチャネルは２５６対の４ビットインターバルおよび分数値を有する。ＤＭＵを通じた各ピクセル出力において、ＭＵＶＲＡＭ２５０から前記値を得るためにＭＡＧ８０５が使われる。この値が得られないときに、ＭＡＧ８０５は欠けているインターバルおよび分数値をフェッチせよというメモリ読み出し要求を出す。帯域の有効利用のために、要求あたりエントリ一つだけをフェッチする手法のかわりに、多数のエントリをフェッチするような手法を取る。 In the color conversion mode, the MUV RAM 250 is used as a cache for storing intervals and fractional values, and the MAG 805 serves as a control unit for the cache. MUV RAM 250 caches three color channel values. Here, each color channel has 256 pairs of 4-bit intervals and fractional values. A MAG 805 is used to obtain the value from the MUV RAM 250 at each pixel output through the DMU. When this value is not available, the MAG 805 issues a memory read request to fetch the missing interval and fractional values. In order to effectively use the bandwidth, a method of fetching a large number of entries is used instead of a method of fetching only one entry per request.

画像変換および畳込み演算のために、ＭＵＶＲＡＭ２５０はＭＤＰの行列係数を記憶している。ＭＡＧはＭＵＶＲＡＭ２５０に格納されている全ての行列係数をスキャンする。画像変換および畳込み命令の始めにおたって、ＭＡＧ８０５はオペランドフェッチ部に要求を出し、オペランドフェッチ部がカーネル記述“ヘッダ”（図９４）とバスト要求の第１行列係数とをフェッチするようにする。 For image conversion and convolution operations, the MUV RAM 250 stores MDP matrix coefficients. The MAG scans all matrix coefficients stored in the MUV RAM 250. At the beginning of the image conversion and convolution instruction, the MAG 805 issues a request to the operand fetch unit so that the operand fetch unit fetches the kernel description “header” (FIG. 94) and the first matrix coefficient of the bust request. .

図１１５において、図１０８のＭＵＶアドレス生成部（ＭＡＧ）８０５をより詳細に示す。ＭＡＧ８０５はＩＢｕｓ要求を多重化するＩＢｕｓ要求モジュール８２０を備えており、ＩＢｕｓ要求は画像変換制御部（ＩＴＸ）８２１と色空間変換（ＣＳＣ）制御部８２２によって生成される。この要求は、要求を実行するようになっているオペランドフェッチ部に送られる。ピクセルオーガナイザ２４６は画像変換、色空間変換のどちらか１つのモードで動作するようになっているため、制御部８２１，８２２の間では仲裁が要らないことになる。ＩＢｕｓ要求モジュール８２０は、オペランドフェッチ部への要求を生成するのに必要なバストアドレスとバスト長さとを含む情報を、関連するピクセルオーガナイザから導出する。 115 shows the MUV address generation unit (MAG) 805 in FIG. 108 in more detail. The MAG 805 includes an IBus request module 820 that multiplexes IBus requests. The IBus request is generated by an image conversion control unit (ITX) 821 and a color space conversion (CSC) control unit 822. This request is sent to an operand fetch unit that is adapted to execute the request. Since the pixel organizer 246 operates in one of the image conversion mode and the color space conversion mode, no arbitration is required between the control units 821 and 822. The IBus request module 820 derives information from the associated pixel organizer including the bust address and bust length necessary to generate a request to the operand fetch unit.

ＪＰＥＧ制御部８２４は、ＪＰＥＧ書き込み制御部とＪＰＥＧ読み出し制御部という２つの状態機械を備えており、ＪＰＥＧモードにおいて使われる。前記二つの制御部は同時に作動するようになっており、内部レジスタを用いることによってお互いに同期を取る。ＪＰＥＧ圧縮動作において、ＤＭＵはＭＣＵデータを出力しＭＵＶＲＡＭに格納する。ＪＰＥＧ書き込み制御部は水平パディングとピクセルサブサンプリングの制御とを担当しており、ＪＰＥＧ読み出し制御部は垂直パディングを担当する。水平パディングはＤＭＵ出力を停止することによって行われ、垂直パディングは既に読み出した８×８ブロックを再び読み出すことによって行われる。 The JPEG control unit 824 includes two state machines, a JPEG write control unit and a JPEG read control unit, and is used in the JPEG mode. The two control units operate simultaneously and synchronize with each other by using an internal register. In the JPEG compression operation, the DMU outputs MCU data and stores it in the MUV RAM. The JPEG write control unit is responsible for horizontal padding and pixel subsampling control, and the JPEG read control unit is responsible for vertical padding. Horizontal padding is performed by stopping DMU output, and vertical padding is performed by rereading the 8 × 8 block that has already been read.

ＪＰＥＧ書き込み制御部は、ソース画像におけるＤＣＵおよびＤＭＵ出力ピクセルの現在位置をトラッキングしており、水平パディングのためにいつＤＭＵを停止すべきかを決定するのにその情報を用いる。ＭＣＵがＭＵＶＲＡＭ２５０に書き込まれたときに、ＪＰＥＧ書き込み制御部は内部レジスタをセットするかまたはリセットすることによって、ＭＣＵが画像の右エッジにあるかあるいは画像の最低エッジにあるかを表す。ＪＰＥＧ読み出し制御部は、前記レジスタの内容に基づき、垂直パディングが必要であるかや画像の最後のＭＣＵまで読んだのかを判断する。 The JPEG write controller tracks the current position of the DCU and DMU output pixels in the source image and uses that information to determine when to stop the DMU for horizontal padding. When the MCU is written to the MUV RAM 250, the JPEG write controller sets or resets an internal register to indicate whether the MCU is at the right edge of the image or at the lowest edge of the image. Based on the contents of the register, the JPEG read control unit determines whether vertical padding is necessary or whether the last MCU of the image has been read.

ＪＰＥＧ書き込み制御部はＤＭＵ出力データをトラッキングし、ＤＭＵ出力データをＭＵＶＲＡＭ２５０に格納する。前記制御部は、レジスタセットを用いて入力ピクセルの現在位置を記憶する。この情報はＤＭＵ出力を停止して水平パディングを行うときに使われる。全てのＭＣＵがＭＵＶＲＡＭ２５０に書き込まれたときに、前記制御部はＭＣＵ情報をＪＰＥＧ−ＲＷ−ＩＰＣレジスタに書き込み、以後ＪＰＥＧ読み出し制御部によって利用し得るようにする。 The JPEG write control unit tracks the DMU output data and stores the DMU output data in the MUV RAM 250. The controller stores the current position of the input pixel using a register set. This information is used when the DMU output is stopped and horizontal padding is performed. When all the MCUs have been written to the MUV RAM 250, the control unit writes the MCU information to the JPEG-RW-IPC register so that it can be subsequently used by the JPEG read control unit.

この制御部は、最後のＭＣＵがＭＵＶＲＡＭ２５０に書き込まれた後、ＳＬＥＥＰ状態に入り現在の命令が終了するまでその状態に残る。ＪＰＥＧ読み出し制御部は、ＭＵＶＲＡＭ２５０に格納されているＭＣＵから８×８ブロックを読み出す。マルチチャネルピクセルにおいては、制御部がＭＣＵを数回に渡って読み出すようになっており、ＭＵＶＲＡＭに格納されている各ピクセルから、各読み出しにおける異なるバイトを抽出する。 After the last MCU has been written to the MUV RAM 250, this controller enters the SLEEP state and remains in that state until the current instruction is finished. The JPEG read control unit reads 8 × 8 blocks from the MCU stored in the MUV RAM 250. In the multi-channel pixel, the control unit reads the MCU several times, and extracts different bytes in each readout from each pixel stored in the MUV RAM.

この制御部はＪＰＥＧ−ＲＷ−ＩＰＣによって提供される情報を用いて、垂直パディングを行うべきかを検出する。垂直パディングはＭＵＶＲＡＭ２５０から読み出した直前の８バイトを再び読み出すことによって行われる。画像変換制御部８２１はＩＢｕｓからカーネルディスクリプタを読み出し、カーネルヘッダをＭＤＰ２４２に伝える。そして、ｐｏ．ｌｅｎレジスタで指定された回数だけ行列係数をスキャンする。画像変換および畳込み命令において、ＰＯ２４６による全てのデータ出力はＩＢｕｓから直接フェッチされるようになっており、ＤＭＵには伝えられない。 This control unit detects whether to perform vertical padding using information provided by JPEG-RW-IPC. The vertical padding is performed by reading the previous 8 bytes read from the MUV RAM 250 again. The image conversion control unit 821 reads the kernel descriptor from the IBus and transmits the kernel header to the MDP 242. And po. The matrix coefficient is scanned the number of times specified by the len register. In the image conversion and convolution instructions, all data output by PO246 is fetched directly from the IBus and is not communicated to the DMU.

カーネルヘッダの直後フェッチされる第１行列係数の最初８ビットは、フェッチすべき残りの行列係数の数を表す。カーネルヘッダは修正されずに直接ＭＤＰに伝えられるが、行列係数はＭＤＰに伝えられる前にサイン拡張される。ピクセルサブサンプラ８２５は、それぞれが入力ワードの１バイトに対して動作する二つの同じチャネルサブサンプラを備える。関連する構成レジスタが起動されていないときに、ピクセルサブサンプラは自身の入力をそのまま自身の出力にコピーする。一方、構成レジスタが起動されているときに、サブサンプラは入力データに対して平均を取るか又は間引きを行うかすることによって入力データをサブサンプルする。 The first 8 bits of the first matrix coefficient fetched immediately after the kernel header represent the number of remaining matrix coefficients to be fetched. The kernel header is passed directly to MDP without modification, but the matrix coefficients are sign extended before being passed to MDP. Pixel subsampler 825 includes two identical channel subsamplers, each operating on one byte of the input word. When the associated configuration register is not activated, the pixel subsampler simply copies its input to its output. On the other hand, when the configuration register is activated, the subsampler subsamples the input data by averaging or decimating the input data.

ＭＵＶ多重化モジュール８２６は現在アクティブである制御部からＭＵＶ読み出しおよび書き込み信号を選ぶ。内部多重化部は、ＭＵＶＲＡＭ２５０を使う種々の制御部を経由して、読み出しアドレス出力を選ぶ。ＭＵＶＲＡＭ書き込みアドレスはＭＵＶ多重化モジュールの８ビットレジスタに格納されている。ＭＵＶＲＡＭ２５０を用いる制御部は次のＭＵＶＲＡＭアドレスを決定するための制御を行うとともに、書き込みアドレスレジスタをロードする。 The MUV multiplexing module 826 selects MUV read and write signals from the currently active controller. The internal multiplexing unit selects a read address output via various control units using the MUV RAM 250. The MUV RAM write address is stored in the 8-bit register of the MUV multiplexing module. The control unit using the MUV RAM 250 performs control for determining the next MUV RAM address and loads the write address register.

ＭＵＶ有効アクセスモジュール８２７は色空間変換制御部によって用いられ、データ操作部による現在ピクセル出力のインターバルおよび分数値がＭＵＶＲＡＭ２５０において利用できるかを決定する。一つ以上のカラーチャネルが欠けているとき、ＭＵＶ有効アクセスモジュール８２７は関連するアドレスをＩＢｕｓ要求モジュール８２０に伝え、インターバルおよび分数値をバーストモードでロードする。キャッシュミスがサービスされると、ＭＵＶ有効アクセスモジュール８２７は今までフェッチされたインターバルおよび分数値のセットを表す内部有効ビットをセットする。 The MUV effective access module 827 is used by the color space conversion control unit to determine whether the current pixel output interval and fractional value by the data manipulating unit are available in the MUV RAM 250. When one or more color channels are missing, the MUV valid access module 827 communicates the associated address to the IBus request module 820 and loads the interval and fractional values in burst mode. When a cache miss is serviced, the MUV valid access module 827 sets an internal valid bit representing the set of interval and fractional values fetched so far.

複写モジュール８２９は、内部ピクセルレジスタが定める回数だけ、入力データを複写する。複写モジュールが現在の入力ワードを複写している間、入力ストリームは停止されるようになる。ＰＢｕｓインタフェースモジュール８３０は、ピクセルオーガナイザ２４６を主データパス２４２およびＪＰＥＧ符号化器２４１にリタイムするか或いはその逆の処理をするのに使われる。最後に、ＭＡＧ制御部８３１は種々のサブモジュールをイニシエイトする信号とシャットダウンする信号とを生成する。なお、ＭＡＧ制御部８３１は、主データパス２４２およびＪＰＥＧ符号化器２４１からの入力ＰＢｕｓ信号に対する多重化をも行う。 The copy module 829 copies the input data as many times as determined by the internal pixel register. The input stream will be stopped while the copy module is copying the current input word. The PBus interface module 830 is used to retime the pixel organizer 246 to the main data path 242 and the JPEG encoder 241 and vice versa. Finally, the MAG control unit 831 generates a signal for initiating various submodules and a signal for shutting down. The MAG control unit 831 also multiplexes the main data path 242 and the input PBus signal from the JPEG encoder 241.

３．１８．２ＭＵＶバッファ
図２においては、これまでの説明から明らかなようにピクセルオーガナイザ２４６はＭＵＶバッファ２５０と相互関係にある。再コンフィギュレーション可能なＭＵＶバッファ２５０は単純ルックアップテーブルモード（モード０）、多重ルックアップテーブルモード（モード１）、ＪＰＥＧモード（モード２）を含む様々な処理モードをサポートしている。それぞれのモードで、バッファには異なるタイプのデータオブジェクトが格納される。例えば、バッファに格納されているデータワード、様々な検索テーブルの値、単一チャネルデータ、複数チャネルデータはデータオブジェクトである。一般的に、データオブジェクトは異なるサイズを持つ。更に再コンフィギュレーション可能なＭＵＶバッファ２５０に格納されたデータオブジェクトはバッファのオペレーティングモードに依存した様々な方法で実際にアクセスできる。 3.18.2 MUV Buffer In FIG. 2, the pixel organizer 246 is interrelated with the MUV buffer 250, as is apparent from the above description. The reconfigurable MUV buffer 250 supports various processing modes including simple lookup table mode (mode 0), multiple lookup table mode (mode 1), and JPEG mode (mode 2). In each mode, the buffer stores different types of data objects. For example, data words stored in a buffer, values in various search tables, single channel data, and multiple channel data are data objects. In general, data objects have different sizes. In addition, data objects stored in the reconfigurable MUV buffer 250 can actually be accessed in various ways depending on the operating mode of the buffer.

異なるタイプのデータを書き戻したり及び格納するのに必要な様々な方法を適切にするために、データオブジェクトはしばしば、格納される前に符号化される。データオブジェクトのコーディングに用いられる方法はデータオブジェクトのサイズ、表現されているデータオブジェクトのフォーマット、どのようにデータオブジェクトがバッファから書き戻されるのか、バッファ上に形成されたメモリモジュールの構成状態によって決定される。 Data objects are often encoded before they are stored in order to make the various methods necessary to write back and store different types of data. The method used to code the data object is determined by the size of the data object, the format of the data object being represented, how the data object is written back from the buffer, and the configuration state of the memory module formed on the buffer. The

図１１６は再コンフィギュレーション可能なＭＵＶバッファ２５０を実装するために用いられるコンポーネントのブロックダイアグラムである。再コンフィギュレーション可能なＭＵＶバッファ２５０はエンコーダ１２９０、ストレージデバイス１２９３、デコーダ１２９１、アドレス読み込み・ローテーション信号発生器１２９２からなる。入力データストリーム１２９５にデータオブジェクトが入力された時には、データオブジェクトはエンコーダ１２９０により内部データに符号化され、内部データストリーム１２９６に配置される。符号化されたデータオブジェクトはストレージデバイス１２９３に格納される。 FIG. 116 is a block diagram of components used to implement a reconfigurable MUV buffer 250. The reconfigurable MUV buffer 250 includes an encoder 1290, a storage device 1293, a decoder 1291, and an address read / rotation signal generator 1292. When a data object is input to the input data stream 1295, the data object is encoded into internal data by the encoder 1290 and placed in the internal data stream 1296. The encoded data object is stored in the storage device 1293.

格納されたデータオブジェクトを復号化する場合には、符号化されたデータは符号化データ出力ストリーム１２９７によりストレージデバイスから取り出される。符号化データ出力ストリーム１２９７上の符号化されたデータはデコーダ１２９１によって復号化される。復号化されたデータオブジェクトは出力データストリーム１２９８上に現れる。 When decoding the stored data object, the encoded data is retrieved from the storage device by means of an encoded data output stream 1297. The encoded data on the encoded data output stream 1297 is decoded by the decoder 1291. The decrypted data object appears on the output data stream 1298.

ストレージデバイス１２９３への書き込みアドレス１０３５はＭＡＧ８０５（図１０８）により与えられる。書き込みアドレス１２９９，１３００，１３０１も同様にＭＡＧ８０５（図１０８）によって与えられ、アドレス読み込み・ローテーション信号発生器１２９２によってストレージデバイス１２９３に分配される。アドレス読み込み・ローテーション信号発生器１２９２はまた、入力・出力ローテーション信号１３０３，１３０４をエンコーダ、デコーダそれぞれに対して生成する。書き込み有効信号１３０６と１３０７は外部ソースから与えられる。コントローラ８０１（図１０８）によって与えられる処理モード信号１３０２はエンコーダ１２９０、デコーダ１２９１、アドレス読み込み・ローテーション信号発生器１２９２、ストレージデバイス１２９３に接続される。インクリメント信号１３０８はアドレス読み込み・ローテーション信号発生器内の内部カウンタをインクリメントし、ＪＰＥＧモード（モード２）でも用いられることがある。 A write address 1035 to the storage device 1293 is given by the MAG 805 (FIG. 108). Write addresses 1299, 1300, and 1301 are also given by MAG 805 (FIG. 108) and distributed to storage devices 1293 by address read / rotation signal generator 1292. The address read / rotation signal generator 1292 also generates input / output rotation signals 1303 and 1304 for the encoder and decoder, respectively. Write enable signals 1306 and 1307 are provided from an external source. A processing mode signal 1302 provided by the controller 801 (FIG. 108) is connected to an encoder 1290, a decoder 1291, an address read / rotation signal generator 1292, and a storage device 1293. The increment signal 1308 increments an internal counter in the address read / rotation signal generator and may be used in the JPEG mode (mode 2).

再コンフィギュレーション可能なＭＵＶバッファ２５０が単純ルックアップテーブルモード（モード０）である場合には、本質的にバッファ２５０はむしろ、単一モードのメモリモジュールの様に動作する。データオブジェクトは本質的にメモリモジュールにアクセスする方法と同様な方法でバッファに格納あるいはバッファから取り出せる。 If the reconfigurable MUV buffer 250 is in simple look-up table mode (mode 0), the buffer 250 essentially operates like a single mode memory module. Data objects can be stored in and retrieved from the buffer in a manner essentially similar to accessing a memory module.

再コンフィギュレーション可能なＭＵＶバッファ２５０が多重ルックアップテーブルモード（モード１）で動作中の時、バッファ２５０はストレージデバイス１２９３に格納されている最大３つの検索テーブルをもちいて複数のテーブルに分割される。検索テーブルは同時かつ独立にアクセスすることができる。一例を挙げると、インターバルおよびフラクションの値は多重ルックアップテーブルモードのストレージデバイス１２９３に格納される、テーブルは入力データストリーム１２９５の下位３バイトを利用してインデックスがつけられる。３バイトのそれぞれはストレージデバイス１２９３に格納された独立の検索テーブルに発行される。 When the reconfigurable MUV buffer 250 is operating in the multiple lookup table mode (mode 1), the buffer 250 is divided into a plurality of tables using a maximum of three search tables stored in the storage device 1293. . The search table can be accessed simultaneously and independently. In one example, interval and fraction values are stored in a storage device 1293 in multiple lookup table mode, which is indexed using the lower 3 bytes of the input data stream 1295. Each of the 3 bytes is issued to an independent search table stored in the storage device 1293.

画像がＪＰＥＧ圧縮されているとき、画像は符号化されたデータストリームに変換される。ピクセルは原画像からＭＣＵのフォーマットで取り出される。ＭＣＵは画像の左から右に、上から下に読み出される。それぞれのＭＣＵは多数の８×８のブロックに再合成される。多数の８×８ブロックはＭＣＵから抽出される。ＭＣＵは原画像のカラーコンポーネント、複数チャネルのＪＰＥＧモード、サブサンプリングの必要性等のいくつかの要因に依存している。８×８のブロックはその後フォワードＤＣＴ（ＦＤＣＴ）、量子化、エントロピー符号化される。ＪＰＥＧ圧縮の場合には、符号化されたデータはデータストリームからシーケンシャルに読み込まれる。データストリームはエントロピー復号化、逆量子化、逆ＤＣＴ（ＩＤＣＴ）が行われる。ＩＤＣＴ処理の出力は８×８のブロックである。多数の８×８ブロックはＭＣＵを再構成するように統合される。ＪＰＥＧ圧縮を用いるとき、多数の８×８ブロックは前述の要因に依存する。再コンフィギュレーション可能なＭＵＶバッファ２５０はＭＣＵを多数の８×８ブロックに分解したり、多数の８×８ブロックをＭＣＵに再構成したりするときにも用いられる。 When an image is JPEG compressed, the image is converted into an encoded data stream. Pixels are extracted from the original image in MCU format. The MCU is read from left to right and from top to bottom of the image. Each MCU is recombined into a number of 8 × 8 blocks. A number of 8x8 blocks are extracted from the MCU. The MCU depends on several factors such as the color component of the original image, the multi-channel JPEG mode, the need for sub-sampling. The 8 × 8 block is then forward DCT (FDCT), quantized and entropy coded. In the case of JPEG compression, the encoded data is read sequentially from the data stream. The data stream is subjected to entropy decoding, inverse quantization, and inverse DCT (IDCT). The output of the IDCT process is an 8 × 8 block. Multiple 8x8 blocks are integrated to reconfigure the MCU. When using JPEG compression, a large number of 8 × 8 blocks depend on the aforementioned factors. The reconfigurable MUV buffer 250 is also used when decomposing an MCU into a large number of 8 × 8 blocks or reconfiguring a large number of 8 × 8 blocks into an MCU.

再コンフィギュレーション可能なＭＵＶバッファ２５０がＪＰＥＧモードの処理を行っているときはバッファ２５０への入力データストリーム１２９５はＪＰＥＧ圧縮処理を行っているピクセルあるいはＪＰＥＧ圧縮処理を行っている単一のコンポーネントを含んでいる。バッファ２５０の出力データストリームはＪＰＥＧ伸長処理の単一チャネルデータブロックあるいはＪＰＥＧ伸長処理のピクセルデータを含んでいる。このＪＰＥＧ圧縮の例では、入力ピクセルはＹ，Ｕ，Ｖ，Ｏの４チャネルまで構成できる。指定の数のピクセルが完成したピクセルブロックとして処理処理されたときには、単一のコンポーネントデータブロックの抽出が開始できる。それぞれの単一のコンポーネントデータブロックはバッファに格納された同チャネルのピクセルからなるデータにより構成される。従ってこの例では、４つまでの単一のコンポーネントデータブロックをひとつのピクセルデータブロックから抽出できる。この具体例では、再コンフィギュレーション可能なＭＵＶバッファ２５０がＪＰＥＧ圧縮用のＪＰＥＧモード（モード２）で処理を行っているときには、多数の単位最小コード（ＭＣＵ）はそれぞれ６４の単一あるいは複数チャネルのピクセルをバッファに格納でき、多数の６４バイト長の単一チャネルのコンポーネントデータブロックをバッファに格納されたそれぞれのＭＣＵから抽出できる。例えば、バッファ１２８９がＪＰＥＧ伸長を行うためにＪＰＥＧモード（モード２）である間は、出力データストリームは、Ｙ，Ｕ，Ｖ，Ｏの最大４つのコンポーネントを持つ出力ピクセルから構成される。要求された数の完成した単一のコンポーネントデータブロックをバッファに書き込んだときは、ピクセルデータの抽出ができる。異なる色のコンポーネントに対応する４つの単一のコンポーネントデータブロックからのバイトは出力ピクセルとして取り出される。 When the reconfigurable MUV buffer 250 is processing JPEG mode, the input data stream 1295 to the buffer 250 includes pixels that are JPEG compressed or a single component that is JPEG compressed. It is out. The output data stream of the buffer 250 includes a JPEG decompression processed single channel data block or JPEG decompression processed pixel data. In this JPEG compression example, the input pixel can be configured up to four channels of Y, U, V, and O. When a specified number of pixels have been processed as a completed pixel block, extraction of a single component data block can begin. Each single component data block consists of data consisting of pixels of the same channel stored in a buffer. Thus, in this example, up to four single component data blocks can be extracted from one pixel data block. In this specific example, when the reconfigurable MUV buffer 250 is processing in JPEG mode (mode 2) for JPEG compression, multiple unit minimum codes (MCUs) are each of 64 single or multiple channels. Pixels can be stored in a buffer, and a number of 64 byte long single channel component data blocks can be extracted from each MCU stored in the buffer. For example, while the buffer 1289 is in JPEG mode (mode 2) for performing JPEG decompression, the output data stream is composed of output pixels having a maximum of four components Y, U, V, and O. When the required number of completed single component data blocks have been written to the buffer, pixel data can be extracted. Bytes from four single component data blocks corresponding to different color components are retrieved as output pixels.

図１１７は図１１６のエンコーダ１２９０の詳細図である。ピクセルブロックの伸長のでは、入力データオブジェクトそれぞれはストレージデバイス１２９３に格納される前にバイト方向のローテーションにより符号化される（図１２９）。ローテーションの大きさは入力ローテーション制御信号１３０３により決定される。この例ではピクセルデータが最大の４バイトであったときは、３２ビットの４入力１出力のマルチプレクサ１３２０および１３２５が、４つのうちの１つの可能な入力ピクセルのローテーションの選択に用いられる。例えば、もしピクセルの４つのバイトが（３，２，１，０）のようにラベルが付けられていたとすると、このピクセルのローテーションは（３，２，１，０）（０，３，２，１）（１，０，３，２）（２，１，０，３）となる。４つの符号化されたバイトはストレージデバイスの１２９０に出力される。 FIG. 117 is a detailed view of the encoder 1290 of FIG. In pixel block decompression, each input data object is encoded by byte-wise rotation before being stored in the storage device 1293 (FIG. 129). The magnitude of the rotation is determined by the input rotation control signal 1303. In this example, when the pixel data is a maximum of 4 bytes, 32-bit 4-input 1-output multiplexers 1320 and 1325 are used to select one of the 4 possible input pixel rotations. For example, if the four bytes of a pixel are labeled as (3, 2, 1, 0), the rotation of this pixel is (3, 2, 1, 0) (0, 3, 2, 1) (1, 0, 3, 2) (2, 1, 0, 3) The four encoded bytes are output to the storage device 1290.

バッファがＪＰＥＧモード（モード２）以外のモード、例えば、単一ルックアップテーブルモード（モード０）、多重ルックアップテーブルモードである時には、バイト方向のローテーションは必要ではなく、また入力データオブジェクトに対して行えない。入力データオブジェクトは後者の場合に、ノーローテーションの値をもつ入力ローテーション制御信号を無視することによって、ローテーションにより妨害を受ける。この値１３２３はである。２入力１出力のマルチプレクサ１３２１は制御信号１３２６を入力ローテーション制御信号１３０３とノーオペレーション値１３２３の選択をすることによって生成する。現在の処理モード１３０２はマルチプレクサ選択信号を生成するために、ピクセルブロック分解モードの値と比較される。。信号１３２６によって制御される４入力１出力のマルチプレクサ１３２０は入力データオブジェクトの４つのローテーションのうち１つを選択し、符号化された入力データストリーム１３２６上に符号化された有力データオブジェクトを生成する。 When the buffer is in a mode other than JPEG mode (mode 2), for example, single lookup table mode (mode 0), multiple lookup table mode, no byte-wise rotation is necessary and for input data objects I can't. In the latter case, the input data object is disturbed by rotation by ignoring the input rotation control signal having a no-rotation value. This value 1323 is The 2-input 1-output multiplexer 1321 generates the control signal 1326 by selecting the input rotation control signal 1303 and the no-operation value 1323. The current processing mode 1302 is compared to the pixel block decomposition mode value to generate a multiplexer select signal. . A 4-input 1-output multiplexer 1320, controlled by signal 1326, selects one of the four rotations of the input data object and generates an encoded dominant data object on the encoded input data stream 1326.

図１１８は符号化された出力データストリーム１２９７を復号化するデコーダ１２９１を実装する組み合わせ回路の回路図である。デコーダ１３２１はエンコーダと本質的に同様な方法で動作する。デコーダはデータバッファがＪＰＥＧモード（モード２）である場合のみにデータを操作する。下部の符号化されたデータストリーム１２９７内の符号化された出力データオブジェクトの下位３２ビットはデコーダに渡される。データはエンコーダ１２９０でローテーションするのとは逆の感覚でバイト方向のローテーションを用いて復号化される。３２ビットの４入力１出力のマルチプレクサは、可能な４つの種類の符号化データのうちの１つを選択するために用いられる。例えば４バイトの入力ピクセルが（３，２，１，０）の様にラベルが付けられているとすると、このピクセルのローテーションの種類は（３，２，１，０）（２，１，０，３）（１，０，３，２）（０，３，２，１）の４つが可能である。出力ローテーション制御信号１３０４はバッファがピクセルブロック分解ノードの時と、他のオペレーションモードでノーオペレーション値が無視されたときに使用される。ノーオペレーション値１３３３は０である。２入力１出力のマルチプレクサ１３３１は、出力ローテーション制御信号１３０４とノーオペレーション値１３３３の選択を行うことで信号１３３４を生成する。現在の処理モード１３０２はマルチプレクサ選択信号１３３２を生成するために、ピクセルブロック分解モードの値と比較される。信号１３３４よって制御される４入力１出力のマルチプレクサ１３３０は符号化された出力データストリーム１２９７上の符号化された出力データオブジェクトの４種類のローテーションを選択し、出力データストリーム１２９８上に出力データを生成する。 FIG. 118 is a circuit diagram of a combinational circuit that implements a decoder 1291 that decodes the encoded output data stream 1297. The decoder 1321 operates in essentially the same manner as the encoder. The decoder manipulates data only when the data buffer is in JPEG mode (mode 2). The lower 32 bits of the encoded output data object in the lower encoded data stream 1297 are passed to the decoder. The data is decoded using the rotation in the byte direction in the reverse sense of rotation by the encoder 1290. A 32-bit 4-input 1-output multiplexer is used to select one of four possible types of encoded data. For example, if a 4-byte input pixel is labeled as (3, 2, 1, 0), the rotation type of this pixel is (3, 2, 1, 0) (2, 1, 0). , 3) (1, 0, 3, 2) (0, 3, 2, 1) are possible. The output rotation control signal 1304 is used when the buffer is a pixel block decomposition node and when no operation value is ignored in other operation modes. The no operation value 1333 is 0. The 2-input 1-output multiplexer 1331 generates the signal 1334 by selecting the output rotation control signal 1304 and the no-operation value 1333. The current processing mode 1302 is compared to the value of the pixel block decomposition mode to generate a multiplexer select signal 1332. A 4-input 1-output multiplexer 1330 controlled by signal 1334 selects four types of rotations of the encoded output data object on the encoded output data stream 1297 and generates output data on the output data stream 1298. To do.

図１１６において、回路で用いられる内部読み込みアドレス生成の方法は、再コンフィギュレーション可能なＭＵＶバッファ２５０の処理モード１３０２によって選択される。単一ルックアップテーブルモード（モード０）と多重ルックアップテーブルモード（モード１）では読み込みアドレスは外部読み込みアドレス１２９９，１３００，１３０１の形でＭＡＧ８０５（図１０８）によって生成される。単純ルックアップテーブルモード（モード０）ではストレージデバイス１２９３上にメモリモジュール１３８０，１３８１，１３８２，１３８３，１３８４，１３８５（図１２１）は一緒に処理する。メモリモジュール１３８０から１３８５（図１２１）に与えられる書き込みアドレスと読み込みアドレスは本質的に同じである。即ち、ストレージデバイス１２９３は外部回路に１つの読み込みアドレスと１つの書き込みアドレスの供給のみを必要とし、これらのアドレスをメモリモジュール１３８０から空１３８５（図１２１）に分配するために内部ロジックを使用する。モード０では、読み込みアドレスは外部アドレス１２９９（図１１６）により与えられ、本質的に変化しないまま内部アドレス１３４８（図１２１）に分配される。外部読み込みアドレス１３４９，１３５０，１３５１（図１２１）はモード０では使用されない。書き込みアドレスは外部書き込みアドレス１３０５（図１１６）により与えられ、本質的に修正なしで各メモリモジュール１３８０から１３８５（図１２１）の書き込みアドレスに接続される。 In FIG. 116, the method of generating the internal read address used in the circuit is selected by the processing mode 1302 of the reconfigurable MUV buffer 250. In the single lookup table mode (mode 0) and the multiple lookup table mode (mode 1), the read address is generated by the MAG 805 (FIG. 108) in the form of external read addresses 1299, 1300, 1301. In the simple lookup table mode (mode 0), the memory modules 1380, 1381, 1382, 1383, 1384, and 1385 (FIG. 121) process together on the storage device 1293. The write address and the read address given to the memory modules 1380 to 1385 (FIG. 121) are essentially the same. That is, the storage device 1293 only needs to supply one read address and one write address to the external circuit, and uses internal logic to distribute these addresses from the memory module 1380 to the empty 1385 (FIG. 121). In mode 0, the read address is given by the external address 1299 (FIG. 116) and distributed to the internal address 1348 (FIG. 121) with essentially no change. External read addresses 1349, 1350, and 1351 (FIG. 121) are not used in mode 0. The write address is given by the external write address 1305 (FIG. 116) and is connected to the write address of each memory module 1380-1385 (FIG. 121) essentially without modification.

ここでは、多重ルックアップテーブルモード（モード１）における３ルックアップテーブルの構成を示す。３つのテーブルが独立にアクセスされるとき、符号化された入力データは１３８０から１３８５（図１２１）までのすべてのメモりもジュールに同時に書き込まれ、従って３つのテーブルそれぞれに１つのインデックスが必要となる。メモリモジュール１３８０から１３８５（図２１２）への３つのインデックス、即ち読み込みアドレスはストレージデバイス１２９３により与えられる。これらの読み込みアドレスは、内部ロジックを用いて１３８０から１３８５の適切なメモリモジュールに分配される。本質的に単一ルックアップテーブルモードのときと同様な手法で、外部から与えられる書き込みアドレスは、本質的な変更なしに１３０８から１３８５のそれぞれのメモリモジュールのアドレスに接続される。その結果、多重ルックアップテーブルモード（モード１）では外部読み込みアドレス１２９９，１３００，１３１１は内部読み込みアドレス１３４８，１３４９，１３５０にそれぞれ分配される。内部読み込みアドレス１３５２はモード１では使用されない。ＪＰＥＧモード（モード２）で使用される内部アドレス生成方法は前述の方法とは異なる。 Here, the configuration of the three lookup tables in the multiple lookup table mode (mode 1) is shown. When the three tables are accessed independently, the encoded input data is also written to the joule at the same time from 1380 to 1385 (FIG. 121), thus requiring one index for each of the three tables. Become. Three indexes from memory module 1380 to 1385 (FIG. 212), ie, read addresses, are provided by storage device 1293. These read addresses are distributed to the appropriate memory modules 1380 to 1385 using internal logic. In a manner essentially similar to that in the single lookup table mode, the externally provided write address is connected to the addresses of the respective memory modules 1308 to 1385 without any substantial change. As a result, in the multiple lookup table mode (mode 1), the external read addresses 1299, 1300, and 1311 are distributed to the internal read addresses 1348, 1349, and 1350, respectively. The internal read address 1352 is not used in mode 1. The internal address generation method used in the JPEG mode (mode 2) is different from the method described above.

図１１９はＪＰＥＧ圧縮を行うＪＰＥＧモード（モード２）における、再コンフィギュレーション可能なデータバッファ用の、読み込みアドレスおよびローテーション信号生成回路１２９２を実装する組み合わせ回路の回路図である。ＪＰＥＧモード（モード２）では、信号生成器１２９２はコンポーネントカウンタ１３４０とデータバイトカウンタ１３４１の出力を、ストレージデバイス１２９３を含むメモリーモジュールの内部読み込みアドレスを計算するために用いている。コンポーネントブロックカウンタ１３４０はストレージデバイスに格納されている、ピクセルデータブロックから抽出したコンポーネントブロック数を生成する。そのブロック数はデータバイトカウンタ１３４１の出力を４倍することで与えられる。具体的には、ピクセルブロック分解モードにおける内部読み込みアドレス１３４８、１３４９、１３５０、１３５１は次のように計算される。コンポーネントブロックカウンタはオフセット値１３４３、１３４４、１３４５、１３４７を計算するために使用され、また出力データバイトカウンタ１３４１はベース読み込みアドレス１３５４を生成するために用いられる。オフセット値１３４３はベース読み込みアドレス１３５４に加算された１３５８で、加算値は内部読み込みアドレス１３４８（あるいは１３４９，１３５０，１３５１）である。メモリモジュールのオフセット値は、多重メモリモジュールで実行される同時読み込みに対して一般的に異なる値をとるが、コンポーネントブロックの抽出においては本質的に同じである。ピクセルデータブロック分解モードにおける４つの内部読み込みアドレスを計算するのに用いられるベースアドレス１３５４も同様である。インクリメント信号１３０８はコンポーネントバイトカウンタのインクリメント信号として使用される。カウンタは読み込みが成功する度にインクリメントされる。コンポーネントブロックカウンタインクリメント信号１３５６は、単一校正用をデータブロックが正常にバッファから取り出された後、コンポーネントブロックカウンタ１３４０をインクリメントするのに用いられる。 FIG. 119 is a circuit diagram of a combinational circuit that implements a read address and rotation signal generation circuit 1292 for a reconfigurable data buffer in JPEG mode (mode 2) in which JPEG compression is performed. In the JPEG mode (mode 2), the signal generator 1292 uses the outputs of the component counter 1340 and the data byte counter 1341 to calculate the internal read address of the memory module including the storage device 1293. The component block counter 1340 generates the number of component blocks extracted from the pixel data block stored in the storage device. The number of blocks is given by multiplying the output of the data byte counter 1341 by four. Specifically, the internal read addresses 1348, 1349, 1350, and 1351 in the pixel block decomposition mode are calculated as follows. The component block counter is used to calculate the offset values 1343, 1344, 1345, 1347, and the output data byte counter 1341 is used to generate the base read address 1354. The offset value 1343 is 1358 added to the base read address 1354, and the added value is the internal read address 1348 (or 1349, 1350, 1351). The offset value of the memory module generally takes a different value for the simultaneous reading executed in the multiple memory modules, but is essentially the same in the extraction of the component block. The base address 1354 used to calculate the four internal read addresses in the pixel data block decomposition mode is similar. The increment signal 1308 is used as an increment signal for the component byte counter. The counter is incremented for every successful read. Component block counter increment signal 1356 is used to increment component block counter 1340 after a data block has been successfully retrieved from the buffer for single calibration.

出力ローテーション制御信号１３０４（図１１６）はコンポーネントブロックカウンタの出力と出力データバイトカウンタの出力から取り出され、本質的に内部アドレスの生成と同じ方法である。コンポーネントブロックカウンタの出力はローテーションオフセット１３４７を計算するのに用いられる。出力ローテーション制御信号１３０４はローテーションオフセット１３５５とベース読み込みアドレス１３５４の和の最下位２ビットにより与えられる。入力ローテーション制御信号は、アドレス及びローテーション制御信号生成器の例の様に、外部書き込みアドレス１３０５の最下位２ビットにより与えられる。 The output rotation control signal 1304 (FIG. 116) is taken from the output of the component block counter and the output of the output data byte counter and is essentially the same method as the generation of the internal address. The output of the component block counter is used to calculate the rotation offset 1347. The output rotation control signal 1304 is given by the least significant 2 bits of the sum of the rotation offset 1355 and the base read address 1354. The input rotation control signal is given by the least significant 2 bits of the external write address 1305 as in the example of the address and rotation control signal generator.

図１２０は、再コンフィギュレーション可能なＭＵＶバッファ２５０に格納された単一コンポーネントデータからの多重チャネルピクセルデータの再構成に用いられるもう１つのアドレス生成器１２９２である。この場合、バッファはＪＰＥＧ伸長のためのＪＰＥＧモード（モード２）となる。この場合、単一コンポーネントデータブロックはバッファに格納され、ピクセルデータブロックはバッファから取り出される。この例では、メモリモジュールへの書き込みアドレスは、本質的変更なしで外部書き込みアドレス１３０５によって与えられる。単一コンポーネントブロックは連続したメモリに格納される。この例の入力ローテーション制御信号１３０３は単に書き込みアドレスの最下位２ビットによってセットされる。ピクセルカウンタ１３６０は、バッファ内に格納されている単一コンポーネントブロックから抽出されたピクセル数の記録を保持するために用いられる。ピクセルカウンタの出力は、読み込みアドレス１３４８、１３４９、１３５０、１３５１及び出力ローテーション制御信号１３０４を生成するために用いられる。一般に読み込みアドレスは、ストレージデバイス１２９３を構成するそれぞれのモジュール毎に異なっている。この例では、読み込みアドレスは単一コンポーネントブロックインデックス１３６２、１３６３、１３６４、１３６５あるいは１３６５とバイトインデックス１３６１の２つの部分からなる。特定のブロックの単一コンポーネントブロックインデックスを計算するために、オフセットが出力ピクセルカウンタのビット３と４に加えられる。一般にオフセット１３６６、１３６７、１３６８、１３６９はそれぞれの読み込みアドレスで異なる。ピクセルカウンタのビット２からビット０は読み込みアドレスのバイトインデックス１３６１に用いられる。読み込みアドレスは図１２０に示されるように、単一コンポーネントブロックインデックス１３６２、１３６３、１３６４、１３６５あるいは１３６５とバイトインデックス１３６１の結合の結果である。この例では、出力ローテーション制御信号１３０４は、本質的な変化なしにピクセルカウンタの出力のビット４とビット３により生成される。インクリメント信号１３０８はピクセルカウンタ１３６０をインクリメントするためのピクセルカウンタインクリメント信号として使用される。ピクセルカウンタ１３６０はピクセルが正常にバッファから取り出されたときにインクリメントされる。 FIG. 120 is another address generator 1292 used to reconstruct multi-channel pixel data from single component data stored in a reconfigurable MUV buffer 250. In this case, the buffer is in JPEG mode (mode 2) for JPEG decompression. In this case, the single component data block is stored in the buffer and the pixel data block is retrieved from the buffer. In this example, the write address to the memory module is provided by the external write address 1305 without any substantial modification. Single component blocks are stored in contiguous memory. The input rotation control signal 1303 in this example is simply set by the least significant 2 bits of the write address. Pixel counter 1360 is used to maintain a record of the number of pixels extracted from a single component block stored in the buffer. The output of the pixel counter is used to generate read addresses 1348, 1349, 1350, 1351 and an output rotation control signal 1304. In general, the read address is different for each module constituting the storage device 1293. In this example, the read address consists of two parts: a single component block index 1362, 1363, 1364, 1365 or 1365 and a byte index 1361. An offset is added to bits 3 and 4 of the output pixel counter to calculate a single component block index for a particular block. In general, offsets 1366, 1367, 1368, and 1369 are different for each read address. Bits 2 to 0 of the pixel counter are used as a byte index 1361 of the read address. The read address is the result of the combination of the single component block index 1362, 1363, 1364, 1365 or 1365 and the byte index 1361 as shown in FIG. In this example, the output rotation control signal 1304 is generated by bits 4 and 3 of the pixel counter output without any substantial change. The increment signal 1308 is used as a pixel counter increment signal for incrementing the pixel counter 1360. Pixel counter 1360 is incremented when a pixel is successfully retrieved from the buffer.

図１２１はストレージデバイス１２９３の構造である。ストレージデバイス１２９３は１３８３、１３８４、１３８５の３つの４ビットワイドメモリモジュールと１３８０，１３８１、１３８２の３つの８ビットワイドメモリモジュールを持つことができる。メモリモジュールは単一ルックアップテーブルモード（モード０）の３６ビットのワード、多重ルックアップテーブルモード（モード１）の１２×３ビットのワード、ＪＰＥＧモード（モード２）における３２ビットのピクセルあるいは４×８ビットの単一コンポーネントデータを格納するために結合できる。通常それぞれのメモリモジュールは符号化された入力及び出力データストリーム（１２９６と１２９７）の異なる部分に関連づけられる。たとえば、メモリモジュール１３８０は符号化された入力データストリーム１２９６のビット０からビット７に接続されデータ入力ポートと符号化された出力データストリーム１２９７のビット０からビット７に接続されたデータ出力ポートをもつ。この例ですべてのメモリモジュールの書き込みアドレスは一緒に接続され、同時に同じ値を共有する。一方、図１２１に示されるメモリモジュールの読み込みアドレス１３８６，１３８７，１３８８，１３９０，１３９１は読み込みアドレス生成器１２９２により与えられ、これらは一般に異なる値をとる。例では、共通の書き込み有効信号はすべての８ビットメモリモジュールに対して書き込み有効信号を出すために用いられ、第二の共通の書き込み有効信号はすべての４ビットメモリモジュールに対して書き込み有効信号を出すために用いられる。 FIG. 121 shows the structure of the storage device 1293. The storage device 1293 can have three 4-bit wide memory modules 1383, 1384, and 1385 and three 8-bit wide memory modules 1380, 1381, and 1382. The memory module is a 36-bit word in single lookup table mode (mode 0), a 12 × 3-bit word in multiple lookup table mode (mode 1), a 32-bit pixel in JPEG mode (mode 2) or 4 × Can be combined to store 8-bit single component data. Typically, each memory module is associated with a different portion of the encoded input and output data streams (1296 and 1297). For example, the memory module 1380 has a data input port connected to bits 0 to 7 of the encoded input data stream 1296 and a data output port connected to bits 0 to 7 of the encoded output data stream 1297. . In this example, the write addresses of all memory modules are connected together and share the same value at the same time. On the other hand, the read addresses 1386, 1387, 1388, 1390, and 1391 of the memory module shown in FIG. 121 are given by the read address generator 1292, and these generally take different values. In the example, a common write enable signal is used to issue a write enable signal to all 8-bit memory modules, and a second common write enable signal is used to issue a write enable signal to all 4-bit memory modules. Used to put out.

図１２２はストレージデバイス１２９３内のメモリモジュールにアクセスするための読み込みアドレス１３８６，１３８７，１３８８，１３８９，１３９０を生成するための組み合わせ回路の回路図である。符号化されたそれぞれの入力データオブジェクトは部分部分に分解され、それぞれの部分はストレージデバイスの独立したメモリモジュール内に格納される。従って通常、すべての処理モードにおけるすべてのメモリモジュールの書き込みアドレスは本質的には同じであり、メモリモジュールの書き込みアドレスを計算するために実質的にロジックは必要ない。一方、読み込みアドレスは通常、処理毎に異なり、それぞれの処理モードにおけるメモリモジュールそれぞれに対しても異なる。再コンフィギュレーション可能なＭＵＶバッファ２５０の出力データストリーム１２９８内のすべてのバイトはＪＰＥＧ圧縮のＪＰＥＧモード（モード２）のバッファに格納されているピクセルデータから抽出された単位コンポーネントデータ、あるいはＪＰＥＧ伸長のＪＰＥＧモードのバッファ内に格納されて単一コンポーネントデータから抽出されたピクセルデータを含まなくてはならない。出力データに対する要求はバッファへの４つの読み込みアドレス１３４８、１３４９、１３５０、１３５１の生成によって満たされる。多重ルックアップテーブルモード（モード１）においては、最大３つの検索テーブルがバッファに格納され、従って最大３つまでの読み込みアドレス１３４８、１３４９、１３５０が３つの検索テーブルにインデックスをつけるために必要である。すべてのメモリモジュールの読み込みアドレスは単一ルックアップテーブルモード（モード０）の場合と同じであり、読み込みアドレス２４８のみがこのモードで用いられる。図１２２に示されている制御回路の例はストレージデバイス１２９３を構成する６つのメモリモジュールそれぞれの読み込みアドレス１３８６−１３９１を計算するために、バッファの処理モード信号と最大４つの読み込みアドレスを用いる。読み込みアドレス生成器１２９２は入力信号として外部アドレスバス１３４８，１３４９、１３５０、１３５１からなる外部読み込み信号をもちい、ストレージデバイス１２９３を構成するメモリモジュールの内部読み込みアドレス１３８６，１３８７、１３８９、１３９０を生成する。 122 is a circuit diagram of a combinational circuit for generating read addresses 1386, 1387, 1388, 1389, and 1390 for accessing memory modules in the storage device 1293. FIG. Each encoded input data object is broken into partial parts, and each part is stored in an independent memory module of the storage device. Thus, typically, the write address of all memory modules in all processing modes is essentially the same, and virtually no logic is required to calculate the write address of the memory module. On the other hand, the read address is usually different for each process, and is different for each memory module in each processing mode. All bytes in the output data stream 1298 of the reconfigurable MUV buffer 250 are unit component data extracted from the pixel data stored in the JPEG compressed JPEG mode (mode 2) buffer, or JPEG decompressed JPEG It must contain pixel data stored in the mode's buffer and extracted from single component data. The request for output data is satisfied by the generation of four read addresses 1348, 1349, 1350, 1351 to the buffer. In the multiple lookup table mode (mode 1), a maximum of 3 lookup tables are stored in the buffer, so up to 3 read addresses 1348, 1349, 1350 are required to index the 3 lookup tables. . The read addresses of all memory modules are the same as in the single lookup table mode (mode 0), and only the read address 248 is used in this mode. The example of the control circuit shown in FIG. 122 uses a buffer processing mode signal and a maximum of four read addresses in order to calculate the read addresses 1386-1391 of each of the six memory modules constituting the storage device 1293. The read address generator 1292 uses external read signals including external address buses 1348, 1349, 1350, and 1351 as input signals, and generates internal read addresses 1386, 1387, 1389, and 1390 of memory modules that constitute the storage device 1293.

図１２３はバッファ２５０が単一ルックアップテーブルモードにある時に、どのようにして２０ビットの行列係数がバッファ２５０に格納されるのかを示した図である。この場合、データオブジェクトが再コンフィギュレーション可能なＭＵＶバッファに書き込まれるときにはキャッシュ上のデータオブジェクトに対してエンコーディングは通常行われない。行列係数は８ビットメモリモジュール１３８０，１３８１，１３８２に格納される。行列係数のビット７からビット０はメモリモジュール１３８０に格納され、ビット１５からビット８はメモリモジュール１３８１に格納され、ビット１９からビット１６はメモリモジュール１３８２の下位４ビットに格納される。命令の残りのために必要であるようなバッファに格納されたデータオブジェクトは何回も取り出される。単一ルックアップテーブルモードにおける、すべてのメモリモジュールの読み込みと書き込みのアドレスは本質的に同じである。 FIG. 123 is a diagram illustrating how 20-bit matrix coefficients are stored in the buffer 250 when the buffer 250 is in the single lookup table mode. In this case, when a data object is written to a reconfigurable MUV buffer, no encoding is normally performed on the data object on the cache. The matrix coefficients are stored in 8-bit memory modules 1380, 1381, 1382. Bits 7 to 0 of the matrix coefficient are stored in the memory module 1380, bits 15 to 8 are stored in the memory module 1381, and bits 19 to 16 are stored in the lower 4 bits of the memory module 1382. Data objects stored in the buffer as needed for the remainder of the instruction are retrieved many times. In the single lookup table mode, the read and write addresses for all memory modules are essentially the same.

図１２４は多重ルックアップテーブルモード（モード１）において、どのようにしてバッファにテーブルエントリが格納されるかを示した図である。この場合、３つの検索テーブルはバッファに格納され、それぞれの検索テーブルは４ビットのインターバル値と８ビットの小数値をもつ。通常インターバール値は４ビットのメモリモジュールに格納され、小数値は８ビットのメモリモジュールに格納される。この場合３つの検索テーブル１４１０，１４１１，１４１２はメモリバンク１３８０と１３８３、１３８１と１３８４、１３８２と１３８５に格納される。分離過去も未有効制御信号１３０６と１３０７（図１２１）はストレージデバイスに格納されている小数値に影響せずにストレージデバイス１２９３にインターバル値を書き込むことができる。本質的に同様な方法でインターバル値に影響を与えずに小数値を書き込むことができる。 FIG. 124 is a diagram showing how table entries are stored in the buffer in the multiple lookup table mode (mode 1). In this case, three search tables are stored in the buffer, and each search table has a 4-bit interval value and an 8-bit decimal value. Usually, the interval value is stored in a 4-bit memory module, and the decimal value is stored in an 8-bit memory module. In this case, the three search tables 1410, 1411 and 1412 are stored in the memory banks 1380 and 1383, 1381 and 1384, and 1382 and 1385, respectively. In the past separation, the invalid control signals 1306 and 1307 (FIG. 121) can write the interval value to the storage device 1293 without affecting the decimal value stored in the storage device. A decimal value can be written in an essentially similar manner without affecting the interval value.

図１２５はピクセルデータブロックを単一要素データブロックに分解するＪＰＥＧモード（モード２）の状態の再コンフィギュレーション可能なＭＵＶバッファ２５０にどのようにしてピクセルデータが書き込まれるのかを示した図である。ストレージデバイス１２９３は、８ビットメモリモジュールと同様な方法で統合して扱われるメモリモジュール、１３８１と１３８４を含むメモリモジュール１３８０、１３８１、１３８２、１３８３、１３８４からなる４つの８ビットメモリバンクとして統括される。メモリモジュール１３８５はＪＰＥＧモード（モード２）では使用されない。３２ビットの符号化されたピクセルは４つのバイトに分解され、それぞれが異なる８ビットのメモリモジュールに格納される。 FIG. 125 shows how pixel data is written to the reconfigurable MUV buffer 250 in JPEG mode (mode 2), which breaks the pixel data block into single element data blocks. The storage device 1293 is integrated as four 8-bit memory banks including memory modules 1380, 1381, 1382, 1383, and 1384 including memory modules 1338 and 1384 that are handled in an integrated manner in the same manner as the 8-bit memory modules. . The memory module 1385 is not used in the JPEG mode (mode 2). A 32-bit encoded pixel is broken down into four bytes, each stored in a different 8-bit memory module.

図１２６は単一コンポーネントモードであるストレージデバイス１２９３にどのようにして単一コンポーネントデータブロックが格納されるのかを示した図である。ストレージデバイス１２９３は、８ビットメモリモジュールと同様な方法で統合して扱われるメモリモジュール、１３８１と１３８４を含むメモリモジュール１３８０、１３８１、１３８２、１３８３、１３８４からなる４つの８ビットメモリバンクとして統括される。メモリモジュール１３８５はＪＰＥＧモード（モード２）では使用されない。３２ビットの符号化されたピクセルは４つのバイトに分解され、それぞれが異なる８ビットのメモリモジュールに格納される。この場合、単一コンポーネントブロックは６４バイトからなる。単いるコンポーネントブロックが亜バッファに書き込まれるときは、それぞれに異なる量のバイトローテーションが適用される。３２ビットの符号化されたピクセルデータはバッファ内の異なる単一コンポーネントデータブロックを読むことで取り出される。 FIG. 126 is a diagram showing how a single component data block is stored in the storage device 1293 in the single component mode. The storage device 1293 is integrated as four 8-bit memory banks including memory modules 1380, 1381, 1382, 1383, and 1384 including memory modules 1338 and 1384 that are handled in an integrated manner in the same manner as the 8-bit memory modules. . The memory module 1385 is not used in the JPEG mode (mode 2). A 32-bit encoded pixel is broken down into four bytes, each stored in a different 8-bit memory module. In this case, a single component block consists of 64 bytes. When a single component block is written to the subbuffer, a different amount of byte rotation is applied to each. The 32-bit encoded pixel data is retrieved by reading different single component data blocks in the buffer.

より詳細な再コンフィギュレーション可能なデータバッファ２５０の統括方法は、ピクセルオーガナイザの節を参照せよ。以上の具体例では、再コンフィギュレーション可能はデータバッファが、異なる命令と関係するデータの処理に用いられることを示した。３つの処理モードのある再コンフィギュレーション可能なデータバッファが明らかにされた。異なるアドレスの生成技術がバッファの処理モードのそれぞれにおいて必要となる。単一ルックアップテーブルモード（モード０）は画像変換において、行列係数をバッファに格納するのに用いられる。多重ルックアップテーブルモード（モード１）では多チャネルの色空間変換（ＣＳＣ）における多数のインターバル及びフラクション検索テーブルをバッファに格納するのに用いられる。ＪＰＥＧモード（モード２）はＪＰＥＧ圧縮、ＪＰＥＧ伸長それぞれにおいて、ＭＣＵデータを８×８の単一コンポーネントブロックに分解、あるいは８×８の単一コンポーネントブロックをＭＣＵに再合成するのに用いられる。 See the Pixel Organizer section for more details on how to control the reconfigurable data buffer 250. In the above example, reconfigurable indicates that the data buffer is used for processing data related to different instructions. A reconfigurable data buffer with three processing modes has been revealed. Different address generation techniques are required in each of the buffer processing modes. Single lookup table mode (mode 0) is used to store matrix coefficients in a buffer in image conversion. The multiple lookup table mode (mode 1) is used to store a number of interval and fraction search tables in a buffer in multi-channel color space conversion (CSC). The JPEG mode (mode 2) is used for decomposing MCU data into 8 × 8 single component blocks or recombining 8 × 8 single component blocks into MCUs in JPEG compression and JPEG decompression, respectively.

３．１８．３結果オーガナイザ
ＭＵＶバッファ２５０は結果オーガナイザ２４９においても用いられる。結果オーガナイザ２４９は、メインデータパス２４２あるいはＪＰＥＧコーダ２４１のストリームをバッファしてフォーマットする。結果オーガナイザ２４９はまた、図４２で説明した結果データの圧縮、非圧縮、非正規化、バイトレーンスワップ、再編成にも関係する。更に結果オーガナイザ２４９は外部インターフェースコントローラ２３８、ローカルメモリコントローラ２３６、周辺インターフェースコントローラ２３７の要求に対し、その結果を転送する。 3.18.3 Result Organizer The MUV buffer 250 is also used in the result organizer 249. The result organizer 249 buffers and formats the main data path 242 or JPEG coder 241 stream. The result organizer 249 is also related to the compression, uncompression, denormalization, byte lane swap, and reorganization of the result data described in FIG. Further, the result organizer 249 transfers the results in response to requests from the external interface controller 238, the local memory controller 236, and the peripheral interface controller 237.

ＪＰＥＧ伸長モードの時、結果オーガナイザ２４９はＭＵＶＲＡＭ２５０をＪＰＥＧコーダ２４９の画像データをダブルバッファするために用いる。ダブルバッファはＭＵＶＲＡＭ２５０の半分に書き込まれているＪＰＥＧコーダ２４１のデータを用いてＪＰＥＧ伸長する場合に、同時に残りの半分に書きこまれた画像データが指定の格納場所に出力されるとき、そのパフォーマンスをあげることができる。 When in JPEG decompression mode, result organizer 249 uses MUV RAM 250 to double buffer the image data of JPEG coder 249. When JPEG decompression is performed using the data of the JPEG coder 241 written in the half of the MUVRAM 250, the double buffer performs the performance when the image data written in the other half is output to the designated storage location at the same time. I can give you.

１，３及び４チャネル画像データは、同一チャネルからの８ビットのコンポーネントを含む８×８ブロックの形のＪＰＥＧ伸長を行っている間に、結果オーガナイザ２４９に渡される。結果オーガナイザはこれらのブロックを指定の順番でＭＵＶＲＡＭ２５０に格納し、また複数チャネルのインターリーブ画像のために、データをＭＵＶＲＡＭ２５０から読みこみを行っている時のチャネルのメッシュを格納する。例えば、ＹＵＶによる３チャネルのＪＰＥＧ圧縮ではＪＰＥＧコーダ２４１は３つの８×８ブロックを、初めにＹ、次にＵ、最後にＶの順で出力する。メッシュ処理がはそれぞれブロックか１つのコンポーネントを取り出すことによって行われ、ピクセルを（ＹＵＶＸ）の形で構成する。ここでＸは未使用チャネルである。バイトスワッピングは出力チャネルのスワップが必要となったときに行われる。結果オーガナイザはまた、伸長された出力データのクロマデータの再構成のための必要なサブサンプリング処理を行う必要がある。このことは生成するためにそれぞれのプログラムチャネルを繰り返すという意味を含んでいる。 The 1, 3 and 4 channel image data is passed to the result organizer 249 while performing JPEG decompression in the form of an 8x8 block containing 8 bit components from the same channel. The result organizer stores these blocks in the MUVRAM 250 in a specified order, and stores the mesh of the channel as it is reading data from the MUVRAM 250 for multi-channel interleaved images. For example, in 3-channel JPEG compression using YUV, the JPEG coder 241 outputs three 8 × 8 blocks in the order of Y, then U, and finally V. The meshing is done by taking out each block or one component and constructing the pixels in the form of (YUVX). Here, X is an unused channel. Byte swapping is performed when output channel swapping is required. The result organizer must also perform the necessary sub-sampling process for the reconstruction of the chroma data of the decompressed output data. This implies that each program channel is repeated to generate.

図１２７にもどると図２の結果オーガナイザ２４９の詳細が示されている。結果オーガナイザ２４９は、その処理に設定されるレジスタのレジスタファイルを含む通常の標準ＣＢｕｓインターフェース８４０周辺に基礎をおいている。結果オーガナイザ２４９の処理はピクセルオーガナイザ２４９と同様であるが、リバースデータ操作が行われる。データ操作ユニット８４２はバイトレーンスワッピング、コンポーネント代入、コンポーネント解放、非正規化をＭＵＶアドレス発生器８０５により生成されるデータに対して行う。実行された処理は図４２を参照して前述の通り説明され、内部レジスタにセットされた様々なフィールドに従って処理が行われる。ＦＩＦＯキュー８４３は出力データをそれがＲＢｕｓ制御ユニット８４４を用いて出力される前にバッファを行う。ＲＢｕｓ制御ユニット８４４はアドレスデコーダとアドレス生成器によって構成される。格納モジュール用のアドレスは、必要な出力バイト数のデータに加えて、内部レジスタに格納される。更に、内部ＲＯ＿ＣＵＴレジスタはいくつくらいの出力バイトが出力バスのバイトストリーム上に送られる前に欠落したかを決定する。加えて、ＲＯ＿ＬＭＴレジスタは出力制限が中止された後の次のデータを用いて最大いくつのデータ項目が出力されるかを決定する。ＭＡＧ８０５はＪＰＥＧ伸長時にＭＵＶＲＡＭ２５０のアドレスを生成する。ＭＵＶＲＡＭ２５０はＪＰＥＧコーダからの出力をダブルバッファするために用いられる。ＭＡＧ８０５は内部コンフィギュレーションレジスタに依存するＭＵＶＲＡＭ２５０におけるコンポーネントのメッシュを行い、ピクセルの入った単一チャネル、３チャネル、４チャネルの出力を行う。バイトレーンスワッピングがピクセルデータを適切な場所に格納する前に必要となるので、ＭＵＶＲＡＭ２５０から得られるデータはデータ操作ユニットを通して渡される。結果オーガナイザ２４９がＪＰＥＧモードになっていないときはＭＡＧ８０５は単にＰＢｕｓレシーバ８４５のデータをデータ操作ユニット８４２にダイレクトに送る。 Returning to FIG. 127, details of the results organizer 249 of FIG. 2 are shown. The result organizer 249 is based around the normal standard CBus interface 840 that contains the register file of the registers that are set for the process. The processing of the result organizer 249 is the same as that of the pixel organizer 249, but reverse data operation is performed. The data manipulation unit 842 performs byte lane swapping, component substitution, component release, and denormalization on the data generated by the MUV address generator 805. The executed processing is described as described above with reference to FIG. 42, and processing is performed according to various fields set in the internal register. FIFO queue 843 buffers the output data before it is output using RBus control unit 844. The RBus control unit 844 includes an address decoder and an address generator. The address for the storage module is stored in an internal register in addition to the required number of output bytes. Furthermore, the internal RO_CUT register determines how many output bytes are missing before being sent on the byte stream of the output bus. In addition, the RO_LMT register determines the maximum number of data items to be output using the next data after the output restriction is stopped. The MAG 805 generates an address of the MUVRAM 250 at the time of JPEG decompression. MUVRAM 250 is used to double buffer the output from the JPEG coder. The MAG 805 meshes the components in the MUVRAM 250 depending on the internal configuration register, and outputs a single channel containing pixels, three channels, and four channels. Since byte lane swapping is required before storing the pixel data in place, the data obtained from MUVRAM 250 is passed through the data manipulation unit. When the result organizer 249 is not in JPEG mode, the MAG 805 simply sends the data of the PBus receiver 845 directly to the data manipulation unit 842.

３．１８．４オペランドオーガナイザＢ及びＣ
図２に再び戻って、２つの独立なオペランドオーガナイザ２４７と２４８はデータキャッシュコントロール２４０のデータバッファの機能と、ＪＰＥＧコーダ２４１あるいはメインデータパス２４２にデータを転送する機能を持つ。オペランドオーガナイザ２４７と２４８は様々なモードで操作される。
（ａ）オペランドオーガナイザがＣＢｕｓ要求にたいしてのみ応答するアイドルモード
（ｂ）現在の命令のデータがオペランドレジスタの内部レジスタに格納されている時の直接モード
（ｃ）オペレータオーガナイザがシーケンシャルアドレスおよびデータキャッシュコントローラ２４０のバッファが満杯である時のデータを生成するシーケンシャルモード。 3.18.4 Operand organizers B and C
Returning to FIG. 2 again, the two independent operand organizers 247 and 248 have a data buffer function of the data cache control 240 and a function of transferring data to the JPEG coder 241 or the main data path 242. Operand organizers 247 and 248 operate in various modes.
(A) Idle mode in which operand organizer responds only to CBus request (b) Direct mode when data of current instruction is stored in internal register of operand register (c) Sequential address and data cache controller 240 by operator organizer A sequential mode that generates data when the buffer is full.

多数のメインデータパス２４２の処理モードは、少なくともどちらかのオペランドオーガナイザにシーケンシャルモードであることを要求する。オペランドオーガナイザＢ２４７における、合成を含むこれらのモードは、ほかのイメージを用いて合成されるバッファピクセルで必要である。オペランドオーガナイザＣ２４８はそれぞれのデータチャネルの値の減衰を行う合成処理に用いられる。ハーフトーンモードではオペランドオーガナイザＢ２４７は８ビットの行列係数のバッファを行い、階層的画像フォーマット分解モードではオペランドオーガナイザＢ２４７は垂直補間と残部融合命令の両方のデータのバッファを行う。
（ｄ）定常モードではオペランドオーガナイザＢは単一の内部データワードの組立とそのワードを内部レジスタによって指定された回数繰返すことを行う。
（ｅ）タイルモードではオペランドオーガナイザＢはピクセルタイルを構成するデータのバッファを行う。
（ｆ）ランダムモードでは、オペランドオーガナイザはデータキャッシュコントローラにＭＤＰ２４２あるいはＪＰＥＧコーダ２４１のアドレスをダイレクトに転送する。 The processing mode of multiple main data paths 242 requires at least one of the operand organizers to be in sequential mode. These modes, including compositing, in Operand Organizer B247 are necessary for buffer pixels that are composited using other images. The operand organizer C248 is used for a synthesis process for attenuating the values of the respective data channels. In the halftone mode, the operand organizer B247 buffers 8-bit matrix coefficients, and in the hierarchical image format decomposition mode, the operand organizer B247 buffers both vertical interpolation and residual fusion instruction data.
(D) In steady mode, the operand organizer B assembles a single internal data word and repeats that word a number of times specified by the internal register.
(E) In the tile mode, the operand organizer B buffers the data constituting the pixel tile.
(F) In the random mode, the operand organizer directly transfers the address of the MDP 242 or JPEG coder 241 to the data cache controller.

内部長さレジスタは、シーケンシャル、タイル、定常の各モードの処理の時に、オペランドオーガナイザ２４７、２４８の個々で生成される項目の数を決定する。オペランドオーガナイザ２４７、２４８それぞれは、はそれまでに処理されたデータ項目の数を保持し、内部レジスタによって決定される値に達したら停止する。オペランドオーガナイザそれぞれは、バイトレーンスワッピングを用いた入力データのフォーマット、コンポーネントの代入、圧縮・非圧縮・正規化機能、にたいしてより信頼がある。要求された処理は内部レジスタを用いてコンフィギュレーションされる。更に、オペランドオーガナイザ２４７と２４８それぞれはデータ項目を制限するためにコンフィギュレーションされる。 The internal length register determines the number of items generated by each of the operand organizers 247, 248 when processing in sequential, tile, and steady modes. Each of the operand organizers 247, 248 holds the number of data items processed so far and stops when it reaches a value determined by an internal register. Each operand organizer is more reliable for the format of input data using byte lane swapping, component substitution, compression / uncompression / normalization functions. The requested process is configured using internal registers. Further, each of the operand organizers 247 and 248 is configured to restrict data items.

図１２８では、オペランドオーガナイザ（２４７、２４８）のより詳細な構成が示されている。オペランドオーガナイザ２４７、２４８は通常の標準ＣＢｕｓインターフェースとオペランドオーガナイザ全体の制御を司るレジスタ８５０を含む。更に、ＯＢｕｓ制御ユニット８５１はデータキャッシュコントローラに接続され、シーケンシャル、タイル、定常の各モードのアドレス生成、オペランドオーガナイザ２４７，２４８のＯＢｕｓインターフェースの通信を可能にする制御信号の生成、入力ストリームの過去のクロックサイクルから保存される状態を必要とする、正規化、繰り返し等を行うデータ操作ユニットの制御を行う。オペランドオーガナイザ２４７、２４８がシーケンシャル、あるいはタイルモードであるときには、ＯＢｕｓコントローラユニット８５１はデータの要求をデータキャッシュコントローラに送る。このときアドレスは内部レジスタによって決定されている。 In FIG. 128, a more detailed configuration of the operand organizer (247, 248) is shown. Operand organizers 247 and 248 include a normal standard CBus interface and a register 850 that controls the entire operand organizer. Further, the OBus control unit 851 is connected to the data cache controller, generates addresses in sequential, tile, and steady modes, generates control signals that enable communication with the OBus interface of the operand organizers 247 and 248, and records the past of the input stream. Controls the data manipulation unit that performs normalization, iteration, etc., which requires a state saved from the clock cycle. When the operand organizers 247, 248 are in sequential or tile mode, the OBus controller unit 851 sends a request for data to the data cache controller. At this time, the address is determined by the internal register.

それぞれのオペランドオーガナイザは更に、様々なモードの処理において、データキャッシュコントローラ２４０からのデータをバッファするために用いられる３６ビット幅のＦＩＦＯバッファ８５２を含んでいる。データ操作ユニット８５３は、ピクセルオーガナイザ２４６のデータ操作ユニット８０４に対応する機能と同じ機能を行う。 Each operand organizer further includes a 36-bit wide FIFO buffer 852 that is used to buffer data from the data cache controller 240 in various modes of processing. The data operation unit 853 performs the same function as the function corresponding to the data operation unit 804 of the pixel organizer 246.

メインデータパス／ＪＰＥＧコーダインターフェース８５４は通常処理モードにおいてメインデータパスやＪＰＥＧコーダモジュール２４２、２４１でやりとりされるデータ及びアドレスを分配する。ＭＤＰ／ＪＣインターフェース８５４はデータ操作ユニット８５３からのデータをメインデータパス及びそのデータを繰り返すように構成されたプロセスに送る。色変換モードの場合には、ユニット８５１、８５４はデータキャッシュコントローラ２４０と色変換テーブルの高速アクセスを確立するためにバイパスされる。 The main data path / JPEG coder interface 854 distributes data and addresses exchanged by the main data path and the JPEG coder modules 242 and 241 in the normal processing mode. The MDP / JC interface 854 sends data from the data manipulation unit 853 to the main data path and a process configured to repeat the data. In the color conversion mode, units 851 and 854 are bypassed to establish fast access to the data cache controller 240 and the color conversion table.

３．１８．５主データパス部
以下の実施例の特徴は、複数の画像処理動作を高速で行うことのできる低価格のコンピュータアーキテクチャを提供する画像プロセッサに関するものである。更に、画像プロセッサは、元々は規定されなかった画像処理動作を行うように構成されることのできる、柔軟性のあるコンピュータアーキテクチャを提供することを目的とする。また、画像プロセッサは、同じロジックをたくさん持っていて、設計プロセスが簡単で安くなるような、コンピュータアーキテクチャを提供することをも目的とする。 3.18.5 Main Data Path Unit The features of the following embodiments relate to an image processor that provides a low-cost computer architecture capable of performing a plurality of image processing operations at high speed. Furthermore, the image processor aims to provide a flexible computer architecture that can be configured to perform image processing operations that were not originally defined. Another object of the image processor is to provide a computer architecture that has a lot of the same logic and that makes the design process simple and cheap.

コンピュータアーキテクチャは制御レジスタブロック、復号ブロック、データオブジェクトプロセッサ、および、フロー制御ロジックを具備する。制御レジスタブロックは画像処理動作に関する全ての情報を格納する。復号ブロックは情報を構成信号に復号し、入力データオブジェクトインターフェースを構成する。入力データオブジェクトインターフェースはデータオブジェクトを外部から受け取り格納する。そして、これらのデータオブジェクトをデータオブジェクトプロセッサに配分する。ある画像処理動作においては、入力データオブジェクトインターフェースがデータオブジェクトのアドレスを生成することもあり、これらのデータオブジェクトのソースが正しいデータオブジェクトを提供できるようになる。データオブジェクトプロセッサは、受け取ったデータオブジェクトに対して算術演算を行う。フロー制御ロジックは、データオブジェクト処理ロジックの中のデータオブジェクトフローを制御する。 The computer architecture includes a control register block, a decoding block, a data object processor, and flow control logic. The control register block stores all information relating to image processing operations. The decoding block decodes the information into constituent signals and constitutes an input data object interface. The input data object interface receives and stores data objects from the outside. These data objects are distributed to the data object processors. In certain image processing operations, the input data object interface may generate the addresses of the data objects so that the source of these data objects can provide the correct data objects. The data object processor performs arithmetic operations on the received data object. The flow control logic controls the data object flow in the data object processing logic.

特に、データオブジェクトプロセッサは、いくつかの同一なデータオブジェクトサブプロセッサを備えることができ、各サブプロセッサは、入力データオブジェクトの一部を処理する。データオブジェクトサブプロセッサは、データオブジェクトの当該部分に対し算術演算を行ういくつかの同一な多機能算術部、出力データオブジェクトを処理する後処理ロジック、および、多機能算術部と後処理部とを接続する多重化ロジックを有する。多機能算術部は計算されたデータオブジェクトのための記憶装置を具備する。この記憶装置は、フロー制御ロジックによってイネーブルされるか又はデスエーブルされる。多機能算術部および多重化ロジックは、復号ロジックによって生成された構成信号によって構成される。 In particular, the data object processor can comprise several identical data object sub-processors, each sub-processor processing a part of the input data object. The data object subprocessor connects several identical multi-function arithmetic units that perform arithmetic operations on that part of the data object, post-processing logic that processes output data objects, and multi-function arithmetic units and post-processing units Multiplexing logic. The multifunctional arithmetic unit comprises a storage device for calculated data objects. This storage is enabled or disabled by the flow control logic. The multi-function arithmetic unit and the multiplexing logic are constituted by constituent signals generated by the decoding logic.

なお、復号ロジックからの構成信号は外部プログラミングエージェントによって変化されることができる。このメカニズムを通じて、どのような多機能ブロックおよび多重化ロジックであっても、外部プログラミングエージェントによって個々に構成することができ、前もって規定されなかった画像処理動作を行うように画像プロセッサを構成することを可能にする。本発明の実施例が持つこれらの特徴およびその他の特徴を以下で詳述する。 Note that the configuration signal from the decoding logic can be changed by an external programming agent. Through this mechanism, any multi-functional block and multiplexing logic can be individually configured by an external programming agent to configure the image processor to perform image processing operations not previously defined. to enable. These and other features of the embodiments of the present invention are described in detail below.

図２において、前述したように、主データパス部２４２はＪＰＥＧデータ符号化以外の全てのデータ操作動作および命令を行う。これらの命令には、合成、色空間変換、画像変換、畳込み演算、行列の乗算、ハーフトーン処理、メモリ複写、および階層画像フォーマットの解凍が含まれる。主データパス２４２はピクセルオーガナイザ２４６およびオペランドオーガナイザ２４７、２４８から、ピクセルとオペランドデータとを受け取り、結果出力を結果オーガナイザ２４９に送る。 In FIG. 2, as described above, the main data path unit 242 performs all data operation operations and commands other than JPEG data encoding. These instructions include composition, color space conversion, image conversion, convolution, matrix multiplication, halftone processing, memory copying, and hierarchical image format decompression. Main data path 242 receives pixel and operand data from pixel organizer 246 and operand organizers 247, 248 and sends the result output to result organizer 249.

図１２９は、主データパス部２４２のブロック図である。主データパス部２４２は汎用の画像プロセッサであって、入力インターフェース１４６０、画像データプロセッサ１４６２、命令ワードレジスタ１４６４、命令ワード復号器１４６８、制御信号レジスタ１４７０、レジスタファイル１４７２、および、ＲＯＭ１４７５を備える。 FIG. 129 is a block diagram of the main data path unit 242. The main data path unit 242 is a general-purpose image processor, and includes an input interface 1460, an image data processor 1462, an instruction word register 1464, an instruction word decoder 1468, a control signal register 1470, a register file 1472, and a ROM 1475.

命令制御部２３５は、バス１４５４を通じて、命令ワードを命令ワードレジスタ１４６４へ移す。それぞれの命令ワードは、実行すべき画像処理動作の種類や画像処理動作の様々なオプションを選択するプラグなどの情報を含んでいる。命令ワードは、バス１４６５を経由して命令ワード復号器１４６８に運ばれる。それで、命令制御部２３５は、命令ワードを復号するように命令ワード復号器１４６８に指示することができる。その指示を受けると、命令復号器１４６８は命令ワードを制御信号に復号する。それから、これらの制御信号はバス１４６９を経由して制御信号レジスタ１４７０に運ばれる。それから、制御信号レジスタの出力は、バス１４７１を経由して入力インターフェース１４６０および画像データプロセッサ１４６２に接続される。 The instruction control unit 235 moves the instruction word to the instruction word register 1464 through the bus 1454. Each instruction word includes information such as the type of image processing operation to be performed and a plug for selecting various options of the image processing operation. The instruction word is carried to instruction word decoder 1468 via bus 1465. Thus, the instruction control unit 235 can instruct the instruction word decoder 1468 to decode the instruction word. Upon receiving the instruction, the instruction decoder 1468 decodes the instruction word into a control signal. These control signals are then carried to control signal register 1470 via bus 1469. The output of the control signal register is then connected to the input interface 1460 and the image data processor 1462 via the bus 1471.

主データパス部２４２をより柔軟性のあるものにするために、命令制御部２３５が制御信号レジスタ１４７０に直接書き込むこともできる。これによって、主データパス部２４２の構造を熟知している誰でも、主データパス部２４２の細かい構成を行えるようになり、主データパス部２４２は命令ワードで記述されていない画像処理動作をも実行できるようになる。 The instruction control unit 235 can also write directly to the control signal register 1470 to make the main data path unit 242 more flexible. As a result, anyone who is familiar with the structure of the main data path unit 242 can perform the detailed configuration of the main data path unit 242, and the main data path unit 242 can perform image processing operations not described in the instruction word. It can be executed.

所望の画像処理動作を実行するために必要な全ての情報を命令ワードに収容できない場合、命令制御部２３５は、その収容できない必要な全ての情報をレジスタファイル１４７２のいくつかの選ばれたレジスタに書き込むことができる。この情報は、バス１４７３を経由して、入力インターフェース１４６０および画像データプロセッサ１４６２に伝えられる。ある画像処理動作において、入力インターフェース１４６０は主データパス部２４２の現在状態を反映するために、レジスタファイル１４７２の選ばれたレジスタの内容を更新することもあり得る。画像処理動作を実行するときに問題が生じたとき、命令制御部２３５は前述の特徴を用いて、問題を容易に発見できるようになる。 If the instruction word cannot accommodate all the information necessary to execute the desired image processing operation, the instruction control unit 235 stores all the necessary information that cannot be accommodated in some selected registers of the register file 1472. Can write. This information is communicated to input interface 1460 and image data processor 1462 via bus 1473. In certain image processing operations, the input interface 1460 may update the contents of the selected register in the register file 1472 to reflect the current state of the main data path unit 242. When a problem occurs when executing an image processing operation, the command control unit 235 can easily find the problem using the above-described features.

命令ワードの復号が終了し、制御信号レジスタに所望する制御信号がロードされたとき、命令制御部２３５は主データパス部２４２に所望画像処理動作の実行を始めるように指示することができる。この指示を受けると、入力インターフェース１４６０はバス１４５１からのデータオブジェクトを受け取り始める。入力インターフェース１４６０は、実行される画像処理動作の種類に応じて、オペランドバス１４５２又はオペランドバス１４５３からのオペランドデータを受け取り始めるか、或は、オペランドデータのアドレスを生成してオペランドバス１４５２又はオペランドバス１４５３からのオペランドデータを受け取り始める。入力インターフェース１４６０は、制御信号レジスタ１４７０の出力に応じて、入力データを格納して配列し直す。アフィン画像変換動作および畳込み演算のような計算を行うとき、入力インターフェース１４６０はバス１４５２および１４５３を経由してフェッチされるべき座標をも生成する。 When the decoding of the instruction word is completed and the desired control signal is loaded into the control signal register, the instruction control unit 235 can instruct the main data path unit 242 to start executing the desired image processing operation. Upon receiving this instruction, the input interface 1460 begins to receive data objects from the bus 1451. Input interface 1460 begins to receive operand data from operand bus 1452 or operand bus 1453, depending on the type of image processing operation being performed, or generates an address for the operand data to generate operand bus 1452 or operand bus. Start receiving operand data from 1453. The input interface 1460 stores and rearranges input data according to the output of the control signal register 1470. When performing calculations such as affine image transformation operations and convolution operations, the input interface 1460 also generates coordinates to be fetched via buses 1452 and 1453.

画像データプロセッサ１４６２は、入力インターフェース１４６０に配列し直してもらったデータオブジェクトに対して主算術演算を行う。画像プロセッサ１４６２は、所定の補間ファクタで行われる２つのデータオブジェクトの間の補間、２つのデータオブジェクトの乗算、及びその結果を２５５で割る割算、２つのデータオブジェクトに対する通常の乗算及び足し算、データオブジェクトの分数部に対する様々な精度での切り捨て、データオブジェクトのオーバフローをある最大値に、そしてデータオブジェクトのアンダフローをある最低値にそれぞれ抑えるクランプ、データオブジェクトのスケーリング及びクランピングというような処理を行うことができる。バス１４７１の制御信号は、前記の算術演算中のどれがデータオブジェクトに対して行われるか、及びその動作の順序などを制御する。 The image data processor 1462 performs main arithmetic operations on the data objects that have been rearranged by the input interface 1460. Image processor 1462 performs interpolation between two data objects performed with a predetermined interpolation factor, multiplication of the two data objects, and division of the result divided by 255, normal multiplication and addition for the two data objects, data Perform operations such as truncation of the fractional part of the object with various precisions, clamping the data object to a certain maximum value, clamping the data object to a certain minimum value, scaling and clamping of the data object. be able to. Control signals on bus 1471 control which of the arithmetic operations described above is performed on the data object, the order of its operations, and the like.

ＲＯＭ１４７５は、８．８フォーマットで切り捨てられた２５５／ｘの被除数を有するが、ここで、ｘは０から２５５までの数である。ＲＯＭ１４７５は、バス１４７６を経由して、入力インターフェース１４６０および画像データプロセッサ１４６２に接続される。ＲＯＭ１４７５は短い長さのブレンドを生成し、データオブジェクトに２５５を掛け、その結果を他のデータオブジェクトで割るというような動作に用いられる。 ROM 1475 has a dividend of 255 / x truncated in 8.8 format, where x is a number from 0 to 255. The ROM 1475 is connected to the input interface 1460 and the image data processor 1462 via the bus 1476. The ROM 1475 is used for operations such as generating a short length blend, multiplying the data object by 255 and dividing the result by another data object.

オペランドバス、例えば１４５２の数は２に制限されるが、大多数の画像処理動作においては十分である。図１３０は、入力インターフェース１４６０をより詳細に示す。入力インターフェース１４６０は、データオブジェクトインターフェース部１４８０、オペランドインターフェース部１４８２および１４８４、アドレス生成状態器１４８６、ブレンド生成状態器１４８８、行列乗算状態器１４９０、補間状態器１４９４、データ同期部１５００、算術部１４９６、他レジスタ１４９８、並びに、データ分配ロジック１５０５を備える。 The number of operand buses, eg 1452, is limited to 2, but is sufficient for the majority of image processing operations. FIG. 130 shows the input interface 1460 in more detail. The input interface 1460 includes a data object interface unit 1480, operand interface units 1482 and 1484, an address generation state unit 1486, a blend generation state unit 1488, a matrix multiplication state unit 1490, an interpolation state unit 1494, a data synchronization unit 1500, an arithmetic unit 1496, Other registers 1498 and data distribution logic 1505 are provided.

データオブジェクトインターフェース部１４８０と、オペランドインターフェース部１４８２及び１４８４とは、外部からデータオブジェクト及びオペランドを受け取る。インターフェース部１４８２，１４８４は、２つとも制御バス１５１５からの制御信号によって構成される。インターフェース部１４８２，１４８４は、受け取ったばかりのデータオブジェクト／オペランドを含むデータレジスタを内部に有しており、２つとも前記データレジスタが有効であるときはＶＡＬＩＤ信号を出力する。インターフェース部１４８２，１４８４のデータレジスタの出力はデータバス１５０５に接続される。インターフェース部１４８２、１４８４のＶＡＬＩＤ信号はフローバス１５１０に接続される。オペランドをフェッチするように構成されたとき、オペランドインターフェース部１４８２および１４８４は、算術部１４９６からのアドレスと、行列乗算状態器１４９０と、データオブジェクトインターフェース部１４８０のデータレジスタの出力とを受け取り、その中で必要なアドレスを制御バス１５１５からの制御信号に応じて選択する。いくつかの場合、特に、外部からデータを受けて格納する必要がない場合、オペランドインターフェース部１４８２および１４８４のデータレジスタは、データオブジェクトインターフェース部１４８０または算術部１４９６のデータレジスタの出力からデータを格納するように構成されることが有り得る。 The data object interface unit 1480 and the operand interface units 1482 and 1484 receive data objects and operands from the outside. Both of the interface units 1482 and 1484 are configured by control signals from the control bus 1515. The interface units 1482 and 1484 have a data register including the data object / operand just received, and output a VALID signal when both of the data registers are valid. Outputs of the data registers of the interface units 1482 and 1484 are connected to the data bus 1505. The VALID signals of the interface units 1482 and 1484 are connected to the flow bus 1510. When configured to fetch operands, operand interface units 1482 and 1484 receive the address from arithmetic unit 1496, matrix multiply state unit 1490, and the output of the data register of data object interface unit 1480, among which The necessary address is selected according to the control signal from the control bus 1515. In some cases, particularly when there is no need to receive and store data from the outside, the data registers of operand interface units 1482 and 1484 store data from the output of the data register of data object interface unit 1480 or arithmetic unit 1496. It can be configured as follows.

アドレス生成状態器１４８６は、アフィン画像変換動作および畳込み演算動作において算術部１４９６を制御し、ソース画像のアクセスされるべき次の座標を計算する。アドレス生成状態器１４８６は、制御バス１５１５のＳＴＡＲＴ信号が設定されることを待つ。制御バス１５１５のＳＴＡＲＴ信号が設定されると、アドレス生成状態器１４８６はデータオブジェクトインターフェース部１４８０に対してＳＴＡＬＬ信号を解除して、データオブジェクトが到着することを待つ。なお、アドレス生成状態器１４８６は、アドレス生成状態器１４８６がフェッチすることを必要とするカーネルデスクリプタのデータオブジェクトの数と同じとなるようにカウンタを設定する。カウンタの出力は、復号され、オペランドインターフェース部１４８２および１４８４のデータレジスタと他レジスタ１４９８とのイネーブル信号になる。データオブジェクトインターフェース部１４８０からＶＡＬＩＤ信号が起動されると、アドレス生成状態器１４８６はカウンタを減少させるようになり、データオブジェクトの次の部分が異なるレジスタにラッチされる。 The address generation state machine 1486 controls the arithmetic unit 1496 in the affine image conversion operation and the convolution operation, and calculates the next coordinate to be accessed of the source image. Address generation state machine 1486 waits for the START signal on control bus 1515 to be set. When the START signal of the control bus 1515 is set, the address generation state machine 1486 releases the STALL signal to the data object interface unit 1480 and waits for the data object to arrive. Note that the address generation state machine 1486 sets the counter to be the same as the number of kernel descriptor data objects that the address generation state machine 1486 needs to fetch. The output of the counter is decoded and becomes an enable signal for the data registers of the operand interface units 1482 and 1484 and the other registers 1498. When the VALID signal is activated from the data object interface unit 1480, the address generation state machine 1486 starts to decrement the counter, and the next part of the data object is latched into a different register.

カウンタが零に達すると、アドレス生成状態器１４８６はオペランドインターフェース部１４８４からインデックステーブル値とピクセルとをフェッチし始めよとオペランドインターフェース部１４８２に指示する。なお、アドレス生成状態器１４８６は、行の数と列の数とをそれぞれ持つ２つのカウンタをロードする。全てのクロックエッジにおいて、かつオペランドインターフェース部１４８２などからのＳＴＡＬＬ信号によって停止されないとき、カウンタは減少され残りの行と列を出力する。そして、算術部１４９６は、フェッチされるべき次の座標を計算する。両方のカウンタが零に達すると、カウンタは行と列の数を再びロードし、算術部１４９６は次の行列の左上端を探すように構成される。 When the counter reaches zero, address generation state machine 1486 instructs operand interface unit 1482 to begin fetching index table values and pixels from operand interface unit 1484. The address generation state machine 1486 loads two counters each having the number of rows and the number of columns. At all clock edges and when not stopped by a STALL signal from operand interface 1482, etc., the counter is decremented and outputs the remaining rows and columns. The arithmetic unit 1496 then calculates the next coordinates to be fetched. When both counters reach zero, the counters are reloaded with the number of rows and columns, and the arithmetic unit 1496 is configured to look for the upper left corner of the next matrix.

ピクセルの真の値を決定するために補間が使われる場合、アドレス生成状態器１４８６は２つのクロックサイクルごとに、行および列の数を減少させる。これは１ビットカウンタを使って実行され、その出力は行および列カウンタのイネーブルとして用いられる。行列が一度スキャンされた後、状態器は長さカウンタのカウントを減少させる信号を送る。カウンタが１に達して、かつ最終インデックステーブルアドレスがオペランドインターフェース部１４８２に送られたとき、状態器は最終信号を出し、開始ビットをリセットする。 If interpolation is used to determine the true value of a pixel, address generation state machine 1486 reduces the number of rows and columns every two clock cycles. This is done using a 1-bit counter whose output is used as an enable for the row and column counters. After the matrix has been scanned once, the stateer sends a signal that decrements the length counter count. When the counter reaches 1 and the final index table address is sent to the operand interface unit 1482, the state machine issues a final signal and resets the start bit.

ブレンド生成状態器１４８８は、算術部１４９６を制御して、ブレンド長さのための０から２５５までの数列を生成する。この数列は、ブレンド開始値とブレンド終了値との間を補間する補間ファクタとして使われる。ブレンド生成状態器１４８８はどちらかのモード（ジャンプモード又はステップモード）で実行すべきであるかを決める。ブレンド長さが２５６以下である場合はジャンプモードが使われ、そうでない場合はステップモードが使われる。 The blend generation state machine 1488 controls the arithmetic unit 1496 to generate a sequence from 0 to 255 for the blend length. This number sequence is used as an interpolation factor for interpolating between the blend start value and the blend end value. The blend generation state machine 1488 determines which mode (jump mode or step mode) should be executed. If the blend length is 256 or less, the jump mode is used, otherwise the step mode is used.

ブレンド生成状態器１４８８は、下記の計算を行い、その結果をレジスタ（ｒｅｇ０，ｒｅｇ１，ｒｅｇ２）にセットする。ブランドランプが予め決定された長さでステップモードにある場合、５１１−長さをｒｅｇ０（２４ビット）に、５１２−２＊長さをｒｅｇ１（２４ビット）に、そして、終了−開始をｒｅｇ２（４×９ビット）に、それぞれラッチする。ランプがジャンプモードにある場合は、０をｒｅｇ０（２４ビット）に、２５５／（長さ−１）をｒｅｇ１（２４ビット）に、そして、終了−開始をｒｅｇ２（４×９ビット）に、それぞれラッチする。 The blend generation state machine 1488 performs the following calculation and sets the result in the registers (reg0, reg1, reg2). If the brand lamp is in step mode with a predetermined length, 511-length to reg0 (24 bits), 512-2 * length to reg1 (24 bits), and end-start to reg2 ( 4 × 9 bits). When the ramp is in jump mode, 0 is reg0 (24 bits), 255 / (length-1) is reg1 (24 bits), and end-start is reg2 (4x9 bits). Latch.

ステップモードにおいて、以下の処理が各サイクルにおいて実行される。ｒｅｇ０＞０であるとき、ｒｅｇ０にｒｅｇ１を加え、その結果をｒｅｇ０に格納する。もう一つのインクリメンタがイネーブルされることもできるが、その場合には出力が１だけ増加される。ｒｅｇ０≦０であるとき、ｒｅｇ０に５１０を加え、その結果をｒｅｇ０に格納する。インクリメンタは増加されない。インクリメンタの出力はランプ値である。 In the step mode, the following processing is executed in each cycle. When reg0> 0, reg1 is added to reg0, and the result is stored in reg0. Another incrementer can be enabled, in which case the output is increased by one. When reg0 ≦ 0, 510 is added to reg0, and the result is stored in reg0. Incrementers are not increased. The output of the incrementer is a ramp value.

ジャンプモードにおいて、以下の処理が各サイクルにおいて実行される。ｒｅｇ０にｒｅｇ１を加える。加算の出力は２４ビットであり、１６．８の固定少数点フォーマットで出力される。前記加算出力をｒｅｇ０に格納する。分数結果の第１ビットが１である場合、整数部を増加させる。インクリメンタの整数部の下位８ビットはランプ値である。このランプ値、即ちｒｅｇ２の出力と、ブレンド開始値とは画像データプロセッサ１４６２に送られ、ランプを生成する。 In the jump mode, the following processing is executed in each cycle. Add reg1 to reg0. The output of the addition is 24 bits and is output in a fixed decimal point format of 16.8. The addition output is stored in reg0. If the first bit of the fractional result is 1, the integer part is increased. The lower 8 bits of the integer part of the incrementer is a ramp value. The ramp value, that is, the output of reg2 and the blend start value are sent to the image data processor 1462 to generate a ramp.

行列乗算状態器１４９０は、変換行列を用いて入力データオブジェクトに対する線形色空間変換を行う。変換行列は４×５次元である。第１から第４列にはデータオブジェクトの４チャネルを掛けるようになっており、最後列は積の和に加えられるべき常係数を含んでいる。制御バス１５１５からのＳＴＡＲＴ信号が起動されたとき、行列乗算状態器は以下のように動く。 The matrix multiplication state unit 1490 performs linear color space conversion on the input data object using the conversion matrix. The transformation matrix is 4 × 5 dimensions. The first through fourth columns are multiplied by the four channels of the data object, and the last column contains the ordinary coefficients to be added to the sum of products. When the START signal from the control bus 1515 is activated, the matrix multiplication state machine operates as follows.

１）バス１４８２及び１４８４から変換行列の常係数をフェッチすべきライン番号を生成する。なお、他レジスタ１４９８をイネーブルして常係数が格納できるようにする。
２）１ビットフリップフロップを備えていて、ライン番号を生成して、バス１４８２および１４８４から行列の半分をフェッチするときにアドレスとして使う。なお、データオブジェクトの半分から、前記行列の半分に掛けられるべきものを選択する“ＭＡＴ＿ＳＥＬ”信号をも生成する。 1) Generate line numbers for fetching the ordinary coefficients of the transformation matrix from the buses 1482 and 1484. The other register 1498 is enabled so that the ordinary coefficient can be stored.
2) A 1-bit flip-flop is provided that generates line numbers and uses them as addresses when fetching half of the matrix from buses 1482 and 1484. It also generates a “MAT_SEL” signal that selects from half of the data object to be multiplied by half of the matrix.

３）データオブジェクトインターフェース部１４８０から入力されるデータオブジェクトがないとき終了する。
補間状態器１４９４は、データオブジェクトの水平補間を行う。水平補間において、主データパス部２４２はバス１４５１からデータオブジェクトストリームを受け取り、隣のデータオブジェクトの間を補間する。そして、元ストリームの２倍、又は４倍の長さであるデータオブジェクトのストリームを出力する。データオブジェクトはバイト又はピクセルにパックされることがあり得るため、補間状態器１４９４は、スループットが最大になるようにそれぞれの場合に異なる操作を行う。補間状態器１４９４は以下のように動作する。 3) When there is no data object input from the data object interface unit 1480, the process ends.
Interpolation state machine 1494 performs horizontal interpolation of data objects. In horizontal interpolation, the main data path unit 242 receives a data object stream from the bus 1451 and interpolates between adjacent data objects. Then, a data object stream that is twice or four times as long as the original stream is output. Since data objects can be packed into bytes or pixels, interpolation state machine 1494 performs a different operation in each case to maximize throughput. Interpolation state machine 1494 operates as follows.

１）ＩＮＴ＿ＳＥＬ信号を生成することによって、データ配分ロジック１５０３が入力データオブジェクトを再配列するようにし、正しいデータオブジェクト対に対して補間を行うようにする。
２）隣接するデータオブジェクト対の間を補間するための補間ファクタを生成する。 1) Generate an INT_SEL signal so that the data distribution logic 1503 rearranges the input data objects and interpolates for the correct data object pair.
2) Generate an interpolation factor for interpolating between adjacent data object pairs.

３）データオブジェクトインターフェース部１４８０がもうデータオブジェクトを受け入れないようにするＳＴＡＬＬ信号を生成する。これが必要とされる理由は、出力ストリームが入力ストリームより長いからである。ＳＴＡＬＬ信号はフローバス１５１０に送られる。
算術部１４９６は、算術計算を行うなめの回路を具備しており、制御バス１５１５の制御信号によって構成される。これは、アフィン画像変換および畳込み演算と合成においてのブレンド生成という２つの命令のみによって使われる。 3) Generate a STALL signal that prevents the data object interface 1480 from accepting the data object anymore. This is necessary because the output stream is longer than the input stream. The STALL signal is sent to the flow bus 1510.
The arithmetic unit 1496 includes a licking circuit that performs arithmetic calculation, and is configured by a control signal of the control bus 1515. This is only used by two commands: affine image transformation and convolution and blend generation in synthesis.

アフィン画像変換および畳込み演算において、算術部１４９６は以下のような演算を行う。
１）次のｘおよびｙ座標を計算する。ｘ座標を計算するために、算術部１４９６は加算器を用いて現在のｘ座標に水平および垂直デルタのｘ成分を加えるか、減算器を用いて現在のｘ座標から水平および垂直デルタのｘ成分を引くようにする。ｙ座標を計算するために、算術部１４９８は加算器を用いて現在のｙ座標に水平又は垂直デルタのｙ成分を加えるか、減算器を用いて現在のｙ座標から水平又は垂直デルタのｙ成分を引くようにする。 In the affine image conversion and the convolution operation, the arithmetic unit 1496 performs the following operation.
1) Calculate the next x and y coordinates. To calculate the x coordinate, the arithmetic unit 1496 uses an adder to add the horizontal and vertical delta x components to the current x coordinate, or uses a subtracter to calculate the horizontal and vertical delta x components from the current x coordinate. To pull. To calculate the y coordinate, the arithmetic unit 1498 uses an adder to add the y component of the horizontal or vertical delta to the current y coordinate, or uses a subtractor to calculate the y component of the horizontal or vertical delta from the current y coordinate. To pull.

２）ｙ座標をインデックステーブルオフセットに加算しインデックステーブルアドレスを計算する。ピクセルの元の値を求めるために補間を使う場合、前記の和はインデックスエントリを求めるために、更に４だけ増加される。
３）ｘ座標をインデックステーブルエントリに加算し、ピクセルのアドレスを求める。 2) Add the y coordinate to the index table offset to calculate the index table address. When using interpolation to determine the original value of the pixel, the sum is further increased by 4 to determine the index entry.
3) Add the x coordinate to the index table entry to determine the pixel address.

４）長さカウントから１を引く。
ブレンド生成において、算術部１４９６は以下のように作動する。
１）ステップモードにおいて、ある１つのランプ加算器を用いてランプ生成アルゴリズムの内部変数を計算する。一方、その他の１つの加算器は、インターバル変数が零より大きいときにランプ値を増加させるために用いられる。 4) Subtract 1 from the length count.
In blend generation, the arithmetic unit 1496 operates as follows.
1) In step mode, calculate an internal variable of the ramp generation algorithm using a single ramp adder. On the other hand, one other adder is used to increase the ramp value when the interval variable is greater than zero.

２）ジャンプモードにおいては、ジャンプ値を現在のランプ値に加えるために１つの加算器のみが必要とされる。
３）ジャンプモードでは、分数の切り捨てが行われる。
４）ランプ生成の始めにあたって、ブランドの終了からブランドの開始を引く。 2) In jump mode, only one adder is required to add the jump value to the current ramp value.
3) In jump mode, fractions are truncated.
4) At the beginning of ramp generation, subtract the start of the brand from the end of the brand.

５）長さカウントから１を引く。
他レジスタ１４９８は、データオブジェクトインターフェース部１４８０、並びに、オペランドインターフェース部１４８２及び１４８４において、データレジスタ以外の余分の格納空間を提供する。他レジスタ１４９８は、内部変数を格納するか、或はデータオブジェクトインターフェース部１４８０からの過去のデータオブジェクトをバッファするのにおいて使われるのが普通である。レジスタ１４９８は、制御バス１５１５の制御信号によって構成される。 5) Subtract 1 from the length count.
The other register 1498 provides an extra storage space other than the data register in the data object interface unit 1480 and the operand interface units 1482 and 1484. Other registers 1498 are typically used to store internal variables or buffer past data objects from the data object interface 1480. The register 1498 is configured by a control signal of the control bus 1515.

データ同期部１５００は、制御バス１５１５の制御信号によって構成される。データ同期部１５００は、ＳＴＡＬＬ信号をデータオブジェクトインターフェース部１４８０、並びに、オペランドインターフェース部１４８２および１４８４に提供することによって、あるインターフェース部が、他のインターフェースは持っていない一部データオブジェクトを受け取った場合、他のインターフェースの全てかデータを受け取るまでそのインターフェース部を停止させる。 The data synchronization unit 1500 is configured by a control signal of the control bus 1515. When the data synchronization unit 1500 provides a STALL signal to the data object interface unit 1480 and the operand interface units 1482 and 1484, when an interface unit receives a partial data object that other interfaces do not have, The interface part is stopped until all other interfaces receive data.

データ配分ロジック１５０５は、行列乗算状態器１４９０からのＭＡＴ＿ＳＥＬ信号と、補間状態器１４９４からのＩＮＴ＿ＳＥＬ信号とを含む制御バス１５１５の制御信号に応じて、データバス１５１０およびレジスタファイル１４７２からのデータオブジェクトをバス１５３０を経由して再配列する。再配列されたデータはバス１４６１へ出力される。 The data distribution logic 1505 is responsible for data objects from the data bus 1510 and the register file 1472 in response to control signals on the control bus 1515 including the MAT_SEL signal from the matrix multiplication state unit 1490 and the INT_SEL signal from the interpolation state unit 1494. Rearrange via bus 1530. The rearranged data is output to the bus 1461.

図１３１は、図１２９の画像データプロセッサ１４６２をより詳細に示す。画像データプロセッサ１４６２は、パイプライン制御部１５４０と、多数のカラーチャネルプロセッサ１５４５，１５５０，１５５５、及び１５６０とを有する。全てのカラーチャネルプロセッサは、入力インターフェース１４６０（図１３１）によって駆動されるバス１５６５から入力を受け取る。全てのチャネルプロセッサとパイプライン制御部１５４０は、バス１４７２を経由する、制御信号レジスタ１４７０からの制御信号によって構成される。全てのカラーチャネルプロセッサは、図１２９のレジスタファイル１４７２及びＲＯＭ１４７５からの入力をもバス１５８０を経由して受け取ることがある。全てのカラーチャネルプロセッサとパイプライン制御部との出力はグループされてバス１５７０となり、画像データプロセッサ１４６２の出力１４５５を形成する。 FIG. 131 shows the image data processor 1462 of FIG. 129 in more detail. The image data processor 1462 includes a pipeline control unit 1540 and a number of color channel processors 1545, 1550, 1555, and 1560. All color channel processors receive input from bus 1565 driven by input interface 1460 (FIG. 131). All the channel processors and the pipeline control unit 1540 are configured by control signals from the control signal register 1470 via the bus 1472. All color channel processors may also receive input from the register file 1472 and ROM 1475 of FIG. The outputs of all color channel processors and pipeline controllers are grouped into a bus 1570 that forms the output 1455 of the image data processor 1462.

パイプライン制御部１５４０は、全てのカラーチャネルプロセッサのレジスタをイネーブル又はデスエーブルすることによって、全てのカラーチャネルプロセッサのデータオブジェクトのフローを制御する。パイプライン制御部１５４０の中には、レジスタパイプラインがある。パイプラインの形態及び長さは、バス１４７１からの制御信号により構成されるようになっており、パイプライン制御部１５４０のパイプラインとカラーチャネルプロセッサのパイプラインとは、その形態が同じである。パイプライン制御部はバス１５６５からＶＡＬＩＤ信号を受け取る。パイプライン制御部１５４０のパイプラインステージそれぞれにおいて、入力ＶＡＬＩＤ信号が起動され、パイプラインステージが停止されていない場合、パイプラインステージは全てのカラーチャネルプロセッサに対してレジスタイネーブル信号を起動させるとともに入力ＶＡＬＩＤ信号をラッチする。それから、ラッチの出力、即ち、ＶＡＬＩＤ信号は、次のパイプラインステージに移る。このようにして、パイプラインにおけるデータオブジェクトの移動が、データ記憶装置を用いずに、シミュレートかつ制御される。 Pipeline controller 1540 controls the flow of data objects for all color channel processors by enabling or disabling registers for all color channel processors. The pipeline control unit 1540 includes a register pipeline. The form and length of the pipeline are configured by a control signal from the bus 1471, and the form of the pipeline of the pipeline control unit 1540 and that of the color channel processor are the same. The pipeline control unit receives the VALID signal from the bus 1565. In each pipeline stage of the pipeline control unit 1540, when the input VALID signal is activated and the pipeline stage is not stopped, the pipeline stage activates the register enable signal for all color channel processors and the input VALID. Latch the signal. Then the output of the latch, i.e. the VALID signal, moves to the next pipeline stage. In this way, the movement of data objects in the pipeline is simulated and controlled without using a data storage device.

カラーチャネルプロセッサ１５４５，１５５０，１５５５、及び１５６０は、入力データオブジェクトに対する主な算術動作を行い、各プロセッサは出力データオブジェクトの１つのチャネルを担当している。好適な実施例においては、大多数のピクセルデータオブジェクトが最大４つのチャネルを持っているため、カラーチャネルプロセッサの数は４に制限される。 Color channel processors 1545, 1550, 1555, and 1560 perform the main arithmetic operations on the input data object, and each processor is responsible for one channel of the output data object. In the preferred embodiment, the number of color channel processors is limited to four because most pixel data objects have a maximum of four channels.

カラーチャネルプロセッサの中には、ピクセルの不透明（ｏｐａｃｉｔｙ）チャネルを処理する部分がある。図１３１には示されていないが、制御バス１４７１に接続されている追加の回路があり、カラーチャネルプロセッサは不透明チャネルを正しく処理するように制御バス１４７１からの制御信号を変換する。これは、ある画像処理動作においては、不透明チャネルに対する動作がカラーチャネルに対する動作と少し異なるからである。 Within the color channel processor is the part that processes the pixel's opacity channel. Although not shown in FIG. 131, there is additional circuitry connected to the control bus 1471, and the color channel processor converts the control signal from the control bus 1471 to correctly handle the opaque channel. This is because in certain image processing operations, the operation on the opaque channel is slightly different from the operation on the color channel.

図１３２は、カラーチャネルプロセッサ１５４５，１５５０，１５５５、１５６０を（図１３２においては一般的に１６００で示した）より詳細に示す。各カラーチャネルプロセッサ１５４５，１５５０，１５５５、１５６０は、処理ブロックＡ１６１０と、処理ブロックＢ１６１５と、ビッグ加算器１６２０と、分数切り捨て部１６２５と、クランプまたはラッパー１６３０と、出力多重化部１６３５とを備えている。カラーチャネルプロセッサ１６００は、制御信号レジスタ１４７０からの制御信号をバス１６０２を経由して、パイプライン制御部１５４０からのイネーブル信号をバス１６０４を経由して、レジスタファイル１４７２からの情報をバス１６０５を経由して、その他カラーチャネルプロセッサからのデータオブジェクトをバス１６０３を経由して、入力インターフェース１４６０からのデータオブジェクトをバス１６０１を経由して、それぞれ受け取る。 FIG. 132 shows the color channel processors 1545, 1550, 1555, 1560 in more detail (generally indicated as 1600 in FIG. 132). Each color channel processor 1545, 1550, 1555, 1560 includes a processing block A1610, a processing block B1615, a big adder 1620, a fraction truncation unit 1625, a clamp or wrapper 1630, and an output multiplexing unit 1635. Yes. The color channel processor 1600 passes the control signal from the control signal register 1470 via the bus 1602, the enable signal from the pipeline control unit 1540 via the bus 1604, and the information from the register file 1472 via the bus 1605. Then, the data object from the other color channel processor is received via the bus 1603 and the data object from the input interface 1460 is received via the bus 1601.

処理ブロックＡ１６１０は，バス１６０１からのデータオブジェクトに対していくつかの算術動作を行い、部分的に計算されたデータオブジェクトをバス１６１１に出力する。処理ブロックＡ１６１０が画像処理動作のために行うべき処理を以下に説明する。合成において、処理ブロックＡ１６１０はデータオブジェクトバス１４５１からのデータオブジェクトに不透明度を掛け、ブレンド開始値とブレンド終了値との間を図１２９の入力インターフェース１４６０からの補間ファクタによって補間し、図１２９のオペランドバス１４５２からのオペランドをプレ乗算するかまたはブレンドカラーに不透明度を掛けるかする。そして、プレ乗算されたオペランドまたはブレンドカラーデータに対する乗算を減衰させる。 Processing block A 1610 performs some arithmetic operations on the data object from bus 1601 and outputs the partially calculated data object to bus 1611. Processing to be performed by the processing block A 1610 for the image processing operation will be described below. In compositing, processing block A 1610 multiplies the data object from data object bus 1451 by opacity, interpolates between the blend start value and blend end value by the interpolation factor from input interface 1460 of FIG. 129, and the operand of FIG. Either pre-multiply operands from bus 1452 or multiply the blend color by opacity. Then, the multiplication for the pre-multiplied operand or blend color data is attenuated.

一般色空間変換において、処理ブロックＡ１６１０は、図１２９のバス１４５１からの２つの分数値を用いて４つのカラーテーブル値の間を補間する。アフィン画像変換および畳込み演算において、処理ブロックＡ１６１０はソースピクセルの色に不透明度をプレ乗算し、現在ｘ座標の分数部を用いて同じ行のピクセルの間を補間する。 In general color space conversion, processing block A 1610 interpolates between four color table values using two fractional values from bus 1451 of FIG. In an affine image transformation and convolution operation, processing block A1610 premultiplies the source pixel color by the opacity and interpolates between pixels in the same row using the fractional part of the current x coordinate.

線形色空間変換において、処理ブロックＡ１６１０はソースピクセルのカラーに不透明度をプレ乗算し、プレ乗算されたカラーデータに変換行列係数を掛ける。水平補間と垂直補間において、処理ブロックＡ１６１０は２つのデータオブジェクトの間を補間する。 In linear color space conversion, processing block A 1610 pre-multiplies the source pixel color by opacity and multiplies the pre-multiplied color data by a transform matrix coefficient. In horizontal interpolation and vertical interpolation, processing block A 1610 interpolates between two data objects.

レジデュアルマージンにおいて、処理ブロックＡ１６１０は２つのデータオブジェクトを加算する。処理ブロックＡ１６１０は多数の多機能ブロック１６４０と、処理ブロックＡグルーロジック１６４５とを備える。多機能ブロック１６４０は制御信号によって構成されていて、以下の機能のどちらかの１つを実行することができる。 In the residual margin, processing block A1610 adds two data objects. Processing block A 1610 includes a number of multifunction blocks 1640 and processing block A glue logic 1645. Multifunction block 1640 is configured by control signals and can perform one of the following functions.

２つのデータオブジェクトに対し加減算を行う。１つのデータオブジェクトを伝える。２つのデータオブジェクトの間をある補間ファクタによって補間する。色に不透明度をプレ乗算する。２つのデータオブジェクトを掛け、その積に第３のデータオブジェクトを掛ける。 Addition / subtraction is performed on two data objects. Communicate one data object. Interpolate between two data objects by some interpolation factor. Premultiply color by opacity. Multiply two data objects and multiply the product by a third data object.

２つのデータオブジェクトに対し加減算を行い、その結果に不透明度をプレ乗算する。多機能ブロック１６４０のレジスタは、図１３１のパイプライン制御部１５４０によって生成される、バス１６０４からのイネーブル信号によってイネーブルされるかデスエーブルされる。処理ブロックＡグルーロジック１６４５はバス１６０１からのデータオブジェクトおよびバス１６０３からのデータオブジェクトと、いくつかの多機能ブロック１６４０の出力とを受け取り、これらをその他の選択された多機能ブロック１６４０の入力に送る。処理ブロックＡグルーロジック１６４５もバス１６０２からの制御信号によって構成される。 Add / subtract two data objects and pre-multiply the result by opacity. The registers of the multifunction block 1640 are enabled or disabled by an enable signal from the bus 1604 generated by the pipeline controller 1540 of FIG. Processing block A glue logic 1645 receives data objects from bus 1601 and data objects from bus 1603 and the outputs of some multifunction block 1640 and sends them to the inputs of other selected multifunction blocks 1640. . The processing block A glue logic 1645 is also configured by a control signal from the bus 1602.

処理ブロックＢ１６１５は，バス１６０１からのデータオブジェクトとバス１６１１からの部分的に計算されたデータオブジェクトとに対して算術動作を行い、部分的に計算されたデータオブジェクトをバス１６１６に出力する。処理ブロックＢ１６１５が画像処理動作のために行う処理を以下に説明する。非正のオペレータをもつ合成において、処理ブロックＢ１６１５はデータオブジェクトバス１４５１からのプレ処理されたデータオブジェクトと、オペランドバス１４５２からのオペランドに対して、バス１６０３からの合成被乗数を掛けるとともに、８．８フォーマットの２５５／不透明度の値であるＲＯＭの出力を、クランプ／ラップされたデータオブジェクトに掛ける。 Processing block B 1615 performs arithmetic operations on the data object from bus 1601 and the partially calculated data object from bus 1611 and outputs the partially calculated data object to bus 1616. Processing performed by the processing block B1615 for the image processing operation will be described below. In composition with a non-positive operator, processing block B1615 multiplies the preprocessed data object from data object bus 1451 and the operand from operand bus 1452 by the composite multiplicand from bus 1603 and 8.8. The output of the ROM, which is the format 255 / opacity value, is multiplied by the clamped / wrapped data object.

正のオペレータをもつ合成において、処理ブロックＢ１６１５は、プレ処理された２つのデータオブジェクトを加算する。更に、不透明チャネルにおいては、前記の和から２５５を引いて、その差をオフセットに掛け、その積を２５５で割る。一般色空間変換において、処理ブロックＢ１６１５は、バス１４５１からの２つの分数値を用いて４つのカラーテーブル値の間を補間し、残っている分数値を用いて処理ブロックＡ１６１０からの部分的に補間されたカラー値と、以前の補間結果との間を補間する。 In composition with a positive operator, processing block B1615 adds the two preprocessed data objects. Further, in the opaque channel, subtract 255 from the sum, multiply the difference by the offset, and divide the product by 255. In general color space conversion, processing block B1615 interpolates between four color table values using the two fractional values from bus 1451 and partially interpolates from processing block A1610 using the remaining fractional values. Interpolate between the determined color value and the previous interpolation result.

アフィン画像変換および畳込み演算において、処理ブロックＢ１６１５は、現在ｙ座標の分数部を用いて、部分的に補間されたピクセルの間を補間し、補間されたピクセルにサブサンプルウェート行列の係数を掛ける。線形色空間変換において、処理ブロックＢ１６１５はソースピクセルのカラーに不透明度をプレ乗算し、プレ乗算されたカラーに変換行列係数を掛ける。 In the affine image transformation and convolution operations, processing block B1615 interpolates between the partially interpolated pixels using the fractional part of the current y coordinate and multiplies the interpolated pixels by the subsample weight matrix coefficients. . In linear color space conversion, processing block B1615 pre-multiplies the source pixel color by opacity and multiplies the pre-multiplied color by a transform matrix coefficient.

処理ブロックＢ１６１５は、多数の多機能ブロックと、処理ブロックＢグルーロジック１６５０とを備える。多機能ブロックは、処理ブロックＡ１６１０のものと同様であるが、処理ブロックＢグルーロジック１６５０においては、バス１６０１，１６０３，１６１１，１６３１からのデータオブジェクトと、選択された多機能ブロックの出力とを受け入れ、これらを選択された多機能ブロックの入力に送る。処理ブロックＢグルーロジック１６５０もバス１６０２からの制御信号によって構成される。 The processing block B1615 includes a large number of multifunction blocks and a processing block B glue logic 1650. The multifunction block is similar to that of processing block A 1610, but the processing block B glue logic 1650 accepts data objects from buses 1601, 1603, 1611, 1631 and the output of the selected multifunction block. These are sent to the input of the selected multifunction block. The processing block B glue logic 1650 is also configured by a control signal from the bus 1602.

ビッグ加算器１６２０は、処理ブロックＡ１６１０と処理ブロックＢ１６１５からの部分的結果のいくつかを結合する。これは、バス１６０１を経由して入力インターフェース１６４０から、バス１６１１を経由して処理ブロックＡ１６１０から、バス１６１６を経由して処理ブロックＢ１６１５から、そして、バス１６０５を経由してレジスタファイル１４７２から、それぞれの入力を受け取り、バス１６２１に結合された結果を出力する。ビッグ加算器１６２０も、バス１６０２の制御信号によって構成される。 Big adder 1620 combines some of the partial results from processing block A 1610 and processing block B 1615. This is done from the input interface 1640 via the bus 1601, from the processing block A 1610 via the bus 1611, from the processing block B 1615 via the bus 1616, and from the register file 1472 via the bus 1605, respectively. And outputs the result coupled to bus 1621. The big adder 1620 is also configured by a control signal of the bus 1602.

ビッグ加算器１６２０は、様々な画像処理動作に従って、異なる構成にすることができる。ビッグ加算器１６２０の所定の画像処理動作における動作を以下に説明する。非正のオペレータを持つ合成において、ビッグ加算器１６２０は処理ブロックＢ１６１５からの２つの部分積を合算する。 The big adder 1620 can be configured differently according to various image processing operations. The operation in the predetermined image processing operation of the big adder 1620 will be described below. In a synthesis with a non-positive operator, big adder 1620 adds the two partial products from processing block B1615.

正のオペレータを持つ合成において、オフセットイネーブルが起動されているときに、ビッグ加算器１６２０は不透明度チャネルからオフセットのある先処理されたデータオブジェクトの和を引く。アフィン画像変換／畳込み演算において、ビッグ加算器１６２０は処理ブロックＢ１６１５からの積を累算する。 In a composite with a positive operator, when offset enable is activated, big adder 1620 subtracts the sum of the preprocessed data objects with offsets from the opacity channel. In the affine image transformation / convolution operation, the big adder 1620 accumulates the product from the processing block B1615.

線形色空間変換において、第１サイクルでビッグ加算器は２つの行列係数／データオブジェクト積と常係数とを合算する。第２サイクルで、直前サイクルの和に他のもう２つの行列係数／データオブジェクト積を加える。分数切り捨て（丸め）部１６２５は、バス１６２１を経由してビッグ加算器１６２０からの入力を受け取り、出力の分数部を切り捨てる。分数部を表すビットの数は、レジスタファイル１４７２からバス１６０５のＢＰ信号によって表示される。ＢＰ信号を解釈する仕方を以下の表に表す。切り捨てられた出力はバス１６２６に提供される。 In the linear color space conversion, in the first cycle, the big adder adds the two matrix coefficients / data object product and the ordinary coefficient. In the second cycle, add another two matrix coefficient / data object products to the sum of the previous cycle. The fraction truncation (rounding) unit 1625 receives the input from the big adder 1620 via the bus 1621 and truncates the fractional part of the output. The number of bits representing the fractional part is indicated by the BP signal on the bus 1605 from the register file 1472. The following table shows how to interpret the BP signal. The truncated output is provided on bus 1626.

分数テーブル Fraction table

分数切り捨て部１６２５は、分数の切り捨ての以外に２つの作業を行う。
１）切り捨てられた結果が負であるかどうかを決定する。
２）切り捨てられた結果の絶対値が２５５より大きいかどうかを決定する。
クランプ又はラッパー１６３０はバス１６２６を経由して分数切り捨て部１６２５から入力を受け取り、下記の動作をその順序に従い行う。 The fraction truncation unit 1625 performs two operations other than fraction truncation.
1) Determine if the truncated result is negative.
2) Determine if the absolute value of the truncated result is greater than 255.
The clamp or wrapper 1630 receives input from the fraction truncation unit 1625 via the bus 1626 and performs the following operations in that order.

切り捨てられた結果の絶対値を求めるべきというオプションがイネーブルされているとき、その絶対値を求める。データオブジェクトのアンダフローをある最低値に、そして、データオブジェクトのオーバフローをある最大値に、それぞれクランプする。出力多重化部１６３５は、バス１６１６の処理ブロックＢの出力とバス１６３１のクランプまたはラッパーの出力とのなかで、最終の出力を選択する。なお、データオブジェクトに対して、いくつかの最終処理をも行うが、以下は所定の画像処理動作のために行われる動作を説明する。 Find the absolute value when the option to find the absolute value of the truncated result is enabled. Clamp data object underflow to a certain minimum value and data object overflow to a certain maximum value. The output multiplexing unit 1635 selects the final output among the output of the processing block B of the bus 1616 and the output of the clamp or wrapper of the bus 1631. Although some final processing is also performed on the data object, the operation performed for a predetermined image processing operation will be described below.

非正のオペレータをもつ、プレ乗算なしの合成において、多重化部１６３５は処理ブロックＢ１６１５のいくつかの出力を結合し、プレ乗算なしのデータオブジェクトを形成する。非正のオペレータをもつ、プレ乗算ありの合成において、多重化部１６３５はクランプまたはラッパー１６３０の出力を通過させる。 In synthesis without pre-multiplication with non-positive operators, the multiplexer 1635 combines several outputs of processing block B 1615 to form a data object without pre-multiplication. In pre-multiplication synthesis with non-positive operators, the multiplexer 1635 passes the output of the clamp or wrapper 1630.

正のオペレータをもつ合成において、多重化部１６３５は処理ブロックＢ１６３０のいくつかの出力を結合し、データオブジェクト結果を形成する。一般色空間変換において、多重化部１６３５は出力データオブジェクトに対して、翻訳・クランプ機能を適用する。他の動作において、多重化部１６３５は、クランプ又はラッパー１６３０の出力を通過させる。 In composition with positive operators, the multiplexer 1635 combines several outputs of processing block B 1630 to form a data object result. In the general color space conversion, the multiplexing unit 1635 applies a translation / clamp function to the output data object. In other operations, the multiplexer 1635 passes the output of the clamp or wrapper 1630.

図１３３は、例えば１６４０のような、１つの多機能ブロックをより詳細に示す。多機能ブロック１６４０は、モード検出部１７１０と、２つの加算オペランド論理部１６６０及び１６７０と、３つの多重化論理部１６８０，１６８５，及び１６９０と、２入力加算部１６７５と、２つの加数を持つ２入力乗算部１６９５と、レジスタ１７０５とを備える。 FIG. 133 shows one multifunction block, such as 1640, in more detail. The multi-function block 1640 has a mode detection unit 1710, two addition operand logic units 1660 and 1670, three multiplexing logic units 1680, 1685, and 1690, a two-input addition unit 1675, and two addends. A two-input multiplication unit 1695 and a register 1705 are provided.

モード検出部１７１０は、図１２９の制御信号レジスタ１４７０からのＭＯＤＥ信号１７１１と、図１２９の入力インターフェース１４６０からの２つのＳＵＢ信号１７１２及びＳＷＡＰ信号１７１３とを受け取る。モード検出部１７１０は、これらの信号を復号して、加算オペランド論理部１６６０および１６７０と、多重化論理部１６８０，１６８５，および１６９０に伝えられる制御信号を生成する。そして、この制御信号は、多機能ブロック１６４０を種々な動作のできるように構成する。多機能ブロック１６４０は、８つのモードを有する。 The mode detection unit 1710 receives the MODE signal 1711 from the control signal register 1470 in FIG. 129 and the two SUB signals 1712 and the SWAP signal 1713 from the input interface 1460 in FIG. 129. Mode detection unit 1710 decodes these signals and generates control signals that are communicated to addition operand logic units 1660 and 1670 and multiplexing logic units 1680, 1685, and 1690. This control signal configures the multifunction block 1640 to perform various operations. Multifunctional block 1640 has eight modes.

１）加減算モード：ＳＵＢ信号１７１２に従い、入力１６５５を入力１６６５に加えるか、または、入力１６６５から引く。更に、ＳＷＡＰ信号６９３に従い、入力をスワップすることもできる。
２）バイパスモード：入力１６５５を出力にバイパスする。
３）補間モード：入力１６７５を補間ファクタとして、入力１６５５と１６６５の間を補間する。ＳＷＡＰ信号１７１３に従い、入力１６５５および１６６５をスワップすることができる。 1) Addition / subtraction mode: Input 1655 is added to input 1665 or subtracted from input 1665 in accordance with SUB signal 1712. Furthermore, the inputs can be swapped according to the SWAP signal 693.
2) Bypass mode: Bypass the input 1655 to the output.
3) Interpolation mode: Interpolates between inputs 1655 and 1665 using input 1675 as an interpolation factor. According to the SWAP signal 1713, the inputs 1655 and 1665 can be swapped.

４）プレ乗算モード：入力１６５５に入力１６７５を掛け、その結果を２５５で割る。ＩＮＣレジスタ１７０８の出力は、正しい結果を得るためにバス１７０７における、このステージの結果を増加すべきかどうかを、次のステージに教える。
５）乗算モード：入力１６５５に入力１６７５を掛ける。 4) Pre-multiplication mode: The input 1655 is multiplied by the input 1675, and the result is divided by 255. The output of INC register 1708 tells the next stage whether the result of this stage on bus 1707 should be increased to obtain the correct result.
5) Multiplication mode: The input 1655 is multiplied by the input 1675.

６）加減算およびプレ乗算モード：入力１６６５を入力１６５５に加えるか、または、入力１６５５から引き、その結果に入力１６７５を掛け、そして、この積を２５５で割る。ＩＮＣレジスタ１７０８の出力は、正しい結果を得るためにバス１７０７にあるこのステージの結果を増加すべきかどうかを、次のステージに教える。 6) Addition / subtraction and pre-multiplication modes: add input 1665 to input 1655 or subtract from input 1655, multiply the result by input 1675, and divide this product by 255. The output of INC register 1708 tells the next stage whether the result of this stage on bus 1707 should be increased to get the correct result.

加算オペランド論理部１６６０及び１６７０は、加算器によって減算もできるようにするために、必要に応じて入力に対する１の補数を求める。加算器１６７５は、バス１６６２と１６７２の加算オペランドロジック１６６０及び１６７０の出力を合算し、その和をバス１６７７に出力する。多重化ロジック１６８０，１６８５、及び１６９０は、所望の機能を実行するために適する被乗数と加数を選ぶ。これらは全てモード検出部１７１０からのバス１７１４の制御信号によって構成される。 The add operand logic units 1660 and 1670 determine the one's complement for the input as necessary to allow subtraction by the adder. Adder 1675 adds the outputs of addition operand logics 1660 and 1670 on buses 1662 and 1672 and outputs the sum to bus 1677. Multiplexing logic 1680, 1685, and 1690 chooses the multiplicand and addend that are appropriate to perform the desired function. These are all configured by a control signal of the bus 1714 from the mode detection unit 1710.

２つの加数を持つ乗算部１６９５は、バス１６８２からの入力をバス１６７７からの入力に掛ける。そして、前記積にバス１６８７および１６９２からの入力の和を加える。加算器１７００は、乗算部１６９５の出力の下位８ビットに乗算部１６９５の出力の上位８ビットを加える。加算器１７００の桁上げはＩＮＣレジスタ１７０１にラッチされる。ＩＮＣレジスタ１７０１は、信号１７０２によってイネーブルされる。レジスタ１７０５は乗算部１６９５からの積を記憶する。これも信号１７０２によってイネーブルされる。 A multiplication unit 1695 having two addends multiplies the input from the bus 1682 by the input from the bus 1677. Then, the sum of the inputs from buses 1687 and 1692 is added to the product. Adder 1700 adds the upper 8 bits of the output of multiplier 1695 to the lower 8 bits of the output of multiplier 1695. The carry of adder 1700 is latched in INC register 1701. INC register 1701 is enabled by signal 1702. The register 1705 stores the product from the multiplication unit 1695. This is also enabled by signal 1702.

図１３４は、合成動作のブロック図を示す。この合成動作は３つの入力データストリームを受け取る。
１）累算ピクセルデータ：この累算部モデルにおいて、結果が格納された位置と同一な位置から誘導される。
２）合成オペランド：カラーと不透明度からなる。カラーと不透明度の両方はフラット、ブレンド、ピクセル、またはタイルであることができる。 FIG. 134 shows a block diagram of the composition operation. This compositing operation receives three input data streams.
1) Accumulated pixel data: In this accumulator model, it is derived from the same location where the result was stored.
2) Composite operand: Consists of color and opacity. Both color and opacity can be flat, blend, pixel, or tile.

３）減衰：オペランドデータを減衰する。減衰はフラットなビットマップまたはバイトマップであることができる。
ピクセルデータは典型的に４つのチャネルからなる。その３つのチャネルがピクセルのカラーを形成する。残りのチャネルはピクセルの不透明度である。ピクセルデータはプレ乗算されても、或はされなくてもよい。ピクセルデータがプレ乗算されるとき、各カラーチャネルに不透明度を掛ける。ピクセルがプレ乗算されると合成動作の式が簡単になるため、ピクセルデータがプレ乗算されてから他のピクセルと合成されるのが普通である。 3) Attenuation: Attenuates operand data. The attenuation can be a flat bitmap or a byte map.
Pixel data typically consists of four channels. The three channels form the pixel color. The remaining channel is pixel opacity. Pixel data may or may not be premultiplied. When the pixel data is premultiplied, each color channel is multiplied by the opacity. When a pixel is pre-multiplied, the formula of the composition operation becomes simple. Therefore, pixel data is usually pre-multiplied and then synthesized with other pixels.

好適な実施例で実行される合成命令を表１に示す。各命令はプレ乗算されたデータに働きかける。（ａｃ０，ａ０）はプレ乗算されたピクセルカラーａｃと不透明度ａ０を、ｒは“オフセット”値、ｗｃ（）はラップ／クランプ・オペレータを意味し、表１におけるｏｖｅｒ、ｉｎ、ｏｕｔ、ａｔｏｐの各オペレータの逆オペレータも実装されている。また、合成モデルは左側に累算器を備える。 Table 1 shows the synthesis instructions executed in the preferred embodiment. Each instruction operates on pre-multiplied data. (Ac0, a0) is the premultiplied pixel color ac and opacity a0, r is the “offset” value, wc () is the wrap / clamp operator, and A reverse operator for each operator is also implemented. The composite model also has an accumulator on the left side.

図１３４における合成ブロック１７６０は、３つのカラーサブブロックと不透明サブブロックを具備する。各々のカラーサブブロックは、入力ピクセルの１つのカラーチャンネルと不透明チャンネルに対して動作して、出力ピクセルのカラーを得る。以上の動作を擬似コードの形で以下に示す。
ＰＩＸＥＬＣｏｍｐｏｓｉｔｅ（
ＩＮｃｏｌｏｒＡ，ｃｏｌｏｒＢ：ＰＩＸＥＬ；
ＩＮｏｐａｃｉｔｙＡ，ｏｐａｃｉｔｙＢ：ＰＩＸＥＬ；
ＩＮｃｏｍｐ＿ｏｐ：ＣＯＭＰＯＳＩＴＥ＿ＯＰＥＲＡＴＯＲ）
（
ＰＩＸＥＬｒｅｓｕｌｔ；
ＩＦｃｏｍｐ＿ｏｐがｒｏｖｅｒ，ｒｉｎ，ｒｏｕｔ，ｒａｔｏｐである
とＴＨＥＮ
ｃｏｌｏｒＡとｃｏｌｏｒＢをスワップする；
ｏｐａｃｉｔｙＡ，ｏｐａｃｉｔｙＢをスワップする；
ＥＮＤＩＦ；
ＩＦｃｏｍｐ＿ｏｐがｏｖｅｒ，ｒｏｖｅｒ，ｌｏａｄｏ，又は、ｐｌｕ
ｓであるとＴＨＥＮ
Ｘ＝１；
ＥＬＳＥＩＦｃｏｍｐ＿ｏｐがｉｎ，ｒｉｎ，ａｔｏｐ，又は、ｒａｔ
ｏｐであるとＴＨＥＮ
Ｘ＝ｏｐａｃｉｔｙＢ；
ＥＬＳＥＩＦｃｏｍｐ＿ｏｐがｏｕｔ，ｒｏｕｔ，又は、ｘｏｒである
とＴＨＥＮ
Ｘ＝ｎｏｔ（ｏｐａｃｉｔｙＢ）；
ＥＬＳＥＩＦｃｏｍｐ＿ｏｐがｌｏａｄｚｅｒｏ，ｌｏａｄｃ，又は、
ｌｏａｄｃｏであるとＴＨＥＮ
Ｘ＝０；
ＥＮＤＩＦ；
ＩＦｃｏｍｐ＿ｏｐがｏｖｅｒ，ｒｏｖｅｒ，ａｔｏｐ，ｒａｔｏｐ，又
は、ｘｏｒであるとＴＨＥＮ
Ｙ＝ｎｏｔ（ｏｐａｃｉｔｙａ）；
ＥＬＳＥＩＦｃｏｍｐ＿ｏｐがｐｌｕｓ，ｌｏａｄｃ，又は、ｌｏａｄ
ｃｏであるとＴＨＥＮ
Ｙ＝ｎｏｔ（ｏｐａｃｉｔｙａ）；
ＥＬＳＥＩＦｃｏｍｐ＿ｏｐがｐｌｕｓ，ｌｏａｄｃ，又は、ｌｏａｄ
ｃｏであるとＴＨＥＮ
Ｙ＝１；
ＥＬＳＥＩＦｃｏｍｐ＿ｏｐがｉｎ，ｒｉｎ，ｏｕｔ，ｒｏｕｔ，ｌｏ
ａｄｚｅｒｏ，又は、ｌｏａｄｏＴＨＥＮ
Ｙ＝０；
ＥＮＤＩＦ；
ｒｅｓｕｌｔ＝ｃｏｌｏＡ＊Ｘ＋ｃｏｌｏｒＢ＊Ｙ；
ＲＥＴＵＲＮｒｅｓｕｌｔ；
命令’ｌｏａｄ’と’ｌｏａｄｏ’が不透明チャンネルに対して異なる意味を持っているため、以上のコードは不透明サブブロックにおいて異なる。 The composite block 1760 in FIG. 134 includes three color sub-blocks and an opaque sub-block. Each color sub-block operates on one color channel and an opaque channel of the input pixel to obtain the color of the output pixel. The above operation is shown below in the form of pseudo code.
PIXEL Composite (
IN color A, color B: PIXEL;
IN opacity A, opacity B: PIXEL;
IN comp_op: COMPOSITE_OPERATOR)
(
PIXEL result;
IF if comp_op is rover, rin, rout, ratop
swap colorA and colorB;
swap opacityA, opacityB;
END IF;
IF comp_op is over, rover, loado, or pl
THEN to be s
X = 1;
ELSE IF comp_op is in, rin, atop, or rat
THEN to be op
X = opacityB;
THEN if ELSE IF comp_op is out, rout, or xor
X = not (opacity B);
ELSE IF comp_op is loadzero, loadc, or
THEN to be loadco
X = 0;
END IF;
If IF comp_op is over, rover, atop, ratop, or xor, then
Y = not (opacity);
ELSE IF comp_op is plus, loadc, or load
if it is co
Y = not (opacity);
ELSE IF comp_op is plus, loadc, or load
if it is co
Y = 1;
ELSE IF comp_op is in, rin, out, rout, lo
adzero or loado THEN
Y = 0;
END IF;
result = coloA * X + colorB * Y;
RETURN result;
Since the instructions 'load' and 'loado' have different meanings for opaque channels, the above code is different in the opaque sub-block.

図１３４におけるブロック１７６５は、ブロック１７６０の出力をクランプまたはラップする。ブロック１７６５がクランプするように構成されると、許容される最小値より小さい全ての値を最小値に、許容される最大値より大きい全ての値を最大許容値に抑える。ブロック１７６５がスワップするように構成されると、以下の式を計算する。 Block 1765 in FIG. 134 clamps or wraps the output of block 1760. When block 1765 is configured to clamp, all values less than the minimum allowed value are kept to the minimum value and all values greater than the maximum allowed value are kept to the maximum allowed value. When block 1765 is configured to swap, the following equation is calculated:

（（ｘ−ｍｉｎ）ｍｏｄ（ｍａｘ−ｍｉｎ））＋ｍｉｎ，
ここで、ｍｉｎとｍａｘはカラーにおいて許容される最小値と最大値を意味する。最小値と最大値としては、０と２５５が望ましい。図１３４におけるブロック１７７０は、ブロック１７６５からの結果をプレ乗算する。これはプレ乗算されたカラー値に２５５／ｏを掛けることによりピクセルをプレ乗算する。ここで、ｏは合成後の不透明度を意味する。２５５／ｏの値は合成エンジン内のＲＯＭから得られる。ＲＯＭ内の値は８．８フォーマットで記憶されており、分数以下の部分は丸められる。乗算の結果は１６．８フォーマットで格納される。逆プレ乗算されたピクセルを生成するために、この結果は８ビットで丸められる。 ((X−min) mod (max−min)) + min,
Here, min and max mean the minimum value and the maximum value allowed in the color. As the minimum value and the maximum value, 0 and 255 are desirable. Block 1770 in FIG. 134 pre-multiplies the result from block 1765. This pre-multiplies the pixel by multiplying the pre-multiplied color value by 255 / o. Here, o means opacity after synthesis. The value 255 / o is obtained from the ROM in the synthesis engine. Values in ROM are stored in 8.8 format, and fractional parts are rounded. The result of the multiplication is stored in 16.8 format. This result is rounded to 8 bits to produce the inverse premultiplied pixels.

ブランド生成部１７２１は特定の開始値と終了値を持つ特定長さのブランドを生成する。これは以下の２つのステージに渡って行なわれる。
１）ランプ生成
２）補間
ランプ生成において、合成エンジンは命令の長さに対して、０から２５５まで線形増加する数列を生成する。ランプ生成には、長さが２５５以下の“ジャンプ”モードと長さが２５５より長い“ステップ”モードの２つがある。モードは長さの上位２４ビットによって決まる。ジャンプモードにおいて、ランプ値の増加分はクロック周期ごとに少なくとも１である。ステップモードおいて、ランプ値の増加分はクロック周期ごとに最大１である。 The brand generation unit 1721 generates a brand of a specific length having a specific start value and end value. This is done over the following two stages:
1) Ramp generation 2) Interpolation In ramp generation, the synthesis engine generates a sequence that increases linearly from 0 to 255 with respect to the length of the instruction. There are two ramp generations: a “jump” mode with a length of 255 or less and a “step” mode with a length greater than 255. The mode is determined by the upper 24 bits of length. In jump mode, the ramp value increment is at least 1 per clock cycle. In the step mode, the increment of the ramp value is a maximum of 1 every clock cycle.

ジャンプモードにおいて、合成エンジンはステップ値２５５／（長さ−１）を求めるために８．８フォーマットのＲＯＭを用いる。この値は１６ビット累算器に加えられる。累算器の出力は８ビットで切り捨てられて数列を形成する。ステップモードおいて、合成エンジンはＢｒｅｓｅｎｈａｍの線描アルゴリズムに似たアルゴリズムを用いる。そのアルゴリズムを以下に示す。 In jump mode, the composition engine uses 8.8 format ROM to determine the step value 255 / (length-1). This value is added to the 16-bit accumulator. The accumulator output is truncated at 8 bits to form a sequence. In step mode, the composition engine uses an algorithm similar to Bresenham's line drawing algorithm. The algorithm is shown below.

Ｖｏｉｄｌｉｎｅｄｒａｗ（ｌｅｎｇｔｈ：ＩＮＴＥＲＧＥＲ）
｛
ｄ＝５１１−ｌｅｎｇｔｈ；
ｉｎｃｒＥ＝５１０；
ｉｎｃｒＮＥ＝５１２−２＊ｌｅｎｇｔｈ；
ｒａｍｐ−０；
ｆｏｒ（ｉ＝０；ｉ（ｌｅｎｇｔｈ；ｉ＋＋）
｛
ｉｆｄ（＝０ｔｈｅｎ
ｄ＋＝ｉｎｃｒＥ；
ｅｌｓｅ｛
ｄ＋＝ｉｎｃｒＮＥ；
ｒａｍｐ＋＋；
｝
｝
｝
その後、ランプからブランドを生成するために次の式が使われる。 Void line draw (length: INTERGER)
{
d = 511-length;
incrE = 510;
incrNE = 512-2 * length;
ramp-0;
for (i = 0; i (length; i ++)
{
if d (= 0 then
d + = incrE;
else {
d + = incrNE;
ramp ++;
}
}
}
The following formula is then used to generate the brand from the lamp:

Ｂｌｅｎｄ＝（（ｅｎｄ−ｓｔａｒｔ）ｘｒａｍｐ／２５５）＋ｓｔａ
ｒｔ
２５５による割算に対して切り捨てが行われる。上記式は、２つの加算器と、各チャンネルのランプによって（ｅｎｄ−ｓｔａｒｔに対し）“プレ乗算”を行なうブロックとを必要とする。主データパス部２４２が行なうことのできる他の画像処理は、一般色空間変換である。一般化色空間変換（ＧＣＳＣ）は出力カラー値を求めるためにピースワイズトライーリニア（３次線形）補間を用いる。３次元の入力空間から１次元もしくは４次元出力空間への変換が行なわれるのが望ましい。 Blend = ((end-start) × ramp / 255) + sta
rt
Truncation is performed for division by 255. The above equation requires two adders and a block that "pre-multiplies" (for end-start) with each channel ramp. Another image processing that the main data path unit 242 can perform is general color space conversion. Generalized color space conversion (GCSC) uses piecewise tri-linear interpolation to determine the output color value. It is desirable to perform conversion from a three-dimensional input space to a one-dimensional or four-dimensional output space.

いくつかの場合においては、色域のエッジにおけるトライーリニア補間の正確さが問題になる。この問題はエッジ付近に対して敏感なプリントデバイスにおいて著しくなる。この問題を避けるためにＧＣＳＣは、選択的に拡張出力色空間において計算されることができ、次の式を用いて適当な範囲内にスケール及びクランプされる。 In some cases, the accuracy of tri-linear interpolation at the gamut edges is a problem. This problem becomes significant in printing devices that are sensitive to near edges. To avoid this problem, the GCSC can be selectively computed in the extended output color space and scaled and clamped within the appropriate range using the following equation:

好適な実施例が実行できるその他の画像処理には、画像変換および畳込み演算である。画像変換においてソース画像はスケール、回転、スキューされる。畳込み演算において、ソース画像のピクセルは畳込み行列をもってサンプリングされ、目的画像を生成する。目的画像におけるスキャンラインを生成するためには次の段階が必要である。 Other image processes that the preferred embodiment can perform are image conversion and convolution operations. In image conversion, the source image is scaled, rotated, and skewed. In the convolution operation, the pixels of the source image are sampled with a convolution matrix to produce the target image. In order to generate a scan line in the target image, the following steps are necessary.

１）図１３５に示すような目的画像のスキャンラインを逆変換する。これによって目的画像のスキャンラインを生成するに必要なソース画像のピクセルを識別することができる。
２）ソース画像の必要部分を解凍する。
３）目的画像の水平、垂直サブサンプリング距離、開始ｘ，ｙ座標をソース画像に逆変換する。 1) Inversely transform the scan line of the target image as shown in FIG. This makes it possible to identify the pixels of the source image that are necessary to generate the scan line of the target image.
2) Decompress the necessary part of the source image.
3) Invert the horizontal and vertical sub-sampling distances and start x, y coordinates of the target image back to the source image.

４）上記情報を処理部に伝送し、必要なサブサンプリングと補間を行ない、出力画像のピクセルを求める。
サブサンプリング、補間、目的ピクセルの書き込みなどは好適な実施例によって行なわれ、ソース画像における関連する部分、使うべきサブサンプリング周波数などの計算はホストアプリケーションによって行なわれる。 4) Transmit the above information to the processing unit, perform necessary sub-sampling and interpolation, and obtain the pixel of the output image.
Subsampling, interpolation, writing of the target pixel, etc. are performed by the preferred embodiment, and calculations of the relevant parts in the source image, the subsampling frequency to be used, etc. are performed by the host application.

図１３６は目的ピクセル値の計算において必要な段階のブロック図である。図１３６は必要なソース画像のピクセルが利用可能であるものと想定している。目的ピクセルを計算する最後の段階は、ソース画像から２次線形補間された全てのサブサンプルを合算することである。主データパス部２４２における適当な設定によって引き出される画像変換エンジンのブロック図を図１３７に示す。画像変換エンジン１８３０はアドレス生成部１８３１、プレ乗算部１８３２、補間部１８３３、累算部１８３４、切捨て、クランプ、絶対値を求める論理部１８３５からなる。 FIG. 136 is a block diagram of the necessary steps in the calculation of the target pixel value. FIG. 136 assumes that the necessary source image pixels are available. The final step in calculating the target pixel is to add up all the subsamples that have been linearly interpolated from the source image. FIG. 137 shows a block diagram of the image conversion engine extracted by appropriate settings in the main data path unit 242. The image conversion engine 1830 includes an address generation unit 1831, a pre-multiplication unit 1832, an interpolation unit 1833, an accumulation unit 1834, truncation, clamping, and a logic unit 1835 for obtaining an absolute value.

アドレス生成部１８３１は、結果ピクセルを構成するのに必要なソース画像のｘ，ｙ軸を生成する。また、これは入力インデックステーブル１８１５と画像１８１０のピクセルからインデックスオフセットを求めるためのアドレスを生成する。アドレス生成部１８３１がソース画像のｘ，ｙ軸を生成する前にカーネルディスクリプタを読む。カーネルディスクリプタのフォーマットには２つの種類があり、それを図１３８に示す。カーネルディスクリプタは、
１）ソース画像の開始座標（符号なしの固定小数点、２４．２４精度）。位置（０、０）は画像の左上端である。 The address generation unit 1831 generates the x and y axes of the source image necessary for constructing the result pixel. This also generates an address for obtaining an index offset from the pixels of the input index table 1815 and the image 1810. The address generation unit 1831 reads the kernel descriptor before generating the x and y axes of the source image. There are two types of kernel descriptor formats, which are shown in FIG. The kernel descriptor is
1) Start coordinates of source image (unsigned fixed point, 24.24 accuracy). The position (0, 0) is the upper left corner of the image.

２）水平、垂直のサブサンプルデルタ（２の補数、２４．２４精度）
３）固定小数点行列係数における２進小数点の位置を示す３ビットのｂｐフィールド。図１５０はｂｐフィールドの定義とその説明を示す。
４）累算行列係数。これは２０個の２進位置（２の補数）を持つ”可変”小数点精度のものであり、２進小数点の位置はｂｐフィールドにより暗黙的に規定される。 2) Horizontal, vertical subsample delta (2's complement, 24.24 precision)
3) A 3-bit bp field indicating the position of the binary point in the fixed-point matrix coefficient. FIG. 150 shows the definition of the bp field and its description.
4) Accumulated matrix coefficients. This is a "variable" decimal point precision with 20 binary positions (2's complement), and the position of the binary point is implicitly defined by the bp field.

５）カーネルディスクリプタのワードの残り個数を示すｒｌフィールド。この値は行の個数と（列の個数−１）とを掛けたものと同じである。
短いカーネルディスクリプタにおいて、ｘの開始座標の定数部を除いた他のパラメータは次のような値を持つ。
ｘの開始座標の分数＜ −０，
ｙの開始座標＜ −０，
水平デルタ＜ −１．０，
垂直デルタ＜ −１．０．
アドレス生成部１８３１が構成された後、現座標を計算する。これにはサブサンプル行列の次元に応じて２つの方法がある。サブサンプル行列の次元が１×１である場合、アドレス生成部１８３１は十分な座標が得られるまで水平デルタを現座標に加える。 5) An rl field indicating the remaining number of words in the kernel descriptor. This value is the same as the number of rows multiplied by (number of columns minus 1).
In the short kernel descriptor, the other parameters excluding the constant part of the start coordinate of x have the following values.
fraction of start coordinate of x <−0,
start coordinates of y <−0,
Horizontal delta <-1.0,
Vertical delta <−1.0.
After the address generator 1831 is configured, the current coordinates are calculated. There are two methods for this depending on the dimension of the subsample matrix. If the dimension of the subsample matrix is 1 × 1, the address generator 1831 adds the horizontal delta to the current coordinates until sufficient coordinates are obtained.

サブサンプル行列の次元が１×１でない場合、アドレス生成部１８３１は行列の１つの行が終るまで水平デルタを現座標に加える。その後、アドレス生成部１８３１は次の行の座標を求めるために垂直デルタを現座標に加える。アドレス生成部１８３１は次の座標を求めるため、１つ以上の列が終るまで現座標から水平デルタを引く。その後、アドレス生成部１８３１は垂直デルタを現座標に加え、そしてこの過程を繰り返す。図１５０の上端におけるダイアグラムは行列へのアクセス方法を示す。この構造を用いて、行列はジグザグでスキャンされ、この方法によって現在のｘ，ｙ軸が計算されるので、必要なレジスタ数は少なくてもよい。累算行列係数はカーネルディスクリプタにおいて同様な順序で並べなければならない。 If the dimension of the subsample matrix is not 1 × 1, the address generator 1831 adds a horizontal delta to the current coordinates until one row of the matrix is completed. Thereafter, the address generation unit 1831 adds a vertical delta to the current coordinates in order to obtain the coordinates of the next line. The address generator 1831 subtracts the horizontal delta from the current coordinates until the end of one or more columns to determine the next coordinate. Thereafter, the address generator 1831 adds the vertical delta to the current coordinates and repeats this process. The diagram at the top of FIG. 150 shows how to access the matrix. With this structure, the matrix is scanned zigzag and the current x, y axes are calculated by this method, so fewer registers are required. The accumulated matrix coefficients must be arranged in a similar order in the kernel descriptor.

現座標を生成した後、アドレス生成部１８３１はインデックステーブルのアドレスを求めるため、ｙ軸をインデックステーブルベースアドレスに加える（ソースピクセルが補間されている場合、アドレス生成部１８３１は次のインデックステーブルも求める必要がある）。インデックステーブルベースアドレスは（ｙ＋０）におけるインデックステーブルエントリを指す。インデックステーブルからインデックスオフセットを求めた後、アドレス生成部１８３１はそれをｘ座標に加える。この和は、ソース画像から１ピクセルを求めるときに用いられる（ソースピクセルが補間されている場合は２ピクセル）。ソースピクセルが補間されている場合、アドレス生成部１８３１はｘ座標を次のインデックスオフセットに加え、２以上のピクセルを得る。 After generating the current coordinates, the address generator 1831 adds the y-axis to the index table base address to determine the index table address (if the source pixel is interpolated, the address generator 1831 also determines the next index table. There is a need). The index table base address points to the index table entry at (y + 0). After obtaining the index offset from the index table, the address generator 1831 adds it to the x coordinate. This sum is used when determining one pixel from the source image (two pixels if the source pixel is interpolated). If the source pixel is interpolated, the address generator 1831 adds the x coordinate to the next index offset to obtain two or more pixels.

画像変換の座標を求めるとき、畳込み演算においても類似な手法を使う。畳込み演算との唯一の差異は、畳込み演算は次の出力ピクセルにおける行列の開始座標が前ピクセルにおける行列の開始座標から水平デルタだけ離れていることである。画像変換において、次のピクセルにおける行列の開始座標は、以前の出力ピクセルにおける行列の右上端ピクセルの座標から水平デルタだけ離れている。 A similar method is used in the convolution operation when obtaining the coordinates for image conversion. The only difference from the convolution operation is that the starting coordinate of the matrix at the next output pixel is a horizontal delta away from the starting coordinate of the matrix at the previous pixel. In image transformation, the starting coordinate of the matrix at the next pixel is separated by a horizontal delta from the coordinates of the upper rightmost pixel of the matrix at the previous output pixel.

図１３９において、中段のダイアグラムは上記の差を示す。プレ乗算部１８３２は必要であればピクセルのカラーチャネルと不透明チャネルを掛ける。補間部１８３２は必要なピクセルの真の色を求めるためソースピクセルを補間する。これはソース画像メモリから２ピクセルを取り、現在のｘ座標の分数部分を用いて補間し、その結果をレジスタに入力する。その後、ソース画像メモリの次の列の２ピクセルを取り、同じくｘの分数を用いて補間する。その後、補間部１８３３は現在のｙ座標の分数部を用いて、この補間値とその前の補間値を補間する。 In FIG. 139, the middle diagram shows the difference. The premultiplier 1832 multiplies the pixel color channel and the opaque channel if necessary. Interpolator 1832 interpolates the source pixel to determine the true color of the required pixel. It takes 2 pixels from the source image memory, interpolates using the fractional part of the current x coordinate, and enters the result into a register. Then take two pixels in the next column of the source image memory and also interpolate using a fraction of x. Thereafter, the interpolation unit 1833 interpolates this interpolation value and the previous interpolation value using the fractional part of the current y coordinate.

累算部１８３４は２つの作業をする。
１）行列係数とピクセルを掛ける。
２）全ての行列に対する上の結果を累算した値を次のステージに出力する。累算部１８３４の初期値は、チャネルに応じて、０もしくは特定の値に初期化される。 Accumulator 1834 performs two tasks.
1) Multiply the matrix coefficient by the pixel.
2) Output the accumulated value of the above results for all matrices to the next stage. The initial value of the accumulation unit 1834 is initialized to 0 or a specific value according to the channel.

ブロック１８３５は累算部１８３４の出力を切り捨て、必要であればアンダーフローやオーバーフローした値を最大値または最小値に制限する。そして、必要であれば出力の絶対値を求めることもある。累算部の出力において２進小数点の位置はカーネルディスクリプタのｂｐフィールドによって指定される。ｂｐフィールドは、累算結果において捨てるべきビットの数を示す。これは、図１３９における下端のダイアグラムに示されている。この累算値は符号ありの２の補数として扱われる。 Block 1835 truncates the output of accumulator 1834 and limits underflow or overflow values to a maximum or minimum value if necessary. If necessary, the absolute value of the output may be obtained. In the output of the accumulation part, the position of the binary point is specified by the bp field of the kernel descriptor. The bp field indicates the number of bits to be discarded in the accumulation result. This is shown in the bottom diagram in FIG. This accumulated value is treated as a signed two's complement.

主データパス部２４２が行えるもう１つの画像処理動作は行列乗算である。行列乗算は２つの空間の間でアフィン関係がある場合の色空間変換に使われる。これが、（３次線形補間に基づく）一般色空間変換との差異である。行列乗算の結果は次の式によって定義される。 Another image processing operation that can be performed by the main data path unit 242 is matrix multiplication. Matrix multiplication is used for color space conversion when there is an affine relationship between the two spaces. This is the difference from general color space conversion (based on cubic linear interpolation). The result of the matrix multiplication is defined by:

ここで、ｒｉは結果ピクセルであり、ａｉはＡオペランドピクセルである。行列のサイズは５列４行でなければならない。図１４０は、主データパス部２４２において行列乗算を行なう乗算−加算器のブロック図である。この中にはピクセルチャンネルに行列係数を掛ける乗算部、その結果を合算する加算器、必要に応じて出力値をクランプしそして絶対値を求める論理部からなる。 Here, ri is the result pixel and ai is the A operand pixel. The size of the matrix must be 5 columns and 4 rows. FIG. 140 is a block diagram of a multiplier-adder that performs matrix multiplication in the main data path unit 242. This includes a multiplication unit for multiplying the pixel channel by a matrix coefficient, an adder for adding the results, and a logic unit for clamping the output value and obtaining an absolute value as necessary.

行列乗算が終了するためには２クロックサイクルが必要である。各サイクルごとに多重化部を設定し、乗算部と加算部のデータが正しく選択されるようにする。第０サイクルにおいて、ピクセルの最下位２バイトが多重化部１８５１、１８５２によって選択される。次にその係数を行列の左側における２つの列、即ち、キャッシュにおける第０ラインにある行列係数に掛ける。 Two clock cycles are required to complete the matrix multiplication. A multiplexing unit is set for each cycle so that the data of the multiplication unit and the addition unit are correctly selected. In the 0th cycle, the least significant 2 bytes of the pixel are selected by the multiplexing units 1851 and 1852. The coefficient is then multiplied by the two columns on the left side of the matrix, the matrix coefficient in the 0th line in the cache.

第１サイクルにおいて、ピクセルのより上位２バイトがトップ多重化部によって選択される。次にその係数を行列の右側における２つの列に掛ける。乗算の結果は最終サイクルの結果に加えられる１８５４。加算部における和は８ビットに切り捨てられる１８５５。“オペランド論理部”１８５６は、加算部１８５４の入力が４つになるように乗算部出力を再配列する。これは乗算部の結果に対する加算を可能にするための再配列を行い、２４ビット係数と８ビットピクセル成分との正しい積を出力するようにする。 In the first cycle, the upper 2 bytes of the pixel are selected by the top multiplexer. The coefficient is then multiplied by two columns on the right side of the matrix. The result of the multiplication is added 1854 to the result of the last cycle. The sum in the adder is truncated 1855 to 8 bits. The “operand logic unit” 1856 rearranges the output of the multiplier so that the input of the adder 1854 becomes four. This performs a rearrangement to allow the addition to the result of the multiplier and output the correct product of 24-bit coefficients and 8-bit pixel components.

“ＡＣ論理部”１８５５は加算部の出力の最下位１２ビットを切捨て、設定に従い切り捨てられた結果の絶対値を求める。その後、設定に応じて、その結果をクランプまたはラップする。“ＡＣ論理部”がクランプするように設定されたとき、０以下の全ての値は０に、２５５以上の全ての値は２５５に抑えられる。“ＡＣ論理部”がラップするように設定されたとき、定数部分の下位８ビットが出力される。 “AC logic part” 1855 truncates the least significant 12 bits of the output of the adder, and obtains the absolute value of the result of the truncation according to the setting. Then, depending on the setting, the result is clamped or wrapped. When the “AC logic” is set to clamp, all values less than or equal to 0 are suppressed to 0, and all values greater than or equal to 255 are suppressed to 255. When the “AC logic part” is set to wrap, the lower 8 bits of the constant part are output.

主データパス部２４２は、上記以外の画像処理を行なうように設定されることもできる。設計再利用によってコストが低減されるとともに、様々な画像処理動作を早く行なうことのできるコンピュータアーキテクチャについて以下述べるようにする。なお、このコンピュータアーキテクチャは柔軟性をもっているため、外部プログラミングエージェントであってもそのアーキテクチャにさえ慣れていれば、元々予測しなかった画像処理動作をも実行できるようにコンピュータを構成することができる。また、設計のコアーは主にいくつかの多機能ブロックからなるため、設計の苦労を著しく減らすことができる。 The main data path unit 242 can also be set to perform image processing other than the above. A computer architecture that can reduce the cost by design reuse and can quickly perform various image processing operations will be described below. Since this computer architecture is flexible, even if it is an external programming agent, as long as it is familiar with the architecture, it is possible to configure the computer so that image processing operations that were not originally predicted can be executed. In addition, since the core of the design mainly consists of several multifunctional blocks, the design effort can be significantly reduced.

３．１８．６データキャッシュ制御部とキャッシュ
データキャッシュ制御部２４０は、コプロセッサ２２４における４キロバイトの読み出しデータキャッシュ２３０を備えている。データキャッシュ２３０はダイレクトマップＲＡＭキャッシュとして配列されており、外部メモリにおける同じ長さを持つラインのいずれも、キャッシューメモリ２３０（図２）における同じ長さの同じラインに直接マッピングされることができる。キャッシュメモリにおけるこのラインを普通キャッシュラインと呼び、上記キャッシュメモリは、多数のこのようなキャッシュラインからなる。 3.18.6 Data Cache Control Unit and Cache The data cache control unit 240 includes a 4 kilobyte read data cache 230 in the coprocessor 224. Data cache 230 is arranged as a direct map RAM cache, and any of the same length lines in external memory can be directly mapped to the same length and same line in cache memory 230 (FIG. 2). . This line in the cache memory is called a normal cache line, and the cache memory is composed of a large number of such cache lines.

データキャッシュ制御部２４０は２つのオペランドオーガナイザ２４７、２４８からのデータ要求をサービスする。まずデータがキャッシュ２３０に存在するかを確認する。そうでなければデータが外部メモリからフェッチされる。データキャッシュ制御部２４０にはプログラムのできるアドレス生成部があり、データキャッシュ制御部２４０がいくつかの異なるアドレッシングモードで動作するのを可能にする。また、要求されたデータのアドレスがデータキャッシュ制御部２４０によって作られるようになる特殊アドレシングモードもある。このモードでは８ワード（２５６ビット）までのデータをオペレーションオーガナイザ２４７、２４８に同時に送ることができる。 Data cache controller 240 services data requests from two operand organizers 247, 248. First, it is confirmed whether the data exists in the cache 230. Otherwise, the data is fetched from external memory. The data cache controller 240 has a programmable address generator, which allows the data cache controller 240 to operate in several different addressing modes. There is also a special addressing mode in which the address of the requested data is created by the data cache control unit 240. In this mode, data of up to 8 words (256 bits) can be sent to the operation organizers 247 and 248 simultaneously.

キャッシュＲＡＭは８つの独立してアドレス可能なメモリバンクからなる（異なるラインアドレスによってアドレスされた）。各々のバンクからのデータが２５６ビットに単位付けられる一部の特殊アドレシングモードに必要である。この配置は、お互いに異なるバンクから来たものであれば、８つの３２ビット要求までを同時にサービスすることができる。 The cache RAM consists of 8 independently addressable memory banks (addressed by different line addresses). Required for some special addressing modes where data from each bank is united to 256 bits. This arrangement can serve up to eight 32-bit requests simultaneously if they come from different banks.

キャッシュは、詳細に後述する以下のモードにおいて動作する。必要であれば、すべてのキャッシュが自動的に入れ込まれるようにすることも可能である。１．ノーマルモード
２．単一出力一般色空間変換モード
３．多出力一般色空間変換モード
４．ＪＰＥＧ符号化モード
５．低速ＪＰＥＧ復号モード
６．行列乗算モード
７．デスエーブルモード
８．無効化モード
図１４１は、図２におけるデータキャッシュ制御部２４０のアドレス、データ、制御フローとデータキャッシュ２３０とを示す。 The cache operates in the following modes described in detail later. If necessary, all caches can be automatically populated. 1. Normal mode 2. Single output general color space conversion mode Multi-output general color space conversion mode 4. JPEG encoding mode Low speed JPEG decoding mode 6. Matrix multiplication mode Desable mode 8. Invalidation Mode FIG. 141 shows the address, data, control flow and data cache 230 of the data cache control unit 240 in FIG.

データキャッシュ２３０は、前述したダイレクトマップキャッシュを具備する。データキャッシュ制御部２４０は、各キャッシュラインにおけるタグエントリを有するタグメモリ１８７２を具備しており、タグエントリはキャッシュラインが現在マップされている外部メモリアドレスの最上位部を有する。また、現在のキャッシュラインが有効であるかどうかを示すライン有効状態メモリ１８７３も備える。全てのキャッシュラインの初期状態は無効である。 The data cache 230 includes the direct map cache described above. The data cache control unit 240 includes a tag memory 1872 having a tag entry in each cache line, and the tag entry has the most significant part of the external memory address to which the cache line is currently mapped. A line valid state memory 1873 indicating whether the current cache line is valid is also provided. The initial state of all cache lines is invalid.

データキャッシュ制御部２４０は、オペランドオーガナイザＣ２４７（図２）とオペランドオーガナイザＣ２４８（図２）からのデータ要求をオペランドバスインターフェースを通じて同時にサービスできる。動作において、オペランドオーガナイザ２４７、２４８（図２）のどちらかの一方もしくは両方はインデックス１８７４を提供し、データ要求信号１８７６を出す。アドレス生成部１８８１はインデックス１８７４に対して１つもしくはそれ以上の完全な外部アドレス１８７７を生成する。キャッシュ制御部１８７８は、生成されたアドレス１８７７のタグアドレスに対するタグメモリ１８７２を検査するとともに、関連するキャッシュラインが有効であるかどうかを調べるためにライン有効状態メモリ１８７３を検査することにより、要求されたデータがキャッシュ２３０に存在するかどうかを判断する。要求されたデータがキャッシュメモリ２３０に存在するとき、要求データ１８８０と共に、アクノレッジメント（応答）信号１８７９が関連するオペレーションオーガナイザ２４７、２４８に送られる。要求されたデータがキャッシュメモリ２３０に存在しないとき、入力バスインターフェース１８７１と入力インターフェーススイッチ２５２（図２）を通じて、要求されたデータ１８７０が外部メモリからフェッチされる。データ１８７０は要求信号１８８２を出力し、要求されたデータ１８７０が生成されたアドレス１８７７を提供することによってフェッチされる。アクノリッジ信号１８８３及び要求されたデータ１８７０はそれぞれキャッシュ制御部１８７８及びキャッシュメモリ２３０に送られる。それから、そのキャッシュメモリ２３０に関連するキャッシュラインが新しいデータ１８７０によって更新される。新しいキャッシュラインのタグアドレスもタグメモリ１８７２に書き込まれ、新しいキャッシュラインにおけるライン有効状態１８７３が起動される。アクノリッジ信号１８７９はデータ１８７０とともに関連するオペランドオーガナイザ２４７又は２４８（図２）に送られる。 The data cache controller 240 can simultaneously service data requests from the operand organizer C247 (FIG. 2) and the operand organizer C248 (FIG. 2) through the operand bus interface. In operation, one or both of the operand organizers 247, 248 (FIG. 2) provides an index 1874 and issues a data request signal 1876. Address generator 1881 generates one or more complete external addresses 1877 for index 1874. The cache controller 1878 is requested by examining the tag memory 1872 for the tag address of the generated address 1877 and examining the line valid state memory 1873 to see if the associated cache line is valid. It is determined whether the stored data exists in the cache 230. When the requested data is present in the cache memory 230, an acknowledgment signal 1879 is sent to the associated operations organizer 247, 248 along with the requested data 1880. When the requested data does not exist in the cache memory 230, the requested data 1870 is fetched from the external memory through the input bus interface 1871 and the input interface switch 252 (FIG. 2). Data 1870 is fetched by outputting request signal 1882 and providing the address 1877 at which the requested data 1870 was generated. Acknowledge signal 1883 and requested data 1870 are sent to cache controller 1878 and cache memory 230, respectively. The cache line associated with that cache memory 230 is then updated with new data 1870. The tag address of the new cache line is also written into the tag memory 1872 and the line valid state 1873 in the new cache line is activated. Acknowledge signal 1879 is sent along with data 1870 to the associated operand organizer 247 or 248 (FIG. 2).

図１４２において、データキャッシュ２３０のメモリ構成を示す。データキャッシュ２３０は、キャッシュライン長が３２である１２８個のキャッシュラインＣ０，．．．，Ｃ１２７をもつダイレクトマップキャッシュとして整理される。キャッシュＲＡＭは別々のアドレス指定のできるメモリバンクＢ０，．．．，Ｂ７を具備しており、各メモリバンクは３２ビットのバンクライン１２８個のを持ち、各キャッシュラインＣｉは８つのメモリバンクＢ０，．．．Ｂ７において相当する８つのバンクラインＢ０ｉ，．．．，Ｂ７ｉを有する。 In FIG. 142, the memory configuration of the data cache 230 is shown. The data cache 230 has 128 cache lines C0,. . . , C127 are arranged as a direct map cache. The cache RAM is a memory bank B0,. . . , B7, each memory bank has 128 32-bit bank lines, and each cache line Ci has eight memory banks B0,. . . Eight bank lines B0i,. . . , B7i.

生成された外部メモリアドレスの構成を図１４３に示す。生成されたアドレスは２０ビットタグアドレス、７ビットラインアドレス、３ビットバンクアドレス、２ビットバイトアドレスからなる３２ビットのワードである。２０ビットタグアドレスはタグアドレスとタグメモリ１８７２に記憶されているタグと比較するのに使われる。７ビットラインアドレスはキャッシュメモリ１８７０にある関連するキャッシュラインのアドレスに使われる。３ビットバンクアドレスはキャッシュメモリ１８７０のメ関連するモリバンクのアドレスに使われる。２ビットバイトアドレスは３２ビットバンクラインの関連するバイトのアドレスに使われる。 FIG. 143 shows the configuration of the generated external memory address. The generated address is a 32-bit word consisting of a 20-bit tag address, a 7-bit line address, a 3-bit bank address, and a 2-bit byte address. The 20-bit tag address is used to compare the tag address with the tag stored in tag memory 1872. The 7-bit line address is used for the address of the associated cache line in the cache memory 1870. The 3-bit bank address is used as the address of the associated memory bank in the cache memory 1870. The 2-bit byte address is used for the address of the associated byte on the 32-bit bank line.

図１４４は、データキャッシュ制御部２４０とデータキャッシュ２３０の構造のブロック図を示す。ここで、１２８×２５６ビットＲＡＭはキャッシュメモリ２３０を構成し、これは８つの１２８×３２ビットの分離住所付けが可能なメモリバンクからなる。このＲＡＭは書き込み可能ポート（ｗｒｉｔｅ）、書き込みアドレスポート（ｗｒｉｔｅ＿ａｄｄｒ）、書き込みデータポート（ｗｒｉｔｅ＿ｄａｔａ）を持つ。また、読み可能ポート（ｒｅａｄ）、８つの読みアドレスポート（ｒｅａｄ＿ａｄｄｒ）、８つの読みデータ出力ポート（ｒｅａｄ＿ｄａｔａ）を持つ。キャッシュメモリ２３０の全てのメモリバンクへの同時書き込みを可能にさせるためキャッシュ制御ブロック１８７８から書き込み可能信号が生成される。必要によって、データキャッシュ２３０は書き込みデータポート（ｗｒｉｔｅ＿ｄａｔａ）を通じて外部メモリからの１もしくはそれ以上のラインのデータに更新される。書き込みアドレスポート（ｗｒｉｔｅ＿ａｄｄｒ）にラインアドレスを提供し、８：１多重化器ＭＵＸを利用することによって１ラインのデータが書き込まれる。８：１多重化器ＭＵＸはデータキャッシュ制御部（ａｄｄｒ＿ｓｅｌｅｃｔ）の制御の下で生成された外部アドレスからラインアドレスを選択する。キャッシュメモリ２３０の全てのメモリバンクへの同時読み込みを可能にさせるため、キャッシュ制御ブロック１８７８から読み可能信号が生成される。この方法で、キャッシュメモリ２３０のメモリバンクの８つの書きアドレスポート（ｒｅａｄ＿ａｄｄｒ）に提供される各々のラインアドレスに応じて、８つの読みデータポート（ｒｅａｄ＿ｄａｔａ）から８つの異なるバンクラインのデータを同時に読み込むことができる。 FIG. 144 shows a block diagram of the structure of the data cache control unit 240 and the data cache 230. Here, the 128 × 256 bit RAM constitutes a cache memory 230, which consists of eight memory banks capable of separate addressing of 128 × 32 bits. This RAM has a writable port (write), a write address port (write_addr), and a write data port (write_data). Further, it has a readable port (read), eight read address ports (read_addr), and eight read data output ports (read_data). A write enable signal is generated from cache control block 1878 to allow simultaneous writing to all memory banks of cache memory 230. If necessary, the data cache 230 is updated with data of one or more lines from the external memory through a write data port (write_data). One line of data is written by providing a line address to the write address port (write_addr) and using the 8: 1 multiplexer MUX. The 8: 1 multiplexer MUX selects a line address from the external address generated under the control of the data cache control unit (addr_select). A readable signal is generated from cache control block 1878 to allow simultaneous reading into all memory banks of cache memory 230. In this manner, data of eight different bank lines are read simultaneously from the eight read data ports (read_data) according to the respective line addresses provided to the eight write address ports (read_addr) of the memory bank of the cache memory 230. be able to.

各々のキャッシュメモリ２３０のバンクはプログラム可能アドレス生成器１８８１を持っている。これは違う８つの位置への、関連する８つのメモリバンクからの同時アクセスを可能にする。各々のアドレス生成器１８８１はアドレス生成器１８８１の作動モード設定のためのｄｃｃモード入力、インデックスパケット入力、ベースアドレス入力、アドレス出力を持つ。プログラム可能アドレス生成器１８８１の作動モードは、
（ａ）ｄｃｃモード入力への信号が各々のアドレス生成器１８８１をランダムアクセスモードにし、外部メモリアドレスがインデックスパケット入力へ提供され、一つもしくはそれ以上のアドレス生成器１８８１のアドレス出力に出力されるランダムアクセスモード；
（ｂ）ｄｃｃモード入力への信号が各々のアドレス生成器１８８１を適切なモードにするＪＰＥＧエンコーディングと復号、色空間変換、行列乗算モード。このモードでは、各々のアドレス生成器１８８１にはインデックスパケット入力へのインデックスが入力され、インデックスアドレスを生成する。作動モードによって、アドレス生成部は最大８つの異なる外部メモリアドレスを生成させることができる。 Each bank of cache memory 230 has a programmable address generator 1881. This allows simultaneous access to the eight different locations from the eight related memory banks. Each address generator 1881 has a dcc mode input, an index packet input, a base address input, and an address output for setting the operation mode of the address generator 1881. The operating mode of the programmable address generator 1881 is:
(A) A signal to the dcc mode input puts each address generator 1881 in random access mode and the external memory address is provided to the index packet input and output to the address output of one or more address generators 1881 Random access mode;
(B) JPEG encoding and decoding, color space conversion, matrix multiplication mode, where the signal to the dcc mode input causes each address generator 1881 to be in the appropriate mode. In this mode, each address generator 1881 receives an index to the index packet input and generates an index address. Depending on the mode of operation, the address generator can generate up to eight different external memory addresses.

８つのアドレス生成部１８８１は８つの異なる論理回路からなっており、各々は入力としてベースアドレス、出力として外部メモリアドレスを持つｄｃｃモードとインデックスからなる。ベースアドレスレジスタ１８８５はインデックスパケットの組合せである現在のベースアドレスを記憶し、ｄｃｃモードレジスタ１８８８はデータキャッシュ制御部２４０の現在の作動モード（ｄｃｃモード）を記憶する。 The eight address generation units 1881 are composed of eight different logic circuits, and each is composed of a dcc mode and an index having a base address as an input and an external memory address as an output. The base address register 1885 stores the current base address which is a combination of index packets, and the dcc mode register 1888 stores the current operation mode (dcc mode) of the data cache control unit 240.

タグメモリ１８７２は１ブロック、１２８×２０ビットのマルチポートＲＡＭで構成される。このＲＡＭは１つの書きポート（ｕｐｄａｔｅ−ｌｉｎｅ−ａｄｄｒ）、１つの書き可能ポート（ｗｒｉｔｅ）、８つの読みポート（ｔａｇ０＿ｄａｔａ，．．．，ｔａｇ７＿ｄａｔａ）を持っている。これは、８つのアドレス生成器１８８１が現在記憶されている、１つもしくはそれ以上に生成されたメモリアドレスの、ラインのタグアドレスを決定することによりポート（ｒｅａｄ０ｌｉｎｅ−ａｄｄｒ，．．．，ｒｅａｄ７ｌｉｎｅ−ａｄｄｒ）において８つの同時のルックアップを可能にする。これらラインの現在のタグアドレスはポート（ｔａｇ０−ｄａｔａ，．．．，ｔａｇ７−ｄａｔａ）からタグ比較部１８８６に出力される。ポート（ｕｐｄａｔｅ−ｌｉｎｅ−ａｄｄｒ）のタグメモリ１８７２への書き込みを可能にするため、必要によって、キャッシュ制御ブロック１８７２によりタグ書き信号は生成される。 The tag memory 1872 is composed of a single block, 128 × 20 bit multi-port RAM. This RAM has one write port (update-line-addr), one writable port (write), and eight read ports (tag0_data,..., Tag7_data). This is done by determining the tag address of the line of one or more generated memory addresses currently stored in the eight address generators 1881 (read0line-addr, ..., read7line-). 8 simultaneous lookups in addr). The current tag addresses of these lines are output from the ports (tag0-data,..., Tag7-data) to the tag comparison unit 1886. A tag write signal is generated by the cache control block 1872 as necessary to allow writing of the port (update-line-addr) to the tag memory 1872.

１２８ビットのラインｖａｌｉｄメモリ１８７３は、キャッシュメモリ２３０の各キャッシュラインのｖａｌｉｄ状態を保っている。これは１つの書きポート（ｕｐｄａｔｅ−ｌｉｎｅ−ａｄｄｒ）、１つの書き可能ポート（ｕｐｄａｔｅ）、８つの読み込みポート（ｒｅａｄ０ｌｉｎｅ−ａｄｄｒ，．．．，ｒｅａｄ７ｌｉｎｅ−ａｄｄｒ）、８つの読み可能ポート（ｌｉｎｅｖａｌｉｄ０，．．．，ｌｉｎｅｖａｌｉｄ７）からなる１２８×１ビットのメモリである。タグメモリと同じように、これは８つのアドレス生成部１８８１に、１つ若しくはそれ以上に生成されたメモリアドレスの個々のラインアドレスに対して、現在のラインにセーブされているラインｖａｌｉｄ状態を決定させることにより、ポート（ｒｅａｄ０ｌｉｎｅ−ａｄｄｒ，．．．，ｒｅａｄ７ｌｉｎｅ−ａｄｄｒ）に対しての８つの同時ルックアップを可能にする。このラインの現ラインｖａｌｉｄｅビットはポート（ｌｉｎｅｖａｌｉｄ０，．．．，ｌｉｎｅｖａｌｉｄ７）からタグ比較部１８８６に出力される。必要によっては、ラインｖａｌｉｄ状態メモリ１８７３の書きポートに、ポート（ｕｐｄａｔｅ−ｌｉｎｅ−ａｄｄｒ）からラインｖａｌｉｄ状態メモリ１８７３への書き込みを可能にするための書き信号がキャッシュ制御ブロック１８７８から生成する。 The 128-bit line valid memory 1873 maintains the valid state of each cache line in the cache memory 230. This includes one write port (update-line-addr), one writable port (update), eight read ports (read0line-addr, ..., read7line-addr), eight readable ports (linevalid0,. , Linevalid7) is a 128 × 1 bit memory. As with the tag memory, this determines in the eight address generator 1881 the line valid state saved in the current line for each line address of one or more generated memory addresses. This allows eight simultaneous lookups for the port (read0line-addr, ..., read7line-addr). The current line valid bit of this line is output from the port (linevalid0, ..., linevalid7) to the tag comparison unit 1886. If necessary, a write signal for enabling writing from the port (update-line-addr) to the line valid state memory 1873 is generated from the cache control block 1878 at the write port of the line valid state memory 1873.

タグ比較部１８８６は８つのタグ比較器からなっており、現在生成された外部アドレスのラインアドレスによってアクセスされるラインのタグメモリ１８７２に現在セーブされているタグアドレスを受け取るためのｔａｇ＿ｄａｔａ入力、現在生成された外部メモリアドレスのタグアドレス受け取るためのｔａｇ＿ａｄｄｒ入力、比較されるタグアドレス部を設定するための現動作モード信号（ｄｃｃ＿ｍｏｄｅ）を受け取るためのｄｃｃ＿ｉｎｐｕｔ、現在生成された外部アドレスのラインアドレスによってアクセスされるラインにあるラインｖａｌｉｄ状態メモリ１８７３に現在セーブされているラインｖａｌｉｄ状態を受け取るためのｌｉｎｅ＿ｖａｌｉｄ入力を持っている。比較部１８８６は８つのアドレス生成部１８８１それぞれに対して８つのｈｉｔ出力を持つ。生成された外部メモリアドレスのタグアドレスと、生成された外部メモリのラインアドレスによってアクセスされる位置にあるタグメモリ１８７２の内容とが一致する時、ｈｉｔ信号とそのラインへのラインｖａｌｉｄ状態ビット１８７３が出力される。この実施例では、外部メモリにセーブされているデータ構造は小さくなり、タグアドレスの最上位ビットが全て同じである。従って、タグアドレスの変化する最下位ビットだけを比較すれば良い。これはタグ比較部１８６６がタグアドレスの変化する最下位ビットを比較するよう現作動モード信号（ｄｃｃ＿ｍｏｄｅ）を設定することで可能になる。 The tag comparison unit 1886 includes eight tag comparators. The tag_data input for receiving the tag address currently saved in the tag memory 1872 of the line accessed by the line address of the currently generated external address, the current generation Tag_addr input for receiving the tag address of the external memory address generated, dcc_input for receiving the current operation mode signal (dcc_mode) for setting the tag address part to be compared, accessed by the line address of the currently generated external address Line_valid input for receiving the line valid state currently saved in the line valid state memory 1873 in the line. The comparison unit 1886 has eight hit outputs for each of the eight address generation units 1881. When the tag address of the generated external memory address matches the contents of the tag memory 1872 at the position accessed by the line address of the generated external memory, the hit signal and the line valid status bit 1873 for that line are set. Is output. In this embodiment, the data structure saved in the external memory is small, and the most significant bits of the tag address are all the same. Therefore, only the least significant bit where the tag address changes needs to be compared. This is made possible by setting the current operation mode signal (dcc_mode) so that the tag comparison unit 1866 compares the least significant bit where the tag address changes.

キャッシュ制御部１８７８はキャッシュメモリ２３０にあるデータへのアクセスが可能なとき、オペランドＢ２４７、オペランドＣ２４８からの要求（ｐｒｏｃ＿ｒｅｑ）と通知（ｐｒｏｃ＿ａｃｋ）を受け取る。動作モードによっては、キャッシュメモリ２３０の８つまでのバンクから異なるアドレスのデータが要求される。要求データがキャッシュメモリ２３０からアクセスできる時、タグ比較部１８８６からそのメモリのラインにヒットを出す。出されたヒット信号（ｈｉｔ０，．．．，ｈｉｔ７）に対して、キャッシュ制御部１８７８はポート（ｃａｃｈｅ＿ｒｅａｄ）に読み込み可能信号を生成し、ヒット信号が出されたキャッシュラインへの読み込みを可能にする。ヒット信号（ｈｉｔ０，．．．，ｈｉｔ７）ではなく要求（ｐｒｏｃ＿ｒｅｑ）１８７６が出された時には、生成された要求（ｅｘｔ＿ｒｅｑ）と供にデータのキャッシュラインの外部メモリアドレスが外部メモリに送られる。このキャッシュラインは入力（ｅｘｔ＿ｄａｔａ）が可能な時、それを通じてキャッシュメモリ２３０の８つのバンクに書き込まれる。この場合、タグ情報もラインアドレスのタグメモリ１８８６に書き込まれ、そのラインのライン状態ビット１８７３が出力される。 When the data in the cache memory 230 can be accessed, the cache control unit 1878 receives a request (proc_req) and a notification (proc_ack) from the operands B247 and C248. Depending on the operation mode, data of different addresses is requested from up to eight banks of the cache memory 230. When the requested data can be accessed from the cache memory 230, the tag comparison unit 1886 issues a hit to the line of that memory. In response to the issued hit signal (hit0,..., Hit7), the cache control unit 1878 generates a readable signal at the port (cache_read) and enables the cache line from which the hit signal is issued to be read. . When a request (proc_req) 1876 is issued instead of a hit signal (hit0,..., Hit7), the external memory address of the data cache line is sent to the external memory together with the generated request (ext_req). This cache line is written to the eight banks of the cache memory 230 through which input (ext_data) is possible. In this case, the tag information is also written in the tag memory 1886 of the line address, and the line status bit 1873 of the line is output.

キャッシュメモリ２３０の８つのバンクからのデータは、データオーガナイザ１８９２にあるいくつかの多重化器を通じて出力され、所定の方法で出力データパケット１８９４に位置付けられる。ある動作モードでデータオーガナイザ１８９２は、現動作モード信号（ｄｃｃ＿ｍｏｄｅ）と生成された外部メモリアドレスのバイトアドレス（ｂｙｔｅ＿ａｄｄｒ）を用いる事によって、８つのメモリバンクから出力された８つの３２ビットワードから８ビットワードを選択、出力することができる。他のモードでデータオーガナイザ１８９２は、８つのメモリバンクから出力された８つの３２ビットワードを直接出力する。前述した通り、データオーガナイザはこのデータを決められた方式に整列し出力する。 Data from the eight banks of the cache memory 230 is output through several multiplexers in the data organizer 1892 and positioned in the output data packet 1894 in a predetermined manner. In one operation mode, the data organizer 1892 uses the current operation mode signal (dcc_mode) and the byte address (byte_addr) of the generated external memory address to generate 8 bits from 8 32-bit words output from 8 memory banks. A word can be selected and output. In other modes, the data organizer 1892 directly outputs the eight 32-bit words output from the eight memory banks. As described above, the data organizer arranges and outputs this data in a predetermined system.

要求は次の段階で行われる。
１）プロセッシングユニットはキャッシュ制御部１８７８にあるプロセッシングユニットインターフェースにアドレスを送りパケットデータを要求する。
２）８つのアドレス生成ユニット１８８１は動作モードに従い、キャッシュメモリの各ブロックのアドレスを生成する。 The request is made in the next stage.
1) The processing unit sends an address to the processing unit interface in the cache control unit 1878 to request packet data.
2) The eight address generation units 1881 generate addresses of each block of the cache memory according to the operation mode.

３）生成されたアドレスのタグ位置は３ポートのタグメモリ１８８６の４ブロックにセーブされているタグアドレスと比較され、８つの生成されたアドレスに相当するライン部によって位置づけられる。
４）それらが一致し、そのラインのラインｖａｌｉｄ状態１８７３が出されたら、要求されたデータはキャッシュメモリ２３０に存在するとみなされる。 3) The tag position of the generated address is compared with the tag address saved in the 4 blocks of the 3-port tag memory 1886 and positioned by the line portion corresponding to the 8 generated addresses.
4) If they match and the line valid state 1873 for that line is issued, the requested data is considered to be in the cache memory 230.

５）存在しないデータは外部バス１８９０を介してフェッチされ、キャッシュメモリ２３０の８つのブロックはその外部メモリからのデータラインの内容に更新される。新しいデータのタグアドレスはタグメモリ１８８６に書き込まれ、そのラインのラインｖａｌｉｄ状態１８７３が出される。
６）全ての要求データがキャッシュメモリ２３０に存在すれば、それは決められたパケット形式でプロセッシングユニットに現れる。 5) The non-existent data is fetched via the external bus 1890, and the eight blocks of the cache memory 230 are updated with the contents of the data line from the external memory. The tag address of the new data is written into the tag memory 1886 and the line valid state 1873 for that line is issued.
6) If all the request data is present in the cache memory 230, it appears in the processing unit in the determined packet format.

前述した通り、コプロセッサ２２４の全ての部分（図２）は標準ＣＢｕｓインターフェース３０３（図２０）を含めている。データキャッシュ制御部２４０とキャッシュ２３０の標準ＣＢｕｓインターフェースレジスタの詳細は、付録ＢのＢ４２からＢ４６までに記載されている。このレジスタの設定はデータ制御部２４０の作動を制御する。簡単のため、２つのレジスタ（ｂａｓｅ＿ａｄｄｒｅｓｓとｂｃｃ＿ｍｏｄｅ）だけを図１５３に示す。 As described above, all parts of the coprocessor 224 (FIG. 2) include the standard CBus interface 303 (FIG. 20). Details of the standard CBus interface registers of the data cache control unit 240 and the cache 230 are described in B42 to B46 of Appendix B. The setting of this register controls the operation of the data control unit 240. For simplicity, only two registers (base_address and bcc_mode) are shown in FIG.

データキャッシュ制御部２４０とデータキャッシュ２３０が有効ならば、データキャッシュ制御部は最初全てのキャッシュラインを無効にして標準モードで動作する。ある命令の終わりには、データキャッシュ制御部２４０とキャッシュ２３０はいつも標準動作モードに切り替わる。”Ｉｎｖａｌｉｄａｔｅ”モードを除いた全てのモードには”Ａｕｔｏ−ｆｉｌｌａｎｄｖａｌｉｄａｔｅ”と言うオプションがある。ｄｃｃ＿ｃｆｇ２レジスタに１ビットをセットすることにより、全てのキャッシュをｂａｓｅ＿ａｄｄｒｅｓｓレジスタにセーブされているアドレスから始めることができる。この動作の間、オペランドオーガナイザＢ、Ｃ２４７，２４８からのデータ要求は中止される。キャッシュはこの動作が終わった後に有効になる。
ａ．標準キャッシュモード
このモードでは、２つのオペランドオーガナイザにより要求データの外部メモリアドレスが提供される。アドレス生成部１８８１が外部メモリアドレスを出力し、内部タグメモリを用いてそれがメモリキャッシュ２３０に存在するのかを確かめる。両方の要求データがキャッシュ２３０に存在しない場合、入力インターフェーススイッチ２５２からデータが要求される。持続的かつ同時的要求に構えてラウンド・ロビンスケジューリングが採用される。 If the data cache control unit 240 and the data cache 230 are valid, the data cache control unit initially invalidates all cache lines and operates in the standard mode. At the end of a certain instruction, the data cache controller 240 and the cache 230 are always switched to the standard operation mode. All modes except the “Invalidate” mode have an option of “Auto-fill and validate”. By setting 1 bit in the dcc_cfg2 register, all caches can start from the address saved in the base_address register. During this operation, data requests from the operand organizers B, C247, 248 are suspended. The cache is valid after this operation is finished.
a. Standard Cache Mode In this mode, the two memory organizers provide the external memory address of the requested data. The address generation unit 1881 outputs the external memory address, and checks whether it exists in the memory cache 230 using the internal tag memory. If both requested data are not present in the cache 230, data is requested from the input interface switch 252. Round robin scheduling is employed in response to persistent and simultaneous requests.

同時的な要求に対し、１つのデータアイテムがキャッシュに存在すれば、それは要求したデータバスの後ろの３２ビットに位置するようになる。他のデータは入力インターフェーススイッチを通じて外部に要求される。
ｂ．シングル出力一般色空間変換モード
このモードでは、要求はオペランドオーガナイザ部Ｂから１２ビットバイトのアドレス形式で出される。図６０に示されている様に、要求データアイテムは８ビットカラー出力値である。１２ビットアドレスはアドレス生成部１８８１のｉｎｄｅｘ＿ｐａｃｋｅｔ入力に入力され、８つのアドレス生成部１８８１は図９６に示される形式の３２ビット外部メモリアドレスを生成する。この生成されたアドレスのバンク、ライン、バイトアドレスは表１２と図６１によって決められる。外部メモリアドレスは、８つの９ビットラインとバイトアドレスとして解釈され、それはＲＡＭの８つのバンクのバイトを指すために使われる。キャッシュは補間のため主データパス２４２によりオペランドオーガナイザ部に、図６０に示された前述の原理で戻されたバンクの８バイト値を求めるためにアクセスされる。全てのシングル出力一般カラー値テーブルはキャッシュメモリ２３０に収まるため、シングルカラー変換モードを適用する前にシングル出力カラー値テーブルをキャッシュメモリ２３０にロードするのが望ましい。
ｃ．マルチ出力一般色空間変換モード
このモードでは、１２ビットワードアドレスがオペランドオーガナイザ部Ｂ２４７から受けられる。要求データアイテムは図６２を参照して前述した３２ビットカラー出力値である。１２ビットアドレスはアドレス生成部１８８１のｉｎｄｅｘ＿ｐａｃｋｅｔ入力に入力され、８つのアドレス生成部１８８１は、図９６に示される形式の８つの異なる３２ビット外部メモリアドレスを作る。外部メモリアドレスのラインとタグアドレスは、表１２と図６３によって決定される。外部メモリアドレスは、図６３を参照して前述したように、７ビットラインアドレスと２ビットタグアドレスに分けられる９ビットアドレスを有する８個の９ビットアドレスとして解釈される。タグアドレスが発見されなかった場合、入力インターフェーススイッチ２５２（図２）から適切なデータがロードされるまでキャッシュは停止する。データが利用可能な場合、出力データはオペランドオーガナイザ部に出力される。
ｄ．ＪＰＥＧ符号化モード
このモードでは、ＪＰＥＧ符号化モードに必要なテーブルなどがキャッシュＲＡＭのバンクにセーブされる。テーブルの記憶についてはＪＰＥＧ符号化モード（表１４、１６）のところに述べられている。
ｅ．低速ＪＰＥＧ復号モード
このモードでは、データは表１７に従って生成される。
ｆ．行列乗算モード
このモードでは、キャッシュは２５６バイトラインのデータにアクセスするために使われる。
ｇ．Ｄｉｓａｂｌｅｄモード
このモードでは、全ての要求は入力インターフェーススイッチ２５２にパスされる。
ｈ．Ｉｎｖａｌｉｄａｔｅ（無効化）モード
このモードでは、ラインｖａｌｉｄ状態ビットをクリアすることにより、全てのキャッシュの内容が無効にされる。 For a simultaneous request, if a data item is present in the cache, it will be located in the 32 bits behind the requested data bus. Other data is requested externally through the input interface switch.
b. Single Output General Color Space Conversion Mode In this mode, a request is issued from the operand organizer part B in a 12-bit byte address format. As shown in FIG. 60, the requested data item is an 8-bit color output value. The 12-bit address is input to the index_packet input of the address generator 1881, and the eight address generators 1881 generate a 32-bit external memory address in the format shown in FIG. The bank, line, and byte addresses of the generated address are determined by Table 12 and FIG. The external memory address is interpreted as eight 9-bit line and byte addresses, which are used to refer to the eight bank bytes of RAM. The cache is accessed by the main data path 242 for interpolation to the operand organizer to determine the 8-byte value of the bank returned on the aforementioned principle shown in FIG. Since all the single output general color value tables fit in the cache memory 230, it is desirable to load the single output color value table into the cache memory 230 before applying the single color conversion mode.
c. Multi-output general color space conversion mode In this mode, a 12-bit word address is received from the operand organizer B247. The request data item is the 32-bit color output value described above with reference to FIG. The 12-bit address is input to the index_packet input of the address generator 1881, and the eight address generators 1881 create eight different 32-bit external memory addresses in the format shown in FIG. The line and tag address of the external memory address are determined by Table 12 and FIG. As described above with reference to FIG. 63, the external memory address is interpreted as eight 9-bit addresses having 9-bit addresses divided into 7-bit line addresses and 2-bit tag addresses. If the tag address is not found, the cache stops until the appropriate data is loaded from the input interface switch 252 (FIG. 2). If the data is available, the output data is output to the operand organizer section.
d. JPEG encoding mode In this mode, tables and the like necessary for the JPEG encoding mode are saved in the bank of the cache RAM. Table storage is described in the JPEG encoding mode (Tables 14 and 16).
e. Low Speed JPEG Decoding Mode In this mode, data is generated according to Table 17.
f. Matrix multiplication mode In this mode, the cache is used to access 256 bytes of data.
g. Disabled mode In this mode, all requests are passed to the input interface switch 252.
h. Invalidate Mode In this mode, all cache contents are invalidated by clearing the line valid status bit.

３．１８．７入力インターフェーススイッチ
図２で、入力インターフェーススイッチはピクセルオーガナイザ部２４６、データキャッシュ制御部２４０、命令制御部２３５からの要求データを調節する投割を果たす。またこれは外部インターフェース制御部２３８とローカルメモリ制御部２３６に必要なアドレスとデータを伝送する。 3.18.7 Input Interface Switch In FIG. 2, the input interface switch performs allocation to adjust request data from the pixel organizer unit 246, the data cache control unit 240, and the instruction control unit 235. This also transmits necessary addresses and data to the external interface controller 238 and the local memory controller 236.

入力インターフェーススイッチ２５２はベースアドレス若しくはホストメモリマップにあるメモリオブジェクトのいずれかのレジスタにその設定を保存する。２０個のアドレスビットが必要なため、これはページ境界に整列されるバーチュアルアドレスである。ピクセルオーガナイザ部、データキャッシュ制御部、命令制御部からの要求に対して、入力インターフェーススイッチ２５２は、まずデータの開始アドレスの上位６ビットからコプロセッサのベースアドレスビットを減じる。この結果が負であるか、この結果の上位６ビットが０ではない場合はＰＣＩバスが望ましい伝送先であることを意味する。 The input interface switch 252 stores the setting in the register of either the base address or the memory object in the host memory map. Since 20 address bits are required, this is a virtual address aligned on a page boundary. In response to requests from the pixel organizer unit, data cache control unit, and instruction control unit, the input interface switch 252 first subtracts the coprocessor base address bits from the upper 6 bits of the data start address. If this result is negative or the upper 6 bits of this result are not 0, it means that the PCI bus is a desirable transmission destination.

結果の上位６ビットが０である場合は、データマップがコプロセッサのメモリ位置を現すことを意味する。その後、入力インターフェーススイッチはコプロセッサの位置が正しいか否かを判別するため次の３ビットを検査する。コプロセッサの正当な位置は、
１）コプロセッサのベースアドレスからオフセット０ｘ０１００００００から始まる一般インターフェースが占める１６メガバイト。 If the upper 6 bits of the result is 0, it means that the data map represents the memory location of the coprocessor. Thereafter, the input interface switch checks the next 3 bits to determine if the coprocessor position is correct. The legal position of the coprocessor is
1) 16 megabytes occupied by the general interface starting from offset 0x01000000 from the coprocessor base address.

２）コプロセッサのメモリオブジェクトのベースアドレスからオフセット０ｘ０２００００００から始まるローカルメモリ制御部（ＬＭＣ）が占める３２メガバイト。不当なコプロセッサの位置を指す要求は、入力インターフェーススイッチによりエラーと見なされる。ＰＣＩバスはコプロセッサのメモリオブジェクトが占める領域以外のアドレスのデータソースとなる。入力インターフェーススイッチは要求データがＰＣＩバスからのものなのか、それとも一般インターフェースからのものかをＥＩＣに知らせるためｉソース信号を用いる。 2) 32 megabytes occupied by the local memory controller (LMC) starting from offset 0x02000000 from the base address of the memory object of the coprocessor. A request to point to an illegal coprocessor location is considered an error by the input interface switch. The PCI bus is a data source for addresses other than the area occupied by the memory object of the coprocessor. The input interface switch uses the i source signal to inform the EIC whether the request data is from the PCI bus or the general interface.

アドレス復号処理の後、正当な要求は適切なＩＢｕｓインターフェースに伝送される。ＥＩＣとＬＭＣはｉ−ａｃｋ信号が出された時、入力インターフェーススイッチにデータを伝送する。しかし入力インターフェーススイッチは入力されるワード数をカウントしないので、現在のデータ伝送がいつ終わるのかを、ピクセルオーガナイザ部により制御されるｉ−ｏｅ信号、命令制御部、データキャッシュ制御部が監視すなければならない。 After the address decryption process, the legitimate request is transmitted to the appropriate IBus interface. The EIC and LMC transmit data to the input interface switch when the i-ack signal is output. However, since the input interface switch does not count the number of input words, the i-oe signal controlled by the pixel organizer unit, the command control unit, and the data cache control unit must monitor when the current data transmission ends. Don't be.

入力インターフェーススイッチ２５２はピクセルオーガナイザ部、データキャッシュ制御部、命令制御部の３つのモジュールを調節する。これらはデータを同時に要求することができるが、物理的な資源は２つしかないため、その要求は直に処理されない。入力インターフェーススイッチに使われる調節技術は優先権をベースにし、またプログラムも可能である。入力インターフェーススイッチの設定レジスタにある制御ビットは、命令制御部、データキャッシュ制御部、ピクセルオーガナイザ部の相対的優先権を指定する。優先権が低いモジュールからの要求は、その他の２つのモジュールからの同じ資源へのアクセス要求がない場合に受け入れられる。少なくとも２つの要求発行元に同じ優先順位が与えられると、要求が受付けられる発行元を決定するためにラウンドロビン技術を用いる必要が生じる。 The input interface switch 252 adjusts three modules: a pixel organizer unit, a data cache control unit, and an instruction control unit. They can request data at the same time, but since there are only two physical resources, the request is not processed directly. The adjustment technique used for the input interface switch is based on priority and can also be programmed. A control bit in the setting register of the input interface switch specifies the relative priority of the instruction control unit, data cache control unit, and pixel organizer unit. Requests from lower priority modules are accepted if there are no requests to access the same resource from the other two modules. Given at least two request issuers with the same priority, it becomes necessary to use a round-robin technique to determine the issuer from which the request is accepted.

１つのソースに直ちにアクセスするのが不可能であるため、入力インターフェーススイッチは要求されたデータのアドレスとバースト長を記憶し、要求元から提供されたデータをプリフェッチするかどうかをみる必要がある。あるソースに対する処理の中で、ＩＢｕｓ処理がない場合には優先権を決める調整処理が必要になる。 Since it is not possible to access one source immediately, the input interface switch needs to store the address and burst length of the requested data and see if it prefetches the data provided by the requester. If there is no IBus process in a process for a certain source, an adjustment process for determining priority is required.

図１４５に命令インターフェーススイッチ２５２の詳細を示す。スイッチ２５２は標準ＣＢｕｓインターフェースとレジスタファイル８６０以外にアドレス復号器８６３と調節部８６４の間に２つのＩＢｕｓトランシーバ６６１を持つ。アドレス復号器８６３はピクセルオーガナイザ部、データキャッシュ制御部、命令制御部から受けた要求に対するアドレス復号をする。アドレス復号器８６３は、アドレスが正当なのかを検査する他、必要によってアドレスを再マッピングする。調節部８６４はどの要求をＩＢｕｓトランシーバ６６１からＩＢｕｓトランシーバ６６２に伝送するのかを決める。優先権はプログラム可能である。 FIG. 145 shows details of the command interface switch 252. In addition to the standard CBus interface and the register file 860, the switch 252 has two IBus transceivers 661 between the address decoder 863 and the adjustment unit 864. The address decoder 863 performs address decoding in response to a request received from the pixel organizer unit, data cache control unit, and instruction control unit. The address decoder 863 checks whether the address is valid and remaps the address as necessary. The adjustment unit 864 determines which request is transmitted from the IBus transceiver 661 to the IBus transceiver 662. The priority is programmable.

ＩＢｕｓトランシーバ８６１、８６２は、マルチプレクシングとデマルチプレクシング機能と、他のインターフェースから入力インターフェーススイッチへの通信を可能にするためのトライステートのバッファーリング機能を有している。
３．１８．８ローカルメモリ制御部
図２において、ローカルメモリ制御部２３６は、ローカルメモリの制御及びローカルメモリとコプロセッサ内のモジュールとの間におけるアクセス要求の処理の全てを担当する。ローカルメモリ制御部２３６は、結果オーガナイザ２４９からの書き込み要求と入力インターフェーススイッチ２５２からの読み出し要求に応答する。更に、周辺インターフェース制御部２３７と通常の一般ＣＢｕｓ入力からの読み出しと書き込み要求に対しても応答する。ローカルメモリ制御部はプログラム可能なプライオリティシステムを用いており、更にスループットを最大化するためにＦＩＦＯバッファを採用している。 The IBus transceivers 861 and 862 have multiplexing and demultiplexing functions and a tri-state buffering function for enabling communication from other interfaces to the input interface switch.
3.18.8 Local Memory Control Unit In FIG. 2, the local memory control unit 236 is responsible for all of the control of the local memory and the processing of access requests between the local memory and the modules in the coprocessor. The local memory control unit 236 responds to a write request from the result organizer 249 and a read request from the input interface switch 252. Furthermore, it responds to read and write requests from the peripheral interface controller 237 and normal general CBus inputs. The local memory control unit uses a programmable priority system and further employs a FIFO buffer to maximize throughput.

本発明においては、ファーストイン・ファーストアウト（ＦＩＦＯ）バッファの他に、メモリアレイからポートをデカップルするためにマルチポートバーストダイナミックメモリ制御部が用いられている。図１４６は、本発明の第１の実施例に従い、４ポートバーストダイナミックメモリ制御部のブロック図を示している。この回路には、メモリアレイ１９１０へのアクセスを必要とする２つの書き込みポート（Ａ１９４４とＢ１９４６）と２つの読み出しポート（Ｃ１９４８とＤ１９５０）が含まれている。読み出しポート１９４８、１９５０のデータパスは別個のＦＩＦＯ１９３６、１９３８経由でメモリアレイ１９１０から出てくるのに対し、２つの書き込みポートからのデータパスは別個のＦＩＦＯ１９２０、１９２２を通り、多重化部１９１２経由でメモリアレイ１９１０に向かう。中央制御部１９３２は、ダイナミックメモリ１９１０へのインターフェースに必要な全てのコントロール信号を駆動すると共に全体のポートアクセスを調整する。リフレッシュカウンタ１９３４は、メモリアレイ１９１０のためにダイナミックメモリのリフレッシュサイクルの必要時期を決め、制御部１９３２と共にこれらを調整する。 In the present invention, in addition to a first-in first-out (FIFO) buffer, a multi-port burst dynamic memory controller is used to decouple ports from the memory array. FIG. 146 shows a block diagram of a 4-port burst dynamic memory controller according to the first embodiment of the present invention. The circuit includes two write ports (A1944 and B1946) and two read ports (C1948 and D1950) that require access to the memory array 1910. The data path of read ports 1948 and 1950 exits the memory array 1910 via separate FIFOs 1936 and 1938, whereas the data path from the two write ports passes through separate FIFOs 1920 and 1922 via the multiplexer 1912. Go to memory array 1910. The central control unit 1932 drives all control signals necessary for the interface to the dynamic memory 1910 and adjusts the entire port access. The refresh counter 1934 determines the required time of the dynamic memory refresh cycle for the memory array 1910 and adjusts these together with the controller 1932.

好ましくは、メモリアレイ１９１０に対するデータの読み出しと書き込みは、書き込みポート１９４４、１９４６からＦＩＦＯ１９２０、１９２２へ、或はＦＩＦＯ１９３６、１９３８から読み出しポート１９４８、１９５０への転送の２倍のレートで行われる。この結果、書き込みと読み出しポート１９４４、１９４６、１９４８、１９５０を通してデータを転送するのに要する時間に対し、メモリアレイ１９１０からの転送、又はメモリアレイ１９１０への転送に要する時間（いかなるメモリシステムのボトルネックである）を可能な限り短くするのである。 Preferably, data reads and writes to memory array 1910 are performed at twice the rate of transfer from write ports 1944, 1946 to FIFO 1920, 1922, or from FIFO 1936, 1938 to read ports 1948, 1950. As a result, the time required to transfer data to or from the memory array 1910 relative to the time required to transfer data through the write and read ports 1944, 1946, 1948, 1950 (both memory system bottlenecks Is as short as possible.

データは、書き込みポート１９４４、１９４６のいずれかを経由してメモリアレイ１９１０に書き込まれる。書き込みポート１９４４、１９４６に接続された回路は、初期値ゼロのＦＩＦＯ１９２０、１９２２のみを認知する事になる。書き込みポート１９４４、１９４６を通してのデータ転送は、ＦＩＦＯ１９２０、１９２２が一杯になるか、又はバーストが終了するまでスムーズに進んでいく。データが最初にＦＩＦＯ１９２０、１９２２に書き込まれると、制御部１９３２はＤＲＡＭへのアクセスのための他のポートとの仲裁を行う。アクセスが得られると、データは最高レートでＦＩＦＯ１９２０、１９２２から読み出され、メモリアレイ１９１０に書き込まれる。ＤＲＡＭ１９１０へのバースト書き込みサイクルは、ＦＩＦＯ１９２０、１９２２にプリセットされた数のデータワードが貯えられた場合、又は書き込みポートからのバーストが終了した場合のみに開始される。いずれの場合においても、ＤＲＡＭ１９１０へのバーストは許可された時点から進み、ＦＩＦＯ１９２０、１９２２が空になるか、又はより高いプライオリティポートからのサイクル要求があるまで続く。いずれのイベントにおいてもデータは、ＦＩＦＯが充満するか、又は現在のバーストが終了し、新たなバーストが開始するまで、書き込みポートからＦＩＦＯ１９２０、１９２２へ邪魔されなく続けて書き込まれる。後者の場合、新しいバーストは、以前のバーストがＦＩＦＯ１９２０、１９２２を空にしてＤＲＡＭ１９１０に書き込まれるまでは進行されない。前者の場合には、最初のワードがＦＩＦＯ１９２０、１９２２から読み出されてＤＲＡＭ１９１０に書き込まれるや否やデータ転送が再開される。ＦＩＦＯ１９２０、１９２２からのデータ転送が最高レートであるため、書き込みポート１９４４、１９４６がストールするのは、制御部１８３２が他のポートからのサイクル要求で割り込みされた時のみ可能である。書き込みポート１９４４、１９４６からＦＩＦＯ１９２０、１９２２へのデータ転送に対するいかなる割り込みも、できるだけ最小に維持するのが望ましい。 Data is written to the memory array 1910 via one of the write ports 1944, 1946. The circuits connected to the write ports 1944 and 1946 will only recognize the FIFOs 1920 and 1922 with an initial value of zero. Data transfer through the write ports 1944, 1946 proceeds smoothly until the FIFOs 1920, 1922 are full or the burst is complete. When data is first written to the FIFO 1920, 1922, the control unit 1932 arbitrates with other ports for accessing the DRAM. When access is obtained, data is read from the FIFOs 1920, 1922 at the highest rate and written to the memory array 1910. A burst write cycle to the DRAM 1910 is initiated only when a preset number of data words are stored in the FIFOs 1920, 1922, or when a burst from the write port is completed. In either case, the burst to DRAM 1910 proceeds from the time it was granted and continues until FIFOs 1920, 1922 are empty or there are cycle requests from higher priority ports. In either event, data is written uninterrupted from the write port to the FIFO 1920, 1922 until the FIFO is full or the current burst ends and a new burst starts. In the latter case, the new burst will not proceed until the previous burst is written to DRAM 1910 with FIFOs 1920, 1922 empty. In the former case, data transfer is resumed as soon as the first word is read from the FIFO 1920, 1922 and written to the DRAM 1910. Since the data transfer from the FIFO 1920, 1922 is at the highest rate, the write ports 1944, 1946 can be stalled only when the control unit 1832 is interrupted by a cycle request from another port. It is desirable to keep any interruption to data transfer from the write ports 1944, 1946 to the FIFO 1920, 1922 as minimal as possible.

読み出しポート１９４８、１９５０は逆の順で動作する。読み出しポート１９４８、１９５０が読み出し要求を出すと、即刻、ＤＲＡＭサイクルが要求される。この要求に対する許可が得られるとメモリアレイ１９１０が読まれ、対応するＦＩＦＯ１９３６、１９３８にデータが書き込まれる。最初のデータワードがＦＩＦＯ１９３６、１９３８に書き込まれるやいなや、読み出しポート１９４８、１９５０による読み出しが可能になる。このように最初のデータワードを得るには初期遅延が存在するが、その後の連続するデータワードの獲得にはおそらくそれ以上の遅延は出て来ないのである。ＤＲＡＭの読み出しは、より高いプライオリティのＤＲＡＭ要求があるか、読み出しＦＩＦＯ１９３６、１９３８が一杯になった場合、或は読み出しポート１９４８、１９５０がそれ以上データを要求しなくなったら終了する。一旦このようにして読み出しが終了すると、ＦＩＦＯ１９３６、１９３８へプリセットされているデータワードの数に余裕ができるまで再開されない。一旦読み出しポートがサイクルを終了すると、ＦＩＦＯ１９３６、１９３８に残っているいかなるデータも廃棄される。 Read ports 1948 and 1950 operate in the reverse order. When the read port 1948, 1950 issues a read request, a DRAM cycle is requested immediately. If permission for this request is obtained, the memory array 1910 is read and data is written to the corresponding FIFOs 1936, 1938. As soon as the first data word is written to the FIFO 1936, 1938, it can be read by the read ports 1948, 1950. Thus, although there is an initial delay to obtain the first data word, there is probably no further delay in the subsequent acquisition of successive data words. The DRAM read is terminated when there is a higher priority DRAM request, the read FIFO 1936, 1938 is full, or the read ports 1948, 1950 no longer request data. Once reading is completed in this way, it is not resumed until there is room in the number of data words preset in the FIFOs 1936, 1938. Once the read port finishes the cycle, any data remaining in the FIFO 1936, 1938 is discarded.

常にＤＲＡＭコントロールが最小値を上回るようにするため、プリセットされている数のデータワードが全て転送されるまで（或は、対応するＦＩＦＯ１９２０、１９２２が空になるか、読み出しＦＩＦＯ１９３６、１９３８が一杯になるまで）バーストが割り込みされないようにＤＲＡＭアクセスへの再仲裁は制限される。全てのアクセスポート１９４４、１９４６、１９４８、１９５０はそれぞれに対応するバースト開始アドレスを持っており、これらはバーストの開始時にカウンタ１９４２にラッチされている。このカウンタはポートに対する取り引きのためのカレントアドレスを保持しており、例え転送が割り込みされても、いっでも正しいメモリアドレスで再開する事が可能である。現在アクティヴなＤＲＡＭサイクルのアドレスのみが多重化部１９４０により選択され、行アドレスカウンタ１９１６と列アドレスカウンタ１９１８に送られる。アドレスの低次Ｎビットは列カウンタ１９１８に入力され、一方の上位アドレスビットは行カウンタ１９１６へ入力される。多重化部１９１４は、ＤＲＡＭの行アドレスタイムの間には行カウンタ１９１６からメモリアレイ１９１０へ行アドレスを出力し、ＤＲＡＭの列アドレスタイムの間には列カウンタ１９１８から列アドレスを送る。行アドレスカウンタ１９１６と列アドレスカウンタ１９１８は、いかなるバーストの開始時においてもメモリアレイＤＲＡＭ１９１０へロードされる。これは、ポートサイクルの開始時と、割り込みされたバーストの継続時の両方に当てはまる事実である。列アドレスカウンタ１９１８は、それぞれのメモリへの転送が起きた後にインクリメントされ、行アドレスカウンタ１９１６は列アドレスカウンタ１９１８がゼロに変わるとインクリメントされる。後者の場合にはバーストが終了され、新たな行アドレスで再開されなければならない。 To keep the DRAM control above the minimum value at all times, until the preset number of data words has been transferred (or the corresponding FIFO 1920, 1922 is empty or the read FIFO 1936, 1938 is full) Until) the re-arbitration to DRAM access is limited so that the burst is not interrupted. All access ports 1944, 1946, 1948, 1950 have corresponding burst start addresses, which are latched in the counter 1942 at the start of the burst. This counter holds the current address for the transaction for the port, and even if the transfer is interrupted, it can be restarted at the correct memory address. Only the address of the currently active DRAM cycle is selected by multiplexer 1940 and sent to row address counter 1916 and column address counter 1918. The low order N bits of the address are input to column counter 1918, while the upper address bits are input to row counter 1916. The multiplexer 1914 outputs a row address from the row counter 1916 to the memory array 1910 during the DRAM row address time, and sends a column address from the column counter 1918 during the DRAM column address time. Row address counter 1916 and column address counter 1918 are loaded into memory array DRAM 1910 at the start of any burst. This is true at both the beginning of the port cycle and the duration of the interrupted burst. The column address counter 1918 is incremented after the transfer to the respective memory occurs, and the row address counter 1916 is incremented when the column address counter 1918 changes to zero. In the latter case, the burst is terminated and must be resumed with a new row address.

本実施例では、メモリアレイ１９１０は４×８ビットバイトラインを含んでおり、ワード当たり３２ビットを構成すると仮定している。更に、それぞれの書き込みポート１９４４、１９４６に対応する４バイトの書き込みイネーブル信号のセット１９５０、１９５２があり、個別的にデータがメモリアレイ１９１０内のそれぞれの３２ビットデータワードのそれぞれの８ビット部分に書き込まれるようにする。メモリアレイ１９１０に書き込まれるそれぞれのワード内のいかなるバイトにデータの書き込みに対するマスクを任意にかける事が可能であるため、対応するＦＩＦＯ１９２６、１９２８にそれぞれのデータワードと共に書き込みイネーブル情報を貯えておく必要がある。これらのＦＩＦＯ１９２６、１９２８は書き込みＦＩＦＯ１９２０、１９２２のコントロールに用いられるのと同じ信号でコントロールされるが、ＦＩＦＯ１９２０、１９２２へのデータの書き込みに必要とされる３２ビットの代わりに４ビットのみが用いられる。同様に、多重化部１９３０は多重化部１９１２と同じようにコントロールされる。選択された書き込みイネーブルは、制御部１９３２へ入力され、制御部はこれらの情報を用い、多重化部１９１２によりメモリアレイ１９１０へ入力される書き込みデータと同期してメモリアレイ１９１０内のアドレスされたワードへの書き込みを選択的に可能又は不可能にする。 In this example, it is assumed that memory array 1910 includes 4 × 8 bit byte lines and constitutes 32 bits per word. In addition, there is a set of 4-byte write enable signals 1950, 1952 corresponding to each write port 1944, 1946, where data is individually written to each 8-bit portion of each 32-bit data word in the memory array 1910. To be. Since any byte in each word written to the memory array 1910 can be arbitrarily masked for writing data, it is necessary to store write enable information together with each data word in the corresponding FIFO 1926, 1928. is there. These FIFOs 1926, 1928 are controlled by the same signals used to control the write FIFOs 1920, 1922, but only 4 bits are used instead of the 32 bits required to write data to the FIFOs 1920, 1922. Similarly, the multiplexing unit 1930 is controlled in the same manner as the multiplexing unit 1912. The selected write enable is input to the control unit 1932, and the control unit uses these pieces of information to synchronize with the write data input to the memory array 1910 by the multiplexing unit 1912 and to address the addressed word in the memory array 1910. Selectively enabling or disabling writing to

図１４６の構成は制御部１９３２の制御下で動作する。図１４７は、図１４６において制御部１９３２の動作の詳細を示す状態図である。パワーアップの後とリセットの完了時に、状態器は強制的にＩＤＬＥ１００状態になり、この状態ですべてのＤＲＡＭコントロール信号がインアクティブ（ｈｉｇｈ）になり、多重化部１９１４は行アドレスをＤＲＡＭアレイ１９１０へ送る。リフレッシュまたはサイクル要求が検出されると、ＲＡＳＤＥＬ１１９６２状態へ遷移される。次のクロックエッジでサイクル要求とリフレッシュがなくなったら、状態器はＩＤＬＥ１９００状態に戻る。そうでないと、ＤＲＡＭｔＲＰ（ＲＡＳプリチャージタイミング制限）周期が満たされた時にＲＡＳＯＮ１９６６状態へ遷移され、この時、行アドレスストローブ信号ＲＡＳはローレベルになる。ｔＲＣＤ（ＲＡＳからＣＡＳへの遅延タイミング制限）が満たされた後、ＣＯＬ１９６８状態へ遷移され、ＤＲＡＭアレイ１９１０へ入力するための列アドレスを選択するように多重化部１９１４がスイッチされる。次のクロックエッジでＣＡＳＯＮ１９７０状態に遷移され、ＤＲＡＭ列アドレスストローブ（ＣＡＳ）信号がアクティブローになる。一旦、ｔＣＡＳ（ＣＡＳアクティヴタイミング制限）が満たされたら、ＣＡＳＯＦＦ１９７２状態へ遷移され、この状態でＤＲＡＭ列アドレスストローブ（ＣＡＳ）は再びインアクティヴハイになる。ここで、更なるデータワードが転送されることになっていると共に、より高いプライオリティのサイクル要求や、リフレッシュが差し迫ってないか、或は再仲裁するには速すぎる場合、それから一旦ｔＣＰ（ＣＡＳプリチャージタイミング制限）周期が満たされたらＣＡＳＯＮ１９７０状態へ復帰し、ＤＲＡＭ列アドレスストローブ（ＣＡＳ）は再びアクティヴローになる。もし更なるデータワードの転送がない、或は再仲裁が発生し、より高いプライオリティのサイクル要求や、リフレッシュが差し迫っている場合、ｔＲＡＳ（ＲＡＳアクティヴタイミング制限）とｔＣＰ（ＣＡＳプリチャージタイミング制限）が両方満たされたら、その代わりにＲＡＳＯＦＦ１９７４状態へ遷移される。この状態で、ＤＲＡＭ行アドレスストローブ（ＲＡＳ）信号はインアクティヴハイになる。次のクロックエッジで状態器はＩＤＬＥ１８６０状態に復帰し、次のサイクル開始を準備する。 The configuration in FIG. 146 operates under the control of the control unit 1932. FIG. 147 is a state diagram showing details of the operation of the control unit 1932 in FIG. 146. After power-up and upon completion of reset, the state machine is forced into the IDLE 100 state, in which all DRAM control signals are inactive, and the multiplexing unit 1914 sends the row address to the DRAM array 1910. send. When a refresh or cycle request is detected, a transition is made to the RASDEL 19162 state. When there are no cycle requests and refreshes at the next clock edge, the state machine returns to the IDLE1900 state. Otherwise, when the DRAM tRP (RAS precharge timing limitation) cycle is satisfied, the state is changed to the RASON 1966 state, and at this time, the row address strobe signal RAS is at a low level. After tRCD (RAS to CAS delay timing limitation) is satisfied, the state transitions to the COL1968 state, and the multiplexing unit 1914 is switched to select the column address to be input to the DRAM array 1910. At the next clock edge, the transition is made to the CASON 1970 state, and the DRAM column address strobe (CAS) signal becomes active low. Once tCAS (CAS active timing limit) is satisfied, the transition is made to the CASOFF 1972 state, where the DRAM column address strobe (CAS) is again inactive high. Here, if more data words are to be transferred and if a higher priority cycle request or refresh is imminent or too fast to re-arbitrate, then tCP (CAS pre- When the (charge timing limit) period is satisfied, the state returns to the CASON 1970 state, and the DRAM column address strobe (CAS) becomes active low again. If there is no further data word transfer or re-arbitration occurs and a higher priority cycle request or refresh is imminent, tRAS (RAS active timing limit) and tCP (CAS precharge timing limit) If both are satisfied, the RASOFF 1974 state is transitioned instead. In this state, the DRAM row address strobe (RAS) signal becomes inactive high. At the next clock edge, the state machine returns to the IDLE 1860 state and prepares for the start of the next cycle.

ＲＡＳＤＥＬ２１９６４状態でリフレッシュ要求が検出されると、一旦ｔＲＰ（ＲＡＳプリチャージタイミング制限）が満たされたら、ＲＣＡＳＯＮ１９８０状態に遷移される。この状態でＤＲＡＭ列アドレスストローブがアクティヴローになり、ＲＡＳリフレッシュサイクルの前にＤＲＡＭＣＡＳを開始する。次のクロックエッジで遷移はＲＲＡＳＯＮ１９７８へ行われ、ＤＲＡＭ行アドレスストローブ（ＲＡＳ）はアクティヴローになる。ｔＣＡＳ（ＣＡＳアクティヴタイミング制限）が満たされると遷移はＲＣＡＳＯＦＦ１９７６へ行われ、ＤＲＡＭ列アドレスストローブ（ＣＡＳ）はインアクティヴハイになる。一旦ｔＲＡＳ（ＲＡＳアクティヴタイミング制限）が満たされると遷移はＲＡＳＯＦＦ１９７４へ行われ、ＤＲＡＭ行アドレスストローブ（ＲＡＳ）はインアクティヴハイになり、有効的にリフレッシュサイクルを終了させる。状態器は通常のＤＲＡＭサイクルのために上記のような振る舞いを継続し、ＩＤＬＥ１９６０状態へ遷移する。 When a refresh request is detected in the RASDEL2 1964 state, transition to the RCASON 1980 state is made once tRP (RAS precharge timing restriction) is satisfied. In this state, the DRAM column address strobe becomes active low and DRAM CAS is started before the RAS refresh cycle. At the next clock edge, a transition is made to RRASON 1978 and the DRAM row address strobe (RAS) is active low. When tCAS (CAS active timing limit) is met, a transition is made to RCASOFF 1976 and the DRAM column address strobe (CAS) goes inactive high. Once tRAS (RAS active timing limit) is met, a transition is made to RASOFF 1974, the DRAM row address strobe (RAS) goes inactive high, effectively ending the refresh cycle. The state machine continues the above behavior for a normal DRAM cycle and transitions to the IDLE 1960 state.

図１４６のリフレッシュカウンタ１９３４は単純にカウンタであり、１５マイクロ秒当たりに一回の固定レート、或は特殊ＤＲＡＭ業者の要求により定まったレートでリフレッシュ要求信号を発生させる。リフレッシュ要求が発行されると、この要求は図１４７の状態器により認知されるまで発行状態を続ける。このアクノレッジメントは、状態器がＲＣＡＳＯＮ１９８０状態に入った時に行われ、状態器がリフレッシュ要求の撤去を検出するまでその状態を続ける。 The refresh counter 1934 shown in FIG. 146 is simply a counter, and generates a refresh request signal at a fixed rate once every 15 microseconds, or at a rate determined by a special DRAM manufacturer's request. When a refresh request is issued, the request continues to issue until it is acknowledged by the state machine of FIG. This acknowledgment occurs when the state machine enters the RCASON 1980 state and continues in that state until the state machine detects removal of the refresh request.

図１４８には、疑似コードフォームで図１４６の仲裁器１９２４の動作が示されている。ここでは、４つのサイクル要求発行者の中でどれにメモリアレイ１９１０へのアクセスを許可するかを決める方法と、アクセスへの公平さを保つためにサイクル要求者のプライオリティを修正するメカニズムを記述している。これらのコードに用いられたシンボルは図１４９に説明されている。 FIG. 148 shows the operation of the arbiter 1924 of FIG. 146 in pseudocode form. This section describes how to determine which of the four cycle request issuers are allowed to access the memory array 1910, and a mechanism for modifying the cycle requester's priority to maintain fairness in access. ing. The symbols used for these codes are illustrated in FIG.

それぞれの要求発行者は、その要求のプライオリティを表す４ビットを持っている。上位の２ビットは一般の構成レジスタに設定されている構成値により全般的なプライオリティにプリセットされている。プライオリティの下位２ビットは仲裁者２４により更新される２ビットカウンタに収められている。仲裁の勝者を決める際に、仲裁者１９２４は単にそれぞれの要求者の４ビットの値を比較し、最高値の要求者にアクセスを許可する。要求者にサイクルが許可されると、下位２ビットのプライオリティカウンタの値はゼロになり、同一の上位２ビットのプライオリティ値と勝者より低い下位２ビットのプライオリティ値を持つ他の要求者の下位２ビットのプライオリティカウントは全て１ずつインクリメントされる。この結果、今メモリアレイ１９１０へのアクセスを許可された要求者は同一の上位２ビットプライオリティ値を持つ要求者の間で最も低いプライオリティになる。上位２ビットのプライオリティ値が勝者とは違った値を持つ要求者の下位２ビットのプライオリティ値は影響されない。プライオリティの上位２ビットの値は要求者の全般的なプライオリティを決め、下位２ビットの値は同一の上位プライオリティの要求者の間で公平な仲裁スキームを実現している。このスキームを用いることにより、ハードウェアで結線された固定プライオリティ（それぞれの要求者の上位２ビットがユニーク）から部分的な入れ替えと、部分ハードウェア結線（全てではないが、一部の上位２ビットプライオリティが他のと異なる）、厳密に公平な入れ替え（全ての上位２ビットのプライオリティ値が同一）までのいろいろな仲裁スキームが実現できる。 Each request issuer has 4 bits representing the priority of the request. The upper 2 bits are preset to the general priority by the configuration value set in the general configuration register. The lower 2 bits of the priority are stored in a 2-bit counter updated by the arbitrator 24. In determining the winner of the arbitration, the arbitrator 1924 simply compares each requester's 4-bit value and grants access to the highest requestor. When the requester is allowed to cycle, the value of the priority counter of the lower 2 bits becomes zero, and the lower 2 of the other requesters having the same upper 2 bit priority value and the lower 2 bit priority value lower than the winner All bit priority counts are incremented by one. As a result, the requester permitted to access the memory array 1910 now has the lowest priority among the requesters having the same upper 2-bit priority value. The priority value of the lower 2 bits of the requester whose priority value of the upper 2 bits is different from that of the winner is not affected. The value of the upper 2 bits of the priority determines the general priority of the requester, and the value of the lower 2 bits realizes a fair arbitration scheme between requesters having the same higher priority. By using this scheme, partial replacement from fixed priority (high-order 2 bits of each requester is unique) connected in hardware and partial hardware connection (not all, but some high-order 2 bits) Various arbitration schemes can be realized, ranging in priority (different from the others) and strictly fair replacement (the priority values of all the upper 2 bits are the same).

図１４９は、それぞれの要求者に対するプライオリティビットの構造とそのビットの利用法を示している。ここでは、図１４８に用いられているシンボルの意味も定義されている。上記の実施例で各種のＦＩＦＯ１９２０、１９２２、１９３８、それから１９３６は幅３２ビット、深さ３２ワードである。この深さは効率と消費される回路エリアの間の良い線での妥協を与えている。しかし、深さの値は、パフォーマンスの変化と共に特定のアプリケーションのニーズに合わせて変えられる。 FIG. 149 shows the structure of priority bits for each requester and how to use the bits. Here, the meaning of the symbols used in FIG. 148 is also defined. In the above embodiment, the various FIFOs 1920, 1922, 1938, and 1936 are 32 bits wide and 32 words deep. This depth provides a good line compromise between efficiency and circuit area consumed. However, the depth value can be changed to meet the needs of a particular application as performance changes.

また、ここに示されている４ポート構成は単に一つの実施例である。メモリアレイと読み出しまたは書き込みポートのいずれかとの間に単一のＦＩＦＯバッファを用意するだけでも効果は得られる。しかし、多数の読み出しと書き込みポートを用いると最高のスピード向上が得られることになる。
３．１８．９他モジュール
他モジュール２３９は、コプロセッサ２２４の動作、リセット同期、内部診断信号を必要に応じて外部ピンにまわすことによるエラーと割り込み信号のマルチプレクシング、ＣＢｕｓの内部と外部フォームとの間のインタフェーシングや内部と一般Ｂｕｓ信号の一般／外部Ｃｂｕｓ出力ピンへのマルチプレクシングなどのためのクロックの発生と選択を行う。勿論他モジュール２３９の動作は、用いられるＡＳＩＣテクノロジによるクロッキングへの要求と具現詳細により異なる。 Also, the 4-port configuration shown here is just one example. Even providing a single FIFO buffer between the memory array and either the read or write port can be beneficial. However, using a large number of read and write ports will provide the highest speed improvement.
3.18.9 Other Modules The other modules 239 are coprocessor 224 operations, reset synchronization, error and interrupt signal multiplexing by passing internal diagnostic signals to external pins as needed, CBus internal and external forms Generation and selection of clocks for interfacing, multiplexing of internal and general bus signals to the general / external Cbus output pins. Of course, the operation of the other module 239 differs depending on the clocking requirements and implementation details of the ASIC technology used.

３．１８．１０外部インターフェース制御部
次に記述される本発明の特徴は、仮想メモリを共有するコプロセッサを有するホストコンピュータで仮想メモリを提供するための方法と装置に関連している。本発明の実施例は、コプロセッサがホストプロセッサと連動し仮想メモリモードで動作可能になるよう模索している。 3.18.10 External Interface Controller The features of the invention described below relate to a method and apparatus for providing virtual memory in a host computer having a coprocessor that shares virtual memory. Embodiments of the present invention seek to enable a coprocessor to operate in virtual memory mode in conjunction with a host processor.

特に、コプロセッサはホストプロセッサの仮想メモリモードで動作することが可能である。コプロセッサには、ホストプロセッサの仮想メモリテーブルを参照することができる仮想メモリ対物理メモリマッピングデバイスが含まれており、コプロセッサにより生成された命令アドレスをホストプロセッサのメモリ内の対応する物理アドレスにマッピングする。むしろ、仮想メモリ対物理メモリマッピングデバイスは、グラフィックイメージを生成するためにコンピュータグラフィックコプロセッサの一部を形成する。コプロセッサには、イメージに種々の複雑な動作を行える多数のモジュールが含まれる。マッピングデバイスはコプロセッサとホストプロセッサとの間の相互作用に関与するのである。 In particular, the coprocessor can operate in the virtual memory mode of the host processor. The coprocessor includes a virtual memory-to-physical memory mapping device that can reference the virtual memory table of the host processor, and converts the instruction address generated by the coprocessor to the corresponding physical address in the memory of the host processor. Map. Rather, the virtual memory to physical memory mapping device forms part of a computer graphic coprocessor for generating graphic images. The coprocessor includes a number of modules that can perform various complex operations on the image. The mapping device is responsible for the interaction between the coprocessor and the host processor.

外部インターフェース制御部（ＥＩＣ）２３８は、コプロセッサのＰＣＩＢｕｓと一般Ｂｕｓへのインターフェースを提供する。更に外部インターフェース制御部は、コプロセッサの内部仮想アドレス空間とホストシステムの物理アドレス空間との間をつなぐメモリマネジメントも提供する。外部インターフェース制御部２３８は、入力インターフェーススイッチ２５２からの要求に応じてホストメモリからデータを読み出す時や、結果オーガナイザ２４９からの要求に応じてホストメモリにデータを書き込む時にＰＣＩＢｕｓ上のマスタとして作動する。ＰＣＩＢｕｓへのアクセスは、“ＰＣＩＬｏｃａｌＢｕｓＳｐｅｃｉｆｉｃａｔｉｏｎ，ｄｒａｆｔ２．１”ＰＣＩｓｐｅｃｉａｌｉｎｔｅｒｅｓｔｇｒｏｕｐ，１９９４の標準に従って具現する。 An external interface controller (EIC) 238 provides an interface to the coprocessor PCI bus and general bus. Further, the external interface control unit also provides memory management that connects between the internal virtual address space of the coprocessor and the physical address space of the host system. The external interface control unit 238 operates as a master on the PCI bus when reading data from the host memory in response to a request from the input interface switch 252 or writing data to the host memory in response to a request from the result organizer 249. . Access to the PCI bus is implemented in accordance with the standard of “PCI Local Bus Specification, draft 2.1” PCI special interest group, 1994.

外部インターフェース制御部２３８は、入力インターフェーススイッチ２５２と結果オーガナイザ２４９からのＰＣＩ取り引きのための同時要求を仲裁する。仲裁は構成可能であるのが望ましい。受け取った要求のタイプには、一度にホストコプロセッサの１行以下のキャッシュライン読み出しや、ホストの１行と２行の間のキャッシュラインの読み出しと、２行又はそれ以上のキャッシュラインの読み出しが含まれる。長さ無制限の書き込みも外部インターフェース制御部２３８により具現される。更に外部インターフェース制御部２３８は、随意にデータのプリフェッチングも行う。 The external interface control unit 238 arbitrates simultaneous requests for PCI transactions from the input interface switch 252 and the result organizer 249. Arbitration should be configurable. The types of requests received include reading one or fewer cache lines of the host coprocessor at a time, reading a cache line between one and two rows of the host, and reading two or more cache lines. included. Unlimited length writing is also implemented by the external interface controller 238. Further, the external interface control unit 238 optionally performs prefetching of data.

外部インターフェース制御部２３８の構築には、全てのコプロセッサの内部モジュールのために仮想メモリからホストの物理メモリへのアドレスマッピングを提供するメモリマネジメントが含まれる。このマッピングは、アクセスを要求するモジュールに対し完全に透明である。外部インターフェース制御部２３８がホストメモリへのアクセス要求を受け取ると、メモリマネジメントユニットを初期化して、その要求されたアドレスを変換する。メモリマネジメントユニットがアドレスの変換に失敗すると、場合によっては一つまたはそれ以上のＰＣＩＢｕｓの取り引きがアドレスの変換を完了する結果になる。これは、メモリマネジメントユニット自身がＰＣＩＢｕｓへ取り引きを要求するもう一つのソースになれることを意味する。入力インターフェーススイッチ２５２や結果オーガナイザ２４９から要求されたバーストが仮想ページの境界を越えると、外部インターフェース制御部２３８は自動的にメモリマネジメントユニットを作動し、全ての仮想アドレスのマッピングを正しくやり直す。 The construction of the external interface controller 238 includes memory management that provides address mapping from virtual memory to host physical memory for all coprocessor internal modules. This mapping is completely transparent to the module requesting access. When the external interface control unit 238 receives an access request to the host memory, the external interface control unit 238 initializes the memory management unit and converts the requested address. If the memory management unit fails to translate the address, in some cases, one or more PCI Bus transactions will result in completing the address translation. This means that the memory management unit itself can be another source of requests for transactions from PCI Bus. When the burst requested by the input interface switch 252 or the result organizer 249 exceeds the virtual page boundary, the external interface control unit 238 automatically activates the memory management unit to correctly re-map all virtual addresses.

メモリマネジメントユニット（ＭＭＵ）（図１５０の９１５）は、１６個のルックアサイドバッファ（ＴＬＢ）が基本になっている。ＴＬＢは仮想対物理アドレスマッピングのキャッシュとして作動する。ＴＬＢでは次のような作業が可能である。
１）比較：仮想アドレスが与えられると、ＴＬＢは対応する物理アドレスかＴＬＢミス信号（アドレスにマッチする有効なエントリがない場合）のいずれかを返す。 The memory management unit (MMU) (915 in FIG. 150) is based on 16 lookaside buffers (TLBs). The TLB acts as a cache for virtual to physical address mapping. The following operations are possible in TLB.
1) Compare: Given a virtual address, the TLB returns either the corresponding physical address or the TLB miss signal (if there is no valid entry matching the address).

２）置換：ＴＬＢには、既存エントリや有効でないエントリの代わりに新しい仮想対物理マッピングが書き込まれる。
３）無効化：仮想アドレスが与えられた時、ＴＬＢのエントリにマッチするとマッチしたエントリを無効化する。
４）全無効化：すべてのＴＬＢエントリを無効化する。 2) Replacement: A new virtual-to-physical mapping is written to the TLB in place of existing or invalid entries.
3) Invalidation: When a virtual address is given, the matched entry is invalidated if it matches the TLB entry.
4) Invalidate all: Invalidate all TLB entries.

５）読み出し：ＴＬＢエントリの仮想や物理アドレスは、４ビットアドレスベースで読み出される。テストのみに用いられる。
６）書き込み：ＴＬＢエントリの仮想や物理アドレスは、４ビットアドレスベースで書き込まれる。
ＴＬＢ内のエントリは図１５１に示すようなフォーマットになっている。それぞれの有効なエントリは、２０ビットの仮想アドレス６７０、２０ビットの物理アドレス６７１、それから対応する物理ページが書き込み可能か否かを表すフラグで構成される。エントリの許容ページサイズは４Ｋバイトである。ＭＭＵ内のレジスタは、比較に用いられた１０ビットまでのアドレスにマスクをかけるのに用いることができる。これによってＴＬＢのページは４Ｍバイトまでサポートされる。マスクレジスタは１つのみであるため、すべてのＴＬＢエントリは同サイズのページを参照する。 5) Read: The virtual and physical addresses of the TLB entry are read on a 4-bit address basis. Used for testing only.
6) Write: The virtual and physical addresses of the TLB entry are written on a 4-bit address basis.
The entries in the TLB have a format as shown in FIG. Each valid entry includes a 20-bit virtual address 670, a 20-bit physical address 671, and a flag indicating whether or not the corresponding physical page can be written. The allowable page size of the entry is 4K bytes. Registers in the MMU can be used to mask up to 10 bits of addresses used for comparison. This supports TLB pages up to 4M bytes. Since there is only one mask register, all TLB entries refer to pages of the same size.

ＴＬＢには、“Ｌｅａｓｔ−ＲｅｃｅｎｔｌｙＵｓｅｄ”（ＬＲＵ）置換アルゴリズムが用いられている。新しいエントリは最も長い時間が経過したエントリに上書きされる。なぜなら、それは最後に書き込まれたか、或は比較作業で一致したものだからである。これは無効なエントリがない場合のみに適用される。無効なエントリがある場合には、有効なエントリに上書きする前に無効なエントリに書き込まれる。 The TLB uses a “Least-Recently Used” (LRU) replacement algorithm. The new entry is overwritten with the entry with the longest time. This is because it was last written or matched in comparison. This only applies if there are no invalid entries. If there is an invalid entry, it is written to the invalid entry before it is overwritten.

図１５２はＴＬＢ比較操作の流れを示す。受け取られた仮想アドレス８８０は８８１〜８８３の３つの部分に分けられる。下位１２ビット８８１は常にページ内のオフセットの部分であるため、対応する物理アドレスビット８８５へダイレクトに送られる。次の１０ビット８８２は、マスクビットにより設定された通り、ページサイズによってオフセットの部分か、ページ番号の部分かのいずれかである。マスクレジスタ８８７内のゼロの値は、ビットがページオフセットの部分であるためＴＬＢ比較に用いてはいけないということを示している。１０アドレスビットは１０マスクビットとロジカルに“ＡＮＤＥＤ”（論理積）され、ＴＬＢルックアップのために下位１０ビットの仮想ページ番号８８９を与える。仮想アドレスの上位１０ビット８８３は、仮想ページ番号８８９の上位１０ビットとしてダイレクトに用いられる。 FIG. 152 shows the flow of the TLB comparison operation. The received virtual address 880 is divided into three parts 881-883. Since the lower 12 bits 881 are always part of the offset within the page, they are sent directly to the corresponding physical address bits 885. The next 10 bits 882 are either an offset portion or a page number portion depending on the page size, as set by the mask bits. A zero value in the mask register 887 indicates that the bit is part of the page offset and should not be used for TLB comparisons. The 10 address bits are logically “ANDED” with the 10 mask bits to give the lower 10 bits of the virtual page number 889 for the TLB lookup. The upper 10 bits 883 of the virtual address are directly used as the upper 10 bits of the virtual page number 889.

このように生成された２０ビットの仮想ページ番号はＴＬＢに送られる。これがエントリの１つと一致すると、ＴＬＢは対応する物理ページ番号８７２と一致した位置の番号を返す。物理アドレス８７３は、マスクレジスタ８８７を再び用いて物理ページ番号から生成される。物理ページ番号８７２の上位１０ビットは物理アドレス８７３の上位１０ビットとしてダイレクトに用いられる。物理アドレス８７２の次の１０ビットは、物理ページ番号（対応するマスクビットが１の場合）か仮想アドレス（マスクビットが０の場合）かのいずれかから８７５に選択される。物理アドレスの下位１２ビット８８５は仮想アドレスからダイレクトに与えられる。 The 20-bit virtual page number generated in this way is sent to the TLB. If this matches one of the entries, the TLB returns the number of the position that matched the corresponding physical page number 872. The physical address 873 is generated from the physical page number using the mask register 887 again. The upper 10 bits of the physical page number 872 are directly used as the upper 10 bits of the physical address 873. The next 10 bits of the physical address 872 are selected as 875 from either the physical page number (when the corresponding mask bit is 1) or the virtual address (when the mask bit is 0). The lower 12 bits 885 of the physical address are given directly from the virtual address.

最後に、マッチに従いＬＲＵバッファ８７６が更新され、マッチされたアドレスの使用を表す。ＴＬＢミスは、入力インターフェーススイッチ２５２や結果オーガナイザ２４９がＴＬＢ８７２に存在しない仮想アドレスへのアクセスを要求した時に発生する。この場合、ＭＭＵは要求されたアクセスの処理を進める前に、ホストメモリ２０３のページテーブルから要求された仮想対物理変換をフェッチし、それをＴＬＢに書き込まなければならない。 Finally, the LRU buffer 876 is updated according to the match to indicate the use of the matched address. A TLB miss occurs when the input interface switch 252 or the result organizer 249 requests access to a virtual address that does not exist in the TLB 872. In this case, the MMU must fetch the requested virtual-to-physical translation from the page table in the host memory 203 and write it to the TLB before proceeding with the requested access process.

ページテーブルはホストメインメモリのハッシュテーブルである。それぞれのページテーブルエントリは、図１５３に示すようなフォーマットの２つの３２ビットワードで構成されている。２番目のワードは物理アドレスのための上位２０ビットを構成し、下位１２ビットは予約されている。対応する仮想アドレスの上位２０ビットは最初のワードに与えられている。下位１２ビットには有効（Ｖ）ビットと書き込み可能（Ｗ）または“リードオンリ”ビットが含まれており、残りの１０ビットは予約されている。 The page table is a hash table of the host main memory. Each page table entry is composed of two 32-bit words in the format shown in FIG. The second word constitutes the upper 20 bits for the physical address and the lower 12 bits are reserved. The upper 20 bits of the corresponding virtual address are given in the first word. The lower 12 bits include a valid (V) bit and a writable (W) or “read only” bit, and the remaining 10 bits are reserved.

ページテーブルエントリには、基本的にＴＬＢエントリと同じ情報が含まれている。ページテーブルの余分のフラグは予約されている。ページテーブル自身は、通常メインメモリ２０３内の複数のページにわたって分散され、一般に仮想空間と隣接していて物理空間とは接していない。ＭＭＵには、ソフトウェアにより設定された１６のページテーブルポインタのセットが含まれており、それぞれはページテーブルの部分を含んでいる４Ｋバイトメモリ領域への２０ビットポインタである。これは、コプロセッサ２２４が６４Ｋバイトサイズのページテーブルをサポートし、８Ｋページマッピングを有することを意味している。４Ｋバイトページサイズのシステムにおいて、これは最大３２Ｍバイトのマッピングされた仮想アドレス空間を意味する。むしろページテーブルポインタは、ＴＬＢに用いられるページサイズとは関係なく、常に４Ｋバイトのメモリ領域を参照することである。 The page table entry basically includes the same information as the TLB entry. Extra flags in the page table are reserved. The page table itself is normally distributed over a plurality of pages in the main memory 203, and is generally adjacent to the virtual space and is not in contact with the physical space. The MMU includes a set of 16 page table pointers set by software, each of which is a 20-bit pointer to a 4K byte memory area containing the page table portion. This means that the coprocessor 224 supports a 64K byte size page table and has 8K page mapping. In a 4 Kbyte page size system, this means a mapped virtual address space of up to 32 Mbytes. Rather, the page table pointer always refers to a 4 Kbyte memory area regardless of the page size used for the TLB.

ＴＬＢミス後のＭＭＵ操作は、次のように図１５４の６９０に示している。
１．ＴＬＢに存在しない仮想ページ番号８９１上のハッシュファンクション８９２を実行し、ページテーブルへ１３ビットのインデックスを生成する。
２．ページテーブルインデックス８９４、８９６の上位４ビット８９４を用い、ページテーブルポインタ８９５を選択する。 The MMU operation after the TLB miss is shown as 690 in FIG. 154 as follows.
1. A hash function 892 on the virtual page number 891 that does not exist in the TLB is executed to generate a 13-bit index in the page table.
2. The page table pointer 895 is selected using the upper 4 bits 894 of the page table indexes 894 and 896.

３．２０ビットのページテーブルポインタ８９５とページテーブルインデックス８９６の下位９ビットを連結し、最下位３ビットに０００を設定することにより（ページテーブルエントリはホストメモリ内の８バイトを占めるため）、要求されたページテーブルエントリの物理アドレス８９０を生成する。
４．ページテーブルエントリの物理アドレス８９８から始め、ホストメモリから８バイトを読み出す。 3. Concatenate the 20-bit page table pointer 895 and the lower 9 bits of the page table index 896, and set 000 to the lowest 3 bits (because the page table entry occupies 8 bytes in the host memory), request The generated physical address 890 of the page table entry is generated.
4). Starting from the physical address 898 of the page table entry, 8 bytes are read from the host memory.

５．８バイトのページテーブルエントリ９００がＰＣＩバスへ返されたとき、ＶＡＬＩＤビットが１にセットされていれば仮想ページ番号はＴＬＢミスを起こした元の仮想ページ番号と比較される。両者がマッチしないと、上記のプロセスを用いて次のページテーブルエントリがフェッチされる（物理アドレスは８バイトずつインクリメントされる）。この過程はマッチする仮想ページ番号のページテーブルエントリが見つかるまで、或は無効なページテーブルエントリに遭うまで続けられる。無効なページテーブルエントリに遭った場合には、ページフォールトエラーが出され処理は中止する。 When the 5.8-byte page table entry 900 is returned to the PCI bus, if the VALID bit is set to 1, the virtual page number is compared with the original virtual page number that caused the TLB miss. If they do not match, the next page table entry is fetched using the above process (the physical address is incremented by 8 bytes). This process continues until a matching virtual page number page table entry is found or an invalid page table entry is encountered. If an invalid page table entry is encountered, a page fault error is issued and processing stops.

６．マッチする仮想ページ番号を有するページテーブルエントリが見つかると、置換操作によって完全なエントリがＴＬＢに書き込まれる。新しいエントリはＬＲＵバッファ８７６によってポイントされたＴＬＢ位置に置かれる。それからＴＬＢの比較作業が再び行われ、順調に続いて、元の要求されたホストメモリアクセスの処理が可能になる。新しいエントリがＴＬＢに書き込まれると、ＬＲＵバッファ８７６は更新される。 6). If a page table entry with a matching virtual page number is found, the replacement operation writes the complete entry to the TLB. The new entry is placed in the TLB location pointed to by LRU buffer 876. The TLB comparison operation is then performed again, and the original requested host memory access can be processed following the success. When a new entry is written to the TLB, the LRU buffer 876 is updated.

ＥＩＣ２３８に具現されているハッシュファンクション８９２は、２０ビットの仮想ページ番号（ｖｐｎ）に対し、次の方程式を用いる。
ｉｎｄｅｘ＝（（ｖｐｎ＞＞Ｓ１）ＸＯＲ（ｖｐｎ＞＞Ｓ２）ＸＯＲ（ｖｐｎ＞＞Ｓ３））＆Ｏｘ１ｆｆｆ；
ここで、Ｓ１、Ｓ２、Ｓ３は独立的にプログラム可能なシフト量（正、又は負）で、それぞれ４つの値を取ることができる。 The hash function 892 embodied in the EIC 238 uses the following equation for a 20-bit virtual page number (vpn).
index = ((vpn >> S1) XOR (vpn >> S2) XOR (vpn >> S3)) &Ox1fff;
Here, S1, S2, and S3 are independently programmable shift amounts (positive or negative) and can take four values.

ページテーブルの線形探索が４Ｋバイトの境界を越えると、ＭＭＵは自動的に次のページテーブルポインタを選択し、正しい物理メモリ位置で探索を継続する。この作業には、ページテーブルの最後から最初へのラッピングが含まれる。ぺージテーブルは、探索が常に終了されるように常に少なくとも１つの無効（ｎｕｌｌ）エントリを含んでいる。 If the linear search of the page table crosses the 4K byte boundary, the MMU automatically selects the next page table pointer and continues the search at the correct physical memory location. This operation includes wrapping from the end of the page table to the beginning. The page table always contains at least one null entry so that the search is always terminated.

ソフトウェアがホストメモリ内のページを置換するたびに、新しい仮想ページのためのページテーブルエントリを追加し、置換されたページに対応するエントリを削除しなければならない。また、古いページテーブルエントリはコプロセッサ２２４のＴＬＢにキャッシュされてはいけない。これは、ＭＭＵ内のＴＬＢ無効化サイクルを果たすことにより行われる。 Each time software replaces a page in host memory, it must add a page table entry for the new virtual page and delete the entry corresponding to the replaced page. Also, old page table entries should not be cached in the coprocessor 224 TLB. This is done by performing a TLB invalidation cycle within the MMU.

無効化サイクルは無効化作業を引き起こすビットと共に無効化される仮想ページ番号をし、ＭＭＵへのレジスタ書き込みを通じて果たされる。このレジスタ書き込みは、ソフトウェアによって直接、或は命令デコーダにより割り込みされた命令を通じて果たされる。無効化作業は、提供された仮想ページ番号のためにＴＬＢ上で果たされる。ＴＬＢエントリにマッチすると、エントリは無効にマークされ、無効化された位置が次の置換作業で用いられるようにＬＲＵテーブルが更新される。 An invalidation cycle is a virtual page number that is invalidated along with a bit that causes invalidation work, and is accomplished through a register write to the MMU. This register write is accomplished either directly by software or through an instruction interrupted by an instruction decoder. Invalidation work is done on the TLB for the provided virtual page number. When a TLB entry is matched, the entry is marked invalid and the LRU table is updated so that the invalidated position is used in the next replacement operation.

未決定の無効化作業はいかなる未決定のＴＬＢ比較より高いプライオリティを持っている。無効化作業が完了すると、ＭＭＵは無効化ビットをクリアし、次の無効化処理が可能であることを知らせる。ＭＭＵが要求された仮想アドレスのための有効なページテーブルエントリを見つけられない場合、これをページフォルトという。ＭＭＵはエラー信号を出し、フォルトを起こした仮想アドレスをソフトウェアがアクセス可能なレジスタに保管する。ＭＭＵはアイドル状態に入り、エラーが解決されるまで待機する。割り込みがクリアされると、ＭＭＵは次の要求された取り引きから再び作業を始める。 The pending invalidation work has a higher priority than any pending TLB comparison. When the invalidation operation is completed, the MMU clears the invalidation bit to notify that the next invalidation processing is possible. If the MMU cannot find a valid page table entry for the requested virtual address, this is called a page fault. The MMU issues an error signal and stores the virtual address that caused the fault in a register accessible to the software. The MMU enters the idle state and waits until the error is resolved. When the interrupt is cleared, the MMU starts working again with the next requested transaction.

読み出し専用とマークされた（書き込み可能とマークされてない）ページへの書き込み作業がなされた時にもページフォルトが出される。外部インターフェース制御部（ＥＩＣ）２３８は、一般バスへアドレスされている入力インターフェーススイッチ２５２と結果オーガナイザ２４９からの取り引き要求に応じられる。それぞれの要求モジュールは現在の要求が一般バス用かあるいはＰＣＩバス用かを表す。入力インターフェーススイッチ２５２と結果オーガナイザ２４９とのコミュニケーションに共通バスを用いるのとは異なり、一般バス要求へのＥＩＣ操作はＰＣＩ要求への操作と完全に分かれている。更にＥＩＣ２３８は、一般バス空間にダイレクトにアドレスするＣｂｕｓ取り引きタイプにも応じられる。 A page fault is also issued when a write operation is performed on a page marked read-only (not marked writable). The external interface control unit (EIC) 238 responds to a transaction request from the input interface switch 252 and the result organizer 249 addressed to the general bus. Each request module indicates whether the current request is for a general bus or a PCI bus. Unlike using a common bus for communication between the input interface switch 252 and the result organizer 249, an EIC operation for a general bus request is completely separated from an operation for a PCI request. Furthermore, the EIC 238 is also compatible with a Cbus transaction type that directly addresses the general bus space.

図１５０は、外部インターフェース制御部２３８の構造を示している。ＩＢｕｓ要求は多重化部９１０を通り、多重化部９１０は要求の目的地をもとにして（ＰＣＩまたは一般バス）適当な内部モジュールへ要求を導く。一般バスへの要求は、ＲＢｕｓとＣＢｕｓも持っている一般バス制御部９１１へ送られる。ＲＢｕｓ上の一般バスとＰＣＩバス要求は異なるコントロール信号を用いるため、このバスには多重化部が必要とされない。 FIG. 150 shows the structure of the external interface control unit 238. The IBus request passes through the multiplexing unit 910, and the multiplexing unit 910 directs the request to an appropriate internal module based on the destination of the request (PCI or general bus). The request for the general bus is sent to the general bus control unit 911 which also has the RBus and CBus. Since the general bus and PCI bus requests on the RBus use different control signals, a multiplexing unit is not required for this bus.

ＰＣＩバスへ導かれたＩＢｕｓ要求はＩＢｕｓドライバ（ＩＢＤ）９１２によって扱われる。同様に、ＰＣＩへのＲＢｕｓ要求はＲＢｕｓレシーバ（ＲＢＲ）９１４によって処理される。ＩＢＤ９１２とＲＢＲ９１４は仮想アドレスを、物理アドレスを返すメモリマネジメントユニット（ＭＭＵ）９１５に送る。ＩＢＤ、ＲＢＲ、それからＭＭＵは、それぞれＰＣＩトランザクションを要求できて、これらはＰＣＩマスタモード制御部（ＰＭＣ）９１７によって生成され、コントロールされる。ＩＢＤとＭＭＵはＰＣＩ読み出しのみを要求し、ＲＢＲはＰＣＩ書き込みのみを要求する。 An IBus request directed to the PCI bus is handled by an IBus driver (IBD) 912. Similarly, RBus requests to PCI are handled by an RBus receiver (RBR) 914. The IBD 912 and RBR 914 send the virtual address to a memory management unit (MMU) 915 that returns the physical address. The IBD, RBR, and then the MMU can each request PCI transactions, which are generated and controlled by the PCI master mode controller (PMC) 917. The IBD and MMU request only PCI read, and the RBR requires only PCI write.

別個のＰＣＩターゲットモード制御部（ＰＴＣ）９１８は、ターゲットとしてコプロセッサへアドレスされた全てのＰＣＩトランザクションを処理する。これはＣＢｕｓマスタモード信号を命令制御部へ送り、すべての他モジュールへのアクセスを可能にする。ＰＴＣは、返されたＣＢｕｓデータをＰＭＣ経由でＰＣＩバスへ送るため、ＰＣＩデータバスピンのコントロールは単一のソースから出される。 A separate PCI target mode controller (PTC) 918 processes all PCI transactions addressed to the coprocessor as a target. This sends a CBus master mode signal to the command controller, allowing access to all other modules. Since the PTC sends the returned CBus data to the PCI bus via the PMC, control of the PCI data bus pins comes from a single source.

ＥＩＣレジスタとモジュールメモリへアドレスされたＣＢｕｓトランザクションは標準ＣＢｕｓインターフェース７によって扱われる。全てのサブモジュールはコントロールレジスタからビットをもらい、ステータスレジスタにビットを返す。これらは標準ＣＢｕｓインターフェース内部に位置している。ＰＣＩバストランザクションのためのパリティ生成とチェックは、ＰＭＣとＰＴＣのコントロール下で作動するパリティ生成とチェック（ＰＧＣ）モジュール９２１によって処理される。生成されたパリティは、パリティエラー信号と同様にＰＣＩバスへ送られる。パリティチェックの結果は、エラーレポートのためにＰＴＣのコンフィギュレーションレジスタにも送られる。 CBus transactions addressed to the EIC register and module memory are handled by the standard CBus interface 7. All submodules get the bit from the control register and return the bit to the status register. These are located inside the standard CBus interface. Parity generation and checking for PCI bus transactions is handled by a parity generation and checking (PGC) module 921 that operates under the control of the PMC and PTC. The generated parity is sent to the PCI bus in the same manner as the parity error signal. The result of the parity check is also sent to the PTC configuration register for error reporting.

図１５５は、図１５０のＩＢｕｓドライバ９１２の構造を示している。受け入れたＩＢｕｓアドレスとコントロール信号はサイクルの始点でラッチされる９３０。オアゲート９３１はサイクルの始まりを検出し、コントロールロジック９３２に開始信号を発生する。仮想ページ番号を形成するラッチ９３０の上位アドレスビットはカウンタ９３５にロードされる。仮想ページ番号は、９３６にラッチされた物理ページ番号を返すＭＭＵ９１５（図１５０）へ送られる。 FIG. 155 shows the structure of the IBus driver 912 of FIG. The accepted IBus address and control signal are latched 930 at the beginning of the cycle. The OR gate 931 detects the start of the cycle and generates a start signal to the control logic 932. The upper address bits of latch 930 that form the virtual page number are loaded into counter 935. The virtual page number is sent to the MMU 915 (FIG. 150) that returns the physical page number latched in 936.

物理ページ番号と下位仮想アドレスビットは、マスク９３７によって再結合され、ＰＭＣ７１７（図１０２）へのＰＣＩ要求のためのアドレス９３８を形成する。また、サイクルのためのバーストカウントもカウンタ９３９にロードされる。プリフェッチ動作は異なるカウンタ９４１とアドレスラッチと比較回路９４３を用いる。ＰＭＣから返されたデータは、データがプリフェッチの一部か否かを表すマーカと共にＦＩＦＯ９４４にロードされる。データがＦＩＦＯ９４４の前の部分で使用可能になってくると、ラッチ９４５、９４６経由で読み出し、ロジックによりクロックアウトされる。読み出しロジック９４６はＩＢｕｓアクノレッジメント信号も生成する。 The physical page number and lower virtual address bits are recombined by mask 937 to form address 938 for the PCI request to PMC 717 (FIG. 102). A burst count for the cycle is also loaded into the counter 939. The prefetch operation uses a different counter 941, an address latch, and a comparison circuit 943. The data returned from the PMC is loaded into the FIFO 944 with a marker that indicates whether the data is part of a prefetch. When data becomes available in the previous part of FIFO 944, it is read out via latches 945 and 946 and clocked out by logic. Read logic 946 also generates an IBus acknowledgment signal.

中央コントロールブロック９３２は、状態器を含め、全てのアドレスとデータ要素の順次処理、それからＰＭＣへのインターフェースをコントロールする。仮想ページ番号カウンタ９３５は、ＩＢｕｓアドレスからのページ番号ビットで、ＩＢｕｓトランザクションの開始と共にロードされる。この２０ビットカウンタの上位１０ビットは常に受け入れるアドレスからくる。下位１０ビットに対しては、それぞれのビットは対応するマスクビット９３７が１にセットされていれば受け入れるアドレスからロードされ、そうでないと、カウンタビットが１にセットされる。２０ビットの値はＭＭＵインターフェースへ送られる。 Central control block 932 controls the sequential processing of all address and data elements, including the state machine, and then the interface to the PMC. The virtual page number counter 935 is the page number bit from the IBus address and is loaded with the start of the IBus transaction. The upper 10 bits of this 20-bit counter always come from an accepted address. For the lower 10 bits, each bit is loaded from the accepting address if the corresponding mask bit 937 is set to 1, otherwise the counter bit is set to 1. The 20 bit value is sent to the MMU interface.

通常の動作で、仮想ページ番号は初期アドレス変換の後で用いられない。しかし、ＩＢＤがバーストのページ境界越えを検出した場合には、仮想ページカウンタがインクリメントされ、もう１つの変換が行われる。カウンタがロードされた時仮想ページ番号の一部でない下位ビットが１にセットされているため、２０ビットの値への単純インクリメントは実際のページ番号フィールドのインクリメントをもたらす。インクリメントされた後、次のインクリメントのためにカウンタをセットアップするために、マスクビット９３７が再び用いられる。 In normal operation, virtual page numbers are not used after initial address translation. However, if the IBD detects a burst page boundary crossing, the virtual page counter is incremented and another conversion is performed. Since the low order bits that are not part of the virtual page number are set to 1 when the counter is loaded, a simple increment to a 20 bit value results in an actual page number field increment. After being incremented, mask bit 937 is again used to set up the counter for the next increment.

物理アドレスは、変換後、ＭＭＵが有効な物理ページ番号を返すたびにラッチされる９３６。マスクビットは、返された物理ページ番号と元の仮想アドレスビットとを正しく結合するために用いられる。物理アドレスカウンタ９３８は物理アドレスラッチ９３６からロードされる。これはＰＭＣからワードが返されるたびにインクリメントされる。インクリメントされるたびにカウンタはモニタされ、トランザクションがページ境界を越えようとしているか否かを判断する。マスクビットは、カウンタのどのビットが比較に用いられるかを判断するのに使用される。カウンタがページ内に残っているワードの数が２つ以下であることを検出すると、コントロールロジック９３２に信号を出し、２つのデータ転送後現在のＰＣＩ要求を終了し、必要に応じて新たなアドレス変換を要求する。カウンタは新しいアドレス変換後に再びロードされ、ＰＣＩ要求が再開する。 The physical address is latched 936 whenever the MMU returns a valid physical page number after translation. The mask bits are used to correctly combine the returned physical page number with the original virtual address bits. The physical address counter 938 is loaded from the physical address latch 936. This is incremented each time a word is returned from the PMC. Each time it is incremented, the counter is monitored to determine if the transaction is about to cross a page boundary. The mask bits are used to determine which bits of the counter are used for comparison. When the counter detects that the number of words remaining in the page is two or less, it signals the control logic 932 to terminate the current PCI request after two data transfers and, if necessary, a new address Request conversion. The counter is reloaded after a new address translation and the PCI request resumes.

バーストカウンタ９３９は、トランザクションの始点でＩＢｕｓバースト値と共にロードされる６ビットのダウンカウンタである。これはＰＭＣからワードが返されるたびにデクリメントされる。カウンタの値が２つ以下になると、コントロールロジック９３２へ信号を出し、これで２つのデータ転送後、ＰＣＩトランザクションを終了することができる（プリフェッチングが可能でない限り）。 The burst counter 939 is a 6-bit down counter that is loaded with the IBus burst value at the start of a transaction. This is decremented each time a word is returned from the PMC. When the counter value is two or less, a signal is sent to the control logic 932, so that after two data transfers, the PCI transaction can be terminated (unless prefetching is possible).

プリフェッチアドレスレジスタ９４３は、いかなるプリフェッチの最初のワードの物理アドレスと共にロードされる。続くＩＢｕｓトランザクションが開始し、それからプリフェッチカウンタが少なくとも１つのワードが巧くプリフェッチされたことを示したら、トランザクションの最初の物理アドレスがプリフェッチアドレスの値と比較される。両者がマッチすると、プリフェッチデータはＩＢｕｓ引取りを満たすのに用いられ、最後にプリフェッチされたワードの後のアドレスでＰＣＩトランザクション要求が開始する。 The prefetch address register 943 is loaded with the physical address of the first word of any prefetch. When a subsequent IBus transaction starts and then the prefetch counter indicates that at least one word has been successfully prefetched, the first physical address of the transaction is compared to the value of the prefetch address. If they match, the prefetch data is used to satisfy IBus takeover and a PCI transaction request begins at the address after the last prefetched word.

プリフェッチカウンタ９４１は４ビットのカウンタで、プリフェッチ動作中にＰＭＣによってワードが返されるたびに最大入力ＦＩＦＯの深さと同じカウントまでインクリメントされる。続くＩＢｕｓトランザクションがプリフェッチアドレスとマッチすると、プリフェッチカウントがアドレスカウンタに足され、それからバーストカウンタから引かれ、ＰＣＩ要求が要求される位置で開始できるようになる。代わりに、ＩＢｕｓトランザクションがプリフェッチされたデータの一部だけを必要とすると、要求されたバーストの長さはプリフェッチカウントから引かれ、それからラッチされたプリフェッチアドレスに足され、残りのプリフェッチデータは更なる要求を満たすために保留される。 The prefetch counter 941 is a 4-bit counter and is incremented to the same count as the depth of the maximum input FIFO every time a word is returned by the PMC during the prefetch operation. When a subsequent IBus transaction matches the prefetch address, the prefetch count is added to the address counter and then subtracted from the burst counter so that a PCI request can be started at the required location. Instead, if the IBus transaction needs only a portion of the prefetched data, the length of the requested burst is subtracted from the prefetch count, then added to the latched prefetch address, and the remaining prefetch data is further Held to satisfy the request.

データＦＩＦＯ９４４は、８ワード×３３ビットの非同期フォールスルーＦＩＦＯである。ＰＭＣからのデータは、データがプリフェッチの一部であるか否かを表すビットと共にＦＩＦＯに書きこまれる。ＦＩＦＯの先端からのデータは、使用可能になるや否やＦＩＦＯから読み出されＩＢｕｓへ送られる。データ読み出し信号を生成するロジックはｃｌｋと同期して動作し、ＩＢｕｓアクノレッジメント出力を発生する。トランザクションがプリフェッチされたデータを用いて満たされる場合に、コントロールロジックからの信号は、ＦＩＦＯから読み出すプリフェッチされたデータの数の情報をを読み出しロジックに与える。 The data FIFO 944 is an asynchronous fall-through FIFO of 8 words × 33 bits. Data from the PMC is written to the FIFO along with a bit indicating whether the data is part of a prefetch. Data from the leading edge of the FIFO is read from the FIFO and sent to the IBus as soon as it becomes available. The logic that generates the data read signal operates in synchronization with clk and generates an IBus acknowledgment output. When the transaction is satisfied with prefetched data, a signal from the control logic provides the read logic with information on the number of prefetched data to be read from the FIFO.

図１５６は、図１５０のＲＢｕｓレシーバ９１４の構造を示している。コントロールは２つの状態器９５０、９５１との間でスプリットされる。書き込み状態器９５１はＲＢｕｓへのインターフェースをコントロールする。入力アドレス７５２はＲＢｕｓバーストの始点でラッチされる。バーストのそれぞれのデータワードは、バイトイネーブルと共にＦＩＦＯ７５４に書き込まれる。ＦＩＦＯ９５４が充満するようになると書き込みロジック９５１によってｒ−レディが取り消され、オーガナイザがそれ以上のワードを書き込まないようにする。 FIG. 156 shows the structure of the RBus receiver 914 of FIG. Control is split between two state machines 950, 951. Write state machine 951 controls the interface to RBus. Input address 752 is latched at the start of the RBus burst. Each data word in the burst is written to FIFO 754 with a byte enable. When the FIFO 954 becomes full, the write logic 951 cancels the r-ready, preventing the organizer from writing any more words.

書き込みロジック９５１は、再同期開始信号を介してメイン状態器９５０にＲＢｕｓバーストの開始を通知し、オーガナイザがそれ以上のワードを書き込まないようにする。仮想ページ番号を形成する上位アドレスビットはカウンタ９５７にロードされる。仮想ページ番号はＭＭＵへ送られ、ＭＭＵからは物理ページ番号９５８が返される。物理ページ番号と仮想アドレスの下位ビットはマスクに従って再結合され、カウンタ９６０にロードされ、ＰＭＣへのＰＣＩ要求のためのアドレスを提供する。ＰＣＩ要求のそれぞれのワードのためのデータとバイトイネーブルは、すべてのＰＭＣＭインターフェースコントロール信号も扱うメインコントロールロジック９５０によってＦＩＦＯ９５４からクロックアウトされる。メイン状態器は、ビジー信号を介してアクティヴであることを示し、それは書き込み状態器へ再同期して返される。 Write logic 951 notifies main state machine 950 of the start of the RBus burst via a resynchronization start signal, preventing the organizer from writing any more words. The upper address bits forming the virtual page number are loaded into the counter 957. The virtual page number is sent to the MMU, and a physical page number 958 is returned from the MMU. The physical page number and the low order bits of the virtual address are recombined according to the mask and loaded into the counter 960 to provide the address for the PCI request to the PMC. Data and byte enables for each word of the PCI request are clocked out of the FIFO 954 by the main control logic 950 that also handles all PMCM interface control signals. The main state machine indicates active via the busy signal, which is resynchronized back to the write state machine.

書き込み状態器９５１は、ｒ−ファイナルを用いてＲＢｕｓバーストの終了を検出する。するとＦＩＦＯ９５４へのデータのロードを中止し、メイン状態器にＲＢｕｓバーストが終了したことを通知する。メイン状態器はデータＦＩＦＯが空になるまでＰＣＩ要求を継続する。それからビジーを取り消し、書き込み状態器が次のＲＢｕｓバーストを開始するようにする。 The write state machine 951 detects the end of the RBus burst using the r-final. Then, the data loading to the FIFO 954 is stopped, and the main state machine is notified that the RBus burst has ended. The main state machine continues the PCI request until the data FIFO is empty. It then cancels busy and causes the write state machine to start the next RBus burst.

図１５０に再び戻り、メモリマネジメントユニット９１５は、ＩＢｕｓドライバ（ＩＢＤ）９１２とＲＢｕｓレシーバ（ＩＢＲ）９１４のために仮想ページ番号から物理ページ番号への変換を担当する。図１５７に、メモリマネジメントユニットの詳細を示している。１６エントリの変換ルックアサイドバッファ（ＴＬＢ）９７０は、ＴＬＢアドレスロジック９７１から入力データを受け取って出力を送り返す。状態器が含まれているＴＬＢコントロールロジック９７２は、ＲＢＲまたはＩＢＤからＴＬＢアドレスロジックにバッファされている要求を受け取る。要求を受け取ると、入力のソースとＴＬＢによって行われる作業を選択する。有効なＴＬＢ作業は、比較、無効化、全無効化、書き込みと読み出しである。ＴＬＢ入力アドレスのソースとしては、ＩＢＤとＲＢＲインターフェース（比較作業用）、ページテーブルエントリバッファ９７４（ＴＬＢミスサービス用）またはＴＬＢアドレスロジック内のレジスタなどがある。ＴＬＢは、ＴＬＢコントロールロジックにそれぞれの作業のステータスを返す。成功した比較作業からの物理ページ番号はＩＢＤとＲＢＲへ送り返す。ＴＬＢは最も最近アクセスされた（ＬＲＵ）位置の記録を保有し、これはＴＬＢアドレスロジックにとっては書き込み作業用の位置として用いるのに有用である。 Returning again to FIG. 150, the memory management unit 915 is responsible for converting virtual page numbers to physical page numbers for the Ibus driver (IBD) 912 and the RBus receiver (IBR) 914. FIG. 157 shows details of the memory management unit. A 16-entry translation lookaside buffer (TLB) 970 receives input data from the TLB address logic 971 and sends back output. The TLB control logic 972 that contains the state machine receives requests buffered in the TLB address logic from the RBR or IBD. When a request is received, it selects the input source and the work done by the TLB. Valid TLB operations are compare, invalidate, all invalidate, write and read. The source of the TLB input address includes an IBD and RBR interface (for comparison work), a page table entry buffer 974 (for TLB miss service), or a register in the TLB address logic. The TLB returns the status of each work to the TLB control logic. The physical page number from a successful comparison operation is sent back to the IBD and RBR. The TLB maintains a record of the most recently accessed (LRU) location, which is useful for the TLB address logic as a location for write operations.

比較作業が失敗した場合、ＴＬＢコントロールロジック９７２はページテーブルアクセスコントロールロジック９７６にＰＣＩ要求を開始するよう信号を出す。ページテーブルアドレスゼネレータ９７７は、内部ページテーブルポインタレジスタを用い、仮想ページ番号をもとにＰＣＩアドレスを生成する。ＰＣＩ要求から返されたデータは、ページテーブルエントリバッファ９７４へラッチされる。要求される仮想アドレスにマッチするページテーブルエントリが見つかると、物理ページ番号がＴＬＢアドレスロジック９７７へ送られ、その後ページテーブルアクセスコントロールロジック９７６はページテーブルアクセスが完了したことを通知する。それからＴＬＢコントロールロジック９７２は、ＴＬＢに新たなエントリを書き込み、比較作業を再び開始する。 If the comparison operation fails, the TLB control logic 972 signals the page table access control logic 976 to initiate a PCI request. The page table address generator 977 uses the internal page table pointer register to generate a PCI address based on the virtual page number. Data returned from the PCI request is latched into the page table entry buffer 974. When a page table entry matching the requested virtual address is found, the physical page number is sent to the TLB address logic 977, after which the page table access control logic 976 notifies the page table access is complete. The TLB control logic 972 then writes a new entry in the TLB and starts the comparison operation again.

ＳＣＩへのレジスタ信号とＳＣＩからのレジスタ信号は両方の方向に再同期される９８０。信号は全てのサブモジュールへ行き来する。モジュールメモリインターフェース９８１は、標準ＣＢｕｓインターフェースからＴＬＢとページテーブルポインタメモリ要素へのアクセスをデコードする。ＴＬＢアクセスは読み出し専用で、データを得るためにＴＬＢコントロールロジックを用いる。ページテーブルポインタは読み出し・書き込み両方可能で、モジュールメモリインターフェースによってダイレクトにアクセスされる。これらのパスには同期回路も含まれている。 Register signals to and from the SCI are resynchronized 980 in both directions. The signal goes back and forth to all submodules. The module memory interface 981 decodes access to the TLB and page table pointer memory elements from the standard CBus interface. TLB access is read-only and uses TLB control logic to obtain data. The page table pointer can be both read and written, and is directly accessed by the module memory interface. These paths also include a synchronization circuit.

３．１８．１１周辺インターフェース制御部図１５８には、図２の周辺インターフェース制御部（ＰＩＣ）の一例を詳細に示している。ＰＩＣ２３７は、外部周辺デバイスへ、又はデバイスからデータを転送するいくつかのモードの１つで動作する。基本的なモードは、
１）ビデオ出力モード：このモードで、データは外部ビデオクロックとクロック・データイネーブルのコントロール下で、周辺へ転送される。ＰＩＣ２３７は、出力データに対し必要とされるタイミングで出力クロックとクロックイネーブルサインを送る。 3.18.11 Peripheral Interface Control Unit FIG. 158 shows an example of the peripheral interface control unit (PIC) in FIG. 2 in detail. The PIC 237 operates in one of several modes for transferring data to or from an external peripheral device. The basic mode is
1) Video output mode: In this mode, data is transferred to the periphery under the control of the external video clock and clock data enable. The PIC 237 sends an output clock and a clock enable sign at the required timing for the output data.

２）ビデオ入力モード：このモードで、データは外部ビデオクロックとクロック・データイネーブルのコントロール下で、周辺へ転送される。
３）セントロニクスモード：このモードは、ＩＥＥＥ１２８４標準に定義されている標準プロトコルに従い、周辺へと周辺からデータを転送する。
ＰＩＣ２３７は、必要に応じて、内部データソースや目的地から外部インターフェースのプロトコルを分離する。内部データソースは、出力データの単一ストリームにデータを書き込み、選択されているモードによって外部周辺機器へ転送される。同様に、外部周辺からの全てのデータは単一入力データストリームに書き込まれ、可能な内部データ目的地の１つに要求されたトランザクションを満たすのに用いられる。 2) Video input mode: In this mode, data is transferred to the periphery under the control of external video clock and clock data enable.
3) Centronics mode: This mode transfers data to and from the periphery according to the standard protocol defined in the IEEE 1284 standard.
The PIC 237 separates the external interface protocol from internal data sources and destinations as needed. The internal data source writes data to a single stream of output data and is transferred to the external peripheral device according to the selected mode. Similarly, all data from the external periphery is written to a single input data stream and used to satisfy the requested transaction for one of the possible internal data destinations.

可能な出力データのソースとしては、ＬＭＣ２３６（ＡＢｕｓを用いる）、ＲＯ２４９（ＲＢｕｓを用いる）、それから一般ＣＢｕｓの３つが挙げられる。ＰＩＣ２３７は、これらのデータソースからのトランザクションに一度に１つのみに応答する。１つのソースからのトランザクションは次のソースが考慮される前に完全に終了するのである。一般に、いつでも１つのみのデータソースしかアクティヴになってはならないのである。２つ以上のソースがアクティヴになった場合にはＣＢｕｓ、ＡＢｕｓ、ＲＢｕｓのプライオリティで順に処理される。 There are three possible sources of output data: LMC 236 (using ABus), RO249 (using RBus), and general CBus. PIC 237 responds to transactions from these data sources only one at a time. Transactions from one source are completely terminated before the next source is considered. In general, only one data source should be active at any given time. When two or more sources become active, they are processed in order of priority of CBus, ABus, and RBus.

通常通り、モジュールはＰＩＣの内部レジスタが含まれている標準ＣＢｕｓインターフェース９９０のコントロール下で動作する。更に、ＣＢｕｓインターフェース９９２は、コプロセッサ２２４を介して周辺デバイスをアクセスし、コントロールすることができる。ＡＢｕｓインターフェース９９１もローカルメモリ制御部とのメモリ相互作用を処理することができる。結果オーガナイザ２４９に加え、ＡＢｕｓインターフェース９９１とＣＢｕｓインターフェース９９２は両方ともバイト−ワイドＦＩＦＯが含まれている出力データパス９９３へデータを送る。出力データパスへのアクセスは、どのソースが出力ストリームに対してプライオリティまたは所有権を持っているかを常にチェックする仲裁者によってコントロールされる。出力データパスは、どっちがイネーブルになっているかによってビデを出力制御部９９４とセントロニクス制御部９９７とインターフェースする。それぞれのモジュール９９４、９９７は出力データパスの内部ＦＩＦＯから一度に１バイトを読み出す。セントロニクス制御部９９７は、周辺デバイスをコントロールするために標準セントロニクスデータインターフェースを具現する。ビデオ出力制御部には、要求されるビデオ出力プロトコルに従い、出力パッドをコントロールするロジックが含まれている。同様に、ビデオ入力制御部９９８には、用いられているいかなるビデオ入力標準もコントロールするロジックが含まれている。ビデオ入力制御部９９８は入力データパスユニット９９９へ出力を出し、これは再びビデオ入力制御部９９８かセントロニクス制御部９９７かのいずれかによって一度に１バイトずつ非同期でＦＩＦＯに書き込まれるデータとバイトワイド入力ＦＩＦＯを構成する。 As usual, the module operates under the control of the standard CBus interface 990, which contains PIC internal registers. Furthermore, the CBus interface 992 can access and control peripheral devices via the coprocessor 224. The ABus interface 991 can also handle memory interactions with the local memory controller. In addition to the result organizer 249, the ABus interface 991 and the CBus interface 992 both send data to the output data path 993, which includes a byte-wide FIFO. Access to the output data path is controlled by an arbitrator that always checks which source has priority or ownership over the output stream. The output data path interfaces the bidet with the output controller 994 and the Centronics controller 997 depending on which is enabled. Each module 994, 997 reads one byte at a time from the internal FIFO of the output data path. The Centronics controller 997 implements a standard Centronics data interface for controlling peripheral devices. The video output control unit includes logic for controlling the output pad in accordance with a required video output protocol. Similarly, video input controller 998 includes logic to control any video input standard being used. The video input controller 998 outputs to the input data path unit 999, which again data and byte wide inputs written asynchronously into the FIFO one byte at a time by either the video input controller 998 or the Centronics controller 997 again. Configure the FIFO.

データタイマ９９６には種々のカウンタが含まれており、出力データパス９９３と入力データパス９９９内のＦＩＦＯの現在状態をモニタするために用いられている。以上のことから、コプロセッサを用いると多重イメージまたは単一イメージの多重部分を同時に生成するために二重ストリームの命令を実行するのが可能に思われる。一次命令ストリームは現在ページの出力イメージを得るのに用いられ、一次命令ストリームがアイドルになっている間に次のページのレンダリングを始めるために二次命令ストリームを用いることができる。その結果、標準モードの動作で、現在ページのイメージはレンダリングされてからＪＰＥＧコーダ２４１を用いて圧縮される。イメージをプリントする必要がある時に、コプロセッサ２４１は二度ＪＰＥＧコーダ２４１を用いてＪＰＥＧエンコーデッドイメージを解凍する。出力デバイスにからそれ以上のＪＰＥＧデコーデッドイメージの部分が必要とされないアイドルタイムの間に、次のページまたはバンドの構成のために命令を実行するのが可能である。一般にこのプロセスは、コプロセッサの動作オーバーラップにより、イメージを生成するレートを上げる。特に、コプロセッサ２２４を用いると、コプロセッサに付いたプリンタによってプリントが行われ、結果的にレンダリングスピードが上がるため、イメージプロセシング作業のスピードアップの面でベネフィットが得られるのである。 The data timer 996 includes various counters and is used to monitor the current state of the FIFO in the output data path 993 and the input data path 999. From the above, it appears that with a coprocessor, it is possible to execute dual stream instructions to simultaneously generate multiple images or multiple portions of a single image. The primary instruction stream is used to obtain an output image of the current page, and the secondary instruction stream can be used to begin rendering the next page while the primary instruction stream is idle. As a result, in the operation in the standard mode, the image of the current page is rendered and then compressed using the JPEG coder 241. When the image needs to be printed, the coprocessor 241 uses the JPEG coder 241 twice to decompress the JPEG encoded image. It is possible to execute instructions for the construction of the next page or band during idle time when no more JPEG decoded image portions are required from the output device. In general, this process increases the rate at which images are generated due to the operational overlap of the coprocessors. In particular, when the coprocessor 224 is used, printing is performed by a printer attached to the coprocessor, and as a result, the rendering speed is increased, so that a benefit can be obtained in terms of speeding up the image processing operation.

上記好適な実施例は本発明の１つの実施形態であり、本発明の範囲を外れずに当業者にとって自明な修正ができることが、以上から明らかであろう。 It will be apparent from the foregoing that the preferred embodiment is one embodiment of the present invention and that modifications obvious to those skilled in the art can be made without departing from the scope of the invention.

付録Ａ
コプロセッサマイクロプログラミング
この節では新しい命令の実行毎にコプロセッサ内で行われる動作について詳述する。命令実行の間にコプロセッサにより行われるすべてのセルフコンフィグレーションは内部のレジスタのリード／ライトにより実現されており、従って、コプロセッサは外部のＣバスインターフェースあるいはホストによってＰＣＩバスインターフェースを用いることで完全にマイクロプログラミング可能である。但し、ホストを用いるマイクロプログラミングの場合には一般的にホスト同期の問題から困難となることが予想される。本章は読者がコプロセッサについて以下の点で十分な知識を持っていることを前提している。
１．実行モデル
２．命令セットとコーディング
３．レジスタセット
４．内部構造
Ａ．１一般事項
Ａ１．１コプロセッサのセットアップに関する一般事項
コントロール命令とローカルＤＭＡ命令以外のすべての命令については、コプロセッサで内のデータの流れは基本的にピクセルオーガナイザの制御下におかれる。ピクセルオーガナイザは入力データストリームの先頭のフェッチ、データのカウント、及び最後のデータがフェッチされた時期の決定について責任を持っている。コプロセッサ内のその他のモジュールは基本的に、送られてきたデータに単に応答するだけである。
Ａ１．２モジュールのコンフィグレーション順序
すべてのモジュールが命令毎にセットアップされるわけではない。いくつかのモジュールは命令デコーディング時に、全くコンフィグレーションされない。モジュールのコンフィグレーション順序は常にＰＯ，ＤＣＣ，ＯＯＢ，ＯＯＣ，ＭＤＰ，ＪＣ，ＲＯ，ＰＩＣの順である。
Ａ１．３その他のレジスタの設定
命令が、あるレジスタ値の設定を含んで符号化された場合にはそのレジスタは次の順序に従うマイクロプログラミングにより設定される。
１．設定されるべきレジスタを持つモジュールに、ほかにレジスタセットが存在しなければ、そのレジスタはほかのいかなるレジスタ設定よりも先に設定される。
２．設定されるべきレジスタを持つモジュールに、ほかにもレジスタセットがあるときはそのレジスタはほかのレジスタの設定が終わった後に、そのモジュールの＿ｃｆｇレジスタの直前に設定される。
Ａ１．４整合性のない命令オペランドのコーディング
多くの命令は、オペランド及び結果のデータタイプが指定されているので、ほかのデータタイプが指定された場合には、無意味な結果を返す。各オペランドに対し、コプロセッサは次の手順で目的のオペランドのフォーマットを決定する。
１．オペランドの内部フォーマットが１つのピクセル（圧縮バイトあるいは非圧縮バイト）に特化されている場合には、対応するオペランドオーガナイザはこれを反映して設定される。データキャッシュコントローラはコンフィグレーションされず、従ってノーマルモードで演算が継続される。
２．オペランドの内部フォーマットが「その他の形式」に特化されている場合には、コプロセッサは命令からオペランドのフォーマットを生成する。オペランドＢとオペランドＣについては前進的である。オペランドＡについて「その他の形式」は元来指定されていなく、コプロセッサの振る舞いは定義されていない。対応するオペランドオーガナイザはバイパスモードになり、データキャッシュコントローラは得られたフォーマットのオペランドデータを管理するように設定される。マイクロプログラミングは合理的に様々なモジュール間で相互独立である。
Ａ１．５疑似命令の文法
・命令の実行順序は左端の番号で決定される。
・レジスタ名はＨｅｌｖｅｔｉｃａＢｏｌｄ体でかかれている。
・レジスタフィールドはｒｅｇｉｓｔｅｒ．ｆｉｅｌｄによって示される。
・Ｉ，Ｄは現在復号化されている命令ワードとデータワードをそれぞれ示す。
・Ａ，Ｂ及びＣは現在復号化されているオペランドワードＡ、オペランドワード
Ｂ、オペランドワードＣを示す。
・Ａ＿ｄｅｓｋｒｉｐｔｏｒ，Ｂ＿ｄｅｓｋｒｉｐｔｏｒおよびＣ＿ｄｅｓｋｒｉｐｔｏｒは現在復号化されている命令のデータワードのデスクリプタを示す。
・Ｒは現在復号化されている命令の結果ワードを示す。
・”Ｘ：Ｙ”はＸとＹの連結を示す。
・”＠Ｘ”はコプロセッサのレジスタ番号Ｘを示す。
・”Ｃｂｕｓ（Ｘ）”はＣバスオペレーションＸの実行を示す。
・”^＊Ｃｂｕｓ（Ｘ）”はＣバスオペレーションＸによる受け取りデータを示す。
・”^＊Ｘ”は仮想メモリ番地Ｘを示す。
・”？？”は不明な値、あるいは未定の値を示す。
・”ｓｅｔ”はデータマニピュレーションレジスタの設定を示す。
Ａ．２合成演算子
注：
１．主要オペコードは０ｘＣと０ｘＤ
２．曖昧さは最上位アドレスのバイト（すなわち、最上位バイト）であると考える。
３．アキュムレータあるいはオペランドはプレ乗算されていてもよい。
４．結果は非プレ乗算されていてもよい。
５．命令長は入力ピクセルの数により定義されている。
Ａ．３色空間変換
注：
１．入力空間は常に３次元である。デフォルトでは３つの最下位なピクセルのチャネルである。曖昧さは排除される。
２．カラーテーブルのフォーマットはひとつの出力チャネルを含むものか、４つの出力チャネルを含むもののうちどちらかである。
Ａ．４ＪＰＥＧ命令
注：
１．オペコードは０ｘ２である。
２．オペランドＣはヤットするためのレジスタでもよい。
３．オプションは多数存在する。
・サブサンプリングを行う／行わない。
・フィルタリングを行う／行わない。
・１，３あるいは４スキャン。
４．これらの命令は命令実行前に設定されたいくつかのレジスタと関係している。
Ａ．４．１伸長
注：１．以下のレジスタは命令実行前に設定されている必要がある。
・ｒｏ＿ｉｄｒ：出力画像次元数レジスタ
・ｒｏ＿ｃｕｔ：出力カットレジスタ
・ｒｏ＿ｌｍｔ：出力制限レジスタ
Ａ．４．２圧縮
注：
１．以下のレジスタは命令実行前に設定されている必要がある。
・ｐｏ＿ｉｄｒ：出力画像次元数レジスタ
・ｊｃ＿ｒｍｌ：再スタートマーカのインターバル
・ｒｏ＿ｃｕｔ：出力カットレジスタ
・ｒｏ＿ｌｍｔ：出力制限レジスタ
Ａ．５データコーディング
注：
１．すべてのデータコーディング操作は圧縮、圧縮解除いずれの場合も同じ様に扱われる。これらの操作設定はＪＰＥＧの時とほとんど同じである。
２．可能なエンコーディング操作
・ハフマン符号化
・予測符号化
３．可能なデコーディング操作
・高速ハフマン復号化
・低速ハフマン復号化
・ｐａｃｋｂｉｔｓ復号化（バージョンＡ）
・ｐａｃｋｂｉｔｓ復号化（バージョンＢ）
・予測復号化
４．オペランドＣは設定するためのレジスタでも良い。
５．以下のレジスタは命令実行前に設定されている必要がある。
・ｒｏ＿ｃｕｔ：出力カットレジスタ
・ｒｏ＿ｌｍｔ：出力制限レジスタ
Ａ．６変換と畳み込み
１．オペコードは０ｘ４（畳み込み）と０ｘ５（変換）。
２．コプロセッサは画像変換と画像畳み込みのそれぞれのために必要となるスーパーセットである操作を行う。画像変換と画像畳込みの唯一の違いは、コプロセッサに関する限り、画像変換ではカーネルステップサイズがカーネルの大きさ（水平、垂直）なのに対して、畳込みではステップサイズが１ソースピクセルとなっていることである。
３．オプション：
・隣接ピクセルへのスナッピングおよび補間
・ピクセル（カーネル）の蓄積を行うか否か
・ソースピクセルのプレ乗算を行うか否か
・最終結果のクランプ、ラッピング、絶対値
４．注：変換と畳込みは元の位置には実行できない。つまり、ソースのポインタとデスティネーションのポインタが同じであるときは、その内容が破壊される。
Ａ．７行列乗算
注：
１．オペコードは０ｘ３
２．オプション：
・ソースピクセルのプレ乗算を行うか否か
・最終結果のクランプ、ラッピング、絶対値化
・オペランドＣはレジスタに書き込んでも良い
Ａ．８ハーフトーン処理
注：
１．オペコードは０ｘ７
２．オプションはハーフトーンのレベル値のみ
３．ハーフトーンスクリーンが適切にメッシュあるいはアンメッシュされているかぎり、ピクセルあるいはバイトに対して行うことができる。
Ａ．９メモリーコピー
注：
１．オペコードは０ｘ９２．この命令はメモリーコピーの操作を完了するために、全く個別の機構を用いている。
・汎用データ転送命令はコプロセッサにおける通常のデータフローを利用し、ＰＯおよびＲＯ内のデータ操作ユニットを用いる様々な関数を利用できる。
・ペリフェラルＤＭＡ命令はＰＩＣとＬＭＣ間の直接的なコネクションを利用する。このことはデータ操作ができないことを意味し、後続の命令と同時実行が可能である。
Ａ．９．１汎用データ転送
Ａ．９．２ペリフェラルＤＭＡ転送
注：
１．同時実行でもそうでなくとも良い。このことは、ＩＣによって扱われている。
２．オペランドＣは設定するレジスタでも良い
３．ＰＩＣはデータを扱うモジュールなので、この命令はほかの”能動”命令と異なる。
Ａ．１０フォトＣＤ伸長
この命令群は３つの異なる操作すなわち、水平補間、垂直補間、残部融合から構成される。垂直補間と残部融合の設定方法は同じである。これら全ての命令のオペコードは０ｘ９である。
Ａ．１０．１水平補間
注：
１．ピクセルあるいはバイトに対して実行可能
２．この命令はオペランドが１つの命令であり、オペランドＣは設定するレジスタでも良い。
Ａ．１０．２垂直補間と残部融合
注：
１．垂直補間と残部融合の設定は同じである。
２．ピクセルとバイトの両方に対して実行可能。
３．この命令はオペランドが２つの命令であり、オペランドＣはレジスタセットでも良い。
Ａ．１１制御命令
注：
１．制御命令は２種類の操作、すなわちフロー制御命令と内部アクセス命令からなる。
Ａ．１１．１フロー制御
注：
１．オペコードは０ｘＢ
２．フロー制御命令は現在、各種ジャンプ命令と各種の待機命令から成っている。
３．コプロセッサ内では明確な設置は行われず、またこの命令は、”能動”命令ではない。つまり、ほかの命令のようにコプロセッサ内のサブモジュールが実際に何かを行ったりはしない。
４．オペランドＣは設定するレジスタでも良い。
Ａ．１１．２内部アクセス（リード）
注：
１．オペコードは０ｘＡ
２．リード命令はデータをコプロセッサ外に転送する。
３．ＲＯが実際にコプロセッサ内ですべてを行う唯一のモジュールである。
Ａ．１１．３内部アクセス（ライト）
注：
１．オペコードは０ｘＡ
２．ライト命令はデータをコプロセッサ内に転送する。
３．この命令は”能動”命令ではないので、ＩＣ以外のモジュールは実際には何も行わない。
Ａ．１２予約された命令
注：
１．オペコード０ｘ０，０ｘＦは予約されている。
２．予約された命令はマスク可能なエラーを出す。
３．これらの予約された命令はコプロセッサが今後改訂されたときにほかの命令として使用されることになっている。
付録Ｂ：レジスタ
１．１レジスタおよびテーブル
本節ではコプロセッサのレジスタについて解説する。これらのレジスタは３通りの方法で変更可能である。
１．特定のコプロセッサの命令群ははレジスタの読み書きをするためにある。これらの命令群を用いることでレジスタは、イニシエータのＰＩＣバスサイクルの開始あるいは汎用インターフェースのトランザクションを用いて、ローカルメモリインターフェースに関連するメモリへの、あるいはメモリからの読み書きが行われる。
２．多くのレジスタは命令実行の副作用により内容が変化する。命令実行のためにコプロセッサが自身の設定を行うという主要な機構は、様々なレジスタを現在の状態を反映するように設定することで実現されている。命令実行終了後には各レジスタはコプロセッサの状態を反映する。多くの典型的な処理はある命令により完全に特定され、設定される。いくつかのレジスタでは命令実行の直前に設定する必要がある。
「予約」レジスタビットの意味
あらゆるレジスタ或はその構成要素の「予約」の意味は次の通りである。
・予約された場所への書き込みは行えるが、そのデータは棄却される。
・予約された場所からの読み込みは行えるが、そのデータは不定である
全ての特定されていないレジスタ及びレジスタフィールドは「予約」である。
１．１．１レジスタの分類
コプロセッサ内のレジスタは本節に記述される振る舞いに基づいて分類される。これらの記述は
・外部：モジュール外部（からのアクセス）。ＣＢｕｓインターフェースを用いた外部アクセスである。すなわち、命令コントローラあるいは外部ＣＢｕｓインターフェースによるターゲットモードのＰＣＩを用いる。注、レジスタは、バイセットモードを介してＰＣＩバスからセットできない。
・内部：モジュール内部（からのアクセス）
状態レジスタ
状態レジスタは外部からは読み込み専用で、内部からは読み書き可能。
コンフィグ１レジスタ
コンフィグ１レジスタは外部からは読み書き可能で、内部からは読み込み専用である。コンフィグ１レジスタはタイプＣのＣＢｕｓ操作はサポートせず（すなわち、ビットセットモードをサポートしない）、アドレス値のようなバイト（またはそれより大きな）コンフィギュレーション情報を保持するレジスタとして用いられる。
コンフィグ２レジスタ
コンフィグ２レジスタも外部から読み書き可能で、内部からは読み込み専用である。コンフィグ２レジスタはタイプＣのＣＢｕｓ操作（すなわちビットセットモード）をサポートし、ビット単位で設定する必要のあるコンフィギュレーション情報を保持するレジスタとして用いられる。
コントロール１レジスタ
コントロール１レジスタは外部および内部から読み書き可能。コントロール１レジスタはタイプＣのＣＢｕｓ操作をサポートせず（すなわちビットセットモードをサポートしない）、アドレス値のようなバイト（またはそれより大きなコントロール情報を保持するレジスタとして用いられる。
コントロール２レジスタ
コントロール２レジスタは外部および内部から読み書き可能。コントロール２レジスタはタイプＣのＣＢｕｓ操作（すなわちビットセットモード）をサポートし、ビット単位で設定する必要のあるコントロール情報を保持するレジスタとして用いられる。
割り込みレジスタ
割り込みレジスタ内のビットは内部からは１にセットでき、外部からは１を書き込むことによって０にリセットできる。モジュール割り込み／エラーレジスタもこのタイプである。モジュールの割り込み／エラーレジスタは３つのフィールドから構成される。
［７：０］モジュールによって生成されたあらゆるエラー状態（ステータス）を意味する
［２３：８］モジュールによって生成されたあらゆる例外状態を意味する
［３１：２４］モジュールによって生成されたあらゆる割り込み状態を意味する１．１．２レジスタマップ
表１．１はコプロセッサのレジスタである。番号はアドレスではなくレジスタ番号である。
表１．１コプロセッサレジスタ
１．１．３レジスタ定義
汎用モジュールレジスタ
命令コントローラレジスタ
Ｉ．ｉｃ＿ｃｆｇ
ｉｃ＿ｃｆｇレジスタは３つの部分に別れる。最下位バイトはグローバルコンフィギュレーション情報を含む。最下位から３番目のバイトはストリームＡのコンフィギュレーション情報を含み、最上位バイトはストリームＢのコンフィギュレーション情報を含む。このレジスタのリセット値は０ｘ００００００００である。
ｍ．ｉｓ＿ｓｔａｔ
このレジスタは４つのセクションに分かれている。最下位バイトはＩＣの内部状態を保持する。最下位から２番目のバイトは現在の命令の復号化された結果と現在及びプリフェッチした命令ストリームを保持する。最上位から２番目のバイトはＡストリームに関してすべてのステータス情報を保持する。最上位バイトはＢストリームに関する情報を保持する。このレジスタのリセット値は０ｘ００００００００である。
ｎ．ｉｃ＿ｅｒｒｉｎｔ
このレジスタはＩＣ内部で割り込みやエラーが発生したかどうかを示す、アクティブ・ハイのフラグを含む。それぞれのビットは１を書き込むことでクリアされる。
ｏ．ｉｃ＿ｅｒｒ＿ｉｎｔ＿ｅｎ
このレジスタは様々なエラーや割り込みの許可のマスクを含み、リセット値は０ｘ００００００００である。
ｐ．ｉｃ＿ｉｐａ
このレジスタはストリームＡの命令フェッチに用いられる仮想アドレスの最上位３０ビットを保持する。２つの最下位ビットは命令が整列されてるはずであるとして０に仮定される。このレジスタのリセット値は０ｘ００００００００である。
ｑ．ｉｃ＿ｔｄａ
このレジスタはストリームＡの“ｔｏｄｏ”値を保持する。これは適正な命令が存在するまでの３２ビット（ラッピング）のシーケンス番号である。このレジスタのリセット値は０ｘ００００００００である。
ｒ．ｉｃ＿ｆｎａ
このレジスタはストリームＡの「終了」値を保持する。これは３２ビット（ラッピング）のシーケンス番号で最後に完了した命令を示している。このレジスタのリセット値は０ｘ００００００００である。
ｓ．ｉｃ＿ｉｎｔａ
このレジスタはストリームＡの「割り込み」番号を保持する。これは機構が有効であり用意されている場合にどこへ割り込みをかけるかの、３２ビット（ラッピング）のシーケンス番号である。このレジスタのリセット値は０ｘ００００００００である。
ｔ．ｉｃ＿ｌｏａ
このレジスタはストリームＡで実行される最後の重複命令の３２ビット（ラッピング）のシーケンス番号を保持する。このレジスタのリセット値は０ｘ００００００００である。
ｕ．ｉｃ＿ｉｐｂ
このレジスタはストリームＢの命令フェッチに用いられる仮想アドレスの最上位３０ビットを保持する。２つの最下位ビットは命令が整列されているはずであるとして０に仮定される。このレジスタのリセット値は０ｘ００００００００である。
ｖ．ｉｃ＿ｔｄｐ
このレジスタはストリームＢの“ｔｏｄｏ”値を保持する。これは適正な命令が存在するまでの３２ビット（ラッピング）番号である。このレジスタのリセット値は０ｘ００００００００である。
ｗ．ｉｃ＿ｆｎｂ
このレジスタはストリームＢの「終了」値を保持する。これは３２ビット（ラッピング）のシーケンス番号で最後に完了した命令を示している。このレジスタのリセット値は０ｘ００００００００である。
ｘ．ｉｃ＿ｉｎｔｂ
このレジスタはストリームＢの「割り込み」番号を保持する。これは機構が有効であり用意されている場合にどこへ割り込みをかけるかの、３２ビット（ラッピング）のシーケンス番号である。このレジスタのリセット値は０ｘ００００００００である。
ｙ．ｉｃ＿ｌｏｂ
このレジスタはストリームＢで実行される最後の重複命令の３２ビット（ラッピング）のシーケンス番号を保持する。このレジスタのリセット値は０ｘ００００００００である。
ｚ．ｉｃ＿ｓｅｍａ
このレジスタはｉｃ＿ｓｔａｔレジスタの副作用を用いたエイリアスであり、このレジスタの読み込はストリームＡのレジスタセマフォの要求の副作用である。
ａａ．ｉｃ＿ｓｅｍｂ
このレジスタはｉｃ＿ｓｔａｔレジスタの副作用を用いたエイリアスであり、このレジスタの読み込みはストリームＢのレジスタセマフォの要求の副作用である。
入力インターフェースレジスタ
ａｂ．ｉｉｓ＿ｃｆｇ
ａｃ．ｉｉｓ＿ｓｔａｔ
ａｄ．ｉｉｓ＿ｅｒｒ＿ｉｎｔ
ａｅ．ｉｉｓ＿ｅｒｒ＿ｉｎｔ＿ｅｎ
ａｆ．ｉｉｓ＿ｉｃ＿ａｄｄｒ
ａｇ．ｉｉｓ＿ｄｃｃ＿ａｄｄｒ
ａｈ．ｉｉｓ＿ｐｏ＿ａｄｄｒ
ａｉ．ｉｉｓ＿ｂｕｒｓｔ
ａｊ．ｉｉｓ＿ｂａｓｅ＿ａｄｄｒ
ａｋ．ｉｉｓ＿ｔｅｓｔ
外部インターフェースコントローラレジスタ
ａｌ．ｅｉｃ＿ｃｆｇ
ａｍ．ｅｉｃ＿ｓｔａｔ
ａｎ．ｅｉｃ＿ｅｒｒ＿ｉｎｔ
ｅｉｃ＿ｅｒｒ＿ｉｎｔレジスタのエラー及び割り込みビットはＥＩＣのみによって設定でき、ソフトウェアのみによってリセットできる。通常のエラー及び割り込みビットはそのビットに１を書き込むことでリセットされる。ＰＣＩコンフィギュレーションレジスタビットのコピーであるエラービットはＰＣＩコンフィギュレーションレジスタに書き込むことでクリアされなければならない。すなわち、ｅｉｃ＿ｅｒｒ＿ｉｎｔでのコピーは何も影響しない。
ａｏ．ｅｉｃ＿ｅｒｒ＿ｉｎｔ＿ｅｎ
ａｐ．ｅｉｃ＿ｔｅｓｔ
ａｑ．ｅｉｃ＿ｐｏｂ
ａｒ．ｅｉｃ＿ｈｉｇｈ＿ａｄｄｒ
ａｓ．ｅｉｃ＿ｗｔｌｂ＿ｖ
ａｔ．ｅｉｃ＿ｗｔｌｂ＿ｐ
ａｕ．ｅｉｃ＿ｍｍｕ＿ｖ
注：このレジスタの値は、ＭＭＵがページフォールトエラーあるいはＭＭＵからＰＣＩバスのエラーにより無効でないなら、いつでも変更可能である。
ａｖ．ｅｉｃ＿ｍｍｕ＿ｐ
注：このレジスタの値は、ＭＭＵがページフォールトエラーあるいはＭＭＵからＰＣＩバスのエラーにより無効でないなら、いつでも変更可能である。
ａｗ．ｅｉｃ＿ｉｐ＿ａｄｄｒ
注：このレジスタの値はＩＢＤがＩＢｕｓからＰＣＩバスへのエラーによって無効でないならいつでも変更可能である。
ａｘ．ｅｉｃ＿ｒｐ＿ａｄｄｒ
注：このレジスタの値はＲＢＲがＲＢｕｓからＰＣＩバスへのエラーによって無効でないなら、いつでも変更可能である。
ａｙ．ｅｉｃ＿ｉｇ＿ａｄｄｒ注：このレジスタの値はＧＢＣが汎用バスのエラーによって無効でないなら、いつでも変更可能である。
ａｚ．ｅｉｃ＿ｒｇ＿ａｄｄｒ
注：このレジスタの値はＧＢＣが汎用バスのエラーによって無効でないなら、いつでも変更可能である。
ＰＣＩバスコンフィギュレーション空間のエイリアス
１６ワードからなるＰＣＩバスコンフィギュレーション空間は０ｘｃ０から０ｘｃｆまでのアドレスで示されるレジスタにエイリアスされている。
ローカルメモリコントローラレジスタ
ｂａ．ｌｍｉ＿ｃｆｇ
このレジスタはＬＭＣの処理モードとパラメータを決定するのに用いられる多くのコンフィギュレーションビットと制御ビットを含む。ｓｄｒａｍ＿１ピンがハイの時ＳＤＲＡＭ処理を特別に参照するビットは全く影響を持たない。このレジスタはｃｌｋｉｎの周波数が８０ＭＨｚのとき３．２マイクロ秒のリフレッシュ間隔であるようなリセット値０ｘ２００００１００をもつ。すべての特別なモードや機能は電源投入時には無効であり、すべてのアクセス権限は等しく０に設定される。リフレッシュはリセット時に有効であるが、ほかのモジュールは無効（Ｅ＝０）である。リフレッシュはＥビットに影響されない。
ｂｂ．ｌｍｉ＿ｓｔａｔ
ステータスレジスタはマシン内部の情報と同様にモジュールのアクティブや未決定ビットからなる。ステートマシンはＣＢｕｓインターフェースの２倍のクロックで駆動されており、従って最新の８０ＭＨｚクロック２サイクルそれぞれの状態情報を保持するのには２フィールド必要である。
ｂｃ．ｌｍｉ＿ｅｒｒ＿ｉｎｔ
エラーと割り込みのステータスレジスタは割り込み、例外、エラー状態の情報を保持する。レジスタは読み書きでき、読み込みはステータス情報を返し、特定ビットへの１の書き込みはそのビットをリセットする。０の書き込みはそのビットに対して全く影響を持たない。
このレジスタはリセット値０ｘ００００００００を持たなくてはならず、これは割り込み及びエラーが発生していないことを示す。予約ビットは常に０であり決して状態を変更できない。
ｂｄ．ｌｍｉ＿ｅｒｒ＿ｉｎｔ＿ｅｎレジスタ
エラー、例外、割り込み有効レジスタはエラー、例外割り込み信号の有効、無効の選択に用いられる。レジスタは読み書きできる。このレジスタはｌｍｉ＿ｅｒｒ＿ｉｎｔレジスタ内のエラー、例外、割り込みそれぞれに基づいて、ビット単位で有効化するのに用いられる。このレジスタのビットとｌｍｉ＿ｅｒｒ＿ｉｎｔレジスタのビットとの間には１対１の対応がある。もしｌｍｉ＿ｅｒｒ＿ｉｎｔ＿ｅｎレジスタの特定のビットがハイになったらｌｍｉ＿ｅｒｒ＿ｉｎｔレジスタの対応するビットが有効になり、それがハイであるならば、ＬＭＣモジュールエラー、例外あるいは割り込み信号、ｃ＿ｅｒｒ、ｃ＿ｅｘｐ、あるいはｃ＿ｉｎｔが発生できる。もしｌｍｉ＿ｅｒｒ＿ｉｎｔ＿ｅｎレジスタの特定のビットがクリアされたらたらｌｍｉ＿ｅｒｒ＿ｉｎｔレジスタの対応するビットが無効になり、ｃ＿ｅｒｒ、ｃ＿ｅｘｐあるいはｃ＿ｉｎｔを発生させることはできない。ＬＭＣには例外はないので、このレジスタのｅｘｐ＿ｍａｓｋビットは全く影響せず、すべて予約である。このレジスタのリセット値はすべてのエラー及び割り込み源を無効にする０ｘ００００００００である。使用されないビットは常に０であり、ハイにセットすることはできない。
ｂｅ．ｌｍｉ＿ｄｃｆｇ
このコンフィギュレーションレジスタはＤＲＡＭチップを使用する場合のサイズやコンフィギュレーションを決定する設計パラメータを保持する。このレジスタはすべてのタイミング制限の値を最大値にするようなリセット値０ｘ０００７ｆｆ８０を保持する。
ｂｆ．ｌｍｉ＿ｍｏｄｅレジスタ
このコンフィギュレーションレジスタは初期化処理の一環としてＳＤＲＡＭモードレジスタに書き込まれる情報を保持する。このレジスタは常に読み書き可能で、初期化ビットをセットすることによってＳＤＲＡＭに書き込んでも良い。このレジスタはリセット値０ｘ００３７をもつ。この有用なデフォルト値は電源投入プリチャージ後あるいはレベル１のリセット後直ちに要求される。これは読み込み遅延を３クロックに設定し、バースト長をシーケンシャルラップを用いたフルページに設定する。あらゆるリセットの後、もしｓｄｒａｍ＿１ピンがローであれば、ＳＤＲＡＭモードレジスタを初期的にプログラムするために、初期化ビットはセットされる。モードレジスタの書き込み実行後、このビットは自動的にゼロにクリアされる。
周辺インターフェースレジスタ
ｂｇ．ｐｉｃ＿ｃｆｇレジスタ
ｂｈ．ｐｉｃ＿ｓｔａｔ
ｂｉ．ｐｉｃ＿ｅｒｒ＿ｉｎｔ
ｐｉｃ＿ｅｒｒ＿ｉｎｔレジスタのエラーおよび割り込みビットはＰＩＣのみによりセットされ、ソフトウェアのみによってリセットされる。それぞれのビットは１を書き込むことでリセットされる
ｂｊ．ｐｉｃ＿ｅｒｒ＿ｉｎｔ＿ｅｎ
ｂｋ．ｐｉｃ＿ａｂｕｓ＿ｃｆｇ
ｂｌ．ｐｉｃ＿ａｂｕｓ＿ａｄｄｒ
ｂｍ．ｐｉｃ＿ｃｅｎｔ＿ｃｆｇ
ｐｉｃ＿ｃｅｎｔ＿ｃｆｇレジスタはセントロニクスモードが有効の場合に、すべてのインターフェースの局面を制御する読み込み／書き込み信号及び読み込み専用ステータス信号を含んでいる。
ｂｎ．ｐｉｃ＿ｃｅｎｔ＿ｄｉｒ
ｂｏ．ｐｉｃ＿ｒｅｖｅｒｓｅ＿ｃｆｇ
ｂｐ．ｐｉｃ＿ｔｉｍｅｒ０
ｂｑ．ｐｉｃ＿ｔｉｍｅｒ１
データキャッシュコントローラレジスタ
ｂｒ．ｄｃｃ＿ｃｆｇ１
ｂｓ．ｄｃｃ＿ｃｆｇ２
ｂｔ．ｄｃｃ＿ｓｔａｔ
ｂｕ．ｄｃｃ＿ｅｒｒ＿ｉｎｔ
ｂｖ．ｄｃｃ＿ｅｒｒ＿ｉｎｔ＿ｅｎ
ｂｗ．ｄｃｃ＿ｌｖ０
ｂｘ．ｄｃｃ＿ｌｖ１
ｂｙ．ｄｃｃ＿ｌｖ２
ｂｚ．ｄｃｃ＿ｌｖ３
ｃａ．ｄｃｃ＿ａｄｄｒ
ｃｂ．ｄｃｃ＿ｒａｄｄｒｂ
ｃｃ．ｄｃｃ＿ｒａｄｄｒｃ
ｃｄ．ｄｃｃ＿ｔｅｓｔ
オペランドオーガナイザレジスタオペランドオーガナイザレジスタには同様の２つのオペランドオーガナイザが存在する：オペランドオーガナイザＢとオペランドオーガナイザＣである。これらの２つのオペランドオーガナイザ用のレジスタはここに記述されている。
ｃｅ．ｏｏｎ＿ｃｆｇ（ｏｏｂ＿ｃｆｇ＝０ｘ７０，ｏｏｃ＿ｃｆｇ＝０ｘ８０）
ｃｆ．ｏｏｎ＿ｓｔａｔ（ｏｏｂ＿ｃｆｇ＝０ｘ７１，ｏｏｃ＿ｃｆｇ＝０ｘ８１）
ｃｇ．ｏｏｎ＿ｅｒｒ＿ｉｎｔ（ｏｏｂ＿ｅｒｒ＿ｉｎｔ＝０ｘ７２，ｅｒｒ＿ｉｎｔ＝０ｘ８２）
ｃｈ．ｏｏｎ＿ｅｒｒ＿ｉｎｔ＿ｅｎ（ｏｏｂ＿ｅｒｒ＿ｉｎｔ＿ｅｎ＝０ｘ７３，ｅｒｒ＿ｉｎｔ＿ｅｎ＝０ｘ８３）
ｃｉ．ｏｏｎ＿ｄｍｒ（ｏｏｂ＿ｄｍｒ＝０ｘ７４，ｏｏｃ＿ｄｍｒ＝０ｘ８４）
ｃｊ．ｏｏｎ＿ｓｕｂｓｔ（ｏｏｂ＿ｓｕｂｓｔ＝０ｘ７５，ｏｏｃ＿ｓｕｂｓｔ＝０ｘ８５）
ｃｋ．ｏｏｎ＿ｃｄｐ（ｏｏｂ＿ｃｄｐ＝０ｘ７６，ｏｏｃ＿ｃｄｐ＝０ｘ８６）
ｃｌ．ｏｏｎ＿ｌｅｎ（ｏｏｂ＿ｌｅｎ＝０ｘ７７，ｏｏｃ＿ｌｅｎ＝０ｘ８
ｃｍ．ｏｏｎ＿ｓａｉｄ（ｏｏｂ＿ｓａｉｄ＝０ｘ７８，ｏｏｃ＿ｓａｉｄ＝０ｘ８８）
ｃｎ．ｏｏｎ＿ｔｉｌｅ（ｏｏｂ＿ｔｉｌｅ＝０ｘ７９，ｏｏｃ＿ｔｉｌｅ＝０ｘ８９）
ピクセルオーガナイザレジスタ
ｃｏ．ｐｏ＿ｃｆｇ
ｃｐ．ｐｏ＿ｓｔａｔ
ｃｑ．ｐｏ＿ｅｒｒ＿ｉｎｔ
ｃｒ．ｐｏ＿ｅｒｒ＿ｉｎｔ＿ｅｎ
ｃｓ．ｐｏ＿ｄｍｒ
ｃｔ．ｐｏ＿ｓｕｂｓｔ
ｃｕ．ｐｏ＿ｃｄｐ
ｃｖ．ｐｏ＿ｌｅｎ
ｃｗ．ｐｏ＿ｓａｉｄ
ｃｘ．ｐｏ＿ｉｄｒ
ｃｙ．ｐｏ＿ｍｕｖ＿ｖａｌｉｄ
ｃｚ．ｐｏ＿ｍｕｖ
主データパスレジスタ
ｄａ．ｍｄｐ＿ｃｆｇすべてのビットは０にリセットされる。
ｄｂ．ｍｄｐ＿ｓｔａｔ
すべてのビットは０にリセットされる。
ｄｃ．ｍｄｐ＿ｅｒｒ＿ｉｎｔ
すべてのビットは０にリセットされる。
ｄｄ．ｍｄｐ＿ｅｒｒ＿ｉｎｔ＿ｅｎ
すべてのビットは０にリセットされる。
ｄｅ．ｍｄｐ＿ｔｅｓｔすべてのビットは０にリセットされる。
ｄｆｍｄｐ＿ｏｐ１すべてのビットは０にリセットされる。
ｄｇｍｄｐ＿ｏｐ２すべてのビットは０にリセットされる。
ｄｈｍｄｐ＿ｐｏｒすべてのビットは０にリセットされる。
ｄｉｍｄｐ＿ｂｉすべてのビットは０にリセットされる。ｍｄｐ＿ｂｉレジスタは種々のモードの様々なものに用いられる。
ｄｊｍｄｐ＿ｂｍすべてのビットは０にリセットされる。ｍｄｐ＿ｂｍレジスタは異なるモードの異なるものに用いられる。
ｄｋｍｄｐ＿ｌｅｎすべてのビットは０にリセットされる
ＪＰＥＧ符号化器レジスタｄｌｊｃ＿ｃｆｇ
ｄｍｊｃ＿ｓｔａｔ
ｄｎｊｃ＿ｅｒｒ＿ｉｎｔ
ｄｏｊｃ＿ｅｒｒ＿ｉｎｔ＿ｅｎ
ｄｐｊｃ＿ｒｓｉ
ｄｑｊｃ＿ｄｅｃｏｄｅ
ｄｒｊｃ＿ｒｅｓ
ｄｓｊｃ＿ｔａｂｌｅ＿ｓｅｌ
結果オーガナイザレジスタ
ｄｔｒｏ＿ｃｆｇ
ｄｕｒｏ＿ｓｔａｔ
ｄｖｒｏ＿ｅｒｒ＿ｉｎｔ
ｄｗｒｏ＿ｅｒｒ＿ｉｎｔ＿ｅｎ
ｄｘｒｏ＿ｄｍｒ
ｄｙｒｏ＿ｓｕｂｓｔ
ｄｚｒｏ＿ｃｄｐ
ｅａｒｏ＿ｌｅｎ
ｅｂｒｏ＿ｓａ
ｅｃｒｏ＿ｉｄｒ
ｅｄｒｏ＿ｖｂａｓｅ
ｅｅｒｏ＿ｃｕｔ
ｅｆｒｏ＿ｌｍｔ
ＰＣＩコンフィギュレーション空間のエイリアスＰＣＩコンフィギュレーション空間は２５６バイトの、ＰＣＩによって定義されたレジスタのブロックであり、ホストがＰＣＩデバイスをコンフィギュレーションしたり、その状態を読んだりすることを認めている。それはＰＣＩコンフィギュレーションサイクルを用いてアクセスされる。レジスタはまたコプロセッサの内部メモリの読み込み専用エリアにミラーされており、従ってＰＣＩの通常のメモリサイクルを用いて読むことができる。ＥＩＣに実装されているコンフィギュレーション空間のフォーマットを表１．１４１．１に示す。
表１．１４１．１コプロセッサＰＣＩ構成の空間的レイアウト
予約のレジスタと実装されたレジスタにおける予約のビットは読み込みに対しては０を返し、また書き込みによって影響しない。０ｘ４０−０ｘｆｆの範囲のコンフィギュレーション空間のアドレスもまた予約である。ベンダー専用のコンフィギュレーションレジスタは定義されない。
ｅｇベンダーＩＤ
このレジスタは読み込み専用である。ＣＩＳＲＡのベンダーＩＤは０ｘ１１ＡＣである。
ｅｈデバイスＩＤ
このレジスタは読み込み専用である。コプロセッサのデバイスＩＤは０ｘ０００１である。デバイスＩＤフィールドは二つの８ビットのフィールドに分割されている：最上位の８ビットはデバイスの特徴をを示す番号（０ｘ０はコプロセッサ）で、最下位の８ビットはそのデバイスのバージョン番号（０ｘ１はコプロセッサのバージョン）を示す。
ｅｉコマンドレジスタ
コマンドレジスタのフィールドの定義を表１．１４２に示す。このレジスタのすべての予約されていないビットは読みこみ／書き込みができる。リセット後にはこのレジスタは０ｘ００００にセットされる。
ｅｊステータスレジスタステータスレジスタのフィールドの定義を表１．１４３に示す。このレジスタの読み込みは通常通りである。このレジスタのいくつかのビットは読み込み専用である。その他のビットはコプロセッサのみにより１にセットされ、ホストのみによって０にリセットされる（テストモードを除く）。ホストはそのビットに１を書き込むことでリセットする；０の書き込みは意味をなさない。リセット後にはこのレジスタは０ｘ０２８０にセットされる。
ｅｋリビジョンＩＤこれは読み込み専用のレジスタである。コプロセッサの初期リビジョンＩＤは０ｘ０１である。ｅｌクラスコードこれは読み込み専用のレジスタである。コプロセッサはＰＣＩＳＩＧの定義されたクラスコードに適さないのでこのレジスタは０ｘＦＦ００００にセットされる。
ｅｍキャッシュラインサイズ
これは３２ビットワード単位でシステムのキャッシュラインサイズを決定する読み書き可能なレジスタである。これはコプロセッサがメモリ読み込みラインやメモリ多重読み込みコマンドを使用するときに決定する。コプロセッサはこのレジスタの０から２５５までの値をサポートする。このレジスタにおける０はメモリ読み込みラインおよびメモリ多重読み込みの形式を無効にする。このレジスタはリセット時には０ｘ００にセットされる。
ｅｎ遅延タイマ
このレジスタはすべてのＰＣＩの処理にＣＰＵが使用する最大のクロック数を特定する読み書きできるレジスタである。コプロセッサはこのレジスタにおいて０から２５５の値をサポートする。このレジスタはリセット時には０ｘ００にセットされる。
ｅｏヘッダタイプ
この読み込み専用のレジスタは０ｘ００にセットされる。このことはコプロセッサがタイプ０のレイアウトのコンフィギュレーション空間を使用することを意味する。
ｅｐベースアドレス
この読み書き可能なレジスタはコプロセッサの内部レジスタ、内部メモリ、ローカルメモリ、及び汎用インターフェースをホストのメモリマップ内に配置するために用いられる。コプロセッサの様々なリソースは６４ＭＢ（すべてが使用される訳ではない）を占有し、従ってこのレジスタの先頭６ビットだけが書き込み可能である。残りのビットはすべて０にハード的に結線されている。このレジスタの下位の４ビットは読み込み専用の制御ビットであり、これらもまた０に結線されている。このことはレジスタがメモリ空間を参照することを意味し、コプロセッサがホスト側の３２ビット空間のどこにでもマッピングされ、コプロセッサのリソースがターゲットであるときはプリフェッチできないことを意味する。
ｅｑサブシステムベンダーＩＤ
この読み込み専用レジスタはホストがシステムに実装されたＰＣＩボードのベンダーを識別できるようにする（ボード上のＰＣＩインターフェースに実装したコンポーネントのベンダーに対して）。このレジスタの内容はリセット時にＥＩＣコンフィギュレーションシリアルポートを用いてロードされる。
ｅｒサブシステムＩＤ
この読み込み専用レジスタはホストがシステムに実装されたＰＣＩボードを識別できるようにする。このレジスタの内容はリセット時にＥＩＣコンフィギュレーションシリアルポートを用いてロードされる。このメカニズムはボードの機能あるいはコンフィギュレーションに必要な情報の外部からの符号化およびホストからの読み込みを可能にする。
ｅｓ割り込みライン
この読み書きできるレジスタはシステムソフトウェアが割り込みラインルーティング情報を記録できる様にするために使用され、割り込みサービスソフトウェアによりアクセスできる。コプロセッサ内の処理には全く影響を与えない。このレジスタはリセット時には０ｘ００にセットされる。
ｅｔ割り込みピン
この読み込み専用レジスタはハード的に０ｘ０１に結線されている。このことはコプロセッサがＰＣＩのｉｎｔａ＿１割り込みピンを使用することを示す。
ｅｕＭｉｎ＿Ｇｎｔ
この読み込み専用レジスタはコプロセッサが要求する１／４マイクロ秒単位のバースト期間長をホストに示す。このレジスタの最適な値はまだ決まっていない。
ｅｖＭａｘ＿Ｌａｔ
この読み込み専用レジスタは１／４マイクロ秒単位での、コプロセッサが要求するＰＣＩバスのゲインコントロール最大遅延をホストに示す。このレジスタの最適な値はまだ決まっていない。
１．１．４内部メモリマップ
本節ではコプロセッサの内部メモリマップ内のプレモジュールデータエリアに生ずるオブジェクトの詳細について述べる。
１．１．５メモリワードフィールド
ａｅｉｃ＿ｐｔｐ
Appendix A
Coprocessor microprogramming
This section details the operations performed in the coprocessor each time a new instruction is executed. All self-configuration performed by the coprocessor during instruction execution is accomplished by reading / writing internal registers, and therefore the coprocessor can be fully implemented using the external C bus interface or the PCI bus interface by the host. Microprogramming is possible. However, in the case of microprogramming using a host, it is generally expected that it will be difficult due to the problem of host synchronization. This chapter assumes that the reader has sufficient knowledge of the coprocessor in the following respects:
1. Execution model
2. Instruction set and coding
3. Register set
4). Internal structure
A. 1 General matters
A1.1 General information on coprocessor setup
For all instructions other than control instructions and local DMA instructions, the data flow within the coprocessor is essentially under the control of the pixel organizer. The pixel organizer is responsible for fetching the beginning of the input data stream, counting the data, and determining when the last data was fetched. The other modules in the coprocessor basically simply respond to the data sent.
A1.2 Module configuration order
Not all modules are set up on a per instruction basis. Some modules are not configured at all during instruction decoding. The module configuration order is always the order of PO, DCC, OOB, OOC, MDP, JC, RO, and PIC.
A1.3 Other register settings
If an instruction is encoded including setting a certain register value, that register is set by microprogramming according to the following order:
1. If there is no other register set in the module that has the register to be set, that register is set before any other register setting.
2. If there is another register set in the module having the register to be set, that register is set immediately before the _cfg register of the module after the other registers are set.
A1.4 Coding of inconsistent instruction operands
Many instructions specify operands and result data types, so if other data types are specified, they return a meaningless result. For each operand, the coprocessor determines the format of the target operand in the following procedure.
1. If the internal format of the operand is specialized for one pixel (compressed byte or uncompressed byte), the corresponding operand organizer is set to reflect this. The data cache controller is not configured and therefore the operation continues in normal mode.
2. If the internal format of the operand is specialized for “other forms”, the coprocessor generates the operand format from the instruction. Operand B and C are forward. “Other forms” are not originally specified for operand A, and the coprocessor behavior is not defined. The corresponding operand organizer is in bypass mode, and the data cache controller is set to manage the resulting format of operand data. Microprogramming is reasonably independent between various modules.
A1.5 Pseudo-instruction grammar
• The order of instruction execution is determined by the leftmost number.
• Register names are written in Helvetica Bold.
-Register field is register. indicated by field.
I and D indicate an instruction word and a data word that are currently decoded, respectively.
A, B and C are currently decoded operand word A and operand word
B, operand word C.
A_descriptor, B_descriptor, and C_descriptor indicate the data word descriptor of the instruction currently being decoded.
R indicates the result word of the instruction currently being decoded.
“X: Y” indicates a connection between X and Y.
“@X” indicates the register number X of the coprocessor.
“Cbus (X)” indicates execution of C bus operation X.
・ ” ^* “Cbus (X)” indicates data received by the C bus operation X.
・ ” ^* X ″ represents a virtual memory address X.
・ "??" indicates an unknown value or an unknown value.
“Set” indicates the setting of the data manipulation register.
A. 2 Composition operators
note:
1. Major opcodes are 0xC and 0xD
2. The ambiguity is considered to be the byte at the highest address (ie, the highest byte).
3. The accumulator or operand may be premultiplied.
4). The result may be non-premultiplied.
5). The instruction length is defined by the number of input pixels.
A. 3 color space conversion
note:
1. The input space is always three dimensional. By default, it is the channel of the three lowest pixels. Ambiguity is eliminated.
2. The format of the color table is either one that contains one output channel or one that contains four output channels.
A. 4 JPEG instructions
note:
1. The opcode is 0x2.
2. Operand C may be a register to be shut down.
3. There are many options.
・ Do subsampling or not.
・ Do or do not filter.
• 1, 3 or 4 scans.
4). These instructions are associated with a number of registers that are set prior to instruction execution.
A. 4.1 Elongation
Note: 1. The following registers must be set before executing the instruction.
Ro_idr: Output image dimension number register
Ro_cut: Output cut register
Ro_lmt: Output restriction register
A. 4.2 Compression
note:
1. The following registers must be set before executing the instruction.
Po_idr: Output image dimension number register
Jc_rml: interval of restart marker
Ro_cut: Output cut register
Ro_lmt: Output restriction register
A. 5 Data coding
note:
1. All data coding operations are handled in the same way for both compression and decompression. These operation settings are almost the same as in JPEG.
2. Possible encoding operations
・ Huffman coding
・ Predictive coding
3. Possible decoding operations
・ High-speed Huffman decoding
・ Slow Huffman decoding
Packbits decoding (version A)
Packbits decoding (version B)
・ Predictive decoding
4). Operand C may be a register for setting.
5). The following registers must be set before executing the instruction.
Ro_cut: Output cut register
Ro_lmt: Output restriction register
A. 6 Conversion and convolution
1. Opcodes are 0x4 (convolution) and 0x5 (conversion).
2. The coprocessor performs operations that are supersets necessary for each of image conversion and image convolution. The only difference between image conversion and image convolution is that as far as the coprocessor is concerned, the kernel step size is the kernel size (horizontal and vertical) in image conversion, whereas the step size is one source pixel in convolution. That is.
3. option:
-Snap to neighboring pixels and interpolation
・ Pixel (kernel) accumulation or not
Whether to perform pre-multiplication of source pixels
-Final result clamping, wrapping, absolute value
4). Note: Conversion and convolution cannot be performed in place. That is, if the source pointer and destination pointer are the same, the contents are destroyed.
A. 7 Matrix multiplication
note:
1. Opcode is 0x3
2. option:
Whether to perform pre-multiplication of source pixels
・ Clamping, wrapping and absolute value of the final result
Operand C may be written to the register
A. 8 Halftone processing
note:
1. Opcode is 0x7
2. Options are halftone level values only
3. This can be done on pixels or bytes as long as the halftone screen is properly meshed or unmeshed.
A. 9 Memory copy
note:
1. The opcode is 0x92. This instruction uses a completely separate mechanism to complete the memory copy operation.
General data transfer instructions use the normal data flow in the coprocessor and can use various functions using the data manipulation units in the PO and RO.
Peripheral DMA instructions use a direct connection between PIC and LMC. This means that data cannot be manipulated and can be executed simultaneously with subsequent instructions.
A. 9.1 General-purpose data transfer
A. 9.2 Peripheral DMA transfer
note:
1. It may or may not be concurrent. This is handled by the IC.
2. Operand C may be a register to be set
3. Since PIC is a module that handles data, this instruction is different from other “active” instructions.
A. 10 Photo CD extension
This group of commands consists of three different operations: horizontal interpolation, vertical interpolation and remainder fusion. The setting method of the vertical interpolation and the remainder fusion is the same. The opcodes for all these instructions are 0x9.
A. 10.1 Horizontal interpolation
note:
1. Can be performed on pixels or bytes
2. This instruction is an instruction having one operand, and the operand C may be a register to be set.
A. 10.2 Vertical interpolation and remainder fusion
note:
1. The settings for vertical interpolation and rest fusion are the same.
2. Executable for both pixels and bytes.
3. This instruction is an instruction having two operands, and the operand C may be a register set.
A. 11 Control instructions
note:
1. The control instruction consists of two types of operations, namely a flow control instruction and an internal access instruction.
A. 11.1 Flow control
note:
1. Opcode is 0xB
2. The flow control command currently consists of various jump commands and various standby commands.
3. There is no explicit placement within the coprocessor, and this instruction is not an “active” instruction. That is, submodules in the coprocessor do not actually do anything like other instructions.
4). Operand C may be a register to be set.
A. 11.2 Internal access (read)
note:
1. Opcode is 0xA
2. A read instruction transfers data out of the coprocessor.
3. RO is actually the only module that does everything in the coprocessor.
A. 11.3 Internal access (write)
note:
1. Opcode is 0xA
2. A write instruction transfers data into the coprocessor.
3. Since this instruction is not an “active” instruction, modules other than the IC actually do nothing.
A. 12 Reserved instructions
note:
1. Opcodes 0x0 and 0xF are reserved.
2. Reserved instructions give maskable errors.
3. These reserved instructions are to be used as other instructions when the coprocessor is revised in the future.
Appendix B: Registers
1.1 Registers and tables
This section describes the coprocessor registers. These registers can be changed in three ways.
1. Specific coprocessor instructions are for reading and writing registers. By using these instruction groups, the register is read from or written to the memory associated with the local memory interface using the start of the PIC bus cycle of the initiator or the transaction of the general-purpose interface.
2. The contents of many registers change due to side effects of instruction execution. The main mechanism by which the coprocessor sets itself for instruction execution is realized by setting various registers to reflect the current state. After completion of instruction execution, each register reflects the state of the coprocessor. Many typical processes are completely specified and configured by certain instructions. Some registers need to be set just before instruction execution.
"Reserved" register bit meaning
The meaning of “reserved” for any register or its components is as follows.
・ You can write to the reserved place, but the data is rejected.
・ You can read from the reserved place, but the data is undefined.
All unspecified registers and register fields are “reserved”.
1.1.1 Register classification
Registers in the coprocessor are classified based on the behavior described in this section. These descriptions are
External: Outside module (access from). External access using the CBus interface. That is, the target mode PCI by the instruction controller or the external CBus interface is used. Note, registers cannot be set from the PCI bus via the biset mode.
・ Inside: Inside the module (access from)
Status register
The status register is read-only externally and can be read and written internally.
Config 1 register
The config 1 register is readable / writable from the outside and is read-only from the inside. The Config 1 register does not support Type C CBus operations (ie, does not support bit set mode) and is used as a register to hold byte (or larger) configuration information such as address values.
Config 2 register
The config 2 register can also be read and written from the outside, and is read-only from the inside. The configuration 2 register supports type C CBus operation (ie, bit set mode) and is used as a register for holding configuration information that needs to be set in bit units.
Control 1 register
Control 1 register can be read and written from outside and inside. The Control 1 register does not support Type C CBus operations (ie, does not support bit set mode) and is used as a register to hold bytes such as address values (or larger control information).
Control 2 register
Control 2 register can be read and written from outside and inside. The control 2 register supports type C CBus operations (ie, bit set mode) and is used as a register for holding control information that needs to be set in bit units.
Interrupt register
The bit in the interrupt register can be set to 1 from the inside and can be reset to 0 by writing 1 from the outside. Module interrupt / error registers are also of this type. The module interrupt / error register consists of three fields.
[7: 0] means any error condition (status) generated by the module
[23: 8] means any exception condition generated by the module
[31:24] Means any interrupt state generated by the module 1.1.2 Register Map
Table 1.1 shows the coprocessor registers. The number is not an address but a register number.
Table 1.1 Coprocessor registers
1.1.3 Register definition
Function module register
Instruction controller register
I. ic_cfg
The ic_cfg register is divided into three parts. The least significant byte contains global configuration information. The third least significant byte contains stream A configuration information, and the most significant byte contains stream B configuration information. The reset value of this register is 0x00000000.
m. is_stat
This register is divided into four sections. The least significant byte holds the internal state of the IC. The second byte from the least significant holds the decoded result of the current instruction and the current and prefetched instruction streams. The second byte from the most significant holds all status information regarding the A stream. The most significant byte holds information about the B stream. The reset value of this register is 0x00000000.
n. ic_err int
This register includes an active high flag that indicates whether an interrupt or error has occurred within the IC. Each bit is cleared by writing 1.
o. ic_err_int_en
This register contains various error and interrupt permission masks, and the reset value is 0x00000000.
p. ic_ipa
This register holds the most significant 30 bits of the virtual address used for stream A instruction fetch. The two least significant bits are assumed to be 0 as the instruction should be aligned. The reset value of this register is 0x00000000.
q. ic_tda
This register holds the “to do” value for stream A. This is a 32-bit (wrapping) sequence number until a proper instruction exists. The reset value of this register is 0x00000000.
r. ic_fna
This register holds the “end” value for stream A. This indicates the last completed instruction with a 32-bit (wrapping) sequence number. The reset value of this register is 0x00000000.
s. ic_inta
This register holds the “interrupt” number for stream A. This is a 32-bit (wrapping) sequence number where to interrupt when the mechanism is valid and prepared. The reset value of this register is 0x00000000.
t. ic_loa
This register holds the 32-bit (wrapping) sequence number of the last duplicate instruction executed in stream A. The reset value of this register is 0x00000000.
u. ic_ipb
This register holds the most significant 30 bits of the virtual address used for stream B instruction fetch. The two least significant bits are assumed to be 0 as the instruction should be aligned. The reset value of this register is 0x00000000.
v. ic_tdp
This register holds the “to do” value of stream B. This is a 32-bit (wrapping) number until a proper instruction exists. The reset value of this register is 0x00000000.
w. ic_fnb
This register holds the “end” value for stream B. This indicates the last completed instruction with a 32-bit (wrapping) sequence number. The reset value of this register is 0x00000000.
x. ic_intb
This register holds the “interrupt” number for stream B. This is a 32-bit (wrapping) sequence number where to interrupt when the mechanism is valid and prepared. The reset value of this register is 0x00000000.
y. ic_lob
This register holds the 32-bit (wrapping) sequence number of the last duplicate instruction executed in stream B. The reset value of this register is 0x00000000.
z. ic_sema
This register is an alias using a side effect of the ic_stat register, and reading of this register is a side effect of the request of the stream A register semaphore.
aa. ic_semb
This register is an alias using the side effect of the ic_stat register, and reading this register is a side effect of the request for the stream B register semaphore.
Input interface register
ab. iis_cfg
ac. iis_stat
ad. iis_err_int
ae. iis_err_int_en
af. iis_ic_addr
ag. iis_dcc_addr
ah. iis_po_addr
ai. iis_burst
aj. iis_base_addr
ak. iis_test
External interface controller register
al. eic_cfg
am. eic_stat
an. eic_err_int
The error and interrupt bits in the eic_err_int register can be set only by EIC and can be reset only by software. Normal error and interrupt bits are reset by writing a 1 to the bit. The error bit, which is a copy of the PCI configuration register bit, must be cleared by writing to the PCI configuration register. That is, copying with eic_err_int has no effect.
ao. eic_err_int_en
ap. eic_test
aq. eic_pob
ar. eic_high_addr
as. eic_wtlb_v
at. eic_wtlb_p
au. eic_mmu_v
Note: The value of this register can be changed at any time if the MMU is not invalid due to a page fault error or an MMU to PCI bus error.
av. eic_mmu_p
Note: The value of this register can be changed at any time if the MMU is not invalid due to a page fault error or an MMU to PCI bus error.
aw. eic_ip_addr
Note: The value of this register can be changed at any time if the IBD is not invalid due to an error from the Ibus to the PCI bus.
ax. eic_rp_addr
Note: The value of this register can be changed at any time if the RBR is not invalid due to an error from the RBus to the PCI bus.
ay. eic_ig_addr Note: The value of this register can be changed at any time if the GBC is not invalid due to a universal bus error.
az. eic_rg_addr
Note: The value of this register can be changed at any time if the GBC is not invalid due to a universal bus error.
PCI bus configuration space alias
The PCI bus configuration space consisting of 16 words is aliased to a register indicated by an address from 0xc0 to 0xcf.
Local memory controller register
ba. lmi_cfg
This register contains a number of configuration bits and control bits that are used to determine the processing mode and parameters of the LMC. Bits that specifically refer to SDRAM processing have no effect when the sdram_1 pin is high. This register has a reset value 0x20000100 which is a refresh interval of 3.2 microseconds when the frequency of clkin is 80 MHz. All special modes and functions are disabled when power is turned on, and all access rights are set equal to zero. Refresh is valid at reset, but other modules are invalid (E = 0). Refresh is not affected by the E bit.
bb. lmi_stat
The status register consists of module active and undecided bits as well as information inside the machine. The state machine is driven with twice the clock of the CBus interface, so two fields are required to hold the state information for each of the two latest 80 MHz clock cycles.
bc. lmi_err_int
The error and interrupt status registers hold interrupt, exception, and error status information. The register can be read and written, read returns status information, and writing a 1 to a specific bit resets that bit. Writing a zero has no effect on that bit.
This register must have a reset value of 0x00000000, which indicates that no interrupts or errors have occurred. The reserved bit is always 0 and can never change state.
bd. lmi_err_int_en register
The error / exception / interrupt valid register is used to select whether the error / exception interrupt signal is valid or invalid. Registers can be read and written. This register is used to enable bit by bit based on errors, exceptions and interrupts in the lmi_err_int register. There is a one-to-one correspondence between the bits in this register and the bits in the lmi_err_int register. If a particular bit in the lmi_err_int_en register goes high, the corresponding bit in the lmi_err_int register becomes valid, and if it is high, an LMC module error, exception or interrupt signal, c_err, c_exp, or c_int can be generated. If a particular bit in the lmi_err_int_en register is cleared, the corresponding bit in the lmi_err_int register becomes invalid and c_err, c_exp, or c_int cannot be generated. Since there is no exception in LMC, the exp_mask bit in this register has no effect and is all reserved. The reset value of this register is 0x00000000, which disables all error and interrupt sources. Unused bits are always 0 and cannot be set high.
be. lmi_dcfg
This configuration register holds design parameters that determine the size and configuration when a DRAM chip is used. This register holds a reset value 0x0007ff80 that maximizes all timing limit values.
bf. lmi_mode register
This configuration register holds information written to the SDRAM mode register as part of the initialization process. This register is always readable and writable and may be written to the SDRAM by setting the initialization bit. This register has a reset value of 0x0037. This useful default value is required immediately after power-up precharge or immediately after level 1 reset. This sets the read delay to 3 clocks and sets the burst length to a full page using sequential wrap. After any reset, if the sdram_1 pin is low, the initialization bit is set to initially program the SDRAM mode register. After writing the mode register, this bit is automatically cleared to zero.
Peripheral interface register
bg. pic_cfg register
bh. pic_stat
bi. pic_err_int
The error and interrupt bits in the pic_err_int register are set only by PIC and reset only by software. Each bit is reset by writing 1
bj. pic_err_int_en
bk. pic_abus_cfg
bl. pic_abus_addr
bm. pic_cent_cfg
The pic_cent_cfg register contains read / write signals and read-only status signals that control all interface aspects when the Centronics mode is enabled.
bn. pic_cent_dir
bo. pic_reverse_cfg
bp. pic_timer0
bq. pic_timer1
Data cache controller register
br. dcc_cfg1
bs. dcc_cfg2
bt. dcc_stat
bu. dcc_err_int
bv. dcc_err_int_en
bw. dcc_lv0
bx. dcc_lv1
by. dcc_lv2
bz. dcc_lv3
ca. dcc_addr
cb. dcc_raddrb
cc. dcc_raddrc
cd. dcc_test
Operand Organizer Register There are two similar operand organizers in the Operand Organizer register: Operand Organizer B and Operand Organizer C. The registers for these two operand organizers are described here.
ce. oon_cfg (oob_cfg = 0x70, ooc_cfg = 0x80)
cf. oon_stat (oob_cfg = 0x71, ooc_cfg = 0x81)
cg. oon_err_int (obb_err_int = 0x72, err_int = 0x82)
ch. oon_err_int_en (oob_err_int_en = 0x73, err_int_en = 0x83)
ci. oon_dmr (oob_dmr = 0x74, ooc_dmr = 0x84)
cj. oon_subst (oob_subst = 0x75, ooc_subst = 0x85)
ck. oon_cdp (oob_cdp = 0x76, ooc_cdp = 0x86)
cl. oon_len (oob_len = 0x77, ooc_len = 0x8
cm. oon_said (oob_said = 0x78, ooc_said = 0x88)
cn. oon_tile (oob_tile = 0x79, ooc_tile = 0x89)
Pixel organizer register
co. po_cfg
cp. po_stat
cq. po_err_int
cr. po_err_int_en
cs. po_dmr
ct. po_subst
cu. po_cdp
cv. po_len
cw. po_said
cx. po_idr
cy. po_muv_valid
cz. po_muv
Main data path register
da. mdp_cfg All bits are reset to zero.
db. mdp_stat
All bits are reset to zero.
dc. mdp_err_int
All bits are reset to zero.
dd. mdp_err_int_en
All bits are reset to zero.
de. mdp_test All bits are reset to zero.
df mdp_op1 All bits are reset to zero.
dg mdp_op2 All bits are reset to zero.
dh mdp_port All bits are reset to zero.
di mdp_bi All bits are reset to zero. The mdp_bi register is used for various things in various modes.
dj mdp_bm All bits are reset to zero. The mdp_bm register is used for different ones in different modes.
dk mdp_len All bits are reset to 0
JPEG encoder register dl jc_cfg
dm jc_stat
dn jc_err_int
do jc_err_int_en
dp jc_rsi
dq jc_decode
dr jc_res
ds jc_table_sel
Results organizer register
dt ro_cfg
du ro_stat
dv ro_err_int
dw ro_err_int_en
dx ro_dmr
dy ro_subst
dz ro_cdp
ea ro_len
eb ro_sa
ec ro_idr
ed ro_vbase
ee ro_cut
ef ro_lmt
PCI Configuration Space Alias The PCI configuration space is a 256-byte PCI-defined block of registers that allows the host to configure and read the status of PCI devices. It is accessed using the PCI configuration cycle. The registers are also mirrored in the read-only area of the coprocessor's internal memory and can therefore be read using the normal PCI memory cycles. The format of the configuration space implemented in the EIC is shown in Table 1.141.1.
Table 1.141.1 Spatial layout of coprocessor PCI configuration
The reserved bits in the reserved register and the implemented register return 0 for reading and are not affected by writing. Configuration space addresses in the range 0x40-0xff are also reserved. Vendor-specific configuration registers are not defined.
eg Vendor ID
This register is read-only. The vendor ID of CISRA is 0x11AC.
eh Device ID
This register is read-only. The device ID of the coprocessor is 0x0001. The device ID field is divided into two 8-bit fields: the most significant 8 bits are numbers indicating device characteristics (0x0 is a coprocessor), and the least significant 8 bits are the version number of the device (0x1). Indicates the coprocessor version).
ei command register
Table 1.142 shows the definition of the command register field. All unreserved bits in this register can be read / written. After reset, this register is set to 0x0000.
ej Status register Table 1.143 shows the definition of the status register field. Reading this register is normal. Some bits in this register are read-only. The other bits are set to 1 only by the coprocessor and reset to 0 only by the host (except for the test mode). The host resets by writing a 1 to that bit; writing a 0 makes no sense. After reset, this register is set to 0x0280.
ek revision ID This is a read-only register. The initial revision ID of the coprocessor is 0x01. el class code This is a read-only register. This register is set to 0xFF0000 because the coprocessor is not suitable for the PCIISIG defined class code.
em cache line size
This is a read / write register that determines the cache line size of the system in units of 32 bit words. This is determined when the coprocessor uses a memory read line or a memory multiple read command. The coprocessor supports values from 0 to 255 in this register. A 0 in this register disables the memory read line and memory multiple read formats. This register is set to 0x00 at reset.
en Delay timer
This register is a readable / writable register that specifies the maximum number of clocks used by the CPU for all PCI processing. The coprocessor supports values from 0 to 255 in this register. This register is set to 0x00 at reset.
eo header type
This read-only register is set to 0x00. This means that the coprocessor uses a configuration space of type 0 layout.
ep base address
This readable / writable register is used to place the coprocessor's internal registers, internal memory, local memory, and general purpose interface in the host's memory map. The various resources of the coprocessor occupy 64 MB (not all are used), so only the first 6 bits of this register can be written. All remaining bits are hard wired to 0. The lower 4 bits of this register are read-only control bits, which are also wired to 0. This means that the register refers to the memory space, meaning that the coprocessor is mapped anywhere in the host 32-bit space and cannot be prefetched when the coprocessor resource is the target.
eq Subsystem vendor ID
This read-only register allows the host to identify the vendor of the PCI board installed in the system (for the component vendor installed in the PCI interface on the board). The contents of this register are loaded using the EIC configuration serial port at reset.
er subsystem ID
This read-only register allows the host to identify the PCI board installed in the system. The contents of this register are loaded using the EIC configuration serial port at reset. This mechanism allows external encoding and reading from the host of information required for board function or configuration.
es interrupt line
This read / write register is used to allow the system software to record interrupt line routing information and can be accessed by the interrupt service software. It has no effect on the processing in the coprocessor. This register is set to 0x00 at reset.
et interrupt pin
This read-only register is wired to 0x01 in hardware. This indicates that the coprocessor uses the PCI inta_1 interrupt pin.
eu Min_Gnt
This read-only register indicates to the host the burst period length in 1/4 microsecond units required by the coprocessor. The optimal value for this register has not yet been determined.
ev Max_Lat
This read-only register indicates to the host the PCI bus gain control maximum delay required by the coprocessor in 1/4 microsecond units. The optimal value for this register has not yet been determined.
1.1.4 Internal memory map
This section details the objects that occur in the pre-module data area in the coprocessor's internal memory map.
1.1.5 Memory word field
a eic_ptp

ホストコンピュータ環境内のラスタ画像コプロセッサの動作を示す図Diagram showing the operation of a raster image coprocessor in a host computer environment 図１のラスタ画像コプロセッサをより詳細に示した図A more detailed view of the raster image coprocessor of FIG. ラスタ画像コプロセッサのメモリマップを示す図Diagram showing memory map of raster image coprocessor ＣＰＵ，命令キュー、命令オペランド、共有メモリ中の結果、コプロセッサ間の関係を示す図Diagram showing the relationship between CPU, instruction queue, instruction operand, result in shared memory, and coprocessor 命令生成部、メモリ管理部、キュー管理部、コプロセッサ間の関係を示す図Diagram showing the relationship between the instruction generator, memory manager, queue manager, and coprocessor 命令をペンディング命令キューから読み込み、終了命令キューに配置するグラフィックスコプロセッサの動作を示す図Diagram showing the operation of the graphics coprocessor that reads instructions from the pending instruction queue and places them in the end instruction queue 命令キューの固定長巡回バッファ実装を示し、バッファが溢れるまで待機しする必要性を説明する図Diagram showing the implementation of a fixed-length cyclic buffer in the instruction queue and explaining the need to wait until the buffer overflows コプロセッサにおいて用いられる命令実行ストリームを示す図Diagram showing instruction execution stream used in coprocessor 命令実行フローチャート、Instruction execution flowchart, コプロセッサにおいて用いられる標準命令ワードフォーマットを示す図Diagram showing standard instruction word format used in coprocessors 標準命令の命令ワードフィールドを示す図Diagram showing the instruction word field of a standard instruction 標準命令のデータワードフィールドを示す図Diagram showing data word field of standard instruction 図２の命令制御部を模式的に示す図The figure which shows the command control part of FIG. 2 typically 図１３の実行制御部をより詳細に示した図The figure which showed the execution control part of FIG. 13 in detail 命令制御部の状態遷移図State transition diagram of command controller 図１３の命令復号部を示す図The figure which shows the instruction decoding part of FIG. 図１６の命令シーケンサをより詳細に示した図FIG. 16 shows the instruction sequencer of FIG. 16 in more detail. 図１６のＩＤシーケンサの状態遷移図State transition diagram of the ID sequencer of FIG. 図１３のプレフェッチバッファ制御部をより詳細に示した図The figure which showed the prefetch buffer control part of FIG. 13 in detail 、, コプロセッサで用いられるレジスタ記憶とモジュール間関連の標準形式を示す図Diagram showing the standard format for register storage and inter-module relationships used in the coprocessor コプロセッサにおいて用いられる制御バス処理のフォーマットを示す図The figure which shows the format of the control bus processing which is used in the coprocessor コプロセッサの一部内のデータフローを示す図Diagram showing the data flow within a part of a coprocessor コプロセッサにおいて用いられるさまざまなデータ再フォーマット例を示す図Diagram showing various data reformat examples used in coprocessors コプロセッサにおいて用いられるさまざまなデータ再フォーマット例を示す図Diagram showing various data reformat examples used in coprocessors コプロセッサにおいて用いられるさまざまなデータ再フォーマット例を示す図Diagram showing various data reformat examples used in coprocessors コプロセッサにおいて用いられるさまざまなデータ再フォーマット例を示す図Diagram showing various data reformat examples used in coprocessors コプロセッサにおいて用いられるさまざまなデータ再フォーマット例を示す図Diagram showing various data reformat examples used in coprocessors コプロセッサにおいて用いられるさまざまなデータ再フォーマット例を示す図Diagram showing various data reformat examples used in coprocessors コプロセッサにおいて用いられるさまざまなデータ再フォーマット例を示す図Diagram showing various data reformat examples used in coprocessors コプロセッサにおいて実行されるフォーマット変換を示す図Diagram showing format conversion performed in coprocessor コプロセッサにおいて実行されるフォーマット変換を示す図Diagram showing format conversion performed in coprocessor コプロセッサにおいて実行される入力データ変換処理を示す図The figure which shows the input data conversion process performed in a coprocessor コプロセッサにおいて実行されるさまざまなデータ変換を示す図Diagram showing various data transformations performed in the coprocessor コプロセッサにおいて実行されるさまざまなデータ変換を示す図Diagram showing various data transformations performed in the coprocessor コプロセッサにおいて実行されるさまざまなデータ変換を示す図Diagram showing various data transformations performed in the coprocessor コプロセッサにおいて実行されるさまざまなデータ変換を示す図Diagram showing various data transformations performed in the coprocessor コプロセッサにおいて実行されるさまざまなデータ変換を示す図Diagram showing various data transformations performed in the coprocessor コプロセッサにおいて実行されるさまざまなデータ変換を示す図Diagram showing various data transformations performed in the coprocessor コプロセッサにおいて実行されるさまざまなデータ変換を示す図Diagram showing various data transformations performed in the coprocessor コプロセッサにおいて実行されるさまざまなデータ変換を示す図Diagram showing various data transformations performed in the coprocessor コプロセッサにおいて実行されるさまざまなデータ変換を示す図Diagram showing various data transformations performed in the coprocessor コプロセッサにおいて実行されるさまざまな内部から出力データ変換を示す図Diagram showing various internal output data conversions performed in the coprocessor コプロセッサにおいて実行されるさまざまなデータ変換例を示す図Diagram showing various data conversion examples executed in coprocessor コプロセッサにおいて実行されるさまざまなデータ変換例を示す図Diagram showing various data conversion examples executed in coprocessor コプロセッサにおいて実行されるさまざまなデータ変換例を示す図Diagram showing various data conversion examples executed in coprocessor コプロセッサにおいて実行されるさまざまなデータ変換例を示す図Diagram showing various data conversion examples executed in coprocessor コプロセッサにおいて実行されるさまざまなデータ変換例を示す図Diagram showing various data conversion examples executed in coprocessor どのデータ変換が用いられるべきかを決定する内部レジスタで用いられるさまざまなフィールドを示す図Diagram showing various fields used in internal registers that determine which data conversion should be used データ正規化を用いるグラフィックスサブシステムのブロック図Block diagram of a graphics subsystem using data normalization データ正規化装置の回路図Circuit diagram of data normalizer 合成処理において実行されるピクセル処理を示す図The figure which shows the pixel process performed in a compositing process 合成処理のための命令ワードフォーマットを示す図Diagram showing instruction word format for composition processing 合成処理のためのデータワードフォーマットを示す図Diagram showing data word format for compositing process タイル処理のための命令ワードフォーマットを示す図Diagram showing instruction word format for tile processing 画像に対するタイル命令の動作を示す図Diagram showing operation of tile command for image 色値を再マッピングするための色区間／区間内位置テーブルの利用処理を示す図The figure which shows the utilization process of the color section / intra-section position table for remapping a color value コプロセッサのＭＵＶバッファ内の区間／区間内位置テーブルの格納形式を示す図The figure which shows the storage format of the area / intra-section position table in the MUV buffer of a coprocessor コプロセッサにおいて実行される補間を用いた色変換処理を示す図The figure which shows the color conversion process using the interpolation performed in a coprocessor コプロセッサにおいて実行されるエッジでの色変換処理の改善処理を示す図The figure which shows the improvement process of the color conversion process in the edge performed in a coprocessor コプロセッサにおいて実行される１出力色のための色空間変換処理を示す図The figure which shows the color space conversion process for 1 output color performed in a coprocessor 単一色出力色空間変換を用いたときのコプロセッサのキャッシュ内でのメモリ格納を示す図Diagram showing memory storage in coprocessor cache when using single color output color space conversion 複数色空間変換で用いられる手法を示す図Diagram showing the technique used in multiple color space conversion 複数色空間変換処理において用いられるキャッシュのためのアドレス再マッピング処理を示す図The figure which shows the address remapping process for the cache used in a multiple color space conversion process 色空間変換命令における命令ワードフォーマットを示す図The figure which shows the instruction word format in a color space conversion instruction 複数色変換手法を示す図Diagram showing multiple color conversion method コプロセッサで実行されるＪＰＥＧ変換処理でのＭＣＵの生成を説明する図The figure explaining the production | generation of MCU in the JPEG conversion process performed with a coprocessor コプロセッサで実行されるＪＰＥＧ変換処理でのＭＣＵの生成を説明する図The figure explaining the production | generation of MCU in the JPEG conversion process performed with a coprocessor コプロセッサのＪＰＥＧ符号化部の構造を示す図The figure which shows the structure of the JPEG encoding part of the coprocessor 図６８の量子化部をより詳細に示す図68 shows the quantization unit in FIG. 68 in more detail. 図６８のハフマン符号化部をより詳細に示す図The figure which shows the Huffman encoding part of FIG. 68 in detail ハフマン符号化部と復号部とを示す図The figure which shows a Huffman encoding part and a decoding part ハフマン符号化部と復号部とを示す図The figure which shows a Huffman encoding part and a decoding part コプロセッサで用いられるＪＰＥＧデータの削除・制約処理を説明する図A diagram for explaining deletion / restriction processing of JPEG data used in the coprocessor コプロセッサで用いられるＪＰＥＧデータの削除・制約処理を説明する図A diagram for explaining deletion / restriction processing of JPEG data used in the coprocessor コプロセッサで用いられるＪＰＥＧデータの削除・制約処理を説明する図A diagram for explaining deletion / restriction processing of JPEG data used in the coprocessor ＪＰＥＧ命令の命令ワードフォーマットを示す図The figure which shows the command word format of the JPEG command 一般の離散コサイン変換装置（従来例）のブロック図Block diagram of a general discrete cosine transform device (conventional example) 従来例のＤＣＴ装置の算術データパスを示す図The figure which shows the arithmetic data path of the DCT apparatus of a prior art example コプロセッサで用いられるＤＣＴ装置のブロック図Block diagram of DCT device used in coprocessor 図７９の算術回路をより詳細に示すブロック図A block diagram showing the arithmetic circuit of FIG. 79 in more detail. 図７９のＤＣＴ装置の算術データパスを示す図The figure which shows the arithmetic data path of the DCT device of FIG. ＪＰＥＧフォーマットのように符号化されていないビットフィールド（バイト整列されているものとされていないもの）がインタリーブされた代表的なハフマン符号化データを示す図The figure which shows the typical Huffman coding data in which the bit field (one which is not byte-aligned and what is not byte-aligned) is interleaved like a JPEG format 、, 図８４のＪＰＥＧデータのハフマン復号部の全体の構造をより詳細に示した図FIG. 84 is a diagram showing in more detail the overall structure of the JPEG data Huffman decoding unit in FIG. ＪＰＥＧデータのハフマン復号部の全体の構造を示す図The figure which shows the whole structure of the Huffman decoding part of JPEG data バイト整列された符号化されていないビットフィールドを入力データから削除するストリッパブロック中のデータ処理を示し、ストリッパから出力されるデータに対応するタグ符号の例をも示す図The figure which shows the data processing in the stripper block which deletes the bit field which is not byte-aligned and encoded from input data, and also shows the example of the tag code corresponding to the data output from a stripper 、, データプレシフタの構成とデータフローを示す図Diagram showing data pre-shifter configuration and data flow 、, 図８１の復号部の制御ロジックを示す図The figure which shows the control logic of the decoding part of FIG. 、, マーカプレシフタの構成とデータフローを示す図Diagram showing marker pre-shifter configuration and data flow ＪＰＥＧ符号化においてハフマン符号値を復号する組み合わせ回路のブロック図、Block diagram of a combinational circuit for decoding a Huffman code value in JPEG encoding; パディング領域の概念とパディングビットの復号部のブロック図Block diagram of padding area concept and padding bit decoding unit 復号部から出力され、コプロセッサにおいて用いられるデータフォーマットの例を示す図The figure which shows the example of the data format output from a decoding part and used in a coprocessor 画像変換命令において用いられる手法を示す図Diagram showing the technique used in image conversion instructions 画像変換命令における命令ワードフォーマットを示す図The figure which shows the command word format in the image conversion command コプロセッサで用いられる画像変換カーネルのフォーマットを示す図Diagram showing the format of the image conversion kernel used by the coprocessor コプロセッサで用いられる画像変換カーネルのフォーマットを示す図Diagram showing the format of the image conversion kernel used by the coprocessor コプロセッサで用いられる画像変換のためのインデックステーブルの利用処理を示す図The figure which shows the utilization process of the index table for the image conversion used with a coprocessor 変換や畳込みで用いる命令のためのデータフィールドフォーマットを示す図、Diagram showing data field format for instructions used in conversion and convolution, 命令ワードのｂｐフィールドの説明図Explanatory drawing of bp field of instruction word コプロセッサで用いられる畳込み処理を示す図Diagram showing the convolution process used in the coprocessor コプロセッサで用いられる畳込み命令の命令ワードフォーマット図Instruction word format diagram of convolution instruction used in coprocessor コプロセッサで用いられる行列乗算の命令ワードフォーマット図、Instruction word format diagram of matrix multiplication used in coprocessor, コプロセッサで用いられる階層的画像操作処理を示す図Diagram showing hierarchical image manipulation processing used in the coprocessor コプロセッサで用いられる階層的画像操作処理を示す図Diagram showing hierarchical image manipulation processing used in the coprocessor コプロセッサで用いられる階層的画像操作処理を示す図Diagram showing hierarchical image manipulation processing used in the coprocessor コプロセッサで用いられる階層的画像操作処理を示す図Diagram showing hierarchical image manipulation processing used in the coprocessor 階層的画像命令での命令ワード符号を示す図Diagram showing instruction word codes in hierarchical image instructions コプロセッサで用いられるフロー制御命令の命令ワード符号を示す図The figure which shows the instruction word code of the flow control instruction which is used with the coprocessor ピクセルオーガナイザをより詳細に示す図Diagram showing the pixel organizer in more detail ピクセルオーガナイザにおけるオペランドフェッチ部をより詳細に示す図The figure which shows the operand fetch section in the pixel organizer in more detail コプロセッサで用いられる種々の格納フォーマットを示す図Diagram showing various storage formats used in coprocessors コプロセッサで用いられる種々の格納フォーマットを示す図Diagram showing various storage formats used in coprocessors コプロセッサで用いられる種々の格納フォーマットを示す図Diagram showing various storage formats used in coprocessors コプロセッサで用いられる種々の格納フォーマットを示す図Diagram showing various storage formats used in coprocessors コプロセッサで用いられる種々の格納フォーマットを示す図Diagram showing various storage formats used in coprocessors コプロセッサのピクセルオーガナイザにおけるＭＵＶアドレス生成部をより詳細に示す図The figure which shows the MUV address production | generation part in the pixel organizer of a coprocessor in detail. コプロセッサで用いられる多重値（ＭＵＶ）バッファのブロック図Block diagram of a multi-value (MUV) buffer used in a coprocessor 図１１６の符号化器の構造を示す図116 shows the structure of the encoder in FIG. 図１１６の復号器の構造を示す図116 shows the structure of the decoder in FIG. ＪＰＥＧモード（ピクセル分解）において読み出しアドレスを生成する図１１６のアドレス生成部の構造を示す図The figure which shows the structure of the address generation part of FIG. 116 which produces | generates a read address in JPEG mode (pixel decomposition | disassembly). ＪＰＥＧモード（ピクセル復元）において読み出しアドレスを生成する図１１６のアドレス生成部の構造を示す図The figure which shows the structure of the address generation part of FIG. 116 which produces | generates a read address in JPEG mode (pixel restoration). 図１１６の記憶装置を備えるメモリモジュールの構成を示す図116 is a diagram showing a configuration of a memory module including the storage device of FIG. 読み出しアドレスをメモリモジュールに多重化する回路の構造を示す図The figure which shows the structure of the circuit which multiplexes the read address to the memory module 単一ルックアップテーブルモードで動作するバッファ内にルックアップテーブルエントリがどのように格納されるかを示す図Diagram showing how lookup table entries are stored in a buffer operating in single lookup table mode 多重ルックアップテーブルモードで動作するバッファ内にルックアップテーブルエントリがどのように格納されるかを示す図Diagram showing how lookup table entries are stored in a buffer operating in multiple lookup table mode ＪＰＥＧモード（ピクセル分解）で動作するバッファ内にピクセルがどのように格納されるかを示す図Diagram showing how pixels are stored in a buffer operating in JPEG mode (pixel decomposition) ＪＰＥＧモード（ピクセル復元）で動作するバッファから単一カラーがどのように格納されるかを示す図Diagram showing how a single color is stored from a buffer operating in JPEG mode (pixel restoration) コプロセッサの結果オーガナイザの構造をより詳細に示す図Diagram showing the structure of the coprocessor results organizer in more detail コプロセッサのオペランドオーガナイザの構造をより詳細に示す図Diagram showing the structure of the coprocessor operand organizer in more detail コプロセッサにおいて用いられる主データパス部のためのコンピュータアーキテクチャのブロック図Block diagram of computer architecture for main data path used in coprocessor 更なる処理のために入力データオブジェクトを受け取り、格納し、再配列するための入力インターフェースのブロック図Block diagram of the input interface for receiving, storing and rearranging input data objects for further processing 入力データオブジェクトに対して算術演算を実行するための画像データプロセッサのブロック図Block diagram of an image data processor for performing arithmetic operations on input data objects 入力データオブジェクトの１つのチャネルに対して算術演算を実行するためのカラーチャネルプロセッサのブロック図A block diagram of a color channel processor for performing arithmetic operations on one channel of an input data object. カラーチャネルプロセッサにおける多機能ブロックのブロック図Block diagram of multi-function block in color channel processor 合成動作のためのブロック図Block diagram for compositing operation スキャンラインの逆変換を示す図Diagram showing inverse scanline transformation 指定されたピクセルにおける値を計算するために必要なステップのブロック図A block diagram of the steps necessary to calculate the value at a specified pixel 画像変換エンジンのブロック図Block diagram of image conversion engine カーネルデスクリップションにおける２つのフォーマットを示す図Diagram showing two formats for kernel description ｂｐフィールドの定義と解釈を示す図Diagram showing the definition and interpretation of the bp field 行列乗算を実行する乗算・加算部のブロック図Block diagram of multiplication / addition unit that performs matrix multiplication コプロセッサでのキャッシュ及びキャッシュ制御部における制御、アドレス及びデータフローを示す図Diagram showing control, address and data flow in cache and cache controller in coprocessor キャッシュのメモリ構成を示す図Diagram showing cache memory configuration コプロセッサにおけるキャッシュ制御部のためのアドレスフォーマットを示す図Diagram showing the address format for the cache controller in the coprocessor 、, カラーチャネルプロセッサにおける多機能ブロックのブロック図Block diagram of multi-function block in color channel processor 図１４４のキャッシュ及びキャッシュコントローラのコプロセッサ入力インターフェーススイッチを示す図144 shows the coprocessor input interface switch of the cache and cache controller of FIG. 主アドレス及びデータパスを示すコプロセッサの４ポートダイナミックローカルメモリ制御部を示す図The figure which shows the 4 port dynamic local memory control part of the coprocessor which shows the main address and the data path 図１４６の制御部における状態機構図146 is a state mechanism diagram of the control unit of FIG. 図１４６の仲裁部における機能を詳細にリストした擬似コードを示す図The figure which shows the pseudo code which listed the function in the arbitration department of Figure 146 in detail 図１４６において用いられる要求者プライオリティビットの構造および用語を示す図146 is a diagram showing the structure and terminology of requester priority bits used in FIG. コプロセッサにおける外部インターフェース制御部をより詳細に示す図Diagram showing the external interface controller in the coprocessor in more detail コプロセッサで用いられる物理アドレスへのマッピング処理又は物理アドレスからのマッピング処理を示す図The figure which shows the mapping process to the physical address used by the coprocessor or the mapping process from the physical address コプロセッサで用いられる物理アドレスへのマッピング処理又は物理アドレスからのマッピング処理を示す図The figure which shows the mapping process to the physical address used by the coprocessor or the mapping process from the physical address コプロセッサで用いられる物理アドレスへのマッピング処理又は物理アドレスからのマッピング処理を示す図The figure which shows the mapping process to the physical address used by the coprocessor or the mapping process from the physical address コプロセッサで用いられる物理アドレスへのマッピング処理又は物理アドレスからのマッピング処理を示す図The figure which shows the mapping process to the physical address used by the coprocessor or the mapping process from the physical address 、, 図１５０におけるＩＢｕｓ受信部をより詳細に示す図The figure which shows the IBus receiving part in FIG. 150 in detail 、, 図２におけるＲＢｕｓ受信部をより詳細に示す図The figure which shows the RBus receiving part in FIG. 2 in detail 、, 図１５０におけるメモリ管理部をより詳細に示す図The figure which shows the memory management part in FIG. 150 in detail 図２における周辺インターフェース制御部をより詳細に示す図The figure which shows the peripheral interface control part in FIG. 2 in detail

Claims

A device for decoding a plurality of data blocks encoded with a plurality of variable length codes interleaved with a variable length uncoded bit field and having a plurality of fixed length uncoded fields,
Removing the plurality of fixed length uncoded fields, interleaving the variable length uncoded bit field and the variable length uncoded bit field, and the plurality of variable length codes in the plurality of data blocks A pre-processing unit that outputs a plurality of position signals indicating the positions of the fixed-length uncoded fields of
Decoding data of variable-length encoded data input during the fixed-length non-encoded field is output from a decoding device during the position signal corresponding to the fixed-length non-encoded field. And a means for sending the position signal to an external device in synchronization with the data to be decoded.

A first processing unit having a first barrel shifter set and a first register, and processing the plurality of variable length codes interleaved with the variable length uncoded bit field;
A second processing unit which has a second barrel shifter set and a second register, and which processes a plurality of position signals indicating the positions of the plurality of fixed-length uncoded fields in a plurality of data blocks, The decoding apparatus according to claim 1, wherein the first and second processing units are the same, and outputs of the first and second barrel shifter sets and the first and second processing units receive the same control signal.

The output of the second processing unit that processes a position signal indicating the position of the fixed-length non-encoded field is used to determine the size of an unencoded variable-length field that is removed from the data stored in the data register at the time of decoding. The decoding device according to claim 2, wherein

The preprocessing unit removes a plurality of fixed-length uncoded fields, and converts the plurality of variable-length codes interleaved with the variable-length uncoded bit field and the variable-length uncoded bit field into a plurality of A tag indicating a plurality of fixed-length codes composed of fixed-length bit fields and one of the fixed-length bit fields being passed or removed in the preprocessing field or passed in the preprocessing field The decoding apparatus according to claim 1, wherein the tag is output so as to exist before or after a marker indicating a fixed-length uncoded field.

The decoding apparatus according to claim 1, wherein the data block is Huffman encoded.

A method of decoding a plurality of data blocks encoded with a plurality of variable length codes interleaved with a variable length uncoded bit field and having a plurality of fixed length uncoded fields,
Removing the plurality of fixed length uncoded fields, interleaving the variable length uncoded bit field and the variable length uncoded bit field, and the plurality of variable length codes in the plurality of data blocks A preprocessing step of outputting a plurality of position signals indicating the positions of the fixed-length uncoded fields
Decoding data of variable-length encoded data input during the fixed-length non-encoded field is output from a decoding device during the position signal corresponding to the fixed-length non-encoded field. And a step of sending the position signal to an external device in synchronization with the data to be decoded.

Processing the plurality of variable length codes interleaved with the variable length non-encoded bit field using a first processing unit having a first barrel shifter set and a first register;
The method further includes the step of processing a plurality of position signals indicating positions of the plurality of fixed-length uncoded fields in the plurality of data blocks using a second processing unit having a second barrel shifter set and a second register. ,
7. The decoding method according to claim 6, wherein the first and second processing units are the same, and outputs of the first and second barrel shifter sets and the first and second processing units receive the same control signal.

In accordance with the output of the second processing unit that processes the position signal indicating the position of the fixed-length non-encoded field, the size of the non-encoded variable-length field to be removed from the data stored in the data register at the time of decoding is set. The decoding method according to claim 7, further comprising a step of determining.

In the pre-processing unit step, a plurality of fixed-length uncoded fields are removed, and the plurality of variable-length codes interleaved with the variable-length uncoded bit field and the variable-length uncoded bit field are As a plurality of fixed-length codes consisting of a fixed-length bit field, and indicating that one fixed-length bit field is passed or removed in the pre-processing field or passed in the pre-processing field 7. The decoding method according to claim 6, wherein the decoding method has a tag, and the tag is output so as to exist before or after a marker indicating a fixed-length uncoded field.

The decoding method according to claim 6, wherein the data block is Huffman encoded.

Storage medium for storing the method described as computer executable program code to any one of claims 6-1 0.