JPH11353305A

JPH11353305A - Address specification for vector register

Info

Publication number: JPH11353305A
Application number: JP11147256A
Authority: JP
Inventors: Neal Hinds Christopher; ニールヒンズクリストファー; Paul Ellwood Matthew; ポールエルウッドマシュー
Original assignee: ARM Ltd; Advanced Risc Machines Ltd
Current assignee: ARM Ltd
Priority date: 1998-05-27
Filing date: 1999-05-26
Publication date: 1999-12-24
Also published as: GB2338094B; GB2338094A; GB9905296D0

Abstract

PROBLEM TO BE SOLVED: To evade the increase of overhead in cost and complexity and the reduction of efficiency due to the execution of individual loading and storing operation. SOLUTION: Data values are transferred between a memory and a register in a register bank 38 by using a continuous block memory accessing instruction. A vector processing instruction specifies a processing operation series to be executed for a series of data values stored in the register. A register address is increased only by a value controlled by a slide value during the period of each operation. Thereby the register address can be increased only by a value '0', '1', '2', or '4' in each repetition. Consequently a mechanism for storing a block memory accessing instruction in continuous memory addresses while supporting a vector operation that data values required for repetition are not adjacent to each other in the memory can be obtained.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明はデータ処理の分野に
関するものである。更に詳しく述べると、本発明はベク
トルデータ処理レジスタをそなえたデータ処理システム
に関するものである。[0001] The present invention relates to the field of data processing. More specifically, the present invention relates to a data processing system having a vector data processing register.

【０００２】[0002]

【従来の技術】データ処理命令には通常、操作コード
（ｏｐｃｏｄｅ）部分と一つ以上のレジスタ指定フィー
ルドが含まれている。システムによっては、レジスタは
ベクトルレジスタまたはスカラレジスタとして取り扱う
ことができる。ベクトルレジスタは、各々がそれ自身の
データ値を記憶するレジスタの系列を指定する。データ
処理命令が系列の中の各データ値に対してそれのオペレ
ーションを繰り返すにつれて、データ値は別々にオペレ
ーションを受ける。逆に、スカラレジスタは、他のレジ
スタとはかかわりなく動作する単一の値を記憶する単一
のレジスタである。2. Description of the Related Art Data processing instructions usually include an operation code (opcode) portion and one or more register specification fields. In some systems, registers can be treated as vector registers or scalar registers. Vector registers specify a sequence of registers, each storing its own data value. As the data processing instruction repeats its operation on each data value in the sequence, the data values undergo separate operations. Conversely, a scalar register is a single register that stores a single value that operates independently of other registers.

【０００３】ベクトルレジスタを使用するデータ処理命
令には、純粋にスカラのオペレーションに比べて多数の
利点がある。必要な命令帯域幅を削減することができ
る。（ＦＩＲフィルタのようなＤＳＰ（ディジタル信号
処理）機能でよくあることであるが、）複数の類似のデ
ータ処理オペレーションを実行するように指定するため
に、単一のデータ処理命令しか必要とされないからであ
る。簡単さの故に望ましい単一送出マシンの場合（すな
わち、サイクル毎に一つの命令がフェッチされ、復号さ
れる場合）には、異なるベクトル命令に対して並列に実
行する多重機能ユニットで、より高い性能を達成するこ
とができる。Data processing instructions that use vector registers have a number of advantages over purely scalar operations. The required instruction bandwidth can be reduced. Since a DSP (Digital Signal Processing) function such as an FIR filter is common, only a single data processing instruction is required to specify multiple similar data processing operations to be performed. It is. In the case of a single dispatch machine, which is desirable due to its simplicity (ie, one instruction is fetched and decoded per cycle), higher performance with multiple functional units executing in parallel on different vector instructions Can be achieved.

【０００４】付図の図１６および１７はそれぞれ、クレ
イ１（Ｃｒａｙ１）プロセッサのレジスタバンクおよ
びディジタルイクイップメント社（ＤｉｇｉｔａｌＥ
ｑｕｉｐｍｅｎｔＣｏｒｐｏｒａｔｉｏｎ）のマルチ
チタン（ＭｕｌｔｉＴｉｔａｎ）プロセッサのレジスタ
バンクを示す。これらの従来技術のプロセッサの両方と
も、ベクトルレジスタとスカラレジスタを設ける。FIGS. 16 and 17 of the accompanying drawings respectively show a register bank of a Cray 1 processor and a Digital Equipment Corporation (Digital E).
3 shows a register bank of a multi-titanium processor of the Equipment Corporation. Both of these prior art processors provide a vector register and a scalar register.

【０００５】クレイ１（Ｃｒａｙ１）の場合には、別
々のベクトルレジスタバンク１０とスカラレジスタバン
ク１２が設けられる。１６ビットの命令は、ベクトルま
たはスカラとして取り扱われている命令で指定されるレ
ジスタの異なる組み合わせに対応する個別の操作コード
（ｏｐｃｏｄｅ）を与える。３ビットのレジスタ指定フ
ィールドＲ１、Ｒ２、Ｒ３により、８個のスカラレジス
タと８個のベクトルレジスタをアドレス指定することが
できる。実際上は、各ベクトルレジスタには多数のレジ
スタが含まれ、各レジスタは異なるデータ値を記憶する
ことができ、長さレジスタ１６に記憶されたベクトル長
さ値およびマスクレジスタ１８の中に記憶されたマスク
ビットに基づいてアクセスすることができる。[0005] In the case of Cray 1, a separate vector register bank 10 and a scalar register bank 12 are provided. The 16-bit instructions provide individual opcodes corresponding to different combinations of registers specified in the instructions being treated as vectors or scalars. Eight scalar registers and eight vector registers can be addressed by the 3-bit register specification fields R1, R2, R3. In practice, each vector register contains a number of registers, each register capable of storing a different data value, stored in the vector length value stored in the length register 16 and the mask register 18. It can be accessed based on the mask bits.

【０００６】クレイ１（Ｃｒａｙ１）プロセッサはメ
モリの中の非隣接アドレスへのロードと記憶を行えるよ
うにするスキャッタギャザ（ｓｃａｔｔｅｒｇａｔｈ
ｅｒ）機構も利用し、これにより主メモリの中に記憶さ
れる交互配置（インタリーブ）されたマトリクスと複素
数値の一方または両方を後の操作のため、適切なレジス
タにロードすることができる。[0006] The Cray 1 processor is a scatter gather that allows loading and storing to non-adjacent addresses in memory.
er) mechanism is also utilized whereby one or both of the interleaved matrix and complex values stored in main memory can be loaded into the appropriate registers for later manipulation.

【０００７】クレイ１（Ｃｒａｙ１）アーキテクチャ
によって、サポートされ得るオペレーションに多大な柔
軟性が得られるが、コストと複雑さに関して多大のオー
バヘッドが必要になるという欠点がある。[0007] The Cray 1 architecture provides a great deal of flexibility in the operations that can be supported, but has the disadvantage of requiring a great deal of overhead in terms of cost and complexity.

【０００８】マルチチタン（ＭｕｌｔｉＴｉｔａｎ）プ
ロセッサでは単一レジスタバンク２０が設けられ、その
中では各レジスタはスカラとして、またはベクトルレジ
スタの一部として動作することができる。マルチチタン
プロセッサは３２ビット命令を使用することにより、そ
れのデータ処理オペレーションを指定する。命令には、
レジスタがベクトルであるかスカラであるかを指定する
フィールドＶＳ２、ＶＳ３、およびベクトルの長さ（Ｌ
ｅｎ）が含まれている。命令自体の中にベクトル長さを
設けることにより、自己修正コードに頼る必要無しにベ
クトル長さを全体的に変更することが難しくなる。それ
に対してプロセッサが動作しているデータが交互配置
（インタリーブ）されたマトリクスと複素データの一方
または両方である場合には、個別のロードと記憶のオペ
レーションを行うことにより、ベクトル命令によって操
作すべき各データ項目のロードと記憶を行わなければな
らない。これは効率が悪いという欠点がある。[0008] In a MultiTitan processor, a single register bank 20 is provided, in which each register can operate as a scalar or as part of a vector register. The multi-titanium processor specifies its data processing operations by using 32-bit instructions. Instructions include:
Fields VS2, VS3 specifying whether the register is a vector or a scalar, and the length of the vector (L
en). Providing the vector length in the instruction itself makes it difficult to change the overall vector length without having to resort to self-modifying code. In contrast, if the data on which the processor is operating is one or both of an interleaved matrix and complex data, it should be manipulated by vector instructions by performing separate load and store operations. Each data item must be loaded and stored. This has the disadvantage of being inefficient.

【０００９】[0009]

【発明が解決しょうとする課題】本発明の一つの目的
は、上記のシステムの制限の少なくともいくつかに対処
することである。SUMMARY OF THE INVENTION It is an object of the present invention to address at least some of the above system limitations.

【００１０】[0010]

【課題を解決するための手段】一つの側面から見ると、
本発明はデータ処理装置を提供し、このデータ処理装置
は、操作すべきデータ値を保持するための複数のレジス
タをそなえたレジスタバンクであって、前記レジスタの
各々がレジスタアドレスをそなえる、レジスタバンク
と、少なくとも一つのブロックメモリアクセス命令に応
答して、メモリの中の複数の連続アドレスのメモリロケ
ーションと前記レジスタバンクの中の複数の連続アドレ
スのレジスタの間でメモリアクセスを行うためのメモリ
アクセス回路と、少なくとも一つのベクトル処理命令に
応答して、前記レジスタの所定の系列の中に記憶された
オペランドに対してデータ処理オペレーションを複数回
引き続いて実行するための命令復号器と、を含み、前記
データ処理オペレーションの各実行の間に、前記命令復
号器はストライド値（ｓｔｒｉｄｅ）に応答して、前記
データ処理オペレーションで使用されるオペランドを記
憶するレジスタのレジスタアドレスを前記ストライド値
によって指定された量だけインクリメント（増分）す
る。[Means for Solving the Problems] From one aspect,
The present invention provides a data processing device, comprising: a register bank having a plurality of registers for holding data values to be operated, wherein each of the registers has a register address. And a memory access circuit for performing a memory access between a plurality of contiguous address memory locations in the memory and a plurality of contiguous address registers in the register bank in response to at least one block memory access command And an instruction decoder responsive to at least one vector processing instruction for successively performing data processing operations on operands stored in the predetermined series of registers. During each execution of a data processing operation, the instruction decoder In response to the stride), is incremented by the amount specified register address of the register storing the operands used by the data processing operations by said stride value (increment).

【００１１】本発明により、スキャッタギャザ（ｓｃａ
ｔｔｅｒｇａｔｈｅｒ）機構のコストと複雑さが無
い、連続ブロックメモリアクセス命令の簡単なインプリ
メンテーションを使用することができる。反復毎にレジ
スタアドレスに印加されるインクリメント（ベクトル相
互の間の変数）によって、システムは非隣接データ値を
通って進み、処理されるべき適切なマトリクス、実数ま
たは虚数の値を選択することができる。実際上は、ベク
トル命令の実行内でレジスタアドレス相互の間のインク
リメントを固定値に制限することは、たいした制約でな
いことがわかった。多数の重要な現実のタスクはこの振
る舞いに合致するからである。要するに、システムはメ
モリに対して効率的なブロックメモリアクセス命令を用
いることができ、この場合、単一の命令でオペレーショ
ンの系列を指定できるというベクトル処理の能力が維持
されるとともに、データが一様な形式で配列される。According to the present invention, scatter gather (sca)
A simple implementation of contiguous block memory access instructions can be used without the cost and complexity of the tergather mechanism. The increment (variable between vectors) applied to the register address at each iteration allows the system to proceed through non-adjacent data values and select the appropriate matrix, real or imaginary value to be processed. . In practice, it has been found that limiting the increment between register addresses to a fixed value within the execution of a vector instruction is not a significant constraint. Many important real tasks are consistent with this behavior. In essence, the system can use efficient block memory access instructions to memory, which preserves the vector processing ability to specify a sequence of operations with a single instruction while maintaining uniform data. Are arranged in a simple format.

【００１２】ベクトル処理オペレーションで指定された
各レジスタに自身のストライド値を与えることも可能で
はあるが、好適実施例では、前記ベクトル処理命令を実
行するときにベクトルレジスタとして動作するすべての
レジスタに前記ストライド値が適用される。Although it is possible to provide each register specified in the vector processing operation with its own stride value, in a preferred embodiment, all registers operating as vector registers when executing the vector processing instruction are described above. The stride value applies.

【００１３】この特徴により、インプリメンテーション
が簡単になるとともに、実際上はたいした制約にならな
いことがわかった。It has been found that this feature simplifies implementation and is not a significant constraint in practice.

【００１４】同様に、ベクトル命令自体の中にストライ
ド値を指定することも可能である。しかし、本発明の好
適実施例では、前記ベクトル処理命令にかかわりなく前
記ストライド値が設定される。Similarly, it is possible to specify a stride value in the vector instruction itself. However, in a preferred embodiment of the present invention, the stride value is set regardless of the vector processing instruction.

【００１５】この特徴により、実行すべき処理に実際上
たいした制約を加えることなく、ベクトル命令の中のビ
ットスペースが節約される。実際上、ストライド値は通
常、長期間一定のままであるので、命令の中で節約され
るビットスペースは、ストライド値を変える特定の命令
を実行することが時たま必要になることを補って余りあ
ることがわかった。[0015] This feature saves bit space in vector instructions without actually imposing any significant restrictions on the processing to be performed. In practice, the stride value usually remains constant for long periods of time, so the bit space saved in the instructions more than compensates for the occasional need to execute certain instructions that change the stride value. I understand.

【００１６】実行されるすべてのベクトル処理命令にた
いする前記レジスタアドレスインクリメントの前記量を
指定する前記ストライド値を制御レジスタに記憶する実
施例を使用することが、特に効率的であることがわかっ
た。It has been found to be particularly efficient to use an embodiment in which the stride value specifying the amount of the register address increment for all executed vector processing instructions is stored in a control register.

【００１７】第一の加算器入力に与えられる現在のレジ
スタアドレス入力と第二の加算器入力に与えられる前記
量入力とを加算することにより前記レジスタアドレスを
インクリメントするレジスタアドレス加算器が前記命令
復号器に含まれる実施例を提供することにより、ベクト
ルオペレーション相互の間のレジスタアドレスのインク
リメントを都合よく達成することができる。The register address adder increments the register address by adding a current register address input provided to a first adder input and the quantity input provided to a second adder input. By providing an embodiment included in the implementation, incrementing the register address between vector operations can be conveniently achieved.

【００１８】オペランドが使用されるべき順序でメモリ
の中に連続して配列されなくても、ブロックメモリアク
セス命令を使用することができ、そしてベクトル命令を
使用してデータ処理オペレーションを実行することがで
きるシステムを提供する際に、このような実施例はコス
トと複雑さに関してオーバヘッドが少なくなる。Even if the operands are not arranged sequentially in memory in the order in which they are to be used, block memory access instructions can be used and data processing operations can be performed using vector instructions. In providing a possible system, such an embodiment has less cost and complexity overhead.

【００１９】ストライド値がインクリメントすべき量を
直接表現するようにもできるが、本発明の好適実施例で
は、前記ストライド値は前記量の符号化された表現であ
る。Although the stride value may directly represent the amount to be incremented, in a preferred embodiment of the present invention the stride value is an encoded representation of the amount.

【００２０】実際上は、共通に与えられるインクリメン
トはすべての可能なインクリメント値のサブセット（部
分集合）であるので、ストライド値に量を直接表現させ
ることでビットスペースを無駄にするよりも量を符号化
したものをストライド値とする方が効率が良いことがわ
かる。In practice, the commonly given increment is a subset of all possible increment values, so having the stride value directly represent the quantity signifies the quantity rather than wasting bit space. It can be seen that it is more efficient to use the converted value as the stride value.

【００２１】前記レジスタバンクの中のレジスタのサブ
セット（部分集合）の中に前記レジスタアドレスがある
かに基づいて、レジスタアドレスをインクリメントする
かも制御される実施例で、洗練度が向上する。In embodiments where the register address is also incremented or controlled based on whether the register address is in a subset of registers in the register bank, sophistication is improved.

【００２２】この特徴により、乏しい操作コード（ｏｐ
ｃｏｄｅ）ビットではなくて、使用されるレジスタ番号
によって、付加的な命令を符号化することができる。Due to this feature, poor operation codes (op
Additional instructions can be encoded by the register number used rather than the code) bits.

【００２３】適用されるインクリメントは、レジスタア
ドレスに正または負の数を加算することとすることがで
きる。しかし、最も有用なインクリメントは２の正の自
然数乗であることがわかった。The increment applied may be to add a positive or negative number to the register address. However, the most useful increment has been found to be a positive natural power of two.

【００２４】もう一つの側面から見ると、本発明はデー
タ処理方法を提供する。このデータ処理方法は、各レジ
スタがレジスタアドレスをそなえる複数のレジスタをそ
なえたレジスタバンクに、操作すべきデータ値を保持す
るステップと、少なくとも一つのブロックメモリアクセ
ス命令に応答して、メモリの中の複数の連続アドレスの
メモリロケーションと前記レジスタバンクの中の複数の
連続アドレスのレジスタとの間でメモリアクセスを行う
ステップと、少なくとも一つのベクトル処理命令に応答
して、前記レジスタの所定の系列の中に記憶されたオペ
ランドに対してデータ処理オペレーションを複数回引き
続いて実行するステップと、を含み、前記データ処理オ
ペレーションの各実行の間に、ストライド値に応答し
て、前記データ処理オペレーションで使用されるオペラ
ンドを記憶するレジスタのレジスタアドレスが前記スト
ライド値によって指定された量だけインクリメント（増
分）される。Viewed from another aspect, the present invention provides a data processing method. The data processing method includes the steps of: holding a data value to be operated in a register bank having a plurality of registers each having a register address; and responding to at least one block memory access command, Performing a memory access between a memory location at a plurality of contiguous addresses and a register at a plurality of contiguous addresses in the register bank; and responsive to at least one vector processing instruction, Successively performing a data processing operation on the operands stored in the data processing operation during each execution of the data processing operation in response to a stride value used in the data processing operation. Register address of the register that stores the operand It is incremented (increment) the amount specified by the stride value.

【００２５】本発明を、付図に示されたその好適実施例
を参照して更に説明する。これらは例を示しているに過
ぎない。The present invention will be further described with reference to a preferred embodiment thereof shown in the accompanying drawings. These are only examples.

【００２６】セクション１図１は、主プロセッサ２４、浮動小数点コユニットプロ
セッサ２６、キャッシュメモリ２８、主メモリ３０、お
よび入力／出力システム３２を含むデータ処理システム
を示す。主プロセッサ２４、キャッシュメモリ２８、主
メモリ３０、および入力／出力システム３２は主バス３
４を介してリンクされる。コプロセッサバス３６は主プ
ロセッサ２４を浮動小数点ユニットコプロセッサ２６に
リンクする。 Section 1 FIG. 1 shows a data processing system including a main processor 24, a floating point co-unit processor 26, a cache memory 28, a main memory 30, and an input / output system 32. The main processor 24, the cache memory 28, the main memory 30, and the input / output system 32 are connected to the main bus 3
4 are linked. Coprocessor bus 36 links main processor 24 to floating point unit coprocessor 26.

【００２７】オペレーションについて説明する。主プロ
セッサ２４（ＡＲＭコアとも呼ぶ）は、キャッシュメモ
リ２８、主メモリ３０、および入力／出力システム３２
の相互作用を含む一般の型のデータ処理オペレーション
を制御するデータ処理命令のストリームを実行する。デ
ータ処理命令のストリームの中に、コプロセッサ命令が
埋め込まれる。主プロセッサ２４はこれらのコプロセッ
サ命令を、付属のコプロセッサが実行すべき型のものと
認識する。これに応じて主プロセッサ２４は、これらの
コプロセッサ命令をコプロセッサバス３６上に送出す
る。このコプロセッサバス３６から、任意の付属のコプ
ロセッサがコプロセッサ命令を受信する。この場合、浮
動小数点ユニットコプロセッサ２６は、自分宛てのもの
であると検出した受信コプロセッサ命令をアクセプトし
て受け入れて、実行する。この検出は、コプロセッサ命
令の中のコプロセッサ番号フィールドを介して行われ
る。The operation will be described. The main processor 24 (also called an ARM core) includes a cache memory 28, a main memory 30, and an input / output system 32.
Executes a stream of data processing instructions that control general types of data processing operations, including the interaction of Coprocessor instructions are embedded in the stream of data processing instructions. Main processor 24 recognizes these coprocessor instructions as being of a type to be executed by the associated coprocessor. In response, main processor 24 sends these coprocessor instructions on coprocessor bus 36. From this coprocessor bus 36, any attached coprocessors receive coprocessor instructions. In this case, the floating-point unit coprocessor 26 accepts and accepts the received coprocessor instruction detected as being addressed to itself and executes it. This detection is performed via a coprocessor number field in a coprocessor instruction.

【００２８】図２は、浮動小数点ユニットコプロセッサ
２６を更に詳しく示す概略図である。浮動小数点ユニッ
トコプロセッサ２６には、３２個の３２ビットレジスタ
（図２には少ししか示されていない）で構成されるレジ
スタバンク３８が含まれる。これらのレジスタは、３２
ビットデータ値を各々記憶する単精度レジスタとして個
別に動作するか、または一緒になって６４ビットデータ
値を記憶する対として動作することができる。浮動小数
点ユニットコプロセッサ２６の中には、パイプライン形
乗算累算ユニット４０とロード記憶制御ユニット４２と
が設けられる。適当な状況では、乗算累算ユニット４０
とロード記憶制御ユニット４２とが同時に動作すること
ができる。この場合、乗算累算ユニット４０がレジスタ
バンク３８の中のデータ値に対して（乗算累算オペレー
ションと他のオペレーションを含む）算術オペレーショ
ンを行っている間に、ロード記憶制御ユニット４２は乗
算累算ユニット４０が使用していないデータ値の、主プ
ロセッサ２４を介した浮動小数点ユニットコプロセッサ
２６とのやり取りを行う。FIG. 2 is a schematic diagram illustrating the floating point unit coprocessor 26 in more detail. Floating point unit coprocessor 26 includes a register bank 38 comprised of thirty-two 32-bit registers (only a few shown in FIG. 2). These registers contain 32
It can operate individually as single precision registers each storing a bit data value, or together can operate as a pair storing a 64-bit data value. In the floating point unit coprocessor 26, a pipeline type multiply-accumulate unit 40 and a load storage control unit 42 are provided. In appropriate circumstances, the multiply-accumulate unit 40
And the load storage control unit 42 can operate simultaneously. In this case, while the multiply-accumulate unit 40 is performing arithmetic operations (including multiply-accumulate operations and other operations) on the data values in the register bank 38, the load storage control unit 42 The data values not used by the unit 40 are exchanged with the floating-point unit coprocessor 26 via the main processor 24.

【００２９】浮動小数点ユニットコプロセッサ２６の中
では、アクセプトされたコプロセッサ命令が命令レジス
タ４４の中にラッチされる。この単純化された図ではコ
プロセッサ命令は、操作コード（ｏｐｃｏｄｅ）部分の
後に、３個のレジスタ指定フィールドＲ１とＲ２とＲ３
とが続いて構成されると考えることができる（実際に
は、これらのフィールドは命令全体の中で別の仕方で分
割、展開してもよい）。これらのレジスタ指定フィール
ドＲ１とＲ２とＲ３とは、実行されているデータ処理オ
ペレーションに対するデスティネーション（宛て先）、
第一のソース（発生源）、および第二のソースとしての
役目を果たすレジスタバンク３８の中のレジスタにそれ
ぞれ対応する。ベクトル制御レジスタ４６はベクトル制
御レジスタ命令に応じて、長さとストライド値の初期化
と更新を行うことができる。ベクトル長さとストライド
値は浮動小数点ユニットコプロセッサ２６の中で全体的
に適用されるので、これらの値は全体的に動的に変更で
き、自動修正モードに頼る必要は無い。In the floating point unit coprocessor 26, the accepted coprocessor instruction is latched in the instruction register 44. In this simplified diagram, the coprocessor instruction consists of three register specification fields R1, R2 and R3 after the operation code (opcode) part.
(In fact, these fields may be split and expanded differently in the overall instruction). These register designation fields R1, R2, and R3 are used to specify the destination for the data processing operation being performed,
They correspond to registers in a register bank 38 serving as a first source (source) and a second source, respectively. The vector control register 46 can initialize and update the length and stride value according to the vector control register instruction. Since the vector length and stride values are globally applied in the floating point unit coprocessor 26, these values can be dynamically changed globally without having to resort to an auto-correct mode.

【００３０】レジスタ制御−命令送出ユニット４８、ロ
ード記憶制御ユニット４２、およびベクトル制御ユニッ
ト５０はまとめて、命令復号器の役割の主要部分を実行
するものと考えることができる。レジスタ制御−命令送
出ユニット４８は操作コードとレジスタ指定フィールド
Ｒ１とＲ２とＲ３とに応答し、まず、初期レジスタアク
セス（アドレス）信号をレジスタバンク３８に出力す
る。操作コードに対するふは行わないし、ベクトル制御
ユニット５０を使用する必要は無い。このように初期レ
ジスタ値に直接アクセスすることは、より高速のインプ
リメンテーションを実現する助けとなる。ベクトルレジ
スタが指定されれば、ベクトル制御ユニット５０は３ビ
ットのインクリメンタ（加算器）５２を使用してレジス
タアクセス信号の必要な系列を発生する役目を果たす。
ベクトル制御ユニット５０は、レジスタバンク３８のそ
のアドレス指定を行う際、ベクトル制御レジスタ４６の
中に記憶された長さの値とストライド値とに応答する。
パイプライン形乗算累算ユニット４０とそれと同時に動
作するロード記憶制御ユニット４２とがデータ一貫性の
問題を生じないように、レジスタスコアボード５４を設
けてレジスタロッキングを行う（代わりに、レジスタス
コアボード５４をレジスタ制御−命令送出ユニット４８
の一部とみなしてもよい）。The register control-instruction dispatch unit 48, load storage control unit 42, and vector control unit 50 can be considered collectively to perform the major part of the instruction decoder role. The register control-command sending unit 48 responds to the operation code and the register specification fields R1, R2, and R3, and first outputs an initial register access (address) signal to the register bank 38. No action code is performed and there is no need to use the vector control unit 50. Such direct access to the initial register values helps to achieve a faster implementation. If a vector register is specified, the vector control unit 50 serves to generate the required sequence of register access signals using a 3-bit incrementer (adder) 52.
The vector control unit 50 responds to the length value and stride value stored in the vector control register 46 when addressing the register bank 38.
A register scoreboard 54 is provided for register locking so that the pipeline type multiply-accumulate unit 40 and the load storage control unit 42 operating at the same time do not cause a data consistency problem. Register control-instruction sending unit 48
May be considered a part of).

【００３１】命令レジスタ４４の中の操作コードは実行
すべきデータ処理オペレーションの性質（たとえば、命
令が加算、減算、乗算、除算、ロード、記憶等である
か）を指定する。これは、指定されているレジスタのベ
クトルまたはスカラの性質によらない。これは更に、命
令復号化と乗算累算ユニット４０の設定を単純化する。
第一のレジスタ指定値Ｒ１と第二のレジスタ指定値Ｒ２
は一緒になって、操作コードによって指定されるオペレ
ーションのベクトル／スカラの性質を復号化する。符号
化によってサポートされる三つの共通の場合は、Ｓ＝Ｓ
＊Ｓ（たとえば、ＣコードのブロックからＣコンパイラ
によって作成されるような基本ランダム計算）、Ｖ＝Ｖ
ｏｐＳ（たとえば、ベクトルの要素をスケーリングする
ため）、およびＶ＝ＶｏｐＶ（たとえば、ＦＩＲフィル
タ、図形変換のようなマトリックスオペレーション）で
ある（注意すべきことは、この前後関係では、「ｏｐ」
は一般的なオペレーションを示し、シンタクスはデステ
ィネーション＝第二のオペランドｏｐ第一のオペランド
という形式になっているということである）。命令によ
っては（たとえば、比較、零または絶対値との比較）、
デスティネーションレジスタが無い（たとえば、出力が
条件フラグである）、または入力オペランドが少ない
（零との比較に１入力オペランドしかない）ことがあり
得る。これらの場合には、ベクトル／スカラの性質のよ
うなオプションを指定するために、より大きな操作コー
ドビット空間が利用でき、各オペランドに対してレジス
タの全範囲を利用可能とすることができる（たとえば、
レジスタがなんであれ、比較は常に完全にスカラとして
もよい）。The operation code in instruction register 44 specifies the nature of the data processing operation to be performed (eg, whether the instruction is addition, subtraction, multiplication, division, loading, storing, etc.). This does not depend on the nature of the specified register vector or scalar. This further simplifies instruction decoding and multiply accumulate unit 40 settings.
First register specification value R1 and second register specification value R2
Together decode the vector / scalar nature of the operation specified by the operation code. For the three common cases supported by the encoding, S = S
* S (eg, a basic random calculation as created by a C compiler from a block of C code), V = V
opS (eg, to scale the elements of the vector), and V = VopV (eg, FIR filters, matrix operations such as graphic transformations) (note that in this context, “op”
Indicates a general operation, where the syntax is of the form destination = second operand op first operand). For some instructions (for example, compare, compare to zero or absolute value)
There may be no destination register (eg, the output is a condition flag), or there may be few input operands (only one input operand to compare to zero). In these cases, a larger opcode bit space is available to specify options such as vector / scalar properties, and the full range of registers can be made available for each operand (eg, ,
Whatever the register, the comparison may always be completely scalar).

【００３２】一緒になって命令復号器の役割の主要部分
を果たすレジスタ制御−命令送出ユニット４８とベクト
ル制御ユニット５０とは、第一のレジスタ指定フィール
ドＲ１と第二のレジスタ指定フィールドＲ２とに応答し
て、指定されたデータ処理オペレーションのベクトル／
スカラの性質を判定した後、制御する。ベクトル制御レ
ジスタ４６の中に記憶された長さ値が１の長さ（０の記
憶された値に対応する）を示していれば、これは純粋に
スカラのオペレーションの初期表示として使用すること
ができる。The register control-instruction sending unit 48 and the vector control unit 50, which together play a major part in the role of the instruction decoder, respond to the first register specification field R1 and the second register specification field R2. And the vector of the specified data processing operation
After determining the nature of the scalar, control. If the length value stored in the vector control register 46 indicates a length of 1 (corresponding to a stored value of 0), this may be used as an initial indication of purely scalar operations. it can.

【００３３】図３は、単精度モードでレジスタ指定値か
らベクトル／スカラの性質を復号するために使用される
処理論理を示す流れ図である。ステップ５６で、ベクト
ル長さが全体的に１として設定されているか（長さ値が
０に等しい）のテストが行われる。ベクトル長さが１で
あれば、ステップ５８ですべてのレジスタはスカラとし
て扱われる。ステップ６０で、デスティネーションレジ
スタＲ１がレンジＳ０からＳ７の中にあるか否かについ
てのテストが行われる。これが事実であれば、オペレー
ションはすべてスカラであり、ステップ６２に示される
ように、Ｓ＝ＳｏｐＳの形式になっている。ステップ６
０か”ｎｏ”を返すと、ステップ６４に示されるように
デスティネーションはベクトルであるものと判定され
る。デスティネーションがベクトルであれば、符号化は
第二のオペランドもベクトルであるとみなす。したがっ
て、この段階で残っている二つの可能性は、Ｖ＝Ｖｏｐ
ＳとＶ＝ＶｏｐＶである。これらの二つの可能性の区別
は、第一のオペランドがＳ０からＳ７の中の一つである
か判定するステップ６６のテストによって行われる。こ
れが事実であれば、オペレーションはＶ＝ＶｏｐＳであ
り、そうでなければＶ＝ＶｏｐＶである。これらの状態
はステップ６８と７０でそれぞれ認識される。FIG. 3 is a flowchart showing the processing logic used to decode the vector / scalar property from the register specified value in single precision mode. In step 56, a test is made whether the vector length is set as a whole as 1 (length value equals 0). If the vector length is 1, at step 58 all registers are treated as scalars. At step 60, a test is made as to whether the destination register R1 is in the range S0 to S7. If this is the case, the operations are all scalars, in the form S = SopS, as shown in step 62. Step 6
If 0 or "no" is returned, the destination is determined to be a vector, as shown in step 64. If the destination is a vector, the encoding assumes that the second operand is also a vector. Therefore, the two possibilities remaining at this stage are: V = Vop
S and V = VopV. The distinction between these two possibilities is made by a test in step 66 which determines whether the first operand is one of S0 to S7. If this is the case, the operation is V = VopS, otherwise V = VopV. These states are recognized in steps 68 and 70, respectively.

【００３４】注意すべきことは、ベクトル長さが１に設
定されたときには、レジスタバンク３８の３２個のレジ
スタのすべてをスカラとして使用することができるとい
うことである。これは、オペレーションのスカラ性がス
テップ５８で認識され、デスティネーションに使用し得
るレジスタの範囲を制限するステップ６０のテストに頼
る必要が無いからである。ベクトルとスカラの混合の命
令が使用されているときに、すべてがスカラのオペレー
ションを認識する際に、ステップ６０のテストは有用で
ある。ベクトルとスカラの混合のモードで動作している
とき、第一のオペランドがスカラであれば、それはＳ０
からＳ７のいずれかであり得るのに対して、第一のオペ
ランドがベクトルであれば、それはＳ８からＳ３１のい
ずれかであり得るということもわかる。第一のオペラン
ドがベクトルである場合に、レジスタバンクの中で利用
し得るレジスタ数を３倍にしたのは、ベクトルオペレー
ションを使用するときにデータ値の系列を保持するため
に必要なレジスタ数が一般に多くなることに対する適応
である。It should be noted that when the vector length is set to 1, all 32 registers of the register bank 38 can be used as scalars. This is because the scalar nature of the operation is recognized at step 58 and it is not necessary to resort to the test of step 60 which limits the range of registers available for the destination. The test of step 60 is useful in recognizing scalar operations when mixed vector and scalar instructions are used. When operating in mixed vector and scalar mode, if the first operand is a scalar, it is S0
It can also be seen that if the first operand is a vector, it can be any of S8 to S31. When the first operand is a vector, doubling the number of available registers in the register bank is because the number of registers needed to hold the sequence of data values when using vector operations It is generally an adaptation to the growing.

【００３５】実行したい普通のオペレーションは図形変
換であることが理解されよう。一般的な場合には、実行
すべき変換は４＊４マトリックスで表すことができる。
このような計算でのオペランドの再使用は、ベクトルと
して操作し得るレジスタにマトリックス値を記憶するこ
とが望ましいということを意味する。同様に、入力画素
値は通常、４個のレジスタに記憶され、この４個のレジ
スタも再使用を助けるためにベクトルとして操作するこ
とができるべきである。マトリックスオペレーションの
出力は通常、４個のレジスタに記憶された（別々のベク
トル行乗算を累算した）スカラとなる。入力値と出力値
とを二重に送り込む（ｄｏｕｂｌｅｐｕｍｐ）ことが
望ましい場合には、２４（＝１６＋４＋４）個のベクト
ルレジスタと８（＝４＋４）個のスカラレジスタとが必
要になる。It will be appreciated that the usual operation that one wants to perform is a graphics transformation. In the general case, the transformation to be performed can be represented by a 4 * 4 matrix.
The reuse of operands in such calculations means that it is desirable to store matrix values in registers that can be manipulated as vectors. Similarly, input pixel values are typically stored in four registers, which should also be able to be manipulated as vectors to aid reuse. The output of a matrix operation is typically a scalar (accumulated separate vector row multiplications) stored in four registers. If it is desired to double input and output values, 24 (= 16 + 4 + 4) vector registers and 8 (= 4 + 4) scalar registers are required.

【００３６】図４は、図３の流れ図に対応する流れ図で
あるが、この場合には倍精度モードを示している。前に
説明したように倍精度モードでは、レジスタバンク３８
の中のレジスタスロットは対として動作し、論理レジス
タＤ０からＤ１５に１６個の６４ビットデータ値を記憶
する。この場合には、レジスタのベクトル／スカラの性
質の符号化は図３のそれから変形され、ステップ６０と
６６のテストがそれぞれステップ７２と７４の「デステ
ィネーションはＤ０からＤ３の中の一つか？」と「第一
のオペランドはＤ０からＤ３の中の一つか？」になる。FIG. 4 is a flow chart corresponding to the flow chart of FIG. 3, but in this case the double precision mode is shown. As described earlier, in double precision mode, register bank 38
Register slots operate in pairs and store 16 64-bit data values in logical registers D0 through D15. In this case, the encoding of the vector / scalar nature of the register is modified from that of FIG. 3 and the tests of steps 60 and 66 are respectively performed at steps 72 and 74 of "Is the destination one of D0 to D3?" And "Is the first operand one of D0 to D3?"

【００３７】上記したようなレジスタ指定フィールド内
のレジスタのベクトル／スカラの性質の符号化により、
命令ビット空間は著しく節約されるが、加算や除算のよ
うな非可換性のオペレーションに対してある種の困難が
生じる。レジスタ構成Ｖ＝ＶｏｐＳが与えられたとする
と、非可換性のオペレーションに対する第一のオペラン
ドと第二のオペランドとの間の対称性の欠如は、命令セ
ットを拡張して、非可換性のオペレーションに対する二
つの異なるオペランドオプションを表すＳＵＢ、ＲＳＵ
Ｂ、ＤＩＶ、ＲＤＩＶのような操作コードの対を含める
ようにすることにより、レジスタ値を交換する付加的な
命令無しに、克服することができる。By encoding the vector / scalar properties of the registers in the register specification field as described above,
Although instruction bit space is saved significantly, certain difficulties arise for non-commutative operations such as addition and division. Given a register configuration V = VopS, the lack of symmetry between the first and second operands for non-commutative operations extends the instruction set to non-commutative operations. SUB, RSU representing two different operand options for
By including pairs of opcodes such as B, DIV, and RDIV, this can be overcome without the additional instruction of exchanging register values.

【００３８】図５はレジスタバンク３８のサブセットの
中のベクトルのラッピングを示す。特に、単精度モード
では、レジスタバンクはアドレスがＳ０からＳ７、Ｓ８
からＳ１５、Ｓ１６からＳ２３、およびＳ２４からＳ３
１の四つの範囲のレジスタに分割される。これらの範囲
は互いに素で、隣接している。図２に示すように、８個
のレジスタを含むこれらのサブセットに対するラッピン
グ機能は、ベクトル制御ユニット５０の中の３ビットの
インクリメンタ（加算器）５２を用いることにより提供
することができる。このようにして、サブセットの境界
を横切るとき、インクリメンタはラップバックする。こ
の簡単なインプリメンテーションは、レジスタアドレス
空間の中の８ワード境界にサブセットをそろえることに
よって、容易になる。FIG. 5 shows the wrapping of the vectors in a subset of the register bank 38. In particular, in the single-precision mode, the register banks have addresses S0 to S7, S8.
To S15, S16 to S23, and S24 to S3
It is divided into four ranges of registers. These ranges are disjoint and adjacent. As shown in FIG. 2, the wrapping function for these subsets, including eight registers, can be provided by using a three-bit incrementer 52 in the vector control unit 50. In this way, the incrementer wraps back when crossing a subset boundary. This simple implementation is facilitated by aligning the subset on 8-word boundaries in the register address space.

【００３９】図５に戻って、レジスタのラッピングの理
解を助けるために多数のベクトルオペレーションが示さ
れる。第一のベクトルオペレーションは、スタートレジ
スタＳ２、（ベクトル制御レジスタ４６の長さ値３によ
って示される）ベクトル長さ４、および（ベクトル制御
レジスタ４６の中のストライド値０によって示される）
ストライド１を指定する。したがって、これらのグロー
バルベクトル制御パラメータセットをそなえたベクトル
としてレジスタＳ２を参照するように復号された命令を
実行するとき、レジスタＳ２、Ｓ３、Ｓ４、およびＳ５
の中のデータ値をそれぞれ使用して命令が４回実行され
る。このベクトルがサブセット境界を横切らないので、
ベクトルラッピングは無い。Returning to FIG. 5, a number of vector operations are shown to aid in understanding register wrapping. The first vector operation is a start register S2, a vector length 4 (indicated by length value 3 in vector control register 46), and a stride value 0 in vector control register 46 (indicated by stride value 0 in vector control register 46).
Specify stride 1. Therefore, when executing a decoded instruction to refer to register S2 as a vector with these global vector control parameter sets, registers S2, S3, S4, and S5
The instruction is executed four times using each of the data values in. Since this vector does not cross the subset boundary,
No vector wrapping.

【００４０】第二の例では、スタートレジスタはＳ１４
であり、長さはＳ１４であり、ストライドは１である。
その結果、レジスタＳ１４から始まって、命令は６回実
行される。使用される次のレジスタはＳ１５となる。レ
ジスタが再びストライドだけ歩進すると、レジスタＳ１
６を使用する代わりに、レジスタＳ８にラップされる。
次に、命令が更に３回実行されることにより、Ｓ１４、
Ｓ１５、Ｓ８、Ｓ９、Ｓ１０、およびＳ１１の全シーケ
ンスが完了される。In the second example, the start register is S14
, The length is S14, and the stride is 1.
As a result, the instruction is executed six times, starting from register S14. The next register to be used is S15. When the register advances by the stride again, the register S1
Instead of using 6, it is wrapped in a register S8.
Next, by executing the instruction three more times, S14,
The entire sequence of S15, S8, S9, S10, and S11 is completed.

【００４１】図５の最後の例は、Ｓ２５のスタートレジ
スタ、８の長さ、および２のストライドを示す。使用さ
れる第一のレジスタはＳ２５であり、ストライド値２に
従ってその後にＳ２７、Ｓ２９、およびＳ３１が続く。
レジスタＳ３１の使用に続いて、次のレジスタ値はサブ
セットのスタートにラップバックし、ストライド２であ
るからレジスタＳ２４を通過し、レジスタＳ２５を使用
してオペレーションを実行する。インクリメンタ５２
は、ベクトルレジスタ相互間を動くとき現在値にストラ
イドを加算する３ビット加算器の形式を取り得る。した
がって、加算器に異なるストライド値を与えることによ
り、ストライドを調整することができる。The last example in FIG. 5 shows the start register of S25, the length of 8, and the stride of 2. The first register used is S25, followed by S27, S29 and S31 according to the stride value 2.
Following use of register S31, the next register value wraps back to the start of the subset, passes through register S24 because it is stride 2, and uses register S25 to perform the operation. Incrementer 52
May take the form of a 3-bit adder that adds a stride to the current value as it moves between vector registers. Therefore, the stride can be adjusted by giving the adder different stride values.

【００４２】図６は倍精度モードでのレジスタバンク３
８のラッピングを示す。このモードでは、レジスタのサ
ブセットにＤ０からＤ３、Ｄ４からＤ７、Ｄ８からＤ１
１、およびＤ１２からＤ１５が含まれる。倍精度モード
でインクリメンタ５２としての役目を果たす加算器への
最小値入力は２となる。これは倍精度ストライド１に対
応する。倍精度ストライド２は加算器への入力４を必要
とする。図６の第一の例では、スタートレジスタがＤ
０、長さが４、ストライドが１である。その結果、Ｄ
０、Ｄ１、Ｄ２およびＤ３のベクトルレジスタ系列が得
られる。サブセット境界を横切らないので、この例では
ラッピングは無い。第二の例では、スタートレジスタが
Ｄ１５、長さが２、ストライドが２である。その結果、
Ｄ１５およびＤ１３のベクトルレジスタ系列が得られ
る。FIG. 6 shows the register bank 3 in the double precision mode.
8 shows a wrapping of 8. In this mode, a subset of registers include D0 to D3, D4 to D7, and D8 to D1.
1, and D12 to D15. The minimum value input to the adder serving as the incrementer 52 in the double precision mode is 2. This corresponds to double precision stride 1. Double precision stride 2 requires input 4 to the adder. In the first example of FIG.
0, length 4, stride 1. As a result, D
A vector register sequence of 0, D1, D2 and D3 is obtained. There is no wrapping in this example because it does not cross the subset boundary. In the second example, the start register is D15, the length is 2, and the stride is 2. as a result,
A vector register series of D15 and D13 is obtained.

【００４３】図２を参照して、ロード記憶制御ユニット
４２はその出力に５ビットのインクリメンタをそなえて
おり、多重ロード／記憶オペレーションはベクトルオペ
レーションに適用されるレジスタラッピングを受けない
ことがわかる。これにより、単一の多重ロード／記憶命
令はそれが必要とするだけの数の連続レジスタにアクセ
スすることができる。Referring to FIG. 2, it can be seen that load storage control unit 42 has a 5-bit incrementer at its output, so that multiple load / store operations are not subject to register wrapping applied to vector operations. This allows a single multiple load / store instruction to access as many consecutive registers as it requires.

【００４４】このラッピング構成を良好に使用するオペ
レーションの一例は、４個の信号値のユニットと４個の
タップに分割されたＦＩＲフィルタである。シンタック
スＲ８−Ｒ１１ｏｐＲ１６−Ｒ１９がベクトルオペレー
ションＲ８ｏｐＲ１６、Ｒ９ｏｐＲ１７、Ｒ１０ｏｐＲ
１８、およびＲ１１ｏｐＲ１９を表す場合には、ＦＩＲ
フィルタオペレーションは次のように行うことができ
る。An example of an operation that makes good use of this wrapping configuration is a FIR filter divided into four signal value units and four taps. The syntaxes R8-R11opR16-R19 are vector operations R8opR16, R9opR17, R10opR.
18, and R11opR19, FIR
The filter operation can be performed as follows.

【００４５】８個のタップをＲ８−Ｒ１５に、８個の信
号値をＲ１６−Ｒ２３にロードする。Eight taps are loaded into R8-R15 and eight signal values are loaded into R16-R23.

【００４６】Ｒ８−Ｒ１１ｏｐＲ１６−Ｒ１９、そして
結果をＲ２４−Ｒ２７に入れる。Ｒ９−Ｒ１２ｏｐＲ１
６−Ｒ１９、そして結果をＲ２４−Ｒ２７に入れる。Ｒ
１０−Ｒ１３ｏｐＲ１６−Ｒ１９、そして結果をＲ２４
−Ｒ２７に入れる。Ｒ１１−Ｒ１４ｏｐＲ１６−Ｒ１
９、そして結果をＲ２４−Ｒ２７に入れる。Put R8-R11opR16-R19, and put the result in R24-R27. R9-R12opR1
6-R19, and place the results in R24-R27. R
10-R13opR16-R19 and the result as R24
-Put in R27. R11-R14opR16-R1
9, and place the result in R24-R27.

【００４７】Ｒ８−Ｒ１１に新しいタップを再ロードす
る。Reload new taps in R8-R11.

【００４８】Ｒ１２−Ｒ１５ｏｐＲ１６−Ｒ１９、そし
て結果をＲ２４−Ｒ２７に累積する。Ｒ１３−Ｒ８ｏｐ
Ｒ１６−Ｒ１９、そして結果をＲ２４−Ｒ２７に累積す
る（Ｒ１５−＞Ｒ８ラップ）。Ｒ１４−Ｒ９ｏｐＲ１６
−Ｒ１９、そして結果をＲ２４−Ｒ２７に累積する（Ｒ
１５−＞Ｒ８ラップ）。Ｒ１５−Ｒ１０ｏｐＲ１６−Ｒ
１９、そして結果をＲ２４−Ｒ２７に累積する（Ｒ１５
−＞Ｒ８ラップ）。The results are accumulated in R12-R15opR16-R19, and the result is accumulated in R24-R27. R13-R8op
R16-R19 and the result are accumulated in R24-R27 (R15-> R8 wrap). R14-R9opR16
-R19, and accumulate the result in R24-R27 (R
15-> R8 wrap). R15-R10opR16-R
19, and accumulate the result in R24-R27 (R15
-> R8 wrap).

【００４９】Ｒ１２−Ｒ１５に新しいタップを再ロード
する。Reload new taps on R12-R15.

【００５０】タップがなくなると、Ｒ１６−Ｒ１９に新
しいデータを再ロードする。When there are no more taps, new data is reloaded into R16-R19.

【００５１】Ｒ１２−Ｒ１５ｏｐＲ２０−Ｒ２３、そし
て結果をＲ２８−Ｒ３１に入れる。Put R12-R15opR20-R23, and put the result in R28-R31.

【００５２】Ｒ１３−Ｒ８ｏｐＲ２０−Ｒ２３、そして
結果をＲ２８−Ｒ３１に入れる（Ｒ１５−＞Ｒ８ラッ
プ）。Ｒ１４−Ｒ９ｏｐＲ２０−Ｒ２３、そして結果を
Ｒ２８−Ｒ３１に入れる（Ｒ１５−＞Ｒ８ラップ）。Ｒ
１５−Ｒ１０ｏｐＲ２０−Ｒ２３、そして結果をＲ２８
−Ｒ３１に入れる（Ｒ１５−＞Ｒ８ラップ）。Put R13-R8opR20-R23 and the result into R28-R31 (R15-> R8 wrap). R14-R9opR20-R23, and put the result in R28-R31 (R15-> R8 wrap). R
15-R10opR20-R23 and the result as R28
-Put in R31 (R15-> R8 wrap).

【００５３】残りは上記と同様The rest is the same as above

【００５４】上記のことからわかるように、ロードは多
重累算から異なるレジスタに対して行われるので、並列
に行われ得る（すなわち、二重バッファを行う）。As can be seen from the above, the loading can be done in parallel (ie, do double buffering) since the loading is done to different registers from multiple accumulations.

【００５５】図７Ａは、主プロセッサ２４がコプロセッ
サ命令をどのように調べるかを示す概略図である。主プ
ロセッサは命令の中の（分割できる）フィールド７６の
ビット組み合わせを使用することにより、命令をコプロ
セッサ（ｃｏｐｒｏｃｅｓｓｏｒ）命令と識別する。標
準のＡＲＭプロセッサの命令のセットの中で、コプロセ
ッサ命令にはコプロセッサ番号フィールド７８が含まれ
る。主プロセッサに付属したコプロセッサ（一つまたは
複数）はコプロセッサ番号フィールド７８を使用して、
特定のコプロセッサ命令がそれらを目標としているか識
別する。ＤＳＰコプロセッサ（たとえば、ＡＲＭ社製の
ピッコロ（Ｐｉｃｃｏｌｏ）コプロセッサ）または浮動
小数点ユニットコプロセッサのような異なる型のコプロ
セッサには異なるコプロセッサ番号を割り当てることが
でき、したがって、同じコプロセッサバス３６を使用し
て単一のシステムの中で別々にアドレス指定することが
できる。コプロセッサ命令には、コプロセッサが使用す
る操作コード（ｏｐｃｏｄｅ）、ならびにコプロセッサ
レジスタの中からデスティネーション、第一のオペラン
ド、および第二のオペランドをそれぞれ指定する３個の
５ビットフィールドも含まれる。コプロセッサロードま
たは記憶のような、いくつかの命令では、コプロセッサ
と主プロセッサが一緒になって所望のデータ処理オペレ
ーションを完了できるように、主プロセッサは少なくと
も部分的にコプロセッサ命令を復号する。主プロセッサ
は、このような状況でそれが行う命令復号の一部として
コプロセッサ番号の中で符号化されたデータ型に応答し
てもよい。FIG. 7A is a schematic diagram showing how the main processor 24 examines coprocessor instructions. The main processor identifies the instruction as a coprocessor instruction by using a bit combination of the (splittable) field 76 in the instruction. Within the standard ARM processor instruction set, the coprocessor instructions include a coprocessor number field 78. The coprocessor (s) associated with the main processor use the coprocessor number field 78 to
Identify whether specific coprocessor instructions are targeting them. Different types of coprocessors, such as a DSP coprocessor (eg, a Piccolo coprocessor from ARM) or a floating point unit coprocessor, can be assigned different coprocessor numbers, and therefore have the same coprocessor bus 36. Can be addressed separately within a single system. The coprocessor instruction also includes an operation code (opcode) used by the coprocessor, and three 5-bit fields respectively specifying a destination, a first operand, and a second operand from the coprocessor registers. . For some instructions, such as coprocessor loads or stores, the main processor at least partially decodes the coprocessor instructions so that the coprocessor and the main processor together can complete a desired data processing operation. The main processor may respond to the data type encoded in the coprocessor number as part of the instruction decoding it does in such situations.

【００５６】図７Ｂは、倍精度と単精度の両方のオペレ
ーションをサポートするコプロセッサが受信したコプロ
セッサ命令をどのように解釈するかを示す。このような
コプロセッサには二つの隣接したコプロセッサ番号が割
り当てられる。コプロセッサはコプロセッサ番号の最上
位の３ビットを使用して、それがターゲットのコプロセ
ッサであるか識別する。このようにして、コプロセッサ
番号の最下位ビットはターゲットのコプロセッサを識別
する目的で冗長であり、代わりにこれを使用して、その
コプロセッサ命令を実行する際に使用されるべきデータ
型を指定することができる。この例では、データ型はデ
ータサイズが単精度であるか、倍精度であるかというこ
とに対応する。FIG. 7B illustrates how a coprocessor that supports both double and single precision operations interprets received coprocessor instructions. Such a coprocessor is assigned two adjacent coprocessor numbers. The coprocessor uses the three most significant bits of the coprocessor number to identify if it is the target coprocessor. In this way, the least significant bit of the coprocessor number is redundant for the purpose of identifying the target coprocessor, and is instead used to determine the data type to be used in executing that coprocessor instruction. Can be specified. In this example, the data type corresponds to whether the data size is single precision or double precision.

【００５７】倍精度モードではレジスタ数が事実上３２
から１６に減る。それに応じてレジスタのフィールドサ
イズを小さくすることは可能ではあるが、その場合に
は、使用すべきレジスタの復号化はコプロセッサ命令の
中の既知の位置のそれだけで完備したフィールドから直
接得ることはできず、コプロセッサ命令の他の部分の復
号化に左右される。これには、複雑で、多分、コプロセ
ッサのオペレーションが遅くなるという欠点がある。コ
プロセッサ番号の最下位ビットを使用してデータ型を復
号するということは、操作コードは完全にデータ型によ
らないようにできることを意味し、これによっても、そ
の復号化は簡単になり、速度が早くなる。In the double precision mode, the number of registers is effectively 32.
From 16 to 16. It is possible to reduce the register field size accordingly, but in that case the decoding of the register to be used can be obtained directly from the complete field itself at a known location in the coprocessor instruction. No, depending on the decoding of other parts of the coprocessor instruction. This has the disadvantage of being complex and possibly slowing the operation of the coprocessor. Decoding the data type using the least significant bits of the coprocessor number means that the operation code can be completely independent of the data type, which also makes its decoding easier and faster. Is faster.

【００５８】図７Ｃは、図７Ｂのコプロセッサによって
サポートされたデータ型のサブセットである単一のデー
タ型だけをサポートするコプロセッサがどのようにコプ
ロセッサ命令を解釈するかを示す。この場合には、全コ
プロセッサ番号を使用して、その命令をアクセプトする
べきか否か判定する。このようにして、コプロセッサ命
令がサポートされていないデータ型である場合には、そ
れは異なるコプロセッサ番号に対応し、アクセプトされ
ない。このとき、主プロセッサ２４は未定義の命令例外
処理に頼って、サポートされていないデータ型に対して
オペレーションのエミュレーションを行うことができ
る。FIG. 7C illustrates how a coprocessor that supports only a single data type, which is a subset of the data types supported by the coprocessor of FIG. 7B, interprets coprocessor instructions. In this case, all coprocessor numbers are used to determine whether the instruction should be accepted. Thus, if the coprocessor instruction is of an unsupported data type, it corresponds to a different coprocessor number and will not be accepted. At this time, main processor 24 can rely on undefined instruction exception handling to emulate operations for unsupported data types.

【００５９】図８は、主プロセッサとしての役目を果た
し、単精度と倍精度の両方のデータ型をサポートするコ
プロセッサ８４とコプロセッサバス８２を介して通信す
るデータ処理システムを示す。コプロセッサ番号を含む
コプロセッサ命令は命令ストリームの中に出て来たと
き、ＡＲＭコア８０からコプロセッサバス８２上に送出
される。次にコプロセッサ８４は、コプロセッサ番号を
それ自身の番号と比較し、一致していればアクセプト信
号をＡＲＭコア８０に返送する。アクセプト信号を受信
しなければ、ＡＲＭコアは未定義の命令例外と認識し、
メモリシステム８６に記憶されている例外処理コードを
参照する。FIG. 8 shows a data processing system serving as a main processor and communicating via a coprocessor bus 82 with a coprocessor 84 that supports both single and double precision data types. A coprocessor instruction including a coprocessor number is issued from the ARM core 80 onto the coprocessor bus 82 as it appears in the instruction stream. The coprocessor 84 then compares the coprocessor number with its own number and returns an accept signal to the ARM core 80 if they match. If the accept signal is not received, the ARM core recognizes that the instruction exception is undefined,
The exception handling code stored in the memory system 86 is referred to.

【００６０】図９は、コプロセッサ８４を単精度オペレ
ーションのみをサポートするコプロセッサ８８に置き換
えることにより修正された図８のシステムを示す。この
場合、コプロセッサ８８は単一のコプロセッサ番号だけ
を認識する。したがって、図８のコプロセッサ８４によ
って実行される、オリジナル命令ストリームの中の倍精
度コプロセッサ命令は、単精度のコプロセッサ８８によ
ってアクセプトされない。したがって、同じコードを実
行することが望ましい場合には、メモリシステム８６の
中の未定義の例外処理コードに倍精度エミュレーション
ルーチンを含めることができる。FIG. 9 shows the system of FIG. 8 modified by replacing coprocessor 84 with a coprocessor 88 that supports only single precision operation. In this case, coprocessor 88 recognizes only a single coprocessor number. Thus, double-precision coprocessor instructions in the original instruction stream that are executed by coprocessor 84 of FIG. 8 are not accepted by single-precision coprocessor 88. Thus, if it is desired to execute the same code, undefined exception handling code in memory system 86 may include a double precision emulation routine.

【００６１】倍精度命令をエミュレーションしなければ
ならないことにより、これらの命令の実行が遅くなる
が、単精度コプロセッサ８８は倍精度コプロセッサ８４
より小さく、安価にでき、倍精度命令が充分にまれであ
れば、正味の利点が得られる。The need to emulate double-precision instructions slows down the execution of these instructions, but the single-precision coprocessor 88
If smaller, less expensive, and double-precision instructions are sufficiently rare, the net benefit is gained.

【００６２】図１０は、単精度と倍精度の両方の命令を
サポートし、二つの隣接したコプロセッサ番号をそなえ
たコプロセッサ８４の中の命令ラッチ回路を示す。この
場合、コプロセッサ命令の中の望ましいコプロセッサ番
号の最上位３ビットＣＰ＃［３：１］がそのコプロセッ
サ８４に割り当てられたものと比較される。この例で、
コプロセッサ８４がコプロセッサ番号１０と１１をそな
えている場合には、コプロセッサ番号の最上位３ビット
ＣＰ＃［３：１］を２進１０１と比較することにより、
この比較を行うことができる。一致が生じると、アクセ
プト信号がＡＲＭコア８０に返送され、コプロセッサ命
令が実行のためにラッチされる。FIG. 10 shows an instruction latch circuit in a coprocessor 84 that supports both single and double precision instructions and has two adjacent coprocessor numbers. In this case, the three most significant bits CP # [3: 1] of the desired coprocessor number in the coprocessor instruction are compared with those assigned to that coprocessor 84. In this example,
If the coprocessor 84 has coprocessor numbers 10 and 11, by comparing the most significant three bits CP # [3: 1] of the coprocessor number with the binary 101,
This comparison can be made. If a match occurs, an accept signal is returned to ARM core 80 and the coprocessor instruction is latched for execution.

【００６３】図１１は図９の単精度コプロセッサ８８の
中の同等の回路を示す。この場合には、単一のコプロセ
ッサ番号だけが認識され、単精度オペレーションがデフ
ォルトにより使用される。コプロセッサ命令をアクセプ
トしてラッチすべきか否かについて判定する際に行われ
る比較は、コプロセッサ番号ＣＰ＃［３：０］の４ビッ
ト全体と単一の埋め込まれたコプロセッサ番号である２
進１０１０との間で行われる。FIG. 11 shows an equivalent circuit in the single precision coprocessor 88 of FIG. In this case, only a single coprocessor number is recognized and single precision operation is used by default. The comparison made in determining whether to accept and latch the coprocessor instruction is made by comparing the entire four bits of coprocessor number CP # [3: 0] and the single embedded coprocessor number 2
This is performed between the decimal point 1010 and the decimal point 1010.

【００６４】図１２は、図９の実施例の未定義例外処理
ルーチンをどのようにトリガして倍精度エミュレーショ
ンコードを動かせるかを示す流れ図である。これは、未
定義命令例外を生じた命令が、コプロセッサ番号が２進
１０１１であるコプロセッサ命令であるか検出する（ス
テップ９０）ことにより、行われる。「イエス」であれ
ば、これは倍精度命令を意図したものであるので、ステ
ップ９２でエミュレーションを行った後、主プログラム
のフローに戻ることができる。ステップ９０でトラップ
されなければ、以後のステップにより他の例外の型の検
出と処理を行ってもよい。FIG. 12 is a flowchart showing how the undefined exception handling routine of the embodiment of FIG. 9 can be triggered to move the double precision emulation code. This is performed by detecting whether the instruction causing the undefined instruction exception is a coprocessor instruction whose coprocessor number is binary 1011 (step 90). If "yes", this is intended for a double-precision instruction, so that emulation can be performed in step 92 and then the flow returns to the main program. If not trapped in step 90, other exception types may be detected and processed in subsequent steps.

【００６５】図１３は、レジスタバンク２２０の各３２
ビットレジスタ、すなわち各データスロット、に記憶さ
れたデータの型を識別する情報を記憶するための、フォ
ーマットレジスタＦＰＲＥＧ２００の使用を示す。前に
説明したように、各データスロットは３２ビットのデー
タ値（１データワード）を記憶するための単精度レジス
タとして個別に動作するか、またはもう一つのデータス
ロットと対にして６４ビットのデータ値（２データワー
ド）を記憶するための倍精度レジスタを提供することが
できる。本発明の好適実施例によれば、ＦＰＲＥＧレジ
スタ２００は任意の特定のデータスロットがその中に単
精度のデータを記憶しているか、倍精度のデータを記憶
しているかを識別するように構成される。FIG. 13 shows each of the 32 registers in the register bank 220.
9 illustrates the use of a format register FPREG 200 to store information identifying the type of data stored in a bit register, ie, each data slot. As described above, each data slot may operate individually as a single precision register for storing a 32-bit data value (one data word), or may be paired with another data slot to provide a 64-bit data value. A double precision register for storing a value (two data words) can be provided. According to a preferred embodiment of the present invention, FPREG register 200 is configured to identify whether any particular data slot has single-precision or double-precision data stored therein. You.

【００６６】図１３に示すように、レジスタバンク２２
０の中の３２個のデータスロットは１６対のデータスロ
ットを提供するように配列される。ある第一のデータス
ロットがその中に単精度のデータ値を記憶している場合
には、その対の他方のデータスロットは単精度のデータ
値だけを記憶するように構成され、倍精度のデータ値を
記憶するために他のどのデータスロットともリンクされ
ることはない。これにより、どの特定のデータスロット
対も二つの単精度データ値、または一つの倍精度データ
値を記憶するように構成される。この情報は、レジスタ
バンク２２０の中の各データスロット対と結合された１
ビットの情報により識別することができる。したがって
好適実施例ではＦＰＲＥＧレジスタ２００は、レジスタ
バンク２２０の各データスロット対に記憶されたデータ
の型を識別するために１６ビットの情報を記憶するよう
に構成される。したがって、レジスタＦＰＲＥＧ２００
は１６ビットのレジスタとして具体化するか、またはＦ
ＰＵコプロセッサ２６の中の他のレジスタとの一貫性の
ため、１６スペアビットの情報をそなえた３２ビットの
レジスタとして具体化することができる。As shown in FIG. 13, the register bank 22
Thirty-two data slots out of zero are arranged to provide 16 pairs of data slots. If a first data slot stores a single precision data value therein, the other data slot of the pair is configured to store only the single precision data value and the double precision data value is stored. It is not linked to any other data slot to store the value. Thus, any particular data slot pair is configured to store two single precision data values, or one double precision data value. This information is associated with each data slot pair in register bank 220
It can be identified by bit information. Thus, in the preferred embodiment, FPREG register 200 is configured to store 16 bits of information to identify the type of data stored in each data slot pair of register bank 220. Therefore, register FPREG200
May be embodied as a 16-bit register or F
For consistency with other registers in the PU coprocessor 26, it can be embodied as a 32-bit register with 16 spare bits of information.

【００６７】図１５はレジスタバンク２２０の中の６対
のデータスロットを示す。好適実施例によれば、この６
対のデータスロットを使用して、６個の倍精度のデータ
値または１２個の単精度のデータ値を記憶することがで
きる。データスロットの中に記憶し得るデータの例が図
１５に示されている。ＤＨは倍精度データ値の３２個の
最上位ビットを表し、ＤＬは倍精度データ値の３２個の
最下位ビットを表し、Ｓは単精度のデータ値を表す。FIG. 15 shows six pairs of data slots in register bank 220. According to the preferred embodiment, this 6
A pair of data slots can be used to store six double precision data values or twelve single precision data values. An example of data that can be stored in a data slot is shown in FIG. DH represents the 32 most significant bits of the double precision data value, DL represents the 32 least significant bits of the double precision data value, and S represents a single precision data value.

【００６８】本発明の好適実施例によるＦＰＲＥＧレジ
スタ２００の中の対応するエントリも図１５に示されて
いる。好適実施例によれば、対応するデータスロット対
に倍精度データ値が入っていることを示すためにＦＰＲ
ＥＧレジスタ２００に値「１」が記憶され、対応するデ
ータスロット対の少なくとも一方に単精度データ値が入
っているか、または両方のデータスロットとも初期化さ
れていないことを示すために値「０」が使用される。し
たがって、両方のデータスロットとも初期化されていな
い場合、一方のデータスロットが初期化されていなく
て、その対の他方のデータスロットに単精度データ値が
入っている場合、または対の両方のデータスロットに単
精度データ値が入っている場合には、ＦＰＲＥＧレジス
タ２００の対応するビットに論理「０」の値が記憶され
る。The corresponding entry in FPREG register 200 according to the preferred embodiment of the present invention is also shown in FIG. According to a preferred embodiment, the FPR is used to indicate that the corresponding data slot pair contains a double precision data value.
The value "1" is stored in the EG register 200, and the value "0" to indicate that at least one of the corresponding data slot pairs contains a single precision data value, or that both data slots have not been initialized. Is used. Therefore, if both data slots are uninitialized, one data slot is uninitialized and the other data slot in the pair contains single-precision data values, or both data pairs If the slot contains a single precision data value, a logic "0" value is stored in the corresponding bit of FPREG register 200.

【００６９】前に説明したように、好適実施例のＦＰＵ
プロセッサ２６を使用して単精度または倍精度のデータ
値を処理してもよく、また主プロセッサ２４が送出した
コプロセッサ命令は、任意の特定の命令が単精度命令で
あるか倍精度命令であるかを識別する（図７Ｂと付属の
説明参照）。命令がコプロセッサよりアクセプトされる
と、その命令はレジスタ制御−命令送出ユニット４８に
送られて、復号されて実行される。命令がロード命令で
あれば、レジスタ制御−命令送出論理４８はロード記憶
制御ユニット４２に命じて、識別されたデータをメモリ
から検索させ、レジスタバンク２２０の指定されたデー
タスロットにそのデータを記憶させる。この段階でコプ
ロセッサは単精度データ値が検索されているのか、倍精
度データ値が検索されているのかを知り、ロード記憶制
御ユニット４２はそれに応じて動作する。したがって、
ロード記憶制御ユニット４２は経路２２５で３２ビット
の単精度データ値または６４ビットの倍精度データ値を
レジスタバンク入力論理２３０に送って、レジスタバン
ク２２０に記憶させる。As previously described, the FPU of the preferred embodiment
Processor 26 may be used to process single-precision or double-precision data values, and the coprocessor instructions issued by main processor 24 may be any particular instruction is a single-precision instruction or a double-precision instruction. (See FIG. 7B and accompanying description). When an instruction is accepted from the coprocessor, the instruction is sent to the register control-instruction sending unit 48 where it is decoded and executed. If the instruction is a load instruction, register control-instruction dispatch logic 48 instructs load storage control unit 42 to retrieve the identified data from memory and store the data in the designated data slot of register bank 220. . At this stage, the coprocessor knows whether a single precision data value or a double precision data value is being searched, and the load storage control unit 42 operates accordingly. Therefore,
The load storage control unit 42 sends a 32-bit single precision data value or a 64-bit double precision data value to the register bank input logic 230 via the path 225 for storage in the register bank 220.

【００７０】データはロード記憶制御ユニット４２によ
りレジスタバンク２２０にロードされるだけでなく、フ
ォーマットレジスタＦＲＰＥＧ２００にも与えられる。
これにより、データを受ける各データスロット対が単精
度データを記憶しようとしているのか、倍精度データを
記憶しようとしているのかを表すために必要な情報ビッ
トを付加することができる。好適実施例では、このデー
タがフォーマットレジスタＦＲＰＥＧ２００に記憶され
た後に、データがレジスタバンクにロードされるので、
この情報をレジスタバンク入力論理２３０が利用でき
る。The data is not only loaded into the register bank 220 by the load storage control unit 42, but is also provided to the format register FRPEG200.
As a result, it is possible to add information bits necessary to indicate whether each data slot pair receiving data intends to store single-precision data or double-precision data. In the preferred embodiment, after this data is stored in the format register FRPEG200, the data is loaded into the register bank,
This information is available to register bank input logic 230.

【００７１】好適実施例では、レジスタバンク２２０の
内部フォーマットは外部フォーマットと同じであるの
で、レジスタバンク２２０の中では単精度データ値は３
２ビットのデータ値として記憶され、倍精度データ値は
６４ビットのデータ値として記憶される。レジスタバン
ク入力論理２３０はフォーマットレジスタＦＲＰＥＧ２
００にアクセスするので、レジスタバンク入力論理２３
０はそれが受けているデータが単精度であるか、倍精度
であるかがわかる。したがって、このような実施例で
は、レジスタバンク入力論理２３０はレジスタバンク２
２０の適当なデータスロット（一つまたは複数）に記憶
するために経路２２５で受け取ったデータを単に配列す
るだけである。しかし、代替実施例でレジスタバンクの
中の内部表現が外部フォーマットと異なる場合には、レ
ジスタバンク入力論理２３０は必要な変換を行うように
構成される。たとえば、ある数は通常、１．ａｂ
ｃ．．．に基数を乗じて、ある指数の累乗としたもので
表される。効率性のため、通常の単精度と倍精度の表現
は１０進小数点の左側の１を表すためにデータビットを
使用しないで、１は暗示されているものとする。何らか
の理由で、レジスタバンク２２０の中で使用される内部
表現が１を明示しなければならない場合には、レジスタ
バンク入力論理２３０はデータの必要な変換を行う。こ
のような実施例では、レジスタバンク入力論理２３０が
発生する付加的なデータを収容するために、データスロ
ットは通常、３２ビットより若干大きくなる。In the preferred embodiment, since the internal format of register bank 220 is the same as the external format, single precision data values in
It is stored as a 2-bit data value, and the double-precision data value is stored as a 64-bit data value. The register bank input logic 230 is a format register FRPEG2
00, the register bank input logic 23
A value of 0 indicates whether the data received is single precision or double precision. Thus, in such an embodiment, register bank input logic 230 would
It simply arranges the data received on path 225 for storage in the twenty appropriate data slot (s). However, in an alternative embodiment, if the internal representation in the register bank differs from the external format, register bank input logic 230 is configured to perform the necessary conversion. For example, a number is usually 1. ab
c. . . Is multiplied by a radix to obtain a power of a certain exponent. For efficiency, the usual single and double precision representations do not use data bits to represent the one to the left of the decimal point, and one is assumed to be implied. If, for any reason, the internal representation used in register bank 220 must specify 1, then register bank input logic 230 performs the necessary conversion of the data. In such an embodiment, the data slot is typically slightly larger than 32 bits to accommodate the additional data generated by register bank input logic 230.

【００７２】データ値をレジスタバンク２２０にロード
する他に、ロード記憶制御ユニット４２はコプロセッサ
２６の一つ以上のシステムレジスタ、たとえばユーザス
テータス制御レジスタＦＰＳＣＲ２１０にデータをロー
ドしてもよい。好適実施例では、ＦＰＳＣＲレジスタ２
１０に、ユーザがアクセスできるコンフィギュレーショ
ンビットと例外ステータスビットとが含まれている。こ
れについては、好適実施例の説明の最後に行う浮動小数
点ユニットのアーキテクチャの説明で更に詳しく説明す
る。In addition to loading data values into register bank 220, load storage control unit 42 may load data into one or more system registers of coprocessor 26, such as user status control register FPSCR 210. In the preferred embodiment, FPSCR register 2
10 includes configuration bits and exception status bits that can be accessed by the user. This is explained in more detail in the description of the architecture of the floating point unit at the end of the description of the preferred embodiment.

【００７３】その内容をメモリに記憶すべきレジスタバ
ンク２２０の中の特定のデータスロットを表す記憶命令
をレジスタ制御−命令送出ユニット４８が発すると、そ
れに応じてロード記憶制御ユニット４２が命令され、必
要なデータワードがレジスタバンク２２０からレジスタ
バンク出力論理２４０を介してロード記憶制御ユニット
４２に呼び出される。読み出されつつあるデータが単精
度データか倍精度データかを判定するために、レジスタ
バンク出力論理２４０はＦＰＲＥＧレジスタ２００の内
容にアクセスする。次にレジスタバンク出力論理２４０
は、レジスタバンク入力論理２３０によって加えられた
データ変換を逆にするために適当なデータ変換を加え、
そのデータを経路２３５でロード記憶制御ユニット４２
に与える。When the register control-instruction sending unit 48 issues a storage instruction representing a specific data slot in the register bank 220 whose contents are to be stored in the memory, the load storage control unit 42 is instructed accordingly, and The appropriate data word is called from the register bank 220 to the load storage control unit 42 via the register bank output logic 240. Register bank output logic 240 accesses the contents of FPREG register 200 to determine whether the data being read is single precision data or double precision data. Next, register bank output logic 240
Applies an appropriate data transformation to reverse the data transformation applied by register bank input logic 230,
The data is transferred to the load storage control unit 42 via a path 235.
Give to.

【００７４】本発明の好適実施例によれば、記憶命令が
倍精度命令であれば、コプロセッサ２６は、命令が倍精
度データ値に適用される第二のオペレーションモードで
動作していると考えることができる。倍精度データ値に
は偶数個のデータワードが含まれているので、第二のオ
ペレーションモードで送出されるどの記憶命令も通常、
その内容がメモリに記憶されるべき偶数個のデータスロ
ットを表す。しかし、本発明の好適実施例によれば、奇
数個のデータスロットが指定された場合には、ロード記
憶制御ユニット４２は、ＦＰＲＥＧレジスタ２００の内
容を読んで、まずそれらの内容をメモリに記憶した後、
レジスタバンク２２０からの識別された偶数個のデータ
スロットを記憶するように構成される。通常、転送すべ
きデータスロットは、レジスタバンクの中の特定のデー
タスロットを表すベースアドレスの後に、そのデータス
ロットから数えた、記憶すべきデータスロット数（すな
わち、データワード数）を示す数を続けたもので表され
る。According to a preferred embodiment of the present invention, if the store instruction is a double-precision instruction, coprocessor 26 operates in a second mode of operation in which the instruction is applied to a double-precision data value. be able to. Since a double precision data value contains an even number of data words, any store instruction issued in the second mode of operation will normally
Its contents represent an even number of data slots to be stored in memory. However, according to a preferred embodiment of the present invention, if an odd number of data slots are specified, the load storage control unit 42 will read the contents of the FPREG register 200 and first store those contents in memory. rear,
It is configured to store an even number of identified data slots from the register bank 220. Usually, the data slot to be transferred is a base address representing a specific data slot in the register bank, followed by a number indicating the number of data slots to be stored (that is, the number of data words) counted from the data slot. Is represented by

【００７５】ここで、たとえば、記憶命令がベースアド
レスとしてレジスタバンク２２０の第一のデータスロッ
トを与えて、３３個のデータスロットを指定した場合に
は、これにより３２個の全部のデータスロットの内容が
メモリに記憶されるが、指定されたデータスロット数が
奇数であるので、ＦＰＲＥＧレジスタ２００の内容もメ
モリに記憶される。Here, for example, when the storage instruction gives the first data slot of register bank 220 as the base address and designates 33 data slots, the contents of all 32 data slots are thereby specified. Is stored in the memory, but the content of the FPREG register 200 is also stored in the memory since the specified number of data slots is odd.

【００７６】このアプローチにより単一の命令を使用し
て、レジスタバンクの内容と、レジスタバンク２２０の
種々のデータスロットの中に記憶されたデータの型を表
すＦＰＲＥＧレジスタ２００の内容の両方を記憶するこ
とができる。これにより、ＦＰＲＥＧレジスタ２００の
内容を明示的に記憶するために別々の命令を発する必要
が無くなるので、メモリへの記憶またはメモリプロセス
からのロードの間の処理速度にあまり悪影響を及ぼすこ
とは無い。This approach uses a single instruction to store both the contents of the register bank and the contents of the FPREG register 200, which represents the type of data stored in the various data slots of the register bank 220. be able to. This eliminates the need to issue separate instructions to explicitly store the contents of FPREG register 200, and does not significantly affect processing speed during storage to memory or loading from memory processes.

【００７７】本発明のもう一つの実施例では、この手法
をもう一段階進めることにより、単一の命令を使用し
て、必要な場合には、ＦＰＳＣＲレジスタ２１０のよう
な付加的なシステムレジスタもメモリに記憶し得るよう
にできる。したがって、３２個のデータスロットをそな
えたレジスタバンク２２０の例を考えると、前に説明し
たように、記憶命令で３３個のデータスロットが表され
た場合には、レジスタバンク２２０の３２個のデータス
ロットの内容の他に、ＦＰＲＥＧレジスタ２００がメモ
リに記憶される。しかし、レジスタバンクの中のデータ
スロット数を超える異なる奇数、たとえば、３５が表さ
れた場合には、これをロード記憶制御ユニット４２は、
ＦＰＲＥＧレジスタ２００とレジスタバンク２２０のデ
ータスロットの内容の他に、ＦＰＳＣＲレジスタ２１０
の内容もメモリに記憶する必要性と解釈することができ
る。コプロセッサはそれ以上のシステムレジスタ、たと
えば、コプロセッサによる命令の処理の間に生じた例外
を表す例外レジスタを含んでもよい。記憶命令に異なる
奇数、たとえば、３７が表された場合には、これをロー
ド記憶制御ユニット４２は、ＦＰＳＣＲレジスタ２１
０、ＦＰＲＥＧレジスタ２００、およびレジスタバンク
２２０の内容の他に、一つ以上の例外レジスタの内容も
付加的に記憶する必要性と解釈することができる。In another embodiment of the present invention, taking this approach one step further, a single instruction can be used to add additional system registers, such as FPSCR register 210, if needed. It can be stored in memory. Therefore, considering the example of the register bank 220 having 32 data slots, as described above, when 33 data slots are represented by the storage instruction, the 32 data slots of the register bank 220 are stored. In addition to the contents of the slot, an FPREG register 200 is stored in memory. However, if a different odd number, for example, 35, which is greater than the number of data slots in the register bank, is represented by the load storage control unit 42,
In addition to the contents of the data slots in the FPREG register 200 and the register bank 220, the FPSCR register 210
Can also be interpreted as the need to store it in memory. The coprocessor may include additional system registers, for example, an exception register that indicates an exception that occurred during processing of the instruction by the coprocessor. If a different odd number, for example, 37 is represented in the storage instruction, the load storage control unit 42 stores this in the FPSCR register 21.
In addition to the contents of the 0, FPREG register 200, and register bank 220, the contents of one or more exception registers can be interpreted as a need to additionally store.

【００７８】この手法が特に有用であるのは、記憶また
はロードの命令を開始するコードがレジスタバンクの内
容を知っていなくて、後でレジスタバンクで検索するた
めにレジスタバンクの内容が一時的にのみメモリに記憶
される。コードがレジスタバンクの内容を知っている場
合には、ＦＰＲＥＧレジスタ２００の内容もメモリに記
憶する必要は無いかも知れない。レジスタバンクの内容
を知らないことがあるコードの代表的な例は、コンテキ
ストスイッチコードおよび手順呼エントリとエクジット
ルーチンである。This technique is particularly useful because the code that initiates the store or load instruction does not know the contents of the register bank, and temporarily stores the contents of the register bank for later retrieval in the register bank. Only stored in memory. If the code knows the contents of the register bank, the contents of FPREG register 200 may not need to be stored in memory either. Representative examples of codes that may not know the contents of the register bank are context switch codes and procedural call entries and exit routines.

【００７９】このような場合には、レジスタバンクの内
容の他にＦＰＲＥＧレジスタ２００の内容を効率良くメ
モリに記憶することができる。実際、上記したように、
必要に応じて他のある種のシステムレジスタも記憶する
ことができる。In such a case, the contents of the FPREG register 200 in addition to the contents of the register bank can be efficiently stored in the memory. In fact, as mentioned above,
Certain other system registers can also be stored as needed.

【００８０】後続のロード命令を受けると、同様のプロ
セスが用いられる。したがって、ロード記憶制御ユニッ
ト４２は、奇数個のデータスロットを指定する倍精度ロ
ード命令を受けると、ＦＰＲＥＧレジスタ２００の内容
をＦＰＲＥＧレジスタ２００にロードした後、ロード命
令に表されたスロット数で示されるシステムレジスタの
内容、その後に偶数個のデータワードをレジスタバンク
２２０の指定されたデータスロットに記憶させるように
構成される。したがって、前に説明した例を考えると、
ロード命令で指定されたデータスロット数が３３である
場合には、ＦＰＲＥＧレジスタ２００の内容がＦＰＲＥ
Ｇレジスタ２００にロードされた後、３２個のデータス
ロットの内容がロードされる。同様に、ロード命令に指
定されたデータスロット数が３５である場合には、上記
の内容の他に、ＦＰＳＣＲレジスタ２１０の内容もＦＰ
ＳＣＲレジスタにロードされる。最後に、指定されたデ
ータスロット数が３７である場合には、上記の内容の他
に、例外レジスタの内容もそれらの例外レジスタにロー
ドされる。熟練した当業者には明らかなように、特定の
奇数と結合された特定のオペレーションは完全に任意で
あり、希望に応じて変えることができる。When a subsequent load instruction is received, a similar process is used. Therefore, when receiving a double-precision load instruction designating an odd number of data slots, the load storage control unit 42 loads the contents of the FPREG register 200 into the FPREG register 200, and then indicates the number of slots indicated by the load instruction. It is configured to store the contents of a system register followed by an even number of data words in a designated data slot of the register bank 220. So, given the example described earlier,
If the number of data slots specified by the load instruction is 33, the content of the FPREG register 200 is set to FPRE.
After being loaded into the G register 200, the contents of the 32 data slots are loaded. Similarly, when the number of data slots specified in the load instruction is 35, in addition to the above contents, the contents of the FPSCR
Loaded into SCR register. Finally, if the specified number of data slots is 37, in addition to the above contents, the contents of the exception registers are loaded into those exception registers. As will be apparent to those skilled in the art, the particular operation associated with the particular odd number is completely arbitrary and can be varied as desired.

【００８１】図１４は、記憶とロードの命令を実行する
ときに本発明の好適実施例に従うレジスタ制御−命令送
出ユニット４８のオペレーションを示す流れ図である。
最初に、ステップ３００で、命令に表された第一のレジ
スタ番号、すなわちベースレジスタとともに、データワ
ード数（これは好適実施例ではデータスロット数と同じ
である）が命令から読み出される。次に、ステップ３１
０で、命令が倍精度命令であるか判定される。前に説明
したように、命令が倍精度命令であるか単精度命令であ
るかを表すので、この段階でコプロセッサはこの情報を
得ることができる。FIG. 14 is a flowchart illustrating the operation of the register control and instruction dispatch unit 48 in accordance with a preferred embodiment of the present invention when executing store and load instructions.
Initially, at step 300, the number of data words (which in the preferred embodiment is the same as the number of data slots) is read from the instruction, along with the first register number represented in the instruction, the base register. Next, step 31
At 0, it is determined whether the instruction is a double precision instruction. At this stage, the coprocessor can obtain this information because it indicates whether the instruction is a double-precision instruction or a single-precision instruction, as described above.

【００８２】命令が倍精度命令である場合には、プロセ
スはステップ３２０に進む。ステップ３２０で、命令で
指定されたワード数が奇数であるか判定される。この実
施例に対して、ＦＰＲＥＧレジスタ２００の他に種々の
システムレジスタを選択的に転送するために上記の手法
を使用しないものと仮定すると、ワード数が奇数である
場合には、これはＦＰＲＥＧレジスタ２００の内容を転
送すべきであるということを示し、これに応じてステッ
プ３２５で、ＦＰＲＥＧレジスタ２００の内容がロード
記憶制御ユニット４２により転送される。次に、ステッ
プ３２７でワード数が１だけ減らされ。プロセスがステ
ップ３３０に進む。ステップ３２０でワード数が偶数で
あると判定された場合には、プロセスは直接ステップ３
３０に進む。If the instruction is a double precision instruction, the process proceeds to step 320. At step 320, it is determined whether the number of words specified in the instruction is odd. For this embodiment, assuming that the above technique is not used to selectively transfer various system registers in addition to the FPREG register 200, if the number of words is odd, then the FPREG register Indicating that the contents of 200 should be transferred, the contents of FPREG register 200 are transferred by load storage control unit 42 in step 325 accordingly. Next, at step 327, the number of words is reduced by one. The process proceeds to step 330. If step 320 determines that the number of words is even, the process proceeds directly to step 3
Go to 30.

【００８３】ステップ３３０で、ワード数が零より大き
いか判定される。ワード数が零より大きくなければ、命
令は完了したと見なされ、プロセスはステップ３４０で
出る。しかし、ワード数が零より大きければ、プロセス
がステップ３３２に進む。ステップ３３２で、倍精度デ
ータ値（すなわち、二つのデータスロットの内容）が第
一の指定されたレジスタ番号に、またはそれから転送さ
れる。次に、ステップ３３４でワード数が２だけ減らさ
れ、ステップ３３６でレジスタ番号が１だけ増される。
前に説明したように、倍精度命令の場合、レジスタは実
際には二つのデータスロットで構成されるので、レジス
タカウントを１だけ増すことはデータスロット番号を２
だけ増すことと同等である。At step 330, it is determined whether the number of words is greater than zero. If the number of words is not greater than zero, the instruction is considered completed and the process exits at step 340. However, if the number of words is greater than zero, the process proceeds to step 332. At step 332, the double precision data value (ie, the contents of the two data slots) is transferred to or from the first specified register number. Next, at step 334, the number of words is reduced by two, and at step 336, the register number is increased by one.
As explained earlier, for a double precision instruction, the register is actually composed of two data slots, so increasing the register count by one increases the data slot number by two.
It is equivalent to increasing only.

【００８４】次に、手順はステップ３３０に戻る。ステ
ップ３３０で、ワード数がまだ零より大きいか判定され
る。ワード数が零より大きければ、プロセスが繰り返さ
れる。ワード数が零に達すると、プロセスはステップ３
４０で出る。ステップ３１０で命令が倍精度命令でない
と判定された場合には、プロセスはステップ３５０に進
む。ステップ３５０で、ワード数が零より大きいか再び
判定される。ワード数が零より大きければ、プロセスは
ステップ３５２に進む。ステップ３５２で、単精度デー
タ値が命令に表された第一のレジスタ番号に、またはそ
れから転送される。次に、ステップ３５４で、ワード数
が１だけ減らされ、ステップ３５６で、次のデータスロ
ットを指すようにレジスタ番号カウントが１だけ増され
る。次に、プロセスはステップ３５０に戻る。ステップ
３５０で、ワード数がまだ零より大きいか判定される。
ワード数が零より大きければ、プロセスが繰り返され、
ワード数が零に等しくなったときに、プロセスはステッ
プ３６０で出る。Next, the procedure returns to step 330. At step 330, it is determined whether the number of words is still greater than zero. If the number of words is greater than zero, the process repeats. When the number of words reaches zero, the process proceeds to step 3.
Exit at 40. If it is determined in step 310 that the instruction is not a double precision instruction, the process proceeds to step 350. At step 350, it is again determined whether the number of words is greater than zero. If the number of words is greater than zero, the process proceeds to step 352. At step 352, the single precision data value is transferred to or from the first register number represented in the instruction. Next, at step 354, the number of words is reduced by one, and at step 356, the register number count is increased by one to point to the next data slot. Next, the process returns to step 350. At step 350, it is determined whether the number of words is still greater than zero.
If the number of words is greater than zero, the process repeats,
When the number of words equals zero, the process exits at step 360.

【００８５】レジスタバンク内容を知らないコード、た
とえば、コンテキストスイッチコードまたは手順呼エン
トリとエクジット系列を実行するときに、上記のアプロ
ーチにより相当な柔軟性が得られる。これらの場合に
は、オペレーティングシステムはレジスタの内容を知ら
ない、そしてそれらの内容に応じてレジスタに対して異
なる取り扱いをする必要が無いことが望ましい。上記の
アプローチにより、奇数のデータワードを指定する単一
の記憶またはロードの命令でこれらのコードルーチンを
書き込むことができる。コプロセッサがレジスタ内容情
報の使用を必要とする場合には、コプロセッサは命令の
中のデータワードの奇数を、レジスタバンクの中のデー
タの内容を表すために必要とされるフォーマット情報を
メモリに記憶するか、またはメモリからロードする必要
性と解釈する。この柔軟性により、レジスタ内容情報を
必要とするコプロセッサをサポートするための特有のオ
ペレーティングシステムソフトウェアが不要となる。The above approach provides considerable flexibility when executing codes that do not know the register bank contents, for example, context switch codes or procedure call entries and exit sequences. In these cases, it is desirable that the operating system does not know the contents of the registers and does not need to treat the registers differently depending on their contents. With the above approach, these code routines can be written with a single store or load instruction specifying an odd number of data words. If the coprocessor requires the use of register contents information, the coprocessor stores in memory the odd number of data words in the instruction and the format information needed to represent the contents of the data in the register bank. Interpret the need to store or load from memory. This flexibility eliminates the need for special operating system software to support coprocessors that require register content information.

【００８６】この手法により、コードの中の別々のオペ
レーションでレジスタ内容情報をロードし、記憶する必
要もなくなる。レジスタ内容情報をロードし、記憶する
オプションが命令に組み込まれているので、付加的なメ
モリアクセスは不要となる。これにより、コード長さが
短くなり、時間が多分節約される。This technique also eliminates the need to load and store register content information in separate operations in the code. Since the option to load and store register content information is built into the instruction, no additional memory access is required. This shortens the code length and possibly saves time.

【００８７】上記の手法を組み込んだ浮動小数点ユニッ
トのアーキテクチャについて以下に説明する。The architecture of the floating-point unit incorporating the above method will be described below.

【００８８】１．緒言ＶＦＰｖ１はＡＲＭプロセッサモジュールと一緒に使用
するためにコプロセッサとしてインプリメンテーション
されるように設計された浮動小数点システム（ＦＰＳ：
ｆｌｏａｔｉｎｇｐｏｉｎｔｓｙｓｔｅｍ）であ
る。このアーキテクチャのインプリメンテーションはハ
ードウェアまたはソフトウェアに特徴を組み込んでもよ
いし、あるいはインプリメンテーションはソフトウェア
を使用することにより、機能を完全にするか、またはＩ
ＥＥＥ７５４の適合性を提供してもよい。この仕様は、
ハードウェアとソフトウェアのサポートの組み合わせを
使用して全ＩＥＥＥ７５４の適合性を達成しょうとする
ものである。1. INTRODUCTION VFPv1 is a floating point system (FPS: FPS) designed to be implemented as a coprocessor for use with an ARM processor module.
Floating point system). An implementation of this architecture may incorporate features in hardware or software, or the implementation may use software to achieve full functionality or
EEE 754 compatibility may be provided. This specification is
It seeks to achieve full IEEE 754 compliance using a combination of hardware and software support.

【００８９】二つのコプロセッサ番号はＶＦＰｖ１によ
って使用される。単精度オペランドのオペレーションに
対して１０が使用されるのに対して、倍精度オペランド
のオペレーションに対して１１が使用される。単精度デ
ータと倍精度データとの間の変換は、ソースオペランド
コプロセッサ空間で動作する２個の変換命令で実行され
る。The two coprocessor numbers are used by VFPv1. 10 is used for operations on single precision operands, while 11 is used for operations on double precision operands. Conversion between single-precision data and double-precision data is performed by two conversion instructions operating in the source operand coprocessor space.

【００９０】ＶＦＰｖ１アーキテクチャの特徴には下記
のものが含まれる。The features of the VFPv1 architecture include:

【００９１】・サポートコードをそなえたハードウェ
アでのＩＥＥＥ７５４との完全な適合性。・各々がソースオペランドまたはデスティネーション
レジスタとしてアドレス指定可能な３２個の単精度レジ
スタ。・各々がソースオペランドまたはデスティネーション
レジスタとしてアドレス指定可能な１６個の倍精度レジ
スタ。（倍精度レジスタは物理的な単精度レジスタと重
なる）。・ベクトルモードは浮動小数点コード密度およびロー
ドと記憶のオペレーションとの同時作用を著しく増大す
る。・ｄｓｐ（ディジタル信号処理）と図形のオペレーシ
ョンを強化するための８個の循環単精度レジスタの４個
のバンクまたは４個の循環倍精度レジスタの４個のバン
ク。・非正規処理オプションはＩＥＥＥ７５４適合性（浮
動小数点エミュレーションパッケージからのサポートを
意図）または高速フラッシュトゥゼロ（ｆｌｕｓｈ−ｔ
ｏ−ｚｅｒｏ）機能を選択する。・完全なパイプラインチェーン乗算−累算の構成で、
ＩＥＥＥ７５４に適合性のある結果を生じる。・ＦＦＴＯＳＩＺ命令によるＣ、Ｃ＋＋、およびジャ
バ（Ｊａｖａ）に対する浮動小数点から整数への変換。
インプリメンテーションを行う者は完全にハードウェア
でＶＦＰｖ１のインプリメンテーションを行うか、ハー
ドウェアとサポートコードの組み合わせを利用するかを
選択してもよい。ＶＦＰｖ１は完全にソフトウェアでイ
ンプリメンテーションを行ってもよい。Full compatibility with IEEE 754 on hardware with support code. -32 single precision registers, each addressable as a source operand or destination register. • Sixteen double-precision registers, each addressable as a source operand or destination register. (Double precision registers overlap with physical single precision registers). Vector mode significantly increases the floating point code density and the simultaneous operation of load and store operations. Four banks of eight circular single precision registers or four banks of four circular double precision registers to enhance dsp (digital signal processing) and graphics operations. Denormal processing options are either IEEE 754 compatible (intended to be supported from the floating point emulation package) or fast flash-to-zero (flash-t)
o-zero) Select function. A complete pipeline chain multiply-accumulate configuration,
Produces results that are compatible with IEEE754. Conversion of floating point to integer for C, C ++, and Java with the FFTOSIZ instruction.
The implementer may choose to implement VFPv1 entirely in hardware or to use a combination of hardware and support code. VFPv1 may be implemented entirely in software.

【００９２】２．用語この仕様書では以下の用語を使用する。[0092] 2. Terminology The following terms are used in this specification.

【００９３】自動例外 − それぞれの例外イネーブル
ビットの値にかかわらずサポートコードに常にバウンス
（ｂｏｕｎｃｅ）する例外条件。どの例外が自動である
かの選択があれば、それはインプリメンテーションのオ
プションである。セクション１の６．例外処理を参照の
こと。バウンス（Ｂｏｕｎｃｅ） − ユーザトラップハンド
ラを呼び出すことなく、または別の仕方でユーザコード
の正規のフローを遮断することなく、全面的にサポート
コードにより処理される、オペレーティングシステムに
報告される例外。ＣＤＰ − コプロセッサデータ処理（Ｃｏｐｒｏｃｅ
ｓｓｏｒＤａｔａＰｒｏｃｅｓｓｉｎｇ）。ＦＰＳの
場合、ＣＤＰオペレーションはロードまたは記憶のオペ
レーションではなくて、算術オペレーションである。Automatic Exception—An exception condition that always bounces to the support code regardless of the value of each exception enable bit. If there is a choice of which exceptions are automatic, it is an implementation option. Section 1-6. See exception handling. Bounce-an exception reported to the operating system that is handled entirely by support code without invoking a user trap handler or otherwise interrupting the normal flow of user code. CDP-Coprocessor Data Processing
ssor DataProcessing). In the case of FPS, the CDP operation is an arithmetic operation, not a load or store operation.

【００９４】ＣｏｎｖｅｒｔＴｏＵｎｓｉｇｎｅｄＩｎ
ｔｅｇｅｒ（Ｆｍ）（無符号整数への変換） − Ｆｍ
の内容を無符号３２ビット整数値へ変換。結果は、最終
丸めと３２ビットの無符号整数の範囲の外側の浮動小数
点値の処理について、丸めモードによって左右される。
浮動小数点入力値が負または３２ビットの無符号整数に
対して大き過ぎる場合には、ＩＮＶＡＬＩＤ例外が可能
である。ＣｏｎｖｅｒｔＴｏＳｉｇｎｅｄＩｎｔｅｇｅｒ（Ｆ
ｍ）（符号つき整数への変換） − Ｆｍの内容を符号
つき３２ビット整数値へ変換。結果は、最終丸めと３２
ビットの符号つき整数の範囲の外側の浮動小数点値の処
理について、丸めモードによって左右される。浮動小数
点入力値が３２ビットの符号つき整数に対して大き過ぎ
る場合には、ＩＮＶＡＬＩＤ例外が可能である。ＣｏｎｖｅｒｔＵｎｓｉｇｎｅｄＩｎｔＴｏＳｉｎｇｌ
ｅ／Ｄｏｕｂｌｅ（Ｒｄ）（無符号整数を単／倍へ変
換） − ３２ビットの無符号整数と解釈されたＡＲＭ
レジスタの内容（Ｒｄ）を単精度または倍精度の浮動小
数点値に変換。デスティネーション精度が単精度であれ
ば、変換オペレーションでＩＮＥＸＡＣＴ例外が可能で
ある。ＣｏｎｖｅｒｔＳｉｇｎｅｄＩｎｔＴｏＳｉｎｇｌｅ／
Ｄｏｕｂｌｅ（Ｒｄ）（符号つき整数を単／倍へ変換）
− ３２ビットの符号つき整数と解釈されたＡＲＭレ
ジスタの内容（Ｒｄ）を単精度または倍精度の浮動小数
点値に変換。デスティネーション精度が単精度であれ
ば、変換オペレーションでＩＮＥＸＡＣＴ例外が可能で
ある。ConvertToUnsignedIn
teger (Fm) (conversion to unsigned integer)-Fm
Is converted to an unsigned 32-bit integer value. The result depends on the rounding mode for final rounding and handling of floating point values outside the range of 32-bit unsigned integers.
If the floating point input value is too large for a negative or 32-bit unsigned integer, an INVALID exception is possible. ConvertToSignedInteger (F
m) (Conversion to signed integer)-Convert the contents of Fm to a signed 32-bit integer value. The result is the final round and 32
Handling of floating point values outside the range of signed integer bits is governed by the rounding mode. If the floating point input value is too large for a 32-bit signed integer, an INVALID exception is possible. ConvertUnsignedIntToSingl
e / Double (Rd) (unsigned integer converted to single / multiple)-ARM interpreted as a 32-bit unsigned integer
Convert register contents (Rd) to single or double precision floating point values. If the destination precision is single precision, an INEXACT exception is possible in the conversion operation. ConvertSignedIntToSingle /
Double (Rd) (Convert signed integers to single / multiple)
Convert the ARM register contents (Rd) interpreted as a 32-bit signed integer to a single or double precision floating point value. If the destination precision is single precision, an INEXACT exception is possible in the conversion operation.

【００９５】非正規化値 − 範囲（−２^Emin＜ｘ＜２
^Emin）での値の表現。単精度と倍精度のオペランドに対
するＩＥＥＥ７５４フォーマットでは、非正規化値すな
わちｄｅｎｏｒｍａｌは零指数をそなえ、先行シグニフ
ィカンド（ｓｉｇｎｉｆｉｃａｎｄ）ビットは１ではな
くて、０である。ＩＥＥＥ７５４−１９８５の仕様で
は、非正規化オペランドの生成と操作は正規オペランド
の場合と同じ精度で行わなければならない。ディスエーブルド（Ｄｉｓａｂｌｅｄ）例外 − ＦＰ
ＳＣＲの中の対応する例外イネーブルビットが０に設定
された例外は「ディスエーブルド」（ｄｉｓａｂｌｅ
ｄ）と呼ばれる。これらの例外の場合、ＩＥＥＥ７５４
仕様は正しい結果を返すように定める。例外条件を発生
するオペレーションは、サポートコードにバウンスし
て、ＩＥＥＥ７５４で定められた結果を生じる。例外は
ユーザ例外ハンドラに報告されない。イネーブルド（Ｅｎａｂｌｅｄ）例外 − それぞれの
例外イネーブルビットが１に設定された例外。この例外
の生起の際に、ユーザハンドラへのトラップが行われ
る。例外条件を生成するオペレーションはサポートコー
ドにバウンスすることにより、ＩＥＥＥ７５４で定めら
れた結果を生じる。次に、例外はユーザ例外ハンドラに
報告される。Denormalized value-range (-2 ^Emin <x <2
^Emin ). In the IEEE 754 format for single and double precision operands, the denormalized value or denormal has a zero exponent and the leading significand bit is zero instead of one. According to the IEEE 754-1985 specification, the generation and manipulation of denormalized operands must be performed with the same precision as for normal operands. Disabled Exception-FP
An exception for which the corresponding exception enable bit in the SCR is set to 0 is "disabled" (disabled).
Called d). For these exceptions, IEEE 754
The specifications are to return the correct result. Operations that raise an exception condition bounce to the support code and produce the results specified in IEEE 754. No exception is reported to the user exception handler. Enabled exceptions-Exceptions in which each exception enable bit is set to one. When this exception occurs, a trap to the user handler is performed. Operations that generate exceptional conditions will bounce to the support code, producing the results specified in IEEE 754. Next, the exception is reported to the user exception handler.

【００９６】指数 − 表現された数の値を判定する際
に２の整数乗を通常表す浮動小数点の成分。時々、指数
は符号つきまたは不偏の指数と呼ばれる。小数部 − その暗示された２進小数点の右側にあるシ
グニフィカンドのフィールド。フラッシュ−トゥー−ゼロモード − このモードで
は、丸めの後の範囲（−２^Emin＜ｘ＜２^Emin）のすべて
の値は非正規化された値に変換されるのではなくて、零
として扱われる。高（Ｆｎ／Ｆｍ） − メモリで表現された倍精度の上
位３２ビット［６３：３２］。Exponent-A floating point component that typically represents an integer power of two in determining the value of a represented number. Sometimes the exponent is called a signed or unbiased exponent. Fraction-the field of the significand to the right of its implied binary point. Flash-to-zero mode-in this mode, all values in the range after rounding ( ^-2Emin <x < ^2Emin ) are treated as zero rather than being converted to denormalized values . High (Fn / Fm) —The upper 32 bits [63:32] of the double precision represented in memory.

【００９７】ＩＥＥＥ７５４−１９８５ − アメリカ
電気電子学会、「２進浮動小数点演算のＩＥＥＥ規格」
（”ＩＥＥＥＳｔａｎｄａｒｄｆｏｒＢｉｎａｒ
ｙＦｌｏａｔｉｎｇ−ＰｏｉｎｔＡｒｉｔｈｍｅｔｉ
ｃ”，ＡＮＳＩ／ＩＥＥＥＳｔｄ７５４−１９８５，
ＴｈｅＩｎｓｔｉｔｕｔｅｏｆＥｌｅｃｔｒｉｃ
ａｌａｎｄＥｌｅｃｔｒｏｎｉｃｓＥｎｇｉｎｅ
ｅｒｓ，Ｉｎｃ．ＮｅｗＹｏｒｋ，１００１７）。し
ばしばＩＥＥＥ７５４規格と呼ばれるこの規格は、デー
タ型、正しいオペレーション、例外の型と処理、および
浮動小数点システムに対するエラー範囲を定める。ほと
んどのプロセッサは、ハードウェアまたはハードウェア
とソフトウェアとの組み合わせの規格に従って構築され
る。無限大 − ∞を表現するために使用されるＩＥＥＥ７
５４の特殊フォーマット。指数が精度に対して最大とな
り、シグニフィカンドはオール零となる。入力例外 − 与えられたオペレーションに対するオペ
ランドの一つ以上がハードウェアによってサポートされ
ない例外条件。オペレーションの完了のために、オペレ
ーションはサポートコードにバウンスする。IEEE 754-1985-American Institute of Electrical and Electronics Engineers, "IEEE Standard for Binary Floating-Point Arithmetic"
("IEEE Standard for Binar
yFloating-Point Arithmeti
c ", ANSI / IEEE Std 754-1985,
The Institute of Electric
al and Electronics Engineering
ers, Inc. New York, 10017). This standard, often referred to as the IEEE 754 standard, defines data types, correct operations, exception types and handling, and error bounds for floating point systems. Most processors are built according to hardware or a combination of hardware and software. IEEE7 used to represent infinity-∞
54 special formats. The exponent is maximum for accuracy and the significand is all zeros. Input exception-an exception condition in which one or more of the operands for a given operation is not supported by the hardware. The operation bounces to the support code for completion of the operation.

【００９８】中間結果 − 丸めの前に計算の結果を記
憶するために使用される内部フォーマット。このフォー
マットは、デスティネーションフォーマットより大きな
指数フィールドとシグニフィカンドフィールドをそなえ
得る。低（Ｆｎ／Ｆｍ） − メモリで表現された倍精度の下
位３２ビット［３１：０］。ＭＣＲ − ”ＭｏｖｅｔｏＣｏｐｒｏｃｅｓｓｏ
ｒｆｒｏｍＡＲＭＲｅｇｉｓｔｅｒ”（ＡＲＭレジ
スタからコプロセッサへ移動）。ＦＰＳの場合、これに
は、ＡＲＭレジスタとＦＰＳレジスタとの間でデータの
転送またはレジスタの制御を行う命令が含まれる。単一
のＭＣＲクラス命令を使用して、情報の３２ビットだけ
を転送してもよい。Intermediate result--an internal format used to store the result of a calculation before rounding. This format may have a larger exponent field and a significant field than the destination format. Low (Fn / Fm)-the lower 32 bits of the double precision [31: 0] represented in memory. MCR-"Move to Coprocesso
r from ARMRegister "(move from ARM register to coprocessor). In the case of FPS, this includes instructions to transfer data or control registers between ARM and FPS registers. Single MCR class Instructions may be used to transfer only 32 bits of information.

【００９９】ＭＲＣ − ”ＭｏｖｅｔｏＡＲＭ
ＲｅｇｉｓｔｅｒｆｒｏｍＣｏｐｒｏｃｅｓｓｏ
ｒ”（コプロセッサからＡＲＭレジスタへ移動）。ＦＰ
Ｓの場合、これには、ＦＰＳとＡＲＭレジスタとの間で
データの転送またはレジスタの制御を行う命令が含まれ
る。単一のＭＣＲクラス命令を使用して、情報の３２ビ
ットだけを転送してもよい。ＮａＮ − Ｎｏｔａｎｕｍｂｅｒ（数ではな
い）。浮動小数点フォーマットで符号化された記号存
在。二つの型のＮａＮ、シグナリングとノンシグナリン
グ、すなわち静止とがある。シグナリングＮａＮは、オ
ペランドとして使用された場合、無効オペランド例外を
生じる。静止ＮａＮは、シグナリング例外無しに殆どす
べての算術オペレーションを通って伝搬する。ＮａＮに
対するフォーマットは、シグニフィカンドが非零であ
る、すべて１の指数フィールドをそなえている。シグナ
リングＮａＮを表現するために、小数部の最上位ビット
が０であるのに対して、静止ＮａＮは１に設定されたビ
ットをそなえている。MRC-"Move to ARM"
Register from Coprocessor
r "(moved from coprocessor to ARM register) FP
For S, this includes instructions to transfer data or control registers between the FPS and ARM registers. A single MCR class instruction may be used to transfer only 32 bits of information. NaN-Nota number (not a number). The presence of a symbol encoded in floating point format. There are two types of NaN, signaling and non-signaling, i.e. stationary. Signaling NaNs cause invalid operand exceptions when used as operands. A stationary NaN propagates through almost all arithmetic operations without signaling exceptions. The format for NaN has an all-one exponent field with a nonzero significand. To represent the signaling NaN, the most significant bit of the decimal part is 0, whereas the static NaN has a bit set to 1.

【０１００】Ｒｅｓｅｒｖｅｄ（リザーブド） −
制御レジスタまたは命令フォーマットの中のフィールド
は、そのフィールドがインプリメンテーションによって
定義されるべき場合に「リザーブド」となる。フィール
ドの内容が０でない場合には、予測不能の（ＵＮＰＲＥ
ＤＩＣＴＡＢＬＥ）結果を生じる。これらのフィールド
は、アーキテクチャの将来の拡張で使用するために取っ
て置かれる。すなわち、インプリメンテーション特有の
ものである。インプリメンテーションによって使用され
ない、すべてのリザーブドビットは零と書かれなければ
ならず、零と読まれる。Reserved-
A field in a control register or instruction format is "reserved" if the field is to be defined by an implementation. If the content of the field is not 0, it is unpredictable (UNPRE
DIC TABLE) results. These fields are reserved for use in future extensions of the architecture. That is, it is implementation specific. All reserved bits not used by the implementation must be written as zero and read as zero.

【０１０１】丸めモード − ＩＥＥＥ７５４仕様で
は、すべての計算をあたかも無限の精度までのように行
うことが要求されている。すなわち、二つの単精度値の
乗算では、シグニフィカンドのビット数の２倍までシグ
ニフィカンドを正確に計算しなければならない。デステ
ィネーション精度でこの値を表現するために、シグニフ
ィカンドの丸めが、しばしば必要とされる。ＩＥＥＥ７
５４規格では、四つの丸めモードが指定されている。す
なわち、最も近いものへの丸め（ＲＮ：ｒｏｕｎｄｔ
ｏｎｅａｒｅｓｔ）、零への丸め、すなわち（ＲＺ：
ｒｏｕｎｄｔｏｚｅｒｏ）、プラス無限大への丸め
（ＲＰ：ｒｏｕｎｄｔｏｐｌｕｓｉｎｆｉｎｉｔ
ｙ）、およびマイナス無限大への丸め（ＲＭ：ｒｏｕｎ
ｄｔｏｍｉｎｕｓｉｎｆｉｎｉｔｙ）である。第一
の丸めは、真ん中の点で丸めることによって行われ、真
ん中の場合、シグニフィカンドの最下位ビットを零にす
るときは切り上げて「丁度」にする。第二の丸めは、シ
グニフィカンドの右側のどのビットも事実上切り捨て
る。このように、第二の丸めは常に切り捨てを行い、整
数変換でＣ、Ｃ＋＋、およびジャバ（Ｊａｖａ）言語に
より使用される。後の二つのモードは区間演算で使用さ
れる。Rounding Mode--The IEEE 754 specification requires that all calculations be performed as if to infinite precision. That is, in multiplying two single precision values, the significand must be accurately calculated up to twice the number of bits of the significand. Significant rounding is often required to represent this value with destination precision. IEEE7
In the 54 standard, four rounding modes are specified. That is, rounding to the nearest (RN: round t
o nearest), rounding to zero, ie (RZ:
round tozero, plus rounding to infinity (RP: round to plus infinite)
y), and rounding to minus infinity (RM: run
d tominus infinity). The first rounding is done by rounding at the middle point, in which case the least significant bit of the significand is rounded up to "just" to zero. The second rounding effectively truncates any bits to the right of the significand. Thus, the second round always truncates and is used by the C, C ++, and Java languages for integer conversion. The latter two modes are used in interval arithmetic.

【０１０２】シグニフィカンド − 暗示された２進小
数点の左側の明示されるか暗示された先行ビットと右側
の小数部フィールドで構成される２進浮動小数点数の成
分。Significand--A component of a binary floating point number consisting of an explicit or implied leading bit to the left of the implied binary point and a fraction field to the right.

【０１０３】サポートコード − ハードウェアを補足
することによりＩＥＥＥ７５４規格との適合性が得られ
るように使用されなければならないソフトウェア。サポ
ートコードは二つの成分をそなえるように考えられてい
る。一つの成分はルーチンのライブラリである。ルーチ
ンは、超越的な演算のような、ハードウェアの範囲を超
えるオペレーション、およびサポートされない入力での
除算または例外を生じ得る入力のようなサポートされた
機能を行う。もう一つの成分は例外ハンドラのセットで
ある。例外ハンドラはＩＥＥＥ７５４に従うようにする
ために例外条件を処理する。サポートコードは、サポー
トされないデータ型またはデータ表現（たとえば、非正
規値または１０進データ型）の適切な処理のエミュレー
ションを行うために、インプリメンテーションされた機
能を実行しなければならない。ルーチンの出口でユーザ
の状態を復帰させるように配慮されていれば、ＦＳＰを
中間の計算で利用するようにルーチンを書いてもよい。Support Code-Software that must be used to complement the hardware to achieve compliance with the IEEE 754 standard. The support code is designed to have two components. One component is a library of routines. Routines perform operations that are beyond the scope of the hardware, such as transcendental operations, and supported functions, such as inputs that can cause division or exceptions on unsupported inputs. Another component is a set of exception handlers. The exception handler handles exception conditions to comply with IEEE 754. The support code must perform the implemented functions in order to emulate the proper handling of unsupported data types or data representations (eg, denormalized or decimal data types). If care is taken to restore the user's state at the exit of the routine, the routine may be written to use the FSP for intermediate calculations.

【０１０４】トラップ − それぞれの例外イネーブル
ビットがＦＰＳＣＲに設定された例外条件。ユーザのト
ラップハンドラが実行される。未定義（ＵＮＤＥＦＩＮＥＤ） − 未定義の命令トラ
ップを生成する命令を示す。ＡＲＭ例外についての更に
詳しい情報については、ＡＲＭアーキテクチャのレファ
レンスマニュアルを参照のこと。予測不能（ＵＮＰＲＥＤＩＣＴＡＢＬＥ） − 頼るこ
とができない命令または制御レジスタのフィールド値の
結果。予測不能（ＵＮＰＲＥＤＩＣＴＡＢＬＥ）な命令
または結果は、セキュリティホールを表現してはならな
いし、プロセッサまたはシステムのどの部分をも停止さ
せてはならない。Trap-Exception condition with each exception enable bit set in FPSCR. The user's trap handler is executed. UNDEFINED-Indicates an instruction that generates an undefined instruction trap. See the ARM Architecture Reference Manual for more information about ARM exceptions. UNPREDICTABLE-The result of an unreliable instruction or control register field value. Unpredictable instructions or results must not represent a security hole and should not halt any part of the processor or system.

【０１０５】サポートされないデータ（Ｕｎｓｕｐｐｏ
ｒｔｅｄＤａｔａ） − ハードウェアによって処理
されないが、完了のためサポートコードにバウンドされ
る特定のデータ値。これらのデータは無限大、ＮａＮ、
非正規値、および零を含んでもよい。インプリメンテー
ションは、ハードウェアで全面的または部分的にこれら
の値の中のどれをサポートするかを自由に選択し、ある
いはオペレーションを完了するためにサポートコードか
らの助けを必要とする。例外に対する対応する例外イネ
ーブルビットが設定されれば、サポートされないデータ
を処理することによって生じるいかなる例外もユーザコ
ードにトラップされる。Unsupported data (Unsuppo)
rtedData)-a particular data value that is not processed by hardware but is bound to a support code for completion. These data are infinite, NaN,
It may include non-normal values and zero. Implementations are free to choose which of these values are fully or partially supported in hardware, or require the help of supporting code to complete the operation. If the corresponding exception enable bit for an exception is set, any exception resulting from processing unsupported data is trapped in user code.

【０１０６】３．レジスタファイル3. Register file

【０１０７】３．１緒言このアーキテクチャは３２個の単精度レジスタと１６個
の倍精度レジスタを提供する。これらはすべて、ソース
またはデスティネーションのオペランドとして、完全に
定義された５ビットのレジスタインデックスの中で個々
にアドレス指定することができる。3.1 INTRODUCTION This architecture provides 32 single precision registers and 16 double precision registers. All of these can be individually addressed as source or destination operands in a well-defined 5-bit register index.

【０１０８】３２個の単精度レジスタは１６個の倍精度
レジスタと重なる。すなわち、Ｄ５への倍精度データの
書き込みはＳ１０とＳ１１の内容の上書きである。オー
バラップしたインプリメンテーションで単精度データ記
憶装置としてのレジスタの使用と倍精度データ記憶装置
の半分としてのレジスタの使用との間のレジスタ使用の
衝突に気がつくことが、コンパイラまたはアセンブリ言
語のプログラマの仕事である。レジスタの使用を一つの
精度に限定するためのハードウェアは設けられていな
い。これに違反した場合には、結果は予測不能（ＵＮＰ
ＲＥＤＩＣＴＡＢＬＥ）である。The 32 single precision registers overlap the 16 double precision registers. That is, the writing of the double-precision data to D5 is the overwriting of the contents of S10 and S11. A notice of register usage conflicts between the use of registers as single-precision data storage and the use of registers as half of double-precision data storage in overlapping implementations has been found by compiler or assembly language programmers. Work. No hardware is provided to limit the use of registers to one precision. If this is violated, the result is unpredictable (UNP
REDICTABLE).

【０１０９】ＶＦＰｖ１は、１個、２個、または３個の
オペランドレジスタを使用して結果を作成し、結果をデ
スティネーションレジスタに書き込むスカラモードで、
または指定されたオペランドが１群のレジスタを参照す
るベクトルモードで、これらのレジスタへのアクセスを
提供する。ＶＦＰｖ１は、単精度オペランドの場合は単
一の命令で８個までの要素について、そして倍精度オペ
ランドの場合は４個までの要素についてベクトルオペレ
ーションをサポートする。VFPv1 is a scalar mode that creates a result using one, two, or three operand registers and writes the result to a destination register.
Or, provide access to these registers in vector mode where the specified operand references a group of registers. VFPv1 supports vector operations for up to eight elements in a single instruction for single precision operands and up to four elements for double precision operands.

【０１１０】表１ＬＥＮビット符号化ＬＥＮベクトル長さ符号化０００スカラ００１ベクトル長さ２０１０ベクトル長さ３０１１ベクトル長さ４１００ベクトル長さ５１０１ベクトル長さ６１１０ベクトル長さ７１１１ベクトル長さ８Table 1 LEN bit coded LEN vector length coding 000 Scalar 001 Vector length 2 010 Vector length 3 011 Vector length 4 100 Vector length 5 101 Vector length 6 110 Vector length 7 111 Vector length 8

【０１１１】ベクトルモードは、非零値をＬＥＮフィー
ルドに書き込むことによってイネーブルされる。ＬＥＮ
フィールドに０が含まれている場合には、ＦＰＳはスカ
ラモードで動作し、レジスタフィールドはフラットレジ
スタモデルで３２個の個々の単精度レジスタまたは１６
個の倍精度レジスタをアドレス指定するものと解釈され
る。ＬＥＮフィールドが非零である場合には、ＦＰＳは
ベクトルモードで動作し、レジスタフィールドはレジス
タのベクトルをアドレス指定するものと解釈される。Ｌ
ＥＮフィールドの符号化については、表１参照。The vector mode is enabled by writing a non-zero value to the LEN field. LEN
If the field contains a 0, the FPS operates in scalar mode and the register field contains 32 individual single precision registers or 16
Are interpreted as addressing double precision registers. If the LEN field is non-zero, the FPS operates in vector mode and the register field is interpreted as addressing a vector of registers. L
See Table 1 for EN field encoding.

【０１１２】ＬＥＮフィールドを変えることなくスカラ
オペレーションとベクトルオペレーションとを混合する
手段は、デスティネーションレジスタの指定により利用
できる。デスティネーションレジスタがレジスタの第一
のバンク（Ｓ０−Ｓ７またはＤ０−Ｄ３）にある場合に
は、ベクトルモードにある間にスカラオペレーションを
指定してもよい。更に詳しい情報についてはセクション
１を参照のこと。The means for mixing the scalar operation and the vector operation without changing the LEN field can be used by specifying the destination register. If the destination register is in the first bank of registers (S0-S7 or D0-D3), a scalar operation may be specified while in vector mode. See Section 1 for more information.

【０１１３】３．２単精度レジスタの使用ＦＰＳＣＲのＬＥＮフィールドが０である場合には、Ｓ
０からＳ３１と番号を付けられた３２個の単精度レジス
タが利用できる。どのレジスタもソースレジスタまたは
デスティネーションレジスタとして使用することができ
る。3.2 Use of Single-precision Register If the LEN field of the FPSCR is 0, S
Thirty-two single precision registers numbered 0 to S31 are available. Any register can be used as a source register or a destination register.

【０１１４】[0114]

【外１】イラスト１単精度レジスタマップ[Outside 1] Illustration 1 Single precision register map

【０１１５】単精度（コプロセッサ１０）のレジスタマ
ップは、イラスト１に示すように描くことができる。The register map of the single precision (coprocessor 10) can be drawn as shown in Illustration 1.

【０１１６】ＦＰＳＣＲのＬＥＮフィールドが０より大
きい場合には、イラスト２に示すように、レジスタファ
イルは８個の循環レジスタの４個のバンクとして振る舞
う。ベクトルレジスタの第一のバンクＶ０からＶ７はス
カラレジスタＳ０からＳ７と重なり、各オペランドに対
して選択されたレジスタに応じてスカラまたはベクトル
としてアドレス指定される。より詳しい情報について
は、セクション１、３．４レジスタの使用を参照のこ
と。If the LEN field of the FPSCR is greater than 0, the register file behaves as four banks of eight circular registers, as shown in Illustration 2. The first bank of vector registers V0 to V7 overlap with the scalar registers S0 to S7 and are addressed as scalars or vectors depending on the register selected for each operand. See Section 1, 3.4 Using Registers for more information.

【０１１７】[0117]

【外２】イラスト２単精度レジスタの循環[Outside 2] Illustration 2 Single precision register circulation

【０１１８】たとえば、ＦＰＳＣＲのＬＥＮが３に設定
されている場合には、参照ベクトルＶ１０がレジスタＳ
１０、Ｓ１１、Ｓ１２、およびＳ１３をベクトルオペレ
ーションに含める。同様に、Ｖ２２はＳ２２、Ｓ２３、
Ｓ１６、およびＳ１７をオペレーションに含める。ベク
トルモードでレジスタファイルがアクセスされると、順
序でＶ７に続くレジスタはＶ０である。同様に、Ｖ８が
Ｖ１５に続き、Ｖ１６がＶ２３に続き、Ｖ２４がＶ３１
に続く。For example, when LEN of FPSCR is set to 3, reference vector V10 is stored in register S
10, S11, S12, and S13 are included in the vector operation. Similarly, V22 is S22, S23,
S16 and S17 are included in the operation. When the register file is accessed in vector mode, the register following V7 in order is V0. Similarly, V8 follows V15, V16 follows V23, and V24 becomes V31.
followed by.

【０１１９】３．３倍精度レジスタの使用ＦＰＳＣＲのＬＥＮフィールドが０である場合には、１
６個の倍精度スカラレジスタが利用できる。3.3 Use of Double Precision Register When the LEN field of the FPSCR is 0, 1 is used.
Six double-precision scalar registers are available.

【０１２０】[0120]

【外３】イラスト３倍精度レジスタのマップ[Outside 3] Illustration 3 Double precision register map

【０１２１】どのレジスタもソースレジスタまたはデス
ティネーションレジスタとして使用することができる。
レジスタマップは、イラスト３に示すように描くことが
できる。[0121] Any of the registers can be used as a source register or a destination register.
The register map can be drawn as shown in Illustration 3.

【０１２２】ＦＰＳＣＲのＬＥＮフィールドが０より大
きい場合には、イラスト４に示すように、４個の循環レ
ジスタの４個のバンクでは、４個のスカラレジスタと１
６個のベクトルレジスタが利用できる。ベクトルレジス
タの第一のバンクＶ０からＶ３はスカラレジスタＤ０か
らＤ３と重なる。各オペランドに対して選択されたレジ
スタに応じてスカラまたはベクトルとしてアドレス指定
される。より詳しい情報については、セクション１、
３．４レジスタの使用を参照のこと。When the LEN field of the FPSCR is greater than 0, as shown in Illustration 4, four banks of four circular registers have four scalar registers and one
Six vector registers are available. The first bank of vector registers V0 to V3 overlap with the scalar registers D0 to D3. Each operand is addressed as a scalar or vector depending on the register selected. For more information, see Section 1,
See 3.4 Using registers.

【０１２３】[0123]

【外４】イラスト４倍精度レジスタの循環[Outside 4] Illustration 4 Circulation of double precision registers

【０１２４】セクション１の単精度の例と同様に、４個
のバンクの中で倍精度レジスタが循環している。As in the single precision example in section 1, double precision registers are circulated in the four banks.

【０１２５】３．４レジスタの使用スカラとベクトルとの間のこれらのオペレーションがサ
ポートされる。（ＯＰ ₂は浮動小数点コプロセッサによ
ってサポートされる二つのオペランドオペレーションの
どれであってもよい。ＯＰ₃は三つのオペランドオペレ
ーションのどれであってもよい。）3.4 Register Use These operations between scalars and vectors are supported.
Ported. (OP _TwoIs a floating-point coprocessor
The two operand operations supported by
Any may be used. OP_ThreeIs a three operand operator
Any of the options. )

【０１２６】以下の説明では、レジスタファイルの「第
一のバンク」は、単精度のオペレーションの場合はレジ
スタＳ０−Ｓ７、倍精度のオペレーションの場合はＤ０
−Ｄ３と定められる。In the following description, the "first bank" of the register file is the registers S0-S7 for single-precision operation, and D0 for double-precision operation.
-D3.

【０１２７】・ＳｃａｌａｒＤ＝ＯＰ₂Ｓｃａｌａｒ
ＡまたはＳｃａｌａｒＤ＝ＳｃａｌａｒＡＯＰ₃
ＳｃａｌａｒＢまたはＳｃａｌａｒＤ＝Ｓｃａｌａ
ｒＡ＊ＳｃａｌａｒＢ＋ＳｃａｌａｒＤ・ＶｅｃｔｏｒＤ＝ＯＰ₂ＳｃａｌａｒＡまたは
ＶｅｃｔｏｒＤ＝ＳｃａｌａｒＡＯＰ₃Ｖｅｃｔｏｒ
ＢまたはＶｅｃｔｏｒＤ＝ＳｃａｌａｒＡ＊Ｖｅｃ
ｔｏｒＢ＋ＶｅｃｔｏｒＤ・ＶｅｃｔｏｒＤ＝ＯＰ₂ＶｅｃｔｏｒＡまたは
ＶｅｃｔｏｒＤ＝ＶｅｃｔｏｒＡＯＰ₃Ｖｅｃｔｏｒ
ＢまたはＶｅｃｔｏｒＤ＝ＶｅｃｔｏｒＡ＊Ｖｅｃ
ｔｏｒＢ＋ＶｅｃｔｏｒＤScalarD = OP ₂ Scalar
A or ScalarD = ScalarA OP ₃
ScalarB or ScalarD = Scala
rA * ScalarB + ScalarD • VectorD = OP ₂ ScalarA or
VectorD = ScalarA OP ₃ Vector
B or VectorD = ScalarA * Vec
torB + VectorD • VectorD = OP ₂ VectorA or
VectorD = VectorA OP ₃ Vector
B or VectorD = VectorA * Vec
torB + VectorD

【０１２８】３．４．１スカラオペレーション二つの条件で、ＦＰＳがスカラモードで動作する。3.4.1 Scalar Operation Under two conditions, the FPS operates in scalar mode.

【０１２９】１？ＦＰＳＣＲのＬＥＮフィールドが０
である。デスティネーションレジスタとソースレジスタ
は単精度オペレーションの場合にはスカラレジスタ０か
ら３１のどれであってもよく、倍精度オペレーションの
場合にはスカラレジスタ０から１５のどれであってもよ
い。命令で明示的に指定されたレジスタ上でだけ、オペ
レーションが行われる。1? LEN field of FPSCR is 0
It is. The destination and source registers may be any of scalar registers 0 through 31 for single precision operation, or any of scalar registers 0 through 15 for double precision operation. The operation is performed only on registers explicitly specified in the instruction.

【０１３０】２？デスティネーションレジスタは、レ
ジスタファイルの第一のバンク内にある。ソーススカラ
は他のレジスタのどれであってもよい。このモードによ
って、ＦＰＳＣＲのＬＥＮフィールドを変える必要無し
にスカラオペレーションとベクトルオペレーションとの
混合が可能になる。2? The destination register is in the first bank of the register file. The source scalar can be any of the other registers. This mode allows a mix of scalar and vector operations without having to change the LPS field of the FPSCR.

【０１３１】３．４．２ベクトルデスティネーション
とともにスカラとベクトルのソースを含むオペレーショ
ンこのモードで動作するために、ＦＰＳＣＲのＬＥＮフィ
ールドは零より大きく、デスティネーションレジスタは
レジスタファイルの第一のバンクの中に無い。スカラソ
ースレジスタはレジスタファイルの第一のバンクの中の
どのレジスタであってもよいが、ＶｅｃｔｏｒＢに対し
ては残りのレジスタのどれであってもよい。ソーススカ
ラレジスタがＶｅｃｔｏｒＢのメンバである場合、また
はＬＥＮ個より少ない要素でＶｅｃｔｏｒＤがＶｅｃｔ
ｏｒＢと重なる場合には、振る舞いは予測不能（ＵＮＰ
ＲＥＤＩＣＴＡＢＬＥ）である。ＶｅｃｔｏｒＤとＶｅ
ｃｔｏｒＢは同じベクトルであるか、またはすべてのメ
ンバで完全に異なっていなければならない。セクション
１の概括表参照。3.4.2 Operation Involving Scalar and Vector Source with Vector Destination To operate in this mode, the LEN field of the FPSCR is greater than zero and the destination register is located in the first bank of the register file. Not in The scalar source register may be any register in the first bank of the register file, but may be any of the remaining registers for VectorB. If the source scalar register is a member of VectorB, or if less than LEN elements VectorD is Vector
orB, the behavior is unpredictable (UNP
REDICTABLE). VectorD and Ve
ctorB must be the same vector or completely different for all members. See section 1 summary table.

【０１３２】３．４．３ベクトルオペレーションだけ
を含むオペレーションこのモードで動作するために、ＦＰＳＣＲのＬＥＮフィ
ールドは零より大きく、デスティネーションベクトルレ
ジスタはレジスタファイルの第一のバンクの中に無い。
ＶｅｃｔｏｒＡの個々の要素はＶｅｃｔｏｒＢの対応す
る要素と組合わされて、ＶｅｃｔｏｒＤに書き込まれ
る。ＶｅｃｔｏｒＡに対してはレジスタファイルの第一
のバンクの中に無いどのレジスタも利用できるが、Ｖｅ
ｃｔｏｒＢに対してはすべてのベクトルが利用できる。
第二の場合のように、ソースベクトルのどちらかとデス
ティネーションベクトルがＬＥＮより少ない個数の要素
で重なる場合には、振る舞いは予測不能（ＵＮＰＲＥＤ
ＩＣＴＡＢＬＥ）である。それらは同じであるか、また
はすべてのメンバで完全に異なっていなければならな
い。セクション１の概括表参照。3.4.3 Operations Including Only Vector Operations To operate in this mode, the LEN field of the FPSCR is greater than zero and the destination vector register is not in the first bank of the register file.
The individual elements of VectorA are combined with the corresponding elements of VectorB and written to VectorD. For VectorA, any registers not in the first bank of the register file can be used,
All vectors are available for ctorB.
As in the second case, if either of the source vectors and the destination vector overlap with fewer elements than LEN, the behavior is unpredictable (UNPRED
ICTABLE). They must be the same or completely different on all members. See section 1 summary table.

【０１３３】注意すべきことは、ＦＭＡＣファミリのオ
ペレーションについては、デスティネーションレジスタ
またはベクトルは常に累算レジスタまたはベクトルであ
る。Note that for FMAC family of operations, the destination register or vector is always an accumulation register or vector.

【０１３４】３．４．４オペレーション概括表次の表は、単精度と倍精度の、オペランドが２個と３個
の命令に対するレジスタ使用オプションを表す。「任
意」は、指定されたオペランドに対する、その精度のす
べてのレジスタの利用可能性を意味する。3.4.4 Operation Summary Table The following table shows the register usage options for single and double precision instructions with two and three operands. "Any" means the availability of all registers of that precision for the specified operand.

【０１３５】表２単精度３オペランドのレジスタ使用ＬＥＮデスティネー第一の第二のオペレーション型フィールドションソースソースレジスタレジスタレジスタ０任意任意任意Ｓ＝ＳｏｐＳまたはＳ＝Ｓ＊Ｓ＋Ｓ非００−７任意任意Ｓ＝ＳｏｐＳまたはＳ＝Ｓ＊Ｓ＋Ｓ非０８−３１０−７任意Ｖ＝ＳｏｐＶまたはＶ＝Ｓ＊Ｖ＋Ｖ非０８−３１８−３１任意Ｖ＝ＶｏｐＶまたはＶ＝Ｖ＊Ｖ＋ＶTable 2 Single-precision three-operand registers used LEN destination destination second first operation type field source source register register register 0 optional optional optional S = SopS or S = S * S + S non-zero 0-7 optional optional S = SopS or S = S * S + S Non-08-31 0-7 Arbitrary V = SopV or V = S * V + V Non-08-31 8-31 Arbitrary V = VopV or V = V * V + V

【０１３６】表３単精度２オペランドのレジスタ使用ＬＥＮデスティネーソースオペレーション型フィールドションレジスタレジスタ０任意任意Ｓ＝ｏｐＳ非００−７任意Ｓ＝ｏｐＳ非０８−３１０−７Ｖ＝ｏｐＳ非０８−３１８−３１Ｖ＝ｏｐＶTable 3 Single-precision two-operand registers used LEN destination source operation type field register register 0 any optional S = opS non-zero 0-7 optional S = opS non-zero 8-31 0-7 V = opS non-zero 8 −31 8−31 V = opV

【０１３７】表４倍精度３オペランドのレジスタ使用ＬＥＮデスティネー第一の第二のオペレーション型フィールドションソースソースレジスタレジスタレジスタ０任意任意任意Ｓ＝ＳｏｐＳまたはＳ＝Ｓ＊Ｓ＋Ｓ非００−３任意任意Ｓ＝ＳｏｐＳまたはＳ＝Ｓ＊Ｓ＋Ｓ非０４−１５０−３任意Ｖ＝ＳｏｐＶまたはＶ＝Ｓ＊Ｖ＋Ｖ非０４−１５４−１５任意Ｖ＝ＶｏｐＶまたはＶ＝Ｖ＊Ｖ＋ＶTable 4 Use of Double-precision 3 Operand Registers LEN Destine First Second Operation Type Field Source Source Register Register Register 0 Optional Optional Optional S = SopS or S = S * S + S Non-zero 0-3 Optional Optional S = SopS or S = S * S + S Non0 4-15 0-3 Any V = SopV or V = S * V + V Non0 4-15 4-15 Any V = VopV or V = V * V + V

【０１３８】表５倍精度２オペランドのレジ使用ＬＥＮデスティネーソースオペレーション型フィールドションレジスタレジスタ０任意任意Ｓ＝ｏｐＳ非００−３任意Ｓ＝ｏｐＳ非０４−１５０−３Ｖ＝ｏｐＳ非０４−１５４−１５Ｖ＝ｏｐＶTable 5 Double-precision 2-operand register use LEN destination source operation type field register register 0 any optional S = opS non-zero 0-3 optional S = opS non-zero 4-15 0-3 V = opS non-zero 4 -15 4-15 V = opV

【０１３９】４．命令セットＦＰＳ命令は三つのカテゴリーに分けることができる。・ＭＣＲとＭＲＣ − ＡＲＭとＦＰＳとの間の転送
オペレーション・ＬＤＣとＳＴＣ − ＦＰＳとメモリとの間のロー
ドと記憶のオペレーション・ＣＤＰ − データ処理オペレーション4. Instruction Set FPS instructions can be divided into three categories. -Transfer operations between MCR and MRC-ARM and FPS-Load and store operations between LDC and STC-FPS and memory-CDP-Data processing operations

【０１４０】４．１命令の同時実行性ＦＰＳのアーキテクチャの仕様の意図は二つのレベル、
すなわちパイプライン状の機能ユニットとＣＤＰ機能に
よる並列のロード／記憶オペレーションでの同時実行性
である。現在処理しているオペレーションと並列に実行
するためにこれらのオペレーションに対するレジスタ依
存性を持たないロードと記憶のオペレーションをサポー
トすることにより、著しい性能上の利点が得られる。4.1 Instruction Concurrency The intent of the FPS architecture specification is two levels:
That is, there is concurrency in parallel load / store operations by the pipeline-like functional unit and the CDP function. Supporting load and store operations that do not have register dependencies on these operations to execute in parallel with the operations currently being processed provides significant performance advantages.

【０１４１】４．２命令の直列化ＦＰＳは、現在実行しているすべての命令が完了して、
各々の例外ステータスがわかるまで、ＦＰＳがＡＲＭを
ビジー待ち合わせさせる単一の命令を指定する。例外が
生じている場合には、直列化命令は中断され、ＡＲＭで
例外処理が始まる。ＦＰＳの中の直列化命令は次の通り
である。・ＦＭＯＶＸ − 浮動小数点システムレジスタに対
する読み出しまたは書き込み4.2 Instruction Serialization The FPS completes all currently executing instructions,
The FPS specifies a single instruction that causes the ARM to wait busy until each exception status is known. If an exception has occurred, the serializing instruction is suspended and exception handling begins in the ARM. The serialization instructions in the FPS are as follows. • FMOVEX-Read or write to floating point system registers

【０１４２】現在の命令が完了するまで、浮動小数点シ
ステムレジスタに対するいかなる読み出しまたは書き込
みも停止される。システムＩＤレジスタに対するＦＭＯ
ＶＸ（ＦＰＳＩＤ）は、先行する浮動小数点命令によっ
て生じた例外をトリガする。ユーザステータス−制御レ
ジスタ（ＦＰＳＣＲ）上で（ＦＭＯＶＸを使用して）読
み出し／修正／書き込みを行うことを使用して、例外ス
テータスビットをクリアすることができる（ＦＰＳＣＲ
［４：０］）。[0142] Any reading or writing to the floating point system registers is halted until the current instruction is completed. FMO for system ID register
VX (FPSID) triggers an exception caused by a preceding floating point instruction. Reading / modifying / writing (using FMOVEX) on the User Status-Control Register (FPSCR) can be used to clear the exception status bit (FPSCR
[4: 0]).

【０１４３】４．３整数データを含む変換浮動小数点データと整数データとの間の変換は、整数デ
ータを含むデータ転送命令と変換を行うＣＤＰ命令とで
構成されるＦＰＳの二段階のプロセスである。整数フォ
ーマットのままＦＰＳレジスタで整数データに対して何
か算術オペレーションを試みると、結果は予測不能（Ｕ
ＮＰＲＥＤＩＣＴＡＢＬＥ）であり、このようなオペレ
ーションはどれも避けるべきである。4.3 Conversion Including Integer Data Conversion between floating point data and integer data is a two-step process of the FPS composed of a data transfer instruction including integer data and a CDP instruction for conversion. . If you try any arithmetic operation on integer data in the FPS register in integer format, the result is unpredictable (U
NPREDICTABLE) and any such operation should be avoided.

【０１４４】４．３．１整数データからＦＰＳレジス
タ内の浮動小数点データへの変換ＭＣＲのＦＭＯＶＳ命令を使用して、どちらかのＡＲＭ
レジスタから整数データを浮動小数点単精度レジスタに
ロードすることができる。このとき、整数−浮動のファ
ミリのオペレーションにより、ＦＰＳレジスタ内の整数
データを単精度または倍精度の浮動小数点値に変換し
て、デスティネーションのＦＰＳレジスタに書き込むこ
とができる。整数値がもはや必要とされない場合には、
デスティネーションレジスタはソースレジスタであって
もよい。整数は符号つきまたは符号無しの３２ビットの
数とすることができる。4.3.1 Converting Integer Data to Floating-Point Data in FPS Register Using the FMOVS instruction of the MCR, either ARM
Integer data can be loaded from registers to floating point single precision registers. At this time, by the operation of the integer-floating family, the integer data in the FPS register can be converted into a single-precision or double-precision floating-point value and can be written to the destination FPS register. If an integer value is no longer needed,
The destination register may be a source register. The integer can be a signed or unsigned 32-bit number.

【０１４５】４．３．２ＦＰＳレジスタ内の浮動小数
点データから整数データへの変換浮動−整数ファミリの命令により、ＦＰＳの単精度また
は倍精度のレジスタの中の値を符号つきまたは符号無し
の３２ビットの整数フォーマットに変換することができ
る。結果の整数は単精度のデスティネーションレジスタ
に入れられる。ＭＲＣのＦＭＯＶＳ命令を使用して、整
数データをＡＲＭレジスタに記憶することができる。4.3.2 Converting Floating-Point Data in FPS Registers to Integer Data The instructions in the floating-integer family allow the values in the single or double precision registers of the FPS to be signed or unsigned 32 bits. It can be converted to a bit integer format. The resulting integer is placed in a single-precision destination register. Integer data can be stored in the ARM register using the MRC FMOVS instruction.

【０１４６】４．４レジスタファイルのアドレス指定単精度スペース（Ｓ＝０）でオペレーションを行う命令
は、オペランドアクセスに対して命令フィールドで利用
できる５ビットを使用する。上位の４ビットはＦｎ、Ｆ
ｍ、またはＦｄと表されたオペランドフィールドに入っ
ている。アドレスの最下位ビットはそれぞれＮ、Ｍ、ま
たはＤの中にある。4.4 Register File Addressing Instructions that operate in single precision space (S = 0) use the five bits available in the instruction field for operand access. The upper 4 bits are Fn, F
m or in the operand field denoted Fd. The least significant bits of the address are in N, M, or D, respectively.

【０１４７】倍精度スペース（Ｓ＝１）で動作する命令
はオペランドアドレスの上位４ビットだけを使用する。
これらの４ビットはＦｎ、Ｆｍ、およびＦｄのフィール
ドに入っている。対応するオペランドフィールドにオペ
ランドアドレスが入っているとき、Ｎ、Ｍ、およびＤビ
ットに０が入っていなければならない。An instruction operating in a double-precision space (S = 1) uses only the upper 4 bits of the operand address.
These four bits are in the Fn, Fm, and Fd fields. The N, M, and D bits must contain 0 when the corresponding operand field contains an operand address.

【０１４８】４．５ＭＣＲ（ＡＲＭレジスタからコプ
ロセッサへの移動）ＭＣＲオペレーションには、ＦＰＳによるＡＲＭレジス
タ内のデータの転送または使用が含まれる。これには、
ＡＲＭレジスタから単精度フォーマットでのデータの移
動または一対のＡＲＭレジスタからＦＰＳレジスタへの
倍精度フォーマットでのデータの移動、ＡＲＭレジスタ
から単精度ＦＰＳレジスタへの符号つきまたは符号無し
の整数値のロード、および制御レジスタへのＡＲＭレジ
スタの内容のロードが含まれる。4.5 MCR (Move from ARM Register to Coprocessor) MCR operation involves the transfer or use of data in the ARM register by the FPS. This includes
Moving data from an ARM register in single-precision format or moving data from a pair of ARM registers to a FPS register in double-precision format; loading signed or unsigned integer values from an ARM register to a single-precision FPS register; And loading the contents of the ARM register into the control register.

【０１４９】ＭＣＲ命令に対するフォーマットがイラス
ト５に示されている。The format for the MCR instruction is shown in FIG.

【外５】イラスト５ＭＣＲ命令のフォーマット[Outside 5] Figure 5 MCR instruction format

【０１５０】表６ＭＣＲビットフィールドの定義ビット定義フィールドＯｐｃｏｄｅ３ビットのオペレーションコード（表７参照）ＲｄＡＲＭソースレジスタ符号化Ｓオペレーションオペランドのサイズ０ − 単精度オペランド１ − 倍精度オペランドＮ単精度オペレーション：デスティネーションレジスタ最下位ビット倍精度オペレーション：０に設定しなければならない、そうでないとオペレーションは未定義（ＵＮＤＥＦＩＮＥＤ）システムレジスタは移動リザーブドＦｎ単精度オペレーション：デスティネーションレジスタアドレス上位４ビット倍精度オペレーション：デスティネーションレジスタアドレスシステムレジスタ移動：００００−ＦＰＩＤ（コプロセッサＩＤ番号）０００１−ＦＰＳＣＲ（ユーザステータス−制御レジスタ）０１００−ＦＰＲＥＧ（レジスタファイル内容レジスタ）他のレジスタ符号化はリザーブドであり、種々のインプリメンテーションで異なり得る。ＲリザーブドビットTable 6 Definition of MCR bit field Bit definition field Opcode 3-bit operation code (see Table 7) Rd ARM source register encoding S Operation operand size 0-Single precision operand 1-Double precision operand N Single precision operation: Least Significant Bit of Destination Register Double precision operation: Must be set to 0, otherwise operation is undefined (UNDEFINED) System register moved Reserved Fn Single precision operation: Destination register address upper 4 bits Double precision operation: Destination register address System register move: 0000-FPID (coprocessor ID number) 0001-FPSCR (user status Status - Control Register) 0100-fpreg (Register File Content Register) Other register coding is reserved, may differ in a variety of implementations. R Reserved bit

【０１５１】表７ＭＣＲ操作コード（ｏｐｃｏｄｅ）フィールドの定義Ｏｐｃｏｄｅフィールド名前オペレーション０００ＦＭＯＶＳＦｎ＝Ｒｄ（３２ビット、コプロセッサ１０）０００ＦＭＯＶＬＤＬｏｗ（Ｆｎ）＝Ｒｄ（倍精度下位３２ビット、コプロセッサ１１）００１ＦＭＯＶＨＤＨｉｇｈ（Ｆｎ）＝Ｒｄ（倍精度上位３２ビット、コプロセッサ１１）０１０− Ｒｅｓｅｒｖｅｄ１１０１１１ＦＭＯＶＸシステムレジスタ＝Ｒｄ（コプロセッサ１０のスペース）Table 7 Definition of MCR Operation Code (opcode) Field Opcode Field Name Operation 000 FMOVS Fn = Rd (32 bits, coprocessor 10) 000 FMOVELD Low (Fn) = Rd (double precision lower 32 bits, coprocessor 11) 001 FMVHD High (Fn) = Rd (double precision upper 32 bits, coprocessor 11) 010- Reserved 110 111 FMOVEX system register = Rd (space of coprocessor 10)

【０１５２】注：３２ビットのデータオペレーションだ
けがＦＭＯＶ［Ｓ，ＨＤ，ＬＤ］命令によってサポート
される。Note: Only 32-bit data operations are supported by the FMOVE [S, HD, LD] instruction.

【０１５３】ＡＲＭレジスタまたは単精度レジスタの中
のデータだけがＦＭＯＶＳオペレーションによって移動
される。２個のＡＲＭレジスタから倍精度オペランドを
転送するために、ＦＭＯＶＬＤ命令とＦＭＯＶＨＤ命令
が下位半分と上位半分とをそれぞれ移動させる。Only the data in the ARM register or single precision register is moved by the FMOVS operation. To transfer double precision operands from the two ARM registers, the FMOVED and FMOVED instructions move the lower half and the upper half, respectively.

【０１５４】４．６ＭＲＣ（コプロセッサからＡＲＭ
レジスタへの移動／浮動レジスタの比較）ＭＲＣオペレーションには、ＦＰＳレジスタのデータの
ＡＲＭレジスタへの転送が含まれる。これには、単精度
値、または浮動小数点値の整数への変換結果を一つのＡ
ＲＭレジスタへ移動すること、あるいは倍精度ＦＰＳレ
ジスタを２個のＡＲＭレジスタへ移動することと、前の
浮動小数点比較オペレーションの結果によりＣＰＳＲの
ステータスビットを修正することが含まれる。ＭＲＣ命
令のフォーマットがイラスト６に示されている。4.6 MRC (from coprocessor to ARM
Move to Register / Compare Floating Register) MRC operation involves transferring data from the FPS register to the ARM register. This involves converting the result of converting a single-precision value or floating-point value to an integer into one A
Moving to the RM register, or moving the double-precision FPS register to two ARM registers, and modifying the status bits of the CPSR with the results of a previous floating point compare operation. The format of the MRC instruction is shown in FIG.

【０１５５】[0155]

【外６】イラスト６ＭＲＣ命令のフォーマット[Outside 6] Illustration 6 MRC instruction format

【０１５６】表８ＭＲＣビットフィールドの定義ビット定義フィールドＯｐｃｏｄｅ３ビットのＦＰＳオペレーションコード（表９参照）ＲｄＡＲＭデスティネーション＊レジスタ符号化Ｓオペレーションオペランドのサイズ０ − 単精度オペランド１ − 倍精度オペランドＮ単精度オペレーション：デスティネーションレジスタ最下位ビット倍精度オペレーション：０に設定しなければならない、そうでないとオペレーションは未定義（ＵＮＤＥＦＩＮＥＤ）システムレジスタは移動リザーブドＭリザーブドＦｎ単精度オペレーション：デスティネーションレジスタアドレス上位４ビット倍精度オペレーション：デスティネーションレジスタアドレスシステムレジスタ移動：００００−ＦＰＩＤ（コプロセッサＩＤ番号）０００１−ＦＰＳＣＲ（ユーザステータス−制御レジスタ）０１００−ＦＰＲＥＧ（レジスタファイル内容レジスタ）他のレジスタ符号化はリザーブドであり、種々のインプリメンテーションで異なり得る。ＦｍリザーブドＲリザーブドTable 8 Definition of MRC bit field Bit definition field Opcode 3-bit FPS operation code (see Table 9) Rd ARM destination * register encoding S Operation operand size 0-single precision operand 1-double precision operand N single Precision operation: Least significant bit of destination register Double precision operation: Must be set to 0, otherwise operation is undefined (UNDEFINED) System register moved Reserved M Reserved Fn Single precision operation: Destination register address upper 4 Bit Double-precision operation: Destination register address System register move: 0000-FPID (coprocessor ID number) 0001-FPSCR (User Status-Control Register) 0100-FPREG (Register File Content Register) Other register encodings are reserved and may differ in various implementations. Fm Reserved R Reserved

【０１５７】＊ＦＭＯＶＸＦＰＳＣＲ命令の場合、Ｒ
ｄフィールドにＲ１５（１１１１）が入っている場合、
ＣＰＳＲの上位４ビットは結果の条件コードで更新され
る。* In the case of the FMOVEX FPSCR instruction, R
When R15 (1111) is contained in the d field,
The upper 4 bits of the CPSR are updated with the resulting condition code.

【０１５８】表９ＭＲＣ操作コードフィールドの定義Ｏｐｃｏｄｅフィールド名前オペレーション０００ＦＭＯＶＳＲｄ＝Ｆｎ（３２ビット、コプロセッサ１０）０００ＦＭＯＶＬＤＲｄ＝Ｌｏｗ（Ｆｎ）。Ｄｎの下位３２ビットが転送される。（倍精度下位３２ビット、コプロセッサ１１）００１ＦＭＯＶＨＤＲｄ＝Ｈｉｇｈ（Ｆｎ）。Ｄｎの上位３２ビットが転送される。（倍精度上位３２ビット、コプロセッサ１１）０１０− Ｒｅｓｅｒｖｅｄ１１０１１１ＦＭＯＶＸＲｄ＝システムレジスタTable 9 Definition of MRC Operation Code Field Opcode Field Name Operation 000 FMOVS Rd = Fn (32 bits, coprocessor 10) 000 FMOVED Rd = Low (Fn). The lower 32 bits of Dn are transferred. (Lower 32 bits of double precision, coprocessor 11) 001 FMOVHD Rd = High (Fn). The upper 32 bits of Dn are transferred. (Double precision upper 32 bits, coprocessor 11) 010- Reserved 110 111 FMOVX Rd = system register

【０１５９】注：ＭＣＲＦＭＯＶ命令の注参照。Note: See the note for the MCR FMOV instruction.

【０１６０】４．７ＬＤＣ／ＳＴＣ（ロード／記憶Ｆ
ＰＳレジスタ）ＬＤＣとＳＴＣオペレーションはＦＰＳとメモリとの間
のデータ転送を行う。浮動小数点データは単一データ転
送または多重データ転送でどちらかの精度で転送するこ
とができる。この際、ＡＲＭアドレスレジスタは更新さ
れるか、または変化しないままとされる。移動多重オペ
ーションでのデータ構造に対する多重オペランドアクセ
スとともに、満杯の降順スタックと空き昇順スタックが
ともにサポートされる。ＬＤＣとＳＴＣに対する種々の
オプションの説明については表１１参照。ＬＤＣとＳＴ
Ｃの命令のフォーマットがイラスト７に示されている。4.7 LDC / STC (Load / Store F
(PS register) The LDC and STC operations perform data transfer between the FPS and the memory. Floating point data can be transferred with either precision in single data transfer or multiple data transfer. At this time, the ARM address register is updated or left unchanged. Both full descending stacks and empty ascending stacks are supported, along with multiple operand access to data structures in moving multiple operations. See Table 11 for a description of the various options for LDC and STC. LDC and ST
The format of the C instruction is shown in FIG.

【０１６１】[0161]

【外７】イラスト７ＬＤＣ／ＳＴＣ命令のフォーマット[Outside 7] Illustration 7 LDC / STC instruction format

【０１６２】表１０ＬＤＣ／ＳＴＣビットフィールドの定義ビット定義フィールドＰプリ／ポストインデキング（０＝ポスト、１＝プリ）Ｕアップ／ダウンビット（０＝ダウン、１＝アップ）Ｄ単精度オペレーション：ソース／デスティネーションレジスタ最下位ビット倍精度オペレーション：０に設定しなければならないＷライトバックビット（０＝非ライトバック、１＝ライトバック）Ｌ方向ビット（０＝ストア、１＝ロード）ＲｎＡＲＭベースレジスタ符号化Ｆｄ単精度オペレーション：ソース／デスティネーションレジスタアドレス上位４ビット倍精度オペレーション：ソース／デスティネーションレジスタアドレスＳオペレーションオペランドのサイズ０−単精度オペランド１−倍精度オペランドオフセット／符号無し８ビットオフセットまたはＦＬＤＭ（ＩＡ／ＤＢ）お転送よびＦＳＴＭ（ＩＡ／ＤＢ）に対して転送すべき単精度レジス数タ数（倍精度レジスタのカウントの２倍）。転送の最大ワード数は１６であり、１６個の単精度値または８個の倍精度値にそなえる。Table 10 Definition of LDC / STC Bit Field Bit Definition Field P Pre / Post Indexing (0 = Post, 1 = Pre) U Up / Down Bit (0 = Down, 1 = Up) D Single Precision Operation: Source / Destination register least significant bit double precision operation: Must be set to 0 W write back bit (0 = non-write back, 1 = write back) L direction bit (0 = store, 1 = load) Rn ARM Base register encoding Fd Single-precision operation: upper 4 bits of source / destination register address Double-precision operation: source / destination register address S Operation operand size 0-single-precision operand 1-double-precision operand offset / Unsigned 8-bit offset or number of single precision registers to be transferred for FLDM (IA / DB) and FSTM (IA / DB) (twice the count of double precision registers). The maximum number of words in a transfer is 16, which is 16 single-precision values or 8 double-precision values.

【０１６３】４．７．１ロードと記憶のオペレーショ
ンについての一般的な注意多重レジスタのロードと記憶は、ベクトルオペレーショ
ンが使用する４個または８個のレジスタ境界を横切るラ
ッピングなしに、レジスタファイルを通って線形に行わ
れる。レジスタファイルの端を通り越してロードしよう
とすることは予測不能（ＵＮＰＲＥＤＩＣＴＡＢＬＥ）
である。4.7.1 General Notes on Load and Store Operations Multiple register load and store operations are performed through the register file without wrapping across the four or eight register boundaries used by vector operations. It is done linearly. Attempting to load past the end of the register file is unpredictable (UNPREDICTABLE)
It is.

【０１６４】二重ロードまたは多重記憶に対するオフセ
ットに奇数のレジスタカウント１７以下が入っている場
合には、インプリメンテーションはもう一つの３２ビッ
トのデータ項目を書き込んだり、もう一つの３２ビット
のデータ項目を読み出したりしてもよいが、そうする必
要は無い。付加的なデータ項目を使用して、ロードされ
たり記憶されたりするときのレジスタの内容を識別する
ことができる。これは、レジスタファイルフォーマット
がその精度に対するＩＥＥＥ７５４のフォーマットと異
なり、各レジスタがメモリ内でそれを識別するために必
要な型情報をそなえているインプリメンテーションで有
用である。オフセットが奇数で、数が単精度レジスタの
数より大きい場合には、これを使用して、レジスタのコ
ンテキストスイッチとすべてのシステムレジスタを起動
してもよいIf the offset for a dual load or multiple store contains an odd register count of 17 or less, the implementation may write another 32-bit data item, or write another 32-bit data item. May be read, but there is no need to do so. Additional data items can be used to identify the contents of the registers as they are loaded and stored. This is useful in implementations where the register file format differs from the IEEE 754 format for its precision, and each register has the type information needed to identify it in memory. If the offset is odd and the number is greater than the number of single-precision registers, this may be used to trigger a context switch on the register and all system registers

【０１６５】表１１ロードと記憶のアドレス指定モードオプションＰＷオフセット／アドレス指定モード名前転送数型０転送：ライトバック無しに、多重ロード／記憶００転送すべきＦＬＤＭ＜ｃｏｎｄ＞＜Ｓ／Ｄ＞多重ロード／レジスタ数Ｒｎ，＜ｒｅｇｉｓｔｅｒ記憶ｌｉｓｔ＞ＦＳＴＭ＜ｃｏｎｄ＞＜Ｓ／Ｄ＞Ｒｎ，＜ｒｅｇｉｓｔｅｒｌｉｓｔ＞Ｒｎの実行開始アドレスから多重レジスタのロード／記
憶、とＲｎの修飾無し。レジスタ数は、単精度の場合に
は１から１６個、倍精度の場合には１から８個とするこ
とができる。オフセットフィールドには、３２ビットの
転送数が入っている。このモードを使用して、グラフィ
ックオペレーションに対する変換マトリックスおよび変
換に対する点をロードすることができる。例：ＦＬＤＭＥＱＳｒ１２，｛ｆ８−ｆ１１｝はｒ１
２のアドレスからの４個の単精度データを４個の浮動小
数点レジスタにロードする。ｓ８、ｓ９、ｓ１０、およ
びｒ１２は変化しない。ＦＳＴＭＥＱＤｒ４，｛ｆ
０｝はｄ０からの一つの倍精度データをｒ４のアドレス
に記憶する。ｒ４は変化しない。型１転送：Ｒｎのポストインデックスとライトバックを
使用して多重をロード／記憶する。０１転送すべきＦＬＤＭ＜ｃｏｎｄ＞ＩＡ＜Ｓ／Ｄ＞多重ロード／レジスタ数Ｒｎ！，＜ｒｅｇｉｓｔｅｒ記憶ｌｉｓｔ＞ＦＳＴＭ＜ｃｏｎｄ＞ＩＡ＜Ｓ／Ｄ＞Ｒｎ！，＜ｒｅｇｉｓｔｅｒｌｉｓｔ＞Ｒｎの実行開始アドレスから多重レジスタのロード／記
憶、とＲｎへの最後の転送の後に次のアドレスのライト
バック。オフセットフィールドは３２ビットの転送数で
ある。ＲｎへのライトバックはＯｆｆｓｅｔ＊４であ
る。多重ロードで転送される最大ワード数は１６であ
る。Ｕビットは１に設定しなければならない。これは、
空きの昇順スタックに記憶するためか満杯の降順スタッ
クからロードするため、または変換された点を記憶して
ポインタを次の点に歩進するため、そして多重データを
フィルタオペレーションにロードし、記憶するために、
使用される。例：ＦＬＤＭＥＱＩＡＳｒ１３！，｛ｆ１２−ｆ１
５｝はｒ１３のアドレスから４個の浮動小数点レジスタ
ｓ１２、ｓ１３、ｓ１４、およびｓ１５にロードし、系
列の次のデータを指すアドレスでｒ１３を更新する。型２転送：プリインデックスまたはＲｎを使用し、ライ
トバック無しで一つのレジスタをロード／記憶する。１０オフセットＦＬＤ＜ｃｏｎｄ＞＜Ｓ／Ｄ＞オフセット［Ｒｎ，＃＋／−ｏｆｆｓｅｔ］，でロード／Ｆｄ記憶ＦＳＴ＜ｃｏｎｄ＞＜Ｓ／Ｄ＞［Ｒｎ，＃＋／−ｏｆｆｓｅｔ］，ＦｄＲｎのアドレスのプリインクリメントを使用し、ライト
バック無しで、単一のレジスタをロード／記憶する。オ
フセット値はＯｆｆｓｅｔ＊４であり、加算（Ｕ＝１）
またはＲｎから減算（Ｕ＝０）されて、アドレスを生成
する。これは構造体へのオペランドアクセスに対して有
用であり、浮動小数点データのためメモリにアクセスす
るために使用される代表的な方法である。例：ＦＳＴＥＱＤｆ４，［ｒ８，＃＋８］は３２（８
＊４）だけオフセットされたｒ８のアドレスから倍精度
データをｄ４に記憶する。ｒ８は変化しない。型３転送：プリインデックスとライトバックを使用して
多重レジスタをロード／記憶する。１１転送すべきＦＬＤＭ＜ｃｏｎｄ＞ＤＢ＜Ｓ／Ｄ＞プリデクリメレジスタ数Ｒｎ！，＜ｒｅｇｉｓｔｅｒントで多重ｌｉｓｔ＞ロード／ＦＳＴＭ＜ｃｏｎｄ＞ＤＢ＜Ｓ／Ｄ＞記憶Ｒｎ！，＜ｒｅｇｉｓｔｅｒｌｉｓｔ＞Ｒｎのアドレスのプリデクリメントと新しい目的アドレ
スで多重レジスタをＲｎへロード／記憶。オフセットフ
ィールドには３２ビットの転送数が入っている。ライト
バック値はＲｎから減算されるＯｆｆｓｅｔ＊４であ
る。このモードは、満杯の降順スタックに記憶するた
め、または空きの昇順スタックからロードするために、
使用される。例：ＦＳＴＭＥＱＤＢＳｒ９！，｛ｆ２７−ｆ２９｝
は、ｒ９に入っている最後のエントリアドレスでｓ２
７、ｓ２８、およびｓ２９からの３個の単精度データを
空きの降順スタックに記憶する。ｒ９は新しい最後のエ
ントリへの点に更新される。Table 11 Load and Store Addressing Mode Option P W Offset / Addressing Mode Name Transfer Number Type 0 Transfer: Multiple Load / Store 0 0 Transfer Without Write Back FLDM <cond><S / D> Multiple load / register number Rn, <register storage list> FSTM <cond><S / D> Rn, <register list> Load / store multiple registers from the execution start address of Rn, and do not modify Rn. The number of registers can be 1 to 16 for single precision and 1 to 8 for double precision. The offset field contains a 32-bit transfer number. This mode can be used to load transformation matrices for graphic operations and points for transformations. Example: FLDMEQS r12, {f8-f11} is r1
Load four single precision data from two addresses into four floating point registers. s8, s9, s10, and r12 do not change. FSTMEQD r4, @f
0｝ stores one double-precision data from d0 at the address of r4. r4 does not change. Type 1 transfer: Load / store multiplex using Rn post index and write back. 0 1 FLDM <cond> IA <S / D> to be transferred Multiple load / number of registers Rn! , <Register memory list> FSTM <cond> IA <S / D> Rn! , <Register list> Load / store multiple registers from the start address of Rn and write back the next address after the last transfer to Rn. The offset field is a 32-bit transfer number. Write back to Rn is Offset * 4. The maximum number of words transferred in the multiple load is 16. The U bit must be set to one. this is,
To store in an empty ascending stack or to load from a full descending stack, or to store transformed points and advance the pointer to the next point, and to load and store multiplexed data in a filter operation for,
used. Example: FLDMEQIAS r13! , ｛F12-f1
5 # loads four floating-point registers s12, s13, s14, and s15 from the address of r13, and updates r13 with an address indicating the next data in the series. Type 2 transfer: Load / store one register using pre-index or Rn without write-back. 10 Offset FLD <cond><S / D> Offset [Rn, # + /-offset], Load / Fd storage FST <cond><S / D> [Rn, # + /-offset], Fd Rn Load / store a single register using address pre-increment and no write back. The offset value is Offset * 4, and addition (U = 1)
Alternatively, an address is generated by subtracting (U = 0) from Rn. This is useful for operand access to structures and is a typical method used to access memory for floating point data. Example: FSTEQD f4, [r8, # + 8] is 32 (8
* 4) The double-precision data is stored in d4 from the address of r8 which is offset by an amount. r8 does not change. Type 3 transfer: Load / store multiple registers using pre-index and write-back. 11 1 FLDM <cond> DB <S / D> to be transferred Number of predecreme registers Rn! , <Register with multiplex list> Load / FSTM <cond> DB <S / D> Store Rn! , <Register list> Pre-decrement the address of Rn and load / store the multiplex register into Rn with the new destination address. The offset field contains a 32-bit transfer number. The write-back value is Offset * 4 subtracted from Rn. This mode is used to store in a full descending stack or to load from a free ascending stack.
used. Example: FSTMEQDBS r9! , {F27-f29}
Is the last entry address in r9 and s2
7. Store the three single precision data from 7, s28 and s29 in the empty descending stack. r9 is updated with the point to the new last entry.

【０１６６】４．７．２ＬＤＣ／ＳＴＣオペレーショ
ンのまとめ表１２は、ＬＤＣ／ＳＴＣオペレーションコードのＰ、
Ｗ、およびＵビットに対して許容できる組み合わせ、お
よび妥当な各オペレーションに対するオフセットフィー
ルドの機能を表にしたものである。4.7.2 Summary of LDC / STC Operation Table 12 shows the LDC / STC operation codes P,
5 tabulates allowable combinations for the W and U bits, and the function of the offset field for each valid operation.

【０１６７】表１２ＬＤＣ／ＳＴＣオペレーションのまとめＰＷＵオフセットオペレーションフィールド０００ＵＮＤＥＦＩＮＥＤ００１レジスタＦＬＤＭ／ＦＳＴＭカウント０１０ＵＮＤＥＦＩＮＥＤ０１１レジスタＦＬＤＭＩＡ／ＦＳＴＭＩＡカウント１００オフセットＦＬＤ／ＦＳＴ１０１オフセットＦＬＤ／ＦＳＴ１１０レジスタＦＬＤＭＤＢ／ＦＳＴＭＤＢカウント１１１ＵＮＤＥＦＩＮＥＤ[0167] Table 12 LDC / STC Operation Summary P W U Offset operation field 0 0 0 UNDEFINED 0 0 1 Register FLDM / FSTM Count 0 1 0 UNDEFINED 0 1 1 register FLDMIA / FSTMIA Count 1 0 0 Offset FLD / FST 1 0 1 Offset FLD / FST 110 Register FLMDDB / FSTMDB Count 1 1 1 UNDEFINED

【０１６８】４．８ＣＤＰ（コプロセッサデータ処
理）ＣＤＰ命令には、浮動小数点レジスタファイルからのオ
ペランドを含み、レジスタファイルにライトバックされ
る結果を生じる、すべてのデータ処理オペレーションが
含まれている。特に関心があるのは、ＦＭＡＣ（乗算−
累積連鎖）オペレーション、オペランドの中の二つに乗
算を行い、第三のオペランドを加算するオペレーション
である。このオペレーションは、積に対してＩＥＥＥ丸
めオペレーションを行った後、第三のオペランドの加算
を行うという点で、融合された乗算−累積オペレーショ
ンと異なる。これにより、ジャバ（Ｊａｖａ）コードは
ＦＭＡＣオペレーションを利用することにより、別々に
乗算した後、加算を行うオペレーションに比べて、乗算
−累積オペレーションの速度を早くすることができる。4.8 CDP (Coprocessor Data Processing) CDP instructions include all data processing operations that involve operands from a floating point register file and that result in being written back to the register file. Of particular interest is FMAC (multiplication-
(Cumulative chain) operation, an operation of multiplying two of the operands and adding a third operand. This operation differs from the fused multiply-accumulate operation in that an IEEE rounding operation is performed on the product, followed by a third operand addition. By using the FMAC operation, the speed of the multiply-accumulate operation can be increased as compared with the operation of performing the multiplication separately after adding the Java code by using the FMAC operation.

【０１６９】ＣＤＰグループの二つの命令は、ＦＰＳレ
ジスタ内の浮動小数点値をその整数値に変換する際に有
用である。ＦＦＴＯＵＩ［Ｓ／Ｄ］は、ＦＰＳＣＲ内で
現在の丸めモードを使用して、単精度または倍精度の内
容をＦＰＳレジスタ内の符号無し整数に変換する。ＦＦ
ＴＯＵＩ［Ｓ／Ｄ］は、符号つき整数への変換を行う。
ＦＦＴＯＵＩＺ［Ｓ／Ｄ］とＦＦＴＯＳＩＺ［Ｓ／Ｄ］
は同じ機能を行うが、変換に対するＦＰＳＣＲ丸めモー
ドを無視し、小数部のビットを切り捨てる。ＦＦＴＯＳ
ＩＺ［Ｓ／Ｄ］の機能は、浮動小数点から整数への変換
において、Ｃ、Ｃ＋＋、およびジャバ（Ｊａｖａ）が必
要とする。ＦＦＴＯＳＩＺ［Ｓ／Ｄ］命令はこの機能を
提供し、変換に対してＦＰＳＣＲからＲＺへの丸めモー
ドビットの調整を必要としない。変換のためのサイクル
カウントはＦＦＴＯＳＩＺ［Ｓ／Ｄ］オペレーションの
サイクルカウントだけとなり、４サイクルから６サイク
ルが節約される。The two instructions in the CDP group are useful in converting a floating point value in an FPS register to its integer value. FFTOUI [S / D] uses the current rounding mode in the FPSCR to convert single or double precision content to unsigned integers in the FPS register. FF
TOUI [S / D] performs conversion into a signed integer.
FFTOUIZ [S / D] and FFTOSIZ [S / D]
Performs the same function, but ignores the FPSCR rounding mode for the transform and truncates the fractional bits. FFTOS
The function of IZ [S / D] requires C, C ++, and Java in the conversion from floating point to integer. The FFTOSIZ [S / D] instruction provides this function and does not require adjustment of the rounding mode bits from FPSCR to RZ for conversion. The cycle count for the conversion is only the cycle count of the FFTOSIZ [S / D] operation, saving four to six cycles.

【０１７０】比較オペレーションは、ＣＤＰＣＭＰ命令
とその後のＭＲＣＦＭＯＶＸＦＰＳＣＲ命令を使用
して、結果のＦＰＳフラグビットの付いたＡＲＭＣＰ
ＳＲフラグビットをロードすることにより、行われる。
比較オペランドの一つがＮａＮである場合には、ＩＮＶ
ＡＬＩＤ例外に対する可能性がある状態と無い状態と
で、比較オペレーションが提供される。比較オペランド
の一つがＮａＮである場合には、ＦＣＭＰＥとＦＣＭＰ
Ｅ０とが例外を伝えている間に、ＦＣＭＰとＦＣＭＰ０
とはＩＮＶＡＬＩＤを伝えない。ＦＣＭＰ０とＦＣＭＰ
Ｅ０とはＦｍフィールドの中のオペランドを０と比較
し、これに応じてＦＰＳフラグを設定する。ＡＲＭフラ
グＮ、Ｚ、Ｃ、およびＶは、ＦＭＯＶＸＦＰＳＣＲオ
ペレーションの後で次のように定義される。The compare operation uses the CDPCMP instruction followed by the MRC FMOVEX FPSCR instruction to execute the ARM CP with the resulting FPS flag bit.
This is done by loading the SR flag bit.
If one of the comparison operands is NaN, INV
A comparison operation is provided with and without potential ALID exceptions. If one of the comparison operands is NaN, FCMPE and FCMP
While E0 is signaling an exception, FCMP and FCMP0
Does not convey INVALID. FCMP0 and FCMP
E0 compares the operand in the Fm field with 0, and sets the FPS flag accordingly. ARM flags N, Z, C, and V are defined as follows after the FMOVEX FPSCR operation.

【０１７１】Ｎより小さいＺ等しいＣ以上または無秩序Ｖ無秩序Less than N Z equal to or greater than C or disorder V disorder

【０１７２】ＣＤＰ命令のフォーマットがイラスト８に
示されている。The format of the CDP instruction is shown in FIG.

【０１７３】[0173]

【外８】イラスト８ＣＤＰ命令のフォーマット[Outside 8] Illustration 8 CDP instruction format

【０１７４】表１３ＣＤＰビットフィールドの定義ビット定義フィールドＯｐｃｏｄｅ４ビットのＦＰＳオペレーションコード（表１４参照）Ｄ単精度オペレーション：デスティネーションレジスタ最下位ビット倍精度オペレーション：０に設定されなければならないＦｎ単精度オペレーション：ソースＡレジスタ上位４ビットまたはｏｐｃｏｄｅ最上位４ビットを拡張倍精度オペレーション：ソースＡレジスタアドレスまたはｏｐｃｏｄｅ最上位４ビットを拡張Ｆｄ単精度オペレーション：デスティネーションレジスタ上位４ビット倍精度オペレーション：デスティネーションレジスタアドレスＳオペレーションオペランドサイズ０−単精度オペランド１−倍精度オペランドＮ単精度オペレーション：ソースＡレジスタ最下位ビットｏｐｃｏｄｅ最下位ビットを拡張倍精度オペレーション：０に設定されなければならないｏｐｃｏｄｅ最下位ビットを拡張Ｍ単精度オペレーション：ソースＢレジスタ最下位ビット倍精度オペレーション：０に設定されなければならないＦｍ単精度オペレーション：ソースＢレジスタアドレス上位４ビット倍精度オペレーション：ソースＢレジスタアドレスTable 13 Definition of CDP bit field Bit definition field Opcode 4-bit FPS operation code (see Table 14) D Single precision operation: Least significant bit of destination register Double precision operation: Must be set to 0 Fn Single precision Operation: Extend upper 4 bits of source A register or upper 4 bits of opcode Double precision operation: Extend source A register address or upper 4 bits of opcode Fd Single precision operation: Upper 4 bits of destination register Double precision operation: Destination register Address S Operation operand size 0-Single precision operand 1-Double precision operand N Single precision operation: Source A register Lower bits opcode least significant bit extended Double precision operation: must be set to 0 opcode least significant bit extended M Single precision operation: Source B register least significant bit Double precision operation: must be set to 0 Fm single precision Operation: Upper 4 bits of source B register address Double precision operation: Source B register address

【０１７５】４．８．１操作コード（Ｏｐｃｏｄｅ）表１４はＣＤＰ命令に対する主要な操作コードを表にし
たものである。すべてのニーモニックは［ＯＰＥＲＡＴ
ＩＯＮ］［ＣＯＮＤ］［Ｓ／Ｄ］の形式をそなえてい
る。4.8.1 Operation Code (Opcode) Table 14 lists the main operation codes for the CDP instruction. All mnemonics are [OPERAT
ION] [COND] [S / D].

【０１７６】表１４ＣＤＰ操作コード仕様Ｏｐｃｏｄｅオペレーションオペレーションフィールドの名前００００ＦＭＡＣＦｄ＝Ｆｎ＊Ｆｍ＋Ｆｄ０００１ＦＮＭＡＣＦｄ＝−（Ｆｎ＊Ｆｍ＋Ｆｄ）００１０ＦＭＳＣＦｄ＝Ｆｎ＊Ｆｍ−Ｆｄ００１１ＦＮＭＳＣＦｄ＝−（Ｆｎ＊Ｆｍ−Ｆｄ）０１００ＦＭＵＬＦｄ＝Ｆｎ＊Ｆｍ０１０１ＦＮＭＵＬＦｄ＝−（Ｆｎ＊Ｆｍ）０１１０ＦＳＵＢＦｄ＝Ｆｎ−Ｆｍ０１１１ＦＮＳＵＢＦｄ＝−（Ｆｎ−Ｆｍ）１０００ＦＡＤＤＦｄ＝Ｆｎ＋Ｆｍ１００１− リザーブド１０１１１１００ＦＤＩＶＦｄ＝Ｆｎ／Ｆｍ１１０１ＦＲＤＩＶＦｄ＝Ｆｍ／Ｆｎ１１１０ＦＲＭＤＦｄ＝Ｆｎ％Ｆｍ（Ｆｄ＝Ｆｎ／Ｆｍ後に残る小数部）１１１１Ｅｘｔｅｎｄ２個のオペランドのオペレーションに対するオペレーションを指定するためにＦｎレジスタを使用する（表１５参照）Table 14 CDP Operation Code Specification Opcode Operation Operation Field Name 0000 FMAC Fd = Fn * Fm + Fd 0001 FNMAC Fd = − (Fn * Fm + Fd) 0010 FMSC Fd = Fn * Fm-Fd 0011 FNMSC Fd = − (Fn * Fm -Fd) 0100 FMUL Fd = Fn * Fm 0101 FNMUL Fd =-(Fn * Fm) 0110 FSUB Fd = Fn-Fm 0111 FNSUB Fd =-(Fn-Fm) 1000 FADD Fd = Fn + Fm 100+ Fn / Fm 1101 FRDIV Fd = Fm / Fn 1110 FMD Fd = Fn% Fm (Fd = decimal part after Fn / Fm) 1111 Extend Two operands Using the Fn register for specifying operations on Bae configuration (see Table 15)

【０１７７】４．８．２拡張オペレーション表１５は、操作コードフィールドで拡張値を使用して利
用できる拡張オペレーションを表にしたものである。直
列化命令とＦＬＳＣＢ命令を除いて、すべての命令は
［ＯＰＥＲＡＴＩＯＮ］［ＣＯＮＤ］［Ｓ／Ｄ］の形式
をそなえている。拡張オペレーションに対する命令符号
化は、Ｆｎオペランドに対するレジスタファイルへのイ
ンデックスと同様に形成される。すなわち、｛Ｆｎ
［３：０］，Ｎ｝の形式になっている。4.8.2 Extended Operations Table 15 lists the extended operations available using extended values in the operation code field. Except for the serialization instruction and the FLSCB instruction, all instructions have the format [OPERATION] [COND] [S / D]. The instruction encoding for the extended operation is formed similar to the index into the register file for the Fn operand. That is, ｛Fn
[3: 0], N}.

【０１７８】表１５ＣＤＰ拡張オペレーションＦｎ｜Ｎ名前オペレーション０００００ＦＣＰＹＦｄ＝Ｆｍ００００１ＦＡＢＳＦｄ＝ａｂｓ（Ｆｍ）０００１０ＦＮＥＧＦｄ＝−（Ｆｍ）０００１１ＦＳＱＲＴＦｄ＝ｓｑｒｔ（Ｆｍ）００１００− リザーブド００１１１０１０００ＦＣＭＰ＊Ｆｌａｇｓ：＝Ｆｄ⇔Ｆｍ０１００１ＦＣＭＰＥ＊Ｆｌａｇｓ：＝Ｆｄ⇔Ｆｍ例外レポートあり０１０１０ＦＣＭＰ０＊Ｆｌａｇｓ：＝Ｆｄ⇔００１０１１ＦＣＭＰＥ０＊Ｆｌａｇｓ：＝Ｆｄ⇔０例外レポートあり０１１００リザーブド − ０１１１００１１１１ＦＣＶＴＤＦｄ（倍精度レジスタ符号化）＝単精度か＜ｃｏｎｄ＞Ｓ＊ら倍精度へ変換されたＦｍ（単精度レジスタ符号化）（コプロセッサ１０）０１１１１ＦＣＶＴＳＦｄ（単精度レジスタ符号化）＝倍精度か＜ｃｏｎｄ＞Ｄ＊ら単精度へ変換されたＦｍ（倍精度レジスタ符号化）（コプロセッサ１１）１００００ＦＵＩＴＯ＊Ｆｄ＝符号無し整数を単／倍（Ｆｍ）に変換１０００１ＦＳＩＴＯ＊Ｆｄ＝符号つき整数を単／倍（Ｆｍ）に変換１００１０− リザーブド１０１１１１１０００ＦＦＴＯＵＩ＊Ｆｄ＝符号無し整数（Ｆｍ）への変換｛現在のＲＭＯＤＥ｝１１００１ＦＦＴＯＵＩＺ＊Ｆｄ＝符号無し整数（Ｆｍ）への変換｛ＲＺモード｝１１０１０ＦＦＴＯＳＩ＊Ｆｄ＝符号つき整数（Ｆｍ）への変換｛現在のＲＭＯＤＥ｝１１０１１ＦＦＴＯＳＩＺ＊Ｆｄ＝符号つき整数（Ｆｍ）への変換｛ＲＺモード｝１１１００− リザーブド１１１１１Table 15 CDP Extension Operation Fn | N Name Operation 000000 FCPY Fd = Fm 00001 FABS Fd = abs (Fm) 00010 FNEG Fd = − (Fm) 00011 FSQRT Fd = sqrt (Fm) 00100− Reserved 001111F1000g FC : = Fd @ Fm 01001 FCMPE * Flags: = Fd @ Fm Exception Report Available 01010 FCMP0 * Flags: = Fd @ 0 01011 FCMPE0 * Flags: = Fd @ 0 Exception Report Available 01100 Reserved -01110 D111FVT FCV Fm (single-precision register encoding) converted from <cond> S * to double-precision (coprocessor 10) 01111 FCVTS Fd (single-precision register encoding) = double-precision <cond> Fm converted from D * to single-precision (double-precision register encoding) (coprocessor 11) 10000 FUITO * Fd = unsigned integer Convert to double (Fm) 10001 FSITO * Fd = Convert signed integer to single / multiple (Fm) 10010-Reserved 10111 11000 FFTOUI * Fd = Convert to unsigned integer (Fm) {Current RMODE} 11001 FFTOUIZ * Fd = Conversion to unsigned integer (Fm) {RZ mode} 11010 FFTOSI * Fd = Conversion to signed integer (Fm) {Current RMODE} 11011 FFTOSIZ * Fd = Conversion to signed integer (Fm) RZ mode II 11100- Reserved 11111

【０１７９】＊ベクトル化不能のオペレーション。Ｌ
ＥＮフィールドは無視され、スカラオペレーションは指
定されたレジスタ上で行われる。* Non-vectorizable operations. L
The EN field is ignored and the scalar operation is performed on the specified register.

【０１８０】５．システムレジスタ5. System register

【０１８１】５．１システムＩＤレジスタ（ＦＰＳＩ
Ｄ）ＦＰＳＩＤには、ＦＰＳアーキテクチャ、およびインプ
リメンテーションで定義された識別値が入っている。こ
のワードを使用して、ＦＰＳのモデル、特徴セット、お
よび改訂、ならびにマスクセット番号を判定することが
できる。ＦＰＳＩＤは読み取り専用であり、ＦＰＳＩＤ
への書き込みは無視される。ＦＰＳＩＤレジスタのレイ
アウトについては、イラスト９参照。5.1 System ID Register (FPSI
D) The FPSID contains an identification value defined in the FPS architecture and implementation. This word can be used to determine the model, feature set, and revision of the FPS, as well as the mask set number. FPSID is read-only, FPSID
Writing to is ignored. See Illustration 9 for the layout of the FPSID register.

【０１８２】[0182]

【外９】イラスト９ＦＰＳＩＤレジスタの符号化[Outside 9] Fig. 9 Encoding of FPSID register

【０１８３】５．２ユーザステータス−制御レジスタ
（ＦＰＳＣＲ）ＦＰＳＣＲレジスタには、ユーザがアクセスできる構成
ビットと例外ステータスビットが入っている。構成オプ
ションには、例外イネーブルビット、丸め制御、ベクト
ルストライドと長さ、非正規オペランドの処理と結果、
およびデバッグモードの使用が含まれる。このレジスタ
は、ユーザとオペレーティングシステムコードがＦＰＳ
を構成し、完了したオペレーションのステータスを問い
合わせるためのものである。これはコンテキストスイッ
チの間に、セーブされ、リストアされなければならな
い。ビット３１から２８には、最も最近の比較命令から
のフラグ値が入っている。ビット３１から２８には、Ｆ
ＰＳＣＲの読み出しを使用してアクセスすることができ
る。ＦＰＳＣＲがイラスト１０に示されている。5.2 User Status Control Register (FPSCR) The FPSCR register contains user accessible configuration bits and exception status bits. Configuration options include exception enable bits, rounding control, vector stride and length, processing and results of subnormal operands,
And use of debug mode. This register contains the user and operating system code
To query the status of completed operations. It must be saved and restored during a context switch. Bits 31-28 contain the flag values from the most recent compare instruction. Bits 31 through 28 contain F
It can be accessed using a PSCR read. The FPSCR is shown in Illustration 10.

【０１８４】[0184]

【外１０】イラスト１０ユーザステータス−制御レジスタ（ＦＰ
ＳＣＲ）[Outside 10] Illustration 10 User status-control register (FP
SCR)

【０１８５】５．２．１ステータス比較と処理制御の
バイトビット３１から２８には、最も最近の比較オペレーショ
ンの結果、および特殊な状況でのＦＰＳの演算応答を指
定するのに有用な数個の制御ビットが入っている。ステ
ータス比較と処理制御のバイトのフォーマットがイラス
ト１１に示されている。5.2.1 Status Comparison and Processing Control Bytes Bits 31 through 28 contain the results of the most recent comparison operation and a number of useful bits to specify the operation response of the FPS in special situations. Contains control bits. The format of the status comparison and processing control bytes is shown in FIG.

【０１８６】[0186]

【外１１】イラスト１１ＦＰＳＣＲのステータス比較と処理制御
のバイト[Outside 11] Illustration 11 FPSCR status comparison and processing control bytes

【０１８７】表１６ＦＰＳＣＲのステータス比較と処理制御のバイトのフィールド定義レジスタ名前機能ビット３１Ｎ比較結果がより小さかった３０Ｚ比較結果が等しかった２９Ｃ比較結果が以上または無秩序だった２８Ｖ比較結果が無秩序だった２７：２５リザーブド２４ＦＺフラッシュトゥゼロ（Ｆｌｕｓｈｔｏｚｅｒｏ）０：ＩＥＥＥ７５４アンダフロー処理（デフォルト）１：ごく小さい結果を流して零とする結果がデスティネーションの精度に対する正規の範囲より小さい場合には、零がデスティネーションに書き込まれる。ＵＮＤＥＲＦＬＯＷ例外は用いられない。Table 16 FPSCR Status Comparison and Processing Control Byte Field Definition Register Name Function Bit 31 N Comparison Result was Smaller 30 Z Comparison Result was Equal 29 C Comparison Result was Above or Disordered 28 V Comparison Result 27:25 Reserved 24 FZ Flash to zero 0: IEEE754 underflow processing (default) 1: Flow very small results to zero Results are smaller than the normal range for destination accuracy If so, a zero is written to the destination. The UNDERFLOW exception is not used.

【０１８８】５．２．２システム制御バイトシステム制御バイトは丸めモード、ベクトルストライ
ド、およびベクトル長さフィールドを制御する。ビット
はイラスト１２に示されているように指定されている。
ＶＦＰｖ１アーキテクチャには、ベクトルオペレーショ
ンと一緒に使用するためのレジスタファイルストライド
機構が組み入れられている。ＳＴＲＩＤＥビットが００
に設定されている場合には、ベクトルオペレーションで
選択される次のレジスタはレジスタファイルの中の前の
レジスタの直後のレジスタとなる。正規のレジスタファ
イルのラッピング機構はストライド値の影響を受けな
い。１１のＳＴＲＩＤＥは、入力レジスタの全部と出力
レジスタを２だけインクリメントさせる。5.2.2 System Control Byte The system control byte controls the rounding mode, vector stride, and vector length fields. The bits are designated as shown in illustration 12.
The VFPv1 architecture incorporates a register file stride mechanism for use with vector operations. STRIDE bit is 00
, The next register selected by the vector operation is the register immediately following the previous register in the register file. The regular register file wrapping mechanism is not affected by the stride value. A STRIDE of 11 increments all of the input registers and the output registers by two.

【０１８９】たとえば、ＦＭＵＬＥＱＳＦ８，Ｆ１６，Ｆ２４は次の非ベクトルオペレーションを行う。ＦＭＵＬＥＱＳＦ８，Ｆ１６，Ｆ２４ＦＭＵＬＥＱＳＦ１０，Ｆ１８，Ｆ２６ＦＭＵＬＥＱＳＦ１２，Ｆ２０，Ｆ２８ＦＭＵＬＥＱＳＦ１４，Ｆ２２，Ｆ３０レジスタファイルの中の乗算に対するオペランドが事実
上、１レジスタではなくて、２レジスタだけ「ストライ
ド」する。For example, FMULEQS F8, F16, F24 perform the following non-vector operations. FMULEQS F8, F16, F24 FMULEQS F10, F18, F26 FMULEQS F12, F20, F28 FMULEQS F14, F22, F30 The operand for the multiplication in the register file effectively "strides" not two registers but two registers.

【０１９０】[0190]

【外１２】イラスト１２ＦＰＳＣＲシステム制御バイト[Outside 12] Illustration 12 FPSCR system control byte

【０１９１】表１７ＦＰＳＣＲシステム制御バイトのフィールド定義レジスタ名前機能ビット２３：１２ＲＭＯＤＥ丸めモードの設定００：ＲＮ（最も近い値への丸め、デフォルト）０１：ＲＰ（正無限大に向けての丸め）１０：ＲＭ（負無限大に向けての丸め）１１：ＲＺ（零に向けての丸め）２１：２０ＳＴＲＩＤＥベクトルレジスタアクセスを００：１（デフォルト）０１：リザーブド１０：リザーブド１１：２１９リザーブド（Ｒ）１８：１６ＬＥＮベクトル長さ。ベクトルオペレーションに対する長さを指定する。（各インプリメンテーションですべての符号化が利用できるわけではない。）０００：１（デフォルト）００１：２０１０：３０１１：４１００：５１０１：６１１０：７１１１：８Table 17 FPSCR System Control Byte Field Definition Register Name Function Bits 23:12 RMODE Rounding Mode Setting 00: RN (Round to nearest, default) 01: RP (Round to positive infinity) 10: RM (rounding toward negative infinity) 11: RZ (rounding toward zero) 21:20 STRIDE Vector register access 00: 1 (default) 01: Reserved 10: Reserved 11: 2 19 Reserved (R) 18:16 LEN vector length. Specifies the length for vector operations. (Not all encodings are available in each implementation.) 000: 1 (default) 001: 2 010: 3 011: 4 100: 5 101: 6 110: 7 111: 8

【０１９２】５．２．３例外イネーブルバイト例外イネーブルバイトはビット１５：８を占め、例外ト
ラップに対するイネーブルが入っている。ビットはイラ
スト１３に示すように指定されている。例外イネーブル
ビットは、浮動小数点例外条件の処理に対するＩＥＥＥ
７５４仕様の要求に合致する。ビットが設定された場合
には、例外はイネーブルされる。現在の命令に対する例
外条件が生じた場合には、ＦＰＳはユーザの可視トラッ
プをオペレーティングシステムに伝える。ビットがクリ
アされた場合には、例外はイネーブルされない。例外条
件の場合には、ＦＰＳはユーザの可視トラップをオペレ
ーティングシステムに伝えないが、数学的に妥当な結果
を生じる。例外イネーブルビットに対するデフォルトは
ディスエーブルされる。例外処理の更に詳しい情報につ
いては、ＩＥＥＥ７５４規格を参照されたい。5.2.3 Exception Enable Byte The exception enable byte occupies bits 15: 8 and contains the enable for exception trap. The bits are designated as shown in illustration 13. The exception enable bit sets the IEEE for handling floating point exception conditions.
Meets the requirements of the 754 specification. If the bit is set, the exception is enabled. If an exception condition occurs for the current instruction, the FPS signals the user's visible trap to the operating system. If the bit is cleared, no exception is enabled. In the case of exceptional conditions, the FPS does not communicate the user's visible trap to the operating system, but produces mathematically valid results. The default for the exception enable bit is disabled. See the IEEE 754 standard for more information on exception handling.

【０１９３】インプリメンテーションによっては、例外
がディスエーブルされたときでも、ハードウェアの機能
の外側で例外条件を処理するために、サポートコードへ
のバウンスを生じる。これは一般に、ユーザコードには
見えない。In some implementations, even when an exception is disabled, it causes a bounce to the support code to handle the exception condition outside of the hardware's function. This is generally not visible to user code.

【０１９４】[0194]

【外１３】イラスト１３ＦＰＳＣＲ例外イネーブル[Outside 13] Illustration 13 FPSCR exception enable

【０１９５】表１８ＦＰＳＣＲ例外イネーブルバイトのフィールドレジスタ名前機能ビット１５：１３リザーブド１２ＩＸＥ不正確な（Ｉｎｅｘａｃｔ）イネーブルビット０：ディスエーブルされる（デフォルト）１：イネーブルされる１１ＵＦＥアンダフローイネーブルビット０：ディスエーブルされる（デフォルト）１：イネーブルされる１０ＯＦＥオーバフローイネーブルビット０：ディスエーブルされる（デフォルト）１：イネーブルされる９ＤＺＥ零による割り算イネーブルビット０：ディスエーブルされる（デフォルト）１：イネーブルされる８ＩＯＥ無効オペランドイネーブルビット０：ディスエーブルされる（デフォルト）１：イネーブルされるTable 18 FPSCR Exception Enable Byte Field Register Name Function Bits 15:13 Reserved 12 IX Inexact Enable Bit 0: Disabled (Default) 1: Enabled 11 UFE Underflow Enable Bit 0 : Disabled (default) 1: Enabled 10 OFE overflow enable bit 0: Disabled (default) 1: Enabled 9 DZE Division enable bit by zero 0: Disabled (default) 1: Enabled 8 IOE Invalid Operand Enable Bit 0: Disabled (default) 1: Enabled

【０１９６】５．２．４例外ステータスバイトはＦＰ
ＳＣＲのビット７：０を占め、例外ステータスフラグビ
ットが入っている。浮動小数点例外毎に一つづつ、５個
の例外ステータスフラグビットがある。これらのビット
は「粘着性がある」。検出された例外によって一旦設定
されると、これらのビットはＦＰＳＣＲまたはＦＳＥＲ
ＩＡＬＣＬへのＦＭＯＶＸ書き込みの命令によってクリ
アされなければならない。ビットはイラスト１４に示す
ように指定される。イネーブルされた例外の場合には、
対応する例外ステータスビットは自動的に設定されるこ
とは無い。必要に応じて適当な例外ステータスビットを
設定することは、サポートコードのタスクである。いく
つかの例外は自動にし得る。すなわち、例外条件が検出
されると、例外イネーブルビットがどのように設定され
るかにかかわり無く、ＦＰＳは後続の浮動小数点命令に
バウンスする。これにより、ＩＥＥＥ７５４規格が必要
とする、より多くの関係する例外処理をハードウェアで
はなくソフトウェアで行うことができる。一例は、ＦＺ
ビットが０に設定されるアンダフロー条件である。この
場合、正しい結果は、結果の指数と丸めモードによって
決まる非正規化された数となり得る。ＦＰＳによって作
成者は、バウンスするオプションを含む応答を選択し、
サポートコードを利用することにより正しい結果を作成
して、この値をデスティネーションレジスタに書き込む
ことができる。アンダフロー例外イネーブルビットが設
定された場合には、サポートコードがオペレーションを
完了した後、ユーザのトラップハンドラが呼び出され
る。このコードはＦＰＳの状態を変更し、リターン、す
なわちプロセスを終了させることができる。5.2.4 Exception status byte is FP
Occupies bits 7: 0 of the SCR and contains the exception status flag bit. There are five exception status flag bits, one for each floating point exception. These bits are "sticky". Once set by the detected exception, these bits are set to FPSCR or FSER.
Must be cleared by an FMVX write command to IALCL. The bits are specified as shown in illustration 14. For an enabled exception,
The corresponding exception status bit is not set automatically. It is the task of the support code to set the appropriate exception status bits as needed. Some exceptions can be automatic. That is, when an exception condition is detected, the FPS bounces to a subsequent floating point instruction, regardless of how the exception enable bit is set. This allows more related exception handling required by the IEEE 754 standard to be performed in software rather than hardware. One example is FZ
This is an underflow condition in which a bit is set to 0. In this case, the correct result can be a denormalized number determined by the exponent of the result and the rounding mode. The FPS allows the author to select a response that includes an option to bounce,
A correct result can be created by using the support code, and this value can be written to the destination register. If the underflow exception enable bit is set, the user's trap handler is called after the support code has completed the operation. This code can change the state of the FPS and return, ie, terminate the process.

【０１９７】[0197]

【外１４】イラスト１４ＦＰＳＣＲ例外ステータスバイト[Outside 14] Illustration 14 FPSCR exception status byte

【０１９８】表１９ＦＰＳＣＲ例外ステータスバイト
のフィールド定義レジスタ名前機能ビット７：５リザーブド４ＩＸＣ不正確な例外検出３ＵＦＣアンダフロー例外検出２ＯＦＣオーバフロー例外検出１ＤＺＣ零で割る例外検出０ＩＯＣ無効オペレーション例外検出Table 19 FPSCR Exception Status Byte Field Definition Register Name Function Bits 7: 5 Reserved 4 IXC Incorrect Exception Detection 3 UFC Underflow Exception Detection 2 OFC Overflow Exception Detection 1 DZC Exception Detect by Zero 0 IOC Invalid Operation Exception detection

【０１９９】５．３レジスタファイル内容レジスタ
（ＦＰＲＥＧ）レジスタファイル内容レジスタは特権レジスタである。
デバッガはその中に入っている情報を使用して、現在実
行中のプログラムにより解釈されたようにレジスタの内
容を適当に提示することができる。ＦＰＲＥＧには１６
ビットが含まれており、レジスタファイルの中の倍精度
レジスタ毎に１ビットとなっている。ビットがセットさ
れると、そのビットによって表現される物理的なレジス
タ対が倍精度レジスタとしてディスプレイされるべきで
ある。そのビットがクリアである場合には、物理的なレ
ジスタが初期化される。すなわち、物理的なレジスタに
一つまたは二つの単精度データ値が含まれる。5.3 Register File Content Register (FPREG) The register file content register is a privileged register.
The debugger can use the information contained therein to suitably present the contents of the registers as interpreted by the currently executing program. 16 for FPREG
Bits, one bit for each double precision register in the register file. When a bit is set, the physical register pair represented by that bit should be displayed as a double precision register. If the bit is clear, the physical register is initialized. That is, a physical register contains one or two single precision data values.

【０２００】[0200]

【外１５】イラスト１５ＦＰＲＥＧレジスタの符号化[Outside 15] Illustration 15 Encoding of FPREG register

【０２０１】表２０ＦＰＲＥＧビットフィールドの定義ＦＰＲＥＧビットビットクリアビットセットＣ０Ｄ０妥当Ｓ１とＳ０が妥当、すなわち初期化されないＣ１Ｄ１妥当Ｓ３とＳ２が妥当、すなわち初期化されないＣ２Ｄ２妥当Ｓ５とＳ４が妥当、すなわち初期化されないＣ３Ｄ３妥当Ｓ７とＳ６が妥当、すなわち初期化されないＣ４Ｄ４妥当Ｓ９とＳ８が妥当、すなわち初期化されないＣ５Ｄ５妥当Ｓ１１とＳ１０が妥当、すなわち初期化されないＣ６Ｄ６妥当Ｓ１３とＳ１２が妥当、すなわち初期化されないＣ７Ｄ７妥当Ｓ１５とＳ１４が妥当、すなわち初期化されないＣ８Ｄ８妥当Ｓ１７とＳ１６が妥当、すなわち初期化されないＣ９Ｄ９妥当Ｓ１９とＳ１８が妥当、すなわち初期化されないＣ１０Ｄ１０妥当Ｓ２１とＳ２０が妥当、すなわち初期化されないＣ１１Ｄ１１妥当Ｓ２３とＳ２２が妥当、すなわち初期化されないＣ１２Ｄ１２妥当Ｓ２５とＳ２４が妥当、すなわち初期化されないＣ１３Ｄ１３妥当Ｓ２７とＳ２６が妥当、すなわち初期化されないＣ１４Ｄ１４妥当Ｓ２９とＳ２８が妥当、すなわち初期化されないＣ１５Ｄ１５妥当Ｓ３１とＳ３０が妥当、すなわち初期化されないTable 20 FPREG bit field definitions FPREG bit bit clear bit set C0 D0 valid S1 and S0 valid, ie not initialized C1 D1 valid S3 and S2 valid, ie not initialized C2 D2 valid S5 and S4 valid C3 D3 valid S7 and S6 valid, ie not initialized C4 D4 valid S9 and S8 valid, ie not initialized C5 D5 valid S11 and S10 valid, ie uninitialized C6 D6 valid S13 and S12 Is valid, ie not initialized C7 D7 valid S15 and S14 are valid, ie not initialized C8 D8 valid S17 and S16 are valid, ie not initialized C9 D9 valid S19 and S18 are valid, ie not initialized C10 D10 valid S21 and S20 are valid, ie not initialized C11 D11 valid S23 and S22 are valid, ie not initialized C12 D12 valid S25 and S24 are valid, ie not initialized C13 D13 valid S27 and S26 are valid, ie not initialized C14 D14 valid S29 and S28 are valid, ie not initialized C15 D15 valid S31 and S30 are valid, ie not initialized

【０２０２】６．例外処理ＦＰＳは、デバッグモードと正規モードの二つのモード
の中の一つで動作する。ＦＰＳＣＲでＤＭビットがセッ
トされれば、ＦＰＳはデバッグモードで動作する。この
モードではＦＰＳは一度に一つの命令を実行し、命令の
実行ステータスがわかるまでＡＲＭは待たされる。これ
により、命令の流れに対してレジスタファイルとメモリ
が的確になるが、実行時間がずっと長くなってしまう。
ＦＰＳは、リソースが許せば、ＡＲＭから新しい命令を
受け入れ、そして例外条件を検出したときに例外を伝え
る。ＡＲＭへの例外報告は常に、浮動小数点命令ストリ
ームに対して的確となる。ただし、ベクトルオペレーシ
ョンに続き、ベクトルオペレーションと並列に実行する
ロードまたは記憶のオペレーションの場合は除く。この
場合には、ロードオペレーションに対するレジスタファ
イルの内容、または記憶オペレーションに対するメモリ
が的確でなくなることがある。6. Exception Handling The FPS operates in one of two modes, debug mode and normal mode. If the DM bit is set in the FPSCR, the FPS operates in the debug mode. In this mode, the FPS executes one instruction at a time, and the ARM waits until the execution status of the instruction is known. This makes the register file and memory more accurate for the flow of instructions, but results in much longer execution times.
The FPS accepts new instructions from the ARM, if resources allow, and signals an exception when it detects an exception condition. Exception reporting to the ARM is always correct for floating point instruction streams. However, this does not apply to load or store operations that are executed in parallel with the vector operation following the vector operation. In this case, the contents of the register file for the load operation or the memory for the storage operation may not be accurate.

【０２０３】６．１サポートコードＦＰＳのインプリメンテーションは、ハードウェアとソ
フトウェアのサポートの組み合わせでＩＥＥＥ７５４に
従うように選ぶことができる。サポートされないデータ
型と自動例外の場合、サポートコードはＩＥＥＥ７５４
に従うハードウェアの機能を果たし、該当するときデス
ティネーションレジスタに結果を返送し、そしてユーザ
のトラップハンドラを呼び出したり、別の仕方でユーザ
のコードの流れを修正することなく、ユーザのコードに
戻る。ハードウェアだけが浮動小数点コードの処理に責
任を負うべきであったように、ユーザには見える。サポ
ートコードにバウンスしてこれらの特徴を取り扱うこと
により、特徴の実行または処理に要する時間が著しく長
くなるが、これらの状況の発生は通常、ユーザコード、
組込みアプリケーション、および良好に書かれた数値ア
プリケーションでは最小限となる。6.1 Support Code The implementation of the FPS can be chosen to comply with IEEE 754 with a combination of hardware and software support. For unsupported data types and automatic exceptions, the support code is IEEE754
, Returns the result to the destination register when applicable, and returns to the user's code without calling the user's trap handler or otherwise modifying the user's code flow. It appears to the user that only the hardware should have been responsible for processing the floating-point code. Handling these features by bouncing to the support code significantly increases the time required to execute or process the features, but these situations usually occur when user code,
Minimal for embedded applications and well-written numerical applications.

【０２０４】サポートコードは二つの成分をそなえるよ
うに考えられている。すなわち、ルーチンのライブラリ
と例外ハンドラのセットである。ルーチンのライブラリ
は、超越的な演算のような、ハードウェアの範囲を超え
るオペレーション、およびサポートされない入力または
例外を発生し得る入力での除算のような、サポートされ
た機能を実行する。例外ハンドラのセットは、ＩＥＥＥ
７５４に適合するために、例外トラップを処理する。サ
ポートされないデータ型またはデータ表現（たとえば、
非正規値）の適当な処理のエミュレーションを行うため
に、サポートコードはインプリメントされた機能を実行
する必要がある。ルーチンの出口でユーザの状態を元に
戻すように配慮された場合には、中間計算でＦＰＳを利
用するようにルーチンを書いてもよい。The support code is designed to have two components. That is, a set of routine libraries and exception handlers. The library of routines performs operations beyond the hardware, such as transcendental operations, and supported functions, such as division by unsupported inputs or inputs that can cause exceptions. The set of exception handlers is IEEE
Handle the exception trap to conform to 754. Unsupported data types or data representations (for example,
In order to emulate the appropriate processing (non-normal values), the support code needs to perform the implemented function. If care is taken to restore the user's state at the exit of the routine, the routine may be written to use the FPS for intermediate calculations.

【０２０５】６．２例外報告と処理例外条件が検出された後に発せられる次の浮動小数点命
令で、正規モードの例外がＡＲＭに報告される。ＡＲＭ
プロセッサ、ＦＰＳレジスタファイル、およびメモリの
状態は、例外が得られた時点で、反する命令に対して的
確でないかも知れない。命令の正しいエミュレーション
を行い、命令によって生じるいかなる例外をも処理する
ために、サポートコードは充分な情報を利用できる。6.2 Exception Reporting and Handling The next floating point instruction issued after an exception condition is detected, reports a normal mode exception to the ARM. ARM
The state of the processor, FPS register file, and memory may not be accurate to the offending instruction at the time the exception was obtained. Sufficient information is available to the support code to perform the correct emulation of the instruction and handle any exceptions caused by the instruction.

【０２０６】いくつかのインプリメンテーションでは、
サポートコードを使用することにより、無限大、Ｎａ
Ｎ、非正規データ、および零を含むＩＥＥＥ７５４の特
殊データでいくつかの、またはすべてのオペレーション
を処理することができる。そうするインプリメンテーシ
ョンは、これらのデータをサポートされないものとして
参照し、ユーザコードには一般に見えない仕方でサポー
トコードにバウンスし、デスティネーションレジスタに
ＩＥＥＥ７５４で指定された結果がある状態で戻る。オ
ペレーションによって生じるどの例外も、例外について
のＩＥＥＥ７５４の規則に従う。これには、対応する例
外イネーブルビットがセットされている場合の、ユーザ
コードへのトラップも含めることができる。In some implementations,
By using the support code, infinity, Na
Some or all operations may be processed with IEEE 754 special data including N, non-normal data, and zero. An implementation that does so will refer to these data as unsupported, bounce to the support code in a way that is generally invisible to user code, and return with the result specified in IEEE 754 in the destination register. Any exceptions caused by the operation follow the IEEE 754 rules for exceptions. This may include trapping in user code if the corresponding exception enable bit is set.

【０２０７】ＩＥＥＥ７５４規格は、ＦＰＳＣＲでイネ
ーブルされ、ディスエーブルされた例外の両方の場合に
ついて例外条件に対する応答を定めている。ＶＦＰｖ１
アーキテクチャは、ＩＥＥＥ７５４仕様に正しく従うた
めに使用されるハードウェアとソフトウェアとの間の境
界を指定していない。The IEEE 754 standard defines a response to exception conditions for both cases of enabled and disabled exceptions in the FPSCR. VFPv1
The architecture does not specify the boundaries between hardware and software used to correctly follow the IEEE 754 specification.

【０２０８】６．２．１サポートされないオペレーシ
ョンとフォーマットＦＰＳは、１０進データでのオペレーション、もしくは
１０進データへの変換または１０進データからの変換を
サポートしない。これらのオペレーションはＩＥＥＥ７
５４規格によって必要とされ、サポートコードにより提
供されなければならない。１０進データを利用しようと
するいかなる試みも、所望の機能に対するライブラリル
ーチンを必要とする。ＦＰＳは１０進データ型をそなえ
ていない。そしてＦＰＳは、１０進データを使用する命
令をトラップするために使用することはできない。6.2.1 Unsupported Operations and Formats The FPS does not support operations on, or conversion to or from decimal data. These operations are based on IEEE7
Required by the H.54 standard and must be provided by a support code. Any attempt to utilize decimal data requires a library routine for the desired function. FPS does not have a decimal data type. And FPS cannot be used to trap instructions that use decimal data.

【０２０９】６．２．２ＦＰＳがディスエーブルされ
るか、または例外的であるときのＦＭＯＶＸの使用ＳＵＰＥＲＶＩＳＯＲまたはＵＮＤＥＦＩＮＥＤモード
で実行されるＦＭＯＶＸ命令は、ＦＰＳが例外状態にあ
るか、または（インプリメンテーションがディスエーブ
ルオプションをサポートする場合に）ディスエーブルさ
れているとき、例外をＡＲＭに伝えさせることなく、Ｆ
ＰＳＣＲの読み出しと書き込みを行うか、もしくはＦＰ
ＳＩＤまたはＦＰＲＥＧの読み出しを行うことができ
る。6.2.2 Use of FMOVX When FPS Is Disabled or Exceptional The FMOVX instruction executed in SUPERVISOR or UNDEFINED mode requires that the FPS be in an exception state or (implemented). When the station is disabled (if the station supports the disable option), without causing the ARM to signal an exception,
Read and write PSCR or FP
Reading of SID or FPREG can be performed.

【０２１０】図２で説明した実施例に戻って、インクリ
メンタ５２はレジスタバンク３８のアドレス指定をする
ためのレジスタアドレスを発生する。レジスタバンク３
８に印加されたレジスタアドレスは図１８のレジスタア
ドレスラッチ４００の中に記憶される。各ベクトルオペ
レーションの間に、このレジスタアドレスは３ビット加
算器４０２に帰還され、そこでマルチプレクサ４０４に
よって選択された量と加算される。マルチプレクサ４０
４は値０、１、２、４の一つを選択する。マルチプレク
サ４０４はマルチプレクサ制御器４０６によって切り換
えられ、マルチプレクサ制御器４０６は制御レジスタ４
０８の中に記憶されたストライド値、ならびに実行され
ている命令４１０の中のデータサイズおよびレジスタ番
号に応答する。命令４１０からの初期レジスタ値が、マ
ルチプレクサ４０３を介して加算器４０２にロードされ
る。Returning to the embodiment described with reference to FIG. 2, the incrementer 52 generates a register address for addressing the register bank 38. Register bank 3
The register address applied to 8 is stored in register address latch 400 of FIG. During each vector operation, this register address is fed back to 3-bit adder 402, where it is added to the quantity selected by multiplexer 404. Multiplexer 40
4 selects one of the values 0, 1, 2, 4. The multiplexer 404 is switched by the multiplexer controller 406, and the multiplexer controller 406
Responding to the stride value stored in 08 and the data size and register number in the instruction 410 being executed. The initial register value from instruction 410 is loaded into adder 402 via multiplexer 403.

【０２１１】命令４１０の中のレジスタ番号がそのレジ
スタをスカラとして取り扱うべきであるということを示
す場合には、他のパラメータにかかわりなくインクリメ
ントとして値０が選択される。データサイズが単精度を
示す場合には、ストライド値（０または１）がインクリ
メント１または２を選択する。データサイズが倍精度を
示す場合には、ストライド値がインクリメント２または
４を選択する。If the register number in the instruction 410 indicates that the register should be treated as a scalar, then the value 0 is selected as the increment regardless of the other parameters. If the data size indicates single precision, the stride value (0 or 1) selects increment 1 or 2. If the data size indicates double precision, the stride value selects increment 2 or 4.

【０２１２】図１９Ａから１９Ｄは、メモリアドレスス
ペースの中の複素数の実部と虚部のインタリーブを考慮
に入れるためにストライド（ｓｔｒｉｄｅ）値２を使用
する長さ２のベクトル複素数乗算を示す。ブロックロー
ドオペレーションを使用することにより、第一のオペラ
ンドＡを表す二つの複素数がレジスタｓ１２からｓ１５
にロードされる。第二のブロックロードオペレーション
により、第二のオペランドに対する二つの複素数がレジ
スタｓ２０からｓ２３にロードされる。図１９Ａに示さ
れた第一の命令は、ストライド値２を使用して必要な値
を保持するレジスタの間を移動することにより（以下の
命令でも同様）、Ａ１＊Ｂ１およびＡ２＊Ｂ２に対する
二つの虚数の内積を計算する。図１９Ｂに示された第二
の命令は、Ａ１＊Ｂ１およびＡ２＊Ｂ２に対する四つの
外積の中の最初の二つを計算する。図１９Ｃに示された
第三の命令は、Ａ１＊Ｂ１およびＡ２＊Ｂ２に対する二
つの実数内積を計算し、各々から、前に計算されたそれ
ぞれの虚数内積を減算する。図１９Ｄに示された最後の
命令は、Ａ１＊Ｂ１およびＡ２＊Ｂ２に対する残りの二
つのそれぞれの外積を計算する。ｓ２９からｓ３１に記
憶される結果は、二つの複素数積の実部と虚部を隣接し
て配置されたものとなる。FIGS. 19A to 19D show a vector complex multiplication of length 2 using a stride value of 2 to take into account the interleaving of the real and imaginary parts of the complex number in the memory address space. By using a block load operation, two complex numbers representing the first operand A are stored in registers s12 to s15.
Is loaded. A second block load operation loads two complex numbers for the second operand into registers s20 through s23. The first instruction shown in FIG. 19A uses a stride value of 2 to move between registers that hold the required values (also in the following instructions), thereby providing a second instruction for A1 * B1 and A2 * B2. Calculates the dot product of two imaginary numbers. The second instruction shown in FIG. 19B computes the first two of the four cross products for A1 * B1 and A2 * B2. The third instruction shown in FIG. 19C computes two real dot products for A1 * B1 and A2 * B2, and subtracts from each the respective imaginary dot products previously calculated. The last instruction shown in FIG. 19D computes the respective outer product of the remaining two for A1 * B1 and A2 * B2. The result stored in s29 to s31 is a result in which the real part and the imaginary part of the two complex numbers are arranged adjacent to each other.

【０２１３】本発明の特定の実施例を説明してきたが、
本発明はこれに限定されないこと、そして本発明の範囲
内で多数の変形および追加を行い得ることは明らかであ
ろう。たとえば、本発明の範囲から逸脱することなく、
独立請求項の特徴に対して従属請求項の特徴を種々に組
み合わせることができる。Having described certain embodiments of the invention,
It will be apparent that the invention is not limited thereto and that many modifications and additions can be made within the scope of the invention. For example, without departing from the scope of the invention,
The features of the dependent claims can be combined in various ways with the features of the independent claims.

[Brief description of the drawings]

【図１】データ処理システムの概略図である。FIG. 1 is a schematic diagram of a data processing system.

【図２】スカラレジスタとベクトルレジスタの両方をサ
ポートする浮動小数点ユニットを示す図である。FIG. 2 shows a floating point unit that supports both scalar registers and vector registers.

【図３】単精度オペレーションの場合に、与えられたレ
ジスタがベクトルレジスタであるか、スカラレジスタで
あるかをどのように判定するかを示す流れ図である。FIG. 3 is a flow chart showing how to determine whether a given register is a vector register or a scalar register for single precision operation.

【図４】倍精度オペレーションの場合に、与えられたレ
ジスタがベクトルレジスタであるか、スカラレジスタで
あるかをどのように判定するかを示す流れ図である。FIG. 4 is a flow chart showing how to determine whether a given register is a vector register or a scalar register for double precision operation.

【図５】単精度オペレーションの際の、レジスタバンク
のサブセットへの分割と、各サブセット内のラッピング
を示す図である。FIG. 5 illustrates the division of register banks into subsets and wrapping within each subset during single precision operation.

【図６】倍精度オペレーションの際の、レジスタバンク
のサブセットへの分割と、各サブセット内のラッピング
を示す図である。FIG. 6 illustrates the division of register banks into subsets and the wrapping within each subset during double precision operation.

【図７】コプロセッサ命令がコプロセッサからどのよう
に見えるかを示す図であって、Ａはコプロセッサ命令が
主コプロセッサからどのように見えるかを示し、Ｂはコ
プロセッサ命令が単精度(single precision)と倍精度の
コプロセッサからどのように見えるかを示し、Ｃはコプ
ロセッサ命令が単精度のコプロセッサからどのように見
えるかを示す図である。FIG. 7 shows how coprocessor instructions look from the coprocessor, where A shows how the coprocessor instructions look from the main coprocessor, and B shows how the coprocessor instructions appear in single precision ( FIG. 4C shows how a single-precision coprocessor looks like from a single-precision coprocessor, and C shows how a coprocessor instruction looks from a single-precision coprocessor.

【図８】単精度と倍精度のコプロセッサを制御する主コ
プロセッサを示す図である。FIG. 8 is a diagram showing a main coprocessor that controls single-precision and double-precision coprocessors.

【図９】単精度のコプロセッサを制御する主コプロセッ
サを示す図である。FIG. 9 is a diagram illustrating a main coprocessor that controls a single-precision coprocessor.

【図１０】受信したコプロセッサ命令についてアクセプ
ト信号を主コプロセッサに返送すべきか判定する単精度
と倍精度のコプロセッサの中の回路を示す図である。FIG. 10 is a diagram showing a circuit in a single-precision and double-precision coprocessor that determines whether an accept signal should be returned to a main coprocessor for a received coprocessor instruction.

【図１１】受信したコプロセッサ命令についてアクセプ
ト信号を主コプロセッサに返送すべきか判定する単精度
のコプロセッサの中の回路を示す図である。FIG. 11 is a diagram showing a circuit in a single-precision coprocessor that determines whether to return an accept signal to a main coprocessor for a received coprocessor instruction.

【図１２】主コプロセッサの中の未定義命令例外処理を
示す図である。FIG. 12 is a diagram showing exception processing of an undefined instruction in the main coprocessor.

【図１３】本発明の好適実施例によるコプロセッサの要
素を示すブロック図である。FIG. 13 is a block diagram illustrating components of a coprocessor according to a preferred embodiment of the present invention.

【図１４】本発明の好適実施例によるレジスタ制御−命
令送出論理のオペレーションを示す流れ図である。FIG. 14 is a flowchart illustrating the operation of the register control and instruction dispatch logic according to a preferred embodiment of the present invention.

【図１５】本発明の好適実施例による浮動小数点レジス
タの内容の一例を示す図である。FIG. 15 illustrates an example of the contents of a floating point register according to a preferred embodiment of the present invention.

【図１６】クレイ１（Ｃｒａｙ１）プロセッサの中の
レジスタバンクを示す図である。FIG. 16 is a diagram showing a register bank in a Cray 1 processor.

【図１７】マルチチタン（ＭｕｌｔｉＴｉｔａｎ）プロ
セッサの中のレジスタバンクを示す図である。FIG. 17 illustrates a register bank in a MultiTitan processor.

【図１８】図２のレジスタアドレスインクリメンティン
グ回路部分を更に詳細に示す。FIG. 18 shows the register address incrementing circuit portion of FIG. 2 in more detail.

【図１９】ストライド値２を使用する複素数乗算の形式
のベクトル処理オペレーションを示す図であって、Ａは
第一の命令による結果を示す図であり、Ｂは第二の命令
による結果を示す図であり、Ｃは第三の命令による結果
を示す図であり、Ｄは第四の命令による結果を示す図で
ある。FIG. 19 illustrates a vector processing operation in the form of a complex multiplication using a stride value of 2, where A is the result of the first instruction and B is the result of the second instruction. C is a diagram showing a result of the third instruction, and D is a diagram showing a result of the fourth instruction.

[Explanation of symbols]

２２データ処理システム３０主メモリ３８レジスタバンク４８レジスタ制御−命令送出ユニット５０ベクトル制御ユニット５２インクリメンタ４０２３ビット加算器４０６マルチプレクサ制御器４０８制御レジスタ４１０命令 22 Data Processing System 30 Main Memory 38 Register Bank 48 Register Control-Instruction Sending Unit 50 Vector Control Unit 52 Incrementer 402 3-Bit Adder 406 Multiplexer Controller 408 Control Register 410 Instruction

Claims

[Claims]

1. A data processing device, comprising: a register bank having a plurality of registers for holding data values to be operated, wherein each of the registers has a register address; A register bank; and a memory for performing memory access between a plurality of consecutive address memory locations in the memory and a plurality of consecutive address registers in the register bank in response to at least one block memory access command. An access circuit; and an instruction decoder for performing a plurality of subsequent data processing operations on operands stored in the predetermined sequence of the registers in response to at least one vector processing instruction. And during each execution of the data processing operation, the instruction decoder strikes. In response to the id value, only increments (increments) the amount specified by the data processing operation register addresses the stride value of the register for storing the operands used by the data processing apparatus.

2. The data processing apparatus according to claim 1, wherein the stride value is applied to all registers operating as vector registers when the vector processing instruction is executed.

3. The data processing apparatus according to claim 1, wherein said stride value is set irrespective of said vector instruction processing.

4. The data processing device according to claim 3, wherein said stride value is stored in a control register.
A data processing apparatus, wherein the stride value specifies the amount of the register address increment for all executed vector processing instructions.

5. The data processing apparatus according to claim 1, wherein a current register address input provided to a first adder input is added to the quantity input provided to a second adder input. And a register address adder for incrementing the register address is included in the instruction decoder.

6. The data processing apparatus according to claim 5, wherein said stride value selects an amount to be applied to said second input.

7. The data processing device according to claim 1, wherein said stride value is a coded representation of said quantity.

8. The data processing device according to claim 7, wherein the quantity is a natural number raised to a power of two.

9. The data processing apparatus according to claim 1, wherein the register address is also incremented based on whether the register address is in a subset of the registers in the register bank. Data processing equipment.

10. The data processing apparatus according to claim 1, wherein said subsequent execution is performed in an execution pipeline.

11. The data processing device according to claim 1, wherein a positive number is applied to said register address by said increment.

12. A data processing method, comprising: storing a data value to be manipulated in a register bank having a plurality of registers each having a register address; and at least one block. Responsive to a memory access instruction, performing a memory access between a plurality of contiguous address memory locations in the memory and a plurality of contiguous address registers in the register bank; Responsively performing a plurality of data processing operations on operands stored in a predetermined sequence of said register; and, during each execution of said data processing operations, In response, an opera used in said data processing operation Register address of the register that stores the de is incremented (increment) the amount specified by the stride value, the data processing method.