JPH10143494A

JPH10143494A - Single-instruction plural-data processing for which scalar/vector operation is combined

Info

Publication number: JPH10143494A
Application number: JP9222417A
Authority: JP
Inventors: Moataz A Mohamed; エーモハメッドモアタズ; Heon Chul Park; チュルパクヒョン; Le Trong Nguyen; トロンギュエンリ; Roney Sau Don Wong; サードンウォンロニー
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 1996-08-19
Filing date: 1997-08-19
Publication date: 1998-05-29
Also published as: CN1152300C; KR19980018065A; DE19735349A1; FR2752629A1; CN1188275A; KR100267089B1; DE19735349B4; TW346595B; FR2752629B1

Abstract

PROBLEM TO BE SOLVED: To improve the efficiency and speed of a program at the time of applying multi-media by providing a vector register or a scalar register as an operand and parallelly operating the multiple data elements of the vector register so as to improve calculation ability. SOLUTION: A multi-media processor 100 is provided with a processing core 105 provided with a general purpose processor 110 and a vector processor 120. The processing core 105 is connected to the remaining of the multi-media processor 100 through a cache sub system 130 provided with SRAMs 160 and 190, a ROM 170 and a cache control 180. The processor 100 is realized by using a general purpose processor architecture. In response to a signal instruction, the processing core 105 parallelly performs arithmetic operations for connecting one element among the data elements from the vector register and a scalar value from the scalar register for the respective arithmetic operations.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明はディジタル信号プロ
セッサ、特にビデオ信号及びオーディオ信号の符号化(e
ncoding)及び復号化(decoding)のようなマルチメディア
機能に有利に適用される命令毎に多重データエレメント
の並列処理を行なうプロセッサに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a digital signal processor, and more particularly to the encoding of video and audio signals (e.g.
The present invention relates to a processor for performing parallel processing of multiple data elements for each instruction which is advantageously applied to multimedia functions such as ncoding and decoding.

【０００２】[0002]

【従来の技術】実時間ビデオ符号化及び復号化などのマ
ルチメディア応用のためのプログラム可能ディジタル信
号プロセッサ（ＤＳＰ：Digital Signal Processor、以
下ＤＳＰと称する）は、制限された時間内に処理される
べき多量のデータが発生するので高速な処理能力を必要
とする。例えば特開平６−３０９３４９号公報または特
開平６−２６６８６０号公報に示すように、ディジタル
信号プロセッサに対する幾つかのアーキテクチャ(archi
tecture)が知られている。大部分のマイクロプロセッサ
に採用されたこのような汎用アーキテクチャは、実時間
ビデオ符号化または復号化のための充分な計算能力を有
するＤＳＰを提供するためには高速演算周期を必要とす
る。このため、このようなＤＳＰは高コストとなる。2. Description of the Related Art Programmable digital signal processors (DSPs) for multimedia applications such as real-time video encoding and decoding must be processed in a limited amount of time. Since a large amount of data is generated, high-speed processing capability is required. For example, as shown in JP-A-6-309349 or JP-A-6-266860, there are several architectures (architectures) for digital signal processors.
tecture) is known. Such general-purpose architectures employed in most microprocessors require fast computation cycles to provide a DSP with sufficient computing power for real-time video encoding or decoding. For this reason, such a DSP becomes expensive.

【０００３】超長命令ワード（ＶＬＩＷ：Very Long In
struction Word、以下ＶＬＩＷと称する）プロセッサは
多くの機能ユニットを有するＤＳＰであって、これらの
大部分は相違し、比較的単純なタスク(task)を行う。Ｖ
ＬＩＷＤＳＰに対する単一命令は１２８バイト或いはそ
れ以上であり、分離された機能ユニットを並列に実行す
る分離された部分をもっている。ＶＬＩＷＤＳＰは多く
の機能ユニットが並列演算を行えるために高い計算能力
を備えている。また、ＶＬＩＷＤＳＰは各機能ユニット
が比較的小さくて単純なので比較的安価である。A very long instruction word (VLIW: Very Long In)
A struction word (VLIW) processor is a DSP having many functional units, most of which differ and perform relatively simple tasks. V
A single instruction to the LIWDSP is 128 bytes or more and has separate parts that execute separate functional units in parallel. The VLIWDSP has high computational power because many functional units can perform parallel operations. Also, the VLIWDSP is relatively inexpensive because each functional unit is relatively small and simple.

【０００４】[0004]

【発明が解決しようとする課題】ＶＬＩＷＤＳＰの問
題は、ＶＬＩＷＤＳＰの機能ユニットに対する並列実行
に適しない入出力制御、ホストコンピュータとの通信、
及び他の機能を処理することに対する非効率性である。
また、ＶＬＩＷソフトウェアは、通常のソフトウェアと
相違し、ＶＬＩＷソフトウェアアーキテクチャに慣れて
いるプログラマとプログラムツールが足りないために、
開発し難い。The problems of the VLIW DSP are: input / output control that is not suitable for parallel execution of functional units of the VLIW DSP, communication with a host computer,
And inefficiency in handling other functions.
In addition, VLIW software is different from ordinary software and lacks programmers and program tools that are accustomed to the VLIW software architecture.
Difficult to develop.

【０００５】妥当な費用、高い計算能力、及び馴染みの
プログラミング環境を提供するＤＳＰがマルチメディア
応用に要求されている。[0005] DSPs that provide reasonable cost, high computing power, and a familiar programming environment are required for multimedia applications.

【０００６】[0006]

【課題を解決するための手段】本発明の一特徴によれ
ば、マルチメディアディジタル信号プロセッサは、高い
計算能力を提供するためにベクトルデータ（即ち、オペ
ランド当たり多重データエレメント）を操作するベクト
ルプロセッサを含む。プロセッサはＲＩＳＣ型命令セッ
トを有する単一命令−多重データ(single-instruction-
multiple-data)アーキテクチャを使用する。プログラマ
にとっては、プログラム環境が馴染みの汎用プロセッサ
のプログラム環境と類似しているので、プログラムはベ
クトルプロセッサのプログラム環境に容易に適応でき
る。In accordance with one aspect of the present invention, a multimedia digital signal processor includes a vector processor that operates on vector data (ie, multiple data elements per operand) to provide high computational power. Including. The processor is a single instruction having a RISC type instruction set-single-instruction-
Use multiple-data) architecture. For a programmer, the program environment is similar to that of a familiar general-purpose processor, so that the program can be easily adapted to the program environment of a vector processor.

【０００７】ＤＳＰは１セットの汎用ベクトルレジスタ
を含む。各ベクトルレジスタは固定サイズをもっている
が、使用者の選択可能なサイズの分離されたデータエレ
メントに分割される。従って、ベクトルレジスタに記憶
されたデータエレメントの数は、エレメントに対する選
択されたサイズによって決定される。例えば、３２バイ
トレジスタは３２個の８ビットデータエレメント、１６
個の１６ビットデータエレメント、或いは８個の３２ビ
ットデータエレメントに分けられる。データサイズと形
式の選択はベクトルレジスタと演算されたデータを処理
する命令によって行われ、命令に対する実行データパス
は命令によって指示されたデータサイズによって多数の
並列演算を実行する。[0007] The DSP includes a set of general purpose vector registers. Each vector register has a fixed size, but is divided into separate data elements of a user selectable size. Thus, the number of data elements stored in the vector register is determined by the selected size for the element. For example, a 32-byte register has 32 8-bit data elements, 16
Divided into 16-bit data elements or 8 32-bit data elements. The selection of data size and format is made by a vector register and an instruction for processing the operated data, and an execution data path for the instruction executes a number of parallel operations according to the data size indicated by the instruction.

【０００８】ベクトルプロセッサに対する命令はオペラ
ンドとしてベクトルレジスタ或いはスカラレジスタをも
つことができ、計算能力が高くなるように並列にベクト
ルレジスタの多重データエレメントを操作することがで
きる。本発明によるベクトルプロセッサに対する命令セ
ットの例はコプロセッサインタフェース演算、フロー制
御演算、ロード／記憶演算、及び論理／算術演算を含
む。論理／算術演算は、データエレメントの結果的なデ
ータベクトルを発生するために、１つのベクトルレジス
タからのデータエレメントを、１つ或いはそれ以上の他
のベクトルレジスタからの対応するデータエレメントと
結合させる演算を含む。他の論理／算術演算は１つ或い
はそれ以上のベクトルレジスタからの各種のデータエレ
メントを混合するか、或いはベクトルレジスタからのデ
ータエレメントをスカラ量と結合させる。Instructions to the vector processor can have vector registers or scalar registers as operands, and can operate multiple data elements of the vector registers in parallel to increase computational power. Examples of instruction sets for a vector processor according to the present invention include coprocessor interface operations, flow control operations, load / store operations, and logical / arithmetic operations. Logical / arithmetic operations combine data elements from one vector register with corresponding data elements from one or more other vector registers to generate a resulting data vector of data elements. including. Other logic / arithmetic operations mix various data elements from one or more vector registers or combine data elements from vector registers with scalar quantities.

【０００９】ベクトルプロセッアーキテクチャの拡張は
それぞれスカラデータエレメントを含むスカラレジスタ
を加算する。スカラとベクトルレジスタの結合(combina
tion）は、ベクトルの各データエレメントをスカラ値と
並列に結合する命令を含むベクトルプロセッサの命令セ
ットの拡張を容易にする。例えば、１つの命令がベクト
ルのデータエレメントにスカラ値を乗算する。また、ス
カラレジスタは単一データエレメントの記憶場所を提供
してベクトルレジスタから抽出されるか、或いはベクト
ルレジスタに記憶されるようにする。また、スカラレジ
スタはベクトルプロセッサとスカラレジスタのみを備え
るアーキテクチャをもつコプロセッサとの間に情報をパ
スするか、或いはロード／記憶演算に対する有効アドレ
スの計算に便利である。An extension of the vector processor architecture adds scalar registers, each containing a scalar data element. Combining scalar and vector registers (combina
option) facilitates extending the instruction set of a vector processor, including instructions that combine each data element of a vector in parallel with a scalar value. For example, one instruction multiplies a vector data element by a scalar value. The scalar register also provides a storage location for a single data element to be extracted from or stored in the vector register. Scalar registers are also useful for passing information between a vector processor and a coprocessor having an architecture with only scalar registers, or for calculating effective addresses for load / store operations.

【００１０】本発明の他の特徴によれば、ベクトルプロ
セッサのベクトルレジスタはバンク(bank)から組み合わ
せられる。各バンクは“現在(current）”バンクとして
選択でき、一方他のバンクは“交替(alternative）”バ
ンクである。ベクトルプロセッサの制御レジスタで“現
在バンク”ビットは現在バンクを指示する。ビットの数
の減縮にはベクトルレジスタを識別することが必要であ
り、現在バンクにベクトルレジスタを識別するために若
干の命令はレジスタ番号のみを提供する。ロード／記憶
命令はあるバンクからベクトルレジスタを識別するため
に付加ビットをもつ。従って、ロード／記憶命令は現在
バンクでデータを操作する間、交替バンクでデータを取
り出すことができる。これはイメージ処理及びグラフィ
ック手続に対するソフトウェアパイプライニングを容易
にし、論理／算術演算が規則を外れて交替レジスタバン
クをアクセスするロード／記憶演算によって実行される
ことができるために、データ取出し時にプロセッサ遅延
を減らす。他の命令により交替バンクは現在バンクから
のベクトルレジスタと交替バンクからの対応するベクト
ルレジスタを含むダブルサイズベクトルレジスタの使用
を可能にする。このようなダブルサイズレジスタは命令
構文(syntax)から識別されることができる。ベクトルプ
ロセッサで制御ビットはデフォールトベクトルサイズが
１つ或いは２つのベクトルレジスタのいずれか１つにな
るように設定できる。また、交替バンクは２つのソース
と２つの目的地レジスタを有するシャフル(shuffle）、
アンシャフル(unshuffle）、飽和(saturate)、及び条件
移動のような複合命令の構文でより小さくて且つ明確な
識別されたオペランドを使用可能にする。According to another feature of the invention, the vector registers of the vector processor are combined from a bank. Each bank can be selected as a "current" bank, while the other banks are "alternative" banks. The "current bank" bit in the control register of the vector processor indicates the current bank. Reducing the number of bits requires identifying the vector register, and some instructions provide only the register number to identify the vector register in the current bank. Load / store instructions have additional bits to identify a vector register from a bank. Thus, a load / store instruction can retrieve data in a replacement bank while operating on data in the current bank. This facilitates software pipelining for image processing and graphics procedures, and increases processor delays during data fetches because logical / arithmetic operations can be performed by load / store operations that access the alternate register banks out of rule. cut back. Other instructions allow the replacement bank to use double-sized vector registers, including the vector register from the current bank and the corresponding vector register from the replacement bank. Such a double size register can be identified from an instruction syntax. The control bits in the vector processor can be set so that the default vector size is one or one of the two vector registers. The replacement bank also has a shuffle with two sources and two destination registers,
Enables the use of smaller and unambiguous identified operands in compound instruction syntax such as unshuffle, saturate, and conditional move.

【００１１】さらに、ベクトルレジスタは平均カッド(q
uad)、シャフル、アンシャフル、ペア式最大と交換、及
び飽和などの新規命令を具現する。これらの命令はビデ
オ符号化及び復号化のようなマルチメディア機能に共通
の演算を行い、他の命令セットが同一の機能を具現する
ために必要とする２或いはそれ以上の命令に代える。従
って、ベクトルプロセッサ命令セットはマルチメディア
応用時にプログラムの効率と速度を向上させる。Further, the vector register stores the average quad (q
uad), shuffle, unshuffle, exchange with paired max, and implement new instructions such as saturation. These instructions perform operations common to multimedia functions such as video encoding and decoding, replacing two or more instructions required by other instruction sets to implement the same function. Thus, the vector processor instruction set improves the efficiency and speed of the program in multimedia applications.

【００１２】[0012]

【発明の実施の形態】以下、添付図面を参照して本発明
の好ましい実施の形態をさらに詳しく説明する。図中の
同一部分には同一符号を付する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments of the present invention will be described below in detail with reference to the accompanying drawings. The same parts in the drawings are denoted by the same reference numerals.

【００１３】図１は本発明の実施の形態によるマルチメ
ディア信号プロセッサ１００（ＭＳＰ：Multimedia Sig
nal Processor)の実施の形態のブロック図を示す。マル
チメディアプロセッサ１００は汎用プロセッサ１１０と
ベクトルプロセッサ１２０を含むプロセッシングコア１
０５を含む。プロセッシングコア１０５はＳＲＡＭ１６
０，１９０、ＲＯＭ１７０、及びキャッシュコントロー
ル１８０を含むキャッシュサブシステム１３０を通して
マルチメディアプロセッサ１００の残りに接続されてい
る。キャッシュコントロール１８０はプロセッサ１１０
に対する命令キャッシュ１６２とデータキャッシュ１６
４でＳＲＡＭ１６０を構成することができ、ベクトルプ
ロセッサ１２０に対する命令キャッシュ１９２とデータ
キャッシュ１９４でＳＲＡＭ１９０を構成することがで
きる。FIG. 1 shows a multimedia signal processor 100 (MSP: Multimedia Sig) according to an embodiment of the present invention.
2 is a block diagram of an embodiment of the present invention. The multimedia processor 100 is a processing core 1 including a general-purpose processor 110 and a vector processor 120.
05 inclusive. The processing core 105 is an SRAM 16
0, 190, a ROM 170, and a cache subsystem 180, including a cache control 180, which are connected to the rest of the multimedia processor 100. The cache control 180 is the processor 110
Cache 162 and data cache 16 for
4 can configure the SRAM 160, and the instruction cache 192 and the data cache 194 for the vector processor 120 can configure the SRAM 190.

【００１４】ワンチップＲＯＭ１７０はプロセッサ１１
０，１２０に対するデータと命令を含み、且つキャッシ
ュから構成することができる。好ましい実施の形態にお
いて、ＲＯＭ１７０はリセット及び初期化手続、自己テ
スト診断手続、インタラプト及び例外処理器、及びサウ
ンドブラスタエミュレーション用サブルーチン、Ｖ．３
４モデム信号処理用サブルーチン、一般電話機能、１−
Ｄ及び３−Ｄグラフィックサブライブラリ、及びＭＰＥ
Ｇ−１、ＭＰＥＧ−２、Ｈ．２６１、Ｈ．２６３、Ｇ．
７２８、Ｇ．７２３のようなオーディオ及びビデオ標準
用サブルーチンライブラリを含む。The one-chip ROM 170 stores the processor 11
It contains data and instructions for 0, 120, and can consist of a cache. In the preferred embodiment, ROM 170 contains reset and initialization procedures, self-test diagnostic procedures, interrupt and exception handlers, and sound blaster emulation subroutines. 3
4 Modem signal processing subroutine, general telephone function, 1-
D and 3-D graphic sublibraries and MPE
G-1, MPEG-2, H.264. 261, H .; 263, G.R.
728, G.R. Includes a subroutine library for audio and video standards such as 723.

【００１５】キャッシュサブシステム１３０は、プロセ
ッサ１１０，１２０を２つのシステムバス１４０，１５
０に接続させ、プロセッサ１１０，１２０とバス１４
０、１５０に結合された装置に対するキャッシュとスイ
ッチングステーションとして作用する。システムバス１
５０はバス１４０よりさらに高いクロック周波数で動作
し、それぞれ外部ローカルメモリ、ホストコンピュータ
のローカルバス、ダイレクトメモリアクセス（ＤＭＡ：
Direct Memory Access）、及び各種アナログ／ディジタ
ル（Ａ／Ｄ）及びディジタル／アナログ（Ｄ／Ａ）変換
器に対するインタフェースを提供するデバイスインタフ
ェース１５２、ＤＭＡコントローラ１５４、ローカルバ
スインタフェース１５６、及びメモリコントローラ１５
８に接続されている。バス１４０にはシステムタイマ１
４２、ＵＡＲＴ(Universal Asynchronous Receiver Tra
nsceiver）１４４、ビットストリームプロセッサ１４
６、及びインタラプトコントローラ１４８が接続されて
いる。“マルチメディア信号プロセッサのマルチプロセ
ッサ動作及びビデオデータを処理するための方法及び装
置”の名称を有する本願出願と合体する特許出願は、プ
ロセッサ１１０，１２０がキャッシュシステム１３０と
バス１４０，１５０を通してアクセスする、好ましいデ
バイスとキャッシュサブシステム１３０の作用をさらに
詳しく説明している。The cache subsystem 130 connects the processors 110 and 120 to two system buses 140 and 15.
0, and the processors 110 and 120 and the bus 14
It acts as a cache and switching station for devices coupled to 0,150. System bus 1
The 50 operates at a higher clock frequency than the bus 140, and has an external local memory, a local bus of a host computer, and a direct memory access (DMA:
Direct Memory Access, and a device interface 152 for providing an interface to various analog / digital (A / D) and digital / analog (D / A) converters, a DMA controller 154, a local bus interface 156, and the memory controller 15.
8 is connected. The bus 140 has a system timer 1
42, UART (Universal Asynchronous Receiver Tra)
nsceiver) 144, bitstream processor 14
6, and the interrupt controller 148 are connected. A patent application merged with the present application, entitled "Method and Apparatus for Processing Multimedia Operation and Video Data of a Multimedia Signal Processor," has processors 110,120 accessed through cache system 130 and buses 140,150. The operation of the preferred device and cache subsystem 130 is described in further detail.

【００１６】プロセッサ１１０，１２０は分離されたプ
ログラムスレッド(thread)を実行し、それらに割り当て
られた特定タスクをより効率的に実行するために構造的
に相違する。プロセッサ１１０は実時間作動システムの
実行のような制御機能と多数の反復的な計算を要求しな
い類似機能を優先している。従って、プロセッサ１００
は高い計算能力を必要とせず、通常の汎用プロセッサア
ーキテクチャを用いて具現することができる。ベクトル
プロセッサ１２０は大部分のマルチメディア処理におい
て共通のデータブロックに対する反復的な演算を含むナ
ンバクランチング(number crunching)を施す。高い計算
能力と比較的単純なプログラミングのために、ベクトル
プロセッサ１２０はＳＩＭＤ(Single Instruction Mult
iple Data)アーキテクチャを有し、例示された実施の形
態でベクトルプロセッサ１２０における大部分のデータ
パスはベクトルデータ操作を支援するために２８８或い
は５７６ビットのうち１つの広さを有する。また、ベク
トルプロセッサ１２０に対する命令セットは特にマルチ
メディア問題に適した命令を含む。Processors 110 and 120 execute separate program threads and are structurally different to more efficiently perform the specific tasks assigned to them. Processor 110 favors control functions, such as the execution of a real-time operating system, and similar functions that do not require multiple repetitive calculations. Therefore, the processor 100
Does not require high computing power and can be implemented using a normal general-purpose processor architecture. The vector processor 120 performs number crunching including repetitive operations on common data blocks in most multimedia processing. Due to the high computational power and relatively simple programming, the vector processor 120 is a single instruction multiple result (SIMD).
Most data paths in the vector processor 120 in the illustrated embodiment have a width of one of 288 or 576 bits to support vector data operations. The instruction set for the vector processor 120 includes instructions particularly suitable for multimedia problems.

【００１７】上述の実施の形態において、プロセッサ１
１０は４０ＭＨｚで動作して、ＡＲＭ７標準によって定
義されたレジスタセットを含むＡＲＭ７プロセッサのア
ーキテクチャと一致する３２ビットＲＩＳＣプロセッサ
である。ＡＲＭ７ＲＩＳＣプロセッサに対するアーキテ
クチャと命令セットはAdvance RISC Machines Ltd.から
入手可能な“ＡＲＭ７ＤＭＤａｔａＳｈｅｅｔ”、文書
番号：ＡＲＭＤＤＩ００１０Ｇに記載されている。ＡＲ
Ｍ７ＤＭＤａｔａＳｈｅｅｔはこの出願に参考として含
まれる。後述する別添Ａには好ましい実施の形態でＡＲ
Ｍ７命令セットの拡張を説明している。In the above embodiment, the processor 1
10 is a 32-bit RISC processor operating at 40 MHz and consistent with the ARM7 processor architecture including a register set defined by the ARM7 standard. The architecture and instruction set for the ARM7 RISC processor is described in "ARM7DMDataSheet", available from Advance RISC Machines Ltd., document number ARMDDI0010G. AR
M7DMDataSheet is included in this application by reference. Attachment A to be described later has a preferred embodiment of AR
7 illustrates the extension of the M7 instruction set.

【００１８】ベクトルプロセッサ１２０はベクトルとス
カラ量を全て演算する。好ましい実施の形態において、
ベクトルプロセッサ１２０は８０ＭＨｚで動作するパイ
プライン構造のＲＩＳＣエンジンから構成されている。
ベクトルプロセッサ１２０のレジスタは３２ビットスカ
ラレジスタ、３２ビット特殊目的レジスタ、２バンクの
２８８ビットベクトルレジスタ、及び２ダブルサイズ
（例えば、５７６ビット）ベクトルアキュムレータレジ
スタを含む。後述する別添Ｃにはベクトルプロセッサ１
２０の好ましい実施の形態に対するレジスタセットを説
明する。好ましい実施の形態において、プロセッサ１２
０は０〜３１の５ビットレジスタ番号によって命令が識
別される３２個のスカラレジスタを含む。また、２バン
クの３２ベクトルレジスタ構造からなっている６４個の
２８８ビットベクトルレジスタを備えている。各ベクト
ルレジスタは１ビットのバンク番号（０または１）と０
〜３１の５ビットベクトルレジスタ番号によって識別さ
れる。大部分の命令はただベクトルプロセッサ１２０の
制御レジスタＶＣＳＲに記憶されたデフォールトバンク
ビットＣＢＡＮＫとして指示された現在バンクからベク
トルレジスタをアクセスする。第２制御ビットＶＥＣ６
４はデフォールトによるレジスタ番号が各バンクからレ
ジスタを含むダブルサイズベクトルレジスタを識別する
かを指示する。命令の構文はベクトルレジスタを識別す
るレジスタ番号をスカラレジスタを識別するレジスタ番
号と区別する。The vector processor 120 calculates all vectors and scalar quantities. In a preferred embodiment,
The vector processor 120 is composed of a pipelined RISC engine operating at 80 MHz.
The registers of the vector processor 120 include a 32-bit scalar register, a 32-bit special purpose register, two banks of 288-bit vector registers, and two double-sized (eg, 576 bits) vector accumulator registers. Attachment C described later has a vector processor 1
The register set for the twenty preferred embodiments will now be described. In a preferred embodiment, the processor 12
0 includes 32 scalar registers whose instructions are identified by 5-bit register numbers 0-31. Further, there are provided 64 288-bit vector registers having a 32-bank vector register structure of two banks. Each vector register has a 1-bit bank number (0 or 1) and 0
３１31 are identified by 5-bit vector register numbers. Most instructions simply access the vector registers from the current bank, indicated as the default bank bit CBANK stored in the control register VCSR of the vector processor 120. Second control bit VEC6
4 indicates whether the default register number identifies a double size vector register including a register from each bank. The instruction syntax distinguishes register numbers identifying vector registers from register numbers identifying scalar registers.

【００１９】各ベクトルレジスタはプログラム可能なサ
イズのデータエレメントに分割されることができる。表
１は２８８ビットベクトルレジスタ内でデータエレメン
トに対して支援されるデータ形式を示す。Each vector register can be divided into data elements of programmable size. Table 1 shows the data formats supported for data elements in the 288 bit vector register.

【００２０】[0020]

【表１】 [Table 1]

【００２１】後述する別添Ｄにおいて本発明の好ましい
実施の形態から支援されるデータサイズとデータ形式に
対する追加説明を提供する。In Appendix D below, additional explanations for data sizes and data formats supported by the preferred embodiment of the present invention are provided.

【００２２】ｉｎｔ９データ形式の場合、９ビットバイ
トが２８８ビットベクトルレジスタに必然的に包装され
るが、他のデータ形式の場合には２８８ビットベクトル
レジスタに全ての９ビットは使用されない。２８８ビッ
トベクトルレジスタは３２個の８ビット又は９ビット整
数データエレメント、１６個の１６ビット整数データエ
レメント、或いは８個の３２ビット整数または浮動小数
点エレメントを保有することができる。また、２ベクト
ルレジスタはダブルサイズベクトルでデータエレメント
を包装するように結合できる。本発明の好ましい実施の
形態で制御及び状態レジスタＶＣＳＲに制御ビットＶＥ
Ｃ６４を設定することは、ダブルサイズ（５７６ビッ
ト）がベクトルレジスタのデフォールトサイズの場合、
ベクトルプロセッサ１２０をモードＶＥＣ６４に設定す
る。In the case of the int9 data format, a 9-bit byte is necessarily packed in a 288-bit vector register, but in other data formats, all the 9 bits are not used in the 288-bit vector register. A 288 bit vector register can hold 32 8 bit or 9 bit integer data elements, 16 16 bit integer data elements, or 8 32 bit integer or floating point elements. Also, two vector registers can be combined to wrap data elements in double size vectors. In the preferred embodiment of the present invention, the control bit VE is stored in the control and status register VCSR.
Setting C64 means that if double size (576 bits) is the default size of the vector register,
The vector processor 120 is set to the mode VEC64.

【００２３】また、マルチメディアプロセッサ１００は
両プロセッサ１１０，１２０がアクセスし得る１セット
の３２ビット拡張レジスタ１１５を含む。後述する別添
Ｂにおいて本発明の好ましい実施の形態で１セットのレ
ジスタとそれらの機能を説明する。拡張レジスタとベク
トルプロセッサ１２０のスカラ及び特殊目的のレジスタ
は、幾つかの環境でプロセッサ１１０がアクセスし得
る。２つの特殊“使用者”拡張レジスタはプロセッサ１
１０，１２０が同時にレジスタを読み取れるように２つ
の読取りポートをもっている。他の拡張レジスタは同時
にアクセスされることができない。The multimedia processor 100 also includes a set of 32-bit extension registers 115 that can be accessed by both processors 110 and 120. A set of registers and their functions will be described in Appendix B described below in a preferred embodiment of the present invention. Extension registers and scalar and special purpose registers of the vector processor 120 may be accessed by the processor 110 in some circumstances. Two special "user" extension registers are processor 1
10, 120 have two read ports so that registers can be read simultaneously. Other extension registers cannot be accessed simultaneously.

【００２４】ベクトルプロセッサ１２０はベクトルプロ
セッサがランニング或いはアイドル状態にあるかを示す
２つの選択的な状態（ＶＰ＿ＲＵＮ，ＶＰ＿ＩＤＬＥ）
を有する。プロセッサ１１０はベクトルプロセッサ１２
０が状態ＶＰ＿ＩＤＬＥにある時、ベクトルプロセッサ
１２０のスカラ或いは特殊目的のレジスタを読み取るか
書き込むことができるが、ベクトルプロセッサ１２０が
状態ＶＰ＿ＲＵＮにある間にプロセッサ１１０がベクト
ルプロセッサ１２０のレジスタを読み取ったり書き込ん
だりした結果は未定である。The vector processor 120 has two optional states (VP_RUN, VP_IDLE) indicating whether the vector processor is running or idle.
Having. The processor 110 is a vector processor 12
When 0 is in state VP_IDLE, the scalar or special purpose registers of vector processor 120 can be read or written, but processor 110 can read or write registers of vector processor 120 while vector processor 120 is in state VP_RUN. The result is undecided.

【００２５】プロセッサ１１０に対するＡＲＭ７命令セ
ットの拡張は拡張レジスタとベクトルプロセッサ１２０
のスカラ及び特殊目的のレジスタをアクセスする命令を
含む。命令ＭＦＥＲ，ＭＦＥＰはそれぞれ、拡張レジス
タとベクトルプロセッサ１２０のスカラ或いは特殊目的
のレジスタからプロセッサ１１０の一般レジスタにデー
タを移動させる。命令ＭＴＥＲ，ＭＴＥＰはそれぞれ、
プロセッサ１１０の一般的なレジスタから拡張レジスタ
とベクトルプロセッサ１２０のスカラ或いは特殊目的の
レジスタにデータを移動させる。ＴＥＳＴＳＥＴ命令は
拡張レジスタを読み取り拡張レジスタのビット３０を１
に設定させる。命令ＴＥＳＴＳＥＴはプロセッサ１１０
が生産された結果を読み取るか、或いは使用したプロセ
ッサ１２０に対する信号を発生するようにビット３０を
設定することにより、使用者／生産者同期を容易にす
る。ＳＴＡＲＴＶＰ及びＩＮＴＶＰのようなプロセッサ
１１０に対する他の命令はベクトルプロセッサ１２０の
演算状態を制御する。The extension of the ARM7 instruction set to the processor 110 is performed by using the extension register and the vector processor 120.
And instructions to access special purpose registers. Instructions MFER and MFEP move data from the extension registers and scalar or special purpose registers of vector processor 120 to general registers of processor 110, respectively. The instructions MTER and MTE are respectively
Data is moved from general registers of processor 110 to extension registers and scalar or special purpose registers of vector processor 120. The TESTSET instruction reads the extension register and sets bit 30 of the extension register to 1
To be set. The instruction TESTSET is sent to the processor 110
Facilitates user / producer synchronization by setting the bit 30 to read the produced result or to generate a signal to the processor 120 used. Other instructions to the processor 110, such as STARTVP and INTVP, control the operation state of the vector processor 120.

【００２６】プロセッサ１１０はベクトルプロセッサ１
２０の演算を制御するマスタプロセッサとしての役割を
果たす。プロセッサ１１０，１２０の間の不均衡分割制
御を使用することはプロセッサ１１０，１２０の同期化
問題を単純化させる。プロセッサ１１０はベクトルプロ
セッサ１２０が状態ＶＰ＿ＩＤＬＥにある間にベクトル
プロセッサ１２０に対するプログラムカウンタに命令ア
ドレスを記録することにより、ベクトルプロセッサ１２
０を初期化させる。その後、プロセッサ１１０はベクト
ルプロセッサ１２０を状態ＶＰ＿ＲＵＮに変更させるＳ
ＴＡＲＴＶＰ命令を実行する。状態ＶＰ＿ＲＵＮにおい
てベクトルプロセッサ１２０はキャッシュサブシステム
１３０を通して命令を取り出し、プロセッサ１１０と並
列にそれら命令を実行し、引き続き自分のプログラムを
実行する。起動後にベクトルプロセッサ１２０は例外に
会うか、適切な条件が満足されてＶＣＪＯＩＮまたはＶ
ＣＩＮＴ命令を実行するか、或いはプロセッサ１１０に
よってインタラプトがかかる時まで実行し続ける。ベク
トルプロセッサ１２０は拡張レジスタに結果を記録する
か、プロセッサ１１０，１２０の共有アドレス空間に結
果を記録するか、或いはベクトルプロセッサ１２０が状
態ＶＰ＿ＩＤＬＥに再進入する時プロセッサ１１０がア
クセスするスカラ或いは特殊目的のレジスタに結果を残
すことにより、プロセッサ１１０に対するプログラム実
行の結果をパスすることができる。The processor 110 is a vector processor 1
It functions as a master processor that controls 20 operations. Using an unbalanced split control between the processors 110, 120 simplifies the synchronization problem of the processors 110, 120. The processor 110 records the instruction address in the program counter for the vector processor 120 while the vector processor 120 is in the state VP_IDLE, thereby
Initialize 0. Thereafter, the processor 110 causes the vector processor 120 to change to the state VP_RUN (S).
Execute the TARTVP instruction. In state VP_RUN, vector processor 120 fetches instructions through cache subsystem 130, executes those instructions in parallel with processor 110, and subsequently executes its own program. After activation, the vector processor 120 may encounter an exception or VCJOIN or VJOIN
Execute the CINT instruction or continue execution until interrupted by processor 110. The vector processor 120 may record the result in an extension register, record the result in the shared address space of the processors 110 and 120, or may access a scalar or special purpose processor 110 to access when the vector processor 120 reenters the state VP_IDLE. By leaving the result in the register, the result of the program execution for the processor 110 can be passed.

【００２７】ベクトルプロセッサ１２０は自分の例外を
処理することができない。例外を引き起こす命令の実行
時にベクトルプロセッサ１２０は状態ＶＰ＿ＩＤＬＥに
進入してプロセッサ１１０に対してダイレクトラインを
通してインタラプト要求(interrupt request）を発生す
る。ベクトルプロセッサ１２０はプロセッサ１１０が他
のＳＴＡＲＴＶＰ命令を実行する時まで状態ＶＰ＿ＩＤ
ＬＥに残っている。プロセッサ１１０は例外現象を判断
してベクトルプロセッサ１２０のレジスタＶＩＳＲＣを
読み取り、ベクトルプロセッサ１２０を更に初期化させ
ることによりできるだけ例外を処理し、その後所望に応
じて、実行を再び始めるようにベクトルプロセッサ１２
０を調整する。The vector processor 120 cannot handle its own exception. Upon execution of the instruction causing the exception, the vector processor 120 enters the state VP_IDLE and issues an interrupt request to the processor 110 through a direct line. Vector processor 120 is in a state VP_ID until processor 110 executes another STARTVP instruction.
Left in LE. The processor 110 determines the exception event, reads the register VISRC of the vector processor 120, handles the exception as much as possible by further initializing the vector processor 120, and then, if desired, re-executes the vector processor
Adjust 0.

【００２８】プロセッサ１１０によって実行されるＩＮ
ＴＶＰ命令は、ベクトルプロセッサ１２０がアイドル状
態ＶＰ＿ＩＤＬＥに進入するようにベクトルプロセッサ
１２０にインタラプトを掛ける。例えば、命令ＩＮＴＶ
Ｐはマルチタスクシステム(multitasking system）に用
いられ、ビデオ復号化のような１つのタスクからサウン
ドカードエミュレーションのような他のタスクにベクト
ルプロセッサを交換する。The IN executed by the processor 110
The TVP instruction interrupts the vector processor 120 so that the vector processor 120 enters the idle state VP_IDLE. For example, the instruction INTV
P is used in a multitasking system to exchange a vector processor from one task such as video decoding to another such as sound card emulation.

【００２９】ベクトルプロセッサ命令ＶＣＩＮＴ，ＶＣ
ＪＯＩＮは命令によって指示された条件が満足される場
合、ベクトルプロセッサ１２０による実行を停止し、状
態ＶＰ＿ＩＤＬＥにベクトルプロセッサ１２０を設定
し、このような要求が遮断されない場合、プロセッサ１
１０に対するインタラプトを発する。ベクトルプロセッ
サ１２０のプログラムカウンタ（特殊目的のレジスタＶ
ＰＣ）はＶＣＩＮＴ或いはＶＣＪＯＩＮ命令の次の命令
アドレスを示す。プロセッサ１１０はＶＣＩＮＴ或いは
ＶＣＪＯＩＮ命令がインタラプト要求を引き起こしたか
否かを判断するために、ベクトルプロセッサ１２０のイ
ンタラプトソースレジスタＶＩＳＲＣをチェックするこ
とができる。ベクトルプロセサ１２０は大きいデータバ
スをもっており且つレジスタのセーブ及び復旧にさらに
効率的なので、ベクトルプロセッサ１２０によって実行
されたソフトウェアは環境スイッチングの間、レジスタ
をセーブし復旧する。“マルチプロセッサにおける効率
的な環境セービング及び復旧”との名称の本願出願と関
連した他の出願には環境スイッチングに対する好ましい
システムが記述されている。Vector processor instructions VCINT, VC
JOIN stops execution by vector processor 120 if the condition indicated by the instruction is satisfied, sets vector processor 120 to state VP_IDLE, and if such a request is not interrupted, processor 1
Issue an interrupt for 10. The program counter of the vector processor 120 (special purpose register V
(PC) indicates the instruction address next to the VCINT or VCJOIN instruction. Processor 110 may check the interrupt source register VISRC of vector processor 120 to determine whether the VCINT or VCJOIN instruction has caused an interrupt request. Because the vector processor 120 has a large data bus and is more efficient at saving and restoring registers, software executed by the vector processor 120 saves and restores registers during environmental switching. Another application related to the present application entitled "Efficient Environmental Saving and Restoration in Multiprocessors" describes a preferred system for environmental switching.

【００３０】図２はベクトルプロセッサ１２０の好まし
い実施の形態の重要な機能ブロックを示す。ベクトルプ
ロセッサ１２０は命令取出しユニット（ＩＦＵ：Instru
ction Fetch Unit）２１０、デコーダ２２０、スケジュ
ーラ２３０、実行データパス２４０、及びロード／記憶
ユニット（ＬＳＵ：Load/Store Unit)２５０を含む。Ｉ
ＦＵ２１０は命令を取り出してブランチ(Branch)のよう
なフローコントロール命令を処理する。命令デコーダ２
２０はＩＦＵ２１０から達した順序によって各サイクル
ごとに１つの命令を復号化して、命令から復号化された
フィールド値をＦＩＦＯ方式でスケジューラ２３０に記
録する。スケジューラ２３０は演算実行段階において必
要とする実行制御レジスタに発行されるフィールド値を
選択する。発行選択は実行データパス２４０或いはロー
ド／記憶ユニット２５０のような処理資源の有効性とオ
ペランド(operand）依存性による。実行データパス２４
０はベクトルまたはスカラデータを操作する論理／算術
命令を実行する。ロード／記憶ユニット２５０はベクト
ルプロセッサ１２０のアドレス空間をアクセスするロー
ド／記憶命令を実行する。FIG. 2 shows the important functional blocks of the preferred embodiment of the vector processor 120. The vector processor 120 includes an instruction fetch unit (IFU: Instru
cfetch fetch unit (210), a decoder 220, a scheduler 230, an execution data path 240, and a load / store unit (LSU: Load / Store Unit) 250. I
The FU 210 fetches an instruction and processes a flow control instruction such as a branch. Instruction decoder 2
20 decodes one instruction every cycle according to the order of arrival from the IFU 210, and records the field value decoded from the instruction in the scheduler 230 in a FIFO manner. The scheduler 230 selects a field value issued to the execution control register required in the operation execution stage. The issue selection depends on the availability of processing resources such as the execution data path 240 or the load / storage unit 250 and the operand dependency. Execution data path 24
0 executes logical / arithmetic instructions that operate on vector or scalar data. The load / store unit 250 executes load / store instructions for accessing the address space of the vector processor 120.

【００３１】図３はメイン命令バッファ３１０と第２命
令バッファ３１２に分割された命令バッファを含むＩＦ
Ｕ２１０の実施の形態に対するブロック図を示す。メイ
ンバッファ３１０は現在プログラムカウントに対応する
命令を含む８つの連続命令を含む。第２命令バッファ３
１２はバッファ３１０命令の後続の８命令を含む。ＩＦ
Ｕ２１０はまたバッファ３１０或いは３１２の次のフロ
ーコントロール命令のターゲットを含んだ８連続命令を
含むブランチターゲットバッファ３１４を備える。好ま
しい実施の形態でベクトルプロセッサ１２０は各命令が
３２ビットで長い場合、ＲＩＳＣ形命令セットを使用
し、バッファ３１０，３１２，３１４は８×３２ビット
バッファであり、２５６ビット命令バスを通じてキャッ
シュサブシステム１３０に接続される。ＩＦＵ２１０は
単一クロックサイクル内にキャッシュサブシステム１３
０からバッファ３１０，３１２，３１４中のいずれか１
つに８命令をロードすることができる。レジスタ３４
０，３４２，３４４はそれぞれのバッファ３１０，３１
２，３１４にロードされた命令に対するベースアドレス
を指示する。FIG. 3 shows an IF including an instruction buffer divided into a main instruction buffer 310 and a second instruction buffer 312.
FIG. 4 shows a block diagram for an embodiment of U210. Main buffer 310 contains eight consecutive instructions, including the instruction corresponding to the current program count. Second instruction buffer 3
12 includes the subsequent 8 instructions of the buffer 310 instructions. IF
U210 also includes a branch target buffer 314 containing eight consecutive instructions containing the target of the next flow control instruction in buffer 310 or 312. In the preferred embodiment, the vector processor 120 uses a RISC-type instruction set if each instruction is 32 bits long, and the buffers 310, 312, 314 are 8x32 bit buffers and the cache subsystem 130 through a 256 bit instruction bus. Connected to. The IFU 210 uses the cache subsystem 13 within a single clock cycle.
0 to any one of buffers 310, 312, and 314
8 instructions can be loaded at a time. Register 34
0, 342, 344 are buffers 310, 31 respectively.
2, 314 indicates the base address for the instruction loaded.

【００３２】マルチプレクサＭＵＸ３３２はメイン命令
バッファ３１０から現在命令を選択する。もし、現命令
がフローコントロール命令でなく且つ命令レジスタ３３
０に記憶された命令が復号化段階の実行より前にある場
合、現命令は命令レジスタ３３０に記憶され、プログラ
ムカウントは増分される。プログラムカウントの増分が
バッファ３１０にある最終の命令を選択した後、次のセ
ットの８命令はバッファ３１０にロードされる。もしバ
ッファ３１２が所望の８命令を含む場合、バッファ３１
２とレジスタ３４２の内容は直ちにバッファ３１０とレ
ジスタ３４０に移動され、８以上の命令はキャッシュサ
ブシステム１３０から第２命令バッファ３１２に予め取
り出される。加算器３５０はマルチプレクサＭＵＸ３５
２によって選択されたオフセットレジスタ３４２のベー
スアドレスから次のセットの命令のアドレスを決定す
る。加算器３５０からの結果アドレスはレジスタ３４２
からのアドレスがレジスタ３４０に移動した場合に或い
はその後にレジスタ３４２に記憶される。さらに、計算
されたアドレスは８命令に対する要求を有するキャッシ
ュサブシステム１３０に送られる。キャッシュサブシス
テム１３０に対する予備呼出がバッファ３１０に要求さ
れる時、バッファ３１２に対する次の８命令がまだ備え
られていない場合、予め要求された命令はキャッシュサ
ブシステム１３０から受信され次第にバッファ３１０に
記憶される。The multiplexer MUX 332 selects the current instruction from the main instruction buffer 310. If the current instruction is not a flow control instruction and instruction register 33
If the instruction stored at 0 is before the execution of the decoding phase, the current instruction is stored in the instruction register 330 and the program count is incremented. After incrementing the program count selects the last instruction in buffer 310, the next set of eight instructions is loaded into buffer 310. If buffer 312 contains the desired eight instructions, buffer 31
2 and the contents of register 342 are immediately moved to buffer 310 and register 340, and eight or more instructions are prefetched from cache subsystem 130 to second instruction buffer 312. The adder 350 is a multiplexer MUX35.
2 determines the address of the next set of instructions from the base address of the offset register 342 selected. The result address from adder 350 is stored in register 342
Is moved to the register 340 or after that, is stored in the register 342. Further, the calculated address is sent to the cache subsystem 130 having a request for eight instructions. When a preliminary call to cache subsystem 130 is requested from buffer 310, if the next eight instructions to buffer 312 are not already provided, the previously requested instruction is stored in buffer 310 as soon as it is received from cache subsystem 130. You.

【００３３】現在命令がフローコントロール命令の場
合、ＩＦＵ２１０はフローコントロール命令に対する条
件を評価し、フローコントロール命令を従うプログラム
カウントをアップデートさせることにより命令を処理す
る。ＩＦＵ２１０は条件変更可能な従前の命令が完了し
ていないために、条件が決定されていない場合に保留に
なる。ブランチがなされない場合、プログラムは増分さ
れ、次の命令が前記のように選択される。もしブランチ
が成されてブランチターゲットバッファ３１４がブラン
チのターゲットを含む場合、バッファ３１４とレジスタ
３４４の内容がバッファ３１０及びレジスタ３４０に移
動され、ＩＦＵ２１０はキャッシュサブシステム１３０
からの命令を待たずにデコーダ２２０に引き続き命令を
提供する。If the current instruction is a flow control instruction, IFU 210 evaluates the condition for the flow control instruction and processes the instruction by updating the program count according to the flow control instruction. The IFU 210 is suspended if the condition has not been determined because the previous instruction whose condition can be changed has not been completed. If no branch is taken, the program is incremented and the next instruction is selected as described above. If a branch is taken and branch target buffer 314 contains the target of the branch, the contents of buffer 314 and register 344 are moved to buffer 310 and register 340, and IFU 210 causes cache subsystem 130
The instruction is continuously provided to the decoder 220 without waiting for the instruction from.

【００３４】ブランチターゲットバッファ３１４に対す
る命令を予め取り出すために、スキャナ３２０は現在プ
ログラムカウント後の次のフロー制御命令を探すため、
バッファ３１０，３１２をスキャニングする。もしフロ
ー制御命令がバッファ３１０または３１２から発見され
る場合、スキャナ３２０は命令を含むバッファ（３１０
又は３１２）のベースアドレスからフロー制御命令のタ
ーゲットアドレスを含む整列したセットの８命令に対す
るオフセットを決定する。マルチプレクサ３５２，３５
４はレジスタ３４０或いは３４２からバッファ３１４に
対する新しいベースアドレスを発生する加算器３５０に
ベースアドレスとフローコントロール命令からオフセッ
トを提供する。新しいベースアドレスはキャッシュサブ
システム１３０に印加されてブランチターゲットバッフ
ァ３１４に８命令を引き続き提供する。To prefetch instructions for branch target buffer 314, scanner 320 looks for the next flow control instruction after the current program count.
Scan the buffers 310 and 312. If a flow control instruction is found in the buffer 310 or 312, the scanner 320 returns to the buffer containing the instruction (310
Or 312) determine an offset from the base address to an ordered set of eight instructions including the target address of the flow control instruction. Multiplexers 352, 35
4 provides the base address and offset from the flow control instruction to adder 350 which generates a new base address for buffer 314 from register 340 or 342. The new base address is applied to the cache subsystem 130 to continue providing eight instructions to the branch target buffer 314.

【００３５】“減少及び条件部ブランチ”命令（ＶＤ１
ＣＢＲ，ＶＤ２ＣＢＲ，ＶＤ３ＣＢＲ）及び“変更制御
レジスタ”命令ＶＣＨＧＣＲのようなフローコントロー
ル命令を処理する場合、ＩＦＵ２１０はプログラムカウ
ントに付加してレジスタ値を変更することができる。Ｉ
ＦＵ２１０がフローコントロール命令でない命令を発見
した時、その命令は命令レジスタ３３０に送られてから
デコーダ２２０に送り出される。The "decrease and conditional branch" instruction (VD1
When processing flow control instructions such as CBR, VD2CBR, VD3CBR) and the "change control register" instruction VCHGCR, the IFU 210 can change the register value in addition to the program count. I
When the FU 210 finds an instruction that is not a flow control instruction, the instruction is sent to the instruction register 330 and then to the decoder 220.

【００３６】デコーダ２２０は図４に示すように、スケ
ジューラ２３０でＦＩＦＯバッファ４１０のフィールド
に制御値を記録することにより、命令を復号化する。Ｆ
ＩＦＯバッファ４１０は４行列のフリップフロップを含
み、各フリップフロップは１つの命令の実行を制御する
するための５フィールドの情報を含むことができる。行
列０〜行列３はそれぞれ一番古いものから一番新しい命
令に対する情報を保有し、ＦＩＦＯバッファ４１０の情
報はさらに古い情報が命令として完全に除去されたと
き、さらに低い行列にシフトされる。スケジューラ２３
０は実行レジスタ４２１〜４２７を含むコントロールパ
イプ４２０にロードされる命令の必要なフィールドを選
択することにより、実行端に命令を発行する。大部分の
命令は不規則的な順序で発行と実行を予定することがで
きる。特に論理／算術演算とロード／記憶演算の順序は
ロード／記憶演算と論理／算術演算との間のオペランド
従属性がない限り任意である。ＦＩＦＯバッファ４１０
でフィールド値の比較はあるオペランド従属性が存在す
るかを指示する。As shown in FIG. 4, the decoder 220 decodes the instruction by recording the control value in the field of the FIFO buffer 410 by the scheduler 230. F
IFO buffer 410 includes four rows of flip-flops, and each flip-flop may include five fields of information for controlling execution of one instruction. Matrices 0 through 3 each contain information for the oldest to newest instructions, and the information in FIFO buffer 410 is shifted to lower matrices when older information is completely removed as instructions. Scheduler 23
0 issues an instruction to the execution end by selecting a necessary field of the instruction to be loaded into the control pipe 420 including the execution registers 421 to 427. Most instructions can be scheduled for issue and execution in an irregular order. In particular, the order of the logical / arithmetic operation and the load / store operation is arbitrary as long as there is no operand dependency between the load / store operation and the logical / arithmetic operation. FIFO buffer 410
A comparison of field values indicates whether an operand dependency exists.

【００３７】図５はベクトルプロセッサ１２０のアドレ
ス空間をアクセスせず、レジスタ対レジスタ演算を行う
命令に対する６段実行パイプラインを示す。命令取出し
段階５１１でＩＦＵ２１０は前記のように命令を取り出
す。取出し端はＩＦＵ２１０がパイプライン遅延、未解
決ブランチ条件、或いは予め取り出された命令を提供す
るキャッシュサブシステム１３０における遅延によって
保留されない限り１クロックサイクルを必要とする。復
号化段階５１２でデコーダ２２０はＩＦＵ２１０から命
令を復号化してスケジューラ２３０に命令に対する情報
を記録する。尚、復号化段階５１２はＦＩＦＯ４１０で
いずれの行列も新しい演算を利用しない限り１クロック
サイクルを必要とする。演算はＦＩＦＯ４１０で第１サ
イクルの間コントロールパイプ４２０に発行されること
ができるが、さらに古い演算の発行によって遅延するこ
ともある。FIG. 5 shows a six-stage execution pipeline for an instruction that performs a register-to-register operation without accessing the address space of the vector processor 120. At instruction fetch stage 511, IFU 210 fetches the instruction as described above. The fetch end requires one clock cycle unless the IFU 210 is suspended by pipeline delays, outstanding branch conditions, or delays in the cache subsystem 130 that provides prefetched instructions. In the decoding step 512, the decoder 220 decodes the instruction from the IFU 210 and records information on the instruction in the scheduler 230. Note that the decoding step 512 requires one clock cycle unless any matrix in the FIFO 410 uses a new operation. The operation may be issued to the control pipe 420 during the first cycle in the FIFO 410, but may be delayed by the issuance of older operations.

【００３８】実行データパス２４０はレジスタ対レジス
タ演算を行い、ロード／記録演算に対するアドレスを提
供する。図８は実行データパス２４０の実施の形態のブ
ロック図であり、実行段階５１４，５１５，５１６と関
連して説明される。実行レジスタ４２１は読取り段階５
１４の間クロックサイクルで読み取られたレジスタファ
イル６１０に２レジスタを識別する信号を提供する。レ
ジスタファイル６１０は３２スカラレジスタと６４ベク
トルレジスタを含む。図９はレジスタファイルのブロッ
ク図である。レジスタファイル６１０は各クロックサイ
クルごとに２読取り及び２書込みを収容するように２つ
の読取りポートと２つの書込みポートをもっている。各
ポートは選択回路６１２，６１４，６１６或いは６１８
と、２８８ビットデータバス６１３，６１５，６１７或
いは６１９を含む。回路６１２，６１４，６１６，６１
８のような選択回路は当分野の周知された事項であり、
命令から典型的に抽出された５ビットレジスタ番号から
デコーダ２２０が導出するアドレス信号ＷＲＡＤＤＲ
１，ＷＲＡＤＤＲ２，ＲＤＡＤＤＲ１或いはＲＤＡＤＤ
Ｒ２と、命令又は制御状態レジスタＶＣＳＲからのバン
クビットと、レジスタがベクトルレジスタ或いはスカラ
レジスタであるかを指示する命令構文を使用する。デー
タ読取りはマルチプレクサ６５６を通してロード／記憶
ユニット２５０に対してなされるか、或いはマルチプレ
クサ６２２，６２４を通してマルチプライヤ６２０、算
術論理ユニット６３０、又はアキュムレータ６４０に対
してなされる。大部分の演算は２レジスタを読み取り、
読取り段階５１４は１サイクルで完了する。しかし、乗
算及び加算命令ＶＭＡＤ及びダブルサイズベクトルを調
整する命令のような幾つかの命令は２以上のレジスタか
らデータを必要とするので、読取り段階５１４は１クロ
ックサイクルよりさらに長くなる。Execution data path 240 performs register-to-register operations and provides addresses for load / record operations. FIG. 8 is a block diagram of an embodiment of execution datapath 240, which is described in connection with execution stages 514, 515, 516. The execution register 421 reads the read stage 5
It provides a signal identifying two registers to the register file 610 read in a clock cycle during fourteen. Register file 610 includes 32 scalar registers and 64 vector registers. FIG. 9 is a block diagram of the register file. Register file 610 has two read ports and two write ports to accommodate two reads and two writes each clock cycle. Each port is a selection circuit 612, 614, 616 or 618
And 288 bit data buses 613, 615, 617 or 619. Circuits 612, 614, 616, 61
Selection circuits such as 8 are well known in the art,
Address signal WRADDR derived by decoder 220 from the 5-bit register number typically extracted from the instruction
1, WRADDR2, RDADDR1 or RDADD
It uses R2, bank bits from the instruction or control status register VCSR, and an instruction syntax that indicates whether the register is a vector register or a scalar register. Data reads are made to load / store unit 250 through multiplexer 656 or to multiplier 620, arithmetic logic unit 630, or accumulator 640 through multiplexers 622,624. Most operations read two registers,
The read phase 514 is completed in one cycle. However, the read stage 514 is longer than one clock cycle because some instructions, such as the multiply and add instructions VMAD and the instructions that adjust the double size vector, require data from more than one register.

【００３９】実行段階５１５において、マルチプライヤ
６２０、算術論理ユニット６３０、及びアキュムレータ
６４０を通る間、処理データはレジスタファイル６１０
から予め読み取られる。実行段階５１５は必要なデータ
の読取りに多数のサイクルが要求される場合、読取り段
階５１４をオーバーラップすることができる。実行段階
５１５の期間はデータエレメントのタイプ（整数或いは
浮動小数点）及び処理されたデータの量（読み取りサイ
クルの数）によって変わる。実行レジスタ４２２，４２
３，４２５の信号は実行段階の間行われた第１演算のた
めに算術論理ユニット６３０、アキュムレータ６４０、
及びマルチプライヤ６２０に対する入力データを制御す
る。実行レジスタ４３２，４３３，４３５は実行段階５
１５の間行われた第２演算を制御する。During execution stage 515, while passing through multiplier 620, arithmetic logic unit 630, and accumulator 640, the processed data is stored in register file 610.
Is read in advance. The execution stage 515 can overlap the read stage 514 if a number of cycles are required to read the required data. The duration of the execution phase 515 depends on the type of data element (integer or floating point) and the amount of data processed (number of read cycles). Execution registers 422, 42
3,425 signals are used by the arithmetic and logic unit 630, the accumulator 640, for the first operation performed during the execution phase.
And input data to the multiplier 620. Execution registers 432, 433, and 435 indicate execution stage 5
The second operation performed during 15 is controlled.

【００４０】図１０は乗算器（マルチプライヤ）６２０
とＡＬＵ(arithmetic and logic unit）６３０の実施の
形態に対するブロック図である。マルチプライヤ６２０
は８つの独立した３６×３６ビットマルチプライヤ６２
６を含む整数マルチプライヤである。それぞれのマルチ
プライヤ６２６は制御回路に相互接続された４つの９×
９ビットマルチプライヤを含む。８ビット及び９ビット
データエレメントサイズをもつ場合、スケジューラ２３
０からの制御信号は４つの９×９ビットマルチプライヤ
を互いに分離させて各マルチプライヤ６２６が４乗算を
行うようにすることで、マルチプライヤ６２０が１サイ
クルの間３２独立乗算を行うようにする。１６ビットデ
ータエレメントの場合、制御回路は一対の９×９ビット
マルチプライヤが共に動作するように接続させて、マル
チプライヤ６２０は１６並列乗算を行う。３２ビット整
数データエレメント形の場合、８つのマルチプライヤ６
２６はクロックサイクルごとに８並列乗算を行う。乗算
の結果は９ビットデータエレメントサイズに対いて５７
６ビットを提供し、そして他のデータサイズに対して５
１２ビットを提供する。FIG. 10 shows a multiplier (multiplier) 620.
And a block diagram for an embodiment of an ALU (arithmetic and logic unit) 630. Multiplier 620
Are eight independent 36 × 36 bit multipliers 62
6 is an integer multiplier. Each multiplier 626 includes four 9 × interconnected control circuits.
Includes 9-bit multiplier. If it has 8-bit and 9-bit data element sizes, the scheduler 23
The control signal from 0 separates the four 9 × 9 bit multipliers from each other so that each multiplier 626 performs four multiplications, such that the multiplier 620 performs 32 independent multiplications during one cycle. . For a 16-bit data element, the control circuit connects a pair of 9 × 9-bit multipliers to operate together, and the multiplier 620 performs 16 parallel multiplications. For the 32-bit integer data element type, eight multipliers 6
26 performs eight parallel multiplications every clock cycle. The result of the multiplication is 57 for the 9-bit data element size.
Provides 6 bits and 5 for other data sizes
Provides 12 bits.

【００４１】ＡＬＵ６３０は２クロックサイクル内にマ
ルチプライヤ６２０から生成された５７６ビットまたは
５１２ビットの結果を処理することができる。ＡＬＵ６
３０は８つの独立した３６ビットＡＬＵ６３６を含む。
各ＡＬＵ６３６は浮動小数点加算と乗算のための３２×
３２ビット浮動小数点ユニットを含む。整数操作のため
に各ＡＬＵ６３６は独立した８ビット及び９ビット操作
を行うことができ、１６ビット及び３２ビット整数デー
タエレメントに対して２或いは４セットで互いに接続さ
れ得る４ユニットを含む。ALU 630 can process a 576-bit or 512-bit result generated from multiplier 620 in two clock cycles. ALU6
30 includes eight independent 36-bit ALUs 636.
Each ALU 636 has 32 × for floating point addition and multiplication.
Includes a 32-bit floating point unit. For integer operations, each ALU 636 can perform independent 8-bit and 9-bit operations and includes four units that can be connected together in two or four sets for 16-bit and 32-bit integer data elements.

【００４２】累算器（アキュムレータ）６４０は結果を
累算し、中間結果でさらに高い精密度のために２つの５
７６ビットレジスタを含む。An accumulator 640 accumulates the result and uses two 5's for higher precision in the intermediate result.
Includes a 76-bit register.

【００４３】記録段階５１６の間実行段階の結果はレジ
スタファイル６１０に記憶される。２つのレジスタは単
一クロックサイクルの間に記録されることができ、入力
マルチプレクサ６０２，６０５は記録される２データ値
を選択する。演算に対する記録段階５１６の期間は演算
結果として記録されるデータの量と、レジスタファイル
６１０に記録することによりロード命令を完了できるＬ
ＳＵ２５０からの完了によって異なる。実行レジスタ４
２６，４２７からの信号は論理ユニット６３０、アキュ
ムレータ６４０、及びマルチプライヤ６２０のデータが
記録されるレジスタを選択する。During the recording phase 516, the results of the execution phase are stored in the register file 610. Two registers can be recorded during a single clock cycle, and input multiplexers 602 and 605 select the two data values to be recorded. During the recording phase 516 for the operation, the amount of data to be recorded as the operation result and the L to complete the load instruction by recording in the register file 610.
It depends on the completion from SU 250. Execution register 4
Signals from 26 and 427 select the register where the data of logic unit 630, accumulator 640, and multiplier 620 are recorded.

【００４４】図６はロード命令の実行のための実行パイ
プライン５２０を示す。実行パイプライン５２０のため
の命令取出し段階５１１、復号化段階５１２、及び発行
段階５１３はレジスタ対レジスタ演算に対して説明され
たものと同一である。また、読取り段階５１４はキャッ
シュサブシステム１３０に対する呼出用アドレスを決定
するために実行データパス２４０がレジスタファイル６
１０からデータを使用することを除いては前述と同一で
ある。アドレス段階５２５においてマルチプレクサ６５
２，６５４，６５６は実行段階５２６，５２７のために
ロード／記憶ユニット２５０に提供されるアドレスを選
択する。ロード演算に対する情報は段階５２６，５２７
の間ＦＩＦＯ４１０に残留し、一方ロード／記憶ユニッ
ト２５０は演算を処理する。FIG. 6 shows an execution pipeline 520 for executing a load instruction. The instruction fetch stage 511, decoding stage 512, and issue stage 513 for the execution pipeline 520 are the same as described for register-to-register operations. In addition, the read stage 514 determines whether the execution data path 240 is to call the register file 6 to determine the calling address for the cache subsystem 130.
10 is the same as above except that data is used. Multiplexer 65 in address stage 525
2, 654, 656 select the addresses provided to load / store unit 250 for execution stages 526, 527. Information for the load operation is provided in steps 526 and 527
During this period, the load / store unit 250 processes the operation.

【００４５】図１１はロード／記憶ユニット２５０に対
する実施の形態を示す。２５６段階の間５２５段階で決
定されたアドレスのデータのためにキャッシュサブシス
テム１３０に対するコール（ｃａｌｌ）を行う。好まし
い実施の形態はプロセッサ１１０，１２０を含む多重デ
バイスがキャッシュサブシステム１３０を通してローカ
ルアドレス空間をアクセスする場合、トランザクション
ベースキャッシュコール(transaction based cache cal
l)を使用する。要求されたデータはキャッシュサブシス
テム１３０に対するコール後に幾つかのサイクルの間に
使用し得ないが、ロード／記憶ユニット２５０は他のコ
ールが保留(pending）されている間キャッシュサブシス
テムに対するコールをすることができる。従って、ロー
ド／記憶ユニット２５０は停止されない。要求されたデ
ータを提供するためにキャッシュサブシステムに要求さ
れるクロックサイクルの数は、データキャッシュ１９４
にヒット或いはミスが存在するかによっている。FIG. 11 shows an embodiment for the load / storage unit 250. During step 256, a call is made to the cache subsystem 130 for the data at the address determined in step 525. The preferred embodiment provides a transaction based cache call when multiple devices, including processors 110 and 120, access the local address space through cache subsystem 130.
Use l). Although the requested data may not be available during some cycles after a call to the cache subsystem 130, the load / store unit 250 makes a call to the cache subsystem while other calls are pending. be able to. Therefore, the load / store unit 250 is not stopped. The number of clock cycles required by the cache subsystem to provide the requested data is determined by the data cache 194
It depends on whether there is a hit or a mistake.

【００４６】ドライブ段階５２７において、キャッシュ
サブシステム１３０はロード／記憶ユニット２５０に対
するデータ信号を要求する。キャッシュサブシステム１
３０はロード／記憶ユニット２５０にサイクル当たり２
５６ビット（３２バイト）データを提供することができ
る。バイトアライナ７１０は２８８ビット値を提供する
ために対応する９ビット記憶位置に３２バイトをそれぞ
れ整列させる。２８８ビットフォーマットは時々９ビッ
トデータエレメントを使用するＭＰＥＧ符号化及び復号
化のようなマルチメディア応用に便利である。２８８ビ
ット値は読取りデータバッファ７２０に記録される。記
録段階５２８でスケジューラ２３０はＦＩＦＯバッファ
４１０から実行レジスタ（４２６または４２７）にフィ
ールド４を伝送して、データバッファ７２０からレジス
タファイル６１０に２８８ビット量を記録する。In drive stage 527, cache subsystem 130 requests a data signal to load / store unit 250. Cache subsystem 1
30 is 2 per cycle to load / store unit 250
56 bit (32 byte) data can be provided. Byte aligner 710 aligns each of the 32 bytes in a corresponding 9-bit storage location to provide a 288-bit value. The 288 bit format is convenient for multimedia applications such as MPEG encoding and decoding, which sometimes use 9 bit data elements. The 288 bit value is recorded in read data buffer 720. In the recording step 528, the scheduler 230 transmits the field 4 from the FIFO buffer 410 to the execution register (426 or 427), and records the 288 bit amount from the data buffer 720 to the register file 610.

【００４７】図７は記憶命令の実行のための実行パイプ
ライン５３０を示す。実行パイプライン５３０のための
取出し段階５１１、復号化段階５１２、及び発行段階５
１３は前述と同様である。読み取り段階５１４は記憶さ
れるべきデータとアドレス計算用データとを読み取るこ
とを除いては前述と同一である。記憶されるべきデータ
はロード／記憶ユニット２５０で記録データバッファ７
３０に記録される。マルチプレクサ７４０は９ビットバ
イトを提供するフォーマットのデータを８ビットバイト
を有する通常のフォーマットに変換する。バッファ７３
０からの変換されたデータとアドレス計算段階５２５か
らの関連アドレスはＳＲＡＭ段階５３６の間キャッシュ
サブシステム１３０に並列に送られる。FIG. 7 shows an execution pipeline 530 for execution of a store instruction. Retrieval stage 511, decryption stage 512, and issue stage 5 for execution pipeline 530
13 is the same as described above. The reading step 514 is the same as described above, except that the data to be stored and the data for address calculation are read. The data to be stored is stored in the recording data buffer 7 in the load / store unit 250.
30 is recorded. Multiplexer 740 converts data in a format that provides 9-bit bytes to a conventional format that has 8-bit bytes. Buffer 73
The translated data from 0 and the associated address from address calculation stage 525 are sent in parallel to cache subsystem 130 during SRAM stage 536.

【００４８】ベクトルプロセッサ１２０の好ましい実施
の形態において、各命令は３２ビット長であって、図８
に示された９つのフォーマットのうち１つをもち、ＲＥ
ＡＲ，ＲＥＡＩ，ＲＲＲＭ５，ＲＲＲＲ，ＲＩ，ＣＴ，
ＲＲＲＭ９，ＲＲＲＭ＊，及びＲＲＲＭ９＊＊のレベル
が付けてある。なお、別添Ｅにおいてベクトルプロセッ
サ１２０に対する命令セットについて説明する。In the preferred embodiment of vector processor 120, each instruction is 32 bits long and
Has one of the nine formats shown in
AR, REAI, RRRM5, RRRR, RI, CT,
RRRM9, RRRM *, and RRRM9 ** levels are given. The instruction set for the vector processor 120 will be described in Appendix E.

【００４９】有効アドレスを決定する時にスカラレジス
タを使用する幾つかのロード、記憶、及びキャッシュ演
算はＲＥＡＲフォーマットをもつ。ＲＥＡＲ−フォーマ
ット命令は０００ｂのビット２９〜３１によって識別さ
れ、スカラレジスタに対する２つのレジスタ番号ＳＲ
ｂ，ＳＲｉとビットＤによるスカラ或いはベクトルレジ
スタである可能性のあるレジスタのレジスタ番号Ｒｎに
よって識別される３オペランドをもつ。バンクビットＢ
はレジスタＲｎに対するバンクを識別するか、或いはデ
フォールトベクトルレジスタサイズがダブルサイズの場
合、ベクトルレジスタＲｎがダブルサイズベクトルレジ
スタであるかを指示する。ｏｐ−コードフィールドＯｐ
ｃはオペランドに実行される演算を識別し、フィールド
ＴＴはロード或いは記憶のような伝送タイプを指す。典
型的なＲＥＡＲ−フォーマット命令はスカラレジスタＳ
Ｒｂ，ＳＲｉの内容を加算することにより決定されるア
ドレスからレジスタＲｎをロードする命令ＶＬである。
もしビットＡが設定される場合、計算されたアドレスは
スカラレジスタＳＲｂに記憶される。Some load, store, and cache operations that use scalar registers when determining the effective address have a REAR format. REAR-format instruction is identified by bits 29-31 of 000b and has two register numbers SR for the scalar register.
b, SRi and three operands identified by the register number Rn of the register which may be a scalar or a vector register by bit D. Bank bit B
Indicates the bank for the register Rn or, if the default vector register size is double size, indicates whether the vector register Rn is a double size vector register. op-code field Op
c identifies the operation to be performed on the operand, and field TT refers to the type of transmission, such as load or store. A typical REAR-format instruction is a scalar register S
An instruction VL for loading the register Rn from an address determined by adding the contents of Rb and SRi.
If bit A is set, the calculated address is stored in scalar register SRb.

【００５０】ＲＥＡ１−フォーマット命令は、フィール
ドＩＭＭの８ビット中間値がスカラレジスタＳＲｉの内
容の代わりに使用されることを除いてはＲＥＡＲ命令と
同一である。ＲＥＡＲとＲＥＡＩフォーマットはデータ
エレメントサイズフィールドを持たない。The REA1-format instruction is the same as the REAR instruction except that the 8-bit intermediate value of the field IMM is used instead of the contents of the scalar register SRi. The REAR and REAI formats do not have a data element size field.

【００５１】ＲＲＲＭ５フォーマットは２ソースオペラ
ンドと１目的オペランドをもつ命令のためのものであ
る。これらの命令は３レジスタオペランド或いは２レジ
スタオペランドと５ビット中間値のうち１つを有する。
別添Ｅに示すように、フィールドＤ，Ｓ，Ｍの符号化は
第１ソースオペランドＲａがスカラまたはベクトルレジ
スタであるか否かを判断し、第２ソースオペランドＲｂ
／ＩＭ５がスカラレジスタ、ベクトルレジスタ、或いは
５ビット中間値であるか否かを判断し、目的レジスタＲ
ｄがスカラ或いはベクトルレジスタであるか否かを判断
する。The RRRM5 format is for instructions with two source operands and one destination operand. These instructions have one of a 3-register operand or a 2-register operand and a 5-bit intermediate value.
As shown in Attachment E, the encoding of the fields D, S, and M determines whether the first source operand Ra is a scalar or a vector register, and
/ IM5 is a scalar register, a vector register, or a 5-bit intermediate value.
It is determined whether or not d is a scalar or a vector register.

【００５２】ＲＲＲＲフォーマットは４レジスタオペラ
ンドをもつ命令のためのものである。レジスタ番号Ｒ
ａ，Ｒｂはソースレジスタを指摘する。レジスタ番号Ｒ
ｄは目的レジスタを示し、レジスタ番号Ｒｃはフィール
ドＯｐｃによるソースまたは目的レジスタのうち１つを
示す。レジスタＲｂがスカラレジスタであることを指示
するようにビットＳが設定されている場合を除いて全て
のオペランドはベクトルレジスタである。フィールドＤ
Ｓはベクトルレジスタに対するデータエレメントサイズ
を示す。フィールドＯｐｃは３２ビットデータエレメン
トに対するデータ型を選択する。The RRRR format is for instructions with four register operands. Register number R
a and Rb indicate a source register. Register number R
d indicates the destination register, and the register number Rc indicates one of the source or destination register according to the field Opc. All operands are vector registers, except when bit S is set to indicate that register Rb is a scalar register. Field D
S indicates the data element size for the vector register. Field Opc selects the data type for the 32-bit data element.

【００５３】ＲＩ−フォーマット命令は中間値をレジス
タにロードさせる。フィールドＩＭＭは１８ビットまで
の中間値を含む。レジスタ番号ＲｄはビットＤによるス
カラレジスタと現在バンクのベクトルレジスタのうち１
つである目的レジスタを示す。フィールドＤＳ，Ｆはそ
れぞれのデータエレメントサイズとタイプを指す。３２
ビット整数データエレメントの場合、１８ビット中間値
はレジスタＲｄにロードされる前に拡張されたサインで
ある。浮動小数点データエレメントの場合、ビット１
８、ビット１７〜１０、及びビット９〜０はそれぞれ３
２ビット浮動小数点値のサイン、指数、及び仮数(manti
ssa)を示す。The RI-format instruction causes an intermediate value to be loaded into a register. Field IMM contains intermediate values up to 18 bits. The register number Rd is one of the scalar register by bit D and the vector register of the current bank.
Are the target registers. Fields DS and F indicate the respective data element sizes and types. 32
In the case of a bit integer data element, the 18-bit intermediate value is the sign extended before being loaded into register Rd. Bit 1 for floating point data element
8, bits 17 to 10, and bits 9 to 0 are each 3
Sine, exponent, and mantissa of two-bit floating point value
ssa).

【００５４】ＣＴフォーマットはフローコントロール命
令に対するものであり、ｏｐ−コードフィールドＯｐ
ｃ、条件フィールドＣｏｎｄ、２３ビット中間値ＩＭＭ
を含む。条件フィールドによって示される条件が真実で
ある場合、ブランチが取られる。可能な条件コードは
“常時(always)”、“より少ない(less than）”、“同
一(equal）”、“以下或いは同一(less than or equa
l)”、“より大きい(greaterthan)”、“同一でない(no
t equal）”、“より大きいか或いは同一(greater than
or equal) ”、及び“オーバフロー(overflow)”であ
る。状態及び制御レジスタＶＣＳＲでビットＧＴ，Ｅ
Ｑ，ＬＴ，ＳＯは条件を評価するのに用いられる。The CT format is for a flow control instruction, and includes an op-code field Op.
c, condition field Cond, 23-bit intermediate value IMM
including. If the condition indicated by the condition field is true, a branch is taken. Possible condition codes are "always", "less than", "equal", "less than or equa"
l), “greaterthan”, “not identical (no
t equal) ”,“ greater than
or equal) "and" overflow ". Bits GT, E in the status and control register VCSR
Q, LT, and SO are used to evaluate conditions.

【００５５】フォーマットＲＲＲＭ９は３レジスタオペ
ランド或いは２レジスタオペランドと９ビット中間値の
うちいずれかを提供する。ビットＤ，Ｓ，Ｍの組合せは
どのオペランドがベクトルレジスタ、スカラレジスタ、
或いは９ビット中間値であるかを示す。フィールドＤＳ
はデータエレメントサイズを示す。ＲＲＲＭ９＊とＲＲ
ＲＭ９＊＊フォーマットはＲＲＲＭ９フォーマットの特
殊なケースであって、演算コードフィールドＯｐｃによ
って区別される。ＲＲＲＭ９＊＊フォーマットはソース
レジスタ番号Ｒａを条件コードＣｏｎｄとＩＤフィール
ドに置き換えた。ＲＲＲＭ９＊＊フォーマットは中間値
の最上位ビットＭＳＢを条件コードＣｏｎｄとビットＫ
に置き換えた。ＲＲＲＭ９＊とＲＲＲＭ９＊＊に対する
追加説明が条件部移動命令ＶＣＭＯＶ、エレメントマス
クを有する条件部移動ＣＭＯＶＭ、及び比較とマスク設
定ＣＭＰＶ命令と関連して後述する別添Ｅになされてい
る。The format RRRM9 provides either a 3-register operand or a 2-register operand and a 9-bit intermediate value. The combination of bits D, S, and M indicates which operand is a vector register, a scalar register,
Alternatively, it indicates whether it is a 9-bit intermediate value. Field DS
Indicates a data element size. RRRM9 * and RR
The RM9 ** format is a special case of the RRRM9 format, and is distinguished by an operation code field Opc. In the RRRM9 ** format, the source register number Ra is replaced with a condition code Cond and an ID field. The RRRM9 ** format uses the most significant bit MSB of the intermediate value as the condition code Cond and bit K.
Was replaced with Additional explanations for RRRM9 * and RRRM9 ** are provided in Appendix E below in connection with the conditional move command VCMOV, the conditional move CMOVM with element mask, and the compare and set mask CMPV instruction.

【００５６】以上、本発明による特定の好ましい実施の
形態に関連して図示し述べたが、特許請求の範囲によっ
て設けられる本発明の精神や分野を外れない限度内で本
発明を多様に改造及び変換し得ることは当分野で通常の
知識を有する者には明らかなことである。While the invention has been illustrated and described in connection with specific preferred embodiments thereof, it will be understood that various modifications and alterations of the invention may be made without departing from the spirit and scope of the invention as defined by the appended claims. It will be obvious to those having ordinary skill in the art that they can be transformed.

【００５７】［別添Ａ］例示的な実施例において、プロ
セッサ１１０はＡＲＭ７プロセッサの規格に合う汎用プ
ロセッサである。ＡＲＭ７プロセッサ内のレジスタ内の
説明に関するＡＲＭアーキテクチャ文献或いはＡＲＭ７
データシート（１９９４年１２月に発行された文献番号
ＡＲＭＤＤＩ００２０Ｃ）を参照する。Attachment A In the exemplary embodiment, processor 110 is a general purpose processor that meets the ARM7 processor standard. ARM architecture literature or ARM7 for description in registers in ARM7 processor
Reference is made to the data sheet (literature number ARMDDI0020C issued in December 1994).

【００５８】ベクトルプロセッサ１２０との相互作用の
ために、プロセッサ１１０はベクトルプロセッサを開始
及び停止させ、同期を含んだベクトルプロセッサ状態を
テストし、ベクトルプロセッサ１２０内のスカラ／特殊
レジスタからのデータをプロセッサ１１０内の汎用レジ
スタ側に伝送し、一般レジスタからのデータをベクトル
プロセッサスカラ／特殊レジスタ側に伝送する。このよ
うな伝送のためには仲介者としてメモリを必要とする。For interaction with the vector processor 120, the processor 110 starts and stops the vector processor, tests the vector processor state including synchronization, and passes data from scalar / special registers in the vector processor 120 to the processor. The data is transmitted to the general-purpose register side in 110, and the data from the general register is transmitted to the vector processor scalar / special register side. Such transmission requires a memory as an intermediary.

【００５９】表２にはベクトルプロセッサの相互作用の
ためのＡＲＭ７命令セットの拡張について説明されてい
る。Table 2 describes the ARM7 instruction set extension for vector processor interaction.

【００６０】[0060]

【表２】 [Table 2]

【００６１】[0061]

【表３】 [Table 3]

【００６２】表３にはＡＲＭ７の例外がリストされてお
り、これら例外はフローティング命令を行う前に検出及
び報告される。例外ベクトルアドレスは１６進数表記で
与えられる。Table 3 lists ARM7 exceptions, which are detected and reported before executing the floating instruction. The exception vector address is given in hexadecimal notation.

【００６３】[0063]

【表４】 [Table 4]

【００６４】次に、ＡＲＭ７命令セットに対する拡張の
構文について説明する。用語説明及び命令フォーマット
に関するＡＲＭアーキテクチャ文献或いはＡＲＭ７デー
タシート（１９９４年１２月に発行された文献番号ＡＲ
ＭＤＤＩ００２０Ｃを参照する。Next, an extension syntax for the ARM7 instruction set will be described. ARM architecture literature on terminology and instruction format or ARM7 data sheet (literature number AR issued in December 1994)
Reference is made to MDDI0020C.

【００６５】前記ＡＲＭアーキテクチャはコプロセッサ
インタフェースのための３種類の命令フォーマットを提
供する。The ARM architecture provides three different instruction formats for the coprocessor interface.

【００６６】１．コプロセッサデータ演算（ＣＤＰ）２．コプロセッサデータ伝送（ＬＤＣ，ＳＴＣ）３．コプロセッサレジスタ伝送（ＭＲＣ，ＭＣＲ）ＭＳＰアーキテクチャ拡張は３種類の形態を全て使用す
る。前記コプロセッサのデータ演算フォーマットＣＤＰ
はＡＲＭ７側に再び伝送する必要のない演算のために用
いられる。1. 1. Coprocessor data operation (CDP) 2. Coprocessor data transmission (LDC, STC) Coprocessor Register Transfer (MRC, MCR) The MSP architecture extension uses all three forms. Data operation format CDP of the coprocessor
Is used for operations that do not need to be transmitted again to the ARM7 side.

【００６７】[0067]

【表５】 [Table 5]

【００６８】[0068]

【表６】 [Table 6]

【００６９】コプロセッサデータ伝送フォーマット（Ｌ
ＤＣ，ＳＴＣ）はベクトルプロセッサのレジスタのサブ
セットをメモリに直接ロード或いは記憶させるのに用い
られる。前記ＡＲＭ７プロセッサはワードアドレスを供
給する役目をし、前記ベクトルプロセッサはデータを供
給又は受信し、伝送されたワードの個数を制御する。よ
り詳細なことはＡＲＭ７データシートを参照する。The coprocessor data transmission format (L
DC, STC) is used to load or store a subset of the registers of the vector processor directly into memory. The ARM7 processor serves to supply word addresses, and the vector processor supplies or receives data and controls the number of words transmitted. Refer to the ARM7 data sheet for more details.

【００７０】[0070]

【表７】 [Table 7]

【００７１】[0071]

【表８】 [Table 8]

【００７２】コプロセッサレジスタ伝送フォーマット
（ＭＲＣ，ＭＣＲ）はＡＲＭ７とベクトルプロセッサと
の間で直接情報を通信するのに用いられる。このフォー
マットはＡＲＭ７レジスタとベクトルプロセッサスカラ
或いは特殊レジスタとの間の移動に用いられる。The coprocessor register transmission format (MRC, MCR) is used to communicate information directly between the ARM7 and the vector processor. This format is used to move between ARM7 registers and vector processor scalar or special registers.

【００７３】[0073]

【表９】 [Table 9]

【００７４】[0074]

【表１０】 [Table 10]

【００７５】拡張ＡＲＭ命令説明拡張ＡＲＭ命令についてはアルファベット順で説明す
る。Description of Extended ARM Instruction Extended ARM instruction will be described in alphabetical order.

【００７６】ＣＡＣＨＥキャッシュ演算 CACHE cache operation

【００７７】[0077]

【表１１】 [Table 11]

【００７８】アセンブラ構文ＳＴＣ｛ｃｏｎｄ｝ｐ１５，ｃ０ｐｃ，（Ａｄｄｒｅ
ｓｓ）ＣＡＣＨＥ｛ｃｏｎｄ｝Ｏｐｃ，（Ａｄｄｒｅｓｓ）ここで、ｃｏｎｄ＝｛ｅｑ，ｈｅ，ｃｓ，ｃｃ，ｍｉ，
ｐｌ，ｖｓ，ｖｃ，ｈｉ，Ｉｓ，ｇｅ，Ｉｔ，ｇｔ，ｌ
ｅ，ａｉ，ｎｖ｝、Ｏｐｃ＝｛０，１，３｝。ＬＤＣ／
ＳＴＣフォーマットのＣＲｎフィールドはＯｐｃを特定
するのに用いられるので、演算コードの十進数表記は第
１構文で文字“ｃ”（即ち、０の代わりにｃ０を使用す
る）で開始すべきことに注目されたい。アドレスモード
構文に関するＡＲＭ７データシートを参照する。Assembler syntax STC {cond} p15, c0pc, (Addre
ss) CACHE ｛cond｝ Opc, (Address) where cond = ｛eq, he, cs, cc, mi,
pl, vs, vc, hi, Is, ge, It, gt, l
e, ai, nv}, Opc = {0, 1, 3}. LDC /
Note that the decimal notation of the opcode should start with the letter “c” in the first syntax (ie, use c0 instead of 0), since the CRn field of the STC format is used to specify Opc. I want to be. See ARM7 data sheet for address mode syntax.

【００７９】[0079]

【表１２】 [Table 12]

【００８０】演算ＥＡを算出する方法に対するＡＲＭ７データシートを参
照する。Operation Reference is made to the ARM7 data sheet for the method of calculating EA.

【００８１】例外ＡＲＭ７保護侵害ＩＮＴＶＰインタラプトベクトルプロセッサ Exceptions ARM7 Protection Violation INTVP Interrupt Vector Processor

【００８２】[0082]

【表１３】 [Table 13]

【００８３】アセンプラ構文ＣＤＰ｛ｃｏｎｄ｝ｐ７，１，ｃ０，ｃ０，ｃ０ＩＮＴＶＰ｛ｃｏｎｄ｝ここで、ｃｏｎｄ＝｛ｅｑ，ｎｅ，ｃｓ，ｃｃ，ｍｉ，
ｐｌ，ｖｓ，ｈｉ，ｌｓ，ｇｅ，ｌｔ，ｇｔ，ｌｅ，ａ
ｌ，ｎｓ｝説明この命令はＣｏｎｄが真の時にのみ行われる。Assembler syntax CDP {cond} p7,1, c0, c0, c0 INTVP {cond} where cond = {eq, ne, cs, cc, mi,
pl, vs, hi, ls, ge, lt, gt, le, a
1, ns} Description This instruction is performed only when Cond is true.

【００８４】この命令はベクトルプロセッサを停止させ
るために信号伝送を行う。This instruction sends a signal to stop the vector processor.

【００８５】ＡＲＭ７はベクトルプロセッサの停止を待
たず、次の命令を引き続き行う。The ARM 7 continues to execute the next instruction without waiting for the stop of the vector processor.

【００８６】ＭＦＥＲ使用中(busy)待機ループはこの命
令が行われた後にベクトルプロセッサが停止されたかを
調べるために用いられるべきである。この命令はベクト
ルプロセッサが予めＶＰ＿ＩＤＬＥ状態であれば、何の
影響も及ぼさない。The MFER busy loop should be used to check if the vector processor has been stopped after this instruction has been performed. This instruction has no effect if the vector processor is in the VP_IDLE state in advance.

【００８７】ビット１９：１２，７：１５及び３：０は
予約されている。Bits 19:12, 7:15 and 3: 0 are reserved.

【００８８】例外ベクトルプロセッサ利用不可能。Exception Vector processor not available.

【００８９】ＭＦＥＲ拡張レジスタからの移動 Move from MFER Extension Register

【００９０】[0090]

【表１４】 [Table 14]

【００９１】アセンブラ構文ＭＲＣ｛ｃｏｎｄ｝ｐ７，１，Ｒｄ，ｃＰ，ｃＥＲ，０ＭＦＥＲ｛ｃｏｎｄ｝Ｒｄ，ＲＮＡＭＥここで、ｃｏｎｄ＝｛ｅｑ，ｈｅ，ｃｓ，ｃｃ，ｍｉ，
ｐｌ，ｒｓ，ｖｓ，ｈｉ，ｌｓ，ｇｅ，ｌｔ，ｇｔ，ｌ
ｅ，ａｌ，ｎｖ｝，Ｒｄ＝｛ｒ０，・・・ｒ１５｝，Ｐ
＝｛０，１｝，ＥＲ＝｛０，・・１５｝、そしてＲＮＡ
ＭＥはアーキテクチャ的に特定されたレジスタニモニッ
ク｛即ち、ＰＥＲＯ或いはＣＳＲ｝を意味する。Assembler syntax MRC ｛cond｝ p7,1, Rd, cP, cER, 0 MFER ｛cond｝ Rd, RNAME where cond = ｛eq, he, cs, cc, mi,
pl, rs, vs, hi, ls, ge, lt, gt, l
e, al, nv}, Rd = {r0,... r15}, P
= {0,1}, ER = {0, ... 15}, and RNA
ME stands for architecturally specified register mnemonic {ie, PERO or CSR}.

【００９２】[0092]

【表１５】 [Table 15]

【００９３】ビット１９：１７及び７：５は予約されて
いる。Bits 19:17 and 7: 5 are reserved.

【００９４】例外使用者モード中にＰＥＲｘをアクセスしようとする時の
保護侵害ＭＦＶＰベクトルプロセッサからの移動 Exceptions Protection breach when trying to access PERx while in user mode Move from MFVP vector processor

【００９５】[0095]

【表１６】 [Table 16]

【００９６】アセンブラ構文ＭＲＣ｛ｃｏｎｄ｝ｐ７，１，Ｒｄ，Ｃｒｎ，ＣＲｍ，
０ＭＦＶＰ｛ｃｏｎｄ｝Ｒｄ，ＲＮＡＭＥここで、ｃｏｎｄ＝｛ｅｑ，ｎｅ，ｃｓ，ｃｃ，ｍｉ，
ｐｌ，ｖｓ，ｖｃ，ｈｉ，ｌｓ，ｇｅ，ｌｔ，ｇｔ，ｌ
ｅ，ａｌ，ｎｖ｝，Ｒｄ＝｛ｒ０，・・・ｒ１５｝，Ｃ
Ｒｎ＝｛ｃ０，・・・ｃ１５｝，ＣＲｍ＝｛ｃ０，・・
ｃ１５｝、そしてＲＮＡＭＥはアーキテクチャ的に特定
されたレジスタニモニック｛即ち、ＳＰＯ或いはＶＣ
Ｓ｝を意味する。Assembler syntax MRC {cond} p7,1, Rd, Crn, CRm,
0 MFVP ｛cond｝ Rd, RNAME where cond = ｛eq, ne, cs, cc, mi,
pl, vs, vc, hi, ls, ge, lt, gt, l
e, al, nv}, Rd = {r0,... r15}, C
Rn = {c0,... C15}, CRm = {c0,.
c15}, and RNAME is an architecturally specified register mnemonic {ie, SPO or VC
It means S｝.

【００９７】[0097]

【表１７】 [Table 17]

【００９８】ＳＲ０は常に０である３２ビットと判読
し、これに対する記録は無視される。SR0 is read as 32 bits which are always 0, and the record for this is ignored.

【００９９】例外ベクトルプロセッサ利用不可能ＭＴＥＲ拡張レジスタ側への移動 Exception: Unable to use vector processor MTER Move to extension register side

【０１００】[0100]

【表１８】 [Table 18]

【０１０１】アセンブラ構文ＭＲＣ｛ｃｏｎｄ｝ｐ７，１，Ｒｄ，ｃＰ，ｃＥＲ，０ＭＲＥＲ｛ｃｏｎｄ｝Ｒｄ，ＲＮＡＭＥここで、ｃｏｎｄ＝｛ｅｑ，ｈｅ，ｃｓ，ｃｃ，ｍｉ，
ｐｌ，ｒｓ，ｖｃ，ｈｉ，ｌｓ，ｇｅ，ｌｔ，ｇｔ，ｌ
ｅ，ａｌ，ｎｖ｝，Ｒｄ＝｛ｒ０，・・・ｒ１５｝，Ｐ
＝｛０，１｝，ＥＲ＝｛０，・・１５｝，そしてＲＮＡ
ＭＥはアーキテクチャ的に特定されたレジスタニモニッ
ク｛即ち、ＰＥＲＯ或いはＣＳＲ｝を意味する。Assembler syntax MRC ｛cond｝ p7,1, Rd, cP, cER, 0 MERR ｛cond｝ Rd, RNAME where cond = ｛eq, he, cs, cc, mi,
pl, rs, vc, hi, ls, ge, lt, gt, l
e, al, nv}, Rd = {r0,... r15}, P
= {0,1}, ER = {0, ... 15}, and RNA
ME stands for architecturally specified register mnemonic {ie, PERO or CSR}.

【０１０２】[0102]

【表１９】 [Table 19]

【０１０３】ビット１９：１７及び７：５は予約されて
いる。Bits 19:17 and 7: 5 are reserved.

【０１０４】例外使用者モード中にＰＥＲｘをアクセスしようとする時の
保護侵害ＭＴＶＰベクトルプロセッサ側への移動 Exceptions Protection violation when trying to access PERx during user mode Move to MTVP vector processor side

【０１０５】[0105]

【表２０】 [Table 20]

【０１０６】アセンブラ構文ＭＲＣ｛ｃｏｎｄ｝ｐ７，１，Ｒｄ，ｃＲｎ，ＣＲｍ，
０ＭＲＥＲ｛ｃｏｎｄ｝Ｒｄ，ＲＮＡＭＥここで、ｃｏｎｄ＝｛ｅｑ，ｎｅ，ｃｓ，ｃｃ，ｍｉ，
ｐｌ，ｖｓ，ｈｉ，ｌｓ，ｇｅ，ｌｔ，ｇｔ，ｌｅ，ａ
ｌ，ｎｖ｝，Ｒｄ＝｛ｒ０，・・・ｒ１５｝，ＣＲｎ＝
｛ｃ０，・・ｃ１５｝、ＣＲｍ＝｛ｃ０，・・・ｃ１
５｝、そしてＲＮＡＭＥはアーキテクチャ的に特定され
たレジスタニモニック｛即ち、ＳＰＯ或いはＶＣＳ｝を
意味する。Assembler syntax MRC {cond} p7,1, Rd, cRn, CRm,
0 MERR ｛cond｝ Rd, RNAME where cond = ｛eq, ne, cs, cc, mi,
pl, vs, hi, ls, ge, lt, gt, le, a
l, nv}, Rd = {r0,... r15}, CRn =
{C0,... C15}, CRm = {c0,... C1
5}, and RNAME stands for architecturally specified register mnemonic {ie, SPO or VCS}.

【０１０７】[0107]

【表２１】 [Table 21]

【０１０８】例外ベクトルプロセッサ利用不可能ＰＦＴＣＨプリフェッチ Exception Vector processor unavailable PFTCH prefetch

【０１０９】[0109]

【表２２】 [Table 22]

【０１１０】アセブラ構文ＭＲＣ｛ｃｏｎｄ｝ｐ１５，２，（Ａｄｄｒｅｓｓ）ＭＦＴＣＨ｛ｃｏｎｄ｝（Ａｄｄｒｅｓｓ）ここで、ｃｏｎｄ＝｛ｅｑ，ｈｅ，ｃｓ，ｃｃ，ｍｉ，
ｐｌ，ｒｓ，ｖｃ，ｈｉ，ｌｓ，ｇｅ，ｌｔ，ｇｔ，ｌ
ｅ，ａｌ，ｎｖ｝、アドレスモード構文に関するＡＲＭ
７データシートを参照する。Assembler syntax MRC {cond} p15,2, (Address) MFTCH {cond} (Address) where cond = {eq, he, cs, cc, mi,
pl, rs, vc, hi, ls, ge, lt, gt, l
e, al, nv}, ARM for address mode syntax
7 Refer to the data sheet.

【０１１１】説明この命令はＣｏｎｄが真の時にのみ行われる。ＥＡによ
って特定されたキャッシュラインはＡＲＭ７データキャ
ッシュ側にプリフェッチされる。Description This instruction is executed only when Cond is true. The cache line specified by the EA is prefetched to the ARM7 data cache side.

【０１１２】演算ＥＡが算出される方法に関してはＡＲＭ７データシート
を参照する。Calculation The method of calculating EA is described in the ARM7 data sheet.

【０１１３】例外無しＳＴＡＲＴＶＰ開始ベクトルプロセッサ Exception None STARTVP Start vector processor

【０１１４】[0114]

【表２３】 [Table 23]

【０１１５】アセブラ構文ＣＤＰ｛ｃｏｎｄ｝ｐ７，２，ｃ０，ｃ０，ｃ０ＳＴＡＲＴＶＰ｛ｃｏｎｄ｝ここで、ｃｏｎｄ＝｛ｅｑ，ｈｅ，ｃｓ，ｃｃ，ｍｉ，
ｐｌ，ｖｓ，ｖｃ，ｈｉ，ｌｓ，ｇｅ，ｌｔ，ｇｔ，ｌ
ｅ，ａｌ，ｎｖ｝説明この命令はＣｏｎｄが真の時にのみ行われる。この命令
は遂行を開始するようにベクトルプロセッサ側に信号伝
送を行い、ＶＩＳＲＣ（ｖｊｐ）とＶＩＳＲＣ（ｖｉ
ｐ）を自動的にクリアさせる。ＡＲＭ７はベクトルプロ
セッサが遂行を開始することを待たず、次の命令を引き
続き行う。前記ベクトルプロセッサの状態はこの命令が
行われる前に所望の状態に初期化されるべきである。こ
の命令は前記ベクトルプロセッサが予めＶＰ＿ＲＵＮ状
態になっている場合には何の影響も及ぼさない。Assembler syntax CDP {cond} p7,2, c0, c0, c0 STARTVP {cond} where cond = {eq, he, cs, cc, mi,
pl, vs, vc, hi, ls, ge, lt, gt, l
e, al, nv} Description This instruction is performed only when Cond is true. This instruction sends a signal to the vector processor to start execution, and VISRC (vjp) and VISRC (vi
p) is automatically cleared. The ARM 7 continues to execute the next instruction without waiting for the vector processor to start executing. The state of the vector processor should be initialized to the desired state before this instruction is performed. This instruction has no effect if the vector processor has previously been in the VP_RUN state.

【０１１６】ビット１９：１２，７：５，及び３：０は
予約されている。Bits 19:12, 7: 5, and 3: 0 are reserved.

【０１１７】例外ベクトルプロセッサ利用不可能ＴＥＳＴＳＥＴテスト及びセット Exceptions Vector processor unavailable TESTSET test and set

【０１１８】[0118]

【表２４】 [Table 24]

【０１１９】アセンブラ構文ＭＲＣ｛ｃｏｎｄ｝ｐ７，０，Ｒｄ，ｃ０，ｃＥＲ，０ＴＥＳＴＳＥＴ｛ｃｏｎｄ｝Ｒｄ，ＲＮＡＭＥここで、ｃｏｎｄ＝｛ｅｑ，ｈｅ，ｃｓ，ｃｃ，ｍｉ，
ｐｌ，ｒｓ，ｒｅ，ｈｉ，ｌｓ，ｇｅ，ｌｔ，ｇｔ，ｌ
ｅ，ａｌ，ｎｖ｝，Ｒｄ＝｛ｒ０，・・ｒ１５｝，ＥＲ
＝｛０，・・１５｝，そしてＲＡＮＡＭＥはアーキテク
チャ的に特定されたレジスタニモニック（即ち，ＵＥＲ
１或いはＶＡＳＹＮＣ）を意味する。Assembler syntax MRC ｛cond｝ p7,0, Rd, c0, cER, 0 TESTSET ｛cond｝ Rd, RNAME where cond = ｛eq, he, cs, cc, mi,
pl, rs, re, hi, ls, ge, lt, gt, l
e, al, nv}, Rd = {r0,... r15}, ER
= {0,... 15}, and RANAME is an architecturally specified register mnemonic (ie, UER
1 or VASYNC).

【０１２０】説明この命令はＣｏｎｄが真の時にのみ行われる。この命令
はＵＥＲｘ乃至ＲＤの内容を復帰させ、ＵＥＲｘ（３
０）を１に設定する。ＡＲＭ７レジスタ（１５）が目的
レジスタとして特定されると、ＵＥＲｘ（３０）はＣＰ
ＳＲのＺビットから復帰し、これにより短い使用中(bus
y)待機ループが行われることができる。現在、ＵＥＲ１
のみがこの命令に従って動作するように定義されてい
る。Description This instruction is executed only when Cond is true. This instruction restores the contents of UERx through RD, and UERx (3
0) is set to 1. When the ARM7 register (15) is specified as the destination register, the UERx (30)
Returns from the Z bit of the SR, which causes a short busy (bus
y) A waiting loop can be performed. Currently, UER1
Only are defined to operate according to this instruction.

【０１２１】ビット１９：１２及び７：５は予約されて
いる。Bits 19:12 and 7: 5 are reserved.

【０１２２】例外無し［別添Ｂ］マルチメディアプロセッサのアーキテクチャ
１００はプロセッサ１１０がＭＦＥＲ命令或いはＭＴＥ
Ｒ命令でアクセスする拡張レジスタを定義する。この拡
張レジスタは特権拡張レジスタと使用者拡張レジスタを
含んでいる。Exceptions None [Attachment B] The multimedia processor architecture 100 is such that the processor 110 uses the MFER instruction or the MTE instruction.
An extension register to be accessed by the R instruction is defined. The extension register includes a privilege extension register and a user extension register.

【０１２３】特権拡張レジスタはマルチメディア信号プ
ロセッサの演算を制御するのに主に用いられる。これら
は表１２に示されている。The privilege extension register is mainly used to control the operation of the multimedia signal processor. These are shown in Table 12.

【０１２４】[0124]

【表２５】 [Table 25]

【０１２５】前記制御レジスタはＭＳＰ（１００）の演
算を制御する。ＣＴＲの全てのビットはリセット時にク
リアされる。前記レジスタ定義は表２Ｂに示されてい
る。The control register controls the operation of the MSP (100). All bits of CTR are cleared on reset. The register definitions are shown in Table 2B.

【０１２６】[0126]

【表２６】 [Table 26]

【０１２７】[0127]

【表２７】 [Table 27]

【０１２８】前記状態レジスタはＭＳＰ（１００）の状
態を指示する。フィールドＳＴＲの全てのビットはリセ
ット時にクリアされる。レジスタ定義は表１４に示され
ている。The status register indicates the status of the MSP (100). All bits of the field STR are cleared at reset. The register definitions are shown in Table 14.

【０１２９】[0129]

【表２８】 [Table 28]

【０１３０】プロセッサバージョンレジスタはプロセッ
サのマルチメディア信号プロセッサファミリーの特定プ
ロセッサの特定バージョンを表示する。The processor version register indicates the particular version of a particular processor in the processor's multimedia signal processor family.

【０１３１】ベクトルプロセッサインタラプトマスクレ
ジスタＶＩＭＳＫはプロセッサ１１０にベクトルプロセ
ッサ例外を報告する演算を制御する。ＶＩＭＳＫのそれ
ぞれのビットはＶＩＳＲＣレジスタの対応ビットと共に
セットされると、ＡＲＭ７に対してインタラプトを行う
例外をイネーブルさせる。これはベクトルプロセッサ例
外を検出する方法には何の影響も及ぼさず、但し前記例
外がＡＲＭ７に対してインタラプトを掛けるべきかにの
み影響を及ぼす。ＶＩＭＳＫの全てのビットはリセット
時にクリアされる。レジスタ定義は表１５に示されてい
る。The vector processor interrupt mask register VIMSK controls operations that report vector processor exceptions to the processor 110. Each bit of VIMSK, when set with the corresponding bit of the VISRC register, enables an interrupt exception to ARM7. This has no effect on how to detect a vector processor exception, but only on whether the exception should interrupt ARM7. All bits of VIMSK are cleared on reset. The register definitions are shown in Table 15.

【０１３２】[0132]

【表２９】 [Table 29]

【０１３３】ＡＲＭ７命令アドレス区切り点レジスタは
ＡＲＭ７プログラムデバック(debugging）時にこれを支
援する。レジスタ定義は表１６に示されている。The ARM7 instruction address breakpoint register supports this during ARM7 program debugging. The register definitions are shown in Table 16.

【０１３４】[0134]

【表３０】 [Table 30]

【０１３５】ＡＲＭ７データアドレス区切り点レジスタ
はＡＲＭ７プログラムデバッグ(debugging）時にこれを
支援する。レジスタ定義は表１７に示されている。The ARM7 data address breakpoint register supports this during ARM7 program debugging. The register definitions are shown in Table 17.

【０１３６】[0136]

【表３１】 [Table 31]

【０１３７】スクラッチパッドレジスタはキャッシュサ
ブシステム１３０のＳＲＡＭを使用して形成されたスク
ラッチのアドレスとサイズを構成する。レジスタ定義は
表１８に示されている。The scratch pad register configures the address and size of a scratch formed using the SRAM of the cache subsystem 130. The register definitions are shown in Table 18.

【０１３８】[0138]

【表３２】 [Table 32]

【０１３９】使用者拡張レジスタはプロセッサ１１０，
１２０の同期に主に用いられる。使用者拡張レジスタは
ビット３０にマッピングされた１ビットのみを持てるよ
うに現在定義されており、“ＭＦＥＲＲ１５，ＵＥＲ
ｘ”のような命令は例えばビット値をＺフラグ側に復帰
させる。ビットＵＥＲｘ（３１）及びＵＥＲｘ（２９：
０）は常にゼロと判読される。使用者拡張レジスタは表
１９に説明されている。The user extension register includes the processor 110,
Mainly used for synchronization of H.120. The user extension register is currently defined to have only one bit mapped to bit 30, "MFERR15, UER
An instruction such as x "returns, for example, the bit value to the Z flag side. The bits UERx (31) and UERx (29:
0) is always read as zero. The user extension registers are described in Table 19.

【０１４０】[0140]

【表３３】 [Table 33]

【０１４１】表２０はパワーオンリセット時の拡張レジ
スタの状態を示す。Table 20 shows the state of the extension register at power-on reset.

【０１４２】[0142]

【表３４】 [Table 34]

【０１４３】［別添Ｃ］ベクトルプロセッサ１２０のア
ーキテクチャ状態は３２個の３２ビットスカラレジス
タ；３２個の２８８ビットベクトルレジスタの２つのバ
ンク：一対の５７６ビットベクトルアキュムレータレジ
スタ；１セットの３２ビット特殊レジスタを含んでい
る。スカラレジスタ、ベクトルレジスタ及びアキュムレ
ータレジスタは汎用プログラミングのためのものであ
り、多数の他のデータ形態を支援する。[Appendix C] The architectural state of the vector processor 120 is 32 32-bit scalar registers; 2 banks of 32 288-bit vector registers: a pair of 576-bit vector accumulator registers; 1 set of 32-bit special registers Contains. Scalar registers, vector registers and accumulator registers are for general purpose programming and support a number of other data formats.

【０１４４】このセクション及び次のセクションでは次
の表記を使用する。：ＶＲはベクトルレジスタを示し、
ＶＲｉは第ｉベクトルレジスタ（ゼロオフセット）を示
し、ＶＲ［ｉ］はベクトルレジスタＶＲの第ｉデータエ
レメントを示し、ＶＲ（ａ：ｂ）はベクトルレジスタＶ
Ｒのビットａ乃至ビットｂを示し、ＶＲ［ｉ］（ａ：
ｂ）はベクトルレジスタＶＲの第ｉデータエレメントの
ビットａ乃至ビットｂを示す。This and the following sections use the following notation: : VR indicates a vector register,
VRi indicates the ith vector register (zero offset), VR [i] indicates the ith data element of the vector register VR, and VR (a: b) indicates the vector register V
R indicates bits a and b of VR [i] (a:
b) shows bits a and b of the ith data element of the vector register VR.

【０１４５】ベクトルアーキテクチャは一つのベクトル
レジスタ内の多数のエレメントＤＭＬデータ種類とサイ
ズの追加された寸法をもっている。ベクトルレジスタは
固定されたサイズをもっているので、保持可能なデータ
エレメントの個数は前記エレメントのサイズに左右され
る。ＭＳＰアーキテクチャは表２１に示すように５種類
のエレメントサイズを定義している。The vector architecture has a number of additional element DML data types and sizes in one vector register. Since the vector register has a fixed size, the number of data elements that can be held depends on the size of the element. The MSP architecture defines five element sizes as shown in Table 21.

【０１４６】[0146]

【表３５】 [Table 35]

【０１４７】ＭＳＰアーキテクチャは特定されたデータ
種類と命令サイズによってベクトルデータを解釈する。
現在、大部分の算術命令のバイト、バイト９、ハフワー
ド及びワードエレメントサイズに対しては２の補数（整
数）フォーマットが支援されている。また、ＩＥＥＥ７
４単一精密度フォーマットは大部分の算術命令のワード
エレメントサイズが支援されている。The MSP architecture interprets vector data according to the specified data type and instruction size.
Currently, two's complement (integer) format is supported for most arithmetic instruction byte, byte 9, Huffword and word element sizes. In addition, IEEE7
The four single precision format supports the word element size of most arithmetic instructions.

【０１４８】命令シーケンスが意味のある結果をもたら
す限り、プログラマは所望の方式でデータを自由に解釈
する。例えば、プログラマはプログラムが“偽(fals
e）”オーバフロー結果を処理し得る限り、符号のない
８ビット数の格納にバイト９サイズを自由に使用し、バ
イトサイズデータエレメントの符号の無い８ビット数を
同様に自由に格納し、提供された２補数算術命令を使用
してこれらに対して演算を自由に行うことができる。As long as the instruction sequence produces a meaningful result, the programmer is free to interpret the data in the desired manner. For example, programmers say that a program is "fals
e) As long as the overflow result can be handled, the byte 9 size is freely used for storing unsigned 8-bit numbers, and the unsigned 8-bit numbers of byte-size data elements are also freely stored and provided The operation can be freely performed on these using two's complement arithmetic instructions.

【０１４９】ＳＲ０乃至ＳＲ３１で表記された３２個の
スカラレジスタが存在する。これらスカラレジスタは幅
が３２ビットであり、未確定されたサイズのうちいずれ
かのサイズの１つのデータエレメントを含むことができ
る。スカラレジスタＳＲ０はこのレジスタＳＲ０が０か
らなる３２であって常に判読することができ、レジスタ
ＳＲ０への記録が無視される点から特別であると言え
る。バイト形、バイト９形及びハフワードデータ形は未
確定の値をもった最上位ビットをもっているスカラレジ
スタの最下位ビットに格納される。There are 32 scalar registers denoted by SR0 to SR31. These scalar registers are 32 bits wide and can include one data element of any size of undetermined size. The scalar register SR0 is special in that the register SR0 is 32 consisting of 0 and can always be read, and the recording in the register SR0 is ignored. The byte type, byte type 9 and Huffword data type are stored in the least significant bit of the scalar register having the most significant bit with an undefined value.

【０１５０】レジスタはデータ種類指示器を持っていな
いので、プログラマはそれぞれの命令に用いられるレジ
スタのデータ種類を知っていなければならない。これは
３２ビットレジスタが３２ビットレジスタを含んでいる
と仮定される他のアーキテクチャとは異なる。ＭＳＰア
ーキテクチャはデータ種類Ａの結果がデータ種類Ａに対
して未確定されたビットのみを修正することを指示す
る。例えば、バイト９加算の結果は３２ビット目的スカ
ラレジスタの下位９ビットのみを修正する。上位２３ビ
ットの値は命令に対して異に言及されなければ、未確定
された状態である。Since a register does not have a data type indicator, the programmer must know the data type of the register used for each instruction. This is different from other architectures where a 32-bit register is assumed to include a 32-bit register. The MSP architecture indicates that the result of data type A modifies only those bits that have not been determined for data type A. For example, the result of byte 9 addition modifies only the lower 9 bits of the 32-bit destination scalar register. The value of the upper 23 bits is undetermined unless otherwise specified for the instruction.

【０１５１】６４ベクトルレジスタはそれぞれ３２ビッ
トレジスタをもっている２つのバンクから構成されてい
る。バンク０は第１の３２レジスタを含んでおり、バン
ク１は第２の３２ビットレジスタを含んでいる。これら
２つのバンクのうち、一つは現在バンクとして設定さ
れ、もう一つは交替バンクとして設定されるようにして
用いられる。交替バンクのベクトルレジスタをアクセス
し得るロード／記憶命令及びレジスタ移動命令を除いた
全てのベクトル命令はデフォルトであって、現在バンク
内のレジスタを使用する。ベクトル制御及び状態レジス
タＶＣＳＲのＣＢＡＮＫビットはバンク０或いはバンク
１を現在バンクとして設定するのに用いられる（他のバ
ンクは交替バンクになる）。現在バンク内のベクトルレ
ジスタはＶＲ０乃至ＶＲ３１とし、交替バンク内のベク
トルレジスタはＶＲＡ０乃至ＶＲＡ３１とする。The 64-vector register is composed of two banks each having a 32-bit register. Bank 0 contains a first 32 registers and bank 1 contains a second 32 bit register. One of these two banks is used as a current bank, and the other is used as a replacement bank. All vector instructions, except for load / store and move register instructions that can access the vector registers of the replacement bank, are the default and use the registers in the current bank. The CBANK bit of the vector control and status register VCSR is used to set bank 0 or bank 1 as the current bank (other banks are alternate banks). The vector registers in the current bank are VR0 to VR31, and the vector registers in the replacement bank are VRA0 to VRA31.

【０１５２】また、２つのバンクは概念的には５７６ビ
ットそれぞれのダブルサイズの３２個のベクトルレジス
タを提供し得るように結合することができる。制御レジ
スタＶＣＳＲのＶＥＣ６４ビットはこのモードを示す。
ＶＥＣ６４モードには現在バンク及び交替バンクが存在
せず、ベクトルレジスタ番号は２つのバンクからの対応
する対の２８８ベクトルビットベクトルを示す。即ち、ＶＲｉ（５７５：０）＝ＶＲ１ｉ（２８７：０）：ＶＲ
０ｉ（２８７：０）ここで、ＶＲ０ｉ及びＶＲ１ｉはそれぞれバンク１及び
バンク０でレジスタ番号ＶＲｉをもっているベクトルレ
ジスタを示す。ダブルサイズベクトルレジスタはＶＲ０
乃至ＶＲ３１と表記されている。Also, the two banks can be conceptually combined to provide 32 double size vector registers of 576 bits each. The VEC64 bit of the control register VCSR indicates this mode.
In VEC64 mode, there is no current bank or replacement bank, and the vector register number indicates the corresponding pair of 288 vector bit vectors from the two banks. That is, VRi (575: 0) = VR1i (287: 0): VR
0i (287: 0) Here, VR0i and VR1i indicate a vector register having a register number VRi in bank 1 and bank 0, respectively. Double size vector register is VR0
To VR31.

【０１５３】ベクトルレジスタは表２２に示したバイ
ト、バイト９、ハフワード或いはワードサイズの多数の
エレメントを収容することができる。The vector register can contain a number of elements of byte, byte 9, huff word or word size as shown in Table 22.

【０１５４】[0154]

【表３６】 [Table 36]

【０１５５】一つのベクトルレジスタ内のエレメントサ
イズ間の混合は支援されない。バイト９エレメントサイ
ズを除いては２８８ビットのうち２５６ビットにのみが
用いられる。特に、全ての第９ビットは用いられない。
バイト、ハフワード及びワードサイズのうち用いられな
い３２ビットは予約されており、プログラマはこれらの
値に対してどのの仮定もできない。ベクトルアキュムレ
ータレジスタは目的レジスタの結果より高い正確度をも
っている中間結果を記憶装置に提供する。ベクトルアキ
ュムレータレジスタは４つの２８８ビットレジスタ、即
ちＶＡＣ１Ｈ，ＶＡＣ１Ｌ，ＶＡＣ０Ｈ，ＶＡＣ０Ｌか
ら構成されている。ＶＡＣ０Ｈ，ＶＡＣ０Ｌ対はデフォ
ルトによって３つの命令によって用いられる。ＶＥＣ６
４モードでのみ、ＶＡＣ１Ｈ，ＶＡＣ１Ｌ対が６４種類
のバイト９ベクトル演算を模倣するのに用いられる。ソ
ースベクトルレジスタと同じ個数のエレメントをもって
いる拡張された正確度の結果を生成するために、拡張精
密度エレメントは表２３に示すように一対のレジスタに
わたって節減される。Mixing between element sizes in one vector register is not supported. Excluding the byte 9 element size, only 256 bits out of 288 bits are used. In particular, not all ninth bits are used.
The unused 32 bits of the byte, huff word and word size are reserved and the programmer cannot make any assumptions about these values. The vector accumulator register provides an intermediate result to storage with higher accuracy than the result of the destination register. The vector accumulator register is composed of four 288-bit registers, that is, VAC1H, VAC1L, VAC0H, and VAC0L. The VAC0H, VAC0L pair is used by three instructions by default. VEC6
Only in 4 modes, the VAC1H, VAC1L pairs are used to mimic 64 byte 9 vector operations. To produce an extended accuracy result having the same number of elements as the source vector registers, the extended precision elements are saved over a pair of registers as shown in Table 23.

【０１５６】[0156]

【表３７】 [Table 37]

【０１５７】ＶＡＣ１Ｈ，ＶＡＣ１Ｌ対はＶＥＣ６４モ
ードでのみ用いられることができ、この時エレメントの
個数はバイト９（及びバイト）、ハフワード、及びワー
ドの場合にそれぞれ６４，３２，或いは１６になること
ができる。The VAC1H, VAC1L pair can only be used in VEC64 mode, where the number of elements can be 64, 32, or 16 for byte 9 (and byte), Huffword, and Word, respectively. .

【０１５８】メモリから直接ロードされ得るか、或いは
メモリに直接格納されうる３３つの特殊レジスタがあ
る。ＲＡＳＲ０乃至ＲＡＳＲ１５とする１６つの特殊レ
ジスタは内部復帰アドレススタックを形成しており、そ
してサブルーチン呼出命令及びサブルーチン復帰命令に
よって用いられる。１７個以上の３２ビット特殊レジス
タが表２４に示されている。There are 33 special registers that can be loaded directly from memory or stored directly in memory. Sixteen special registers, RASR0 through RASR15, form an internal return address stack and are used by subroutine call and subroutine return instructions. Seventeen or more 32-bit special registers are shown in Table 24.

【０１５９】[0159]

【表３８】 [Table 38]

【０１６０】ベクトル制御及び状態レジスタ（ＶＣＳ
Ｒ）に関する定義は表２５に示されている。Vector control and status register (VCS)
The definitions for R) are shown in Table 25.

【０１６１】[0161]

【表３９】 [Table 39]

【０１６２】[0162]

【表４０】 [Table 40]

【０１６３】[0163]

【表４１】 [Table 41]

【０１６４】ベクトルプログラムカウンタレジスタＶＰ
Ｃはベクトルプロセッサ１２０によって行われる次の命
令のアドレスである。ＡＲＭ７プロセッサ１１０はベク
トルプロセッサ１２０の演算を開始させるためにＳＴＡ
ＲＴＶＰ命令を発生する前にレジスタＶＰＣをロードし
なければならない。Vector program counter register VP
C is the address of the next instruction to be executed by the vector processor 120. The ARM7 processor 110 executes STA to start the operation of the vector processor 120.
Register VPC must be loaded before issuing an RTVP instruction.

【０１６５】ベクトル例外プログラムカウンタＶＥＰＣ
は一番最近の例外を一番生じさせるような命令のアドレ
スを指定する。ＭＳＰ１００は正確な例外を支援せず、
よって“一番生じさせるような”という用語を使用す
る。Vector exception program counter VEPC
Specifies the address of the instruction that causes the most recent exception. MSP100 does not support exact exceptions,
Therefore, the term "most likely to occur" is used.

【０１６６】ベクトルインタラプト供給レジスタＶＩＳ
ＲＣはインタラプト供給源をＡＲＭ７プロセッサ１１０
に特定する。適切なビットは例外の検出時にハードウェ
アによって設定される。ソフトウェアはベクトルプロセ
ッサ１２０が遂行を再開する前にレジスタＶＩＳＲＣを
クリアさせなければならない。レジスタＶＩＳＲＣで設
定されたあるビットによってベクトルプロセッサ１２０
は状態ＶＰ＿ＩＤＬＥに入る。対応するインタラプトイ
ネーブルビットがＶＩＭＳＫに設定されると、プロセッ
サ１１０に対するインタラプトが信号伝送される。表２
６にはレジスタＶＩＳＲＣの内容が定義されている。Vector interrupt supply register VIS
RC provides interrupt source to ARM7 processor 110
To be specified. The appropriate bits are set by hardware upon detection of an exception. Software must clear register VISRC before vector processor 120 resumes execution. A certain bit set in the register VISRC sets the vector processor 120
Enters state VP_IDLE. An interrupt to processor 110 is signaled when the corresponding interrupt enable bit is set to VIMSK. Table 2
6, the contents of the register VISRC are defined.

【０１６７】[0167]

【表４２】 [Table 42]

【０１６８】ベクトルインタラプト命令レジスタＶＩＩ
ＮＳはＶＣＩＮＴ命令或いはＶＣＪＯＩＮ命令がＡＲＭ
７プロセッサ１００をインタラプトするために行われる
と、ＶＣＩＮＴ命令或いはＶＣＪＯＩＮ命令に更新され
る。Vector interrupt instruction register VII
NS is ARM if VCINT instruction or VCJOIN instruction
7 is executed to interrupt the processor 100, the instruction is updated to the VCINT instruction or the VCJOIN instruction.

【０１６９】ベクトルカウントレジスタＶＣＲ１，ＶＣ
Ｒ２，ＶＣＲ３は減少及びブランチ命令ＶＤ１ＣＢＲ，
ＶＤ２ＣＢＲ，ＶＤ３ＣＢＲのためのものであり、行わ
れるループのカウントに初期化される。命令ＶＤ１ＣＢ
Ｒが行われると、レジスタＶＣＲ１は１だけデクレメン
ト(decrement）される。カウント値がゼロでなく前記命
令に特定された条件がＶＦＬＡＧと一致すると、ブラン
チが取られる。一致しなければ、ブランチは取られな
い。レジスタＶＣＲ１は２つの場合において１だけデク
レメントされる。レジスタＶＣＲ２，ＶＣＲ３も同一方
法で用いられる。Vector count registers VCR1 and VC
R2 and VCR3 are reduced and branch instructions VD1CBR,
For VD2CBR, VD3CBR, initialized to count of loops performed. Instruction VD1CB
When R is performed, register VCR1 is decremented by one. If the count value is not zero and the condition specified in the instruction matches VFLAG, a branch is taken. If they do not match, no branch is taken. Register VCR1 is decremented by one in two cases. Registers VCR2 and VCR3 are used in the same manner.

【０１７０】ベクトルグローバルマスクレジスタＶＧＭ
Ｒ０は、ＶＥＣ６モードで影響を受けるＶＲ（５７５：
２８８）内のエレメントとＶＥＣ６４モードにおけるＶ
Ｒ（２８７：０）内のエレメントを指示するのに用いら
れる。レジスタＶＧＭＲ０のそれぞれのビットはベクト
ル目的レジスタの９ビットの更新を制御する。具体的
に、ＶＧＭＲ０（ｉ）は、ＶＥＣ３２モードではＶＲｄ
（９ｉ＋８：９ｉ）の更新を、そしてＶＥＣ６４モード
ではＶＲ０ｄ（９ｉ＋８：９ｉ）の更新を制御する。Ｖ
Ｒ０ｄはＶＥＣ６４モードでバンク０の目的レジスタを
示し。ＶＲｄはＶＥＣ３２モードでバンク０或いはバン
ク１になれる現在バンクの目的レジスタを意味する。ベ
クトルグローバルマスクレジスタＶＧＭＲ０はＶＣＭＯ
ＶＭ命令を除いた全ての命令の遂行に用いられる。Vector global mask register VGM
R0 is the VR (575:
288) and V in VEC64 mode
Used to indicate an element in R (287: 0). Each bit of register VGMR0 controls the updating of 9 bits of the vector destination register. Specifically, VGMR0 (i) is VRd in the VEC32 mode.
The update of (9i + 8: 9i) is controlled, and in the VEC64 mode, the update of VR0d (9i + 8: 9i) is controlled. V
R0d indicates the target register of bank 0 in VEC64 mode. VRd means a destination register of the current bank which can be bank 0 or bank 1 in VEC32 mode. Vector global mask register VGMR0 is VCMO
Used for execution of all instructions except VM instructions.

【０１７１】ベクトルグローバルマスクレジスタＶＧＭ
Ｒ１はＶＥＣ６４モードで影響を受けるＶＲ（５７５：
２８８）内のエレメントを指示するのに用いられる。レ
ジスタＶＧＭＲ１のそれぞれのビットはバンク１のベク
トル目的レジスタの９ビットの更新を制御する。具体的
に、ＶＧＭＲ（ｉ）はＶＲ１ｄ（９ｉ＋８：９ｉ）の更
新を制御する。レジスタＶＧＲＭ１はＶＥＣ３２モード
では使用されないが、ＶＥＣ６４ではＶＣＭＯＶＭ命令
を除いた全ての命令の遂行に影響を及ぼす。Vector global mask register VGM
R1 is the affected VR in VEC64 mode (575:
288). Each bit of register VGMR1 controls the updating of 9 bits of the vector destination register of bank 1. Specifically, VGMR (i) controls updating of VR1d (9i + 8: 9i). Register VGRM1 is not used in VEC32 mode, but VEC64 affects the execution of all instructions except the VCMOVM instruction.

【０１７２】ベクトルオーバフローレジスタＶＯＲ０は
ベクトル算術演算後にオーバフロー結果を含んでいるＶ
ＥＣ６４モードでＶＲ（２８７：０）内のエレメントを
指示するのに用いられる。このレジスタはスカラ算術演
算に修正されない。セットされたビットＶＯＲ１（ｉ）
はバイトまたはバイト９の第ｉエレメント、ハフワード
の第（ｉｉｄｉｖ２）エレメント、或いはワードデータ
形演算の第（ｉｉｄｉｖ４）エレメントがオーバフロー
の結果を含んでいることを指示する。例えば、ビット１
とビット３は第１ハフワード及びワードエレメントのオ
ーバフローをそれぞれ指示するように設定される。ＶＯ
Ｒ０のビットのマッピングはＶＧＭＲ０或いはＶＧＭＲ
１のビットのマッピングとは異なる。The vector overflow register VOR0 stores the V containing the overflow result after the vector arithmetic operation.
Used to indicate an element in VR (287: 0) in EC64 mode. This register is not modified for scalar arithmetic. Bit VOR1 (i) set
Indicates that the ith element of byte or byte 9, the (idiv2) element of the Huffword, or the (idiv4) element of the word data type operation contains an overflow result. For example, bit 1
And bit 3 are set to indicate the overflow of the first Huff word and the word element, respectively. VO
R0 bit mapping is VGMR0 or VGMR
This is different from the mapping of one bit.

【０１７３】ベクトルオーバフローレジスタＶＯＲ１は
ベクトル算術演算後にオーバフローの結果を含んでいる
ＶＥＣ６４モードでＶＲ（５７５：２８８）内のエレメ
ントを指示するのに用いられる。レジスタＶＯＲ１はＶ
ＥＣ３２モードで使用されず、且つスカラ算術演算によ
って修正もされない。セットされたビットＶＯＲ１
（ｉ）はバイトまたはバイト９の第ｉエレメント、ハフ
ワードの第１（ｉｉｄｉｖ２）エレメント、或いはワー
ドデータ形演算の第（ｉｉｄｉｖ４）エレメントがオー
バフローの結果を含んでいることを指示する。例えば、
ビット１とビット３はそれぞれＶＲ（５７５：２８８）
で第１ハフワードとワードエレメントのオーバフローを
指示し得るようにセットされる。ＶＯＲ１のビットマッ
ピングはＶＧＭＲ０或いはＶＧＭＲ１のビットマッピン
グとは異なる。The vector overflow register VOR1 is used to point to an element in VR (575: 288) in VEC64 mode that contains the result of the overflow after vector arithmetic. The register VOR1 is V
It is not used in EC32 mode and is not modified by scalar arithmetic. Bit VOR1 set
(I) indicates that the ith element of byte or byte 9, the first (idiv2) element of the Huffword, or the (idiv4) element of the word data type operation contains an overflow result. For example,
Bits 1 and 3 are each VR (575: 288)
Is set to indicate the overflow of the first Huff word and the word element. The bit mapping of VOR1 is different from the bit mapping of VGMR0 or VGMR1.

【０１７４】ベクトル命令アドレス区切り点レジスタＶ
ＩＡＢＲはベクトルプログラムデバッグ(debugging）時
にこれを支援する。このレジスタ定義は表２７に示され
ている。Vector instruction address breakpoint register V
IABR supports this during vector program debugging. This register definition is shown in Table 27.

【０１７５】[0175]

【表４３】 [Table 43]

【０１７６】ベクトルデータアドレス区切り点レジスタ
ＶＤＡＢＲはベクトルプログラムのデバッグ(debuggin
g）時にこれを支援する。表２８にレジスタ定義が示さ
れている。The vector data address breakpoint register VDABR is used to debug vector programs.
g) sometimes assist in this. Table 28 shows the register definitions.

【０１７７】[0177]

【表４４】 [Table 44]

【０１７８】ベクトル移動マスクレジスタＶＭＭＲ０は
モード命令に対してＶＣＳＲ（ＳＭＭ）＝１の時のみな
らず、常にＶＣＭＯＶＭによって用いられる。レジスタ
ＶＭＭＲ０はＶＥＣ３２モードで影響を受ける目的レジ
スタのエレメント、及びＶＥＣ６４モードでＶＲ（２８
７：０）内のエレメントを指示する。ＶＭＭＲ０のそれ
ぞれのビットはベクトル目的レジスタの９ビットの更新
を制御する。具体的に、ＶＭＭＲ０（ｉ）はＶＥＣ３２
モードでＶＲｄ（９ｉ＋８：９ｉ）の更新及びＶＥＣ６
４モードでＶＲ０ｄ（９ｉ＋８：９ｉ）の更新を制御す
る。ＶＲ０ｄはＶＥＣ６４モードでバンク０の目的レジ
スタを示し、このＶＲｄはＶＥＣ３２モードでバンク０
或いはバンク１になれる現在バンクの目的レジスタを意
味する。The vector movement mask register VMMR0 is always used by the VCMOVM, not only when VCSR (SMM) = 1 for the mode instruction. Register VMMR0 is the element of the destination register affected in VEC32 mode, and VR (28) in VEC64 mode.
7: 0). Each bit of VMMR0 controls a 9-bit update of the vector destination register. Specifically, VMMR0 (i) is VEC32
Update VRd (9i + 8: 9i) in mode and VEC6
Update of VR0d (9i + 8: 9i) is controlled in four modes. VR0d indicates the target register of bank 0 in VEC64 mode, and this VRd indicates the target register of bank 0 in VEC32 mode.
Alternatively, it means a target register of the current bank which can be the bank 1.

【０１７９】ベクトル移動マスクレジスタＶＭＭＲ１は
全ての命令に対してＶＣＳＲ（ＳＭＭ）＝１の時のみな
らず、常にＶＣＭＯＶＭによって用いられる。レジスタ
ＶＭＭＲ１はＶＥＣ３２モードで影響を受けるＶＲ（５
７５：２８８）内のエレメントを指示する。ＶＭＭＲ１
のそれぞれのビットはバンク１のベクトル目的レジスタ
の９ビットに対する更新を制御する。具体的に、ＶＧＭ
Ｒ０１（ｉ）はＶＲｄ（９ｉ＋８：９ｉ）の更新を制御
する。レジスタＶＧＭＲ１はＶＥＣ３２モードで用いら
れない。The vector movement mask register VMMR1 is always used by the VCMOVM, not only when VCSR (SMM) = 1 for all instructions. The register VMMR1 stores the affected VR (5) in the VEC32 mode.
75: 288). VMMR1
Control the update to 9 bits of the bank 1 vector destination register. Specifically, VGM
R01 (i) controls updating of VRd (9i + 8: 9i). Register VGMR1 is not used in VEC32 mode.

【０１８０】ベクトル及びＡＲＭ７同期レジスタＶＡＳ
ＹＮＣはプロセッサ１１０とプロセッサ１２０との間に
生産者／消費者形態の同期を提供する。現在、ビット３
０のみが定義されている。ＡＲＭ７プロセッサは命令
（ＭＦＥＲ，ＭＴＥＲ，ＴＥＳＴＳＥＴ）を使用してレ
ジスタＶＡＳＹＮＣをアクセスすることができ、ベクト
ルプロセッサ１２０は状態ＶＰ＿ＲＵＮ或いは状態ＶＰ
＿ＩＤＬＥにある。レジスタＶＡＳＹＮＣはＴＶＰ或い
はＭＦＶＰ命令を通じてＡＲＭ７プロセッサにアクセス
できないが、これはこれら命令が第１の１６ベクトルプ
ロセッサの特殊レジスタに対してアクセスし得ないため
である。ベクトルプロセッサはＶＭＯＶ命令を通じてレ
ジスタＶＡＳＹＮＣをアクセスすることができる。Vector and ARM7 synchronization register VAS
YNC provides producer / consumer style synchronization between processor 110 and processor 120. Currently bit 3
Only 0 is defined. The ARM7 processor can access the register VASYNC using instructions (MFER, MTER, TESTSET), and the vector processor 120 can read the state VP_RUN or state VP.
_IDLE. Registers VASYNC cannot access the ARM7 processor through the TVP or MFVP instructions, since these instructions cannot access the special registers of the first 16 vector processor. The vector processor can access the register VASYNC through the VMOV instruction.

【０１８１】表２９はパワーオンリセット時の前記ベク
トルプロセッサの状態を示す。Table 29 shows the state of the vector processor at the time of power-on reset.

【０１８２】[0182]

【表４５】 [Table 45]

【０１８３】前記特殊レジスタは前記ベクトルプロセッ
サが命令を行える前に、ＡＲＭ７プロセッサ１１０によ
って初期化される。The special register is initialized by the ARM7 processor 110 before the vector processor can execute an instruction.

【０１８４】〔別添Ｄ〕各命令はソースと目的オベラン
ドのデータタイプを意味するか或いは指定する。いくつ
かの命令はソースに対して一つのデータタイプを取り、
結果に対して相違したデータタイプを生成する意味をも
つ。この別添は好ましい実施例で指示されるデータタイ
プを説明する。この出願の表３０では支持されるデータ
タイプｉｎｔ８，ｉｎｔ９，ｉｎｔ１６，ｉｎｔ３２，
及びフロート(float）について説明した。符号の無い整
数フォーマット(unsigned integer format）は支持され
ず、そしてそれの符号の無い整数値はまず使用前に２の
補数フォーマットに変換されるべきである。プログラマ
はオーバフローを適切に処理する限り、その選択による
ある他のフォーマット或いは符号の無い整数フォーマッ
トをもつ算術命令を自由に使用することができる。アー
キテクチャは単に２の補数整数のオーバフロー及び３２
ビット浮動小数点データタイプを定義する。アーキテク
チャは符号なしオーバフローの検出に必要な８，９，１
６，或いは３２ビット演算のキャリアウトを検出しな
い。[Appendix D] Each instruction means or designates a data type of a source and a destination overland. Some instructions take one data type for the source,
It has the meaning of generating a different data type for the result. This appendix describes the data types indicated in the preferred embodiment. In Table 30 of this application, the supported data types int8, int9, int16, int32,
And float. The unsigned integer format is not supported, and its unsigned integer value should first be converted to two's complement format before use. The programmer is free to use arithmetic instructions with some other format or unsigned integer format depending on their choice, as long as they handle overflow appropriately. The architecture is simply two's complement overflow and 32
Defines a bit floating point data type. The architecture requires 8, 9, 1 to detect unsigned overflow
It does not detect carry out of 6 or 32 bit operation.

【０１８５】表３０はロード(load)演算によって支持さ
れるデータサイズを示す。Table 30 shows the data sizes supported by the load operation.

【０１８６】[0186]

【表４６】 [Table 46]

【０１８７】アーキテクチャはデータタイプ境界に存在
するようにメモリアドレス整列を指定する。即ち、バイ
トに対しては何の整列要求事項もない。ハフワードに対
する整列要求事項はハフワード境界である。ワードに対
する整列要求事項はワード境界である。The architecture specifies memory address alignment to be on data type boundaries. That is, there is no alignment requirement for bytes. The alignment requirement for a Huffword is a Huffword boundary. The alignment requirement for a word is a word boundary.

【０１８８】表３１はストア(store）演算によって支持
されるデータサイズを示す。Table 31 shows the data sizes supported by the store operation.

【０１８９】[0189]

【表４７】 [Table 47]

【０１９０】１以上のダム(dam）タイプはスカラ或いは
ベクトルでレジスタにマッピングされているために、若
干のデータタイプに対して何の定義もされていない結果
をもつ目的レジスタにビットが存在することができる。
実際に、ベクトル目的レジスタに対するバイト９データ
サイズ演算とスカラ目的レジスタに対するワードデータ
サイズ演算以外にも目的レジスタでその値が演算によっ
て定義されていないビットが存在する。これらビットの
ために、アーキテクチャはそれらの値が未定の状態にな
るように指定する。表３２は各データサイズに対して定
義されていないビットを示す。Since one or more dam types are mapped to registers as scalars or vectors, there must be bits in the destination register that have undefined results for some data types. Can be.
Actually, in addition to the byte 9 data size operation for the vector destination register and the word data size operation for the scalar destination register, there are bits in the destination register whose values are not defined by the operation. For these bits, the architecture specifies that their values be in an undefined state. Table 32 shows the undefined bits for each data size.

【０１９１】[0191]

【表４８】 [Table 48]

【０１９２】プログラマはプログラミング時にソース及
び目的レジスタ或いはメモリのデータタイプを知ってい
なければならない。一つのエレメントサイズから他のエ
レメントサイズへのデータタイプ変換は暫定的にベクト
ルレジスタに相違した数のエレメントが記憶されるよう
にする。例えば、ハフワードのワードデータタイプへの
ベクトルレジスタ変換は同一数の変換されたエレメント
を記憶するのに２つのレジスタを必要とする。逆に、ベ
クトルレジスタで使用者定義されたフォーマットが持て
るワードデータタイプからハフワードフォーマットへの
変換はベクトルレジスタの１／２に同一数のエレメント
と、他の１／２に残りのビットを生成する。いずれか一
つの場合に、データタイプの変換はソースエレメントと
相違したサイズをもつ変換されたエレメントの整列をも
つ構造的な発行(issue）を生成する。The programmer must know the source and destination registers or the data type of the memory during programming. Data type conversion from one element size to another element size temporarily stores a different number of elements in the vector register. For example, a vector register conversion of a Huff word to a word data type requires two registers to store the same number of converted elements. Conversely, conversion from the word data type, which can have a user-defined format in the vector register, to the Huffword format, produces the same number of elements in half the vector register and the remaining bits in the other half. . In either case, the data type conversion produces a structured issue with an array of converted elements having a different size than the source element.

【０１９３】原則的に、ＭＳＰアーキテクチャは結果と
してエレメントの数を密かに変更する演算を提供しな
い。アーキテクチャはプログラマが目的レジスタでエレ
メントの数を変更させる順序を知っていると判断する。
アーキテクチャは只一つのデータタイプから同一サイズ
の他のデータタイプに変換する演算を提供し、一つのデ
ータタイプから異なるサイズの他のデータタイプに変換
する時、プログラマがデータサイズの差異を調整するこ
とを要求する。In principle, the MSP architecture does not provide operations that secretly change the number of elements as a result. The architecture determines that the programmer knows the order in which to change the number of elements in the destination register.
The architecture provides operations to convert from one data type to another data type of the same size, and allows the programmer to adjust for data size differences when converting from one data type to another of a different size. Request.

【０１９４】別添Ｅに説明されるＶＳＨＦＬＬ及びＶＵ
ＮＳＨＦＬＬのような特殊命令は第１サイズをもつベク
トルから第２データサイズをもつ第２ベクトルへの変換
を単純にする。ベクトルＶＲａ、例えばさらに小さいエ
レメントサイズのｉｎｔ８から、例えばさらに大きいサ
イズのｉｎｔ１６へ２の補数データタイプを変換するの
に含まれた基本段階は次のようである。VSHFLLL and VU described in Appendix E
Special instructions such as NSHFLL simplify the conversion from a vector having a first size to a second vector having a second data size. The basic steps involved in converting a two's complement data type from a vector VRa, eg, a smaller element size int8, to a larger size int16, for example, are as follows.

【０１９５】１．異なるベクトルＶＲｂをもつＶＲａに
あるエレメントをバイトデータタイプを使用して２ベク
トル（ＶＲｃ：ＶＲｄ）に分割する(shuffle）。ＶＲａ
にあるエレメントはダブルサイズレジスタ（ＶＲｃ：Ｖ
Ｒｄ）にあるｉｎｔ１６データエレメントの下位バイト
に移動させ、その値と関係の無いＶＲｂのエレメントは
ＶＲｃ：ＶＲｄの上位バイトに移動させる。この演算は
各エレメントのサイズをバイトからハフワードにダブル
化される間、ＶＲａエレメントの１／２をＶＲｃに、残
りの１／２をＶＲｄに効果的に移動させる。1. Elements in VRa having different vectors VRb are shuffled into two vectors (VRc: VRd) using a byte data type. VRa
Is a double size register (VRc: V
The element is moved to the lower byte of the int16 data element in Rd), and the element of VRb not related to the value is moved to the upper byte of VRc: VRd. This operation effectively moves one half of the VRa element to VRc and the other half to VRd while the size of each element is doubled from bytes to Huffwords.

【０１９６】２．８ビットでＶＲｃ：ＶＲｄにあるエレ
メントを算術シフトさせてそれらをサイン拡張させる。The elements at 2.8 bits in VRc: VRd are arithmetically shifted to sign extend them.

【０１９７】ベクトルＶＲａ、例えばさらに大きいエレ
メントサイズのｉｎｔ１６から、例えばさらに小さいサ
イズのｉｎｔ８に２の補数データタイプを変換するのに
含まれた基本段階は次のようである。The basic steps involved in converting a two's complement data type from a vector VRa, eg, a larger element size int16, to a smaller size int8, for example, are as follows.

【０１９８】１．ｉｎｔ１６データタイプの各エレメン
トがバイトサイズで表現され得るかを保障するためにチ
ェックする。もし必要なら、さらに小さいサイズに合わ
せるために両端のエレメントを飽和(saturate)させる。1. Check to ensure that each element of int16 data type can be represented in byte size. If necessary, saturate the elements at both ends to fit the smaller size.

【０１９９】２．異なるベクトルＶＲｂをもつＶＲａに
あるエレメントを２ベクトルＶＲｃ：ＶＲｄに結合させ
る(unshuffle）。ＶＲａとＶＲｂにある各エレメントの
上位１／２をＶＲｃに移動させ、下位１／２をＶＲｄに
移動させる。これはＶＲａの全てのエレメントの下位１
／２をＶＲｄの下位１／２に効果的に集める。2. Unshuffle elements in VRa with different vectors VRb into two vectors VRc: VRd. The upper half of each element in VRa and VRb is moved to VRc, and the lower half is moved to VRd. This is the lower one of all elements of VRa
/ 2 is effectively collected in the lower half of VRd.

【０２００】特殊な命令は次のデータタイプ変換に提供
される：ｉｎｔ３２を単一精密浮動小数点に；単一精密
浮動小数点を固定小数点に（Ｘ．Ｙ注解）；単一精密浮
動小数点をｉｎｔ３２に；ｉｎｔ８をＩｎｔ９に；ｉｎ
ｔ９をｉｎｔ１６に；及びｉｎｔ１６をｉｎｔ９に。Special instructions are provided for the following data type conversions: int32 to single precision floating point; single precision floating point to fixed point (XY note); single precision floating point to int32. Int8 to Int9; in
t9 to int16; and int16 to int9.

【０２０１】ベクトルプログラミングに余裕度を与える
ために大部分のベクトル命令はベクトル内から選択され
たエレメントに対してのみ演算を行うようにエレメント
マスクを使用する。ベクトルグローバルマスクレジスタ
(Vector Global Mask Register：ＶＧＭＲ０，ＶＧＭＲ
１）はベクトル命令によってベクトルアキュムレータと
目的レジスタで修正されるエレメントを識別する。バイ
ト及びバイト９データサイズ演算のためにＶＧＭＲ０
（或いはＶＧＭＲ１）で３２ビットそれぞれは演算され
るエレメントを識別する。セット状態のビット（ＶＧＭ
Ｒ０（ｉ）はバイトサイズのエレメント（ｉ，ここでｉ
は０から３１まで）が影響を受けることを指示する。ハ
フワードデータサイズ演算のためにＶＧＭＲ０（或いは
ＶＧＭＲ１）で各３２ビット対は演算されるエレメント
を識別する。セット状態のビットＶＧＭＲ０（２ｉ：２
ｉ＋１）はエレメント（ｉ，ここでｉは０から１５ま
で）が影響を受けることを指示する。もしＶＧＭＲ０で
一対のうち只１つのビットがハフワードデータサイズ演
算のためにセットされた場合、対応するバイトで只その
ビットのみが修正される。ワードデータサイズ演算のた
めにＶＧＭＲ０（或いはＶＧＭＲ１）で各４ビットセッ
トは演算されるエレメントを識別する。セット状態のビ
ットＶＧＭＲ０（４ｉ：４ｉ＋３）はエレメント（ｉ，
ここでｉは０から７まで）が影響を受けることを指示す
る。もしＶＧＭＲ０で４ビットセットの全てのビットが
ワードデータサイズ演算のためにセットされない場合、
対応するバイトで単にそのビットのみが修正される。Most vector instructions use element masks to perform operations only on elements selected from within the vector to provide room for vector programming. Vector global mask register
(Vector Global Mask Register: VGMR0, VGMR
1) identifies the elements to be modified in the vector accumulator and destination register by the vector instruction. VGMR0 for byte and byte 9 data size operations
(Or VGMR1) each of the 32 bits identifies the element to be operated on. Set bit (VGM
R0 (i) is a byte-sized element (i, where i
Indicates that 0 to 31) are affected. Each VGMR0 (or VGMR1) 32-bit pair identifies the element to be operated on for Huffword data size operation. The bit VGMR0 (2i: 2) in the set state
i + 1) indicates that the element (i, where i is from 0 to 15) is affected. If only one bit of a pair is set for a Huffword data size operation in VGMR0, only that bit is modified in the corresponding byte. Each VGMR0 (or VGMR1) 4-bit set identifies the element to be operated on for word data size operation. The bit VGMR0 (4i: 4i + 3) in the set state includes the element (i,
Where i is from 0 to 7). If in VGMR0 not all bits of the 4-bit set are set for word data size operation,
Only that bit is modified in the corresponding byte.

【０２０２】ＶＧＭＲ０及びＶＧＭＲ１はベクトルレジ
スタをベクトル或いはスカラレジスタ或いはＶＣＭＰＶ
命令を使用した即値と比較することによりセットされる
ことができる。この命令は特定されたビットサイズによ
ってマスクを適切にセットする。スカラレジスタは只一
つのデータエレメントを含むように定義されるので、ス
カラ演算（即ち、目的レジスタがスカラである）はエレ
メントマスクによって影響を受けない。VGMR0 and VGMR1 are vector or scalar registers or VCMPV.
It can be set by comparing to the immediate value using the instruction. This instruction sets the mask appropriately according to the specified bit size. Since a scalar register is defined to contain only one data element, scalar operations (ie, the destination register is a scalar) are not affected by the element mask.

【０２０３】ベクトルプログラミングに余裕度を与える
ために、大部分のＭＳＰ命令は３形態のベクトルとスカ
ラ演算を支援する。それらは次のようである：１．ベクトル＝ベクトルｏｐベクトル２．ベクトル＝ベクトルｏｐスカラ３．スカラ＝スカラｏｐスカラスカラレジスタがＢオペランドとして特定されているケ
ース２の場合、スカラレジスタで単一エレメントはベク
トルＡオベランド内に多数のエレメントをマッチングさ
せるのに要求されるだけ多く複製される。複製されたエ
レメントは特定されたスカラオペランドでエレメントを
同じ値をもつ。スカラオペランドはスカラレジスタ或い
は命令から即値オペランド(immedoate operand）の形態
になることができる。即値オベランドの場合にもし特定
されたデータタイプが即値フィールドサイズの有用なも
のよりさらに大きいデータサイズを使用する場合、適当
なサイン−拡張が加えられる。To provide room for vector programming, most MSP instructions support three forms of vector and scalar operations. They are as follows: Vector = vector op vector 2. Vector = vector op scalar Scalar = scalar op scalar In case 2 where the scalar register is specified as a B operand, a single element in the scalar register is copied as many times as required to match as many elements in the vector A overland. The duplicated element has the same value as the element in the specified scalar operand. A scalar operand can be in the form of an immedoate operand from a scalar register or instruction. In the case of the immediate obland, if the specified data type uses a data size that is larger than the useful value of the immediate field size, an appropriate sign-extension is added.

【０２０４】多くのマルチメディア応用ではソース、中
間及び最終結果の精密性に特別な注意が要求される。し
かも、整数マルチプライ(integer multiply)命令は２ベ
クトルレジスタに記憶され得る“２倍精密”中間結果を
生成する。Many multimedia applications require special attention to the precision of the source, intermediate and final results. Moreover, the integer multiply instruction produces a "double precision" intermediate result that can be stored in a two vector register.

【０２０５】ＭＳＰアーキテクチャは現在８，９，１
６，及び３２ビットエレメントに対して２の補数整数フ
ォーマットと３２ビットエレメントに対してＩＥＥＥ７
５４単一精密フォーマットを支援する。オーバフローは
特定されたデータタイプによって表現され得る一番ポジ
ティブ或いは一番ネガティブ値以上の結果となるように
定義される。オーバフローが発生する時、目的レジスタ
に記録された値は有効番号でない。アンダーフローは単
に浮動小数点演算についてのみ定義される。The MSP architecture is currently 8, 9, 1
2's complement integer format for 6 and 32 bit elements and IEEE 7 for 32 bit elements
Supports 54 single precision formats. Overflow is defined to result in more than the most positive or most negative value that can be represented by the specified data type. When an overflow occurs, the value recorded in the destination register is not a valid number. Underflow is defined only for floating point operations.

【０２０６】もし、その他の状態でなければ、全ての浮
動小数点演算はビット（ＶＣＳＲ＜ＲＭＯＤＥ）で特定
された４つのラウンディングモードのうち一つを使用す
る。若干の命令はゼロ（ラウンドイブン）ラウンディン
グモードからラウンドアウェイ(round away)として知ら
れたものを使用する。If not otherwise, all floating point operations use one of the four rounding modes specified by the bits (VCSR <RMODE). Some instructions use what is known as round away from the zero (round even) rounding mode.

【０２０７】飽和(Saturation)は多くのマルチメディア
応用で重要な機能である。ＭＳＰアーキテクチャは全て
の４整数及び浮動小数点演算で飽和を支援する。レジス
タＶＣＳＲでビットＩＳＡＴは整数飽和モードを特定す
る。また、速いＩＥＥＥモードと周知された浮動小数点
飽和モードはＶＣＳＲでＦＳＡＴビットに特定される。
飽和モードがイネーブルされる時、一番ポジティブ或い
は一番ネガティブ値以上になる結果はそれぞれ一番ポジ
ティブ或いは一番ネガティブ値にセットされる。オーバ
フローはこの場合に発生することができなく、オーバフ
ロービットはセットされることができない。Saturation is an important feature in many multimedia applications. The MSP architecture supports saturation for all four integer and floating point operations. Bit ISAT in register VCSR specifies the integer saturation mode. The floating point saturation mode, known as the fast IEEE mode, is specified in the FCSR bit in the VCSR.
When saturation mode is enabled, results that are greater than or equal to the most positive or most negative value are set to the most positive or most negative value, respectively. No overflow can occur in this case and the overflow bit cannot be set.

【０２０８】表３３は欠陥のある命令を実行する前に検
出されて報告される精密な例外(Precise Exception）に
対するリストを示す。Table 33 shows a list of Precise Exceptions that are detected and reported before executing the faulty instruction.

【０２０９】[0209]

【表４９】 [Table 49]

【０２１０】表３４は欠陥のある命令よりプログラム順
序において後の方に存在するある番号の命令を実行した
後、検出されて報告される不精密な例外(Imprecise Exc
eption）に対するリストを示す。Table 34 shows that an inexact exception (Imprecise Exc.) Detected and reported after executing a numbered instruction that is later in the program order than the defective instruction.
2 shows a list for the eption).

【０２１１】[0211]

【表５０】 [Table 50]

【０２１２】〔別添Ｅ〕ベクトルプロセッサに対する命
令セットは表３５に示すように１１個の分類を含む。[Appendix E] The instruction set for the vector processor includes 11 classes as shown in Table 35.

【０２１３】[0213]

【表５１】 [Table 51]

【０２１４】[0214]

【表５２】 [Table 52]

【０２１５】表３６はフローコントロール(Flow Contro
l)命令に対するリストを示す。Table 36 shows the flow control (Flow Control).
l) Show a list for instructions.

【０２１６】[0216]

【表５３】 [Table 53]

【０２１７】論理（Logical)分類はブール(Boolean）デ
ータタイプを支援し、エレメントマスクによって影響を
受ける。表３７は論理(logic）命令リストである。Logical classification supports Boolean data types and is affected by element masks. Table 37 is a list of logic instructions.

【０２１８】[0218]

【表５４】 [Table 54]

【０２１９】シフト／ローテート(Shift/Rotate)分類命
令はｉｎｔ８，ｉｎｔ９，ｉｎｔ１６及びｉｎｔ３２デ
ータタイプ（フロートデータタイプでない）を演算し、
エレメントマスクによって影響を受ける。表３８はシフ
ト／ローテート分類命令リストである。The shift / rotate classification instruction operates on int8, int9, int16 and int32 data types (not float data types),
Affected by element mask. Table 38 is a list of shift / rotate classification instructions.

【０２２０】[0220]

【表５５】 [Table 55]

【０２２１】算術(Arithmetic)分類命令は一般にｉｎｔ
８，ｉｎｔ９，ｉｎｔ１６，ｉｎｔ３２，及びフローデ
ータタイプを支援し、エレメントマスクによって影響を
受ける。支援されないデータタイプに対する特別な制限
に対しては次の各命令の詳細な説明を参照されたい。Ｖ
ＣＭＰＶ命令はそれがエレメントマスクを演算するの
で、エレメントマスクによって影響を受けない。表３９
は算術演算命令リストである。Arithmetic classification instructions are generally int
Supports 8, int9, int16, int32, and flow data types and is affected by element masks. See the detailed description of each instruction below for any special restrictions on unsupported data types. V
The CMPV instruction is not affected by the element mask because it operates on the element mask. Table 39
Is an arithmetic operation instruction list.

【０２２２】[0222]

【表５６】 [Table 56]

【０２２３】ＭＰＥＧ命令はＭＰＥＧ符号化及び復号化
に特に適した命令分類であるが、多様な方式で用いられ
ることができる。ＭＰＥＧ命令はｉｎｔ８，ｉｎｔ９，
ｉｎｔ１６及びｉｎｔ３２データタイプを支援し、エレ
メントマスクによって影響を受ける。表４０はＭＰＥＧ
命令リストである。The MPEG instruction is an instruction class particularly suitable for MPEG encoding and decoding, but can be used in various ways. MPEG instructions are int8, int9,
Supports int16 and int32 data types and is affected by element masks. Table 40 is MPEG
This is an instruction list.

【０２２４】[0224]

【表５７】 [Table 57]

【０２２５】各データタイプ変換(Data Type Conversio
n)命令は特殊なデータタイプを支援し、アーキテクチャ
がレジスタで１以上のデータタイプを支援しないため
に、エレメントマスクによって影響を受けない。表４１
はデータタイプ変換命令リストである。Each data type conversion (Data Type Conversio)
n) Instructions support special data types and are not affected by element masks because the architecture does not support more than one data type in registers. Table 41
Is a data type conversion instruction list.

【０２２６】[0226]

【表５８】 [Table 58]

【０２２７】インタ−エレメント算術（Inter-element
Arithmetic）分類命令はｉｎｔ８，ｉｎｔ９，ｉｎｔ１
６，ｉｎｔ３２及びフローデータタイプを支援する。表
４２はインタエレメント算術分類命令リストである。Inter-element arithmetic
Arithmetic) Classification instructions are int8, int9, int1
Supports 6, int32 and flow data types. Table 42 is an inter-element arithmetic classification instruction list.

【０２２８】[0228]

【表５９】 [Table 59]

【０２２９】インタエレメントムーブ(Inter-element M
ove)分類命令はバイト、バイト９、ハフワード及びワー
ドデータサイズを支援する。表４３はインタエレメント
ムーブ分類命令リストである。[0229] Inter-element M
ove) The sort instruction supports byte, byte 9, huff word and word data sizes. Table 43 is an inter-element move classification instruction list.

【０２３０】[0230]

【表６０】 [Table 60]

【０２３１】ロード／ストア(Load/Store)命令はバイ
ト、ハフワード、及びワードデータサイズに加えて特殊
なバイト９に関連したデータサイズ演算を支援し、エレ
メントマスクによって影響を受けない。表４４はロード
／ストア分類命令リストである。The Load / Store instruction supports the byte, Huffword, and Word data sizes plus the data size operation associated with the special byte 9 and is unaffected by the element mask. Table 44 is a list of load / store classification instructions.

【０２３２】[0232]

【表６１】 [Table 61]

【０２３３】大部分のレジスタムーブ(Register Move）
命令はｉｎｔ８，ｉｎｔ９，ｉｎｔ１６，ｉｎｔ３２及
びフローデータタイプを支援し、エレメントマスクによ
って影響を受けない。但し、ＶＣＭＯＶＭ命令はエレメ
ントマスクによって影響を受ける。表４５はレジスタム
ーブ分類の命令リストである。Most Register Moves
Instructions support int8, int9, int16, int32 and flow data types and are unaffected by element masks. However, the VCMOVM instruction is affected by the element mask. Table 45 is an instruction list of the register move classification.

【０２３４】[0234]

【表６２】 [Table 62]

【０２３５】表４６はキャッシュサブシステム１３０を
制御するキャッシュ演算(Cache Operation）分類の命令
リストである。Table 46 is a list of instructions of a cache operation classification for controlling the cache subsystem 130.

【０２３６】[0236]

【表６３】 [Table 63]

【０２３７】命令説明命名法命令セットの説明を単純化するために、別添全体にわた
って特殊な用語が用いられる。例えば、命令オペランド
は他の注釈がない場合、バイト、バイト９、ハフワード
或いはワードサイズの符号付きた２の補数整数である。
単語“レジスタ”は汎用（スカラ或いはベクトル）レジ
スタを指称するのに用いられる。他のタイプのレジスタ
は明らかに説明される。アセンブリ言語構文(syntax)に
おいて、接尾語ｂ，ｂ９，ｈ及びｗはデータサイズ（バ
イト、バイト９、ハフワード、及びワード）と整数デー
タタイプ（ｉｎｔ８，ｉｎｔ９，ｉｎｔ１６，及びｉｎ
ｔ３２）の全てを示す。また、命令オペランド、演算、
及びアセンブリ言語構文類の説明に用いられた用語と記
号は次の通りである。 Instruction Description Nomenclature To simplify the description of the instruction set, special terms are used throughout the appendix. For example, the instruction operand is a signed two's complement integer of byte, byte 9, Huffword or word size, unless otherwise noted.
The word "register" is used to refer to a general purpose (scalar or vector) register. Other types of registers are explicitly described. In the assembly language syntax, the suffixes b, b9, h and w are the data size (bytes, bytes 9, huff words, and words) and integer data types (int8, int9, int16, and in).
t32). Also, instruction operands, operations,
The terms and symbols used in the description of the assembly language syntaxes are as follows.

【０２３８】Ｒｄ目的レジスタ（ベクトル、
スカラ或いは特殊目的）Ｒａ，Ｒｂソースレジスタ（ａ，ｂ）（ベクトル、
スカラ或いは特殊目的）Ｒｃソース或いは目的レジスタ（ｃ）（ベク
トル或いはスカラ）Ｒｓストアデータソースレジスタ（ベクトル
或いはスカラ）Ｓ３２ビットスカラ或いは特殊目的レジス
タＶＲ現在バンクベクトルレジスタＶＲＡ代替バンクベクトルレジスタＶＲ０バンク０ベクトルレジスタＶＲ１バンク１ベクトルレジスタＶＲｄベクトル目的レジスタ（ＶＲＡが指定さ
れない限り、現在バンクに対するデフォールト）ＶＲａ，ＶＲｂベクトルソースレジスタ（ａ及び
ｂ）ＶＲｃベクトルソース或いは目的レジスタ
（ｃ）ＶＲｓベクトルストアデータソースレジスタＶＡＣ０Ｈベクトルアキュムレータレジスタ０ハイＶＡＣ０Ｌベクトルアキュムレータレジスタ０ローＶＡＣ１Ｈベクトルアキュムレータレジスタ１ハイＶＡＣ１Ｌベクトルアキュムレータレジスタ１ローＳＲｄスカラ目的レジスタＳＲａ，ＳＲｂスカラソースレジスタ（ａ及びｂ）ＳＲｂ＋有効アドレスをもつベースレジスタのア
ップデートＳＲｓスカラストアデータソースレジスタＳＰ特殊目的レジスタＶＲ〔ｉ〕ベクトルレジスタＶＲにおけるｉ番目の
エレメントＶＲ〔ｉ〕（ａ：ｂ）ベクトルレジスタＶＲにおけ
るｉ番目のエレメントのビット（ａ〜ｂ）ＶＲ〔ｉ〕（ｍｓｂ）ベクトルレジスタＶＲにおけ
るｉ番目のエレメントの最上位ビットＥＡメモリアクセスのための有効アドレスＭＥＭメモリＢＹＴＥ〔ＥＡ〕ＥＡによってアドレスされるメモ
リの１バイトＨＡＬＦ〔ＥＡ〕ＥＡによってアドレスされるメモ
リのハフワード。ビット（１５：８）がＥＡ＋１によっ
てアドレスされる。Rd destination register (vector,
Scalar or special purpose) Ra, Rb source register (a, b) (vector,
Scalar or special purpose) Rc Source or destination register (c) (vector or scalar) Rs Store data source register (vector or scalar) S 32-bit scalar or special purpose register VR Current bank vector register VRA Alternative bank vector register VR0 Bank 0 vector Register VR1 Bank 1 vector register VRd Vector destination register (default for current bank unless VRA is specified) VRa, VRb Vector source register (a and b) VRc Vector source or destination register (c) VRs Vector store data source register VAC0H vector Accumulator register 0 high VAC0L Vector accumulator register 0 low VAC1H Vector accumulator register 1 High VAC1L Vector accumulator register 1 Low SRd Scalar purpose register SRa, SRb Scalar source register (a and b) SRb + Update base register with effective address SRs Scalar store data source register SP Special purpose register VR [i] i in vector register VR Bit [a] (a: b) of the i-th element in the vector register VR (ab) VR [i] (msb) Most significant bit of the element i in the vector register VR EA Effective address for MEM memory BYTE [EA] 1 byte of memory addressed by EA HALF [EA] Huff word of memory addressed by EA. Bits (15: 8) are addressed by EA + 1.

【０２３９】ＷＯＲＤ〔ＷＡ〕ＥＡによってアドレ
スされるメモリのワード。ビット（３１：２４）がＥＡ
＋３によってアドレスされる。WORD [WA] Word of memory addressed by EA. Bits (31:24) are EA
Addressed by +3.

【０２４０】ＮｕｍＥｌｅｍ与えられたデータタイ
プに対するエレメントの数を示す。それはＶＥＣ３２モ
ードでそれぞれバイト、バイト９、ハフワード、或いは
ワードデータサイズに対して３２，１６，或いは８であ
る。それはＶＥＣ６４モードでそれぞれバイト、バイト
９、ハフワード、或いはワードデータサイズに対して６
４，３２，或いは１６である。スカラ演算の場合、Ｎｕ
ｍＥｌｅｍは０である。NumElem Indicates the number of elements for a given data type. It is 32, 16, or 8 for byte, byte 9, huffword, or word data size, respectively, in VEC32 mode. It is 6 bytes for byte, byte 9, huff word or word data size in VEC64 mode, respectively.
4, 32, or 16. Nu for scalar operation
mElem is 0.

【０２４１】ＥＭＡＳＫ〔ｉ〕ｉ番目のエレメント
に対するエレメントマスクを示す。それはそれぞれバイ
ト、バイト９、ハフワード、或いはワードデータサイズ
に対してＶＧＭＲ０／１，〜ＶＧＭＲ０／１，ＶＧＭＲ
０／１，或いは〜ＶＧＭＲ０／１で１，２，或いは４ビ
ットを示す。スカラ演算の場合、ＥＭＡＳＫ〔ｉ〕＝０
であってもエレメントマスクはセットされたと推定す
る。EMASK [i] Indicates the element mask for the i-th element. It is VGMR0 / 1 to VGMR0 / 1, VGMR for byte, byte 9, Huffword or word data size respectively.
0/1, or ~ VGMR0 / 1 indicates 1, 2, or 4 bits. In the case of a scalar operation, EMASK [i] = 0
However, it is estimated that the element mask has been set.

【０２４２】ＭＭＡＳＫ〔ｉ〕ｉ番目のエレメント
に対するエレメントマスクを示す。それはそれぞれバイ
ト、バイト９、ハフワード、或いはワードデータサイズ
に対してＶＭＭＲ０、或いはＶＭＭＲ１で１，２，或い
は４ビットを示す。MMASK [i] Indicates the element mask for the i-th element. It indicates 1, 2, or 4 bits in VMMR0 or VMMR1, respectively, for byte, byte 9, Huffword, or word data size.

【０２４３】ＶＣＳＲベクトルコントロール
及び状態レジスタＶＣＳＲ（ｘ）ＶＣＳＲで１つのビット或いは複数
のビットを示す。“ｘ”はフィールド名である。VCSR Vector control and status register VCSR (x) One bit or a plurality of bits is indicated by VCSR. “X” is a field name.

【０２４４】ＶＰＣベクトルプロセッサプログラムカウ
ンタＶＥＣＳＩＺＥベクトルレジスタサイズはＶＥＣ３
２で３２、ＶＥＣ６４モードで６４である。VPC Vector Processor Program Counter VECSIZE The vector register size is VEC3
2 for 32 and 64 for VEC64 mode.

【０２４５】ＳＰＡＤスクラッチパッドＣプログラミング構成物は演算のコントロールフローを
説明するのに用いられる。例外は次のように要約され
る。The SPAD scratchpad C programming construct is used to describe the control flow of the operation. Exceptions are summarized as follows:

【０２４６】＝代入(assignment) ：接合(consatenation）｛ｘ‖ｙ｝ｘとｙの間の選択を指示する（論理ｏｒではない）ｓｅｘ特定データサイズに符号−拡張するｓｅｘ−ｄｐ特定データサイズの２倍精密度で符号−拡張ｚｅｘ特定データサイズにゼロ−拡張するｚｅｒｏゼロ−拡張された（論理）右に移動左に移動する（ゼロ充てん）ｔｒｎｃ７先行７ビット（ハフワードから）を打ち切るｔｒａｃ１先行１ビット（バイト９から）を打ち切る％モジュロ演算者｜式｜式の絶対値／分割（フロートデータタイプに対して４ＩＥＥＥラウンディングモードのうち一つを使用する）／／分割（ゼロラウンディングモードからラウンドアウェイ(round away）を使用する）飽和整数データタイプに対してオーバフロー発生の代わりに一番陰或いは一番陽の値に飽和する。フロートデータタイプに対して、飽和は陽の無限大、陽のゼロ、陰のゼロ、或いは陰の無限大に行われることができる。= Assignment: consentation {x {y} indicating selection between x and y (not logical or) sex sign-extend to specific data size sex-dp specific data size Sign-extend with double precision zex zero-extend to specific data size zero zero-extended (logical) move right shift left (zero fill) trnc7 truncate 7 leading bits (from Huffword) trac1 leading 1 Truncate bit (from byte 9)% Modulo operator | Expression | Absolute value of expression / Divide (use one of 4 IEEE rounding modes for float data type) // Divide (Zero rounding mode to round away (round away)) for saturated integer data types Or it saturates to the most positive value. For the float data type, saturation can be done to positive infinity, positive zero, negative zero, or negative infinity.

【０２４７】一般的な命令フォーマットは図１２に表示
されており、下記に説明される。The general instruction format is shown in FIG. 12 and is described below.

【０２４８】ＲＥＡＲフォーマットはロード、ストア及
びキャッシュ演算命令によって用いられ、ＲＥＡＲフォ
ーマットでフィールドは表４７に与えられたように次の
意味をもつ。The REAR format is used by load, store and cache operation instructions, where the fields have the following meanings as given in Table 47.

【０２４９】[0249]

【表６４】 [Table 64]

【０２５０】ビット１７：１５は予約(Reserved)され、
アーキテクチャで未来の拡張時に交換性を保障するため
にゼロになるべきである。Ｂ：ＤとＴＴフィールドのあ
る符号化は定義されない。Bits 17:15 are reserved.
The architecture should be zero to ensure interchangeability in future extensions. B: Coding with D and TT fields is not defined.

【０２５１】プログラマはアーキテクチャがこのような
符号化が用いられる時に予想された結果を指定しないた
めに、前記のような符号化を使用してはいけない。表４
８はＶＥＣ３２とＶＥＣ６４モードで支援された（ＬＴ
としてＴＴフィールドで符号化された）スカラロード演
算を示す。Programmers should not use such encodings because the architecture does not specify the expected results when such encodings are used. Table 4
8 supported in VEC32 and VEC64 modes (LT
A scalar load operation (encoded in the TT field as).

【０２５２】[0252]

【表６５】 [Table 65]

【０２５３】表４９はビットＶＣＳＲ（０）がクリアの
時のＶＥＣ３０モードで支援された（ＬＴとしてＴＴフ
ィールドで符号化された）ベクトルロード演算を示す。Table 49 shows the vector load operations supported in VEC30 mode (encoded in the TT field as LT) when bit VCSR (0) is clear.

【０２５４】[0254]

【表６６】 [Table 66]

【０２５５】Ｂビットは現在或いは交替バンクの指示に
用いられる。The B bit is used to indicate the current or replacement bank.

【０２５６】表５０はビットＶＣＳＲ（０）がクリアの
時のＶＥＣ６４モードで支援された（ＬＴとしてＴＴフ
ィールドで符号化された）ベクトルロード演算を示す。Table 50 shows the vector load operations supported in VEC64 mode (encoded in the TT field as LT) when bit VCSR (0) is clear.

【０２５７】[0257]

【表６７】 [Table 67]

【０２５８】現在及び交替バンクの概念がＶＥＣ６４モ
ードでは存在しないので、ビットＢは６４バイトベクト
ル演算の指示に用いられる。Since the concept of the current and replacement banks does not exist in the VEC64 mode, bit B is used to indicate a 64-byte vector operation.

【０２５９】表５１はＶＥＣ３２及びＶＥＣ６４モード
で支援された（ＬＴとしてＴＴフィールドで符号化され
た）スカラストア演算リストである。Table 51 is a list of scalar store operations supported (encoded in the TT field as LT) supported in VEC32 and VEC64 modes.

【０２６０】[0260]

【表６８】 [Table 68]

【０２６１】表５２はビットＶＣＳＲ（０）がクリアの
時のＶＥＣ３２モードで支援された（ＬＴとしてフィー
ルドＴＴで符号化された）ベクトルストア演算リストで
ある。Table 52 is a list of vector store operations supported (encoded in field TT as LT) supported in VEC32 mode when bit VCSR (0) is clear.

【０２６２】[0262]

【表６９】 [Table 69]

【０２６３】表５３はビットＶＣＳＲ（０）がセットで
ある時のＶＥＣ６４モードで支援された（ＬＴとしてＴ
Ｔフィールドで符号化された）ベクトルストア演算リス
トである。Table 53 supports the VEC64 mode when bit VCSR (0) is set (T as LT
5 is a vector store operation list (encoded in the T field).

【０２６４】[0264]

【表７０】 [Table 70]

【０２６５】現在及び交替バンクの概念がＶＥＣ６４モ
ードでは存在しないので、ビットＢは６４バイトベクト
ル演算の指示に用いられる。Since the concept of the current and replacement banks does not exist in the VEC64 mode, bit B is used to indicate a 64-byte vector operation.

【０２６６】ＲＥＡＩフォーマットはロード、ストア及
びキャッシュ演算命令によって用いられ、ＲＥＡＩフォ
ーマットでフィールドは表５４に与えられたように次の
意味をもつ。The REAI format is used by the load, store and cache operation instructions, where the fields have the following meanings as given in Table 54.

【０２６７】[0267]

【表７１】 [Table 71]

【０２６８】ＲＥＡＲ及びＲＥＡＩフォーマットはトラ
ンスファタイプに対して同一の符号化を適用する。符号
化に対する詳しいことはＲＥＡＲフォーマットを参考さ
れたい。The REAR and REAI formats apply the same coding to the transfer type. For details on encoding, refer to the REAR format.

【０２６９】ＲＲＲＭ５フォーマットは３レジスタ或い
は２レジスタ及び５ビット即値オペランドを提供する。
表５５はＲＲＲＭ５フォーマットに対するフィールドを
定義する。The RRRM5 format provides three or two registers and a 5-bit immediate operand.
Table 55 defines the fields for the RRRM5 format.

【０２７０】[0270]

【表７２】 [Table 72]

【０２７１】ビットは１９：１５は予約(RESERVED)さ
れ、アーキテクチャで未来の拡張時に互換性を保障する
ためにゼロになるべきである。The bits 19:15 are RESERVED and should be zero to ensure compatibility with future extensions in the architecture.

【０２７２】全てのベクトルレジスタオペランドは他の
状態がない限り、現在バンク（バンク０或いはバンク１
になることができる）を参照する。表５６はＤＣ（１：
０）が００，０１，或いは１０の時、Ｄ：Ｓ：Ｍ符号化
表である。All vector register operands are in the current bank (bank 0 or bank 1) unless otherwise specified.
Can be referred to). Table 56 shows DC (1:
When 0) is 00, 01, or 10, it is a D: S: M encoding table.

【０２７３】[0273]

【表７３】 [Table 73]

【０２７４】ＤＳ（１：０）が１１の場合、Ｄ：Ｓ：Ｍ
符号化は次の表５７に示す意味をもつ。When DS (1: 0) is 11, D: S: M
The encoding has the meaning shown in Table 57 below.

【０２７５】[0275]

【表７４】 [Table 74]

【０２７６】ＲＲＲＲフォーマットは４レジスタのオペ
ランドを提供する。The RRRR format provides four register operands.

【０２７７】表５８はＲＲＲＲフォーマットでフィール
ドを示す。Table 58 shows the fields in RRRR format.

【０２７８】[0278]

【表７５】 [Table 75]

【０２７９】全てのベクトルレジスタオペランドは他の
状態がない限り、現在バンク（バンク０またはバンク１
になることができる）を言及する。All vector register operands are in the current bank (bank 0 or bank 1) unless otherwise specified.
Can be).

【０２８０】Ｒ１フォーマットは単にロード即値命令に
よって使用される。表５９はＲＩフォーマットでフィー
ルドを示す。The R1 format is only used by load immediate instructions. Table 59 shows the fields in RI format.

【０２８１】[0281]

【表７６】 [Table 76]

【０２８２】Ｆ：ＤＳ（１：０）フィールドのある符号
化は定義されない。プログラマはこのような符号化が用
いられる時アーキテクチャが予想されたけっかを指定し
ないので、前記のような符号化を使用してはいけない。
Ｒｄにロードされた値は表６０に示すようにデータタイ
プによる。F: The coding with the DS (1: 0) field is not defined. Programmers should not use such encodings, as the architecture does not specify the expected behavior when such encodings are used.
The value loaded into Rd depends on the data type as shown in Table 60.

【０２８３】[0283]

【表７７】 [Table 77]

【０２８４】ＣＴフォーマットは表６１に示すフィール
ドを含む。The CT format includes the fields shown in Table 61.

【０２８５】[0285]

【表７８】 [Table 78]

【０２８６】ブランチ条件はＶＣＳＲ［ＧＴ：ＥＱ：Ｌ
Ｔ］フィールドを使用する。The branch condition is VCSR [GT: EQ: L
T] field.

【０２８７】オーバフロー条件はＶＣＳＲ［Ｓ０］ビッ
トを使用し、これはセット状態の時、ＧＴ，ＥＱ，及び
ＬＴビットを先行する。ＶＣＣＳとＶＣＢＡＲＲは前述
と異なってＣｏｎｄ（２：０）フィールドを解釈する。
詳細な命令説明を参考されたい。The overflow condition uses the VCSR [S0] bit, which, when set, precedes the GT, EQ, and LT bits. VCCS and VCBARR interpret the Cond (2: 0) field differently from the above.
Please refer to the detailed instruction explanation.

【０２８８】ＲＲＲＭ９フォーマットは３レジスタ或い
は２レジスタ及び９ビット即値オペランドを指定する。
表６２はＲＲＲＭ９フォーマットのフィールドをを示
す。The RRRM9 format specifies 3 or 2 registers and a 9-bit immediate operand.
Table 62 shows the fields of the RRRM9 format.

【０２８９】[0289]

【表７９】 [Table 79]

【０２９０】Ｄ：Ｓ：Ｍ符号化が即値オペランドを指定
しない時、ビット１９：１５は予約され、未来の互換性
を保障するためにゼロになるべきである。When the D: S: M encoding does not specify an immediate operand, bits 19:15 are reserved and should be zero to ensure future compatibility.

【０２９１】全てのベクトルレジスタオペランドは他の
状態にない限り、現在（バンク０或いはバンク１になす
ることができる）ことるを参照する。Ｄ：Ｓ：Ｍ符号化
は即値フィールドから抽出された即値が表６３に示すよ
うにＤＳ（１：０）符号化によって左右されることを除
いてはＲＲＲＭ５フォーマットに対する表５６及び５７
に示されたことと同一である。All vector register operands refer to the current (can be bank 0 or bank 1) unless otherwise. Tables 56 and 57 for the RRRM5 format except that the D: S: M encoding is affected by DS (1: 0) encoding as shown in Table 63, with the immediate values extracted from the immediate field.
Is the same as that shown in FIG.

【０２９２】[0292]

【表８０】 [Table 80]

【０２９３】即値フォーマットはフロートデータタイプ
では有用でない。The immediate format is not useful for float data types.

【０２９４】ＭＳＰベクトル命令が次のアルファベット
順で示されている。注釈：１．命令は他の状態が存在しない限り、エレメントマス
クによって影響を受ける。ＣＴフォーマット命令はエレ
メントマスクによって影響を受けない。ロード，スト
ア，及びキャッシュ命令からなるＲＥＡＲとＲＥＡＩフ
ォーマット命令もエレメントマスクによって影響を受け
ない。The MSP vector instructions are shown in the following alphabetical order. Notes: The instruction is affected by the element mask unless another state exists. CT format instructions are not affected by the element mask. REAR and REAI format instructions consisting of load, store, and cache instructions are also unaffected by the element mask.

【０２９５】２．９ビット即値オペランドはフロートデ
ータタイプには有用でない。2.9-bit immediate operands are not useful for float data types.

【０２９６】３．演算(operation）説明で単にベクトル
形式(form)のみ与えられる。スカラ演算の場合はただ一
つ、０番目のエレメントが定義されたと仮定する。[0296] 3. In the description of the operation, only the vector form is given. In the case of a scalar operation, it is assumed that only the 0th element is defined.

【０２９７】４．ＲＲＲＭ５とＲＲＲＭ９フォーマット
の場合、次の表６４に示す符号化が整数データタイプ
（ｂ，ｂ９，ｈ，ｗ）に対して使用される。[0297] 4. For the RRRM5 and RRRM9 formats, the encodings shown in Table 64 below are used for integer data types (b, b9, h, w).

【０２９８】[0298]

【表８１】 [Table 81]

【０２９９】５．ＲＲＲＭ５とＲＲＲＭ９フォーマット
の場合、次の表６５に示した符号化がフロートデータタ
イプに用いられる。[0299] 5. For RRRM5 and RRRM9 formats, the encoding shown in Table 65 below is used for the float data type.

【０３００】[0300]

【表８２】 [Table 82]

【０３０１】６．オーバフローを引き起こす虞のある全
ての命令に対してｉｎｔ８，ｉｎｔ９，ｉｎｔ６，ｉｎ
ｔ３２最大値或いは最小値の制限値はＶＣＳＲ（ＩＳＡ
Ｔ）ビットがセットされた時に適用される。従って、浮
動小数点結果はＶＣＳＲ（ＩＳＡＴ）ビットがセットさ
れた時に−無限大，−ゼロ，＋ゼロ，或いは＋無限大に
飽和される。6. Int8, int9, int6, in for all instructions that may cause overflow
The limit value of t32 maximum value or minimum value is VCSR (ISA
T) Applied when the bit is set. Thus, the floating point result is saturated to -infinity, -zero, + zero, or + infinity when the VCSR (ISAT) bit is set.

【０３０２】７．構文的に．ｎはバイト９データサイズ
を示すために、．ｂ９の代わりに用いられることができ
る。7. Syntactically. .n indicates the byte 9 data size. It can be used instead of b9.

【０３０３】８．全ての命令に対して目的レジスタ或い
はベクトルアキュムレータに帰還する浮動小数点結果は
ＩＥＥＥ７５４単精度フォーマットからなる。浮動小数
点結果はアキュムレータの下位部分に記録され、上位部
分は修正されない。[0303] 8. The floating point result returned to the destination register or vector accumulator for all instructions is in IEEE 754 single precision format. Floating point results are recorded in the lower part of the accumulator and the upper part is not modified.

【０３０４】ＶＡＡＳ３加算及び（１，０，１）の加算 VAAS3 Addition and (1,0,1) Addition

【０３０５】[0305]

【表８３】 [Table 83]

【０３０６】アセンブラ構文ＶＡＡＳ３．ｄｔＶＲｄ，ＶＲａ，ＶＲｂＶＡＡＳ３．ｄｔＶＲｄ，ＶＲａ，ＳＲｂＶＡＡＳ３．ｄｔＳＲｄ，ＳＲａ，ＳＲｂここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ｝Assembler syntax VAAS3. dt VRd, VRa, VRb VAAS3. dt VRd, VRa, SRb VAAS3. dt SRd, SRa, SRb where dt = {b, b9, h, w}

【０３０７】[0307]

【表８４】 [Table 84]

【０３０８】説明ベクトル／スカラレジスタＲａの内容はＲｂに加算され
て中間結果を発生し、その後中間結果にＲａの符号が加
算されて得られた最終結果はベクトル／スカラレジスタ
Ｒｄに記憶される。Description The contents of the vector / scalar register Ra are added to Rb to generate an intermediate result, and the final result obtained by adding the sign of Ra to the intermediate result is stored in the vector / scalar register Rd.

【０３０９】例外オーバフローＶＡＤＡＣ加算及びアキュムレート [0309] Exception Overflow VADAC Addition and accumulation

【０３１０】[0310]

【表８５】 [Table 85]

【０３１１】アセンブラ構文ＶＡＤＡＣ．ｄｔＶＲｃ，ＶＲｄ，ＶＲａ，ＶＲｂＶＡＤＡＣ．ｄｔＳＲｃ，ＳＲｄ，ＳＲａ，ＳＲｂここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ｝。Assembler syntax VADAC. dt VRc, VRd, VRa, VRb VADAC. dt SRc, SRd, SRa, SRb where dt = {b, b9, h, w}.

【０３１２】[0312]

【表８６】 [Table 86]

【０３１３】説明ＲａとＲｂはオペランドのそれぞれのエレメントをベク
トルアキュムレータのそれぞれの倍精度エレメントに加
算し、各エレメントの倍精度の和をベクトルアキュムレ
ータと目的レジスタＲｃ，Ｒｄに記憶させる。ＲａとＲ
ｂは指定されたデータタイプを使用するが、ＶＡＣは適
当な倍精度データタイプ（それぞれｉｎｔ８，ｉｎｔ
９，ｉｎｔ１６，及びｉｎｔ３２に対して１６，１８，
３２，及び６４ビット）を使用する。それぞれの倍精度
エレメントの上位部分はＶＡＣＨとＲｃに記憶される。
もしＲｃ＝Ｒｄであれば、Ｒｃの結果は定義されない。Description Ra and Rb add each element of the operand to each double precision element of the vector accumulator, and store the double precision sum of each element in the vector accumulator and the target registers Rc and Rd. Ra and R
b uses the specified data type, while VAC uses the appropriate double precision data type (int8, int, respectively).
16, 18, for 9, int16 and int32
32 and 64 bits). The upper part of each double precision element is stored in VACH and Rc.
If Rc = Rd, the result of Rc is undefined.

【０３１４】演算 for(i = 0;i < NumElem ＆＆ EMASK[i];i++)｛ Aop[i] =｛VRa[i]‖ SRa｝; Bop[i] =｛VRb[i]‖ SRb｝; VACH[i]:VACL[i] = sex(Aop[i] + Bop[i] + VACH[i]:VACL[i]; Rc[i] = VACH[i] ; Rd[i] = VACL[i] ; ｝ＶＡＤＡＣＬ加算及びローアキュムレート Operation for (i = 0; i <NumElem && EMASK [i]; i ++) ｛Aop [i] = {VRa [i] ‖SRa｝; Bop [i] = ｛VRb [i] ‖SRb｝; VACH [i]: VACL [i] = sex (Aop [i] + Bop [i] + VACH [i]: VACL [i]; Rc [i] = VACH [i]; Rd [i] = VACL [i ]; ＡＤ VADACL addition and low accumulation

【０３１５】[0315]

【表８７】 [Table 87]

【０３１６】アセンブラ構文ＶＡＤＡＣＬ．ｄｔＶＲｄ，ＶＲａ，ＶＲｂＶＡＤＡＣＬ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂＶＡＤＡＣＬ．ｄｔＶＲｄ，ＶＲａ，＃ＩＭＭＶＡＤＡＣＬ．ｄｔＳＲｄ，ＳＲａ，ＳＲｂＶＡＤＡＣＬ．ｄｔＳＲｄ，ＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ｝。Assembler syntax VADACL. dt VRd, VRa, VRb VADACL. dt VRd, VRa, SRb VADACL. dt VRd, VRa, #IMM VADACL. dt SRd, SRa, SRb VADACL. dt SRd, SRa, #IMM where dt = {b, b9, h, w}.

【０３１７】[0317]

【表８８】 [Table 88]

【０３１８】説明ＲａとＲｂ／即値オペランドのそれぞれのエレメントを
ベクトルアキュムレータのそれぞれの拡張された精密度
エレメントに加算し、低い精密度を目的レジスタ（Ｒ
ｄ）にリターンさせる。ＲａとＲｂ／即値は指定された
データタイプを使用するが、ＶＡＣは適当な倍精度デー
タタイプ（それぞれｉｎｔ８，ｉｎｔ９，ｉｎｔ１６，
及びｉｎｔ３２に対して１６、１８，３２，及び６４ビ
ット）を使用する。それぞれの拡張された精密度エレメ
ントの上位部分はＶＡＣＨに記憶される。Description Each element of the Ra and Rb / immediate operands is added to each extended precision element of the vector accumulator, and the lower precision is added to the destination register (R
Return to d). While Ra and Rb / immediate use the specified data type, VAC uses the appropriate double precision data type (int8, int9, int16,
And int32 for 16, 18, 32, and 64 bits). The upper part of each extended precision element is stored on the VACH.

【０３１９】演算 for(i = 0;i < NumElem ＆＆ EMASK[i];i++)｛ Bop[i] =｛VRb[i]‖ SRb‖ sex(IMM<8:0>)｝; VACH[i]:VACL[i] = sex(Ra[i] + Bop[i] + VACH[i]:VACL[i]; Rd[i] = VACL[i] ; ｝ＶＡＤＤ加算 Operation for (i = 0; i <NumElem && EMASK [i]; i ++) ｛Bop [i] = ｛VRb [i] ‖SRb‖sex (IMM <8: 0>)｝; VACH [i] : VACL [i] = sex (Ra [i] + Bop [i] + VACH [i]: VACL [i]; Rd [i] = VACL [i]; Ｖ VADD addition

【０３２０】[0320]

【表８９】 [Table 89]

【０３２１】アセンブラ構文ＶＡＤＤ．ｄｔＶＲｄ，ＶＲａ，ＶＲｂＶＡＤＤ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂＶＡＤＤ．ｄｔＶＲｄ，ＶＲａ，＃ＩＭＭＶＡＤＤ．ｄｔＳＲｄ，ＳＲａ，ＳＲｂＶＡＤＤ．ｄｔＳＲｄ，ＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ，ｆ｝。Assembler syntax VADD. dt VRd, VRa, VRb VADD. dt VRd, VRa, SRb VADD. dt VRd, VRa, #IMM VADD. dt SRd, SRa, SRb VADD. dt SRd, SRa, #IMM where dt = {b, b9, h, w, f}.

【０３２２】[0322]

【表９０】 [Table 90]

【０３２３】説明ＲａとＲｂ／即値オペランドを加算し、その和を目的レ
ジスタＲｄにリターンさせる。Description Adds Ra and Rb / immediate operands and returns the sum to destination register Rd.

【０３２４】演算 for(i = 0;i < NumElem ＆＆ EMASK[i];i++)｛ Bop[i] =｛VRb[i]‖ SRb‖ sex(IMM<8:0>)｝; Rd[i] = Ra[i] + Bop[i] ; ｝例外オーバフロー，浮動小数点無効オペランドＶＡＤＤＨその隣接セルエレメント加算 Operation for (i = 0; i <NumElem && EMASK [i]; i ++) ｛Bop [i] = ｛VRb [i] ‖SRb‖sex (IMM <8: 0>)｝; Rd [i] = Ra [i] + Bop [i]; 例外 Exception Overflow, floating-point invalid operand VADDH Addition of adjacent cell elements

【０３２５】[0325]

【表９１】 [Table 91]

【０３２６】アセンブラ構文ＶＡＤＤＨ．ｄｔＶＲｄ，ＶＲａ，ＶＲｂＶＡＤＤＨ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ｝。Assembler syntax VADDH. dt VRd, VRa, VRb VADDH. dt VRd, VRa, SRb where dt = {b, b9, h, w}.

【０３２７】[0327]

【表９２】 [Table 92]

【０３２８】[0328]

【表９３】 [Table 93]

【０３２９】演算 for(i = 0;i < NumElem - 1 ; i++) ｛ Rd[i] = Ra[i] + Ra[i+1] ; ｝ Rd[NumElem-1] = Ra[NumElem-1]＋｛VPb[0]‖SRb ｝；例外オーバフロー、浮動小数点無効オペランドプログラミング注意この命令はエレメントマスクによって影響を受けない。Operation for (i = 0; i <NumElem-1; i ++) ｛Rd [i] = Ra [i] + Ra [i + 1];; Rd [NumElem-1] = Ra [NumElem-1] + {VPb [0]} SRb}; Exception Overflow, floating-point invalid operand Programming note This instruction is not affected by the element mask.

【０３３０】ＶＡＮＤＡＮＤ [0330] VAND AND

【０３３１】[0331]

【表９４】 [Table 94]

【０３３２】アセンブラ構文ＶＡＮＤ．ｄｔＶＲｄ，ＶＲａ，ＶＲｂＶＡＮＤ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂＶＡＮＤ．ｄｔＶＲｄ，ＶＲａ，＃ＩＭＭＶＡＮＤ．ｄｔＳＲｄ，ＳＲａ，ＳＲｂＶＡＮＤ．ｄｔＳＲｄ，ＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ｝，．ｗと．ｆが同
一演算を指定することに留意されたい。Assembler syntax VAND. dt VRd, VRa, VRb VAND. dt VRd, VRa, SRb VAND. dt VRd, VRa, #IMM VAND. dt SRd, SRa, SRb VAND. dt SRd, SRa, #IMM where dt = {b, b9, h, w},. w and. Note that f specifies the same operation.

【０３３３】[0333]

【表９５】 [Table 95]

【０３３４】説明ＲａとＲｂ／即値オペランドを論理的にＡＮＤし、その
結果を目的レジスタＲｄにリターンさせる。Description Logically AND the Ra and Rb / immediate operands and return the result to the destination register Rd.

【０３３５】演算 for(i = 0;i < NumElem ＆＆ EMASK[i];i++)｛ Bop[i] =｛VRb[i]‖ SRb‖ sex(IMM<8:0>)｝; Rd[i]<k> = Ra[i]<k> ＆ Bop[i]<k> , k = for all bits in element i ; ｝例外無しＶＡＮＤＣ補数ＡＮＤ Operation for (i = 0; i <NumElem && EMASK [i]; i ++) ｛Bop [i] = ｛VRb [i] ‖SRb‖sex (IMM <8: 0>)｝; Rd [i] <k> = Ra [i] <k>& Bop [i] <k>, k = for all bits in element i;｝ No exception VANDC complement AND

【０３３６】[0336]

【表９６】 [Table 96]

【０３３７】アセンブラ構文ＶＡＮＤＣ．ｄｔＶＲｄ，ＶＲａ，ＶＲｂＶＡＮＤＣ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂＶＡＮＤＣ．ｄｔＶＲｄ，ＶＲａ，＃ＩＭＭＶＡＮＤＣ．ｄｔＳＲｄ，ＳＲａ，ＳＲｂＶＡＮＤＣ．ｄｔＳＲｄ，ＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ｝，．ｗと．ｆが同
一の演算を指定することに留意されたい。Assembler syntax VANDC. dt VRd, VRa, VRb VANDC. dt VRd, VRa, SRb VANDC. dt VRd, VRa, #IMM VANDC. dt SRd, SRa, SRb VANDC. dt SRd, SRa, #IMM where dt = {b, b9, h, w},. w and. Note that f specifies the same operation.

【０３３８】[0338]

【表９７】 [Table 97]

【０３３９】説明Ｒａ及びＲｂ／即値オペランドの補数を論理的にＡＤＮ
し、その結果を目的レジスタＲｄにリターンさせる。Description The complement of Ra and Rb / immediate operand is logically ADN.
Then, the result is returned to the destination register Rd.

【０３４０】演算 for(i = 0;i < NumElem ＆＆ EMASK[i];i++)｛ Bop[i] =｛VRb[i]‖ SRb‖ sex(IMM<8:0>)｝; Rd[i]<k> = Ra[i]<k> ＆ -Bop[i]<k>, k = for all bits in element i ; ｝例外無しＶＡＳＡ算術アキュムレータ移動 Operation for (i = 0; i <NumElem && EMASK [i]; i ++) ｛Bop [i] = ｛VRb [i] ‖SRb‖sex (IMM <8: 0>)｝; Rd [i] <k> = Ra [i] <k>& -Bop [i] <k>, k = for all bits in element i;｝ Exceptions None VASA arithmetic accumulator move

【０３４１】[0341]

【表９８】 [Table 98]

【０３４２】アセンブラ構文ＶＡＳＡＬ．ｄｔＶＡＳＡＲ．ｄｔここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ｝であり、Ｒは左或
いは右側の回転方向を示す。Assembler syntax VASAL. dt VASAR. dt Here, dt = {b, b9, h, w}, and R indicates the left or right rotation direction.

【０３４３】[0343]

【表９９】 [Table 99]

【０３４４】説明ベクトルアキュムレータレジスタのそれぞれのデータエ
レメントは右側からゼロ充てん(zerofill)で１ビット位
置だけ左に移動されるか（もしＲ＝０の場合）或いは符
号−拡張で１ビット位置だけ左に移動される（もしＲ＝
１の場合）。この結果はベクトルアキュムレータに記憶
される。Description Each data element of the vector accumulator register is shifted one bit position to the left with zerofill from the right (if R = 0) or one bit position to the left with sign-extension. Moved (if R =
1). This result is stored in the vector accumulator.

【０３４５】演算 for(i = 0;i < NumElem ＆＆ EMASK[i];i++)｛ if(R = 1) VACOH[i]:VACOL[i] = VACOH[i]:VACOL[i] sign>> 1 ; else VACOH[i]:VACOL[i] = VACOH[i]:VACOL[i] << 1 ; ｝例外オーバフローＶＡＳＬ算術左への移動 Operation for (i = 0; i <NumElem && EMASK [i]; i ++) ｛if (R = 1) VACOH [i]: VACOL [i] = VACOH [i]: VACOL [i] sign >>1; else VACOH [i]: VACOL [i] = VACOH [i]: VACOL [i] <<1; 例外 Exception Overflow VASL Move arithmetic left

【０３４６】[0346]

【表１００】 [Table 100]

【０３４７】アセンブラ構文ＶＡＳＬ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂＶＡＳＬ．ｄｔＶＲｄ，ＶＲａ，＃ＩＭＭＶＡＳＬ．ｄｔＳＲｄ，ＳＲａ，ＳＲｂＶＡＳＬ．ｄｔＳＲｄ，ＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ｝。Assembler syntax VASL. dt VRd, VRa, SRb VASL. dt VRd, VRa, #IMM VASL. dt SRd, SRa, SRb VASL. dt SRd, SRa, #IMM where dt = {b, b9, h, w}.

【０３４８】[0348]

【表１０１】 [Table 101]

【０３４９】説明ベクトル／スカラレジスタＲａのそれぞれのデータエレ
メントは右側からゼロ充てんでスカラレジスタＲｂ或い
はＩＭＭフィールドに与えられた移動量だけ左に移動さ
れ、その結果はベクトル／スカラレジスタＲｄに記憶さ
れる。オーバフローを発生するそれらエレメントに対し
てその結果はそれらの符号によって最大陽或いは陰の値
に飽和する。移動量は符号のない整数となるように定義
される。Description Each data element of the vector / scalar register Ra is filled with zeros from the right side and moved to the left by the moving amount given to the scalar register Rb or the IMM field, and the result is stored in the vector / scalar register Rd. . For those elements that cause overflow, the result saturates to the maximum explicit or negative value depending on their sign. The amount of movement is defined to be an unsigned integer.

【０３５０】演算 shift＿amount =｛SRb % 32‖IMM<4:0>｝; for(i = 0 ; i < NumElem ＆＆ EMASK[i] ; i++)｛ Rd[i] = saturate(Ra[i] << shift＿amount; ｝例外なしプログラミング注意移動量はＳＲｂ或いはＩＭＭ（４：０）から５ビット番
号で得られる点に注意されたい。バイト、バイト９、ハ
フワードデータタイプに対してプログラマはデータサイ
ズのビット数より小さいか同一の移動量を正確に指定す
る義務がある。もし移動量が指定されたデータサイズよ
り大きい場合、エレメントはゼロ充てんされる。Operation shift_amount = ｛SRb% 32‖IMM <4: 0>｝; for (i = 0; i <NumElem && EMASK [i]; i ++) ｛Rd [i] = saturate (Ra [i] << ＿注意移動移動なし＿プログラミングプログラミングプログラミング＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿＿. It is obligatory to specify exactly the same or smaller amount of movement: if the amount of movement is greater than the specified data size, the element is zero-filled.

【０３５１】ＶＡＳＲ算術右への移動 VASR Arithmetic Move Right

【０３５２】[0352]

【表１０２】 [Table 102]

【０３５３】アセンブラ構文ＶＡＳＲ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂＶＡＳＲ．ｄｔＶＲｄ，ＶＲａ，＃ＩＭＭＶＡＳＲ．ｄｔＳＲｄ，ＳＲａ，ＳＲｂＶＡＳＲ．ｄｔＳＲｄ，ＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ｝。Assembler syntax VASR. dt VRd, VRa, SRb VASR. dt VRd, VRa, #IMM VASR. dt SRd, SRa, SRb VASR. dt SRd, SRa, #IMM where dt = {b, b9, h, w}.

【０３５４】[0354]

【表１０３】 [Table 103]

【０３５５】説明ベクトル／スカラレジスタＲａのそれぞれのデータエレ
メントは最上位ビット位置で符号−拡張されてスカラレ
ジスタＲｂ或いはＩＭＭフィールドの最下位ビットに与
えられた移動量だけ右に算術的に移動され、その結果は
ベクトル／スカラレジスタＲｄに記憶される。移動量は
符号のない整数となるように定義される。Description Each data element of the vector / scalar register Ra is sign-extended at the most significant bit position and arithmetically moved right by the amount of movement given to the least significant bit of the scalar register Rb or IMM field, The result is stored in the vector / scalar register Rd. The amount of movement is defined to be an unsigned integer.

【０３５６】演算 shift＿amount =｛SRb % 32‖IMM<4:0>｝; for(i = 0 ; i < NumElem ＆＆ EMASK[i] ; i++)｛ Rd[i] = Ra[i] sign >> shift＿amount) ; ｝例外無しプログラミング注意移動量がＳＲｂ或いはＩＭＭ（４：０）から５ビット番
号で得られることに注意されたい。バイト、バイト９、
ハフワードデータタイプに対してプログラマはデータサ
イズのビット数より小さいか同一の移動量を正確に指定
する義務がある。もし移動量が指定されたデータサイズ
より大きい場合、エレメントは符号ビットで充てんされ
る。Operation shift_amount = ｛SRb% 32‖IMM <4: 0>｝; for (i = 0; i <NumElem && EMASK [i]; i ++) ｛Rd [i] = Ra [i] sign >> shift_amount ｝; 例外無し無し無しプログラミング注意プログラミングプログラミングプログラミング注意プログラミングプログラミングプログラミングプログラミングプログラミングプログラミングプログラミングプログラミングプログラミングプログラミングプログラミング; 注意無しプログラミングプログラミングプログラミングプログラミングプログラミングプログラミングプログラミングプログラミングプログラミングプログラミングプログラミングプログラミングプログラミングプログラミング注意注意プログラミングプログラミングプログラミングプログラミングプログラミングプログラミングプログラミングプログラミングプログラミングプログラミングプログラミングプログラミングプログラミングプログラミングプログラミングByte, byte 9,
For the Huffword data type, the programmer has an obligation to specify exactly the same amount of movement that is less than or equal to the number of bits of the data size. If the displacement is larger than the specified data size, the element is filled with sign bits.

【０３５７】ＶＡＳＳ３加算及び（−１，
０，１）の符号減算 The VASS3 addition and (−1,
Sign subtraction of 0,1)

【０３５８】[0358]

【表１０４】 [Table 104]

【０３５９】アセンブラ構文ＶＡＳＳ３．ｄｔＶＲｄ，ＶＲａ，ＶＲｂＶＡＳＳ３．ｄｔＶＲｄ，ＶＲａ，ＳＲｂＶＡＳＳ３．ｄｔＳＲｄ，ＳＲａ，ＳＲｂここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ｝。Assembler syntax VASS3. dt VRd, VRa, VRb VASS3. dt VRd, VRa, SRb VASS3. dt SRd, SRa, SRb where dt = {b, b9, h, w}.

【０３６０】[0360]

【表１０５】 [Table 105]

【０３６１】説明ベクトル／スカラレジスタＲａのＲｂに加算されて中間
結果を生成し、その後中間結果からＲａの符号が減算さ
れて得られた最終結果はベクトル／スカラレジスタＲｄ
に記憶される。Description The intermediate result is added to Rb of the vector / scalar register Ra to generate an intermediate result, and the final result obtained by subtracting the sign of Ra from the intermediate result is the vector / scalar register Rd.
Is stored.

【０３６２】演算 for(i = 0;i < NumElem ＆＆ EMASK[i];i++)｛ if(Ra[i] > 0) extsgn3 = 1 ; else if(Ra[i] < 0) extsgn3 = -1 ; else extsgn3 = 0 ; Rd[i] = Ra[i] + Rb[i] - extsgn3 ; ｝例外オーバフローＶＡＳＵＢ減算の絶対値 Operation for (i = 0; i <NumElem && EMASK [i]; i ++) ｛if (Ra [i]> 0) extsgn3 = 1; else if (Ra [i] <0) extsgn3 = −1; else extsgn3 = 0; Rd [i] = Ra [i] + Rb [i]-extsgn3;｝ Exceptions Overflow Absub Absolute value of subtraction

【０３６３】[0363]

【表１０６】 [Table 106]

【０３６４】アセンブラ構文ＶＡＳＵＢ．ｄｔＶＲｄ，ＶＲａ，ＶＲｂＶＡＳＵＢ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂＶＡＳＵＢ．ｄｔＶＲｄ，ＶＲａ，＃ＩＭＭＶＡＳＵＢ．ｄｔＳＲｄ，ＳＲａ，ＳＲｂＶＡＳＵＢ．ｄｔＳＲｄ，ＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ｝。Assembler syntax VASUB. dt VRd, VRa, VRb VASUB. dt VRd, VRa, SRb VASUB. dt VRd, VRa, #IMM VASUB. dt SRd, SRa, SRb VASUB. dt SRd, SRa, #IMM where dt = {b, b9, h, w}.

【０３６５】[0365]

【表１０７】 [Table 107]

【０３６６】説明ベクトル／スカラレジスタＲｂ或いはＩＭＭフィールド
の内容はベクトル／スカラレジスタＲａの内容から減算
されてその絶対値がベクトル／スカラレジスタＲｄに記
憶される。Description The contents of the vector / scalar register Rb or the IMM field are subtracted from the contents of the vector / scalar register Ra, and the absolute value is stored in the vector / scalar register Rd.

【０３６７】例外オーバフロー、浮動小数点無効オペランドプログラミング注意もし減算器の結果が最大陰数であれば、オーバフローは
絶対値演算後に発生される。もし飽和モードがイネーブ
ルされる場合ならば、絶対値演算の結果は最大陽数にな
る。[0367] Exceptions Overflow, floating-point invalid operand Programming notes If the result of the subtractor is the largest implicit, an overflow occurs after the absolute operation. If saturation mode is enabled, the result of the absolute value operation will be the largest positive number.

【０３６８】ＶＡＶＧ２エレメント平均 VAVG Two Element Average

【０３６９】[0369]

【表１０８】 [Table 108]

【０３７０】アセンブラ構文ＶＡＶＧ．ｄｔＶＲｄ，ＶＲａ，ＶＲｂＶＡＶＧ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂＶＡＶＧ．ｄｔＳＲｄ，ＳＲａ，ＳＲｂここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ，ｆ｝であり、整数
データタイプに対する“打切り”四捨五入モードを指定
するためにＶＡＶＧＴを使用する。Assembler syntax VAVG. dt VRd, VRa, VRb VAVG. dt VRd, VRa, SRb VAVG. dt SRd, SRa, SRb where dt = {b, b9, h, w, f} and use VAVGT to specify the "truncated" rounding mode for integer data types.

【０３７１】[0371]

【表１０９】 [Table 109]

【０３７２】説明ベクトル／スカラレジスタＲａの内容はベクトル／スカ
ラレジスタＲｂの内容に加算されて中間結果を生成し、
その後中間結果は２で割られ、最終結果はベクトル／ス
カラレジスタＲｄに記憶される。整数データタイプに対
してＴ＝１の場合、四捨五入モードが打ち切られ、Ｔ＝
０の場合、ゼロから切り捨てがなされる（デフォール
ト）。フロートデータタイプの場合、四捨五入モードは
ＶＣＳＲ（ＲＭＯＤＥ）に指定される。Description The contents of vector / scalar register Ra are added to the contents of vector / scalar register Rb to generate an intermediate result,
Then the intermediate result is divided by 2 and the final result is stored in the vector / scalar register Rd. If T = 1 for an integer data type, rounding mode is aborted and T =
If 0, round down from zero (default). In the case of the float data type, the rounding mode is specified in VCSR (RMODE).

【０３７３】例外無しＶＡＶＧＨ２隣接エレメント平均 [0373] Exception None VAVGH Average of two adjacent elements

【０３７４】[0374]

【表１１０】 [Table 110]

【０３７５】アセンブラ構文ＶＡＶＧＨ．ｄｔＶＲｄ，ＶＲａ，ＶＲｂＶＡＶＧＨ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ，ｆ｝であり、整数
データタイプに対する“打切り”四捨五入モードを指定
するためにＶＡＶＧＨＴを使用する。Assembler syntax VAVGH. dt VRd, VRa, VRb VAVGH. dt VRd, VRa, SRb where dt = {b, b9, h, w, f} and use VAVGHT to specify the "truncated" rounding mode for integer data types.

【０３７６】[0376]

【表１１１】 [Table 111]

【０３７７】[0377]

【表１１２】 [Table 112]

【０３７８】演算 for(i = 0;i < NumElem - 1 ; i++) ｛ Rd[i] = (Ra[i] + Ra[i+1])// 2 ; ｝ Rd[NumElem-1] = (Ra[NumElem-1] ＋｛VRb[0]‖SRb ｝）／／２；例外無しプログラミング注意この命令はエレメントマスクによって影響を受けない。Operation for (i = 0; i <NumElem-1; i ++) ｛Rd [i] = (Ra [i] + Ra [i + 1]) // 2;｝ Rd [NumElem-1] = ( Ra [NumElem-1] + {VRb [0] {SRb}) // 2; Exceptions None Programming note This instruction is not affected by the element mask.

【０３７９】ＶＡＶＧＱ４重平均 VAVGQ Quadruple Average

【０３８０】[0380]

【表１１３】 [Table 113]

【０３８１】アセンブラ構文ＶＡＶＧＱ．ｄｔＶＲｄ，ＶＲａ，ＶＲｂここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ｝であり、整数デー
タタイプに対する“打切り”四捨五入モードを指定する
ためにＶＡＶＧＱＴを使用する。Assembler syntax VAVGQ. dt VRd, VRa, VRb where dt = {b, b9, h, w} and use VAVGQT to specify the "truncated" rounding mode for integer data types.

【０３８２】[0382]

【表１１４】 [Table 114]

【０３８３】[0383]

【表１１５】 [Table 115]

【０３８４】演算 for(i = 0;i < NumElem - 1 ; i++) ｛ Rd[i] = (Ra[i] + Rb[i] + (Ra[i+1] + Rb[i+1])// 4；｝例外無しＶＣＡＣＨＥキャッシュ演算 Operation for (i = 0; i <NumElem-1; i ++) ++ Rd [i] = (Ra [i] + Rb [i] + (Ra [i + 1] + Rb [i + 1]) // 4;｝ No exception VCACHE Cache operation

【０３８５】[0385]

【表１１６】 [Table 116]

【０３８６】アセンブラ構文ＶＣＡＣＨＥ．ｆｃＳＲｂ，ＳＲｉＶＣＡＣＨＥ．ｆｃＳＲｂ，＃ＩＭＭＶＣＡＣＨＥ．ｆｃＳＲｂ＋，ＳＲｉＶＣＡＣＨＥ．ｆｃＳＲｂ＋，＃ＩＭＭここで、ｆｃ＝｛０，１｝。Assembler syntax VCACHE. fc SRb, SRi VCACHE. fc SRb, #IMM VCACHE. fc SRb +, SRi VCACHE. fc SRb +, #IMM where fc = {0, 1}.

【０３８７】[0387]

【表１１７】 [Table 117]

【０３８８】演算例外無しプログラミング注意この命令はエレメントマスクによって影響を受けない。Operation Exceptions None Programming Notes This instruction is not affected by the element mask.

【０３８９】ＶＣＡＮＤ補数加算 VCAND Complementary Addition

【０３９０】[0390]

【表１１８】 [Table 118]

【０３９１】アセンブラ構文ＶＣＡＮＤ．ｄｔＶＲｄ，ＶＲａ，ＶＲｂＶＣＡＮＤ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂＶＣＡＮＤ．ｄｔＶＲｄ，ＶＲａ，＃ＩＭＭＶＣＡＮＤ．ｄｔＳＲｄ，ＳＲａ，ＳＲｂＶＣＡＮＤ．ｄｔＳＲｄ，ＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ｝であり、．ｗと．
ｆが同一の演算を指定することに留意されたい。Assembler syntax VCAND. dt VRd, VRa, VRb VCAND. dt VRd, VRa, SRb VCAND. dt VRd, VRa, #IMM VCAND. dt SRd, SRa, SRb VCAND. dt SRd, SRa, #IMM where dt = {b, b9, h, w} and. w and.
Note that f specifies the same operation.

【０３９２】[0392]

【表１１９】 [Table 119]

【０３９３】説明ＲａとＲｂ／即値オペランドの補数を論理的にＡＮＤ
し、その結果は目的レジスタＲｄにリターンさせる。Description The complements of Ra and Rb / immediate operands are logically ANDed.
Then, the result is returned to the destination register Rd.

【０３９４】演算 for(i = 0; i < NumElem ＆＆ EMASK[i]; i++) ｛ Bop[i] = ｛VRb[i]‖SRb ‖sex(IMM<8:0>) ｝; Rd[i]<k> = -Ra[i]<k> ＆ Bop[i]<k>, k = for all bits in element i; ｝例外無しＶＣＢＡＲＲ条件付バリヤ Operation for (i = 0; i <NumElem && EMASK [i]; i ++) ｛Bop [i] = ｛VRb [i] ‖SRb ‖sex (IMM <8: 0>)｝; Rd [i] <k> = -Ra [i] <k>& Bop [i] <k>, k = for all bits in element i;｝ Exception None VCBARR Conditional barrier

【０３９５】[0395]

【表１２０】 [Table 120]

【０３９６】アセンブラ構文ＶＣＢＡＲＲ．ｃｏｎｄここで、ｃｏｎｄ＝｛０，−７｝、各条件は後から記号
で与えられる。Assembler syntax VCBARR. cond Here, cond = {0, -7}, and each condition is given later by a symbol.

【０３９７】[0397]

【表１２１】 [Table 121]

【０３９８】演算（Ｃｏｎｄ＝真）の間、全ての後続命令は停止させる。During the operation (Cond = true), all subsequent instructions are stopped.

【０３９９】例外無しプログラミング注意この命令は命令実行の直列化を施行するためにソフトウ
ェアに提供される。この命令は不正密例外の正確な報告
を行うのに用いられる。例えば、もしこの命令が例外を
生じさせる恐れのある算術命令の直後に用いられる場
合、例外はこの命令を番地指定するプログラムカウンタ
に報告される。Exceptions None Programming Notes This instruction is provided to software to enforce serialization of instruction execution. This instruction is used to provide accurate reporting of confidential exceptions. For example, if this instruction is used immediately after an arithmetic instruction that may cause an exception, the exception is reported to the program counter addressing the instruction.

【０４００】ＶＣＢＲ条件付ブランチ VCBR Conditional Branch

【０４０１】[0401]

【表１２２】 [Table 122]

【０４０２】アセンブラ構文ＶＣＢＲ．ｃｏｎｄ＃Ｏｆｆｓｅｔここで、ｃｏｎｄ＝｛ｕｎ，ｌｔ，ｅｑ，ｌｅ，ｇｔ，
ｎｅ，ｇｅ，ｏｖ｝。[0402] Assembler syntax VCBR. cond #Offset where cond = {un, lt, eq, le, gt,
ne, ge, ov｝.

【０４０３】説明Ｃｏｎｄが真であれば、ブランチする。これは遅延した
ブランチでない。Description If Cond is true, branch. This is not a delayed branch.

【０４０４】例外命令アドレス無効ＶＣＢＲＩ条件付間接ブランチ [0404] Exception Instruction address invalid VCBRI Conditional indirect branch

【０４０５】[0405]

【表１２３】 [Table 123]

【０４０６】アセンブラ構文ＶＣＢＲＩ．ｃｏｎｄＳＲｂここで、ｃｏｎｄ＝｛ｕｎ，ｌｔ，ｅｑ，ｌｅ，ｇｔ，
ｎｅ，ｇｅ，ｏｖ｝。Assembler syntax VCBRI. cond SRb where cond = {un, lt, eq, le, gt,
ne, ge, ov｝.

【０４０７】説明Ｃｏｎｄが真であれば、ブランチする。これは遅延した
ブランチでない。Description If Cond is true, branch. This is not a delayed branch.

【０４０８】例外命令アドレス無効ＶＣＣＳ条件付文脈切換 [0408] Exception Instruction address invalid VCCS Conditional context switch

【０４０９】[0409]

【表１２４】 [Table 124]

【０４１０】アセンブラ構文ＶＣＣＳ＃Ｏｆｆｓｅｔ説明もしＶＩＭＳＫ（ｃｓｅ）が真の場合、文脈切換サブル
ーチンにジャンプする。これは遅延したブランチでな
い。もし、ＶＩＭＳＫ（ｃｓｅ）が真の場合、ＶＰＣ＋
４（リターンアドレス）がリターンアドレススタックに
セーブされる。もしそうでなければ、実行はＶＰＣ＋４
で続けられる。Assembler Syntax VCCS #Offset Description If VIMSK (cse) is true, jump to the context switch subroutine. This is not a delayed branch. If VIMSK (cse) is true, VPC +
4 (return address) is saved on the return address stack. If not, execution is VPC + 4
Continue with.

【０４１１】例外アドレススタックオーバフローリターンＶＣＨＧＣＲ制御レジスタ変更 [0411] Exception Address stack overflow return VCHGCR Control register change

【０４１２】[0412]

【表１２５】 [Table 125]

【０４１３】アセンブラ構文ＶＣＨＧＣＲＭｏｄｅ[0413] Assembler syntax VCHGCR Mode

【０４１４】[0414]

【表１２６】 [Table 126]

【０４１５】演算例外無しプログラミング注意この命令はハードウェアがＶＭＯＶ命令をもって機能し
たものよりさらに効率的な方式でＶＣＳＲで制御ビット
を変更するために提供される。Operations Exceptions None Programming Notes This instruction is provided to change the control bits in the VCSR in a more efficient manner than the hardware worked with the VMOV instruction.

【０４１６】ＶＣＩＮＴ条件付ＡＲＭ７イン
タラプト [0416] VCINT Conditional ARM7 In
Taraput

【０４１７】[0417]

【表１２７】 [Table 127]

【０４１８】アセンブラ構文ＶＣＩＮＴ．ｃｏｎｄ＃ＣＯＤＥここで、ｃｏｎｄ＝｛ｕｎ，ｌｔ，ｅｑ，ｌｅ，ｇｔ，
ｎｅ，ｇｅ，ｏｖ｝。Assembler syntax VCINT. cond #CODE where cond = {un, lt, eq, le, gt,
ne, ge, ov｝.

【０４１９】説明もしＣｏｎｄが真であれば、実行を停止し、イネーブル
された場合にＡＲＭ７をインタラプトする。Description If Cond is true, stop execution and interrupt ARM7 if enabled.

【０４２０】演算 If((Cond=VCSR[SO,GT,EQ,LT]) ｜(Cond==un)) ｛ VISRC<vip> = 1; VIINS = [VCINT.cond #ICODE instruction]; VEPC = VPC; if(VIMSK<vie>==1)signal ARM7 interrupt; VP STATE=VP IDLE; ｝ else VPC = VPC+4; 例外ＶＣＩＮＴインタラプトＶＣＪＯＩＮＡＲＭ７タスクを有する条件付結
合 Operation If ((Cond = VCSR [SO, GT, EQ, LT]) | (Cond == un)) ｛VISRC <vip> = 1; VIINS = [VCINT.cond #ICODE instruction]; VEPC = VPC ; if (VIMSK <vie> == 1) signal ARM7 interrupt; VP STATE = VP IDLE;｝ else VPC = VPC + 4; Exception VCINT interrupt VCJOIN Conditional with ARM7 task
Combination

【０４２１】[0421]

【表１２８】 [Table 128]

【０４２２】アセンブラ構文ＶＣＪＯＩＮ．ｃｏｎｄ＃Ｏｆｆｓｓｅｔここで、ｃｏｎｄ＝｛ｕｎ，ｌｔ，ｅｑ，ｌｅ，ｇｔ，
ｎｅ，ｇｅ，ｏｖ｝。Assembler syntax VCJOIN. cond #Offset where cond = {un, lt, eq, le, gt,
ne, ge, ov｝.

【０４２３】説明もしＣｏｎｄが真であれば、実行を停止し、イネーブル
された場合にＡＲＭ７をインタラプトする。Description If Cond is true, stop execution and interrupt ARM7 if enabled.

【０４２４】演算 If((Cond=VCSR[SO,GT,EQ,LT]) ｜(Cond=un))｛ VISRC<vjp> = - 1; VIINS = [VCJOIN.cond #Offset instruction]; VEPC = VPC; if(VIMSK<vje>==1)signal ARM7 interrupt; VP STATE = VP IDLE; ｝ else VPC = VPC+4; 例外ＶＣＪＯＩＮインタラプトＶＣＪＳＲサブルーチンに対する条件付ジャ
ンプ Operation If ((Cond = VCSR [SO, GT, EQ, LT]) | (Cond = un)) ｛VISRC <vjp>=-1; VIINS = [VCJOIN.cond #Offset instruction]; VEPC = VPC ; if (VIMSK <vje> == 1) signal ARM7 interrupt; VP STATE = VP IDLE;｝ else VPC = VPC + 4; Exception VCJOIN interrupt VCJSR Conditional access to subroutine
Pump

【０４２５】[0425]

【表１２９】 [Table 129]

【０４２６】アセンブラ構文ＶＣＪＳＲ．ｃｏｎｄ＃Ｏｆｆｓｓｅｔここで、ｃｏｎｄ＝｛ｕｎ，ｌｔ，ｅｑ，ｌｅ，ｇｔ，
ｎｅ，ｇｅ，ｏｖ｝。Assembler syntax VCJSR. cond #Offset where cond = {un, lt, eq, le, gt,
ne, ge, ov｝.

【０４２７】説明もしＣｏｎｄが真であれば、サブルーチンにジャンプす
る。これは遅延したブランチでない。Description If Cond is true, jump to subroutine. This is not a delayed branch.

【０４２８】もしＣｏｎｄが真であれば、ＶＰＣ＋４
（リターンアドレス）がリターンアドレススタックにセ
ーブされる。もしそうでなければ、実行はＶＰＣ＋４で
続けられる。If Cond is true, VPC + 4
(Return address) is saved on the return address stack. If not, execution continues at VPC + 4.

【０４２９】演算 If((Cond==VCSR[SO,GT,EQ,LT])｜(Cond==un)) ｛ if(VSP<4>>15) ｛ VISRC<RASO> = 1; signal ARM7 with RASO exception; VP STATE = VP IDLE; ｝else｛ RSTACK[VSP<3:0>] = VPC+4; VSP<4:0> = VSP<4:0>+1; VPC = VPC+sex(Offset<22:0>^* 4); ｝｝ else VPC = VPC+4; 例外アドレススタックオーバフローリターンＶＣＪＳＲＩサブルーチンに対する条件付間接
ジャンプ Operation If ((Cond == VCSR [SO, GT, EQ, LT]) | (Cond == un)) ｛if (VSP <4 >> 15) ｛VISRC <RASO> = 1; signal ARM7 with RASO exception; VP STATE = VP IDLE;｝ else ｛RSTACK [VSP <3: 0>] = VPC + 4; VSP <4: 0> = VSP <4: 0>+1; VPC = VPC + sex (Offset <22: 0> ^* 4) ;｝｝ Else VPC = VPC + 4; Exception Address stack overflow return VCJSRI Conditional indirect to subroutine
Jump

【０４３０】[0430]

【表１３０】 [Table 130]

【０４３１】アセンブラ構文ＶＣＪＳＲＩ．ｃｏｎｄＳＲｂここで、ｃｏｎｄ＝｛ｕｎ，ｌｔ，ｅｑ，ｌｅ，ｇｔ，
ｎｅ，ｇｅ，ｏｖ｝。[0431] Assembler syntax VCJSRI. cond SRb where cond = {un, lt, eq, le, gt,
ne, ge, ov｝.

【０４３２】説明もしＣｏｎｄが真であれば、サブルーチンに間接ジャン
プする。これは遅延したブランチでない。Description If Cond is true, an indirect jump is made to a subroutine. This is not a delayed branch.

【０４３３】もしＣｏｎｄが真であれば、ＶＰＣ＋４
（リターンアドレス）がリターンアドレススタックにセ
ーブされる。もしそうでなければ、実行はＶＰＣ＋４で
続けられる。If Cond is true, VPC + 4
(Return address) is saved on the return address stack. If not, execution continues at VPC + 4.

【０４３４】演算 If((Cond==VCSR[SO,GT,EQ,LT])｜(Cond=un))｛ if(VSP<4:9>15)｛ VISRC<RASO> = 1; signal ARM7 with RASO exception; VP STATE = VP IDLE; ｝else｛ RSTACK[VSP<3:0>] = VPC+4; VSP<4:0> = VSP<4:0>+1; VPC = SRb<31:2>:b'OO; ｝｝else VPC = VPC+4; 例外アドレススタックオーバフローリターンＶＣＭＯＶ条件付ムーブ Operation If ((Cond == VCSR [SO, GT, EQ, LT]) | (Cond = un)) ｛if (VSP <4: 9> 15) ｛VISRC <RASO> = 1; signal ARM7 with RASO exception; VP STATE = VP IDLE;｝ else ｛RSTACK [VSP <3: 0>] = VPC + 4; VSP <4: 0> = VSP <4: 0>+1; VPC = SRb <31: 2>: b'OO;｝｝ else VPC = VPC + 4; Exception Address stack overflow return VCMOV Conditional move

【０４３５】[0435]

【表１３１】 [Table 131]

【０４３６】アセンブラ構文ＶＣＭＯＶ．ｄｔＲｄ，Ｒｂ，ｃｏｎｄＶＣＭＯＶ．ｄｔＲｄ，＃ＩＭＭ，ｃｏｎｄここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ，ｆ｝、ｃｏｎｄ＝
｛ｕｎ，ｌｔ，ｅｑ，ｌｅ，ｇｔ，ｎｅ，ｇｅ，ｏ
ｖ｝、．ｆと．ｗは．ｆデータタイプが９ビット即値オ
ペランドによって支援されないことを除いては同一の演
算を指定する。Assembler syntax VCMOV. dt Rd, Rb, cond VCMOV. dt Rd, #IMM, cond where dt = {b, b9, h, w, f}, cond =
｛Un, lt, eq, le, gt, ne, ge, o
v｝,. f and. w is. Specify the same operation except that the f data type is not supported by the 9-bit immediate operand.

【０４３７】[0437]

【表１３２】 [Table 132]

【０４３８】[0438]

【表１３３】 [Table 133]

【０４３９】演算 If((Cond=VCSR[SOV,GT,EQ,LT])｜(Cond==un)) for(i=0;i<NumElem;i++) Rd[i] ==｛Rb[i] ‖SRb ‖sex(IMM<8:0>) ｝；例外無しプログラミング注意この命令はエレメントマスクによって影響を受けず、−
ＶＣＭＯＶＭはエレメントマスクによって影響を受け
る。ベクトルアキュムレータで拡張された浮動小数点精
密表現は８エレメントに対する全ての５７６ビットを使
用する。従って、アキュムレータを含むベクトルレジス
タムーブは．ｂ９データサイズを指定すべきである。Operation If ((Cond = VCSR [SOV, GT, EQ, LT]) | (Cond == un)) for (i = 0; i <NumElem; i ++) Rd [i] == ｛Rb [i ] ‖SRb ‖sex (IMM <8: 0>)｝; Exception None Programming Note This instruction is not affected by the element mask,
VCMOVM is affected by the element mask. The floating point precision representation extended with the vector accumulator uses all 576 bits for 8 elements. Therefore, a vector register move that includes an accumulator is. b9 Data size should be specified.

【０４４０】ＶＣＭＯＶＭエレメントマスクを
有する条件付ムーブ [0440] The VCMOVM element mask is
Conditional move with

【０４４１】[0441]

【表１３４】 [Table 134]

【０４４２】アセンブラ構文ＶＣＭＯＶＭ．ｄｔＲｄ，Ｒｂ，ｃｏｎｄＶＣＭＯＶＭ．ｄｔＲｄ，＃ＩＭＭ，ｃｏｎｄここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ，ｆ｝、ｃｏｎｄ＝
｛ｕｎ，ｌｔ，ｅｑ，ｌｅ，ｇｔ，ｎｅ，ｇｅ，ｏ
ｖ｝、．ｆと．ｗは．ｆデータタイプが９ビット即値オ
ペランドによって支援されないことを除いては同一の演
算を指定する。Assembler syntax VCMOVM. dt Rd, Rb, cond VCMOVM. dt Rd, #IMM, cond where dt = {b, b9, h, w, f}, cond =
｛Un, lt, eq, le, gt, ne, ge, o
v｝,. f and. w is. Specify the same operation except that the f data type is not supported by the 9-bit immediate operand.

【０４４３】[0443]

【表１３５】 [Table 135]

【０４４４】[0444]

【表１３６】 [Table 136]

【０４４５】演算 If((Cond=VCSR[SO,GT,EQ,LT]) ｜(Cond=un)) for(i=0;i < NumElem; ＆＆ MMASK[i];i++) Rd[i] = ｛Rb[i] ‖SRb ‖sex(IMM<8:0>) ｝; 例外無しプログラミング注意この命令はＶＭＭＲエレメントマスクによって影響を受
け、−ＶＣＭＯＶはエレメントマスクによって影響を受
けない。ベクトルアキュムレータで拡張された浮動小数
点精密表現は８エレメントに対する全ての５７６ビット
を使用する。従って、アキュムレータを含むベクトルレ
ジスタムーブは．ｂ９データサイズを指定すべきであ
る。Operation If ((Cond = VCSR [SO, GT, EQ, LT]) | (Cond = un)) for (i = 0; i <NumElem;&& MMASK [i]; i ++) Rd [i] = ｛Rb [i] ‖SRb ‖sex (IMM <8: 0>)｝; Exceptions None Programming notes This instruction is affected by the VMMR element mask, -VCMOV is not affected by the element mask. The floating point precision representation extended with the vector accumulator uses all 576 bits for 8 elements. Therefore, a vector register move that includes an accumulator is. b9 Data size should be specified.

【０４４６】ＶＣＭＰＶ比較及びマスクセッ
ト [0446] VCMPV Comparison and Mask Set
G

【０４４７】[0447]

【表１３７】 [Table 137]

【０４４８】アセンブラ構文ＶＣＭＰＶ．ｄｔＶＲｄ，ＶＲｂ，ｃｏｎｄ．ｍａｓ
ｋＶＣＭＰＶ．ｄｔＶＲｄ，ＳＲｂ，ｃｏｎｄ．ｍａｓ
ｋここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ，ｆ｝、ｃｏｎｄ＝
｛ｕｎ，ｌｔ，ｅｑ，ｌｅ，ｇｔ，ｎｅ，ｇｅ，ｏ
ｖ｝、ｍａｓｋ＝｛ＶＧＭＲ，ＶＭＭＲ｝、もしマスク
が指定されなければ、ＶＧＭＲは仮想である。The assembler syntax VCMPV. dt VRd, VRb, cond. mas
k VCMPV. dt VRd, SRb, cond. mas
k where dt = {b, b9, h, w, f}, cond =
｛Un, lt, eq, le, gt, ne, ge, o
v}, mask = {VGMR, VMMR}, if no mask is specified, VGMR is virtual.

【０４４９】[0449]

【表１３８】 [Table 138]

【０４５０】説明ベクトルレジスタＶＲａ，ＶＲｂの内容は減算演算（Ｖ
Ｒａ[ｉ]−ＶＲｂ[ｉ]を実行することによりエレメント
方式で比較され、ＶＧＭＲ（もしＫ＝０）或いはＶＭＭ
Ｒ（もしＫ＝１）レジスタで対応するビット（＃ｉ）は
もし比較の結果がＶＣＭＰＶ命令のＣｏｎｄフィールド
と符合する場合にセットされる。例えば、Ｃｏｎｄフィ
ールドがＬＴより小さい場合、ＶＧＭＲ[ｉ]（またはＶ
ＭＭＲ[ｉ]）はＶＲａ[ｉ]＜ＶＲｂ[ｉ]の時にセットさ
れる。Description The contents of the vector registers VRa and VRb are subtracted (V
By performing Ra [i] -VRb [i], comparison is performed in an elemental manner, and VGMR (if K = 0) or VMM
The corresponding bit (#i) in the R (if K = 1) register is set if the result of the comparison matches the Cond field of the VCMPV instruction. For example, if the Cond field is smaller than LT, VGMR [i] (or V
MMR [i]) is set when VRa [i] <VRb [i].

【０４５１】演算 for(i=0;i < NumElem ; i++)｛ Bop[i] =｛Rb[i] ‖SRb ‖sex(IMM<8:0>) ｝; relationship[i] =Ra[i] ? Bop[i]; if(k=1) MMASK[i]=(relationship[i]==Cond) ? True:False; else EMASK[i]=(relationship[i]==Cond) ? True:False; 例外無しプログラミング注意この命令はエレメントマスクによって影響を受けない。Operation for (i = 0; i <NumElem; i ++) ｛Bop [i] = ｛Rb [i] ‖SRb‖sex (IMM <8: 0>)｝; relationship [i] = Ra [i] Bop [i]; if (k = 1) MMASK [i] = (relationship [i] == Cond)? True: False; else EMASK [i] = (relationship [i] == Cond)? True: False ; Exceptions None Programming Notes This instruction is not affected by the element mask.

【０４５２】ＶＣＮＴＬＺ先行ゼロカウント VCNTLZ Leading zero count

【０４５３】[0453]

【表１３９】 [Table 139]

【０４５４】アセンブラ構文ＶＣＮＴＬＺ．ｄｔＶＲｄ，ＶＲｂＶＣＮＴＬＺ．ｄｔＳＲｄ，ＳＲｂここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ｝。Assembler syntax VCNTLZ. dt VRd, VRb VCNTLZ. dt SRd, SRb where dt = {b, b9, h, w}.

【０４５５】[0455]

【表１４０】 [Table 140]

【０４５６】説明Ｒｂの各エレメントに対して先行ゼロの数をカウントし
て、Ｒｄにカウントをリターンする。Description The number of leading zeros is counted for each element of Rb, and the count is returned to Rd.

【０４５７】演算 for(i=0;i < NumElem ＆＆ EMASK[i];i++) ｛ Rd[i] = number of leading zeroes (Rb[i]); ｝例外無しプログラミング注意エレメントの全てのビットがゼロの場合、その結果はエ
レメントサイズ（それぞれバイト、バイト９、ハフワー
ド、或いはワードに対して８，９，１６，或いは３２）
と同一である。先行ゼロのカウントはエレメント位置の
インデックスと逆関係をもつ（もしＶＣＭＰＲ命令の次
に用いられる場合）。エレメント位置を変換するために
与えられたデータタイプに対するＮｕｍＥｌｅｍからＶ
ＣＮＴＬＺの結果を減算する。Operation for (i = 0; i <NumElem && EMASK [i]; i ++) ｛Rd [i] = number of leading zeroes (Rb [i]);｝ Exceptions None Programming attention All bits of element are zero , The result is the element size (8, 9, 16, or 32 for bytes, bytes 9, huff words, or words, respectively)
Is the same as The leading zero count is inversely related to the element position index (if used next to the VCMPR instruction). NumElem to V for a given data type to convert element position
Subtract the result of CNTLZ.

【０４５８】ＶＣＯＲ補数ＯＲ VCOR Complement's OR

【０４５９】[0459]

【表１４１】 [Table 141]

【０４６０】アセンブラ構文ＶＣＯＲ．ｄｔＶＲｄ，ＶＲａ，ＶＲｂＶＣＯＲ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂＶＣＯＲ．ｄｔＶＲｄ，ＶＲａ，＃ＩＭＭＶＣＯＲ．ｄｔＳＲｄ，ＳＲａ，ＳＲｂＶＣＯＲ．ｄｔＳＲｄ，ＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗであり｝、．ｗと．
ｆが同一の演算を指定することに留意されたい。The assembler syntax VCOR. dt VRd, VRa, VRb VCOR. dt VRd, VRa, SRb VCOR. dt VRd, VRa, #IMM VCOR. dt SRd, SRa, SRb VCOR. dt SRd, SRa, #IMM where dt = {b, b9, h, w and {,. w and.
Note that f specifies the same operation.

【０４６１】[0461]

【表１４２】 [Table 142]

【０４６２】説明ＲａとＲｂ／即値オペランドの補数を論理的にＯＲし、
その結果を目的レジスタＲｄにリターンさせる。Explanation The logical complement of Ra and Rb / the complement of the immediate operand is logically ORed.
The result is returned to the destination register Rd.

【０４６３】演算 for(i=0;i < NumElem ＆＆ EMASK[i];i++) ｛ Bop[i] = ｛VRb[i]‖SRb ‖sex(IMM<8:0>) ｝; Rd[i]<k> = -Ra[i]<k> ｜Bop[i]<k>,k = for all bits in element; ｝例外無しＶＣＲＳＲサブルーチンからの条件付リター
ン Operation for (i = 0; i <NumElem && EMASK [i]; i ++) ｛Bop [i] = ｛VRb [i] ‖SRb ‖sex (IMM <8: 0>)｝; Rd [i] <k> = -Ra [i] <k> │Bop [i] <k>, k = for all bits in element;｝ Exceptions None Conditional return from VCRSR subroutine
N

【０４６４】[0464]

【表１４３】 [Table 143]

【０４６５】アセンブラ構文ＶＣＲＳＲ．ｃｏｎｄここで、ｃｏｎｄ＝｛ｕｎ，ｌｔ，ｅｑ，ｌｅ，ｇｔ，
ｎｅ，ｇｅ，ｏｖ｝。Assembler syntax VCRSR. cond where cond = {un, lt, eq, le, gt,
ne, ge, ov｝.

【０４６６】説明もしＣｏｎｄが真であれば、サブルーチンにリターンす
る。これは遅延したブランチでない。Description If Cond is true, return to the subroutine. This is not a delayed branch.

【０４６７】もしＣｏｎｄが真であれば、リターンアド
レススタックにセーブされたリターンアドレスから実行
が続けられる。もしそうでなければ、実行はＶＰＣ＋４
で続けられる。If Cond is true, execution continues from the return address saved on the return address stack. If not, execution is VPC + 4
Continue with.

【０４６８】演算 If((Cond==VCSR[SO,GT,EQ,LT])｜(Cond=un))｛ if(VSP<4:0> == 0) ｛ VISRC<RASU> = 1; signal ARM7 with RASU exception; VP STATE = VP IDLE; ｝else｛ VSP<4:0> = VSP<4:0> -1; VPC = RSTACK[VSP<3:0>]; VPC<1:0> = b'00; ｝｝ else VPC = VPC+4; 例外命令アドレス無効、アドレススタックオーバフローリタ
ーン。Operation If ((Cond == VCSR [SO, GT, EQ, LT]) | (Cond = un)) ｛if (VSP <4: 0> == 0) ｛VISRC <RASU> = 1; signal ARM7 with RASU exception; VP STATE = VP IDLE;｝ else ｛VSP <4: 0> = VSP <4: 0>-1; VPC = RSTACK [VSP <3: 0>]; VPC <1: 0> = b'00;｝｝ else VPC = VPC +4; Exception Instruction address invalid, address stack overflow return.

【０４６９】ＶＣＶＴＢ９バイト９データタイ
プ変換 [0469] VCVTB9 byte 9 data type
Conversion

【０４７０】[0470]

【表１４４】 [Table 144]

【０４７１】アセンブラ構文ＶＣＶＴＢ９．ｍｄＶＲｄ，ＶＲｂＶＣＶＴＢ９．ｍｄＳＲｄ，ＳＲｂここで、ｍｄ＝｛ｂｂ９，ｂ９ｈ，ｈｂ９｝Assembler syntax VCVTB9. md VRd, VRb VCVTB9. md SRd, SRb where md = {bb9, b9h, hb9}

【０４７２】[0472]

【表１４５】 [Table 145]

【０４７３】説明Ｒｂの各エレメントはバイトからバイト９（ｂｂ９）
へ、バイト９からハフワード（ｂ９ｈ）へ、或いはハフ
ワードからバイト９（ｈｂ９）へ変換する。Explanation Each element of Rb is from byte to byte 9 (bb9)
, Byte 9 to Huffword (b9h), or Huffword to byte 9 (hb9).

【０４７４】演算 if(md<1:0> = 0)｛ //bb9 for byte to byte 9 conversion VRd = VRb; VRd<9i+8> = VRb<9i+7>, I = 0 to 31(or 63 in VEC64 mode)｝ else if(md<1:0>==2)｛ //b9h for byte9 to halfword conversion VRd = VRb ; VRd<18i+16:18i+9>=VRb<18i+8>,i=0 to 15(or 31 in VEC64 mode)｝ else if(md<1:0> = 3) ｛ //hb9 for halfword to byte9 conversion VRd<18i+8>=VRb<18i+9>,i=0 to 15(or 31 in VEC64 mode) else VRd = undefined; 例外無しプログラミング注意ｂ９ｈはモードを有するこのような命令を使用する前に
プログラマはシャフル(shuffle)演算をもつベクトルレ
ジスタにエレメントの減少した数を調整することが要求
される。ｈｂ９モードを有するこのような命令を使用し
た後、プログラマはアンシャフル演算をもつ目的ベクト
ルレジスタにエレメントの増加した数を調整することが
要求される。この命令はエレメントマスクによって影響
を受けない。Operation if (md <1: 0> = 0) ｛// bb9 for byte to byte 9 conversion VRd = VRb; VRd <9i + 8> = VRb <9i + 7>, I = 0 to 31 (or 63 in VEC64 mode)｝ else if (md <1: 0> == 2) ｛// b9h for byte9 to halfword conversion VRd = VRb; VRd <18i + 16: 18i + 9> = VRb <18i + 8>, i = 0 to 15 (or 31 in VEC64 mode)｝ else if (md <1: 0> = 3) ｛// hb9 for halfword to byte9 conversion VRd <18i + 8> = VRb <18i + 9>, i = 0 to 15 (or 31 in VEC64 mode) else VRd = undefined; Exceptions None Programming notes b9h is a programmer before using such an instruction with mode the programmer must have the reduced number of elements in the vector register with shuffle operation. Is required to be adjusted. After using such an instruction with the hb9 mode, the programmer is required to adjust the increased number of elements to the destination vector register with unshuffle operation. This instruction is not affected by the element mask.

【０４７５】ＶＣＶＴＦＦ浮動小数点の固定小数点への変換 VCVTFF Convert floating point to fixed point

【０４７６】[0476]

【表１４６】 [Table 146]

【０４７７】アセンブラ構文ＶＣＶＴＦＦＶＲｄ，ＶＲａ，ＳＲｂＶＣＶＴＦＦＶＲｄ，ＶＲａ，＃ＩＭＭＶＣＶＴＦＦＳＲｄ，ＳＲａ，ＳＲｂＶＣＶＴＦＦＳＲｄ，ＳＲａ，＃ＩＭＭAssembler syntax VCVTFF VRd, VRa, SRb VCVTFF VRd, VRa, #IMM VCVTFF SRd, SRa, SRb VCVTFF SRd, SRa, #IMM

【０４７８】[0478]

【表１４７】 [Table 147]

【０４７９】説明ベクトル／スカラレジスタＲａの内容はＹの幅がＲｂ
（モジュロ３２）或いはＩＭＭフィールドによって指定
され、Ｘの幅が（３２−Ｙの幅）に定義される場合、３
２ビット浮動小数点からフォーマット（Ｘ，Ｙ）の固定
小数点実数に変換される。Description The content of the vector / scalar register Ra is such that the width of Y is Rb.
(Modulo 32) or specified by the IMM field and if the width of X is defined as (32-Y width), 3
It is converted from a 2-bit floating point to a fixed point real number in the format (X, Y).

【０４８０】演算 Y size =｛SRb ％ 32 ‖ IMM<4.0> ｝； for(i＝0;i<NumElem;i++)｛ Rd[i] = convert to < 32-Y size.Y size>format(Ra[i]); ｝例外オーバフロープログラミング注意この命令は単にワードデータサイズのみを支援する。こ
の命令はアーキテクチャがレジスタ内に多重データタイ
プを支援しないために、エレメントマスクを使用しな
い。この命令は整数データタイプに対してゼロ四捨五入
モードから切り捨てを使用する。Operation Y size = ｛SRb% 32 ‖ IMM <4.0>｝; for (i = 0; i <NumElem; i ++) ｛Rd [i] = convert to <32-Y size.Y size> format (Ra [i]);｝ Exceptions Overflow Programming Note This instruction only supports word data size only. This instruction does not use an element mask because the architecture does not support multiple data types in registers. This instruction uses truncation from zero rounding mode for integer data types.

【０４８１】ＶＣＶＴＩＦ整数の浮動小数
点への変換 VCVTIF Integer floating point number
Convert to points

【０４８２】[0482]

【表１４８】 [Table 148]

【０４８３】アセンブラ構文ＶＣＶＴＩＦＶＲｄ，ＶＲｂＶＣＶＴＩＦＶＲｄ，ＳＲｂＶＣＶＴＩＦＳＲｄ，ＳＲａAssembler syntax VCVTIF VRd, VRb VCVTIF VRd, SRb VCVTIF SRd, SRa

【０４８４】[0484]

【表１４９】 [Table 149]

【０４８５】説明ベクトル／スカラレジスタＲｂの内容はｉｎｔ３２から
フロートデータタイプに変換され、その結果はベクトル
／スカラレジスタＲｄに記憶される。Description The contents of the vector / scalar register Rb are converted from int32 to the float data type, and the result is stored in the vector / scalar register Rd.

【０４８６】演算 for(i=0; i<NumElem : i++) ｛ Rd[i] = convert to floating point format(Rb[i]); ｝例外無しプログラミング注意この命令は単にワードデータサイズのみを支援する。こ
の命令はアーキテクチャがレジスタ内に多重データタイ
プを支援しないために、エレメントマスクを使用しな
い。Operation for (i = 0; i <NumElem: i ++) ｛Rd [i] = convert to floating point format (Rb [i]);｝ Exceptions None Programming note This instruction only supports word data size only . This instruction does not use an element mask because the architecture does not support multiple data types in registers.

【０４８７】ＶＤ１ＣＢＲＶＣＲ１減少及び条
件付ブランチ VD1CBR VCR1 Reduction and Conditions
Branch with condition

【０４８８】[0488]

【表１５０】 [Table 150]

【０４８９】アセンブラ構文ＶＤ１ＣＢＲ．ｃｏｎｄ＃Ｏｆｆｓｅｔここで、ｃｏｎｄ＝｛ｕｎ，ｌｔ，ｅｑ，ｌｅ，ｇｔ，
ｎｅ，ｇｅ，ｏｖ｝。Assembler syntax VD1CBR. cond #Offset where cond = {un, lt, eq, le, gt,
ne, ge, ov｝.

【０４９０】説明ＶＣＲ１を減少させ、もしＣｏｎｄが真であればブラン
チする。これは遅延したブランチではない。Description Decreases VCR1 and branches if Cond is true. This is not a delayed branch.

【０４９１】演算 VCR1 = VCR1 - 1; If((VCR1 > 0) ＆ ((Cond = VCSR[SO,GT,EQ,LT])｜(Cond == un))) VPC = VPC+sex(Offset<22:0>^*4); else VPC = VPC+4; 例外命令アドレス無効プログラミング注意ＶＣＲ１はブランチ条件がチェックされる前に減少され
る。ＶＣＲ１が０の時、この命令を実行することはルー
プカウント２³²−１に効果的にセットする。Operation VCR1 = VCR1-1; If ((VCR1> 0) & ((Cond = VCSR [SO, GT, EQ, LT]) | (Cond == un))) VPC = VPC + sex (Offset < 22: 0> ^* 4); else VPC = VPC + 4; Exception Instruction address invalid Programming note VCR1 is decremented before branch condition is checked. Executing this instruction when VCR1 is 0 effectively sets the loop count 2 ³² -1.

【０４９２】ＶＤ２ＣＢＲＶＣＲ２減少及び条
件付ブランチ VD2CBR VCR2 Reduction and Conditions
Branch with condition

【０４９３】[0493]

【表１５１】 [Table 151]

【０４９４】アセンブラ構文ＶＤ２ＣＢＲ．ｃｏｎｄ＃Ｏｆｆｓｅｔここで、ｃｏｎｄ＝｛ｕｎ，ｌｔ，ｅｑ，ｌｅ，ｇｔ，
ｎｅ，ｇｅ，ｏｖ｝。Assembler syntax VD2CBR. cond #Offset where cond = {un, lt, eq, le, gt,
ne, ge, ov｝.

【０４９５】説明ＶＣＲ２を減少させ、もしＣｏｎｄが真であればブラン
チする。これは遅延したブランチではない。Description Decreases VCR2 and branches if Cond is true. This is not a delayed branch.

【０４９６】演算 VCR2 = VCR2 - 1; If((VCR2 > 0) ＆ ((Cond = VCSR[SO,GT,EQ,LT])｜(Cond = un))) VPC = VPC+sex(Offset<22:0>^*4); else VPC = VPC+4; 例外命令アドレス無効プログラミング注意ＶＣＲ２はブランチ条件がチェックされる前に減少され
る。ＶＣＲ２が０の時、この命令を実行することはルー
プカウント２³²−１に効果的にセットする。Operation VCR2 = VCR2−1; If ((VCR2> 0) & ((Cond = VCSR [SO, GT, EQ, LT]) | (Cond = un))) VPC = VPC + sex (Offset <22 : 0> ^* 4); else VPC = VPC + 4; Exception Instruction address invalid Programming note VCR2 is decremented before branch condition is checked. Executing this instruction when VCR2 is 0 effectively sets the loop count 2 ³² -1.

【０４９７】ＶＤ３ＣＢＲＶＣＲ３減少及び条
件付ブランチ VD3CBR VCR3 Reduction and Conditions
Branch with condition

【０４９８】[0498]

【表１５２】 [Table 152]

【０４９９】アセンブラ構文ＶＤ３ＣＢＲ，ｃｏｎｄ＃Ｏｆｆｓｅｔここで、ｃｏｎｄ＝｛ｕｎ，ｌｔ，ｅｑ，ｌｅ，ｇｔ，
ｎｅ，ｇｅ，ｏｖ｝。Assembler syntax VD3CBR, cond #Offset where cond = {un, lt, eq, le, gt,
ne, ge, ov｝.

【０５００】説明ＶＣＲ３を減少させ、もしＣｏｎｄが真であればブラン
チする。これは遅延したブランチではない。Description Decreases VCR3 and branches if Cond is true. This is not a delayed branch.

【０５０１】演算 VCR3 = VCR3 - 1; If((VCR3 > 0) ＆ ((Cond = VCSR[SO,GT,EQ,LT])｜(Cond = un))) VPC = VPC+sex(Offset<22:0>^*4); else VPC = VPC+4; 例外命令アドレス無効プログラミング注意ＶＣＲ３はブランチ条件がチェックされる前に減少され
る。ＶＣＲ３が０の時、この命令を実行することはルー
プカウント２³²−１に効果的にセットする。Operation VCR3 = VCR3-1; If ((VCR3> 0) & ((Cond = VCSR [SO, GT, EQ, LT]) | (Cond = un))) VPC = VPC + sex (Offset <22 : 0> ^* 4); else VPC = VPC + 4; Exception Instruction address invalid Programming note VCR3 is decremented before branch condition is checked. Executing this instruction when VCR3 is 0 effectively sets the loop count 2 ³² -1.

【０５０２】ＶＤＩＶ２Ｎ２ⁿによる分割 Division by VDIV2N 2 ⁿ

【０５０３】[0503]

【表１５３】 [Table 153]

【０５０４】アセンブラ構文ＶＤＩＶ２Ｎ．ｄｔＶＲｄ，ＶＲａ，ＶＲｂＶＤＩＶ２Ｎ．ｄｔＶＲｄ，ＶＲａ，＃ＩＭＭＶＤＩＶ２Ｎ．ｄｔＳＲｄ，ＳＲａ，ＳＲｂＶＤＩＶ２Ｎ．ｄｔＳＲｄ，ＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ｝Assembler syntax VDIV2N. dt VRd, VRa, VRb VDIV2N. dt VRd, VRa, #IMM VDIV2N. dt SRd, SRa, SRb VDIV2N. dt SRd, SRa, #IMM where dt = {b, b9, h, w}

【０５０５】[0505]

【表１５４】 [Table 154]

【０５０６】説明ベクトル／スカラレジスタＲａの内容はｎがスカラレジ
スタ（Ｒｂ或いはＩＭＭ）の陽の整数である場合、２ⁿ
によって分割され、その最終結果はベクトル／スカラレ
ジスタＲｄに記憶される。この命令は四捨五入モードで
として切捨て（ゼロを向かって四捨五入）を使用する。Description The contents of the vector / scalar register Ra is 2 ⁿ if ⁿ is an explicit integer in a scalar register (Rb or IMM).
And the final result is stored in the vector / scalar register Rd. This instruction uses truncation (round toward zero) as in rounding mode.

【０５０７】例外無しプログラミング注意ＮがＳＲｂ或いはＩＭＭ（４：０）から５ビット数で得
られる点に留意されたい。バイト、バイト９、ハフワー
ドデータタイプの場合、プログラマはデータサイズで精
度が低いか同一のＮの値を正確に指定する責任がある。
もしそれが指定されたデータサイズの精度よりさらに大
きければ、エレメントは符号ビットで充てんされる。こ
の命令は四捨五入モードとしてゼロを向かって四捨五入
を使用する。[0507] Exceptions None Programming Notes Note that N is obtained in 5 bits from SRb or IMM (4: 0). For the Byte, Byte 9, Huffword data type, the programmer is responsible for accurately specifying the same or smaller value of N in data size.
If it is greater than the precision of the specified data size, the element is filled with sign bits. This instruction uses round toward zero as the rounding mode.

【０５０８】ＶＤＩＶ２Ｎ．Ｆ２ⁿフロートによる分割 VDIV2N. Split by F 2 ⁿ float

【０５０９】[0509]

【表１５５】 [Table 155]

【０５１０】アセンブラ構文ＶＤＩＶ２Ｎ．ｆＶＲｄ，ＶＲａ，ＶＲｂＶＤＩＶ２Ｎ．ｆＶＲｄ，ＶＲａ，＃ＩＭＭＶＤＩＶ２Ｎ．ｆＳＲｄ，ＳＲａ，ＳＲｂＶＤＩＶ２Ｎ．ｆＳＲｄ，ＳＲａ，＃ＩＭＭAssembler syntax VDIV2N. f VRd, VRa, VRb VDIV2N. f VRd, VRa, #IMM VDIV2N. f SRd, SRa, SRb VDIV2N. f SRd, SRa, #IMM

【０５１１】[0511]

【表１５６】 [Table 156]

【０５１２】説明ベクトル／スカラレジスタＲａの内容はｎがスカラレジ
スタ（Ｒｂ或いはＩＭＭ）の陽の整数の場合、２ⁿによ
って分割され、その最終結果はベクトル／スカラレジス
タＲｄに記憶される。Description The contents of the vector / scalar register Ra are divided by 2 ⁿ if ⁿ is an explicit integer of a scalar register (Rb or IMM), and the final result is stored in the vector / scalar register Rd.

【０５１３】例外無しプログラミング注意ＮがＳＲｂ或いはＩＭＭ（４：０）から５ビット数で得
られる点に留意されたい。[0513] Exceptions None Programming Notes Note that N is obtained in 5 bits from SRb or IMM (4: 0).

【０５１４】ＶＤＩＶＩ分割初期化−不完全 VDIVI Split Initialization-Incomplete

【０５１５】[0515]

【表１５７】 [Table 157]

【０５１６】アセンブラ構文ＶＤＩＶＩ．ｄｓＶＲｂＶＤＩＶＩ．ｄｓＳＲｂここで、ｄｓ＝｛ｂ，ｂ９，ｈ，ｗ｝。Assembler syntax VDIVI. ds VRb VDIVI. ds SRb where ds = {b, b9, h, w}.

【０５１７】[0517]

【表１５８】 [Table 158]

【０５１８】説明非復原符号付き整数除算の初期化段階を実行する。被除
数はアキュムレータで倍精度符号付き整数である。もし
被除数が単精度の場合、それは倍精度で符号拡張されて
ＶＡＣＯＨ及びＶＡＣＯＬに記憶されるべきである。除
数はＲｂで単精度符号付き整数である。Description Performs the initialization step of non-restored signed integer division. The dividend is an accumulator and is a double precision signed integer. If the dividend is single precision, it should be sign-extended with double precision and stored in VACOH and VACOL. The divisor is Rb, a single precision signed integer.

【０５１９】被除数の符号(sign)が除数の符号と同一で
あれば、Ｒｂはアキュムレータの上位から減算され、そ
うでなければ、Ｒｂはアキュムレータの上位に加算され
る。If the sign of the dividend is the same as the sign of the divisor, Rb is subtracted from the upper part of the accumulator; otherwise, Rb is added to the upper part of the accumulator.

【０５２０】例外無しプログラミング注意プログラマは分割ステップ前にオーバフロー或いはゼロ
による除算の場合を検出ことが要求される。[0520] Exceptions None Programming Notes The programmer is required to detect overflow or division by zero before the division step.

【０５２１】ＶＩＤＶＳ分割ステップ−不完
全 VIDVS Split Step-Incomplete
all

【０５２２】[0522]

【表１５９】 [Table 159]

【０５２３】アセンブラ構文ＶＤＩＶＳ．ｄｓＶＲｂＶＤＩＶＳ．ｄｓＳＲｂここで、ｄｓ＝｛ｂ，ｂ９，ｈ，ｗ｝。Assembler syntax VDIVS. ds VRb VDIVS. ds SRb where ds = {b, b9, h, w}.

【０５２４】[0524]

【表１６０】 [Table 160]

【０５２５】説明被復原符号付き除算の一つの反復ステップを行う。この
命令はデータサイズの多数倍（即ち、ｉｎｔ８データタ
イプに対して８倍、ｉｎｔ９に対して９倍、ｉｎｔ１６
に対して１６倍、そしてｉｎｔ３２データタイプに対し
て３２倍）として実行されるべきである。ＶＤＩＶＩ命
令はアキュムレータで初期部分の残りを生成するための
除算ステップ前に一度使用されるべきである。除数はＲ
ｂで単精度符号付き整数である。一応、商ビットはステ
ップごとに抽出されてアキュムレータの最下位ビットに
シフトされる。もし部分残りの符号がＲｂの除数の符号
と同一であれば、Ｒｂはアキュムレータの上位から減算
される。もし同一でなければ、Ｒｂはアキュムレータの
上位に加算される。商ビットはもしアキュムレータで結
果的な部分残り（加算或いは減算）の符号が除数の符号
と同一であれば、１である。そうでなければ、商ビット
はゼロ（０）である。アキュムレータは商ビットが充て
んされた状態で１ビット位置だけ左にシフトされる。除
算ステップの結論として、残りはアキュムレータの上位
に、商はアキュムレータの下位に記録される。商は１の
補数形態である。Description One repetition step of the restored signed division is performed. This instruction is multiple times the data size (ie, 8 times for int8 data type, 9 times for int9, int16
16 times for the int32 data type and 32 times for the int32 data type). The VDIVI instruction should be used once before the divide step to generate the remainder of the initial part in the accumulator. The divisor is R
b is a single precision signed integer. First, the quotient bit is extracted at each step and shifted to the least significant bit of the accumulator. If the sign of the partial remainder is the same as the sign of the divisor of Rb, Rb is subtracted from the upper part of the accumulator. If not, Rb is added to the top of the accumulator. The quotient bit is 1 if the sign of the resulting partial remainder (addition or subtraction) in the accumulator is the same as the sign of the divisor. Otherwise, the quotient bit is zero (0). The accumulator is shifted left one bit position with the quotient bit filled. At the conclusion of the division step, the remainder is recorded above the accumulator and the quotient is recorded below the accumulator. The quotient is in one's complement form.

【０５２６】演算ＶＥＳＬ１だけエレメントを左にシフトする Operation VESL Shift element left by one

【０５２７】[0527]

【表１６１】 [Table 161]

【０５２８】アセンブラ構文ＶＥＳＬ．ｄｔＳＲｃ，ＶＲｄ，ＶＲａ，ＳＲｂここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ，ｆ｝、．ｗと．ｆ
が同一演算を指定することに留意されたい。Assembler syntax VESL. dt SRc, VRd, VRa, SRb where dt = {b, b9, h, w, f},. w and. f
Specifies the same operation.

【０５２９】[0529]

【表１６２】 [Table 162]

【０５３０】説明１位置だけ左にベクトルレジスタＲａのエレメントをシ
フトし、スカラレジスタＲｂから充てんする。シフトさ
れた一番左側のエレメントはスカラレジスタＲｃにリタ
ーンされ、残りのエレメントはベクトルレジスタＲｄに
リターンされる。Description The elements of the vector register Ra are shifted one position to the left, and are filled from the scalar register Rb. The shifted leftmost element is returned to the scalar register Rc, and the remaining elements are returned to the vector register Rd.

【０５３１】[0531]

【表１６３】 [Table 163]

【０５３２】例外無しプログラミング注意この命令はエレメントマスクによって影響を受けない。[0532] Exceptions None Programming Notes This instruction is not affected by the element mask.

【０５３３】ＶＥＳＲ１だけエレメントを
右にシフトする [0533] VESR Only 1 element
Shift right

【０５３４】[0534]

【表１６４】 [Table 164]

【０５３５】アセンブラ構文ＶＥＳＲ．ｄｔＳＲｃ，ＶＲｄ，ＶＲａ，ＳＲｂここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ，ｆ｝、．ｗと．ｆ
が同一演算を指定することに留意されたい。Assembler syntax VESR. dt SRc, VRd, VRa, SRb where dt = {b, b9, h, w, f},. w and. f
Specifies the same operation.

【０５３６】[0536]

【表１６５】 [Table 165]

【０５３７】説明１位置だけ右にベクトルレジスタＲａのエレメントをシ
フトし、スカラレジスタＲｂから充てんする。シフトさ
れた一番右側のエレメントはスカラレジスタＲｃにリタ
ーンされ、残りのエレメントはベクトルレジスタＲｄに
リターンされる。Description The elements of the vector register Ra are shifted right by one position, and are filled from the scalar register Rb. The rightmost shifted element is returned to the scalar register Rc, and the remaining elements are returned to the vector register Rd.

【０５３８】[0538]

【表１６６】 [Table 166]

【０５３９】例外無しプログラミング注意この命令はエレメントマスクによって影響を受けない。[0539] Exceptions None Programming Notes This instruction is not affected by the element mask.

【０５４０】ＶＥＸＴＲＴ１エレメント抽出 VEXTRT One Element Extraction

【０５４１】[0541]

【表１６７】 [Table 167]

【０５４２】アセンブラ構文ＶＥＸＴＲＴ．ｄｔＳＲｄ，ＶＲａ，ＳＲｂＶＥＸＴＲＴ．ｄｔＳＲｄ，ＶＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ，ｆ｝、．ｗと．ｆ
が同一演算を指定することに留意されたい。The assembler syntax VEXTRT. dt SRd, VRa, SRb VEXTRT. dt SRd, VRa, #IMM where dt = {b, b9, h, w, f},. w and. f
Specifies the same operation.

【０５４３】[0543]

【表１６８】 [Table 168]

【０５４４】説明インデックスがスカラレジスタＲｂ或いはＩＭＭフィー
ルドによって指定されるＲａベクトルレジスタからエレ
メントを抽出してスカラレジスタＲｄに記憶させる。Description An element is extracted from the Ra vector register whose index is specified by the scalar register Rb or the IMM field and stored in the scalar register Rd.

【０５４５】演算 index32 = ｛SRb ％ 32 ‖ IMM<4:0> ｝; index64 = ｛SRb ％ 64 ‖ IMM<5:0> ｝; index = (VCSR<vec64>) ？ index64 : index32; SRd = VRa[index]; 例外無しプログラミング注意この命令はエレメントマスクによって影響を受けない。Operation index32 = ｛SRb% 32 ‖ IMM <4: 0>｝; index64 = ｛SRb% 64 ‖ IMM <5: 0>｝; index = (VCSR <vec64>)? index64: index32; SRd = VRa [index]; Exceptions None Programming notes This instruction is not affected by the element mask.

【０５４６】ＶＥＸＴＳＮＧ２（１，−１）の符号
抽出 [0546] Code of VEXTSNG2 (1, -1)
Extraction

【０５４７】[0547]

【表１６９】 [Table 169]

【０５４８】アセンブラ構文ＶＥＸＴＳＮＧ２．ｄｔＶＲｄ，ＶＲａＶＥＸＴＳＮＧ２．ｄｔＳＲｄ，ＳＲａここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ｝。Assembler syntax VEXTSNG2. dt VRd, VRa VEXTSNG2. dt SRd, SRa where dt = {b, b9, h, w}.

【０５４９】[0549]

【表１７０】 [Table 170]

【０５５０】説明ベクトル／スカラレジスタＲａの内容の符号値はエレメ
ントのように計算されて、その結果はベクトル／スカラ
レジスタＲｄに記憶される。Description The sign value of the contents of the vector / scalar register Ra is calculated like an element, and the result is stored in the vector / scalar register Rd.

【０５５１】演算 for(i=0; i<NumElem ＆＆ EMASK[i]; i++)｛ Rd[i] = (Ra[i]<0) ？-1: 1; ｝例外無しＶＥＸＴＳＮＧ３（１，０，−１）の符号抽出 Operation for (i = 0; i <NumElem && EMASK [i]; i ++) ｛Rd [i] = (Ra [i] <0)? -1: 1;｝ Exceptions None VEXTSNG3 Extract code of (1,0, -1)

【０５５２】[0552]

【表１７１】 [Table 171]

【０５５３】アセンブラ構文ＶＥＸＴＳＮＧ３．ｄｔＶＲｄ，ＶＲａＶＥＸＴＳＮＧ３．ｄｔＳＲｄ，ＳＲａここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ｝。Assembler syntax VEXTSNG3. dt VRd, VRa VEXTSNG3. dt SRd, SRa where dt = {b, b9, h, w}.

【０５５４】[0554]

【表１７２】 [Table 172]

【０５５５】説明ベクトル／スカラレジスタＲａの内容の符号値はエレメ
ントのように計算されて、その結果はベクトル／スカラ
レジスタＲｄに記憶される。Description The sign value of the contents of the vector / scalar register Ra is calculated like an element, and the result is stored in the vector / scalar register Rd.

【０５５６】演算 for(i=0; i<NumElem ＆＆ EMASK[i]; i++)｛ if(Ra[i] > 0) Rd[i]=1; else if(Ra[i] < 0) Rd[i]=-1; else Ｒｄ［ｉ］＝０；｝例外無しＶＩＮＳＲＴ１エレメント挿入 Operation for (i = 0; i <NumElem && EMASK [i]; i ++) ｛if (Ra [i]> 0) Rd [i] = 1; else if (Ra [i] <0) Rd [ i] =-1; else Rd [i] = 0;｝ Exception None VINSRT 1 element insertion

【０５５７】[0557]

【表１７３】 [Table 173]

【０５５８】アセンブラ構文ＶＩＮＳＲＴ．ｄｔＶＲｄ，ＳＲａ，ＳＲｂＶＩＮＳＲＴ．ｄｔＳＲｄ，ＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ，ｆ｝、．ｗと．ｆ
が同一の演算を指定する点に留意されたい。The assembler syntax VINSRT. dt VRd, SRa, SRb VINSRT. dt SRd, SRa, #IMM where dt = {b, b9, h, w, f},. w and. f
Specify the same operation.

【０５５９】[0559]

【表１７４】 [Table 174]

【０５６０】説明スカラレジスタＲａのエレメントをスカラレジスタＲｂ
或いはＩＭＭフィールドによって指定されたインデック
スにあるベクトルレジスタＲｄへ挿入する。Description The elements of the scalar register Ra are replaced with the scalar register Rb.
Alternatively, the data is inserted into the vector register Rd at the index specified by the IMM field.

【０５６１】演算 index32 = ｛SRb ％ 32 ‖ IMM<4:0> ｝; index64 = ｛SRb ％ 64 ‖ IMM<5:0> ｝; index = (VCSR<vec64>) ？ index64 : index32; VRd[index]＝SRa; 例外無しプログラミング注意この命令はエレメントマスクによって影響を受けない。Operation index32 = ｛SRb% 32 ‖ IMM <4: 0>｝; index64 = ｛SRb% 64 ‖ IMM <5: 0>｝; index = (VCSR <vec64>)? index64: index32; VRd [index] = SRa; Exceptions None Programming note This instruction is not affected by the element mask.

【０５６２】ＶＬロード VL Load

【０５６３】[0563]

【表１７５】 [Table 175]

【０５６４】アセンブラ構文ＶＬ．ｌｔＲｄ，ＳＲｂ，ＳＲｉＶＬ．ｌｔＲｄ，ＳＲｂ，＃ＩＭＭＶＬ．ｌｔＲｄ，ＳＲｂ＋，ＳＲｉＶＬ．ｌｔＲｄ，ＳＲｂ＋，＃ＩＭＭここで、ｌｔ＝｛ｂ，ｂｚ９，ｂｓ９，ｈ，ｗ，４，
８，１６，３２，６４｝、Ｒｄ＝｛ＶＲｄ，ＶＲＡｄ，
ＳＲｄ｝、．ｗと．ｆは同一の演算が指定され、．６４
とＶＲＡｄは共に指定され得ない点に留意されたい。キ
ャッシュオフロードのためにＶＬＯＦＦを使用する。[0564] Assembler syntax VL. lt Rd, SRb, SRi VL. lt Rd, SRb, #IMM VL. lt Rd, SRb +, SRi VL. lt Rd, SRb +, #IMM where lt = {b, bz9, bs9, h, w, 4,
8, 16, 32, 64}, Rd = {VRd, VRAd,
SRd｝,. w and. f designates the same operation, and. 64
And VRAd cannot be specified together. Use VLOFF for cache offload.

【０５６５】説明現在或いは交替バンク或いはスカラレジスタにベクトル
レジスタをロードする。Description Loads a vector register into the current or replacement bank or scalar register.

【０５６６】演算 EA= SR_b+ ｛SR_i‖ sex(IMM<7:0>)); if(A==1)SR_b=EA; R_d= see table below;Operation EA = SR _b + ｛SR _i ‖ sex (IMM <7: 0>)); if (A == 1) SR _b = EA; R _d = see table below;

【０５６７】[0567]

【表１７６】 [Table 176]

【０５６８】例外データアドレス、非整列アクセス無効プログラミング注意この命令はエレメントマスクによって影響を受けない。Exceptions Data Address, Unaligned Access Invalid Programming Note This instruction is not affected by the element mask.

【０５６９】ＶＬＣＢ循環バッファからロード VLCB Load from circular buffer

【０５７０】[0570]

【表１７７】 [Table 177]

【０５７１】アセンブラ構文ＶＬＣＢ．ｌｔＲｄ，ＳＲｂ，ＳＲｉＶＬＣＢ．ｌｔＲｄ，ＳＲｂ，＃ＩＭＭＶＬＣＢ．ｌｔＲｄ，ＳＲｂ＋，ＳＲｉＶＬＣＢ．ｌｔＲｄ，ＳＲｂ＋，＃ＩＭＭここで、ｌｔ＝｛ｂ，ｂｚ９，ｄｓ９，ｈ，ｗ，４，
８，１６，３２，６４｝、Ｒｄ＝｛ＶＲｄ，ＶＲＡｄ，
ＳＲｄ｝。．ｂと．ｄｓ９は同一の演算が指定され、．
６４とＶＲＡｄは共に指定され得ない点に注意された
い。キャッシュオフロードのためにＶＬＣＢＯＦＦを使
用する。Assembler syntax VLCB. lt Rd, SRb, SRi VLCB. lt Rd, SRb, #IMM VLCB. lt Rd, SRb +, SRi VLCB. lt Rd, SRb +, #IMM where lt = {b, bz9, ds9, h, w, 4,
8, 16, 32, 64}, Rd = {VRd, VRAd,
SRd｝. . b and. ds9 designates the same operation,.
Note that neither 64 nor VRAd can be specified. Use VLCBOFF for cache offload.

【０５７２】説明ＳＲ_b+1に存在するＢＥＧＩＮポインタとＳＲ_b+2に存
するＥＮＤポインタで指された循環バッファからベクト
ルレジスタまたはスカラレジスタをロードする。[0572] To load the vector register or a scalar register from the circular buffer pointed to by END pointers residing in BEGIN pointer and SR _{b + 2} present in the described SR _{b + 1.}

【０５７３】もし、アドレス更新演算はもちろん、ロー
ド前にＥＮＤアドレスより大きければ有効アドレスが調
整される。また、循環バッファバウンドはそれぞれ．ｈ
及び．ｗスカラレジスタについてハーフワード及びワー
ド境界上で整列されるべきである。If the address is larger than the END address before loading, of course, the effective address is adjusted. Also, the circular buffer bounds are. h
as well as. Should be aligned on halfwords and word boundaries for w scalar registers.

【０５７４】[0574]

【表１７８】 [Table 178]

【０５７５】例外無効データアドレス、整列されないアクセスプログラミング注意この命令はエレメントマスクにより影響されない。Exceptions Invalid Data Address, Unaligned Access Programming Notes This instruction is not affected by the element mask.

【０５７６】プログラマは下記条件を満たさなければ予
想通り作動できない。A programmer cannot operate as expected unless the following conditions are satisfied.

【０５７７】ＢＥＧＩＮ＜ＥＡ＜２＊ＥＮＤーＢＥＧＩＮすなわち、ＥＡーＥＮＤ＜ＥＮＧーＢＥＧＩＮは勿論、
ＥＡ＞ＢＥＧＩＮＶＬＤダブルロード BEGIN <EA <2 * END-BEGIN That is, EA-END <ENG-BEGIN, of course,
EA> BEGIN VLD Double Road

【０５７８】[0578]

【表１７９】 [Table 179]

【０５７９】アセンブラ構文ＶＬＤ．ｌｔＲｄ，ＳＲｂ，ＳＲｉＶＬＤ．ｌｔＲｄ，ＳＲｂ，＃ＩＭＭＶＬＤ．ｌｔＲｄ，ＳＲｂ＋，ＳＲｉＶＬＤ．ｌｔＲｄ，ＳＲｂ＋，＃ＩＭＭここで、ｌｔ＝｛ｂ，ｂｚ９，ｂｓ９，ｈ，ｗ，４，
８，１６，３２，６４｝、Ｒｄ＝｛ＶＲｄ，ＶＲＡｄ，
ＳＲｄ｝、．ｂと．ｂｓ９は同一の演算が指定され、．
６４とＶＲＡｄは一緒に指定され得ない点に留意された
い。キャッシュオフロードのためにＶＬＤＯＦＦを使用
する。Assembler syntax VLD. lt Rd, SRb, SRi VLD. lt Rd, SRb, #IMM VLD. lt Rd, SRb +, SRi VLD. lt Rd, SRb +, #IMM where lt = {b, bz9, bs9, h, w, 4,
8, 16, 32, 64}, Rd = {VRd, VRAd,
SRd｝,. b and. bs9 specifies the same operation.
Note that 64 and VRAd cannot be specified together. Use VLDOFF for cache offload.

【０５８０】説明現在或いは交替バンク或いは２スカラレジスタに２ベク
トルレジスタをロードする。Description Loads two vector registers into the current or replacement bank or two scalar registers.

【０５８１】[0581]

【表１８０】 [Table 180]

【０５８２】例外データアドレス、非整列アクセス無効プログラミング注意この命令はエレメントマスクによって影響を受けない。Exceptions Data Address, Unaligned Access Invalid Programming Note This instruction is not affected by the element mask.

【０５８３】ＶＬＩ即値ロード VLI Immediate Load

【０５８４】[0584]

【表１８１】 [Table 181]

【０５８５】アセンブラ構文ＶＬＩ．ｄｔＶＲｄ．＃ＩＭＭＶＬＩ．ｄｔＳＲｄ．＃ＩＭＭここで、ｄｔ＝｛ｂ、ｂ９、ｈ、ｗ、ｆ｝。Assembler syntax VLI. dt VRd. #IMM VLI. dt SRd. #IMM where dt = {b, b9, h, w, f}.

【０５８６】説明即値をスカラまたはベクトルレジスタへロードする。Description Loads an immediate into a scalar or vector register.

【０５８７】スカラレジスタロードの場合、バイト、バ
イト９、ハーフワードまたはワードはデータ型によりロ
ードされる。バイト、バイト９及びハーフワードデータ
型の場合、影響されないバイト（バイト９）は修正され
ない。For scalar register loads, bytes, bytes 9, halfwords or words are loaded by data type. For byte, byte 9 and halfword data types, the unaffected byte (byte 9) is not modified.

【０５８８】演算Ｒｄ＝以下の表を参照する：Operation Rd = Refer to the following table:

【０５８９】[0589]

【表１８２】 [Table 182]

【０５９０】例外無しＶＬＱ四重ロード [0590] Exceptions None VLQ Quadruple Load

【０５９１】[0591]

【表１８３】 [Table 183]

【０５９２】アセンブラ構文ＶＬＱ．ｌｔＲｄ，ＳＲｂ，ＳＲｉＶＬＱ．ｌｔＲｄ，ＳＲｂ，＃ＩＭＭＶＬＱ．ｌｔＲｄ，ＳＲｂ＋，ＳＲｉＶＬＱ．ｌｔＲｄ，ＳＲｂ＋，＃ＩＭＭここで、ｌｔ＝｛ｂ，ｂｚ９，ｂｓ９，ｈ，ｗ，４，
８，１６，３２，６４｝、Ｒｄ＝｛ＶＲｄ，ＶＲＡｄ，
ＳＲｄ｝、．ｂと．ｂｓ９は同一の演算が指定され、．
６４とＶＲＡｄは共に指定され得ない点に留意された
い。キャッシュオフロードのためにＶＬＱＯＦＦを使用
する。Assembler syntax VLQ. lt Rd, SRb, SRi VLQ. lt Rd, SRb, #IMM VLQ. lt Rd, SRb +, SRi VLQ. lt Rd, SRb +, #IMM where lt = {b, bz9, bs9, h, w, 4,
8, 16, 32, 64}, Rd = {VRd, VRAd,
SRd｝,. b and. bs9 specifies the same operation.
Note that both 64 and VRAd cannot be specified. Use VLQOFF for cache offload.

【０５９３】説明現在或いは交替バンク或いは４スカラレジスタに４ベク
トルレジスタをロードする。Description Loads 4 vector registers into the current or replacement bank or 4 scalar registers.

【０５９４】演算 EA= SR_b+ ｛SR_i‖ sex(IMM<7:0>)｝; if(A==1)SR_b = EA;; R_d:R_d+1:R_d+2:R_d+3= see table below;Operation EA = SR _b + ｛SR _i ‖ sex (IMM <7: 0>)｝; if (A == 1) SR _b = EA ;; R _d : R _{d + 1} : R _{d + 2} : R _{d + 3} = see table below;

【０５９５】[0595]

【表１８４】 [Table 184]

【０５９６】例外データアドレス、非整列アクセス無効プログラミング注意この命令はエレメントマスクによって影響を受けない。Exceptions Data Address, Unaligned Access Invalid Programming Note This instruction is not affected by the element mask.

【０５９７】ＶＬＲ逆へのロード VLR Load Reverse

【０５９８】[0598]

【表１８５】 [Table 185]

【０５９９】アセンブラ構文ＶＬＲ．ｌｔＲｄ，ＳＲｂ，ＳＲｉＶＬＲ．ｌｔＲｄ，ＳＲｂ，＃ＩＭＭＶＬＲ．ｌｔＲｄ，ＳＲｂ＋，ＳＲｉＶＬＲ．ｌｔＲｄ，ＳＲｂ＋，＃ＩＭＭここで、ｌｔ＝｛４，８，１６，３２，６４｝、Ｒｄ＝
｛ＶＲｄ，ＶＲＡｄ｝、．６４とＶＲＡｄは一緒に指定
され得ない点に留意されたい。キャッシュオフロードの
ためにＶＬＲＯＦＦを使用する。Assembler syntax VLR. lt Rd, SRb, SRi VLR. lt Rd, SRb, #IMM VLR. lt Rd, SRb +, SRi VLR. lt Rd, SRb +, #IMM where lt = {4,8,16,32,64}, Rd =
{VRd, VRAd},. Note that 64 and VRAd cannot be specified together. Use VLROFF for cache offload.

【０６００】説明逆エレメント順序でベクトルレジスタをロードする。こ
の命令はスカラ目的レジスタを支援しない。Description Loads vector registers in reverse element order. This instruction does not support scalar destination registers.

【０６０１】[0601]

【表１８６】 [Table 186]

【０６０２】例外データアドレス、非整列アクセス無効プログラミング注意この命令はエレメントマスクによって影響を受けない。Exceptions Data Address, Unaligned Access Invalid Programming Note This instruction is not affected by the element mask.

【０６０３】ＶＬＳＬ論理左への移動 VLSL Logical Move Left

【０６０４】[0604]

【表１８７】 [Table 187]

【０６０５】アセンブラ構文ＶＬＳＬ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂＶＬＳＬ．ｄｔＶＲｄ，ＶＲａ，＃ＩＭＭＶＬＳＬ．ｄｔＳＲｄ，ＳＲａ，ＳＲｂＶＬＳＬ．ｄｔＳＲｄ，ＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ，｝。Assembler syntax VLSL. dt VRd, VRa, SRb VLSL. dt VRd, VRa, #IMM VLSL. dt SRd, SRa, SRb VLSL. dt SRd, SRa, #IMM where dt = {b, b9, h, w,}.

【０６０６】[0606]

【表１８８】 [Table 188]

【０６０７】説明ベクトル／スカラレジスタＲａのそれぞれのエレメント
は最下位ビットＬＳＢの位置にゼロ充てんによってスカ
ラレジスタＲｂ或いはＩＭＭフィールドに与えられた移
動量だけ左に論理的にビット−移動され、その結果はベ
クトル／スカラレジスタＲｄに記憶される。Description Each element of the vector / scalar register Ra is logically bit-shifted to the left by the amount of shift given to the scalar register Rb or the IMM field by zero-filling the least significant bit LSB to zero. It is stored in the vector / scalar register Rd.

【０６０８】例外無しプログラミング注意移動量がＳＲｂ或いはＩＭＭ（４：０）から５ビット番
号で得られる点に注意されたい。バイト、バイト９、ハ
フワードデータタイプに対してプログラマはデータサイ
ズのビット数より小さいか同一の移動量を正確に指定す
る義務がある。もし移動量が指定されたデータサイズよ
りさらに大きければ、エレメントはゼロ充てんされる。[0608] Exceptions None Programming notes Note that the travel distance is obtained from SRb or IMM (4: 0) as a 5-bit number. For byte, byte 9, and huff word data types, the programmer is obliged to specify exactly the same amount of movement that is less than or equal to the number of bits in the data size. If the displacement is larger than the specified data size, the element is zero filled.

【０６０９】ＶＬＳＲ論理右への移動 VLSR Logic Move Right

【０６１０】[0610]

【表１８９】 [Table 189]

【０６１１】アセンブラ構文ＶＬＳＲ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂＶＬＳＲ．ｄｔＶＲｄ，ＶＲａ，＃ＩＭＭＶＬＳＲ．ｄｔＳＲｄ，ＳＲａ，ＳＲｂＶＬＳＲ．ｄｔＳＲｄ，ＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ，｝。Assembler syntax VLSR. dt VRd, VRa, SRb VLSR. dt VRd, VRa, #IMM VLSR. dt SRd, SRa, SRb VLSR. dt SRd, SRa, #IMM where dt = {b, b9, h, w,}.

【０６１２】[0612]

【表１９０】 [Table 190]

【０６１３】説明ベクトル／スカラレジスタＲａのそれぞれのエレメント
は最上位ビットＭＳＢの位置にゼロ充てんでスカラレジ
スタＲｂ或いはＩＭＭフィールドに与えられた移動量だ
け右に論理的にビット−移動され、その結果はベクトル
／スカラレジスタＲｄに記憶される。Description Each element of the vector / scalar register Ra is logically bit-shifted to the right by a shift amount given to the scalar register Rb or the IMM field by zero-filling the position of the most significant bit MSB. It is stored in the vector / scalar register Rd.

【０６１４】例外無しプログラミング注意移動量がＳＲｂ或いはＩＭＭ（４：０）から５ビット番
号で得られる点に注意されたい。バイト、バイト９、ハ
フワードデータタイプに対してプログラマはデータサイ
ズのビット数より小さいか同一の移動量を正確に指定す
る義務がある。もし移動量が指定されたデータサイズよ
りさらに大きければ、エレメントはゼロで充てんされ
る。[0614] Exceptions None Programming notes Note that the travel distance is obtained from SRb or IMM (4: 0) as a 5-bit number. For byte, byte 9, and huff word data types, the programmer is obliged to specify exactly the same amount of movement that is less than or equal to the number of bits in the data size. If the displacement is larger than the specified data size, the element is filled with zeros.

【０６１５】ＶＬＷＳストライドにロード
する Load VLWS Stride
Do

【０６１６】[0616]

【表１９１】 [Table 191]

【０６１７】アセンブラ構文ＶＬＷＳ．ｌｔＲｄ，ＳＲａ，ＳＲｉＶＬＷＳ．ｌｔＲｄ，ＳＲｂ，＃ＩＭＭＶＬＷＳ．ｌｔＲｄ，ＳＲｂ＋，ＳＲｉＶＬＷＳ．ｌｔＲｄ，ＳＲｂ＋，＃ＩＭＭここで、ｌｔ＝｛４，８，１６，３２，６４｝，Ｒｄ＝
｛ＶＲｄ，ＶＲＡｄ｝、．６４とＶＲＡｄは共に指定さ
れ得ない点に留意されたい。キャッシュオフロードのた
めにＶＬＷＳＯＦＦを使用する。Assembler syntax VLWS. lt Rd, SRa, SRi VLWS. lt Rd, SRb, #IMM VLWS. lt Rd, SRb +, SRi VLWS. lt Rd, SRb +, #IMM where lt = {4, 8, 16, 32, 64}, Rd =
{VRd, VRAd},. Note that both 64 and VRAd cannot be specified. Use VLWSOFF for cache offload.

【０６１８】説明有効アドレスから始めてストライド制御レジスタ(Strid
e Control register)としてスカラレジスタＳＲｂ＋１
を使用してメモリからベクトルレジスタＶＲｄに３２バ
イトがロードされる。ＬＴは各ブロックに対するロード
のために連続したバイトの番号とブロックサイズを指定
する。ＳＲｂ＋１は２連続ブロックの始まりを分離する
番号とストライドを指定する。ストライドはブロックサ
イズと同一か或いはさらに大きいべきである。ＥＡは整
列されたデータサイズでなければならない。ストライド
とブロックサイズはデータサイズの多数倍になるべきで
ある。Description Starting from the effective address, the stride control register (Strid
e Control register) as scalar register SRb + 1
Is used to load 32 bytes from the memory into the vector register VRd. LT specifies the number of consecutive bytes and the block size for loading for each block. SRb + 1 specifies the number and stride that separate the beginning of two consecutive blocks. Stride should be equal to or greater than the block size. EA must be aligned data size. Stride and block size should be many times the data size.

【０６１９】演算 EA＝SR_b+ ｛SR_i‖sex(IMM<7:0>）｝； if(A=1) SR_b=EA ； Block＿size＝｛４‖８‖16‖32｝； stride＝SR_b+1<31：0>； for(i=0;i<VECSIZE/Block ＿size;i++） for(j=0;j<Block ＿size;j++) VRd[ｉ^*Block ＿size+j]<8:0>=sex BYTE[EA+i^* Stride+j]; 例外データアドレス、非整列アクセス無効ＶＭＡＣ乗算及びアキュムレート Operation EA = SR _b + {SR _i {sex (IMM <7: 0>)}; if (A = 1) SR _b = EA; Block_size = {4‖8‖16‖32}; stride = SR _{b + 1} <31: 0>; for (i = 0; i <VECSIZE / Block_size; i ++) for (j = 0; j <Block_size; j ++) VRd [i ^* Block_size + j] <8: 0 > = sex BYTE [EA + i ^* Stride + j]; Exception Data address, non-aligned access invalid VMAC multiplication and accumulation

【０６２０】[0620]

【表１９２】 [Table 192]

【０６２１】アセンブラ構文ＶＭＡＣ．ｄｔＶＲａ，ＶＲｂＶＭＡＣ．ｄｔＶＲａ，ＳＲｂＶＭＡＣ．ｄｔＶＲａ，＃ＩＭＭＶＭＡＣ．ｄｔＳＲａ，ＳＲｂＶＭＡＣ．ｄｔＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｈ，ｗ，ｆ｝。Assembler syntax VMAC. dt VRa, VRb VMAC. dt VRa, SRb VMAC. dt VRa, #IMM VMAC. dt SRa, SRb VMAC. dt SRa, # IMM where dt = {b, h, w, f}.

【０６２２】[0622]

【表１９３】 [Table 193]

【０６２３】説明Ｒａの各エレメントをＲｂの各エレメントと乗算して倍
精度中間結果を生成し、中間結果の各倍精度エレメント
をベクトルアキュムレータの各倍精度エレメントに加算
して、ベクトルアキュムレータに各エレメントの倍精度
の和を記憶させる。ＲａとＲｂは指定されたデータタ
イプを使用し、一方ＶＡＣは適宜な倍精度データタイプ
を使用する（それぞれｉｎｔ８，ｉｎｔ１６，及びｉｎ
ｔ３２に対して１６，３２，及び６４ビット）。各倍精
度エレメントの上位部分はＶＡＣＨに記憶される。Description Each element of Ra is multiplied by each element of Rb to generate a double-precision intermediate result, and each double-precision element of the intermediate result is added to each double-precision element of the vector accumulator, and each element is added to the vector accumulator. The sum of the double precision of is stored. Ra and Rb use the specified data type, while VAC uses the appropriate double precision data type (int8, int16, and in, respectively).
16, 32, and 64 bits for t32). The upper part of each double precision element is stored in the VACH.

【０６２４】フロートデータタイプに対して全てのオペ
ランドと結果は単精度である。All operands and results are single precision for the float data type.

【０６２５】演算 for(i=0;i < NumElem ＆＆ EMASK[i];i++)｛ Aop[i]＝｛VRa[i]‖SRa ｝; Bop[i]＝｛VRb[i]‖SRb ｝; if(dt==float)VACL[i]=Aop[i] ^*Bop[i]+VACL[i]; else VACH[i]:VACL[i]=Aop[i] ^*Bop[i]+VACH[i]:VACL[i]; 例外オーバフロー、浮動小数点無効オペランドプログラミング注意この命令はｉｎｔ９データタイプを支援せず、その代わ
りｉｎｔ１６データタイプを使用する。Operation for (i = 0; i <NumElem && EMASK [i]; i ++) ｛Aop [i] = ｛VRa [i] ‖SRa｝; Bop [i] = ｛VRb [i] ‖SRb｝; if (dt == float) VACL [i] = Aop [i] ^* Bop [i] + VACL [i]; else VACH [i]: VACL [i] = Aop [i] ^* Bop [i] + VACH [ i]: VACL [i]; Exceptions Overflow, floating point invalid operand Programming Notes This instruction does not support the int9 data type, but uses the int16 data type instead.

【０６２６】ＶＭＡＣＦ乗算及び小数部ア
キュムレート VMCF Multiplication and Decimal Part A
Cumulate

【０６２７】[0627]

【表１９４】 [Table 194]

【０６２８】アセンブラ構文ＶＭＡＣＦ．ｄｔＶＲａ，ＶＲｂＶＭＡＣＦ．ｄｔＶＲａ，ＳＲｂＶＭＡＣＦ．ｄｔＶＲａ，＃ＩＭＭＶＭＡＣＦ．ｄｔＳＲａ，ＳＲｂＶＭＡＣＦ．ｄｔＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｈ，ｗ，｝。Assembler syntax VMACF. dt VRa, VRb VMACF. dt VRa, SRb VMACF. dt VRa, #IMM VMACF. dt SRa, SRb VMCF. dt SRa, # IMM where dt = {b, h, w,}.

【０６２９】[0629]

【表１９５】 [Table 195]

【０６３０】説明Ｒａの各エレメントをＲｂの各エレメントと乗算して倍
精度中間結果を生成し、中間結果を１ビット左にシフト
させ、シフトさせた中間結果の各倍精度エレメントをベ
クトルアキュムレータの各倍精度エレメントに加算し
て、ベクトルアキュムレータに各エレメントの倍精度の
和を記憶させる。ＶＲａとＲｂは指定されたデータタ
イプを使用し、一方ＶＡＣは適宜な倍精度データタイプ
を使用する（それぞれｉｎｔ８，ｉｎｔ１６，及びｉｎ
ｔ３２に対して１６，３２，及び６４ビット）。各倍精
度エレメントの上位部分はＶＡＣＨに記憶される。Description Each element of Ra is multiplied by each element of Rb to generate a double-precision intermediate result, the intermediate result is shifted one bit to the left, and each double-precision element of the shifted intermediate result is stored in the vector accumulator. The sum of the double precision of each element is stored in the vector accumulator in addition to the double precision element. VRa and Rb use the specified data type, while VAC uses the appropriate double precision data type (int8, int16, and in, respectively).
16, 32, and 64 bits for t32). The upper part of each double precision element is stored in the VACH.

【０６３１】演算 for(i=0;i < NumElem ＆＆ EMASK[i];i++)｛ Bop[i]＝｛VRb[i]‖SRb ‖sex(IMM<8:0>）｝； VACH[i]:VACL[i]=((VRa[i]^*Bop[i])<<1)+VACH[i]:VACL[i]; ｝例外オーバフロー、プログラミング注意この命令はｉｎｔ９データタイプを支援せず、その代わ
りｉｎｔ１６データタイプを使用する。Operation for (i = 0; i <NumElem && EMASK [i]; i ++) ｛Bop [i] = {VRb [i] ‖SRb‖sex (IMM <8: 0>)}; VACH [i] : VACL [i] = ((VRa [i] ^* Bop [i]) << 1) + VACH [i]: VACL [i]; 例外 Exception overflow, programming note This instruction does not support int9 data type Instead, use the int16 data type.

【０６３２】ＶＭＡＣＬ乗算及びローアキ
ュムレート VMACL Multiplication and Row Space
Cumulate

【０６３３】[0633]

【表１９６】 [Table 196]

【０６３４】アセンブラ構文ＶＭＡＣＬ．ｄｔＶＲａ，ＶＲｂＶＭＡＣＬ．ｄｔＶＲａ，ＳＲｂＶＭＡＣＬ．ｄｔＶＲａ，＃ＩＭＭＶＭＡＣＬ．ｄｔＳＲａ，ＳＲｂＶＭＡＣＬ．ｄｔＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｈ，ｗ，ｆ｝。Assembler Syntax VMCL. dt VRa, VRb VMCL. dt VRa, SRb VMACL. dt VRa, #IMM VMACL. dt SRa, SRb VMCL. dt SRa, # IMM where dt = {b, h, w, f}.

【０６３５】[0635]

【表１９７】 [Table 197]

【０６３６】説明Ｒａの各エレメントをＲｂの各エレメントと乗算して倍
精度中間結果を生成し、中間結果の各倍精度エレメント
をベクトルアキュムレータの各倍精度エレメントに加算
して、ベクトルアキュムレータに各エレメントの倍精度
の和を記憶させ、目的レジスタＶＲｄに下位部分をリタ
ーンさせる。Description Each element of Ra is multiplied by each element of Rb to generate a double-precision intermediate result, and each double-precision element of the intermediate result is added to each double-precision element of the vector accumulator, and each element is added to the vector accumulator. Is stored, and the lower part is returned to the target register VRd.

【０６３７】ＲａとＲｂは指定されたデータタイプを使
用し、一方ＶＡＣは適宜な倍精度データタイプを使用す
る（それぞれｉｎｔ８，ｉｎｔ１６，及びｉｎｔ３２に
対して１６，３２，及び６４ビット）。各倍精度エレメ
ントの上位部分はＶＡＣＨに記憶される。Ra and Rb use the specified data type, while VAC uses the appropriate double precision data type (16, 32, and 64 bits for int8, int16, and int32, respectively). The upper part of each double precision element is stored in the VACH.

【０６３８】フロートデータタイプに対して全てのオペ
ランドと結果は単精度である。All operands and results are single precision for the float data type.

【０６３９】演算 for(i=0;i < NumElem ＆＆ EMASK[i];i++)｛ Bop[i]＝｛VRb[i]‖SRb ｝； if(dt==float)VACL[i]=VRa[i] ^*Bop[i]+VACL[i]； else VACH[i]:VACL[i]=VRa[i] ^*Bop[i]+VACH[i]:VACL[i]； VRd[i]=VACL[i]；例外オーバフロー、浮動小数点無効オペランドプログラミング注意この命令はｉｎｔ９データタイプを支援せず、その代わ
りｉｎｔ１６データタイプを使用する。Operation for (i = 0; i <NumElem && EMASK [i]; i ++) ｛Bop [i] = ｛VRb [i] ‖SRb｝; if (dt == float) VACL [i] = VRa [ i] ^* Bop [i] + VACL [i]; else VACH [i]: VACL [i] = VRa [i] ^* Bop [i] + VACH [i]: VACL [i]; VRd [i] = VACL [i]; Exceptions Overflow, floating point invalid operand Programming Notes This instruction does not support the int9 data type, but instead uses the int16 data type.

【０６４０】ＶＭＡＤ乗算及び加算 VMAD Multiplication and Addition

【０６４１】[0641]

【表１９８】 [Table 198]

【０６４２】アセンブラ構文ＶＭＡＤ．ｄｔＶＲｃ，ＶＲｄ，ＶＲａ，ＶＲｂＶＭＡＤ．ｄｔＳＲｃ，ＳＲｄ，ＳＲａ，ＳＲｂここで、ｄｔ＝｛ｂ，ｈ，ｗ｝。The assembler syntax VMAD. dt VRc, VRd, VRa, VRb VMAD. dt SRc, SRd, SRa, SRb where dt = {b, h, w}.

【０６４３】[0643]

【表１９９】 [Table 199]

【０６４４】説明Ｒａの各エレメントをＲｂの各エレメントと乗算して倍
精度中間結果を生成し、中間結果の各倍精度エレメント
をＲｃの各エレメントに加算して、目的レジスタ（Ｒｄ
＋１：Ｒｄ）に各エレメントの倍精度の和を記憶させ
る。Description Each element of Ra is multiplied by each element of Rb to generate a double precision intermediate result, and each double precision element of the intermediate result is added to each element of Rc to obtain a target register (Rd
+1: Rd) stores the double precision sum of each element.

【０６４５】演算 for(i=0:i < NumElem ＆＆ EMASK[i];i++)｛ Aop[i]＝｛VRa[i]‖SRa ｝； Bop[i]＝｛VRb[i]‖SRb ｝； Cop[i]＝｛VRc[i]‖SRc ｝； Rd+1[i]:Rd[i]=Aop[i]^*Bop[i]+sex＿dp(Cop[i]) ; ｝例外無しＶＭＡＤＬ乗算及びロー加算 Operation for (i = 0: i <NumElem && EMASK [i]; i ++) ｛Aop [i] = {VRa [i] ‖SRa｝; Bop [i] = ｛VRb [i] ‖SRb｝; Rd + 1 [i]: Rd [i] = Aop [i] ^* Bop [i] + sex_dp (Cop [i]);｝ No exception VMADL multiplication and Row addition

【０６４６】[0646]

【表２００】 [Table 200]

【０６４７】アセンブラ構文ＶＭＡＤＬ．ｄｔＶＲｃ，ＶＲｄ，ＶＲａ，ＶＲｂＶＭＡＤＬ．ｄｔＳＲｃ，ＳＲｄ，ＳＲａ，ＳＲｂここで、ｄｔ＝｛ｂ，ｈ，ｗ，ｆ｝。The assembler syntax VMADL. dt VRc, VRd, VRa, VRb VMADL. dt SRc, SRd, SRa, SRb where dt = {b, h, w, f}.

【０６４８】[0648]

【表２０１】 [Table 201]

【０６４９】説明Ｒａの各エレメントをＲｂの各エレメントと乗算して倍
精度中間結果を生成し、中間結果の各倍精度エレメント
をＲｃの各エレメントに加算して、目的レジスタＲｄに
各エレメントの下位部分倍精度の和を記憶させる。Description Each element of Ra is multiplied by each element of Rb to generate a double-precision intermediate result, and each double-precision element of the intermediate result is added to each element of Rc. The partial double precision sum is stored.

【０６５０】フロートデータタイプに対して全てのオペ
ランドと結果は単精度である。All operands and results are single precision for the float data type.

【０６５１】演算 for(i=0;i < NumElem ＆＆ EMASK[i];i++)｛ Aop[i]＝｛VRa[i]‖SRa ｝； Bop[i]＝｛VRb[i]‖SRb ｝； Cop[i]＝｛VRc[i]‖SRc ｝； if(dt==float)Lo[i]=Aop[i] ^*Bop[i]+ Cop[i]; else Hi[i]:Lo[i]＝Aop[i]^*Bop[i]+sex＿dp(Cop[i]); Rd[i]=Lo[i]; ｝例外オーバフロー、浮動小数点無効オペランド。Operation for (i = 0; i <NumElem && EMASK [i]; i ++) ｛Aop [i] = {VRa [i] ‖SRa｝; Bop [i] = ｛VRb [i] ‖SRb｝; Cop [i] = {VRc [i] ‖SRc}; if (dt == float) Lo [i] = Aop [i] ^* Bop [i] + Cop [i]; else Hi [i]: Lo [i ] = Aop [i] ^* Bop [i] + sex_dp (Cop [i]); Rd [i] = Lo [i];｝ Exceptions Overflow, floating point invalid operand.

【０６５２】ＶＭＡＳ乗算及びアキュムレ
ータからの減算 VMAS Multiplication and Accumulation
Subtraction from data

【０６５３】[0653]

【表２０２】 [Table 202]

【０６５４】アセンブラ構文ＶＭＡＳ．ｄｔＶＲａ，ＶＲｂＶＭＡＳ．ｄｔＶＲａ，ＳＲｂＶＭＡＳ．ｄｔＶＲａ，＃ＩＭＭＶＭＡＳ．ｄｔＳＲａ，ＳＲｂＶＭＡＳ．ｄｔＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｈ，ｗ，ｆ｝。Assembler Syntax VMAS. dt VRa, VRb VMAS. dt VRa, SRb VMAS. dt VRa, #IMM VMAS. dt SRa, SRb VMAS. dt SRa, # IMM where dt = {b, h, w, f}.

【０６５５】[0655]

【表２０３】 [Table 203]

【０６５６】説明Ｒａの各エレメントをＲｂの各エレメントと乗算して倍
精度中間結果を生成し、中間結果の各倍精度エレメント
をベクトルアキュムレータの各倍精度エレメントから減
算して、ベクトルアキュムレータに各エレメントの倍精
度の和を記憶させる。Description Each element of Ra is multiplied by each element of Rb to generate a double-precision intermediate result, and each double-precision element of the intermediate result is subtracted from each double-precision element of the vector accumulator, and each element is added to the vector accumulator. The sum of the double precision of is stored.

【０６５７】ＲａとＲｂは指定されたデータタイプを使
用し、一方ＶＡＣは適宜な倍精度データタイプを使用す
る（それぞれｉｎｔ８，ｉｎｔ１６，及びｉｎｔ３２に
対して１６，３２，及び６４ビット）。各倍精度エレメ
ントの上位部分はＶＡＣＨに記憶される。Ra and Rb use the specified data type, while VAC uses the appropriate double precision data type (16, 32, and 64 bits for int8, int16, and int32, respectively). The upper part of each double precision element is stored in the VACH.

【０６５８】フロートデータタイプに対して全てのオペ
ランドと結果は単精度である。All operands and results are single precision for the float data type.

【０６５９】演算 for(i=0;i < NumElem ＆＆ EMASK[i];i++)｛ Bop[i]＝｛VRb[i]‖SRb ｝； if(dt==float)VACL[i]=VACL[i]-VRa[i] ^*Bop[i]； else VACH[i]:VACL[i]=VACH[i]:VACL[i]-VRa[i] ^*Bop[i]；｝例外オーバフロー、浮動小数点無効オペランドプログラミング注意この命令はｉｎｔ９データタイプを支援せず、その代わ
りｉｎｔ１６データタイプを使用する。Operation for (i = 0; i <NumElem && EMASK [i]; i ++) ｛Bop [i] = {VRb [i] ‖SRb｝; if (dt == float) VACL [i] = VACL [ i] -VRa [i] ^* Bop [i]; else VACH [i]: VACL [i] = VACH [i]: VACL [i] -VRa [i] ^* Bop [i]; 例外 Exception Overflow, floating point Invalid Operands Programming Note This instruction does not support the int9 data type, but instead uses the int16 data type.

【０６６０】ＶＭＡＳＦ乗算及びアキュム
レータ小数部からの減算 VMASF Multiplication and Accumulation
Subtraction from decimal fraction

【０６６１】[0661]

【表２０４】 [Table 204]

【０６６２】アセンブラ構文ＶＭＡＳＦ．ｄｔＶＲａ，ＶＲｂＶＭＡＳＦ．ｄｔＶＲａ，ＳＲｂＶＭＡＳＦ．ｄｔＶＲａ，＃ＩＭＭＶＭＡＳＦ．ｄｔＳＲａ，ＳＲｂＶＭＡＳＦ．ｄｔＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｈ，ｗ｝。Assembler syntax VMASF. dt VRa, VRb VMASF. dt VRa, SRb VMASF. dt VRa, #IMM VMASF. dt SRa, SRb VMASF. dt SRa, # IMM where dt = {b, h, w}.

【０６６３】[0663]

【表２０５】 [Table 205]

【０６６４】説明Ｒａの各エレメントをＲｂの各エレメントと乗算して倍
精度中間結果を生成し、倍精度中間結果を１ビットだけ
左にシフトし、シフトされた中間結果の各倍精度エレメ
ントをベクトルアキュムレータの各倍精度エレメントか
ら減算して、ベクトルアキュムレータに各エレメントの
倍精度の和を記憶させる。Description Each element of Ra is multiplied by each element of Rb to generate a double precision intermediate result, the double precision intermediate result is shifted left by one bit, and each double precision element of the shifted intermediate result is vector Subtraction is performed from each double precision element of the accumulator, and the vector accumulator stores the sum of the double precision of each element.

【０６６５】ＲａとＲｂは指定されたデータタイプを使
用し、一方ＶＡＣは適宜な倍精度データタイプを使用す
る（それぞれｉｎｔ８，ｉｎｔ１６，及びｉｎｔ３２に
対して１６，３２，及び６４ビット）。各倍精度エレメ
ントの上位部分はＶＡＣＨに記憶される。Ra and Rb use the specified data type, while VAC uses the appropriate double precision data type (16, 32, and 64 bits for int8, int16, and int32, respectively). The upper part of each double precision element is stored in the VACH.

【０６６６】演算 for(i=0;i < NumElem ＆＆ EMASK[i];i++)｛ Bop[i]＝｛VRb[i]‖SRb ‖sex(IMM<8:0>）｝； VACH[i]:VACL[i]=VACH[i]:VACL[i]-VRa[i]^*Bop[i]；｝例外オーバフロープログラミング注意この命令はｉｎｔ９データタイプを支援せず、その代わ
りｉｎｔ１６データタイプを使用する。Operation for (i = 0; i <NumElem && EMASK [i]; i ++) ｛Bop [i] = {VRb [i] ‖SRb‖sex (IMM <8: 0>)｝; VACH [i] : VACL [i] = VACH [i]: VACL [i] -VRa [i] ^* Bop [i];｝ Exception Overflow Programming Note This instruction does not support int9 data type, but uses int16 data type instead .

【０６６７】ＶＭＡＳＬ乗算及びアキュム
レータローからの減算 VMASL Multiplication and Accumulation
Subtraction from Latarrow

【０６６８】[0668]

【表２０６】 [Table 206]

【０６６９】アセンブラ構文ＶＭＡＳＬ．ｄｔＶＲｄ，ＶＲａ，ＶＲｂＶＭＡＳＬ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂＶＭＡＳＬ．ｄｔＶＲｄ，ＶＲａ，＃ＩＭＭＶＭＡＳＬ．ｄｔＳＲｄ，ＳＲａ，ＳＲｂＶＭＡＳＬ．ｄｔＳＲｄ，ＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｈ，ｗ，ｆ｝。Assembler syntax VMASL. dt VRd, VRa, VRb VMASL. dt VRd, VRa, SRb VMASL. dt VRd, VRa, #IMM VMASL. dt SRd, SRa, SRb VMASL. dt SRd, SRa, #IMM where dt = {b, h, w, f}.

【０６７０】[0670]

【表２０７】 [Table 207]

【０６７１】説明Ｒａの各エレメントをＲｂの各エレメントと乗算して倍
精度中間結果を生成し、中間結果の各倍精度エレメント
をベクトルアキュムレータの各倍精度エレメントから減
算して、ベクトルアキュムレータに各エレメントの倍精
度の和を記憶させ、目的レジスタＶＲｄに下位部分をリ
ターンする。Description Each element of Ra is multiplied by each element of Rb to generate a double-precision intermediate result, and each double-precision element of the intermediate result is subtracted from each double-precision element of the vector accumulator, and each element is added to the vector accumulator. Is stored, and the lower part is returned to the target register VRd.

【０６７２】ＲａとＲｂは指定されたデータタイプを使
用し、一方ＶＡＣは適宜な倍精度データタイプを使用す
る（それぞれｉｎｔ８，ｉｎｔ１６，及びｉｎｔ３２に
対して１６，３２，及び６４ビット）。各倍精度エレメ
ントの上位部分はＶＡＣＨに記憶される。Ra and Rb use the specified data type, while VAC uses the appropriate double precision data type (16, 32, and 64 bits for int8, int16, and int32, respectively). The upper part of each double precision element is stored in the VACH.

【０６７３】フロートデータタイプに対して全てのオペ
ランドと結果は単精度である。All operands and results are single precision for the float data type.

【０６７４】演算 for(i=0;i < NumElem ＆＆ EMASK[i];i++)｛ Bop[i]＝｛VRb[i]‖SRb ｝； if(dt==float)VACL[i]=VACL[i]-VRA[i] ^*Bop[i]； else VACH[i]:VACL[i]=VACH[i]:VACL[i]-VRa[i] ^*Bop[i]； VRd[i]=VACL[i]；｝例外オーバフロー、浮動小数点無効オペランドプログラミング注意この命令はｉｎｔ９データタイプを支援せず、その代わ
りｉｎｔ１６データタイプを使用する。Operation for (i = 0; i <NumElem && EMASK [i]; i ++) ｛Bop [i] = ｛VRb [i] ‖SRb｝; if (dt == float) VACL [i] = VACL [ i] -VRA [i] ^* Bop [i]; else VACH [i]: VACL [i] = VACH [i]: VACL [i] -VRa [i] ^* Bop [i]; VRd [i] = VACL [i];｝命令注意注意注意注意注意命令 This instruction does not support the int9 data type;

【０６７５】ＶＭＡＸＥ双方式最大及び交換 VMAX Bi-directional maximum and exchange

【０６７６】[0676]

【表２０８】 [Table 208]

【０６７７】アセンブラ構文ＶＭＡＸＥ．ｄｔＶＲｄ，ＶＲｂここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ，ｆ｝。Assembler syntax VMAXE. dt VRd, VRb where dt = {b, b9, h, w, f}.

【０６７８】[0678]

【表２０９】 [Table 209]

【０６７９】説明ＶＲａとＶＲｂは同一でなければならない。ＶＲａがＶ
Ｒｂと相違する時、その結果は定義されない。Description VRa and VRb must be the same. VRa is V
When different from Rb, the result is undefined.

【０６８０】ベクトルレジスタＲｂの各偶数／奇数デー
タエレメントは対で比較され、各データエレメント対の
うちより大きい値がベクトルレジスタＲｄの偶数位置に
記憶され、各データエレメント対のうちより小さい値が
奇数位置に記憶される。Each even / odd data element of vector register Rb is compared in pairs, the larger value of each data element pair being stored in the even position of vector register Rd, and the smaller value of each data element pair being the odd number. Stored in position.

【０６８１】演算 for(i=0;i < NumElem ＆＆ EMASK[i]:i=i+2)｛ VRd[i]＝(VRb[i]>VRb[i+1])?VRb[i]:VRb[i+1]; VRd[i+1]=(VRb[i]>VRb[i+1])?VRb[i+1]:VRb[i]；｝例外無しＶＭＯＶムーブ Operation for (i = 0; i <NumElem && EMASK [i]: i = i + 2) ｛VRd [i] = (VRb [i]> VRb [i + 1])? VRb [i]: VRb [i + 1]; VRd [i + 1] = (VRb [i]> VRb [i + 1])? VRb [i + 1]: VRb [i]; Ｖ No exception VMOV move

【０６８２】[0682]

【表２１０】 [Table 210]

【０６８３】アセンブラ構文ＶＭＯＶ．ｄｔＲｄ，Ｒｂここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ，ｆ｝であり、Ｒｄ
とＲｂは構造的に指定されたレジスタ名で示される。Assembler syntax VMOV. dt Rd, Rb where dt = {b, b9, h, w, f} and Rd
And Rb are indicated by register names designated structurally.

【０６８４】[0684]

【表２１１】 [Table 211]

【０６８５】[0685]

【表２１２】 [Table 212]

【０６８６】[0686]

【表２１３】 [Table 213]

【０６８７】演算Ｒｄ＝Ｒｂ例外ＶＣＳＲ或いはＶＩＳＲＣに例外状態ビットをセットす
ることは対応する例外を生じさせる。Operation Rd = Rb Exceptions Setting an exception status bit in VCSR or VISRC causes a corresponding exception.

【０６８８】プログラミング注意この命令はエレメントマスクによって影響を受けない。
交替バンク概念がＶＥＣ６４モードには存在しないの
で、この命令はＶＥＣ６４モードで交替バンクレジスタ
に対する移動に使用できないことに注意されたい。Programming Note This instruction is not affected by the element mask.
Note that this instruction cannot be used to move to the alternate bank register in VEC64 mode because the alternate bank concept does not exist in VEC64 mode.

【０６８９】ＶＭＵＬ乗算 VMUL Multiplication

【０６９０】[0690]

【表２１４】 [Table 214]

【０６９１】アセンブラ構文ＶＭＵＬ．ｄｔＶＲｃ，ＶＲｄ，ＶＲａ，ＶＲｂＶＭＵＬ．ｄｔＳＲｃ，ＳＲｄ，ＳＲａ，ＳＲｂここで、ｄｔ＝｛ｂ，ｈ，ｗ｝。[0691] Assembler syntax VMUL. dt VRc, VRd, VRa, VRb VMUL. dt SRc, SRd, SRa, SRb where dt = {b, h, w}.

【０６９２】[0692]

【表２１５】 [Table 215]

【０６９３】説明Ｒａの各エレメントをＲｂの各エレメントと乗算して倍
精度の結果を生成し、目的レジスタＲｃ：Ｒｄに各エレ
メントの倍精度の和をリターンさせる。Description Each element of Ra is multiplied by each element of Rb to generate a double precision result, and the sum of the double precision of each element is returned to the target register Rc: Rd.

【０６９４】ＲａとＲｂは指定されたデータタイプを使
用し、一方Ｒｃ：Ｒｄは適宜な倍精度データタイプを使
用する（それぞれｉｎｔ８，ｉｎｔ１６，及びｉｎｔ３
２に対して１６，３２，及び６４ビット）、各倍精度エ
レメントの上位部分はＲｃに記憶される。Ra and Rb use the specified data types, while Rc: Rd use the appropriate double precision data types (int8, int16, and int3, respectively).
2 for 16, 32, and 64 bits), the upper part of each double precision element is stored in Rc.

【０６９５】例外無しプログラミング注意この命令はｉｎｔ９データタイプを支援せず、その代わ
りｉｎｔ１６データタイプを使用する。また、この命令
は拡張された結果が支援されたデータタイプでないの
で、フロートデータタイプを支援しない。[0696] Exceptions None Programming Notes This instruction does not support the int9 data type, but instead uses the int16 data type. Also, this instruction does not support the float data type because the extended result is not a supported data type.

【０６９６】ＶＭＵＬＡアキュムレータ乗算 [0696] VMULA accumulator multiplication

【０６９７】[0697]

【表２１６】 [Table 216]

【０６９８】アセンブラ構文ＶＭＵＬＡ．ｄｔＶＲａ，ＶＲｂＶＭＵＬＡ．ｄｔＶＲａ，ＳＲｂＶＭＵＬＡ．ｄｔＶＲａ，＃ＩＭＭＶＭＵＬＡ．ｄｔＳＲａ，ＳＲｂＶＭＵＬＡ．ｄｔＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｈ，ｗ，ｆ｝。Assembler syntax VMULA. dt VRa, VRb VMULA. dt VRa, SRb VMULA. dt VRa, #IMM VMULA. dt SRa, SRb VMULA. dt SRa, # IMM where dt = {b, h, w, f}.

【０６９９】[0699]

【表２１７】 [Table 217]

【０７００】説明Ｒａの各エレメントをＲｂの各エレメントと乗算して倍
精度中間結果を生成し、その結果をアキュムレータに記
録させる。Description Each element of Ra is multiplied by each element of Rb to generate a double precision intermediate result, and the result is recorded in an accumulator.

【０７０１】フロートデータタイプに対して全てのオペ
ランドと結果は単精度である。All operands and results are single precision for float data types.

【０７０２】例外無しプログラミング注意この命令はｉｎｔ９データタイプを支援せず、その代わ
りｉｎｔ１６データタイプを使用する。[0702] Exceptions None Programming Notes This instruction does not support the int9 data type, but instead uses the int16 data type.

【０７０３】ＶＭＵＬＡＦアキュムレータ小数部乗算 VMULAF Accumulator Decimal Part Multiply

【０７０４】[0704]

【表２１８】 [Table 218]

【０７０５】アセンブラ構文ＶＭＵＬＡＦ．ｄｔＶＲａ，ＶＲｂＶＭＵＬＡＦ．ｄｔＶＲａ，ＳＲｂＶＭＵＬＡＦ．ｄｔＶＲａ，＃ＩＭＭＶＭＵＬＡＦ．ｄｔＳＲａ，ＳＲｂＶＭＵＬＡＦ．ｄｔＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｈ，ｗ｝。The assembler syntax VMULAF. dt VRa, VRb VMULAF. dt VRa, SRb VMULAF. dt VRa, #IMM VMULAF. dt SRa, SRb VMULAF. dt SRa, # IMM where dt = {b, h, w}.

【０７０６】[0706]

【表２１９】 [Table 219]

【０７０７】説明Ｒａの各エレメントをＲｂの各エレメントと乗算して倍
精度中間結果を生成し、倍精度中間結果を１ビットだけ
左にシフトして、その結果をアキュムレータに記録させ
る。Description Each element of Ra is multiplied by each element of Rb to generate a double precision intermediate result, the double precision intermediate result is shifted left by one bit, and the result is recorded in the accumulator.

【０７０８】演算 for(i=0;i < NumElem ＆＆ EMASK[i]:i++)｛ Bop[i]＝｛VRb[i]‖SRb ‖sex(IMM<8:0>）｝； VACH[i]:VACL[i]=(VRa[i] ^*Bop[i])<<1；｝例外無しプログラミング注意この命令はｉｎｔ９データタイプを支援せず、その代わ
りｉｎｔ１６データタイプを使用する。Operation for (i = 0; i <NumElem && EMASK [i]: i ++) ｛Bop [i] = {VRb [i] ‖SRb‖sex (IMM <8: 0>)｝; VACH [i] : VACL [i] = (VRa [i] ^* Bop [i]) <<1;｝ Exceptions None Programming note This instruction does not support the int9 data type, but uses the int16 data type instead.

【０７０９】ＶＭＵＬＦ小数部乗算 VMULF Fractional Part Multiplication

【０７１０】[0710]

【表２２０】 [Table 220]

【０７１１】アセンブラ構文ＶＭＵＬＦ．ｄｔＶＲａ，ＶＲｂＶＭＵＬＦ．ｄｔＶＲａ，ＳＲｂＶＭＵＬＦ．ｄｔＶＲａ，＃ＩＭＭＶＭＵＬＦ．ｄｔＳＲａ，ＳＲｂＶＭＵＬＦ．ｄｔＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｈ，ｗ｝。Assembler syntax VMULF. dt VRa, VRb VMULF. dt VRa, SRb VMULF. dt VRa, #IMM VMULF. dt SRa, SRb VMULF. dt SRa, # IMM where dt = {b, h, w}.

【０７１２】[0712]

【表２２１】 [Table 221]

【０７１３】説明Ｒａの各エレメントをＲｂの各エレメントと乗算して倍
精度中間結果を生成し、倍精度中間結果を１ビットだけ
左にシフトして、その結果の上位部分を目的レジスタ
（ＶＲｄ＋１）にリターンし、その結果の下位部分を目
的レジスタＶＲｄにリターンさせる。ＶＲｄは偶数番号
のレジスタでなければならない。[0713] Description: Each element of Ra is multiplied by each element of Rb to generate a double-precision intermediate result, the double-precision intermediate result is shifted by one bit to the left, and the upper part of the result is stored in the destination register (VRd + 1). And the lower part of the result is returned to the destination register VRd. VRd must be an even numbered register.

【０７１４】演算 for(i=0;i < NumElem ＆＆ EMASK[i];i++)｛ Bop[i]＝｛VRb[i]‖SRb ‖sex(IMM<8:0>）｝； Hi[i]:Lo[i]=(VRa[i] ^*Bop[i])<<1； VRd+1[i]=Hi[i]; VRd[i]=Lo[i]; ｝例外無しプログラミング注意この命令はｉｎｔ９データタイプを支援せず、その代わ
りｉｎｔ１６データタイプを使用する。Operation for (i = 0; i <NumElem && EMASK [i]; i ++) ｛Bop [i] = {VRb [i] ‖SRb‖sex (IMM <8: 0>)}; Hi [i] : Lo [i] = (VRa [i] ^* Bop [i]) <<1; VRd + 1 [i] = Hi [i]; VRd [i] = Lo [i];｝ No exception Programming attention This instruction Does not support the int9 data type, but instead uses the int16 data type.

【０７１５】ＶＭＵＬＦＲ小数部乗算及び四捨
五入 VMULFR Decimal part multiplication and rounding
Goiri

【０７１６】[0716]

【表２２２】 [Table 222]

【０７１７】アセンブラ構文ＶＭＵＬＦＲ．ｄｔＶＲｄ，ＶＲａ，ＶＲｂＶＭＵＬＦＲ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂＶＭＵＬＦＲ．ｄｔＶＲｄ，ＶＲａ，＃ＩＭＭＶＭＵＬＦＲ．ｄｔＳＲｄ，ＳＲａ，ＳＲｂＶＭＵＬＦＲ．ｄｔＳＲｄ，ＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｈ，ｗ｝。Assembler syntax VMULFR. dt VRd, VRa, VRb VMULFR. dt VRd, VRa, SRb VMULFR. dt VRd, VRa, #IMM VMULFR. dt SRd, SRa, SRb VMULFR. dt SRd, SRa, #IMM where dt = {b, h, w}.

【０７１８】[0718]

【表２２３】 [Table 223]

【０７１９】説明Ｒａの各エレメントをＲｂの各エレメントと乗算して倍
精度中間結果を生成し、倍精度中間結果を１ビットだけ
左にシフトして、シフトされた中間結果を上位部分に対
して四捨五入し、上位部分を目的レジスタ（ＶＲｄ）に
リターンさせる。Description Each element of Ra is multiplied by each element of Rb to generate a double precision intermediate result, the double precision intermediate result is shifted left by one bit, and the shifted intermediate result is shifted to the upper part. Round off and return the upper part to the destination register (VRd).

【０７２０】演算 for(i=0;i < NumElem ＆＆ EMASK[i]:i++)｛ Bop[i]＝｛VRb[i]‖SRb ‖sex(IMM<8:0>）｝； Hi[i]:Lo[i]=(VRa[i] ^*Bop[i])<<1； if(Lo[i]<msb>==1)Hi[i]=Hi[i]+1; VRd[i]=Hi[i]; ｝例外無しプログラミング注意この命令はｉｎｔ９データタイプを支援せず、その代わ
りｉｎｔ１６データタイプを使用する。Operation for (i = 0; i <NumElem && EMASK [i]: i ++) ｛Bop [i] = {VRb [i] ‖SRb‖sex (IMM <8: 0>)｝; Hi [i] : Lo [i] = (VRa [i] ^* Bop [i]) <<1; if (Lo [i] <msb> == 1) Hi [i] = Hi [i] +1; VRd [i] = Hi [i];｝ Exceptions None Programming notes This instruction does not support the int9 data type, but uses the int16 data type instead.

【０７２１】ＶＭＵＬＬロー乗算 VMULL Row Multiplication

【０７２２】[0722]

【表２２４】 [Table 224]

【０７２３】アセンブラ構文ＶＭＵＬＬ．ｄｔＶＲｄ，ＶＲａ，ＶＲｂＶＭＵＬＬ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂＶＭＵＬＬ．ｄｔＶＲｄ，ＶＲａ，＃ＩＭＭＶＭＵＬＬ．ｄｔＶＲｄ，ＳＲａ，ＳＲｂＶＭＵＬＬ．ｄｔＶＲｄ，ＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｈ，ｗ，ｆ｝。The assembler syntax VMULLL. dt VRd, VRa, VRb VMULL. dt VRd, VRa, SRb VMULL. dt VRd, VRa, #IMM VMULL. dt VRd, SRa, SRb VMULL. dt VRd, SRa, #IMM where dt = {b, h, w, f}.

【０７２４】[0724]

【表２２５】 [Table 225]

【０７２５】説明Ｒａの各エレメントをＲｂの各エレメントと乗算して倍
精度中間結果を生成し、その結果の下位部分を目的レジ
スタＶＲｄにリターンする。[0725] Description: Each element of Ra is multiplied by each element of Rb to generate a double precision intermediate result, and the lower part of the result is returned to the destination register VRd.

【０７２６】フロートデータタイプに対して全てのオペ
ランドと結果は単精度である。All operands and results are single precision for the float data type.

【０７２７】例外オーバフロー、浮動小数点無効オペランドプログラミング注意この命令はｉｎｔ９データタイプを支援せず、その代わ
りｉｎｔ１６データタイプを使用する。[0727] Exceptions Overflow, floating point invalid operand Programming Notes This instruction does not support the int9 data type, but instead uses the int16 data type.

【０７２８】ＶＮＡＮＤＮＡＮＤ [ 0727 ] VNAND NAND

【０７２９】[0729]

【表２２６】 [Table 226]

【０７３０】アセンブラ構文ＶＮＡＮＤ．ｄｔＶＲｄ，ＶＲａ，ＶＲｂＶＮＡＮＤ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂＶＮＡＮＤ．ｄｔＶＲｄ，ＶＲａ，＃ＩＭＭＶＮＡＮＤ．ｄｔＶＲｄ，ＳＲａ，ＳＲｂＶＮＡＮＤ．ｄｔＶＲｄ，ＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｈ，ｗ，ｆ｝、．ｗと．ｆが同一
の演算を指定することに留意されたい。[0731] Assembler syntax VNAND. dt VRd, VRa, VRb VNAND. dt VRd, VRa, SRb VNAND. dt VRd, VRa, #IMM VNAND. dt VRd, SRa, SRb VNAND. dt VRd, SRa, #IMM where dt = {b, h, w, f},. w and. Note that f specifies the same operation.

【０７３１】[0731]

【表２２７】 [Table 227]

【０７３２】説明Ｒａにある各エレメントの各ビットとＲｂ／即値オペラ
ンドにある対応するビットを論理的にＮＡＮＤし、その
結果をＲｄにリターンさせる。[0732] Description: Each bit of each element in Ra is logically NANDed with the corresponding bit in the Rb / immediate operand, and the result is returned to Rd.

【０７３３】演算 for(i=0;i < NumElem ＆＆ EMASK[i];i++)｛ Bop[i]＝｛VRb[i]‖SRb ‖sex(IMM<8:0>）｝； Rd[i](k)=-(Ra[i]<k> ＆ Bop[i]<k>,for k=all bits in element；｝例外無しＶＮＯＲＮＯＲ Operation for (i = 0; i <NumElem && EMASK [i]; i ++) ｛Bop [i] = {VRb [i] ‖SRb‖sex (IMM <8: 0>)}; Rd [i] (k) =-(Ra [i] <k>& Bop [i] <k>, for k = all bits in element;｝ Exception None VNOR NOR

【０７３４】[0734]

【表２２８】 [Table 228]

【０７３５】アセンブラ構文ＶＮＯＲ．ｄｔＶＲｄ，ＶＲａ，ＶＲｂＶＮＯＲ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂＶＮＯＲ．ｄｔＶＲｄ，ＶＲａ，＃ＩＭＭＶＮＯＲ．ｄｔＳＲｄ，ＳＲａ，ＳＲｂＶＮＯＲ．ｄｔＳＲｄ，ＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｂ９，ｗ，ｆ｝、．ｗと．ｆが同
一の演算を指定することに留意されたい。[0735] Assembler syntax VNOR. dt VRd, VRa, VRb VNOR. dt VRd, VRa, SRb VNOR. dt VRd, VRa, #IMM VNOR. dt SRd, SRa, SRb VNOR. dt SRd, SRa, #IMM where dt = {b, b9, w, f},. w and. Note that f specifies the same operation.

【０７３６】[0736]

【表２２９】 [Table 229]

【０７３７】説明Ｒａにある各エレメントの各ビットとＲｂ／即値オペラ
ンドにある対応するビットを論理的にＮＯＲし、その結
果をＲｄにリターンさせる。[0738] Description: The bits of each element in Ra and the corresponding bits in the Rb / immediate operand are logically NORed, and the result is returned to Rd.

【０７３８】演算 for(i=0;i < NumElem ＆＆ EMASK[i];i++)｛ Bop[i]＝｛VRb[i]‖SRb ‖sex(IMM<8:0>）｝； Rd[i](k)=-(Ra[i]<k> 1 Bop[i]<k>,for k=all bits in element ；｝例外無しＶＯＲＯＲ Operation for (i = 0; i <NumElem && EMASK [i]; i ++) ｛Bop [i] = {VRb [i] ‖SRb‖sex (IMM <8: 0>)｝; Rd [i] (k) =-(Ra [i] <k> 1 Bop [i] <k>, for k = all bits in element;｝ Exception None VOR OR

【０７３９】[0739]

【表２３０】 [Table 230]

【０７４０】アセンブラ構文ＶＯＲ．ｄｔＶＲｄ，ＶＲａ，ＶＲｂＶＯＲ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂＶＯＲ．ｄｔＶＲｄ，ＶＲａ，＃ＩＭＭＶＯＲ．ｄｔＳＲｄ，ＳＲａ，ＳＲｂＶＯＲ．ｄｔＳＲｄ，ＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｂ９，ｗ，ｆ｝、．ｗと．ｆが同
一の演算を指定することに留意されたい。[0739] Assembler syntax VOR. dt VRd, VRa, VRb VOR. dt VRd, VRa, SRb VOR. dt VRd, VRa, #IMM VOR. dt SRd, SRa, SRb VOR. dt SRd, SRa, #IMM where dt = {b, b9, w, f},. w and. Note that f specifies the same operation.

【０７４１】[0741]

【表２３１】 [Table 231]

【０７４２】説明Ｒａにある各エレメントの各ビットとＲｂ／即値オペラ
ンドにある対応するビットを論理的にＯＲし、その結果
をＲｄにリターンさせる。[0739] Description: The bits of each element in Ra are logically ORed with the corresponding bits in the Rb / immediate operand, and the result is returned to Rd.

【０７４３】演算 for(i=0;i < NumElem ＆＆ EMASK[i];i++)｛ Bop[i]＝｛VRb[i]‖SRb ‖sex(IMM<8:0>）｝； Rd[i]<k>=-(Ra[i]<k> 1 Bop[i]<k>,for k=all bits in element ；｝例外無しＶＯＲＣ補数ＯＲ Operation for (i = 0; i <NumElem && EMASK [i]; i ++) ｛Bop [i] = {VRb [i] ‖SRb‖sex (IMM <8: 0>)｝; Rd [i] <k> =-(Ra [i] <k> 1 Bop [i] <k>, for k = all bits in element;｝ Exception None Vorc Complement OR

【０７４４】[0744]

【表２３２】 [Table 232]

【０７４５】アセンブラ構文ＶＯＲＣ．ｄｔＶＲｄ，ＶＲａ，ＶＲｂＶＯＲＣ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂＶＯＲＣ．ｄｔＶＲｄ，ＶＲａ，＃ＩＭＭＶＯＲＣ．ｄｔＳＲｄ，ＳＲａ，ＳＲｂＶＯＲＣ．ｄｔＳＲｄ，ＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ｝、．ｗと．ｆが同
一の演算を指定することに留意されたい。[0745] Assembler syntax Vorc. dt VRd, VRa, VRb VORC. dt VRd, VRa, SRb VORC. dt VRd, VRa, #IMM VORC. dt SRd, SRa, SRb VORC. dt SRd, SRa, #IMM where dt = {b, b9, h, w},. w and. Note that f specifies the same operation.

【０７４６】[0746]

【表２３３】 [Table 233]

【０７４７】説明Ｒａにある各エレメントの各ビットとＲｂ／即値オペラ
ンドにある対応するビットの補数を論理的にＯＲし、そ
の結果をＲｄにリターンさせる。[0747] Description: Logically OR each bit of each element in Ra with the complement of the corresponding bit in the Rb / immediate operand, and return the result to Rd.

【０７４８】演算 for(i=0:i < NumElem ＆＆ EMASK[i];i++)｛ Bop[i] =｛VRb[i]‖SRb ‖sex(IMM<8.0>) ｝; Ｒｄ［ｉ］＜ｋ＞＝Ｒａ［ｉ］＜ｋ＞１−Ｂｏｐ［ｉ］＜ｋ＞．ｆ
ｏｒｋ＝ａｌｌｂｉｔｓｉｎｅｌｅｍｅｎｔｉ：｝例外無しＶＰＦＴＣＨ事前取出し Operation for (i = 0: i <NumElem && EMASK [i]; i ++) ｛Bop [i] = ｛VRb [i] ‖SRb‖sex (IMM <8.0>)｝; Rd [i] <k > = Ra [i] <k> 1-Bop [i] <k>. f
or k = all bits in element i:｝ No exception VPFTCH Prefetch

【０７４９】[0749]

【表２３４】 [Table 234]

【０７５０】アセンブラ構文ＶＰＦＴＣＨ．ｄｔＳＲｂ，ＳＲｉＶＰＦＴＣＨ．ｄｔＳＲｂ，＃ＩＭＭＶＰＦＴＣＨ．ｄｔＳＲｂ＋，ＳＲｉＶＰＦＴＣＨ．ｄｔＳＲｂ＋，＃ＩＭＭここで、ｌｎ＝｛１，２，４，８｝。The assembler syntax VPFTCH. dt SRb, SRi VPFTCH. dt SRb, #IMM VPFTCH. dt SRb +, SRi VPFTCH. dt SRb +, #IMM where ln = {1, 2, 4, 8}.

【０７５１】説明有効アドレスから始める多数のベクトルデータキャッシ
ュラインを予め取り出す。キャッシュラインの数は次の
ように指定される：ＬＮ（１：０）＝００：１つの６４バイトキャッシュラ
インが予め取り出され。るＬＮ（１：０）＝０１：２つの６４バイトキャッシュラ
インが予め取り出される。Description A large number of vector data cache lines starting from an effective address are fetched in advance. The number of cache lines is specified as follows: LN (1: 0) = 00: One 64-byte cache line is prefetched. LN (1: 0) = 01: Two 64-byte cache lines are prefetched.

【０７５２】ＬＮ（１：０）＝１０：４つの６４バイト
キャッシュラインが予め取り出される。LN (1: 0) = 10: Four 64-byte cache lines are prefetched.

【０７５３】ＬＮ（１：０）＝１１：８つの６４バイト
キャッシュラインが予め取り出される。LN (1: 0) = 11: Eight 64-byte cache lines are prefetched.

【０７５４】もし有効キャッシュラインが６４バイトバ
ウンダリにない場合、それは６４バイトバウンダリに整
列されるように先に打ち切られる。If the valid cache line is not at a 64-byte boundary, it is truncated first so that it is aligned at a 64-byte boundary.

【０７５５】演算例外データアドレス例外無効プログラミング注意ＥＡ（３１：０）はローカルメモリのバイトアドレスを
示す。Operation Exception Data Address Exception Invalid Invalid Programming Note EA (31: 0) indicates the byte address of the local memory.

【０７５６】ＶＰＦＴＣＨＳＰ臨時パッドへの事前
取出し VPFTCHSP Advance to Temporary Pad
Take out

【０７５７】[0757]

【表２３５】 [Table 235]

【０７５８】アセンブラ構文ＶＰＦＴＣＨＳＰ．ｌｎＳＲｐ，ＳＲｂ，ＳＲｉＶＰＦＴＣＨＳＰ．ｌｎＳＲｐ，ＳＲｂ，＃ＩＭＭＶＰＦＴＣＨＳＰ．ｌｎＳＲｐ，ＳＲｂ＋，ＳＲｉＶＰＦＴＣＨＳＰ．ｌｎＳＲｐ，ＳＲｂ＋，＃ＩＭＭここで、ｌｎ＝｛１，２，４，８｝、ＶＰＦＴＣＨとＶ
ＰＦＴＣＨＳＰは同一の演算コードをもつ説明メモリから臨時パッドに多数の６４バイトブロックを伝
送する。有効アドレスはメモリに開始アドレスを提供
し、ＳＲｐは臨時パッドに開始アドレスを提供する。６
４バイトブロックの数は次のように指定される。[0758] Assembler syntax VPFTCHSP. ln SRp, SRb, SRi VPFTCHSP. ln SRp, SRb, #IMM VPFTCHSP. ln SRp, SRb +, SRi VPFTCHSP. ln SRp, SRb +, #IMM where ln = {1, 2, 4, 8}, VPFTCH and V
PFTCHSP has the same opcode Description A number of 64-byte blocks are transmitted from the memory to the temporary pad. The effective address provides the start address to the memory, and SRp provides the start address to the temporary pad. 6
The number of 4-byte blocks is specified as follows.

【０７５９】ＬＮ（１：０）＝００：１つの６４バイト
ブロックが伝送される。[0759] LN (1: 0) = 00: One 64-byte block is transmitted.

【０７６０】ＬＮ（１：０）＝０１：２つの６４バイト
ブロックが伝送される。LN (1: 0) = 01: Two 64-byte blocks are transmitted.

【０７６１】ＬＮ（１：０）＝１０：４つの６４バイト
ブロックが伝送される。LN (1: 0) = 10: Four 64-byte blocks are transmitted.

【０７６２】ＬＮ（１：０）＝１１：８つの６４バイト
ブロックが伝送される。LN (1: 0) = 11: Eight 64-byte blocks are transmitted.

【０７６３】もし有効キャッシュラインが６４バイトバ
ウンダリになければ、それは６４バイトバウンダリに整
列されるように先に打ち切られる。もしＳＲｐの臨時パ
ッドポインタアドレスが６４バイトバウンダリになけれ
ば、それはまた６４バイトバウンダリに整列されるよう
に先に打ち切られる。整列された臨時パッドポインタア
ドレスは伝送されたバイト数だけ増分される。If the valid cache line is not at a 64-byte boundary, it is truncated first so that it is aligned at a 64-byte boundary. If the temporary pad pointer address of SRp is not at a 64-byte boundary, it is also truncated first so that it is aligned at the 64-byte boundary. The aligned temporary pad pointer address is incremented by the number of bytes transmitted.

【０７６４】例外データアドレス例外無効ＶＲＯＬ左への回転 [0764] Exception Data address exception invalid VVOL Rotate left

【０７６５】[0765]

【表２３６】 [Table 236]

【０７６６】アセンブラ構文ＶＲＯＬ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂＶＲＯＬ．ｄｔＶＲｄ，ＶＲａ，＃ＩＭＭＶＲＯＬ．ｄｔＶＲｄ，ＳＲａ，ＳＲｂＶＲＯＬ．ｄｔＶＲｄ，ＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ｝。[0766] Assembler syntax VVOL. dt VRd, VRa, SRb VVOL. dt VRd, VRa, #IMM VOL. dt VRd, SRa, SRb VVOL. dt VRd, SRa, #IMM where dt = {b, b9, h, w}.

【０７６７】[0767]

【表２３７】 [Table 237]

【０７６８】説明ベクトル／スカラレジスタＲａの各データエレメントは
スカラレジスタＲｂ或いはＩＭＭフィールドに与えられ
たビット量だけ左に回転され、その結果はベクトル／ス
カラレジスタＲｄに記憶される。Description Each data element of the vector / scalar register Ra is rotated left by the bit amount given to the scalar register Rb or the IMM field, and the result is stored in the vector / scalar register Rd.

【０７６９】例外なしプログラミング注意回転量はＳＲｂ或いはＩＭＭ（４：０）から５ビット番
号で得られる点に注意されたい。バイト、バイト９、ハ
フワードデータタイプに対してプログラマはデータサイ
ズのビット数より小さいか同一の回転量を正確に指定す
る義務がある。もし回転量が指定されたデータサイズよ
りさらい大きければ、結果は定義されない。ｎだけ左に
回転することはＥｌｅｍＳｉｚｅ−ｎだけ右に回転する
のと等しく、ここでＥｌｅｍＳｉｚｅは与えられたデー
タサイズのビットの番号を示す。[0769] Exceptions None Programming notes Note that the amount of rotation is obtained from SRb or IMM (4: 0) as a 5-bit number. For byte, byte 9, and huff word data types, the programmer is obliged to specify exactly the same amount of rotation that is less than or equal to the number of bits in the data size. If the amount of rotation is much larger than the specified data size, the result is undefined. Turning left by n is equivalent to turning right by ElemSize-n, where ElemSize indicates the number of bits of a given data size.

【０７７０】ＶＲＯＲ右への回転 [0770] VROR Rotate right

【０７７１】[0771]

【表２３８】 [Table 238]

【０７７２】アセンブラ構文ＶＲＯＲ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂＶＲＯＲ．ｄｔＶＲｄ，ＶＲａ，＃ＩＭＭＶＲＯＲ．ｄｔＶＲｄ，ＳＲａ，ＳＲｂＶＲＯＲ．ｄｔＶＲｄ，ＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ｝。[0772] Assembler syntax VROR. dt VRd, VRa, SRb VROR. dt VRd, VRa, #IMM VROR. dt VRd, SRa, SRb VROR. dt VRd, SRa, #IMM where dt = {b, b9, h, w}.

【０７７３】[0773]

【表２３９】 [Table 239]

【０７７４】説明ベクトル／スカラレジスタＲａの各データエレメントは
スカラレジスタＲｂ或いはＩＭＭフィールドに与えられ
たビット量だけ右に回転され、その結果はベクトル／ス
カラレジスタＲｄに記憶される。Description Each data element of the vector / scalar register Ra is rotated right by the bit amount given to the scalar register Rb or the IMM field, and the result is stored in the vector / scalar register Rd.

【０７７５】例外なしプログラミング注意回転量はＳＲｂ或いはＩＭＭ（４：０）から５ビット番
号で得られる点に注意されたい。バイト、バイト９、ハ
フワードデータタイプに対してプログラマはデータサイ
ズのビット数より小さいか同一の回転量を正確に指定す
る義務がある。もし回転量が指定されたデータサイズよ
りさらい大きければ、結果は定義されない。ｎだけ右に
回転することはＥｌｅｍＳｉｚｅ−ｎだけ左に回転する
のと等しく、ここでＥｌｅｍＳｉｚｅは与えられたデー
タサイズのビットの番号を示す。[0775] Exceptions None Programming notes Note that the amount of rotation is obtained from SRb or IMM (4: 0) as a 5-bit number. For byte, byte 9, and huff word data types, the programmer is obliged to specify exactly the same amount of rotation that is less than or equal to the number of bits in the data size. If the amount of rotation is much larger than the specified data size, the result is undefined. Rotating right by n is equivalent to rotating left by ElemSize-n, where ElemSize indicates the number of bits for a given data size.

【０７７６】ＶＲＯＵＮＤ浮動小数点を整数に四捨五入する VROUND Round floating point to integer

【０７７７】[0777]

【表２４０】 [Table 240]

【０７７８】アセンブラ構文ＶＲＯＵＮＤ．ｒｍＶＲｄ，ＶＲｂＶＲＯＵＮＤ．ｒｍＳＲｄ，ＳＲｂここで、ｒｍ＝｛ｎｉｎｆ，ｚｅｒｏ，ｎｅａｒ，ｐｉ
ｎｆ｝。[0778] Assembler syntax VROUND. rm VRd, VRb VROUND. rm SRd, SRb where rm = {ninf, zero, near, pi
nf｝.

【０７７９】[0779]

【表２４１】 [Table 241]

【０７８０】説明浮動小数点データフォーマットでベクトル／スカラレジ
スタＲｂの内容は一番近い３２ビット整数（ワード）に
四捨五入され、その結果はベクトル／スカラレジスタＲ
ｄに記憶される。四捨五入モードはＲＭに定義される。Description In floating point data format, the contents of vector / scalar register Rb are rounded to the nearest 32-bit integer (word) and the result is vector / scalar register Rb.
stored in d. Rounding mode is defined in RM.

【０７８１】[0781]

【表２４２】 [Table 242]

【０７８２】例外無しプログラミング注意この命令はエレメントマスクに影響を受けない。[0782] Exceptions None Programming Notes This instruction is not affected by the element mask.

【０７８３】ＶＳＡＴＬ下限境界への飽和 VSATL Saturation to Lower Boundary

【０７８４】[0784]

【表２４３】 [Table 243]

【０７８５】アセンブラ構文ＶＳＡＴＬ．ｄｔＶＲｄ，ＶＲａ，ＶＲｂＶＳＡＴＬ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂＶＳＡＴＬ．ｄｔＶＲｄ，ＶＲａ，＃ＩＭＭＶＳＡＴＬ．ｄｔＳＲｄ，ＳＲａ，ＳＲｂＶＳＡＴＬ．ｄｔＳＲｄ，ＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ，ｆ｝、．ｆデータ
タイプは９ビット即値で支援されないことに留意された
い。[0785] Assembler syntax VSATL. dt VRd, VRa, VRb VSATL. dt VRd, VRa, SRb VSATL. dt VRd, VRa, #IMM VSATL. dt SRd, SRa, SRb VSATL. dt SRd, SRa, #IMM where dt = {b, b9, h, w, f},. Note that the f data type is not supported with a 9-bit immediate.

【０７８６】[0786]

【表２４４】 [Table 244]

【０７８７】説明ベクトル／スカラレジスタＲａの各データエレメントは
ベクトル／スカラレジスタＲｂ或いはＩＭＭフィールド
にあるそれの対応する下限値に対してチェックされる。
もしデータエレメントの値が下限値よりさらに小さけれ
ば、それは下限値と同一に設定され、最終結果はベクト
ル／スカラレジスタＲｄに記憶される。Description Each data element of the vector / scalar register Ra is checked against the vector / scalar register Rb or its corresponding lower limit in the IMM field.
If the value of the data element is even smaller than the lower limit, it is set equal to the lower limit and the final result is stored in the vector / scalar register Rd.

【０７８８】例外無しＶＳＡＴＵ上限境界への飽和 [0788] Exception None VSATU Saturation to upper boundary

【０７８９】[0789]

【表２４５】 [Table 245]

【０７９０】アセンブラ構文ＶＳＡＴＵ．ｄｔＶＲｄ，ＶＲａ，ＶＲｂＶＳＡＴＵ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂＶＳＡＴＵ．ｄｔＶＲｄ，ＶＲａ，＃ＩＭＭＶＳＡＴＵ．ｄｔＳＲｄ，ＳＲａ，ＳＲｂＶＳＡＴＵ．ｄｔＳＲｄ，ＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ，ｆ｝、．ｆデータ
タイプは９ビット即値で支援されないことに留意された
い。[0790] Assembler syntax VSATU. dt VRd, VRa, VRb VSATU. dt VRd, VRa, SRb VSATU. dt VRd, VRa, #IMM VSATU. dt SRd, SRa, SRb VSATU. dt SRd, SRa, #IMM where dt = {b, b9, h, w, f},. Note that the f data type is not supported with a 9-bit immediate.

【０７９１】[0791]

【表２４６】 [Table 246]

【０７９２】説明ベクトル／スカラレジスタＲａの各データエレメン
トはベクトル／スカラレジスタＲｂ或いはＩＭＭフィー
ルドにあるそれの対応する上限値に対してチェックされ
る。もしデータエレメントの値が上限値よりさらに小さ
ければ、それは上限値と同一に設定され、最終結果はベ
クトル／スカラレジスタＲｄに記憶される。Each data element in the description vector / scalar register Ra is checked against the vector / scalar register Rb or its corresponding upper limit in the IMM field. If the value of the data element is even smaller than the upper limit, it is set equal to the upper limit and the final result is stored in the vector / scalar register Rd.

【０７９３】例外無しＶＳＨＦＬシャフル(shuffle) [0793] No exception VSHFL shuffle

【０７９４】[0794]

【表２４７】 [Table 247]

【０７９５】アセンブラ構文ＶＳＨＦＬ．ｄｔＶＲｃ，ＶＲｄ，ＶＲａ，ＶＲｂＶＳＨＦＬ．ｄｔＶＲｃ，ＶＲｄ，ＶＲａ，ＳＲｂここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ｝、．ｗと．ｆが同
一の演算を指定することに留意されたい。[0795] Assembler syntax VSHFL. dt VRc, VRd, VRa, VRb VSHFL. dt VRc, VRd, VRa, SRb where dt = {b, b9, h, w},. w and. Note that f specifies the same operation.

【０７９６】[0796]

【表２４８】 [Table 248]

【０７９７】説明ベクトルレジスタＲａの内容は下記に示すようにＲｂと
混ぜられて(shuffle）、その結果はベクトルレジスタＲ
ｃ：Ｒｄに記憶される。Description The contents of vector register Ra are shuffled with Rb as shown below, and the result is
c: Stored in Rd.

【０７９８】[0798]

【表２４９】 [Table 249]

【０７９９】演算例外無しプログラミング注意この命令はエレメントアスクを使用しない。Operation Exceptions None Programming Caution This instruction does not use element asks.

【０８００】ＶＳＨＦＬＨハイシャフル VSHFLH High Shuffle

【０８０１】[0801]

【表２５０】 [Table 250]

【０８０２】アセンブラ構文ＶＳＨＦＬＨ．ｄｔＶＲｄ，ＶＲａ，ＶＲｂＶＳＨＦＬＨ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ｝、．ｗと．ｆが同
一の演算を指定することに留意されたい。[0802] Assembler syntax VSHFLH. dt VRd, VRa, VRb VSHFLH. dt VRd, VRa, SRb where dt = {b, b9, h, w},. w and. Note that f specifies the same operation.

【０８０３】[0803]

【表２５１】 [Table 251]

【０８０４】説明ベクトルレジスタＲａの内容は下記に示すようにＲｂと
混ぜられて(shuffle）、その結果の上位部分はベクトル
レジスタＲｄに記憶される。Description The contents of the vector register Ra are shuffled with Rb as described below, and the upper part of the result is stored in the vector register Rd.

【０８０５】[0805]

【表２５２】 [Table 252]

【０８０６】演算例外無しプログラミング注意この命令はエレメントマスクを使用しない。Operation Exceptions None Programming Notes This instruction does not use an element mask.

【０８０７】ＶＳＨＦＬＬローシャフル VSHFLLL Low Shuffle

【０８０８】[0808]

【表２５３】 [Table 253]

【０８０９】アセンブラ構文ＶＳＨＦＬＬ．ｄｔＶＲｄ，ＶＲａ，ＶＲｂＶＳＨＦＬＬ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ，ｆ｝、．ｗと．ｆ
が同一の演算を指定することに留意されたい。[0809] Assembler syntax VSHFLLL. dt VRd, VRa, VRb VSHFLLL. dt VRd, VRa, SRb where dt = {b, b9, h, w, f},. w and. f
Specify the same operation.

【０８１０】[0810]

【表２５４】 [Table 254]

【０８１１】説明ベクトルレジスタＲａの内容は下記に示すようにＲｂと
混ぜられて(shuffle）、その結果の下位部分はベクトル
レジスタＲｄに記憶される。Description The contents of the vector register Ra are shuffled with Rb as described below, and the lower part of the result is stored in the vector register Rd.

【０８１２】[0812]

【表２５５】 [Table 255]

【０８１３】演算例外無しプログラミング注意この命令はエレメントマスクを使用しない。Operation Exceptions None Programming Note This instruction does not use an element mask.

【０８１４】ＶＳＴ記憶 [0814] VST storage

【０８１５】[0815]

【表２５６】 [Table 256]

【０８１６】アセンブラ構文ＶＳＴ．ｓｔＲｓ，ＳＲｂ，ＳＲｉＶＳＴ．ｓｔＲｓ，ＳＲｂ，＃ＩＭＭＶＳＴ．ｓｔＲＳ，ＳＲｂ＋，ＳＲｉＶＳＴ．ｓｔＲｓ，ＳＲｂ＋，＃ＩＭＭここで、ｓｔ＝｛ｂ，ｂ９ｔ，ｈ，ｗ，４，８，１６，
３２，６４｝，Ｒｓ＝｛ＶＲｓ，ＶＲＡｓ，ＳＲ
ｓ｝、．ｂと．ｂ９ｔは同一の演算が指定され、．６４
とＶＲＡｓは共に指定され得ない点に留意されたい。キ
ャッシュオフ記憶のためにＶＳＴＯＦＦを使用する。[0816] Assembler syntax VST. st Rs, SRb, SRi VST. st Rs, SRb, #IMM VST. st RS, SRb +, SRi VST. st Rs, SRb +, #IMM where st = {b, b9t, h, w, 4, 8, 16,
32, 64}, Rs = {VRs, VRAs, SR
s｝,. b and. b9t specifies the same operation, 64
Note that both and VRAs cannot be specified. Use VSTOFF for cache off storage.

【０８１７】説明ベクトルまたはスカラレジスタを記憶する。Description Stores a vector or scalar register.

【０８１８】演算 EA= SR_b+ ｛SR_i‖ sex(IMM<7:0>)｝; if(A==1)SR_b=EA; ＭＥＭ［ＥＡ］＝ｓｅｅｔａｂｌｅｂｅｌｏ
ｗ；[0818] calculation EA = SR _b + {SR _i ‖ sex (IMM <7: 0>)}; if (A == 1) SR b = EA; MEM [EA] = see table belo
w;

【０８１９】[0819]

【表２５７】 [Table 257]

【０８２０】例外データアドレス、非整列アクセス無効プログラミング注意この命令はエレメントマスクによって影響を受けない。Exceptions Data Address, Unaligned Access Invalid Programming Note This instruction is not affected by the element mask.

【０８２１】ＶＳＴＣＢ円形バッファによる
記憶 VSTCB With circular buffer
Memory

【０８２２】[0822]

【表２５８】 [Table 258]

【０８２３】アセンブラ構文ＶＳＴＣＢ．ｓｔＲｓ，ＳＲｂ，ＳＲｉＶＳＴＣＢ．ｓｔＲｓ，ＳＲｂ，＃ＩＭＭＶＳＴＣＢ．ｓｔＲＳ，ＳＲｂ＋，ＳＲｉＶＳＴＣＢ．ｓｔＲｓ，ＳＲｂ＋，＃ＩＭＭここで、ｓｔ＝｛ｂ，ｂ９ｔ，ｈ，ｗ，４，８，１６，
３２，６４｝，Ｒｓ＝｛ＶＲｓ，ＶＲＡｓ，ＳＲ
ｓ｝、．ｂと．ｂ９ｔは同一の演算が指定され、．６４
とＶＲＡｄは共に指定され得ない点に留意されたい。キ
ャッシュオフロードのためにＶＳＴＣＢＯＦＦを使用す
る。[0823] Assembler syntax VSTCB. st Rs, SRb, SRi VSTCB. st Rs, SRb, #IMM VSTCB. st RS, SRb +, SRi VSTCB. st Rs, SRb +, #IMM where st = {b, b9t, h, w, 4, 8, 16,
32, 64}, Rs = {VRs, VRAs, SR
s｝,. b and. b9t specifies the same operation, 64
And VRAd cannot be specified together. Use VSTCBOFF for cache offload.

【０８２４】説明ＳＲｂ＋１でＢＥＧＩＮポインタ、ＳＲｂ＋２でＥＮＤ
ポインタによって境界になった円形バッファからベクト
ルまたはスカラレジスタを記憶する。[0824] Description: BEGIN pointer at SRb + 1, END at SRb + 2
Store a vector or scalar register from a circular buffer bounded by a pointer.

【０８２５】有効アドレスはもしそれが記憶のみならず
アドレスアップデート演算以前のＥＮＤアドレスよりさ
らに大きければ、調整される。さらに円形バッファ境界
はそれぞれ．ｈと．ｗスカラロードに対してハフワード
及びワード境界に整列されるべきである。The effective address is adjusted if it is larger than the END address before the address update operation as well as the storage. In addition, each circular buffer boundary is. h and. Huff words and word boundaries should be aligned for w scalar loads.

【０８２６】演算 EA =SR_b+ ｛SRi ‖ sex(IMM<7:0>)｝; BEGIN = SR_b+1; END = SR_b+2; cbsize = END - BEGIN; ｉｆ（ＥＡ＞ＥＮＤ）ＥＡ＝ＢＥＧＩＮ＋（Ｅ
Ａ − ＥＮＤ）；ｉｆ（Ａ＝＝１）ＳＲ_ｂ＝ＥＡ；ＭＥＭ［ＥＡ］＝ｓｅｅｔａｂｌｅｂｅｌｏ
ｗ；[0826] calculation EA = SR _b + {SRi ‖ sex (IMM <7: 0>)}; BEGIN = SR b + 1; END = SR b + 2; cbsize = END - BEGIN; if (EA> END) EA = BEGIN + (E
A-END); if (A == 1) SR _b = EA; MEM [EA] = see table bello
w;

【０８２７】[0827]

【表２５９】 [Table 259]

【０８２８】例外データアドレス、非整列アクセス無効プログラミング注意この命令はエレメントマスクによって影響を受けない。
プログラマはこの命令が案の通りに動作するように次の
条件を保障すべきである：ＢＥＧＩＮ＜ＥＡ＜２^*ＥＮＤ−ＢＥＧＩＮ即ち、ＥＡ＞ＢＥＧＩＮ及びＥＡ−ＥＮＤ＜ＥＮＤ−Ｂ
ＥＧＩＮＶＳＴＤダブル記憶 Exceptions Data Address, Unaligned Access Invalid Programming Note This instruction is not affected by the element mask.
The programmer should ensure that the following conditions work for this instruction to work as expected: BEGIN <EA <2 ^* END-BEGIN, ie EA> BEGIN and EA-END <END-B
EGIN VSSTD Double memory

【０８２９】[0829]

【表２６０】 [Table 260]

【０８３０】アセンブラ構文ＶＳＴＤ．ｓｔＲｓ，ＳＲｂ，ＳＲｉＶＳＴＤ．ｓｔＲｓ，ＳＲｂ，＃ＩＭＭＶＳＴＤ．ｓｔＲＳ，ＳＲｂ＋，ＳＲｉＶＳＴＤ．ｓｔＲｓ，ＳＲｂ＋，＃ＩＭＭここで、ｓｔ＝｛ｂ，ｂ９ｔ，ｈ，ｗ，４，８，１６，
３２，６４｝，Ｒｓ＝｛ＶＲｓ，ＶＲＡｓ，ＳＲ
ｓ｝、．ｂと．ｂ９ｔは同一の演算が指定され、．６４
とＶＲＡｓは共に指定され得ない点に留意されたい。キ
ャッシュオフ記憶のためにＶＳＴＤＯＦＦを使用する。[0832] Assembler syntax VSTD. st Rs, SRb, SRi VSTD. st Rs, SRb, #IMM VSTD. st RS, SRb +, SRi VSTD. st Rs, SRb +, #IMM where st = {b, b9t, h, w, 4, 8, 16,
32, 64}, Rs = {VRs, VRAs, SR
s｝,. b and. b9t specifies the same operation, 64
Note that both and VRAs cannot be specified. Use VSTDOFF for cache off storage.

【０８３１】説明現在或いは交替バンク或いは２スカラレジスタから２ベ
クトルレジスタを記憶する。Description Stores two vector registers from the current or replacement bank or two scalar registers.

【０８３２】演算 EA =SR_b+ ｛SR_i‖ sex(IMM<7:0>)｝；ｉｆ（Ａ＝＝１）ＳＲ_ｂ＝ＥＡ；ＭＥＭ［ＥＡ］＝ｓｅｅｔａｂｌｅｂｅｌｏ
ｗ；Operation EA = SR _b + {SR _i {sex (IMM <7: 0>)}; if (A == 1) SR _b = EA; MEM [EA] = seeable bello
w;

【０８３３】[0832]

【表２６１】 [Table 261]

【０８３４】例外データアドレス、非整列アクセス無効プログラミング注意この命令はエレメントマスクによって影響を受けない。Exceptions Data Address, Unaligned Access Invalid Programming Note This instruction is not affected by the element mask.

【０８３５】ＶＳＴＱ４重記憶 [ 0832 ] VSTQ Quadruple Memory

【０８３６】[0836]

【表２６２】 [Table 262]

【０８３７】アセンブラ構文ＶＳＴＱ．ｓｔＲｓ，ＳＲｂ，ＳＲｉＶＳＴＱ．ｓｔＲｓ，ＳＲｂ，＃ＩＭＭＶＳＴＱ．ｓｔＲＳ，ＳＲｂ＋，ＳＲｉＶＳＴＱ．ｓｔＲｓ，ＳＲｂ＋，＃ＩＭＭここで、ｓｔ＝｛ｂ，ｂ９ｔ，ｈ，ｗ，４，８，１６，
３２，６４｝，Ｒｓ＝｛ＶＲｓ，ＶＲＡｓ，ＳＲ
ｓ｝、．ｂと．ｂ９ｔは同一の演算が指定され、．６４
とＶＲＡｓは共に指定され得ない点に留意されたい。キ
ャッシュオフ記憶のためにＶＳＴＱＯＦＦを使用する。[0837] Assembler syntax VSTQ. st Rs, SRb, SRi VSTQ. st Rs, SRb, #IMM VSTQ. st RS, SRb +, SRi VSTQ. st Rs, SRb +, #IMM where st = {b, b9t, h, w, 4, 8, 16,
32, 64}, Rs = {VRs, VRAs, SR
s｝,. b and. b9t specifies the same operation, 64
Note that both and VRAs cannot be specified. Use VSTQOFF for cache off storage.

【０８３８】説明現在或いは交替バンク或いは４スカラレジスタから４ベ
クトルレジスタを記憶する。Description Stores 4 vector registers from the current or replacement bank or 4 scalar registers.

【０８３９】演算 EA =SR_b+ ｛SR_i‖ sex(IMM<7:0>)｝；ｉｆ（Ａ＝＝１）ＳＲ_ｂ＝ＥＡ；ＭＥＭ［ＥＡ］＝ｓｅｅｔａｂｌｅｂｅｌｏ
ｗ；Operation EA = SR _b + {SR _i {sex (IMM <7: 0>)}; if (A == 1) SR _b = EA; MEM [EA] = see table bello
w;

【０８４０】[0840]

【表２６３】 [Table 263]

【０８４１】例外データアドレス、非整列アクセス無効プログラミング注意この命令はエレメントマスクによって影響を受けない。Exceptions Data Address, Unaligned Access Invalid Programming Note This instruction is not affected by the element mask.

【０８４２】ＶＳＴＲ逆順記憶 VSTR Reverse Order Storage

【０８４３】[0843]

【表２６４】 [Table 264]

【０８４４】アセンブラ構文ＶＳＴＲ．ｓｔＲｓ，ＳＲｂ，ＳＲｉＶＳＴＲ．ｓｔＲｓ，ＳＲｂ，＃ＩＭＭＶＳＴＲ．ｓｔＲＳ，ＳＲｂ＋，ＳＲｉＶＳＴＲ．ｓｔＲｓ，ＳＲｂ＋，＃ＩＭＭここで、ｓｔ＝｛ｂ，ｂ９ｔ，ｈ，ｗ，４，８，１６，
３２，６４｝，Ｒｓ＝｛ＶＲｓ，ＶＲＡｓ，ＳＲ
ｓ｝、．６４とＶＲＡｄは一緒に指定され得ない点に留
意されたい。キャッシュオフ記憶のためにＶＳＴＲＯＦ
Ｆを使用する。[0844] Assembler syntax VSTR. st Rs, SRb, SRi VSTR. st Rs, SRb, #IMM VSTR. st RS, SRb +, SRi VSTR. st Rs, SRb +, #IMM where st = {b, b9t, h, w, 4, 8, 16,
32, 64}, Rs = {VRs, VRAs, SR
s｝,. Note that 64 and VRAd cannot be specified together. VSTROF for cache off storage
Use F.

【０８４５】説明逆エレメント順序でベクトルレジスタを記憶する。この
命令はスカラデータソースレジスタを支援しない。Description Stores vector registers in reverse element order. This instruction does not support scalar data source registers.

【０８４６】演算 EA =SR_b+ ｛SRi ‖ sex(IMM<7:0>)｝; if(A==1)SR_b= EA; MEM[EA] = see table below;[0846] calculation EA = SR _b + {SRi ‖ sex (IMM <7: 0>)}; if (A == 1) SR b = EA; MEM [EA] = see table below;

【０８４７】[0847]

【表２６５】 [Table 265]

【０８４８】例外データアドレス、非整列アクセス無効プログラミング注意この命令はエレメントマスクによって影響を受けない。Exceptions Data Address, Unaligned Access Invalid Programming Note This instruction is not affected by the element mask.

【０８４９】ＶＳＴＷＳストライド記憶 [0849] VSTWS Stride Memory

【０８５０】[0850]

【表２６６】 [Table 266]

【０８５１】アセンブラ構文ＶＳＴＷＳ．ｓｔＲｓ，ＳＲｂ，ＳＲｉＶＳＴＷＳ．ｓｔＲｓ，ＳＲｂ，＃ＩＭＭＶＳＴＷＳ．ｓｔＲＳ，ＳＲｂ＋，ＳＲｉＶＳＴＷＳ．ｓｔＲｓ，ＳＲｂ＋，＃ＩＭＭここで、ｓｔ＝｛４，８，１６，３２｝，Ｒｓ＝｛ＶＲ
ｓ，ＶＲＡｓ｝、．６４モードは支援されず、その代わ
りＶＳＴを使用することに留意されたい。キャッシュオ
フ記憶のためにＶＳＴＷＳＯＦＦを使用する。[0851] Assembler syntax VSTWS. st Rs, SRb, SRi VSTWS. st Rs, SRb, #IMM VSTWS. st RS, SRb +, SRi VSTWS. st Rs, SRb +, #IMM where st = {4, 8, 16, 32}, Rs = {VR
s, VRAs},. Note that the 64 mode is not supported and uses VST instead. Use VSTWSOFF for cache off storage.

【０８５２】説明有効アドレスから始めてストライド制御レジスタ(Strid
e Control Register)としてスカラレジスタＳＲｂ＋１
を使用してベクトルレジスタＶＲｓからメモリに３２バ
イトが記憶される。Description Starting from the effective address, the stride control register (Strid
e Control Register) as scalar register SRb + 1
, 32 bytes are stored in the memory from the vector register VRs.

【０８５３】ＳＴは各ブロックから記憶のために連続し
たバイトの番号とブロックサイズを指定する。ＳＲｂ＋
１は２連続ブロックの始まりを分離するバイトの番号と
ストライドを指定する。[0853] ST specifies the number of continuous bytes and the block size for storage from each block. SRb +
1 specifies the byte number and stride separating the start of two consecutive blocks.

【０８５４】ストライドはブロックサイズと同一か或い
は大きくなければならない。ＥＡは整列されたデータサ
イズでなければならない。ストライドとブロックサイズ
はデータサイズの多数倍になるべきである。The stride must be equal to or larger than the block size. EA must be aligned data size. Stride and block size should be many times the data size.

【０８５５】演算 EA= SR_b+ ｛SR_i‖ sex(IMM<7:0>)｝; if(A==1) SR _b=EA; Block size= ｛4 ‖ 8 ‖ 16 ‖ 32 ｝; Stride = SR_b+1<31:0>; for(i=0;j < VECSIZE/Block size;i++) for(j=0;j < Block size;j++) BYTE｛EA+i^*Stride+j] = VR_s[i^*Block size+j]<7:0>; 例外データアドレス、非整列アクセス無効ＶＳＵＢ減算 Operation EA = SR _b + ｛SR _i ‖ sex (IMM <7: 0>)｝; if (A == 1) SR _b = EA; Block size = ｛4 ‖ 8 ‖ 16 ‖ 32｝; Stride = SR _{b + 1} <31: 0>; for (i = 0; j <VECSIZE / Block size; i ++) for (j = 0; j <Block size; j ++) BYTE ｛EA + i ^* Stride + j] = VR _s (i ^* Block size + j] <7: 0>; Exception Data address, non-aligned access invalid VSUB subtraction

【０８５６】[0856]

【表２６７】 [Table 267]

【０８５７】アセンブラ構文ＶＳＵＢ．ｓｔＶＲｄ，ＶＲａ，ＶＲｂＶＳＵＢ．ｓｔＶＲｄ，ＶＲａ，ＳＲｂＶＳＵＢ．ｓｔＶＲｄ，ＶＲａ，＃ＩＭＭＶＳＵＢ．ｓｔＳＲｄ，ＳＲａ，ＳＲｂＶＳＵＢ．ｓｔＳＲｄ，ＳＲａ，＃ＩＭＭここで、ｓｔ＝｛ｂ，ｂ９ｔ，ｈ，ｗ，ｆ｝[0857] Assembler syntax VSUB. st VRd, VRa, VRb VSUB. st VRd, VRa, SRb VSUB. st VRd, VRa, #IMM VSUB. st SRd, SRa, SRb VSUB. st SRd, SRa, #IMM where st = {b, b9t, h, w, f}

【０８５８】[0858]

【表２６８】 [Table 268]

【０８５９】説明ベクトル／スカラレジスタＲｂの内容はベクトル／スカ
ラレジスタＲａの内容から減算され、その結果はベクト
ル／スカラレジスタＲｄに記憶される。[0859] Description: The contents of vector / scalar register Rb are subtracted from the contents of vector / scalar register Ra, and the result is stored in vector / scalar register Rd.

【０８６０】例外オーバフロー、浮動小数点無効オペランドＶＳＵＢＳ減算及びセット [0860] Exception overflow, floating point invalid operand VSUBS subtraction and set

【０８６１】[0861]

【表２６９】 [Table 269]

【０８６２】アセンブラ構文ＶＳＵＢＳ．ｄｔＳＲｄ，ＳＲａ，ＳＲｂＶＳＵＢＳ．ｄｔＳＲｄ，ＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ，ｆ｝。[0832] Assembler syntax VSUBS. dt SRd, SRa, SRb VSUBS. dt SRd, SRa, #IMM where dt = {b, b9, h, w, f}.

【０８６３】[0863]

【表２７０】 [Table 270]

【０８６４】説明ＳＲｂはＳＲａから減算され、その結果はＳＲｄに記憶
され、ＶＣＳＲにＶＦＬＡＧビットがセットされる。[0864] Description: SRb is subtracted from SRa, the result is stored in SRd, and the VFLAG bit is set in VCSR.

【０８６５】演算 Bop = ｛SRb ‖ sex(IMM<8:0>)｝; SRd = SRa - Bop; VCSR<lt,eq,gt> = status(SRa - Bop); 例外オーバフロー、浮動小数点無効オペランドＶＵＮＳＨＦＬアンシャッフル Operation Bop = {SRb} sex (IMM <8: 0>)}; SRd = SRa-Bop; VCSR <lt, eq, gt> = status (SRa-Bop); Exception Overflow, floating point invalid operand VUNSHFL Unshuffle

【０８６６】[0866]

【表２７１】 [Table 271]

【０８６７】アセンブラ構文ＶＵＮＳＨＦＬ．ｄｔＶＲｃ，ＶＲｄ，ＶＲａ，ＶＲ
ｂＶＵＮＳＨＦＬ．ｄｔＶＲｃ，ＶＲｄ，ＶＲａ，ＳＲ
ｂここで，ｄｔ＝｛ｂ、ｂ９、ｈ、ｗ、ｆ｝。．ｗと．ｆ
は同一な演算を指定することに注意する。[0867] Assembler syntax VUNSHFL. dt VRc, VRd, VRa, VR
b VUNSHFL. dt VRc, VRd, VRa, SR
b where dt = {b, b9, h, w, f}. . w and. f
Specifies the same operation.

【０８６８】[0868]

【表２７２】 [Table 272]

【０８６９】説明ベクトルレジスタＶＲｂの内容が次に示した通りベクト
ルレジスタＶＲｃ：ＶＲｄにＲｂとアンシャッフルされ
る。[0869] Description The contents of the vector register VRb are unshuffled with Rb in the vector register VRc: VRd as shown below.

【０８７０】[0870]

【表２７３】 [Table 273]

【０８７１】演算例外無し。Operation Exceptions None.

【０８７２】プログラミング注意この命令はエレメントマスクを使用しない。Programming Notes This instruction does not use an element mask.

【０８７３】ＶＵＮＳＨＦＬＨハイアンシャフ
ル [0873] VUNSHFLH High Anshuff
Le

【０８７４】[0874]

【表２７４】 [Table 274]

【０８７５】アセンブラ構文ＶＵＮＳＨＦＬＨ．ｄｔＶＲｄ，ＶＲａ，ＶＲｂＶＵＮＳＨＦＬＨ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ，ｆ｝、．ｗと．ｆ
が同一の演算を指定することに留意されたい。[0887] Assembler syntax VUNSHFLH. dt VRd, VRa, VRb VUNSHFLH. dt VRd, VRa, SRb where dt = {b, b9, h, w, f},. w and. f
Specify the same operation.

【０８７６】[0876]

【表２７５】 [Table 275]

【０８７７】説明ベクトルレジスタＲａの内容は下記に示すようにＲｂと
アンシャフル(unshuffle) され、その結果の上位部分は
ベクトルレジスタＲｄにリターンされる。[0877] Description The contents of the vector register Ra are unshuffled with Rb as described below, and the upper part of the result is returned to the vector register Rd.

【０８７８】[0878]

【表２７６】 [Table 276]

【０８７９】演算例外無しプログラミング注意この命令はエレメントマスクを使用しない。Operation Exceptions None Programming Note This instruction does not use an element mask.

【０８８０】ＶＵＮＳＨＦＬＬローアンシャフル [0880] VUNSHFLL Low unshuffle

【０８８１】[0881]

【表２７７】 [Table 277]

【０８８２】アセンブラ構文ＶＵＮＳＨＦＬＬ．ｄｔＶＲｄ，ＶＲａ，ＶＲｂＶＵＮＳＨＦＬＬ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ，ｆ｝、．ｗと．ｆ
が同一の演算を指定することに留意されたい。[0882] Assembler syntax VUNSHFLLL. dt VRd, VRa, VRb VUNSHFLLL. dt VRd, VRa, SRb where dt = {b, b9, h, w, f},. w and. f
Specify the same operation.

【０８８３】[0883]

【表２７８】 [Table 278]

【０８８４】説明ベクトルレジスタＲａの内容は下記に示すようにＲｂと
アンシャフル(unshuffle) され、その結果の上位部分は
ベクトルレジスタＲｄにリターンされる。Description The contents of the vector register Ra are unshuffled with Rb as described below, and the upper part of the result is returned to the vector register Rd.

【０８８５】[0885]

【表２７９】 [Table 279]

【０８８６】演算例外無しプログラミング注意この命令はエレメントマスクを使用しない。Operation Exceptions None Programming Notes This instruction does not use an element mask.

【０８８７】ＶＷＢＡＣＫ再記録 [0887] VWBACK re-recording

【０８８８】[0888]

【表２８０】 [Table 280]

【０８８９】アセンブラ構文ＶＷＢＡＣＫ．ｌｎＳＲｂ，ＳＲｉＶＷＢＡＣＫ．ｌｎＳＲｂ，＃ＩＭＭＶＷＢＡＣＫ．ｌｎＳＲｂ＋，ＳＲｉＶＷＢＡＣＫ．ｌｎＳＲｂ＋，＃ＩＭＭここで、ｌｎ＝｛１，２，４，８｝。[0889] Assembler syntax VWBACK. ln SRb, SRi VWBACK. ln SRb, #IMM VWBACK. ln SRb +, SRi VWBACK. ln SRb +, #IMM where ln = {1, 2, 4, 8}.

【０８９０】説明ベクトルデータキャッシュでＥＡによってインデックス
が指定されたキャッシュライン（ＥＡとタグが一致する
ものと反対）はそれが修正されたデータを含む場合、メ
モリにアップデートされる。もし１以上のキャッシュラ
インが指定される場合、次の順次的なキャッシュライン
はそれらが修正されたデータを含む場合、メモリにアッ
プデートされる。キャッシュラインの数は次のように指
定される：ＬＮ（１：０）＝００：１つの６４バイトキャッシュラ
インが記録される。Description The cache line indexed by the EA in the vector data cache (as opposed to the one whose tag matches the EA) is updated to memory if it contains modified data. If more than one cache line is specified, the next sequential cache line will be updated to memory if they contain modified data. The number of cache lines is specified as follows: LN (1: 0) = 00: One 64-byte cache line is recorded.

【０８９１】ＬＮ（１：０）＝０１：２つの６４バイト
キャッシュラインが記録される。[0891] LN (1: 0) = 01: Two 64-byte cache lines are recorded.

【０８９２】ＬＮ（１：０）＝１０：４つの６４バイト
キャッシュラインが記録される。[1030] LN (1: 0) = 10: Four 64-byte cache lines are recorded.

【０８９３】ＬＮ（１：０）＝１１：８つの６４バイト
キャッシュラインが記録される。[0893] LN (1: 0) = 11: Eight 64-byte cache lines are recorded.

【０８９４】もし有効アドレスが６４バイトバウンダリ
になければ、それは６４バイトバウンダリに整列される
ように先に打ち切られる。If the valid address is not on a 64-byte boundary, it is truncated first so that it is aligned on a 64-byte boundary.

【０８９５】演算例外データアドレス例外無効プログラミング注意ＥＡ（３１：０）はローカルメモリのバイトアドレスを
示す。[0895] Operation Exception Data address exception invalid Invalid programming note EA (31: 0) indicates the byte address of the local memory.

【０８９６】ＶＷＢＡＣＫＳＰ臨時パッドからの再
記録 [ 0896 ] VWBACKSP Replay from temporary pad.
Record

【０８９７】[0897]

【表２８１】 [Table 281]

【０８９８】アセンブラ構文ＶＷＢＡＣＫＳＰ．ｌｎＳＲｐ，ＳＲｂ，ＳＲｉＶＷＢＡＣＫＳＰ．ｌｎＳＲｐ，ＳＲｂ，＃ＩＭＭＶＷＢＡＣＫＳＰ．ｌｎＳＲｐ，ＳＲｂ＋，ＳＲｉＶＷＢＡＣＫＳＰ．ｌｎＳＲｐ，ＳＲｂ＋，＃ＩＭＭここで、ｌｎ＝｛１，２，４，８｝、ＶＷＢＡＣＫとＶ
ＷＢＡＣＫＳＰは同一の演算コードを使用する。[0898] Assembler syntax VWBACKSP. ln SRp, SRb, SRi VWBACKSP. ln SRp, SRb, #IMM VWBACKSP. ln SRp, SRb +, SRi VWBACKSP. ln SRp, SRb +, #IMM where ln = {1, 2, 4, 8}, VWBACK and V
WBACKSP uses the same operation code.

【０８９９】説明臨時パッドからメモリに多数の６４バイトブロックを伝
送する。有効アドレスはメモリに開始アドレスを提供
し、ＳＲｐは臨時パッドに開始アドレスを提供する。６
４バイトブロックの数は次のように指定される：ＬＮ（１：０）＝００：１つの６４バイトブロックが記
録される。Description A number of 64-byte blocks are transmitted from the temporary pad to the memory. The effective address provides the start address to the memory, and SRp provides the start address to the temporary pad. 6
The number of 4-byte blocks is specified as follows: LN (1: 0) = 00: One 64-byte block is recorded.

【０９００】ＬＮ（１：０）＝０１：２つの６４バイト
ブロックが記録される。[0900] LN (1: 0) = 01: Two 64-byte blocks are recorded.

【０９０１】ＬＮ（１：０）＝１０：４つの６４バイト
ブロックが記録される。LN (1: 0) = 10: Four 64-byte blocks are recorded.

【０９０２】ＬＮ（１：０）＝１１：８つの６４バイト
ブロックが記録される、もし有効アドレスが６４バイト
バウンダリになければ、それは６４バイトバウンダリに
整列されるように先に打ち切られる。もしＳＲｐの臨時
パッドポインタアドレスが６４バイトバウンダリになけ
れば、またそれは６４バイトバウンダリに整列されるよ
うに先に打ち切られる。整列された臨時パッドポインタ
アドレスは伝送されたバイトの数だけ増分される。LN (1: 0) = 11: Eight 64-byte blocks are recorded. If the effective address is not on a 64-byte boundary, it is truncated first so as to be aligned on the 64-byte boundary. If the temporary pad pointer address of SRp is not at a 64-byte boundary, it is truncated first to be aligned at the 64-byte boundary. The aligned temporary pad pointer address is incremented by the number of bytes transmitted.

【０９０３】例外データアドレス例外無効ＶＸＮＯＲＸＮＯＲ（排他的ＮＯＲ） [0903] Exception Data address exception invalid VXNOR XNOR (Exclusive NOR)

【０９０４】[0904]

【表２８２】 [Table 282]

【０９０５】アセンブラ構文ＶＸＮＯＲ．ｄｔＶＲｄ，ＶＲａ，ＶＲｂＶＸＮＯＲ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂＶＸＮＯＲ．ｄｔＶＲｄ，ＶＲａ，＃ＩＭＭＶＸＮＯＲ．ｄｔＳＲｄ，ＳＲａ，ＳＲｂＶＸＮＯＲ．ｄｔＳＲｄ，ＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ，ｆ｝。[0905] Assembler syntax VXNOR. dt VRd, VRa, VRb VXNOR. dt VRd, VRa, SRb VXNOR. dt VRd, VRa, #IMM VXNOR. dt SRd, SRa, SRb VXNOR. dt SRd, SRa, #IMM where dt = {b, b9, h, w, f}.

【０９０６】[0906]

【表２８３】 [Table 283]

【０９０７】説明ベクトル／スカラレジスタＲａの内容はベクトル／スカ
ラレジスタＲｂの内容に論理的にＸＮＯＲされ、その結
果はベクトル／スカラレジスタＲｄに記憶される。Description The contents of vector / scalar register Ra are logically XNORed into the contents of vector / scalar register Rb, and the result is stored in vector / scalar register Rd.

【０９０８】演算 for(i=0:i < NumElem ＆＆ EMASK[i];i++)｛ Bop[i] =｛VRb[i]‖SRb ‖sex(IMM<8.0>) ｝; Rd[i]<K>=-(Ra[i]<k> ＾Bop[i]<k>,for k =all bits in element i; ｝例外無しＶＸＯＲＸＯＲ（排他的ＯＲ） Operation for (i = 0: i <NumElem && EMASK [i]; i ++) ｛Bop [i] = ｛VRb [i] ‖SRb‖sex (IMM <8.0>)｝; Rd [i] <K > =-(Ra [i] <k> ＾ Bop [i] <k>, for k = all bits in element i;｝ Exception None VXOR XOR (Exclusive OR)

【０９０９】[0909]

【表２８４】 [Table 284]

【０９１０】アセンブラ構文ＶＸＯＲ．ｄｔＶＲｄ，ＶＲａ，ＶＲｂＶＸＯＲ．ｄｔＶＲｄ，ＶＲａ，ＳＲｂＶＸＯＲ．ｄｔＶＲｄ，ＶＲａ，＃ＩＭＭＶＸＯＲ．ｄｔＳＲｄ，ＳＲａ，ＳＲｂＶＸＯＲ．ｄｔＳＲｄ，ＳＲａ，＃ＩＭＭここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ｝。[0991] Assembler syntax VXOR. dt VRd, VRa, VRb VXOR. dt VRd, VRa, SRb VXOR. dt VRd, VRa, #IMM VXOR. dt SRd, SRa, SRb VXOR. dt SRd, SRa, #IMM where dt = {b, b9, h, w}.

【０９１１】[0911]

【表２８５】 [Table 285]

【０９１２】説明ベクトル／スカラレジスタＲａの内容はベクトル／スカ
ラレジスタＲｂの内容に論理的にＸＯＲされ、その結果
はベクトル／スカラレジスタＲｄに記憶される。Description The contents of vector / scalar register Ra are logically XORed with the contents of vector / scalar register Rb, and the result is stored in vector / scalar register Rd.

【０９１３】演算 for(i=0:i < NumElem ＆＆ EMASK[i];i++)｛ Bop[i] =｛VRb[i]‖SRb ‖sex(IMM<8:0>) ｝; Rd[i]<k>=Ra[i]<k> ＾Bop[i]<k>),for k =all bits in element i; ｝例外無しＶＸＯＲＡＬＬ全てのエレメントＸＯＲ（排他的
ＯＲ） Operation for (i = 0: i <NumElem && EMASK [i]; i ++) ｛Bop [i] = ｛VRb [i] ‖SRb‖sex (IMM <8: 0>)｝; Rd [i] <k> = Ra [i] <k> ＾ Bop [i] <k>), for k = all bits in element i;｝ Exception None VXORALL All elements XOR (exclusive
OR)

【０９１４】[0914]

【表２８６】 [Table 286]

【０９１５】アセンブラ構文ＶＸＯＲＡＬＬ．ｄｔＳＲｄ，ＶＲｂここで、ｄｔ＝｛ｂ，ｂ９，ｈ，ｗ，ｆ｝、．ｂと．ｂ
９は同一の演算を示す[0915] Assembler syntax VXORALL. dt SRd, VRb where dt = {b, b9, h, w, f},. b and. b
9 indicates the same operation

【０９１６】[0916]

【表２８７】 [Table 287]

【０９１７】説明ＶＲｂで各エレメントの最下位ビットは共にＸＯＲさ
れ、１ビット結果はＳＲｄの最下位ビットにリターンさ
れる。この命令はエレメントマスクによって影響を受け
ない。Description In VRb, the least significant bit of each element is XORed together, and the 1-bit result is returned to the least significant bit of SRd. This instruction is not affected by the element mask.

【０９１８】演算例外無しOperation Exceptions None

[Brief description of the drawings]

【図１】本発明の実施の形態によるマルチメディア信号
プロセッサのブロック図。FIG. 1 is a block diagram of a multimedia signal processor according to an embodiment of the present invention.

【図２】図１に示すマルチメディア信号プロセッサのベ
クトルプロセッサのブロック図。FIG. 2 is a block diagram of a vector processor of the multimedia signal processor shown in FIG. 1;

【図３】図２に示すベクトルプロセッサにおける命令取
出しユニットのブロック図。FIG. 3 is a block diagram of an instruction fetch unit in the vector processor shown in FIG. 2;

【図４】図２に示すベクトルプロセッサにおける命令取
出しユニットのブロック図。FIG. 4 is a block diagram of an instruction fetch unit in the vector processor shown in FIG. 2;

【図５】図２に示すベクトルプロセッサにおけるレジス
タ対レジスタ命令に対するステージ実行パイプラインを
示した段階図。5 is a stage diagram showing a stage execution pipeline for a register-to-register instruction in the vector processor shown in FIG. 2;

【図６】図２に示すベクトルプロセッサにおけるロード
命令の実行のための実行パイプラインを示した段階図。FIG. 6 is a stage diagram showing an execution pipeline for executing a load instruction in the vector processor shown in FIG. 2;

【図７】図２に示すベクトルプロセッサにおける格納命
令語の実行のための実行パイプラインを示した段階図。FIG. 7 is a stage diagram showing an execution pipeline for executing a stored instruction word in the vector processor shown in FIG. 2;

【図８】図２に示すベクトルプロセッサにおける実行デ
ータパスのブロック図。FIG. 8 is a block diagram of an execution data path in the vector processor shown in FIG. 2;

【図９】図８に示す実行データパスにおけるレジスタフ
ァイルのブロック図。FIG. 9 is a block diagram of a register file in the execution data path shown in FIG. 8;

【図１０】図８に示す実行データパスにおける並列処理
論理ユニットのブロック図。FIG. 10 is a block diagram of a parallel processing logic unit in the execution data path shown in FIG. 8;

【図１１】図２に示すベクトルプロセッサにおけるロー
ド／記憶ユニットのブロック図。FIG. 11 is a block diagram of a load / store unit in the vector processor shown in FIG. 2;

【図１２】本発明の実施の形態によるベクトルプロセッ
サの命令セットのフォーマット図。FIG. 12 is a format diagram of an instruction set of the vector processor according to the embodiment of the present invention.

[Explanation of symbols]

１００マルチメディアプロセッサ１０５プロセッシングコア１１０主プロセッサ１１５拡張レジスタ１２０ベクトルプロセッサ１３０キャッシュサブレジスタ１４０システムバス１４２システムタイマ１４４全二重ＵＡＲＴ１４６ビットストリームプロセッサ１４８インタラプトコントローラ１５０システムバス１５２デバイスインタフェース１５４ＤＭＡコントローラ１５６ローカルバスコントローラ１５８メモリコントローラ１６０，１９０ＳＲＡＭ１６２，１９２命令キャッシュ１６４，１９４データキャッシュ１７０ＲＯＭ１８０キャッシュコントロール２１０命令取出しユニット（ＩＦＵ）２２０デコーダ２３０スケジューラ２４０実行データパス２５０ロード／記憶ユニット（ＬＳＵ）６１０レジスタファイル Reference Signs List 100 Multimedia processor 105 Processing core 110 Main processor 115 Extension register 120 Vector processor 130 Cache subregister 140 System bus 142 System timer 144 Full duplex UART 146 Bit stream processor 148 Interrupt controller 150 System bus 152 Device interface 154 DMA controller 156 Local bus Controller 158 Memory controller 160, 190 SRAM 162, 192 Instruction cache 164, 194 Data cache 170 ROM 180 Cache control 210 Instruction fetch unit (IFU) 220 Decoder 230 Scheduler 230 Execution data path 250 Load / storage unit (LSU) 61 0 Register file

フロントページの続き (72)発明者リトロンギュエンアメリカ合衆国カリフォルニア州 95030 モンテセレノダニエルプレイス 15095 (72)発明者ロニーサードンウォンアメリカ合衆国カリフォルニア州 94086 サニーベイルラークスパーアベニュー 946Continuing on the front page (72) Inventor Litron Guyen, USA 95030 Monteselena Daniel Place, California 15095 (72) Inventor Ronnie Sir Don Won United States of America 94086 Sunnyvale Larkspur Ave 946

Claims

[Claims]

A scalar register adapted to store a single scalar value; a vector register adapted to store a number of data elements; coupled to the scalar register and the vector register. In response to a single instruction, wherein the processing circuit performs in parallel an operation in which each operation combines one of the data elements from the vector register with a scalar value from the scalar register. A processor, characterized in that:

2. A method of operating a processing circuit for executing an instruction, comprising: reading from a register data element forming a vector value component; and combining a scalar value with the respective data element to generate a vector result. Performing a calculation.

3. The method as claimed in claim 2, wherein the parallel performing step multiplies the scalar value by the respective data elements to generate a vector data result.

4. The method of claim 2, wherein the parallel performing step adds the scalar value to each of the data elements to generate a vector data result.

5. The method of claim 1, further comprising reading the scalar value from a second register for combining with the data element, wherein the second register is adapted to store a single scalar value. The processing circuit operation method according to claim 2.

6. The method of claim 2, further comprising extracting the scalar value from the instruction to combine with the data element.

7. A method of operating a processor, wherein each scalar register is adapted to store a single scalar value, and each vector register is adapted to store a number of data elements forming a vector component. Providing a scalar register and a vector register in the processor; assigning a register number to each scalar register that is distinct from a register number assigned to another scalar register; and Assigning a register number to each vector register, wherein at least some of the register numbers are the same as the register numbers assigned to the scalar registers, and distinguishing the register numbers from the register numbers assigned to other vector registers Know the registers Forming an instruction including a first operand that is a register number to be executed and a second operand that is a register number for identifying a vector register; and identifying the scalar register identified by the first operand and the second operand. Moving the data to and from a data element in the vector register and executing the instruction.

8. The formed instruction further includes a third operand identifying a data element in the vector,
Moving data between the scalar register identified by the first operand and the data element identified by the third operand in the vector register identified by the second operand to execute an instruction; The processor operation method according to claim 7, further comprising:

9. The formed instruction further includes a third operand identifying a second scalar register, wherein the scalar register identified by the first operand and the second scalar register identified by the second operand are stored in the second operand. 8. The method of claim 7, further comprising moving data to and from a data element in the vector register identified by a stored value and executing an instruction.