JP3317985B2

JP3317985B2 - Pseudo vector processor

Info

Publication number: JP3317985B2
Application number: JP30447691A
Authority: JP
Inventors: 喜三郎中澤; 宏中村; 弘充位守; 英夫和田
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1991-11-20
Filing date: 1991-11-20
Publication date: 2002-08-26
Anticipated expiration: 2017-08-26
Also published as: JPH07114534A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明はデ−タ処理装置に関
し、さらに具体的には命令によってアドレス可能なレジ
スタの数より多いレジスタをデ−タ処理装置がアクセス
可能とする技術に関する。特に、このようにすることに
よって、キャッシュがあまり有効でない大規模なデ−タ
を連続的に処理するいわゆるベクトル処理に際しても、
主記憶からの転送のための性能低下がほとんど生じない
ようにし、効率の良い疑似ベクトル処理を通常のデ−タ
処理装置で実現可能とする技術に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data processing apparatus, and more particularly, to a technique for enabling a data processing apparatus to access more registers than the number of registers addressable by an instruction. In particular, by doing so, so-called vector processing for continuously processing large-scale data for which the cache is not very effective,
The present invention relates to a technique which makes it possible to hardly cause a decrease in performance for transfer from a main memory and to realize efficient pseudo vector processing with a normal data processing device.

【０００２】[0002]

【従来の技術】（従来技術１）従来、命令によってア
ドレス可能なレジスタの数より多いレジスタをデ−タ処
理装置がアクセス可能とする技術としては、特開昭５７
−１６６６４９号に記載があり、その方式によると、ま
ず、プログラムによってアドレス可能な汎用レジスタの
数よりも多いハ−ドウェアレジスタと称するレジスタ群
を設け、異なった主記憶アドレスから同一の汎用レジス
タに対する複数のロ−ド命令があったら、該ロ−ド命令
の数だけハ−ドウェアレジスタに保持する（すなわち、
プログラムによってアドレス可能な汎用レジスタの数が
１６である場合、ハ−ドウェアレジスタを各汎用レジス
タ当たり１６、つまり、合計２５６用意し、たとえば、
汎用レジスタ０にはハ−ドウェアレジスタ０から１５を
割り当てる。汎用レジスタ０に対して１６個の異なった
主記憶アドレスを指定したロ−ド命令が実行された場合
は、該１６個のロ−ド命令からのデ−タをハ−ドウェア
レジスタ０からハ−ドウェアレジスタ１５までの中に保
持する）。そして、過去に実行されたロ−ド命令の主記
憶アドレスとその時ロ−ドされたデ−タが格納されてい
るハ−ドウェアレジスタ番号を登録しておく記憶機構を
設け、プログラムで発行されたロ−ド命令が該記憶機構
に登録されている主記憶アドレスと一致したら、主記憶
からデ−タを読まずに、対応するハ−ドウェアレジスタ
からデ−タを読みだす。この方式により、主記憶参照回
数が低減でき、また、命令間の参照レジスタのぶつかり
による性能低下を防ぐことができる。2. Description of the Related Art Conventionally, as a technique for enabling a data processing device to access more registers than the number of registers addressable by an instruction, Japanese Patent Laid-Open No.
According to this method, first, a register group called a hardware register is provided which is larger than the number of general-purpose registers addressable by a program, and a plurality of registers for the same general-purpose register are provided from different main memory addresses. Are stored in the hardware register as many as the number of the load instructions (that is, the number of the load instructions).
If the number of general-purpose registers that can be addressed by the program is 16, hardware registers are prepared for each general-purpose register, that is, a total of 256 hardware registers.
The hardware registers 0 to 15 are assigned to the general-purpose register 0. When a load instruction specifying 16 different main storage addresses is executed for general-purpose register 0, data from the 16 load instructions is transferred from hardware register 0 to hardware register 0. The data is stored in the hardware register 15). A storage mechanism is provided for registering a main memory address of a previously executed load instruction and a hardware register number in which data loaded at that time is stored. If the load instruction matches the main memory address registered in the storage mechanism, the data is read from the corresponding hardware register without reading the data from the main memory. According to this method, the number of times of referring to the main memory can be reduced, and performance degradation due to collision of the reference register between instructions can be prevented.

【０００３】（従来技術２）従来、命令によってアドレ
ス可能なレジスタの数より多いレジスタをデ−タ処理装
置がアクセス可能とする技術としては、Ｊ．Ｌ．Ｈｅｎ
ｎｅｓｓｙａｎｄＤ．Ａ．Ｐａｔｔｅｒｓｏｎ ”
ＣｏｍｐｕｔｅｒＡｒｃｈｉｔｅｃｔｕｒｅ：Ａ
ＱｕａｎｔｉｔａｔｉｖｅＡｐｐｒｏａｃｈ”，Ｍｏ
ｒｇａｎＫａｕｆｍａｎｎＰｕｂｌｉｓｈ−ｅｒ
ｓ，Ｉｎｃ．（１９９０）に記載があり、その方式に
よると、まず、プログラムによってアドレス可能なレジ
スタの数より多い物理レジスタと称するレジスタを設
け、該物理レジスタを複数個のウィンドウと呼ばれる部
分に分ける。すなわち、各ウィンドウは複数個の物理レ
ジスタから成る。たとえば、プログラムによってレジス
タが番号１からｎまで番号づけられるとし、物理レジス
タがｎ＊ｍ個、すなわち、番号１からｎ＊ｍまで設けら
れたとする。ウィンドウをｍ個、すなわち、番号１から
ｍまで設けるとすれば、ウィンドウ１は物理レジスタ１
からｎ、ウィンドウ２は物理レジスタｎ＋１から２ｎと
いうように割り当てることができる。実際には、全ウィ
ンドウ共通の物理レジスタや、隣合うウィンドウ共通の
物理レジスタなどを設けるのが通例であるが、簡単のた
め、前記の例を示した。各ウィンドウは１つのプログラ
ムが使用するレジスタを持つ。すなわち、あるプログラ
ムでアドレス可能なレジスタを参照することは、実際に
は、あるウィンドウに属する物理レジスタを参照するこ
とになる。たとえば、前記の例では、あるプログラムに
ウィンドウ２が割当てられていたとしたら、該プログラ
ムでレジスタｋを指定したら、参照される物理レジスタ
は物理レジスタｎ＋ｋになる。(Prior Art 2) Conventionally, as a technique for enabling a data processing device to access more registers than the number of registers addressable by an instruction, J. L. Hen
nessy and D.S. A. Patterson "
ComputerArchitecture: A
QuantitativeApproach ", Mo
rgan Kaufmann Publish-er
s, Inc. (1990). According to this method, first, a register called a physical register that is larger than the number of registers addressable by a program is provided, and the physical register is divided into a plurality of windows. That is, each window is composed of a plurality of physical registers. For example, it is assumed that registers are numbered from number 1 to n by a program, and that n * m physical registers, that is, numbers 1 to n * m are provided. Assuming that there are m windows, that is, numbers 1 to m, window 1 is assigned to physical register 1
To n, window 2 can be assigned as physical registers n + 1 to 2n, and so on. In practice, it is customary to provide a physical register common to all windows, a physical register common to adjacent windows, and the like. However, for simplicity, the above example is shown. Each window has registers used by one program. In other words, referring to a register that can be addressed by a certain program actually refers to a physical register belonging to a certain window. For example, in the above example, if window 2 is assigned to a certain program, if the register k is specified in the program, the physical register to be referred to is the physical register n + k.

【０００４】このウィンドウは以下のように使う。仮
に、あるプログラムにウィンドウｊが割当てられていた
場合、該プログラムが別のプログラムを呼んだ（ｃａｌ
ｌした）場合、呼ばれたプログラムには、ウィンドウｊ
＋１が割当てられる。また、あるプログラムにウィンド
ウｊが割当てられていた場合、該プログラムから該プロ
グラムを呼んだプログラムに戻った（ｒｅｔｕｒｎし
た）場合、戻り先のプログラムには、ウィンドウｊ−１
が割当てられる。このように使うことによって、以下の
効果がある。プログラムによってアドレス可能なレジス
タの数だけのレジスタしか持たないシステムでは、前記
のようなプログラムの呼びが発生するたびに、該呼び発
生時点の情報保存のために、レジスタに格納されていた
デ−タを主記憶に格納しなければならず、プログラムの
戻りが発生するたびに、プログラムの再開のために、主
記憶に格納されていたデ−タをレジスタに書き戻さなく
てはならない。前記のウィンドウの機構を持つシステム
では、異なるウィンドウが割当てられているプログラム
は異なる物理レジスタを参照しているので、前記のレジ
スタからの主記憶への格納、主記憶からレジスタへの書
き戻しの操作が不要になり、処理が高速化される。This window is used as follows. If window j is allocated to a certain program, the program calls another program (cal
l), the called program contains a window j
+1 is assigned. If window j is allocated to a certain program, if the program returns to the program that called the program (returned), the window j-1
Is assigned. By using in this way, the following effects are obtained. In a system having only as many registers as the number of registers that can be addressed by a program, each time a program call as described above occurs, the data stored in the register is used to save information at the time of the call. Must be stored in the main memory, and every time the program returns, the data stored in the main memory must be written back to the register in order to restart the program. In a system having the above window mechanism, since a program to which a different window is assigned refers to a different physical register, the operation of storing data from the above-mentioned register to the main memory and writing back from the main memory to the register is performed. Becomes unnecessary, and the processing speed is increased.

【０００５】ただし、該ウィンドウの機構を持つシステ
ムでは、「最大のウィンドウ番号のプログラムからプロ
グラムの呼びが発された場合、ウィンドウオ−バフロ−
の割込みを起こし、最小のウィンドウ番号のプログラム
からプログラムの戻りが発された場合、ウィンドウアン
ダフロ−の割込みを起こす」という制御が必要になる。[0005] However, in a system having the window mechanism, when a program call is issued from the program having the largest window number, the window overflow is performed.
When the program returns from the program with the smallest window number, a window underflow interrupt is caused. "

【０００６】[0006]

【発明が解決しようとする課題】科学技術計算の大部分
は以下に示すような、ベクトル演算である。The majority of scientific and technical calculations are vector operations as described below.

【０００７】Ｃ（ｉ）＝Ａ（ｉ）＋Ｂ（ｉ）ｉ＝１，Ｎ（１）ここで、Ａ，Ｂ，Ｃは要素数Ｎのベクトルである。C (i) = A (i) + B (i) i = 1, N (1) Here, A, B, and C are vectors having N elements.

【０００８】式（１）を、汎用計算機で実行すると、表
１のようなプログラムになる。When equation (1) is executed by a general-purpose computer, a program as shown in Table 1 is obtained.

【０００９】以下の説明で、レジスタとして浮動小数点
数が格納される浮動小数点レジスタを例にとり、浮動小
数点レジスタのデ−タ幅は８バイトであるとする。In the following description, a floating-point register storing a floating-point number is taken as an example, and it is assumed that the data width of the floating-point register is 8 bytes.

【００１０】表１中の各命令の機能を以下に説明する。The function of each instruction in Table 1 will be described below.

【００１１】ＦＬＤＭａ（ＧＲｍ），ＦＲｎ（機能）汎用レジスタｍの値で表される主記憶アドレス
から８バイトのデ−タを読みだし、浮動小数点レジスタ
ｎに格納する。FLDM a (GRm), FRn (Function) 8-byte data is read from the main memory address represented by the value of the general-purpose register m and stored in the floating-point register n.

【００１２】その後、汎用レジスタｍの値をａ加える。Thereafter, the value of the general-purpose register m is added by a.

【００１３】ＦＡＤＤＦＲｊ，ＦＲｍ，ＦＲｎ（機能）浮動小数点レジスタｍの値と浮動小数点レジス
タｎの値を加えて浮動小数点レジスタｊに格納する。FADD FRj, FRm, FRn (Function) The value of the floating-point register m and the value of the floating-point register n are added and stored in the floating-point register j.

【００１４】ＦＳＴＭａ（ＧＲｍ），ＦＲｎ（機能）浮動小数点レジスタｎの値（８バイト）を汎用
レジスタｍの値で表される主記憶アドレスにストアす
る。FSTM a (GRm), FRn (Function) Stores the value (8 bytes) of floating-point register n in the main memory address represented by the value of general-purpose register m.

【００１５】その後、汎用レジスタｍの値をａ加える。Thereafter, the value of the general-purpose register m is added by a.

【００１６】ＢＣＮＴＧＲｍ，ｔ（機能）ＧＲｍの値を１減じる。その値がゼロでなけれ
ば、番地ｔに分岐する。ゼロならば、分岐しない。BCNT GRm, t (Function) Decreases the value of GRm by one. If the value is not zero, branch to address t. If zero, do not branch.

【００１７】[0017]

【表１】 [Table 1]

【００１８】ここで、表１のプログラムの実行に先だっ
て、ベクトルＡは、主記憶アドレスａｄ１から始まる連
続領域に格納されているものとする。すなわち、Ａ
（１）の主記憶アドレスがａｄ１、Ａ（２）の主記憶ア
ドレスがａｄ１＋８というように格納されている。同様
にベクトルＢは、主記憶アドレスａｄ２から始まる連続
領域に格納されているものとする。また、ベクトルＣ
は、主記憶アドレスａｄ３から始まる連続領域に格納す
るものとする。汎用レジスタ１にはａｄ１が、汎用レジ
スタ２にはａｄ２が、汎用レジスタ３にはａｄ３が、汎
用レジスタ４にはＮが前もって格納されているものとす
る。Here, it is assumed that prior to the execution of the program shown in Table 1, the vector A is stored in a continuous area starting from the main storage address ad1. That is, A
The main storage address of (1) is stored as ad1, and the main storage address of A (2) is stored as ad1 + 8. Similarly, it is assumed that the vector B is stored in a continuous area starting from the main storage address ad2. Also, the vector C
Is stored in a continuous area starting from the main storage address ad3. It is assumed that ad1 is stored in general-purpose register 1, ad2 is stored in general-purpose register 2, ad3 is stored in general-purpose register 3, and N is stored in general-purpose register 4 in advance.

【００１９】表１からわかるように、Ｎｏ．１、Ｎｏ．
２のＦＬＤＭ命令でＡ（ｉ）、Ｂ（ｉ）がそれぞれ、浮
動小数点レジスタ１２、１３にロ−ドされ、該２レジス
タの値が加えられ浮動小数点レジスタ２０に格納され、
該レジスタの内容がＣ（ｉ）にストアされる。As can be seen from Table 1, no. 1, No.
A (i) and B (i) are loaded into the floating point registers 12 and 13 respectively by two FLDM instructions, the values of the two registers are added and stored in the floating point register 20,
The contents of the register are stored in C (i).

【００２０】すなわち、５命令からなるル−プを１回実
行することによって、１要素の結果が求まり、このル−
プをＮ回実行することによって全要素計算ができる。That is, by executing a loop consisting of five instructions once, a result of one element is obtained.
By executing the loop N times, all element calculations can be performed.

【００２１】ここで問題となるのは、１ル−プの実行時
間である。まずＮｏ．１とＮｏ．２のＦＬＤＭ命令で浮
動小数点レジスタ１２と１３に主記憶からデ−タをロ−
ドしているが、キャッシュにデ−タがある場合はＦＬＤ
Ｍ命令は短いサイクル数で終わるが、キャッシュにない
場合は、キャッシュよりはかなり低速の主記憶からデ−
タを読みださなくてはならず、キャッシュにデ−タがあ
る場合に比べ、かなり時間がかかることになる。次にＮ
ｏ．３のＦＡＤＤ命令は浮動小数点レジスタ１２と１３
の値を使うが、浮動小数点レジスタ１２と１３は先行す
る２つのＦＬＤＭ命令の実行が終わらないと、すなわ
ち、前記のデ−タ読みだしが終わらないと、値が決まら
ないので、ＦＡＤＤ命令はそれまで実行が開始できな
い。さらに、Ｎｏ．４のＦＳＴＭ命令は浮動小数点レジ
スタ２０の値を使うが、浮動小数点レジスタ２０の値は
先行するＦＡＤＤ命令の実行が終わらないと値が決まら
ないので、ＦＳＴＭ命令はそれまで実行が開始できな
い。すなわち、（１）デ−タ読みだし時間、（２）レジ
スタのぶつかりという２つの性能低下要因がル−プの実
行時間を長くすることになる。特に（１）は長大デ−タ
を扱う計算の場合は深刻で、必要なデ−タがキャッシュ
にはいりきらない場合が多いので、性能の低下は大きく
なる。The problem here is the execution time of one loop. First, no. 1 and No. Load data from main memory to floating point registers 12 and 13 by FLDM instruction 2
FLD when there is data in the cache
The M instruction ends in a short number of cycles, but if it is not in the cache, the data is taken from main memory, which is much slower than the cache.
The data must be read out, which takes much longer than when data is in the cache. Then N
o. 3 FADD instructions are floating point registers 12 and 13
Since the values of the floating point registers 12 and 13 are not determined until the execution of the preceding two FLDM instructions is completed, that is, the data reading is not completed, the FADD instruction Execution cannot be started until. In addition, No. The FSTM instruction of No. 4 uses the value of the floating-point register 20, but the value of the floating-point register 20 cannot be determined until the execution of the preceding FADD instruction is completed, so that the execution of the FSTM instruction cannot be started until then. That is, (1) the data read time and (2) the collision of the registers, two performance degradation factors increase the execution time of the loop. In particular, the case (1) is serious in the case of calculation for handling long data, and since the required data often cannot be stored in the cache, the performance is greatly reduced.

【００２２】この問題を解決する一手法がル−プアンロ
−リングであり、表２に示す手法である。すなわち、１
ル−プで複数要素（＝ｎとする）を処理し、１ル−プで
１要素を処理する場合に比べ、ル−プ回数を１／ｎにす
る方式である。表２は１ル−プで４要素を処理する方式
である。One method for solving this problem is loop unrolling, which is shown in Table 2. That is, 1
In this method, a plurality of elements (= n) are processed in a loop, and the number of loops is reduced to 1 / n as compared with a case where one element is processed in one loop. Table 2 shows a method for processing four elements in one loop.

【００２３】[0023]

【表２】 [Table 2]

【００２４】ここで、表２のプログラムの実行に先だっ
て、ベクトルＡは、主記憶アドレスａｄ１から始まる連
続領域に格納されているものとする。すなわち、Ａ
（１）の主記憶アドレスがａｄ１、Ａ（２）の主記憶ア
ドレスがａｄ１＋８というように格納されている。同様
にベクトルＢは、主記憶アドレスａｄ２から始まる連続
領域に格納されているものとする。また、ベクトルＣ
は、主記憶アドレスａｄ３から始まる連続領域に格納す
るものとする。汎用レジスタ１にはａｄ１が、汎用レジ
スタ２にはａｄ２が、汎用レジスタ３にはａｄ３が、汎
用レジスタ４にはＮ／４が前もって格納されているもの
とする。Here, it is assumed that the vector A is stored in a continuous area starting from the main storage address ad1 prior to the execution of the program shown in Table 2. That is, A
The main storage address of (1) is stored as ad1, and the main storage address of A (2) is stored as ad1 + 8. Similarly, it is assumed that the vector B is stored in a continuous area starting from the main storage address ad2. Also, the vector C
Is stored in a continuous area starting from the main storage address ad3. It is assumed that ad1 is stored in general-purpose register 1, ad2 is stored in general-purpose register 2, ad3 is stored in general-purpose register 3, and N / 4 is stored in general-purpose register 4 in advance.

【００２５】表２からわかるように、１７命令からなる
ル−プを１回実行することによって、４要素の結果が求
まり、このル−プをＮ／４回実行することによって全要
素計算ができる。As can be seen from Table 2, by executing a loop consisting of 17 instructions once, a four-element result is obtained, and by executing this loop N / 4 times, all elements can be calculated. .

【００２６】表２からわかる通り、ｉ番目の要素に関
し、ロ−ドをＮｏ．１とＮｏ．２のＦＬＤＭ命令で、加
算をＮｏ．９のＦＡＤＤ命令で、ストアをＮｏ．１３の
ＦＳＴＭ命令で行う。同様に、ｉ＋１番目の要素に関
し、ロ−ドをＮｏ．３とＮｏ．４のＦＬＤＭ命令で、加
算をＮｏ．１０のＦＡＤＤ命令で、ストアをＮｏ．１４
のＦＳＴＭ命令で行う。同様に、ｉ＋２番目の要素に関
し、ロ−ドをＮｏ．５とＮｏ．６のＦＬＤＭ命令で、加
算をＮｏ．１１のＦＡＤＤ命令で、ストアをＮｏ．１５
のＦＳＴＭ命令で行う。同様に、ｉ＋３番目の要素に関
し、ロ−ドをＮｏ．７とＮｏ．８のＦＬＤＭ命令で、加
算をＮｏ．１２のＦＡＤＤ命令で、ストアをＮｏ．１６
のＦＳＴＭ命令で行う。したがって、表１に比べ、ある
１つの要素番号で示される要素に関するロ−ド、加算、
ストアという一連の処理が、命令列上で離れることにな
り、前記の（１）デ−タ読みだし時間、（２）レジスタ
のぶつかりという２つの性能低下要因の影響を低減でき
る。たとえば、Ｎｏ．１とＮｏ．２のＦＬＤＭ命令でＡ
（ｉ）とＢ（ｉ）のロ−ドが行われ、そのロ−ド結果が
使われるのが、７命令後になるので、デ−タ読みだし時
間が７サイクル以内ならば、そのロ−ド結果を使うＮ
ｏ．９のＦＡＤＤ命令が待たされることはない。また、
Ｎｏ．９のＦＡＤＤ命令による加算結果Ａ（ｉ）＋Ｂ
（ｉ）が使われるのが、４命令後になるので、加算に要
する時間が４サイクル以内ならば、Ｎｏ．１３のＦＳＴ
Ｍ命令が待たされることはない。As can be seen from Table 2, for the i-th element, the load is No. 1 and No. In the FLDM instruction of No. 2, the addition is No. 9 is stored in No. 9 by the FADD instruction. 13 FSTM instructions. Similarly, for the (i + 1) th element, the load is set to No. 3 and No. No. 4 FLDM instruction, the addition is No. With the FADD instruction of No. 10, the store is No. 14
FSTM instruction. Similarly, regarding the (i + 2) th element, the load is set to No. 5 and No. 5 No. 6 with the FLDM instruction No. 6 11, the store is No. 11 by the FADD instruction. Fifteen
FSTM instruction. Similarly, with respect to the (i + 3) th element, the load is No. 7 and no. No. 8 with the FLDM instruction No. 8 No. 12, the store is No. 16
FSTM instruction. Therefore, as compared with Table 1, loading, addition,
A series of processes called store are separated on the instruction sequence, and the effects of the two performance degradation factors (1) data read time and (2) register collision can be reduced. For example, No. 1 and No. A with 2 FLDM instructions
Loading of (i) and B (i) is performed, and the loading result is used after 7 instructions. If the data reading time is within 7 cycles, the loading is performed. Use result N
o. Nine FADD instructions are not awaited. Also,
No. 9 Addition result A (i) + B by FADD instruction
Since (i) is used after four instructions, if the time required for addition is within four cycles, No. 13 FSTs
There is no waiting for the M instruction.

【００２７】このように、ル−プアンロ−リングによっ
て、性能は向上するが、この方式の欠点は、多くのレジ
スタを必要とすることである。表１のプログラムが３本
の浮動小数点レジスタを必要とするのに対し、表２のプ
ログラムは、１２本の浮動小数点レジスタを必要とす
る。デ−タの読みだしに要する時間がさらに長かった
り、演算に要する時間がさらに長かったりすると、もっ
と多くの要素を１ル−プで処理しなくてはならず、より
多くのレジスタが必要となることになる。Although loop unrolling improves performance, the disadvantage of this scheme is that it requires many registers. The program in Table 1 requires three floating point registers, while the program in Table 2 requires twelve floating point registers. If the time required for reading the data is longer or the time required for the operation is longer, more elements must be processed in one loop and more registers are required. Will be.

【００２８】一般に、レジスタはアクティブな素子（す
なわち、メモリ素子ではない）で構成され、読みだし／
書き込みのためのポ−ト（すなわち、デ−タの出入口）
を多く用意することができるので、いわゆる記憶装置が
１つの動作サイクルに１個のデ−タの読みだし／書き込
みしかできないのに比べ、極めて高速である。したがっ
て、主記憶はもちろん、キャッシュに比べても、高速化
のためには、十分な容量のレジスタをもっていることが
必要不可欠である。それにもかかわらず、従来、レジス
タの数が比較的少なかったのは、ビット当たりのコスト
が高価であったことと、下記に示すように命令形式上の
レジスタ番号のフィ−ルドの長さに制限があったためで
ある。コストの問題はＬＳＩ化で解決されつつあるが、
後者はまだ未解決であった。Generally, a register is made up of active elements (ie, not memory elements),
Port for writing (ie, data entry / exit)
Can be prepared, so that the speed of the storage device is much higher than that of a storage device which can only read / write one piece of data in one operation cycle. Therefore, it is indispensable to have a register with a sufficient capacity for speeding up, not only the main memory but also the cache. Nevertheless, conventionally, the number of registers was relatively small because of the high cost per bit and the restriction on the length of the register number field in the instruction format as shown below. Because there was. Although the problem of cost is being solved by LSI,
The latter was still unresolved.

【００２９】プログラムでアドレス可能なレジスタの数
は、ア−キテクチャ上から制限されている。たとえば、
命令語中にレジスタ指定フィ−ルドが５ビットあれば、
アドレス可能なレジスタの数は３２（２の５乗）であ
る。該レジスタ指定フィ−ルドのビット数を増せば、プ
ログラムでアドレス可能なレジスタの数は増えるが、命
令形式が変わるので、既存のプログラムの変更が必要と
なり、非現実的である。The number of registers that can be addressed by a program is limited in terms of architecture. For example,
If there are 5 bits in the register specification field in the instruction word,
The number of addressable registers is 32 (2 to the fifth power). Increasing the number of bits in the register specification field increases the number of registers that can be addressed by the program, but changes the instruction format, which requires modification of the existing program, which is impractical.

【００３０】そこで、デ−タ処理装置のア−キテクチャ
を変えずに、命令によってアドレス可能なレジスタの数
より多いレジスタをデ−タ処理装置がアクセス可能とす
る方式が必要となるが、従来技術１では、過去にロ−ド
命令が実行された主記憶アドレスに対し、新たにロ−ド
命令が発行された場合は高速化される。しかし、式
（１）のようなベクトル計算は多くの場合、表１、２の
プログラムのように主記憶上のデ−タに対するロ−ド要
求は１度しか出ないので、従来技術では高速化されない
という問題がある。Therefore, it is necessary to provide a system in which the data processing device can access more registers than the number of registers addressable by the instruction without changing the architecture of the data processing device. In 1, when a new load instruction is issued for a main memory address where a load instruction was executed in the past, the speed is increased. However, in many cases, the vector calculation as in the equation (1) requires only one load request for the data in the main memory as in the programs shown in Tables 1 and 2. There is a problem that is not done.

【００３１】また、従来技術２では、１つのプログラム
で使えるのはある１つのウィンドウに属する物理レジス
タのみであり、その数はプログラムでアドレス可能なレ
ジスタの数に等しく、１つのプログラムで行なわれる演
算を高速化できない。すなわち、前記のウィンドウの機
構は、プログラムの呼びと戻りが発生する場合のみ処理
の高速化がなされ、式（１）のベクトル計算のように１
つのプログラムで処理が完結しているような場合は高速
化されないという問題がある。また、前記のウィンドウ
オ−バフロ−、ウィンドウアンダフロ−の割込みは式
（１）のベクトル計算のように１つのプログラムで処理
が完結していて、プログラムの呼びと戻りが発生しない
場合は不要であるという問題がある。Further, in the prior art 2, only one physical register belonging to a certain window can be used in one program, and the number thereof is equal to the number of registers addressable by the program, and the number of operations performed by one program is one. Can not speed up. In other words, the above-mentioned window mechanism speeds up the processing only when a program call and return occur.
When the processing is completed in one program, there is a problem that the speed is not increased. Also, the interruption of the window overflow and the window underflow is not necessary when the processing is completed by one program as in the vector calculation of the equation (1) and no call and return of the program occur. There is a problem that there is.

【００３２】本発明の目的は、デ−タ処理装置のア−キ
テクチャを変えずに、命令によってアドレス可能なレジ
スタの数より多いレジスタをデ−タ処理装置がアクセス
可能とし、科学技術計算におけるベクトル計算を高速に
実行する方式を提供することにある。An object of the present invention is to make it possible for a data processing device to access more registers than the number of registers addressable by an instruction without changing the architecture of the data processing device, and to provide a vector for scientific calculation. An object of the present invention is to provide a method for performing calculations at high speed.

【００３３】[0033]

【課題を解決するための手段】上記目的を達成するため
に、命令によってアドレス可能な浮動小数点レジスタの
数より多い物理浮動小数点レジスタと呼ばれる浮動小数
点レジスタを用意し、該レジスタは物理浮動小数点レジ
スタ番号で参照されることにする。また、物理浮動小数
点レジスタ全体をウィンドウと呼ばれる複数個のレジス
タから成る部分群に区分けし、ウィンドウ番号とウィン
ドウ内のレジスタ番号の組合せで参照することもできる
ようにする。これらの対応関係を決定づけるために、論
理浮動小数点レジスタ番号と呼ばれる、プログラム命令
中の浮動小数点レジスタ番号を、物理浮動小数点レジス
タ番号に変換するパタ−ンを示す現浮動小数点ウィンド
ウポインタを格納する現浮動小数点ウィンドウポインタ
レジスタ、現浮動小数点ウィンドウポインタが有効であ
ることを示す値を格納しておく現浮動小数点ウィンドウ
ポインタ有効レジスタ、現浮動小数点ウィンドウポイン
タを用いて論理浮動小数点レジスタ番号を物理浮動小数
点レジスタ番号に変換する変換回路、現浮動小数点ウィ
ンドウポインタを変更する現浮動小数点ウィンドウポイ
ンタ変更命令、論理浮動小数点レジスタ番号を現浮動小
数点ウィンドウポインタから決定されるが該現浮動小数
点ウィンドウポインタとは異なる値を現浮動小数点ウィ
ンドウポインタとして変換した物理浮動小数点レジスタ
に主記憶デ−タを格納する浮動小数点レジスタプリロ−
ド命令、論理浮動小数点レジスタ番号を現浮動小数点ウ
ィンドウポインタから決定されるが該現浮動小数点ウィ
ンドウポインタとは異なる値を現浮動小数点ウィンドウ
ポインタとして変換した物理浮動小数点レジスタから主
記憶にデ−タを格納する浮動小数点レジスタポストスト
ア命令を設ける。In order to achieve the above object, there is provided a floating point register called a physical floating point register having more than the number of floating point registers addressable by an instruction, wherein the register is a physical floating point register number. Will be referred to by In addition, the entire physical floating-point register is divided into a partial group consisting of a plurality of registers called a window, so that the register can be referred to by a combination of a window number and a register number in the window. In order to determine these correspondences, a current floating point window pointer, which stores a current floating point window pointer indicating a pattern for converting a floating point register number in a program instruction into a physical floating point register number, which is called a logical floating point register number, is used. Decimal point window pointer register, current floating point window pointer valid register that stores a value indicating that the current floating point window pointer is valid, logical floating point register number using the current floating point window pointer, physical floating point register number A floating-point window pointer change instruction for changing the current floating-point window pointer, and a logical floating-point register number determined from the current floating-point window pointer, but different from the current floating-point window pointer. Main memory de values to the physical floating point register obtained by converting the current floating point window pointer - floating point register preload to store data -
The instruction and logical floating-point register number are determined from the current floating-point window pointer, but data different from the current floating-point window pointer are converted to the current floating-point window pointer as data from the physical floating-point register and data is stored in the main memory. Provide a floating-point register post-store instruction to store.

【００３４】[0034]

【作用】浮動小数点レジスタを参照する命令では全て、
現浮動小数点ウィンドウポインタ有効レジスタの値が１
であれば、論理浮動小数点レジスタ番号−物理浮動小数
点レジスタ番号変換が行われ、浮動小数点レジスタの参
照で物理浮動小数点レジスタ番号が参照される。現浮動
小数点ウィンドウポインタ有効レジスタの値が０であれ
ば、論理浮動小数点レジスタ番号は物理浮動小数点レジ
スタ番号に等しい。[Effect] In all instructions that refer to the floating-point register,
The value of the current floating-point window pointer valid register is 1
If, the logical floating-point register number-physical floating-point register number conversion is performed, and the physical floating-point register number is referred to by referring to the floating-point register. If the value of the current floating point window pointer valid register is 0, the logical floating point register number is equal to the physical floating point register number.

【００３５】論理浮動小数点レジスタ番号−物理浮動小
数点レジスタ番号変換は以下のようにして行われる。論
理浮動小数点レジスタが物理浮動小数点レジスタのどの
範囲を指定するかを、複数通り設ける。この範囲の指定
が前述のウィンドウである。The conversion of the logical floating-point register number to the physical floating-point register number is performed as follows. A plurality of types are provided for specifying which range of the physical floating-point register the logical floating-point register specifies. The specification of this range is the above-mentioned window.

【００３６】ウィンドウの設けかたの一例を図１に示
す。本例では、論理浮動小数点レジスタは３２本で、論
理浮動小数点レジスタ番号は０から３１まで指定可能と
する。物理浮動小数点レジスタは８８本で、物理浮動小
数点レジスタ番号は０から８７である。ウィンドウはｗ
０からｗ３の４通り設け、現浮動小数点ウィンドウポイ
ンタ０から３でそれぞれ指定される。FIG. 1 shows an example of how to provide a window. In this example, there are 32 logical floating point registers, and logical floating point register numbers 0 to 31 can be specified. There are 88 physical floating point registers, and physical floating point register numbers are 0 to 87. The window is w
0 to w3 are provided, and are designated by the current floating point window pointers 0 to 3, respectively.

【００３７】ここで、現浮動小数点ウィンドウポインタ
をｗ、論理浮動小数点レジスタ番号をｒと表記し、物理
浮動小数点レジスタ番号はｗとｒから決まるので、＜
ｗ，ｒ＞と表記することにする。Here, the current floating-point window pointer is denoted by w, the logical floating-point register number is denoted by r, and the physical floating-point register number is determined by w and r.
w, r>.

【００３８】命令でｗの増減を行なうが、該増減は４を
法として行なわれる。たとえば、ｗ＝３では、ｗ＋１の
値は０になる。An instruction is used to increase or decrease w, and the increase or decrease is performed modulo 4. For example, when w = 3, the value of w + 1 is 0.

【００３９】図１の例では、以下のように論理浮動小数
点レジスタ番号−物理浮動小数点レジスタ番号変換が行
われる。In the example of FIG. 1, the conversion of the logical floating point register number to the physical floating point register number is performed as follows.

【００４０】１．０≦ｒ≦７の時：ｗに関係なく＜ｗ，ｒ＞＝ｒ（２）２．８≦ｒ≦３１の時：＜０，ｒ＞＝ｒ（３）＜１，ｒ＞＝ｒ＋２０（４）＜２，ｒ＞＝ｒ＋４０（５）＜３，ｒ＞＝ｒ＋６０（８≦ｒ≦２７）ｒ−２０（２８≦ｒ≦３１）（６）上記の変換法で以下の２つが特長的である。1. When 0 ≦ r ≦ 7: <w, r> = r regardless of w (2) When 8 ≦ r ≦ 31: <0, r> = r (3) <1, r> = r + 20 (4) <2, r> = r + 40 (5) <3, r> = r + 60 (8 ≦ r ≦ 27) r-20 (28 ≦ r ≦ 31) (6) The following two features are characteristic of the above conversion method.

【００４１】１．０番から７番の物理浮動小数点レジ
スタは、各ウィンドウ共通に使う。これらのレジスタ
は、ｇｌｏｂａｌｒｅｇｉｓｔｅｒとして、それぞれ
のウィンドウを用いる演算ル−プに共通のデ−タを保持
する。1. The physical floating point registers 0 to 7 are used in common for each window. These registers hold, as global registers, data common to the operation loop using each window.

【００４２】２．各ウィンドウの論理浮動小数点レジ
スタ２８番から３１番は、現浮動小数点ウィンドウポイ
ンタが１つ大きいウィンドウの論理浮動小数点レジスタ
８番から１１番と同一の物理浮動小数点レジスタを指
す。これらのレジスタは、ｏｖｅｒｌａｐｒｅｇｉｓ
ｔｅｒとして隣合うウィンドウを用いる演算ル−プ間の
デ−タの受渡しに用いる。2. The logical floating point registers 28 to 31 of each window indicate the same physical floating point registers as the logical floating point registers 8 to 11 of the window whose current floating point window pointer is one larger. These registers are overlap regis
It is used to transfer data between operation loops using adjacent windows as ter.

【００４３】前記新設命令の命令ニモニックと機能を、
一例として、以下のように定める。The instruction mnemonics and functions of the new instruction are as follows:
As an example, it is determined as follows.

【００４４】１．現浮動小数点ウィンドウポインタ変更命令（命令ニモニック）ＣＦＲＷＰＳＧＲｍ（機能）汎用レジスタｍの値を現浮動小数点ウィンドウ
ポインタレジスタにセットする。２．浮動小数点レジスタプリロ−ド命令（命令ニモニック）ＦＬＤＰＲＭａ（ＧＲｍ），ＦＲ
ｎ（機能）汎用レジスタｍの値で表される主記憶アドレス
から８バイトのデ−タを読みだし、浮動小数点レジスタ
ｎに格納する。このとき、（現浮動小数点レジスタウィ
ンドウポインタ＋１）を現浮動小数点レジスタウィンド
ウポインタとして、論理浮動小数点レジスタ番号−物理
浮動小数点レジスタ番号変換が行われる。1. Current floating-point window pointer change instruction (instruction mnemonic) CFRWPS GRm (Function) Sets the value of the general-purpose register m to the current floating-point window pointer register. 2. Floating point register preload instruction (instruction mnemonic) FLDPRM a (GRm), FR
n (Function) Reads 8-byte data from the main memory address represented by the value of the general-purpose register m and stores it in the floating-point register n. At this time, logical floating-point register number-physical floating-point register number conversion is performed using (current floating-point register window pointer + 1) as the current floating-point register window pointer.

【００４５】その後、汎用レジスタｍの値をａ加える。Thereafter, the value of the general-purpose register m is added by a.

【００４６】３．浮動小数点レジスタポストストア命令ＦＳＴＰＯＭａ（ＧＲｍ），ＦＲｎ（機能）浮動小数点レジスタｎの値（８バイト）を汎用
レジスタｍの値で表される主記憶アドレスにストアす
る。このとき、（現浮動小数点レジスタウィンドウポイ
ンタ−１）を現浮動小数点レジスタウィンドウポインタ
として、論理浮動小数点レジスタ番号−物理浮動小数点
レジスタ番号変換が行われる。3. Floating-point register post store instruction FSTPOM a (GRm), FRn (Function) Stores the value (8 bytes) of floating-point register n at the main memory address represented by the value of general-purpose register m. At this time, logical floating point register number-physical floating point register number conversion is performed using (current floating point register window pointer-1) as the current floating point register window pointer.

【００４７】その後、汎用レジスタｍの値をａ加える。Thereafter, the value of the general-purpose register m is added by a.

【００４８】また、一般の浮動小数点命令（すなわち、
上記２−３を除く浮動小数点命令）では、現浮動小数点
レジスタウィンドウポインタを用いて、論理浮動小数点
レジスタ番号−物理浮動小数点レジスタ番号変換が行わ
れる。In addition, a general floating-point instruction (ie,
In the floating-point instructions except for the above 2-3, the logical floating-point register number-physical floating-point register number conversion is performed using the current floating-point register window pointer.

【００４９】式（１）を、上記新設機能を用いると、表
３のようなプログラムになる。When the above-mentioned new function is used for the expression (1), a program as shown in Table 3 is obtained.

【００５０】ここで、表３のプログラムの実行に先だっ
て、ベクトルＡは、主記憶アドレスａｄ１から始まる連
続領域に格納されているものとする。すなわち、Ａ
（１）の主記憶アドレスがａｄ１、Ａ（２）の主記憶ア
ドレスがａｄ１＋８というように格納されている。同様
にベクトルＢは、主記憶アドレスａｄ２から始まる連続
領域に格納されているものとする。また、ベクトルＣ
は、主記憶アドレスａｄ３から始まる連続領域に格納す
るものとする。汎用レジスタ４にはＮ−２、汎用レジス
タ５には１、汎用レジスタ６には０、現浮動小数点レジ
スタウインドウポインタレジスタには０が格納されてい
るものとする。Here, it is assumed that the vector A is stored in a continuous area starting from the main storage address ad1 prior to execution of the program shown in Table 3. That is, A
The main storage address of (1) is stored as ad1, and the main storage address of A (2) is stored as ad1 + 8. Similarly, it is assumed that the vector B is stored in a continuous area starting from the main storage address ad2. Also, the vector C
Is stored in a continuous area starting from the main storage address ad3. It is assumed that N-2 is stored in the general-purpose register 4, 1 is stored in the general-purpose register 5, 0 is stored in the general-purpose register 6, and 0 is stored in the current floating-point register window pointer register.

【００５１】ここで、表３には、上記の記述にないＡＤ
Ｄ命令が含まれているので、該命令の機能を以下に述べ
る。Here, Table 3 shows ADs that are not described above.
Since the D instruction is included, the function of the instruction will be described below.

【００５２】ＡＤＤＧＲｊ，ＧＲｍ（機能）汎用レジスタｊの値と汎用レジスタｍの値を加
えて汎用レジスタｊに格納する。ADD GRj, GRm (Function) Adds the value of general-purpose register j and the value of general-purpose register m and stores the result in general-purpose register j.

【００５３】[0053]

【表３】 [Table 3]

【００５４】以下、表３について説明する。Ｎｏ．１の
ＦＬＤＭ命令では、現浮動小数点レジスタウィンドウポ
インタで論理浮動小数点レジスタ番号−物理浮動小数点
レジスタ番号変換が行われるので、Ａ（１）が物理浮動
小数点レジスタ＜０、１２＞に格納される。同様にＮ
ｏ．２のＦＬＤＭ命令でＢ（１）が物理浮動小数点レジ
スタ＜０、１３＞に格納される。Ｎｏ．３のＦＬＤＰＲ
Ｍ命令では、「現浮動小数点レジスタウィンドウポイン
タ＋１」の値で論理浮動小数点レジスタ番号−物理浮動
小数点レジスタ番号変換が行われるので、Ａ（２）が物
理浮動小数点レジスタ＜１、１２＞に格納される。同様
に、Ｎｏ．４のＦＬＤＰＲＭ命令でＢ（２）が物理浮動
小数点レジスタ＜１、１３＞に格納される。Ｎｏ．５の
ＦＡＤＤ命令では、現浮動小数点レジスタウィンドウポ
インタで論理浮動小数点レジスタ番号−物理浮動小数点
レジスタ番号変換が行われるので、物理浮動小数点レジ
スタ＜０、１２＞の値と物理浮動小数点レジスタ＜０、
１３＞の値が加算されて物理浮動小数点レジスタ＜０、
２０＞に格納される。Ｎｏ．１，Ｎｏ．２のＦＬＤＭ命
令でＡ（１）、Ｂ（１）がそれぞれ物理浮動小数点レジ
スタ＜０、１２＞と物理浮動小数点レジスタ＜０、１３
＞に格納されているので、Ａ（１）＋Ｂ（１）が物理浮
動小数点レジスタ＜０、２０＞に格納されることにな
る。Ｎｏ．５のＡＤＤ命令とＮｏ．６のＣＦＲＷＰＳ命
令で現浮動小数点レジスタウィンドウポインタが＋１さ
れ、１になる。Ｎｏ．８のＦＬＤＰＲＭ命令からＮｏ．
１４のＢＣＮＴ命令までが、ル−プを構成し、Ｎ−２回
くりかえし実行される。以下、ル−プ内での現浮動小数
点レジスタウィンドウポインタの値をｗとする。第ｉ−
１回目に実行されるル−プについて見る。汎用レジスタ
１、汎用レジスタ２の値からわかる通り、Ｎｏ．１０の
ＦＡＤＤ命令で加算されるデ−タはＡ（ｉ）とＢ（ｉ）
になり、Ｎｏ．８、Ｎｏ．９のＦＬＤＰＲＭ命令で物理
浮動小数点レジスタ＜ｗ＋１、１２＞と物理浮動小数点
レジスタ＜ｗ＋１、１３＞にロ−ドされるのは、それぞ
れ、Ａ（ｉ＋１）、Ｂ（ｉ＋１）になる。汎用レジスタ
３の値からわかる通り、Ｎｏ．１１のＦＳＴＰＯＭ命令
で物理浮動小数点レジスタ＜ｗ−１、２０＞の値がＣ
（ｉ−１）の主記憶位置に格納される。Ｎｏ．１２のＡ
ＤＤ命令とＮｏ．１３のＣＦＲＷＰＳ命令で現浮動小数
点レジスタウィンドウポインタｗが＋１され、ル−プの
先頭に戻る。すなわち、１つのル−プの中では、１つ後
のル−プで加算されるデ−タＡ（ｉ＋１）、Ｂ（ｉ＋
１）をそれぞれ、物理浮動小数点レジスタ＜ｗ＋１，１
２＞、物理浮動小数点レジスタ＜ｗ＋１，１３＞に格納
し、前のル−プでそれぞれ、物理浮動小数点レジスタ＜
ｗ，１２＞、物理浮動小数点レジスタ＜ｗ，１３＞に格
納されたＡ（ｉ）とＢ（ｉ）を加算して、物理浮動小数
点レジスタ＜ｗ，２０＞に格納し、前のル−プで物理浮
動小数点レジスタ＜ｗ−１，２０＞に格納されたＡ（ｉ
−１）＋Ｂ（ｉ−１）をＣ（ｉ−１）の主記憶位置に格
納する。Table 3 will be described below. No. In the FLDM instruction of No. 1, the conversion of the logical floating-point register number to the physical floating-point register number is performed by the current floating-point register window pointer, so that A (1) is stored in the physical floating-point register <0, 12>. Similarly N
o. B (1) is stored in the physical floating point register <0, 13> by the second FLDM instruction. No. 3 FLDPR
In the M instruction, since the logical floating-point register number-physical floating-point register number conversion is performed using the value of "current floating-point register window pointer + 1", A (2) is stored in the physical floating-point register <1, 12>. You. Similarly, no. By the FLDPRM instruction of No. 4, B (2) is stored in the physical floating-point register <1, 13>. No. In the FADD instruction No. 5, the logical floating-point register number-physical floating-point register number conversion is performed by the current floating-point register window pointer, so that the values of the physical floating-point registers <0, 12> and the physical floating-point registers <0,
13> is added to the physical floating-point register <0,
20>. No. 1, No. A (1) and B (1) are respectively the physical floating point registers <0, 12> and the physical floating point registers <0, 13
>, A (1) + B (1) is stored in the physical floating point register <0, 20>. No. No. 5 ADD instruction and No. 5 The current CFP register window pointer is incremented by one by the CFWWPS instruction of No. 6 to be 1. No. No. 8 to No. 8 FLDPRM instruction.
Up to 14 BCNT instructions constitute a loop and are repeatedly executed N-2 times. Hereinafter, the value of the current floating point register window pointer in the loop is defined as w. I-th
Let's look at the first loop. As can be seen from the values of the general-purpose registers 1 and 2, The data added by the 10 FADD instructions are A (i) and B (i).
No. 8, no. The data loaded to the physical floating point registers <w + 1, 12> and <w + 1, 13> by the FLDPRM instruction No. 9 are A (i + 1) and B (i + 1), respectively. As can be seen from the value of general-purpose register 3, No. The value of the physical floating-point register <w-1, 20> is C
It is stored in the main storage location of (i-1). No. 12 A
DD instruction and No. The current floating-point register window pointer w is incremented by 1 by the 13 CFRWPS instruction, and the process returns to the top of the loop. That is, in one loop, data A (i + 1) and B (i +
1) respectively correspond to the physical floating-point registers <w + 1, 1
2> and stored in the physical floating-point registers <w + 1, 13>, and the physical loop-point registers <
w, 12> and A (i) and B (i) stored in the physical floating-point register <w, 13>, and the result is stored in the physical floating-point register <w, 20>. A (i) stored in the physical floating-point register <w−1, 20>
-1) + B (i-1) is stored in the main storage location of C (i-1).

【００５５】ル−プを抜けた後のＮｏ．１５からＮｏ．
１９の命令は、未処理の要素の処理であり、Ｎｏ．１５
のＦＡＤＤ命令でＡ（Ｎ）＋Ｂ（Ｎ）を実行し、Ｎｏ．
１６、、Ｎｏ．１９のＦＳＴＰＯＭ命令でそれぞれＡ
（Ｎ−１）＋Ｂ（Ｎ−１）、Ａ（Ｎ）＋Ｂ（Ｎ）の主記
憶へのストアを行う。No. after exiting the loop. No. 15 to No.
The instruction of No. 19 is processing of an unprocessed element. Fifteen
A (N) + B (N) is executed by the FADD instruction of
16, No. A in each of 19 FSTPOM instructions
(N-1) + B (N-1) and A (N) + B (N) are stored in the main memory.

【００５６】ル−プ内の処理をからわかるように、Ｎ
ｏ．８、Ｎｏ．９のＦＬＤＰＲＭ命令で論理浮動小数点
レジスタ１２、１３を指定して、直後のＮｏ．１０のＦ
ＡＤＤ命令で論理浮動小数点レジスタ１２、１３を使っ
ているが、アクセスしている物理浮動小数点レジスタは
異なる。また、Ｎｏ．１０のＦＡＤＤ命令で論理浮動小
数点レジスタ２０に加算結果を格納し、直後のＦＳＴＰ
ＯＭ命令で論理浮動小数点レジスタ２０を使っている
が、アクセスしている物理浮動小数点レジスタは異な
る。したがって、表１のプログラムで発生した、デ−タ
読みだし待ち、演算終了待ちで後続命令の実行が待たさ
れるという現象は起こらず、言いかえれば、デ−タ読み
だし、演算は次ル−プの実行までに完了すればよいわけ
で、プログラムが高速に実行される。また、プログラム
で指定している論理浮動小数点レジスタは３つだけであ
り、表２のプログラムのように浮動小数点レジスタを数
多く使う必要もない。As can be seen from the processing in the loop, N
o. 8, no. The FLPRM instruction No. 9 specifies the logical floating-point registers 12 and 13 and the immediately following No. 9 10 F
Although the logical floating point registers 12 and 13 are used in the ADD instruction, the physical floating point registers accessed are different. In addition, No. The addition result is stored in the logical floating-point register 20 by the FADD instruction No. 10 and the FSTP
Although the logical floating point register 20 is used in the OM instruction, the physical floating point register accessed is different. Therefore, the phenomenon of waiting for the data reading and the completion of the operation and waiting for the execution of the subsequent instruction which occurred in the program of Table 1 does not occur. In other words, the data is read and the operation is executed in the next loop. That is, the program is executed at high speed. Further, since only three logical floating point registers are specified in the program, there is no need to use many floating point registers as in the program of Table 2.

【００５７】ここで、表３のプログラムには、表１、表
２のプログラムにはない現浮動小数点レジスタウィンド
ウポインタの更新の処理があってオ−バヘッドになって
いる。たとえば、表１のプログラムのル−プが５命令で
構成されているのに対し、表３のプログラムのル−プは
７命令で構成されている。しかし、表１のプログラムに
あるデ−タ読みだし待ち、演算終了待ちで後続命令の実
行が待たされるオ−バヘッドの方がはるかに大きい。ま
た、表２のプログラムのようなル−プアンロ−リングの
手法も、プログラムで指定できるレジスタを使い果たし
てしまうと実現できないので、現浮動小数点レジスタウ
ィンドウポインタの更新のオ−バヘッドがあっても、本
発明の方式のほうがすぐれていると考えられる。Here, the program of Table 3 has a process of updating the current floating point register window pointer which is not included in the programs of Tables 1 and 2, and thus has an overhead. For example, while the loop of the program in Table 1 is composed of five instructions, the loop of the program in Table 3 is composed of seven instructions. However, the overhead of waiting for reading of data and waiting for completion of operation in the program of Table 1 and waiting for execution of a subsequent instruction is much larger. Also, the loop unrolling method as shown in the program of Table 2 cannot be realized if the registers that can be specified by the program are exhausted. Therefore, even if there is overhead for updating the current floating-point register window pointer, this method is not applicable. It is believed that the inventive scheme is superior.

【００５８】以上のように、本発明の方式によって、命
令列のル−プのくりかえしが主となる科学技術計算のベ
クトル計算では、ル−プごとに現浮動小数点レジスタウ
ィンドウポインタを変え、すなわち、使うウィンドウを
変え、ｉ番目の要素の処理を、第ｉ−１ル−プにおける
浮動小数点レジスタプリロ−ド命令によるオペランドベ
クトルの第ｉ要素のロ−ド、第ｉル−プにおける演算、
第ｉ＋１ル−プにおける浮動小数点レジスタポストスト
ア命令による結果格納用ベクトルの第ｉ要素への演算結
果のストアによって行なうことによって、１つのデ−タ
に対するロ−ド、演算、ストアの処理の命令列上での距
離が大きくなり、デ−タの読みだし時間、演算実行時間
の影響による性能低下を防ぐことができ、高速化ができ
る。As described above, according to the method of the present invention, in the vector calculation of the scientific and technical calculation in which the loop of the instruction sequence is mainly performed, the current floating-point register window pointer is changed for each loop. The window to be used is changed, and the processing of the i-th element is performed by loading the i-th element of the operand vector by the floating-point register preload instruction in the (i-1) -th loop, performing the operation in the i-th loop,
An instruction sequence of load, operation, and store processing for one data is performed by storing the operation result in the i-th element of the result storage vector by the floating-point register post-store instruction in the (i + 1) -th loop. The above distance is increased, and the performance can be prevented from deteriorating due to the influence of the data reading time and the operation execution time, and the speed can be increased.

【００５９】[0059]

【実施例】以下、本発明の一実施例を図を用いて説明す
る。図２に本実施例のデ−タ処理装置を示す。デ−タ処
理装置は、命令の発行および実行を行なう命令処理ユニ
ット１０、命令処理ユニットで実行する命令やデ−タを
記憶する主記憶３０、命令処理ユニットと主記憶との間
のデ−タのやりとりを制御する記憶制御ユニット２０か
ら構成される。DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be described below with reference to the drawings. FIG. 2 shows a data processing apparatus according to the present embodiment. The data processing device includes an instruction processing unit 10 for issuing and executing instructions, a main memory 30 for storing instructions and data to be executed by the instruction processing unit, and data between the instruction processing unit and the main memory. And a storage control unit 20 for controlling the exchange of data.

【００６０】命令処理ユニット１０は、実行する命令を
保持する命令レジスタ１０１、命令レジスタ１０１の内
容を解読し、命令実行の制御を行なう命令制御部１０
２、汎用演算およびアドレス計算に必要なデ−タを保持
する汎用レジスタ群１０３、命令で指定される汎用演算
を実行する汎用演算器１０４、浮動小数点演算に必要な
デ−タを保持する物理浮動小数点レジスタ群１０５、命
令で指定される浮動小数点演算を実行する浮動小数点演
算器１０６、主記憶デ−タをアクセスするための主記憶
アドレスを計算するアドレス加算器１０７、記憶制御ユ
ニット２０から読みだされた主記憶デ−タを保持するキ
ャッシュ１０８、キャッシュ１０８の検索結果に従い記
憶制御ユニット２０から主記憶デ−タを読みだすなどの
制御を行なう主記憶アクセス制御部１０９、現浮動小数
点レジスタウィンドウポインタを格納する現浮動小数点
ウィンドウポインタレジスタ１１０、現浮動小数点レジ
スタウィンドウポインタが有効であることを示す現浮動
小数点ウィンドウポインタ有効レジスタ１１１、命令で
指定された論理浮動小数点レジスタ番号を物理浮動小数
点レジスタ番号に式（２）−（６）にしたがって変換す
る変換論理１１２から構成される。The instruction processing unit 10 includes an instruction register 101 for holding an instruction to be executed, an instruction control unit 10 for decoding the contents of the instruction register 101 and controlling the instruction execution.
2. A general-purpose register group 103 for holding data required for general-purpose operation and address calculation, a general-purpose arithmetic unit 104 for executing general-purpose operation specified by an instruction, and a physical floating unit for holding data required for floating-point operation. A decimal point register group 105, a floating-point arithmetic unit 106 for executing a floating-point operation specified by an instruction, an address adder 107 for calculating a main storage address for accessing main storage data, and a read from the storage control unit 20. A cache 108 for holding the main memory data, a main memory access control unit 109 for performing control such as reading the main memory data from the storage control unit 20 in accordance with a search result of the cache 108, a current floating point register window pointer Floating point window pointer register 110 for storing the current floating point The current floating-point window pointer valid register 111 indicating that the counter is valid, and the conversion logic 112 for converting the logical floating-point register number specified by the instruction into the physical floating-point register number according to the equations (2) to (6). Be composed.

【００６１】ここで、このデ−タ処理装置には、図３で
示すように、４つの命令が新たに追加される。それら
は、（ａ）現浮動小数点レジスタウィンドウポインタ変
更命令、（ｂ）浮動小数点レジスタプリロ−ド命令、
（ｃ）拡張浮動小数点レジスタプリロ−ド命令、（ｄ）
浮動小数点レジスタポストストア命令である。現浮動小
数点レジスタウィンドウポインタ変更命令は現浮動小数
点レジスタウィンドウポインタを変更する命令である。
前記命令のうち（ａ），（ｂ），（ｄ）の命令ニモニッ
クと機能は「作用」の項に述べた。図３（ａ）中、命令
コ−ドは、現浮動小数点レジスタウィンドウポインタ変
更命令であることを示すものである。汎用レジスタ番号
は、セットする現浮動小数点レジスタウィンドウポイン
タの値が格納されている汎用レジスタを指定する。浮動
小数点レジスタプリロ−ド命令は、主記憶デ−タを、
（現浮動小数点レジスタウィンドウポインタ＋１）のウ
ィンドウに属する浮動小数点レジスタに格納する命令で
ある。図３（ｂ）中、命令コ−ドは、浮動小数点レジス
タプリロ−ド命令であることを示すものである。浮動小
数点レジスタ番号は主記憶デ−タが格納される論理浮動
小数点レジスタ番号（ｒとする）であり、対応する物理
浮動小数点レジスタ番号は、現浮動小数点レジスタウィ
ンドウポインタをｗとして、＜ｗ＋１，ｒ＞である。汎
用レジスタの値が、デ−タを主記憶から読みだすための
主記憶アドレスである。該読みだし実行後、汎用レジス
タに加える値がインクリメント値である。拡張浮動小数
点レジスタプリロ−ド命令は、主記憶デ−タを、（現浮
動小数点レジスタウィンドウポインタ＋２）のウィンド
ウに属する浮動小数点レジスタに格納する命令である。
図３（ｃ）中、命令コ−ドは、拡張浮動小数点レジスタ
プリロ−ド命令であることを示すものである。浮動小数
点レジスタ番号は主記憶デ−タが格納される論理浮動小
数点レジスタ番号（ｒとする）であり、対応する物理浮
動小数点レジスタ番号は、現浮動小数点レジスタウィン
ドウポインタをｗとして、＜ｗ＋２，ｒ＞である。汎用
レジスタの値が、デ−タを主記憶から読みだすための主
記憶アドレスである。該読みだし実行後、汎用レジスタ
に加える値がインクリメント値である。なお、拡張浮動
小数点レジスタプリロ−ド命令の機能は「作用」の項に
示していないが、拡張浮動小数点レジスタプリロ−ド命
令の単なる拡張で、上記説明から自明である。浮動小数
点レジスタポストストア命令は、（現浮動小数点レジス
タウィンドウポインタ−１）のウィンドウに属する浮動
小数点レジスタからデ−タを主記憶に格納する命令であ
る。図３（ｄ）中、命令コ−ドは、浮動小数点レジスタ
ポストストア命令であることを示すものである。浮動小
数点レジスタ番号はデ−タが読みだされる論理浮動小数
点レジスタ番号（ｒとする）であり、対応する物理浮動
小数点レジスタ番号は、現浮動小数点レジスタウィンド
ウポインタをｗとして、＜ｗ−１，ｒ＞である。汎用レ
ジスタの値がデ−タが格納される主記憶のアドレスであ
る。該読みだし実行後、汎用レジスタに加える値がイン
クリメント値である。これらの命令の動作を図２に従
い、説明する。まず、現浮動小数点レジスタウィンドウ
ポインタ変更命令について説明する。命令レジスタ１０
１に命令が取り込まれていると、命令は命令制御部１０
２で解読され、現浮動小数点レジスタウィンドウポイン
タ変更命令であることが識別されると、命令中に指定さ
れた汎用レジスタが汎用レジスタ群１０３から読みださ
れ、該レジスタに格納されている値が現浮動小数点ウィ
ンドウポインタレジスタ１１０にセットされる。Here, as shown in FIG. 3, four new instructions are added to the data processing device. They are: (a) a current floating point register window pointer change instruction, (b) a floating point register preload instruction,
(C) an extended floating-point register preload instruction, (d)
Floating point register post store instruction. The current floating point register window pointer change instruction is an instruction for changing the current floating point register window pointer.
The instruction mnemonics and functions of the instructions (a), (b), and (d) are described in the section of “action”. In FIG. 3A, the instruction code indicates that the instruction is a current floating-point register window pointer change instruction. The general-purpose register number designates a general-purpose register in which the value of the current floating-point register window pointer to be set is stored. The floating-point register preload instruction stores the main memory data.
This is an instruction to be stored in the floating-point register belonging to the window of (current floating-point register window pointer + 1). In FIG. 3B, the instruction code indicates that it is a floating-point register preload instruction. The floating-point register number is a logical floating-point register number (referred to as r) in which main memory data is stored, and the corresponding physical floating-point register number is <w + 1, r, where w is the current floating-point register window pointer. >. The value of the general-purpose register is a main memory address for reading data from the main memory. After the reading, the value added to the general-purpose register is the increment value. The extended floating-point register preload instruction is an instruction for storing main storage data in a floating-point register belonging to the window of (current floating-point register window pointer + 2).
In FIG. 3C, the instruction code indicates an extended floating-point register preload instruction. The floating-point register number is a logical floating-point register number (r) in which main storage data is stored, and the corresponding physical floating-point register number is <w + 2, r, where w is the current floating-point register window pointer. >. The value of the general-purpose register is a main memory address for reading data from the main memory. After the reading, the value added to the general-purpose register is the increment value. Although the function of the extended floating-point register preload instruction is not shown in the section of "Operation", it is merely an extension of the extended floating-point register preload instruction and is obvious from the above description. The floating-point register post-store instruction is an instruction to store data from the floating-point register belonging to the window of (current floating-point register window pointer-1) in the main memory. In FIG. 3D, the instruction code indicates that the instruction is a floating-point register post-store instruction. The floating-point register number is the logical floating-point register number (r) from which data is read, and the corresponding physical floating-point register number is <w−1, where w is the current floating-point register window pointer. r>. The value of the general-purpose register is the address of the main memory where the data is stored. After the reading, the value added to the general-purpose register is the increment value. The operation of these instructions will be described with reference to FIG. First, the current floating-point register window pointer change instruction will be described. Instruction register 10
When the instruction is loaded into the instruction control unit 10
When the instruction is decoded in step 2 and it is identified that the instruction is the current floating-point register window pointer change instruction, the general-purpose registers specified in the instruction are read out from the general-purpose register group 103, and the value stored in the register is read out. Set in floating point window pointer register 110.

【００６２】次に浮動小数点レジスタプリロ−ド命令に
ついて説明する。命令レジスタ１０１に命令が取り込ま
れていると、命令は命令制御部１０２で解読され、浮動
小数点レジスタプリロ−ド命令であることが識別される
と、アドレス加算器１０７は、命令に指定された汎用レ
ジスタ番号で示される汎用レジスタの内容をデ−タを主
記憶から読みだすための主記憶アドレスとする。主記憶
アクセス制御部１０９は前記主記憶アドレスをもとにキ
ャッシュ１０８を検索し、キャッシュに所望のデ−タが
あれば、キャッシュからデ−タを転送し、なければ、記
憶制御ユニット２０経由で、主記憶３０からデ−タを転
送する。該転送デ−タは、浮動小数点レジスタ１０５に
格納されるが、格納される浮動小数点レジスタの物理浮
動小数点レジスタ番号は、変換回路１１２で以下のよう
にして求められる。命令中に指定された浮動小数点レジ
スタ番号は論理浮動小数点レジスタ番号（ｒとする）で
あり、現浮動小数点レジスタウィンドウポインタレジス
タ１１０の値をｗとして、＜ｗ＋１，ｒ＞が物理浮動小
数点レジスタ番号になる。該デ−タ転送動作開始後、汎
用演算器１０４で汎用レジスタの値にインクリメント値
を加える。Next, the floating point register preload instruction will be described. When the instruction is loaded into the instruction register 101, the instruction is decoded by the instruction control unit 102. When the instruction is identified as a floating-point register preload instruction, the address adder 107 operates the general-purpose register designated by the instruction. The content of the general-purpose register indicated by the number is used as a main storage address for reading data from the main storage. The main memory access control unit 109 searches the cache 108 based on the main memory address. If desired data is found in the cache, the data is transferred from the cache. The data is transferred from the main memory 30. The transfer data is stored in the floating-point register 105. The physical floating-point register number of the stored floating-point register is obtained by the conversion circuit 112 as follows. The floating-point register number specified in the instruction is a logical floating-point register number (r), and the value of the current floating-point register window pointer register 110 is w, and <w + 1, r> is a physical floating-point register number. Become. After the start of the data transfer operation, the general-purpose arithmetic unit 104 adds an increment value to the value of the general-purpose register.

【００６３】次に拡張浮動小数点レジスタプリロ−ド命
令について説明する。命令レジスタ１０１に命令が取り
込まれていると、命令は命令制御部１０２で解読され、
拡張浮動小数点レジスタプリロ−ド命令であることが識
別されると、アドレス加算器１０７は、命令に指定され
た汎用レジスタ番号で示される汎用レジスタの内容をデ
−タを主記憶から読みだすための主記憶アドレスとす
る。主記憶アクセス制御部１０９は前記主記憶アドレス
をもとにキャッシュ１０８を検索し、キャッシュに所望
のデ−タがあれば、キャッシュからデ−タを転送し、な
ければ、記憶制御ユニット２０経由で、主記憶３０から
デ−タを転送する。該転送デ−タは、浮動小数点レジス
タ１０５に格納されるが、格納されるレジスタの物理浮
動小数点レジスタ番号は、変換回路１１２で以下のよう
にして求められる。命令中に指定された浮動小数点レジ
スタ番号は論理浮動小数点レジスタ番号（ｒとする）で
あり、現浮動小数点レジスタウィンドウポインタレジス
タ１１０の値をｗとして、＜ｗ＋２，ｒ＞が物理浮動小
数点レジスタ番号になる。該デ−タ転送動作開始後、汎
用演算器１０４で汎用レジスタの値にインクリメント値
を加える。Next, the extended floating point register preload instruction will be described. When the instruction is taken into the instruction register 101, the instruction is decoded by the instruction control unit 102,
When it is determined that the instruction is an extended floating-point register preload instruction, the address adder 107 reads the contents of the general-purpose register indicated by the general-purpose register number specified in the instruction from the main memory. The storage address. The main memory access control unit 109 searches the cache 108 based on the main memory address. If desired data is found in the cache, the data is transferred from the cache. The data is transferred from the main memory 30. The transfer data is stored in the floating point register 105. The physical floating point register number of the stored register is obtained by the conversion circuit 112 as follows. The floating-point register number specified in the instruction is a logical floating-point register number (r), and the value of the current floating-point register window pointer register 110 is w, and <w + 2, r> is a physical floating-point register number. Become. After the start of the data transfer operation, the general-purpose arithmetic unit 104 adds an increment value to the value of the general-purpose register.

【００６４】次に浮動小数点レジスタポストストア命令
について説明する。命令レジスタ１０１に命令が取り込
まれていると、命令は命令制御部１０２で解読され、浮
動小数点レジスタポストストア命令であることが識別さ
れると、アドレス加算器１０７は、命令に指定された汎
用レジスタ番号で示される汎用レジスタの内容をデ−タ
を主記憶に格納するための主記憶アドレスとする。浮動
小数点レジスタ１０５からデ−タが読みだされるが、読
みだされるレジスタの物理浮動小数点レジスタ番号は、
変換回路１１２で以下のようにして求められる。命令中
に指定された浮動小数点レジスタ番号は論理浮動小数点
レジスタ番号（ｒとする）であり、現浮動小数点レジス
タウィンドウポインタレジスタ１１０の値をｗとして、
＜ｗ−１，ｒ＞が物理浮動小数点レジスタ番号になる。
主記憶アクセス制御部１０９は前記主記憶アドレスをも
とにキャッシュ１０８を検索し、キャッシュ中に、主記
憶３０の該主記憶アドレスに格納されているデ−タの写
しがあれば、該デ−タを前記読みだしデ−タに置き換
え、なければ、キャッシュは操作しない。さらに、主記
憶アクセス制御部１０９は記憶制御ユニット２０経由
で、主記憶３０の前記主記憶アドレスに前記読みだしデ
−タを格納する。該デ−タ転送動作開始後、汎用演算器
１０４で汎用レジスタの値にインクリメント値を加え
る。Next, the floating-point register post-store instruction will be described. When the instruction is loaded into the instruction register 101, the instruction is decoded by the instruction control unit 102, and when the instruction is identified as a floating-point register post-store instruction, the address adder 107 causes the general-purpose register designated by the instruction to operate. The contents of the general-purpose register indicated by the number are used as a main memory address for storing data in the main memory. Data is read from the floating-point register 105. The physical floating-point register number of the register to be read is:
It is obtained by the conversion circuit 112 as follows. The floating point register number specified in the instruction is a logical floating point register number (r), and the value of the current floating point register window pointer register 110 is w,
<W-1, r> is the physical floating-point register number.
The main memory access control unit 109 searches the cache 108 based on the main memory address, and if there is a copy of the data stored in the main memory address of the main memory 30 in the cache, the data is retrieved. If the data is not replaced with the read data, the cache is not operated. Further, the main memory access control unit 109 stores the read data at the main memory address of the main memory 30 via the storage control unit 20. After the start of the data transfer operation, the general-purpose arithmetic unit 104 adds an increment value to the value of the general-purpose register.

【００６５】また、一般の浮動小数点命令（演算命令、
ロ−ド命令、ストア命令）では、命令中に示された論理
浮動小数点レジスタ番号ｒは、現浮動小数点レジスタウ
ィンドウポインタレジスタ１１０の値をｗとして、＜
ｗ，ｒ＞で示される物理浮動小数点レジスタ番号に、変
換論理１１２で変換され、該物理浮動小数点レジスタ番
号の示す浮動小数点レジスタが参照される。Further, general floating-point instructions (operation instructions,
Load instruction, store instruction), the logical floating-point register number r indicated in the instruction is determined by setting the value of the current floating-point register window pointer register 110 to w.
The conversion logic 112 converts the data into a physical floating-point register number represented by w, r>, and refers to the floating-point register indicated by the physical floating-point register number.

【００６６】また、現浮動小数点ウィンドウポインタ有
効レジスタ１１１の値が「１」であると、現浮動小数点
レジスタウィンドウポインタが有効である。すなわち、
変換回路１１２における論理レジスタ番号−物理レジス
タ番号変換が行なわれ、「０」であると、論理レジスタ
番号−物理レジスタ番号変換は行なわれず、命令で指定
された論理浮動小数点レジスタ番号がそのまま物理浮動
小数点レジスタ番号になり、該物理浮動小数点レジスタ
番号の示す物理浮動小数点レジスタが参照される。こ
こで、現浮動小数点ウィンドウポインタ有効レジスタ１
１１には、デ−タ処理システムの制御情報を格納する既
存のレジスタの空きビットを割当てれば良く、該レジス
タに値を格納する既存の命令を用いて値をセットするも
のとする。When the value of the current floating point window pointer valid register 111 is “1”, the current floating point register window pointer is valid. That is,
Conversion of the logical register number to the physical register number in the conversion circuit 112 is performed. If the conversion result is "0", the conversion of the logical register number to the physical register number is not performed, and the logical floating point register number specified by the instruction remains unchanged. It becomes a register number, and refers to the physical floating point register indicated by the physical floating point register number. Here, the current floating-point window pointer valid register 1
An empty bit of an existing register for storing control information of the data processing system may be assigned to 11, and a value is set using an existing instruction for storing a value in the register.

【００６７】以上のように現浮動小数点レジスタウィン
ドウポインタ変更命令、浮動小数点レジスタプリロ−ド
命令、拡張浮動小数点レジスタプリロ−ド命令、浮動小
数点レジスタポストストア命令、現浮動小数点ウィンド
ウポインタの制御下での一般の浮動小数点命令は動作す
る。As described above, the current floating-point register window pointer change instruction, floating-point register preload instruction, extended floating-point register preload instruction, floating-point register post-store instruction, and general control under the control of the current floating-point window pointer Floating point instructions work.

【００６８】上記の実施例により、表３のようなプログ
ラムが実現でき、ベクトル計算が高速化されることは、
「作用」の欄に述べた。According to the above embodiment, a program as shown in Table 3 can be realized, and the speed of the vector calculation is increased.
It was mentioned in the column of "action".

【００６９】したがって、本発明の方式によると、現浮
動小数点ウィンドウポインタを変えることにより、命令
中のある浮動小数点レジスタ番号は異なる物理浮動小数
点レジスタ番号に変換されるので、命令によってアドレ
ス可能なレジスタの数よりも多い物理レジスタをデ−タ
処理装置のア−キテクチャを変えることなくアクセス可
能にでき、表３に示すようなプログラムが実現でき、デ
−タ読みだし、レジスタのぶつかりによって命令実行が
待たされることによる性能低下を防ぐことができ、プロ
グラムの高速な実行が可能である。Therefore, according to the method of the present invention, by changing the current floating-point window pointer, one floating-point register number in an instruction is converted to a different physical floating-point register number. More physical registers than the number can be accessed without changing the architecture of the data processing device, the program as shown in Table 3 can be realized, data is read out, and the execution of the instruction is delayed due to the collision of the registers. Performance can be prevented from being reduced, and high-speed execution of the program is possible.

【００７０】特に、表３のプログラムからもわかる通
り、命令列のル−プのくりかえしが主となる科学技術計
算のベクトル計算では、ル−プごとに使うウィンドウを
変え、ｉ番目の要素の処理を、第ｉ−１ル−プにおける
浮動小数点プリロ−ド命令によるオペランドベクトルの
第ｉ要素のロ−ド、第ｉル−プにおける演算、第ｉ＋１
ル−プにおける浮動小数点ポストストア命令による結果
格納用ベクトルの第ｉ要素への演算結果のストアを行う
ことによって、１つのデ−タに対するロ−ド、演算、ス
トアの処理の命令列上での距離が大きくなり、デ−タの
読みだし時間、演算実行時間の影響による性能低下を防
ぐことができ、高速化ができる。In particular, as can be seen from the program shown in Table 3, in the vector calculation of the scientific and technical calculation in which the loop of the instruction sequence is mainly performed, the window used for each loop is changed, and the processing of the i-th element is performed. Is the load of the i-th element of the operand vector by the floating-point preload instruction in the (i-1) -th loop, the operation in the i-th loop, the (i + 1) -th loop
By storing the operation result in the i-th element of the result storage vector by the floating-point post-store instruction in the loop, load, operation, and store processing of one data on the instruction sequence is performed. The distance is increased, and the performance can be prevented from deteriorating due to the influence of the data reading time and the operation execution time, and the speed can be increased.

【００７１】ここで、表３にない拡張浮動小数点プリロ
−ド命令が実施例では導入されているが、これは、主記
憶からデ−タを読みだす時間が長いシステムにおいて
は、表３のプログラムのように、プリロ−ド命令を発行
したル−プの次のル−プでロ−ドされたデ−タを演算し
ようとしても、まだロ−ドが完了していなくて、演算命
令が待たされる場合が生じる恐れがある。したがって、
プリロ−ド命令を発行したル−プの次の次のル−プでロ
−ドされたデ−タを演算するようにプログラムできるよ
うに拡張浮動小数点プリロ−ド命令を考案した。Here, an extended floating-point preload instruction not shown in Table 3 is introduced in the embodiment. This is because, in a system in which the time for reading data from the main memory is long, the program shown in Table 3 is used. When the data loaded in the next loop of the loop that issued the preload instruction is to be calculated, the load has not yet been completed, and the operation instruction waits. May occur. Therefore,
An extended floating point preload instruction has been devised so that it can be programmed to operate on data loaded in the next loop following the loop that issued the preload instruction.

【００７２】[0072]

【発明の効果】本発明によれば、現浮動小数点ウィンド
ウポインタを変えることにより、命令中のある浮動小数
点レジスタ番号は異なる物理浮動小数点レジスタ番号に
変換されるので、命令によってアドレス可能なレジスタ
の数よりも多い物理レジスタをデ−タ処理装置のア−キ
テクチャを変えることなくアクセス可能にできるので、
デ−タ読みだし、レジスタのぶつかりによって命令実行
が待たされることによる性能低下を防ぐことができ、プ
ログラムの高速な実行が可能であるという効果がある。According to the present invention, by changing the current floating point window pointer, one floating point register number in an instruction is converted to a different physical floating point register number, so that the number of registers addressable by the instruction Since more physical registers can be accessed without changing the architecture of the data processing device,
It is possible to prevent a decrease in performance due to waiting for instruction execution due to data reading and register collision, and the program can be executed at high speed.

【００７３】特に、命令列のル−プのくりかえしが主と
なる科学技術計算のベクトル計算では、ル−プごとに使
うウィンドウを変え、ｉ番目の要素の処理を、第ｉ−１
ル−プにおける浮動小数点プリロ−ド命令によるオペラ
ンドベクトルの第ｉ要素のロ−ド、第ｉル−プにおける
演算、第ｉ＋１ル−プにおける浮動小数点ポストストア
命令による結果格納用ベクトルの第ｉ要素への演算結果
のストアを行うことによって、１つのデ−タに対するロ
−ド、演算、ストアの処理の命令列上での距離が大きく
なり、デ−タの読みだし時間、演算実行時間の影響によ
る性能低下を防ぐことができ、高速化ができる。In particular, in the vector calculation of scientific and technical calculation in which the repetition of the loop of the instruction sequence is the main, the window used for each loop is changed, and the processing of the i-th element is performed by the i-1 th element.
Load the i-th element of the operand vector by the floating-point preload instruction in the loop, perform the operation in the i-th loop, and the i-th element of the result storage vector by the floating-point poststore instruction in the (i + 1) -th loop Storing the operation result in the instruction column increases the distance of the load, operation, and store processing for one data on the instruction sequence, and influences the data read time and the operation execution time. Performance can be prevented and the speed can be increased.

[Brief description of the drawings]

【図１】本発明による論理浮動小数点レジスタ番号−
物理浮動小数点レジスタ番号変換の１実施例。FIG. 1 shows a logical floating-point register number according to the present invention.
One embodiment of physical floating point register number conversion.

【図２】本発明による図３で示す命令を実行するデー
タ処理装置の１実施例を示す構成図。FIG. 2 is a configuration diagram showing one embodiment of a data processing device for executing the instruction shown in FIG. 3 according to the present invention.

【図３】本発明による現浮動小数点レジスタウィンド
ウポインタ変更命令、浮動小数点レジスタプリロード命
令、拡張浮動小数点レジスタプリロード命令、浮動小数
点レジスタポイントストア命令の１実施例を示す図。FIG. 3 is a diagram showing an embodiment of a current floating-point register window pointer change instruction, floating-point register preload instruction, extended floating-point register preload instruction, and floating-point register point store instruction according to the present invention.

[Explanation of symbols]

１０は命令処理ユニット、２０は記憶制御ユニット、３
０は主記憶、１０１は命令レジスタ、１０２は命令制御
部、１０３は汎用レジスタ群、１０４は汎用演算器、１
０５は物理浮動小数点レジスタ群、１０６は浮動小数点
演算器、１０７はアドレス加算器、１０８はキャッシ
ュ、１０９は主記憶アクセス制御部、１１０は現浮動小
数点ウィンドウポインタレジスタ、１１１は現浮動小数
点ウィンドウポインタ有効レジスタ、１１２は変換論理10 is an instruction processing unit, 20 is a storage control unit, 3
0 is a main memory, 101 is an instruction register, 102 is an instruction control unit, 103 is a general-purpose register group, 104 is a general-purpose arithmetic unit, 1
05 is a physical floating-point register group, 106 is a floating-point arithmetic unit, 107 is an address adder, 108 is a cache, 109 is a main memory access control unit, 110 is a current floating-point window pointer register, and 111 is a current floating-point window pointer valid. Register, 112 is conversion logic

───────────────────────────────────────────────────── フロントページの続き (72)発明者中澤喜三郎神奈川県相模原市相模台６丁目29番10号 (72)発明者中村宏茨城県つくば市並木４丁目1043番 (72)発明者位守弘充茨城県つくば市天久保２丁目23番６号 (72)発明者和田英夫神奈川県秦野市堀山下１番地株式会社日立製作所神奈川工場内 (56)参考文献特開昭62−98434（ＪＰ，Ａ) 特開平１−161477（ＪＰ，Ａ) 特開昭62−17873（ＪＰ，Ａ) 特開昭61−267134（ＪＰ，Ａ) 特開昭61−241870（ＪＰ，Ａ) 特開昭61−136131（ＪＰ，Ａ) 特開昭60−129838（ＪＰ，Ａ) 特開平５−233279（ＪＰ，Ａ) ＤａｎｉｅｌＴａｂａｋ著，大森健児訳，ＲＩＳＣシステム，日本，海文堂出版株式会社，1991年11月１日，初版, ｐ．110−116 Ｄ．Ｒ．Ｍｉｌｌｅｒ，Ｄ．Ｊ．Ｑｕａｍｍｅｎ，Ｅｘｐｌｏｉｔｉｎｇｌａｒｇｅｒｅｇｉｓｔｅｒｓｅｔｓ，ＭｉｃｒｏｐｒｏｃｅｓｓｏｒｓａｎｄＭｉｃｒｏｓｙｓｔｅｍｓ，英国，Ｂｕｔｔｅｒｗｏｒｔｈ−ＨｅｉｎｅｍａｎｎＬｔｄ．，1990年，第14 巻，第６号，ｐ．333−340 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/16 G06F 9/30 - 9/355 G06F 9/38 G06F 9/40 - 9/42 G06F 9/46 - 9/54 ──────────────────────────────────────────────────の Continuing on the front page (72) Inventor Kisaburo Nakazawa 6-29-10 Sagamidai, Sagamihara City, Kanagawa Prefecture (72) Inventor Hiroshi Nakamura 4-1043, Namiki, Tsukuba City, Ibaraki Prefecture (72) Inventor Hiromitsu Hiromitsu Ibaraki 2-23-6, Akubo, Tsukuba, Japan (72) Inventor Hideo Wada 1 Horiyamashita, Hadano-shi, Kanagawa Prefecture Hitachi, Ltd. Kanagawa Plant (56) References JP-A-62-98434 Japanese Patent Laid-Open No. 1-161477 (JP, A) Japanese Patent Application Laid-Open No. Sho 62-17873 (JP, A) Japanese Patent Application Laid-Open No. 61-267134 (JP, A) Japanese Patent Application Laid-Open No. 61-241870 (JP, A) Japanese Patent Application Laid-Open No. 61-136131 (Japanese) JP, A) JP-A-60-129838 (JP, A) JP-A-5-233279 (JP, A) Daniel Tabak, translated by Kenji Omori, RISC system, Japan, Kaibundo Publishing Co., Ltd., November 1991 Day, first edition, p. 110-116 D.C. R. Miller, D.M. J. Quammen, Exploiting large register sets, Microprocessors and Microsystems, Butterworth-Hein emann Ltd., UK. , 1990, Vol. 14, No. 6, p. 333-340 (58) Fields investigated (Int.Cl. ⁷ , DB name) G06F 17/16 G06F 9/30-9/355 G06F 9/38 G06F 9/40-9/42 G06F 9/46-9 / 54

Claims

(57) [Claims]

An instruction is executed using a main memory for holding instructions and data, and main memory data held in the main memory.
The instruction includes reading main memory data from the main memory,
Load instruction stored in the register numbered in the instruction
If, de to the main storage from the numbered by the register in the instruction - and the store instruction for storing data, performs calculation, and operation instructions to be stored in the numbered by the register computation results in an instruction <br/> de consisting instruction processing unit comprising - in data processing apparatus, a register called more physical registers than the number of addressable registers by the instruction, from a plurality of bits
Register called window pointer register
When, the transformation and register called 1-bit window pointer valid register, when the value of the window pointer valid register is 1, and converts the register number in the instruction to a physical register number, and the value of the window pointer register Bei and a conversion circuit to change the emissions - of pattern
The instructions executed by the instruction processing unit include a window pointer change instruction for changing the value of the window pointer register, and a register number in the instruction determined from the value of the window pointer register. A register preload instruction which regards a value different from the register value as a value of the window pointer register, converts the value into a physical register number by the conversion circuit, and stores main memory data in the physical register indicated by the physical register number ; The register number in the instruction is determined from the value of the window pointer register, but a value different from the value of the window pointer register is regarded as the value of the window pointer register, and the conversion circuit converts the value into a physical register number. de read from the physical register indicated by the physical register number And a register poststore instruction for storing in the main memory data, said instruction processing unit, said register preload - de instruction, the B excluding the register poststore instruction - de instruction, the store instruction, and execution of the operation instruction In some cases, the register number in the instruction is converted into a physical register number by the conversion circuit using the value of the window pointer register, and the physical register indicated by the physical register number is referred to when executing the register preload instruction.
Indicates the register number in the instruction
Determined from the register value, the window pointer register
The window pointer register to a value different from the
Data is converted to a physical register number by the conversion circuit.
The main memory is stored in the physical register indicated by the physical register number.
Storing data and executing the register post-store instruction
Sometimes, the register number in the instruction is
Window pointer
A value different from the value of the register
Assuming the value of the register
To the physical register indicated by the physical register number.
A data processing device for storing read data in a main memory .

2. The data processing apparatus according to claim 1 , wherein said instruction processing unit uses a value obtained by adding an arbitrary integer to the value of said window pointer register when executing said register preload instruction. Data processing device.

3. The buffer memory device according to claim 1 , wherein said instruction processing unit temporarily stores a part of the contents of said main memory when reading said main memory data when said register preload instruction is executed. If the main memory data is not registered in a certain cache, the contents of the cache are changed .
On the other hand , when the register post-store instruction is executed , if the main memory data of the corresponding main memory address is not registered in the cache in writing the data to the main memory, the content of the cache is changed. 2. The data processing apparatus according to claim 1, wherein the data processing is not changed.

4. A set of said physical register numbers which can be converted from a register number in an instruction by a value of said window pointer register in converting said register number in said instruction to said physical register number; The set of physical register numbers that can be converted from the register number in the instruction by one different value includes one or more identical register numbers called the overlap register number; One or a plurality of the register numbers are converted into the same physical register number called a global register number irrespective of the value of the window pointer register, and other than the overlapping register number and the global register number The physical register number of only one value of the window pointer register De according to claim 1, characterized in that the the outcome converted I - data processing device.

5. The data processing according to claim 1, wherein the registers numbered in the instruction and the physical registers are dedicated registers for storing floating-point numbers, called floating-point registers. apparatus.

6. The instruction processing unit performs an operation of setting a value in the window pointer register or increasing or decreasing the value of the window pointer register when executing the window pointer change instruction. The data processing apparatus according to claim 1, wherein: