JP2001331475A

JP2001331475A - Vector instruction processor and vector instruction processing method

Info

Publication number: JP2001331475A
Application number: JP2000151813A
Authority: JP
Inventors: Tadaaki Miyata; 忠明宮田
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2000-05-23
Filing date: 2000-05-23
Publication date: 2001-11-30

Abstract

PROBLEM TO BE SOLVED: To solve the first problem that the speed of a vector instruction is suppressed since the flash processing of a cache is slower than the throughput of the storage processing of the vector instruction itself and the second problem that flash is performed even to data which are not the object of the flash since the data of plural bytes are registered to one cache line in a communication cache. SOLUTION: This processor is provided with a main storage means for storing the data, an instruction discrimination means for discriminating whether an effective object instruction is the vector instruction or not, an address generation means for generating an address in the main storage means for the processing data of the vector instruction when it is discriminated that it is the vector instruction by the instruction means, the storage register of the processing data and an instruction performance means for performing data transfer between the register and the address generated by the address generation means without performing cache flash.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、ベクトル命令処理
装置およびベクトル命令処理方法に関する。The present invention relates to a vector instruction processing device and a vector instruction processing method.

【０００２】[0002]

【従来の技術】従来のベクトル命令処理装置およびベク
トル命令処理方法において、ベクトル命令は一度に大量
のデータを転送するため、キャッシュにアクセスせずに
直接主記憶部にアクセスする。しかしながら、通常、何
らかの演算を実行する際にはスカラ演算とベクトル演算
とが混在し、スカラ演算においてはキャッシュデータを
使用して演算を実行する。そこで、スカラ演算とベクト
ル演算とで共通のデータを扱う場合を考慮してキャッシ
ュと主記憶部とのコンシステンシ（一貫性）を保つた
め、全てのストアデータをキャッシュと主記憶部とに同
時に書き込んでいる。2. Description of the Related Art In a conventional vector instruction processing apparatus and vector instruction processing method, since a vector instruction transfers a large amount of data at a time, a vector storage directly accesses a main storage unit without accessing a cache. However, usually, when performing some operation, scalar operation and vector operation are mixed, and in the scalar operation, the operation is executed using cache data. Therefore, in order to maintain consistency between the cache and the main storage unit in consideration of a case where common data is handled by the scalar operation and the vector operation, all store data is simultaneously written to the cache and the main storage unit. In.

【０００３】[0003]

【発明が解決しようとする課題】上述した従来のベクト
ル命令処理装置およびベクトル命令処理方法において
は、上記コンシステンシを保つために以下の二つの問題
があった。第一の問題点は、キャッシュのフラッシュ処
理がベクトル命令のストア処理自体のスループットより
も遅いため、このキャッシュフラッシュ処理がネックに
なってベクトル命令の速度が抑えられていたことであ
る。第二の問題点は、通常、キャッシュは一つのキャッ
シュラインに複数バイトのデータが登録されるため、フ
ラッシュの対象となっていないデータに対してもフラッ
シュしていたことである。すなわち、フラッシュはキャ
ッシュライン単位で行われるため、キャッシュラインの
中のある１バイトだけがフラッシュ対象だとしても他の
全てのデータもフラッシュされてしまうことになる。The above-described conventional vector instruction processing apparatus and vector instruction processing method have the following two problems in order to maintain the above consistency. The first problem is that the cache flush process is slower than the throughput of the vector instruction store process itself, so that the cache flush process becomes a bottleneck and the speed of the vector instruction is suppressed. The second problem is that the cache usually has a plurality of bytes of data registered in one cache line, and therefore, also flushes data that is not to be flushed. That is, since flushing is performed in units of cache lines, even if only one byte in a cache line is to be flushed, all other data is flushed.

【０００４】第一の問題点を解決するための技術とし
て、特開平５−２０１９０号公報や特開平９−２５１４
２４号公報には、ハードウェア的にフラッシュ処理を高
速化する方法が開示されている。しかし、この方法では
第二の問題点が依然として残っていた。As techniques for solving the first problem, Japanese Patent Application Laid-Open Nos. Hei 5-20190 and Hei 9-2514 are known.
No. 24 discloses a method of speeding up flash processing by hardware. However, this method still had a second problem.

【０００５】本発明は、上記課題にかんがみてなされた
もので、第一の問題点および第二の問題点を共に解決す
るためのベクトル命令処理装置およびベクトル命令処理
方法を提供する。The present invention has been made in view of the above problems, and provides a vector instruction processing device and a vector instruction processing method for solving both the first problem and the second problem.

【０００６】[0006]

【課題を解決するための手段】上記目的を達成するた
め、請求項１にかかる発明は、データを格納する主記憶
手段と、実行対象命令がベクトル命令であるか否かを判
別する命令判別手段と、この命令判別手段によってベク
トル命令であることが判別されたときに、当該ベクトル
命令の処理データのために上記主記憶手段におけるアド
レスを生成するアドレス生成手段と、当該処理データを
格納するレジスタと、キャッシュフラッシュを行うこと
なくこのレジスタと上記アドレス生成手段によって生成
されたアドレスとの間でのデータ授受を実行する命令実
行手段とを具備する構成としてある。In order to achieve the above object, according to the first aspect of the present invention, there is provided a main storage unit for storing data, and an instruction determining unit for determining whether an instruction to be executed is a vector instruction. An address generating means for generating an address in the main storage means for processing data of the vector instruction when the instruction determining means determines that the processing instruction is a vector instruction; and a register for storing the processing data. And an instruction executing means for executing data transfer between the register and the address generated by the address generating means without performing cache flush.

【０００７】すなわち、キャッシュや主記憶部に格納さ
れるデータとレジスタとの間のデータの授受を行って種
々の命令に対する処理を実行するあたり、本発明ではベ
クトル命令に対する処理を行う場合にキャッシュフラッ
シュを実行しないように構成される。このため本発明に
おいて、主記憶手段にはデータが格納され、命令判別手
段によって実行対象の命令がベクトル命令であるか否か
判別される。More specifically, when data is exchanged between a register and data stored in a cache or a main storage unit to execute processing for various instructions, the present invention employs a cache flash when processing for vector instructions is performed. Is configured not to execute. Therefore, in the present invention, data is stored in the main storage means, and the instruction determination means determines whether or not the instruction to be executed is a vector instruction.

【０００８】また、処理データはレジスタに格納される
ようになっており、アドレス生成手段は命令判別手段に
よってベクトル命令であることが判別されたときに当該
命令の処理データのために上記主記憶手段におけるアド
レスを生成する。そして、命令実行手段はキャッシュフ
ラッシュを行うことなくこのレジスタと上記アドレス生
成手段によって生成されたアドレスとの間でのデータ授
受を実行する。従って、ベクトル命令実行時のスループ
ットを向上させることができる。The processing data is stored in a register. When the instruction determining means determines that the instruction is a vector instruction, the address generating means stores the processing data of the instruction in the main storage means. Generate the address at Then, the instruction executing means exchanges data between the register and the address generated by the address generating means without performing cache flush. Therefore, the throughput at the time of executing the vector instruction can be improved.

【０００９】また、請求項２にかかる発明は、上記請求
項１に記載のベクトル命令処理装置において、上記命令
判別手段は、上記ベクトル命令であることを示すコード
によってベクトル命令であるか否かを判別する構成とし
てある。すなわち、本発明においては、上述の従来例で
行っていたようなキャッシュフラッシュ処理を省くた
め、ベクトル命令であることを示すコードを新設する。
従って、当該コードに応じて本発明にかかるベクトル処
理を行うよう構成することによって、簡単に一連のプロ
グラムの処理において本発明にかかる処理を開始するこ
とができる。According to a second aspect of the present invention, in the vector instruction processing device according to the first aspect, the instruction determining means determines whether or not the instruction is a vector instruction by a code indicating the vector instruction. This is a configuration for determining. That is, in the present invention, a code indicating a vector instruction is newly provided in order to omit the cache flush processing performed in the above-described conventional example.
Therefore, by performing the vector processing according to the present invention in accordance with the code, the processing according to the present invention can be easily started in the processing of a series of programs.

【００１０】より具体的には、命令セットレベルでキャ
ッシュフラッシュを行わないベクトルストアおよびベク
トルスキャッタを、また、データキャッシュにデータを
登録しないスカラロード命令およびスカラストア命令を
新たに用意することでキャッシュの一貫性について考慮
する必要をなくすことが可能であり、スループットを向
上して処理を高速化することができる。さらに、新たな
命令を新設することによって、必要であれば従来のよう
にキャッシュフラッシュを伴う処理方式を実行すること
もできる。More specifically, a vector store and a vector scatter that do not perform cache flush at the instruction set level, and a scalar load instruction and a scalar store instruction that do not register data in the data cache are newly prepared. It is possible to eliminate the need to consider the consistency, thereby improving the throughput and speeding up the processing. Further, by newly providing a new instruction, it is possible to execute a processing method involving a cache flush as necessary, as required.

【００１１】さらに、このように本発明による処理をコ
ードによって判別しつつ実行するにしても、高級言語レ
ベルでこのようなコードを発生させるようにするには種
々の構成が考えられる。そのための構成の一例として請
求項３にかかる発明は、上記請求項１または請求項２の
いずれかに記載のベクトル命令処理装置において、高級
言語によって上記実行対象命令で扱う変数がベクトル処
理用の変数である旨を宣言可能である構成としてある。
すなわち、このような構成によると、本発明のようなベ
クトル命令処理を行わせるために変数型を指定するのみ
で良く、プログラマーは非常に簡単にキャッシュフラッ
シュを行わない処理を実行させることができる。Further, even if the processing according to the present invention is executed while being determined by the code, various configurations are possible to generate such a code at a high-level language level. According to a third aspect of the present invention, in the vector instruction processing device according to any one of the first to second aspects, the variable handled in the instruction to be executed in a high-level language is a variable for vector processing. Is a configuration that can be declared.
That is, according to such a configuration, it is only necessary to specify a variable type in order to perform the vector instruction processing as in the present invention, and the programmer can very easily execute the processing without performing the cache flush.

【００１２】さらに、請求項４にかかる発明は、上記請
求項２または請求項３のいずれかに記載のベクトル命令
処理装置において、実行対象命令がベクトル命令であり
上記宣言がなされた変数を扱う場合のコンパイルによっ
て上記ベクトル命令であることを示すコードを得る構成
としてある。すなわち、プログラマーが簡単にキャッシ
ュフラッシュを行わない処理を選択可能な環境を提供し
つつ、本ベクトル処理を行う命令セットを生成すること
ができる。According to a fourth aspect of the present invention, in the vector instruction processing device according to the second or third aspect, the instruction to be executed is a vector instruction and the variable declared above is handled. Is compiled to obtain a code indicating the vector instruction. That is, it is possible to generate an instruction set for performing the vector processing while providing an environment in which a programmer can easily select a process in which cache flush is not performed.

【００１３】さらに、請求項５にかかる発明は、上記請
求項１〜請求項４に記載のベクトル命令処理装置におい
て、上記命令実行手段は、上記アドレス生成手段が生成
したアドレスをキャッシュフラッシュ用に転送しないこ
とによってキャッシュフラッシュを行わないようにする
構成としてある。According to a fifth aspect of the present invention, in the vector instruction processing device of the first to fourth aspects, the instruction executing means transfers the address generated by the address generating means for cache flash. By doing so, the cache flush is not performed.

【００１４】すなわち、通常のスカラ命令処理等におい
ては、生成されたアドレスをキャッシュフラッシュ用に
転送するようになっており、転送アドレスの検索がヒッ
トした場合に有効ビットを無効にすることでキャッシュ
のフラッシュを実行する。そこで、本発明においてはこ
のようなアドレスのキャッシュフラッシュ用の転送を行
わないことによってキャッシュフラッシュ自体を行わな
いようにしている。従って、従来からあるハードウェア
に対して本発明にかかる構成を加えるのみでキャッシュ
フラッシュを行わないベクトル処理を実行することがで
きる。That is, in normal scalar instruction processing and the like, the generated address is transferred for cache flush, and when a search for a transfer address hits, the valid bit is invalidated to enable the cache to be flushed. Perform a flush. Therefore, in the present invention, the cache flush itself is not performed by not performing such an address transfer for the cache flush. Therefore, it is possible to execute the vector processing without performing the cache flush simply by adding the configuration according to the present invention to the existing hardware.

【００１５】さらに、請求項６にかかる発明は、主記憶
部とレジスタとの間でのデータ授受をベクトル命令によ
って処理するベクトル命令処理方法であって、実行対象
命令がベクトル命令であるか否かを判別し、実行対象命
令がベクトル命令であると判別されたときに上記主記憶
部に処理データ用アドレスを確保し、キャッシュフラッ
シュを行うことなく上記レジスタとこのアドレスとの間
でのデータ授受を実行する構成としてある。すなわち、
必ずしも実体のある装置に限らず、その方法としても有
効であることに相違はない。Further, the invention according to claim 6 is a vector instruction processing method for processing data transfer between a main storage unit and a register by a vector instruction, wherein the execution target instruction is a vector instruction. When it is determined that the instruction to be executed is a vector instruction, an address for processing data is secured in the main storage unit, and data transfer between the register and this address is performed without performing a cache flush. There is a configuration to execute. That is,
It is not necessarily limited to a substantial device, and there is no difference that the method is effective.

【００１６】[0016]

【発明の実施の形態】以下、図面にもとづいて本発明の
実施形態を説明する。図１は、本発明の一実施形態にか
かるベクトル命令処理装置を概略ブロック図により示し
ている。同図においては、主にスカラ命令の処理に関連
するスカラユニット１２０と、主にベクトル命令の処理
に関連するベクトルユニット１３０とを示している。ま
た、主記憶部１１０には実行対象命令の処理データや命
令等が格納されるようになっている。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a schematic block diagram showing a vector instruction processing device according to an embodiment of the present invention. In the figure, a scalar unit 120 mainly related to processing of a scalar instruction and a vector unit 130 mainly related to processing of a vector instruction are shown. Further, the main storage unit 110 stores processing data of the instruction to be executed, instructions, and the like.

【００１７】スカラユニット１２０は、命令キャッシュ
１００と命令フェッチ部１０１と命令デコード部１０２
とスカラレジスタ１０３とデータキャッシュ部１０４と
フラッシュアドレス検索部１０５とを備えている。命令
キャッシュ１００は命令をキャッシュしておくためのバ
ッファである。命令フェッチ部１０１は実行対象命令を
命令キャッシュ１００もしくは主記憶部１１０から読み
出すようになっており、命令デコード部１０２はこの命
令フェッチ部１０１が読み出した実行対象命令のデコー
ドを行う。さらに、デコードされた命令はそのコードに
よってベクトル命令であるか否かが判別されるようにな
っており、ベクトル命令であると判別された場合にはこ
の命令がベクトルユニット１３０へ送られる。The scalar unit 120 includes an instruction cache 100, an instruction fetch unit 101, and an instruction decode unit 102.
A scalar register 103, a data cache unit 104, and a flash address search unit 105. The instruction cache 100 is a buffer for caching instructions. The instruction fetch unit 101 reads an execution target instruction from the instruction cache 100 or the main storage unit 110, and the instruction decode unit 102 decodes the execution target instruction read by the instruction fetch unit 101. Further, whether or not the decoded instruction is a vector instruction is determined based on the code. When the decoded instruction is determined to be a vector instruction, the instruction is sent to the vector unit 130.

【００１８】一方、デコードした命令が、スカラロード
命令、もしくはスカラストア命令であった場合は、デー
タキャッシュ部１０４に送られる。データキャッシュ部
１０４はスカラレジスタに対するキャッシュである。フ
ラッシュアドレス検索部１０５は入力されるキャッシュ
フラッシュアドレスに対してキャッシュのヒット、ミス
ヒットの判定を行う。ここで、検索がヒットした場合は
データキャッシュ部１０４からデータがスカラレジスタ
１０３へ転送され、ミスヒットの場合は主記憶部１１０
からデータがスカラレジスタ１０３へ転送される。スカ
ラレジスタ１０３はベクトル演算以外のスカラ演算を実
行するのに用いられるレジスタであり、スカラ命令実行
時のデータ授受に使用される。On the other hand, if the decoded instruction is a scalar load instruction or a scalar store instruction, it is sent to the data cache unit 104. The data cache unit 104 is a cache for a scalar register. The flash address search unit 105 determines a cache hit or mishit for the input cache flash address. Here, if the search hits, the data is transferred from the data cache unit 104 to the scalar register 103, and if the search hits, the main storage unit 110
Is transferred to the scalar register 103. The scalar register 103 is a register used to execute a scalar operation other than a vector operation, and is used to exchange data when executing a scalar instruction.

【００１９】ベクトルユニット１３０は、アドレス生成
部１０６とベクトルロード／ストア制御部１０７とベク
トルレジスタ１０８とベクトル演算器１０９とを備えて
いる。アドレス生成部１０６は、上記命令デコード部１
０２から出力される命令に応じて処理データの主記憶部
１１０におけるアドレスを生成する。ベクトルレジスタ
１０８はベクトル演算を実行するのに用いられるレジス
タであり、一つのレジスタ当たり複数個の要素を格納で
きるようになっている。ベクトルロード／ストア制御部
１０７は、アドレス生成部１０６が生成した主記憶のア
ドレスに対してベクトルレジスタ１０８と主記憶部１１
０との間のベクトルロード、およびベクトルストアの実
行を制御する。The vector unit 130 includes an address generator 106, a vector load / store controller 107, a vector register 108, and a vector calculator 109. The address generation unit 106 is a part of the instruction decoding unit 1
An address of the processing data in the main storage unit 110 is generated in accordance with the instruction output from the main unit 02. The vector register 108 is a register used to execute a vector operation, and can store a plurality of elements per register. The vector load / store control unit 107 stores the vector register 108 and the main storage unit 11 with respect to the main storage address generated by the address generation unit 106.
Controls the execution of vector loads between 0 and vector stores.

【００２０】このように、本実施形態においては、主記
憶部１１０が上記主記憶手段を構成し、命令デコード部
１０２が上記命令判別手段を構成し、アドレス生成部１
０６が上記アドレス生成手段を構成し、ベクトルレジス
タ１０８が上記レジスタを構成し、ベクトルロード／ス
トア制御部１０７が上記命令実行手段を構成する。As described above, in the present embodiment, the main storage unit 110 constitutes the main storage unit, the instruction decoding unit 102 constitutes the instruction determination unit, and the address generation unit 1
06 constitutes the address generation means, the vector register 108 constitutes the register, and the vector load / store control section 107 constitutes the instruction execution means.

【００２１】次に、このような構成における処理を具体
的なＦＯＲＴＲＡＮプログラムを例示しつつ説明する。
図２は本発明によるベクトル処理を実行するためのＦＯ
ＲＴＲＡＮプログラムの一例である。同図において、２
００行と２０１行とでは変数の型宣言を行っている。Ｄ
ＩＭＥＮＳＩＯＮ文である２００行では配列Ｂ（Ｉ），
Ｃ（Ｉ），ＩＡ（Ｉ）の型を宣言しており、これらは通
常の配列として宣言されている。一方、本実施形態にお
いて本発明によるベクトル処理を行うためには、変数を
ベクトル型として宣言すればよく、そのための宣言文が
２０１行目のＶＤＩＭＥＮＳＩＯＮ文である。Next, the processing in such a configuration will be described using a specific FORTRAN program as an example.
FIG. 2 shows a FO for performing vector processing according to the present invention.
It is an example of a RTRAN program. In the figure, 2
Lines 00 and 201 declare variable types. D
In line 200, which is an IMENSION statement, array B (I),
The types of C (I) and IA (I) are declared, and these are declared as ordinary arrays. On the other hand, in order to perform the vector processing according to the present invention in the present embodiment, it is sufficient to declare a variable as a vector type, and the declaration statement for that is the VDIMENTION statement in the 201st line.

【００２２】このＶＤＩＭＥＮＳＩＯＮ文２０１は本発
明で新設される配列宣言文であり、この文で配列宣言さ
れた配列をアクセスするとき、アクセスを行う命令がス
カラ命令かベクトル命令かによって異なる命令セットに
なるようコンパイルされる。すなわち、命令がスカラの
ロード命令、およびストア命令の場合はコンパイラがデ
ータキャッシュにデータを登録しないスカラロード命令
（ＬＤＳＮ）およびびスカラストア命令（ＳＴＳＮ）を
出力する。一方、ベクトルレジスタに対するストア命令
の場合はコンパイラがキャッシュのフラッシュ処理を行
わないベクトルストア命令（ＶＳＴＮ）を出力する。The VDIMENTION statement 201 is an array declaration statement newly established in the present invention. When an array declared by this statement is accessed, a different instruction set is used depending on whether the accessing instruction is a scalar instruction or a vector instruction. Compiled as: That is, if the instruction is a scalar load instruction or a store instruction, the compiler outputs a scalar load instruction (LDSN) and a scalar store instruction (STSN) that do not register data in the data cache. On the other hand, in the case of a store instruction for a vector register, the compiler outputs a vector store instruction (VSTN) that does not perform cache flush processing.

【００２３】ベクトルストア命令やベクトルロード命令
は、一度に大量のデータを転送するために、データキャ
ッシュ部１０４にアクセスせずに直接主記憶部１１０を
アクセスする。一方、ベクトル命令ではない通常のスカ
ラストア命令については、データキャッシュ部１０４が
ストアスルーのキャッシュとして構成されている場合、
キャッシュの一貫性を考慮する必要がない。しかし、ベ
クトルストア命令が主記憶部１１０の内容を直接書き換
えるとき、データキャッシュ部１０４の内容と主記憶部
１１０の内容に矛盾が生じる可能性があり、このときキ
ャッシュの一貫性について考慮する必要がある。A vector store instruction and a vector load instruction directly access the main storage unit 110 without accessing the data cache unit 104 in order to transfer a large amount of data at a time. On the other hand, for a normal scalar store instruction that is not a vector instruction, when the data cache unit 104 is configured as a store-through cache,
No need to consider cache coherency. However, when the vector store instruction directly rewrites the contents of the main storage unit 110, there is a possibility that the contents of the data cache unit 104 and the contents of the main storage unit 110 may be inconsistent. At this time, it is necessary to consider the coherency of the cache. is there.

【００２４】そこで、キャッシュデータに矛盾を生じさ
せないために、ベクトルストア命令（ＶＳＴ）、または
ベクトルスキャッタ命令（ＶＳＣＮ）が実行されたとき
は、主記憶部１１０にストアするアドレスがデータキャ
ッシュ部１０４に存在するか否かを検索する。そして、
データキャッシュ部１０４に該当アドレスが存在したと
きは、そのアドレスのキャッシュラインを無効にする処
理を行う。本発明で新設されるＶＳＴＮ命令およびＶＳ
ＣＮ命令は、このキャッシュを無効にする処理を省略し
たベクトルストア命令およびベクトルスキャッタ命令で
ある。また、ＬＤＳＮ命令、およびＳＴＳＮ命令は、キ
ャッシュを介さないスカラロード命令、およびスカラス
トア命令である。Therefore, when a vector store instruction (VST) or a vector scatter instruction (VSCN) is executed, an address to be stored in the main storage unit 110 is stored in the data cache unit 104 in order to prevent inconsistency in the cache data. Search for the existence. And
When the corresponding address exists in the data cache unit 104, a process of invalidating the cache line of the address is performed. VSTN instruction and VS newly established in the present invention
The CN instructions are a vector store instruction and a vector scatter instruction in which the process of invalidating the cache is omitted. The LDSN instruction and the STSN instruction are a scalar load instruction and a scalar store instruction that do not pass through a cache.

【００２５】図２の２０２に示すループでは、配列Ｂ
（Ｉ），Ｃ（Ｉ），ＩＡ（Ｉ）を初期化しており、２０
３に示すループが実際の演算である。２０２に示すルー
プでは「Ｉ」がループの繰り返しを示す変数であり、
「Ｉ」が「１〜３」までループする。また、配列Ｂ
（Ｉ）には変数「Ｉ」が代入され、配列Ｃ（Ｉ）には
「Ｉ＋３」が代入され、配列ＩＡ（Ｉ）には「Ｉ＊Ｉ＊
１００」が代入される。従って、このループを経て各配
列値は図３（ａ）に示す値になる。In the loop shown at 202 in FIG.
(I), C (I) and IA (I) are initialized, and 20
The loop shown in FIG. 3 is the actual operation. In the loop shown in 202, “I” is a variable indicating the repetition of the loop,
"I" loops to "1-3". Array B
The variable “I” is assigned to (I), “I + 3” is assigned to array C (I), and “I * I *” is assigned to array IA (I).
100 ”is substituted. Therefore, through this loop, each array value becomes a value shown in FIG.

【００２６】２０３に示すループでは配列Ｂ（Ｉ）の要
素と配列Ｃ（Ｉ）の要素とを加え、配列ＩＡ（Ｉ）に格
納されている値を添え字としつつ上記ＶＤＩＭＥＮＳＩ
ＯＮ文で宣言した配列Ａ（Ｉ）に代入する。さらに、本
例では２０３に示すループを抜けたあとに変数ＥにＡ
（１００）の値とＡ（４００）の値を加算したものを代
入する。図４はこのようなＦＯＲＴＲＡＮプログラムを
コンパイルすることにより出力されるアセンブラコード
の一部であり、図５は当該プログラムによって主記憶部
１１０に格納される値とベクトルレジスタ１０８に格納
される値を示している。In the loop indicated by 203, the elements of the array B (I) and the elements of the array C (I) are added, and the VDIMEMSI
Assign to the array A (I) declared by the ON statement. Further, in this example, after exiting the loop shown by 203, the variable E is set to A
The value obtained by adding the value of (100) and the value of A (400) is substituted. FIG. 4 shows a part of the assembler code output by compiling such a FORTRAN program. FIG. 5 shows values stored in the main storage unit 110 and values stored in the vector register 108 by the program. ing.

【００２７】ここで、ＶＬＤ命令はベクトルロード命令
であり、第３オペランドで指定されるアドレスを主記憶
上の開始位置とし、第２オペランドで指定されるた値を
距離として、第１オペランドに指定されたベクトルレジ
スタ１０８の各要素に格納する。ＶＦＡＤ命令は、第２
オペランドと第３オペランドのベクトルレジスタの各要
素を加算し、第１オペランドのベクトルレジスタに格納
する。ＶＳＣＮ命令は、第２オペランドで指定されるベ
クトルレジスタの各要素を、第１オペランドで指定され
るベクトルレジスタの各要素を実行アドレスとするメモ
リロケーションに、順に格納する。Here, the VLD instruction is a vector load instruction, in which the address specified by the third operand is set as the start position on the main memory, the value specified by the second operand is set as the distance, and the first operand is specified. Is stored in each element of the registered vector register 108. The VFAD instruction is
The operand and each element of the vector register of the third operand are added and stored in the vector register of the first operand. The VSCN instruction sequentially stores each element of the vector register specified by the second operand in a memory location whose execution address is each element of the vector register specified by the first operand.

【００２８】ＬＤＳＮ命令は、第２オペランドのメモリ
位置のデータを第１オペランドのスカラレジスタに格納
する。この命令は、常に直接主記憶部１１０からデータ
をロードし、ロードされたデータはデータキャッシュへ
登録されない。ＡＤＤ命令は、第２オペランドと第３オ
ペランドのスカラレジスタの値を加算し、第１オペラン
ドのスカラレジスタに格納する。プログラムを実行する
と、各命令が図１において命令キャッシュ１００若しく
は主記憶部１１０から読み出され、読み出されたた命令
は命令デコード部１０２に送られる。The LDSN instruction stores the data at the memory location of the second operand in the scalar register of the first operand. This instruction always loads data directly from the main storage unit 110, and the loaded data is not registered in the data cache. The ADD instruction adds the values of the scalar registers of the second and third operands and stores the result in the scalar register of the first operand. When the program is executed, each instruction is read from the instruction cache 100 or the main storage unit 110 in FIG. 1, and the read instruction is sent to the instruction decoding unit 102.

【００２９】命令デコード部１０２にて命令をデコード
した結果がベクトル命令と判明したら、命令はベクトル
ユニット１３０内に送られる。すなわち、図４のＶＬＤ
命令はベクトルユニット１３０に送られる。ＶＬＤ命令
は上述のように、主記憶上の所定の位置のデータをベク
トルレジスタ１０８に格納するようになっており、第一
のＶＬＤ命令では、図５に示す主記憶部１１０の位置Ｓ
５０から順に値を抽出し、ベクトルレジスタ１０８のＶ
０に格納させる。第二のＶＬＤ命令では位置Ｓ５１以降
の値が順にベクトルレジスタ１０８のＶ１に格納され、
第三のＶＬＤ命令では位置Ｓ５２以降の値が順にベクト
ルレジスタ１０８のＶ２に格納される（ステップＡ
１）。すなわち、図３（ａ）に示す各値がベクトルレジ
スタ１０８に格納される。If the instruction decoded by the instruction decoding unit 102 is found to be a vector instruction, the instruction is sent to the vector unit 130. That is, the VLD of FIG.
The instruction is sent to the vector unit 130. As described above, the VLD instruction stores data at a predetermined position on the main memory in the vector register 108, and the first VLD instruction stores the data at the position S in the main storage unit 110 shown in FIG.
The values are extracted in order from 50, and V
0 is stored. In the second VLD instruction, the values after the position S51 are sequentially stored in V1 of the vector register 108,
In the third VLD instruction, the values after the position S52 are sequentially stored in V2 of the vector register 108 (step A).
1). That is, the values shown in FIG. 3A are stored in the vector register 108.

【００３０】続いて、図４のＶＦＡＤ命令により、Ｖ０
とＶ１の加算が実行され、Ｖ３に格納される（ステップ
Ａ２）。次に、図２の２０３に示すループ中の左辺Ａ
（ＩＡ（Ｉ））への代入に相当する処理として図４のＶ
ＳＣＮ命令が実行される。ここでの配列のアクセスが図
２の配列Ａに対するものであるため、キャッシュをアク
セスしないベクトルスキャッタ命令であるＶＳＣＮ命令
２０２が実行されることとなる。この命令が図１のベク
トルユニット１３０のアドレス生成部１０６に来たと
き、従来からある命令のようにフラッシュアドレスを転
送せず、ベクトルレジスタ１０８と主記憶部１１０の転
送のみを行う（ステップＡ３）。すなわち、図３（ｂ）
に示すように各和の結果が主記憶部１１０の配列Ａの添
え字に対応したアドレスに格納される。Subsequently, the VFAD instruction shown in FIG.
And V1 are added and stored in V3 (step A2). Next, the left side A in the loop shown by 203 in FIG.
As a process corresponding to the substitution into (IA (I)), V in FIG.
The SCN instruction is executed. Since the array access here is for array A in FIG. 2, the VSCN instruction 202, which is a vector scatter instruction that does not access the cache, is executed. When this instruction arrives at the address generation unit 106 of the vector unit 130 shown in FIG. 1, the flash address is not transferred unlike the conventional instruction, and only the transfer of the vector register 108 and the main storage unit 110 is performed (step A3). . That is, FIG.
As shown in (1), the result of each sum is stored in the main storage unit 110 at an address corresponding to the subscript of the array A.

【００３１】このようにして図２の２０３に示すループ
を実行したことになり、最後に、２０４行に示す和を計
算する。図４の３０３がこの和算に対応する命令であ
る。ここでの配列のアクセスは図２の配列Ａに対するも
のであるため、キャッシュをアクセスしないＬＤＳＮ命
令がコンパイラによって出力されている。従って、ＬＤ
ＳＮ命令がアドレス生成部１０６に出力されると、キャ
ッシュをアクセスせずに直接主記憶部１１０からロード
を行い、スカラレジスタにデータを格納する。このとき
キャッシュにはデータを登録しないため、今後同じアド
レスのデータに対して再びベクトルストア命令、または
ベクトルスキャッタ命令が実行された場合でもキャッシ
ュの一貫性について考慮する必要はない。この結果を使
い、ＡＤＤ命令によってスカラレジスタ同士の加算が行
われ、変数Ｅに所定の加算結果が代入される。In this way, the loop indicated by 203 in FIG. 2 has been executed, and finally, the sum shown in line 204 is calculated. Reference numeral 303 in FIG. 4 is an instruction corresponding to this addition. Since the array access here is for array A in FIG. 2, an LDSN instruction that does not access the cache is output by the compiler. Therefore, LD
When the SN instruction is output to the address generation unit 106, the data is directly loaded from the main storage unit 110 without accessing the cache, and the data is stored in the scalar register. At this time, since no data is registered in the cache, it is not necessary to consider the coherency of the cache even if a vector store instruction or a vector scatter instruction is executed again on data at the same address in the future. Using this result, the addition between the scalar registers is performed by an ADD instruction, and a predetermined addition result is substituted for a variable E.

【００３２】このように、本発明では、主記憶部とレジ
スタとの間でのデータ授受をベクトル命令によって処理
するベクトル命令処理方法において、実行対象命令がベ
クトル命令であるか否かを判別し、実行対象命令がベク
トル命令であると判別されたときに上記主記憶部に処理
データ用アドレスを確保し、キャッシュフラッシュを行
うことなく上記レジスタとこのアドレスとの間でのデー
タ授受を実行する。従って、キャッシュフラッシュによ
ってベクトル命令の速度が抑えられることなく、また、
不必要なフラッシュを行うことなくベクトル命令に対す
る処理を実行することができる。As described above, according to the present invention, in a vector instruction processing method for processing data transfer between a main storage unit and a register by a vector instruction, it is determined whether an instruction to be executed is a vector instruction, When it is determined that the instruction to be executed is a vector instruction, an address for processing data is secured in the main storage unit, and data transfer between the register and this address is performed without performing cache flush. Therefore, the speed of vector instructions is not suppressed by cache flush, and
Processing for a vector instruction can be executed without unnecessary flushing.

【００３３】[0033]

【発明の効果】以上説明したように本発明によれば、キ
ャッシュフラッシュを行うことなく命令を実行するの
で、ベクトル命令実行時のスループットを向上させるこ
とができる。また、請求項２にかかる発明によれば、一
連のプログラムの処理において簡単に本発明にかかる処
理を開始することができる。さらに、請求項３にかかる
発明によれば、プログラマーは非常に簡単にキャッシュ
フラッシュを行わない処理を実行させることができる。As described above, according to the present invention, since an instruction is executed without performing cache flush, the throughput at the time of executing a vector instruction can be improved. According to the second aspect of the present invention, the processing according to the present invention can be started easily in the processing of a series of programs. Furthermore, according to the third aspect of the present invention, the programmer can very easily execute the process without performing the cache flush.

【００３４】さらに、請求項４にかかる発明によれば、
プログラマーが簡単にキャッシュフラッシュを行わない
処理を選択可能な環境を提供しつつ、本ベクトル処理を
行う命令セットを生成することができる。さらに、請求
項５にかかる発明によれば、従来からあるハードウェア
に対して本発明にかかる構成を加えるのみでキャッシュ
フラッシュを行わないベクトル処理を実行することがで
きる。さらに、請求項６にかかる発明によれば、キャッ
シュフラッシュを行うことなく命令を実行するので、ベ
クトル命令実行時のスループットを向上させることがで
きる。Further, according to the invention of claim 4,
An instruction set for performing the vector processing can be generated while providing an environment in which a programmer can easily select a process in which cache flush is not performed. Further, according to the fifth aspect of the present invention, it is possible to execute vector processing that does not perform cache flushing by merely adding the configuration according to the present invention to conventional hardware. Furthermore, according to the invention of claim 6, since the instruction is executed without performing the cache flush, the throughput at the time of executing the vector instruction can be improved.

[Brief description of the drawings]

【図１】本発明の一実施形態にかかるベクトル命令処理
装置の概略ブロック図である。FIG. 1 is a schematic block diagram of a vector instruction processing device according to an embodiment of the present invention.

【図２】本発明によるベクトル処理を実行するためのＦ
ＯＲＴＲＡＮプログラムの一例である。FIG. 2 shows an F for performing vector processing according to the invention;
It is an example of an ORTRAN program.

【図３】配列変数の値を示す図である。FIG. 3 is a diagram showing values of array variables.

【図４】ＦＯＲＴＲＡＮプログラムをコンパイルするこ
とにより出力されるアセンブラコードの一部である。FIG. 4 shows a part of assembler code output by compiling a FORTRAN program.

【図５】主記憶部に格納される値とベクトルレジスタに
格納される値を示す図である。FIG. 5 is a diagram showing values stored in a main storage unit and values stored in a vector register.

[Explanation of symbols]

１００命令キャッシュ１０１命令フェッチ部１０２命令デコード部１０３スカラレジスタ１０４データキャッシュ部１０５フラッシュアドレス検索部１０６アドレス生成部１０７ベクトルロード／ストア制御部１０８ベクトルレジスタ１０９ベクトル演算器１１０主記憶部１２０スカラユニット１３０ベクトルユニット REFERENCE SIGNS LIST 100 instruction cache 101 instruction fetch unit 102 instruction decode unit 103 scalar register 104 data cache unit 105 flash address search unit 106 address generation unit 107 vector load / store control unit 108 vector register 109 vector operation unit 110 main storage unit 120 scalar unit 130 vector unit

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ０６Ｆ 12/08 Ｇ０６Ｆ 12/08 ３１０Ａ 9/30 ３４０Ａ３１０ 9/44 ３２２Ｇ ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat ゛ (Reference) G06F 12/08 G06F 12/08 310A 9/30 340A 310 9/44 322G

Claims

[Claims]

A main storage unit for storing data; an instruction judging unit for judging whether or not the instruction to be executed is a vector instruction; and when the instruction judging unit judges that the instruction is a vector instruction, An address generating means for generating an address in the main storage means for processing data of the vector instruction; a register for storing the processing data; and a register generated by the register and the address generating means without performing a cache flush. A vector instruction processing apparatus, comprising: an instruction executing means for executing data transfer with an address.

2. The vector instruction processing device according to claim 1, wherein the instruction determining means determines whether the instruction is a vector instruction based on a code indicating the vector instruction. Instruction processing unit.

3. The vector instruction processing apparatus according to claim 1, wherein a high-level language can declare that a variable handled by the execution target instruction is a variable for vector processing. A vector instruction processing device characterized by the above-mentioned.

4. The vector instruction processing device according to claim 2, wherein the instruction to be executed is a vector instruction and the vector instruction is a vector instruction by compiling when handling the declared variable. A vector instruction processing device, which obtains a code indicating the fact.

5. The vector instruction processing device according to claim 1, wherein the instruction executing means does not perform cache flush by not transferring an address generated by the address generating means for cache flush. A vector instruction processing apparatus characterized in that:

6. A vector instruction processing method for processing data transfer between a main storage unit and a register by a vector instruction, comprising: determining whether an instruction to be executed is a vector instruction; A vector for securing a processing data address in the main storage unit when the instruction is determined to be a vector instruction, and executing data transfer between the register and this address without performing cache flush; Instruction processing method.