JPH08241302A

JPH08241302A - Vector processor and multiplier

Info

Publication number: JPH08241302A
Application number: JP7046763A
Authority: JP
Inventors: Koji Kuroda; 浩二黒田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1995-03-07
Filing date: 1995-03-07
Publication date: 1996-09-17
Anticipated expiration: 2020-09-07
Also published as: JP3691538B2

Abstract

PURPOSE: To provide the vector processor which makes it possible to perform vector addition processing fast and the multiplier which is suitably used for vector multiplication processing performed by using the vector addition processing of the vector processor. CONSTITUTION: The vector processor which is equipped with at least a vector register 12, a mask register 13, and an adder 14 and performs vector processing employs constitution wherein data of the mask register 13 are inputted to the adder 14 in addition to the addend and augend of a vector operand and carry- out data calculated by the adder 14 are outputted to the mask register 13. Further, the multiplier 15 has a function for calculating 2m-bit data as the multipication value of two (m)-bit input data and is so constituted as to have a selector which selects and outputs one of the high-order (m) bit data and low-order (m)-bit data of the multiplication value in response to an instruction.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、ベクトル加算処理を高
速に実行できるようにするベクトル処理装置と、そのベ
クトル処理装置のベクトル加算処理を使って実行するベ
クトル乗算処理で用いるのに好適な乗算器とに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a vector processing device capable of executing a vector addition process at high speed, and a multiplication suitable for use in a vector multiplication process executed by using the vector addition process of the vector processing device. Related to vessels.

【０００２】ベクトル処理装置では、ベクトル加算処理
やベクトル乗算処理を実行する。このようなベクトル演
算処理は高速に実行できるようにする必要がある。A vector processing device executes vector addition processing and vector multiplication processing. It is necessary to enable such vector calculation processing to be executed at high speed.

【０００３】[0003]

【従来の技術】従来のベクトル処理装置では、ベクトル
加算処理を実行するときには、桁上げの発生を考慮し
て、ベクトルシフト命令を実行しながらベクトル加算命
令を実行していくという構成を採っていた。2. Description of the Related Art In a conventional vector processing device, when a vector addition process is executed, a vector addition command is executed while a vector shift command is executed in consideration of the occurrence of a carry. .

【０００４】次に、１６０ビットの被加数と１６０ビッ
トの加数との加算処理を例にとって、この従来技術を詳
細に説明する。加算器が６４ビット同士の加算処理を実
行する場合には、従来では、図１３に示すように、６４
ビットの３つのレジスタ（ｖｒ00，ｖｒ01，ｖｒ02）か
らなる被加数用のレジスタと、６４ビットの３つのレジ
スタ（ｖｒ03，ｖｒ04，ｖｒ05）からなる加数用のレジ
スタとを用意して、例えば、図１４に示す形式、すなわ
ち、図１５に図式化する形式に従って、その被加数用の
レジスタに１６０ビットの被加数を格納するとともに、
加数用のレジスタに１６０ビットの加数を格納する。Next, this conventional technique will be described in detail by taking the addition processing of the 160-bit addend and the 160-bit addend as an example. When the adder executes an addition process of 64 bits, conventionally, as shown in FIG.
A register for an addend consisting of three registers (vr00, vr01, vr02) of bits and a register for addend consisting of three registers of 64 bits (vr03, vr04, vr05) are prepared. According to the format shown in FIG. 14, that is, the format shown in FIG. 15, the augend of 160 bits is stored in the register for the augend, and
The 160-bit addend is stored in the addend register.

【０００５】そして、図１６に示すベクトル命令列を発
行することで、１６０ビットの被加数と１６０ビットの
加数との加算処理を実行する。ここで、「ＶＡｖｒ１，ｖｒ２，ｖｒ３」は、ベクトルレジスタｖｒ１とベクトルレジスタｖｒ２
との加算結果をベクトルレジスタｖｒ３に格納しろとい
うベクトル加算命令であり、「ＶＳＲｖｒ１，ＳＣ，ｖｒ３」は、ベクトルレジスタｖｒ１のデータをＳＣビット右シ
フトしてベクトルレジスタｖｒ３に格納しろというベク
トルシフト命令であり、「ＶＳＬｖｒ１，ＳＣ，ｖｒ３」は、ベクトルレジスタｖｒ１のデータをＳＣビット左シ
フトしてベクトルレジスタｖｒ３に格納しろというベク
トルシフト命令である。By issuing the vector instruction sequence shown in FIG. 16, the addition process of the 160-bit augend and the 160-bit augend is executed. Here, “VA vr1, vr2, vr3” is the vector register vr1 and the vector register vr2.
Is a vector addition instruction to store the addition result with the vector register vr3, and "VSR vr1, SC, vr3" is a vector shift instruction to store the data of the vector register vr1 right SC bits and store it in the vector register vr3. And "VSL vr1, SC, vr3" is a vector shift instruction to shift the data in the vector register vr1 left by SC bits and store it in the vector register vr3.

【０００６】すなわち、図１６に示すベクトル命令列に
従い、先ず最初に、(1) のベクトル加算命令ＶＡに従っ
て、ベクトルレジスタｖｒ02の被加数部分と、ベクトル
レジスタｖｒ05の加数部分とを加算してベクトルレジス
タｖｒ10に格納する。このとき、桁上げが発生する可能
性があるので、続いて、(2) のベクトルシフト命令ＶＳ
Ｒに従って、ベクトルレジスタ10の格納データを６０ビ
ット右シフトすることでその桁上げ値（キャリーアウト
データ）を取り出して、それをベクトルレジスタ15に格
納する。That is, according to the vector instruction sequence shown in FIG. 16, first, according to the vector addition instruction VA of (1), the augend part of the vector register vr02 and the addend part of the vector register vr05 are added. Store in vector register vr10. At this time, a carry may occur, so the vector shift instruction VS of (2) is continued.
According to R, the data stored in the vector register 10 is right-shifted by 60 bits to take out the carry value (carry-out data) and store it in the vector register 15.

【０００７】続いて、(3) のベクトル加算命令ＶＡに従
って、ベクトルレジスタｖｒ01の被加数部分と、ベクト
ルレジスタｖｒ04の加数部分とを加算してベクトルレジ
スタｖｒ20に格納する。Then, in accordance with the vector addition instruction VA of (3), the augend of the vector register vr01 and the augend of the vector register vr04 are added and stored in the vector register vr20.

【０００８】続いて、(4) のベクトル加算命令ＶＡに従
って、下位部分の加算処理により発生したキャリーアウ
トデータを加算すべく、ベクトルレジスタｖｒ15の格納
するキャリーアウトデータと、ベクトルレジスタｖｒ20
の格納データとを加算してベクトルレジスタｖｒ20に格
納する。このとき、桁上げが発生する可能性があるの
で、続いて、(5) のベクトルシフト命令ＶＳＲに従っ
て、ベクトルレジスタ20の格納データを６０ビット右シ
フトすることでそのキャリーアウトデータを取り出し
て、それをベクトルレジスタｖｒ25に格納する。Then, in accordance with the vector addition instruction VA of (4), the carry-out data stored in the vector register vr15 and the vector register vr20 are added in order to add the carry-out data generated by the addition processing of the lower part.
The stored data is added and stored in the vector register vr20. At this time, a carry may occur. Then, according to the vector shift instruction VSR in (5), the stored data in the vector register 20 is right-shifted by 60 bits to take out the carry-out data, Is stored in the vector register vr25.

【０００９】続いて、(6) のベクトル加算命令ＶＡに従
って、ベクトルレジスタｖｒ00の被加数部分と、ベクト
ルレジスタｖｒ03の加数部分とを加算してベクトルレジ
スタｖｒ30に格納する。Then, according to the vector addition instruction VA of (6), the augend part of the vector register vr00 and the addend part of the vector register vr03 are added and stored in the vector register vr30.

【００１０】続いて、(7) のベクトル加算命令ＶＡに従
って、下位部分の加算処理により発生したキャリーアウ
トデータを加算すべく、ベクトルレジスタｖｒ25の格納
するキャリーアウトデータと、ベクトルレジスタｖｒ30
の格納データとを加算してベクトルレジスタｖｒ6 に格
納する。Then, in accordance with the vector addition instruction VA of (7), the carry-out data stored in the vector register vr25 and the vector register vr30 are added in order to add the carry-out data generated by the addition processing of the lower part.
And the stored data are stored in the vector register vr6.

【００１１】続いて、ベクトルレジスタｖｒ10に格納さ
れる６０ビットの有効データを取り出すべく、(8) のベ
クトルシフト命令ＶＳＬに従って、ベクトルレジスタ10
の格納データを４ビット左シフトして、それをベクトル
レジスタｖｒ10に格納し、(9) のベクトルシフト命令Ｖ
ＳＲに従って、そのベクトルレジスタｖｒ10の格納デー
タを４ビット右シフトしてベクトルレジスタｖｒ8 に格
納することで、上位４ビットにゼロ値を持つその６０ビ
ットの有効データをベクトルレジスタｖｒ8 に格納す
る。Then, in order to take out the 60-bit effective data stored in the vector register vr10, the vector register 10 according to the vector shift instruction VSL of (8)
The stored data of is shifted left by 4 bits and stored in the vector register vr10, and the vector shift instruction V of (9)
According to SR, the data stored in the vector register vr10 is right-shifted by 4 bits and stored in the vector register vr8, so that the 60-bit effective data having a zero value in the upper 4 bits is stored in the vector register vr8.

【００１２】続いて、ベクトルレジスタｖｒ20に格納さ
れる６０ビットの有効データを取り出すべく、(10)のベ
クトルシフト命令ＶＳＬに従って、ベクトルレジスタ20
の格納データを４ビット左シフトして、それをベクトル
レジスタｖｒ20に格納し、(11)のベクトルシフト命令Ｖ
ＳＲに従って、そのベクトルレジスタｖｒ20の格納デー
タを４ビット右シフトしてベクトルレジスタｖｒ7 に格
納することで、上位４ビットにゼロ値を持つその６０ビ
ットの有効データをベクトルレジスタｖｒ7 に格納す
る。Then, in order to take out the 60-bit effective data stored in the vector register vr20, the vector register 20 according to the vector shift instruction VSL in (10)
The stored data of 4 is left-shifted by 4 bits and stored in the vector register vr20, and the vector shift instruction V of (11)
According to SR, the data stored in the vector register vr20 is right-shifted by 4 bits and stored in the vector register vr7, so that the 60-bit effective data having a zero value in the upper 4 bits is stored in the vector register vr7.

【００１３】このように、従来のベクトル処理装置で
は、ベクトル加算処理を実行するときには、桁上げの発
生を考慮して、ベクトルシフト命令を実行しながらベク
トル加算命令を実行していくという構成を採っていたの
である。As described above, in the conventional vector processing device, when the vector addition process is executed, the vector addition command is executed while the vector shift command is executed in consideration of the occurrence of carry. It was.

【００１４】一方、従来のベクトル処理装置の備える乗
算器では、６４ビット×６４ビットのような入力仕様を
持つ場合にあっても、ハードウェア量の削減を図るため
に、６４ビット×１６ビットのような少ないビット数の
乗算機能を持つ構成を採っていた。On the other hand, in the multiplier provided in the conventional vector processing device, even if it has an input specification of 64 bits × 64 bits, in order to reduce the amount of hardware, the multiplier of 64 bits × 16 bits is used. It adopted a configuration with a multiplication function with such a small number of bits.

【００１５】そして、入力仕様の６４ビット同士の乗算
処理を実現するために、乗数を１６ビット単位に４分割
し、６４ビット×１６ビット乗算機能を使って、それら
の１６ビットの乗数部分と６４ビットの被乗数とを乗算
することで部分積を求め、それらの部分積を１６ビット
シフトしつつ加算して、その加算結果の示す６４ビット
部分を乗算結果として出力するという構成を採ってい
た。Then, in order to realize the multiplication process of 64 bits of the input specifications, the multiplier is divided into 4 in 16-bit units, and the 16-bit multiplier part and 64 A partial product is obtained by multiplying by the multiplicand of bits, the partial products are added while being shifted by 16 bits, and the 64-bit part indicated by the addition result is output as the multiplication result.

【００１６】例えば、６４ビット同士の乗算処理の上位
６４ビットが必要となる場合には、命令指示に従って、
図１７に示すように、４分割した乗数を下位側から順番
に選択して部分積を求め、それらの部分積を１６ビット
左シフトしつつ加算していくことで乗算処理の上位６４
ビットを得て出力していた。また、６４ビット同士の乗
算処理の下位６４ビットが必要となる場合には、命令指
示に従って、４分割した乗数を上位側から順番に選択し
て部分積を求め、それらの部分積を１６ビット右シフト
しつつ加算していくことで乗算処理の下位６４ビットを
得て出力していた。For example, when the upper 64 bits of the multiplication process of 64 bits are required, according to the instruction,
As shown in FIG. 17, the multipliers divided into four are sequentially selected from the lower side to obtain partial products, and the partial products are added while being left shifted by 16 bits.
I got a bit and output it. If the lower 64 bits of the 64-bit multiplication process are required, the four-divided multipliers are sequentially selected from the upper side in accordance with the instruction, and partial products are obtained. The lower 64 bits of the multiplication process are obtained and output by adding while shifting.

【００１７】[0017]

【発明が解決しようとする課題】しかしながら、従来技
術のように、ベクトル加算処理を実行するときに、桁上
げの発生を考慮して、ベクトルシフト命令を実行しなが
らベクトル加算命令を実行していくという構成を採って
いると、命令数が多くなることから高速にベクトル加算
処理を実行できないという問題点があった。However, as in the prior art, when the vector addition processing is executed, the vector addition instruction is executed while the vector shift instruction is executed in consideration of the occurrence of a carry. If such a configuration is adopted, there is a problem that the vector addition processing cannot be executed at high speed because the number of instructions increases.

【００１８】また、従来技術の乗算器では、内部で実行
する乗算回数が多くなるとともに、内部に、部分積をシ
フトし加算する機能（ループ構成を使用している）を持
たなくてはならないという問題点があった。Further, in the prior art multiplier, the number of multiplications to be executed internally increases, and it is necessary to internally have a function of shifting and adding partial products (using a loop structure). There was a problem.

【００１９】本発明はかかる事情に鑑みてなされたもの
であって、ベクトル加算処理を高速に実行できるように
するベクトル処理装置の提供と、そのベクトル処理装置
のベクトル加算処理を使って実行するベクトル乗算処理
で用いるのに好適な乗算器の提供とを目的とする。The present invention has been made in view of the above circumstances, and provides a vector processing device capable of executing a vector addition process at high speed and a vector executed by using the vector addition process of the vector processing device. An object of the present invention is to provide a multiplier suitable for use in multiplication processing.

【００２０】[0020]

【課題を解決するための手段】図１に本発明の原理構成
を図示する。図中、１は本発明を具備するベクトル処理
装置であって、ＣＰＵ１０と、ベクトル命令制御機構１
１と、ベクトルレジスタ１２と、マスクレジスタ１３
と、加算器１４と、乗算器１５とを備える。FIG. 1 shows the principle configuration of the present invention. In the figure, reference numeral 1 is a vector processing device equipped with the present invention, which includes a CPU 10 and a vector instruction control mechanism 1.
1, vector register 12, mask register 13
And an adder 14 and a multiplier 15.

【００２１】このＣＰＵ１０は、ベクトル命令を発行す
る。ベクトル命令制御機構１１は、ベクトル命令の実行
を制御する。ベクトルレジスタ１２は、ベクトルデータ
を格納する。マスクレジスタ１３は、ベクトル処理で使
用するマスクデータを格納する。加算器１４は、ベクト
ル加算命令を実行する。乗算器１５は、ベクトル乗算命
令を実行する。The CPU 10 issues a vector instruction. The vector instruction control mechanism 11 controls the execution of vector instructions. The vector register 12 stores vector data. The mask register 13 stores mask data used in vector processing. The adder 14 executes a vector addition instruction. The multiplier 15 executes a vector multiplication instruction.

【００２２】本発明を実現するために、加算器１４は、
ベクトルオペランドの加数と被加数の他に、マスクレジ
スタ１３に格納されるキャリーアウトデータを入力する
とともに、算出結果のキャリーアウトデータをマスクレ
ジスタ１３へ出力する構成を採る。このとき、マスクオ
ペランドの増加により命令で指定するレジスタ数が増え
る場合には、命令で指定するレジスタ数を抑えるべく、
入力用のマスクレジスタと出力用のマスクレジスタとし
て同一のものを使用する構成を採ることが好ましい。To implement the invention, the adder 14 is
In addition to the addend and the augend of the vector operand, carry-out data stored in the mask register 13 is input, and the carry-out data of the calculation result is output to the mask register 13. At this time, if the number of registers specified by the instruction increases due to the increase of the mask operand, in order to suppress the number of registers specified by the instruction,
It is preferable to use the same mask register for input and the same mask register for output.

【００２３】一方、本発明を実現するために、乗算器１
５は、入力される２つのｍビットデータの乗算値となる
２ｍビットデータを算出する機能を有するとともに、命
令に応答して、乗算値の上位ｍビットデータか下位ｍビ
ットデータのいずれか一方を選択して出力するセレクタ
を持つ構成を採る。On the other hand, in order to realize the present invention, the multiplier 1
5 has a function of calculating 2m-bit data which is a multiplication value of two pieces of input m-bit data, and in response to an instruction, outputs either the upper m-bit data or the lower m-bit data of the multiplication value. It adopts a configuration that has a selector for selecting and outputting.

【００２４】更に、本発明のベクトル処理装置１では、
レジスタからの入力を指示する命令の指定するレジスタ
番号が、特定のベクトルレジスタ１２あるいは特定のマ
スクレジスタ１３を指すときには、そのレジスタからの
データをゼロ値として扱って入力処理を実行し、レジス
タへの出力を指示する命令の指定するレジスタ番号が、
特定のベクトルレジスタ１２あるいは特定のマスクレジ
スタ１３を指すときには、そのレジスタへのデータ出力
処理を実行しないように構成する。Furthermore, in the vector processing device 1 of the present invention,
When the register number designated by the instruction for inputting from the register points to the specific vector register 12 or the specific mask register 13, the data from that register is treated as a zero value to execute the input processing, and The register number specified by the instruction that directs output is
When designating a specific vector register 12 or a specific mask register 13, the data output process to that register is not executed.

【００２５】この構成を採ることで、特定のレジスタを
指定すれば、同じ命令でも色々な実行ができるために命
令数を抑えることができるようになる。そして、レジス
タの初期化が不要になるとともに、レジスタの使用数を
抑えることもできるようになる。By adopting this configuration, if a specific register is designated, various instructions can be executed even with the same instruction, so that the number of instructions can be suppressed. In addition, the need for register initialization is eliminated and the number of registers used can be suppressed.

【００２６】[0026]

【作用】本発明の加算器１４は、ベクトルオペランドの
被加数と加数の他に、マスクレジスタ１３に書き込まれ
るキャリーアウトデータを入力として加算処理を実行し
て、その加算結果により生ずるキャリーアウトデータを
マスクレジスタ１３に書き込んでいく。The adder 14 of the present invention receives carry-out data written in the mask register 13 as input, in addition to the augend and addend of the vector operand, executes the addition process, and carries out the carry-out resulting from the addition result. Data is written in the mask register 13.

【００２７】このように、キャリーアウトデータをマス
クレジスタ１３に格納していく構成を採ることから、従
来技術のように、ベクトルシフト命令を実行しながらベ
クトル加算命令を実行していくという構成を採る必要が
なくなり、少ない命令数でもって高速にベクトル加算命
令を実行できるようになる。Since the carry-out data is stored in the mask register 13 as described above, the vector addition instruction is executed while the vector shift instruction is executed as in the prior art. It is not necessary, and the vector addition instruction can be executed at high speed with a small number of instructions.

【００２８】また、本発明の乗算器１５は、入力される
２つのｍビットデータの乗算値となる２ｍビットデータ
を算出する機能を有する。すなわち、従来の乗算器で
は、部分積を求め、それらをシフトしつつ加算すること
で、入力される２つのｍビットデータの乗算値となるｍ
ビットデータを算出する構成を採っているのに対して、
本発明の乗算器１５では、部分積を求めることなく、直
接、乗算値となる２ｍビットデータを算出する構成を採
っている。例えば、６４ビットのデータと、６４ビット
のデータとを乗算して、１２８ビットの乗算結果のデー
タを算出するのである。Further, the multiplier 15 of the present invention has a function of calculating 2m-bit data which is a multiplication value of two input m-bit data. That is, in the conventional multiplier, m is obtained as a multiplication value of two input m-bit data by obtaining partial products and adding them while shifting them.
In contrast to the configuration that calculates bit data,
The multiplier 15 of the present invention has a configuration that directly calculates 2m-bit data as a multiplication value without obtaining a partial product. For example, 64-bit data and 64-bit data are multiplied to calculate 128-bit multiplication result data.

【００２９】これから、従来の乗算器では、内部で実行
する乗算回数が多くなるとともに、内部に、部分積をシ
フトし加算する機能を持たなくてはならないという問題
点があったが、本発明の乗算器１５では、これを解決で
きることになる。From the above, the conventional multiplier has a problem that the number of multiplications to be executed internally increases and the internal multiplier must have a function of shifting and adding partial products. The multiplier 15 can solve this.

【００３０】しかるに、乗算器１５に入力されるデータ
がｍビット構成であるときには、加算器１４に入力され
るデータもｍビット構成を採るので、乗算器１５が２ｍ
ビットのデータを出力したのでは整合性を保てない。こ
れから、本発明の乗算器１５では、命令に応答して、乗
算値の上位ｍビットデータか下位ｍビットデータのいず
れか一方を選択して出力するセレクタを持つ構成を採る
ことで、これに対処している。However, when the data input to the multiplier 15 has an m-bit structure, the data input to the adder 14 also has an m-bit structure, so that the multiplier 15 outputs 2 m.
Consistency cannot be maintained by outputting bit data. The multiplier 15 of the present invention copes with this by adopting a configuration having a selector that selects and outputs either the upper m-bit data or the lower m-bit data of the multiplication value in response to the instruction. are doing.

【００３１】[0031]

【実施例】以下、実施例に従って本発明を詳細に説明す
る。図１で説明したように、本発明の加算器１４は、ベ
クトルオペランドの被加数と加数の他に、マスクレジス
タ１３に書き込まれるキャリーアウトデータを入力とし
て加算処理を実行して、その加算結果により生ずるキャ
リーアウトデータをマスクレジスタ１３に書き込む構成
を採っている。EXAMPLES The present invention will be described in detail below with reference to examples. As described with reference to FIG. 1, the adder 14 of the present invention executes an addition process by using the carry-out data written in the mask register 13 as an input, in addition to the augend and the addend of the vector operand, and performs the addition. The carry-out data generated as a result is written in the mask register 13.

【００３２】すなわち、図２に示すように、ベクトルレ
ジスタ１２から読み込む被加数と、ベクトルレジスタ１
２から読み込む加数と、マスクレジスタ１３から読み込
むキャリーアウトデータとを入力して加算処理を実行し
て、その加算値をベクトルレジスタ１２に書き込むとと
もに、その加算処理により生じたキャリーアウトデータ
をマスクレジスタ１３に書き込む構成を採るのである。That is, as shown in FIG. 2, the augend read from the vector register 12 and the vector register 1
The addend to be read from 2 and the carry-out data to be read from the mask register 13 are input, the addition processing is executed, the addition value is written to the vector register 12, and the carry-out data generated by the addition processing is written to the mask register. That is, the configuration of writing in 13 is adopted.

【００３３】マスクレジスタ１３から読み込むキャリー
アウトデータは、１ビットのデータであることから、こ
の加算処理は簡単なハードウェア構成により実現できる
ことになる。Since the carry-out data read from the mask register 13 is 1-bit data, this addition processing can be realized by a simple hardware configuration.

【００３４】一方、図１で説明したように、本発明の乗
算器１５は、入力される２つのｍビットデータの乗算値
となる２ｍビットデータを算出する機能を有するととも
に、命令に応答して、乗算値の上位ｍビットデータか下
位ｍビットデータのいずれか一方を選択して出力するセ
レクタを持つ構成を採る。On the other hand, as described with reference to FIG. 1, the multiplier 15 of the present invention has a function of calculating 2m-bit data which is a multiplication value of two input m-bit data, and responds to an instruction. , And a configuration having a selector for selecting and outputting either the upper m-bit data or the lower m-bit data of the multiplication value.

【００３５】すなわち、図３に示すように、ベクトルレ
ジスタ１２から読み込むｍビットの被乗数と、ベクトル
レジスタ１２から読み込むｍビットの乗数とを入力とし
て乗算処理を実行して、その乗算値の２ｍビットのデー
タをラッチし、命令に応答して、その乗算値の上位ｍビ
ットデータか下位ｍビットデータのいずれか一方を選択
して出力するセレクタを持つ構成を採るのである。That is, as shown in FIG. 3, the m-bit multiplicand read from the vector register 12 and the m-bit multiplier read from the vector register 12 are used as inputs to execute the multiplication process, and the multiplication value of 2 m bits is calculated. The configuration is such that it has a selector that latches data and, in response to an instruction, selects and outputs either the upper m-bit data or the lower m-bit data of the multiplication value.

【００３６】次に、本発明の加算器１４を用いて実行さ
れるベクトル加算処理について、１６０ビットの被加数
と１６０ビットの加数との加算処理を例にして説明す
る。加算器１４が６４ビット同士の加算処理を実行する
場合には、図１０で示したように、６４ビットの３つの
レジスタ（ｖｒ00，ｖｒ01，ｖｒ02）からなる被加数用
のレジスタと、６４ビットの３つのレジスタ（ｖｒ03，
ｖｒ04，ｖｒ05）からなる加数用のレジスタとを用意し
て、図４に示す形式、すなわち、図５に図式化する形式
に従って、その被加数用のレジスタに１６０ビットの被
加数を格納するとともに、加数用のレジスタに１６０ビ
ットの加数を格納する。Next, the vector addition processing executed using the adder 14 of the present invention will be described by taking the addition processing of the 160-bit augend and the 160-bit addend as an example. When the adder 14 executes addition processing of 64 bits, as shown in FIG. 10, a register for the augend, which is composed of three registers of 64 bits (vr00, vr01, vr02), and a 64-bit Three registers (vr03,
and a register for addend consisting of vr04, vr05), and stores the 160-bit addend in the register for addend according to the format shown in FIG. 4, that is, the format illustrated in FIG. At the same time, the 160-bit addend is stored in the addend register.

【００３７】そして、図６に示すベクトル命令列を発行
することで、１６０ビットの被加数と１６０ビットの加
数との加算処理を実行する。ここで、「ＶＡＣｖｒ１，ｖｒ２，ｍｒ１，ｖｒ３，ｍｒ
２」は、ベクトルレジスタｖｒ１とベクトルレジスタｖｒ２
とマスクレジスタｍｒ１との加算結果をベクトルレジス
タｖｒ３に格納するとともに、そのとき発生するキャリ
ーアウトデータをマスクレジスタｍｒ２に格納しろとい
うベクトル加算命令である。By issuing the vector instruction sequence shown in FIG. 6, the addition process of the 160-bit augend and the 160-bit augend is executed. Here, "VAC vr1, vr2, mr1, vr3, mr
2 ”is the vector register vr1 and the vector register vr2
Is a vector addition instruction to store the addition result of the mask register mr1 in the vector register vr3 and the carry-out data generated at that time in the mask register mr2.

【００３８】すなわち、図６に示すベクトル命令列に従
い、先ず最初に、(1) のベクトル加算命令ＶＡＣに従っ
て、ベクトルレジスタｖｒ02の被加数部分と、ベクトル
レジスタｖｒ05の加数部分と、初期値としてゼロ値を格
納するマスクレジスタｍｒ00の格納データとを加算して
ベクトルレジスタｖｒ08に格納するとともに、このとき
発生する桁上げ値のキャリーアウトデータをマスクレジ
スタｍｒ01に格納する。That is, according to the vector instruction sequence shown in FIG. 6, first, according to the vector addition instruction VAC of (1), the augend part of the vector register vr02, the addend part of the vector register vr05, and the initial value are set as initial values. The data stored in the mask register mr00 that stores a zero value is added and stored in the vector register vr08, and the carry-out data of the carry value generated at this time is stored in the mask register mr01.

【００３９】続いて、(2) のベクトル加算命令ＶＡＣに
従って、ベクトルレジスタｖｒ01の被加数部分と、ベク
トルレジスタｖｒ04の加数部分と、マスクレジスタｍｒ
01に格納されるキャリーアウトデータとを加算してベク
トルレジスタｖｒ07に格納するとともに、このとき発生
する桁上げ値のキャリーアウトデータをマスクレジスタ
ｍｒ02に格納する。Then, according to the vector addition instruction VAC of (2), the augend part of the vector register vr01, the addend part of the vector register vr04, and the mask register mr.
The carry-out data stored in 01 is added and stored in the vector register vr07, and the carry-out data of the carry value generated at this time is stored in the mask register mr02.

【００４０】最後に、(3) のベクトル加算命令ＶＡＣに
従って、ベクトルレジスタｖｒ00の被加数部分と、ベク
トルレジスタｖｒ03の加数部分と、マスクレジスタｍｒ
02に格納されるキャリーアウトデータとを加算してベク
トルレジスタｖｒ06に格納するとともに、このとき発生
する桁上げ値のキャリーアウトデータをマスクレジスタ
ｍｒ00に格納する。Finally, according to the vector addition instruction VAC of (3), the augend part of the vector register vr00, the addend part of the vector register vr03, and the mask register mr.
The carry-out data stored in 02 is added and stored in the vector register vr06, and the carry-out data of the carry value generated at this time is stored in the mask register mr00.

【００４１】このように、本発明の加算器１４を用いる
ベクトル処理装置１では、図７に示すように、マスクレ
ジスタｍｒ00,01,02を使いつつ、３個のベクトル加算命
令を発行することで、１６０ビットの被加数と１６０ビ
ットの加数との加算値を算出できることになる。これに
対して、従来技術に従っていると、図１３で説明したよ
うに、１１個のベクトル加算命令／ベクトルシフト命令
を発行しなければならない。As described above, in the vector processing device 1 using the adder 14 of the present invention, as shown in FIG. 7, while using the mask registers mr00, 01, 02, three vector addition instructions are issued. , 160-bit addend and 160-bit addend can be calculated. On the other hand, according to the conventional technique, 11 vector addition instructions / vector shift instructions must be issued as described with reference to FIG.

【００４２】次に、本発明の乗算器１５を用いて実行さ
れるベクトル乗算処理について説明する。図８に示すよ
うに、４倍精度データでは、１１２ビットの仮数を持っ
ている。これから、４倍精度の乗算処理では、乗算結果
の仮数を求めるために、図９に示すオペランドの乗算処
理を実行する必要がある。Next, the vector multiplication processing executed by using the multiplier 15 of the present invention will be described. As shown in FIG. 8, quadruple precision data has a 112-bit mantissa. Therefore, in the quadruple precision multiplication process, it is necessary to execute the operand multiplication process shown in FIG. 9 in order to obtain the mantissa of the multiplication result.

【００４３】これから、本発明の乗算器１５を用いて実
行されるベクトル乗算処理について、１２８ビットの被
乗数と１２８ビットの乗数との乗算処理を例にして説明
する。Now, the vector multiplication processing executed by using the multiplier 15 of the present invention will be described by taking the multiplication processing of the 128-bit multiplicand and the 128-bit multiplier as an example.

【００４４】乗算器１５が６４ビット同士の乗算処理を
実行する場合には、図１０に示すように、１２８ビット
の被乗数用のレジスタ（上位６４ビット部分を“０
１”、下位６４ビット部分を“０２”で表してある）
と、１２８ビットの乗数用のレジスタ（上位６４ビット
部分を“０３”、下位６４ビット部分を“０４”で表し
てある）とを用意して、その被乗数レジスタに１２８ビ
ットの被乗数（上位６４ビット部分をＡ１、下位６４ビ
ット部分をＡ２で表してある）を格納するとともに、そ
の乗数レジスタに１２８ビットの乗数（上位６４ビット
部分をＢ１、下位６４ビット部分をＢ２で表してある）
を格納する。When the multiplier 15 executes a multiplication process of 64 bits, as shown in FIG. 10, a 128-bit multiplicand register (the upper 64-bit portion is "0").
1 ", the lower 64 bits are represented by" 02 ")
And a 128-bit multiplier register (the upper 64-bit part is represented by “03” and the lower 64-bit part is represented by “04”), and the multiplicand register of the 128-bit multiplicand (upper 64 bits) is prepared. The part stores A1 and the lower 64 bit part is represented by A2), and a multiplier of 128 bits is stored in the multiplier register (the upper 64 bit part is represented by B1 and the lower 64 bit part is represented by B2).
To store.

【００４５】そして、図１１に示すベクトル命令列を発
行することで、図１２に図式化する乗算過程に従いつ
つ、１２８ビットの被乗数と１２８ビットの乗数との乗
算処理を実行する。ここで、「ＶＭＬｖｒ１，ｖｒ２，ｖｒ３」は、ベクトルレジスタｖｒ１とベクトルレジスタｖｒ２
との乗算結果の下位６４ビットをベクトルレジスタｖｒ
３に格納しろというベクトル乗算命令であり、「ＶＭＵｖｒ１，ｖｒ２，ｖｒ３」は、ベクトルレジスタｖｒ１とベクトルレジスタｖｒ２
との乗算結果の上位６４ビットをベクトルレジスタｖｒ
３に格納しろというベクトル乗算命令であり、「ＶＡＣｖｒ１，ｖｒ２，ｍｒ１，ｖｒ３，ｍｒ
２」は、ベクトルレジスタｖｒ１とベクトルレジスタｖｒ２
とマスクレジスタｍｒ１との加算結果をベクトルレジス
タｖｒ３に格納するとともに、そのとき発生するキャリ
ーアウトデータをマスクレジスタｍｒ２に格納しろとい
うベクトル加算命令である。By issuing the vector instruction sequence shown in FIG. 11, the multiplication process of the 128-bit multiplicand and the 128-bit multiplier is executed according to the multiplication process illustrated in FIG. Here, “VML vr1, vr2, vr3” is the vector register vr1 and the vector register vr2.
The lower 64 bits of the multiplication result with and are vector register vr
3 is a vector multiplication instruction to store in "3", and "VMU vr1, vr2, vr3" are vector registers vr1 and vr2.
The upper 64 bits of the multiplication result with and are vector register vr
3 is a vector multiplication instruction to store the data in “VAC vr1, vr2, mr1, vr3, mr.
2 ”is the vector register vr1 and the vector register vr2
Is a vector addition instruction to store the addition result of the mask register mr1 in the vector register vr3 and the carry-out data generated at that time in the mask register mr2.

【００４６】すなわち、図１１に示すベクトル命令列に
従い、先ず最初に、(1) のベクトル乗算命令ＶＭＬに従
って、ベクトルレジスタｖｒ02の被乗数部分Ａ２と、ベ
クトルレジスタｖｒ04の乗数部分Ｂ２とを乗算して、乗
算器１５のセレクタを制御することで出力されるその乗
算結果の下位６４ビットのＡ２Ｂ２Ｌをベクトルレジス
タｖｒ23に格納する。That is, according to the vector instruction sequence shown in FIG. 11, first, according to the vector multiplication instruction VML of (1), the multiplicand portion A2 of the vector register vr02 and the multiplier portion B2 of the vector register vr04 are multiplied, The lower 64-bit A2B2L of the multiplication result output by controlling the selector of the multiplier 15 is stored in the vector register vr23.

【００４７】続いて、(2) のベクトル乗算命令ＶＭＵに
従って、ベクトルレジスタｖｒ02の被乗数部分Ａ２と、
ベクトルレジスタｖｒ04の乗数部分Ｂ２とを乗算して、
乗算器１５のセレクタを制御することで出力されるその
乗算結果の上位６４ビットのＡ２Ｂ２Ｕをベクトルレジ
スタｖｒ05に格納する。Then, according to the vector multiplication instruction VMU of (2), the multiplicand part A2 of the vector register vr02 and
Multiplying with the multiplier part B2 of the vector register vr04,
The upper 64-bit A2B2U of the multiplication result output by controlling the selector of the multiplier 15 is stored in the vector register vr05.

【００４８】続いて、(3) のベクトル乗算命令ＶＭＬに
従って、ベクトルレジスタｖｒ01の被乗数部分Ａ１と、
ベクトルレジスタｖｒ04の乗数部分Ｂ２とを乗算して、
乗算器１５のセレクタを制御することで出力されるその
乗算結果の下位６４ビットのＡ１Ｂ２Ｌをベクトルレジ
スタｖｒ06に格納する。Then, according to the vector multiplication instruction VML of (3), the multiplicand part A1 of the vector register vr01 and
Multiplying with the multiplier part B2 of the vector register vr04,
The lower 64-bit A1B2L of the multiplication result output by controlling the selector of the multiplier 15 is stored in the vector register vr06.

【００４９】続いて、(4) のベクトル加算命令ＶＡＣに
従って、ベクトルレジスタｖｒ05に格納されるＡ２Ｂ２
Ｕと、ベクトルレジスタｖｒ06に格納されるＡ１Ｂ２Ｌ
と、初期値としてゼロ値を格納するマスクレジスタｍｒ
00の格納データとを加算してベクトルレジスタｖｒ07に
格納するとともに、このとき発生する桁上げ値のキャリ
ーアウトデータをマスクレジスタｍｒ01に格納する。Then, according to the vector addition instruction VAC of (4), A2B2 stored in the vector register vr05.
U and A1B2L stored in the vector register vr06
And a mask register mr that stores a zero value as an initial value
The stored data of 00 is added and stored in the vector register vr07, and the carry-out data of the carry value generated at this time is stored in the mask register mr01.

【００５０】続いて、(5) のベクトル乗算命令ＶＭＬに
従って、ベクトルレジスタｖｒ02の被乗数部分Ａ２と、
ベクトルレジスタｖｒ03の乗数部分Ｂ１とを乗算して、
乗算器１５のセレクタを制御することで出力されるその
乗算結果の下位６４ビットのＡ２Ｂ１Ｌをベクトルレジ
スタｖｒ08に格納する。Then, according to the vector multiplication instruction VML in (5), the multiplicand part A2 of the vector register vr02 and
Multiply by the multiplier part B1 of the vector register vr03,
The lower 64-bit A2B1L of the multiplication result output by controlling the selector of the multiplier 15 is stored in the vector register vr08.

【００５１】続いて、(6) のベクトル加算命令ＶＡＣに
従って、ベクトルレジスタｖｒ07の格納データと、ベク
トルレジスタｖｒ08に格納されるＡ２Ｂ１Ｌと、初期値
としてゼロ値を格納するマスクレジスタｍｒ00の格納デ
ータとを加算してベクトルレジスタｖｒ22に格納すると
ともに、このとき発生する桁上げ値のキャリーアウトデ
ータをマスクレジスタｍｒ02に格納する。Then, according to the vector addition instruction VAC of (6), the storage data of the vector register vr07, A2B1L stored in the vector register vr08, and the storage data of the mask register mr00 storing a zero value as an initial value are stored. The carry value is added and stored in the vector register vr22, and the carry-out data of the carry value generated at this time is stored in the mask register mr02.

【００５２】続いて、(7) のベクトル乗算命令ＶＭＵに
従って、ベクトルレジスタｖｒ01の被乗数部分Ａ１と、
ベクトルレジスタｖｒ04の乗数部分Ｂ２とを乗算して、
乗算器１５のセレクタを制御することで出力されるその
乗算結果の上位６４ビットのＡ１Ｂ２Ｕをベクトルレジ
スタｖｒ10に格納する。Subsequently, according to the vector multiplication instruction VMU of (7), the multiplicand part A1 of the vector register vr01 and
Multiplying with the multiplier part B2 of the vector register vr04,
The upper 64-bit A1B2U of the multiplication result output by controlling the selector of the multiplier 15 is stored in the vector register vr10.

【００５３】続いて、(8) のベクトル乗算命令ＶＭＵに
従って、ベクトルレジスタｖｒ02の被乗数部分Ａ２と、
ベクトルレジスタｖｒ03の乗数部分Ｂ１とを乗算して、
乗算器１５のセレクタを制御することで出力されるその
乗算結果の上位６４ビットのＡ２Ｂ１Ｕをベクトルレジ
スタｖｒ11に格納する。Subsequently, according to the vector multiplication instruction VMU of (8), the multiplicand part A2 of the vector register vr02 and
Multiply by the multiplier part B1 of the vector register vr03,
The upper 64-bit A2B1U of the multiplication result output by controlling the selector of the multiplier 15 is stored in the vector register vr11.

【００５４】続いて、(9) のベクトル加算命令ＶＡＣに
従って、ベクトルレジスタｖｒ10に格納されるＡ１Ｂ２
Ｕと、ベクトルレジスタｖｒ11に格納されるＡ２Ｂ１Ｕ
と、マスクレジスタｍｒ01に格納されるキャリーアウト
データとを加算してベクトルレジスタｖｒ12に格納する
とともに、このとき発生する桁上げ値のキャリーアウト
データをマスクレジスタｍｒ00に格納する。Then, according to the vector addition instruction VAC of (9), A1B2 stored in the vector register vr10.
U and A2B1U stored in the vector register vr11
And carry-out data stored in the mask register mr01 are added and stored in the vector register vr12, and carry-out data of the carry value generated at this time is stored in the mask register mr00.

【００５５】続いて、(10)のベクトル乗算命令ＶＭＬに
従って、ベクトルレジスタｖｒ01の被乗数部分Ａ１と、
ベクトルレジスタｖｒ03の乗数部分Ｂ１とを乗算して、
乗算器１５のセレクタを制御することで出力されるその
乗算結果の下位６４ビットのＡ１Ｂ１Ｌをベクトルレジ
スタｖｒ13に格納する。Then, according to the vector multiplication instruction VML in (10), the multiplicand part A1 of the vector register vr01 and
Multiply by the multiplier part B1 of the vector register vr03,
The lower 64-bit A1B1L of the multiplication result output by controlling the selector of the multiplier 15 is stored in the vector register vr13.

【００５６】続いて、(11)のベクトル加算命令ＶＡＣに
従って、ベクトルレジスタｖｒ12の格納データと、ベク
トルレジスタｖｒ13に格納されるＡ１Ｂ１Ｌと、マスク
レジスタｍｒ02に格納されるキャリーアウトデータとを
加算してベクトルレジスタｖｒ21に格納するとともに、
このとき発生する桁上げ値のキャリーアウトデータをマ
スクレジスタｍｒ03に格納する。Then, according to the vector addition instruction VAC of (11), the data stored in the vector register vr12, the A1B1L stored in the vector register vr13, and the carry-out data stored in the mask register mr02 are added to obtain the vector. Stored in register vr21,
The carry-out data of the carry value generated at this time is stored in the mask register mr03.

【００５７】続いて、(12)のベクトル乗算命令ＶＭＵに
従って、ベクトルレジスタｖｒ01の被乗数部分Ａ１と、
ベクトルレジスタｖｒ03の乗数部分Ｂ１とを乗算して、
乗算器１５のセレクタを制御することで出力されるその
乗算結果の上位６４ビットのＡ１Ｂ１Ｕをベクトルレジ
スタｖｒ15に格納する。Then, according to the vector multiplication instruction VMU of (12), the multiplicand part A1 of the vector register vr01 and
Multiply by the multiplier part B1 of the vector register vr03,
The upper 64-bit A1B1U of the multiplication result output by controlling the selector of the multiplier 15 is stored in the vector register vr15.

【００５８】最後に、(13)のベクトル加算命令ＶＡＣに
従って、初期値としてゼロ値を格納するマスクレジスタ
ｖｒ00の格納データと、ベクトルレジスタｖｒ15に格納
されるＡ１Ｂ１Ｕと、マスクレジスタｍｒ03に格納され
るキャリーアウトデータとを加算してベクトルレジスタ
ｖｒ20に格納するとともに、このとき発生する桁上げ値
のキャリーアウトデータをマスクレジスタｍｒ00に格納
する。Finally, according to the vector addition instruction VAC of (13), the data stored in the mask register vr00 that stores a zero value as the initial value, A1B1U stored in the vector register vr15, and the carry stored in the mask register mr03. The out data is added and stored in the vector register vr20, and the carry out data of the carry value generated at this time is stored in the mask register mr00.

【００５９】このように、本発明の乗算器１５を用いる
ベクトル処理装置１では、６４ビット同士の乗算処理に
より求まる１２８ビットの乗算結果の上位６４ビットか
下位６４ビットのいずれかを取り出しながら、本発明の
加算器１４を用いつつ、１２８ビットの被乗数と１２８
ビットの乗数との乗算値を算出していくのである。As described above, in the vector processing device 1 using the multiplier 15 of the present invention, the high-order 64 bits or the low-order 64 bits of the 128-bit multiplication result obtained by the 64-bit multiplication process are taken out while Using the adder 14 of the invention, the 128-bit multiplicand and 128
The multiplication value with the bit multiplier is calculated.

【００６０】なお、この構成にあって、マスクレジスタ
ｍｒ00やベクトルレジスタｖｒ00には、ゼロ値を格納し
ておく必要はなく、そのようなレジスタ番号が指定され
るときには、ゼロ値の入力指定があったと見なしていく
構成を採ってもよい。また、ｍｒ00へ書き込むキャリー
アウトデータは、実際には後で使用するものではない。
これから、そのようなレジスタ番号が指定されるときに
は、実際の書込処理を行わないことで、元のデータを壊
さないようにする構成を採ってもよい。また、ベクトル
加算命令ＶＡＣでは、５個のレジスタを指定しなければ
ならないが、入力と出力とでマスクレジスタを共通にす
れば、４個のレジスタの指定で済むことになる。With this configuration, it is not necessary to store a zero value in the mask register mr00 or the vector register vr00, and when such a register number is designated, a zero value is designated for input. You may take the structure that you consider to be. Further, the carry-out data written to mr00 is not actually used later.
From this, when such a register number is designated, a configuration may be adopted in which the original data is not destroyed by not performing the actual writing process. Further, in the vector addition instruction VAC, five registers must be designated, but if the mask register is shared by the input and the output, the designation of four registers will suffice.

【００６１】[0061]

【発明の効果】以上説明したように、本発明のベクトル
処理装置によれば、キャリーアウトデータをマスクレジ
スタに格納していく構成を採ることから、従来技術のよ
うに、ベクトルシフト命令を実行しながらベクトル加算
命令を実行していくという構成を採る必要がなくなり、
少ない命令数でもって高速にベクトル加算命令を実行で
きるようになる。As described above, according to the vector processing device of the present invention, since the carry-out data is stored in the mask register, the vector shift instruction is executed as in the prior art. However, there is no need to adopt the configuration of executing vector addition instructions,
The vector addition instruction can be executed at high speed with a small number of instructions.

【００６２】また、本発明の乗算器では、入力される２
つのｍビットデータの乗算値となる２ｍビットデータを
算出する機能を有するとともに、命令に応答して、乗算
値の上位ｍビットデータか下位ｍビットデータのいずれ
か一方を選択して出力するセレクタを持つ構成を採るこ
とから、従来技術の問題点を解決できるようになるとと
もに、ベクトル処理装置の持つｍビットのレジスタや、
ｍビット入力仕様の加算器との整合性を保てることにな
る。In the multiplier of the present invention, the input 2
A selector that has a function of calculating 2m-bit data that is a multiplication value of two m-bit data and that selects and outputs either the upper m-bit data or the lower m-bit data of the multiplication value in response to an instruction. By adopting the configuration having the above, it becomes possible to solve the problems of the conventional technology, and the m-bit register of the vector processing device,
It is possible to maintain the consistency with the adder having the m-bit input specification.

[Brief description of drawings]

【図１】本発明の原理構成図である。FIG. 1 is a principle configuration diagram of the present invention.

【図２】本発明の加算器の一実施例である。FIG. 2 is an embodiment of the adder of the present invention.

【図３】本発明の乗算器の一実施例である。FIG. 3 is an example of a multiplier of the present invention.

【図４】被加数及び加数の格納処理の説明図である。FIG. 4 is an explanatory diagram of a storage process of an augend and an addend.

【図５】被加数及び加数の格納処理の説明図である。FIG. 5 is an explanatory diagram of a storage process of an augend and an addend.

【図６】本発明で発行するベクトル加算命令の説明図で
ある。FIG. 6 is an explanatory diagram of a vector addition instruction issued by the present invention.

【図７】本発明の加算処理の説明図である。FIG. 7 is an explanatory diagram of addition processing according to the present invention.

【図８】４倍精度データのデータフォーマットの説明図
である。FIG. 8 is an explanatory diagram of a data format of quad precision data.

【図９】４倍精度乗算処理のオペランドの説明図であ
る。FIG. 9 is an explanatory diagram of operands of quadruple precision multiplication processing.

【図１０】被乗数及び乗数の格納処理の説明図である。FIG. 10 is an explanatory diagram of a storage process of a multiplicand and a multiplier.

【図１１】本発明で発行するベクトル乗算命令の説明図
である。FIG. 11 is an explanatory diagram of a vector multiplication instruction issued by the present invention.

【図１２】本発明の乗算処理の説明図である。FIG. 12 is an explanatory diagram of a multiplication process of the present invention.

【図１３】従来技術の説明図である。FIG. 13 is an explanatory diagram of a conventional technique.

【図１４】従来技術の説明図である。FIG. 14 is an explanatory diagram of a conventional technique.

【図１５】従来技術の説明図である。FIG. 15 is an explanatory diagram of a conventional technique.

【図１６】従来技術の説明図である。FIG. 16 is an explanatory diagram of a conventional technique.

【図１７】従来技術の説明図である。FIG. 17 is an explanatory diagram of a conventional technique.

[Explanation of symbols]

１ベクトル処理装置１０ＣＰＵ１１ベクトル命令制御機構１２ベクトルレジスタ１３マスクレジスタ１４加算器１５乗算器 1 Vector Processor 10 CPU 11 Vector Instruction Control Mechanism 12 Vector Register 13 Mask Register 14 Adder 15 Multiplier

Claims

[Claims]

1. A vector processing device for performing vector processing, comprising at least a vector register, a mask register and an adder, wherein a mask is provided to the adder in addition to the addend and the augend of the vector operand. A vector processing device characterized by adopting a configuration for inputting register data.

2. A vector processing device, which comprises at least a vector register, a mask register, and an adder, and which executes vector processing, has a configuration in which carry-out data calculated by the adder is output to the mask register. A characteristic vector processing device.

3. A vector processing device for performing vector processing, comprising at least a vector register, a mask register, and an adder, wherein the adder has a mask in addition to the addend and the augend of the vector operand. The register data is input and the carry-out data calculated by the adder is output to the mask register. Furthermore, the same mask register for the adder input and the mask register for the adder output are used. A vector processing device characterized by adopting a configuration to be used.

4. A multiplier for calculating 2m-bit data, which is a multiplication value of two input m-bit data, wherein the upper m-bit data or the lower m-bit data of the multiplication value is generated in response to an instruction.
A multiplier characterized by having a selector that selects and outputs one of bit data.

5. In a vector processing device that includes at least a vector register, a mask register, an adder, and a multiplier and executes vector processing, as a multiplier, a multiplication value of two m-bit data to be input is obtained. In addition to calculating 2m-bit data, use one that has a selector that selects and outputs either the upper m-bit data or the lower m-bit data of the multiplication value in response to the instruction. The data output by the secreter is input as the addend and the augend, and in addition, the data of the mask register is input and the carry-out data calculated by the adder is output to the mask register. A vector processing device characterized by the above.

6. The vector processing device according to claim 5, wherein the same mask register is used as an adder input mask register and an adder output mask register. .

7. In a vector processing device comprising a vector register, a mask register, and a vector arithmetic unit for executing vector processing, a register number specified by an instruction instructing an input from the register is a specific vector register or a specific vector register. When specifying the mask register of, the register number specified by the instruction that performs input processing by treating the data from the register as a zero value and outputs to the register indicates a specific vector register or a specific mask register. In some cases, the vector processing device is characterized in that the data output process to the register is not executed.