JP2908231B2

JP2908231B2 - Vector arithmetic unit

Info

Publication number: JP2908231B2
Application number: JP6059530A
Authority: JP
Inventors: 徹三井
Original assignee: NEC Computertechno Ltd
Current assignee: NEC Computertechno Ltd
Priority date: 1994-03-29
Filing date: 1994-03-29
Publication date: 1999-06-21
Anticipated expiration: 2014-06-21
Also published as: JPH07271765A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、コンピュータ装置に関
し、特に、コンピュータ装置に含まれるベクトル演算装
置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a computer device, and more particularly, to a vector operation device included in a computer device.

【０００２】[0002]

【従来の技術】従来、ベクトルレジスタの各ワード毎に
対応しベクトルデータの演算有効または演算無効を判別
するマスクデータに対して、前記マスクデータの先頭か
ら有効ベクトル長で指定されるビット数の各ビットの値
が“１”であるビットの数を数える演算を行う演算装置
は、マスクデータを先頭ビットから１ビットずつ加算カ
ウンタを用いて有効ベクトル長で指定された値だけ累積
加算処理を行い演算結果を算出している。2. Description of the Related Art Conventionally, mask data corresponding to each word of a vector register for determining whether the operation of vector data is valid or invalid is defined by the number of bits specified by the effective vector length from the head of the mask data. An arithmetic unit that performs an operation of counting the number of bits whose bit values are “1” performs an arithmetic operation by performing an accumulative addition process on the mask data by a value specified by an effective vector length using an addition counter one bit at a time from the first bit. The result is calculated.

【０００３】[0003]

【発明が解決しようとする課題】上述した従来のベクト
ル演算装置では、マスクデータを先頭ビットから１ビッ
トずつ加算カウンタを用いて有効ベクトル長で指定され
た値だけ累積加算処理を行い演算結果を算出していたこ
とから演算処理が低速であるという欠点がある。In the above-described conventional vector operation device, the mask data is subjected to an accumulative addition process by a value specified by an effective vector length by using an addition counter one bit at a time from the first bit to calculate an operation result. Therefore, there is a disadvantage that the arithmetic processing is slow.

【０００４】[0004]

【課題を解決するための手段】本発明のベクトル演算装
置は、ベクトルレジスタと、前記ベクトルレジスタのワ
ード数に対応したベクトルデータの演算有効または演算
無効を判別するマスクデータを出力するマスクレジスタ
と、前記マスクレジスタより出力されるマスクデータの
全ビットに対して先頭ビットから有効ベクトル長で指定
された値のビットまでのマスクデータはそのまま出力
し、有効ベクトル長で指定された値のビットから最大有
効長ビットまでのマスクデータは“０”に差し替え出力
する有効ビット判別部と、該有効ビット判別部から出力
された有効ビット判別済みのすべてのマスクデータに対
して値が“１”であるビットの数を計算するビット
“１”カウント演算部とを具備することを特徴とする。According to the present invention, there is provided a vector operation device, comprising: a vector register; a mask register for outputting mask data for determining whether the operation of vector data corresponding to the number of words of the vector register is valid or invalid; With respect to all the bits of the mask data output from the mask register, the mask data from the first bit to the bit of the value specified by the effective vector length is output as it is, and from the bit of the value specified by the effective vector length to the maximum effective bit A valid bit discriminator that replaces mask data up to long bits with “0” and outputs the mask data having a value “1” with respect to all mask data for which valid bit discrimination has been output from the valid bit discriminator. A bit "1" count operation unit for calculating a number.

【０００５】[0005]

【実施例】次に、本発明について図面を参照して説明す
る。Next, the present invention will be described with reference to the drawings.

【０００６】図１および図３は本発明の第１の実施例を
示す構成図および動作説明図である。FIGS. 1 and 3 are a configuration diagram and an operation explanatory diagram showing a first embodiment of the present invention.

【０００７】マスクレジスタ01が９ビットの時の演算動
作について説明する。ベクトルレジスタに格納されてい
るベクトルデータの有効無効を判別するマスクデータを
格納するマスクレジスタ01は、そのすべてのマスクデー
タをマスクレジスタ出力データ02,03,04,05,06,07,08,0
9,10として有効ビット判別部11内のビット差し替え回路
23へ出力する。The operation when the mask register 01 has 9 bits will be described. The mask register 01 for storing mask data for determining validity / invalidity of the vector data stored in the vector register is a mask register output data 02, 03, 04, 05, 06, 07, 08, 0
Bit replacement circuit in effective bit discriminator 11 as 9,10
Output to 23.

【０００８】また有効ビット判別部11内のビットパタン
生成回路13は、マスクレジスタ01と同一のビット幅で、
マスクレジスタ01の０番目のビットから有効ベクトル長
12の値の番号のビットまでは“１”を、有効ベクトル長
12の値の番号のビットより以降のビットに対して“０”
を生成し、ビットパタン生成回路出力14,15,16,17,18,1
9,20,21,22としてビット差し替え回路23へ出力する。The bit pattern generation circuit 13 in the valid bit discrimination section 11 has the same bit width as the mask register 01,
Effective vector length from bit 0 of mask register 01
"1" up to the bit of the value of 12 and the effective vector length
"0" for bits following the bit with the value of 12
And outputs the bit pattern generation circuit outputs 14, 15, 16, 17, 18, and 1.
Output to the bit replacement circuit 23 as 9,20,21,22.

【０００９】ビット差し替え回路23は、入力されたビッ
トパタン生成回路出力14,15,16,17,18,19,20,21,22の各
ビットの値が“０”となっているビットはマスクレジス
タ出力データ02,03,04,05,06,07,08,09,10の同一のビッ
ト番号のビットの値を“０”に差し替え、ビットパタン
生成回路出力14,15,16,17,18,19,20,21,22の各ビットの
値が“１”となっているビットに対してはマスクレジス
タ出力データ02,03,04,05,06,07,08,09,10の値を差し替
えずビット差し替え回路出力データ24,25,26,27,28,29,
30,31,32として、ビットの値が“１”であるビットの数
を計算するビット“１”カウント演算部33へ出力する。The bit replacement circuit 23 masks the bits of the input bit pattern generation circuit outputs 14, 15, 16, 17, 18, 19, 20, 21, and 22 in which the value of each bit is "0". The value of the bit having the same bit number of the register output data 02, 03, 04, 05, 06, 07, 08, 09, 10 is replaced with “0”, and the bit pattern generation circuit output 14, 15, 16, 17, 18 , 19,20,21,22, the value of the mask register output data 02,03,04,05,06,07,08,09,10 is set to the bit whose value is "1". Bit replacement circuit output data without replacement 24, 25, 26, 27, 28, 29,
As 30, 30, 32, it outputs to the bit “1” count operation unit 33 which calculates the number of bits whose bit value is “1”.

【００１０】ビット“１”カウント演算部33の第１演算
器群43では、９ビットのビット差し替え回路出力データ
24,25,26,27,28,29,30,31,32を各３ビットに３分割し、
ビット差し替え回路出力データ24,25を２入力データと
しビット差し替え回路出力データ26をキャリィデータと
し１ビット全加算器34へ入力し、ビット差し替え回路出
力データ27,28を２入力データとしビット差し替え回路
出力データ29をキャリィデータとして１ビット全加算器
35へ入力し、ビット差し替え回路出力データ30,31を２
入力データとしてビット差し替え回路出力データ32をキ
ャリィデータとして１ビット全加算器36へ入力する。３
ビットに分割_されたビット差し替え回路出力データ24〜
32によって１ビット全加算器34,35,36は１０進数で２の
重みを持つキャリィ出力データ37,38,39と、１０進数で
１の重みを持つサム出力データ40,41,42を出力する。第
２演算器群46では共に１０進数で２の重みを持つキャリ
ィ出力データ37,38,39の内のデータ37,38を２入力デー
タとしキャリィ出力データ39をキャリィデータとして１
ビット全加算器44へ入力し、同様に共に１０進数で１の
重みを持つキャリィ出力データ40,41,42の内のデータ4
0,41を２入力データとしキャリィ出力データ42をキャリ
ィデータとして１ビット全加算器45へ入力する。これよ
り１ビット全加算器44は、１０進数で４の重みを持つキ
ャリィ出力データ47と、１０進数で２の重みを持つサム
出力データ49とを出力し、１ビット全加算器45は、１０
進数で２の重みを持つキャリィ出力データ48と、１０進
数で１の重みを持つサム出力データを得る。ここで、１
ビット全加算器45より出力されたサム出力データを１０
進数で１の重みを持つビットが１ビットのみであること
から演算結果ビット３出力データ50として出力する。第
３演算器群59では１０進数で２の重みを持つデータビッ
トがキャリィ出力データ48およびサム出力データ49の２
ビットであることから、キャリィ出力データ48およびサ
ム出力データ49を２入力データとし“０”信号入力51を
キャリィデータとして１ビット全加算器52へ入力する。
キャリィ出力データ48とサム出力データ49の１ビット全
加算器_の加算結果として１０進数で４の重みを持つキャ
リィ出力データ53を加算器53が出力し、サム出力データ
を１０進数で２の重みを持つビットが１ビットのみであ
ることから演算結果ビット２出力データ54として出力す
る。同様にして１０進数で４の重みを持つデータビット
がキャリィ出力データ47およびキャリィ出力データ53の
２ビットであることから、キャリィ出力データ47および
キャリィ出力データ53を２入力データとし“０”信号入
力55をキャリィデータとし１ビット全加算器56へ入力す
る。キャリィ出力データ47とキャリィ出力データ53の１
ビット加算器加算結果としてキャリィ出力データを１０
進数で８の重みを持つビットが１ビットのみであること
から演算結果ビット０出力データ57として_加算器56が出
力し、サム出力データを１０進数で４の重みを持つビッ
トが１ビットのみであることから演算結果ビット１出力
データ58として出力する。出力である演算結果ビット０
出力結果57および演算結果ビット１出力データ58および
演算結果ビット２出力データ54および演算結果ビット３
出力データ50を各ビット番号順に並べた演算結果61が各
ビットの値が“１”であるビットの数を数える演算の演
算結果である。In the first operation unit group 43 of the bit "1" count operation unit 33, the 9-bit bit replacement circuit output data
24,25,26,27,28,29,30,31,32 is divided into 3 bits each,
The bit replacement circuit output data 24 and 25 are used as two input data, the bit replacement circuit output data 26 is used as carry data and input to the 1-bit full adder 34, and the bit replacement circuit output data 27 and 28 are used as two input data and output from the bit replacement circuit. 1-bit full adder using data 29 as carry data
35, and output the bit replacement circuit output data 30, 31 to 2
The bit replacement circuit output data 32 is input as input data to the 1-bit full adder 36 as carry data. 3
Bit replacement circuit outputs the data 24 to which _is divided into bits
By means of 32, the 1-bit full adders 34, 35, 36 output carry output data 37, 38, 39 having a decimal weight of 2 and sum output data 40, 41, 42 having a decimal weight of 1. . In the second computing unit group 46, the data 37, 38 of the carry output data 37, 38, 39 having a weight of 2 in decimal notation are used as two input data, and the carry output data 39 is used as the carry data.
The data is input to the bit full adder 44, and similarly, the data 4 of the carry output data 40, 41, and 42, both of which have decimal weights of 1
0 and 41 are input to the 1-bit full adder 45 as carry data and carry output data 42 as carry data. Thus, the 1-bit full adder 44 outputs carry output data 47 having a decimal weight of 4 and sum output data 49 having a decimal weight of 2, and the 1-bit full adder 45 has a 10-bit weight.
Carry output data 48 having a weight of 2 in decimal and sum output data having a weight of 1 in decimal are obtained. Where 1
Sum output data output from bit full adder 45 is
Since there is only one bit having a weight of 1 in the base number, it is output as the operation result bit 3 output data 50. In the third arithmetic unit group 59, the data bits having a weight of 2 in decimal are the carry output data 48 and the sum output data 49.
The carry output data 48 and the sum output data 49 are input to the 1-bit full adder 52 as carry data, and the "0" signal input 51 is carried as carry data.
_The adder 53 outputs the carry output data 53 having a weight of 4 in decimal as _an addition result of the 1-bit full adder of the carry output data 48 and the sum output data 49, and the sum output data is weighted by 2 in decimal. Since it has only one bit, it is output as operation result bit 2 output data 54. Similarly, since the data bits having a decimal weight of 4 are two bits of the carry output data 47 and the carry output data 53, the carry output data 47 and the carry output data 53 are set as two input data and the "0" signal is input. 55 is used as carry data and input to the 1-bit full adder 56. 1 of carry output data 47 and carry output data 53
Carry output data is 10 as the bit adder addition result.
Since only one bit has a weight of 8 in decimal, the _{adder 56} outputs the result bit 0 as output data 57, and the sum output data is only one bit having a weight of 4 in decimal. For this reason, it is output as the operation result bit 1 output data 58. Operation result bit 0 that is output
Output result 57 and operation result bit 1 output data 58 and operation result bit 2 output data 54 and operation result bit 3
An operation result 61 obtained by arranging the output data 50 in the order of each bit number is an operation result of an operation for counting the number of bits whose value of each bit is “1”.

【００１１】次に図３を用いてマスクレジスタ01の値が
“１０１１１０１０１”で、有効ベクトル長12の値が８
すなわち２進数で“１０００”のときのこの実施例の動
作イメージを示す。Next, referring to FIG. 3, the value of the mask register 01 is "101110101" and the value of the effective vector length 12 is 8
That is, an operation image of this embodiment when the binary number is "1000" is shown.

【００１２】有効ベクトル長12よりビットパタン生成回
路出力データとして“１１１１１１１１０”が出力され
る。[0012] "111111110" is output as the bit pattern generation circuit output data from the effective vector length of 12.

【００１３】マスクレジスタ出力データとして“１０１
１１０１０１”が出力される。"101" is output as mask register output data.
110101 "is output.

【００１４】ビット差し替え回路は、ビットパタン生成
回路出力データで値が“０”であるビットに対応したマ
スクレジスタ出力データのビットを“０”に差し替え
“１０１１１０１００”をビット差し替え回路出力デー
タとして各３ビットに分割して１ビット全加算器34,35,
36へ出力する。The bit replacement circuit replaces the bit of the mask register output data corresponding to the bit whose value is "0" in the bit pattern generation circuit output data with "0", and sets "101110100" as the bit replacement circuit output data. Divided into 1-bit full adders 34, 35,
Output to 36.

【００１５】各１ビット全加算器34,35,36は、入力され
た各３ビットのデータの演算を行い結果として１ビット
全加算器34ではキャリィ出力“１”、サム出力“０”を
出力し、１ビット全加算器35ではキャリィ出力“１”、
サム出力“０”を出力し、１ビット全加算器36ではキャ
リィ出力“０”、サム出力“１”を出力する。Each of the 1-bit full adders 34, 35 and 36 performs an operation on the input 3-bit data, and as a result, the 1-bit full adder 34 outputs a carry output "1" and a sum output "0". In the 1-bit full adder 35, the carry output “1”,
The sum output "0" is output, and the 1-bit full adder 36 outputs the carry output "0" and the sum output "1".

【００１６】１ビット全加算器44は、入力された各３ビ
ットのデータよりキャリィ出力“１”、サム出力“０”
を出力する。同様に１ビット全加算器45では、入力より
キャリィ出力“０”、サム出力“１”を出力する。The 1-bit full adder 44 outputs a carry output "1" and a sum output "0" based on the input 3-bit data.
Is output. Similarly, the 1-bit full adder 45 outputs a carry output “0” and a sum output “1” from the input.

【００１７】１ビット全加算器52は、１ビット全加算器
44のサム出力“０”と１ビット全加算器45のキャリィ出
力“０”と“０”信号よりキャリィ出力“０”、サム出
力“０”を出力する。The 1-bit full adder 52 is a 1-bit full adder
The carry output “0” and the sum output “0” are output from the sum output “0” of the 44 and the carry output “0” of the 1-bit full adder 45 and the “0” signal.

【００１８】１ビット全加算器56は、１ビット全加算器
44のキャリィ出力“１”と１ビット全加算器52のキャリ
ィ出力“０”と“０”信号よりキャリィ出力“０”、サ
ム出力“１”を出力する。The 1-bit full adder 56 is a 1-bit full adder
The carry output "0" and the sum output "1" are output from the carry output "1" of the 44 and the carry output "0" and the "0" signal of the 1-bit full adder 52.

【００１９】１ビット全加算器56のキャリィ出力および
サム出力、１ビット全加算器52のサム出力、１ビット全
加算器45のサム出力をビットの重み順に整列させると演
算結果として２進数" ０１０１”即ち１０進数で“５”
を得る。これは、ビット差し替え回路出力のビット
“１”の数である。When the carry output and the sum output of the 1-bit full adder 56 and the sum output of the 1-bit full adder 52 and the sum output of the 1-bit full adder 45 are arranged in the order of bit weight, a binary number "0101" is obtained as an operation result. ", Ie" 5 "in decimal
Get. This is the number of bits “1” of the output of the bit replacement circuit.

【００２０】以上説明した第１の実施例では、すべての
マスクデータに対する有効ビット判別動作と、あるビッ
ト幅を持った有効ビット判別済み入力データを任意のビ
ット幅に等分割しそれぞれの等分割データの各ビットの
値が“１”であるビットの数を算出する複数の演算器を
持ち、前記演算器より出力として得られる演算結果に対
する同等の重みを持ったビットを別々にそれぞれ等分割
しそれぞれの等分割データの各ビットの値が“１”であ
るビットの数を出力する複数の演算器に入力していくと
いう処理を階層的に行うことにより、従来の有効ベクト
ル長で指定された値だけ加算カウンタを用いて累積加算
を行う演算処理より高速に演算処理を行うことができ
る。In the first embodiment described above, the operation of determining the valid bits for all the mask data, and dividing the input data having a certain bit width for which the valid bits have been determined into equal bit widths and dividing them into equal bit Has a plurality of arithmetic units for calculating the number of bits in which the value of each bit is "1". Bits having the same weight for the arithmetic result obtained as an output from the arithmetic unit are separately divided into equal parts, respectively. By inputting the number of bits in which the value of each bit of the equal-divided data is “1” to a plurality of arithmetic units that output the number of bits, so that the value specified by the conventional effective vector length is obtained. Thus, the arithmetic processing can be performed at a higher speed than the arithmetic processing of performing the cumulative addition using the addition counter.

【００２１】図２および図４は本発明の第２の実施例を
示す構成図および動作説明図である。FIGS. 2 and 4 are a configuration diagram and an operation explanatory diagram showing a second embodiment of the present invention.

【００２２】マスクレジスタ01が９ビットの時の演算動
作について説明する。ただし、第１の実施例の同一部分
の説明は省略する。ベクトルレジスタに格納されている
ベクトルデータの有効無効を判別するマスクデータの格
納レジスタであるマスクレジスタ01のすべてのマスクデ
ータをマスクレジスタ出力データ02,03,04,05,06,07,0
8,09,10として有効ビット判別部11内のビット差し替え
回路23へ出力する。The operation when the mask register 01 has 9 bits will be described. However, description of the same parts in the first embodiment is omitted. All mask data of the mask register 01, which is a register for storing mask data for determining whether the vector data stored in the vector register is valid or invalid, is output to the mask register output data 02, 03, 04, 05, 06, 07, 0.
The values are output to the bit replacement circuit 23 in the valid bit determination unit 11 as 8,09,10.

【００２３】また有効ビット判別部11内のビットパタン
生成回路13では、マスクレジスタ01と同一のビット幅
で、マスクレジスタ01の０番目のビットから有効ベクト
ル長12の値の番号のビットまでは“１”を、有効ベクト
ル長12の値の番号のビットより以降のビットに対して
“０”を生成し、ビットパタン生成回路出力14,15,16,1
7,18,19,20,21,22としてビット差し替え回路23へ出力す
る。In the bit pattern generation circuit 13 in the valid bit discriminating unit 11, the bit width of the mask register 01 from the 0th bit to the bit of the value of the effective vector length 12 is the same as that of the mask register 01. 1 ”is generated, and“ 0 ”is generated for bits subsequent to the bit of the value number of the effective vector length 12, and the bit pattern generation circuit outputs 14, 15, 16, 1
The bits are output to the bit replacement circuit 23 as 7, 18, 19, 20, 21, and 22.

【００２４】ビット差し替え回路23では、入力されたビ
ットパタン生成回路出力14,15,16,17,18,19,20,21,22の
各ビットの値が“０”となっているビットはマスクレジ
スタ出力データ02,03,04,05,06,07,08,09,10の同一のビ
ット番号のビットの値を“０”に差し替え、ビットパタ
ン生成回路出力14,15,16,17,18,19,20,21,22の各ビット
の値が“１”となっているビットに対してはマスクレジ
スタ出力データ02,03,04,05,06,07,08,09,10の値を差し
替えずビット差し替え回路出力データ24,25,26,27,28,2
9,30,31,32としてビットの値が“１”であるビットの数
を計算するビット“１”カウント演算部33へ出力する。In the bit replacement circuit 23, the bits of the input bit pattern generation circuit outputs 14, 15, 16, 17, 18, 19, 20, 21, and 22 whose values are “0” are masked. The value of the bit having the same bit number of the register output data 02, 03, 04, 05, 06, 07, 08, 09, 10 is replaced with “0”, and the bit pattern generation circuit output 14, 15, 16, 17, 18 , 19,20,21,22, the value of the mask register output data 02,03,04,05,06,07,08,09,10 is set to the bit whose value is "1". Bit replacement circuit output data without replacement 24, 25, 26, 27, 28, 2
The value is output to the bit “1” count operation unit 33 for calculating the number of bits whose bit value is “1” as 9, 30, 31, and 32.

【００２５】ビット“１”カウント演算部33の第１演算
器群43では、９ビットのビット差し替え回路出力データ
24,25,26,27,28,29,30,31,32を各３ビットに３分割し、
ビット差し替え回路出力データ24,25を２入力データと
してビット差し替え回路出力データ26をキャリィデータ
として１ビット全加算器34へ入力し、ビット差し替え回
路出力データ27,28を２入力データとしてビット差し替
え回路出力データ29をキャリィデータとして１ビット全
加算器35へ入力し、ビット差し替え回路出力データ30,3
1を２入力データとしてビット差し替え回路出力データ3
2をキャリィデータとして１ビット全加算器36へ入力す
る。３ビット分割されたビット差し替え回路出力データ
24〜32によって１ビット全加算器34,35,36は１０進数で
２の重みを持つキャリィ出力データ37,37,39と、１０進
数で１の重みを持つサム出力データ40,41,42を出力す
る。第２演算器群46では共に１０進数で２の重みを持つ
キャリィ出力データ37,38,39の内37,38を２入力データ
としてキャリィ出力データ39をキャリィデータとして１
ビット全加算器44へ入力し、同様に共に１０進数で１の
重みを持つキャリィ出力データ40,41,42の内40,41を２
入力データとしてキャリィ出力データ42をキャリィデー
タとして１ビット全加算器45へ入力する。これより１ビ
ット全加算器44は１０進数で４の重みを持つキャリィ出
力データ47と、１０進数で２の重みを持つサム出力デー
タ49を出力し、１ビット全加算器45は１０進数で２の重
みを持つキャリィ出力データ48と、１０進数で１の重み
を持つサム出力データを出力する。ここで、》１ビット
全加算器45より出力されたサム出力データを、１０進数
で１の重みを持つビットが１ビットのみであることから
_、演算結果ビット３出力データ50として出力する。ここ
で、１ビット全加算器44からの出力である１０進数で４
の重みを持つキャリィ出力データ47と１０進数で２の重
みを持つサム出力データ49とをビットの重み順に整列し
２ビットの２進数とし、１ビット全加算器45からの出力
である１０進数で２の重みを持つキャリィ出力データ48
を１ビットの２進数として２入力加算器60へ入力する。
２入力加算器60より出力される３ビットの出力データは
演算結果の上位３ビットであることから演算結果ビット
０出力データ56、演算結果ビット１出力データ57、演算
結果ビット２出力データ53を出力する。出力である演算
結果ビット０出力データ56および演算結果ビット１出力
データ57および演算結果ビット２出力データ53および演
算結果ビット３出力データ50を各ビット番号順に並べた
演算結果61が各ビットの値が“１”であるビットの数を
数える演算の演算結果である。In the first operation unit group 43 of the bit "1" count operation unit 33, the 9-bit bit replacement circuit output data
24,25,26,27,28,29,30,31,32 is divided into 3 bits each,
The bit replacement circuit output data 24 and 25 are used as two input data, the bit replacement circuit output data 26 is input as carry data to the 1-bit full adder 34, and the bit replacement circuit output data 27 and 28 are used as two input data and the bit replacement circuit output. Data 29 is input as carry data to 1-bit full adder 35, and bit replacement circuit output data 30, 3
Bit replacement circuit output data 3 with 1 as 2 input data
2 is input to the 1-bit full adder 36 as carry data. Bit replacement circuit output data divided into 3 bits
The 24-bit to 32-bit 1-bit full adders 34, 35, 36 convert the carry output data 37, 37, 39 having a decimal weight of 2 and the sum output data 40, 41, 42 having a decimal weight of 1 to Output. In the second computing unit group 46, the carry output data 37, 38, and 39 of the carry output data 37, 38, and 39 each having a weight of 2 in decimal are set as two input data, and the carry output data 39 is set as 1 as the carry data.
It is input to the bit full adder 44, and similarly, 40,41 of the carry output data 40,41,42 having a weight of 1 in decimal notation are both converted into 2 bits.
The carry output data 42 is input as input data to the 1-bit full adder 45 as carry data. Thus, the 1-bit full adder 44 outputs carry output data 47 having a decimal weight of 4 and sum output data 49 having a decimal weight of 2, and the 1-bit full adder 45 has a decimal number of 2. And carry output data 48 having a weight of 1 and sum output data having a weight of 1 in decimal. Here, the sum output data output from the one-bit full adder 45 is calculated because the number of bits having a decimal weight of 1 is only one bit.
_{, And} outputs the result bit 3 as output data 50. Here, 4 in decimal notation which is the output from the 1-bit full adder 44
The carry output data 47 having a weight of 2 and the sum output data 49 having a weight of 2 as a decimal number are arranged in the order of bit weights to form a 2-bit binary number, which is a decimal number output from the 1-bit full adder 45. Carry output data 48 with weight of 2
Is input to a two-input adder 60 as a 1-bit binary number.
Since the 3-bit output data output from the 2-input adder 60 is the upper 3 bits of the operation result, the operation result bit 0 output data 56, the operation result bit 1 output data 57, and the operation result bit 2 output data 53 are output. I do. The operation result 61 obtained by arranging the operation result bit 0 output data 56, the operation result bit 1 output data 57, the operation result bit 2 output data 53, and the operation result bit 3 output data 50, which are outputs, in the order of each bit number has the value of each bit. This is an operation result of an operation for counting the number of bits that are “1”.

【００２６】次に図４を用いてマスクレジスタ01の値が
“１０１１１０１０１”で、有効ベクトル長12の値が８
すなわち２進数で“１０００”のときの第２の実施例の
動作イメージを示す。ただし、第１の実施例の同一部分
についての説明は省略する。Next, referring to FIG. 4, the value of the mask register 01 is "101110101" and the value of the effective vector length 12 is 8
That is, an operation image of the second embodiment when the binary number is "1000" is shown. However, description of the same parts in the first embodiment is omitted.

【００２７】２入力加算器60は１ビット全加算器44の出
力“１０”と、１ビット全加算器45のキャリィ出力
“０”の２入力データより結果“０１０”を出力する。The two-input adder 60 outputs a result "010" from two input data of the output "10" of the one-bit full adder 44 and the carry output "0" of the one-bit full adder 45.

【００２８】２入力加算器60の出力３ビットと１ビット
全加算器45のサム出力をビットの重み順に整列させると
演算結果として２進数“０１０１”即ち１０進数で
“５”を得る。これは、ビット差し替え回路出力のビッ
ト“１”の数である。When the three bits output from the two-input adder 60 and the sum output from the one-bit full adder 45 are arranged in the order of bit weight, a binary number "0101", that is, "5" is obtained as a decimal number as an operation result. This is the number of bits “1” of the output of the bit replacement circuit.

【００２９】以上説明した第２の実施例では、すべての
マスクデータに対する有効ビット判別動作と、あるビッ
ト幅を持った有効ビット判別済み入力データを任意のビ
ット幅に等分割しそれぞれの等分割データの各ビットの
値が“１”であるビットの数を算出する複数の演算器を
持ち、前記演算器より出力として得られる演算結果に対
する同等の重みを持ったビットを別々にそれぞれ等分割
しそれぞれの等分割データの各ビットの値が“１”であ
るビットの数を出力する複数の演算器に入力していくと
いう処理を階層的に行い、演算器出力で演算結果に対し
て上位の重みを持つビットが各２ビットずつとなった階
層で各ビットの重み順にビットを整列させ加算器に入力
し、加算器からの出力結果を各２ビットずつの上位の重
みを持つビットの演算結果として出力することにより、
従来の有効ベクトル長で指定された値だけ加算カウンタ
を用いて累積加算を行う演算処理より高速に演算処理を
行うことができる。In the second embodiment described above, the valid bit discriminating operation for all the mask data, and the valid bit discriminated input data having a certain bit width are equally divided into an arbitrary bit width and Has a plurality of arithmetic units for calculating the number of bits in which the value of each bit is "1". Bits having the same weight for the arithmetic result obtained as an output from the arithmetic unit are separately divided into equal parts, respectively. The process of inputting the number of bits in which the value of each bit of the equal-divided data is "1" to a plurality of arithmetic units is hierarchically performed. Are arranged in the order of the weight of each bit in a hierarchy in which each bit having 2 bits is input to the adder, and the output result from the adder is compared with the bit having the higher weight of 2 bits each. By outputting the calculated result,
The arithmetic processing can be performed at a higher speed than the conventional arithmetic processing of performing the cumulative addition using the addition counter using only the value specified by the effective vector length.

【００３０】[0030]

【発明の効果】以上説明したように、本発明は、マスク
レジスタより出力されるマスクデータの全ビットに対し
て先頭ビットから有効ベクトル長で指定された値のビッ
トまでのマスクデータはそのまま出力し有効ベクトル長
で指定された値のビットから最大有効長ビットまでのマ
スクデータは“０" に差し替え出力する有効ビット判別
部と、有効ビット判別部から出力された有効ビット判別
済みのすべてのマスクデータに対してビットの値が
“１”であるビットの数を計算する演算器を用いること
により、マスクデータ中で先頭から有効クトル長で指定
されるビット数までの内ビットの値が“１”であるビッ
トの数を数える演算を高速に処理することができるとい
う効果を奏する。As described above, according to the present invention, the mask data from the first bit to the bit of the value specified by the effective vector length is output as it is for all the bits of the mask data output from the mask register. The mask data from the bit of the value specified by the effective vector length to the maximum effective length bit is replaced with “0”, and the valid bit discriminator that outputs the mask data, and all the mask data that has been outputted from the valid bit discriminator and whose valid bit has been discriminated By using an arithmetic unit that calculates the number of bits whose bit value is “1”, the value of the bits from the head to the number of bits specified by the effective vector length in the mask data is “1”. This has the effect that the operation for counting the number of bits can be processed at high speed.

[Brief description of the drawings]

【図１】本発明の第１の実施例のブロック図である。FIG. 1 is a block diagram of a first embodiment of the present invention.

【図２】本発明の第２の実施例のブロック図である。FIG. 2 is a block diagram of a second embodiment of the present invention.

【図３】本発明の第１の実施例の演算を説明するイメー
ジ図である。FIG. 3 is an image diagram illustrating a calculation according to the first embodiment of the present invention.

【図４】本発明の第２の実施例の演算を説明するイメー
ジ図である。FIG. 4 is an image diagram illustrating a calculation according to a second embodiment of the present invention.

[Explanation of symbols]

０１マスクレジスタ０２〜１０マスクレジスタ出力データ１１有効ビット判別部１２有効ベクトル長１３ビットパタン生成回路１４〜２２ビットパタン生成回路出力データ２３ビット差し替え回路２４〜３２ビット差し替え回路出力データ３３ビット“１”カウント演算部３４〜３６１ビット全加算器３７〜３９キャリィ出力データ４０〜４２サム出力データ４３第１演算器群４４〜４５１ビット全加算器４６第２演算器群４７〜４８キャリィ出力データ４９サム出力データ５０演算結果ビット３出力データ５１ “０”信号入力５２１ビット全加算器５３キャリィ出力データ５４演算結果ビット２出力データ５５ “０”信号入力５６１ビット全加算器５７演算結果ビット０出力データ５８演算結果ビット１出力データ５９第２演算器群６０２入力加算器６１演算結果 01 mask register 02-10 mask register output data 11 valid bit discriminator 12 effective vector length 13 bit pattern generation circuit 14-22 bit pattern generation circuit output data 23 bit replacement circuit 24-32 bit replacement circuit output data 33 bits "1" Count operation unit 34-36 1-bit full adder 37-39 Carry output data 40-42 Sum output data 43 First operation unit group 44-45 1-bit full adder 46 Second operation unit group 47-48 Carry output data 49 Sum output data 50 Operation result bit 3 output data 51 “0” signal input 52 1-bit full adder 53 Carry output data 54 Operation result bit 2 output data 55 “0” signal input 56 1-bit full adder 57 Operation result bit 0 Output data 58 Operation result bit Output data 59 second arithmetic unit group 60 2-input adder 61 operation results

Claims

(57) [Claims]

1. A vector register, a mask register for outputting mask data for determining whether the operation of vector data corresponding to the number of words of the vector register is valid or invalid, and all bits of the mask data output from the mask register The mask data from the first bit to the bit of the value specified by the effective vector length is output as it is, and the mask data from the bit of the value specified by the effective vector length to the maximum effective length bit is replaced with “0”. A valid bit discriminator for outputting, and a bit “1” count calculator for calculating the number of bits having a value of “1” for all the mask data for which valid bit discrimination has been output from the valid bit discriminator; A vector operation device comprising:

Wherein the input data with the Oh Ru bit width by equally dividing an arbitrary bit width input multiple addition of calculating the number of bits the value of each bit is "1" the respective input data A first operation unit group having an operation unit, and bits each having the same weight in the operation result data obtained as an output from the first operation unit group, which are separately divided and input, and each bit of the equal division data is input. And a second operation unit group that outputs the number of bits whose value is “1”, and the second operation unit group until the number of bits having the same weight for the operation result of the second operation unit group becomes only one bit. By constructing the arithmetic unit group hierarchically
, Characterized by comprising said bit "1" count calculating unit that performs processing for outputting the output data of said hierarchically structured second arithmetic unit as ordered calculation result bit to the weight order of each bit The vector operation device according to claim 1, wherein

3. The bit "1" count operation unit divides input data having a certain bit width into equal bit widths and inputs the same, and the bit value of each bit of each input data is "1". And a first arithmetic unit group having a plurality of addition arithmetic units for calculating the number of bits, and bits having the same weight in the operation result data obtained as an output from the first arithmetic unit group are separately divided into equal parts and input. A second arithmetic unit group that outputs the number of bits in which the value of each bit of the equally divided data is “1”, and a bit having the same weight as the arithmetic result of the second arithmetic unit group are The second arithmetic unit group is hierarchically configured until the number of bits becomes 2 bits or less, and bits having the same weight for the operation result output from the second arithmetic unit group are arranged in the order of their respective weights. A two-input adder that performs a process of inputting and outputting a calculation result 2. The vector operation device according to claim 1, comprising: