JP2928301B2

JP2928301B2 - Vector processing equipment

Info

Publication number: JP2928301B2
Application number: JP33594289A
Authority: JP
Inventors: 正守柏山
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1989-12-25
Filing date: 1989-12-25
Publication date: 1999-08-03
Anticipated expiration: 2014-08-03
Also published as: JPH03196257A

Description

【発明の詳細な説明】［産業上の利用分野］本発明は、ベクトルデータ処理装置に係り、特にベク
トルエレメントの圧縮及び拡張処理に関するものであ
る。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a vector data processing device, and more particularly to compression and expansion processing of vector elements.

［従来の技術］ベクトルデータ処理装置は、そのチェイニング機能に
よって複数の演算器を時間的にオーバーラップして使用
することができ、高いデータ処理能力を有している。[Related Art] A vector data processing device can use a plurality of arithmetic units in a time overlapping manner by a chaining function, and has a high data processing capability.

すなわち、ベクトルデータ処理装置では、高いデータ
処理能力を実現するため１マシンサイクル中にベクトル
エレメントを並列に処理する。That is, in the vector data processing device, vector elements are processed in parallel in one machine cycle in order to realize high data processing capability.

そのため、例えば、４エレメント並列処理の場合、4n
（ｎ＝０、１、２…）番エレメントを処理する複数のベ
クトルレジスタと演算器の組と、4n＋１番エレメントを
処理する組、4n＋１番エレメントを処理する組、4n＋３
番エレメントを処理する独立した組から構成され、それ
ぞれの組では１マシンサイクルに１エレメントの演算処
理が可能である。これにより、１マシンサイクルに４エ
レメントの演算処理が達成される。Therefore, for example, in the case of 4-element parallel processing, 4n
(N = 0, 1, 2,...) A plurality of sets of vector registers and arithmetic units for processing elements, a set for processing 4n + 1 elements, a set for processing 4n + 1 elements, 4n + 3
It is composed of independent sets for processing the No. element, and each set can perform one element of arithmetic processing in one machine cycle. Thereby, the arithmetic processing of four elements is achieved in one machine cycle.

このようなベクトルエレメント処理装置の例として
は、日経エレクトロニクス、1987年、12月28日、（No.4
37）第111-125頁で紹介されているもの等が知られてい
る。An example of such a vector element processing device is described in Nikkei Electronics, December 28, 1987, (No. 4
37) The ones introduced on pages 111-125 are known.

ところで、画動処理等では、高速度の演算を可能とす
るために、ベクトルデータに対してベクトルエレメント
の有効エレメントに対する圧縮及び、拡張処理を行う場
合がある。すなわち、たとえば、行列演算において、行
列間の演算を必要としない成分データや、あらかじめ結
果が予測できる成分データを、除外して行列データを圧
縮して、演算を行い、結果、得られたデータの並びを元
の行列の並びに復元（拡張）する場合がある。By the way, in image processing or the like, in order to enable high-speed operation, there is a case where compression and expansion processing of effective elements of vector elements are performed on vector data. That is, for example, in a matrix operation, component data that does not require an operation between matrices or component data for which a result can be predicted in advance is excluded, matrix data is compressed, and an operation is performed. In some cases, the arrangement is restored (extended) to the arrangement of the original matrix.

このような、ベクトルデータの圧縮、拡張は、前記エ
レメント並列処理を行なわない場合、例えば１エレメン
ト並列処理では容易である。しかし、これでは演算の高
速化を図ることはできない。Such compression and expansion of vector data is easy when element parallel processing is not performed, for example, by one-element parallel processing. However, this cannot speed up the operation.

そこで、画動処理等で圧縮、拡張を行う場合、相互に
独立しているエレメント並列間のベクトルエレメントの
乗り換えを行い、エレメント並列処理を可能とする必要
がある。Therefore, when compression and expansion are performed by image processing or the like, it is necessary to switch vector elements between mutually independent element parallels to enable element parallel processing.

このため、効率的なベクトルデータの圧縮、拡張手段
に関して色々検討された経緯があった。For this reason, there have been various studies on efficient vector data compression and expansion means.

これらの従来の技術としては、たとえば、特開昭60-2
63268号公報記載の第１の従来技術や特開昭59-140581号
公報記載の第２の従来技術等が知られている。For example, Japanese Patent Application Laid-Open No.
A first prior art described in 63268 and a second prior art described in JP-A-59-140581 are known.

これらの従来技術においては、各ベクトルエレメント
の有効無効を示すマスクビットをベクトルエレメントに
対応して設け、たとえば４エレメントよりなる行列の行
列間演算において、４エレメント並列処理を行う合は第
１の４エレメントの内マスクビットが有効（演算を行
う）を表すエレメントのみをアライナー回路にて抽出し
間隔を詰めて（連続とする）、４エレメントを一行とす
るデータバッファの第１の行に書き込み、その後、デー
タバッファの第１の行の空き部分には次の第２の４エレ
メントの内マスクビットが有効を表すエレメントのみを
アライナー回路にて抽出し間隔を詰めて空きエレメント
分書き込むことにより、圧縮処理を実現している。すな
わち、圧縮された１行を得るために、複数回、データバ
ッファの同一行に書き込みを行わなければならない。ま
た、アライナー回路にて抽出された第２の４エレメント
中のエレメントの内、第１の行に書き込まれなかったエ
レメントは第１の次の行に書き込まれなければならな
い。In these prior arts, a mask bit indicating the validity / invalidity of each vector element is provided in correspondence with the vector element. Of the elements, only those elements whose mask bits indicate valid (perform the operation) are extracted by the aligner circuit, the intervals are narrowed (consecutive), and written in the first row of the data buffer having four rows as one element. In the empty portion of the first row of the data buffer, only the elements of the following second four elements, for which the mask bits are valid, are extracted by the aligner circuit, and the intervals are reduced to write the empty elements, thereby performing the compression process. Has been realized. In other words, in order to obtain one compressed row, it is necessary to write to the same row of the data buffer a plurality of times. In addition, among the elements in the second four elements extracted by the aligner circuit, the elements not written in the first row must be written in the first next row.

また、第２の従来技術記載のハードウェア構成を利用
することにより、データバッファより圧縮された第１の
行のデータ、すなわち本例においては４エレメント読み
出し、マスクビットの情報に基づいて、アライナー回路
にて４エレメント中から第１の行列に属するエレメント
のみ抽出し、本来の並びに並び変えて第１の行列を復元
し、また、残りのエレメントの内、第２の行列に属する
エレメントのみ抽出し、本来の並びに並び変え、さら
に、不足分のエレメントがある場合には、データバッフ
ァより第２の行を読み出し、第２の行列に属するエレメ
ントのみ抽出し、本来の並びに並び変え、先に第１の行
より復元したものと合わせ、第２の行列を復元すること
により拡張処理を実現することができる。また、この場
合、第２の行のエレメントのうち、第２の行列に属さな
いエレメントについては、以降の行列の復元に用いられ
なければならない。Also, by utilizing the hardware configuration described in the second prior art, the aligner circuit is read out based on the data of the first row compressed from the data buffer, that is, four elements in this example, and information of mask bits. Extracts only the elements belonging to the first matrix from among the four elements, restores the first matrix by rearranging the original matrix, and extracts only the elements belonging to the second matrix among the remaining elements, If there is an insufficient element, the second row is read from the data buffer, only the elements belonging to the second matrix are extracted, the original row is rearranged, and the first row is rearranged. The extension processing can be realized by restoring the second matrix together with the one restored from the row. In this case, among the elements in the second row, elements that do not belong to the second matrix must be used for restoring the subsequent matrix.

［発明が解決しようとする課題］以上の従来技術によれば、前記データバッファの書き
込みアドレスや切り替えタイミング等の制御が煩雑とな
る問題があった。特に、前述したように、圧縮または拡
張処理前の行が、処理後、異なる行にまたがる場合の処
理において、その制御の煩雑さは顕著であった。[Problems to be Solved by the Invention] According to the above-described conventional technology, there is a problem that control of the write address of the data buffer, switching timing, and the like is complicated. In particular, as described above, in a process in which a row before compression or expansion processing extends to a different row after processing, the complexity of the control is remarkable.

また、かかる煩雑な制御を実現するために、ハードウ
ェア量が増大するという問題があった。In addition, there is a problem that the amount of hardware increases to realize such complicated control.

そこで、本発明は、ベクトルエレメントの圧縮、拡張
を、簡易な制御によりハードウェアの物理量増加を招く
ことなく実現できるベクトルデータ処理装置を提供する
ことを目的とする。Accordingly, it is an object of the present invention to provide a vector data processing device capable of realizing compression and expansion of vector elements by simple control without increasing the physical quantity of hardware.

［課題を解決するための手段］前記目的達成のために本発明は、ｎ個のベクトルエレ
メントよりなるベクトルデータをｍ個、同一ベクトルデ
ータに属するベクトルエレメントを並列に保持する第１
のベクトルレジスタと、ｎ個のベクトルエレメントよりなるベクトルデータを
１個以上、同一ベクトルデータに属するベクトルエレメ
ントを並列に保持する第２のベクトルレジスタと、第１のベクトルレジスタに保持された各ベクトルエレ
メントに対応して、データの有効無効を示すｎ×ｍ個の
マスクビットを有するベクトルマスクレジスタと、ｎ個のベクトルエレメントを並列に保持可能なデータ
バッファと、第１のベクトルレジスタに格納されている順に、ベク
トルデータの各ベクトルエレメント中、対応するマスク
ビットが有効を示すベクトルエレメントを、前記第１の
ベクトルレジスタにおける並列の並びの順に、順次、デ
ータバッファに記憶空間上連続して格納するアライナー
手段と、ｎ個のベクトルエレメントがデータバッファに格納さ
れた時点で、データバッファに格納されたベクトルエレ
メントを並列に第２のベクトルレジスタに格納する手段
と、を有することを特徴とする第１のベクトルデータ処理
装置を提供する。[Means for Solving the Problems] To achieve the above object, the present invention provides a first method for holding m vector data composed of n vector elements and vector elements belonging to the same vector data in parallel.
, A second vector register holding in parallel one or more vector data consisting of n vector elements, and vector elements belonging to the same vector data, and each vector element held in the first vector register Corresponding to the above, a vector mask register having n × m mask bits indicating data validity / invalidity, a data buffer capable of holding n vector elements in parallel, and stored in a first vector register Aligner means for sequentially storing, in the data buffer, vector elements in each of the vector elements of the vector data, for which the corresponding mask bit indicates validity, in the data buffer in the order of parallel arrangement in the first vector register. And n vector elements are data buffers Means for storing the vector elements stored in the data buffer in parallel in the second vector register when the vector elements are stored in the first vector data processor.

また、本発明は、前記目的達成のために、ｎ個のベク
トルエレメントよりなるベクトルデータを１個以上、同
一ベクトルデータに属するベクトルエレメントを並列に
保持する第１のベクトルレジスタと、ｎ個のベクトルエレメントよりなるベクトルデータを
ｍ個、同一ベクトルデータに属するベクトルエレメント
を並列に保持する第２のベクトルレジスタと、第２のベクトルレジスタに格納すべきベクトルデータ
の各ベクトルエレメントに対応して、データの有効無効
を示すｎ×ｍ個のマスクビットを有するベクトルマスク
レジスタと、ｎ個のベクトルエレメントを保持可能なデータバッフ
ァと第２のベクトルレジスタに格納すべき１ベクトルデー
タを単位として、該ベクトルデータに対応するマスクビ
ットのうち有効を示すマスクビットの数分のベクトルエ
レメントを、第１のベクトルレジスタにおけるベクトル
データの格納の順に、第１のベクトルレジスタにおける
並列の並びの早い方より抽出し、該ベクトルデータの有
効ベクトルエレメントとしてデータバッファに並列に格
納する手段と、データバッファに格納された前記１ベクトルデータ分
の有効ベクトルエレメントを、該ベクトルデータに対応
するマスクビットのうち有効を示すマスクビットの並び
に基づいて並び替えるアライナー手段と、並び変えた１
ベクトルデータ分の有効ベクトルエレメントのみを、該
ベクトルデータに対応するマスクビットに基づいて、有
効を示すマスクビットに対応する第２のレジスタの記憶
領域に並列に格納する手段と、を有することを特徴とする第２のベクトルデータ処理
装置を提供する。Further, according to the present invention, in order to achieve the above object, a first vector register holding, in parallel, one or more vector data composed of n vector elements and vector elements belonging to the same vector data, A second vector register that holds m vector data composed of elements and vector elements belonging to the same vector data in parallel, and a vector data corresponding to each vector element of the vector data to be stored in the second vector register. A vector mask register having n × m mask bits indicating valid / invalid, a data buffer capable of holding n vector elements, and one vector data to be stored in the second vector register as a unit. Mask bit indicating valid among corresponding mask bits A number of vector elements are extracted in the order of storage of the vector data in the first vector register from the earlier parallel array in the first vector register, and are stored in the data buffer in parallel as effective vector elements of the vector data. And an aligner for rearranging valid vector elements for the one vector data stored in the data buffer based on an array of mask bits indicating validity among mask bits corresponding to the vector data.
Means for storing, in parallel, only valid vector elements for the vector data in the storage area of the second register corresponding to the mask bits indicating validity, based on the mask bits corresponding to the vector data. And a second vector data processing device.

また、さらに本発明は、前記目的達成のために、ｎ個
のベクトルエレメントよりなるベクトルデータをｍ個、
同一ベクトルデータに属するベクトルエレメントを並列
に保持する１組以上のベクトルレジスタと、処理対象となる各ベクトルエレメントに対応して、デ
ータの有効無効を示すｎ×ｍ個のマスクビットを有する
ベクトルマスクレジスタと、ｎ個のベクトルエレメントを並列に保持可能な第１の
データバッファと、ｎ個のベクトルエレメントを並列に保持可能な第２の
データバッファと、圧縮処理時は、処理を実行する各ベクトルデータのベ
クトルエレメント中、該ベクトルデータに対応するマス
クビットが有効を示すベクトルエレメントを、ベクトル
レジスタにおける並列の並びの順に並列に抽出し、第１
のデータバッファに格納し、拡張処理時は、１ベクトルデータを単位として、処理
を実行する各ベクトルデータに対応するマスクビットの
うち有効を示すマスクビットの数分のベクトルエレメン
トを、ベクトルレジスタにおけるベクトルデータの格納
の順に、ベクトルレジスタにおける並列の並びの早い方
より抽出し、該ベクトルデータの有効ベクトルエレメン
トとして第１のデータバッファに格納する格納手段と、第１にデータバッファに格納されたベクトルエレメン
トを、処理に応じて該ベクトルエレメントに対応するマ
スクビットを参照して、順次、並び変え第２のデータバ
ッファに格納するアライナー手段と、処理に応じて第２のデータバッファに格納されたベク
トルエレメントに対応するマスクビットを参照して、該
ベクトルエレメントのベクトルレジスタへの並列書き込
みを制御する手段と、を有することを特徴とする第３のベクトルデータ処理
装置を提供する。Further, according to the present invention, in order to achieve the object, m vector data including n vector elements
One or more sets of vector registers for holding vector elements belonging to the same vector data in parallel, and a vector mask register having n × m mask bits indicating validity / invalidity of data corresponding to each vector element to be processed A first data buffer capable of holding n vector elements in parallel; a second data buffer capable of holding n vector elements in parallel; each vector data to be processed during compression processing; Of the vector elements of which the mask bit corresponding to the vector data indicates validity is extracted in parallel in the order of the parallel arrangement in the vector register.
During the expansion process, the vector elements in the vector register are stored in the vector register in units of one vector data, the number of mask bits indicating the validity among the mask bits corresponding to each vector data to be processed. Storage means for extracting data from the earlier parallel arrangement in the vector register in the order of data storage, and storing the vector data as an effective vector element in the first data buffer; first, the vector element stored in the data buffer Aligner means for sequentially rearranging and storing in a second data buffer by referring to a mask bit corresponding to the vector element according to the processing, and a vector element stored in the second data buffer according to the processing Is referred to the mask bit corresponding to Providing a third vector data processing apparatus characterized by comprising means for controlling the parallel writing to placement of the vector register, a.

すなわち、本発明は、前記目的達成のために、エレメ
ント並列単位に設けた複数のベクトルレジスタに保持し
ているベクトルデータの圧縮または拡張処理を、 1.ベクトルレジスタからデータバッファへの読出し処理
と、 2.アライナー回路による順序変換処理と、 3.データバッファからベクトルレジスタへの書き込み処
理の３種類に分けて実現したものである。That is, in order to achieve the above object, the present invention includes a process of compressing or expanding vector data held in a plurality of vector registers provided in units of element parallel, and a process of reading from a vector register to a data buffer; 2. The order conversion process by the aligner circuit and the writing process from the data buffer to the vector register are realized in three types.

すなわち、ベクトルデータ圧縮の場合は、 1.ベクトルエレメントをベクトルレジスタより読出す論
理の実現。That is, in the case of vector data compression: 1. Realization of logic for reading vector elements from vector registers.

2.1で読出されたベクトルエレメントをエレメント並列
数だけ設けたデータバッファに順次セットする論理の実
現。Implement logic to sequentially set the vector elements read in 2.1 in the data buffer provided by the number of parallel elements.

3.2でセットしたベクトルエレメントを有効マスクビッ
トの数だけ読出しベクトルレジスタに書き込む論理の実
現。Implement logic to read the vector elements set in 3.2 by the number of valid mask bits and write them to the vector register.

である。 It is.

また、ベクトルデータの拡張の場合は、 1.マスクビットが有効なベクトルエレメントのみベクト
ルレジスタより読出す論理の実現。In addition, in the case of expansion of vector data, 1. Realization of the logic to read only vector elements for which mask bits are valid from the vector register.

2.1.で読出されたベクトルエレメントを要素並列対応に
設けたデータバッファのマスクビットが有効であること
を示すエレメントに対応したデータバッファにセットす
る論理の実現。Implementation of logic for setting the vector element read in 2.1. In the data buffer corresponding to the element indicating that the mask bit of the data buffer provided for element parallelism is valid.

3.データバッファの内容を順次読出し、マスクビットが
有効であることを示すエレメントに対応したベクトルレ
ジスタにのみ書き込み指示を送出し、2.の有効データを
書き込む論理の実現。3. Realize the logic of reading the contents of the data buffer sequentially, sending a write instruction only to the vector register corresponding to the element indicating that the mask bit is valid, and writing valid data in 2.

である。 It is.

［作用］本発明に係る第１のベクトルデータ処理装置によれ
ば、圧縮処理は、まず、アライナー手段は、第１のベク
トルレジスタに格納されている順に、ベクトルデータの
各ベクトルエレメント中、対応するマスクビットが有効
を示すベクトルエレメントを、前記第１のベクトルレジ
スタにおける並列の並びの順に、順次、データバッファ
に記憶空間上連続して格納する。[Operation] According to the first vector data processing device of the present invention, in the compression processing, first, the aligner means performs the corresponding processing in each vector element of the vector data in the order stored in the first vector register. Vector elements whose mask bits indicate validity are sequentially stored in the data buffer in the storage space in the order of parallel arrangement in the first vector register.

そして、ｎ個のベクトルエレメントがデータバッファ
に格納された時点で、並列にデータバッファに格納され
たベクトルエレメントを、第２のベクトルレジスタに格
納する。Then, when the n vector elements are stored in the data buffer, the vector elements stored in the data buffer in parallel are stored in the second vector register.

また、本発明に係る第２のベクトルデータ処理装置に
よれば、まず、第２のベクトルレジスタに格納すべき１
ベクトルデータを単位として、該ベクトルデータに対応
するマスクビットのうち有効を示すマスクビットの数分
のベクトルエレメントを、第１のベクトルレジスタにお
けるベクトルデータの格納の順に、第１のベクトルレジ
スタにおける並列の並びの早い方より抽出し、該ベクト
ルデータの有効ベクトルエレメントとしてデータバッフ
ァに並列に格納する。そして、アライナー手段が、デー
タバッファに格納された前記１ベクトルデータ分の有効
ベクトルエレメントを、該ベクトルデータに対応するマ
スクビットのうち有効を示すマスクビットの並びに基づ
いて並び替える。また、その後、並び変えた１ベクトル
データ分の有効ベクトルエレメントのみを、該ベクトル
データに対応するマスクビットに基づいて、有効を示す
マスクビットに対応する第２のレジスタの記憶領域に並
列に格納する。Further, according to the second vector data processing device of the present invention, first, the first vector data to be stored in the second vector register
Using the vector data as a unit, vector elements for the number of mask bits indicating validity among the mask bits corresponding to the vector data are stored in parallel in the first vector register in the order of storage of the vector data in the first vector register. It is extracted from the earlier one in the arrangement and stored in parallel in the data buffer as an effective vector element of the vector data. Then, the aligner reorders the valid vector elements for the one vector data stored in the data buffer based on the sequence of the mask bits indicating valid among the mask bits corresponding to the vector data. After that, based on the mask bits corresponding to the vector data, only the rearranged valid vector elements for one vector data are stored in parallel in the storage area of the second register corresponding to the mask bits indicating validity. .

また、本発明に係る第３のベクトルデータ処理装置に
よれば、圧縮処理時は、まず、処理を実行する各ベクト
ルデータのベクトルエレメント中、該ベクトルデータに
対応するマスクビットが有効を示すベクトルエレメント
が、ベクトルレジスタにおける並列の並びの順に並列に
抽出し、第１のデータバッファに格納される、また、拡張処理時は、まず、１ベクトルデータを単位
として、処理を実行する各ベクトルデータに対応するマ
スクビットのうち有効を示すマスクビットの数分のベク
トルエレメントが、ベクトルレジスタにおけるベクトル
データの格納の順に、ベクトルレジスタにおける並列の
並びの早い方より抽出され、該ベクトルデータの有効ベ
クトルエレメントとして第１のデータバッファに格納さ
れる。According to the third vector data processing device of the present invention, at the time of compression processing, first, among the vector elements of each vector data to be processed, a vector element in which a mask bit corresponding to the vector data indicates validity Are extracted in parallel in the order of the parallel arrangement in the vector register and stored in the first data buffer. Also, at the time of expansion processing, first, one vector data corresponds to each vector data to be processed. Vector elements of the number of mask bits indicating validity among the mask bits to be extracted are extracted in the order of storage of the vector data in the vector register from the earlier parallel arrangement in the vector register, and are extracted as valid vector elements of the vector data. 1 data buffer.

そして、第１にデータバッファに格納されたベクトル
エレメントは、アライナー手段により、処理に応じて該
ベクトルエレメントに対応するマスクビットを参照し
て、順次、並び変えられ、第２のデータバッファに格納
される。また、第２のデータバッファに格納されたベク
トルエレメントは、処理に応じて対応するマスクビット
に基づいてベクトルレジスタへ並列に書き込まれる。Then, the vector elements stored in the first data buffer are sequentially rearranged by the aligner with reference to the mask bits corresponding to the vector elements according to the processing, and stored in the second data buffer. You. Further, the vector elements stored in the second data buffer are written in parallel to the vector register based on the corresponding mask bits according to the processing.

［実施例］以下、本発明に係るベクトル処理装置の一実施例につ
いて説明する。Embodiment An embodiment of a vector processing device according to the present invention will be described below.

第１図に、本実施例に係るベクトル処理装置の構成を
示す。FIG. 1 shows a configuration of a vector processing device according to the present embodiment.

エレメント並列に独立したベクトルレジスタ1a〜1d
は、RAMで構成されており、ベクトルデータ4n番エレメ
ントをベクトルレジスタ1aが保持し、4n＋１番エレメン
トをベクトルレジスタ1bが保持し、4n＋２番エレメント
をベクトルレジスタ1cが保持し、4n＋３番エレメントを
ベクトルレジスタ1dが保持する（ｎ＝０、１、２、３
…）。Vector registers 1a to 1d independent of element parallel
Is composed of RAM, the vector data 4n element is stored in the vector register 1a, the 4n + 1 element is stored in the vector register 1b, the 4n + 2 element is stored in the vector register 1c, and the 4n + 3 element is stored in the vector register. 1d holds (n = 0, 1, 2, 3
…).

なお、図示はしないがそれぞれのベクトルレジスタ1a
〜1dは、複数の組から構成されたプログラムで指定する
番号のレジスタが読出し選択されるようになっている。Although not shown, each vector register 1a
1 to 1d, a register of a number designated by a program composed of a plurality of sets is read and selected.

また、ベクトルレジスタ1a〜1dに対応して、それぞれ
読出しエレメントのアドレスを示す読出しアドレス発生
回路2a〜2dが設けてあり、読出しアドレス発生回路2a〜
2dは、カウンター論理で構成されており、ベクトルデー
タ読出し制御回路４から一対一に指示されるカウントア
ップ指示によりベクトルデータエレメントが格納されて
いるRAMアドレスを発生し保持データをベクトルレジス
タ1a〜1dから出力する。Further, read address generation circuits 2a to 2d indicating the addresses of the read elements are provided corresponding to the vector registers 1a to 1d, respectively, and the read address generation circuits 2a to 2d are provided.
2d is composed of counter logic, generates a RAM address in which a vector data element is stored according to a count-up instruction instructed by the vector data read control circuit 4 on a one-to-one basis, and stores held data from the vector registers 1a to 1d. Output.

また、通常の動作時は、ベクトルデータ読出し制御回
路４から読出しアドレス発生回路2a〜2dに対して同時に
送出されるカウントアップ指示により、読出しアドレス
はインクリメントされベクトルデータは４エレメント同
時に読出される。In a normal operation, the read address is incremented by a count-up instruction sent simultaneously from the vector data read control circuit 4 to the read address generation circuits 2a to 2d, and the vector data is read simultaneously for four elements.

ベクトルマスクレジスタ3a〜3dは、ベクトルレジスタ
に保持されているベクトルデータのエレメントデータの
有効無効を示す1bitのマスクビットを、各ベクトルエレ
メントに対応して有し、また、ベクトルマスクレジスタ
3a〜3dは、ベクトルレジスタ1a〜1d対応に独立にある。The vector mask registers 3a to 3d have 1-bit mask bits indicating validity / invalidity of the element data of the vector data held in the vector register corresponding to each vector element.
3a to 3d are independently provided corresponding to the vector registers 1a to 1d.

データバッファ5a〜5dは、ベクトルレジスタ1a〜1dに
対応してあり、各ベクトルレジスタから読み出されたデ
ータを保持する。The data buffers 5a to 5d correspond to the vector registers 1a to 1d, and hold data read from each vector register.

データバッファ6a〜6dはデータバッファ5a〜5dに対応
してあり、データバッファ5a〜5dの内容が転送される。The data buffers 6a to 6d correspond to the data buffers 5a to 5d, and the contents of the data buffers 5a to 5d are transferred.

マスクデータバッファ9a〜9dは、ベクトルマスクレジ
スタ3a〜3dに対応して有り、各ベクトルマスクレジスタ
より読み出されたマスクビットを保持する。The mask data buffers 9a to 9d correspond to the vector mask registers 3a to 3d and hold the mask bits read from each vector mask register.

マスクデータバッファ10a〜10dには、拡張命令処理
時、マスクデータバッファ9a〜9dの内容がコピーされ
る。The contents of the mask data buffers 9a to 9d are copied to the mask data buffers 10a to 10d during the processing of the extension instruction.

アライナー回路７は、本例においてはクロスバースイ
ッチ構造を有し、データバッファ6a〜6dに保持されてい
るベクトルエレメントデータを、データバッファ8a〜8d
の全てに対して乗り換えコピーすることができ、マスク
データバッファ9a〜9d、マスクデータバッファ10a〜10d
の内容に応じて動作する。The aligner circuit 7 has a crossbar switch structure in this example, and converts the vector element data held in the data buffers 6a to 6d into data buffers 8a to 8d.
Can be changed and copied for all of the mask data buffers 9a to 9d and the mask data buffers 10a to 10d.
Operate according to the content of

データバッファ11a〜11dはデータバッファ8a〜8dに対
応してあり、データバッファ8a〜8dの内容が転送され
る。The data buffers 11a to 11d correspond to the data buffers 8a to 8d, and the contents of the data buffers 8a to 8d are transferred.

マスクカウント回路14は、圧縮命令処理時、マスクデ
ータバッファ9a〜9d中の有効マスクビット‘1'の数をカ
ウントし、ベクトルレジスタ12a〜12dの書き込み信号を
セレクタ17a〜17dを介して、発生する。The mask count circuit 14 counts the number of valid mask bits '1' in the mask data buffers 9a to 9d at the time of processing a compression instruction, and generates a write signal for the vector registers 12a to 12d via the selectors 17a to 17d. .

ベクトルデータ書き込み制御回路15は、ベクトルデー
タ読出し制御回路４から出力されるベクトルエレメント
データ及びベクトルエレメントデータ読出し指示に同期
してカウントアップ指示を、書き込みアドレス発生回路
13a〜13dに送出する。書き込みアドレス発生回路13a〜1
3dはこれに従い、ベクトルレジスタ12a〜12dの書き込み
アドレスを発生するまた、拡張命令処理時、セレクタ16a〜16d、セレクタ
17a〜17dマスクデータバッファ10a〜10dの内容を選択出
力し、ベクトルレジスタ12a〜12dの書き込みをマスクす
る。The vector data write control circuit 15 outputs the count-up instruction in synchronization with the vector element data output from the vector data read control circuit 4 and the vector element data read instruction.
13a to 13d. Write address generation circuits 13a-1
3d generates write addresses for the vector registers 12a to 12d in accordance with this.
17a to 17d Selectively output the contents of the mask data buffers 10a to 10d and mask writing of the vector registers 12a to 12d.

ベクトルレジスタ12a〜12dはデータバッファ11a〜11d
の内容が書き込まれる。この書き込まれた結果が、圧縮
または拡張処理の結果となる。Vector registers 12a to 12d are data buffers 11a to 11d
Is written. The written result becomes the result of the compression or expansion processing.

なお、書き込みアドレス発生回路13a〜13dは、読出し
アドレス発生回路と同様の構成を取るものとする。ま
た、ベクトルレジスタ12a〜12dも前記ベクトルレジスタ
1a〜1dと同様の構成を取るものとする。Note that the write address generation circuits 13a to 13d have the same configuration as the read address generation circuit. Further, the vector registers 12a to 12d are also the vector registers.
It is assumed to have the same configuration as 1a to 1d.

ここで、以下に述べる圧縮、拡張の処理の概要を第２
図に示す（第２図参照）。Here, the outline of the compression and expansion processing described below is described in the second section.
It is shown in the figure (see FIG. 2).

以下、ベクトルデータのエレメント間圧縮命令処理の
動作について説明する。Hereinafter, the operation of the vector data inter-element compression instruction processing will be described.

ベクトルレジスタ1a〜1dに格納されているベクトルデ
ータは、ベクトルデータ読出し制御回路４から指示され
る読出し指示により４エレメント並列に読出される。ま
た、ベクトルデータの読出し指示に同期してベクトルマ
スクレジスタ3a〜3dに格納されているマスクビットが４
エレメント分読出される。The vector data stored in the vector registers 1a to 1d is read in parallel by four elements according to a read instruction issued from the vector data read control circuit 4. In addition, the mask bits stored in the vector mask registers 3a to 3d are set to 4 in synchronization with the vector data read instruction.
Elements are read out.

データバッファ5a〜5dは、ベクトルレジスタ1a〜1dか
ら読出されたベクトルエレメントデータを並列に一時的
にセットするバッファであり図示はしないが、ベクトル
データ読出し制御回路４から制御される。データバッフ
ァ5a〜5dからのベクトルエレメントデータ読出しは、エ
レメント番号の小さい順にシリアルに読出される。The data buffers 5a to 5d are buffers for temporarily setting vector element data read from the vector registers 1a to 1d in parallel, and are controlled by the vector data read control circuit 4 (not shown). Vector element data is read from the data buffers 5a to 5d serially in ascending order of element number.

データバッファ6a〜6dは、データバッファ5a〜5dから
シリアルに読出されたベクトルエレメントデータを順次
セットするデータバッファであり、セット指示は、ベク
トルデータ読出し制御回路４から発行される。The data buffers 6a to 6d are data buffers for sequentially setting vector element data serially read from the data buffers 5a to 5d, and a set instruction is issued from the vector data read control circuit 4.

アライナー回路７は、データバッファ6a〜6dに保持さ
れているベクトルエレメントデータを、データバッファ
8a〜8dの全てに対して乗り換えることができるクロスバ
ースイッチ構造になっている。The aligner circuit 7 converts the vector element data held in the data buffers 6a to 6d into data buffers.
It has a crossbar switch structure that can switch to all of 8a to 8d.

マスクデータバッファ9a〜9dは、ベクトルマスクレジ
スタ3a〜3dから読出されたマスクビットをエレメント並
列ごとに保持出来る構成になっている。また、４エレメ
ント分のマスクビットはデータバッファ5a〜5dから読出
されたベクトルエレメントデータがシリアルにデータバ
ッファ6a〜6dに４エレメント分セットされるとき、同時
にベクトルデータ読出し制御回路４からの指示でマスク
データバッファ9a〜9dセットされる。正確には、データ
バッファ6aに４エレメント分の最初のエレメントデータ
がセットされるときに４エレメント分同時にセットされ
る。The mask data buffers 9a to 9d are configured to be able to hold the mask bits read from the vector mask registers 3a to 3d for each element parallel. When the vector element data read from the data buffers 5a to 5d is serially set in the data buffers 6a to 6d for four elements, the mask bits for the four elements are simultaneously masked by an instruction from the vector data read control circuit 4. Data buffers 9a to 9d are set. To be more precise, when the first element data for four elements is set in the data buffer 6a, they are simultaneously set for four elements.

ベクトルデータのエレメント間圧縮命令処理のとき、
アライナー回路では、マスクデータバッファ9a〜9dを検
査し、有効マスクビット‘1'が立っているエレメントに
対応するデータバッファ6a〜6dの内容を、順次、データ
バッファ8a〜8dに乗り換えコピーする。When processing the compression instruction between elements of vector data,
In the aligner circuit, the mask data buffers 9a to 9d are inspected, and the contents of the data buffers 6a to 6d corresponding to the element in which the valid mask bit '1' is set are sequentially transferred to the data buffers 8a to 8d and copied.

これは、たとえば、マスクデータバッファ9a〜9dの内
容が'1001'の場合は、最初の処理であればデータバッフ
ァ6aの内容をデータバッファ8aに、データバッファ6dの
内容をデータバッファ8bにコピーし、次の処理において
は、マスクデータバッファ9a〜9dの内容が'1010'の場合
は、データバッファ6aの内容をデータバッファ8cに、デ
ータバッファ6cの内容をデータバッファ8dにとコピーす
る。This is because, for example, when the contents of the mask data buffers 9a to 9d are '1001', the contents of the data buffer 6a are copied to the data buffer 8a and the contents of the data buffer 6d are copied to the data buffer 8b in the first processing. In the next process, if the contents of the mask data buffers 9a to 9d are "1010", the contents of the data buffer 6a are copied to the data buffer 8c, and the contents of the data buffer 6c are copied to the data buffer 8d.

マスクカウント回路14は、マスクデータバッファ9a〜
9d中の有効マスクビット‘1'の数をカウントしており、
‘1'の数を４つまでカウントアップしたところで、ベク
トルレジスタ12a〜12dの書き込み指示信号を発生する。
このとき同時に圧縮処理されたベクトルデータは、デー
タバッファ8a〜8dに４エレメント分保持されていること
になる。The mask count circuit 14 includes mask data buffers 9a to 9b.
Counting the number of valid mask bits '1' in 9d,
When the number of '1's is counted up to four, a write instruction signal for the vector registers 12a to 12d is generated.
At this time, the vector data that has been subjected to the compression processing at the same time is held in the data buffers 8a to 8d for four elements.

データバッファ8a〜8dに保持されているデータは、シ
リアルにデータバッファ8aから順次読出されたデータバ
ッファ11a〜11dにセットされる。４エレメント分のベク
トルデータがそろったところで書き込み先のベクトルレ
ジスタ12a〜12dに同時に送出される。そして、前記マス
クカウント回路14から出力される書き込み指示信号に従
い、同期を取り前記圧縮データを、ベクトルレジスタ12
a〜12dに４エレメント分同時に書き込む。The data held in the data buffers 8a to 8d are set in the data buffers 11a to 11d sequentially read from the data buffer 8a in a serial manner. When the vector data for four elements is completed, the data is simultaneously sent to the write destination vector registers 12a to 12d. Then, in accordance with the write instruction signal output from the mask count circuit 14, the compressed data is synchronized with the vector register 12 and
Write four elements simultaneously to a-12d.

一方、ベクトルデータ書き込み制御回路15は、ベクト
ルデータ読出し制御回路４から出力されるベクトルデー
タ及びベクトルデータ読出し指示に同期してカウントア
ップ指示を、書き込みアドレス発生回路13a〜13dに送出
することにより順次、ベクトルレジスタ12a〜12dの書き
込みアドレスをインクリメントする。On the other hand, the vector data write control circuit 15 sequentially sends the vector data output from the vector data read control circuit 4 and a count-up instruction to the write address generation circuits 13a to 13d in synchronization with the vector data read instruction, thereby sequentially The write addresses of the vector registers 12a to 12d are incremented.

以上の処理により圧縮命令処理を実現することができ
る。With the above processing, the compression instruction processing can be realized.

なお、セレクタ17a〜17dは、それぞれエレメント並列
対応に独立して設けられているベクトルレジスタ12a〜1
2dに対応して書き込み指示信号を発行する構成になって
おり、ベクトルデータのエレメント間圧縮処理の時のみ
マスクカウント回路14の出力を選択する。Note that the selectors 17a to 17d are vector registers 12a to 1d provided independently for element parallelism, respectively.
The write instruction signal is issued corresponding to 2d, and the output of the mask count circuit 14 is selected only at the time of inter-element compression processing of vector data.

以上のように本実施例によれば、エレメント間圧縮処
理において、圧縮された一行の書き込みはベクトルレジ
スタへの一度のアクセスで済み、また、その制御も、順
次、一定数迄、有効マスクビットの数を数えるという単
純な方法により実現できるため、従来に比べ、簡易なハ
ードウェアで本実施例に係るベクトル処理装置を実現で
きる。As described above, according to the present embodiment, in the inter-element compression processing, writing of one line of compressed data only needs to be performed once to the vector register, and the control is sequentially performed to a certain number of effective mask bits. Since this can be realized by a simple method of counting the number, the vector processing device according to the present embodiment can be realized with simpler hardware than in the related art.

なお、本実施例においては、アライナー回路をクロス
バスィッチ構造としたが、前記したように本実施例によ
る圧縮処理においては、ベクトルデータの並び替えは１
ベクトルエレメントデータ毎に順次行われる。したがっ
て、アライナー回路をタイムスィッチ構造等にすること
が可能であり、これにより、さらに、ハードウェア量を
削減することができる。In the present embodiment, the aligner circuit has a cross-basis switch structure. However, as described above, in the compression processing according to the present embodiment, the rearrangement of the vector data is one.
It is performed sequentially for each vector element data. Therefore, the aligner circuit can have a time switch structure or the like, thereby further reducing the amount of hardware.

次に、ベクトルデータのエレメント間拡張命令処理時
の動作について説明する。Next, the operation at the time of processing the expansion instruction between elements of vector data will be described.

ベクトルデータ読出し制御４は、ベクトルデータのエ
レメント間拡張命令の実行時には、ベクトルレジスタ1a
〜1dに保持しているベクトルデータの読出し指示に当る
カウントアップ指示発行に際し、ベクトルマスクレジス
タ3a〜3dからマスクビットを読み出して、順次、検査
し、マスクビットが‘1'の時、すなわちベクトルエレメ
ントデータが有効の時、そしてこのマスクビットが最初
の有効ビットであった時、読出しアドレス発生回路2aに
カウントアップ指示を送出し、ベクトルレジスタ1aから
ベクトルデータを読出す。次に、また読出されたマスク
ビットが有効のとき読出しアドレス発生回路2b、その次
は2cとカウントアップ指示を送出し、順次マスクビット
が有効であるごとにそれぞれ独立に読出しアドレス発生
回路2a〜2bにカウントアップ指示を送出する。The vector data read control 4 executes the vector register 1a
When issuing a count-up instruction corresponding to an instruction to read out the vector data held in to 1d, the mask bits are read from the vector mask registers 3a to 3d and sequentially inspected, and when the mask bit is '1', that is, when the vector element is When the data is valid and when this mask bit is the first valid bit, a count-up instruction is sent to the read address generation circuit 2a, and the vector data is read from the vector register 1a. Next, when the read mask bit is valid, a read address generation circuit 2b is sent out, followed by 2c and a count-up instruction. Each time the mask bit is valid, the read address generation circuits 2a to 2b are independently output. To send a count-up instruction.

これにより、ベクトルデータのエレメント間拡張命令
処理のとき、対応するマスクビットが‘1'のときのみデ
ータバッファ6a〜6dに、新たなベクトルエレメントデー
タはセットされる。As a result, during the inter-element expansion instruction processing of the vector data, new vector element data is set in the data buffers 6a to 6d only when the corresponding mask bit is "1".

データバッファ6a〜6dの有効データは、マスクデータ
バッファ10a〜10dの内容において‘1'が立っているデー
タバッファ8a〜8dに対して、アライナー回路７を通して
乗り換える。たとえばマスクデータバッファ10a〜10dの
内容が'1001'であり、データバッファ6a〜6bにのみベク
トルレジスタ1a〜1dよりのデータが転送されている場合
は、データバッファ6aの内容をデータバッファ8aに、デ
ータバッファ6bの内容をデータバッファ8dにというよう
に、アライナー回路７を通して乗り換える。The valid data in the data buffers 6a to 6d is switched through the aligner circuit 7 to the data buffers 8a to 8d in which "1" is set in the contents of the mask data buffers 10a to 10d. For example, when the contents of the mask data buffers 10a to 10d are '1001' and the data from the vector registers 1a to 1d are transferred only to the data buffers 6a to 6b, the contents of the data buffer 6a are transferred to the data buffer 8a. The contents of the data buffer 6b are switched to the data buffer 8d through the aligner circuit 7.

マスクデータバッファ10a〜10dにはマスクデータバッ
ファ9a〜9dの内容が転送される。The contents of the mask data buffers 9a to 9d are transferred to the mask data buffers 10a to 10d.

セレクタ16a〜16dは、マスクデータバッファ10a〜10d
の内容が有効であるエレメントに対応するのベクトルレ
ジスタ12a〜12dに対してのみ書き込み指示信号を発生す
る。このことにより有効ベクトルデータのみをベクトル
レジスタ12a〜12dに書き込むことができる。The selectors 16a to 16d are provided with mask data buffers 10a to 10d.
A write instruction signal is generated only for the vector registers 12a to 12d corresponding to the elements whose contents are valid. As a result, only valid vector data can be written to the vector registers 12a to 12d.

以上の処理により、拡張処理が実現できる。 By the above processing, the extension processing can be realized.

以上のように、本実施例によれば、前記エレメントデ
ータの乗り換え等の処理に先立って、圧縮処理されたデ
ータから拡張処理後の一行に含まれるデータを抽出し、
その後拡張処理後の一行を単位として処理を行うため、
以降の処理が簡易に行える。特に、拡張処理前の行が、
処理後、異なる行にまたがる場合の処理においても、特
別な処理を必要としない。したがって、本実施例によれ
ば、少ないハードウェアで拡張処理を実現できるベクト
ルプロセッサを提供することができる。As described above, according to the present embodiment, prior to the processing such as the change of the element data, the data included in one line after the expansion processing is extracted from the compressed data,
After that, since processing is performed in units of one line after extension processing,
Subsequent processing can be easily performed. In particular, the line before the expansion process is
After the processing, no special processing is required even in the case where the processing spans different lines. Therefore, according to the present embodiment, it is possible to provide a vector processor that can realize the extension processing with a small amount of hardware.

なお、本実施例においては、圧縮命令処理と拡張命令
処理を共に実現できるベクトル処理装置につき示した
が、これらの処理の一方のみを独立に実現するようにし
ても良い。また、本実施例データバッファ間の転送（5a
〜ｃと6a〜ｃ間、8a〜ｃと11a〜ｃ間）をシリアル転送
としているが、これは、ハードウェア量を削減するため
の例として示したものであり、パラレル転送としても良
い。In the present embodiment, the vector processing apparatus capable of realizing both the compression instruction processing and the expansion instruction processing has been described. However, only one of these processings may be realized independently. In addition, the transfer (5a
(Between .about.c and 6a.about.c, and between 8a.about.c and 11a.about.c) is serial transfer, but this is shown as an example for reducing the amount of hardware, and may be parallel transfer.

［発明の効果］以上のように、本発明によれば、ベクトルデータの圧
縮、拡張を簡易な制御によりハードウェアの物理量増加
を招くことなく実現できるベクトル処理装置を提供する
ことができる。[Effects of the Invention] As described above, according to the present invention, it is possible to provide a vector processing apparatus capable of realizing compression and expansion of vector data by simple control without increasing the physical quantity of hardware.

[Brief description of the drawings]

第１図は本発明の一実施例に係るベクトル処理装置の構
成を示すブロック図である、第２図はベクトルデータの
圧縮および拡張処理の概念を示す説明図である。 1a〜ｄ……ベクトルレジスタ、2a〜ｄ……読出しアドレ
ス発生回路、3a〜ｄ……ベクトルマスクレジスタ、４…
…ベクトルデータ読出し制御回路、6a〜ｄ……データバ
ッファ、７……アライナー回路、8a〜ｄ……データバッ
ファ、9a〜ｄ……マスクデータバッファ、10a〜ｄ……
マスクデータバッファ、11a〜ｄ……データバッファ、1
2a〜ｄ……ベクトルレジスタ、13a〜ｄ……書き込みア
ドレス発生回路、14……マスクカウント回路、15……ベ
クトルデータ書き込み制御回路、16a〜ｄ……セレク
タ、17a〜ｄ……セレクタ。FIG. 1 is a block diagram showing a configuration of a vector processing apparatus according to one embodiment of the present invention, and FIG. 2 is an explanatory diagram showing the concept of vector data compression and expansion processing. 1a to d ... vector registers, 2a to d ... read address generation circuits, 3a to d ... vector mask registers, 4 ...
... Vector data read control circuit, 6a to d... Data buffer, 7... Aligner circuit, 8a to d... Data buffer, 9a to d... Mask data buffer, 10a to d.
Mask data buffer, 11a-d ... data buffer, 1
2a-d ... vector registers, 13a-d ... write address generation circuits, 14 ... mask count circuits, 15 ... vector data write control circuits, 16a-d ... selectors, 17a-d ... selectors.

Claims

(57) [Claims]

1. A first vector register for holding a plurality of rows of a data string composed of a plurality of vector element data constituting vector data, and at least one data string composed of a plurality of vector element data constituting the vector data. A second vector register for holding the number of rows, a data buffer capable of holding the vector element data by the number of vector element data constituting the data string, and a mask bit indicating whether the vector element data is valid or invalid. A vector mask register for holding each vector element data stored in the first vector register, a reading unit for reading a plurality of vector element data from the first vector vector register in units of the data string, From reading means From the extracted plurality of vector element data, vector element data indicating that the mask bit stored in the vector mask register is valid is extracted, and the extracted vector elements are arranged in the read plurality of vector element data in the earlier order. In order from the aligner means for storing in the data buffer; and when the data buffer stores vector element data for the number of vector element data constituting the data string, a plurality of vector elements stored in the data buffer are stored. Writing means for reading the vector element data and writing it as a data string composed of a plurality of vector element data constituting the vector data in a second vector register. .