JP3583032B2

JP3583032B2 - Vector diffusion processor

Info

Publication number: JP3583032B2
Application number: JP22872499A
Authority: JP
Inventors: 英樹秋山
Original assignee: NEC Computertechno Ltd
Current assignee: NEC Computertechno Ltd
Priority date: 1999-08-12
Filing date: 1999-08-12
Publication date: 2004-10-27
Anticipated expiration: 2019-08-12
Also published as: JP2001051980A

Description

【０００１】
【発明の属する技術分野】
本発明は、並列計算機におけるべクトル拡散処理装置に関し、特にべクトル拡散処理の高速化を実現するべクトル拡散処理装置に関する。
【０００２】
【従来の技術】
従来、べクトル演算処理は、コンピュータの処理を高速化するために用いられる技術であり、特にスーパーコンピュータにおいてほぼ全ての機種に採用されている。
【０００３】
スーパーコンピュータ等の大型計算機においては、行列計算やＦｏｒｔｒａｎのＤＯループのように、各データに対し同一の手順の演算を繰返し行う処理が多くの時間を占めている。べクトル演算処理は、こうした繰返し行う処理を高速化するために、各行列やデータをそれぞれにベクトルとしてまとめて１命令で行列の演算を実行するものであり、行列の各要素毎に１つずつ演算を行う必要がないために高速に処理が可能となる。この高速の処理は、各要素に対する命令を、”命令の呼出”、”命令の解読”、”アドレス計算”等々の部分に分けパイプライン制御により同時に並列に実行することにより行われる。
【０００４】
また、このようにベクトルにまとめられた各データは、途中で各データそれぞれ個別の処理を行う場合が多い。これには、ＦｏｒｔｒａｎのＤＯループ中にＩＦ文等の条件付演算を含む場合等がある。
【０００５】
ベクトルとして各データがまとめられた状態では、こうした条件付演算の処理は複雑になるが、しかし、この演算の処理を行う方法である条件付ベクトル演算の方法には、ベクトルマスクを用いる方式や、圧縮（又は収集）・拡散（又は伸長、拡張）を用いる方式等の多くの様々な方法がある。
【０００６】
ベクトルマスクを用いる条件付ベクトル演算の方法とは、演算対象のベクトルの要素数と同数のビット数を持つ”０”又は”１”のビット列のベクトルをベクトルマスクとして用いるものであり、ＩＦ文等の演算時にはベクトルマスクに、条件が成立する要素に対しては”１”を不成立の要素に対しては”０”を設定し、このベクトルマスクを参照することで演算対象のベクトルの条件の成立する要素のみにＴＨＥＮ文の側の処理結果を格納する処理方法である。このベクトルマスクを構成する各ビットを、マスクビットと言う。
【０００７】
このベクトルマスクを用いる方式では、ＴＨＥＮ文の側の演算処理を実際には全ての要素に対し実行した後で、条件不成立の（マスクビットが”０”の）要素に対しては演算結果の格納処理のみを抑止する方式である。よって、この方式では、条件不成立の要素に対しては無駄な演算を行っており、条件の成立する確立が低い場合には処理に多くの無駄な時間を必要とすることになる。
【０００８】
圧縮・拡散を用いる条件付ベクトル演算の方法とは、図７に示す様に、上述のベクトルマスクを用いて、演算対象のベクトルから条件の成立する要素（ＴＨＥＮ文の側の処理を行う要素）のみを順次抽出して並べることで圧縮された新たなベクトルを作成し、この圧縮されたベクトルに対してＴＨＥＮ文の側の処理を実行し、その結果を再びベクトルマスクを参照して元の形式に戻す（拡散する）処理方法である。
【０００９】
圧縮・拡散を用いる方式では、条件の成立する要素のみに対しＴＨＥＮ文の側の演算処理を実行するため、条件の成立する確立が低い場合や、また疎行列（行列の要素の多くが”０”である行列）の演算に用いる場合に適している。
【００１０】
しかし、上述の圧縮・拡散を用いる条件付ベクトル演算の方法では、通常の演算の他に特別に圧縮と拡散の処理を実行するため、この圧縮・拡散が高速に少ないステップで処理されることが望まれる。
【００１１】
特開平０９−０５４７６９号公報には、拡散処理を高速に行うための技術として、演算結果の格納前にレジスタ内で圧縮されたベクトルを、圧縮前のベクトル長に展開する処理を経由せずに、ベクトルの各要素に対し拡散後のアドレスを指定し直接に演算結果を格納する技術が記載されている。
【００１２】
これは、図７の例で説明すると特開平０９−０５４７６９号公報よりも以前では、圧縮されたベクトルに対し演算を行いベクトル”Ｂ０、Ｂ４、Ｂ６、Ｂ８、・・・”を得た後、この結果を格納してベクトル”Ｂ０、Ａ１、Ａ２、Ａ３、Ｂ４、Ａ５、Ｂ６、Ａ７、Ｂ８、・・・”を得る前に、一旦”Ｂ０、０、０、０、Ｂ４、０、Ｂ６、０、Ｂ８、・・・”の圧縮前のベクトル長の形への変換を経由していたが、特開平０９−０５４７６９号公報ではこの変換を経由せずに直接に演算結果を格納するのである。
【００１３】
また、上述の圧縮・拡散を用いる条件付ベクトル演算の方法では、拡散の処理に関し冗長な処理を含んでいる。
【００１４】
この冗長な処理とは、この圧縮されたベクトルの各要素の復元位置であるベクトルマスクのマスクビット有効（つまり、マスクビットが”１”あるもの）の位置を調べるために、要素を拡散させて書込む時にベクトルマスクのマスクビットを有効無効に関わらず全てを読込む処理を行うことである。
【００１５】
ベクトル演算は、高速処理を行うことを目的とするため、処理時間の短縮のために各処理が必要とするステップの数を可能な限り削減を行いシステムを構成するのであり、この冗長な処理を解消することが求められる。
【００１６】
従来の、この冗長な処理を解消するための技術としては、特開平０３−００６６６３号公報に、マスクレジスタにベクトルマスクが格納されベクトルの圧縮が行われた後で、ベクトルの拡散が実行されるよりも前に、つまり圧縮されたベクトルに対する演算処理が実行されている間に同時進行で、予めベクトルマスクから有効なマスクビットの間隔のデータを生成し、これを用いて拡散処理を行う技術がある。
【００１７】
このため拡散処理の実行時には、多くの数のマスクビットを有効無効に関わらず読込む冗長な処理を行う必要がなく、圧縮されたベクトルの要素の読出し毎に、順次この有効なマスクビットの間隔のデータを加算する処理を行うことで書込先の位置を得ている。この技術は、前述の特開平０９−０５４７６９号公報においても用いられている。
【００１８】
【発明が解決しようとする課題】
上述したように従来のベクトル拡散処理装置では、以下に述べるような問題点があった。
【００１９】
特開平０９−０５４７６９号公報に開示された従来のベクトル拡散処理装置では、圧縮されたベクトルの各要素に対し直接に拡散後のアドレスを指定するのみであるという問題点がある。演算結果の格納位置の指定は、高速化のためより詳細に、格納先のパイプの番号や各パイプ内のポインタにより指定することが望まれる。
【００２０】
特開平０３−００６６６３号公報に開示された従来のベクトル拡散処理装置では、個々の圧縮されたベクトルの要素の読出し毎に、順次有効なマスクビットの間隔のデータを加算する処理を行うことで書込先の位置を得るため、各要素の拡散処理毎に加算処理のステップを有するという問題点がある。高速処理のためには、必要とするステップの数は可能な限り少ないことが望まれる。
【００２１】
本発明の第１の目的は、ベクトル計算機の条件付ベクトル演算における拡散処理を、最小のステップでかつ他の処理に負担を与えることなく実現することにより、高速な処理を実現するベクトル拡散処理装置を提供することにある。
【００２２】
本発明の第２の目的は、圧縮されたベクトルの要素の拡散位置への書込処理において、効率的で高速な処理を実現するベクトル拡散処理装置を提供することにある。
【００２３】
【課題を解決するための手段】
上記目的を達成するため本発明のベクトル拡散処理装置は、条件付ベクトル演算における、ベクトルの各要素に対する分岐処理の有効無効を示すベクトルマスクを格納するマスクレジスタと、前記マスクレジスタのデータを参照して、圧縮されたベクトルの各要素の拡散位置を算出する拡散位置算出部と、前記拡散位置算出部による前記拡散位置の算出結果を格納する拡散位置格納部とを備え、前記圧縮されたベクトルの拡散処理の実行前に、予め前記拡散位置算出部により前記拡散位置を算出して前記拡散位置格納部内に格納し、前記圧縮されたベクトルの拡散処理の実行時に、前記拡散位置格納部内に格納された前記拡散位置を参照し、前記圧縮されたベクトルの各要素を拡散先のベクトルレジスタへ書込むことを特徴とする。
【００２４】
請求項２の本発明のベクトル拡散処理装置は、前記拡散位置格納部に格納される前記拡散位置を表すデータとして、書込先のベクトルレジスタを備えるパイプのパイプ番号と、前記書込先ベクトルレジスタ内の書込先ポインタを含むことを特徴とする。
【００２５】
請求項３の本発明のベクトル拡散処理装置は、前記拡散位置格納部に格納される前記拡散位置を基に、前記ベクトルの各要素に対し、パイプ間データ移送を制御するデータ移送制御部を備えることを特徴とする。
【００２６】
請求項４の本発明のベクトル拡散処理装置は、前記拡散位置格納部に格納される前記拡散位置を基に、前記ベクトルの各要素に対し、書込先である前記ベクトルレジスタ内における書込制御を行う書込制御部を備えることを特徴とする。
【００２７】
請求項５の本発明のベクトル拡散処理装置の前記データ移送制御部は、前記パイプ番号を参照することにより、前記ベクトルのパイプ間データ移送を制御し、前記書込制御部は、前記書込先ポインタを参照することにより、前記書込先ベクトルレジスタ内における書込制御を行うことを特徴とする。
【００２８】
請求項６の本発明のベクトル拡散処理装置の前記データ移送制御部と、前記書込制御部は、互いに独立かつ並行して、拡散処理における個々の制御を実行することを特徴とする。
【００２９】
【発明の実施の形態】
以下、本発明の実施の形態について図面を参照して詳細に説明する。図１は、本発明の一実施の形態によるベクトル拡散処理装置の構成を示すブロック図である。
【００３０】
図１を参照すると、本発明の一実施の形態によるベクトル拡散処理装置は、ベクトルレジスタ１０ａ、１０ｂ、１０ｃ、１０ｄ、２０ａ、２０ｂ、２０ｃ、２０ｄと、ベクトルレジスタ間のデータの移送を行うデータ移送部３０と、ベクトルマスクマスクを格納するマスクレジスタ５０と、マスクレジスタ５０内のベクトルマスクを参照して圧縮されたベクトルの拡散位置を検出する拡散位置算出回路６０と、拡散位置算出回路６０の出力を格納する拡散位置バッファ７０と、データ移送部３０を制御するデータ移送制御回路８０と、拡散データ書込先のベクトルレジスタの書込を制御する書込制御回路９０を備える。
【００３１】
図２は、本発明の一実施の形態のマスクレジスタ５０と拡散位置バッファ７０の構成を示すブロック図である。
【００３２】
図２を参照すると、本発明の一実施の形態のマスクレジスタ５０と拡散位置バッファ７０の構成は、パイプ数がパイプ０〜３迄の４パイプであり、ベクトル拡散処理の実行によりパイプ０〜３のベクトルレジスタ１０ａ、１０ｂ、１０ｃ、１０ｄ内の圧縮されたベクトルデータがマスクレジスタ５０を参照し各要素をマスクビット有効な要素番号位置に配置し圧縮前のベクトルデータの形式に復元する拡散処理を行い、パイプ０〜３のベクトルレジスタ２０ａ、２０ｂ、２０ｃ、２０ｄに書込まれる。
【００３３】
パイプ０はベクトルレジスタ１０ａ、２０ａを有し、パイプ１はベクトルレジスタ１０ｂ、２０ｂを有し、パイプ２はベクトルレジスタ１０ｃ、２０ｃを有し、パイプ３はベクトルレジスタ１０ｄ、２０ｄを有する。
【００３４】
データ移送部３０は、データ移送制御回路８０によるパイプ間移送データの選択の制御により、各パイプが有するベクトルレジスタのパイプ間データ移送を行う。
【００３５】
マスクレジスタ５０は、ベクトルマスクを格納する。ベクトルマスクの要素である各マスクビットをベクトルレジスタの要素と対応させて圧縮・拡散の処理を実行するため、ベクトルレジスタと同数のベクトル要素数を持つ。
【００３６】
拡散位置算出回路６０は、マスクレジスタ５０に処理対象のベクトルマスクの格納後、ベクトル拡散の処理の実行前に、このベクトルマスクを参照してベクトルレジスタ１０ａ、１０ｂ、１０ｃ、１０ｄ内の圧縮されたベクトルの各要素の、書込先のベクトルレジスタ２０ａ、２０ｂ、２０ｃ、２０ｄへの拡散位置を判定する。この拡散位置の判定は、書込先パイプ番号と書込ポインタを求めるものであり、この求められた書込先パイプ番号と書込ポインタは拡散位置バッファ７０に対し書込まれる。
【００３７】
拡散位置バッファ７０は、拡散位置算出回路６０から送信される拡散位置のデータである書込先パイプ番号と書込ポインタを格納し、拡散処理の終了までこれを保持する。
【００３８】
拡散位置バッファ７０は、ベクトル拡散の処理の実行時には拡散位置のデータを毎サイクル出力を行い、書込先パイプ番号をデータ移送制御回路８０に、書込ポインタを書込制御回路９０に送信する。
【００３９】
データ移送制御回路８０は、拡散位置バッファ７０から受信した書込先パイプ番号のデータからデータ移送制御信号を生成し、データ移送部３０におけるパイプ間移送データの選択の制御を行う。
【００４０】
書込制御回路９０は、拡散位置バッファ７０から受信した書込ポインタによりベクトルレジスタへの拡散データ書込ポインタと書込信号を生成し、これを各ベクトルレジスタ２０ａ、２０ｂ、２０ｃ、２０ｄに対し送信し書込制御を実行する。各ベクトルレジスタ２０ａ、２０ｂ、２０ｃ、２０ｄでは、書込制御回路９０からの書込信号の受信により、拡散データ書込ポインタが指定する位置にベクトルの要素を書込む。
【００４１】
なお、本実施の形態の一実施例を示す図１、図２では、パイプ数が４パイプの例を表しているが、本発明は４パイプに限らず任意の複数本のパイプの構成の場合に対し適応されるものである。
【００４２】
次に、本実施の形態の動作について図面を参照し詳細に説明する。図５は、本発明の一実施の形態の拡散位置算出の処理を説明するためのフローチャートであり、図３は、本発明の一実施の形態のベクトルレジスタの構成と、ベクトルマスクの一例とこれに対応し各レジスタに格納されるデータを示すブロック図である。
【００４３】
以下、本実施例の拡散位置算出の処理を図１、図２に示された４パイプ構成時の場合に、図３に示されたベクトルマスクの一例”１０００１０１０１０００・・・”に対する処理を説明する。
【００４４】
図５を参照すると、本実施の形態の拡散位置算出の処理は、予め拡散処理の前に行われるものであり、まず始めに拡散位置算出回路６０は、マスクレジスタ５０のマスクビットを毎サイクル４パイプ分、ワード０から順次読出を実行する（ステップ５０１）。
【００４５】
そして、拡散位置算出回路６０は、読出したマスクビットからマスクビット有効箇所に対応するパイプ番号とベクトルレジスタの書込ポインタ（つまり、圧縮されたベクトルの拡散位置）を算出し（ステップ５０２）、拡散位置バッファ７０にこの格差位置のデータを格納する（ステップ５０３）。
【００４６】
ここで、図３に示されたベクトルマスク”１０００１０１０１０００・・・”は、冒頭の１２個のマスクビットに対して”ワード０・パイプ０”、”ワード１・パイプ０”、”ワード１・パイプ２”、”ワード２・パイプ０”が拡散位置として判定される。
【００４７】
図４は、本発明の一実施の形態の拡散位置バッファのデータ格納の構成を示すブロック図である。
【００４８】
図４を参照すると、拡散位置バッファは、拡散位置算出回路６０が判定した各要素の拡散位置を、書込ポインタ（つまりワード０〜３の番号）とパイプ番号の情報により、”ワード１・パイプ２”のようにして格納する。
【００４９】
また、拡散位置バッファ７０内のこれら各拡散位置の情報の格納位置は、図５に示す様に、ワード０からワード１以降に順に、各ワード内ではパイプ０からパイプ１以降に順次格納する。つまり、ワード０のパイプ３の次はワード１のパイプ０に位置に、順次圧縮ベクトルの各要素の拡散位置を示す情報を、拡散位置バッファ７０に格納していく。
【００５０】
以上の拡散位置算出処理を、ベクトルマスクの全てのマスクビットに対し実行する（ステップ５０４）。
【００５１】
次に、本発明の一実施の形態の拡散処理の動作について、図面を参照して詳細に説明する。図６は、本発明の一実施の形態の拡散処理をの動作を説明するためのフローチャートである。
【００５２】
図６を参照すると、まず圧縮されたベクトルに対する拡散処理が開始されると、拡散位置バッファ７０のワード０から順次毎サイクル４パイプ分の書込先の位置情報であるパイプ番号と書込ポインタの読出が実行され（ステップ６０１）、パイプ番号をデータ移送制御回路８０に、書込ポインタを書込制御回路９０に対し送信する。
【００５３】
同時に、データ移送部３０は、パイプ０〜３の各ベクトルレジスタ１０ａ、１０ｂ、１０ｃ、１０ｄから合計４パイプ分の圧縮されたベクトルデータを読出す（ステップ６０４）。
【００５４】
データ移送制御回路８０は、拡散位置バッファ７０から受信した書込先のパイプを表す位置情報であるパイプ番号を基に、データ移送部３０に対しベクトルデータの各要素のパイプ間移送の移送先の選択等の制御を行うことで、データ移送部３０は、ベクトルレジスタ１０ａ、１０ｂ、１０ｃ、１０ｄから読出した圧縮されたベクトルデータを拡散データとしてパイプ０〜３の拡散データ書込先のベクトルレジスタ２０ａ、２０ｂ、２０ｃ、２０ｄに対し送信する（ステップ６０５）。
【００５５】
書込制御回路９０は、拡散位置バッファ７０から受信した各パイプ内における書込み位置を表す位置情報である書込ポインタを基に、拡散データ書込先のベクトルレジスタ２０ａ、２０ｂ、２０ｃ、２０ｄに対し、書込ポインタ、書込信号を送信することによる書込制御を行う。
【００５６】
拡散データ書込先のベクトルレジスタ２０ａ、２０ｂ、２０ｃ、２０ｄは、書込制御回路９０の書込位置等の制御により、データ移送部３０から受信したベクトルの各要素を格納する（ステップ６０６）。
【００５７】
以上の拡散処理の動作を、１ベクトルマスクにより拡散位置バッファ７０内に格納された全ての位置情報に対し実行すると（ステップ６０７）、四本のパイプによる一組のベクトルの拡散処理が終了する。
【００５８】
以上のように、従来の技術ではマスクビットの有効無効に関わらずベクトルマスクを毎サイクル読出し、かつ、マスクレジスタ５０から直接毎サイクルパイプ数分のマスクビットを読出すことにより、データ移送部３０の制御とベクトルレジスタ２０ａ、２０ｂ、２０ｃ、２０ｄに対する書込ポインタの指定と、書込の制御を行うステップ数の多い処理を行っていたのに対して、本実施の形態によれば、書込処理の前に予め詳細な拡散位置のデータを算出し、かつこのデータを拡散位置バッファ７０内に書込先のパイプ番号と書込ポインタのデータとの組合せによる書込時に参照しやすい形式で格納し、さらに書込処理の実行時にはデータ移送制御回路８０と書込制御回路９０によりそれぞれ書込先のパイプの選択と、書込先のポインタの指定が行われるために少ないステップで高速に拡散処理が可能である。
【００５９】
以上好ましい実施の形態及び実施例をあげて本発明を説明したが、本発明は必ずしも上記実施の形態及び実施例に限定されるものではなく、その技術的思想の範囲内において様々に変形して実施することができる。
【００６０】
【発明の効果】
以上説明したように本発明のベクトル拡散処理装置によれば、以下のような効果が達成される。
【００６１】
第１に、ベクトル拡散の処理前に、圧縮されたベクトルの要素の拡散位置を算出し、かつその位置のデータを書込時に参照しやすい形式によりバッファ内に保存しておくことにより、拡散処理の実行時にはベクトルマスクの読込みや距離データの加算処理等の位置を算出するための処理が不要であり、少ないステップで高速に拡散処理が可能である。
【００６２】
第２に、圧縮されたベクトルの要素の拡散位置への書込処理において必要である２つの処理、つまり各要素に対し書込先のパイプの選択処理と、各パイプ内での書込先のポインタを指定する処理を、それぞれデータ移送制御回路８０と書込制御回路９０により、独立にかつ並行して処理を実行するために高速に拡散処理が可能である。
【図面の簡単な説明】
【図１】本発明の一実施の形態によるベクトル拡散処理装置の構成を示すブロック図である。
【図２】本発明の一実施の形態のマスクレジスタと拡散位置バッファの構成を示すブロック図である。
【図３】本発明の一実施の形態のベクトルレジスタの構成と、ベクトルマスクの一例とこれに対応し各レジスタに格納されるデータを示すブロック図である。
【図４】本発明の一実施の形態の拡散位置バッファのデータ格納の構成を示すブロック図である。
【図５】本発明の一実施の形態の拡散位置算出の処理を説明するためのフローチャートである。
【図６】本発明の一実施の形態の拡散処理の動作を説明するためのフローチャートである。
【図７】本発明の一実施の形態の圧縮・拡散の処理におけるベクトルの各要素への制御を説明するための図である。
【符号の説明】
１０ａ、１０ｂ、１０ｃ、１０ｄ読出側のベクトルレジスタ
２０ａ、２０ｂ、２０ｃ、２０ｄ書込側のベクトルレジスタ
３０データ移送部
５０マスクレジスタ
６０拡散位置算出回路
７０拡散位置バッファ
８０データ移送制御回路
９０書込制御回路[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a vector spread processing apparatus in a parallel computer, and more particularly to a vector spread processing apparatus that realizes high-speed vector spread processing.
[0002]
[Prior art]
2. Description of the Related Art Conventionally, vector operation processing is a technique used to speed up processing of a computer, and is employed in almost all types of supercomputers.
[0003]
In a large-scale computer such as a supercomputer, a large amount of time is spent in a process of repeatedly performing the same procedure for each data, such as a matrix calculation or a Fortran DO loop. In order to speed up such repetitive processing, the vector operation processing is to collectively process each matrix or data as a vector and execute a matrix operation with one instruction, one for each element of the matrix. Since there is no need to perform calculations, processing can be performed at high speed. This high-speed processing is performed by dividing an instruction for each element into "call of instruction", "decoding of instruction", "address calculation", and the like, and simultaneously executing the instructions in parallel by pipeline control.
[0004]
In addition, in many cases, each data grouped into a vector is subjected to individual processing on the way. This may include a conditional operation such as an IF statement in a Fortran DO loop.
[0005]
In the state where each data is put together as a vector, the processing of such a conditional operation becomes complicated.However, the method of the conditional vector operation, which is a method of performing this operation, includes a method using a vector mask, There are many different methods, such as using compression (or collection) and spreading (or decompression, expansion).
[0006]
A conditional vector operation method using a vector mask uses a vector of a bit string of “0” or “1” having the same number of bits as the number of elements of a vector to be operated as a vector mask. Is set to "1" for an element that satisfies the condition, and "0" is set for an element that does not satisfy the condition. Then, the condition of the vector to be operated is satisfied by referring to this vector mask. This is a processing method of storing the processing result of the THEN statement only in the element to be executed. Each bit constituting the vector mask is called a mask bit.
[0007]
In the method using the vector mask, after the operation processing on the THEN statement side is actually executed for all elements, the operation result is stored for elements for which the condition is not satisfied (the mask bit is "0"). In this method, only processing is suppressed. Therefore, in this method, useless calculations are performed for elements whose conditions are not satisfied, and if the conditions are not likely to be satisfied, a large amount of useless time is required for processing.
[0008]
As shown in FIG. 7, a conditional vector operation method using compression / diffusion is an element that satisfies a condition from an operation target vector using the vector mask described above (an element that performs processing on the THEN statement side). By extracting and arranging only the vectors, a compressed new vector is created, the process on the side of the THEN statement is executed on the compressed vector, and the result is again referred to the vector mask to obtain the original format. This is a processing method for returning (spreading).
[0009]
In the method using compression / diffusion, since the arithmetic processing on the side of the THEN statement is executed only for the elements for which the conditions are satisfied, the probability that the conditions are satisfied is low, or a sparse matrix (many of the elements of the matrix are "0"). It is suitable for use in the calculation of (a matrix with "").
[0010]
However, in the above-described conditional vector operation method using compression / spreading, since special compression / spreading processing is performed in addition to the normal operation, the compression / spreading may be performed at a high speed with few steps. desired.
[0011]
Japanese Patent Application Laid-Open No. 09-054769 discloses a technique for performing diffusion processing at a high speed, without going through processing for expanding a vector compressed in a register before storing an operation result to a vector length before compression. , A technique of designating an address after diffusion for each element of a vector and directly storing an operation result.
[0012]
This is explained with reference to the example of FIG. 7. Prior to Japanese Patent Application Laid-Open No. 09-054769, an operation is performed on a compressed vector to obtain vectors “B0, B4, B6, B8,. Before storing the result and obtaining the vectors "B0, A1, A2, A3, B4, A5, B6, A7, B8,...", Once "B0, 0, 0, 0, B4, 0, B6" , 0, B8,... "To the vector length form before compression, but in JP-A-09-054769, the operation result is directly stored without passing through this conversion. is there.
[0013]
Further, the above-described method of conditional vector calculation using compression / spreading includes redundant processing for spreading processing.
[0014]
This redundant processing is performed by diffusing the elements in order to check the position of the mask bit validity (that is, the mask bit having "1") of the vector mask which is the restoration position of each element of the compressed vector. When writing, all the mask bits of the vector mask are read regardless of whether they are valid or invalid.
[0015]
Since the purpose of vector operation is to perform high-speed processing, the number of steps required for each processing is reduced as much as possible to shorten the processing time, and a system is configured. It is required to eliminate it.
[0016]
As a conventional technique for eliminating the redundant processing, Japanese Patent Application Laid-Open No. 03-006663 discloses a technique in which a vector mask is stored in a mask register and vector compression is performed, and then vector diffusion is performed. Prior to this, that is, while the arithmetic processing on the compressed vector is being performed, data of valid mask bit intervals is generated in advance from the vector mask, and diffusion processing is performed using this. is there.
[0017]
For this reason, when performing the spreading process, there is no need to perform a redundant process of reading a large number of mask bits regardless of whether they are valid or invalid. The position of the writing destination is obtained by performing the process of adding the data of (1). This technique is also used in the above-mentioned Japanese Patent Application Laid-Open No. 09-054769.
[0018]
[Problems to be solved by the invention]
As described above, the conventional vector diffusion processing apparatus has the following problems.
[0019]
The conventional vector spreading processing apparatus disclosed in Japanese Patent Application Laid-Open No. 09-054769 has a problem in that it is only necessary to directly specify a post-spread address for each element of a compressed vector. It is desirable to specify the storage position of the operation result in more detail by using the number of the pipe at the storage destination or a pointer in each pipe for speeding up.
[0020]
In the conventional vector spreading processing apparatus disclosed in Japanese Patent Application Laid-Open No. 03-006663, each time a read element of a compressed vector is read, data is sequentially added to data at valid mask bit intervals. In order to obtain the position of the destination, there is a problem in that a step of addition processing is required for each diffusion processing of each element. For high-speed processing, it is desirable that the number of required steps be as small as possible.
[0021]
A first object of the present invention is to realize a vector diffusion processing apparatus which realizes high-speed processing by realizing diffusion processing in conditional vector calculation of a vector computer with minimum steps and without imposing a load on other processing. Is to provide.
[0022]
A second object of the present invention is to provide a vector diffusion processing device which realizes efficient and high-speed processing in writing a compressed vector element to a diffusion position.
[0023]
[Means for Solving the Problems]
In order to achieve the above object, the vector diffusion processing apparatus of the present invention refers to a mask register that stores a vector mask indicating whether a branch process is valid or invalid for each element of a vector in a conditional vector operation, and data of the mask register. A diffusion position calculation unit that calculates a diffusion position of each element of the compressed vector; and a diffusion position storage unit that stores a calculation result of the diffusion position by the diffusion position calculation unit. Before performing the diffusion process, the diffusion position is calculated in advance by the diffusion position calculation unit and stored in the diffusion position storage unit, and when the compression process of the compressed vector is performed, the diffusion vector is stored in the diffusion position storage unit. Each of the elements of the compressed vector is written into a vector register of a diffusion destination with reference to the spread position.
[0024]
3. The vector spreading processing apparatus according to claim 2, wherein the data indicating the spreading position stored in the spreading position storage unit includes a pipe number of a pipe including a write destination vector register, and the write destination vector register. And a write destination pointer in the field.
[0025]
The vector spreading processing apparatus according to the third aspect of the present invention includes a data transfer control unit that controls data transfer between pipes for each element of the vector based on the spreading position stored in the spreading position storage unit. It is characterized by the following.
[0026]
5. The vector diffusion processing device according to claim 4, wherein, based on the diffusion position stored in the diffusion position storage unit, a write control in the vector register as a write destination for each element of the vector. And a writing control unit for performing the following.
[0027]
6. The vector transfer processing device according to claim 5, wherein the data transfer control unit controls data transfer between the pipes of the vector by referring to the pipe number, and the write control unit controls the write destination. The writing control in the writing destination vector register is performed by referring to the pointer.
[0028]
According to a sixth aspect of the present invention, the data transfer control unit and the write control unit of the vector spreading processing device execute individual controls in the spreading process independently and in parallel with each other.
[0029]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a vector diffusion processing device according to an embodiment of the present invention.
[0030]
Referring to FIG. 1, a vector spreading processing apparatus according to an embodiment of the present invention includes a vector register 10a, 10b, 10c, 10d, 20a, 20b, 20c, 20d and a data transfer for transferring data between the vector registers. Unit 30, a mask register 50 for storing a vector mask mask, a diffusion position calculation circuit 60 for detecting the diffusion position of the compressed vector with reference to the vector mask in the mask register 50, and an output of the diffusion position calculation circuit 60 , A data transfer control circuit 80 for controlling the data transfer unit 30, and a write control circuit 90 for controlling the writing of the vector register to which the spread data is written.
[0031]
FIG. 2 is a block diagram showing a configuration of the mask register 50 and the diffusion position buffer 70 according to one embodiment of the present invention.
[0032]
Referring to FIG. 2, the configuration of the mask register 50 and the diffusion position buffer 70 according to an embodiment of the present invention is a configuration in which the number of pipes is four pipes of pipes 0 to 3, and the pipes 0 to 3 The compressed vector data in the vector registers 10a, 10b, 10c, and 10d refers to the mask register 50, and arranges each element at an element number position where mask bits are valid, thereby restoring the vector data to the form of vector data before compression. Then, the data is written to the vector registers 20a, 20b, 20c, and 20d of the pipes 0 to 3.
[0033]
Pipe 0 has vector registers 10a and 20a, pipe 1 has vector registers 10b and 20b, pipe 2 has vector registers 10c and 20c, and pipe 3 has vector registers 10d and 20d.
[0034]
The data transfer unit 30 transfers data between pipes of the vector register of each pipe by controlling the selection of transfer data between pipes by the data transfer control circuit 80.
[0035]
The mask register 50 stores a vector mask. Since each mask bit, which is an element of the vector mask, is associated with an element of the vector register to execute the compression / spreading process, it has the same number of vector elements as the vector register.
[0036]
After storing the vector mask to be processed in the mask register 50 and before executing the vector diffusion process, the diffusion position calculation circuit 60 refers to this vector mask and stores the compressed data in the vector registers 10a, 10b, 10c, and 10d. The diffusion position of each element of the vector to the destination vector register 20a, 20b, 20c, 20d is determined. This determination of the diffusion position is for obtaining a write destination pipe number and a write pointer, and the obtained write destination pipe number and write pointer are written to the diffusion position buffer 70.
[0037]
The spreading position buffer 70 stores the writing destination pipe number and the writing pointer, which are the data of the spreading position transmitted from the spreading position calculating circuit 60, and holds them until the end of the spreading process.
[0038]
The diffusion position buffer 70 outputs the data of the diffusion position every cycle when the vector diffusion process is executed, and transmits the write destination pipe number to the data transfer control circuit 80 and the write pointer to the write control circuit 90.
[0039]
The data transfer control circuit 80 generates a data transfer control signal from the data of the write destination pipe number received from the diffusion position buffer 70, and controls the data transfer unit 30 to select the transfer data between pipes.
[0040]
The write control circuit 90 generates a spread data write pointer to the vector register and a write signal based on the write pointer received from the spread position buffer 70, and transmits this to each of the vector registers 20a, 20b, 20c, and 20d. And execute write control. In each of the vector registers 20a, 20b, 20c, and 20d, upon receiving the write signal from the write control circuit 90, the vector element is written at the position designated by the spread data write pointer.
[0041]
1 and 2 showing an example of the present embodiment, an example in which the number of pipes is four is shown. However, the present invention is not limited to four pipes, and is applicable to a case of an arbitrary plurality of pipes. It is applied to
[0042]
Next, the operation of the present embodiment will be described in detail with reference to the drawings. FIG. 5 is a flowchart for explaining a process of calculating a diffusion position according to an embodiment of the present invention. FIG. 3 is a diagram illustrating an example of a configuration of a vector register, an example of a vector mask, and the like according to an embodiment of the present invention. FIG. 4 is a block diagram showing data stored in each register corresponding to FIG.
[0043]
Hereinafter, the process of calculating the diffusion position according to the present embodiment will be described for the example of the vector mask “100010101000...” Shown in FIG. 3 in the case of the four-pipe configuration shown in FIGS. .
[0044]
Referring to FIG. 5, the diffusion position calculation processing according to the present embodiment is performed in advance before the diffusion processing. First, the diffusion position calculation circuit 60 sets the mask bit of the mask register 50 to 4 Reading is sequentially performed from word 0 for the pipes (step 501).
[0045]
Then, the diffusion position calculation circuit 60 calculates the pipe number corresponding to the mask bit valid part and the write pointer of the vector register (that is, the diffusion position of the compressed vector) from the read mask bits (step 502), and performs the diffusion. The data of the difference position is stored in the position buffer 70 (step 503).
[0046]
Here, the vector mask “100010101000...” Shown in FIG. 3 corresponds to “word 0 / pipe 0”, “word 1 / pipe 0”, “word 1 / pipe” for the first 12 mask bits. 2 "and" word 2 / pipe 0 "are determined as the diffusion positions.
[0047]
FIG. 4 is a block diagram showing a configuration of data storage in the diffusion position buffer according to one embodiment of the present invention.
[0048]
Referring to FIG. 4, the diffusion position buffer indicates the diffusion position of each element determined by the diffusion position calculation circuit 60 according to the write pointer (that is, the number of words 0 to 3) and the information of the pipe number by “word 1 pipe”. Store as 2 ".
[0049]
Further, as shown in FIG. 5, the information storage positions of the respective diffusion positions in the diffusion position buffer 70 are sequentially stored from word 0 to word 1 and thereafter, and within each word from pipe 0 to pipe 1 and thereafter. In other words, the information indicating the diffusion position of each element of the compressed vector is sequentially stored in the diffusion position buffer 70 at the position next to the pipe 3 of the word 1 after the pipe 3 of the word 0.
[0050]
The above diffusion position calculation processing is executed for all the mask bits of the vector mask (step 504).
[0051]
Next, the operation of the diffusion process according to one embodiment of the present invention will be described in detail with reference to the drawings. FIG. 6 is a flowchart for explaining the operation of the diffusion process according to the embodiment of the present invention.
[0052]
Referring to FIG. 6, first, when the spreading process for the compressed vector is started, from the word 0 of the spreading position buffer 70, the pipe number and the write pointer of the position information of the writing destination for 4 pipes every cycle are sequentially read. The reading is executed (step 601), and the pipe number is transmitted to the data transfer control circuit 80 and the write pointer is transmitted to the write control circuit 90.
[0053]
At the same time, the data transfer unit 30 reads compressed vector data for a total of four pipes from the vector registers 10a, 10b, 10c, and 10d of the pipes 0 to 3 (step 604).
[0054]
The data transfer control circuit 80 provides the data transfer unit 30 with the transfer destination of the transfer of each element of the vector data between the pipes based on the pipe number which is the position information indicating the write destination pipe received from the diffusion position buffer 70. By performing control such as selection, the data transfer unit 30 uses the compressed vector data read from the vector registers 10a, 10b, 10c, and 10d as spread data as the spread vector write destination vector register 20a of the pipes 0 to 3. , 20b, 20c, and 20d (step 605).
[0055]
The write control circuit 90 sends the spread data write destination vector registers 20a, 20b, 20c, and 20d based on the write pointer that is the position information indicating the write position in each pipe received from the spread position buffer 70. , Write control by transmitting a write pointer and a write signal.
[0056]
The vector register 20a, 20b, 20c, 20d to which the spread data is written stores each element of the vector received from the data transfer unit 30 under the control of the writing position of the writing control circuit 90 (Step 606).
[0057]
When the above-described operation of the diffusion process is executed for all the position information stored in the diffusion position buffer 70 by one vector mask (step 607), the diffusion process of the set of vectors by the four pipes is completed.
[0058]
As described above, according to the conventional technique, the vector mask is read every cycle irrespective of the validity / invalidity of the mask bit, and the mask bits corresponding to the number of pipes per cycle are read directly from the mask register 50, whereby the data transfer unit 30 According to the present embodiment, a write pointer is designated for the control and vector registers 20a, 20b, 20c, and 20d, and a process with a large number of steps for performing write control is performed. Beforehand, the detailed data of the diffusion position is calculated in advance, and this data is stored in the diffusion position buffer 70 in a format that can be easily referred to when writing by a combination of the pipe number of the write destination and the data of the write pointer. During the execution of the write process, the data transfer control circuit 80 and the write control circuit 90 respectively select the pipe to be written to and the pointer of the pointer of the write destination. Is possible diffusion process at high speed with less steps to be performed.
[0059]
Although the present invention has been described with reference to the preferred embodiments and examples, the present invention is not necessarily limited to the above embodiments and examples, and various modifications may be made within the scope of the technical idea. Can be implemented.
[0060]
【The invention's effect】
As described above, according to the vector diffusion processing device of the present invention, the following effects can be achieved.
[0061]
First, before the vector diffusion processing, the diffusion position of the element of the compressed vector is calculated, and the data at that position is stored in a buffer in a format that can be easily referred to when writing. Does not require processing for calculating a position, such as reading a vector mask and adding distance data, so that high-speed diffusion processing can be performed with few steps.
[0062]
Second, there are two processes required in the process of writing the elements of the compressed vector to the diffusion position, namely, the process of selecting the pipe to write to for each element and the process of selecting the write destination in each pipe. The data transfer control circuit 80 and the write control circuit 90 execute the processing for designating the pointer independently and in parallel, so that the diffusion processing can be performed at high speed.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration of a vector diffusion processing apparatus according to an embodiment of the present invention.
FIG. 2 is a block diagram showing a configuration of a mask register and a diffusion position buffer according to an embodiment of the present invention.
FIG. 3 is a block diagram illustrating a configuration of a vector register according to an embodiment of the present invention, an example of a vector mask, and data stored in each register corresponding thereto.
FIG. 4 is a block diagram showing a configuration of data storage in a diffusion position buffer according to an embodiment of the present invention.
FIG. 5 is a flowchart illustrating a process of calculating a diffusion position according to the embodiment of the present invention.
FIG. 6 is a flowchart illustrating an operation of a diffusion process according to an embodiment of the present invention.
FIG. 7 is a diagram for explaining control of each element of a vector in compression / spreading processing according to an embodiment of the present invention.
[Explanation of symbols]
10a, 10b, 10c, 10d Read-side vector registers 20a, 20b, 20c, 20d Write-side vector registers 30 Data transfer unit 50 Mask register 60 Diffusion position calculation circuit 70 Diffusion position buffer 80 Data transfer control circuit 90 Write control circuit

Claims

In a vector data diffusion processing device in a vector computer,
In a conditional vector operation, a mask register that stores a vector mask that indicates whether branch processing is enabled or disabled for each element of the vector,
A diffusion position calculation unit that calculates a diffusion position of each element of the compressed vector with reference to the data of the mask register;
A diffusion position storage unit for storing the calculation result of the diffusion position by the diffusion position calculation unit,
Before performing the diffusion processing of the compressed vector, the diffusion position is calculated in advance by the diffusion position calculation unit and stored in the diffusion position storage unit,
When performing the spread processing of the compressed vector, the spread position stored in the spread position storage unit is referred to, and each element of the compressed vector is written to a vector register of a spread destination. Vector diffusion processor.

As data representing the diffusion position stored in the diffusion position storage unit,
A pipe number of a pipe having a vector register to be written;
2. The vector spreading apparatus according to claim 1, further comprising a write destination pointer in the write destination vector register.

The vector spreading apparatus according to claim 2, further comprising a data transfer control unit that controls data transfer between pipes for each element of the vector based on the spreading position stored in the spreading position storage unit. Processing equipment.

A write control unit configured to perform a write control in the vector register as a write destination for each element of the vector based on the diffusion position stored in the diffusion position storage unit. The vector diffusion processing device according to claim 3.

The data indicating the diffusion position includes the pipe number and the write destination pointer,
The data transfer control unit includes:
By referring to the pipe number, controlling the transfer of data between pipes of the vector,
The write control unit includes:
5. The vector spreading processing apparatus according to claim 4, wherein the writing control in the writing destination vector register is performed by referring to the writing destination pointer.

The data transfer control unit and the write control unit,
The vector diffusion processing apparatus according to claim 5, wherein individual controls in the diffusion processing are executed independently and in parallel with each other.