JP3760995B2

JP3760995B2 - Shared memory vector processing system, control method therefor, and storage medium storing vector processing control program

Info

Publication number: JP3760995B2
Application number: JP2002205719A
Authority: JP
Inventors: 聡中里
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2002-07-15
Filing date: 2002-07-15
Publication date: 2006-03-29
Anticipated expiration: 2018-12-15
Also published as: JP2003099420A

Description

【０００１】
【発明の属する技術分野】
本発明は、主記憶メモリを共有する複数のＣＰＵを備え、各ＣＰＵがスカラ処理部と複数のベクトルパイプラインを構成するベクトル処理部を有してなる共有メモリ型ベクトル処理システムに関する。
【０００２】
【従来の技術】
図９に、従来のベクトル処理装置におけるＣＰＵを用いた共有メモリ型並列処理システムの構成を示す。このシステムでは、複数のＣＰＵ１００ａ〜１００ｎが１つの主記憶装置２００を共有して接続されている。
【０００３】
各ＣＰＵ１００ａ〜１００ｎの詳細構成図を図１０に示す。各ＣＰＵ１００ａ〜１００ｎは、図示のように、スカラ処理部１０１、命令制御部１０２、ベクトル処理部１０４ａ〜１０４ｎ及びメモリアクセスネットワーク部１０５を備えて構成される。
【０００４】
スカラ処理部１０１から発行された外部処理命令「ＥＸ−ＲＱ」は、命令制御部１０２へと転送される。命令制御部１０２では、自ＣＰＵ内部にのみ存在するベクトル処理部１０４ａ〜１０４ｎのリソースを管理することによりベクトル処理命令「Ｖ−ＲＱ」を発行する。
【０００５】
従って、各ＣＰＵ１００ａ〜１００ｎ内部におけるスカラ処理部１０１とベクトルパイプラインとの構成は、常に一定であり変更することができない。
【０００６】
従来のベクトル処理装置の例が例えば特開昭６３−１２７３６８号及び特開昭６３−１０２６３号等に開示されている。これらの公報に開示されるベクトル処理装置においては、何れもスカラ処理部とベクトルパイプラインとの構成が一定であり、スカラ処理部に付随するベクトルパイプライン数を用途に応じて柔軟に変更できる構成とはなっていない。
【０００７】
【発明が解決しようとする課題】
上述した従来のベクトル処理装置においては、以下のような問題点がある。
【０００８】
第１の問題点は、実行するアプリケーションによってベクトル化率等が異なるのに対して、適切なベクトル処理リソースを割り当てられないことである。
【０００９】
その理由は、各ＣＰＵにおけるベクトルパイプライン数は常に一定であるため、想定しているよりも低いベクトル化率のアプリケーションを実行した場合、ベクトルリソースの余剰が発生してしまう。また逆に、より高いベクトル化率やより長いベクトル長のアプリケーションを実行しても、予め固定的に構成されたベクトルパイプライン構成によりベクトル処理性能の上限が固定されているため、更なる処理性能向上を行なうことができないからある。
【００１０】
第２の問題点は、ＬＳＩの集積度が向上してもスカラ処理部とベクトルパイプラインとを別のＬＳＩとして開発する必要があることである。
【００１１】
その理由は、ＬＳＩの集積度が向上し、スカラ処理部とベクトルパイプラインの１本分程度が１チップ化できるようになってきたが、従来の多重ベクトルパイプライン構成では、このようなＬＳＩを複数接続する際に各ＬＳＩに存在するスカラ処理部は全く利用できずハードウェア量を無駄に使用することになるため、従来通りスカラ処理部とベクトルパイプラインとを別のＬＳＩとして開発することになってしまう。ところが、この方法ではＬＳＩの開発工数が増大すること、ＬＳＩの開発品種数が増加すること、１種のＬＳＩ当たりの生産個数が減少することなどコスト増となる項目が多くなる。
【００１２】
本発明の目的は、スカラ処理部に付随するベクトルパイプライン数を用途に応じて柔軟に変更できるベクトル処理システムを提供することにある。
【００１３】
本発明の他の目的は、独立する各プロセッサのスカラ処理部からの単一のベクトルパイプラインを共有しているように動作するベクトル処理システムを提供することにある。
【００１４】
【課題を解決するための手段】
上記目的を達成する請求項１の本発明は、主記憶メモリを共有する複数のＣＰＵを備え、各ＣＰＵがスカラ処理手段とベクトル処理手段を有してなる共有メモリ型ベクトル処理システムにおいて、前記各ＣＰＵ相互を、前記各ＣＰＵから生成するベクトル処理命令を各ＣＰＵに転送するためのパスによって接続し、前記各ＣＰＵは、発行元のＣＰＵを識別する発行元ＣＰＵ情報を付加したベクトル処理命令を発行し、前記パスを介して自ＣＰＵを含む全てのＣＰＵに対して転送する発行手段と、転送された前記ベクトル処理命令を、前記発行元ＣＰＵ情報に基づいて各ＣＰＵ毎に対応した複数の命令スタックに格納し、前記複数の命令スタック毎の優先順位と前記ベクトル処理手段のリソース情報に基づいて、前記ベクトル処理命令に基づく命令発行を制御するベクトル処理命令制御手段とを備えることを特徴とする。
【００１５】
請求項２の本発明による共有メモリ型ベクトル処理システムは、前記各ＣＰＵのベクトル処理命令制御手段は、各ＣＰＵ毎に対応した複数の命令スタックと、転送された前記ベクトル処理命令に含まれる前記発行元ＣＰＵ情報を検出し、前記ベクトル処理命令を対応する前記命令スタックに格納する命令発行元検出手段と、複数の前記複数の命令スタック毎に、何れの命令スタックのベクトル処理命令に基づく命令発行を優先するか決定する調停手段と、前記調停手段による決定内容と前記ベクトル処理手段のリソース情報に基づいて、前記ベクトル処理命令に基づく命令発行を前記ベクトル処理手段に対して行なう命令発行処理手段とを備えることを特徴とする。
【００１６】
請求項３の本発明は、主記憶メモリを共有する複数のＣＰＵを備え、各ＣＰＵがスカラ処理手段とベクトル処理手段を有してなる共有メモリ型ベクトル処理システムの制御方法であって、前記各ＣＰＵ相互を、前記各ＣＰＵから生成するベクトル処理命令を各ＣＰＵに転送するためのパスによって接続し、前記各ＣＰＵにおいて、発行元のＣＰＵを識別する発行元ＣＰＵ情報を付加したベクトル処理命令を発行し、前記パスを介して自ＣＰＵを含む全てのＣＰＵに対して転送し、転送された前記ベクトル処理命令を、前記発行元ＣＰＵ情報に基づいて各ＣＰＵ毎に対応した複数の命令スタックに格納し、前記複数の命令スタック毎の優先順位と前記ベクトル処理手段のリソース情報に基づいて、前記ベクトル処理命令に基づく命令発行を制御することを特徴とする。
【００１７】
請求項４の本発明による共有メモリ型ベクトル処理システムの制御方法は、前記各ＣＰＵにおいて、転送された前記ベクトル処理命令に含まれる前記発行元ＣＰＵ情報を検出し、前記ベクトル処理命令を対応する命令スタックに格納し、複数の前記複数の命令スタック毎に、何れの命令スタックのベクトル処理命令に基づく命令発行を優先するか決定し、前記決定内容と前記ベクトル処理手段のリソース情報に基づいて、前記ベクトル処理命令に基づく命令発行を前記ベクトル処理手段に対して行なうことを特徴とする。
【００１８】
請求項５の本発明は、主記憶メモリを共有する複数のＣＰＵを備え、各ＣＰＵがスカラ処理手段とベクトル処理手段を有してなる共有メモリ型ベクトル処理システムの前記ＣＰＵに、発行元のＣＰＵを識別する発行元ＣＰＵ情報を付加したベクトル処理命令を発行し、相互に接続されるパスを介して自ＣＰＵを含む全てのＣＰＵに対して転送する処理と、転送された前記ベクトル処理命令を、前記発行元ＣＰＵ情報に基づいて各ＣＰＵ毎に対応した複数の命令スタックに格納し、前記複数の命令スタック毎の優先順位と前記ベクトル処理手段のリソース情報に基づいて、前記ベクトル処理命令に基づく命令発行を制御する処理とを実行させるためのプログラムを格納した記憶媒体である。
【００１９】
請求項６の本発明は、前記ＣＰＵに、転送された前記ベクトル処理命令に含まれる前記発行元ＣＰＵ情報を検出し、前記ベクトル処理命令を対応する命令スタックに格納する処理と、
複数の前記複数の命令スタック毎に、何れの命令スタックのベクトル処理命令に基づく命令発行を優先するか決定する処理と、前記決定内容と前記ベクトル処理手段のリソース情報に基づいて、前記ベクトル処理命令に基づく命令発行を前記ベクトル処理手段に対して行なう処理とを実行させることを特徴とする。
【００２０】
【発明の実施の形態】
以下、本発明の実施の形態について図面を参照して詳細に説明する。図１は、本発明の第１の実施の形態に係るベクトル処理システム全体の構成図である。
【００２１】
本実施の形態に係るベクトル処理システムは、複数のＣＰＵ１０ａ〜１０ｎを備え、これらのＣＰＵ１０ａ〜１０ｎが単一の主記憶装置２０を共有する共有メモリ型並列処理システムを構成している。各ＣＰＵ１０ａ〜１０ｎは、ベクトルリクエストバス３０を介して互いに接続され、相互にベクトル処理に関するリクエストやリプライの送受信を行なうことができる。
【００２２】
上記各ＣＰＵ１０ａ〜１０ｎの詳細な構成を図２に基づいて説明する。
【００２３】
ＣＰＵ１０ａ〜１０ｎは、１つのスカラ処理部１１、メモリアクセス命令制御部１２、ベクトル処理命令制御部１３、複数のベクトル処理部１４ａ〜１４ｎ、そしてメモリアクセスネットワーク部１５から構成されている。
【００２４】
スカラ処理部１１から外部に発行される外部処理命令「ＥＸ−ＲＱ」は、メモリアクセス制御部１２を通して、メモリアクセスネットワーク部１５経由で主記憶装置２０へと転送されるか、ベクトルリクエストバス３０を経由して全てのＣＰＵ１０ａ〜１０ｎのベクトル処理命令制御部１３へと送られ、ベクトル処理部１４ａ〜１４ｎへと発行される。
【００２５】
ここで、図３は、ベクトル処理命令制御部１３の詳細な構成を示すブロック図である。
【００２６】
ベクトル処理命令制御部１３は、２つのレジスタ４２、４３と、それらのレジスタ４２、４３の内容の比較を行なう比較器４５と、ベクトルリクエストバス３０経由で転送されてきたベクトル処理命令の内容を分離する命令発行元情報抽出部４１により得られたベクトル処理命令の発行元情報と一方のレジスタ４２の内容を比較するための比較器４４、命令無効化処理部４６、命令スタック４７、リソース管理／命令発行処理部４８を有している。
【００２７】
命令発行元情報抽出部４１から得られたベクトル処理命令と比較器４４の出力は、無効化処理部４６に入力された後に、命令スタック４７に格納される。命令スタック４７に格納されたベクトル処理命令は、ベクトル処理部１４ａ〜１４ｎからのリソース情報等と共に、リソース管理／命令発行処理部４８に入力されて、ベクトル処理部１４ａ〜１４ｎに対して命令発行が行なわれる。
【００２８】
次いで、上記のように構成される第１の実施の形態によるベクトル処理システムの動作について説明する。
【００２９】
図２において、スカラ処理部１１では命令の解読を行ないスカラ処理命令の実行処理を行なう。ここで、主記憶装置２０へのアクセス命令やベクトル処理命令などといった、スカラ処理部１１では実行処理できない命令が出現した場合、スカラ処理部１１はこれらの命令を外部処理命令「ＥＸ−ＲＱ」としてメモリアクセス命令制御部１２へと転送する。
【００３０】
メモリアクセス命令制御部１２では、スカラ処理部１１から受け取った外部処理命令「ＥＸ−ＲＱ」を解読し、主記憶アクセス系の命令「Ｍ−ＲＱ」であれば、そのままメモリアクセスネットワーク部１５に対して命令の発行を行なう。
【００３１】
一方、ベクトル処理命令であった場合には、これをベクトルリクエストバス３０に送出すると共に、自身のベクトル処理命令制御部１３に対してもベクトルリクエストバス３０を経由して発行を行なう。
【００３２】
ベクトル処理命令制御部１３では、メモリアクセス命令制御部１２から送られてきた自ＣＰＵが発行したベクトル処理命令と他のＣＰＵからベクトルリクエストバス３０を経由して転送されてきたベクトル処理命令とを受け取り、自ＣＰＵ内のベクトル処理部１４ａ〜１４ｎに対してそのリソース状態を管理しながら命令を発行する。
【００３３】
メモリアクセスネットワーク部１５は、メモリアクセス命令制御部１２からの主記憶アクセス命令「Ｍ−ＲＱ」を受け取り、主記憶装置２０に対して命令を発行すると共に、主記憶装置２０からの読み出しデータを受け取り、命令種別に応じてデータをスカラ処理部１１もしくはベクトル処理部１４ａ〜１４ｎに対して戻す。
【００３４】
次で、図３と図４のフローチャートを参照して、ベクトル処理命令制御部１３の動作について述べる。
【００３５】
ベクトル処理命令制御部１３に送られてきたベクトル処理命令は、命令発行元情報抽出部４１にて命令を送出したＣＰＵに関する情報とベクトル処理命令本体とに分離される（ステップ４０１）。
【００３６】
ベクトル処理命令制御部１３には、自ＣＰＵに対してマスターとして外部より設定されたＣＰＵの番号を記憶しているレジスタ４２と、自ＣＰＵ番号を記憶しているレジスタ４３とが備えられている。これら２つのレジスタ４２、４３に対しては、システムの起動前に初期化動作として上記番号をそれぞれ設定しておくものとする。
【００３７】
本実施の形態では、共有メモリ型並列処理システムの各ＣＰＵ１０ａ〜１０ｎをマスターＣＰＵとスレーブＣＰＵとに分けて設定する。マスターＣＰＵはスカラ処理を実行すると共に、ベクトル処理命令を他のＣＰＵに対して発行することができる。これに対してスレーブＣＰＵは、マスターＣＰＵから転送されてきたベクトル処理命令を受け取り、マスターＣＰＵ内のベクトル処理部１４ａ〜１４ｎと同期して多重ベクトルパイプラインとして動作することになる。この時、スレーブＣＰＵのスカラ処理部１１は休止状態として、そのベクトル処理部１４ａ〜１４ｎとベクトル処理命令制御部１３、及びメモリアクセスネットワーク部１５のみが有効に機能することになる。
【００３８】
マスターＣＰＵ番号を格納したレジスタ４２と自ＣＰＵ番号を格納したレジスタ４３の内容は、比較器４５で比較され（ステップ４０２）、不一致の場合には、自ＣＰＵがスレーブＣＰＵであると判断され、自ＣＰＵのスカラ処理部１１に対して動作を停止するよう制御する（ステップ４０３）。
【００３９】
一方、ベクトルリクエストバス３０を通して転送されたベクトル処理命令から命令発行元情報抽出部４１において取り出された命令発行元ＣＰＵ番号と、マスターＣＰＵ番号を格納しているレジスタ４２の内容とが、もう１つの比較器４４で比較される（ステップ４０４）。この際の比較結果と、命令発行元情報抽出部４１にて分離されたベクトル処理命令は、無効化処理部４６に入力される（ステップ４０５）。
【００４０】
比較器４４による比較結果が不一致となっていた場合（ステップ４０６）、入力されたベクトル処理命令はスレーブとして動作している自ＣＰＵのマスターＣＰＵから発行されたベクトル処理命令ではないため、無効化処理部４６にて無効化される（ステップ４０７）。具体的には、比較器４４による比較結果に応じて、ベクトル処理命令に有効又は無効を示すフラグを付し、無効化処理部４６では、そのフラグによって有効なベクトル処理命令のみを命令スタック４７に格納する。無効なベクトル処理命令は命令スタック４７に格納しない。
【００４１】
当然のことながら、自ＣＰＵがマスターＣＰＵとして動作しており、転送されてきたベクトル処理命令が自ＣＰＵの発行した命令であれば、比較器４４の結果は一致を示すので無効化されることはない。
【００４２】
無効化処理部４６で無効化されなかったベクトル処理命令は、自ＣＰＵ内のベクトル処理部１４ａ〜１４ｎで処理すべき命令であることから、命令スタック４７に受付順に格納される（ステップ４０８）。無効化されたベクトル処理命令は、命令スタック４７に格納されずに破棄される。
【００４３】
リソース管理／命令発行制御部４８では、自ＣＰＵ内のベクトル処理部のリソース１４ａ〜１４ｎを管理している。命令スタック４７に格納された命令は、このリソース管理／命令発行処理部４８において優先順位、ならびにベクトル処理部１４ａ〜１４ｎのリソース状況に応じて、発行が可能な順に自ＣＰＵのベクトル処理部１４ａ〜１４ｎに対して命令発行が行なわれる（ステップ４０９）。ここでは、命令スタック４７への格納順には従わずに命令の追い越し発行も可能である。
【００４４】
なお、各スレーブＣＰＵにおけるベクトル処理が終了した時点で、処理の終了がマスタＣＰＵに通知される。マスタＣＰＵでは、全てのスレーブＣＰＵからの終了通知を受け取ったことを確認した上で次のベクトル処理命令の発行を行なうことになる。
【００４５】
以上により、マスターＣＰＵに設定されたＣＰＵと、このＣＰＵ番号をマスターＣＰＵ番号として記憶している複数のＣＰＵは、一体となって動作する多重ベクトルパイプラインのプロセッサと見なすことができるようになる。
【００４６】
この時、マスターＣＰＵのスカラ処理部１１だけが機能しており、スレーブＣＰＵのスカラ処理部１１はベクトル処理命令制御部１３からの制御信号「ＨＡＬＴ」により機能が停止されている。
【００４７】
マスターＣＰＵのスカラ処理部１１から発行されたベクトル処理命令は、ベクトルリクエストバス３０を通して自ＣＰＵを含めてスレーブＣＰＵの各ベクトル処理命令制御部１３において有効と判断され、これら複数のＣＰＵにおけるベクトル処理部１４ａ〜１４ｎにおいて並列動作によって処理されることになる。
【００４８】
仮に、１つのＣＰＵに１つのベクトル処理部が存在する場合、主記憶装置２０を共有しているＣＰＵ数を３２とすると、通常は「１スカラ処理部＋１ベクトル処理部」のＣＰＵが３２個存在するシステムとして固定化されてしまうが、１つのマスターＣＰＵに１つのスレーブＣＰＵを対応させた場合、「１スカラ処理部＋２ベクトル処理部」のＣＰＵが１６個存在するシステムのように動作させることが可能となる。
【００４９】
また、マスターＣＰＵとスレーブＣＰＵの設定内容により、「１スカラ処理部＋１ベクトル処理部」のＣＰＵと「１スカラ処理部＋４ベクトル処理部」のＣＰＵを１つのシステムの中に混在させるような構成も取ることができる。すなわち、マスターＣＰＵとスレーブＣＰＵの設定内容により、種々の構成を構築できるようになる。
【００５０】
次いで、第２の実施の形態に係るベクトル処理システムについて説明する。
【００５１】
図５は、第２の実施の形態によるベクトル処理システムのベクトル処理命令制御部１３の構成を示すブロック図である。ベクトル処理命令制御部１３以外の構成については、上述の第１の実施の形態と同様である。
【００５２】
図５に示すベクトル処理命令制御部１３では、マスターＣＰＵ番号を記憶しているレジスタ４２ａと自ＣＰＵ番号を記憶しているレジスタ４３ａを有しており、この２つのレジスタ４２ａ、４３ａの内容を比較する比較器４４ａの出力に基づいて自ＣＰＵのスカラ処理部１１の機能を停止させることについては、図３の構成の場合と同じである。また、このベクトル処理命令制御部１３には、レジスタ４２ａのマスターＣＰＵ番号とベクトル処理命令に含まれる発行元ＣＰＵ情報とを比較する機能と、比較結果に基づいて命令発行を制御する機能を有するリソース管理／命令発行制御部４８ａが備えられている。
【００５３】
このベクトル処理命令制御部１３の動作を図６のフローチャートを参照して説明する。この構成例では、図６に示すように、ベクトルリクエストバス３０を通って転送されてきたベクトル処理命令は、何も処理をせずそのまま命令スタック４７ａに順番に格納される（ステップ６０１）。従って、命令スタック４７ａには、ベクトル処理命令と共に、発行元ＣＰＵ情報を格納しておくレコードが設けられている。
【００５４】
また、マスターＣＰＵ番号を格納したレジスタ４２ａと自ＣＰＵ番号を格納したレジスタ４３ａの内容は、比較器４５ａで比較され（ステップ６０２）、不一致の場合には、自ＣＰＵがスレーブＣＰＵであると判断され、自ＣＰＵのスカラ処理部１１に対して動作を停止するよう制御する（ステップ６０３）。
【００５５】
次で、リソース管理／命令発行制御部４８ａでは、ベクトル処理部への命令発行を行なう際に、ベクトル処理命令に付随している発行元ＣＰＵ情報と、マスターＣＰＵ番号を記憶しているレジスタ４２ａの内容を比較することにより（ステップ６０４）、番号が不一致である場合には不適切なベクトル処理命令の発行を抑止すると共に、命令スタック４７ａ中の該当エリアを解放する（ステップ６０５）。すなわち、命令スタック４７ａに格納する前に無効化処理を行なうのではなく、実際に命令発行を行なう際に無効化処理を行なうように構成している。
【００５６】
また、番号が一致する場合には、第１の実施の形態におけるリソース管理／命令発行制御部４８と同様に、命令スタック４７ａに格納された命令が、優先順位、ならびにベクトル処理部１４ａ〜１４ｎのリソース状況に応じて、発行が可能な順に自ＣＰＵのベクトル処理部１４ａ〜１４ｎに対して命令発行が行なわれる（ステップ６０６）。
【００５７】
上述した第１の実施の形態においては、ベクトル処理命令の命令発行元ＣＰＵ番号を抽出する命令発行元情報抽出部４１、命令発行元ＣＰＵ番号とマスタＣＰＵ番号を比較する比較器４４及び比較結果によってベクトル処理命令の無効化を行なう無効化処理部４６を備えることで命令スタック４７に無効なベクトル処理命令を格納しない構成としたのに対して、この第２の実施の形態では、送られた発行元ＣＰＵ情報を含むベクトル処理命令を全て命令スタック４７ａに格納し、リソース管理命令発行処理部４８ａによる命令発行処理の段階で、適切なベクトル処理命令のみを発行し、不適切なベクトル処理命令について命令スタック４７ａのエリアを解放する構成としている。従って、第１の実施の形態と第２の実施の形態とを比較した場合、ハードウェア量については第２に実施の形態の方が少なくて済み、命令スタックの記憶容量については第１の実施の形態の方が小さくすることができる。
【００５８】
一方、よりスカラ性能を重視したシステム構成として複数の独立したスカラ処理部から共有される多重ベクトルパイプラインという構成も可能である。すなわち、複数のプロセッサ中に存在するベクトルパイプラインを全てまとめて１つの多重ベクトルパイプラインと見なし、独立する各プロセッサのスカラ処理部から単一のベクトルパイプラインを共有しているように動作するシステム構成である。
【００５９】
これを実現した第３の実施の形態に係るベクトル処理システムのベクトル処理命令制御部１３の構成を図７に示す。なお、ベクトル処理命令制御部１３以外の構成については、上述した第１の実施の形態と同一であるので共通の符号を付して説明を省略する。
【００６０】
第３の実施の形態に係るベクトル処理システムにおいて、ベクトル処理命令制御部１３は、命令発行元検出部６１と、各ＣＰＵ毎に設けられた命令スタック６３ａ〜６３ｎと、命令スタック６３ａ〜６３ｎに設定された優先順位に基づいて発行順の調停を行なう調停部６２と、リソース管理／命令発行処理部６４とで構成される。
【００６１】
以下、本実施の形態によるベクトル処理命令制御部１３の動作を図８のフローチャートを参照して説明する。
【００６２】
ベクトルリクエストバス３０を通して転送されてくるベクトル処理命令は、命令発行元検出部６１を経由して各ＣＰＵ毎に設けられた命令スタック６３ａ〜６３ｎに格納される。格納されたベクトル処理命令は、調停部６２による調停結果と各ベクトル処理部１４ａ〜１２ｎからのリソース情報「Ｖ−ＲＰ」と共に、リソース管理／命令発行制御部６４に入力された後に各ベクトル処理部１４ａ〜１４ｎに対して発行される。
【００６３】
ここでは、ベクトルリクエストバス３０を通して転送されてきたベクトル処理命令は、命令発行元検出部６１で発行元のＣＰＵ番号が検査される（ステップ８０１）。その後、ベクトル処理命令は、発行元ＣＰＵ毎に設けられた命令スタック６３ａ〜６３ｎに分かれて格納される（ステップ８０２）。
【００６４】
そして、何れかの命令スタック６３ａ〜６３ｎから命令を発行するかどうかを、優先順位によって競合調停する調停部６２により決定する（ステップ８０３）。調停部６２は、例えばラウンドロビン方式によって何れかの命令スタック６３ａ〜６３ｎから命令を発行するかを決定する。この調停部６２の出力と、各ベクトル処理部のリソース情報とを用いて、リソース管理／命令発行制御部６４で発行命令を決定する（ステップ８０４）。
【００６５】
この際、発行元ＣＰＵが同一のベクトル処理命令については、命令スタックへの格納順序を越えて追い越し発行することはできないが、発行元ＣＰＵが異なっていればリソースの状況によって追い越し発行を行なってもデータ競合は起こらないため問題にはならない。従って、各命令スタック間での格納順序については、特に記憶しておく必要はない。また、発行元ＣＰＵが同一のベクトル処理命令に関しても、アクセスアドレスを比較することで同一アドレスに対するアクセスを回避するための相応のリソース管理手段を準備すれば、追い越し発行が可能である。
【００６６】
以上により、各ＣＰＵから発行されたベクトル処理命令は全てのＣＰＵのベクトル処理命令制御部に転送され発行処理が行なわれることになる。この時ベクトル処理命令の発行元ＣＰＵ別に管理されるため、各ＣＰＵに存在するベクトル処理部は全ＣＰＵで統合された単一のベクトル処理部として、全ＣＰＵのスカラ処理部から共有されているように動作することになる。
【００６７】
上述したベクトル処理システムは、ハードウェア的に実現することは勿論として、図２に示すように、磁気ディスク、半導体メモリその他の記録媒体１８に記録された、上述した各機能を実現するための制御プログラムによってソフトウェア的に実現することも可能である。この制御プログラムは、記録媒体１８からＣＰＵに読み込まれ、ＣＰＵの動作を制御することにより、上述したベクトル処理命令制御の機能を実現する。すなわち、図４、図６及び図８に示される処理を実行する。
【００６８】
なお、本発明は上述した実施の形態に限定されるものではなく、その技術思想の範囲内において様々に変形して実施することができる。例えば、図１におけるシステム全体構成図において、各ＣＰＵ１０ａ〜１０ｎ間でベクトル処理命令を転送するためのベクトルリクエストバス３０は、単一バスとして記述されている。しかしながら、この転送手段は単一バスに限定されるものではなく、多重バスやクロスバスイッチなどあらゆる接続手段によって実現できることは明らかである。
【００６９】
【発明の効果】
以上説明したように本発明のベクトル処理システムとその制御方法によれば、以下に述べるような効果が得られる。
【００７０】
第１に、よりスカラ処理性能を重視したアプリケーション向けに、複数のスカラ処理手段から共有される単一のベクトル処理手段を有した共有メモリ型並列処理システムを提供することができる。
【００７１】
これは、各ＣＰＵのベクトル処理命令制御にＣＰＵ毎のベクトル処理命令スタックを準備し、各ＣＰＵ間で転送されてきたベクトル処理命令をその発行元ＣＰＵ毎に分類して命令スタックに格納して、各命令スタック内の命令の競合を調停しながら順次ベクトル処理部に対してベクトル処理命令を発行することで、全ＣＰＵ中に存在するベクトル処理手段があたかも単一のベクトル処理手段として全てのＣＰＵから共有されているように動作することが出来るためである。
【００７２】
これにより、ベクトル処理命令の出現頻度が極めて低いスカラスループット性能重視のアプリケーション分野に対しても、ベクトル処理リソースを有効利用してより効率の良い処理が可能なシステムを提供することができる。
【００７３】
第２に、スカラ処理手段とベクトル処理手段とを１チップに集積化したＬＳＩを開発することが可能になり、開発工数やコストを軽減することができる。
【００７４】
これは、スカラ処理手段に対する多重ベクトルパイプラインの構成を外部からの設定により柔軟に変更できるようになったために、これまでは困難であったスカラ処理手段とベクトル処理手段を同一のＬＳＩに集積化することが可能になり、その結果ＬＳＩの開発品種数が削減できることによる。
【図面の簡単な説明】
【図１】本発明の第１の実施の形態に係るベクトル処理システム全体の構成図である。
【図２】上記第１の実施の形態に係るベクトル処理システムの各ＣＰＵの詳細な構成を示すブロック図である。
【図３】上記第１の実施の形態に係るベクトル処理システムのベクトル処理命令制御部の詳細を示すブロック図である。
【図４】上記第１の実施の形態に係るベクトル処理システムのベクトル処理命令制御部の動作を説明するフローチャートである。
【図５】第２の実施の形態に係るベクトル処理システムのベクトル処理命令制御部の構成を示すブロック図である。
【図６】上記第２の実施の形態に係るベクトル処理システムのベクトル処理命令制御部の動作を説明するフローチャートである。
【図７】第３の実施の形態に係るベクトル処理システムのベクトル処理命令制御部の構成を示すブロック図である。
【図８】上記第３の実施の形態に係るベクトル処理システムのベクトル処理命令制御部の動作を説明するフローチャートである。
【図９】従来のベクトル処理装置におけるＣＰＵを用いた共有メモリ型並列処理システムの構成を示すブロック図である。
【図１０】図９に示すベクトル処理装置の各ＣＰＵ０の構成を示すブロック図である。
【符号の説明】
１０ａ〜１０ｎＣＰＵ
１１スカラ処理部
１２メモリアクセス命令制御部
１３ベクトル処理命令制御部
１４ａ〜１４ｎベクトル処理部
１５メモリアクセスネットワーク部
２０主記憶装置
３０ベクトルリクエストバス
４１命令発行元情報抽出部
４２，４３、４２ａ，４３ａレジスタ
４４，４５、４４ａ比較器
４６無効化処理部
４７、４７ａ命令スタック
４８、４８ａリソース管理命令発行処理部
６１命令発行元検出
６２調停部
６３ａ〜６３ｎ命令スタック
６４リソース管理／命令発行処理部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a shared memory vector processing system that includes a plurality of CPUs that share a main memory, and each CPU includes a vector processing unit that forms a scalar processing unit and a plurality of vector pipelines.
[0002]
[Prior art]
FIG. 9 shows a configuration of a shared memory parallel processing system using a CPU in a conventional vector processing apparatus. In this system, a plurality of CPUs 100 a to 100 n are connected by sharing one main storage device 200.
[0003]
A detailed configuration diagram of each of the CPUs 100a to 100n is shown in FIG. Each of the CPUs 100a to 100n includes a scalar processing unit 101, an instruction control unit 102, vector processing units 104a to 104n, and a memory access network unit 105, as illustrated.
[0004]
The external processing instruction “EX-RQ” issued from the scalar processing unit 101 is transferred to the instruction control unit 102. The instruction control unit 102 issues a vector processing instruction “V-RQ” by managing the resources of the vector processing units 104 a to 104 n existing only within the CPU.
[0005]
Therefore, the configuration of the scalar processing unit 101 and the vector pipeline in each of the CPUs 100a to 100n is always constant and cannot be changed.
[0006]
Examples of conventional vector processing apparatuses are disclosed in, for example, Japanese Patent Laid-Open Nos. 63-127368 and 63-10263. In the vector processing devices disclosed in these publications, the configuration of the scalar processing unit and the vector pipeline is constant, and the number of vector pipelines attached to the scalar processing unit can be flexibly changed according to the application. It is not.
[0007]
[Problems to be solved by the invention]
The conventional vector processing apparatus described above has the following problems.
[0008]
The first problem is that an appropriate vector processing resource cannot be assigned to a vectorization rate or the like depending on an application to be executed.
[0009]
The reason is that the number of vector pipelines in each CPU is always constant. Therefore, when an application with a vectorization rate lower than expected is executed, a surplus of vector resources occurs. Conversely, even if an application with a higher vectorization rate or a longer vector length is executed, the upper limit of vector processing performance is fixed by the vector pipeline configuration that is fixedly configured in advance. There is no improvement.
[0010]
The second problem is that the scalar processing unit and the vector pipeline need to be developed as separate LSIs even if the integration degree of the LSIs is improved.
[0011]
The reason is that the degree of integration of LSIs has improved, and one scalar processing unit and a vector pipeline can be integrated into one chip. In the conventional multi-vector pipeline configuration, such an LSI is used. Since a scalar processing unit existing in each LSI cannot be used at the time of multiple connection and the amount of hardware is wastefully used, the scalar processing unit and the vector pipeline are developed as separate LSIs as before. turn into. However, this method increases items such as an increase in the number of LSI development steps, an increase in the number of LSI development varieties, and a decrease in the number of production per LSI.
[0012]
An object of the present invention is to provide a vector processing system that can flexibly change the number of vector pipelines associated with a scalar processing unit according to the application.
[0013]
Another object of the present invention is to provide a vector processing system that operates so as to share a single vector pipeline from the scalar processing unit of each independent processor.
[0014]
[Means for Solving the Problems]
The present invention according to claim 1, which achieves the above object, is a shared memory type vector processing system comprising a plurality of CPUs sharing a main memory, wherein each CPU has scalar processing means and vector processing means. The CPUs are connected to each other by a path for transferring vector processing instructions generated from each CPU to each CPU, and each CPU issues a vector processing instruction to which issuance source CPU information for identifying the issuing CPU is added. A plurality of instruction stacks corresponding to each CPU based on the issuing source CPU information, and issuing means for transferring to all CPUs including the CPU of itself via the path. And based on the vector processing instruction based on the priority for each of the plurality of instruction stacks and the resource information of the vector processing means Characterized in that it comprises a vector processing instruction control means for controlling the decree issued.
[0015]
3. The shared memory vector processing system according to claim 2, wherein the vector processing instruction control means of each CPU includes a plurality of instruction stacks corresponding to each CPU, and the issuance included in the transferred vector processing instruction. Instruction issuer detection means for detecting original CPU information and storing the vector processing instruction in the corresponding instruction stack, and issuing an instruction based on a vector processing instruction of any instruction stack for each of the plurality of instruction stacks Arbitration means for determining whether to give priority, and instruction issue processing means for issuing an instruction based on the vector processing instruction to the vector processing means based on the content determined by the arbitration means and the resource information of the vector processing means. It is characterized by providing.
[0016]
The present invention of claim 3 is a control method of a shared memory type vector processing system comprising a plurality of CPUs sharing a main memory, each CPU having a scalar processing means and a vector processing means. CPUs are connected to each other by a path for transferring vector processing instructions generated from each CPU to each CPU, and each CPU issues a vector processing instruction to which issuance source CPU information for identifying the issuing CPU is added. Then, the vector processing instruction transferred to all CPUs including its own CPU via the path is stored in a plurality of instruction stacks corresponding to each CPU based on the issuing CPU information. Controlling instruction issuance based on the vector processing instruction based on the priority order of the plurality of instruction stacks and resource information of the vector processing means And wherein the Rukoto.
[0017]
According to a fourth aspect of the present invention, there is provided a shared memory type vector processing system control method, wherein each of the CPUs detects the issuer CPU information included in the transferred vector processing instruction, and the instruction corresponding to the vector processing instruction is detected. It is stored in a stack, and for each of the plurality of instruction stacks, the instruction processing based on the vector processing instruction of which instruction stack is prioritized, based on the determination content and the resource information of the vector processing means, An instruction issuance based on a vector processing instruction is performed to the vector processing means.
[0018]
The present invention of claim 5 includes a plurality of CPUs sharing a main memory, and each CPU has a scalar processing means and a vector processing means, and the CPU of the issuer is included in the CPU of the shared memory type vector processing system. Issuing a vector processing instruction to which issuance source CPU information for identifying the CPU is issued, transferring it to all CPUs including its own CPU via a mutually connected path, and the transferred vector processing instruction, An instruction based on the vector processing instruction is stored in a plurality of instruction stacks corresponding to each CPU based on the issuing CPU information, and based on the priority for each of the plurality of instruction stacks and the resource information of the vector processing means. The storage medium stores a program for executing processing for controlling issuance.
[0019]
According to a sixth aspect of the present invention, the CPU detects the issuer CPU information included in the transferred vector processing instruction and stores the vector processing instruction in a corresponding instruction stack.
The vector processing instruction based on the determination content and the resource information of the vector processing means for determining the instruction processing based on the vector processing instruction of which instruction stack is prioritized for each of the plurality of instruction stacks And executing a process of issuing an instruction based on the vector processing means to the vector processing means.
[0020]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a configuration diagram of the entire vector processing system according to the first embodiment of the present invention.
[0021]
The vector processing system according to the present embodiment includes a plurality of CPUs 10a to 10n, and constitutes a shared memory parallel processing system in which these CPUs 10a to 10n share a single main storage device 20. The CPUs 10a to 10n are connected to each other via the vector request bus 30 and can send and receive requests and replies related to vector processing.
[0022]
A detailed configuration of each of the CPUs 10a to 10n will be described with reference to FIG.
[0023]
Each of the CPUs 10a to 10n includes a single scalar processing unit 11, a memory access instruction control unit 12, a vector processing instruction control unit 13, a plurality of vector processing units 14a to 14n, and a memory access network unit 15.
[0024]
An external processing instruction “EX-RQ” issued from the scalar processing unit 11 to the outside is transferred to the main storage device 20 via the memory access network unit 15 through the memory access control unit 12 or via the vector request bus 30. Via this, it is sent to the vector processing instruction control unit 13 of all the CPUs 10a to 10n, and issued to the vector processing units 14a to 14n.
[0025]
Here, FIG. 3 is a block diagram showing a detailed configuration of the vector processing instruction control unit 13.
[0026]
The vector processing instruction control unit 13 separates the contents of the two registers 42 and 43, the comparator 45 that compares the contents of the registers 42 and 43, and the contents of the vector processing instruction transferred via the vector request bus 30. A comparator 44 for comparing the issuer information of the vector processing instruction obtained by the instruction issuer information extracting unit 41 to the contents of one register 42, an instruction invalidation processing unit 46, an instruction stack 47, resource management / instruction An issue processing unit 48 is provided.
[0027]
The vector processing instruction obtained from the instruction issuer information extraction unit 41 and the output of the comparator 44 are input to the invalidation processing unit 46 and then stored in the instruction stack 47. The vector processing instruction stored in the instruction stack 47 is input to the resource management / instruction issue processing unit 48 together with resource information from the vector processing units 14a to 14n, and is issued to the vector processing units 14a to 14n. Done.
[0028]
Next, the operation of the vector processing system according to the first embodiment configured as described above will be described.
[0029]
In FIG. 2, the scalar processing unit 11 decodes the instruction and executes the execution process of the scalar processing instruction. Here, when an instruction that cannot be executed by the scalar processing unit 11, such as an access instruction to the main storage device 20 or a vector processing instruction, the scalar processing unit 11 regards these instructions as an external processing instruction “EX-RQ”. Transfer to the memory access instruction control unit 12.
[0030]
The memory access instruction control unit 12 decodes the external processing instruction “EX-RQ” received from the scalar processing unit 11, and if it is an instruction “M-RQ” for the main memory access system, the memory access instruction control unit 12 directly sends it to the memory access network unit 15. Issue instructions.
[0031]
On the other hand, if it is a vector processing instruction, it is sent to the vector request bus 30 and issued to the own vector processing instruction control unit 13 via the vector request bus 30.
[0032]
The vector processing instruction control unit 13 receives the vector processing instruction issued from its own CPU sent from the memory access instruction control unit 12 and the vector processing instruction transferred from the other CPU via the vector request bus 30. The command is issued to the vector processing units 14a to 14n in the own CPU while managing the resource state.
[0033]
The memory access network unit 15 receives the main memory access command “M-RQ” from the memory access command control unit 12, issues a command to the main memory device 20, and receives read data from the main memory device 20. The data is returned to the scalar processing unit 11 or the vector processing units 14a to 14n according to the instruction type.
[0034]
Next, the operation of the vector processing instruction control unit 13 will be described with reference to the flowcharts of FIGS.
[0035]
The vector processing instruction sent to the vector processing instruction control unit 13 is separated into information related to the CPU that sent the instruction and the vector processing instruction body in the instruction issuer information extraction unit 41 (step 401).
[0036]
The vector processing instruction control unit 13 includes a register 42 that stores a CPU number set as an external master for the CPU and a register 43 that stores the CPU number. It is assumed that the above numbers are set for these two registers 42 and 43 as initialization operations before the system is started.
[0037]
In the present embodiment, the CPUs 10a to 10n of the shared memory parallel processing system are set separately for the master CPU and the slave CPU. The master CPU can execute scalar processing and issue vector processing instructions to other CPUs. On the other hand, the slave CPU receives the vector processing instruction transferred from the master CPU, and operates as a multi-vector pipeline in synchronization with the vector processing units 14a to 14n in the master CPU. At this time, the scalar processing unit 11 of the slave CPU is in a dormant state, and only the vector processing units 14a to 14n, the vector processing instruction control unit 13, and the memory access network unit 15 function effectively.
[0038]
The contents of the register 42 storing the master CPU number and the register 43 storing the own CPU number are compared by the comparator 45 (step 402). If they do not match, it is determined that the own CPU is a slave CPU and Control is made to stop the operation of the scalar processing unit 11 of the CPU (step 403).
[0039]
On the other hand, the instruction issuer CPU number extracted in the instruction issuer information extraction unit 41 from the vector processing instruction transferred through the vector request bus 30 and the contents of the register 42 storing the master CPU number are another one. The comparison is made by the comparator 44 (step 404). The comparison result at this time and the vector processing instruction separated by the instruction issuer information extraction unit 41 are input to the invalidation processing unit 46 (step 405).
[0040]
If the comparison result by the comparator 44 is inconsistent (step 406), the input vector processing instruction is not a vector processing instruction issued from the master CPU of its own CPU operating as a slave, and therefore invalidation processing is performed. It is invalidated by the unit 46 (step 407). Specifically, according to the comparison result by the comparator 44, a flag indicating validity or invalidity is attached to the vector processing instruction, and the invalidation processing unit 46 puts only a valid vector processing instruction in the instruction stack 47 by the flag. Store. Invalid vector processing instructions are not stored in the instruction stack 47.
[0041]
Of course, if the own CPU is operating as the master CPU and the transferred vector processing instruction is an instruction issued by the own CPU, the result of the comparator 44 indicates a match and is invalidated. Absent.
[0042]
Since the vector processing instruction that has not been invalidated by the invalidation processing unit 46 is an instruction to be processed by the vector processing units 14a to 14n in its own CPU, it is stored in the instruction stack 47 in the order of acceptance (step 408). The invalidated vector processing instruction is discarded without being stored in the instruction stack 47.
[0043]
The resource management / instruction issuance control unit 48 manages the resources 14a to 14n of the vector processing unit in its own CPU. Instructions stored in the instruction stack 47 are issued in the order in which the instructions can be issued in the resource management / instruction issue processing unit 48 according to the priority order and the resource status of the vector processing units 14a to 14n. An instruction is issued to 14n (step 409). Here, it is possible to overtake and issue instructions without following the order of storage in the instruction stack 47.
[0044]
When the vector processing in each slave CPU is completed, the master CPU is notified of the end of the processing. The master CPU issues the next vector processing instruction after confirming that end notifications have been received from all slave CPUs.
[0045]
As described above, the CPU set as the master CPU and the plurality of CPUs storing the CPU number as the master CPU number can be regarded as a processor of a multi-vector pipeline that operates as a unit.
[0046]
At this time, only the scalar processing unit 11 of the master CPU is functioning, and the function of the scalar processing unit 11 of the slave CPU is stopped by the control signal “HALT” from the vector processing instruction control unit 13.
[0047]
The vector processing instruction issued from the scalar processing unit 11 of the master CPU is judged to be valid in each vector processing instruction control unit 13 of the slave CPU including the own CPU through the vector request bus 30, and the vector processing unit in the plurality of CPUs 14a to 14n are processed by parallel operation.
[0048]
If there is one vector processing unit in one CPU, assuming that the number of CPUs sharing the main storage device 20 is 32, there are usually 32 CPUs of “1 scalar processing unit + 1 vector processing unit”. However, if one slave CPU is associated with one master CPU, it can be operated like a system having 16 CPUs of “1 scalar processing unit + 2 vector processing unit”. It becomes possible.
[0049]
In addition, depending on the setting contents of the master CPU and slave CPU, a configuration in which a CPU of “1 scalar processing unit + 1 vector processing unit” and a CPU of “1 scalar processing unit + 4 vector processing unit” are mixed in one system. Can be taken. That is, various configurations can be constructed depending on the settings of the master CPU and slave CPU.
[0050]
Next, a vector processing system according to the second embodiment will be described.
[0051]
FIG. 5 is a block diagram showing a configuration of the vector processing instruction control unit 13 of the vector processing system according to the second embodiment. The configuration other than the vector processing instruction control unit 13 is the same as that in the first embodiment.
[0052]
The vector processing instruction control unit 13 shown in FIG. 5 has a register 42a storing a master CPU number and a register 43a storing its own CPU number, and compares the contents of the two registers 42a and 43a. The function of the scalar processing unit 11 of the own CPU is stopped based on the output of the comparator 44a, which is the same as in the configuration of FIG. The vector processing instruction control unit 13 has a function of comparing the master CPU number of the register 42a with the issuer CPU information included in the vector processing instruction and a function of controlling instruction issuance based on the comparison result. A management / command issue controller 48a is provided.
[0053]
The operation of the vector processing instruction control unit 13 will be described with reference to the flowchart of FIG. In this configuration example, as shown in FIG. 6, the vector processing instructions transferred through the vector request bus 30 are sequentially stored in the instruction stack 47a without any processing (step 601). Therefore, the instruction stack 47a is provided with a record for storing the issuer CPU information together with the vector processing instruction.
[0054]
Further, the contents of the register 42a storing the master CPU number and the register 43a storing the own CPU number are compared by the comparator 45a (step 602). If they do not match, it is determined that the own CPU is a slave CPU. Then, it controls the scalar processing unit 11 of its own CPU to stop the operation (step 603).
[0055]
Next, when the resource management / instruction issue control unit 48a issues an instruction to the vector processing unit, the issuer CPU information attached to the vector processing instruction and the register 42a storing the master CPU number are stored. By comparing the contents (step 604), if the numbers do not match, issuance of an inappropriate vector processing instruction is suppressed and the corresponding area in the instruction stack 47a is released (step 605). That is, the invalidation process is not performed before the instruction stack 47a is stored, but the invalidation process is performed when the instruction is actually issued.
[0056]
If the numbers match, the instruction stored in the instruction stack 47a is assigned the priority level and the vector processing units 14a to 14n as in the resource management / instruction issue control unit 48 in the first embodiment. Depending on the resource status, instructions are issued to the vector processing units 14a to 14n of the CPU in the order in which they can be issued (step 606).
[0057]
In the first embodiment described above, the instruction issuer information extraction unit 41 that extracts the instruction issuer CPU number of the vector processing instruction, the comparator 44 that compares the instruction issuer CPU number and the master CPU number, and the comparison result. By providing the invalidation processing unit 46 for invalidating the vector processing instruction, an invalid vector processing instruction is not stored in the instruction stack 47, whereas in the second embodiment, the issued issue is sent. All vector processing instructions including the original CPU information are stored in the instruction stack 47a, and only appropriate vector processing instructions are issued at the stage of instruction issue processing by the resource management instruction issue processing unit 48a. The area of the stack 47a is released. Therefore, when the first embodiment and the second embodiment are compared, the second embodiment requires less hardware, and the storage capacity of the instruction stack is the first. The form of can be made smaller.
[0058]
On the other hand, a multi-vector pipeline shared by a plurality of independent scalar processing units is also possible as a system configuration that emphasizes scalar performance. That is, a system that operates so that all vector pipelines existing in a plurality of processors are collectively regarded as one multi-vector pipeline and a single vector pipeline is shared by the scalar processing units of the independent processors. It is a configuration.
[0059]
FIG. 7 shows the configuration of the vector processing instruction control unit 13 of the vector processing system according to the third embodiment that realizes this. Since the configuration other than the vector processing instruction control unit 13 is the same as that of the first embodiment described above, the same reference numerals are given and description thereof is omitted.
[0060]
In the vector processing system according to the third embodiment, the vector processing instruction control unit 13 sets the instruction issuer detection unit 61, the instruction stacks 63a to 63n provided for each CPU, and the instruction stacks 63a to 63n. The arbitration unit 62 performs arbitration in the issue order based on the priority order and the resource management / instruction issue processing unit 64.
[0061]
The operation of the vector processing instruction control unit 13 according to the present embodiment will be described below with reference to the flowchart of FIG.
[0062]
The vector processing instructions transferred through the vector request bus 30 are stored in instruction stacks 63a to 63n provided for each CPU via the instruction issuer detection unit 61. The stored vector processing instruction is input to the resource management / instruction issuance control unit 64 together with the arbitration result by the arbitration unit 62 and the resource information “V-RP” from each of the vector processing units 14a to 12n. Issued to 14a-14n.
[0063]
Here, for the vector processing instruction transferred through the vector request bus 30, the instruction issuer detection unit 61 checks the issuer CPU number (step 801). Thereafter, the vector processing instructions are divided and stored in instruction stacks 63a to 63n provided for each issuing CPU (step 802).
[0064]
Then, whether or not to issue an instruction from any of the instruction stacks 63a to 63n is determined by the arbitration unit 62 that performs the arbitration according to the priority order (step 803). The arbitrating unit 62 determines whether to issue an instruction from any of the instruction stacks 63a to 63n by, for example, a round robin method. Using the output of the arbitration unit 62 and the resource information of each vector processing unit, the resource management / command issue control unit 64 determines an issue command (step 804).
[0065]
At this time, vector processing instructions with the same issuer CPU cannot be issued by overtaking beyond the order of storage in the instruction stack. However, if the issuer CPU is different, it may be issued by overtaking depending on the resource status. There is no problem because there is no data race. Therefore, it is not necessary to memorize the storage order between the instruction stacks. Further, even if the issuing CPU has the same vector processing instruction, overtaking issuance is possible by preparing corresponding resource management means for avoiding access to the same address by comparing the access addresses.
[0066]
As described above, the vector processing instruction issued from each CPU is transferred to the vector processing instruction control unit of all CPUs, and issuance processing is performed. At this time, since the vector processing instructions are managed according to the issuing CPUs, the vector processing unit existing in each CPU seems to be shared from the scalar processing units of all CPUs as a single vector processing unit integrated by all the CPUs. Will work.
[0067]
The above-described vector processing system is not only realized in hardware, but, as shown in FIG. 2, controls for realizing the above-described functions recorded on a magnetic disk, semiconductor memory or other recording medium 18 are provided. It can be realized by software by a program. This control program is read from the recording medium 18 to the CPU and controls the operation of the CPU, thereby realizing the above-described vector processing instruction control function. That is, the processing shown in FIGS. 4, 6 and 8 is executed.
[0068]
The present invention is not limited to the above-described embodiments, and can be implemented with various modifications within the scope of the technical idea. For example, in the overall system configuration diagram in FIG. 1, the vector request bus 30 for transferring vector processing instructions between the CPUs 10a to 10n is described as a single bus. However, this transfer means is not limited to a single bus and can obviously be realized by any connection means such as multiple buses or crossbar switches.
[0069]
【The invention's effect】
As described above, according to the vector processing system and the control method of the present invention, the following effects can be obtained.
[0070]
First, it is possible to provide a shared memory type parallel processing system having a single vector processing unit shared by a plurality of scalar processing units for applications that place more importance on scalar processing performance.
[0071]
This prepares a vector processing instruction stack for each CPU for the vector processing instruction control of each CPU, classifies vector processing instructions transferred between the CPUs for each issuing CPU and stores them in the instruction stack, By sequentially issuing vector processing instructions to the vector processing unit while arbitrating the competition of instructions in each instruction stack, the vector processing means existing in all CPUs can be regarded as a single vector processing means from all CPUs. This is because they can operate as if they are shared.
[0072]
As a result, it is possible to provide a system capable of performing more efficient processing by effectively using vector processing resources even in an application field where importance is placed on scalar throughput performance in which the appearance frequency of vector processing instructions is extremely low.
[0073]
Second, it is possible to develop an LSI in which scalar processing means and vector processing means are integrated on a single chip, thereby reducing development man-hours and costs.
[0074]
This is because the configuration of the multi-vector pipeline for the scalar processing means can be flexibly changed by setting from the outside, so the scalar processing means and the vector processing means, which were difficult until now, are integrated in the same LSI As a result, the number of LSI development types can be reduced.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of an entire vector processing system according to a first embodiment of the present invention.
FIG. 2 is a block diagram showing a detailed configuration of each CPU of the vector processing system according to the first embodiment.
FIG. 3 is a block diagram showing details of a vector processing instruction control unit of the vector processing system according to the first embodiment.
FIG. 4 is a flowchart illustrating an operation of a vector processing instruction control unit of the vector processing system according to the first embodiment.
FIG. 5 is a block diagram showing a configuration of a vector processing instruction control unit of a vector processing system according to a second embodiment.
FIG. 6 is a flowchart illustrating an operation of a vector processing instruction control unit of the vector processing system according to the second embodiment.
FIG. 7 is a block diagram showing a configuration of a vector processing instruction control unit of a vector processing system according to a third embodiment.
FIG. 8 is a flowchart illustrating an operation of a vector processing instruction control unit of the vector processing system according to the third embodiment.
FIG. 9 is a block diagram showing the configuration of a shared memory parallel processing system using a CPU in a conventional vector processing apparatus.
10 is a block diagram showing a configuration of each CPU 0 of the vector processing device shown in FIG. 9;
[Explanation of symbols]
10a-10n CPU
11 Scalar processing part
12 Memory access instruction control unit
13 Vector processing instruction control unit
14a-14n Vector processing unit
15 Memory access network section
20 Main memory
30 vector request bus
41 Instruction issuer information extraction unit
42, 43, 42a, 43a registers
44, 45, 44a comparator
46 Invalidation processing part
47, 47a Instruction stack
48, 48a Resource management instruction issue processor
61 Instruction issuer detection
62 Mediation Department
63a-63n instruction stack
64 Resource Management / Instruction Issue Processor

Claims

In a shared memory type vector processing system comprising a plurality of CPUs sharing a main memory, each CPU having a scalar processing means and a vector processing means,
The CPUs are connected to each other by a path for transferring vector processing instructions generated from the CPUs to the CPUs.
Each of the CPUs
Issuing means for issuing a vector processing instruction to which issuance source CPU information for identifying the issuance CPU is added and transferring the instruction to all CPUs including the own CPU via the path;
The transferred vector processing instruction is stored in a plurality of instruction stacks corresponding to each CPU based on the issuing CPU information, and based on the priority for each of the plurality of instruction stacks and the resource information of the vector processing means. And a vector processing instruction control means for controlling instruction issuance based on the vector processing instruction.

The vector processing instruction control means of each CPU is
A plurality of instruction stacks corresponding to each CPU;
Instruction issuer detection means for detecting the issuer CPU information included in the transferred vector processing instruction and storing the vector processing instruction in the corresponding instruction stack;
Arbitration means for deciding whether to give priority to instruction issue based on a vector processing instruction of which instruction stack for each of the plurality of instruction stacks;
2. An instruction issuance processing unit for issuing an instruction based on the vector processing instruction to the vector processing unit based on a determination by the arbitration unit and resource information of the vector processing unit. A shared memory type vector processing system according to claim 1.

A control method of a shared memory type vector processing system comprising a plurality of CPUs sharing a main memory, each CPU having a scalar processing means and a vector processing means,
The CPUs are connected to each other by a path for transferring vector processing instructions generated from the CPUs to the CPUs.
In each CPU,
Issuing a vector processing instruction to which the issuer CPU information for identifying the issuer CPU is added is transferred to all CPUs including the own CPU via the path,
The transferred vector processing instruction is stored in a plurality of instruction stacks corresponding to each CPU based on the issuing CPU information, and based on the priority for each of the plurality of instruction stacks and the resource information of the vector processing means. And controlling the issuance of instructions based on the vector processing instructions.

In each CPU,
Detecting the issuer CPU information included in the transferred vector processing instruction, and storing the vector processing instruction in a corresponding instruction stack;
For each of the plurality of instruction stacks, determine which instruction stack is prioritized to issue instructions based on vector processing instructions,
4. The shared memory type vector processing system according to claim 3, wherein an instruction based on the vector processing instruction is issued to the vector processing means based on the determined content and resource information of the vector processing means. Control method.

The CPU of the shared memory type vector processing system having a plurality of CPUs sharing the main memory, each CPU having a scalar processing unit and a vector processing unit,
A process of issuing a vector processing instruction added with issuer CPU information for identifying the issuer CPU and transferring it to all CPUs including the own CPU via a mutually connected path;
The transferred vector processing instruction is stored in a plurality of instruction stacks corresponding to each CPU based on the issuing CPU information, and based on the priority for each of the plurality of instruction stacks and the resource information of the vector processing means. And a storage medium storing a program for executing processing for controlling command issuance based on the vector processing command.

In the CPU,
Detecting the issuer CPU information included in the transferred vector processing instruction and storing the vector processing instruction in a corresponding instruction stack;
A process for determining, for each of the plurality of instruction stacks, which instruction stack is prioritized to be issued based on a vector processing instruction;
6. The program according to claim 5, wherein a program for issuing an instruction based on the vector processing instruction to the vector processing means is executed based on the determined content and resource information of the vector processing means. Stored storage medium.