JP2004206228A

JP2004206228A - Control method for execution of store instruction

Info

Publication number: JP2004206228A
Application number: JP2002371895A
Authority: JP
Inventors: Masashi Shinohara; 真史篠原
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2002-12-24
Filing date: 2002-12-24
Publication date: 2004-07-22
Anticipated expiration: 2022-12-24
Also published as: JP3800171B2

Abstract

<P>PROBLEM TO BE SOLVED: To improve store performance (throughput or the like) by compressing data to be stored to reduce the number of packets to be transferred when the data pattern of the data to be stored is a specified data pattern. <P>SOLUTION: A processor 100 determines the data pattern of the data to be stored when performing a store (including a store-based operation other than a narrowly-defined store) to a memory 200, and compresses the data pattern, when the data to be stored are specific data, to change the packet structure related to the store instruction of the specific data from a packet structure related to the store instruction of data other than the specific data (reduction in the number of packets, etc.). When the the pattern showing the data to be stored in the store instruction transmitted from the processor 10 is a compressed one, the memory 200 receives the store instruction after decoding the pattern. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、複数のプロセッサと複数のメモリとからなる並列コンピュータシステムであり、プロセッサからメモリへの情報の転送を行う際にパケット分割してその転送を行うコンピュータシステムにおいて、ストア命令（狭義のストア命令以外のストア系命令を含む）の実行を制御するストア命令実行制御方式に関する。
【０００２】
【従来の技術】
最初に、従来の技術における転送データリクエスト（ロード命令（狭義のロード命令以外のロード系命令を含む）やストア命令）の一般的なパケット構成について説明する。
【０００３】
図１２および図１３は、従来の技術における転送データリクエストのパケット構成を示す図である。図１２がストア命令のパケット構成を示す図であり、図１３がロード命令のパケット構成を示す図である。
【０００４】
以下に、図１２および図１３における各フィールドの意味を示す。
【０００５】
Ｖ（第１パケット（１‘ｓｔＰａｃｋｅｔ）のビット０）は、転送データリクエストのＶビットであり、本ビットが“１”の場合にはリクエストが有効であることを示す。
【０００６】
第１パケットのビット１は、転送データリクエストがストア命令であるかロード命令であるかを示すビットである。
【０００７】
ストア命令の場合（第１パケットのビット１が“０”の場合）には、ストア対象データが必要なため、パケット数が多くなっている。ストア対象データは、本フォーマットの場合に、第３パケット（３’ｒｄＰａｃｋｅｔ）と第４パケット（４‘ｔｈＰａｃｋｅｔ）とに割り当てられている（それぞれ４バイト（Ｂ）のデータがストア対象データとして転送される）。
【０００８】
第１パケット中の上位アドレス（ＡｄｄｒｅｓｓＵｐｐｅｒ）および第２パケット中の下位アドレス（ＡｄｄｒｅｓｓＬｏｗｅｒ）のフィールドは、ストアするアドレスを示すフィールドである（後述するロード命令の場合には、これらのフィールドはロードするアドレスを示すこととなる）。
【０００９】
Ｚｏｎｅフィールド（第２パケット（２‘ｎｄＰａｃｋｅｔ）のビット０〜７）は、ＲＭＷ（ＲｅａｄＭｏｄｉｆｙＷｒｉｔｅ）時の対応バイト位置を示すビット群である。ＲＭＷ時には、対応するＺｏｎｅビットが“１”の点灯しているバイトのみについて操作（ＯＲやＡＮＤ）が行われる。
【００１０】
Ｉ（第２パケットのビット８）は、例外ビットである。本ビットによって「リクエストにおいて例外が発生していること」が示されている場合には、メモリ側では動作が行われない（リードおよびライトの両方の動作が行われない）。
【００１１】
Ｅ（第２パケットのビット９）は、例外検出を行うか否かを示すビットである。本ビットが“１”の場合には当該リクエストに対して例外検出を行わないことを示し、本ビットが“０”の場合には当該リクエストに対して例外検出を行うことを示す。
【００１２】
ＥＬＥ（ＥＬＥｅｍｅｎｔ）フィールド（第２パケットのビット１０〜１８）は、転送データリクエストに対して付加されるリクエストの認識番号を示すフィールドである。この認識番号は、プロセッサ側で１つのリクエストに対して１つの番号をとり、メモリに転送される。メモリ側では、本認識番号（ＥＬＥ番号）が処理終了時にプロセッサに対して返送される。プロセッサは、ＥＬＥ番号を受け取ることにより、リクエストが正しく処理されたことを認識し、ＥＬＥ番号を開放する。当該ＥＬＥ番号は、開放されることにより、再利用が可能となる。
【００１３】
ロード命令の場合には、第１パケットのビット１が“１”であることと、パケット構成が２パケットからなる構成（ストア対象データのための第３パケットおよび第４パケットが存在しない構成）となっていることと以外は、上記のストア命令の場合と同じである。
【００１４】
従来のストア命令実行制御方式においては、転送データリクエストのパケット構成として上記のようなパケット構成が採用されており、プロセッサとメモリとの間の通信では規定のパケット数で転送が行われ、特殊データ（浮動小数点フォーマットにおける特殊数等）のストア命令に対して、パケット数の変更（ストア対象データの圧縮）のような考慮は行われていなかった。
【００１５】
なお、従来においても、転送対象のデータの圧縮を考慮する技術は存在していた（例えば、特許文献１および特許文献２参照）。
【００１６】
しかしながら、このような従来の技術は、図１中のデータパターン認識回路１０１やデータパターン認識＆データ生成回路２０１等の本願発明に特有の構成要素を有しておらず、本願発明とは明確に異なるものである。
【００１７】
具体的には、先に引用した特許文献１および特許文献２に記載された発明と本願発明のストア命令実行制御方式との差異は、以下のような点にある。
【００１８】
特許文献１記載の発明は、キャッシュへのデータ登録の際に、固定データパターンＡＬＬ０やＡＬＬ１の場合に、キャッシュへの登録を行わず、アドレスキャッシュ部にフラグを立てることにより、キャッシュメモリの使用効率を高めることが目的であり、本願発明とは目的（その目的に起因する構成）が異なる。また、当該公報記載のものは、データの圧縮パターンもＡＬＬ０とＡＬＬ１とだけであり、ＩＥＥＥ（ＩｎｓｔｉｔｕｔｅｏｆＥｌｅｃｔｒｉｃａｌａｎｄＥｌｅｃｔｒｏｎｉｃｓＥｎｇｉｎｅｅｒｓ）浮動小数点フォーマットにおける特殊数等には対応していない（対応させるとビット数が増える）。
【００１９】
特許文献２記載の発明は、マルチプロセッサ構成でのバスでのデータ転送において通常のプロセッサと同等な能力を持つプロセッサをＤＭＡコントローラとして持ち、当該ＤＭＡコントローラを使用して転送データの圧縮伸張を行う技術であり、本願発明とは構成が異なる。本願発明では、少ないハードウェア量でデータバスへの負荷を減らすことができる。
【００２０】
【特許文献１】
特開２０００−２８５０１９号公報（第３−４頁）
【００２１】
【特許文献２】
特開２００１−１１７８９３号公報（第４頁）
【００２２】
【発明が解決しようとする課題】
上述した従来の技術では、ストア命令の実行に際して、ストア対象データが特殊データである場合にパケット構成を変更するといった考慮が行われておらず、プロセッサとメモリとの間の通信において規定のパケット数で転送が行われているので、ストア動作の時間はデータの長さが決まってしまえば、一定となっていた。したがって、従来の技術においては、スループット等のストア性能を向上させる上で、パケット構成に基づく限界が存在するという問題点があった。
【００２３】
本発明の目的は、上述の点に鑑み、ストア対象データのデータパターンが特定のデータパターンの際には、ストア対象データを圧縮して転送対象のパケット数を減らすことにより、ストア性能（スループット等）を向上させることができるストア命令実行制御方式を提供することにある。
【００２４】
【課題を解決するための手段】
本発明のストア命令実行制御方式は、複数のプロセッサと複数のメモリとからなる並列コンピュータシステムにおいて、ストア命令である転送データリクエストの送信時に、ストア対象データのデータパターンが特殊データのデータパターンに該当するか否かを判定し、特殊データのデータパターンである場合に、どの特殊データのデータパターンに該当するかを判断し、その判断結果に基づき当該ストア対象データに対応する圧縮パターンを送出するプロセッサ内のデータパターン認識回路と、前記データパターン認識回路から送られてきた圧縮パターンに基づいてストア対象データが特殊データであるか否かを判定し、その判定結果が「ストア対象データが特殊データである」の場合に、特殊データストア命令識別ビットに特殊データストア命令であることを示す情報（例えば、図５中の第１パケットのビット３２における“１”）を有し、前記データパターン認識回路から送付されてきた圧縮パターンを保持し、特殊データストア命令以外のストア命令よりもパケット数が少ないパケット構成の転送データリクエストをメモリ側に出力（送信）するプロセッサ内のリクエスト出力部と、プロセッサ側からストア命令である転送データリクエストを受信すると、その転送データリクエスト中の特殊データストア命令識別ビットの内容に基づいてその転送データリクエストが特殊データストア命令であるか否かを判定し、その判定結果が「転送データリクエストが特殊データストア命令である」の場合にはその転送データリクエスト中の圧縮パターンの内容を判別し、その判別結果に基づいて当該圧縮パターンを復号したストア対象データのデータパターンを生成し出力するメモリ内のデータパターン認識＆データ生成回路と、プロセッサ側からストア命令である転送データリクエストを受信すると、その転送データリクエスト中の特殊データストア命令識別ビットの内容に基づいてその転送データリクエストが特殊データストア命令であるか否かを判定し、その判定結果が「転送データリクエストが特殊データストア命令である」の場合には前記データパターン認識＆データ生成回路により出力された圧縮パターンの復号結果をストア対象データとして当該転送データリクエストを受け取るメモリ内のリクエスト受信部とを有する。
【００２５】
ここで、上記のデータパターン認識回路は、ストア命令である転送データリクエスト中のストア対象データと各特殊データとのデータパターンの同一性を判定するための比較回路群と、データパターンが同一であることを示す信号を出力する比較回路（前記比較回路群を構成する比較回路）に対応する圧縮パターン（各特殊データに対応する各圧縮パターンおよび「特殊データに該当しないこと」を示す圧縮パターン）を選択出力する選択回路とを含む構成によって実現することが可能である。
【００２６】
また、上記のデータパターン認識＆データ生成回路は、特殊データストア命令である転送データリクエスト中の圧縮パターンを復号（伸張）するために当該圧縮パターンのビットパターンの判別を行うデコーダと、前記デコーダの判別結果に基づいて復号結果のデータパターンを選択出力する選択回路とを含む構成によって実現することが可能である。
【００２７】
なお、本発明のストア命令実行制御方式は、より一般的には、複数のプロセッサと複数のメモリとからなる並列コンピュータシステムにおいて（通常、浮動小数点フォーマットにおける特殊数を特殊データとし、浮動小数点演算を行う前記プロセッサを備えるコンピュータシステムに適用されることが、想定される）、前記メモリに対してストア（狭義のストア以外のストア系の動作を含む）を行う際に、ストア対象データのデータパターンを判断し、ストア対象データが特殊データである場合に、そのデータパターンを圧縮し、その特殊データのストア命令に関するパケット構成を特殊データ以外のデータのストア命令に関するパケット構成とは変化させる（パケット数の減少等を行う）前記プロセッサと、前記プロセッサから送信されてくるストア命令中のストア対象データを示すパターンが圧縮されたものである場合に、そのパターンを復号（伸張）した上で当該ストア命令を受信する前記メモリとを有すると表現することができる。
【００２８】
【発明の実施の形態】
次に、本発明について図面を参照して詳細に説明する。
【００２９】
（１）第１の実施の形態
【００３０】
図１は、本発明の第１の実施の形態に係るストア命令実行制御方式の構成を示すブロック図である。
【００３１】
図１に示すように、本実施の形態に係るストア命令実行制御方式は、複数のプロセッサ１００（ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）０，ＣＰＵ１，…，ＣＰＵｎ（ｎは正整数））と複数のメモリ２００（ＭＭＵ（ＭｅｍｏｒｙＭａｎａｇｅｍｅｎｔＵｎｉｔ）０，ＭＭＵ１，…，ＭＭＵｍ（ｍは正整数）といったメモリ管理部を含む）とからなる並列コンピュータシステムであり、プロセッサ１００からメモリ２００へのデータ（命令を含む）の転送が行われる際にデータがパケット分割されて転送されるコンピュータシステムにおいて実現される。
【００３２】
図１を参照すると、本実施の形態に係るストア命令実行制御方式は、プロセッサ１００内のデータパターン認識回路１０１と、リクエスト出力部１０２と、メモリ２００内のデータパターン認識＆データ生成回路２０１と、リクエスト受信部２０２とを含んで構成されている。
【００３３】
本実施の形態、ひいては本発明では、プロセッサ１００側において、リクエスト出力部１０２が、データパターン認識回路１０１の出力に基づき、転送データリクエスト（ストア命令）のパケット構成をストア対象データの内容によって変化させることを特徴としている。また、メモリ２００側において、リクエスト受信部２０２による転送データリクエストの受信の前段で、特殊データストア命令のストア対象データの生成（圧縮パターンの伸張（復号））を行うデータパターン認識＆データ生成回路２０１を設けることを特徴としている。
【００３４】
図２は、本実施の形態に係るストア命令実行制御方式のプロセッサ１００側における処理を示す流れ図である。この処理は、ストア対象データ受け取りステップＡ１と、データパターン判断ステップＡ２と、圧縮パターン送付ステップＡ３と、ストア対象データ判定ステップＡ４と、３パケット構成転送データリクエスト出力ステップＡ５と、４パケット構成転送データリクエスト出力ステップＡ６とからなる。
【００３５】
図３は、本実施の形態に係るストア命令実行制御方式のメモリ２００側における処理（データパターン認識＆データ生成回路２０１の処理）を示す流れ図である。この処理は、転送データリクエスト受信ステップＢ１１と、特殊データストア命令該当判定ステップＢ１２と、圧縮パターン内容判別ステップＢ１３と、８バイトデータパターン生成・出力ステップＢ１４とからなる。
【００３６】
図４は、本実施の形態に係るストア命令実行制御方式のメモリ２００側における処理（リクエスト受信部２０２の処理）を示す流れ図である。この処理は、転送データリクエスト受信ステップＢ２１と、特殊データストア命令該当判定ステップＢ２２と、圧縮パターン復号結果包含転送データリクエスト受け取りステップＢ２３と、受信転送データリクエスト受け取りステップＢ２４とからなる。
【００３７】
図５は、本実施の形態に係るストア命令実行制御方式で採用される特殊データストア命令のパケット構成（プロセッサ１００とメモリ２００との間の通信フォーマット）を示す図である。
【００３８】
なお、従来の技術に関して言及した図１２および図１３は、本実施の形態に係るストア命令実行制御方式における特殊データ以外のデータのストア命令のパケット構成およびロード命令のパケット構成を説明するための図でもある。ただし、本実施の形態においては、従来の技術とは異なり、図４のパケット構成と同様のビット３２が存在する（第１パケットのビット３２に「特殊データのストア命令ではないことを示す“０”が設定される）。
【００３９】
図６は、データパターン認識回路１０１の詳細な回路構成の具体例を示す図である。
【００４０】
データパターン認識回路１０１は、ストア命令である転送データリクエスト中のストア対象データと各特殊データとのデータパターンの同一性を判定するための比較回路群１０１１と、データパターンが同一であることを示す信号を出力する比較回路（比較回路群１０１１を構成する比較回路）に対応する４ビット（ｂ）の圧縮パターン（各特殊データに対応する各圧縮パターンおよび「特殊データに該当しないこと」を示す圧縮パターン（本実施の形態では００００（２進）））を選択出力する選択回路１０１２とを含んで構成されている。
【００４１】
図７は、データパターン認識＆データ生成回路２０１の詳細な回路構成（リクエスト受信部２０２内の選択回路２０２１を含む）の具体例を示す図である。
【００４２】
データパターン認識＆データ生成回路２０１は、特殊データストア命令である転送データリクエスト中のＰＴＮフィールドの４ビットの圧縮パターンを８バイト（Ｂ）のデータパターンに復号（伸張）するためにＰＴＮフィールド中のビットパターンの判別を行うデコーダ２０１１と、デコーダ２０１１の判別結果に基づいて復号結果の８バイトのデータパターンを選択出力する選択回路２０１２とを含んで構成されている。
【００４３】
また、リクエスト受信部２０２は、転送データリクエストの第１パケットの３２ビットが“１”である場合（パケット構成が３パケット構成である場合）にデータパターン認識＆データ生成回路２０１から出力されるデータをリクエスト対象のデータ（ストア対象データ）として選択する選択回路２０２１を含んで構成されている。
【００４４】
図８は、プロセッサ１００内のデータパターン認識回路１０１およびリクエスト出力部１０２にストア対象データを供給する部分（ストア対象データ供給部）の回路構成の一例を示す図である。
【００４５】
図９および図１０は、本実施の形態に係るストア命令実行制御方式の具体的な動作および効果を説明するための図である。
【００４６】
次に、上記のように構成された本実施の形態に係るストア命令実行制御方式の全体の動作について詳細に説明する。
【００４７】
第１に、図５を参照して、本発明の特徴の１つである特殊データストア命令のパケット構成について説明する。
【００４８】
本実施の形態では、特殊データのストア動作時におけるストア命令（特殊データストア命令）のパケット構成が、従来の技術における４パケット構成（図１２参照）から３パケット構成に変更されている。３パケット構成にするために、Ｉ／Ｆ（インタフェース）用のビットとして１パケット中のビット数を１ビットだけ増加させている。そのビットが、図５におけるビット３２である。図５において、第１パケット（１‘ｓｔＰａｃｋｅｔ）のビット３２（特殊データストア命令識別ビット）の“１”が、ストア対象データ（ストアするデータ）が特殊フォーマットのデータ（特殊データ）であることを示している。そして、その特殊データのデータパターンがどのようなパターンであるかは、第３パケット（３‘ｒｄＰａｃｋｅｔ）中のＰＴＮ（ＰａＴｔｅｒＮ）フィールドの内容（４ビットのビットパターン）によって示されている。なお、上記以外の第１パケットおよび第２パケット（２‘ｎｄＰａｃｋｅｔ）中の各フィールドの内容は、図１２中の各フィールドの内容と同様である。
【００４９】
ここで、特殊データのデータパターンとしてどのようなデータパターンを採用するかということや、各特殊データのデータパターンをどのような圧縮パターン（ＰＴＮフィールド中の４ビットパターン）に割り当てるかということは、例えば、以下のように規定することができる。以下では、データの幅を６４ビットとしたときを考えている。
【００５０】
ａ．ＦＦＦＦＦＦＦＦＦＦＦＦＦＦＦＦ（１６進（ｈｅｘ））というデータパターンを、ＰＴＮフィールドにおけるビットパターン（圧縮パターン）の０００１（２進（ｂｉｎ））に割り当てる。
【００５１】
ｂ．００００００００００００００００（１６進）というデータパターンを、ＰＴＮフィールドにおけるビットパターンの００１０（２進）に割り当てる。
【００５２】
ｃ．７ＦＦ０００００００００００００（１６進）というデータパターンを、ＰＴＮフィールドにおけるビットパターンの００１１（２進）に割り当てる。
【００５３】
ｄ．７ＦＦＦＦＦＦＦＦＦＦＦＦＦＦＦ（１６進）というデータパターンを、ＰＴＮフィールドにおけるビットパターンの０１００（２進）に割り当てる。
【００５４】
ここで、ｃおよびｄの特殊データはＩＥＥＥフォーマットにおける浮動小数点数表示における無限大およびＮＡＮ（非数）を表しており、浮動小数点演算においては出てくる可能性が高いため、特殊データとして定められている。
【００５５】
また、上記のａ〜ｄの４つのデータパターンは６４ビットデータ（浮動小数点データでいうところの倍精度）の数字を示しているが、単精度の数字の特殊数についても、以下のｅ〜ｇのようなデータパターンが特殊データのデータパターンとして定められる。
【００５６】
ｅ．ＦＦＦＦＦＦＦＦ００００００００（１６進）というデータパターンを、ＰＴＮフィールドにおけるビットパターンの０１０１（２進）に割り当てる。
【００５７】
ｆ．７Ｆ８０００００００００００００（１６進）というデータパターンを、ＰＴＮフィールドにおけるビットパターンの０１１０（２進）に割り当てる。
【００５８】
ｇ．７ＦＢＦＦＦＦＦ００００００００（１６進）というデータパターンを、ＰＴＮフィールドにおけるビットパターンの０１１１（２進）に割り当てる。
【００５９】
ここで、ｆおよびｇの特殊データは、浮動小数点表示の単精度における無限大およびＮＡＮを表している。
【００６０】
また、ストア対象データが４バイト単位であること（４バイト単位のインタフェースとなっていること）に鑑み、「時系列でみて連続する２つのクロックタイミングにおいて、３２ビット（４バイト）のインタフェースが全て同時に反転し、電流がＭａｘとなる動作」を少なくするという観点から、以下のｈ〜ｊに示すようなデータパターンも特殊データのデータパターンとして定義することが考えられる。このような特殊データのデータパターンを定義し、当該データパターンを流さないようにすることにより、電流の変化量を抑えることができ、ノイズの低減を期待することができる。
【００６１】
ｈ．００００００００ＦＦＦＦＦＦＦＦ（１６進）というデータパターンを、ＰＴＮフィールドにおけるビットパターンの１００１（２進）に割り当てる。
【００６２】
ｉ．００００ＦＦＦＦＦＦＦＦ００００（１６進）というデータパターンを、ＰＴＮフィールドにおけるビットパターンの１０１０（２進）に割り当てる。
【００６３】
ｊ．ＦＦＦＦ００００００００ＦＦＦＦ（１６進）というデータパターンを、ＰＴＮフィールドにおけるビットパターンの１０１１（２進）に割り当てる。
【００６４】
第２に、ストア動作実施時（ストア命令の実行時）におけるプロセッサ１００側の動作について説明する（図２参照）。
【００６５】
本実施の形態では、図８に示すような回路構成におけるソフトウェアビジブルレジスタ（ＳｏｆｔｗａｒｅＶｉｓｉｂｌｅＲｅｇ）からの出力として、プロセッサ１００内のデータパターン認識回路１０１およびリクエスト出力部１０２に対してストア命令である転送データリクエストのストア対象データ（８バイトのデータパターン）が供給される。
【００６６】
データパターン認識回路１０１は、ストア対象データを受け取ると（ステップＡ１）、そのストア対象データのデータパターンを判断する（ステップＡ２）。すなわち、そのデータパターンが特殊データのデータパターンに該当するか否かを判定した上で、特殊データのデータパターンである場合には、どの特殊データのデータパターンに該当するかを判断する。
【００６７】
本実施の形態では、図６中の比較回路群１０１１の各比較回路によって、その転送データリクエスト中のストア対象データのデータパターンと、上記のａ〜ｊに示す各特殊データのデータパターンとの同一性の判定が行われる。
【００６８】
さらに、データパターン認識回路１０１は、ステップＡ２の判断結果に基づき、当該ストア対象データに対応する圧縮パターンをリクエスト出力部１０２に送付する（ステップＡ３）。
【００６９】
本実施の形態では、図６中の選択回路１０１２によって、比較回路群１０１１の比較結果に基づく所定の圧縮パターンが、リクエスト出力部１０２に対して送付される。ここで、「所定の圧縮パターン」とは、先に述べたａ〜ｊの特殊データのデータパターンに割り当てられている圧縮パターンか、特殊データのデータパターン以外のデータパターンに対する圧縮パターン（ここでは、００００（２進））を意味する。
【００７０】
リクエスト出力部１０２は、データパターン認識回路１０１から送られてきた圧縮パターンに基づいて、ストア対象データが特殊データであるか否か（その圧縮パターンが００００（２進）以外であるかどうか）を判定する（ステップＡ４）。
【００７１】
リクエスト出力部１０２は、ステップＡ４で「ストア対象データが特殊データである」と判定した場合には、第１パケットのビット３２が“１”でありデータパターン認識回路１０１から送付されてきた圧縮パターンを第３パケット中のＰＴＮフィールドに有する３パケット構成の転送データリクエストをメモリ２００側に出力（送信）する（ステップＡ５）。
【００７２】
一方、リクエスト出力部１０２は、ステップＡ４で「ストア対象データが特殊データではない」と判定した場合には、第１パケットのビット３２が“０”であり８バイトのストア対象データのデータパターンを第３パケットおよび第４パケットに有する４パケット構成の転送データリクエストをメモリ２００側に出力（送信）する（ステップＡ６）。
【００７３】
第３に、ストア動作実施時（ストア命令の実行時）におけるメモリ２００側の動作について説明する（図３および図４参照）。
【００７４】
メモリ２００内のデータパターン認識＆データ生成回路２０１は、プロセッサ１００側からストア命令である転送データリクエストを受信すると（ステップＢ１１）、その転送データリクエスト中の第１パケットのビット３２の内容（“１”であるか“０”であるか）に基づいて、その転送データリクエストが特殊データストア命令であるか否かを判定する（ステップＢ１２）。
【００７５】
本実施の形態では、図７中のデコーダ２０１１によって、転送データリクエスト中の第１パケットのビット３２の内容の判定が行われる。
【００７６】
データパターン認識＆データ生成回路２０１は、ステップＢ１２で「転送データリクエストが特殊データストア命令である」と判定した場合には、その転送データリクエスト中のＰＴＮフィールドの４ビットの圧縮パターンの内容を判別する（ステップＢ１３）。この判別は、ＰＴＮフィールドの４ビットの圧縮パターンを８バイトのデータパターンに復号するために行われる。
【００７７】
本実施の形態では、図７中のデコーダ２０１１によって、転送データリクエスト中のＰＴＮフィールドのビットパターンの判別が行われる。
【００７８】
さらに、データパターン認識＆データ生成回路２０１は、ステップＢ１３の判別結果に基づいて、８バイトのデータパターン（当該４ビットの圧縮パターンに対応する８バイトのデータパターン）を生成し出力する（ステップＢ１４）。
【００７９】
本実施の形態では、図７中の選択回路２０１２によって、デコーダ２０１１の判別結果に基づき、復号結果の８バイトのデータパターン（当該ＰＴＮフィールド中の４ビットの圧縮パターンに対応する特殊データのデータパターン）が選択出力される。
【００８０】
メモリ２００内のリクエスト受信部２０２は、プロセッサ１００側からストア命令である転送データリクエストを受信すると（ステップＢ２１）、その転送データリクエスト中の第１パケットのビット３２の内容に基づいて、その転送データリクエストが特殊データストア命令であるか否かを判定する（ステップＢ２２）。
【００８１】
本実施の形態では、図７中の選択回路２０２１によって、転送データリクエスト中の第１パケットのビット３２の内容の判定が行われる。
【００８２】
リクエスト受信部２０２は、ステップＢ２２で「転送データリクエストが特殊データストア命令である」と判定した場合には、データパターン認識＆データ生成回路２０１の出力（圧縮パターンの復号結果）をストア対象データとして、当該転送データリクエストを受け取る（ステップＢ２３）。
【００８３】
一方、リクエスト受信部２０２は、ステップＢ２２で「転送データリクエストが特殊データストア命令ではない」と判定した場合には、ステップＢ２１で受信した転送データリクエストをそのまま受け取る（ステップＢ２４）。
【００８４】
本実施の形態では、上述のように、第１パケット（１‘ｓｔパケット）のビット３２が“１”である場合には、通信ストア時の転送データリクエスト（通信パケット）が３パケット構成となっている。したがって、本実施の形態によると、転送データリクエストが特殊データストア命令の場合における転送タイミングが、図１０に示したようなタイミングとなる（図１０中の１−Ｐ１〜１−Ｐ３および３−Ｐ１〜３−Ｐ３参照）。これによって、従来の技術による転送タイミング（図９参照。特に、１−Ｐ１〜１−Ｐ４および３−Ｐ１〜３−Ｐ４参照）よりも、転送にかかるクロック数が減ることととなる。すなわち、スループットの向上を実現することができる。
【００８５】
（２）第２の実施の形態
【００８６】
上記の第１の実施の形態では、特殊データ（特殊データストア命令のストア対象データ）のデータパターンとして、ａ〜ｊに示すようなデータパターンが採用される例を示した。
【００８７】
しかし、特殊データのデータパターンの種類は、ａ〜ｊの１０種には限られず、増減させることも可能である。
【００８８】
第１に、第１の実施の形態に対して特殊データのデータパターンの種類を減少させる場合としては、最も必要と考えられるａ〜ｇのデータパターンだけを、特殊データのデータパターンとして採用することが考えられる。
【００８９】
なお、この場合には、第３パケット中のＰＴＮフィールド（図５参照）におけるビット数を３に減少させることが可能になる。
【００９０】
第２に、第１の実施の形態に対して特殊データのデータパターンの種類を増加させる場合としては、ａ〜ｊのデータパターンに、さらに、他の特殊データのデータパターンを加えることが可能である。
【００９１】
このとき、第３パケット中のＰＴＮフィールドにおけるビット数は増加することが可能である（図５に示した４ビットというビット数に限定されない）ので、第１の実施の形態で示したＰＴＮフィールドにおけるビット数の４によってデータパターン数が制限されるということもない。
【００９２】
ここで、「他の特殊データのデータパターン」としては、例えば、電送線路上で時系列でみたときに連続する２つのタイミングで反転するデータパターンを回避するという観点から、以下のｋおよびｌに示すようなデータパターンを考えることができる。
【００９３】
ｋ．００ＦＦ００ＦＦＦＦ００ＦＦ００（１６進）という８バイトのデータパターン
【００９４】
ｌ．ＡＡＡＡＡＡＡＡ５５５５５５５５（１６進）という８バイトのデータパターン
【００９５】
上記のｋの例では、各４バイトのデータは、次のようなタイミング（ｔｉｍｉｎｇ１およびｔｉｍｉｎｇ２）になり、連続する２つのクロックタイミングで電送線路上のパターンが反転することとなる。
【００９６】
ｔｉｍｉｎｇ１００ＦＦ００ＦＦ
ｔｉｍｉｎｇ２ＦＦ００ＦＦ００
【００９７】
また、上記のｌの例では、各４バイトのデータは、次のようなタイミング（ｔｉｍｉｎｇ１およびｔｉｍｉｎｇ２）になり、連続する２つのクロックタイミングで電送線路上のパターンが反転することとなる。
【００９８】
ｔｉｍｉｎｇ１ＡＡＡＡＡＡＡＡ
ｔｉｍｉｎｇ２５５５５５５５５
【００９９】
このような考え方で、特殊データのデータパターンは、さらに増やすことが可能である。
【０１００】
（３）第３の実施の形態
【０１０１】
上記の第１の実施の形態では、プロセッサ１００内のデータパターン認識回路１０１およびリクエスト出力部１０２に対して、図８に示すような回路構成のストア対象データ供給部によって、ストア命令である転送データリクエストのストア対象データ（８バイトのデータパターン）が供給されていた。
【０１０２】
しかし、ストア対象データ供給部は、このような回路構成のものに限定されることはない。例えば、ストア対象データ供給部を、図１１に示すような回路構成とすることも可能である。
【０１０３】
すなわち、図１１は、プロセッサ１００内のデータパターン認識回路１０１およびリクエスト出力部１０２にストア対象データを供給する部分（ストア対象データ供給部）の回路構成の他の例を示す図である。この場合には、データパターン認識回路１０１がソフトウェアビジブルレジスタ（ＳｏｆｔｗａｒｅＶｉｓｉｂｌｅＲｅｇ）の前段に設置されることになる。
【０１０４】
以下に、第１の実施の形態（図８に示すストア対象データ供給部が採用される実施の形態）と第３の実施の形態（図１１に示すストア対象データ供給部が採用される実施の形態）との差異について、説明を加える。
【０１０５】
図８および図１１には、浮動小数点演算器と演算結果を書き込むソフトウェアビジブルレジスタと本発明に関するリクエスト出力装置（データパターン認識回路１０１およびリクエスト出力部１０２）とを組み合わせた回路構成が示されている。
【０１０６】
転送データリクエストに載せるデータ（ストア対象データ）は、ソフトウェアビジブルレジスタから読み出される。
【０１０７】
ここで、もともと、浮動小数点フォーマットの特殊数フォーマットは、演算器間の差し替え（ある演算器の演算結果をソフトウェアビジブルレジスタに書き込まずに直接演算器の入力にすること）で使用されている。図８において、最初にＦｌｏａｔ０にて演算が行われる。Ｆｌｏａｔ０の演算結果がＦｌｏａｔ０Ｒに入力され、その後、ソフトウェアビジブルレジスタに書き込まれる。その次のクロックタイミングで、Ｆｌｏａｔ１にてＦｌｏａｔ０の先ほどの演算結果がオペランドとして入力される場合、ソフトウェアビジブルレジスタからの読み出しは行われず、Ｆｌｏａｔ０Ｒから直接Ｆｌｏａｔ１の入力データとして差し替えられる。この際、特殊数に関しては特定のフォーマットを認識することにより、演算器の入力データが再生されている。
【０１０８】
図１１に示す回路構成は、このような「浮動小数点フォーマットの特殊数フォーマットが演算器間の差し替えで使用されていること」に注目して、演算器における特殊フォーマットをメモリ２００側に送信される転送データリクエストにも応用した例を示すものである。
【０１０９】
図１１に示す回路構成では、データパターン認識回路１０１がソフトウェアビジブルレジスタの前段に設置され、特殊数を示す４ビットパターンがソフトウェアビジブルレジスタ内のデータに付加されている（この４ビットパターンは図４中のＰＴＮフィールドにおける圧縮パターンと同様である）。これにより、浮動小数点演算結果の特殊数のフィールドがソフトウェアビジブルレジスタに同時に書き込まれる。
【０１１０】
リクエスト出力部１０２は、転送データリクエストをメモリ２００に転送する際に、このようなソフトウェアビジブルレジスタ内のデータを入力して、そのデータをストア対象データとして、メモリ２００側に転送データリクエストを送信・出力する（ソフトウェアビジブルレジスタの中にある特殊数フォーマットも同時に転送することとなる）。
【０１１１】
（４）その他の実施の形態
【０１１２】
さらに、上記の第１の実施の形態に対しては、以下のａ〜ｃに示すようなことが言える。すなわち、下記の限定がない変形形態（拡張形態）を考えることができる。
【０１１３】
ａ．ストア対象データの内容が、浮動小数点フォーマットのデータに限定されない。
【０１１４】
ｂ．パケット構成が、図５や図１２に示すものに限定されない。
【０１１５】
ｃ．先にも言及したように、ＰＴＮフィールドにおけるビット数が、４に限定されない。
【０１１６】
【発明の効果】
以上説明したように、本発明によると、ストア対象データのデータパターンを判断し、特定のデータパターンの際にはデータを圧縮してパケット数を減らすことにより、ストア性能（スループットの向上等）を実現することができるという効果が生じる。
【図面の簡単な説明】
【図１】本発明の第１の実施の形態に係るストア命令実行制御方式の構成を示すブロック図である。
【図２】図１に示すストア命令実行制御方式のプロセッサ側の処理を示す流れ図である。
【図３】図１に示すストア命令実行制御方式のメモリ側の処理を示す流れ図である。
【図４】図１に示すストア命令実行制御方式のメモリ側の処理を示す流れ図である。
【図５】図１に示すストア命令実行制御方式で採用される特殊データストア命令のパケット構成を示す図である。
【図６】図１中のデータパターン認識回路の詳細な回路構成の具体例を示す図である。
【図７】図１中のデータパターン認識＆データ生成回路の詳細な回路構成（リクエスト受信部内の選択回路を含む）の具体例を示す図である。
【図８】図１中のプロセッサ内のデータパターン認識回路およびリクエスト出力部にストア対象データを供給する部分（ストア対象データ供給部）の構成の一例を示す図である。
【図９】図１に示すストア命令実行制御方式の具体的な動作および効果を説明するための図である。
【図１０】図１に示すストア命令実行制御方式の具体的な動作および効果を説明するための図である。
【図１１】図１中のプロセッサ内のデータパターン認識回路およびリクエスト出力部にストア対象データを供給する部分（ストア対象データ供給部）の構成の他の例を示す図である。
【図１２】従来の技術におけるストア命令のパケット構成を示す図である。
【図１３】従来の技術におけるロード命令のパケット構成を示す図である。
【符号の説明】
１００プロセッサ
１０１データパターン認識回路
１０２リクエスト出力部
２００メモリ
２０１データパターン認識＆データ生成回路
２０２リクエスト受信部
１０１１比較回路群
１０１２，２０１２，２０２１選択回路
２０１１デコーダ[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a parallel computer system including a plurality of processors and a plurality of memories. In a computer system which performs packet transfer when transferring information from the processor to the memory, a store instruction (store in a narrow sense) is provided. (Including store instructions other than instructions).
[0002]
[Prior art]
First, a general packet configuration of a transfer data request (including a load instruction (including a load-related instruction other than a load instruction in a narrow sense) and a store instruction) in the related art will be described.
[0003]
FIG. 12 and FIG. 13 are diagrams showing a packet configuration of a transfer data request in the conventional technology. FIG. 12 is a diagram illustrating a packet configuration of a store instruction, and FIG. 13 is a diagram illustrating a packet configuration of a load instruction.
[0004]
The meaning of each field in FIGS. 12 and 13 will be described below.
[0005]
V (bit 0 of the first packet (1'stPacket)) is the V bit of the transfer data request, and when this bit is "1", it indicates that the request is valid.
[0006]
Bit 1 of the first packet is a bit indicating whether the transfer data request is a store instruction or a load instruction.
[0007]
In the case of a store instruction (when bit 1 of the first packet is “0”), the number of packets is large because data to be stored is required. In the case of this format, the data to be stored is allocated to a third packet (3'rdPacket) and a fourth packet (4'thPacket) (each 4 bytes (B) of data is transferred as the data to be stored). ).
[0008]
The fields of an upper address (Address Upper) in the first packet and a lower address (Address Lower) in the second packet are fields indicating an address to be stored (in the case of a load instruction described later, these fields are addresses to be loaded). Will be shown).
[0009]
The Zone field (bits 0 to 7 of the second packet (2'ndPacket)) is a group of bits indicating the corresponding byte position at the time of RMW (Read Modify Write). At the time of RMW, an operation (OR or AND) is performed only on a lit byte whose corresponding Zone bit is “1”.
[0010]
I (bit 8 of the second packet) is an exception bit. If this bit indicates that "an exception has occurred in the request", no operation is performed on the memory side (both read and write operations are not performed).
[0011]
E (bit 9 of the second packet) is a bit indicating whether or not to perform exception detection. When this bit is “1”, it indicates that no exception detection is performed for the request, and when this bit is “0”, it indicates that exception detection is performed for the request.
[0012]
The ELE (ELElement) field (bits 10 to 18 of the second packet) is a field indicating the identification number of the request added to the transfer data request. This identification number is assigned one number for one request on the processor side, and is transferred to the memory. On the memory side, the real identification number (ELE number) is returned to the processor at the end of the processing. By receiving the ELE number, the processor recognizes that the request has been correctly processed, and releases the ELE number. When the ELE number is released, it can be reused.
[0013]
In the case of a load instruction, bit 1 of the first packet is “1”, the packet configuration is composed of two packets (the configuration in which the third packet and the fourth packet for the data to be stored do not exist) and Other than that, it is the same as the case of the above store instruction.
[0014]
In the conventional store instruction execution control method, the above-described packet configuration is employed as the packet configuration of the transfer data request. In the communication between the processor and the memory, the transfer is performed with a specified number of packets, and the special data is transferred. Regarding a store instruction (such as a special number in a floating-point format), no consideration such as a change in the number of packets (compression of data to be stored) has been made.
[0015]
Heretofore, there has been a technique that considers compression of data to be transferred (for example, see Patent Literature 1 and Patent Literature 2).
[0016]
However, such a conventional technique does not have components specific to the present invention, such as the data pattern recognition circuit 101 and the data pattern recognition & data generation circuit 201 in FIG. 1, and is clearly different from the present invention. Are different.
[0017]
Specifically, the differences between the inventions described in Patent Literatures 1 and 2 cited above and the store instruction execution control method of the present invention are as follows.
[0018]
According to the invention described in Patent Document 1, when registering data in the cache, when the fixed data pattern ALL0 or ALL1 is registered, the flag is set in the address cache unit without registering in the cache. The purpose (the configuration resulting from the purpose) is different from the present invention. Further, the data disclosed in this publication has only data compression patterns of ALL0 and ALL1, and does not support special numbers and the like in IEEE (Institute of Electrical and Electronics Engineers) floating point format. Increases).
[0019]
The invention described in Patent Literature 2 has a technology in which a processor having a performance equivalent to that of a normal processor is used as a DMA controller in data transfer on a bus in a multiprocessor configuration, and compression and decompression of transfer data is performed using the DMA controller. Therefore, the configuration is different from the present invention. According to the present invention, the load on the data bus can be reduced with a small amount of hardware.
[0020]
[Patent Document 1]
JP 2000-285019 A (page 3-4)
[0021]
[Patent Document 2]
JP 2001-117893 A (page 4)
[0022]
[Problems to be solved by the invention]
In the above-described conventional technique, when executing the store instruction, no consideration is given to changing the packet configuration when the data to be stored is special data, and the specified number of packets in the communication between the processor and the memory is not considered. Therefore, the store operation time is constant once the length of data is determined. Therefore, in the related art, there is a problem that there is a limit based on a packet configuration in improving store performance such as throughput.
[0023]
In view of the above, an object of the present invention is to store data (such as throughput) by compressing data to be stored and reducing the number of packets to be transferred when the data pattern of the data to be stored is a specific data pattern. ) Is provided.
[0024]
[Means for Solving the Problems]
According to the store instruction execution control method of the present invention, in a parallel computer system including a plurality of processors and a plurality of memories, when a transfer data request as a store instruction is transmitted, a data pattern of data to be stored corresponds to a data pattern of special data. Processor that determines whether the data pattern of special data corresponds to the data pattern of the special data, and sends a compression pattern corresponding to the data to be stored based on the result of the determination. It is determined whether or not the data to be stored is special data based on the data pattern recognition circuit within and the compression pattern sent from the data pattern recognition circuit. If `` Yes '', the special data store instruction (E.g., "1" in the bit 32 of the first packet in FIG. 5) indicating that the instruction is an instruction, and holds the compression pattern sent from the data pattern recognition circuit. A request output unit in the processor for outputting (transmitting) a transfer data request having a packet configuration having a smaller number of packets than the store instructions to the memory side, and receiving a transfer data request as a store instruction from the processor side, the transfer data It is determined whether or not the transfer data request is a special data store instruction based on the content of the special data store instruction identification bit in the request, and when the determination result is "the transfer data request is a special data store instruction" Determines the contents of the compression pattern in the transfer data request, A data pattern recognition and data generation circuit in a memory for generating and outputting a data pattern of storage target data obtained by decoding the compression pattern based on the data, and receiving a transfer data request as a store instruction from the processor side, It is determined whether or not the transfer data request is a special data store instruction based on the content of the special data store instruction identification bit in the above.If the determination result is "the transfer data request is a special data store instruction", Has a request receiving unit in a memory that receives the transfer data request with the decoding result of the compressed pattern output by the data pattern recognition & data generation circuit as storage target data.
[0025]
Here, the above data pattern recognition circuit has the same data pattern as the comparison circuit group for determining the identity of the data pattern between the data to be stored in the transfer data request as the store instruction and each special data. Compression patterns (compression patterns corresponding to each special data and compression patterns indicating “not applicable to special data”) corresponding to a comparison circuit (a comparison circuit constituting the comparison circuit group) that outputs a signal indicating that It can be realized by a configuration including a selection circuit for selecting and outputting.
[0026]
Further, the data pattern recognition & data generation circuit includes a decoder for determining a bit pattern of the compressed pattern in order to decode (expand) a compressed pattern in a transfer data request which is a special data store instruction, and a decoder for the decoder. And a selection circuit for selecting and outputting a data pattern of a decoding result based on the determination result.
[0027]
The store instruction execution control method of the present invention is more generally used in a parallel computer system including a plurality of processors and a plurality of memories (usually, a special number in a floating-point format is used as special data, and a floating-point operation is performed). It is assumed that the present invention is applied to a computer system including the processor that performs the above-described processing. When performing a store (including a store operation other than a store in a narrow sense) on the memory, a data pattern of data to be stored is changed. Judgment is made, and if the data to be stored is special data, the data pattern is compressed, and the packet configuration related to the special data store instruction is changed from the packet configuration related to the data non-special data store instruction (the number of packets). The processor) and transmitted from the processor When the pattern indicating the store target data in the store instruction that is obtained is compressed, it can be expressed as having said memory for receiving the store instruction to the pattern on the decrypted (stretch).
[0028]
BEST MODE FOR CARRYING OUT THE INVENTION
Next, the present invention will be described in detail with reference to the drawings.
[0029]
(1) First embodiment
[0030]
FIG. 1 is a block diagram showing a configuration of a store instruction execution control method according to the first embodiment of the present invention.
[0031]
As shown in FIG. 1, the store instruction execution control method according to the present embodiment includes a plurality of processors 100 (CPU (Central Processing Unit) 0, CPU1,..., CPUn (n is a positive integer)) and a plurality of memories 200. (Including a memory management unit such as MMU (Memory Management Unit) 0, MMU1,..., MMUm (m is a positive integer)), and is used for transferring data (including instructions) from the processor 100 to the memory 200. This is realized in a computer system in which data is divided into packets and transferred when the transfer is performed.
[0032]
Referring to FIG. 1, a store instruction execution control method according to the present embodiment includes a data pattern recognition circuit 101 in a processor 100, a request output unit 102, a data pattern recognition & data generation circuit 201 in a memory 200, And a request receiving unit 202.
[0033]
In the present embodiment, and eventually in the present invention, in the processor 100, the request output unit 102 changes the packet configuration of the transfer data request (store instruction) according to the contents of the data to be stored, based on the output of the data pattern recognition circuit 101. It is characterized by: On the memory 200 side, before receiving the transfer data request by the request receiving unit 202, a data pattern recognition & data generation circuit 201 for generating (decompressing (decoding) a compression pattern) data to be stored according to a special data store instruction. Is provided.
[0034]
FIG. 2 is a flowchart showing processing on the processor 100 side in the store instruction execution control method according to the present embodiment. This processing includes a storing target data receiving step A1, a data pattern determining step A2, a compression pattern sending step A3, a storing target data determining step A4, a three-packet transfer data request output step A5, and a four-packet transfer data. Request output step A6.
[0035]
FIG. 3 is a flowchart showing processing (processing of the data pattern recognition & data generation circuit 201) on the memory 200 side of the store instruction execution control method according to the present embodiment. This processing includes a transfer data request receiving step B11, a special data store instruction matching determination step B12, a compression pattern content determination step B13, and an 8-byte data pattern generation / output step B14.
[0036]
FIG. 4 is a flowchart showing processing (processing of the request receiving unit 202) on the memory 200 side in the store instruction execution control method according to the present embodiment. This processing includes a transfer data request receiving step B21, a special data store instruction matching determination step B22, a compression pattern decoding result inclusion transfer data request receiving step B23, and a reception transfer data request receiving step B24.
[0037]
FIG. 5 is a diagram showing a packet configuration (a communication format between the processor 100 and the memory 200) of a special data store instruction employed in the store instruction execution control method according to the present embodiment.
[0038]
FIGS. 12 and 13 referred to in relation to the prior art are diagrams for explaining a packet configuration of a store instruction and a load instruction of data other than special data in the store instruction execution control method according to the present embodiment. But also. However, in the present embodiment, unlike the conventional technique, there is a bit 32 similar to the packet configuration of FIG. 4 (bit 32 of the first packet is set to “0 indicating that it is not a special data store instruction”. "Is set).
[0039]
FIG. 6 is a diagram showing a specific example of a detailed circuit configuration of the data pattern recognition circuit 101.
[0040]
The data pattern recognition circuit 101 indicates that the data pattern is the same as the comparison circuit group 1011 for judging the data pattern identity between the data to be stored in the transfer data request as the store instruction and each special data. 4-bit (b) compression pattern (compression pattern corresponding to each special data and compression indicating "not applicable to special data") corresponding to a comparison circuit that outputs a signal (comparison circuits forming comparison circuit group 1011) And a selection circuit 1012 for selectively outputting a pattern (0000 (binary) in the present embodiment).
[0041]
FIG. 7 is a diagram showing a specific example of a detailed circuit configuration of the data pattern recognition & data generation circuit 201 (including the selection circuit 2021 in the request receiving unit 202).
[0042]
The data pattern recognition & data generation circuit 201 decodes (expands) the 4-bit compression pattern of the PTN field in the transfer data request, which is a special data store instruction, into an 8-byte (B) data pattern. The decoder includes a decoder 2011 for determining a bit pattern, and a selection circuit 2012 for selecting and outputting an 8-byte data pattern of a decoding result based on the determination result of the decoder 2011.
[0043]
When the 32 bits of the first packet of the transfer data request are “1” (when the packet configuration is a three-packet configuration), the request receiving unit 202 outputs the data output from the data pattern recognition & data generation circuit 201. As a request target data (store target data).
[0044]
FIG. 8 is a diagram illustrating an example of a circuit configuration of a portion (storage target data supply unit) that supplies storage target data to the data pattern recognition circuit 101 and the request output unit 102 in the processor 100.
[0045]
9 and 10 are diagrams for explaining specific operations and effects of the store instruction execution control method according to the present embodiment.
[0046]
Next, the overall operation of the store instruction execution control method according to the present embodiment configured as described above will be described in detail.
[0047]
First, the packet configuration of a special data store instruction, which is one of the features of the present invention, will be described with reference to FIG.
[0048]
In the present embodiment, the packet configuration of a store instruction (special data store instruction) at the time of a special data store operation is changed from a four-packet configuration (see FIG. 12) in the related art to a three-packet configuration. In order to form a three-packet configuration, the number of bits in one packet as an I / F (interface) bit is increased by one bit. That bit is bit 32 in FIG. In FIG. 5, "1" of bit 32 (special data store instruction identification bit) of the first packet (1'stPacket) indicates that the data to be stored (data to be stored) is data of a special format (special data). Is shown. The pattern of the special data is indicated by the contents (4-bit bit pattern) of the PTN (PaTterN) field in the third packet (3'rdPacket). The contents of each field in the first packet and the second packet (2'ndPacket) other than the above are the same as the contents of each field in FIG.
[0049]
Here, what kind of data pattern is adopted as the data pattern of special data, and what kind of compression pattern (4-bit pattern in the PTN field) is assigned to the data pattern of each special data are as follows. For example, it can be defined as follows. In the following, it is assumed that the data width is 64 bits.
[0050]
a. A data pattern of FFFFFFFFFFFFFFFF (hexadecimal (hex)) is assigned to 0001 (binary (bin)) of a bit pattern (compression pattern) in the PTN field.
[0051]
b. The data pattern 0000000000000000 (hex) is assigned to the bit pattern 0010 (binary) in the PTN field.
[0052]
c. A data pattern of 7FF0000000000000 (hexadecimal) is assigned to a bit pattern of 0011 (binary) in the PTN field.
[0053]
d. A data pattern of 7FFFFFFFFFFFFFFF (hexadecimal) is assigned to a bit pattern 0100 (binary) in the PTN field.
[0054]
Here, the special data of c and d represent infinity and NAN (not a number) in the floating-point number representation in the IEEE format, and are likely to appear in the floating-point operation, so they are defined as special data. ing.
[0055]
Although the above four data patterns a to d indicate 64-bit data (double precision in floating point data), the following e to g also apply to special numbers of single precision numbers. Is defined as the data pattern of the special data.
[0056]
e. The data pattern FFFFFFFF00000000 (hexadecimal) is assigned to the bit pattern 0101 (binary) in the PTN field.
[0057]
f. The data pattern 7F8000000000000 (hex) is assigned to the bit pattern 0110 (binary) in the PTN field.
[0058]
g. A data pattern of 7FBBFFFFF00000000 (hexadecimal) is assigned to 0111 (binary) of the bit pattern in the PTN field.
[0059]
Here, the special data of f and g represent infinity and NAN in single precision of floating-point notation.
[0060]
In addition, in view of the fact that the data to be stored is in units of 4 bytes (it is an interface in units of 4 bytes), "when two clock timings are continuous in a time series, all 32-bit (4 bytes) interfaces are used. From the viewpoint of reducing the number of operations that are simultaneously inverted and the current becomes Max, the following data patterns h to j may be defined as special data data patterns. By defining a data pattern of such special data and preventing the data pattern from flowing, the amount of change in current can be suppressed, and a reduction in noise can be expected.
[0061]
h. The data pattern 00000000FFFFFFFF (hexadecimal) is assigned to the bit pattern 1001 (binary) in the PTN field.
[0062]
i. A data pattern of 0000FFFFFFFF0000 (hexadecimal) is assigned to a bit pattern of 1010 (binary) in the PTN field.
[0063]
j. The data pattern FFFF00000000FFFF (hexadecimal) is assigned to the bit pattern 1011 (binary) in the PTN field.
[0064]
Second, the operation of the processor 100 when the store operation is performed (when the store instruction is executed) will be described (see FIG. 2).
[0065]
In the present embodiment, as an output from a software visible register (SoftwareVisibleReg) having a circuit configuration as shown in FIG. 8, a transfer data request as a store instruction to the data pattern recognition circuit 101 and the request output unit 102 in the processor 100 is sent. (8-byte data pattern) is supplied.
[0066]
When receiving the data to be stored (step A1), the data pattern recognition circuit 101 determines the data pattern of the data to be stored (step A2). That is, it is determined whether the data pattern corresponds to the data pattern of the special data, and if the data pattern is the data pattern of the special data, it is determined which data pattern of the special data corresponds.
[0067]
In the present embodiment, the data patterns of the data to be stored in the transfer data request and the data patterns of the special data shown in the above a to j are determined by the respective comparison circuits of the comparison circuit group 1011 in FIG. A gender determination is made.
[0068]
Further, the data pattern recognition circuit 101 sends a compression pattern corresponding to the storage target data to the request output unit 102 based on the determination result of step A2 (step A3).
[0069]
In the present embodiment, a predetermined compression pattern based on the comparison result of the comparison circuit group 1011 is sent to the request output unit 102 by the selection circuit 1012 in FIG. Here, the “predetermined compression pattern” is a compression pattern assigned to the data patterns of the special data a to j described above or a compression pattern for a data pattern other than the data pattern of the special data (here, 0000 (binary)).
[0070]
The request output unit 102 determines whether or not the storage target data is special data (whether or not the compression pattern is other than 0000 (binary)) based on the compression pattern sent from the data pattern recognition circuit 101. A determination is made (step A4).
[0071]
If the request output unit 102 determines in step A4 that “the storage target data is special data”, the bit 32 of the first packet is “1” and the compression pattern transmitted from the data pattern recognition circuit 101 is “1”. Is output (transmitted) to the memory 200 side in a three-packet transfer data request having the PTN field in the third packet (step A5).
[0072]
On the other hand, if the request output unit 102 determines in step A4 that “the storage target data is not special data”, the bit 32 of the first packet is “0” and the data pattern of the 8-byte storage target data is It outputs (transmits) a transfer data request having a 4-packet configuration included in the third packet and the fourth packet to the memory 200 side (step A6).
[0073]
Third, the operation of the memory 200 when the store operation is performed (when the store instruction is executed) will be described (see FIGS. 3 and 4).
[0074]
When the data pattern recognition & data generation circuit 201 in the memory 200 receives a transfer data request as a store command from the processor 100 side (step B11), the contents of bit 32 (“1”) of the first packet in the transfer data request Is determined as to whether or not the transfer data request is a special data store instruction (step B12).
[0075]
In the present embodiment, the content of the bit 32 of the first packet in the transfer data request is determined by the decoder 2011 in FIG.
[0076]
If the data pattern recognition & data generation circuit 201 determines in step B12 that "the transfer data request is a special data store instruction", the data pattern recognition & data generation circuit 201 determines the contents of the 4-bit compression pattern of the PTN field in the transfer data request. (Step B13). This determination is performed to decode the 4-bit compression pattern of the PTN field into an 8-byte data pattern.
[0077]
In the present embodiment, the bit pattern of the PTN field in the transfer data request is determined by the decoder 2011 in FIG.
[0078]
Further, the data pattern recognition & data generation circuit 201 generates and outputs an 8-byte data pattern (an 8-byte data pattern corresponding to the 4-bit compression pattern) based on the determination result in step B13 (step B14). ).
[0079]
In the present embodiment, the 8-byte data pattern of the decoding result (the data pattern of the special data corresponding to the 4-bit compression pattern in the PTN field) is selected by the selection circuit 2012 in FIG. ) Is selected and output.
[0080]
Upon receiving a transfer data request as a store command from the processor 100 side (step B21), the request reception unit 202 in the memory 200 receives the transfer data request based on the content of bit 32 of the first packet in the transfer data request. It is determined whether the request is a special data store instruction (step B22).
[0081]
In the present embodiment, the selection circuit 2021 in FIG. 7 determines the content of bit 32 of the first packet in the transfer data request.
[0082]
If the request receiving unit 202 determines in step B22 that “the transfer data request is a special data store command”, the request (result of decoding the compressed pattern) of the data pattern recognition & data generation circuit 201 is used as storage target data. , Receives the transfer data request (step B23).
[0083]
On the other hand, when it is determined in step B22 that the transfer data request is not a special data store instruction, the request receiving unit 202 receives the transfer data request received in step B21 as it is (step B24).
[0084]
In the present embodiment, as described above, when the bit 32 of the first packet (1'st packet) is "1", the transfer data request (communication packet) at the time of communication storage has a three-packet configuration. ing. Therefore, according to the present embodiment, when the transfer data request is a special data store instruction, the transfer timing is as shown in FIG. 10 (1-P1 to 1-P3 and 3-P1 in FIG. 10). ３−3-P3). As a result, the number of clocks required for the transfer is reduced as compared with the transfer timing according to the conventional technique (see FIG. 9; particularly, see 1-P1 to 1-P4 and 3-P1 to 3-P4). That is, an improvement in throughput can be realized.
[0085]
(2) Second embodiment
[0086]
In the above-described first embodiment, an example has been described in which the data patterns a to j are employed as the data patterns of the special data (data to be stored by the special data store instruction).
[0087]
However, the types of the data patterns of the special data are not limited to the ten types a to j, and can be increased or decreased.
[0088]
First, in the case where the types of the data patterns of the special data are reduced with respect to the first embodiment, only the data patterns a to g considered to be the most necessary are adopted as the data patterns of the special data. Can be considered.
[0089]
In this case, the number of bits in the PTN field (see FIG. 5) in the third packet can be reduced to three.
[0090]
Second, as a case of increasing the types of data patterns of special data with respect to the first embodiment, it is possible to add data patterns of other special data to the data patterns of a to j. is there.
[0091]
At this time, since the number of bits in the PTN field in the third packet can be increased (not limited to the number of bits of 4 bits shown in FIG. 5), the number of bits in the PTN field shown in the first embodiment can be increased. The number of data patterns is not limited by the number of bits of four.
[0092]
Here, as the “data pattern of other special data”, for example, from the viewpoint of avoiding a data pattern that is inverted at two consecutive timings when viewed in time series on a transmission line, the following k and l are used. The following data pattern can be considered.
[0093]
k. 8-byte data pattern of 00FF00FFFF00FF00 (hexadecimal)
[0094]
l. 8-byte data pattern of AAAAAAA555555555 (hexadecimal)
[0095]
In the above example of k, each 4-byte data has the following timing (timing1 and timing2), and the pattern on the transmission line is inverted at two consecutive clock timings.
[0096]
timing1 00FF00FF
timing2 FF00FF00
[0097]
Further, in the above example 1, the 4-byte data has the following timings (timing1 and timing2), and the pattern on the transmission line is inverted at two consecutive clock timings.
[0098]
timing1 AAAAAAAA
timing2 55555555
[0099]
With such a concept, the data pattern of the special data can be further increased.
[0100]
(3) Third embodiment
[0101]
In the above-described first embodiment, the transfer target data transfer unit having the circuit configuration as shown in FIG. The request storage target data (8-byte data pattern) was supplied.
[0102]
However, the storage target data supply unit is not limited to such a circuit configuration. For example, the storage target data supply unit may have a circuit configuration as shown in FIG.
[0103]
That is, FIG. 11 is a diagram illustrating another example of a circuit configuration of a part (storage target data supply unit) that supplies storage target data to the data pattern recognition circuit 101 and the request output unit 102 in the processor 100. In this case, the data pattern recognizing circuit 101 is provided at a stage preceding the software visible register (SoftwareVisibleReg).
[0104]
Hereinafter, the first embodiment (the embodiment in which the storage target data supply unit shown in FIG. 8 is adopted) and the third embodiment (the embodiment in which the storage target data supply unit shown in FIG. 11 is adopted) will be described. Mode) will be described.
[0105]
8 and 11 show a circuit configuration in which a floating-point arithmetic unit, a software visible register for writing an operation result, and a request output device (data pattern recognition circuit 101 and request output unit 102) according to the present invention are combined. .
[0106]
Data (storage target data) to be included in the transfer data request is read from the software visible register.
[0107]
Here, the special number format of the floating-point format is originally used for replacement between arithmetic units (directly inputting the arithmetic result of an arithmetic unit to the arithmetic unit without writing the result to a software visible register). In FIG. 8, an operation is first performed in Float0. The operation result of Float0 is input to Float0R, and then written to the software visible register. At the next clock timing, when the previous operation result of Float0 is input as an operand in Float1, reading from the software visible register is not performed, and the data is directly replaced by Float0R as input data of Float1. At this time, the input data of the arithmetic unit is reproduced by recognizing a specific format for the special number.
[0108]
The circuit configuration shown in FIG. 11 pays attention to the fact that the special number format of the floating-point format is used for replacement between arithmetic units, and transmits the special format in the arithmetic unit to the memory 200 side. This shows an example applied to a transfer data request.
[0109]
In the circuit configuration shown in FIG. 11, a data pattern recognizing circuit 101 is provided at a stage preceding the software visible register, and a 4-bit pattern indicating a special number is added to the data in the software visible register (this 4-bit pattern is shown in FIG. (The same as the compression pattern in the PTN field in the middle.) As a result, the special number field of the floating-point operation result is simultaneously written into the software visible register.
[0110]
When transferring the transfer data request to the memory 200, the request output unit 102 inputs the data in such a software visible register, and transmits the transfer data request to the memory 200 using the data as storage target data. Output (special number format in software visible register is also transferred at the same time).
[0111]
(4) Other embodiments
[0112]
Further, the following a to c can be applied to the first embodiment. That is, the following modified forms (extended forms) without limitation can be considered.
[0113]
a. The content of the data to be stored is not limited to the data in the floating-point format.
[0114]
b. The packet configuration is not limited to those shown in FIGS.
[0115]
c. As mentioned above, the number of bits in the PTN field is not limited to four.
[0116]
【The invention's effect】
As described above, according to the present invention, the storage performance (improvement of throughput, etc.) is determined by judging the data pattern of the data to be stored and compressing the data in the case of a specific data pattern to reduce the number of packets. There is an effect that it can be realized.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a store instruction execution control method according to a first embodiment of the present invention.
FIG. 2 is a flowchart showing processing on the processor side in the store instruction execution control method shown in FIG. 1;
FIG. 3 is a flowchart showing processing on the memory side in the store instruction execution control method shown in FIG. 1;
FIG. 4 is a flowchart showing processing on the memory side in the store instruction execution control method shown in FIG. 1;
FIG. 5 is a diagram showing a packet configuration of a special data store instruction employed in the store instruction execution control method shown in FIG. 1;
FIG. 6 is a diagram showing a specific example of a detailed circuit configuration of a data pattern recognition circuit in FIG. 1;
FIG. 7 is a diagram showing a specific example of a detailed circuit configuration (including a selection circuit in a request receiving unit) of a data pattern recognition & data generation circuit in FIG. 1;
8 is a diagram illustrating an example of a configuration of a portion (storage target data supply unit) that supplies storage target data to a data pattern recognition circuit and a request output unit in the processor in FIG. 1;
FIG. 9 is a diagram for explaining specific operations and effects of the store instruction execution control method shown in FIG. 1;
FIG. 10 is a diagram for explaining specific operations and effects of the store instruction execution control system shown in FIG. 1;
11 is a diagram illustrating another example of the configuration of a portion (storage target data supply unit) that supplies storage target data to a data pattern recognition circuit and a request output unit in the processor in FIG. 1;
FIG. 12 is a diagram showing a packet configuration of a store instruction in the related art.
FIG. 13 is a diagram showing a packet configuration of a load instruction according to the related art.
[Explanation of symbols]
100 processor
101 Data pattern recognition circuit
102 Request output section
200 memory
201 Data pattern recognition & data generation circuit
202 Request receiving unit
1011 Comparison circuit group
1012, 2012, 2021 selection circuit
2011 Decoder

Claims

In a parallel computer system including a plurality of processors and a plurality of memories,
When storing data in the memory, the data pattern of the data to be stored is determined, and if the data to be stored is special data, the data pattern is compressed, and the packet configuration related to the store instruction of the special data is specially specified. The processor for changing a packet configuration related to a store instruction of data other than data,
When the pattern indicating the data to be stored in the store command transmitted from the processor is a compressed pattern, the memory decodes the pattern and receives the store command. Store instruction execution control method.

2. The store instruction execution control method according to claim 1, wherein the special instruction in the floating-point format is used as a special data, and is applied to a computer system including a processor that performs a floating-point operation.

In a parallel computer system including a plurality of processors and a plurality of memories,
When transmitting a transfer data request as a store instruction, it is determined whether or not the data pattern of the data to be stored corresponds to the data pattern of the special data. A data pattern recognizing circuit in the processor that determines whether the data is applicable, and sends a compression pattern corresponding to the storage target data based on the determination result;
It is determined whether or not the storage target data is special data based on the compression pattern sent from the data pattern recognition circuit, and if the determination result is “the storage target data is special data”, the special data A packet having information indicating that it is a special data store instruction in the store instruction identification bit, holding the compression pattern sent from the data pattern recognition circuit, and having a smaller number of packets than a store instruction other than the special data store instruction A request output unit in the processor that outputs the transfer data request having the configuration to the memory side;
When a transfer data request as a store instruction is received from the processor side, it is determined whether or not the transfer data request is a special data store instruction based on the content of the special data store instruction identification bit in the transfer data request. If the determination result is “the transfer data request is a special data store instruction”, the content of the compression pattern in the transfer data request is determined, and the data of the storage target data obtained by decoding the compression pattern based on the determination result A data pattern recognition and data generation circuit in a memory for generating and outputting a pattern;
When a transfer data request as a store instruction is received from the processor side, it is determined whether or not the transfer data request is a special data store instruction based on the content of the special data store instruction identification bit in the transfer data request. If the determination result is “the transfer data request is a special data store instruction”, the decoding result of the compressed pattern output by the data pattern recognition & data generation circuit is used as storage target data in the memory for receiving the transfer data request. A store instruction execution control method, comprising: a request receiving unit.

A data pattern recognition circuit outputs a signal indicating that the data pattern is the same as a comparison circuit group for determining the data pattern identity between the data to be stored in the transfer data request as a store instruction and each special data. 4. The storage instruction execution control method according to claim 3, wherein the storage instruction execution control method is realized by a configuration including a selection circuit that selects and outputs a compression pattern corresponding to a comparison circuit that outputs.

A data pattern recognition & data generation circuit for determining a bit pattern of the compressed pattern in order to decode a compressed pattern in a transfer data request which is a special data store instruction; and a decoding result based on the determination result of the decoder. 5. The store instruction execution control method according to claim 3, wherein said control method is realized by a configuration including a selection circuit for selectively outputting said data pattern.

A special data store instruction having a three-packet structure including a packet having a special data store instruction identification bit and a packet having a PTN field indicating a compression pattern obtained by compressing the data pattern of the data to be stored, and a special data store instruction having eight bytes of data to be stored. 6. A store instruction according to claim 1, wherein a store instruction other than a special data store instruction having a 4-packet structure having two packets is employed. Execution control method.

Hexadecimal FFFFFFFFFFFFFFFF, 00000000000000000, 7FF00000000000000, 7FFFFFFFFFFFFFFF, FFFFFFFF00000000, 7F8000000000000000, and 7FBBFFFFF00000000 are used as the data patterns of the special data. 7. The storage instruction execution control method according to claim 6, wherein 0111 is assigned respectively.

The data patterns of the hexadecimal numbers FFFFFFFFFFFFFFFF, 0000000000000000000, 7FF00000000000000, 7FFFFFFFFFFFFFFF, FFFFFFFF0000000000, 7F8000000000000000, 7FBFFFFFF000000000, 0000000000FFFFFFFF, 0000FFFFFFFF00000000, and FFFF000000000FFFF are used as the data patterns 0001, FFFF00000000FFFF, and 0001. , 0101, 0110, 0111, 1001, 1010, and 1011 are assigned, respectively.