JP3638625B2

JP3638625B2 - Multiple processing equipment

Info

Publication number: JP3638625B2
Application number: JP27578793A
Authority: JP
Inventors: 忠雄天田; 研二是方
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1993-11-04
Filing date: 1993-11-04
Publication date: 2005-04-13
Anticipated expiration: 2020-04-13
Also published as: JPH07129525A

Description

【０００１】
【産業上の利用分野】
本発明は、特に、メモリ及びデータ処理部を各々有するとともに外部通信手段を介して相互に通信可能に接続された複数処理装置に関する。
【０００２】
【従来の技術】
従来、図１１に示すような複数処理装置があった。
従来例に係る各処理装置９０₀〜９０_nは、同図に示すように、メモリ９４_i及び種々のデータ処理を行うＣＰＵ９３_iを各々有する。
【０００３】
また、同図に示すように、各処理装置９０_iは、外部通信手段９７を介して転送されたパケットの受信を行う受信バッファ９２_iと、送信しようとするパケットを保持する送信バッファ９１_iと、前記外部通信手段９７を介する他の処理装置との間の送信処理及び受信処理を行う外部通信処理部９５_iと、メモリ９４_iへのデータ転送を前記ＣＰＵ９３_iを介さずに行うＤＭＡ９９_iとを有するものである。
【０００４】
従来例に係る複数処理装置にあっては、各処理装置間の通信については、外部通信処理部９５_iが行うが、１つの処理装置内の複数の異なる種別のメモリ空間のデータの転送については、外部通信処理部９５_iによらずに、ＣＰＵ９３_i自身が又はＤＭＡ９９_iにより行うようにしていた。
【０００５】
【発明が解決しようとする課題】
さて、従来例に係る複数処理装置にあっては、以上説明したように、外部処理部との間のデータの通信処理については、外部通信処理部９５_iが行い、自己の処理装置内の複数の異なる種別のメモリ空間のデータの転送については、ＣＰＵ９３_i自身又はＤＭＡ９９_iが処理を行っていた。
自己の処理装置内の複数の異なる種別のメモリ空間のデータの転送についてＣＰＵ９３_i自身が行う場合には、ＣＰＵの処理の負担が大きくなり、各処理装置の処理能力が低下するおそれがあるという問題点があった。
【０００６】
また、自己の処理装置内の複数の異なる種別のメモリ空間のデータの転送をＤＭＡが行うには、前記外部通信処理部の他に、このＤＭＡを設ける必要があるだけでなく、メモリ９４_iの該当する領域にパケット形式で表されたデータを転送するには、当該パケット形式からアドレスを抽出する処理又はアドレスを指示する処理が必要となり、処理が複雑になるおそれがあるという問題点があった。
【０００７】
そこで、本発明は、各処理装置内のメモリの複数の異なる種別間のデータの転送を、ネットワークを通じて各処理装置（ＰＥ；プロセッサ・エレメント）間のデータの転送を行う通信処理装置の機能を用いて行うことにより、ＣＰＵの負担を軽減し、各処理装置の処理性能を向上させ、ひいては、複数処理装置全体の性能を向上させることを目的としてなされたものである。
【０００８】
また、通信処理部による各処理装置（ＰＥ）内メモリのデータ転送に時間がかかると、各処理装置内のＣＰＵのデータ待ち状態が生じるが、各処理装置内のメモリのデータ転送のスループット（処理能力）を処理装置間のデータ転送のスループットより良くなる様に構成することにより、各処理装置の処理性能を低下させないことを目的とするものである。
【０００９】
【課題を解決するための手段】
以上の技術的課題を解決するため、第一の発明は、図１に示すように、メモリ１４_i；ｉ＝０〜ｎ及びデータ処理部１３_iを各々有する複数のデータ処理装置１０₀〜１０_nが外部通信手段１７を介して相互に通信可能に接続された複数処理装置である。
【００１０】
また、各処理装置１０_iには、外部通信手段１７を介して転送されたパケットを受信する受信バッファ１２_iと、送信しようとするパケットを保持し、内外通信処理部１５_iからの指示により、保持されたパケットを外部通信手段１７又は内部通信手段１６_iのどちらか一方に切り換えて送出する折返可能送信バッファ１１_iと、送信しようとするパケットの宛先が自己であるか又は他の処理装置であるかの判断、当該判断結果に基づく折返可能送信バッファ１１_iに対するパケット送出先の通信手段の切換え、及び、受信バッファ１２_iを介して受信されたパケット若しくは自己宛のパケットが示すメモリ１４_iの該当する格納領域１４_i,0〜１４_i,mへの書込みの指示を行うとともに、外部通信手段１７よりも、内部通信手段１６_iの並列通信路数を多く設け、又は通信処理の最小間隔を小さくすることにより、内部通信手段１６_iを介する通信処理の単位時間当たりのデータ量を大きくするように処理を行う内外通信処理部１５_iと、前記メモリ１４_iとデータ処理部１３_i、受信バッファ１２_i、折返可能送信バッファ１１_i、及び内外通信処理部１５_iとの間を接続するとともに外部通信手段１７に比較して並列通信路数の多い内部通信手段１６_iと、を有するものである。
【００１１】
ここで、「通信手段」は通信路を含む概念であり、バスにより構成されるもの、または、クロスバーなどのネットワーク等があり、片方向伝送に限られず、両方向の伝送が可能である場合も含む。
「パケット」とは、データや制御信号を含む２進数字の列であって、列全体を１つの単位として伝送したり交換したりされるものをいう。データ、制御信号、場合によっては誤り制御情報を含むこともあり、これらの情報が一定の形式で配列されているものである。
【００１２】
パケットには転送先、発信元、データの大きさ等の通信情報であるパケットヘッダと、相手に伝えたい情報であるボディからなる。
「処理装置」とは、計算機等において、命令を解読し、実行する機能単位をいい、実施例に説明するプロセッサ・エレメント（ＰＥ）がこれに相当する。
処理装置は、並列型の処理装置に限られず、分散型の処理装置にも適用される。
【００１３】
「格納領域」とは、前記メモリに設けられた、複数の異なる種別用に使用されるものであり、例えば、実施例に示すように、データの転送用に用いられるグローバル領域又はデータ処理用に用いられるローカル領域がある。
「データの転送」には、例えば、トークンをもらって送信を行う通信形式の外に、トークンを付加して送信したパケットに対してリターントークンをもらって、送信を行う通信形式等がある。
【００１４】
「外部通信手段よりも内部通信手段を介する通信処理の単位時間当たりのデータ量が大きい」とは、内部通信手段を介する処理能力（スループット）を外部通信手段を介する処理能力よりも高めることを意味し、そのために、外部通信手段１７に比較して内部通信手段１６_iの並列通信路数を多く設けたり、外部通信手段よりも内部通信手段を介する通信処理の最小間隔（秒）を小さくすること等により実現される。また、「スループット（処理能力）」とは、与えられた時間内に計算機システムによって遂行される仕事の量の測度をいう。
【００１５】
このようなハードウェアの変更が可能なのは以下の理由による。外部通信手段は、各種の通信機器が接続可能となるように、通信規則等で、その形式を統一して汎用性を満たすようにしているので、その通信手段数の変更等には大きな制約が加えられる。もし、外部通路の形式を変更するとすれば、その外部通信手段に接続されているすべての機器に影響を与える。一方、外部通信手段に接続されている各処理装置内での形式の変更による影響は当該各処理装置内に止まり変更が容易である。
【００１６】
第二の発明は、図２に示すように、データの記憶及び種々のデータの処理を各々行い外部通信手段を介して相互に通信可能に接続された複数の処理装置の中の任意の処理装置でパケット送信の指示があると（Ｓ１）、送信しようとするパケットについての宛先が自己又は他の処理装置であるか否かを判断し（Ｓ２）、外部通信手段について送信可能状態となるのを待ち（Ｓ３）、送信しようとするパケットの宛先が他の処理装置の場合には、送信可能状態となってから外部通信手段を介して、単位時間当たり所定データ量毎に当該パケットについて送信を行い（Ｓ４）、１つのパケットについての送信が完了するまで送信を繰り返し（Ｓ５）、送信しようとするパケットの宛先が自己の処理装置の場合には、単位時間当たり前記所定データ量よりも大きいデータ量で処理を行い、自己の処理装置の該当するメモリの領域への書込みを行う（Ｓ６）ことである。
ここで、「所定データ量」とは、外部通信手段を介して行われる単位時間当たりの処理データ量であり、所定データ量は、当該通信手段の並列通信路数や、外部通信手段の最小間隔（秒）等に依存する。
【００１７】
一方、第一の発明の実施態様は、図３に示すように、第一の発明の前記内外通信処理部２５_iは、送信しようとするパケットの宛先が自己の処理装置か、又は他の処理装置であるかを判断する自己宛判断部２０_iと、送信の指示があって送信可能状態となった場合には、前記自己宛判断部２０_iによる判断結果に基づき、前記折返可能送信バッファ１６_iに対するパケット送出先の通信手段の切換え、アドレス選択部２３_iに対するアドレスの選択、及び、当該アドレスに基づき前記メモリ１４_iに対するアクセスの指示を行う通信指示部２２_iとを有する。また、当該内外通信処理部２５_iは、指示に基づいて、送信しようとするパケットのアドレス又は受信されたパケットのアドレスの一方の選択を行うアドレス選択部２３_iと、前記アドレス選択部２３_iにより選択されたアドレスに基づいて前記メモリ１４_iの該当する格納領域１４_i,0〜１４_i,mに対するアクセスの指示を行うアクセス指示部２４_iとを有し、外部通信手段１７よりも内部通信手段１６_iを介する通信処理の単位時間当たりのデータ量を大きくするように処理を行うものである。
【００１８】
【作用】
続いて、第一の発明及び第二の発明の動作について説明する。
図１又は図２に示すように、ステップＳ１で前記内外通信処理部１５_iにパケットの送信の指示があると、ステップＳ２で、前記内外通信処理部１５_iは、送信すべきパケットが自己宛か否かを判断する。自己宛か否かの判断は、例えば、送信しようとするパケットの所定ビット位置にある自己宛か否かを示す自己宛ビットを見ることにより判断したり、パケットの転送先を示す処理装置の機番が自己の処理装置を示す機番と同一であるか否かを判断することにより行われる。
送信すべきパケットが自己宛でない場合には、ステップＳ３に進み、送信可能状態となるのを待つ。
【００１９】
ステップＳ４で、前記内外通信処理部１５_iは、送信可能状態となってから外部通信手段１７を介して送信するように指示を行う。
パケットの送信は、一度に転送可能なデータ量ずつ行われる。送信が行われた後は、次の送信はステップＳ３で送信可能状態になるのを待ってから行われ、ステップＳ５で１つのパケットの送信が完了するまで行われる。「送信可能状態」になるには、例えば、送信権を表すトークンをもらって送信を行う場合又は、パケットにトークンを付加して送信し、リターントークンがあった後に次の送信を行う方式等の場合がある。
【００２０】
一方、ステップＳ２で、送信すべきパケットが自己宛である場合には、ステップＳ６に進み、前記内外通信処理部１５_iは、前記折返可能送信バッファ１１_iに対し、内部通信手段１６_iを介してのパケットの転送を指示するとともに、送信しようとするパケットの宛先を示すデータ位置からアドレスを得て、当該メモリ１４_iの該当する格納領域１４_i,0〜１４_i,mにデータを書き込むようにアクセスの指示を行う。
即ち、処理装置１０₀〜１０_nの内部へのパケットの転送は、前記折返可能送信バッファ１１_iに保持されている送信データがあたかも、前記受信バッファ１２_iに保持されている如く扱われることにより行われる。
【００２１】
各処理装置１０₀〜１０_nの内部通信手段１６_iを介してパケットの転送を行う際、外部通信手段１７に比較して内部通信手段１６_iの方が並列通信路数を多く設けている。また、内外通信処理部１５_iは外部通信手段１７よりも内部通信手段１６_iを介する通信処理の単位時間当たりのデータ量を大きくするように処理をしている。従って、外部の処理装置へのパケットの送信よりも、自己の処理装置へのパケットの送信の方がスループットが良いように構成されていることになる。
前記内外通信処理部１５_iによる外部の通信は、内部の通信を可能にしたことにより悪影響を受けることはない。また、内部へのデータの転送時間を短縮することができるので、前記データ処理部１３_iにデータを揃えるまでのデータの待ち時間を短縮することができ、各処理装置の性能低下の防止、及び、処理装置全体としての処理性能の低下を防止することができることになる。
【００２２】
以上説明したように、本発明にあっては、内部通信手段１６_i及び前記内外通信処理部１５_iを設けることにより、外部通信手段１７を介するデータの転送と同じようなしくみで、各処理装置１０₀〜１０_n内部へのデータの転送を行うことができる。これにより、各処理装置１０₀〜１０_n内部でのデータの転送を行う際には、各処理装置１０₀〜１０_n内部に設けられた内外通信処理部１５_iが処理を行うので、各データ処理部１３_iへの負担をかけない。また、ＤＭＡ等の各処理装置１０₀〜１０_n内部でのデータの転送を行うためには、外部通信を行う機器を共用することができ、専用の機器を設ける必要がないので、構成が簡単になる。
【００２３】
【実施例】
続いて、本発明の実施例について説明する。
図４には、本実施例に係る複数の分散メモリ型の処理装置（プロセッサ・エレメント，ＰＥ）が外部通信手段３７を介して、相互に通信可能に接続された複数処理装置の全体ブロック図を示す。
ここで、「分散メモリ型の処理装置」とは、各処理装置に各々メモリが分散して設けられ、各処理装置とは別個に統一した共用メモリが設けられているものとは異なるものである。
【００２４】
各処理装置３０_o〜３０_nには、同図に示すように、各々主記憶装置等からなるメモリ３４_o〜３４_nと、種々のデータ処理を行うデータ処理部に相当し、命令の実行又は通信処理部の制御を司るＣＰＵ３３₀〜３３_nと、パケットの通信を司る通信部３８₀〜３８_nとが設けられている。さらに、各処理装置３０₀〜３０_n内で、当該各メモリ３４_o〜３４_n、ＣＰＵ３３₀〜３３_n、及び通信部３８₀〜３８_nは、前記外部通信手段３７よりも並列通信路数の多い内部通信手段３６₀〜３６_nにより接続されている。
本実施例では、外部通信手段３７の幅を４バイトとし、内部通信手段３６₀〜３６_nの幅を８バイトとする。
前記外部通信手段３７としては、例えばバスや、ネットワークがある。
【００２５】
図５には、第一の実施例に係る処理装置３０_iのブロック図を示す。
同図に示すように、前述したＣＰＵ３３_iは前記データ処理部に相当するとともに、ＣＰＵ３３_i内部に、キャッシュ３３ａ_iが設けられている。
また、各処理装置３０_iには、同図に示すように、外部通信手段３７を介して転送されたデータの受信を行う受信バッファ３２_iと、送信しようとするパケットを保持するとともに、内外通信処理部３５_iからの指示により、保持されたパケットを外部通信手段３７又は内部通信手段３６_iのどちらか一方に切り換えて送出する折返可能送信バッファ３１_iとが設けられている。
【００２６】
内部通信手段３６_iは、内部バスであり、前述したように、外部通信手段３７に比較して並列通信路数が多くなるように設けることによりスループットを良くしている。また、符号３６１_iは、折返可能送信バッファ３１_iからメモリ３４_iへの経路であり、この経路を用いて、通信処理部３５_iは、前記グローバル領域３４_i,1と、３４_i,2との間のパケットの転送を行う。これによりＣＰＵ３３_iへの負担が軽減され、各処理装置の性能を向上させることができる。
【００２７】
符号３９_iは処理装置３０_iと外部通信手段３７との間を接続する経路である。ここは一般的にいってハードウェアの制約によりスループットが良くない構成になりがちである。そのため、経路３９_iと内部通信手段３６_iとを同じスループットで構成した場合には、メモリ３４_iへのデータ転送速度が遅くなる。従って、内部通信手段３６_iは経路３９_iよりもスループットの良い構成とすることにより、処理に必要なデータを待っているＣＰＵ３３_iのアイドルタイムを減少させることができ処理装置の性能を向上させることができる。
【００２８】
また、図５において、内外通信処理部３５_iは送信しようとするパケットの宛先が自己であるか又は他の外部の処理装置３０₀〜３０_nであるかの判断、当該判断結果に基づく折返可能送信バッファ３１_iに対するパケット送出先の通信手段の切換え、及び、外部通信手段３７を介して受信されたパケット若しくは自己宛のパケットが示すメモリ３４_iの該当するグローバル領域３４_i,1又はローカル領域３４_i,2への書込みの指示を行うとともに、外部通信手段３７よりも内部通信手段３６_iを介する通信処理の単位時間当たりのデータ量を大きくするように処理を行う内外通信処理部３５_iとを有する。
【００２９】
さらに、本実施例に係る各処理装置３０_iの前記メモリ３４_iの主記憶装置に相当する部分には、パケットを記憶するとともに、パケットの移動が相互に可能な２つの異なる種別の格納領域であるグローバル領域３４_i,1及びローカル領域３４_i,2が設けられている。
ここで、グローバル領域３４_i,1は各処理内間のデータの転送のために用いられる領域であり、前記ローカル領域３４_i,2は各処理装置内処理のために用いられる領域である。
また、前記ＣＰＵ３３_iには、同図に示すように、キャッシュ３３ａ_iが設けられ、前記メモリ３４_iのローカル領域３４_i,2において処理を行う。
【００３０】
尚、図示していないが、前記メモリ３４_iには、主記憶装置の外にディスク等が設けられていても良い。
また、符号３１ａ_iはメモリ３４_iからのパケットと内外通信処理部３５_iからのパケットの選択を行う選択部であり、符号３２ａ_iは、折返可能送信バッファ３１_iからの送信パケット又は受信バッファ３２_iからの受信パケットの選択を行う選択部である。また、符号３１ｃ_i、３１ｂ_i及び３２ｂ_iはレジスタを表す。
ここで、内外通信処理部３５_iと、折返可能送信バッファ３１_iと、受信バッファ３２_iと、レジスタ３１ａ_i，３１ｃ_i，３２ｂ_iと、選択部３１ａ_i，３２ａ_iとは、図４の通信部３８_iに相当する。
【００３１】
図６には、前記内外通信処理部３５_iの機能を詳細に示すものである。
同図に示すように、本実施例に係る内外通信処理部３５_iには、送信しようとするパケットの宛先が自己の処理装置か又は他の処理装置であるかを判断する自己宛判断部４０_iと、送信の指示があって送信可能状態となった場合には、前記自己宛判断部４０_iによる判断結果からの指示に基づき、前記折返可能送信バッファ３１_iが送信しようとするパケット送出先の通信手段を外部通信手段若しくは内部通信手段のどちらかへの切換えの指示、アドレス選択部４３_iに対するアドレスの選択、及び、当該アドレスに基づき前記メモリ３４_iに対するアクセスの指示を行う通信指示部３２_iとを有する。
【００３２】
さらに、当該内外通信処理部３５_iには、指示に基づいて、送信しようとするパケットのアドレス又は受信されたパケットのアドレスの選択を行うアドレス選択部４３_iと、当該アドレス選択部４３_iにより選択されたアドレスに基づいて前記メモリ３４_iの該当するグローバル領域３４_i,1又はローカル領域３４_i1,2に対するアクセスの指示を行うアクセス指示部４４_iとを有し、外部通信手段３７を介するよりも内部通信手段３６_iを介する通信処理の単位時間当たりのデータ量を大きくするように処理を行うものである。
【００３３】
さらに、当該通信指示部４２_iには送信をしようとするパケットにトークンを付加するトークン付加部４２ａ_iと、相手側のバッファまたは、外部通信手段３７にあるバッファが空いていて送信が可能である場合に発行されるリターントークンの検出を行うリターントークン検出部４２ｂ_iと、送信すべきパケットについての送信処理が完了したことを表すターミネート出力部４２_iとを有する。
【００３４】
また、本実施例にあっては、前記自己宛判定部４０_iは、図６に示すように、パケットの所定位置に含まれる自己宛ビットを読み出す自己宛ビット読出部４０ａ_iと、読み出された当該ビットの判定を行うビット判定部４０ｂ_iとを有する。「自己宛ビット」とは、パケットの転送の宛て先が自己の処理装置であるか否かを示すビットをいう。
【００３５】
続いて、本実施例の送信時の動作について図７に基づいて説明する。
外部通信手段であるネットワークから到来したパケットは受信バッファに受信され、その後メモリアクセスのプライオリティ（図示せず）がとれた時点でメモリのグローバル領域３４_i,1に記憶される。ＣＰＵ３３_iはローカル領域において処理を行うため、グローバル領域３４_i,1からローカル領域１４_i,2への転送が必要となる。外部通信手段３７であるネットワークへ送信するパケットをローカル領域３４_i,2からグローバル領域３４_i,1に転送する必要がある。図７に示すように、ステップＳＪ１で、前記ＣＰＵ３３_iが送信しようとするパケットヘッダをメモリ３４_iの主記憶上に（グローバル領域とは限らない。）設けられた送信キューに入力する。この事実は前記通信処理部３５_iに通知され、内外通信処理部３５_iへの送信の指示が行われる。
【００３６】
ステップＳＪ２で、前記内外通信処理部３５_iは当該パケットヘッダを前記送信バッファ３１_iに取り込むために、前記アドレス選択部４３_iを介して、指示のあったパケットに関してアクセス指示部４４_iにより、例えば、８バイトのフェッチ要求を前記メモリ３４_iに出す。ここで、８バイトとは前記送信バッファ３１_iがあふれない単位である。
ステップＳＪ３で、当該メモリ３４_iに記憶されているパケットヘッダを全部完了するまで、数回にわたってフェッチ要求を出し、前記折返可能送信バッファ３１_iに書き込む。
【００３７】
ステップＳＪ４でパケットヘッダについてすべて読み出されて前記折返可能送信バッファ３１_iに保持される。その８バイトずつのパケットヘッダの読出しの際のステップＳＪ５に、前記自己宛判定部４０_iは送信しようとするパケットの宛先が自己の処理装置であるか又は外部の他の処理装置であるかの判断を行う。その判断は、自己宛判断部４０_iの自己宛ビット読出部４０ａ_iが送信しようとするパケットの宛先が自己の処理装置であるか“1 ”又は外部の他の処理装置であるか“0 ”であるかを表す前記パケットヘッダの中の所定位置にある自己宛ビットを読み出すことにより行う。読み出された自己宛ビットはビット判定部４０ｂ_iにより当該ビットが“1 ”であるか“0 ”であるかの判定がされ、その結果を前記通信指示部４２_iに通知する。
【００３８】
ここで、当該パケットヘッダには、少なくとも自己宛ビット及び受信先のアドレスを持っていることになる。詳しくは、パケットヘッダの形式は図８に示すものであり、当該パケットヘッダには、第１ワード目には、例えば、宛先の処理装置の識別子が含まれ、第ｍワード目には、そのパケットの大きさを表す情報が含まれ、第ｎ番目には、自己宛か否かを表す自己宛ビット、及び当該パケットが格納されるべき、宛先の処理装置のメモリの格納位置を表すアドレスが含まれる。これらのパケットヘッダの後の第ｐ番目からボディと呼ばれる実質的な内容を表すデータが含まれている。
【００３９】
前記自己宛判断部４０_iにより自己宛に送信するパケットでないと判断された場合には、ステップＳＪ６に進む。ステップＳＪ６で前記通信指示部４２_iのトークン付加部４２ａ_iは例えば外部通信手段への通信路数によって決まる１ワード４バイト毎に１個のトークンを付加して、当該通信処理部３５_iは前記折返送信バッファ３１_iがあふれないように制御し、前記折返可能バッファ３１_iに指示をして前記外部通信手段３７に送り出す。
【００４０】
その後、外部通信手段３７を介して前記リターントークン検出部４２ｂ_iによりリターントークンが送信先ＰＥから返却が検出されるまで待ち、ステップＳＪ７で、リターントークンが返ってきたことが検出された場合には、再びステップＳＪ６に戻り、１ワード毎にトークンを付加して外部通信手段３７にパケットヘッダを送出する。
以上の処理ステップＳＪ６及びステップＳＪ７はパケットヘッダだけでなくボディについても同様に繰り返し、ターミネート出力部４２ｃ_iがパケットの全データについての送信が終了したことを表すターミネートをステップＳＪ８で送出するまで繰り返される。
【００４１】
一方、ステップＳＪ５で、前記自己宛判断部４０_iにより自己宛であると判断された場合には、ステップＳＪ９に進む。
ステップＳＪ９では、前記通信指示部４２_iは前記アドレス選択部４３_iに指示を行い前記送信バッファ３１_iに保持されている送信しようとするパケットから抽出されたアドレスを前記アクセス指示部４４_iに送り、当該送信バッファ３１_iに格納されている送信しようとしたパケットを当該アドレスが示すメモリのローカル領域３４_i,1に書き込ませる。この場合には、外部通信手段３７を介してのパケットの転送とは異なり、トークンはパケットには付加されず、リターントークンの検出が必要がない。なぜならば、前記通信処理部３５_iは何バイトのパケットを前記メモリ３４_iのロードバスから受け取り、同量のバイト数のビットをメモリ３４_iのストアバスに送出すれば良いかを知っているからである。
【００４２】
この点をさらに詳しく説明すれば、トークンは前記送信バッファ３１_iから受信バッファ３２_iにデータをあふれないように送るために必要なものであり、自己に転送する場合には、あたかも前記送信バッファ３１_iが受信バッファ３２_iであるかのように通信処理部には見える。従って、トークンの役割は終わったところより処理が始まっているからである。
【００４３】
以上説明したように、本実施例にあっては、以上のような構成にしたため、前記ＣＰＵ３３_iは、キャッシュ内データを用いて処理を行うことができ、通信処理部がパケット転送を終えた時に前記転送データを開いた次の処理に移ることができる。処理を終えると、その処理データがネットワークへ転送するデータである場合でも通信処理部がデータ転送を行うため次の処理に移ることができる。
【００４４】
また、処理装置間（ＰＥ間）接続装置であるネットワークをクロスバー等で構成した場合、ハードウェアの観点からＰＥとネットワーク間のバス幅が制限される。あるいは転送時間が長くなるといったことが考えられるが、この場合に、ネットワークへの経路３９_iにより各処理装置３０₀〜３０_n内の内部通信手段のスループットが良くなるように構成することにより、例えば、外部通信手段が１バイト／２サイクルに対し、内部通信手段のデータの伝送が２バイト／１サイクルとなるように構成することにより処理データが揃うまでのＣＰＵの待ち時間を短縮することができる。
【００４５】
上述の説明ではＣＰＵの処理量が通信処理部の処理量よりも多いものとして述べたが通信処理部の処理量が多い場合には通信処理部を増やす等により対処できるものと考えられる。
【００４６】
続いて、第二の実施例について、図９に基づいて説明する。
第二の実施例に係る分散メモリ型の処理装置は、前述した第一の実施例に係る処理装置と異なり、前記自己宛判断部として、自己宛ビット読出部４０ａ_i及びビット判定部４０ｂ_iの代わりに、宛先読出部５０ａ_i及び自己の処理装置との機番の比較を行う比較部５０ｂ_iとを設けたものである。
【００４７】
即ち、第二の実施例にあっては、第一の実施例と異なり、そのパケットの形式は、図８ではなく、図１０に示すように、自己宛か否かを示すビットを設けるのではなく、宛先を示す処理装置の番号が自己の処理装置の機番と同一の場合には、自己宛の送信データであると判断するものである。
【００４８】
本例は第一の実施例と異なり自己宛を示すビットを前記パケットに設ける必要がなく、通常の処理装置の宛先を使用することができるので、パケットの形式を変更する必要がないので扱い安い。
尚、第一の実施例の処理装置の場合に用いた符号と同一の符号は同一のものを示す。
尚、以上の例では、トークン及びリターントークンを用いてデータ通信を行う方式について説明したが、当該例に限られることなく、ＩＥＥＥのＣＳＭＡ／ＣＤ（イーサネット）方式、トークンパッシングリング方式やトークンパッシングバス方式、ＭＡＮ方式、ディジタルＰＢＸ方式等に適用することもできる。
【００４９】
【発明の効果】
以上述べてきたように本発明によれば、外部通信手段に比較して伝送されるデータ幅の大きい内部通信手段を設けるとともに、送信しようとするパケットの宛先が他の処理装置であるか自己の処理装置であるかの判断を行い、自己宛のパケットの場合には、メモリの該当する領域への書込みを行うように指示することにより、自己折り返し機能を持つようにしている。
【００５０】
従って、ＣＰＵを介することなく又はＤＭＡ等の機器を別個に設けることなく、通信処理部の機能を用いて自己へのデータ転送を行うためにＣＰＵに負担をかけず、また、構成を簡単化することができる。
また、通信処理部が外部の処理装置との間の通信を行うよりも、自己宛への送信の方がスループットが良くなっている。従って、各処理装置のＣＰＵ等の待ち時間を短縮して複数処理装置及びそのデータ通信方法の性能を向上させることができる。
【図面の簡単な説明】
【図１】第一の発明の原理ブロック図
【図２】第二の発明の原理流れ図
【図３】第一の発明の実施態様を示すブロック図
【図４】第一の実施例に係る機器構成全体ブロック図
【図５】第一の実施例に係るブロック図
【図６】第一の実施例に係る内外通信処理部を示すブロック図
【図７】第一の実施例に係る流れ図
【図８】第一の実施例に係るパケットの形式例を示す図
【図９】第二の実施例に係る内外通信処理部を示すブロック図
【図１０】第二の実施例に係るパケットの形式例を示す図
【図１１】従来例に係るブロック図
【符号の説明】
１０₀〜１０_n，３０₀〜３０_n 処理装置
１１_i，３１_i 折返可能送信バッファ
１２_i，３２_i 受信バッファ
１３_i，２３_i（３３_i）データ処理部（ＣＰＵ）
１４_i，２４_i，３４_i メモリ
１５_i，２５_i，３５_i，４５_i 内外通信処理部
１６_i，２６_i，３６_i 内部通信手段
１７，３７外部通信手段[0001]
[Industrial application fields]
The present invention particularly relates to a plurality of processing apparatuses each having a memory and a data processing unit and connected to each other via an external communication unit so as to communicate with each other.
[0002]
[Prior art]
Conventionally, there has been a multiple processing apparatus as shown in FIG.
Each processing apparatus 90 according to the conventional example ₀ ~ 90 _n As shown in FIG. _i CPU 93 for performing various data processing _i Respectively.
[0003]
Further, as shown in FIG. _i Is a reception buffer 92 for receiving packets transferred via the external communication means 97. _i And a transmission buffer 91 for holding a packet to be transmitted _i And an external communication processing unit 95 that performs transmission processing and reception processing with other processing devices via the external communication means 97 _i And memory 94 _i Data transfer to the CPU 93 _i DMA99 performed without intervention _i It has.
[0004]
In the multi-processing apparatus according to the conventional example, the external communication processing unit 95 is used for communication between the processing apparatuses. _i However, regarding the transfer of data in a plurality of different types of memory spaces in one processing device, the external communication processing unit 95 _i Regardless of the CPU 93 _i Self or DMA99 _i I was trying to do it.
[0005]
[Problems to be solved by the invention]
Now, in the multiple processing device according to the conventional example, as described above, the external communication processing unit 95 is used for data communication processing with the external processing unit. _i For transferring data in a plurality of different types of memory spaces in its own processing device, the CPU 93 _i Self or DMA99 _i Was processing.
CPU 93 for transferring data in a plurality of different types of memory spaces in its own processing device _i If it is performed by itself, there is a problem that the processing load of the CPU increases and the processing capability of each processing apparatus may be reduced.
[0006]
Further, in order for the DMA to transfer data in a plurality of different types of memory spaces in its own processing device, it is not only necessary to provide this DMA in addition to the external communication processing unit, but also the memory 94. _i In order to transfer the data expressed in the packet format to the corresponding area, it is necessary to extract the address from the packet format or to specify the address, and there is a problem that the processing may be complicated. It was.
[0007]
Therefore, the present invention uses the function of a communication processing device that transfers data between a plurality of different types of memory in each processing device, and transfers data between the processing devices (PE; processor elements) via a network. This is done for the purpose of reducing the burden on the CPU, improving the processing performance of each processing apparatus, and consequently improving the performance of the entire plurality of processing apparatuses.
[0008]
Further, if it takes time for the data transfer in the memory in each processing device (PE) by the communication processing unit, the CPU data waiting state in each processing device occurs. It is an object to prevent the processing performance of each processing apparatus from being deteriorated by configuring the capacity to be higher than the throughput of data transfer between the processing apparatuses.
[0009]
[Means for Solving the Problems]
In order to solve the above technical problem, the first invention has a memory 14 as shown in FIG. _i I = 0 to n and the data processing unit 13 _i A plurality of data processing devices 10 each having ₀ -10 _n Is a plurality of processing devices connected to each other via the external communication means 17 so as to be able to communicate with each other.
[0010]
In addition, each processing apparatus 10 _i Includes a reception buffer 12 for receiving a packet transferred via the external communication means 17. _i Holds the packet to be transmitted, and the internal / external communication processing unit 15 _i In accordance with an instruction from the external communication means 17 or the internal communication means 16 _i The return-capable transmission buffer 11 for switching to one of these and sending it out _i A determination as to whether the destination of the packet to be transmitted is itself or another processing device, and a returnable transmission buffer 11 based on the determination result _i Switching of communication means of the packet transmission destination for the reception buffer 12 _i The memory 14 indicated by the packet received via the packet or the packet addressed to itself _i 14 corresponding storage area _{i, 0} ~ 14 _{i, m} Instruction to write to the internal communication means 16 rather than the external communication means 17. _i By providing a large number of parallel communication paths or reducing the minimum interval of communication processing, the internal communication means 16 _i Internal / external communication processing unit 15 that performs processing so as to increase the amount of data per unit time of communication processing via _i And the memory 14 _i And data processing unit 13 _i Receive buffer 12 _i The returnable transmission buffer 11 _i And the internal / external communication processing unit 15 _i And the internal communication means 16 having a larger number of parallel communication paths than the external communication means 17. _i And.
[0011]
Here, “communication means” is a concept including a communication path, and includes a bus or a network such as a crossbar. Including.
A “packet” is a binary digit string including data and control signals, which is transmitted or exchanged as a whole unit. Data, control signals, and possibly error control information may be included, and these pieces of information are arranged in a fixed format.
[0012]
A packet includes a packet header that is communication information such as a transfer destination, a transmission source, and a data size, and a body that is information to be transmitted to the other party.
The “processing device” refers to a functional unit that decodes and executes an instruction in a computer or the like, and corresponds to a processor element (PE) described in the embodiment.
The processing device is not limited to a parallel processing device, and is also applied to a distributed processing device.
[0013]
The “storage area” is used for a plurality of different types provided in the memory. For example, as shown in the embodiment, a global area used for data transfer or for data processing is used. There is a local area used.
“Data transfer” includes, for example, a communication format in which a token is transmitted and a return token is received for a packet transmitted with a token and transmitted.
[0014]
“The amount of data per unit time of communication processing via the internal communication means is larger than that of the external communication means” means that the processing capacity (throughput) via the internal communication means is higher than the processing capacity via the external communication means. Therefore, compared with the external communication means 17, the internal communication means 16 _i This is realized by providing a larger number of parallel communication paths, or by reducing the minimum interval (seconds) of communication processing via the internal communication means than by the external communication means. “Throughput (processing capacity)” refers to a measure of the amount of work performed by a computer system within a given time.
[0015]
Such hardware changes are possible for the following reasons. The external communication means is designed so that it can be connected to various communication devices by standardizing the communication rules, etc., so as to satisfy the generality. Added. If the form of the external passage is changed, it affects all devices connected to the external communication means. On the other hand, the influence of the change in the format in each processing apparatus connected to the external communication means stops in each processing apparatus and can be easily changed.
[0016]
As shown in FIG. 2, the second invention is an arbitrary processing device among a plurality of processing devices connected to each other through external communication means for storing data and processing various data. When there is a packet transmission instruction (S1), it is determined whether or not the destination of the packet to be transmitted is itself or another processing device (S2), and the external communication means is ready for transmission. Waiting (S3), if the destination of the packet to be transmitted is another processing device, the packet is transmitted for each predetermined amount of data per unit time via the external communication means after becoming ready for transmission (S4) The transmission is repeated until the transmission of one packet is completed (S5). When the destination of the packet to be transmitted is its own processing device, the predetermined amount of data per unit time Also it performs a large amount of data processing Ri, performs the appropriate writes to regions of the memory of its own processor (S6) it is.
Here, the “predetermined amount of data” is the amount of processing data per unit time performed via the external communication means, and the predetermined data amount is the number of parallel communication paths of the communication means and the minimum interval of the external communication means. (Seconds) etc.
[0017]
On the other hand, as shown in FIG. 3, the embodiment of the first invention is the internal / external communication processing unit 25 of the first invention. _i The self-addressing determination unit 20 determines whether the destination of the packet to be transmitted is its own processing device or another processing device. _i When the transmission instruction is given and the transmission becomes possible, the self-addressed determination unit 20 _i On the basis of the determination result by the return possible transmission buffer 16 _i Switching of packet transmission destination communication means for address selection unit 23 _i Address selection and the memory 14 based on the address _i Communication instructing unit 22 for instructing access to _i And have. The internal / external communication processing unit 25 _i The address selection unit 23 selects one of the address of the packet to be transmitted or the address of the received packet based on the instruction. _i And the address selector 23. _i The memory 14 based on the address selected by _i 14 corresponding storage area _{i, 0} ~ 14 _{i, m} Access instruction unit 24 for instructing access to _i The internal communication means 16 rather than the external communication means 17 _i The processing is performed so as to increase the amount of data per unit time of communication processing via the network.
[0018]
[Action]
Subsequently, operations of the first invention and the second invention will be described.
As shown in FIG. 1 or FIG. 2, the internal / external communication processing unit 15 in step S1. _i If there is an instruction to transmit a packet in step S2, the internal / external communication processing unit 15 _i Determines whether the packet to be transmitted is addressed to itself. The determination of whether or not the packet is destined for itself is made, for example, by checking the self-addressed bit indicating whether or not the packet is to be transmitted at a predetermined bit position of the packet to be transmitted. This is done by determining whether or not the number is the same as the machine number indicating its own processing device.
If the packet to be transmitted is not addressed to itself, the process proceeds to step S3 and waits for a state where transmission is possible.
[0019]
In step S4, the internal / external communication processing unit 15 _i Instructs to transmit via the external communication means 17 after being ready for transmission.
Packets are transmitted by the amount of data that can be transferred at one time. After the transmission is performed, the next transmission is performed after waiting for the transmission ready state in step S3, and is performed until the transmission of one packet is completed in step S5. To enter the “sendable state”, for example, there is a case where transmission is performed with a token representing a transmission right, or a method in which a token is added to a packet for transmission and the next transmission is performed after a return token is present. .
[0020]
On the other hand, if it is determined in step S2 that the packet to be transmitted is addressed to itself, the process proceeds to step S6, and the internal / external communication processing unit 15 _i Is the returnable transmission buffer 11 _i In contrast, the internal communication means 16 _i And instructing the transfer of the packet via the address, obtaining an address from the data position indicating the destination of the packet to be transmitted, _i 14 corresponding storage area _{i, 0} ~ 14 _{i, m} Instruct to access to write data to
That is, the processing apparatus 10 ₀ -10 _n The packet is transferred to the inside of the returnable transmission buffer 11. _i As if the transmission data held in the reception buffer 12 _i It is performed by being treated as if it is held in
[0021]
Each processing apparatus 10 ₀ -10 _n Internal communication means 16 _i When transferring a packet via the internal communication means 16 compared to the external communication means 17 _i Has more parallel communication paths. Also, the internal / external communication processing unit 15 _i Is internal communication means 16 rather than external communication means 17. _i Processing is performed so as to increase the amount of data per unit time of communication processing via the Internet. Therefore, the transmission of the packet to the own processing device is configured to have a higher throughput than the transmission of the packet to the external processing device.
The internal / external communication processing unit 15 _i External communication by is not adversely affected by enabling internal communication. Further, since the data transfer time to the inside can be shortened, the data processing unit 13 _i Thus, the waiting time of data until the data is aligned can be shortened, and the performance degradation of each processing apparatus can be prevented, and the degradation of the processing performance of the entire processing apparatus can be prevented.
[0022]
As described above, in the present invention, the internal communication means 16 _i And the internal / external communication processing unit 15 _i Are provided in the same manner as the data transfer via the external communication means 17. ₀ -10 _n Data can be transferred to the inside. Thereby, each processing apparatus 10 ₀ -10 _n When transferring data internally, each processing device 10 ₀ -10 _n Internal / external communication processing unit 15 provided inside _i Process, each data processing unit 13 _i Do not put a burden on. Also, each processing device 10 such as DMA ₀ -10 _n In order to transfer data internally, a device that performs external communication can be shared, and it is not necessary to provide a dedicated device, so the configuration is simplified.
[0023]
【Example】
Next, examples of the present invention will be described.
FIG. 4 is an overall block diagram of a plurality of processing devices in which a plurality of distributed memory processing devices (processor elements, PE) according to the present embodiment are connected to each other via an external communication unit 37. Show.
Here, the “distributed memory type processing device” is different from a device in which memories are distributed in each processing device and a common memory is provided separately from each processing device. .
[0024]
Each processing device 30 _o ~ 30 _n As shown in the figure, the memory 34 is composed of a main storage device or the like. _o ~ 34 _n The CPU 33 corresponds to a data processing unit that performs various data processing and controls execution of instructions or control of the communication processing unit. ₀ ~ 33 _n And a communication unit 38 that manages packet communication. ₀ ~ 38 _n And are provided. Furthermore, each processing apparatus 30 ₀ ~ 30 _n Each of the memories 34 _o ~ 34 _n , CPU33 ₀ ~ 33 _n , And communication unit 38 ₀ ~ 38 _n The internal communication means 36 having a larger number of parallel communication paths than the external communication means 37. ₀ ~ 36 _n Connected by.
In this embodiment, the width of the external communication means 37 is 4 bytes, and the internal communication means 36 ₀ ~ 36 _n Is 8 bytes.
Examples of the external communication unit 37 include a bus and a network.
[0025]
FIG. 5 shows a processing apparatus 30 according to the first embodiment. _i The block diagram of is shown.
As shown in FIG. _i Corresponds to the data processor and the CPU 33 _i Inside the cache 33a _i Is provided.
In addition, each processing device 30 _i As shown in the figure, the reception buffer 32 for receiving the data transferred via the external communication means 37 is used. _i And the packet to be transmitted, and the internal / external communication processing unit 35 _i In accordance with an instruction from the external communication means 37 or the internal communication means 36 _i The return-capable transmission buffer 31 for switching to either one of these and sending it out _i And are provided.
[0026]
Internal communication means 36 _i Is an internal bus, and as described above, the throughput is improved by providing a larger number of parallel communication paths than the external communication means 37. Reference numeral 361 _i Is a returnable transmission buffer 31. _i To memory 34 _i The communication processing unit 35 using this route. _i The global area 34 _{i, 1} And 34 _{i, 2} Transfer packets to and from. As a result, the CPU 33 _i Can be reduced, and the performance of each processing apparatus can be improved.
[0027]
Reference 39 _i Is the processing device 30 _i And the external communication means 37. Generally speaking, this tends to result in poor throughput due to hardware constraints. Therefore, path 39 _i And internal communication means 36 _i Are configured with the same throughput. _i The data transfer speed to is slow. Therefore, the internal communication means 36 _i Is route 39 _i The CPU 33 is waiting for data necessary for processing by adopting a configuration with better throughput. _i The idle time can be reduced and the performance of the processing apparatus can be improved.
[0028]
In FIG. 5, the internal / external communication processing unit 35 _i Indicates that the destination of the packet to be transmitted is itself or other external processing device 30 ₀ ~ 30 _n And the return possible transmission buffer 31 based on the determination result. _i Switching of the communication means of the packet transmission destination with respect to the memory 34 and the memory 34 indicated by the packet received via the external communication means 37 or the packet addressed to itself _i Applicable global area 34 _{i, 1} Or local area 34 _{i, 2} Instruction to write to the internal communication means 36 rather than the external communication means 37 _i The internal / external communication processing unit 35 performs processing so as to increase the amount of data per unit time of communication processing via _i And have.
[0029]
Furthermore, each processing apparatus 30 according to the present embodiment. _i Of the memory 34 _i In the portion corresponding to the main storage device, a global area 34, which is a storage area of two different types in which a packet is stored and the movement of the packet can mutually be performed, is stored. _{i, 1} And local area 34 _{i, 2} Is provided.
Here, the global area 34 _{i, 1} Is an area used for data transfer within each process, and the local area 34 _{i, 2} Is an area used for processing in each processing apparatus.
Further, the CPU 33 _i As shown in FIG. _i And the memory 34 _i Local area 34 _{i, 2} Process in
[0030]
Although not shown, the memory 34 _i In addition, a disk or the like may be provided outside the main storage device.
Reference numeral 31a _i Is memory 34 _i Packet and the internal / external communication processing unit 35 _i Is a selection unit that selects packets from _i Is a returnable transmission buffer 31. _i Send packet or receive buffer 32 from _i This is a selection unit for selecting the received packet from. Reference numeral 31c _i , 31b _i And 32b _i Represents a register.
Here, the internal / external communication processing unit 35 _i The returnable transmission buffer 31 _i And receive buffer 32 _i And register 31a _i , 31c _i 32b _i And the selector 31a _i , 32a _i Is the communication unit 38 in FIG. _i It corresponds to.
[0031]
FIG. 6 shows the internal / external communication processing unit 35. _i The function of is shown in detail.
As shown in the figure, the internal / external communication processing unit 35 according to the present embodiment. _i The self-addressing determination unit 40 determines whether the destination of the packet to be transmitted is its own processing device or another processing device. _i When the transmission instruction is given and the transmission becomes possible, the self-addressed determination unit 40 _i The returnable transmission buffer 31 is based on an instruction from the determination result by _i Instruction for switching the communication means of the packet transmission destination to be transmitted to either the external communication means or the internal communication means, and the address selection unit 43 _i Address selection for the memory 34 and the memory 34 based on the address. _i Communication instruction unit 32 for instructing access to _i And have.
[0032]
Further, the internal / external communication processing unit 35 _i In response to the instruction, the address selector 43 selects the address of the packet to be transmitted or the address of the received packet. _i And the address selector 43 _i The memory 34 based on the address selected by _i Applicable global area 34 _{i, 1} Or local area 34 _i1,2 Access instruction unit 44 for instructing access to _i The internal communication means 36 rather than via the external communication means 37 _i The processing is performed so as to increase the amount of data per unit time of communication processing via the network.
[0033]
Further, the communication instruction unit 42 _i Includes a token adding unit 42a for adding a token to a packet to be transmitted. _i A return token detection unit 42b for detecting a return token issued when a buffer on the other side or the buffer in the external communication means 37 is free and transmission is possible. _i And a termination output unit 42 indicating that the transmission process for the packet to be transmitted has been completed. _i And have.
[0034]
Further, in this embodiment, the self-addressed determination unit 40 _i As shown in FIG. 6, a self-addressed bit reading unit 40a that reads out a self-addressed bit included in a predetermined position of the packet. _i And a bit determination unit 40b for determining the read bit. _i And have. “Self-addressed bit” refers to a bit indicating whether or not the destination of packet transfer is its own processing device.
[0035]
Subsequently, an operation at the time of transmission according to the present embodiment will be described with reference to FIG.
A packet arriving from a network which is an external communication means is received by a reception buffer, and then when a memory access priority (not shown) is taken, a global area 34 of the memory is obtained. _{i, 1} Is remembered. CPU33 _i Since processing is performed in the local area, the global area 34 _{i, 1} To local area 14 _{i, 2} Transfer to is required. A packet to be transmitted to the network which is the external communication means 37 is transmitted to the local area 34. _{i, 2} To global area 34 _{i, 1} Need to be transferred to. As shown in FIG. 7, in step SJ1, the CPU 33 _i The packet header to be transmitted by the memory 34 _i To the transmission queue provided on the main memory (not necessarily the global area). This fact is related to the communication processing unit 35. _i And the inside / outside communication processing unit 35 _i An instruction to send to is issued.
[0036]
In step SJ2, the internal / external communication processing unit 35 _i Shows the packet header in the transmission buffer 31. _i The address selecting unit 43 _i Via the access instruction unit 44 for the packet instructed _i Thus, for example, an 8-byte fetch request is sent to the memory 34. _i Put out. Here, 8 bytes means the transmission buffer 31. _i Is a unit that does not overflow.
In step SJ3, the memory 34 _i The fetch request is issued several times until all packet headers stored in the packet header are completed, and the returnable transmission buffer 31 _i Write to.
[0037]
In step SJ4, all the packet headers are read and the returnable transmission buffer 31 is read. _i Retained. In step SJ5 at the time of reading the packet header of every 8 bytes, the self-address determination unit 40 _i Determines whether the destination of the packet to be transmitted is its own processing apparatus or another external processing apparatus. The determination is made by the self-addressed determination unit 40. _i Self-addressed bit reading section 40a _i The self-addressed bit at a predetermined position in the packet header indicating whether the destination of the packet to be transmitted is its own processing device or “1”, or another external processing device or “0” This is done by reading The read self-addressed bit is the bit determination unit 40b. _i Thus, it is determined whether the bit is “1” or “0”, and the result is sent to the communication instruction unit 42. _i Notify
[0038]
Here, the packet header has at least a self-addressed bit and a destination address. Specifically, the format of the packet header is as shown in FIG. 8, and the packet header includes, for example, an identifier of the destination processing device in the first word, and the packet in the mth word. Information indicating the size of the packet, and the nth includes a self-addressed bit indicating whether or not the packet is addressed to itself and an address indicating the storage location of the memory of the destination processing apparatus in which the packet is to be stored It is. Data representing the substantial content called the body from the p-th after these packet headers is included.
[0039]
Self-addressed determination unit 40 _i If it is determined that the packet is not transmitted to itself, the process proceeds to step SJ6. In step SJ6, the communication instruction unit 42 _i Token adding unit 42a _i For example, one token is added for every 4 bytes per word determined by the number of communication paths to the external communication means, and the communication processing unit 35 _i Is the return transmission buffer 31 _i Is controlled so as not to overflow, and the return possible buffer 31 _i Is sent to the external communication means 37.
[0040]
Thereafter, the return token detecting unit 42b is connected via the external communication means 37. _i Until the return token is detected from the transmission destination PE. If it is detected in step SJ7 that the return token has been returned, the process returns to step SJ6 and adds a token for each word. A packet header is sent to the external communication means 37.
The above processing steps SJ6 and SJ7 are similarly repeated not only for the packet header but also for the body, and the termination output unit 42c. _i Is repeated until a terminator indicating that transmission of all data of the packet has been completed is sent in step SJ8.
[0041]
On the other hand, in step SJ5, the self-addressed determination unit 40 _i If it is determined that it is addressed to itself, the process proceeds to step SJ9.
In step SJ9, the communication instruction unit 42 _i Is the address selector 43. _i The transmission buffer 31 _i The access instruction unit 44 uses the address extracted from the packet to be transmitted held in the access instruction unit 44. _i The transmission buffer 31 _i The local area 34 of the memory indicated by the address of the packet to be transmitted stored in _{i, 1} To write to. In this case, unlike the transfer of the packet via the external communication unit 37, the token is not added to the packet, and it is not necessary to detect the return token. This is because the communication processing unit 35 _i Is how many bytes of the packet _i And receive the same number of bytes from the memory 34 _i This is because it knows what to send to the store bus.
[0042]
To explain this point in more detail, the token is the transmission buffer 31. _i To receive buffer 32 _i Is necessary for sending data so as not to overflow, and when it is transferred to itself, it is as if the transmission buffer 31 _i Receive buffer 32 _i It appears to the communication processing unit as if Therefore, the processing starts from the end of the role of the token.
[0043]
As described above, since the present embodiment is configured as described above, the CPU 33 _i Can perform processing using the data in the cache, and when the communication processing unit finishes packet transfer, it can move to the next processing that opened the transfer data. When the processing is completed, even if the processing data is data to be transferred to the network, the communication processing unit transfers the data, so that the next processing can be performed.
[0044]
Further, when a network that is a connection device between processing devices (between PEs) is configured with a crossbar or the like, the bus width between the PE and the network is limited from the viewpoint of hardware. Alternatively, the transfer time may be longer. In this case, the route 39 to the network _i By each processing device 30 ₀ ~ 30 _n By configuring the internal communication means so that the throughput of the internal communication means is improved, for example, the external communication means is configured so that the data transmission of the internal communication means is 2 bytes / 1 cycle with respect to 1 byte / 2 cycles. Thus, the waiting time of the CPU until the processing data is ready can be shortened.
[0045]
In the above description, the processing amount of the CPU is described as being larger than the processing amount of the communication processing unit. However, when the processing amount of the communication processing unit is large, it can be considered that it can be dealt with by increasing the number of communication processing units.
[0046]
Next, a second embodiment will be described based on FIG.
Unlike the processing apparatus according to the first embodiment, the distributed memory type processing apparatus according to the second embodiment is the self-addressed bit reading section 40a as the self-addressed determination section. _i And bit determination unit 40b _i Instead of the address reading unit 50a _i And a comparison unit 50b for comparing the machine number with its own processing device _i Are provided.
[0047]
In other words, in the second embodiment, unlike the first embodiment, the packet format is not shown in FIG. 8, but as shown in FIG. If the number of the processing apparatus indicating the destination is the same as the machine number of the processing apparatus of its own, it is determined that the transmission data is addressed to itself.
[0048]
Unlike the first embodiment, this example does not need to provide a bit indicating the address to the packet and can use the destination of a normal processing device, so that it is not necessary to change the packet format and is therefore easy to handle. .
In addition, the same code | symbol as the code | symbol used in the case of the processing apparatus of a 1st Example shows the same thing.
In the above example, a method of performing data communication using a token and a return token has been described. However, the present invention is not limited to this example, and the IEEE CSMA / CD (Ethernet) method, token passing ring method, token passing bus, etc. It can also be applied to a system, a MAN system, a digital PBX system, and the like.
[0049]
【The invention's effect】
As described above, according to the present invention, the internal communication means having a larger data width compared to the external communication means is provided, and whether the destination of the packet to be transmitted is another processing device or not. It is determined whether it is a processing device, and in the case of a packet addressed to itself, it has a self-wrapping function by instructing to write to the corresponding area of the memory.
[0050]
Accordingly, the CPU is not burdened to transfer data to itself using the function of the communication processing unit without using a CPU or separately providing a device such as a DMA, and the configuration is simplified. be able to.
In addition, the throughput of the transmission to itself is better than that of the communication processing unit communicating with the external processing device. Therefore, it is possible to improve the performance of the multiple processing devices and the data communication method by shortening the waiting time of the CPU of each processing device.
[Brief description of the drawings]
FIG. 1 is a principle block diagram of the first invention.
FIG. 2 is a principle flowchart of the second invention.
FIG. 3 is a block diagram showing an embodiment of the first invention.
FIG. 4 is an overall block diagram of the device configuration according to the first embodiment.
FIG. 5 is a block diagram according to the first embodiment.
FIG. 6 is a block diagram showing an internal / external communication processing unit according to the first embodiment.
FIG. 7 is a flowchart according to the first embodiment.
FIG. 8 is a diagram showing a packet format example according to the first embodiment;
FIG. 9 is a block diagram showing an internal / external communication processing unit according to the second embodiment.
FIG. 10 is a diagram showing an example of a packet format according to the second embodiment.
FIG. 11 is a block diagram according to a conventional example.
[Explanation of symbols]
10 ₀ -10 _n , 30 ₀ ~ 30 _n Processing equipment
11 _i , 31 _i Returnable send buffer
12 _i , 32 _i Receive buffer
13 _i , 23 _i (33 _i Data processing unit (CPU)
14 _i , 24 _i , 34 _i memory
15 _i , 25 _i , 35 _i , 45 _i Internal / external communication processor
16 _i , 26 _i , 36 _i Internal communication means
17, 37 External communication means

Claims

A plurality of processing devices (10 ₀ to 10 _n ) each having a memory (14 _i ; i = ₀ to _n ) and a data processing unit (13 _i ) and connected to each other via an external communication means (17). )
Each processing device (10 _i )
A reception buffer (12 _i ) for receiving a packet transferred via the external communication means (17);
The packet to be transmitted is held, and the held packet is switched to either the external communication means (17) or the internal communication means (16 _i ) in accordance with an instruction from the internal / external communication processing unit (15 _i ) and transmitted. A returnable transmission buffer (11 _i );
Determination of whether the destination of the packet to be transmitted is self or another processing device, switching of communication means of the packet transmission destination for the returnable transmission buffer (11 _i ) based on the determination result, and reception buffer (12 _i ) or an instruction for writing to the corresponding storage area (14 _{i, 0 to} 14 _{i, m} ) of the memory (14 _i ) indicated by the packet received via (12 _i ) or the packet addressed to itself By providing a larger number of parallel communication paths for the internal communication means (16 _i ) than for the means (17) , or by reducing the minimum interval of communication processing, the internal communication means (16 _i ), And an internal / external communication processing unit (15 _i ) that performs processing so as to increase the amount of data per unit time of communication processing via
The memory (14 _i ) is connected to the data processing unit (13 _i ), the reception buffer (12 _i ), the returnable transmission buffer (11 _i ), and the internal / external communication processing unit (15 _i ) and external communication means. A multi-processing device comprising internal communication means (16 _i ) having a larger number of parallel communication paths than (17).

The internal / external communication processing unit (25 _i ) is a self-addressed determination unit (20 _i ) that determines whether the destination of a packet to be transmitted is its own processing device or another processing device ,
When a transmission instruction is given and transmission is possible , switching of communication means for sending packets to the returnable transmission buffer (16 _i ) based on the determination result by the self-addressed determination unit (20 _i ) A communication instruction unit (22 _i ) for selecting an address for the address selection unit (23 _i ) and instructing an access to the memory (14 _i ) based on the address ;
Based on the instruction, the address selection unit which performs one of selection of addresses of the address or the received packet of the packet to be transmitted (23 _i), on the basis of the selected address by the address selector (23 _i) An access instruction section (24 _i ) for instructing access to the corresponding storage area (14 _{i, 0 to} 14 _{i, m} ) of the memory (14 _i ), and internal communication rather than external communication means (17). The multi-processing device according to claim 1, wherein processing is performed so as to increase a data amount per unit time of communication processing via the means (16 _i ) .