JP3974078B2

JP3974078B2 - Interprocess communication method using distributed shared memory

Info

Publication number: JP3974078B2
Application number: JP2003162900A
Authority: JP
Inventors: 晶田中
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2003-06-06
Filing date: 2003-06-06
Publication date: 2007-09-12
Anticipated expiration: 2023-06-06
Also published as: JP2004362476A

Description

【０００１】
【発明の属する技術分野】
本発明は、分散共有メモリを用いたプロセス間通信システムに関するものである。特に、本発明は、OS或いはAP（Application）層に設けたローカルバッファでプロセス或いは転送メッセージに応じた規模のメッセージ送信を一時的に保留して送信優先度レベルの高いプロセスの送信をレベルの低いプロセスの送信に割込ませるスケジューリングにより、ノード間のメッセージ転送において実時間性の高いメッセージを実時間性の低いメッセージに妨げられることなく送信する優先度制御を行う分散共有メモリを用いたプロセス間通信システムに関するものである。
さらに、本発明は、分散共有メモリを用いたプロセス間通信に用いられる通信機能を有するデータ処理装置に関するものである。
【０００２】
【従来の技術】
ノード間ネットワークではソフトウェア負荷の少ないハードウェア組込みの通信手段が注目され、分散共有メモリを用いた通信方式もそのレイテンシの小ささから有効である。ところで、プロセッサ間を接続する高速通信方式は、プロセス（或いはAP）の特性にあわせて設計される場合が多く、特性が異なるプロセス（或いはAP）には、別の高速通信システムを用いることになる。即ち、各々の通信方式はその適用領域が比較的制限されている。例えば、分散共有メモリを用いた高速通信方式では、実時間性の高いメッセージを送信している際に大規模メッセージ送信との混在が生じるとネットワークが隘路となることで大規模メッセージ送信が完了するまでの間、実時間性の高いメッセージの送信が待たされ、実時間性の高いメッセージの送信が妨げられる場合がある。また、大規模メッセージを転送するためのメッセージ転送専用の通信方式の場合は、一般に実時間メッセージ送信を目的にしていないこと並びに通常送信装置に引渡すメッセージ全体を一旦バッファに格納するためバッファリングの時間を要すること等から、大規模メッセージ転送用のシステムを高頻度な実時間メッセージ通信にも利用することは困難であった。
【０００３】
従って、プロセスから高速のデータ転送を行う装置又はシステムに至る経路はプロセス（或いはAP）の特性毎に異なり、プロセス（或いはAP）毎に異なった高速通信方式を実装或いは切替えて使用する等で対処せざるを得ず、切替えのためのプログラム或いは装置の導入による費用の増加、ホストプロセッサ資源利用の増加等の負担がかかる。
【０００４】
【発明が解決しようとする課題】
このように、従来のノード間データ転送技術においては、高速通信装置を用いて一括して送信される大規模なメッセージと実時間性が高いメッセージの送信が混在すると、ネットワークが隘路になることで大規模メッセージ送信が継続する間は実時間性の高いメッセージの送信が妨げられる不都合が生じていた。特に、ファイル転送のような大規模転送専用の装置を用いると実時間性の高いメッセージの送信は困難であり、同一の通信装置を用いて実時間性の高いメッセージ送信と大規模メッセージの送信を共に行いにくい、という問題がある。分散共有メモリを用いた高速通信方式では、元来対象としていた制御情報のような小規模メッセージのみならず、大規模メッセージでも高速性が要求される。
【０００５】
本発明の目的は、一般に実時間性の高いメッセージ送信と比較的実時間性の低い大規模メッセージの送信を共に行いにくいという問題を、分散共有メモリを用いた高速通信方式のみを用いて比較的小規模の改造のみにより解決し、多様な実時間性や多様なデータ規模のメッセージを発生するプロセスの送信を行う際、複数の高速通信装置を実装し切替えて用いたり或いは他の通信方式に入れ替えることなく、プロセスの特性、特に転送メッセージの実時間性を損なわずに送信できる分散共有メモリを用いた高速通信方式を提供することにある。
さらに、本発明の目的は、複数のプロセスを有し分散共有メモリを用いたプロセス間通信方式に用いられるデータ処理装置において、実時間性を必要とするデータないしメッセージを優先的に転送することができるデータ処理装置を実現することにある。
さらに、本発明の別の目的は、分散共有メモリを用いた高速通信方式をより広範囲な分散実時間通信機能として提供することにある。
【０００６】
【課題を解決するための手段】
上記目的を達成するため、本発明による分散共有メモリを用いた通信システムは、複数のノードがネットワークを介して相互接続され、各ノードは、データないしメッセージを生成する1個又は複数個のプロセスと、他のノードとデータを共有するために用いられる分散共有メモリを有するメモリと、分散共有メモリへのデータの書込みを制御する制御手段と、分散共有メモリに書き込まれたデータのネットワークへの送信及びネットワークから送られてきたデータの受信を行うデータ転送装置とを具え、各ノードにおいて、自己の分散共有メモリにデータが書き込まれると、自己のデータ転送装置が、当該データを自動的に転送先のノードの分散共有メモリにそれぞれ転送する分散共有メモリを用いたプロセス間通信方法において、
各ノードは、プロセスが生成したデータを一時的に格納するローカルバッファを有し、前記制御手段は、プロセスにより生成されたデータのデータサイズが所定のデータ長以下の場合当該データを分散共有メモリに直接書込み、生成されたデータのデータサイズが前記所定のデータ長を超える場合当該データを前記ローカルバッファに一時的に書込んだ後分散共有メモリに書込むことを特徴とする。
【０００７】
本発明では、書込み制御手段は、プロセスから受け取った処理メッセージについて所定のメッセージ長又はデータ長と比較し、所定のデータ長ないしメッセージ長以下のデータないしメッセージについては直接分散共有メモリに書込み、所定のメッセージ長を超える場合当該処理メッセージについてはＯＳ又はＡＰ層に設けたローカルバッファに書込み、その後分散共有メモリに書込まれるように制御する。この制御により、実時間性レベルの高いメッセージは優先的に分散共有メモリに書込まれることになる。この結果、メッセージ長の異なる種々のメッセージが混在した場合であっても、複雑な優先制御機構を設けることなく、実時間性の高いメッセージを優先的に送信することができる。尚、本明細書において、「ノード」は、通信機能を有するデータ処理装置を意味し、アプリケーションとしてのプロセスを含む場合だけでなくＰＭ（プロセッサモジュール）及びプロセッサを含むものと理解される。
【０００８】
また、ローカルバッファに仮想番地を与え、実分散共有メモリとの番地変換機能を設けることでOS或いはAP層からはあたかも実分散共有メモリにアクセスするのと同じ過程で高速通信方式を利用することができる。また、ローカルバッファを割り当てる際、プロセス或いはメッセージの実時間性或いはメッセージ規模に応じてOSが規模を定めた一括転送単位毎に割り当て、プロセス或いはメッセージに求められるメッセージの保留規模即ち保留時間を設定し、優先度の高いプロセス或いはメッセージの送信が先行して行われやすくすることができる。この結果、実装形態の簡単な変更又は修正を行うだけで、分散共有メモリを用いたプロセス間通信におけるノード間のメッセージ転送の優先制御を実現することができる。ここで、一括転送単位とは、ローカルバッファにメッセージを一時的に蓄積する場合に、予め定めた一定のデータサイズ毎に区切って蓄積し、当該データサイズ単位でローカルバッファから分散共有メモリにメッセージを書込むが、その一定のデータサイズを持つデータの単位を意味するものとする。
【０００９】
また、一括転送単位内の全領域単位（MB）がアクセスされたことを契機に送信を行う場合、一括転送単位で実共有メモリの内容を上書き（受信の場合は実共有メモリにローカルバッファ内容を上書き）することによりプロセスが意識することなく転送が行われる。尚、ＭＢ（Message Buffer）は、データを書込み分散共有メモリ内の領域単位を意味する。
【００１０】
また、プロセス側から強制的に前記上書きを起動することにより、ローカルバッファと実共有メモリとの間の同期をとることにより、最新のメッセージを随時読み書きできる、即ち、転送先のノードと通信することができる。
【００１１】
また、OSが与える優先度を一括転送単位の規模を用いて示すことで、1個のパラメータでメッセージがネットワークに送出されるまでの保留時間とOSの優先度の両方を制御し、さらに、プロセスが間接的ながらこのパラメータの値を決定することによりプロセス自身がこれら優先度を制御できることを特徴とする。
【００１２】
【発明の実施の形態】
本発明においては、分散共有メモリを用いた高速通信方式を、プロセスの特性（特に、送受信するメッセージ規模とメッセージに求められる実時間性）に応じて如何に効率的に提供するかに関して、下記の事項を解決するものである（プロセスは、AP或いはOS内の何れのものでも良い）。
即ち、ネットワークで接続されるプロセス間通信においては、高速通信方式をより汎用的な通信手段として用いる際、送信元であるプロセスが常に特性の類似したメッセージを送信要求するとは限らず、また、受信するメッセージが常に類似した特性を持つとは限らない。つまり、メッセージの特性に応じて高速通信方式の提供形態や方式の種類を変更する必要が有る。しかし、複数の通信方式を備えてメッセージ特性に応じて通信方式を切替える方式では、装置規模の肥大化等が生じる不都合がある。そこで、本発明ではノード内のOS或いはAP層にローカルバッファを設け、大規模な或いは実時間性の低いメッセージの一時的な保留とOSによる分散共有メモリヘの書込み優先度付けを組み合わせ、これを、OS主体のローカルバッファ制御とプロセスからの強制的な動作、プロセス或いはメッセージの特性に応じたローカルバッファ内に割り当てる領域即ち一括転送単位の規模の設定、仮想的な番地によるローカルバッファと分散共有メモリの同一視、OSへのメッセージ規模の通知によりローカルバッファ規模を適切に確保する。これにより、しばしば高速通信システムにおいて生じるメッセージ特性による性能の差異を吸収し、常に同一の高速通信システムで多様なメッセージを通信できる。さらに、OS或いはAP層に当該機能を設けることで、従来の分散共有メモリを用いる通信装置が有する特徴、例えばアプリケーションやOSに負荷をかけることなく、メッセージを小さな遅延で通信する性能を併せて達成することができる。
【００１３】
尚、本発明はネットワークで接続されるノード各々の中にあるプロセス同士の通信にも適用することができる。従って、ノードは1つのPMから構成される場合だけでなく、複数のPMから構成される場合も適用され、各PM内に1つ或いは複数のプロセスが動作する場合もある。即ち、本発明の対象とするプロセス間通信は、異なるノード内のプロセス同士の通信のみならず、1つのノード内の異なるPM内のプロセス同士の通信や1つのノード内の1つのPM内のプロセス同士の通信も含み、動作は同様のものである。請求項1においてPMとノードをノードで総称するとしたが、このように対象とするプロセスは幾つかの形態を持つものの、これら形態全てを対象とし同様の動作を行うものである。
【００１４】
【実施例】
以下、本発明の実施例を、図面により詳細に説明する。図1は、本発明が適用される一般的な高速通信方式を用いたネットワークモデルの構成の配置と関係例を示す図である。図1では、高速通信装置である分散共有メモリ（DSM）を用いる通信装置が11A、ホストプロセッサに含まれるOS類が12A、及び、主として小規模メッセージを通信するプロセス類が13A、大規模メッセージ転送装置が11B、ホストプロセッサに含まれるOS類が12B、及び、大規模メッセージを通信するプロセス類が13Bである。高速通信装置はAPに依存して適用される場合が多く、AP内のプロセスのメッセージの特性にあわせて通信装置が設計或いは実装されるため、一般に比較的短いメッセージを通信するAP類からの通信装置アクセスは14A、大規模ファイルを通信するAP類からの通信装置アクセスは14Bの形態であるが、ローカルバッファ（１６Ｃ内）の導入により、主として小規模メッセージを通信するプロセス及び大規模メッセージ転送共にDSM通信装置11Cにアクセスする形態で通信を行う。図1では小規模メッセージを通信するAP類と大規模メッセージを通信するAP類の2種類のAP類が示されているが、その種類が3つ以上の場合でも全く同じである。即ち、APに依存せず短いメッセージ送信から大規模ファイル送信まで広範なAPに対して同一の通信装置で効率の良い通信が提供される。尚、本明細書において「ノード」は、通信機能を有し、1個又は複数のプロセスによりデータが生成されるデータ処理装置を意味するものとし、１つ又は複数個のプロセッサモジュール（PM：Processor Module）で構成することができる。すなわち、本発明は、複数のノード間での分散共有メモリへのデータないしメッセージの書込みが競合した場合及び同一ノード内において複数のプロセスにより生成されたデータの分散共有メモリへの書込みが競合した場合の両方について適用される。
【００１５】
図２は本発明が適用される分散共有メモリを用いる通信装置を有するノード及びネットワーク構成図である。図２では2個のノードの26Aと26Bが例示されているが、この数が3個以上であっても全く同じである。各ノードの主体はホストプロセッサ25A，25B、及び、ローカルメモリ23A，23B、及び、分散共有メモリ21A，21B、及び、例えば分散メモリカプラのような分散共有メモリ間のメッセージを転送する装置（分散メモリカプラと呼称）22A、22Bである。ホストプロセッサは、複数個のプロセスを有するものとする。ローカルメモリ23A，23Bには、ローカルバッファ24A，24Bが置かれる。符号27A及び27Bは、このローカルバッファの効果により優先度制御がなされる対象である分散共有メモリを用いる通信システムのプロセス-分散共有メモリ間の共通路である。
【００１６】
図2の21〜221Dは、本発明の分散共有メモリを用いる通信装置をソフトウェアの観点から送信動作を中心に説明する図である。21Dはノード、22Dはノードを構成するソフトウェア部分、23Dはノードを構成する装置を示し、各々AP（このAPの部分として複数個のプロセスがある）24D、OSは25D、ローカルメモリ26D内のローカルバッファ27Dと、プロセス或いはメッセージに応じて規模を可変に割り当てられる一括転送単位221D、ローカルバッファを管理するOSの機能222D、分散共有メモリ28Dとメッセージ送信毎に割り当てられる一定規模で区分された要素領域MB210D、分散メモリカプラ29D、ノードを接続するネットワーク211Dを示している。
【００１７】
従来、プロセス24Dで発生したメッセージ213Dは分散共有メモリ28Dに書込まれると（215D）、同時に分散メモリカプラ29Dが検出して自動的に取込み（216D）（その際、MB：210D単位で使用される）、ネットワーク211Dに送出する。例えば、IP通信に用いようとした場合は、IPoverATMのARPサーバを用いる等の方法によりIP毎に宛先ノードを決定でき、そのVCを割り当てればよいので、高速なIP通信への適用可能性も想定される。
【００１８】
本発明のローカルバッファを導入した場合、プロセス24Dで発生したメッセージ214Dは、実時間性が小さい場合等プロセス或いはメッセージ特性に応じてはローカルバッファ27D内に割り当てられた一括転送単位221Dに書込まれ（217D）、一括転送単位221D内の全ての領域がアクセスされた時点（一括転送単位は分散共有メモリ210DのMBと対応している）で分散共有メモリ28Dに書込まれ、同時に分散メモリカプラ29Dが検出して自動的に取込み（219D）（その際、MB：210D単位で使用される）、ネットワーク211Dに送出する。
【００１９】
図３に本発明によるメモリの一例の構成２１Ｅ〜２８Ｅを示す。このメモリ構成を有するノードをノードＡとする。符号２１Ｅはローカルメモリ上で本発明に用いられる範囲を示し、２２Ｅはローカルバッファを示し、２３Ｅは分散共有メモリを示す。分散共有メモリ２３Ｅは、符号２６Ｅで示すＭＢ毎に区切られており、ＭＢは各々自ノードから全てのネットワーク内の全ての（或いは、通信対象となる）ノード宛に分類され、例えばノードＡ＞ノードＢのＭＢ（複数あってもよい）にメッセージが書込まれると分散共有メモリカップラによりこのノードからノードＢにメッセージが送られる。ノードＢにおいても同様なメモリ構造がとられているので、ノードＢではノードＡ＞ノードＢに当該メッセージが書込まれる。２７Ｅは発信用のＭＢを示し、２６Ｅは受信用のＭＢを示す。このＭＢの構造は分散共有メモリを用いた通信の一例であり、本発明に必須の技術的事項ではないが、参考として記載したものである。符号２２Ｅで示すローカルバッファは一括転送単位に区切られ、ローカルバッファを介する各送信処理において一括転送単位は１個以上用いられ、複数個用いられる場合はその領域を更新しながら、例えば図中の矢印線のように順次更新しながら用いられる。ＡＰからの送信要求の際にＡＰから制御手段に通知されるメッセージサイズに基づいて、制御手段（ローカルバッファ管理機能）は当該送信で用いられるローカルバッファの容量を例えば図中二重線の範囲（２５Ｅ）のように予め決定できるので、ローカルバッファ全体２８Ｅの容量を無駄に確保することなく抑制することができる。
【００２０】
図４は、本発明の分散共有メモリを用いる通信装置の送信側におけるメッセージ通信方式を時間経過に従い説明する図である。符号31A、31Eはノードを示し、32A，32EはAP層を示し、33A，33Eは分散共有メモリ及び分散メモリカプラ層を示す。さらに、符号34A，34Eはネットワークを示し、31Bはローカルバッファを示し、32Bはローカルバッファを用いる場合の分散メモリカプラを示し、33Bはローカルバッファを用いる場合のネットワークを示す。まず、従来の送信を説明する。プロセスはメッセージ送信に先立ち分散共有メモリの書込み領域であるMBを確保する（OSに通知し確保され、番地が返される）。プロセスで発生するメッセージは分散共有メモリ内の概番地で示されるMBに書込まれ、並行して分散メモリカプラがこれを取込みネットワークに送出する。受信側でメッセージの受信完了後、受信完了通知をMB毎に送信側に返信し、送受ともに当該MBを解放する。
【００２１】
ところで、プロセスで発生する実時間性の高いメッセージ36Eは、大規模メッセージの送信35Eが混在する場合、メッセージ36Ｅが分散共有メモリに書込まれる間（37E）は分散共有メモリーに書き込めず、メッセージ37Eが書き込まれた後に書き込まれる（38E）。その結果、時刻311Eでネットワーク上に送信される（310E）。OSによる割込みを導入した場合、35Eより優先度の高い36Eが分散共有メモリに書き込む際は割込みにより35Eを中断し、優先的に36Eを書込む。この場合を示したのが31Dと32Dに分かれた大規模メッセージであり、実時間性の高いメッセージ（31C）は、32Cとして即時に分散共有メモリに書き込まれる。
【００２２】
この状況でいかにローカルバッファが用いられるかを以下に示す。プロセスは起動の際、転送メッセージの規模をローカルバッファ管理機能（図2ではローカルバッファを管理するOSの機能222D、これは分散共有メモリの管理機能を拡張したもの）に通知する。これは、プロセス起動の際でなく随時でも良い。プロセスからの送信のための分散共有メモリへの書込みに先立ち、MB確保をOSに通知するが、OSは従来の分散共有メモリ上のMB確保に代わりローカルバッファに一括転送単位を確保する（OS内の従来の該機能は、ローカルバッファ管理機能である）。一括転送単位の規模はローカルバッファ管理機能が先に通知されたメッセージ規模に応じて決定し、番地を返す。実時間性の高い場合は、実際には小規模メッセージの場合であるが、一括転送単位規模を0として、従来の経路と同様にメッセージ緩衝域は使用されず直接分散共有メモリに書込まれる（ローカルバッファ管理機能或いはOSが返す番地は従来通り分散共有メモリ上のMB番地となる）。プロセスで発生する実時間性の高いメッセージ31Cが発生と並行して33A及び33Eの分散共有メモリ及び分散メモリカプラ層で処理されてネットワーク34Aに送出される。大規模メッセージの送信である一連のメッセージ31Dと32Dと混在すると、33A及び33Eの分散共有メモリ及び分散メモリカプラ層で処理され並行してネットワーク34Aに送出されるが、ネットワーク上に送出されるメッセージ36Aは隘路となるネットワークにより、31Cから発生してネットワーク上に送出されるメッセージの送信を妨げ、31Cからのメッセージは本来時刻38Aからネットワーク上に送出されるところが、時刻39Aから送出される（メッセージ37A）。この遅延が、ローカルバッファの次の動作で解消されるので、その手法を説明する。尚、時間310Aは一連のメッセージ31Dと32D用に割り当てられたローカルバッファに割り当てられた一括転送単位が満たされるまでの時間である。
【００２３】
実時間性の高いメッセージ31Cと大規模メッセージの送信31Dと32Dとが混在すると、大規模メッセージの送信31Dと32Dは310Aの時間はローカルバッファ31Bに保留され時刻39Aからネットワーク上に送出されるため、実時間性の高いメッセージ31Cは大規模メッセージの送信31Dと32Dのネットワーク送出に妨げられることなく、本来の送信時刻38Aにネットワーク上の送出が可能になる（35A）。これは、大規模メッセージはMB確保の際、ローカルバッファ内の一括転送単位の番地をプロセスが得るので該番地に書込み、一括転送単位が一杯になるとこれをOSが検出し（例えば、分散メモリカプラのMB書込み長計数機能を拡張しローカルバッファ書込み長を計数させ、一括転送単位が一杯になると割込みによりOSがそれを検出する）、分散共有メモリ書込み待ちになっている小規模メッセージ即ち一括転送単位規模が0のメッセージがあるかを判定し、あれば、大規模メッセージのローカルバッファ書込みを中断し、小規模メッセージの分散共有メモリ書込みを先に行い、終了後大規模メッセージのローカルバッファ書込みを再開する（既に大規模メッセージを全て書込み終わっていたら、他のメッセージの書込みを始める）、といった動作である。一括転送単位規模を大きく設定しておけば、大規模メッセージが分散共有メモリに書き込まれるまでの時間が長くなり、即ち、大規模メッセージの送信が保留されている問により多くの実時間メッセージを先行して送信することができる。
【００２４】
バッファ管理や分散共有メモリカプラの動作も含めた例の概要を３１Ｆ〜３１９Ｆに示す。符号３１Ｆはプロセッサモジュールを示し、３２Ｆはアプリケーション層（メッセージ送信を行うプロセス）、３３Ｆはローカルバッファ管理ノード、３４Ｆローカルバッファ、３５Ｆは分散共有メモリ、３６Ｆは分散メモリカプラ、３７Ｆはネットワークを示す。大規模メッセージ（３１８Ｆ）が先に送信要求し、その処理中に実時間メッセージ（３１９Ｆ）が送信要求した場合を示す。大規模メッセージについて送信要求された後、ローカルバッファの番地が示され、ローカルバッファに書込まれる（３８Ｆ）。この際一括転送単位で規定された領域が満杯となると、分散メモリカプラがそれを検出し（３１２Ｆ）、ローカルバッファ管理機能に通知し（３１３Ｆ）、実時間メッセージの送信要求があれば分散メモリカプラへの書込みを指示して送信要求しているプロセスから実時間メッセージの分散共有メモリへの書込みが開始される（３９Ｆ）。この書込みが終了すると、分散メモリカプラが終了を検出し（３１４Ｆ）しローカルバッファ管理機能に通知する。ローカルバッファ管理機能はローカルバッファに保留していた大規模メッセージ（３８Ｆによるもの）を分散共有メモリに書込み起動する（３１５Ｆ）。当該書込みが開始され（３１０Ｆ）、その書込みの完了を分散メモリカプラが検出すると（３１６Ｆ）、書込み完了をローカルバッファ管理機能に通知する。ローカルバッファ管理機能は、大規模メッセージの続きがある場合又は無い場合であっても新たなメッセージが大規模であれば、ローカルバッファへの管理を指示し（３１７Ｆ）、当該メッセージがローカルバッファに書込まれる（３１１Ｆ）。一方、小規模メッセージであれば、直接分散共有メモリへの書込みを指示する。これにより、実時間メッセージは送信処理開始から大規模メッセージの送信処理に妨げられることなく即時にネットワークに送信される。
【００２５】
図５は、本発明を適用した場合の分散共有メモリを用いる通信装置の受信側におけるメッセージ通信方式を説明する図である。本図においては矢印線で情報の流れと向きを示し、実線が通信されるメッセージを示し、破線が信号を示している。上段の符号41B〜46Bで示す部分は、本発明を用いない従来の分散共有メモリを用いる通信装置の受信側におけるメッセージ通信方式を説明する図である。43Bは分散メモリカプラを示し、44Bはプロセスを示し、41Bはネットワークからのメッセージの到着を示している。大規模メッセージがネットワークに送出される場合、ネットワーク側への転送単位は分散メモリカプラの送出単位に細分されて送出されるので、41Bの様に大規模メッセージが細分されて到着することになる。細分メッセージが到着すると直ちにプロセスが読み出せる状態でノード内のメモリの分散共有メモリとして割り当てられている領域に書き込まれ、プロセスは到着を認識し（45B）、順にプロセスが読込み或いはプロセスが管理するメモリ領域に転写する等といった読出しを行う（46B）。この際のプロセスの読出し処理は時間間隔42Bで行われるが、ネットワーク側の転送単位或いは分散メモリカプラの送出単位に応じた、即ち、プロセス側とは無関係な単位に細分された単位を扱うためのプロセスの処理が、42Bの間隔で必要になる。
【００２６】
図５の下段の符号41A〜47Aで示す図は、本発明を用いた場合の分散共有メモリを用いる通信装置の受信側におけるメッセージ通信方式を説明する図である。符号43Aは分散メモリカプラを示し、44はローカルバッファ（その中の一括転送単位）を示し、45Aはプロセスを示し、41Aはネットワークからのメッセージの到着を示している。大規模メッセージがネットワークに送出される場合、ネットワーク側の転送単位或いは、分散メモリカプラの送出単位に細分されて送出されるので、41Aの様に大規模メッセージが細分されて到着することになる。細分メッセージが到着すると直ちにノード内のメモリの分散共有メモリとして割り当てられている領域に書き込まれるが（48A）、プロセスでは無くローカルバッファ（その中の一括転送単位）に一旦読み込まれる。一括転送単位が一杯になる（分散共有メモリにおいて対応するMBに相当する領域全てに書き込まれる）とプロセスにメッセージ到着が初めて通知され（46A）、プロセスが読出しを行う（47A）。一括転送単位の規模はプロセス〈或いはメッセージ特性〉に応じて決められており、ネットワークや分散メモリカプラ依存の細分単位でなく、プロセス側の特性による細分単位（大規模メッセージ規模を幾つかに区切った規模を想定）で（42A）プロセスが読み出す。従って、従来方式のようにネットワークや分散メモリカプラ依存の細分単位間隔（42B）で、ネットワークや分散メモリカプラ依存の細分単位に応じた間隔でプロセスの処理起動の必要性が生じるのではなく、プロセスの処理の起動は、プロセス内のメッセージ処理時間等のプロセス側の特性によって決められる細分（大規模メッセージ規模を幾つかに区切った規模を想定）間隔（42A）で（即ち、42Bより広い間隔で）、即ちプロセスに依存した細分単位に応じて行うことができる。これは、通信に先立ちプロセスがセA確保のためにOSに通知するメッセージ規模に応じて一括転送単位の規模がOSにより割当てられるので、この規模（時間の観点からは読出し間隔）を間接的にプロセスが決定することになるからである。
【００２７】
ローカルバッファ（16C内、24A、24B、31B、44B）は分散共有メモリ（23A、23B）と対応した仮想番地が与えられており、ローカルバッファ管理機能222Dには仮想-実番地変換機能と、ローカルバッファに関するメッセージ直接アクセスコマンドが用意される。仮想番地によりメッセージ緩衝領はアプリケーションプログラム／OSからあたかも分散共有メモリとしてアクセスされる。プロセスがある分散共有メモリ領域即ちMBに書込みを行うと、ローカルバッファ内に割り当てられた一括転送単位に書き込まれ、一括転送単位の全てのMBに書込みがなされると、分散メモリカプラ（22B、29D）やOS（25D）内で関連する分散共有メモリアクセス機能（例えば［1］等の機能）により分散共有メモリの当該範囲へ従来システムでプロセスがDSMに対し書込みを行うのとほぼ同じ動作で書き込み更新することになるが、アプリケーションプログラム／OS或いはそのプロセスが直接書込みコマンドを発行すると、一括転送単位の全てのMBに書込みがなされなくても、一括転送単位により分散共有メモリの当該領域が上書きされる。
【００２８】
ここで、「仮想番地」とは、プロセスから見てローカルバッファを介する場合と介さない場合の何れの場合も同一のインタフェースとなるように設けるものであり、仮想番地を用いなくてもローカルバッファを導入することによる実時間メッセージの先行送信に関する処理は変わりなく行われるが、仮想番地を用いるとバッファ管理機能或いはＯＳからプロセスにより指定されるメモリ書込み番地は実番地ではなく仮想的な番地として指定される。一方、プロセスが書込む際は、書込み先として指定する番地を、バッファ管理機能或いはＯＳがバッファを介する場合はローカルバッファの番地に変換し、バッファを介さない場合には分散共有メモリの番地に変換して当該番地にメッセージを書込む。さらに、分散共有メモリを指定する場合は、仮想番地が実番地と同一となる適用方法もあり、この場合ローカルバッファの番地指定のみが仮想番地にて行われることになる。尚、分散共有メモリが全ノードで同一の番地を共有する仕組みである場合は、各ノードで実番地の代わりに仮想番地を用いて、実際の番地が各ノードで異なってもプロセス側からあたかも同一番地を共有しているように見えるようにすることも可能である。この仮想番地の付与手段は、ローカルバッファと分散共有メモリとが同一のローカルメモリ上にとられるなら、容易に本発明の仮想番地付与手段に用いることができるので、プロセスから見たインタフェースをローカルバッファを介するか否かにかかわらず容易に統一することができる。
【００２９】
読出しの場合も同様に、プロセスが分散共有メモリ領域即ちMBの読出しを行うとローカルバッファ内に割り当てられた一括転送単位の内容が読み出されるが、アプリケーションプログラム／OS惑いはそのプロセスが直接読出しコマンドを発行すると分散共有メモリの当該領域により一括転送単位が上書きされる。このメッセージ直接アクセスコマンドにより書込み／読出し各々において強制的に一括転送単位の内容と実分散共有メモリとの同期をとることができ、随時ローカルバッファ或いは実分散共有メモリは最新の状態に保つことが可能である。尚、分散共有メモリと同期をとり相手側にメッセージが到着し受信処理がなされる前に一括転送単位が後続のメッセージ内容により上書きされるのを防ぐため、ローカルバッファ内で一括転送単位はインクリメント即ち番地を更新しながら利用される。従って、一括転送単位のためには最大で、分散共有メモリ上に1ノード向けに取られる領域、即ち、1ノード向けのMB領域程度を、確保することになる。例えば、大規模メッセージを送るプロセスが1種類なら、例えば256由のバッファを256面有している、等でも64KBで良い。しかし、大規模メッセージを送るプロセスにのみ割り当て、同時に送信がなされる場合にそのプロセスの数だけ割り当てれば良いので、分散共有メモリに比べても十分小さくて良い。従って、ローカルバッファの容量はメモリに比較して小さく、アプリケーションプログラムやOSに影響しない。
【００３０】
プロセスからの1回のメッセージ送信要求により1個のMBが利用されるので、MB規模を想定されるメッセージ規模に比べ必要以上に大きくするとMB内で無駄に確保される部分が生じ分散共有メモリの使用率が低下する。そのため分散共有メモリのMBサイズは通常想定されるメッセージ規模にあわせて比較的小さく細分されており、またMB管理の複雑化を避けるためMB規模は何れも同一である。従って、例えば、大規模メッセージと小規模メッセージが混在する場合のために、MB規模に大小を設け、大規模メッセージの場合は大きなMBを用い、小規模メッセージの場合は小さなM8を用いるようにし、大きなMBを選択した場合は一時的に分散共有メモリヘの書込みを待たせる、といった方法で送信に優先度をつける方法は、MB管理の複雑化を招くことから望ましく無い。大規模メッセージ転送は一般に大きな実時間性を要求されないことが多く、ノード間制御メッセージのように小規模ではあるが実時間性の要求される高速転送と大規模メッセージ転送が混在すると、大規模メッセージ転送中において生じるネットワークでの送信待ちが実時間性の要求される高速転送を妨げることになる。図3では、大規模メッセージ31Dがネットワークでの遅延により送信に時間36Dを要するため、実時間性の高い小規模メッセージ31Cの送信が37Aにまで延びている。一括転送単位の規模を送信するメッセージ規模を基にして設定しておくと、時間310Aで一括転送単位が満たされるまで31Dの送信は待つことになり、その間に31Cが送信可能となり35Aで待つことなく送信される。受信側ではメッセージのある部分の到着により一括転送単位が満たされることになるため、メッセージのある部分の到着にあわせてOSの検出を契機に読み出しが行われることになる。
【００３１】
図５においては、従来では42B間隔でプロセス44Bがメッセージを読み込む処理を走行させる必要があり、しかも読込むサイズが分散共有メモリを用いる通信装置が1回で通信するメッセージ規模単位であったが、本発明により42A間隔で、即ちプロセス45Aは複数でメッセージ到着の間に1回の読み込む処理を走行させるだけで良く、さらに1回に読み込む規模は分散共有メモリを用いる通信装置が複数回で通信するサイズをあわせた規模となる。
【００３２】
図６〜図８は、本発明の一実施例を示す分散共有メモリを用いたプロセス間通信における優先度制御方式の共有メモリ書込み動作、或いは相手ノードヘの送信動作フローチャートである。システム構成要素は、ソフトウェアであるアプリケーションプログラム及びOS（オペレーションシステム）と、ハードウェアであるローカルメモリ及び送信装置とに分けて記述している。アプリケーションプログラムは大規模メッセージを送るプロセスと、小規模メッセージを送るプロセスがある場合を示しており、OSはローカルバッファ管理機能（BAMと略記）であり、ローカルメモリはローカルバッファ（BAと略記）と分散共有メモリ（DSMと略記）からなり、送信装置は例えば分散メモリカプラを想定する。
【００３３】
図６〜図８では、最初に送信ノードにおいてあるプロセスがメッセージの送信要求を行った場合を示している。小規模メッセージの送信要求が先に行われると、ローカルバッファを介することなく直ちに送信される。即ち、判定“メッセージ規模＞境界”後は※3となり、ローカルバッファ及びその管理機能等の本発明による機構が無い場合の送信形態とほぼ同じになるため、小規模メッセージの送信要求が先に行われる場合は図６で兼ねている。尚、図６では、プロセスからの送信要求及びOS（ローカルバッファ管理機能）がその要求を取出すキューの構造は少なくともプロセス名とメッセージ規模の情報を含むこととし、判定“メッセージ規模＞境界”での“境界”は、実時間メッセージとして扱うメッセージ規模の上限値を意味しており、この上限値は送信処理開姶に先立ちOS（ローカルバッファ管理機能）において設定されている値であり、※3と※4を同一にする手法をとる場合もあり得る。
【００３４】
アプリケーションプログラム内のあるプロセスが大規模メッセージの送信要求をローカルバッファ管理機能に通知すると（図６では該プロセスを大規模メッセージプロセスと記述）、この通知にはメッセージ規模（生成されたデータサイズないしメッセージサイズ）が含まれており、プロセスからの送信要求待ちの状態であったローカルバッファ管理機能は、メッセージ規模が境界値を超えるかを判定し、超える場合は大規模メッセージとして扱うためメッセージ規模に基づきローカルバッファ内に割り当てる一括転送単位の規模（予め定めたバッファ容量）及び一括転送単位の番地を決定し、大規模メッセージプロセスに通知する。境界を超えない場合は小規模メッセージ即ち実時間メッセージとして扱い、図６※3から分散共有メモリ内のMBの番地を決定し小規模メッセージを送信要求したプロセス（小規模メッセージプロセスと記述）に通知し、該プロセスはメッセージをMBに書き込み、メッセージは分散メモリカプラにより書込みと並行して相手ノードの宛先プロセスへ送信され、ローカルバッファ管理機能はプロセスからの送信要求待ちの状態に戻る。大規模メッセージプロセスは通知された一括転送単位の番地にメッセージを書き込み、ローカルバッファでは該書込みが一括転送単位を一杯にするまで該メッセージの蓄積を続ける。一括転送単位規模は、プロセスから通知されたデータサイズに基づいて決定し、ローカルバッファ管理機能が決定した値を用いるが、分散メモリカプラではMB内のデータ長が所定の規模に達したかどうか（一般のメッセージ送信装置でも同様に送信バッファに相当するバッファ内のデータ長が所定の規模に達したかどうか）を監視し、規模に達したらOSに対し通知する機能があり、ローカルバッファ管理機能が決定した一括転送単位規模を分散メモリカプラが参照できるメモリ領域に設定する。分散メモリカプラは、この値に、分散共有メモリと同じくローカルメモリ上にあるローカルバッファ内に書き込まれたデータ規模が、達したかどうかを監視し、一括転送単位一杯を検出すると、ローカルバッファ管理機能に割込みにより通知しローカルバッファ管理機能に制御が移り大規模メッセージプロセスはローカルバッファへの書込みを中断する。
【００３５】
尚、プロセスからローカルバッファ管理機能に通知したメッセージ規模（メッセージサイズ）が所定のデータ長すなわち境界を超えるか否かに基づいてローカルバッファを介して転送するか否かを判定しているが、例えば、プロセスからローカルバッファ管理機能に転送要求したメッセージを優先メッセージとして取り扱うか否かを示すコマンドやビット列をプロセスからローカルバッファ管理機能に通知することもできる。さらに、生成された転送すべきメッセージサイズが所定のデータ長を超えるか否かを判定する手段は、ローカルバッファ管理機能すなわち制御手段が有する場合だけでなく、プロセス自身が有することもできる。従って、プロセス自身が生成したメッセージをローカルバッファに書込むか又は直接分散共有メモリに転送するかを判断することもできる。
【００３６】
ローカルバッファに書込まれたデータを分散共有メモリに転送する一括転送単位の規模、すなわち所定のバッファ容量は、ローカルバッファ管理機能（制御手段）が生成されたメッセージサイズに基づいて決定することができ、前記境界値（所定のデータ長）から独立した値をとるが、前記境界値（所定のデータ長）に依存して決定することもでき、例えばデータ境界値（所定のデータ長）に等しい値に設定することも可能である。
【００３７】
図６〜図８中の＊1のステップは、OSの持つ機能等により手法が異なる。図5〜図７では動作の表示が容易な場合を記述しており、大規模メッセージプロセスが書込みを中断した後はローカルバッファ管理機能は一定時間プロセスの送信要求待ちとなる。この時間内に送信要求のあったプロセスの中でメッセージ規模が境界値以下のメッセージを選択し、即ち、該メッセージは実時間メッセージと扱い、分散共有メモリ内のMBを割当てる。該MBの番地を該メッセージを送信するプロセス（図5では小規模メッセージプロセス）に通知し、該プロセスは分散共有メモリ内の指定MBにメッセージを書き込み、メッセージは分散メモリカプラにより書込みと並行して相手ノードの宛先プロセスへ送信される。小規模メッセージプロセスからの分散共有メモリへのメッセージ書込みが終了すると、ローカルバッファ管理機能は、ローカルバッファの一括転送単位に蓄積していた先の大規模メッセージを分散共有メモリに書き込み、メッセージは並行して相手ノードの宛先プロセスへ送信される。続いて、中断していた大規模メッセージプロセスの残りのメッセージがある場合は、大規模メッセージプロセスに続きのローカルバッファ書込みを指示し、無ければプロセスの送信要求待ちに戻る。
【００３８】
大規模メッセージのローカルバッファ書込みがあれば一括転送単位が一杯になるまで書込みが続けられ、一杯になると分散メモリカプラがそれを検出しローカルバッファ管理機能に割込みにより通知しローカルバッファ管理機能に制御が移り大規模メッセージプロセスはローカルバッファへの書込みを中断する。即ち図６の※2に戻る。ところで、図６の＊1は、上述の場合よりも下記の場合がより想定される。あるプロセスが送信処理用のプログラムやOS等に送信メッセージを送っている最中に、OSにおいてタイマ等により他のプロセスからの送信要求を受け付けるスケジューリングは一般的にOSが備えていることが想定され得る。この場合、大規模メッセージからのローカルバッファへのメッセージを書込み処理の最中、即ち図６〜図８の＊2において、他のプロセスからの送信要求がローカルバッファ管理機能に受け付けられ、このプロセスの中でメッセージ規模が境界値以下のメッセージを選択し、即ち、該メッセージは実時間メッセージと扱い、分散共有メモリ内のMBを割当てる。この後、上述の“［♯］”で示した箇所からの動作となる。尚、送信対象となるプロセスは、アプリケーションプロセスのみでなく、OS内のプロセスであっても動作は同様に行なわれる。
【００３９】
図６〜図８に示す実施例からも本発明の利点として次の（i）〜（iii）が示される。（i）所定のデータ容量の一括転送単位毎に一旦蓄積されることで大規模メッセージの送信はローカルバッファ内で待つことになり、その間に発生した小規模メッセージは大規模メッセージに先行して送られる。（ii）一括転送単位の規模が大きいと一杯になるまでの時間が長くなり、それだけ小規模メッセージが先行する率が増す。（iii）一括転送単位の規模はその大きさのみならず0か否かで実時間性の有無を示すことになり、優先度を一つのパラメータで制御できる。しかも、このパラメータ値は一度に送信するメッセージ規模により定められるため、アプリケーション側において優先度を間接的に制御できることになる。
【００４０】
図９は、本発明の一実施例を示す分散共有メモリを用いたプロセス問通信における優先度制御方式の共有メモリ読込み動作、或いは相手ノードからの受信動作フローチャートである。このフローの中、aの2ステップはプロセスの到着メッセージ読込みに関するものであり、bの9ステップはプロセスから独立したローカルバッファ管理機能が行うバッファ内への読込み等に関するものである。
【００４１】
予めプロセス起動時或いは送信開始時に、プロセスから受信するメッセージ規模をOS内のローカルバッファ管理機能（222D）に通知し、その値に基づきローカルバッファ管理機能が一括転送単位規模を決定する。送信ノードからメッヤージが到着し始めると到着と並行して分散共有メモリにメッセージが書き込まれ、ローカルバッファ管理機能に到着が通知され、ローカルバッファ管理機能がローカルバッファに分散共有メモリからメッセージを読み込み、一括転送単位規模に逢すると、規模に達したことと分散共有メモリに対応するMBの番地がローカルバッファ管理機能からプロセスに通知され、プロセスはメッセージが蓄積されているMBを指定し、指定されたMBに対応するローカルバッファの内容即ち一括転送単位が一括してプロセスに送られる。一括して送られる部分の実態は図７のｃのように繰返し動作になっても良いが、プロセスは一回の転送動作としてみなされる。一括転送単位が送られるとローカルバッファ管理機能は分散共有メモリから続きのメッセージを読み込むため一括転送単位の番地を更新する。MB内のメッセージが全てローカルバッファに読み込まれるまではローカルバッファへの読込みとプロセスヘの送付を繰返すが、MB内のメッセージ全てが読み込まれるとローカルバッファ管理機能の受信動作は終了する。
【００４２】
一括転送単位規模に達したことがローカルバッファ管理機能からプロセスに通知される以外に、プロセスが直接呼出しコマンド即ち、ローカルバッファ管理横能からの通知と無関係に一連の動作を開始する場合もある。これは、プロセス側からメッセージ読出しを自律的に指示することを可能にする。プロセスから分散共有メモリへの書込み動作、或いは相手プロセスへの送信動作はこれのほほ逆の動作と言える。尚、送信側と受信側はローカルバッファに関しては独立であり、送信例のみ或いは受信側のみに本機能を設定しても動作は可能で、かつ、各々で効果がある。送信側では、ネットワークが隘路になることで大規模メッセージにより実時間メッセージの送信が妨げられることを回避できるが、受信側では、これと似た状況の、分散共有メモリ間のメッセージを転送する装置から分散共有メモリへの書込み速度に対するプロセス側の読出し速度が隘路になることで生じる、大規模メッセージが実時間メッセージの受信を妨げる現象を回避できる。
【００４３】
本発明は上述した実施例だけに限定されず種々の変更や変形が可能である。例えば、大規模メッセージについても一括転送単位規模を０とする判定を行う論理を適用してもよく或いは適用しないこともできる。この判定基準を変更する等により実時間メッセージか否かの判定を適宜変更することは可能であり、常に小規模メッセージが実時間メッセージとして取り扱われることに限定されるものではなく、或いは小規模メッセージのみが実時間メッセージとして扱われることに限定されるものではない。
【００４４】
【発明の効果】
以上説明したように、本発明によれば、ローカルバッファの使用領域規模を指定するのみでプロセスの発生するメッセージの特性に応じてノード間通信の実時間性を確保することができるので、従来システムに、（a）バッファ（及びバッファからのDSM書込み命令）の追加、（b）一括転送単位規模を通知するAPイーンタフエースの追加、（c）DMCの送信メッセージ長計数処理をローカルバッファへ拡張する、程度の簡易な修正で、プロセスに依存せず同一の分散共有メモリを用いる通信装置によるノード間通信の実時間性を確保できるという効果がある。
【図面の簡単な説明】
【図１】本発明が適用される一般的な高速通信方式を用いたネットワークモデルの構成の配置と関係例を示すもので、特にAPないしはプロセスの通信装置に対するアクセス形態を示す図である。
【図２】本発明による分散共有メモリを用いる通信装置を有するノード及びネットワーク構成を示す図である。
【図３】メモリ構造の一例を示す線図である。
【図４】本発明による分散共有メモリを用いる通信装置の送信側における、メッセージ通信方式を時間経過に従い説明する図である。
【図５】分散共有メモリを用いる通信装置の受信側における、メッセージ通信方式を示す図である。
【図６】本発明による分散共有メモリを用いる通信装置の送信側における、ローカルバッファを用いる共有メモリ書込み動作、或いは相手プロセスへの送信動作フローチャートの一部を示す図である。
【図７】本発明による分散共有メモリを用いる通信装置の送信側における、ローカルバッファを用いる共有メモリ書込み動作、或いは相手プロセスへの送信動作フローチャートの一部を示す図である。
【図８】本発明による分散共有メモリを用いる通信装置の送信側における、ローカルバッファを用いる共有メモリ書込み動作、或いは相手プロセスへの送信動作フローチャートの一部を示す図である。
【図９】本発明による分敢共有メモリを用いる通信装置の受信側における、ローカルバッファを用いる共有メモリ読込み動作、或いは相手プロセスからの受信動作フローチャートである。
【符号の説明】
11A，11B，11C 通信装置
12A，12B，12C OS類
13A，13B，13C，15C AP類
14A，14B 従来方式によるアクセス形態
14C 本発明によるアクセス形態
16C ローカルバッファ
21A，21B，28D 分散共有メモリ
21C，211D ネットワーク
22A，22B，29D 分散共有メモリ間のメッセージを転送する装置（分散メモリカプラ）
23A，23B ローカルメモリ
24A，24B，27D ローカルバッファ
25A，25B プロセッサ
26A，26B，21D ノード（プロセッサモジュール）
27A，27B 共通路
22D ソフトウェア層
23D 装置
24D アプリケーションプログラム
25D OS
26D ローカルメモリ
210D メッセージバッファ（MB）
212D，220D ネットワークに送出されたメッセージ
213D，214D 24Dで発生したメッセージ
215D 28Dに書込まれたメッセージ
213D，216D 分散メモリカプラが検出して自動的に取り込まれたメッセージ
213D，217D 27Dに書込まれるメッセージ
214D，218D 28Dに書込まれたメッセージ
214D，219D 分散メモリカプラが検出して自動的に取り込まれたメッセージ
214D，221D 一括転送単位
222D ローカルバッファを管理するOSの機能（ローカルバッファ管理機能）
31A，31E プロセッサモジュール
32A，32E アプリケーション層
33A，33E 分散共有メモリ及び分散メモリカプラ層
34A，34E ネットワーク
35A 本来の送信時刻にネットワーク上の送出が可能になった実時間性の高いメッセージ
36A，39E ネットワーク上に送出される大規模メッセージ
37A ネットワークが隘路となり待ちが生じた実時間性の高いメッセージ
38A 実時間性の高いメッセ⊥ジの本来の送信時刻
39A ネットワークが隘路となり待ちが生じた実時間性の高いメッセージの送信時刻
310A 大規模メッセージがローカルバッファに保留される時間
31B ローカルバッファ、32B…ローカルバッファを用いる場合の分散メモリカプラ
33B ローカルバッファを用いる場合のネットワーク
31C，36E アプリケーションにおける実時間性の高いメッセージの送信
31D，32D，35E アプリケーションにおける大規模メッセージの送信
37E 分散共有メモリに書込まれた大規模メッセージ
38E 分散共有メモリに書込まれた実時間メッセージ
310E 大規模メッセージが分散共有メモリに書込まれるため待ちが生じたネットワーク上に送出された実時間性の高いメッセージ
311E 大規模メッセージが分散共有メモリに書込まれるため待ちが生じた実時間性の高いメッセージの送信時刻
41A 送信元ノードからのメッセージ到着間隔
41B 送信元ノードからのメッセージ到着間隔
42A メッセージの最初の到着通知からプロセスに読込まれるまでの間隔
42B メッセージの到着通知からプロセスに読込まれるまでの間隔、
43A 分散共有メモリを用いる通信装置内の分散共有メモリ
43B 分散共有メモリを用いる通信装置内の分散共有メモリ
44A ローカルバッファ
44B，45A プロセス
45B メッセージが到着するとその度ごとに分散共有メモリを用いる通信装置からプロセスに送られる送信到着通知
46A 一連のメッセニジの先頭が到着すると分散共有メモリを用いる通信装置からプロセスに送られる送信到着通知
46B 分散共有メモリを用いる通信装置内の分散共有メモリからプロセスに読込まれるメッセージ
47A ローカルバッファからプロセスに読み込まれるメッセージ[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an interprocess communication system using a distributed shared memory. In particular, the present invention temporarily suspends message transmission of a scale corresponding to a process or transfer message in a local buffer provided in the OS or AP (Application) layer, and transmits a process having a high transmission priority level to a low level. Inter-process communication using distributed shared memory that performs priority control to transmit messages with high real-time characteristics without being interrupted by messages with low real-time characteristics in the message transfer between nodes by scheduling to interrupt process transmission It is about the system.
Furthermore, the present invention relates to a data processing apparatus having a communication function used for interprocess communication using a distributed shared memory.
[0002]
[Prior art]
In the inter-node network, hardware built-in communication means with a small software load attract attention, and a communication method using a distributed shared memory is also effective because of its low latency. By the way, a high-speed communication method for connecting processors is often designed according to the characteristics of a process (or AP), and another high-speed communication system is used for a process (or AP) having different characteristics. . That is, each communication method has a relatively limited application area. For example, in a high-speed communication method using a distributed shared memory, when a message with high real-time characteristics is being transmitted and mixed with large-scale message transmission, the large-scale message transmission is completed because the network becomes a bottleneck. In the meantime, transmission of a message with high real-time characteristics is awaited, and transmission of a message with high real-time characteristics may be prevented. Also, in the case of a message transfer-dedicated communication method for transferring a large-scale message, it is generally not intended for real-time message transmission, and the entire message delivered to a normal transmission device is temporarily stored in a buffer, so that buffering time Therefore, it has been difficult to use a large-scale message transfer system for high-frequency real-time message communication.
[0003]
Therefore, the route from the process to the device or system that performs high-speed data transfer differs depending on the characteristics of the process (or AP), and it can be dealt with by using different high-speed communication methods for each process (or AP). Inevitably, there is a burden such as an increase in cost due to the introduction of a program or device for switching, and an increase in use of host processor resources.
[0004]
[Problems to be solved by the invention]
As described above, in the conventional internode data transfer technology, when a large-scale message transmitted in a batch using a high-speed communication device and a transmission of a message having high real-time characteristics are mixed, the network becomes a bottleneck. While large-scale message transmission continued, there was a problem that transmission of messages with high real-time characteristics was hindered. In particular, if a device dedicated to large-scale transfer such as file transfer is used, it is difficult to send a message with a high real-time property, and it is possible to send a message with a high real-time property and a large-scale message using the same communication device. There is a problem that it is difficult to do together. In a high-speed communication method using a distributed shared memory, high speed is required not only for small messages such as control information originally targeted, but also for large messages.
[0005]
The object of the present invention is to solve the problem that it is generally difficult to perform message transmission with high real-time characteristics and large-scale messages with relatively low real-time characteristics by using only a high-speed communication method using a distributed shared memory. When sending a process that generates messages with various real-time characteristics and various data scales that can be solved only by a small modification, multiple high-speed communication devices can be installed and used, or replaced with other communication methods. Therefore, it is an object of the present invention to provide a high-speed communication method using a distributed shared memory that can be transmitted without impairing the process characteristics, particularly the real-time property of a transfer message.
Furthermore, an object of the present invention is to preferentially transfer data or messages that require real-time performance in a data processing apparatus used in an interprocess communication system having a plurality of processes and using a distributed shared memory. It is to realize a data processing device that can be used.
Furthermore, another object of the present invention is to provide a high-speed communication system using a distributed shared memory as a wider range of distributed real-time communication functions.
[0006]
[Means for Solving the Problems]
In order to achieve the above object, a communication system using a distributed shared memory according to the present invention has a plurality of nodes interconnected via a network, and each node has one or more processes for generating data or messages. , A memory having a distributed shared memory used to share data with other nodes, and data to the distributed shared memory Control writing Control means and distributed shared memory Written Each node comprising a data transfer device for transmitting data to the network and receiving data sent from the network In Data in its own distributed shared memory Once written, its own data transfer device The data is automatically transferred to the distributed shared memory of the destination node Forward In an interprocess communication method using distributed shared memory,
Each node has a local buffer for temporarily storing data generated by the process, and the control means stores the data in the distributed shared memory when the data size of the data generated by the process is equal to or less than a predetermined data length. When the data size of directly written and generated data exceeds the predetermined data length, the data is temporarily written in the local buffer and then written in the distributed shared memory.
[0007]
In the present invention, the write control means compares the processing message received from the process with a predetermined message length or data length, writes the data or message below the predetermined data length or the message length directly into the distributed shared memory, When the message length is exceeded, the processing message is controlled to be written in a local buffer provided in the OS or AP layer and then written in the distributed shared memory. By this control, a message having a high real-time property level is preferentially written in the distributed shared memory. As a result, even when various messages having different message lengths are mixed, a message with high real-time property can be preferentially transmitted without providing a complicated priority control mechanism. In this specification, “node” means a data processing apparatus having a communication function, and is understood to include not only a case where a process as an application is included but also a PM (processor module) and a processor.
[0008]
Also, by assigning a virtual address to the local buffer and providing an address conversion function with the real distributed shared memory, the OS or AP layer can use the high-speed communication method in the same process as if accessing the real distributed shared memory. it can. Also, when allocating a local buffer, it is assigned for each batch transfer unit whose scale is determined by the OS according to the real-time property or message size of the process or message, and the message hold size or hold time required for the process or message is set. It is possible to facilitate the transmission of a high priority process or message in advance. As a result, priority control of message transfer between nodes in inter-process communication using a distributed shared memory can be realized by simply changing or correcting the mounting form. Here, the batch transfer unit means that when a message is temporarily stored in a local buffer, the message is divided into predetermined data sizes and stored from the local buffer to the distributed shared memory in units of the data size. Although written, it means a unit of data having a certain data size.
[0009]
Also, when sending when all the area units (MB) in the batch transfer unit are accessed, the contents of the real shared memory are overwritten in the batch transfer unit (in the case of reception, the local buffer contents are overwritten in the real shared memory). By overwriting), transfer is performed without the process being aware of it. Note that MB (Message Buffer) means an area unit in the distributed shared memory where data is written.
[0010]
In addition, by forcibly starting the overwriting from the process side and synchronizing the local buffer and the real shared memory, the latest message can be read and written at any time, that is, communicate with the transfer destination node. Can do.
[0011]
In addition, by indicating the priority given by the OS using the scale of the batch transfer unit, both the hold time until the message is sent to the network and the OS priority are controlled with one parameter, and the process The process itself can control these priorities by determining the value of this parameter indirectly.
[0012]
DETAILED DESCRIPTION OF THE INVENTION
In the present invention, as to how to efficiently provide a high-speed communication method using a distributed shared memory according to the characteristics of a process (particularly, the message size to be transmitted and received and the real-time property required for the message), (The process can be either within the AP or the OS).
That is, in inter-process communication connected via a network, when using the high-speed communication method as a more general communication means, the process that is the source does not always request transmission of a message with similar characteristics. Messages that do not always have similar characteristics. That is, it is necessary to change the form of providing the high-speed communication method and the type of method according to the characteristics of the message. However, the method of providing a plurality of communication methods and switching the communication methods according to the message characteristics has a disadvantage that the device scale is enlarged. Therefore, in the present invention, a local buffer is provided in the OS or AP layer in the node, and a combination of temporary hold of a large-scale or low real-time message and write priority assignment to the distributed shared memory by the OS, OS-based local buffer control and forced operation from the process, setting of the area to be allocated in the local buffer according to the characteristics of the process or message, that is, the size of the batch transfer unit, local buffer and distributed shared memory by virtual address Equally ensure the local buffer size by notifying the OS of the message size. As a result, it is possible to absorb a difference in performance due to message characteristics that often occurs in a high-speed communication system, and to always communicate various messages in the same high-speed communication system. In addition, by providing this function in the OS or AP layer, the features of conventional communication devices using distributed shared memory, such as the ability to communicate messages with a small delay without burdening the application or OS, are achieved. can do.
[0013]
Note that the present invention can also be applied to communication between processes in each node connected by a network. Therefore, a node is applied not only to a case where it is composed of one PM but also to a case where it is composed of a plurality of PMs, and one or a plurality of processes may operate within each PM. In other words, inter-process communication targeted by the present invention is not only communication between processes in different nodes, but also communication between processes in different PMs in one node and processes in one PM in one node. The operation is the same including communication between each other. Although PM and nodes are collectively referred to as nodes in claim 1, the processes that are targeted in this way have several forms, but perform the same operations for all these forms.
[0014]
【Example】
Embodiments of the present invention will be described below in detail with reference to the drawings. FIG. 1 is a diagram illustrating an arrangement of a network model configuration using a general high-speed communication method to which the present invention is applied and a related example. In Figure 1, the communication device using the distributed shared memory (DSM), which is a high-speed communication device, is 11A, the OS included in the host processor is 12A, and the processes that mainly communicate small messages are 13A, large-scale message transfer The device is 11B, the OS included in the host processor is 12B, and the processes for communicating large-scale messages are 13B. High-speed communication devices are often applied depending on the AP, and the communication device is designed or implemented according to the message characteristics of the process in the AP, so communication from APs that generally communicate relatively short messages. Device access is 14A, and communication device access from APs that communicate large files is 14B. By introducing a local buffer (in 16C), both small message communication and large message transfer are mainly implemented. Communication is performed by accessing the DSM communication device 11C. FIG. 1 shows two types of APs, APs that communicate small messages and APs that communicate large messages, but the same is true when there are three or more types. In other words, efficient communication is provided by the same communication apparatus to a wide range of APs from a short message transmission to a large-scale file transmission without depending on the AP. In this specification, “node” means a data processing apparatus having a communication function and generating data by one or more processes, and one or more processor modules (PM: Processor) Module). That is, according to the present invention, when the writing of data or messages to the distributed shared memory among a plurality of nodes competes, and when writing of data generated by a plurality of processes in the same node to the distributed shared memory competes. Applies to both.
[0015]
FIG. 2 is a configuration diagram of a node and a network having a communication apparatus using a distributed shared memory to which the present invention is applied. Although two nodes 26A and 26B are illustrated in FIG. 2, even if the number is three or more, the same is true. The main body of each node is a host processor 25A, 25B, local memory 23A, 23B, distributed shared memory 21A, 21B, and a device for transferring messages between distributed shared memories such as distributed memory couplers (distributed memory) 22A and 22B). The host processor is assumed to have a plurality of processes. Local buffers 24A and 24B are placed in the local memories 23A and 23B. Reference numerals 27A and 27B are common paths between the process and the distributed shared memory of the communication system using the distributed shared memory, which is a target whose priority is controlled by the effect of the local buffer.
[0016]
2 to 221D in FIG. 2 are diagrams for explaining the communication apparatus using the distributed shared memory of the present invention, focusing on the transmission operation from the viewpoint of software. 21D is a node, 22D is a software part constituting the node, 23D is a device constituting the node, each AP (a plurality of processes as part of this AP) 24D, OS is 25D, local memory 26D local Buffer 27D, batch transfer unit 221D that can be variably allocated according to the process or message, OS function 222D that manages the local buffer, distributed shared memory 28D, and element area that is divided at a certain scale that is allocated for each message transmission An MB210D, a distributed memory coupler 29D, and a network 211D connecting the nodes are shown.
[0017]
Conventionally, when the message 213D generated in the process 24D is written to the distributed shared memory 28D (215D), the distributed memory coupler 29D simultaneously detects and automatically captures it (216D) (in this case, MB: 210D unit is used) To the network 211D. For example, when using for IP communication, the destination node can be determined for each IP by a method such as using an ARP server of IPoverATM, and it is only necessary to assign the VC, so the applicability to high-speed IP communication is also possible. is assumed.
[0018]
When the local buffer of the present invention is introduced, the message 214D generated in the process 24D is written to the batch transfer unit 221D allocated in the local buffer 27D depending on the process or message characteristics, such as when the real time property is small. (217D) When all the areas in the batch transfer unit 221D are accessed (the batch transfer unit corresponds to the MB of the distributed shared memory 210D), it is written to the distributed shared memory 28D, and at the same time the distributed memory coupler 29D Is automatically captured (219D) (MB: used in 210D units) and sent to the network 211D.
[0019]
FIG. 3 shows configurations 21E to 28E of an example of a memory according to the present invention. A node having this memory configuration is referred to as node A. Reference numeral 21E indicates a range used in the present invention on the local memory, 22E indicates a local buffer, and 23E indicates a distributed shared memory. The distributed shared memory 23E is divided for each MB indicated by reference numeral 26E, and each MB is classified from its own node to all nodes (or communication targets) in all networks, for example, node A> node When a message is written in B's MB (s), the message is sent from this node to node B by the distributed shared memory coupler. Since the same memory structure is adopted in the node B, the message is written in the node A> node B in the node B. 27E indicates an outgoing MB, and 26E indicates an incoming MB. This MB structure is an example of communication using a distributed shared memory and is not a technical matter essential to the present invention, but is described as a reference. The local buffer indicated by reference numeral 22E is divided into batch transfer units, and at least one batch transfer unit is used in each transmission process via the local buffer. It is used while updating sequentially like a line. Based on the message size notified from the AP to the control means at the time of transmission request from the AP, the control means (local buffer management function) determines the capacity of the local buffer used in the transmission, for example, within the range of the double line in the figure ( 25E), the capacity of the entire local buffer 28E can be suppressed without being wasted.
[0020]
FIG. 4 is a diagram for explaining the message communication method on the transmission side of the communication apparatus using the distributed shared memory of the present invention as time elapses. Reference numerals 31A and 31E indicate nodes, 32A and 32E indicate AP layers, and 33A and 33E indicate distributed shared memory and distributed memory coupler layers. Further, reference numerals 34A and 34E denote networks, 31B denotes a local buffer, 32B denotes a distributed memory coupler when the local buffer is used, and 33B denotes a network when the local buffer is used. First, conventional transmission will be described. Prior to sending the message, the process secures MB, which is the write area of the distributed shared memory (notified to the OS and secured, and the address is returned). A message generated in the process is written in an MB indicated by an approximate address in the distributed shared memory, and the distributed memory coupler takes it in parallel and sends it to the network. After the reception of the message is completed on the reception side, a reception completion notification is returned to the transmission side for each MB, and the MB is released for both transmission and reception.
[0021]
By the way, the message 36E having high real-time property generated in the process cannot be written to the distributed shared memory (37E) while the message 36E is written to the distributed shared memory when the large-scale message transmission 35E is mixed, and the message 37E Is written after is written (38E). As a result, it is transmitted on the network at time 311E (310E). When OS interrupts are introduced, when 36E, which has a higher priority than 35E, writes to the distributed shared memory, 35E is interrupted by an interrupt, and 36E is written preferentially. This case is shown in a large-scale message divided into 31D and 32D, and a highly real-time message (31C) is immediately written to the distributed shared memory as 32C.
[0022]
The following shows how local buffers are used in this situation. When the process is activated, it notifies the local buffer management function (in FIG. 2, the OS function 222D that manages the local buffer, which is an extension of the distributed shared memory management function). This may be performed at any time, not during process activation. Prior to writing to the distributed shared memory for transmission from the process, MB is notified to the OS, but the OS reserves a batch transfer unit in the local buffer instead of the conventional MB reserved on the distributed shared memory (within the OS This conventional function is a local buffer management function). The size of the batch transfer unit is determined according to the message size notified in advance by the local buffer management function, and the address is returned. When the real time is high, it is actually a case of a small message, but the batch transfer unit size is set to 0, and the message buffer area is not used as in the conventional route, and is directly written to the distributed shared memory ( The local buffer management function or the address returned by the OS is the MB address on the distributed shared memory as before). A message 31C having a high real time property generated in the process is processed by the distributed shared memory and the distributed memory coupler layer of 33A and 33E in parallel with the generation and sent to the network 34A. When a series of messages 31D and 32D, which are large-scale message transmissions, are mixed, they are processed by the distributed shared memory and distributed memory coupler layers of 33A and 33E and sent to the network 34A in parallel. 36A is a bottleneck network that prevents the transmission of a message generated from 31C and sent to the network, and the message from 31C is sent from time 38A to the network, but is sent from time 39A (message 37A). Since this delay is eliminated by the next operation of the local buffer, the method will be described. The time 310A is the time until the batch transfer unit assigned to the local buffer assigned for the series of messages 31D and 32D is satisfied.
[0023]
When real-time message 31C and large message transmission 31D and 32D are mixed, large message transmission 31D and 32D are held in local buffer 31B for time 310A and sent over the network from time 39A The real-time message 31C can be transmitted on the network at the original transmission time 38A without being interrupted by the large-scale message transmission 31D and 32D network transmission (35A). This is because when a large message is allocated to the MB, the process obtains the address of the batch transfer unit in the local buffer, so it writes to that address, and when the batch transfer unit becomes full, the OS detects this (for example, the distributed memory coupler) The MB write length counting function is expanded to count the local buffer write length, and when the batch transfer unit becomes full, the OS detects it by an interrupt), and small messages waiting for writing to the distributed shared memory, that is, the batch transfer unit Determine if there is a message with a scale of 0, and if so, suspend local buffer writing for large messages, write distributed shared memory for small messages first, and resume local buffer writing for large messages after completion Doing (If you have already written all large messages, start writing other messages) A. If the batch transfer unit size is set large, the time until a large message is written to the distributed shared memory becomes longer, that is, a larger number of real-time messages are preceded by a question that is pending transmission of a large message. Can be sent.
[0024]
Outlines of examples including buffer management and operations of the distributed shared memory coupler are shown in 31F to 319F. Reference numeral 31F indicates a processor module, 32F indicates an application layer (process for transmitting a message), 33F indicates a local buffer management node, 34F local buffer, 35F indicates a distributed shared memory, 36F indicates a distributed memory coupler, and 37F indicates a network. A case is shown where the large-scale message (318F) requests transmission first, and the real-time message (319F) requests transmission during the processing. After a transmission request is made for a large message, the address of the local buffer is indicated and written to the local buffer (38F). At this time, when the area defined in the batch transfer unit becomes full, the distributed memory coupler detects it (312F), notifies the local buffer management function (313F), and if there is a request for transmitting a real-time message, the distributed memory coupler. The real-time message is started to be written into the distributed shared memory from the process instructing to write to the distributed shared memory (39F). When this writing is completed, the distributed memory coupler detects the completion (314F) and notifies the local buffer management function. The local buffer management function starts writing a large-scale message (by 38F) held in the local buffer to the distributed shared memory (315F). When the write is started (310F) and the distributed memory coupler detects the completion of the write (316F), the write completion is notified to the local buffer management function. The local buffer management function instructs management to a local buffer if a new message is large even if there is a continuation of a large message or not (317F), and the message is written to the local buffer. (311F). On the other hand, if it is a small message, the direct write to the distributed shared memory is instructed. As a result, the real-time message is immediately transmitted to the network without being interrupted by the transmission process of the large-scale message from the start of the transmission process.
[0025]
FIG. 5 is a diagram for explaining a message communication method on the receiving side of a communication apparatus using a distributed shared memory when the present invention is applied. In this figure, the flow and direction of information are indicated by arrow lines, the message to be communicated is indicated by a solid line, and the signal is indicated by a broken line. Portions denoted by reference numerals 41B to 46B in the upper stage are diagrams for explaining a message communication system on the receiving side of a communication apparatus using a conventional distributed shared memory that does not use the present invention. 43B indicates a distributed memory coupler, 44B indicates a process, and 41B indicates the arrival of a message from the network. When a large-scale message is sent to the network, the transfer unit to the network side is subdivided into the transmission unit of the distributed memory coupler and sent, so that the large-scale message arrives after being subdivided as in 41B. As soon as the subdivided message arrives, the process can be read and written to the area allocated as the distributed shared memory in the memory in the node. The process recognizes the arrival (45B), and the process reads or manages the process in order. Reading such as transfer to an area is performed (46B). At this time, the process reading process is performed at the time interval 42B, but it corresponds to the transfer unit on the network side or the transmission unit of the distributed memory coupler, that is, to handle the unit subdivided into units unrelated to the process side. Processing of the process is required at intervals of 42B.
[0026]
The diagram indicated by reference numerals 41A to 47A in the lower part of FIG. 5 is a diagram for explaining a message communication system on the receiving side of a communication apparatus using a distributed shared memory when the present invention is used. Reference numeral 43A denotes a distributed memory coupler, 44 denotes a local buffer (a batch transfer unit therein), 45A denotes a process, and 41A denotes arrival of a message from the network. When a large-scale message is transmitted to the network, the message is divided into transmission units on the network side or a transmission unit of the distributed memory coupler, and therefore, the large-scale message arrives after being divided as in 41A. As soon as the subdivided message arrives, it is written into the area allocated as the distributed shared memory of the memory in the node (48A), but once read into the local buffer (the batch transfer unit in it) instead of the process. When the batch transfer unit is full (written to all the areas corresponding to the corresponding MB in the distributed shared memory), the process is notified of message arrival for the first time (46A), and the process reads (47A). The size of the batch transfer unit is determined according to the process (or message characteristics), and is not a sub-unit depending on the network or distributed memory coupler. (42A) process reads in (assuming scale). Therefore, it is not necessary to start the process at an interval according to the sub-unit unit dependent on the network or the distributed memory coupler in the sub-unit interval (42B) depending on the network or the distributed memory coupler as in the conventional method. The start of processing is performed at intervals (42A) that are determined by process-side characteristics such as message processing time within the process (assuming a scale in which a large-scale message is divided into several parts) (ie, at intervals wider than 42B). ), That is, depending on the process-dependent subdivision unit. This is because the size of the batch transfer unit is allocated by the OS according to the size of the message that the process notifies the OS to ensure the security prior to communication, so this size (reading interval from the viewpoint of time) is indirectly set. This is because the process is determined.
[0027]
The local buffer (in 16C, 24A, 24B, 31B, 44B) is given a virtual address corresponding to the distributed shared memory (23A, 23B), and the local buffer management function 222D has a virtual-real address conversion function and local A message direct access command for the buffer is provided. The message buffer area is accessed from the application program / OS as if it were a distributed shared memory by the virtual address. When a process writes to a certain distributed shared memory area or MB, it writes to the batch transfer unit allocated in the local buffer, and when it writes to all MBs of the batch transfer unit, the distributed memory coupler (22B, 29D ) And OS (25D) related distributed shared memory access function (for example, function [1] etc.), write to the corresponding area of the distributed shared memory with almost the same operation as the process writes to DSM in the conventional system However, if the application program / OS or its process issues a direct write command, the entire area of the distributed shared memory is overwritten by the batch transfer unit even if all MBs of the batch transfer unit are not written. The
[0028]
Here, the “virtual address” is provided so that the same interface is provided regardless of whether the local buffer is used or not when viewed from the process. The local buffer is not used even if the virtual address is not used. Although the processing related to the prior transmission of the real-time message by the introduction is performed without change, when the virtual address is used, the memory write address designated by the buffer management function or the process from the OS is designated as a virtual address instead of a real address. The On the other hand, when the process writes, the address specified as the write destination is converted to the address of the local buffer when the buffer management function or the OS passes the buffer, and to the address of the distributed shared memory when the buffer is not passed. And write a message at that address. Furthermore, when the distributed shared memory is designated, there is an application method in which the virtual address is the same as the real address. In this case, only the address designation of the local buffer is performed at the virtual address. If the distributed shared memory has the same address shared by all nodes, the virtual address is used instead of the actual address in each node. Even if the actual address is different in each node, it is as if the same from the process side. It is also possible to make it appear to share a street address. This virtual address assigning means can be easily used for the virtual address assigning means of the present invention if the local buffer and the distributed shared memory are taken on the same local memory. It can be easily unified regardless of whether or not.
[0029]
Similarly, in the case of reading, when the process reads the distributed shared memory area, that is, the MB, the contents of the batch transfer unit allocated in the local buffer are read, but the application program / OS confusion is that the process directly issues a read command. When issued, the batch transfer unit is overwritten by the corresponding area of the distributed shared memory. This message direct access command can forcibly synchronize the contents of the batch transfer unit with the real distributed shared memory in each write / read, and the local buffer or the real distributed shared memory can be kept up to date at any time. is there. Note that the batch transfer unit is incremented in the local buffer in order to prevent the batch transfer unit from being overwritten by the content of the subsequent message before the message arrives at the other party in synchronization with the distributed shared memory and reception processing is performed. It is used while updating the address. Therefore, the maximum area for a batch transfer unit is to secure an area taken for one node on the distributed shared memory, that is, about an MB area for one node. For example, if there is only one type of process for sending a large-scale message, for example, 256 buffers of 256 sources may be used, and 64 KB is sufficient. However, since it is sufficient to allocate only to a process that sends a large-scale message, and only the number of processes that are simultaneously transmitted, the size may be sufficiently smaller than the distributed shared memory. Therefore, the capacity of the local buffer is smaller than that of the memory and does not affect the application program or the OS.
[0030]
Since one MB is used for a single message transmission request from a process, if the MB size is increased more than necessary compared to the assumed message size, a part of the MB that is wasted is generated, and the distributed shared memory Usage rate decreases. Therefore, the MB size of the distributed shared memory is subdivided into a relatively small size in accordance with the normally assumed message size, and the MB size is the same in order to avoid complicated MB management. Therefore, for example, for large messages and small messages coexisting, large and small MBs are set, large MBs are used for large messages, and small M8s are used for small messages. A method of giving priority to transmission by temporarily waiting for writing to the distributed shared memory when a large MB is selected is not desirable because it leads to complicated MB management. Large-scale message transfer generally does not require large real-time characteristics. When high-speed transfer and small-scale message transfer that require small real-time characteristics, such as inter-node control messages, are mixed, The waiting for transmission in the network that occurs during transfer prevents high-speed transfer that requires real-time performance. In FIG. 3, since the large-scale message 31D requires time 36D for transmission due to a delay in the network, the transmission of the small-size message 31C having high real-time characteristics extends to 37A. If the size of the batch transfer unit is set based on the size of the message to be sent, 31D transmission will wait until the batch transfer unit is satisfied at time 310A, during which time 31C can send and wait at 35A. Sent without. On the receiving side, the batch transfer unit is satisfied by the arrival of a certain part of the message, so that the reading is performed in response to the detection of the OS in accordance with the arrival of the certain part of the message.
[0031]
In FIG. 5, the process 44B needs to run a process for reading a message at 42B intervals in the past, and the read size is a message scale unit that a communication device using a distributed shared memory communicates at one time. According to the present invention, at intervals of 42A, that is, the process 45A only needs to run a single reading process during the arrival of a message, and the communication device using the distributed shared memory communicates at a plurality of times. The scale is the same size.
[0032]
6 to 8 are flowcharts of a shared memory write operation of the priority control method in the inter-process communication using the distributed shared memory according to the embodiment of the present invention, or a transmission operation to the partner node. The system components are described separately for an application program and OS (operation system) that are software, and a local memory and transmission device that are hardware. The application program shows the process of sending a large message and the process of sending a small message. The OS is a local buffer management function (abbreviated as BAM), and the local memory is a local buffer (abbreviated as BA). It is composed of a distributed shared memory (abbreviated as DSM), and the transmission device is assumed to be a distributed memory coupler, for example.
[0033]
6 to 8 show a case where a process in a transmission node first makes a message transmission request. When a small message transmission request is made first, it is immediately transmitted without going through the local buffer. That is, after the determination “message size> boundary”, it becomes * 3, which is almost the same as the transmission mode in the case where there is no mechanism according to the present invention such as the local buffer and its management function. In this case, it is also shown in FIG. In FIG. 6, the transmission request from the process and the queue structure from which the OS (local buffer management function) fetches the request include at least the process name and message size information. “Boundary” means the upper limit of the message scale handled as a real-time message. This upper limit is a value set in the OS (local buffer management function) prior to the start of transmission processing. * 4 The same method may be used.
[0034]
When a process in the application program notifies the local buffer management function of a transmission request for a large message (the process is described as a large message process in FIG. 6), the message size (generated data size or message) is included in this notification. Size) is included, and the local buffer management function that was waiting for a transmission request from the process determines whether the message size exceeds the boundary value. If it exceeds, the message is handled as a large message. The size of the batch transfer unit allocated in the local buffer (predetermined buffer capacity) and the address of the batch transfer unit are determined and notified to the large-scale message process. If the boundary is not exceeded, it is handled as a small message, that is, a real-time message, and the MB address in the distributed shared memory is determined from Fig. 6 * 3 and notified to the process that sent the small message (denoted as a small message process) Then, the process writes the message into the MB, the message is transmitted to the destination process of the partner node in parallel with the writing by the distributed memory coupler, and the local buffer management function returns to the state waiting for the transmission request from the process. The large-scale message process writes the message to the address of the notified batch transfer unit, and continues to store the message in the local buffer until the write fills the batch transfer unit. The batch transfer unit scale is determined based on the data size notified from the process, and the value determined by the local buffer management function is used. In the distributed memory coupler, whether the data length in MB has reached a predetermined scale ( Similarly, in general message transmission devices, there is a function to monitor whether the data length in the buffer corresponding to the transmission buffer has reached a predetermined scale) and to notify the OS when it reaches the scale, and the local buffer management function The determined batch transfer unit scale is set in a memory area that can be referred to by the distributed memory coupler. The distributed memory coupler monitors whether the value of the data written in the local buffer on the local memory as in the distributed shared memory has reached this value, and if it detects that the batch transfer unit is full, the local buffer management function The local buffer management function is transferred to the local buffer management function, and the large-scale message process interrupts writing to the local buffer.
[0035]
Whether or not to transfer via the local buffer is determined based on whether the message size (message size) notified from the process to the local buffer management function exceeds a predetermined data length, that is, a boundary. The process can also notify the local buffer management function of a command or a bit string indicating whether or not the message requested to be transferred from the process to the local buffer management function is handled as a priority message. Furthermore, the means for determining whether or not the generated message size to be transferred exceeds a predetermined data length can be provided not only by the local buffer management function, that is, the control means, but also by the process itself. Therefore, it is possible to determine whether the message generated by the process itself is written in the local buffer or directly transferred to the distributed shared memory.
[0036]
The scale of the batch transfer unit for transferring the data written in the local buffer to the distributed shared memory, that is, the predetermined buffer capacity can be determined based on the message size generated by the local buffer management function (control means). , Takes a value independent of the boundary value (predetermined data length), but can be determined depending on the boundary value (predetermined data length), for example, a value equal to the data boundary value (predetermined data length) It is also possible to set to.
[0037]
The method of * 1 step in FIGS. 6 to 8 differs depending on the function of the OS. FIGS. 5 to 7 describe cases where the display of the operation is easy. After the large-scale message process interrupts the writing, the local buffer management function waits for a process transmission request for a certain period of time. A message whose message size is less than or equal to the boundary value is selected from processes that have requested transmission within this time, that is, the message is treated as a real-time message, and MB in the distributed shared memory is allocated. This MB address is notified to the process that sends the message (small message process in FIG. 5), and the process writes the message to the specified MB in the distributed shared memory, and the message is written in parallel with the distributed memory coupler. Sent to the destination process of the other node. When the message writing from the small message process to the distributed shared memory is completed, the local buffer management function writes the previous large message stored in the batch transfer unit of the local buffer to the distributed shared memory, and the messages are processed in parallel. Sent to the destination process of the partner node. Subsequently, when there are remaining messages of the suspended large-scale message process, the local message write is instructed following the large-scale message process.
[0038]
If there is a local buffer write of a large message, the write is continued until the batch transfer unit is full, and when it is full, the distributed memory coupler detects it, notifies the local buffer management function by an interrupt, and controls the local buffer management function. The large message process ceases writing to the local buffer. That is, it returns to * 2 of FIG. By the way, * 1 of FIG. 6 assumes the following case more than the above-mentioned case. While a process is sending a transmission message to a program or OS for transmission processing, it is assumed that the OS generally has scheduling for accepting transmission requests from other processes by a timer or the like in the OS. obtain. In this case, during the process of writing a message from a large message to the local buffer, that is, in * 2 of FIGS. 6 to 8, a transmission request from another process is accepted by the local buffer management function. A message whose message size is equal to or smaller than the boundary value is selected, that is, the message is treated as a real-time message, and MB in the distributed shared memory is allocated. Thereafter, the operation starts from the location indicated by the above-mentioned “[#]”. Note that the process to be transmitted is not limited to the application process, but the operation is performed in the same manner even if it is a process in the OS.
[0039]
The following (i) to (iii) are also shown as advantages of the present invention from the embodiments shown in FIGS. (I) Once stored for each batch transfer unit of a predetermined data capacity, transmission of a large message waits in the local buffer, and small messages generated during that time are sent prior to the large message. It is done. (Ii) If the size of the batch transfer unit is large, the time until the full transfer unit becomes full increases, and the rate at which small messages precede is increased accordingly. (Iii) The size of the batch transfer unit indicates not only the size but also whether it is zero or not, and it indicates the presence or absence of real-time property, and the priority can be controlled with one parameter. Moreover, since the parameter value is determined by the size of the message transmitted at a time, the priority can be indirectly controlled on the application side.
[0040]
FIG. 9 is a flowchart of the shared memory read operation of the priority control method or the reception operation from the partner node in the process inquiry communication using the distributed shared memory according to an embodiment of the present invention. In this flow, 2 steps of a are related to reading process arrival messages, and 9 steps of b are related to reading into buffers performed by the local buffer management function independent of the process.
[0041]
The message size received from the process is notified in advance to the local buffer management function (222D) in the OS at the time of process activation or transmission start, and the local buffer management function determines the batch transfer unit size based on the value. When a message starts to arrive from the sending node, a message is written to the distributed shared memory in parallel with the arrival, the arrival is notified to the local buffer management function, the local buffer management function reads the message from the distributed shared memory to the local buffer, and batches If the transfer unit size is reduced, the local buffer management function notifies the process that the size has been reached and the MB address corresponding to the distributed shared memory, and the process specifies the MB in which the message is stored, and the specified MB. The contents of the local buffer corresponding to the above, that is, the batch transfer unit, are sent to the process all at once. Although the actual state of the parts sent in a batch may be a repetitive operation as shown in FIG. 7c, the process is regarded as a single transfer operation. When the batch transfer unit is sent, the local buffer management function updates the address of the batch transfer unit in order to read the subsequent message from the distributed shared memory. Until all messages in the MB are read into the local buffer, reading into the local buffer and sending to the process are repeated. However, when all the messages in the MB are read, the reception operation of the local buffer management function ends.
[0042]
In addition to notifying the process from the local buffer management function to the process that the batch transfer unit size has been reached, the process may start a series of operations regardless of the direct call command, that is, the notification from the local buffer management laterality. This makes it possible to autonomously instruct message reading from the process side. A write operation from a process to a distributed shared memory or a transmission operation to a partner process can be said to be almost the reverse operation. Note that the transmission side and the reception side are independent with respect to the local buffer, and can operate even if this function is set only on the transmission example or only on the reception side, and each has an effect. On the sending side, it is possible to avoid the transmission of real-time messages from being disturbed by a large-scale message due to a bottleneck in the network. On the receiving side, a device that transfers messages between distributed shared memories in a similar situation The phenomenon that a large-scale message hinders the reception of a real-time message, which occurs when the reading speed on the process side with respect to the writing speed to the distributed shared memory becomes a bottleneck, can be avoided.
[0043]
The present invention is not limited to the above-described embodiments, and various changes and modifications can be made. For example, the logic for determining that the batch transfer unit scale is 0 may or may not be applied to a large-scale message. It is possible to change the determination as to whether or not the message is a real-time message by changing the determination criterion, and the present invention is not limited to always handling a small message as a real-time message. Is not limited to being treated as a real-time message.
[0044]
【The invention's effect】
As described above, according to the present invention, it is possible to ensure the real-time property of inter-node communication according to the characteristics of a message generated by a process only by designating the use area size of a local buffer. (A) Add buffer (and DSM write command from buffer), (b) Add AP interface to notify batch transfer unit size, (c) Extend DMC transmission message length counting to local buffer With such a simple correction, there is an effect that it is possible to ensure the real-time property of the inter-node communication by the communication device using the same distributed shared memory without depending on the process.
[Brief description of the drawings]
FIG. 1 is a diagram showing an arrangement of a network model configuration using a general high-speed communication method to which the present invention is applied and an example of a relationship, and particularly shows an access mode to an AP or process communication device.
FIG. 2 is a diagram showing a node and network configuration having a communication apparatus using a distributed shared memory according to the present invention.
FIG. 3 is a diagram showing an example of a memory structure.
FIG. 4 is a diagram for explaining a message communication method over time on a transmission side of a communication apparatus using a distributed shared memory according to the present invention.
FIG. 5 is a diagram showing a message communication method on the receiving side of a communication apparatus using a distributed shared memory.
FIG. 6 is a diagram showing a part of a flowchart of a shared memory write operation using a local buffer or a transmission operation to a partner process on the transmission side of a communication apparatus using a distributed shared memory according to the present invention.
FIG. 7 is a diagram showing a part of a flowchart of a shared memory write operation using a local buffer or a transmission operation to a partner process on the transmission side of a communication apparatus using a distributed shared memory according to the present invention.
FIG. 8 is a diagram showing a part of a flowchart of a shared memory write operation using a local buffer or a transmission operation to a partner process on the transmission side of a communication apparatus using a distributed shared memory according to the present invention.
FIG. 9 is a flowchart of a shared memory read operation using a local buffer or a reception operation from a partner process on the reception side of a communication device using a dedicated shared memory according to the present invention.
[Explanation of symbols]
11A, 11B, 11C communication equipment
12A, 12B, 12C OS
13A, 13B, 13C, 15C APs
14A, 14B Conventional access mode
14C Access form according to the present invention
16C local buffer
21A, 21B, 28D Distributed shared memory
21C, 211D network
22A, 22B, 29D Device for transferring messages between distributed shared memories (distributed memory coupler)
23A, 23B local memory
24A, 24B, 27D local buffer
25A, 25B processor
26A, 26B, 21D node (processor module)
27A, 27B common road
22D software layer
23D equipment
24D application program
25D OS
26D local memory
210D Message buffer (MB)
Messages sent to 212D and 220D networks
Messages generated by 213D, 214D 24D
Message written to 215D 28D
213D, 216D Messages automatically detected by the distributed memory coupler
Message written to 213D, 217D 27D
Messages written to 214D, 218D 28D
214D and 219D messages automatically detected by the distributed memory coupler
214D, 221D batch transfer unit
222D OS function to manage local buffers (local buffer management function)
31A, 31E processor module
32A, 32E application layer
33A, 33E Distributed shared memory and distributed memory coupler layer
34A, 34E network
35A Real-time messages that can be sent over the network at the original transmission time
36A, 39E Large messages sent over the network
37A Real-time message with network waiting and waiting
38A Original transmission time of highly real-time messages
39A Highly real-time message transmission time when the network becomes a bottleneck and waits
310A Time when a large message is held in the local buffer
31B local buffer, 32B ... Distributed memory coupler when using local buffer
Network with 33B local buffer
Send real-time messages in 31C, 36E applications
Sending large messages in 31D, 32D, and 35E applications
37E Large message written to distributed shared memory
38E Real-time message written to distributed shared memory
310E A real-time message sent over a network that has been waiting because a large message is written to distributed shared memory
311E Real-time message transmission time when a large message is written to distributed shared memory
41A Message arrival interval from source node
41B Message arrival interval from the source node
42A interval between the first notification of a message and the time it is read by the process
42B interval between message arrival notification and read into process
43A Distributed Shared Memory in Communication Equipment Using Distributed Shared Memory
43B Distributed shared memory in communication devices using distributed shared memory
44A local buffer
44B, 45A process
A 45B message arrival notification sent to a process from a communication device that uses distributed shared memory each time a message arrives
46A Transmission arrival notification sent to a process from a communication device using distributed shared memory when the head of a series of messages arrives
46B Message read into the process from the distributed shared memory in the communication device using the distributed shared memory
47A Message read into process from local buffer

Claims

A plurality of nodes are interconnected via a network, and each node has one or more processes for generating data or messages, and a memory having a distributed shared memory used to share data with other nodes Each node comprising: control means for controlling writing of data to the distributed shared memory; and a data transfer device for transmitting data written to the distributed shared memory to the network and receiving data sent from the network. in, the data in its own distributed shared memory is written, its data transfer device, in inter-process communication method using the distributed shared memory to be transferred respectively to the distributed shared memory of automatically forwarding node the data ,
Each node has a local buffer for temporarily storing data generated by the process, and the control means stores the data in the distributed shared memory when the data size of the data generated by the process is equal to or less than a predetermined data length. When a data size of directly written and generated data exceeds the predetermined data length, the data is temporarily written in the local buffer and then written in the distributed shared memory. Interprocess communication method.

2. The inter-process communication method using a distributed shared memory according to claim 1, wherein the process notifies the control means of the data size of the generated data, and the control means notifies the data size of the predetermined data length. A process for interprocess communication using a distributed shared memory, characterized in that control is performed so that data is written to a local buffer or distributed shared memory based on the determination result.

3. The interprocess communication method using the distributed shared memory according to claim 2, wherein the control means is a predetermined process generated by another process during a period in which data generated by the process is written in a local buffer. A method for interprocess communication using a distributed shared memory, wherein control is performed so that small-scale data having a data size equal to or smaller than the data length is directly written directly to the distributed shared memory.

4. The inter-process communication method using the distributed shared memory according to claim 1, wherein when the data generated by the process exceeds the predetermined data length, the control unit stores data in the local buffer. A distributed control characterized in that when the written data reaches a buffer capacity of a predetermined size, the write control is performed so that the written data of the predetermined buffer capacity is collectively written in the distributed shared memory. An interprocess communication method using a shared memory.

5. The interprocess communication method using a distributed shared memory according to claim 4, wherein the control means determines the size of the predetermined buffer capacity to be transferred in batch according to the data size notified from the process. An interprocess communication method using a distributed distributed memory.

The interprocess communication method using the distributed shared memory according to any one of claims 1 to 5, wherein the control means uses a local write used based on a data size of data to be transferred received from a process. An interprocess communication method using a distributed shared memory, characterized in that a buffer capacity is determined and a write area for a local buffer is secured.

7. The inter-process communication method using the distributed shared memory according to claim 1, wherein when the control unit receives a data write request from a process, the data size of the data is the predetermined value. If the data length is equal to or less than the data length, information specifying the write area to the distributed shared memory is given, and if the data size exceeds the predetermined data length, information specifying the write area to the local buffer is given An interprocess communication method using a shared memory.

The interprocess communication method using the distributed shared memory according to any one of claims 1 to 7, wherein the process has a function of generating a command indicating that the process should be preferentially written to the distributed shared memory. The control means controls the distributed shared memory to directly write the data in the distributed shared memory regardless of the data size for the data designated by the command to be preferentially written. The interprocess communication method used.

9. The interprocess communication method using a distributed shared memory according to any one of claims 1 to 8, wherein the local buffer is provided in an OS or an AP layer. Communication method.

The inter-process communication method using the distributed shared memory according to any one of claims 1 to 9, wherein the control means is provided in an OS or an AP layer. Interprocess communication method.

The interprocess communication method using the distributed shared memory according to any one of claims 1 to 10, wherein the data received by the data transfer device is stored in the local buffer and then transferred to the process. An interprocess communication method using a distributed shared memory.

In a data processing apparatus used for a communication system between nodes using a distributed shared memory and having a communication function,
One or more processes that generate data, a memory having a distributed shared memory used to share data with other nodes, a control means for controlling the writing of data to the distributed shared memory, and a distributed shared A data transfer device for transmitting data written in the memory to the network and receiving data sent from the network, and a local buffer for temporarily storing the data generated by the process,
The control means writes the data directly to the distributed shared memory when the data size of the data generated by the process is equal to or smaller than a predetermined data length, and when the data size of the generated data exceeds the predetermined data length, the local data What is claimed is: 1. A data processing apparatus comprising: controlling data to be written in a distributed shared memory after being temporarily written in a buffer.

13. The data processing apparatus according to claim 12, wherein the process notifies the control means of the data size of the generated data, and the control means determines whether or not the notified data size exceeds a predetermined data length. A data processing apparatus that controls to write data to a local buffer or a distributed shared memory based on the determination result.

14. The data processing device according to claim 13, wherein the control means has a data size equal to or smaller than a predetermined data length generated by another process during a period in which the data generated by the process is written in a local buffer. A data processing apparatus that controls to directly write small-scale data directly to the distributed shared memory.

15. The data processing device according to claim 12, wherein when the data generated by the process exceeds the predetermined data length, the control unit writes the control unit to the local buffer. A data processing apparatus, characterized in that, when written data reaches a buffer capacity of a predetermined size, control is performed so that the written data of the predetermined buffer capacity is collectively written in the distributed shared memory.

16. The data processing apparatus according to claim 15, wherein the control means determines a size of the predetermined buffer capacity to be collectively transferred according to a data size notified from a process.

17. The inter-process communication method using the distributed shared memory according to claim 12, wherein the control unit is used for writing based on a data size of data to be transferred received from a process. A data processing apparatus characterized by determining a capacity of a local buffer and securing a write area for the local buffer.

16. The data processing apparatus according to claim 12, wherein when the control means receives a data write request from a process, the control means distributes when the data size of the data is equal to or less than the predetermined data length. A data processing apparatus, characterized in that information specifying a write area to a shared memory is given, and information specifying a write area to a local buffer is given when the data length exceeds a predetermined data capacity.

19. The data processing device according to claim 12, wherein the process has a function of generating a command indicating that the process should be preferentially written to the distributed shared memory, and the control unit includes: A data processing apparatus, wherein the data designated by the command to be preferentially written is controlled so that the data is directly written into the distributed shared memory regardless of the data size.

21. A data processing apparatus according to claim 12, wherein said control means is provided in an OS or an AP layer.

The data processing device according to any one of claims 12 to 21, wherein the data received by the data transfer device is stored in the local buffer and then transferred to a process.