JP3872034B2

JP3872034B2 - Multiprocessor system, data processing method, data processing system, computer program, semiconductor device

Info

Publication number: JP3872034B2
Application number: JP2003106871A
Authority: JP
Inventors: 伸夫佐々木
Original assignee: Sony Computer Entertainment Inc
Current assignee: Sony Interactive Entertainment Inc
Priority date: 2000-09-27
Filing date: 2003-04-10
Publication date: 2007-01-24
Anticipated expiration: 2021-09-21
Also published as: JP2004005572A

Description

【０００１】
【発明の属する技術分野】
本発明は、複数のデータ処理手段によりデータ処理を行うデータ処理システム、例えばマルチプロセッサシステム及びデータ処理方法に関する。
【０００２】
【発明の背景】
高度情報化社会が進み、コンピュータ等のデータ処理装置によるデータ処理量は増大する傾向にある。また、データ処理の内容も複雑化、高度化している。従来、ＣＰＵ（Central Processing Unit）などのプロセッサの高性能化や、複数のプロセッサによるマルチプロセッサ化により、データ処理装置全体の処理能力の向上を図っている。
しかし、近年、要求されるデータ処理能力の増大のスピードは、プロセッサの高性能化のスピードを凌駕するまでになっている。プロセッサの高性能化は、その開発期間が長いこともあり一朝一夕に行えるものではない。
一方、例えばマルチプロセッサによるデータ処理能力は、使用するプロセッサの数や、その処理方法により決まり、個々のプロセッサの高性能化への依存度が小さい。そのために、データ処理装置の処理能力を向上させるための有効な手段の一つとなっている。
【０００３】
マルチプロセッサによるデータ処理方法を、一つのプロセッサがデータ処理時に必要とするデータの範囲により分類すると、以下のようになる。
（１）データ処理を行うプロセッサが、隣接して接続されるプロセッサにより処理されたデータのみを使用する
このような制御は、セル・オートマトン、画像フィルタ、布や波の運動の計算、曲面からのポリゴン生成の計算等に向いている。
（２）データ処理を行うプロセッサが、複数のプロセッサのうちの一部のプロセッサにより処理されたデータのみを使用する
このような制御は、多対多の衝突判定等に向いている。
【０００４】
上記の（１）の場合のデータ処理は、従来の並列プロセッサによって、効率よく実現可能である。しかし、（２）のデータ処理は、並列プロセッサ間の通信速度によりシステム全体の処理速度が制限されてしまい、各プロセッサの処理速度を十分に発揮できない。例えば、すべてのプロセッサ間をクロスバー接続することにより、（２）のデータ処理を高速に行うことも可能であるが、この場合、必要なハードウェアが膨大になり、現実的ではない。
【０００５】
本発明の課題は、例えば上記の（２）のデータ処理を従来よりも効率よく行うことのできるデータ処理システム及びデータ処理方法を提供することにある。
【０００６】
【課題を解決するための手段】
上記課題を解決するため、本発明のマルチプロセッサシステムは、それぞれが、所定の仮想空間内に分布する複数のオブジェクトを少なくとも一つ管理しており、自分が管理するオブジェクトの前記仮想空間内における位置を表す位置データを生成する複数のプロセッサと、すべてのプロセッサから前記位置データを取得可能であり、取得した位置データをすべてのプロセッサにブロードキャストするコントローラと、を備えており、前記複数のプロセッサの各々は、前記コントローラからブロードキャストされた位置データにより、この位置データで位置が表されるオブジェクトが自分が管理するオブジェクトが分布する範囲内にあるか否かを判定して、前記範囲内にある場合にのみ、当該オブジェクトが自分が管理するオブジェクトと衝突する位置にあるか否かを判定するようになっている。
前記複数のプロセッサの各々は、例えば二以上の前記オブジェクトを管理する。
【０００７】
前記複数のプロセッサは、自分の管理するオブジェクトについて、前記仮想空間内における速度を表す速度データを生成するようになっていてもよい。この場合、前記コントローラが、すべての前記プロセッサから前記位置データ及び前記速度データを取得可能であり、取得した位置データ及び速度データを一組ずつすべてのプロセッサにブロードキャストすれば、前記プロセッサは、ブロードキャストされた位置データで位置が表されるオブジェクトが自分の管理下にあるオブジェクトと衝突位置にある場合に、ブロードキャストされたオブジェクトの前記速度データにより、衝突による衝撃の強さを定量的に表す衝突強度データ及び衝突によるオブジェクトへの影響を表すデータを生成することができるようになる。
また、前記複数のプロセッサに、各々を識別するための識別データを割り当て、前記衝突強度データを生成したプロセッサからは衝突強度データをそのプロセッサの識別データと共に取り込み、前記衝突強度データを生成していないプロセッサからは衝突強度データの値よりも小さい値をそのプロセッサの衝突強度データとして識別データと共に取り込んで、最も大きい衝突強度データを生成したプロセッサを特定し、特定したプロセッサの識別データを前記コントローラに送る最大値検出機構をさらに備えるようにしてもよい。これにより前記コントローラは、前記最大値検出機構から送られた識別データにより表されるプロセッサから、衝突強度データ及び衝突によるオブジェクトへの影響を表すデータを取得することができるようになる。
【０００８】
また、このようなマルチプロセッサシステムにおいて、前記プロセッサは、例えば、位置データがブロードキャストされたオブジェクトと自分の管理下にあるオブジェクトとの距離を算出することにより、当該オブジェクトが自分の管理下にあるオブジェクトと衝突位置にあるか否かを判定するようになっている。
【０００９】
本発明のデータ処理方法は、所定の仮想空間内に分布しており複数のクラスタに分けられる複数のオブジェクトを、それぞれが異なる一つのクラスタ毎に管理する複数のデータ処理手段と、すべての前記オブジェクトの前記仮想空間内における位置を記憶するとともに、すべてのデータ処理手段に、前記オブジェクトの位置を表す位置データをブロードキャスト可能な制御手段と、を有する装置又はシステムにおいて実行される方法であって、前記制御手段が、すべてのオブジェクトの前記位置データを一つずつすべてのデータ処理手段にブロードキャストする第１段階と、前記複数のデータ処理手段の各々が、自分が管理するクラスタに属するオブジェクトが分布する範囲内に、前記第１段階で前記制御手段からブロードキャストされた位置データで位置が表されるオブジェクトがあるか否かを判定する第２段階と、前記複数のデータ処理手段の各々が、前記第２段階で前記制御手段からブロードキャストされた位置データで表されるオブジェクトが前記範囲内にある場合にのみ、当該オブジェクトが、自分が管理するクラスタに属するオブジェクトと衝突する位置にあるか否かを判定する第３段階と、を含む。
【００１０】
本発明のデータ処理システムは、各々を識別するための識別データが割り当てられており、各々が、所定の仮想空間内に分布する複数のオブジェクトの少なくとも一つについて、前記仮想空間内における当該オブジェクトの位置を表す位置データを生成するとともに、当該オブジェクトに他のオブジェクトが衝突する際の衝撃の強さを定量的に表す衝突強度データを生成する複数のデータ処理手段と、前記複数のデータ処理手段から前記識別データ及び前記衝突強度データを取得して、最も大きい衝突強度データを検出し、検出した衝突強度データとともに取得した識別データを出力する最大値検出手段と、前記複数のデータ処理手段の各々から前記位置データを取得することですべてのオブジェクトの位置データを取得し、前記複数のデータ処理手段に対してこの位置データをブロードキャストするとともに、前記最大値検出手段から送信された識別データに基づいていずれかのデータ処理手段から衝突強度データを取得する制御手段と、を備えている。前記複数のデータ処理手段の各々は、自身で生成した前記位置データにより位置が表されるオブジェクトが分布する範囲内に、前記制御手段からブロードキャストされた位置データで位置が表されるオブジェクトがあるか否かを判定し、前記範囲内にある場合にのみ、当該オブジェクトが自身で位置データを生成したオブジェクトと衝突する位置にあるか否かを判定するものである。
このシステムは、例えば二以上の前記オブジェクトの位置データを生成する。
【００１１】
本発明が提供するコンピュータプログラムは、各々を識別するための識別データが割り当てられており、各々が、所定の仮想空間内に分布する複数のオブジェクトの少なくとも一つについて、前記仮想空間内における当該オブジェクトの位置を表す位置データを生成するとともに、当該オブジェクトに他のオブジェクトが衝突する際の衝撃の強さを定量的に表す衝突強度データを生成する複数のデータ処理手段と、前記複数のデータ処理手段から前記識別データ及び前記衝突強度データを取得して、最も大きい衝突強度データを検出し、検出した衝突強度データとともに取得した識別データを出力する最大値検出手段と、前記複数のデータ処理手段の各々から前記位置データを取得することですべてのオブジェクトの位置データを取得し、前記複数のデータ処理手段に対してこの位置データをブロードキャストするとともに、前記最大値検出手段から出力された識別データに基づいていずれかのデータ処理手段から衝突強度データを取得する制御手段と、を備えたコンピュータ搭載の装置に於いて、前記コンピュータに、前記複数のデータ処理手段の各々が、自身で生成した前記位置データにより位置が表されるオブジェクトが分布する範囲内に、前記制御手段からブロードキャストされた位置データで位置が表されるオブジェクトがあるか否かを判定し、前記範囲内にある場合にのみ、当該オブジェクトが自身で位置データを生成したオブジェクトと衝突する位置にあるか否かを判定する処理、を実行させるためのコンピュータプログラムである。
本発明が提供する半導体デバイスは、各々を識別するための識別データが割り当てられており、各々が、所定の仮想空間内に分布する複数のオブジェクトの少なくとも一つについて、前記仮想空間内における当該オブジェクトの位置を表す位置データを生成するとともに、当該オブジェクトに他のオブジェクトが衝突する際の衝撃の強さを定量的に表す衝突強度データを生成する複数のデータ処理手段と、前記複数のデータ処理手段から前記識別データ及び前記衝突強度データを取得して、最も大きい衝突強度データを検出し、検出した衝突強度データとともに取得した識別データを出力する最大値検出手段と、前記複数のデータ処理手段の各々から前記位置データを取得することですべてのオブジェクトの位置データを取得し、前記複数のデータ処理手段に対してこの位置データをブロードキャストするとともに、前記最大値検出手段から出力された識別データに基づいていずれかのデータ処理手段から衝突強度データを取得する制御手段と、を備えたコンピュータ搭載の装置に組み込まれることにより、前記コンピュータに、前記複数のデータ処理手段の各々が、自身で生成した前記位置データにより位置が表されるオブジェクトが分布する範囲内に、前記制御手段からブロードキャストされた位置データで位置が表されるオブジェクトがあるか否かを判定し、前記範囲内にある場合にのみ、当該オブジェクトが自身で位置データを生成したオブジェクトと衝突する位置にあるか否かを判定する処理、を実行させる半導体デバイスである。
【００１２】
【発明の実施の形態】
以下に、本発明をデータ処理システムの一例となるマルチプロセッサシステムに適用した場合の実施の形態を説明する。
【００１３】
＜全体構成＞
図１は、マルチプロセッサシステムの構成例を示した図である。このマルチプロセッサシステム１は、データ処理及びデータ記録及び読み出しのための制御手段であるブロードキャストメモリコントローラ（以下、「ＢＣＭＣ（Broadcast Memory Controller）」という。）１０と、各々データ処理手段の一例となる複数のセルプロセッサ２０と、データ処理のための所要の機能を種々形成するための複数のＷＴＡ（Winner Take All）・総和回路３０と、を含んで構成されている。
ＢＣＭＣ１０とすべてのセルプロセッサ２０とは、ブロードキャストチャネル（一斉送出可能な通信チャネル）により接続されている。
【００１４】
このマルチプロセッサシステム１は、各セルプロセッサ２０によるデータ処理結果の一例となる状態変数値をＢＣＭＣ１０で管理し、ＢＣＭＣ１０からすべてのセルプロセッサ２０の状態変数値を、参照用数値の一例としてブロードキャストにより送出するものである。これにより、各セルプロセッサ２０は、高速に他のセルプロセッサ２０において発生した状態変数値を参照可能とする。
【００１５】
ブロードキャストチャネルは、ＢＣＭＣ１０と複数のセルプロセッサ２０との間の伝送経路であって、アドレスの受け渡しに使用されるアドレスバスと、状態変数値などのデータの受け渡しに使用されるデータバスとを含んで構成される。アドレスには、個々のセルプロセッサ２０を特定するためのセルアドレスと、すべてのセルプロセッサ２０を対象とするブロードキャストアドレスとがある。
セルアドレスは、メモリ上のアドレス（物理アドレス又は論理アドレス）に対応しており、セルプロセッサ２０からの状態変数値は、常に、当該セルプロセッサ２０を示すセルアドレスに対応するアドレスに記憶されるようになっている。各セルプロセッサ２０には、各々を識別するための識別情報として、ＩＤ（identification）が付されている。セルアドレスは、このＩＤにも対応するようになっている。これにより、状態変数値がどのセルプロセッサ２０から出力されたのかを、セルアドレスによって特定することができる。
【００１６】
ＷＴＡ・総和回路３０は、図１に示すように接続される。即ち、ＷＴＡ・総和回路３０は、セルプロセッサ２０側を一段目としてピラミッド状に接続される。一段目のＷＴＡ・総和回路３０の入力端には２つのセルプロセッサ２０が接続され、出力端は二段目のＷＴＡ・総和回路３０の入力端に接続される。
二段目以降は、入力端の各々に下位の段の２つのＷＴＡ・総和回路３０の出力端が接続され、出力端に上位の段のＷＴＡ・総和回路３０の入力端が接続される。最上段のＷＴＡ・総和回路３０は、入力端に下段の２つのＷＴＡ・総和回路３０の出力端が接続され、出力端はＢＣＭＣ１０に接続される。
【００１７】
なお、図示の接続形態の他に、ＷＴＡ・総和回路３０をカスケードに接続しても、本発明を実施することが可能である。この場合、一段目のＷＴＡ・総和回路３０の入力端には２つのセルプロセッサ２０を接続し、出力端を上位の段の入力端に接続する。二段目以降のＷＴＡ・総和回路３０の入力端には、下位の段のＷＴＡ・総和回路３０の出力端とセルプロセッサ２０が接続され、出力端は上位の段の入力端に接続される。最上段のＷＴＡ・総和回路３０は、入力端に下位の段のＷＴＡ・総和回路３０の出力端とセルプロセッサ２０とが接続され、出力端はＢＣＭＣ１０に接続される。
【００１８】
次に、ＢＣＭＣ１０、セルプロセッサ２０、ＷＴＡ・総和回路３０のそれぞれについて詳細に説明する。
【００１９】
＜ＢＣＭＣ＞
ＢＣＭＣ１０は、ブロードキャストチャネルによりすべてのセルプロセッサ２０にデータをブロードキャストするとともに、各セルプロセッサ２０からの状態変数値を取り込んで保持する。図２にＢＣＭＣ１０の構成例を示す。
ＢＣＭＣ１０は、マルチプロセッサシステム１全体の動作を制御するＣＰＵコア１０１と、ＳＲＡＭ（Static Random Access Memory）などの書き換え可能なメインメモリ１０２と、ＤＭＡＣ（Direct Memory Access Controller）１０３とがバスＢ１で接続されて構成される。ＣＰＵコア１０１は、メインメモリ１０２と協働し、所定のコンピュータプログラムを読み込んで実行することにより、本発明の特徴的なデータ処理を行うための機能を形成するコンピュータ搭載の半導体デバイスである。メインメモリ１０２は、システム全体の共有メモリとして使用されるようになっている。
バスＢ１には、最上段のＷＴＡ・総和回路３０の出力端及びハードディスクや可搬性メディア等の外部メモリも接続される。
【００２０】
ＣＰＵコア１０１は、起動時に上記の外部メモリから起動プログラムを読み込み、その起動プログラムを実行してオペレーティングシステムを動作させる。また、データ処理に必要となる種々のデータを上記の外部メモリから読み出し、これをメインメモリ１０２に展開する。メインメモリ１０２には、各セルプロセッサ２０の状態変数値などのデータも記憶されるようにする。状態変数値は、当該状態変数値を算出したセルプロセッサ２０のセルアドレスに応じたメインメモリ１０２のアドレスに記憶される。
ＣＰＵコア１０１は、また、メインメモリ１０２から読み出したデータに基づいて、各セルプロセッサ２０に対してブロードキャストするブロードキャストデータを生成する。ブロードキャストデータは、例えば、状態変数値と当該状態変数値を算出したセルプロセッサ２０を示すセルアドレスとの組からなるペア（組）データである。ペアデータは、１組又は複数組生成される。
【００２１】
ＤＭＡＣ１０３は、メインメモリ１０２と各セルプロセッサ２０との間のダイレクトメモリアクセス転送制御を行う半導体デバイスである。例えば、各セルプロセッサ２０に対しては、ブロードキャストチャネルを介して、ブロードキャストデータをブロードキャストする。また、各セルプロセッサ２０のデータ処理結果を個別に取得して、メインメモリ１０２に書き込む。
【００２２】
＜セルプロセッサ＞
各セルプロセッサ２０は、ブロードキャストデータの中から必要となるデータを取捨選択してデータ処理を行い、データ処理の終了時に、その旨をＷＴＡ・総和回路３０へ報告する。データ処理結果である状態変数値を、ＢＣＭＣ１０からの指示により、ＢＣＭＣ１０へ送出する。各セルプロセッサ２０間は、図示しない共有メモリを介してリング接続される。各セルプロセッサ２０は、データ処理を同期的なクロックで行ってもよく、各々異なるクロックで行ってもよい。図３にセルプロセッサ２０の構成例を示す。
セルプロセッサ２０は、セルＣＰＵ２０１と、入力バッファ２０２と、出力バッファ２０３と、ＷＴＡバッファ２０４と、プログラムコントローラ２０５と、命令メモリ２０６と、データメモリ２０７と、を含んで構成される。
【００２３】
セルＣＰＵ２０１は、プログラマブルな浮動小数点演算器を備えたプロセッサであり、セルプロセッサ２０内の動作を制御して、データ処理を行うものである。セルＣＰＵ２０１は、ＢＣＭＣ１０からブロードキャストされたブロードキャストデータを入力バッファ２０２を介して取得し、ペアデータのセルアドレスにより自己が行うべき処理に必要なデータか否かを判断し、必要であればデータメモリ２０７の対応するアドレスに状態変数値を書き込む。また、データメモリ２０７から状態変数値を読み出してデータ処理を行い、データ処理結果を出力バッファ２０３に書き込み、ＷＴＡ・総和回路３０にデータ処理の終了を示すデータを送る。
【００２４】
入力バッファ２０２は、ＢＣＭＣ１０からブロードキャストされたブロードキャストデータを保持するものである。保持されたブロードキャストデータは、セルＣＰＵ２０１からの要求により、セルＣＰＵ２０１へ送られる。
出力バッファ２０３は、セルＣＰＵ２０１の状態変数値を保持するものである。保持された状態変数値は、ＢＣＭＣ１０からの要求により、ＢＣＭＣ１０へ送信される。
入力バッファ２０２及び出力バッファ２０３は、この他に制御用のデータ等の送受を行ってもよい。
ＷＴＡバッファ２０４は、セルＣＰＵ２０１によるデータ処理の終了時に、セルＣＰＵ２０１からデータ処理の終了を示すデータを受信して、これをＷＴＡ・総和回路３０へ送信することにより、データ処理の終了をＷＴＡ・総和回路３０に報告するものである。データ処理の終了を示す終了データには、例えば、自セルプロセッサ２０のＩＤと、出力バッファ２０３に保存された状態変数値がＢＣＭＣ１０へ読み取られるときの優先度を決める優先度データとが含まれる。
【００２５】
プログラムコントローラ２０５は、セルプロセッサ２０の動作を規定するプログラムをＢＣＭＣ１０から取り込むものである。セルプロセッサ２０の動作を規定するプログラムには、セルプロセッサ２０で実行されるデータ処理のためのプログラムや、当該セルプロセッサ２０で処理に必要なデータを決めるデータ選択プログラム、処理結果がＢＣＭＣ１０へ読み取られるときの優先度を決める優先度決定プログラムなどがある。
命令メモリ２０６は、プログラムコントローラ２０５により取り込んだプログラムを保存するものである。保存したプログラムは、必要に応じてセルＣＰＵ２０１に読み込まれる。
【００２６】
データメモリ２０７は、セルプロセッサ２０において処理されるデータを保存するものである。セルＣＰＵ２０１により必要と判断されたブロードキャストデータが書き込まれる。ブロードキャストデータは、セルアドレスに応じたアドレスに保存される。
また、本実施形態ではデータメモリ２０７の一部は共有メモリを介して隣接するセルプロセッサ２０に繋がっており、１サイクル毎に隣接するセルプロセッサ２０とデータの送受が可能となっている。
【００２７】
＜ＷＴＡ・総和回路＞
複数のＷＴＡ・総和回路３０は、各セルプロセッサ２０から送られるデータ処理の終了を示すデータにより、ＢＣＭＣ１０がセルプロセッサ２０から状態変数値を取り込む順序を決めてＢＣＭＣ１０へ報告する。
図４にＷＴＡ・総和回路３０の構成例を示す。
各ＷＴＡ・総和回路３０は、２つの入力レジスタＡ、Ｂ（以下、第１入力レジスタ３０１、第２入力レジスタ３０２）と、切換器３０３と、比較器３０４と、加算器３０５と、出力レジスタ３０６と、を含んで構成される。
【００２８】
第１入力レジスタ３０１及び第２入力レジスタ３０２は、それぞれ整数レジスタ及び浮動小数点レジスタを備えている。整数レジスタには、例えばセルプロセッサ２０から送られるデータ処理の終了を示す終了データのうち、ＩＤが書き込まれ、浮動小数点レジスタには、例えば優先度データが書き込まれる。
切換器３０３は、比較器３０４及び加算器３０５のいずれか一方を活性化する。具体的には、動作モードに従って一方のみを使用可能とする。動作モードは、例えばＢＣＭＣ１０からの指示により決められる。動作モードについては後述する。
比較器３０４は、第１入力レジスタ３０１及び第２入力レジスタ３０２の各々の浮動小数点レジスタが保持する浮動小数点値の比較を行い、大きい方（又は小さい方）の値と、それに付随する整数とを、出力レジスタ３０６へ書き込む。
加算器３０５は、第１入力レジスタ３０１及び第２入力レジスタ３０２の各々の浮動小数点レジスタが保持する浮動小数点値の和を算出し、算出結果を出力レジスタ３０６へ書き込む。
出力レジスタ３０６は、第１入力レジスタ３０１及び第２入力レジスタ３０２とほぼ同じに構成される。つまり、整数レジスタ及び浮動小数点レジスタを備えている。整数レジスタにはＩＤが書き込まれ、浮動小数点レジスタには優先度データが書き込まれるようになっている。
【００２９】
ＷＴＡ・総和回路３０は、以下に説明する３つの動作モードをもつ。
【００３０】
・最大値（ＷＴＡ）モード：
切換器３０３により、比較器３０４が活性化される。比較器３０４は、第１入力レジスタ３０１及び第２入力レジスタ３０２の各々の浮動小数点レジスタが保持する浮動小数点値Ａ、Ｂの比較を行い、大きい方（又は小さい方）の値と、それに付随する整数値を出力レジスタ３０６に書き込む。出力レジスタ３０６への書き込みが終了すると、第１入力レジスタ３０１及び第２入力レジスタ３０２をクリアする。出力レジスタ３０６の内容は、上位の段のＷＴＡ・総和回路３０の入力レジスタに書き込まれる。このとき、書き込み先の入力レジスタがクリアされていないときは、書き込みがストールして、そのサイクルでは書き込みを行わず、次のサイクルで書き込むようにする。
【００３１】
・加算モード：
切換器３０３により、加算器３０５が活性化される。加算器３０５により、第１入力レジスタ３０１及び第２入力レジスタ３０２の各々の浮動小数点レジスタが保持する浮動小数点値Ａ、Ｂの和を算出し、算出結果を出力レジスタ３０６に書き込む。出力レジスタ３０６の内容は、上位の段のＷＴＡ・総和回路３０の入力レジスタに書き込まれる。
【００３２】
・近似ソートモード：
切換器３０３により、比較器３０４が活性化される。比較器３０４は、第１入力レジスタ３０１及び第２入力レジスタ３０２の各々の浮動小数点レジスタが保持する浮動小数点値Ａ、Ｂの比較を行い、大きい方（又は小さい方）の値と、それに付随する整数値とを出力レジスタ３０６に書き込む。
その後、出力レジスタ３０６に書き込まれた値を保持していた入力レジスタのみをクリアし、出力レジスタ３０６の内容を、上位の段のＷＴＡ・総和回路３０の入力レジスタに書き込む。書き込み先の入力レジスタがクリアされていない場合は、書き込みがストールし、そのサイクルでは書き込みを行わない。ただし、下位の段のＷＴＡ・総和回路３０の出力レジスタ３０６からの書き込み動作は行われる。
近似ソートモードにより、ＢＣＭＣ１０がＷＴＡ・総和回路３０の最上段の出力レジスタ３０６から受け取るデータが、浮動小数点が大きい順或いは小さい順にソートされた（並び替えられた）ものとなる。
【００３３】
なお、各モードに入る前には、すべてのＷＴＡ・総和回路３０の第１入力レジスタ３０１、第２入力レジスタ３０２及び出力レジスタ３０６がクリアされる。
【００３４】
各モードを切替えて使用することにより、複数のＷＴＡ・総和回路３０全体として、上記のソートのための機構（ソート機構）及び／又は総和回路として機能する。つまり、近似ソートモードで動作するときは、ソート機構を実現するものとなり、加算モードで動作するときは、総和回路を実現するものとなる。
【００３５】
最大値モード、近似ソートモードで動作するＷＴＡ・総和回路３０は、次に示すようにして実現してもよい。
すなわち、セルプロセッサ２０と同数の入力レジスタと、切換器と、比較器と、加算器と、出力レジスタとを含んでＷＴＡ・総和回路が構成される。
入力レジスタがセルプロセッサ２０の数と同じだけ用意されており、それぞれが、第１レジスタ３０１、第２レジスタ３０２と同様に、整数レジスタ及び浮動小数点レジスタを備える。比較器は、すべての入力レジスタの浮動小数点レジスタが保持する浮動小数点値の比較を行う。加算器は、すべての入力レジスタの浮動小数点レジスタが保持する浮動小数点値の和を算出する。
出力レジスタは、図４のＷＴＡ・総和回路３０の出力レジスタと同様である。
【００３６】
比較器により、各入力レジスタの浮動小数点レジスタが保持する優先度データを比較して、優先度の高い順に、付随するＩＤを順次出力レジスタに書き込む。これにより、ＩＤを、優先度の高い順序でＢＣＭＣ１０へ送ることができる。
加算器により、各浮動小数点レジスタが保持するデータを加算して、その総和を求めることができる。
このようなＷＴＡ・総和回路は、図１に示すような接続形態をとらなくとも、一つで、本発明におけるソート機構、総和回路として機能する。
【００３７】
＜データ処理方法＞
本実施形態におけるマルチプロセッサシステム１は、以下のように動作することにより、所要のデータ処理を実行する。図５は、このマルチプロセッサシステム１において実行される処理の流れを示すフローチャートである。
【００３８】
ＢＣＭＣ１０のメインメモリ１０２には、すべてのセルプロセッサ２０の状態変数値の初期値が予め記憶される。
ＢＣＭＣ１０は、このセルプロセッサ２０の状態変数値とセルプロセッサ２０を示すセルアドレスとからなるペアデータにより、ブロードキャストデータを作成する（ステップＳ１０１）。そして、作成したブロードキャストデータを、すべてのセルプロセッサ２０へブロードキャストする（ステップＳ１０２）。
各セルプロセッサ２０は、ブロードキャストデータを、入力バッファ２０２に取り込む。セルＣＰＵ２０１は、命令メモリ２０６に記憶されたデータ選択プログラムにより、入力バッファ２０２が保持するブロードキャストデータのセルアドレスを調べて、自セルプロセッサ２０が行うデータ処理に要する状態変数値があるか否かを確認する（ステップＳ１０３）。自らが行うデータ処理に要する状態変数値が無い場合、セルプロセッサ２０は、処理動作を終了する（ステップＳ１０３：無）。自らが行うデータ処理に要する状態変数値が有る場合は（ステップＳ１０３：有）、該当する状態変数値を、この状態変数値とペアデータを組むセルアドレスに対応するデータメモリ２０７上のアドレスへ上書きする（ステップＳ１０４）。
以上により、ＢＣＭＣ１０から各セルプロセッサ２０へのデータのブロードキャストが終了する。
【００３９】
ブロードキャストが終了すると、各セルプロセッサ２０は、命令メモリ２０６に記憶されたデータ処理のプログラムにより、データメモリ２０７に記録された状態変数値をデータ処理して新たな状態変数値を生成する。新たな状態変数値は、データメモリ２０７に書き込まれるとともに、出力バッファ２０３にも書き込まれる（ステップＳ１０５）。新たな状態変数値は、データメモリ２０７上の、自らのセルアドレスに対応するアドレスに、上書きされる。
データ処理が終了すると、セルＣＰＵ２０１は、ＷＴＡバッファ２０４を介して１段目のＷＴＡ・総和回路３０の入力レジスタへＩＤと優先度データとを含む終了データを送信して、データ処理の終了を報告する（ステップＳ１０６）。優先度データは、データ処理の前又は後に、所定の優先度決定プログラムによって生成される。
【００４０】
１段目のＷＴＡ・総和回路３０は、各セルプロセッサ２０から送られる終了データのうち、ＩＤを入力レジスタの整数レジスタへ、優先度データを浮動小数点レジスタでそれぞれ保持する。ここで、ＷＴＡ・総和回路３０は近似ソートモードで動作する。そのために、切換器３０３は、比較器３０４を活性化する。
ＷＴＡ・総和回路３０の第１入力レジスタ３０１及び第２入力レジスタの整数レジスタは、各々異なるセルプロセッサ２０から送られたＩＤを保持する。また、各々の浮動小数点レジスタは、ＩＤに付随した優先度データを保持する。比較器３０４は、第１入力レジスタ３０１及び第２入力レジスタ３０２の浮動小数点レジスタからそれぞれ優先度データを読み出し、優先度を比較する。比較の結果、優先度が高い方の優先度データ及びそれに付随したＩＤを、出力レジスタ３０６の浮動小数点レジスタ及び整数レジスタへ書き込む。出力レジスタ３０６へ内容が書き込まれた入力レジスタは、その内容がクリアされる。出力レジスタ３０６へ書き込まれたＩＤ及び優先度データは、上位の段のＷＴＡ・総和回路３０の入力レジスタへ書き込まれる。
このような処理を各段のＷＴＡ・総和回路３０で行う。最上段のＷＴＡ・総和回路３０は、出力レジスタ３０６の整数レジスタに書き込まれたＩＤをＢＣＭＣ１０へ送る。
以上のような処理により、ＷＴＡ・総和回路３０全体としては、ＩＤを、優先度の高い順序でＢＣＭＣ１０へ送ることとなる（ステップＳ１０７）。
【００４１】
ＢＣＭＣ１０は、ＷＴＡ・総和回路３０から送られるＩＤに該当するセルプロセッサ２０の出力バッファ２０３から、データ処理された状態変数値を取得する。取得した状態変数値は、ＢＣＭＣ１０内のメインメモリ１０２上の、処理を行ったセルプロセッサ２０を示すセルアドレスに対応するアドレスに上書きされる（ステップＳ１０８）。
以上で、状態変数値の処理動作の１サイクルが終了する。
【００４２】
ＢＣＭＣ１０が、各セルプロセッサ２０からデータ処理結果を取得し、これによりブロードキャストデータを生成する。
各セルプロセッサ２０は、ブロードキャストデータから自分に必要となるデータのみを取捨選択してデータ処理を行う。このブロードキャストデータを用いてデータ処理を行うことにより、他のすべてのセルプロセッサ２０により処理されたデータを利用する処理が可能となる。また、ブロードキャストデータを、各セルプロセッサ２０からのデータ処理結果とこのデータ処理結果を生成したセルプロセッサ２０を示すセルアドレスとからなるペアデータにより作成することにより、特定のセルプロセッサ２０のデータ処理結果のみを用いる処理が可能となる。さらに、隣接するセルプロセッサ２０間は共有メモリを介して接続されているので、従来と同様に、隣接するセルプロセッサ２０間の処理も可能である。
各セルプロセッサ２０が、メインメモリ１０２に、直接、自セルプロセッサ２０で必要とするデータを取り込みに行くことがなく、ブロードキャストデータから必要となるデータを選択して、各セルプロセッサ２０内にデータを保持して処理を行うので、データの競合が起こらずに高速処理が可能となる。
【００４３】
［実施例１］
次に、上記のマルチプロセッサシステム１の実施例を具体的に説明する。
この実施例では、あるセルプロセッサ２０とそれに隣接する他のセルプロセッサ２０により処理されたデータのみを使用する場合の例を、図６を参照して説明する。
図６において、「○」はセルプロセッサを表しており、網掛された「○」がデータ処理を行うセルプロセッサ、「●」が必要とされるデータを保持するセルプロセッサである。
ｎ×ｎ（ｎは２以上の自然数）の格子の各格子点についてのデータ（格子点データ）に対して、次のようなフィルタ計算を連続的に実行する場合を考える。
Ｘi,j＝（Ｘi-1,j＋Ｘi+1,j＋Ｘi,j-1＋Ｘi,j+1）／４
ｉ：格子点の行番号、ｊ：格子点の列番号
【００４４】
ＢＣＭＣ１０は、格子点データを行又は列でグループ化したブロードキャストデータとして、ｎ個のセルプロセッサ２０にブロードキャストする。
図８は、格子点データをグループ化した例示図であり、「○」で示される格子点データを５個ずつグループ化してある。一つのグループ化した格子点データが、一つのセルプロセッサ２０で処理される。
セルプロセッサ２０では、ブロードキャストデータから必要とするグループ化された格子点データをデータメモリ２０７に保存する。データメモリ２０７から、格子点データを順次読み出してデータ処理する。
【００４５】
共有メモリを介して接続されるセルプロセッサ２０との間では、共有メモリを用いてデータ転送を行う。共有メモリへのデータの書込動作を１サイクルとすると、セルプロセッサ２０間のグループ化されたデータの転送は、２ｎサイクルで行うことができる。
各セルプロセッサ２０を同期的に動作させ、共有メモリへの書き込みと演算とをパイプライン処理のように同時に実行することにより、セルプロセッサ２０間の通信と演算を同時に行うことができる。
【００４６】
次のブロードキャストデータは、グループ化された格子点データのデータ処理が終了する度に、ＢＣＭＣ１０によりブロードキャストされる。セルプロセッサ２０は、ブロードキャストされるデータのｉ、ｊにより、必要なデータか否かを判断する。
ブロードキャストデータをグループ化することにより行又は列方向のデータを処理可能であり、共有データを介してデータ転送することにより列又は行方向のデータ処理が可能となる。
【００４７】
［実施例２］
この実施例では、すべてのセルプロセッサ２０のうち、一部のセルプロセッサ２０により処理されたデータのみを使用する場合の例を、図７を参照して説明する。図７において、「○」はセルプロセッサを表しており、網掛された「○」がデータ処理を行うセルプロセッサ、「●」が必要とされるデータを保持するセルプロセッサである。このようなマルチプロセッサシステムは、ホップフィールドの連想記憶器の実現に有用である。
各セルプロセッサ２０は、データ処理結果である状態変数値とその状態変数値の重要度を表す重み係数とを保持するものとする。また、セルプロセッサ２０には、番号が付されており、ＢＣＭＣ１０は、番号順にセルプロセッサ２０から状態変数値を取り込む。
ＢＣＭＣ１０は、すべてのセルプロセッサ２０から取り込んだ状態変数値をブロードキャストデータとしてブロードキャストする。各セルプロセッサ２０は、ブロードキャストデータから必要な状態変数値のみを選択して重み係数との積和演算を行い、状態変数値を更新する。必要な状態変数値が、ブロードキャストデータに含まれるすべての状態変数値の場合、すべてのプロセッサにより処理されたデータを使用する処理に該当することとなる。
【００４８】
［実施例３］
次に、パターンマッチング計算処理の例を説明する。
ここでは、入力データの特徴に最も類似するデータを保持するセルプロセッサ２０を特定する処理を行う。この処理は、以下のようにして行う。
各セルプロセッサ２０は、予め比較対象となるテンプレートデータを保持する。
ＢＣＭＣ１０は、入力データをすべてのセルプロセッサ２０にブロードキャストする。各セルプロセッサ２０は、自らが保持するテンプレートデータの特徴と入力データの特徴との差分値を算出する。差分値は、ＩＤとともにＷＴＡ・総和回路３０へ送られる。
ＷＴＡ・総和回路３０は、最大値モードで動作する。入力レジスタの整数レジスタはＩＤを保持し、浮動小数点レジスタは差分値を保持する。差分値を比較器３０４により比較して、小さい方の差分値とそれに付随するＩＤを出力レジスタ３０６へ送る。これをＷＴＡ・総和回路３０全体で行い、最も小さい差分値とそれに付随するＩＤを求める。このＩＤ及び差分値をＢＣＭＣ１０へ送る。
ＢＣＭＣ１０は、ＩＤによりセルプロセッサ２０を特定する。これにより、入力データの特徴に最も類似するテンプレートデータ及び入力データの特徴と最も類似するテンプレートデータとの差分値も検出できる。
【００４９】
［実施例４］
次に、画像処理等の際に用いられる、動くオブジェクトの衝突判定アルゴリズムの処理例について説明する。「衝突判定アルゴリズム」は、ある空間内に存在するｎ個のオブジェクト（物体）が互いに他のオブジェクトと衝突するかどうか、衝突する場合はどの程度の強度かを判定するアルゴリズムである。
ｎ個のオブジェクトの空間分布には偏りがあり、ｍ個のクラスタに分かれているとする。ここでは、例えば、１個のオブジェクトが、他の（ｎ−１）個のオブジェクトのいずれと最も強く衝突するかについて判定するものとする。
図９は、このような空間内のオブジェクトの例示図であり、「○」で表されるオブジェクトを矩形で囲んで１クラスタとしており、図９ではオブジェクトが５個のクラスタに分けられている。オブジェクトを示すデータは、ＢＣＭＣ１０からブロードキャストされ、クラスタ毎にセルプロセッサ２０に取り込まれる。セルプロセッサ２０は、取り込んだ１つのクラスタに含まれるオブジェクトに関する空間内での位置、運動についての処理を行う。
図９の例では、セルプロセッサＡ〜Ｅにより５個のクラスタに分けられたオブジェクトに関する処理が行われる。
図１０により、衝突判定アルゴリズムの処理の流れを説明する。
【００５０】
ＢＣＭＣ１０は、オブジェクトの位置や速度のデータを含むオブジェクトデータと、当該オブジェクトが属するクラスタを示すクラスタデータとを含むブロードキャストデータを生成し、すべてのセルプロセッサ２０にブロードキャストする（ステップＳ２０１）。各セルプロセッサ２０は、ブロードキャストデータから、オブジェクトデータをクラスタデータに基づいて取捨選択して取り込む。
オブジェクトデータを取り込んだセルプロセッサ２０は、オブジェクトの現在の位置データと速度データとから、単位時間後の新しい位置データを算出する。
新しい位置データから、新しいバウンディングボックスの値を得る（ステップＳ２０２）。バウンディングボックスとは、例えば、図９における、オブジェクトを囲む矩形である。バウンディングボックスの値とは、例えば、バウンディングボックスの頂点の座標である。
ＢＣＭＣ１０は、オブジェクトの新しい位置データを各セルプロセッサ２０から取り込んで位置データを更新する（ステップＳ２０３）。
【００５１】
次に、ＢＣＭＣ１０は、取得した新しい位置データ等を含むオブジェクトデータを一つずつ全セルプロセッサ２０にブロードキャストする（ステップＳ２０４）。つまり、衝突判定の対象となる１個のオブジェクト（以下、「判定対象オブジェクト」という）の位置を表す位置データを全セルプロセッサ２０に送る。
各セルプロセッサ２０では、まず、ステップＳ２０２で計算したバウンディングボックスを用いて、判定対象オブジェクトが衝突する可能性があるか否かを判断する（ステップＳ２０５）。具体的には、判定対象オブジェクトの位置がバウンディングボックス内にあるか否かを判断する。
衝突する可能性がある場合、つまり、判定対象オブジェクトがバウンディングボックス内にある場合は（ステップＳ２０５：Y）、そのセルプロセッサ２０で処理される、バウンディングボックス内の各オブジェクトとの距離計算を順次行い（ステップＳ２０６）、衝突の判定を行う（ステップＳ２０７）。判定対象オブジェクトがバウンディングボックス内のいずれかのオブジェクトと衝突する場合には（ステップＳ２０７：Y）、その衝突による衝撃の強さを定量的に表すデータ（衝突強度データ）、衝突による判定対象オブジェクトへの影響を表すデータ等を含む衝突データを生成する（ステップＳ２０８）。また、セルプロセッサ２０は、生成した衝突データのうち衝突強度データを、そのＩＤとともにＷＴＡ・総和回路３０に送る（ステップＳ２０９）。
【００５２】
判定対象オブジェクトがバウンディングボックス外にある場合（ステップＳ２０５：N）、または距離計算の結果、衝突しないと判定した場合（ステップＳ２０７：N）、各セルプロセッサ２０は、ＷＴＡ・総和回路３０に、例えば「−１．０」を、衝突強度データとして送る（ステップＳ２１０）。
ＷＴＡ・総和回路３０は最大値モードで動作する。ＷＴＡ・総和回路３０は、セルプロセッサ２０から送られる衝突強度データを比較して、最も衝突による衝撃の強さが大きいことを表す衝突強度データを検出して（ステップＳ２１１）、検出した衝突強度データを生成したセルプロセッサ２０を特定する。そして特定したセルプロセッサ２０を表すＩＤをＢＣＭＣ１０へ送る。
ＢＣＭＣ１０は、ＷＴＡ・総和回路３０の最上段から送られたＩＤにより表されるセルプロセッサ２０から衝突データを取得する（ステップＳ２１２）。ステップＳ２０４以降の処理をすべてのオブジェクトについて行うことにより、空間内のすべてのオブジェクト間の衝突判定が行われる。
【００５３】
［実施例５］
次に、ＷＴＡ・総和回路３０の加算器３０５を用いる場合の例を説明する。
各セルプロセッサ２０は、データ処理結果をＷＴＡ・総和回路３０へ入力する。ＷＴＡ・総和回路３０では、加算器３０５によりデータ処理結果を加算し、最終的に、すべてのセルプロセッサ２０のデータ処理結果の総和を得る。このようにして、ＷＴＡ・総和回路３０により高速にデータ処理結果の総和を得ることが可能である。
データ処理結果の総和は、ＢＣＭＣ１０に送られて、各セルプロセッサ２０にブロードキャストにより、高速に送信可能である。データ処理結果の総和は、例えば、ニューロなどの最適化計算において、正規化計算に用いられる。
【００５４】
以上の説明において、ＢＣＭＣ１０とＷＴＡ・総和回路３０とは各々独立したものとしたが、ＢＣＭＣ１０にＷＴＡ・総和回路３０を組み込んだ一つのブロックとして、コントローラを構成してもよい。
【００５５】
なお、以上の説明は、データ処理手段がセルプロセッサ２０であり、制御手段がコントローラ（ＢＣＭＣ１０）である場合の例であるが、本発明の構成要素は、このような例に限定されるものではない。
例えば複数のデータ処理端末を広域ネットワークを介して双方向通信が可能な形態で接続し、そのうちの一つ又は複数のデータ処理端末を制御手段、他の複数のデータ処理端末をデータ処理手段として動作させ、制御手段に、複数のデータ処理手段の一部又は全部より受け取ったデータ処理結果及び少なくとも一つのデータ処理手段によるデータ処理に用いるデータを含むブロードキャストデータをブロードキャストする機能をもたせ、複数のデータ処理手段の各々に、制御手段によりブロードキャストされたブロードキャストデータから自らが行うデータ処理に必要なデータのみを取捨選択してデータ処理を行うとともに、その処理結果を制御手段に送出させる機能をもたせるようにしてもよい。
【００５６】
また、複数のデータ処理手段として、予め定めた識別情報（例えば上述した識別データ）によりそれを特定できる汎用のデータ処理端末を用い、これらの汎用のデータ処理端末と双方向通信可能なサーバ、あるいはＣＰＵ及びメモリを内蔵した半導体デバイスを搭載した装置をのみをもってデータ処理システムを構成するようにしてもよい。
この場合のサーバ又は装置は、その内部のＣＰＵが所定のコンピュータプログラムを読み込んで実行することにより、サーバ本体又は装置内に、少なくとも一つのデータ処理手段としてのデータ処理端末を特定するとともに特定したデータ処理端末の識別情報とそのデータ処理端末宛のデータ処理用データとを含むブロードキャストデータを生成する機能と、複数のデータ処理端末の一部又は全部から当該データ処理端末で行われたデータの処理結果を取得する機能と、受け取った処理結果をブロードキャストデータに含め、当該ブロードキャストデータを複数のデータ処理端末の各々にブロードキャストする機能とを形成するものである。
【００５７】
【発明の効果】
以上のような本発明により、複数のデータ処理手段を用いる場合のデータ処理手段間のデータ処理を効率的に行えるようになる。
【図面の簡単な説明】
【図１】本発明を適用したマルチプロセッサシステムの構成例を示した図。
【図２】ＢＣＭＣの構成図。
【図３】セルプロセッサの構成図。
【図４】ＷＴＡ・総和回路の構成図。
【図５】本実施形態によるマルチプロセッサシステムの処理の流れを示すフローチャート。
【図６】隣接するプロセッサのデータ処理結果を使用する概念図。
【図７】一部のプロセッサのデータ処理結果を使用する概念図。
【図８】格子点データをグループ化した例示図。
【図９】オブジェクトをクラスタに分けた場合の例示図。
【図１０】衝突判定アルゴリズムの処理の流れを示すフローチャート。
【符号の説明】
１０ＢＣＭＣ
１０１ＣＰＵコア
１０２メインメモリ
１０３ＤＭＡＣ
２０セルプロセッサ
２０１セルＣＰＵ
２０２入力バッファ
２０３出力バッファ
２０４ＷＴＡバッファ
２０５プログラムコントローラ
２０６命令メモリ
２０７データメモリ
３０ＷＴＡ・総和回路
３０１第１入力レジスタ
３０２第２入力レジスタ
３０３切換器
３０４比較器
３０５加算器
３０６出力レジスタ[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a data processing system that performs data processing using a plurality of data processing means, such as a multiprocessor system and a data processing method.
[0002]
BACKGROUND OF THE INVENTION
As an advanced information society advances, the amount of data processed by a data processing device such as a computer tends to increase. In addition, the contents of data processing are becoming more complex and sophisticated. 2. Description of the Related Art Conventionally, processing performance of an entire data processing apparatus has been improved by improving the performance of a processor such as a CPU (Central Processing Unit) or by using a plurality of processors.
However, in recent years, the required speed of increasing data processing capacity has surpassed that of high performance processors. High performance processors cannot be performed overnight because of the long development period.
On the other hand, for example, the data processing capability of a multiprocessor is determined by the number of processors to be used and the processing method thereof, and is less dependent on the performance enhancement of individual processors. Therefore, it is one of effective means for improving the processing capability of the data processing apparatus.
[0003]
The data processing method by the multiprocessor is classified as follows according to the range of data required by one processor during data processing.
(1) A processor that performs data processing uses only data processed by an adjacent processor.
Such control is suitable for cellular automata, image filters, cloth and wave motion calculations, polygon generation calculations from curved surfaces, and the like.
(2) A processor that performs data processing uses only data processed by some of the plurality of processors.
Such control is suitable for many-to-many collision determination and the like.
[0004]
Data processing in the above case (1) can be efficiently realized by a conventional parallel processor. However, in the data processing (2), the processing speed of the entire system is limited by the communication speed between the parallel processors, and the processing speed of each processor cannot be fully exhibited. For example, it is possible to perform the data processing (2) at high speed by cross-bar connecting all the processors. However, in this case, the necessary hardware becomes enormous and is not realistic.
[0005]
An object of the present invention is to provide, for example, a data processing system and a data processing method capable of performing the above-described data processing (2) more efficiently than before.
[0006]
[Means for Solving the Problems]
  In order to solve the above problems, the multiprocessor system of the present invention manages at least one of a plurality of objects distributed in a predetermined virtual space, and positions of objects managed by the multiprocessor system in the virtual space. Each of the plurality of processors, and a controller that can acquire the position data from all the processors and that broadcasts the acquired position data to all the processors. Is determined based on the position data broadcast from the controller if the object whose position is represented by the position data is within the range in which objects managed by the controller are distributed. Only if the object It is adapted to determine whether the position.
  Each of the plurality of processors manages, for example, two or more of the objects.
[0007]
The plurality of processors may generate velocity data representing a velocity in the virtual space for an object managed by the plurality of processors. In this case, if the controller can acquire the position data and the velocity data from all the processors and broadcast the acquired position data and velocity data to all the processors one by one, the processor is broadcast. Collision intensity data that quantitatively represents the strength of impact due to collision based on the velocity data of the broadcast object when the object whose position is represented by the position data is at the collision position with the object under its control In addition, it is possible to generate data representing the influence on the object due to the collision.
Further, identification data for identifying each is assigned to the plurality of processors, and the collision intensity data is taken together with the identification data of the processor from the processor that generated the collision intensity data, and the collision intensity data is not generated. A value smaller than the value of the collision strength data is fetched from the processor together with the identification data as the collision strength data of the processor, the processor that generated the largest collision strength data is specified, and the identification data of the specified processor is sent to the controller A maximum value detection mechanism may be further provided. As a result, the controller can acquire the collision strength data and data representing the influence on the object due to the collision from the processor represented by the identification data sent from the maximum value detection mechanism.
[0008]
Further, in such a multiprocessor system, the processor, for example, calculates the distance between the object whose position data is broadcast and the object under its management, so that the object is under its management. It is determined whether or not it is in the collision position.
[0009]
  The data processing method of the present invention includes a plurality of data processing means for managing a plurality of objects distributed in a predetermined virtual space and divided into a plurality of clusters for each different cluster, and all the objects And a control means capable of storing position data representing the position of the object to all data processing means, and storing the position of the object in the virtual space. The control means obtains the position data of all objectsOne by oneBroadcast to all data processing meansFirstEach of the plurality of data processing means is within a range in which objects belonging to a cluster managed by the plurality of data processing means are distributed.In the first stageIt is determined whether or not there is an object whose position is represented by the position data broadcast from the control means.SecondAnd each of the plurality of data processing means comprises:In the second stageOnly when the object represented by the position data broadcast from the control means is within the range, it is determined whether or not the object is in a position where it collides with an object belonging to a cluster managed by itself.ThirdStages.
[0010]
  The data processing system of the present invention includes:Identification data for identifying each is assigned, and eachOf multiple objects distributed within a given virtual spaceFor at least oneIn the virtual spaceA plurality of data processing means for generating position data representing the position of the object and generating collision intensity data quantitatively representing the strength of impact when another object collides with the object; and the plurality of data Maximum value detecting means for acquiring the identification data and the collision strength data from a processing means, detecting the largest collision strength data, and outputting the acquired identification data together with the detected collision strength data, and the plurality of data processing means The position data of all the objects is acquired by acquiring the position data from each of them, and the position data is broadcast to the plurality of data processing means, and the identification data transmitted from the maximum value detecting means Control means for acquiring collision intensity data from any data processing means based on Eteiru. Each of the plurality of data processing means is within a range in which objects whose positions are represented by the position data generated by itself are distributed.From the control meansBroadcastIt is determined whether or not there is an object whose position is represented by the position data.HimselfDetermine whether or not it is in a position where it collides with the object that generated the position dataIs.
  This system generates, for example, position data of two or more objects.
[0011]
  The computer program provided by the present invention is:Identification data for identifying each is assigned, and for each of at least one of a plurality of objects distributed in a predetermined virtual space, position data representing the position of the object in the virtual space is generated. And a plurality of data processing means for generating impact strength data quantitatively representing the strength of impact when another object collides with the object, and the identification data and the collision strength data from the plurality of data processing means. A maximum value detecting means for detecting the largest collision intensity data and outputting identification data obtained together with the detected collision intensity data, and acquiring the position data from each of the plurality of data processing means. The position data of all objects is acquired, and this position data is sent to the plurality of data processing means. And a control means for acquiring collision intensity data from any of the data processing means based on the identification data output from the maximum value detecting means. Each of the plurality of data processing means has an object whose position is represented by position data broadcast from the control means within a range in which objects whose positions are represented by the position data generated by itself are distributed. A computer program for executing a process for determining whether or not the object is in a position where it collides with the object that has generated the position data only when the object is within the range. .
  The semiconductor device provided by the present invention is:Identification data for identifying each is assigned, and for each of at least one of a plurality of objects distributed in a predetermined virtual space, position data representing the position of the object in the virtual space is generated. And a plurality of data processing means for generating impact strength data quantitatively representing the strength of impact when another object collides with the object, and the identification data and the collision strength data from the plurality of data processing means. A maximum value detecting means for detecting the largest collision intensity data and outputting identification data obtained together with the detected collision intensity data, and acquiring the position data from each of the plurality of data processing means. The position data of all objects is acquired, and this position data is sent to the plurality of data processing means. And a control means for acquiring collision intensity data from any data processing means based on the identification data output from the maximum value detection means, and incorporated in a computer-equipped apparatus comprising In the computer, each of the plurality of data processing means has an object whose position is represented by position data broadcast from the control means within a range in which objects whose positions are represented by the position data generated by itself are distributed. A semiconductor device that executes a process of determining whether or not the object is in a position where it collides with the object that has generated the position data only when the object is within the range. .
[0012]
DETAILED DESCRIPTION OF THE INVENTION
An embodiment in which the present invention is applied to a multiprocessor system as an example of a data processing system will be described below.
[0013]
<Overall configuration>
FIG. 1 is a diagram illustrating a configuration example of a multiprocessor system. The multiprocessor system 1 includes a broadcast memory controller (hereinafter referred to as “BCMC (Broadcast Memory Controller)”) 10 which is a control means for data processing, data recording and reading, and a plurality of data processing means. Cell processor 20 and a plurality of WTA (Winner Take All) / sum total circuits 30 for forming various functions required for data processing.
BCMC 10 and all cell processors 20 are connected by a broadcast channel (a communication channel that can be broadcast simultaneously).
[0014]
The multiprocessor system 1 manages state variable values, which are examples of data processing results by the cell processors 20, by the BCMC 10, and broadcasts the state variable values of all the cell processors 20 from the BCMC 10 as examples of reference numerical values. To do. Thereby, each cell processor 20 can refer to the state variable value generated in another cell processor 20 at high speed.
[0015]
The broadcast channel is a transmission path between the BCMC 10 and the plurality of cell processors 20, and includes an address bus used for address transfer and a data bus used for data transfer such as state variable values. Composed. The address includes a cell address for specifying each cell processor 20 and a broadcast address for all the cell processors 20.
The cell address corresponds to an address (physical address or logical address) on the memory, and the state variable value from the cell processor 20 is always stored in the address corresponding to the cell address indicating the cell processor 20. It has become. Each cell processor 20 is given an ID (identification) as identification information for identifying each cell processor 20. The cell address also corresponds to this ID. Thereby, it is possible to specify from which cell processor 20 the state variable value is output from the cell address.
[0016]
The WTA / sum circuit 30 is connected as shown in FIG. That is, the WTA / sum circuit 30 is connected in a pyramid shape with the cell processor 20 side as the first stage. Two cell processors 20 are connected to the input terminal of the first stage WTA / sum circuit 30, and the output terminal is connected to the input terminal of the second stage WTA / sum circuit 30.
In the second and subsequent stages, the output terminals of the two lower WTA / sum circuits 30 are connected to the input terminals, and the input terminals of the upper WTA / sum circuit 30 are connected to the output terminals. The uppermost WTA / sum circuit 30 has an input terminal connected to the output terminals of two lower WTA / sum circuits 30, and an output terminal connected to the BCMC 10.
[0017]
In addition to the connection configuration shown in the figure, the present invention can also be implemented by connecting the WTA / sum circuit 30 in cascade. In this case, two cell processors 20 are connected to the input terminal of the first stage WTA / sum circuit 30 and the output terminal is connected to the input terminal of the upper stage. The output terminal of the lower stage WTA / sum circuit 30 and the cell processor 20 are connected to the input terminals of the second and subsequent stages of the WTA / sum circuit 30, and the output terminal is connected to the input terminal of the upper stage. The uppermost WTA / sum circuit 30 has an input terminal connected to an output terminal of the lower stage WTA / sum circuit 30 and the cell processor 20, and an output terminal connected to the BCMC 10.
[0018]
Next, each of the BCMC 10, the cell processor 20, and the WTA / sum circuit 30 will be described in detail.
[0019]
<BCMC>
The BCMC 10 broadcasts data to all the cell processors 20 through the broadcast channel, and captures and holds the state variable values from each cell processor 20. FIG. 2 shows a configuration example of the BCMC 10.
In the BCMC 10, a CPU core 101 that controls the operation of the entire multiprocessor system 1, a rewritable main memory 102 such as SRAM (Static Random Access Memory), and a DMAC (Direct Memory Access Controller) 103 are connected by a bus B1. Configured. The CPU core 101 is a semiconductor device mounted on a computer that cooperates with the main memory 102 to read and execute a predetermined computer program to form a function for performing characteristic data processing of the present invention. The main memory 102 is used as a shared memory for the entire system.
The bus B1 is also connected to the output terminal of the uppermost WTA / sum circuit 30 and an external memory such as a hard disk or a portable medium.
[0020]
The CPU core 101 reads a startup program from the external memory at the time of startup, and executes the startup program to operate the operating system. Also, various data necessary for data processing is read from the external memory and developed in the main memory 102. The main memory 102 also stores data such as state variable values of each cell processor 20. The state variable value is stored in the address of the main memory 102 corresponding to the cell address of the cell processor 20 that calculated the state variable value.
The CPU core 101 also generates broadcast data to be broadcast to each cell processor 20 based on the data read from the main memory 102. The broadcast data is, for example, pair data consisting of a set of a state variable value and a cell address indicating the cell processor 20 that has calculated the state variable value. One or more pairs of data are generated.
[0021]
The DMAC 103 is a semiconductor device that performs direct memory access transfer control between the main memory 102 and each cell processor 20. For example, broadcast data is broadcast to each cell processor 20 via a broadcast channel. Further, the data processing result of each cell processor 20 is individually acquired and written into the main memory 102.
[0022]
<Cell processor>
Each cell processor 20 selects necessary data from the broadcast data, performs data processing, and reports the fact to the WTA / sum circuit 30 at the end of the data processing. A state variable value as a data processing result is sent to the BCMC 10 in accordance with an instruction from the BCMC 10. The cell processors 20 are ring-connected via a shared memory (not shown). Each cell processor 20 may perform data processing with a synchronous clock or different clocks. FIG. 3 shows a configuration example of the cell processor 20.
The cell processor 20 includes a cell CPU 201, an input buffer 202, an output buffer 203, a WTA buffer 204, a program controller 205, an instruction memory 206, and a data memory 207.
[0023]
The cell CPU 201 is a processor provided with a programmable floating point arithmetic unit, and controls the operation in the cell processor 20 to perform data processing. The cell CPU 201 acquires the broadcast data broadcast from the BCMC 10 via the input buffer 202, determines whether or not the data is necessary for the process to be performed by the cell address of the pair data, and if necessary, the data memory 207. Write the state variable value to the corresponding address. Further, the state variable value is read from the data memory 207 to perform data processing, the data processing result is written into the output buffer 203, and data indicating the end of the data processing is sent to the WTA / sum circuit 30.
[0024]
The input buffer 202 holds broadcast data broadcast from the BCMC 10. The held broadcast data is sent to the cell CPU 201 in response to a request from the cell CPU 201.
The output buffer 203 holds the state variable value of the cell CPU 201. The held state variable value is transmitted to the BCMC 10 in response to a request from the BCMC 10.
In addition to this, the input buffer 202 and the output buffer 203 may transmit and receive control data and the like.
The WTA buffer 204 receives data indicating the end of data processing from the cell CPU 201 when the data processing by the cell CPU 201 is completed, and transmits the data to the WTA / sum total circuit 30 to indicate the end of data processing. This is reported to the circuit 30. The end data indicating the end of data processing includes, for example, the ID of the own cell processor 20 and priority data for determining the priority when the state variable value stored in the output buffer 203 is read into the BCMC 10.
[0025]
The program controller 205 takes in a program that defines the operation of the cell processor 20 from the BCMC 10. In the program that defines the operation of the cell processor 20, a program for data processing executed by the cell processor 20, a data selection program for determining data necessary for processing by the cell processor 20, and a processing result are read into the BCMC 10. There is a priority determination program that determines the priority of when.
The instruction memory 206 stores a program fetched by the program controller 205. The stored program is read into the cell CPU 201 as necessary.
[0026]
The data memory 207 stores data processed in the cell processor 20. Broadcast data determined to be necessary by the cell CPU 201 is written. Broadcast data is stored at an address corresponding to a cell address.
In this embodiment, a part of the data memory 207 is connected to the adjacent cell processor 20 via the shared memory, and data can be transmitted to and received from the adjacent cell processor 20 every cycle.
[0027]
<WTA / sum circuit>
The plurality of WTA / sum circuits 30 determine the order in which the BCMC 10 fetches the state variable values from the cell processor 20 based on the data indicating the end of the data processing sent from each cell processor 20, and reports it to the BCMC 10.
FIG. 4 shows a configuration example of the WTA / sum circuit 30.
Each WTA / sum circuit 30 includes two input registers A and B (hereinafter referred to as a first input register 301 and a second input register 302), a switcher 303, a comparator 304, an adder 305, and an output register 306. And comprising.
[0028]
The first input register 301 and the second input register 302 include an integer register and a floating point register, respectively. Of the end data indicating the end of data processing sent from the cell processor 20, for example, ID is written in the integer register, and for example, priority data is written in the floating point register.
The switch 303 activates one of the comparator 304 and the adder 305. Specifically, only one of them can be used according to the operation mode. The operation mode is determined by an instruction from the BCMC 10, for example. The operation mode will be described later.
The comparator 304 compares the floating-point values held in the floating-point registers of the first input register 301 and the second input register 302, and calculates a larger (or smaller) value and an associated integer. Write to the output register 306.
The adder 305 calculates the sum of the floating point values held by the floating point registers of the first input register 301 and the second input register 302 and writes the calculation result to the output register 306.
The output register 306 is configured substantially the same as the first input register 301 and the second input register 302. That is, an integer register and a floating point register are provided. An ID is written in the integer register, and priority data is written in the floating point register.
[0029]
The WTA / sum circuit 30 has three operation modes described below.
[0030]
・ Maximum value (WTA) mode:
The comparator 304 is activated by the switch 303. The comparator 304 compares the floating-point values A and B held by the floating-point registers of the first input register 301 and the second input register 302, and the larger (or smaller) value is attached to the comparison. An integer value is written to the output register 306. When the writing to the output register 306 is completed, the first input register 301 and the second input register 302 are cleared. The contents of the output register 306 are written into the input register of the upper stage WTA / sum circuit 30. At this time, when the write destination input register is not cleared, the writing is stalled, and writing is not performed in that cycle, but writing is performed in the next cycle.
[0031]
・ Addition mode:
The adder 305 is activated by the switch 303. The adder 305 calculates the sum of the floating point values A and B held in the floating point registers of the first input register 301 and the second input register 302, and writes the calculation result in the output register 306. The contents of the output register 306 are written into the input register of the upper stage WTA / sum circuit 30.
[0032]
・ Approximate sort mode:
The comparator 304 is activated by the switch 303. The comparator 304 compares the floating-point values A and B held by the floating-point registers of the first input register 301 and the second input register 302, and the larger (or smaller) value is attached to the comparison. The integer value is written into the output register 306.
Thereafter, only the input register holding the value written in the output register 306 is cleared, and the contents of the output register 306 are written in the input register of the WTA / sum circuit 30 in the upper stage. If the write destination input register is not cleared, the write stalls and no write is performed in that cycle. However, the write operation from the output register 306 of the lower stage WTA / sum circuit 30 is performed.
In the approximate sort mode, the data received by the BCMC 10 from the uppermost output register 306 of the WTA / sum circuit 30 is sorted (rearranged) in order of increasing or decreasing floating point.
[0033]
Before entering each mode, the first input register 301, the second input register 302, and the output register 306 of all the WTA / sum circuits 30 are cleared.
[0034]
By switching and using each mode, the plurality of WTA / sum circuits 30 function as the sorting mechanism (sort mechanism) and / or the sum circuit. That is, when operating in the approximate sort mode, a sort mechanism is realized, and when operating in the addition mode, a summation circuit is realized.
[0035]
The WTA / sum circuit 30 that operates in the maximum value mode and the approximate sort mode may be realized as follows.
That is, the same number of input registers, switches, comparators, adders, and output registers as the cell processor 20 constitute a WTA / sum circuit.
The same number of input registers as the number of cell processors 20 are prepared, and each of them includes an integer register and a floating point register, like the first register 301 and the second register 302. The comparator compares the floating point values held in the floating point registers of all the input registers. The adder calculates the sum of the floating point values held in the floating point registers of all the input registers.
The output register is the same as the output register of the WTA / sum circuit 30 of FIG.
[0036]
The comparator compares the priority data held in the floating-point registers of the input registers, and sequentially writes the accompanying IDs to the output registers in descending order of priority. Thereby, ID can be sent to BCMC10 in order of high priority.
The adder can add the data held by each floating-point register to obtain the sum.
Such a WTA / sum circuit does not have the connection form shown in FIG. 1, and functions as a sorting mechanism and a sum circuit in the present invention.
[0037]
<Data processing method>
The multiprocessor system 1 in the present embodiment performs the required data processing by operating as follows. FIG. 5 is a flowchart showing the flow of processing executed in the multiprocessor system 1.
[0038]
In the main memory 102 of the BCMC 10, initial values of the state variable values of all the cell processors 20 are stored in advance.
The BCMC 10 creates broadcast data by using pair data composed of the state variable value of the cell processor 20 and the cell address indicating the cell processor 20 (step S101). Then, the created broadcast data is broadcast to all the cell processors 20 (step S102).
Each cell processor 20 captures broadcast data into the input buffer 202. The cell CPU 201 checks the cell address of the broadcast data held in the input buffer 202 by the data selection program stored in the instruction memory 206, and determines whether there is a state variable value required for data processing performed by the own cell processor 20. Confirmation (step S103). When there is no state variable value required for the data processing performed by itself, the cell processor 20 ends the processing operation (step S103: none). When there is a state variable value required for data processing performed by itself (step S103: present), the corresponding state variable value is overwritten on the address on the data memory 207 corresponding to the cell address that forms pair data with this state variable value. (Step S104).
Thus, the data broadcast from the BCMC 10 to each cell processor 20 is completed.
[0039]
When the broadcast ends, each cell processor 20 generates a new state variable value by processing the state variable value recorded in the data memory 207 by a data processing program stored in the instruction memory 206. The new state variable value is written in the data memory 207 and also in the output buffer 203 (step S105). The new state variable value is overwritten on the address corresponding to its own cell address on the data memory 207.
When the data processing is completed, the cell CPU 201 transmits end data including ID and priority data to the input register of the first stage WTA / sum total circuit 30 via the WTA buffer 204 and reports the end of the data processing. (Step S106). The priority data is generated by a predetermined priority determination program before or after data processing.
[0040]
The WTA / sum circuit 30 in the first stage holds the ID in the integer register of the input register and the priority data in the floating point register among the end data sent from each cell processor 20. Here, the WTA / sum circuit 30 operates in the approximate sort mode. For this purpose, the switching unit 303 activates the comparator 304.
Each of the first input register 301 and the integer register of the second input register of the WTA / sum circuit 30 holds an ID sent from a different cell processor 20. Each floating-point register holds priority data associated with the ID. The comparator 304 reads the priority data from the floating point registers of the first input register 301 and the second input register 302, and compares the priorities. As a result of the comparison, the priority data having the higher priority and the accompanying ID are written to the floating point register and the integer register of the output register 306. The contents of the input register whose contents are written to the output register 306 are cleared. The ID and priority data written to the output register 306 are written to the input register of the WTA / sum circuit 30 in the upper stage.
Such processing is performed by the WTA / sum circuit 30 in each stage. The top WTA / sum circuit 30 sends the ID written in the integer register of the output register 306 to the BCMC 10.
Through the processing as described above, the WTA / sum circuit 30 as a whole sends IDs to the BCMC 10 in order of priority (step S107).
[0041]
The BCMC 10 obtains the data-processed state variable value from the output buffer 203 of the cell processor 20 corresponding to the ID sent from the WTA / sum circuit 30. The acquired state variable value is overwritten on the address corresponding to the cell address indicating the cell processor 20 that has performed the processing on the main memory 102 in the BCMC 10 (step S108).
Thus, one cycle of the state variable value processing operation is completed.
[0042]
The BCMC 10 acquires a data processing result from each cell processor 20 and thereby generates broadcast data.
Each cell processor 20 selects only the data necessary for itself from the broadcast data and performs data processing. By performing data processing using this broadcast data, processing using data processed by all other cell processors 20 becomes possible. Also, by creating broadcast data from paired data consisting of a data processing result from each cell processor 20 and a cell address indicating the cell processor 20 that generated the data processing result, the data processing result of a specific cell processor 20 is obtained. It is possible to perform processing using only these. Furthermore, since adjacent cell processors 20 are connected via a shared memory, processing between adjacent cell processors 20 can be performed as in the conventional case.
Each cell processor 20 does not go directly to the main memory 102 to fetch the data required by its own cell processor 20, but selects the necessary data from the broadcast data and stores the data in each cell processor 20. Since the data is held and processed, high-speed processing can be performed without data competition.
[0043]
[Example 1]
Next, a specific example of the multiprocessor system 1 will be described.
In this embodiment, an example in which only data processed by a certain cell processor 20 and another cell processor 20 adjacent thereto is used will be described with reference to FIG.
In FIG. 6, “◯” represents a cell processor, where the shaded “◯” is a cell processor that performs data processing, and “●” is a cell processor that holds data that needs to be processed.
Consider a case where the following filter calculation is continuously executed on data (grid point data) for each lattice point of a lattice of n × n (n is a natural number of 2 or more).
Xi, j = (Xi-1, j + Xi + 1, j + Xi, j-1 + Xi, j + 1) / 4
i: Grid point row number, j: Grid point column number
[0044]
The BCMC 10 broadcasts to the n cell processors 20 as broadcast data in which grid point data is grouped in rows or columns.
FIG. 8 is an exemplary diagram in which lattice point data is grouped, and five pieces of lattice point data indicated by “◯” are grouped. One grouped grid point data is processed by one cell processor 20.
In the cell processor 20, grouped lattice point data required from the broadcast data is stored in the data memory 207. The grid point data is sequentially read from the data memory 207 and processed.
[0045]
Data transfer is performed using the shared memory with the cell processor 20 connected via the shared memory. Assuming that the data write operation to the shared memory is one cycle, the grouped data transfer between the cell processors 20 can be performed in 2n cycles.
Each cell processor 20 is operated synchronously, and writing and calculation to the shared memory are executed simultaneously as in pipeline processing, whereby communication and calculation between the cell processors 20 can be performed simultaneously.
[0046]
The next broadcast data is broadcast by the BCMC 10 every time the data processing of the grouped lattice point data is completed. The cell processor 20 determines whether it is necessary data based on i and j of the data to be broadcast.
Data in the row or column direction can be processed by grouping the broadcast data, and data processing in the column or row direction can be performed by transferring data via shared data.
[0047]
[Example 2]
In this embodiment, an example of using only data processed by some of the cell processors 20 among all the cell processors 20 will be described with reference to FIG. In FIG. 7, “◯” represents a cell processor, where a shaded “◯” is a cell processor that performs data processing, and a “●” is a cell processor that holds required data. Such a multiprocessor system is useful for realizing a Hopfield associative memory.
Each cell processor 20 is assumed to hold a state variable value that is a data processing result and a weighting coefficient that represents the importance of the state variable value. The cell processors 20 are numbered, and the BCMC 10 takes in state variable values from the cell processor 20 in the order of the numbers.
The BCMC 10 broadcasts the state variable values fetched from all the cell processors 20 as broadcast data. Each cell processor 20 selects only a necessary state variable value from the broadcast data, performs a product-sum operation with the weight coefficient, and updates the state variable value. When the necessary state variable values are all the state variable values included in the broadcast data, this corresponds to the processing using the data processed by all the processors.
[0048]
[Example 3]
Next, an example of pattern matching calculation processing will be described.
Here, a process of specifying the cell processor 20 that holds data most similar to the characteristics of the input data is performed. This process is performed as follows.
Each cell processor 20 holds template data to be compared in advance.
The BCMC 10 broadcasts input data to all cell processors 20. Each cell processor 20 calculates a difference value between the feature of the template data held by itself and the feature of the input data. The difference value is sent to the WTA / sum circuit 30 together with the ID.
The WTA / sum circuit 30 operates in the maximum value mode. The integer register of the input register holds the ID, and the floating point register holds the difference value. The difference value is compared by the comparator 304, and the smaller difference value and the accompanying ID are sent to the output register 306. This is performed by the entire WTA / sum circuit 30 to obtain the smallest difference value and its associated ID. This ID and the difference value are sent to the BCMC 10.
The BCMC 10 identifies the cell processor 20 by the ID. Accordingly, it is possible to detect the difference between the template data most similar to the feature of the input data and the template data most similar to the feature of the input data.
[0049]
[Example 4]
Next, a processing example of a moving object collision determination algorithm used in image processing or the like will be described. The “collision judgment algorithm” is an algorithm for judging whether or not n objects (objects) existing in a certain space collide with other objects, and if so, how strong they are.
It is assumed that the spatial distribution of n objects is biased and divided into m clusters. Here, for example, it is assumed that it is determined which of the (n−1) objects most strongly collides with one object.
FIG. 9 is an illustration of an object in such a space. The object represented by “◯” is surrounded by a rectangle to form one cluster, and in FIG. 9, the object is divided into five clusters. Data indicating the object is broadcast from the BCMC 10 and taken into the cell processor 20 for each cluster. The cell processor 20 performs processing on the position and motion in the space regarding the objects included in one captured cluster.
In the example of FIG. 9, the cell processors A to E perform processing related to objects divided into five clusters.
The flow of the collision determination algorithm process will be described with reference to FIG.
[0050]
The BCMC 10 generates broadcast data including object data including object position and velocity data and cluster data indicating a cluster to which the object belongs, and broadcasts it to all cell processors 20 (step S201). Each cell processor 20 selects and fetches object data from broadcast data based on cluster data.
The cell processor 20 that has fetched the object data calculates new position data after a unit time from the current position data and velocity data of the object.
A new bounding box value is obtained from the new position data (step S202). The bounding box is, for example, a rectangle surrounding the object in FIG. The value of the bounding box is, for example, the coordinates of the vertex of the bounding box.
The BCMC 10 takes in new position data of the object from each cell processor 20 and updates the position data (step S203).
[0051]
Next, the BCMC 10 broadcasts object data including the acquired new position data and the like to the all cell processors 20 one by one (step S204). That is, position data representing the position of one object (hereinafter referred to as “determination target object”) that is a collision determination target is sent to all cell processors 20.
Each cell processor 20 first determines whether or not there is a possibility of collision of the determination target object using the bounding box calculated in step S202 (step S205). Specifically, it is determined whether or not the position of the determination target object is within the bounding box.
When there is a possibility of collision, that is, when the determination target object is in the bounding box (step S205: Y), the distance calculation with each object in the bounding box, which is processed by the cell processor 20, is sequentially performed. (Step S206), the collision is determined (Step S207). When the determination target object collides with one of the objects in the bounding box (step S207: Y), data that quantitatively represents the strength of the impact due to the collision (collision strength data), the determination target object due to the collision Collision data including data representing the influence of the above is generated (step S208). Further, the cell processor 20 sends the collision intensity data among the generated collision data to the WTA / sum circuit 30 together with the ID (step S209).
[0052]
When the determination target object is outside the bounding box (step S205: N), or when it is determined that there is no collision as a result of the distance calculation (step S207: N), each cell processor 20 sends to the WTA / sum circuit 30 for example, “−1.0” is sent as the collision strength data (step S210).
The WTA / sum circuit 30 operates in the maximum value mode. The WTA / sum circuit 30 compares the collision strength data sent from the cell processor 20, detects collision strength data indicating that the impact strength due to the collision is the largest (step S211), and detects the detected collision strength data. Is specified. Then, an ID representing the specified cell processor 20 is sent to the BCMC 10.
The BCMC 10 acquires the collision data from the cell processor 20 represented by the ID sent from the top stage of the WTA / sum circuit 30 (step S212). By performing the processing from step S204 on all the objects, collision determination between all objects in the space is performed.
[0053]
[Example 5]
Next, an example in which the adder 305 of the WTA / sum circuit 30 is used will be described.
Each cell processor 20 inputs the data processing result to the WTA / sum circuit 30. In the WTA / sum total circuit 30, the data processing results are added by the adder 305, and finally the sum of the data processing results of all the cell processors 20 is obtained. In this way, it is possible to obtain the sum of the data processing results at high speed by the WTA / sum circuit 30.
The sum of the data processing results is sent to the BCMC 10 and can be transmitted to each cell processor 20 at high speed by broadcasting. The sum of the data processing results is used for normalization calculation in optimization calculation such as neuro.
[0054]
In the above description, the BCMC 10 and the WTA / sum circuit 30 are independent from each other. However, the controller may be configured as one block in which the WTA / sum circuit 30 is incorporated in the BCMC 10.
[0055]
The above description is an example in which the data processing means is the cell processor 20 and the control means is the controller (BCMC 10). However, the constituent elements of the present invention are not limited to such an example. Absent.
For example, a plurality of data processing terminals are connected via a wide area network in a form capable of bidirectional communication, and one or a plurality of data processing terminals are operated as control means and the other data processing terminals are operated as data processing means. The control means is provided with a function of broadcasting broadcast data including data processing results received from some or all of the plurality of data processing means and data used for data processing by at least one data processing means. Each of the means is provided with a function of selecting only data necessary for data processing performed by itself from the broadcast data broadcast by the control means and performing data processing and sending the processing result to the control means. Also good.
[0056]
In addition, as a plurality of data processing means, a general-purpose data processing terminal capable of specifying it with predetermined identification information (for example, identification data described above) is used, and a server capable of bidirectional communication with these general-purpose data processing terminals, or You may make it comprise a data processing system only with the apparatus which mounts the semiconductor device incorporating CPU and memory.
In this case, the server or device identifies and specifies the data processing terminal as at least one data processing means in the server body or device by reading and executing a predetermined computer program by the internal CPU. A function of generating broadcast data including identification information of a processing terminal and data processing data addressed to the data processing terminal, and processing results of data performed by the data processing terminal from a part or all of the plurality of data processing terminals And a function of including the received processing result in broadcast data and broadcasting the broadcast data to each of a plurality of data processing terminals.
[0057]
【The invention's effect】
According to the present invention as described above, data processing between data processing means when a plurality of data processing means are used can be performed efficiently.
[Brief description of the drawings]
FIG. 1 is a diagram showing a configuration example of a multiprocessor system to which the present invention is applied.
FIG. 2 is a configuration diagram of BCMC.
FIG. 3 is a configuration diagram of a cell processor.
FIG. 4 is a configuration diagram of a WTA / sum circuit.
FIG. 5 is a flowchart showing a process flow of the multiprocessor system according to the embodiment.
FIG. 6 is a conceptual diagram using data processing results of adjacent processors.
FIG. 7 is a conceptual diagram using data processing results of some processors.
FIG. 8 is an exemplary diagram in which lattice point data is grouped.
FIG. 9 is an exemplary diagram when an object is divided into clusters.
FIG. 10 is a flowchart showing a flow of processing of a collision determination algorithm.
[Explanation of symbols]
10 BCMC
101 CPU core
102 Main memory
103 DMAC
20 cell processor
201 cell CPU
202 Input buffer
203 Output buffer
204 WTA buffer
205 Program controller
206 Instruction memory
207 data memory
30 WTA / sum circuit
301 First input register
302 Second input register
303 switcher
304 Comparator
305 Adder
306 Output register

Claims

A plurality of processors each managing a plurality of objects distributed in a predetermined virtual space, and generating position data representing positions in the virtual space of the objects managed by the user;
A controller capable of acquiring the position data from all processors and broadcasting the acquired position data to all processors, and
Wherein each of the plurality of processors, the position data broadcast from the controller, the object which the object whose position is represented by the position data they manage is determined whether the range of distribution, the range Only when the object is in the position, it is determined whether or not the object is in a position where it collides with the object that it manages.
Multiprocessor system.

Each of the plurality of processors manages two or more of the objects;
The multiprocessor system according to claim 1.

The plurality of processors generate velocity data representing a velocity in the virtual space for an object managed by the plurality of processors.
The controller can acquire the position data and the velocity data from all the processors, and broadcasts the acquired position data and velocity data to all the processors one by one,
When the object whose position is represented by the broadcast position data is in a collision position with an object under its control, the processor quantifies the intensity of impact caused by the collision based on the speed data of the broadcast object. To generate the impact intensity data and the impact impact on the object.
The multiprocessor system according to claim 1.

Identification data for identifying each of the plurality of processors is assigned,
Collision intensity data is fetched together with the identification data of the processor from the processor that has generated the collision intensity data, and a value smaller than the value of the collision intensity data is obtained from the processor that has not generated the collision intensity data. And a maximum value detection mechanism that identifies the processor that has generated the largest collision strength data, and sends the identification data of the identified processor to the controller,
The controller is configured to acquire collision strength data and data representing an influence on the object due to the collision from a processor represented by identification data sent from the maximum value detection mechanism.
The multiprocessor system according to claim 3.

The processor determines whether the object is in a collision position with an object under its control by calculating a distance between the object whose position data is broadcast and the object under its control. It has become,
The multiprocessor system according to claim 1.

A plurality of data processing means for managing a plurality of objects distributed in a predetermined virtual space and divided into a plurality of clusters for each different cluster, and positions of all the objects in the virtual space A method executed in an apparatus or system having storage means and control means capable of broadcasting position data representing the position of the object to all data processing means,
A first stage in which the control means broadcasts the position data of all objects one by one to all data processing means;
Whether each of the plurality of data processing means has an object whose position is represented by the position data broadcast from the control means in the first step within the range in which objects belonging to the cluster managed by the data processing means are distributed. A second stage of determining whether or not
Each of the plurality of data processing means belongs to the cluster managed by itself only when the object represented by the position data broadcast from the control means in the second stage is within the range. A third stage for determining whether or not the object collides with the object,
Data processing method.

Identification data for identifying each is assigned, and for each of at least one of a plurality of objects distributed in a predetermined virtual space, position data representing the position of the object in the virtual space is generated. And a plurality of data processing means for generating collision strength data that quantitatively represents the strength of impact when another object collides with the object;
Obtaining the identification data and the collision strength data from the plurality of data processing means, detecting the largest collision strength data, and outputting the identification data acquired together with the detected collision strength data;
The position data of all objects is acquired by acquiring the position data from each of the plurality of data processing means, and the position data is broadcast to the plurality of data processing means, and from the maximum value detecting means. Control means for acquiring collision intensity data from any of the data processing means based on the transmitted identification data,
Whether each of the plurality of data processing means has an object whose position is represented by position data broadcast from the control means within a range in which objects whose position is represented by the position data generated by itself is distributed. Only when it is within the range, it is determined whether or not the object is in a position where it collides with the object that generated the position data by itself.
Data processing system.

At least one of the plurality of data processing means generates the position data of two or more objects;
The data processing system according to claim 7.

Identification data for identifying each is assigned, and for each of at least one of a plurality of objects distributed in a predetermined virtual space, position data representing the position of the object in the virtual space is generated. And a plurality of data processing means for generating collision strength data that quantitatively represents the strength of impact when another object collides with the object;
Obtaining the identification data and the collision strength data from the plurality of data processing means, detecting the largest collision strength data, and outputting the identification data acquired together with the detected collision strength data;
The position data of all objects is acquired by acquiring the position data from each of the plurality of data processing means, and the position data is broadcast to the plurality of data processing means, and from the maximum value detecting means. In a computer-equipped device comprising control means for acquiring collision intensity data from any data processing means based on the output identification data, the computer includes:
Whether each of the plurality of data processing means has an object whose position is represented by the position data broadcast from the control means within a range in which the object whose position is represented by the position data generated by itself is distributed. A process of determining whether or not the object is in a position where it collides with the object that generated the position data only when it is within the range,
A computer program for running.

Identification data for identifying each is assigned, and for each of at least one of a plurality of objects distributed in a predetermined virtual space, position data representing the position of the object in the virtual space is generated. And a plurality of data processing means for generating collision strength data that quantitatively represents the strength of impact when another object collides with the object;
Obtaining the identification data and the collision strength data from the plurality of data processing means, detecting the largest collision strength data, and outputting the identification data acquired together with the detected collision strength data;
The position data of all objects is acquired by acquiring the position data from each of the plurality of data processing means, and the position data is broadcast to the plurality of data processing means, and from the maximum value detecting means. By being incorporated in a computer-equipped device comprising control means for acquiring collision intensity data from any data processing means based on the output identification data,
Whether each of the plurality of data processing means has an object whose position is represented by the position data broadcast from the control means within a range in which the object whose position is represented by the position data generated by itself is distributed. A process of determining whether or not the object is in a position where it collides with the object that generated the position data only when it is within the range,
A semiconductor device that makes it run.