JP2006216058A

JP2006216058A - Data processing method, system and apparatus for moving processor tasks in multiprocessor system

Info

Publication number: JP2006216058A
Application number: JP2006027686A
Authority: JP
Inventors: Keisuke Inoue; 敬介井上; Masahiro Yasue; 正宏安江
Original assignee: Sony Computer Entertainment Inc
Current assignee: Sony Interactive Entertainment Inc
Priority date: 2005-02-04
Filing date: 2006-02-03
Publication date: 2006-08-17
Anticipated expiration: 2026-02-03
Also published as: TW200705208A; WO2006083043A2; WO2006083043A3; TWI338844B; JP4183712B2

Abstract

PROBLEM TO BE SOLVED: To realize efficient distributed processing of processor tasks between multiprocessor systems on a network. SOLUTION: A method and apparatus for moving and distributing processor tasks on a plurality of multiprocessor systems distributed through a network are provided. The multiprocessor system comprises at least one wide band engine 610. Each wide band engine 610 includes a plurality of processing units 630 and a synergistic processing unit 640 in addition to a shared memory 660. Tasks from one wide band engine 610 are remotely bundled, moved and processed in another wind band engine for effectively utilizing processing resources. The tasks are ended by being returned to the wide band engine 610 of the moving source, or subjected to continuous processing. COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、ネットワークを介して分散されたマルチプロセッサシステムにおけるプロセッサタスクの管理方法および装置に関し、より詳細には、実質的に自己管理的に（self-governing）マルチプロセッサシステムのサブ処理ユニット間でネットワークを介してプロセッサタスクをスケジューリングして実行する方法および装置に関する。 The present invention relates to a method and apparatus for managing processor tasks in a multiprocessor system distributed over a network, and more particularly, between sub-processing units of a multiprocessor system substantially self-governing. The present invention relates to a method and apparatus for scheduling and executing processor tasks via a network.

本出願は、２００４年２月２０日に出願された米国一部継続特許出願第１０／７８３，２４６号および第１０／７８３，２３８号に関連し、２００５年２月４日に出願された米国仮出願第６０／６５０，１５３号に関連する。 This application is related to U.S. continuation patent applications 10 / 783,246 and 10 / 783,238, filed February 20, 2004, and filed on Feb. 4, 2005. Related to provisional application 60 / 650,153.

リアルタイムでマルチメディアのアプリケーションは、ますます重要になっている。これらのアプリケーションは、一秒当たり何千メガビットのデータといった極めて速い処理速度を必要とする。単一の処理ユニットでも高速の処理速度を実現できるが、一般にマルチプロセッサアーキテクチャの処理速度と一致することはできない。実際、マルチプロセッサシステムにおいては、複数のサブプロセッサが並行に（または、少なくとも協調して）動作して所望の処理結果を得ることができる。 Real-time multimedia applications are becoming increasingly important. These applications require extremely fast processing speeds, such as thousands of megabits of data per second. A single processing unit can achieve a high processing speed, but generally cannot match the processing speed of a multiprocessor architecture. In fact, in a multiprocessor system, a plurality of sub-processors can operate in parallel (or at least in cooperation) to obtain a desired processing result.

また、マルチプロセッサアーキテクチャは有用であるが、スケーラビリティは限定的である。より大きな効率は、ネットワークを介して複数のマルチプロセッサシステムをグループ化することによって実現することができ、各マルチプロセッサシステムが単独で動作するときの速度を超える速度で分散処理が実行される。 Multiprocessor architectures are also useful, but scalability is limited. Greater efficiency can be achieved by grouping multiple multiprocessor systems over a network, and distributed processing is performed at a rate that exceeds the rate at which each multiprocessor system operates alone.

マルチプロセッサ技術を使用することができるコンピュータおよびコンピューティングデバイスのタイプは、広範囲に及ぶ。コンピューティングデバイスには、パーソナルコンピュータ（ＰＣ）やサーバの他、携帯電話、モバイルコンピュータ、個人携帯情報機器（ＰＤＡ）、セットトップボックス、デジタルテレビのほか多数が含まれる。 The types of computers and computing devices that can use multiprocessor technology are extensive. In addition to personal computers (PCs) and servers, computing devices include mobile phones, mobile computers, personal digital assistants (PDAs), set-top boxes, digital televisions, and many others.

リアルタイムでマルチメディアのソフトウェアアプリケーションは、処理命令や処理データなどの処理コードから構成される。処理命令および／または処理データの少なくとも一部の集合を、プロセッサタスクと呼ぶこともできる。あるプロセッサタスク内のプログラム文を順番に実行することもできるし、別のプロセッサタスクをマルチプロセッサシステム内の異なるプロセッサで並列にて実行することもできるし、あるいは、ネットワーク上の異なるマルチプロセッサシステム間に分散させることもできる。このように、ソフトウェアアプリケーションは、マルチプロセッサシステムによって実行されるプロセッサタスクを含むと考えることができる。 Real-time multimedia software applications are composed of processing codes such as processing instructions and processing data. A collection of at least a portion of processing instructions and / or processing data can also be referred to as a processor task. Program statements in one processor task can be executed sequentially, another processor task can be executed in parallel on different processors in a multiprocessor system, or between different multiprocessor systems on a network It can also be dispersed. Thus, a software application can be considered to include a processor task that is executed by a multiprocessor system.

マルチプロセッサシステムの設計懸念は、システムのいずれのサブ処理ユニットでいずれのプロセッサタスクを実行させるかを管理する方法であり、また、ネットワーク上のマルチプロセッサシステム間でのプロセッサタスクの分散を管理する方法である。マルチプロセッサシステムのなかには、いずれのサブ処理ユニットのプロセッサタスクが実行されるかを、プロセッサタスクが指定するものがある。この方法の欠点は、サブ処理ユニットの間でのプロセッサタスクの配分をプログラマが最適化することができないという点である。例えば、一つまたは複数のプロセッサタスクが、同時に同一のサブ処理ユニットを指定することがあり得る。これは、指定されたサブ処理ユニットが利用可能になるまで、プロセッサタスクの一部が保留されることを意味し、そのためプロセッサタスクの実行が遅延される。残念なことに、これはプロセッサタスクの実行について予測不可能なレイテンシを引き起こす。 A design concern for multiprocessor systems is how to manage which processor tasks are executed on which sub-processing units of the system, and how to manage the distribution of processor tasks among multiprocessor systems on a network It is. In some multiprocessor systems, a processor task designates which sub-processing unit's processor task is executed. The disadvantage of this method is that the programmer cannot optimize the distribution of processor tasks among the sub-processing units. For example, one or more processor tasks may specify the same sub-processing unit at the same time. This means that part of the processor task is suspended until the designated sub-processing unit is available, so that execution of the processor task is delayed. Unfortunately, this causes unpredictable latency for processor task execution.

他のシステムでは、管理要素がサブ処理ユニットと通信し、ユニット間のプロセッサタスクをスケジュールすることを検討する。したがって、このような通信を容易にするように通信プロトコルが実施されていなければならない。残念なことに、通信プロトコルは、管理要素とサブ処理ユニットの間のメッセージ遅延を引き起こすことが多い。実際、この種のプロトコルは、メモリマッピングされたレジスタを使用して、メモリマッピングされたＩ／Ｏ空間の使用が必要となり、これは一般的に低速である。さらに、システムのプロセッサ自身であってもよい管理要素が多数の分割領域を使用し、これは変更にかなりの時間（例えば、７００マイクロ秒）を必要としうる。これらの特徴もまた、プロセッサタスクの実行を遅延させて予測不可能なレイテンシを引き起こす。このように、マルチプロセッサシステムのプロセッサ全体のスループットと効率が犠牲になり、システムのユーザのリアルタイムおよび／またはマルチメディア体験に重大な影響を与えうる。 In other systems, consider that the management element communicates with the sub-processing units and schedules processor tasks between the units. Therefore, a communication protocol must be implemented to facilitate such communication. Unfortunately, communication protocols often cause message delays between management elements and sub-processing units. In fact, this type of protocol requires the use of memory mapped I / O space using memory mapped registers, which is generally slow. In addition, a management element, which may be the system's processor itself, uses a large number of split regions, which can require a significant amount of time (eg, 700 microseconds) to change. These features also delay the execution of processor tasks and cause unpredictable latencies. In this way, the overall processor throughput and efficiency of the multiprocessor system is sacrificed and can have a significant impact on the real-time and / or multimedia experience of the user of the system.

したがって、当技術分野において、ネットワーク上のマルチプロセッサシステム間でプロセッサタスクの効率的な分散処理を実現する新規な方法および装置が必要とされている。 Accordingly, there is a need in the art for a new method and apparatus that implements efficient distributed processing of processor tasks among multiprocessor systems on a network.

本発明の一態様による、インターネット等のネットワークを介して第１のマルチプロセッサシステムから第２のマルチプロセッサシステムにタスクを移動するための方法およびシステムが提供される。 A method and system for moving tasks from a first multiprocessor system to a second multiprocessor system over a network, such as the Internet, in accordance with an aspect of the present invention is provided.

本方法の一態様によると、様々な理由のうち任意の理由のために、少なくとも一つのマルチプロセッサからタスクを移動すべきか否かをマルチプロセッサが決定する。マルチプロセッサがタスクを別のマルチプロセッサに移動すべきと決定すると、第１のマルチプロセッサシステムからアプリケーションをブロードキャストする。アプリケーションは、複数のタスクと一つの属性とを指定する。属性は、アプリケーションをブロードキャストした前段のマルチプロセッサ、もしくは初めにアプリケーションをブロードキャストしたマルチプロセッサ、のうち少なくとも一つからのアプリケーションの距離と、複数のタスクのそれぞれを終了するためのデッドラインと、を表している。 According to one aspect of the method, the multiprocessor determines whether to move a task from at least one multiprocessor for any of a variety of reasons. When the multiprocessor decides to move the task to another multiprocessor, it broadcasts the application from the first multiprocessor system. The application specifies a plurality of tasks and one attribute. The attribute represents the distance of the application from at least one of the previous multiprocessor that broadcast the application or the multiprocessor that first broadcast the application, and a deadline for ending each of the tasks. ing.

本発明の別の態様によると、属性は、タスクの実行に必要な処理能力を指定する。さらに、同一の属性または別の属性が、タスクの実行に必要なメモリを指定してもよい。アプリケーションは、自身が複数のタスクを保有することによって、またはタスクの位置する場所を示すポインタ情報を使用して、複数のタスクを指定することができる。 According to another aspect of the invention, the attribute specifies the processing power required to perform the task. Furthermore, the same attribute or another attribute may specify the memory required to execute the task. An application can specify a plurality of tasks by owning a plurality of tasks or by using pointer information indicating a position where the task is located.

本発明の別の態様によると、第１のマルチプロセッサシステムからブロードキャストされたアプリケーションは、ネットワークに接続された第２のマルチプロセッサで受け取られる。第２のマルチプロセッサは、移動されるべきタスクを含むアプリケーション内のタスクをばらす（unbundle）。第２のマルチプロセッサは、タスクを実行するための要件を指定する属性を調査して、当該タスクを第２のマルチプロセッサで実行すべきか否かを判定する。第２のマルチプロセッサは、必要な処理能力および必要なメモリを検査してもよい。 According to another aspect of the invention, an application broadcast from a first multiprocessor system is received by a second multiprocessor connected to the network. The second multiprocessor unbundles the tasks in the application that includes the task to be moved. The second multiprocessor examines an attribute that specifies a requirement for executing the task and determines whether the task should be executed by the second multiprocessor. The second multiprocessor may check the required processing power and the required memory.

本発明の別の態様によると、第２のマルチプロセッサもまた、アプリケーションをブロードキャストした前段のマルチプロセッサ、もしくは初めにアプリケーションをブロードキャストしたマルチプロセッサ、のうち少なくとも一つからのアプリケーションの距離を調べて、当該タスクを第２のマルチプロセッサシステムで実行すべきか否かを決定する。第２のマルチプロセッサは、第２のマルチプロセッサがタスクを実行しているか否かを第１のマルチプロセッサに連絡するのが好ましい。なお、ここでいう「距離」には時間的な距離と空間的な距離の両方が含まれ、より具体的にはレイテンシやアプリケーション間の帯域などがある。 According to another aspect of the invention, the second multiprocessor also examines the distance of the application from at least one of the previous multiprocessor that broadcast the application or the multiprocessor that originally broadcast the application, It is determined whether or not the task should be executed on the second multiprocessor system. The second multiprocessor preferably communicates to the first multiprocessor whether the second multiprocessor is performing a task. The “distance” here includes both a temporal distance and a spatial distance, and more specifically includes a latency, a band between applications, and the like.

本発明の別の態様によると、ブロードキャストされたアプリケーションは複数の他のマルチプロセッサによって受け取られる。複数の他のマルチプロセッサは、それぞれアプリケーションに含まれる移動されるべきタスク（単数または複数）をばらす。複数の他のマルチプロセッサは、それぞれ、タスクを実行するための要件を指定する属性を調査して、当該タスクを実行すべきか否かを決定する。好適には、複数の他のマルチプロセッサのそれぞれは、アプリケーションをブロードキャストした前段のマルチプロセッサ、もしくは初めにアプリケーションをブロードキャストしたマルチプロセッサ、のうち少なくとも一つからのアプリケーションの距離と、複数のタスクのそれぞれを終了するためのデッドラインと、を調査して、当該タスクを実行すべきか否かを決定する。 According to another aspect of the invention, the broadcast application is received by a plurality of other multiprocessors. Multiple other multiprocessors each distribute the task (s) to be moved included in the application. Each of the other multiprocessors examines an attribute that specifies a requirement for executing the task and determines whether or not the task should be executed. Preferably, each of the plurality of other multiprocessors includes a distance of the application from at least one of the preceding multiprocessor that broadcasts the application or the multiprocessor that initially broadcast the application, and each of the plurality of tasks. The deadline for ending the task is examined to determine whether or not the task should be executed.

本発明のさらに別の態様によると、タスクを移動するための装置が提供される。この装置は、ネットワークに接続可能なマルチプロセッサであって、該マルチプロセッサによってタスクを実行すべきか、またはネットワークに接続された少なくとも一つのマルチプロセッサに移動すべきかを決定するようにプログラムされたマルチプロセッサを含む。このマルチプロセッサは、少なくとも一つのマルチプロセッサにタスクを移動すべきと決定されたとき、ネットワークを介して当該マルチプロセッサからアプリケーションをブロードキャストするよう指示する。アプリケーションは複数のタスクと一つの属性とを指定し、この属性は、アプリケーションをブロードキャストした前段のマルチプロセッサ、もしくは初めにアプリケーションをブロードキャストしたマルチプロセッサ、のうち少なくとも一つからのアプリケーションの距離と、複数のタスクのそれぞれを終了するためのデッドラインと、を表示する。 According to yet another aspect of the invention, an apparatus for moving a task is provided. The apparatus is a multiprocessor connectable to a network and programmed to determine whether the multiprocessor should perform a task or move to at least one multiprocessor connected to the network including. When it is determined that the task should be moved to at least one multiprocessor, the multiprocessor instructs the multiprocessor to broadcast the application via the network. The application specifies a plurality of tasks and one attribute, and this attribute indicates the distance of the application from at least one of the preceding multiprocessor that broadcast the application or the multiprocessor that first broadcast the application, and the plurality of attributes. And a deadline for ending each of the tasks.

本発明の他の態様、特徴および利点は、添付の図面とともになされる本明細書の説明から当業者にとって明らかとなろう。 Other aspects, features and advantages of the present invention will become apparent to those skilled in the art from the present description taken together with the accompanying drawings.

説明のために、現時点で好適である形態が図面に示されるが、本発明は図示の正確な構成および手段に限定されないことを理解されよう。 For the purposes of explanation, the presently preferred form is shown in the drawings, but it will be understood that the invention is not limited to the precise arrangements and instrumentalities shown.

同様の符号が同様の要素を示す図面を参照して、図１は、本発明の一つまたは複数の態様によるマルチプロセッサシステム１００を示す。マルチプロセッサシステム１００は、バス１０８を介して、ＤＲＡＭなどの共有メモリ１０６に連結される複数のプロセッサ１０２（任意の数を使用可能である）を備える。共有メモリ１０６はＤＲＡＭでなくてもよい点に注意する。実際、共有メモリは任意の既知の技術または今後開発される技術を用いて形成することができる。 Referring to the drawings, wherein like numerals indicate like elements, FIG. 1 illustrates a multiprocessor system 100 in accordance with one or more aspects of the present invention. The multiprocessor system 100 includes a plurality of processors 102 (any number can be used) coupled to a shared memory 106 such as DRAM via a bus 108. Note that the shared memory 106 need not be a DRAM. Indeed, the shared memory can be formed using any known or later developed technology.

プロセッサ１０２のうちの一台は、例えば処理ユニット１０２Ａである主処理ユニットであることが好ましい。他の処理ユニット１０２は、サブ処理ユニット（ＳＰＵ）、例えば処理ユニット１０２Ｂ、１０２Ｃ、１０２Ｄなどであることが好ましい。サブ処理ユニット１０２は、既知のまたは今後開発されるコンピュータアーキテクチャのうち任意のものを使用して実現可能なものであってよい。サブ処理ユニット１０２の全てが同一のアーキテクチャを使用して実現される必要はなく、実際、それらは異種または同種の構成のいずれであってもよい。主処理ユニット１０２Ａは、サブ処理ユニット１０２Ｂ〜１０２Ｄに対してローカルに、例えば、同一チップ、同一のパッケージ、同一の回路基盤、同一の製品に位置してもよいことに注意する。代替的に、主処理ユニット１０２Ａは、サブ処理ユニット１０２Ｂ〜１０２Ｄに対してリモートに、例えば、バスやインターネットのような通信ネットワークなどを介して接続可能な異なる製品に位置してもよい。同様に、サブ処理ユニット１０２Ｂ〜１０２Ｄは、互いにローカルにまたはリモートに位置してもよい。 One of the processors 102 is preferably a main processing unit, for example a processing unit 102A. The other processing unit 102 is preferably a sub-processing unit (SPU), such as processing units 102B, 102C, 102D. Sub-processing unit 102 may be implemented using any of the known or later developed computer architectures. Not all of the sub-processing units 102 need to be implemented using the same architecture; in fact, they can be either heterogeneous or homogeneous configurations. Note that main processing unit 102A may be located locally relative to sub-processing units 102B-102D, for example, on the same chip, the same package, the same circuit board, and the same product. Alternatively, the main processing unit 102A may be located in a different product that can be connected remotely to the sub-processing units 102B-102D, for example via a communication network such as a bus or the Internet. Similarly, the sub-processing units 102B-102D may be located locally or remotely from each other.

主処理ユニット１０２Ａを使用してサブ処理ユニット１０２Ｂ〜１０２Ｄによるデータおよびアプリケーションの処理をスケジューリングし調整する（orchestrate）ことで、サブ処理ユニット１０２Ｂ〜１０２Ｄは、並列に独立してこれらのデータおよびアプリケーションの処理を実行する。しかしながら、本発明の一部の態様によれば、主処理ユニット１０２Ａは、サブ処理ユニットの間のプロセッサタスクの実行をスケジューリングする中心的な役割を果たさない。むしろ、そのようなスケジューリングは、ＳＰＵ自身に任されることが好ましい。 By using the main processing unit 102A to schedule and orchestrate the processing of data and applications by the sub-processing units 102B-102D, the sub-processing units 102B-102D are independent of these data and applications in parallel. Execute the process. However, according to some aspects of the invention, main processing unit 102A does not play a central role in scheduling execution of processor tasks between sub-processing units. Rather, such scheduling is preferably left to the SPU itself.

図１のプロセッサ１０２に対する役割および機能の割り当ては柔軟的である。例えば、プロセッサ１０２のいずれもが主処理ユニットまたはサブ処理ユニットであってよい。 The assignment of roles and functions to the processor 102 of FIG. 1 is flexible. For example, any of the processors 102 may be a main processing unit or a sub processing unit.

図２を参照して、主処理ユニット１０２Ａは、特にＳＰＵの間でのプロセッサタスクのスケジューリングおよび管理に関して、ＳＰＵ１０２Ｂ〜１０２Ｆに対するサービスプロセッサの役割を担うことが好ましい。本発明のいくつかの態様によれば、主処理ユニット１０２Ａは、ソフトウェアアプリケーションの範囲内に含まれるプロセッサタスクを評価することができ、共有メモリ１０６の割り当て、ＳＰＵの割り当て、および共有メモリ１０６内のプロセッサタスク１１０の最初の記憶に関わることができる。共有メモリ１０６の割り当てに関して、主処理ユニット１０２Ａは、所与の数のプロセッサタスク１１０に割り当てられるべきメモリ空間の量を決定することが好ましい。この点に関して、主処理ユニット１０２Ａは、いくつかのプロセッサタスク１１０の記憶用に共有メモリ１０６の第１の領域１０６Ａを割り当て、他のプロセッサタスク１１０の記憶用に共有メモリ１０６の第２の領域１０６Ｂを割り当ててもよい。主処理ユニット１０２Ａは、共有メモリ１０６内の領域１０６Ａおよび領域１０６Ｂそれぞれにおけるデータ同期に関するルールを設定することもできる。 Referring to FIG. 2, main processing unit 102A preferably takes the role of a service processor for SPUs 102B-102F, particularly with respect to scheduling and management of processor tasks among SPUs. In accordance with some aspects of the present invention, main processing unit 102A can evaluate processor tasks included within the scope of a software application, and can allocate shared memory 106, SPU allocation, and shared memory 106 The initial storage of the processor task 110 can be involved. With respect to the allocation of shared memory 106, main processing unit 102A preferably determines the amount of memory space to be allocated to a given number of processor tasks 110. In this regard, main processing unit 102A allocates first area 106A of shared memory 106 for storage of some processor tasks 110 and second area 106B of shared memory 106 for storage of other processor tasks 110. May be assigned. The main processing unit 102A can also set rules regarding data synchronization in each of the areas 106A and 106B in the shared memory 106.

本発明の一つまたは複数のさらなる態様によれば、共有メモリ１０６の領域１０６Ａおよび領域１０６Ｂには、定められた数のサブ処理ユニット１０２のみアクセスすることができる。例えば、共有メモリ１０６の所与の領域の中に格納される特定のプロセッサタスク１１０を実行するために割り当てられるサブ処理ユニット１０２のみアクセスすることができる。例えば、サブ処理ユニット１０２Ｂ〜１０２Ｄのみが、共有メモリ１０６の第１の領域１０６Ａ内のプロセッサタスク１１０へのアクセスが許可されていることが好ましい。同様に、サブ処理ユニット１０２Ｅ〜１０２Ｆのみが、共有メモリ１０６の第２の領域１０６Ｂ内のプロセッサタスク１１０へのアクセスが許可されていることが好ましい。共有メモリ１０６の各領域１０６Ａおよび１０６Ｂを保護する技術に関するさらなる詳細は、「ブロードバンドネットワーク用のコンピュータアーキテクチャのメモリ保護システムおよび方法」と題する米国特許第６，５２６，４９１号に見いだすことができ、その全ての開示は参照により本明細書に援用される。 In accordance with one or more further aspects of the present invention, only a defined number of sub-processing units 102 can be accessed in areas 106A and 106B of shared memory 106. For example, only sub-processing units 102 that are assigned to perform a particular processor task 110 stored in a given area of shared memory 106 can be accessed. For example, it is preferable that only the sub processing units 102 B to 102 D are permitted to access the processor task 110 in the first area 106 A of the shared memory 106. Similarly, it is preferable that only the sub processing units 102E to 102F are permitted to access the processor task 110 in the second area 106B of the shared memory 106. Further details regarding techniques for protecting each area 106A and 106B of the shared memory 106 can be found in US Pat. No. 6,526,491 entitled “Computer Architecture Memory Protection System and Method for Broadband Networks”, which All disclosures are incorporated herein by reference.

本発明の一つまたは複数の態様によると、プロセッサタスク１１０が共有メモリ１０６内に配置され、サブ処理ユニット１０２がタスクの実行を割り当てられた後、主処理ユニット１０２Ａは、プロセッサタスク１１０の実行のスケジューリングおよび管理に参加しないことが好ましい。代わりに、それらの責任は関係する特定のサブ処理ユニット１０２に任される。 In accordance with one or more aspects of the present invention, after the processor task 110 is located in the shared memory 106 and the sub-processing unit 102 is assigned to execute the task, the main processing unit 102A may execute the execution of the processor task 110. It is preferable not to participate in scheduling and management. Instead, those responsibilities are left to the specific sub-processing unit 102 involved.

本発明の様々な実施形態のプロセッサタスク管理特徴に関してさらに詳細を述べる前に、マルチプロセッサシステムを実装するための好適なコンピュータ・アーキテクチャについて説明する。この点に関して、図３の基本処理モジュールまたはプロセッサ要素（processor element：ＰＥ）２００のブロック図を参照する。このコンピュータアーキテクチャによると、マルチプロセッサシステムの全てのサブプロセッサは、共通のコンピューティングモジュール（またはセル）から構成される。この共通のコンピューティングモジュールは、一貫した構造を有し、好ましくは同一の命令セットアーキテクチャを使用する。本発明の別の実施形態では、サブ処理ユニットは異種の構成であってもよい。マルチプロセッサシステムは、一つまたは複数のクライアント、サーバ、ＰＣ、モバイルコンピュータ、ゲーム機、ＰＤＡ、セットトップボックス、機器、デジタルテレビ、およびコンピュータプロセッサを使用する他の装置から形成することができる。 Before describing further details regarding the processor task management features of the various embodiments of the present invention, a preferred computer architecture for implementing a multiprocessor system will be described. In this regard, reference is made to the block diagram of the basic processing module or processor element (PE) 200 of FIG. According to this computer architecture, all sub-processors of a multiprocessor system are composed of a common computing module (or cell). This common computing module has a consistent structure and preferably uses the same instruction set architecture. In another embodiment of the invention, the sub-processing units may have different configurations. A multiprocessor system can be formed from one or more clients, servers, PCs, mobile computers, game consoles, PDAs, set-top boxes, devices, digital televisions, and other devices that use computer processors.

基本処理モジュールは、プロセッサ要素（ＰＥ）である。図３に示すように、ＰＥ２００は、Ｉ／Ｏインタフェース２０２、処理ユニット（processing unit：ＰＵ）２０４、ダイレクト・メモリ・アクセス・コントローラ（ＤＭＡＣ）２０６、複数のサブ処理ユニット２０８、すなわち、サブ処理ユニット２０８Ａ、サブ処理ユニット２０８Ｂ、サブ処理ユニット２０８Ｃおよびサブ処理ユニット２０８Ｄを含む。ローカル（または内部）ＰＥバス２１２は、ＰＵ２０４、サブ処理ユニット２０８、ＤＭＡＣ２０６およびメモリインタフェース２１０の間でデータおよびアプリケーションを送信する。ローカルＰＥバス２１２は、例えば、従来のアーキテクチャを有することができ、またはパケットスイッチネットワークとして実現することもできる。パケットスイッチネットワークとして実現すると、より多くのハードウェアを必要とするが、利用可能な帯域幅を増加することができる。 The basic processing module is a processor element (PE). As shown in FIG. 3, the PE 200 includes an I / O interface 202, a processing unit (PU) 204, a direct memory access controller (DMAC) 206, and a plurality of sub processing units 208, that is, sub processing units. 208A, sub-processing unit 208B, sub-processing unit 208C and sub-processing unit 208D. A local (or internal) PE bus 212 transmits data and applications between the PU 204, sub-processing unit 208, DMAC 206 and memory interface 210. The local PE bus 212 can have, for example, a conventional architecture or can be implemented as a packet switch network. When implemented as a packet switch network, more hardware is required, but the available bandwidth can be increased.

ＰＥ２００は、デジタルロジックを実装する様々な方法を使用して構成されることができる。しかしながら、ＰＥ２００は、シリコン基板上の相補型金属酸化膜半導体（ＣＭＯＳ）を使用する単一の集積回路として構成されることが好ましい。基板の代替的材料はガリウム・ヒ素、ガリウム・アルミニウム・ヒ素および多様なドーパントを使用するいわゆるＩＩＩ−Ｂ合成物を含む。ＰＥ２００は、超電導材料、例えば高速単一磁束量子（ＲＳＦＱ）論理回路を使用して実現することもできる。 The PE 200 can be configured using various methods of implementing digital logic. However, the PE 200 is preferably configured as a single integrated circuit using complementary metal oxide semiconductor (CMOS) on a silicon substrate. Alternative materials for the substrate include gallium arsenide, gallium aluminum arsenic and so-called III-B composites using various dopants. The PE 200 can also be realized using superconducting materials, such as high speed single flux quantum (RSFQ) logic circuits.

ＰＥ２００は、高帯域幅のメモリ接続２１６を通して、ダイナミック・ランダム・アクセス・メモリ（ＤＲＡＭ）２１４と密接に関連する。ＤＲＡＭ２１４は、ＰＥ２００用のメインメモリ（または共有メモリ）として機能する。ＤＲＡＭ２１４は好ましくはダイナミック・ランダム・アクセス・メモリであるが、例えば、スタティック・ランダム・アクセス・メモリ（ＳＲＡＭ）、磁気ランダム・アクセス・メモリ（ＭＲＡＭ）、光学メモリ、ホログラフィック・メモリなどの他の手段を使用して実現することもできる。ＤＭＡＣ２０６およびおよびメモリインタフェース２１０は、ＤＲＡＭ２１４と、ＰＥ２００のサブ処理ユニット２０８およびＰＵ２０４との間のデータの転送を容易にする。ＤＭＡＣ２０６および／またはメモリインタフェース２１０は、サブ処理ユニット２０８およびＰＵ２０４に対して一体的にまたは別々に配置されていてもよい点に注意する。実際、図示のように別々の構成とする代わりに、ＤＭＡＣ２０６の機能および／またはメモリインタフェース２１０の機能は、サブ処理ユニット２０８およびＰＵ２０４の一つまたは複数（好ましくは全て）と一体であってもよい。 PE 200 is closely associated with dynamic random access memory (DRAM) 214 through high bandwidth memory connection 216. The DRAM 214 functions as a main memory (or shared memory) for the PE 200. DRAM 214 is preferably dynamic random access memory, but other means such as static random access memory (SRAM), magnetic random access memory (MRAM), optical memory, holographic memory, etc. It can also be realized using The DMAC 206 and the memory interface 210 facilitate the transfer of data between the DRAM 214 and the sub-processing unit 208 and PU 204 of the PE 200. Note that the DMAC 206 and / or the memory interface 210 may be located integrally or separately with respect to the sub-processing unit 208 and the PU 204. Indeed, instead of having separate configurations as shown, the functions of the DMAC 206 and / or the functions of the memory interface 210 may be integrated with one or more (preferably all) of the sub-processing unit 208 and the PU 204. .

例えば、ＰＵ２０４は、データおよびアプリケーションを独立して処理することができる標準のプロセッサであってもよい。サブ処理ユニット２０８は、好ましくは単一命令マルチデータ（ＳＩＭＤ）プロセッサである。サブ処理ユニット２０８は、並行にまたは独立して、データおよびアプリケーションの処理を実行することが好ましい。ＤＭＡＣ２０６は、ＰＵ２０４およびサブ処理ユニット２０８による、共有ＤＲＡＭ２４に格納されるデータおよびアプリケーション（例えば、プロセッサタスク１１０）へのアクセスを制御する。ＰＵ２０４は、主処理ユニットの役割を引き受けているサブ処理ユニット２０８のうち一つにより実現されてもよいことに注意する。 For example, PU 204 may be a standard processor that can process data and applications independently. Sub-processing unit 208 is preferably a single instruction multi-data (SIMD) processor. Sub-processing unit 208 preferably performs data and application processing in parallel or independently. The DMAC 206 controls access to data and applications (eg, processor task 110) stored in the shared DRAM 24 by the PU 204 and sub-processing unit 208. Note that the PU 204 may be implemented by one of the sub-processing units 208 taking on the role of the main processing unit.

このモジュール構造にしたがって、特定のコンピュータシステムによって使用されるＰＥ２００の数は、そのシステムによって必要とされる処理能力に基づいて決まる。例えば、サーバは４つのＰＥ２００を使用することができ、ワークステーションは二つのＰＥ２００を使用することができ、ＰＤＡは一つのＰＥ２００を使用することができる。特定のアプリケーションを処理すべく割り当てられるＰＥ２００のサブ処理ユニットの数は、セル内のプログラムおよびデータの複雑さおよび大きさによって決まる。 According to this modular structure, the number of PEs 200 used by a particular computer system is determined based on the processing power required by that system. For example, a server can use four PEs 200, a workstation can use two PEs 200, and a PDA can use one PE 200. The number of PE 200 sub-processing units allocated to handle a particular application depends on the complexity and size of the program and data in the cell.

図４は、サブ処理ユニット２０８の好適な構造および機能を示す。サブ処理ユニット２０８は、ローカルメモリ２５０、レジスタ２５２、一つまたは複数の浮動小数点ユニット２５４、および一つまたは複数の整数ユニット２５６を備える。しかしながら、必要とされる処理能力によって、より多数のまたはより少数の浮動小数点ユニット２５４および整数ユニット２５６を使用してもよい。浮動小数点ユニット２５４は、好ましくは一秒につき３２０億回の浮動小数点演算（３２ＧＦＬＯＰＳ）をする速度で動作し、整数ユニット２５６は、好ましくは一秒につき３２０億回の演算（３２ＧＯＰＳ）をする速度で動作する。 FIG. 4 shows the preferred structure and function of the sub-processing unit 208. The sub-processing unit 208 includes a local memory 250, a register 252, one or more floating point units 254, and one or more integer units 256. However, more or fewer floating point units 254 and integer units 256 may be used depending on the processing power required. Floating point unit 254 preferably operates at a rate of 32 billion floating point operations per second (32 GFLOPS) and integer unit 256 preferably at a rate of 32 billion operations per second (32 GOPS). Operate.

好ましい実施形態では、ローカルメモリ２５０は２５６キロバイトの記憶装置を含み、レジスタ２５２の容量は１２８×１２８ビットである。プロセッサタスク１１０が共有メモリ２１４を使用して実行されない点に注意する。むしろ、タスク１１０は、所与のサブ処理ユニット２０８のローカルメモリ２５０にコピーされ、ローカルに実行される。 In the preferred embodiment, local memory 250 includes 256 kilobytes of storage and the capacity of register 252 is 128 × 128 bits. Note that processor task 110 is not executed using shared memory 214. Rather, task 110 is copied to local memory 250 of a given sub-processing unit 208 and executed locally.

ローカルメモリ２５０は、キャッシュメモリであってもキャッシュメモリでなくてもよい。好ましくは、ローカルメモリ２５０はスタティック・ランダム・アクセス・メモリ（ＳＲＡＭ）として構成される。ＰＵ２０４は、ＰＵ２０４により起動されるダイレクトメモリアクセスのために、キャッシュ整合性サポートを必要としてよい。しかしながら、キャッシュ整合性サポートは、サブ処理ユニット２０８によって起動されるダイレクトメモリアクセス、または、外部装置との間でのアクセスについては、不要である。 The local memory 250 may or may not be a cache memory. Preferably, local memory 250 is configured as a static random access memory (SRAM). The PU 204 may require cache coherency support for direct memory access initiated by the PU 204. However, cache consistency support is not required for direct memory access initiated by the sub-processing unit 208 or access to external devices.

サブ処理ユニット２０８は、サブ処理ユニット２０８との間でデータおよびアプリケーションを送受信するためのバスインタフェース（Ｉ／Ｆ）２５８をさらに含む。好ましい実施形態では、バスＩ／Ｆ２５８はＤＭＡＣ２０６に連結される。ＤＭＡＣ２０６は、サブ処理ユニット２０８内に一体的に配置することもでき、あるいは外部に配置することもできることを示すよう、図３では点線で描かれている。一対のバス２６８Ａ、２６８Ｂは、バスＩ／Ｆ２５８とローカルメモリ２５０の間でＤＭＡＣ２０６を相互接続する。バス２６８Ａ、２６８Ｂは、好ましくは２５６ビット長である。 The sub processing unit 208 further includes a bus interface (I / F) 258 for transmitting and receiving data and applications to and from the sub processing unit 208. In the preferred embodiment, the bus I / F 258 is coupled to the DMAC 206. The DMAC 206 is depicted with a dotted line in FIG. 3 to indicate that it can be located integrally within the sub-processing unit 208 or can be located externally. A pair of buses 268A, 268B interconnect the DMAC 206 between the bus I / F 258 and the local memory 250. Buses 268A, 268B are preferably 256 bits long.

サブ処理ユニット２０８は、内部バス２６０、２６２および２６４をさらに含む。好ましい実施形態では、バス２６０は２５６ビットの幅を有し、ローカルメモリ２５０およびレジスタ２５２の間の通信を提供する。バス２６２および２６４は、それぞれ、レジスタ２５２と浮動小数点ユニット２５４の間、レジスタ２５２と整数ユニット２５６の間の通信を提供する。好ましい実施形態では、レジスタ２５２から浮動小数点ユニットまたは整数ユニットに向かうバス２６４と２６２のバス幅は３８４ビットであり、浮動小数点ユニット２５４または整数ユニット２５６からレジスタ２５２に向かうバス２６４および２６２のバス幅は、１２８ビットである。浮動小数点ユニット２５４または整数ユニット２５６からレジスタ２５２へのバス幅より、レジスタ２５２から両ユニットへのバス幅が大きいため、処理の間、レジスタ２５２からより大きいデータフローを収容する。各計算のために、最大で３ワードが必要である。しかしながら、各計算の結果は、通常１ワードのみである。 Sub-processing unit 208 further includes internal buses 260, 262 and 264. In the preferred embodiment, bus 260 has a width of 256 bits and provides communication between local memory 250 and register 252. Buses 262 and 264 provide communication between register 252 and floating point unit 254 and between register 252 and integer unit 256, respectively. In the preferred embodiment, the bus width of buses 264 and 262 from register 252 to the floating point unit or integer unit is 384 bits, and the bus widths of buses 264 and 262 from floating point unit 254 or integer unit 256 to register 252 are , 128 bits. Because the bus width from the register 252 to both units is larger than the bus width from the floating point unit 254 or integer unit 256 to the register 252, a larger data flow from the register 252 is accommodated during processing. A maximum of 3 words are required for each calculation. However, the result of each calculation is usually only one word.

本発明の様々なプロセッサタスク管理特徴に戻り、図２を参照して、実行のために共有メモリ１０６からＳＰＵ１０２のローカルメモリの一つにいずれのプロセッサタスク１１０をコピーすべきかを決定するために、サブ処理ユニット１０２はタスクテーブルを利用することが好ましい。この点に関して、図５を参照する。図５は、本発明の様々な態様により利用可能なタスクテーブル２８０の概念的な説明である。タスクテーブル２８０は、好ましくは共有メモリ１０６に格納される。タスクテーブル２８０の初期化方法の詳細は、後述する。タスクテーブル２８０は、複数のタスクテーブルエントリＴ１、Ｔ２、Ｔ３その他を含むことが好ましい。各タスクテーブルエントリは、例えば、タスクテーブルエントリをプロセッサタスク１１０に関係付ける連想アドレス指定または他の手段によって、プロセッサタスク１１０（図２参照）の一つに関連付けられるのが好ましい。 Returning to the various processor task management features of the present invention and referring to FIG. 2, to determine which processor task 110 should be copied from shared memory 106 to one of the local memory of SPU 102 for execution. The sub-processing unit 102 preferably uses a task table. In this regard, reference is made to FIG. FIG. 5 is a conceptual illustration of a task table 280 that can be used in accordance with various aspects of the present invention. The task table 280 is preferably stored in the shared memory 106. Details of the initialization method of the task table 280 will be described later. The task table 280 preferably includes a plurality of task table entries T1, T2, T3 and others. Each task table entry is preferably associated with one of the processor tasks 110 (see FIG. 2), eg, by associative addressing or other means of associating the task table entry with the processor task 110.

好ましい実施形態では、各タスクテーブルエントリは、ステータス表示（ＳＴＡＴ）、優先度表示（ＰＲＩ）、および一対のポインタ（ＰＲＥＶ、ＮＥＸＴ）のうちの少なくとも一つを含んでもよい。ＳＴＡＴは、所与のタスクテーブルエントリに関連付けられるプロセッサタスクが、一つまたは複数のサブ処理ユニットによって実行される準備ができている（ＲＥＡＤＹ）か、または実行中（ＲＵＮＮＩＮＧ）であるかについての表示を提供することが好ましい。ＰＲＩは、関連するプロセッサタスク１１０の優先度についての表示を提供することが好ましい。プロセッサタスク１１０と関連付けられる優先度は任意の数であってよく、これは、ソフトウェアプログラマが設定してもよいし、または、ソフトウェアアプリケーションの実行によって後で設定することもできる。いずれにしても、プロセッサタスク１１０の優先度は、プロセッサタスクが実行される順序を設定するために利用されることができる。ＰＲＥＶ値は、リンク付けされたタスクテーブルエントリの順序付きリスト（または、プロセッサタスクのリスト）において、前のタスクテーブルエントリ（または、前のプロセッサタスク１１０）へのポインタであることが好ましい。ＮＥＸＴ値は、リンク付けされたタスクテーブルエントリの順序付きリスト（または、プロセッサタスクのリスト）において、次のタスクテーブルエントリ（またはプロセッサタスク）へのポインタであることが好ましい。 In a preferred embodiment, each task table entry may include at least one of a status indication (STAT), a priority indication (PRI), and a pair of pointers (PREV, NEXT). STAT is an indication of whether the processor task associated with a given task table entry is ready (READY) or running (RUNNING) for execution by one or more sub-processing units Is preferably provided. The PRI preferably provides an indication of the priority of the associated processor task 110. The priority associated with processor task 110 may be any number, which may be set by a software programmer or later by execution of a software application. In any case, the priority of the processor task 110 can be used to set the order in which the processor tasks are executed. The PREV value is preferably a pointer to the previous task table entry (or previous processor task 110) in the ordered list of linked task table entries (or list of processor tasks). The NEXT value is preferably a pointer to the next task table entry (or processor task) in the ordered list of linked task table entries (or list of processor tasks).

本発明の一つまたは複数の態様によると、タスクテーブル２８０は、プロセッサタスク１１０が実行のために共有メモリ１０６からコピーされる順序を決定するために、サブ処理ユニット２０８によって利用されることが好ましい。例えば、マルチプロセッサシステム１００または２００上のソフトウェアアプリケーションを適切に実行するために、特定のプロセッサタスク１１０は、特定の順序、すなわち少なくとも一般的な順序、つまりＴ１、Ｔ８、Ｔ６、Ｔ９の順序で実行される必要があってもよい。プロセッサタスク配列のこの実施例を反映するために、タスクテーブル２８０は、拡張プロセッサタスクによる、タスクテーブルエントリのリンク付きリストを作成するタスクテーブルエントリのＰＲＥＶ部分およびＮＥＸＴ部分におけるポインタを含むことが好ましい。上記例の特殊性によると、タスクテーブルエントリＴ１は、タスクテーブルエントリＴ８をポイントするＮＥＸＴ値を含む。タスクテーブルエントリＴ８は、タスクテーブルエントリＴ１をポイントするＰＲＥＶ値と、タスクテーブルエントリＴ６をポイントするＮＥＸＴ値を含む。タスクテーブルエントリＴ６は、タスクテーブルエントリＴ８をポイントするＰＲＥＶ値と、タスクテーブルエントリＴ９をポイントするＮＥＸＴ値を含む。タスクテーブルエントリＴ９は、タスクテーブルエントリＴ６をポイントするＰＲＥＶ値を含む。 According to one or more aspects of the present invention, task table 280 is preferably utilized by sub-processing unit 208 to determine the order in which processor tasks 110 are copied from shared memory 106 for execution. . For example, to properly execute a software application on multiprocessor system 100 or 200, a particular processor task 110 is executed in a particular order, ie at least a general order, ie, T1, T8, T6, T9. May need to be done. To reflect this embodiment of the processor task array, the task table 280 preferably includes pointers in the PREV and NEXT portions of the task table entry that create a linked list of task table entries by the extended processor task. According to the particularity of the above example, task table entry T1 includes a NEXT value that points to task table entry T8. Task table entry T8 includes a PREV value that points to task table entry T1, and a NEXT value that points to task table entry T6. Task table entry T6 includes a PREV value that points to task table entry T8 and a NEXT value that points to task table entry T9. Task table entry T9 includes a PREV value that points to task table entry T6.

図６を参照して、上記例のタスクテーブルエントリのリンク付きリストは、状態図として概念的に示すことができる。この状態図において、タスクテーブルエントリＴ１に関連付けられる特定のプロセッサタスクからの遷移が、タスクテーブルエントリＴ８に関連付けられている別のプロセッサタスクの選択および実行を引き起こす。タスクテーブルエントリＴ８に関連付けられるプロセッサタスクからの遷移が、タスクテーブルエントリＴ６に関連付けられるプロセッサタスクの選択および実行を引き起こし、以下同様である。第１の、または先頭のタスクテーブルエントリＴ１が、タスクテーブルエントリＴ９をポイントするＰＲＥＶ値を含むことを保証し、また、タスクテーブルエントリＴ９がタスクテーブルエントリＴ１をポイントするＮＥＸＴ値を含むことを保証することによって、タスクテーブルエントリ（および／またはプロセッサタスク自身）の循環的な関連を達成することができる。 With reference to FIG. 6, the linked list of task table entries in the above example can be conceptualized as a state diagram. In this state diagram, a transition from a particular processor task associated with task table entry T1 causes the selection and execution of another processor task associated with task table entry T8. The transition from the processor task associated with task table entry T8 causes selection and execution of the processor task associated with task table entry T6, and so on. Guarantees that the first or first task table entry T1 contains a PREV value that points to task table entry T9, and that the task table entry T9 contains a NEXT value that points to task table entry T1 By doing so, a circular association of task table entries (and / or the processor task itself) can be achieved.

動作中、共有メモリ１０６のプロセッサタスク１１０（好ましくは所与の領域１０６Ａまたは１０６Ｂ内の）のプールを実行するために割り当てられるそれぞれのサブ処理ユニット１０２は、いずれのプロセッサタスク１１０が実行のために次に占有されるかを判定するために、まず、タスクテーブル２８０にアクセスする。リンク付きリストの最初のまたは先頭のエントリの特定に役立てるために、サブ処理ユニット１０２は、図７に概念的に示すタスクキュー２８２へのアクセスを有することが好ましい。タスクキュー２８２は、関連するプロセッサタスク１１０それぞれの優先度のためのエントリを含むことが好ましい。各エントリは、ＨＥＡＤポインタおよびＴＡＩＬポインタのうちの少なくとも一つを含むことが好ましい。 In operation, each sub-processing unit 102 assigned to execute a pool of processor tasks 110 in shared memory 106 (preferably in a given region 106A or 106B) has any processor task 110 for execution. Next, in order to determine whether it is occupied next, the task table 280 is first accessed. To help identify the first or first entry in the linked list, sub-processing unit 102 preferably has access to task queue 282, conceptually shown in FIG. The task queue 282 preferably includes an entry for the priority of each associated processor task 110. Each entry preferably includes at least one of a HEAD pointer and a TAIL pointer.

図６をさらに参照して、例示的なリンク付きリストの状態図は、優先度１を有するプロセッサタスク１１０を表している。実際、エントリＴ１、Ｔ８、Ｔ６およびＴ９のタスクテーブルエントリ（図５）は、それぞれ「１」のＰＲＩ値を含む。 Still referring to FIG. 6, an exemplary linked list state diagram represents a processor task 110 having a priority of 1. Indeed, the task table entries (FIG. 5) for entries T1, T8, T6 and T9 each contain a PRI value of “1”.

優先度１と関連するタスクキューエントリのＨＥＡＤポインタとＴＡＩＬポインタは、それぞれ、タスクテーブルエントリＴ１とタスクテーブルエントリＴ９へのポインタを含む。タスクキュー２８２の他のエントリは、他のリンク付きリスト用の他の優先度のＨＥＡＤポインタおよびＴＡＩＬポインタと関連付けられる。このように、本発明の様々な実施形態は、（拡張プロセッサタスクによって）タスクテーブルエントリの多数のリンク付きリストを含むことができ、各リンク付きリストは同一のまたは少なくとも類似の優先度を含むことを考慮する。それぞれのサブ処理ユニット１０２は、タスクテーブル２８０とタスクキュー２８２を利用して、いずれのプロセッサタスク１１０が、実行のために共有メモリ１０６からコピーされるべきかについて決定することが好ましい。それぞれのリンク付きリストが作成され適切に維持されると仮定すると、ソフトウェア・アプリケーション全体の実行時に所望の結果を達成するために、プロセッサタスク１１０は適切な順序で実行されることができる。 The HEAD pointer and TAIL pointer of the task queue entry associated with priority 1 include pointers to task table entry T1 and task table entry T9, respectively. Other entries in task queue 282 are associated with other priority HEAD and TAIL pointers for other linked lists. Thus, various embodiments of the present invention can include multiple linked lists of task table entries (by extended processor tasks), each linked list including the same or at least similar priorities. Consider. Each sub-processing unit 102 preferably uses the task table 280 and task queue 282 to determine which processor tasks 110 should be copied from the shared memory 106 for execution. Assuming that each linked list is created and properly maintained, the processor tasks 110 can be executed in the appropriate order to achieve the desired results when the entire software application is executed.

本発明の様々な態様によると、サブ処理ユニット１０２は、ソフトウェアアプリケーションの実行の間、タスクテーブル２８０とタスクキュー２８２を維持し修正する。この点に関して、図８から図１０を参照する。これらは、本発明の一つまたは複数の望ましい特徴を達成するのに適したプロセスフローを表すフロー図である。アクション３００で、特定のサブ処理ユニット１０２が呼び出されて、プロセッサタスク１１０を共有メモリ１０６から自身のローカルメモリにコピーし始める。アクション３０２で、サブ処理ユニット１０２はタスクキュー２８２をロックし、タスクキュー２８２を自身のローカルメモリにコピーする。その後、タスクキュー２８２は、最も高い優先度の準備ができたタスクを求めて検索される（アクション３０４）。図７で示した実施例を使用すれば、タスクキュー２８２は、最高の優先度、例えば優先度１のプロセッサタスクに関連付けられているタスクテーブルエントリＴ１をポイントするＨＥＡＤポインタを含む。タスクテーブルエントリＴ１に関連付けられたプロセッサタスクが実行のために目標とされるので、サブ処理ユニット１０２は、好ましくはタスクキュー２８２を修正して、そのプロセッサタスクへの参照を除去する（アクション３０６）。好ましい実施形態では、これは、タスクテーブルエントリＴ１へのＨＥＡＤポインタを、実行のために占有されるべき次のプロセッサタスクを表示する新規な第１の（または先頭の）タスクテーブルエントリになる別のタスクテーブルエントリへと修正することを伴う。特に、タスクテーブルエントリＴ１のＮＥＸＴポインタが、優先度１の新たなＨＥＡＤポインタとして用いられてもよい。実際、図６に示したように、タスクテーブルエントリＴ１に関連するプロセッサタスクが実行中である（ＲＵＮＮＩＮＧ）と、もはや準備のできた（ＲＥＡＤＹ）状態ではなく、状態図から除去されなければならない。状態図の先頭のエントリとして、タスクテーブルエントリＴ８を残さなければならない。タスクテーブルエントリＴ１がもはやＲＥＡＤＹ状態図の一部でないとき、タスクテーブルエントリＴ８のＰＲＥＶポインタはタスクテーブルエントリＴ９をポイントするように修正されてもよい。このように、アクション３０８において、タスクテーブルが修正可能となるように、タスクテーブルはラベル付けされＳＰＵ１０２のローカルメモリにコピーされる。同様に、タスクテーブルエントリＴ９のＮＥＸＴポインタは、タスクテーブルエントリＴ８をポイントするように修正されてもよい。 According to various aspects of the invention, sub-processing unit 102 maintains and modifies task table 280 and task queue 282 during execution of the software application. In this regard, reference is made to FIGS. These are flow diagrams representing process flows suitable for achieving one or more desirable features of the present invention. At action 300, a particular sub-processing unit 102 is called to begin copying processor task 110 from shared memory 106 to its own local memory. At action 302, sub-processing unit 102 locks task queue 282 and copies task queue 282 to its local memory. Thereafter, the task queue 282 is searched for a task with the highest priority ready (action 304). Using the example shown in FIG. 7, task queue 282 includes a HEAD pointer that points to task table entry T1 associated with the highest priority, eg, priority 1 processor task. Since the processor task associated with task table entry T1 is targeted for execution, sub-processing unit 102 preferably modifies task queue 282 to remove the reference to that processor task (action 306). . In the preferred embodiment, this will cause the HEAD pointer to task table entry T1 to be the new first (or first) task table entry that displays the next processor task to be occupied for execution. With modification to task table entry. In particular, the NEXT pointer of task table entry T1 may be used as a new HEAD pointer with priority 1. In fact, as shown in FIG. 6, when the processor task associated with task table entry T1 is running (RUNNING), it is no longer ready (READY) and must be removed from the state diagram. The task table entry T8 must remain as the first entry in the state diagram. When task table entry T1 is no longer part of the READY state diagram, the PREV pointer of task table entry T8 may be modified to point to task table entry T9. Thus, in action 308, the task table is labeled and copied to the local memory of the SPU 102 so that the task table can be modified. Similarly, the NEXT pointer of task table entry T9 may be modified to point to task table entry T8.

本発明の好ましい態様によると、ＳＰＵ１０２は、タスクテーブルエントリＴ１のＳＴＡＴ値をＲＥＡＤＹからＲＵＮＮＩＮＧに修正することが好ましい（図９のアクション３１０）。アクション３１２として、次のタスクを呼び出すためにＳＰＵ１０２がコール（アクション３００）された時点で、ＳＰＵ１０２が前のタスクを実行しているかに関しての判定がなされることが好ましい。ＳＰＵ１０２上で動作中の前のタスクが他のタスクに譲るとき、これが生じてもよい。本実施例のために、前のタスクが次のプロセッサタスク１１０に実行権を譲渡（yield to）せず、さらに、アクション３１２における判定の結果が否定であったと仮定する。したがって、プロセスフローは好ましくはアクション３１８へ進む。ここで、ＳＰＵ１０２は、修正されたタスクキュー２８２と修正されたタスク・テーブル２８０を共有メモリ１０６に書き戻す。この時点で、タスクテーブル２８０とタスクキュー２８２は更新され、他のサブ処理ユニット１０２によってコピーおよび修正されてもよいように、好ましい同期化技術にしたがってロックが解除される。 According to a preferred aspect of the present invention, SPU 102 preferably modifies the STAT value of task table entry T1 from READY to RUNNING (action 310 in FIG. 9). As action 312, a determination is preferably made as to whether SPU 102 is executing the previous task when SPU 102 is called (action 300) to invoke the next task. This may occur when a previous task running on the SPU 102 yields to another task. For the purposes of this example, assume that the previous task did not yield to the next processor task 110 and that the result of the determination at action 312 was negative. Accordingly, the process flow preferably proceeds to action 318. Here, the SPU 102 writes the modified task queue 282 and the modified task table 280 back to the shared memory 106. At this point, task table 280 and task queue 282 are updated and unlocked according to the preferred synchronization technique so that it may be copied and modified by other sub-processing units 102.

前のプロセッサタスク１１０が次のプロセッサタスクに実行権を譲渡したときのように、アクション３１２における判定結果が肯定である場合、プロセスフローは好ましくはアクション３１４へ進む。そこで、ＳＰＵは、好ましくは、実行権を譲渡するプロセッサタスクに関連付けられているタスクテーブルエントリのＳＴＡＴ値を、ＲＵＮＮＩＮＧからＲＥＡＤＹに修正する。さらに、ＳＰＵは、実行権を譲渡するプロセッサタスクを適当なリンク付きリストに再導入（reintroduce）するために、実行権を譲渡するプロセッサタスクに関連するタスクテーブルエントリを含む様々なタスクテーブルエントリのＰＲＥＶポインタとＮＥＸＴポインタを修正してもよい。好ましくは、関連するタスクテーブルエントリのＰＲＩ値に反映されるように、実行権を譲渡するプロセッサタスク１１０の優先度を参照することによって、これが達成される。アクション３１６で、プロセッサタスクが後に占有されてもよいように、実行権を譲渡するプロセッサタスクが共有メモリ１０６に書き込まれてもよい。その後、プロセスフローはアクション３１８へ進む。そこで、タスクキュー２８２とタスクテーブル２８０は、共有メモリ１０６に書き戻される。 If the determination at action 312 is affirmative, such as when the previous processor task 110 has transferred execution to the next processor task, the process flow preferably proceeds to action 314. Therefore, the SPU preferably modifies the STAT value of the task table entry associated with the processor task to which the execution right is transferred from RUNNING to READY. In addition, the SPU may pre-load various task table entries including task table entries associated with processor tasks to which execution rights are transferred in order to reintroduce the processor tasks to which execution rights are transferred to the appropriate linked list. The pointer and the NEXT pointer may be modified. Preferably, this is accomplished by referring to the priority of the processor task 110 that transfers the execution right as reflected in the PRI value of the associated task table entry. At action 316, a processor task that transfers execution rights may be written to shared memory 106 so that the processor task may be occupied later. Thereafter, the process flow proceeds to action 318. Therefore, the task queue 282 and the task table 280 are written back to the shared memory 106.

アクション３２０（図１０）で、次のプロセッサタスク１１０（例えば、タスクテーブルエントリＴ８に関連するプロセッサタスク）は、サブ処理ユニット１０２によって共有メモリ１０６から自身のローカルメモリにコピーされる。アクション３２２で、サブ処理ユニット１０２は、新たなプロセッサタスク１１０の実行に使用するため、（例えば、新たなプロセッサタスクと関連する任意のデータを有する）自身のレジスタを復元および／または更新することが好ましい。最後に、アクション３２４で、新たなプロセッサタスク１１０は、サブ処理ユニット１０２によって実行される。 At action 320 (FIG. 10), the next processor task 110 (eg, the processor task associated with task table entry T8) is copied by the sub-processing unit 102 from the shared memory 106 to its own local memory. At action 322, the sub-processing unit 102 may restore and / or update its registers (eg, having any data associated with the new processor task) for use in executing the new processor task 110. preferable. Finally, at action 324, the new processor task 110 is executed by the sub-processing unit 102.

上記アクションは例示を目的としてのみ示されており、当業者は、本発明の趣旨および範囲から逸脱することなくこれらのアクションの順序を修正可能であることを理解するであろう。例えば、後述するように、プロセッサタスクが共有メモリからコピーされ共有メモリに書き戻される順序、および、タスクテーブルとタスクキュー２８２が利用される順序は、望ましい結果を達成するために修正可能である。 The above actions are shown for illustrative purposes only, and those skilled in the art will appreciate that the order of these actions can be modified without departing from the spirit and scope of the present invention. For example, as described below, the order in which processor tasks are copied from shared memory and written back to shared memory, and the order in which task tables and task queues 282 are utilized, can be modified to achieve desirable results.

上述のように、主処理ユニット１０２Ａは、プロセッサタスク１１０の実行および管理がサブ処理ユニット１０２によって対処可能となる状態にシステムを準備するために、システムの初期段階の間に利用されることが好ましい。サブ処理ユニット１０２も、第１インスタンスでタスクテーブル２８０とタスクキュー２８２を作成するために、初期化ルーチンを実行することが好ましい。これらの初期化プロセスを、図１１のフロー図に示す。 As described above, the main processing unit 102A is preferably utilized during the initial stages of the system to prepare the system for execution and management of processor tasks 110 that can be handled by the sub-processing unit 102. . Sub-processing unit 102 also preferably executes an initialization routine to create task table 280 and task queue 282 in the first instance. These initialization processes are shown in the flow diagram of FIG.

アクション３５０で、サービスプロセッサ（例えば、主処理ユニット１０２）は、システム上で実行されるソフトウェアアプリケーションを評価して、プロセッサタスク１１０を実行する複数のサブ処理ユニット１０２を割り当てる。プロセスフローは、好ましくはアクション３５２へ進み、そこで、サービスプロセッサがソフトウェアアプリケーションを評価して、プロセッサタスク１１０を受け取るために共有メモリ１０６の一つまたは複数の部分を割り当てる。アクション３５４で、アクション３５２で実行された任意のメモリ・アロケーションにしたがって、プロセッサタスク１１０が共有メモリ１０６にロードされる。初期化プロセスのこの段階においては、サービスプロセッサは、メンテナンスおよび／またはサブ処理ユニット１０２の間でのプロセッサタスクの配分に関与していないことが好ましい。 At action 350, a service processor (eg, main processing unit 102) evaluates a software application running on the system and assigns a plurality of sub-processing units 102 that perform processor task 110. The process flow preferably proceeds to action 352 where the service processor evaluates the software application and allocates one or more portions of the shared memory 106 to receive the processor task 110. At action 354, processor task 110 is loaded into shared memory 106 according to any memory allocation performed at action 352. At this stage of the initialization process, the service processor is preferably not involved in the maintenance and / or distribution of processor tasks among the sub-processing units 102.

プロセスフローは、好ましくはアクション３５６へ進む。そこで、サブ処理ユニット１０２は互いを初期化し、いずれのＳＰＵが第１インスタンスにおいてタスクテーブル２８０とタスクキュー２８２を準備するかを決定する。アクション３５８で、タスクテーブル２８０とタスクキュー２８２を作成する責任を有したサブ処理ユニット１０２は、この種の情報を準備して、同じものを共有メモリ１０６に格納する。例えば、タスクテーブル２８０とタスクキュー２８２の初期化は、好ましくは、各ＳＰＵカーネルに最初のタスクを実行させることによって行われることが好ましい。以下に示すプログラム「init.c」は、各ＳＰＵによって実行される最初のタスクの好適な実施例である。

#include <spurs.h>
#include "task_instance.h"

int
main()
{
spurs_beggin_init();
if（spurs_get_spu_id() = = 0）｛
spurs_create_task(melchior);
spurs_create_task(balthasar);
spurs_create_task(caspar);

spurs_start_task(melchior);
spurs_start_task(balthasar);
spurs_start_task(caspar);
}

spurs_end_init();
return 0;
} Process flow preferably proceeds to action 356. Thus, the sub-processing units 102 initialize each other and determine which SPU prepares the task table 280 and task queue 282 in the first instance. In action 358, the sub-processing unit 102 responsible for creating the task table 280 and task queue 282 prepares this type of information and stores the same in the shared memory 106. For example, initialization of task table 280 and task queue 282 is preferably done by having each SPU kernel execute the first task. The program “init.c” shown below is a preferred embodiment of the first task executed by each SPU.

#include <spurs.h>
#include "task_instance.h"

int
main ()
{
spurs_beggin_init ();
if (spurs_get_spu_id () = = 0) {
spurs_create_task (melchior);
spurs_create_task (balthasar);
spurs_create_task (caspar);

spurs_start_task (melchior);
spurs_start_task (balthasar);
spurs_start_task (caspar);
}

spurs_end_init ();
return 0;
}

このプログラムにおいて、「melchior」、「balthasar」および「caspar」は非常に初期のタスクの名称であり、これらは典型的なスタートアップタスクである。ＳＰＵカーネルの全てがこの初期タスクinit.cを実行する。しかし、一つのＳＰＵ（ＩＤ０を持つＳＰＵ）だけは、if(spurs_get_spu_id () = = 0）のコードラインで指定されるこれらのタスクを実行する。他のＳＰＵの全て、例えば異なるＩＤを持つＳＰＵの全ては、spurs_end_init()で待機する。このように、各ＳＰＵカーネルは最初のタスクを実行し、この最初のタスクが終了した後、本明細書で述べるように、ＳＰＵカーネルは次のタスクの検索を開始する。 In this program, “melchior”, “balthasar” and “caspar” are very early task names, which are typical startup tasks. All of the SPU kernels execute this initial task init.c. However, only one SPU (SPU having ID0) executes these tasks specified by the code line of if (spurs_get_spu_id () == 0). All other SPUs, for example all SPUs with different IDs, wait in spurs_end_init (). Thus, each SPU kernel executes the first task, and after this first task is finished, the SPU kernel begins searching for the next task as described herein.

上述のように、サービス・プロセッサとして動作する主処理ユニット１０２は、一つまたは複数のプロセッサタスク１１０をグループとして指定してもよいことに注意する。これは、初期化段階の間に実行されることが好ましい。例えば、二つ以上のプロセッサタスク１１０が互いに緊密に通信する必要があってもよく、したがって、それらがタスクグループ内でグループ化されている場合、プロセッサタスクをより効率的に実行することができる。暗号化プログラムは、プロセッサタスクが一つ以上のタスクグループに形成されている場合、緊密に通信しより効率的に実行されるプロセッサタスクを含むことができるアプリケーションの一例である。 As noted above, it should be noted that the main processing unit 102 operating as a service processor may designate one or more processor tasks 110 as a group. This is preferably performed during the initialization phase. For example, two or more processor tasks 110 may need to communicate closely with each other, and therefore, processor tasks can be performed more efficiently when they are grouped within a task group. An encryption program is an example of an application that can include processor tasks that communicate more closely and execute more efficiently if the processor tasks are formed in one or more task groups.

本発明のプロセッサタスク管理の特徴を利用して、主処理ユニット１０２Ａが、特定のサブ処理ユニット１０２またはサブ処理ユニット１０２のグループのデバイスドライバをオフロード（off-load）するのを助けることができる。例えば、ギガビットイーサネットハンドラ（イーサネットは登録商標）のようなネットワークインタフェースは、ＣＰＵパワーの８０％まで利用することができる。ネットワークインタフェースが主処理ユニット１０２Ａによってのみ実行される場合、主処理ユニットは１０２Ａは、他のサービス指向の処理タスクをするために利用することができない。したがって、主処理ユニットが１０２Ａが、一つまたは複数のサブ処理ユニット１０２にネットワークインタフェースプログラムをオフロードすることは、有益でありえる。主処理ユニット１０２Ａは、ネットワークインタフェースの処理タスクを共有メモリ１０６に配置し、これを実行する一つまたは複数のサブ処理ユニット１０２を割り当てることによって、この結果を達成してもよい。それに応じて、ＳＰＵは、そのようなプロセッサタスクの実行を管理しスケジューリングするのに適したタスクテーブル２８０とタスクキュー２８２を形成してもよい。したがって、有利なことには、主処理ユニット１０２Ａは、より多くのＣＰＵパワーを他のタスクの実行に充てることができる。主処理ユニット１０２Ａは、また、例えばデジタルテレビデバイスドライバなどの他のデバイスドライバをオフロードしてもよい。ＳＰＵにオフロードさせるための好ましい候補である他のデバイスドライバは、重いプロトコルスタックを有するものである。例えば、ＨＤＤレコーダなどのリアルタイム高速アクセスデバイス用のドライバは、オフロードされると有利である。オフロードされてもよいタスクの他の実施例には、仮想プライベートネットワークおよびＩＰ（例えばＶｏＩＰ）アプリケーションを介したマルチメディアに使用されるネットワークパケット暗号化／解読タスクが含まれる。 The processor task management features of the present invention can be used to help main processing unit 102A off-load a particular sub-processing unit 102 or group of sub-processing unit 102 device drivers. . For example, a network interface such as a Gigabit Ethernet handler (Ethernet is a registered trademark) can use up to 80% of the CPU power. If the network interface is executed only by the main processing unit 102A, the main processing unit 102A cannot be used for other service-oriented processing tasks. Thus, it may be beneficial for the main processing unit 102A to offload the network interface program to one or more sub-processing units 102. The main processing unit 102A may achieve this result by placing processing tasks for the network interface in the shared memory 106 and assigning one or more sub-processing units 102 to execute it. In response, the SPU may form a task table 280 and a task queue 282 suitable for managing and scheduling the execution of such processor tasks. Thus, advantageously, the main processing unit 102A can allocate more CPU power to the execution of other tasks. The main processing unit 102A may also offload other device drivers such as, for example, digital television device drivers. Other device drivers that are preferred candidates for offloading to the SPU are those with heavy protocol stacks. For example, drivers for real-time fast access devices such as HDD recorders are advantageously offloaded. Other examples of tasks that may be offloaded include network packet encryption / decryption tasks used for multimedia over virtual private networks and IP (eg, VoIP) applications.

図１２を参照して、プロセッサタスクのステータスの状態図の一例を示す。タスク状態は、実行（ＲＵＮＮＩＮＧ）状態、準備（ＲＥＡＤＹ）状態、ブロック（ＢＬＯＣＫＥＤ）状態、休止（ＤＯＲＭＡＮＴ）状態、および不在（ＮＯＮ−ＥＸＩＳＴＥＮＴ）状態の５つのカテゴリに分類することができる。プロセッサタスクは、現在実行中であるときは、実行（ＲＵＮＮＩＮＧ）状態にある。ある条件下では、プロセッサタスクは、例えば割り込みの間、タスクコンテクストがなくても、実行状態を維持することができる。プロセッサタスクは、タスクの実行の準備ができているが、より高い優先順位を有する一つまたは複数のプロセッサタスクが既に実行されており、またサブ処理ユニットがタスクの占有のために利用できないため、実行することができないときに、準備（ＲＥＡＤＹ）状態にある。準備状態のプロセッサタスクの優先度が共有メモリ１０６の準備状態タスクのプールの中で十分に高い場合、サブ処理ユニットはそのプロセッサタスクを占有しそれを実行してもよい。このように、タスクがディスパッチされる（dispatch）とき、プロセッサタスクの状態は準備状態から実行状態に変化してもよい。反対に、この種のタスクがプリエンプティブ実行（先取り）されるかまたはその実行の間に侵害される場合、実行状態のタスクは準備状態に変化してもよい。プロセッサタスクのプリエンプティブ実行の実施例は、一つのプロセッサタスクが別のタスクに実行権を譲渡することに関連して、既に述べた。 With reference to FIG. 12, an example of a state diagram of the status of the processor task is shown. Task states can be divided into five categories: run (RUNNING) state, ready (READY) state, block (BLOCKED) state, dormant (DORMANT) state, and absent (NON-EXISTENT) state. A processor task is in a running (RUNNING) state when it is currently running. Under certain conditions, a processor task can maintain an execution state, for example, during an interrupt, without a task context. The processor task is ready to execute the task, but one or more processor tasks with higher priority are already running and the sub-processing unit is not available due to task occupancy, When it cannot be executed, it is in a ready state. If the priority of a ready processor task is sufficiently high in the pool of ready tasks in shared memory 106, the sub-processing unit may occupy and execute that processor task. Thus, when a task is dispatched (dispatch), the state of the processor task may change from the preparation state to the execution state. Conversely, if this type of task is preemptively executed (preempted) or violated during its execution, the task in the running state may change to the ready state. An example of preemptive execution of a processor task has already been described in connection with the transfer of execution rights from one processor task to another.

ブロック状態カテゴリは、待機（ＷＡＩＴＩＮＧ）状態、中断（ＳＵＳＰＥＮＤＥＤ）状態、および待機中断（ＷＡＩＴＩＮＧ−ＳＵＳＰＥＮＤＥＤ）状態を含んでもよい。タスクの実行を継続する前に特定の条件が満足されなければならないことを規定するサービスコールの呼び出しのために、タスクがブロックされるとき、プロセッサタスクは待機（ＷＡＩＴＩＮＧ）状態にある。このように、タスクの実行状態は、サービスコールの呼び出し時に待機状態に変化してもよい。待機状態のプロセッサタスクは、規定された条件が満足するとき準備状態に解放されてもよく、これによって、その後に、処理中のタスクがサブ処理ユニット１０２に占有されることが可能になる。タスクが強制的に停止される（タスクそれ自身が呼び出してもよい）とき、プロセッサタスクは実行状態から中断（ＳＵＳＰＥＮＤＥＤ）状態に入ってもよい。同様に、準備状態のプロセッサタスクは、強制されたアクションによって中断状態に入ってもよい。このようなプロセッサタスクの強制的な停止が開放されるとき、中断状態のプロセッサタスクが再開され準備状態に入ってもよい。プロセッサタスクは、タスクが満足するべき条件を待機しておりまた強制的に中断させられているとき、待機中断（ＷＡＩＴＩＮＧ−ＳＵＳＰＥＮＤＥＤ）状態にある。したがって、待機中断状態のプロセッサタスクは、プロセッサタスクの強制的中断時に待機状態に入ってもよく、ここでプロセッサタスクは満足すべき条件を待機する。 The block state category may include a wait (WAITING) state, a suspended (SUSPENDED) state, and a wait suspended (WAITING-SUPENDED) state. A processor task is in a WAITING state when the task is blocked because of a service call invocation that specifies that certain conditions must be satisfied before continuing execution of the task. Thus, the task execution state may change to a standby state when a service call is invoked. A waiting processor task may be released to a ready state when a specified condition is satisfied, thereby allowing the sub-processing unit 102 to subsequently occupy the task being processed. When a task is forcibly stopped (the task itself may be called), the processor task may enter the suspended (SUSPENDED) state from the running state. Similarly, a ready processor task may enter a suspended state by a forced action. When such a forced stop of the processor task is released, the suspended processor task may be resumed and enter the ready state. A processor task is in a WAITING-SUPENDED state when it is waiting for a condition that the task should satisfy and is forcibly suspended. Accordingly, a processor task that is in a suspended state may enter a waiting state when the processor task is forcibly suspended, where the processor task waits for a condition to be satisfied.

タスクが実行されていないかまたは既にその実行を終了したとき、プロセッサタスクは休止（ＤＯＲＭＡＮＴ）状態にある。休止状態のプロセッサタスクは、適当な状況の下で準備状態に入ってもよい。不在（ＮＯＮ−ＥＸＩＳＴＥＮＴ）状態は、例えば、タスクがまだ作成されていなかったりまたは既に削除されているなどによって、タスクがシステム内に存在しないいわゆる仮想状態を指す。 A processor task is in a dormant state when the task is not running or has already finished its execution. A dormant processor task may enter a ready state under appropriate circumstances. The absence (NON-EXISTENT) state refers to a so-called virtual state in which the task does not exist in the system, for example, because the task has not yet been created or has already been deleted.

準備状態へ移動したタスクが実行状態のタスクより高い優先順位（または、優先度）を有する場合、より低い優先順位のタスクが準備状態へ移動し、より高い優先度のタスクがディスパッチされて実行状態へ移動することが好ましい。この状況において、より低い優先度のタスクは、より高い優先度のタスクによってプリエンプティブ実行されている。 If a task that has moved to the ready state has a higher priority (or priority) than a task in the running state, the task with the lower priority moves to the ready state and the higher priority task is dispatched to the running state It is preferable to move to. In this situation, lower priority tasks are preemptively executed by higher priority tasks.

ノン・プリエンプティブな（non-preemptive）、優先度に基づくタスクスケジューリングは、プロセッサタスクに割り当てられた優先度に基づいてなされる。同じ優先度を有する多くのプロセッサタスクがある場合、スケジューリングは、最初に来て最初に役立つ（first-come, first-served：ＦＣＦＳ）ことを基準に実行される。このタスクスケジューリングの規則は、タスク優先度に基づいたタスク間の優先順位を使用して定められてもよい。実行可能なタスクが存在する場合、せいぜい、高い優先順位のタスクと同数の割り当てられたサブ処理ユニット１０２が実行状態にある。実行可能なタスクの残りは、準備状態にある。異なる優先度を有するタスクの中で、最も高い優先度を有するタスクは、より高い優先順位を有する。同じ優先度のタスクの中で、最も早く実行可能な（実行中または準備ができている）状態に入ったプロセッサタスクは、より高い優先順位を有する。しかしながら、同じ優先度のタスク間の優先順位は、いくつかのサービスコールの呼び出しのせいで変化してもよい。プロセッサタスクが他のプロセッサタスクに対して優先順位を与えられると、ディスパッチが直ちに発生し、タスクが実行状態に移動することが好ましい。 Non-preemptive priority-based task scheduling is based on the priority assigned to the processor task. If there are many processor tasks with the same priority, scheduling is performed on the basis of first-come, first-served (FCFS) coming first. This task scheduling rule may be defined using priorities among tasks based on task priority. If there are tasks that can be executed, at most, the same number of assigned sub-processing units 102 as high priority tasks are in the execution state. The rest of the tasks that can be performed are in a ready state. Among the tasks having different priorities, the task having the highest priority has a higher priority. Among tasks of the same priority, the processor task that enters the state that can be executed earliest (running or ready) has a higher priority. However, the priority between tasks of the same priority may change due to the invocation of several service calls. Preferably, when a processor task is given priority over other processor tasks, dispatching occurs immediately and the task moves to the running state.

図１３と図１４を参照して、本発明の特定の態様にしたがった特定のプリエンプション（preemption）特徴を示す。上述のように、実行状態にあるプロセッサタスク（例えばタスクＡ）はプリエンプティブ実行されるか、または準備状態にある別のプロセッサタスク（例えばタスクＢ）に実行権を譲渡してもよい。図１３と図１４に示すように、タスクＡは、実行権の譲渡の時点まで、サブ処理ユニット１０２で実行されている。この時点で、ＳＰＵのカーネルは、共有メモリ１０６にタスクＡをコピーする（タスクＡを保存する）よう動作する。その後、タスクＢは、共有メモリ１０６からＳＰＵのローカルメモリにコピーされる（タスクＢを復元する）。そして、ＳＰＵはタスクＢを実行する。この技術は、ローカルメモリの使用量および高い帯域幅については比較的高い性能を享受する一方、実行権を譲渡した時点から最適化されないタスクＢの実行の時点までのタスク実行レイテンシが存在する。 With reference to FIGS. 13 and 14, certain preemption features in accordance with certain aspects of the present invention are illustrated. As described above, a processor task in an execution state (eg, task A) may be preemptively executed, or the execution right may be transferred to another processor task (eg, task B) in a preparation state. As shown in FIGS. 13 and 14, the task A is executed by the sub-processing unit 102 until the execution right is transferred. At this point, the SPU kernel operates to copy task A to the shared memory 106 (save task A). Thereafter, task B is copied from the shared memory 106 to the local memory of the SPU (restore task B). Then, the SPU executes task B. While this technology enjoys relatively high performance for local memory usage and high bandwidth, there is task execution latency from the time when the execution right is transferred to the time when task B is not optimized.

図１５と図１６を参照して、本発明のさらなる態様による代替的方法を示す。このシナリオにおいて、タスクＡをローカルメモリから共有メモリ１０６にコピーする前に、タスクＢを共有メモリ１０６からサブ処理ユニット１０２のローカルメモリにコピーしてもよい。この点に関して、サブ処理ユニット１０２は、共有メモリ１０６からタスクＢを特定して読み出すための処置を同時にとりながら、タスクＡを実行してもよい。これは、タスクテーブル２８０とタスクキュー２８２を共有メモリ１０６からサブ処理ユニット１０２Ａのローカルメモリにコピーし、それらを用いて次の準備状態のタスク、すなわちタスクＢを特定することを伴う。実行権の譲渡の時点で、サブ処理ユニット１０２Ａのカーネルは、ローカルメモリから共有メモリ１０６にタスクＡをコピーするが、これは、上述のようにタスクテーブル２８０とタスクキュー２８２を修正することを伴ってもよい。その後、サブ処理ユニット１０２はタスクＢの実行を占有してもよい。この技術は、図１３と図１４に示した技術と比較して、実行権の譲渡とタスクＢの実行との間のレイテンシを大きく削減する。 With reference to FIGS. 15 and 16, an alternative method according to a further aspect of the present invention is shown. In this scenario, task B may be copied from shared memory 106 to local memory of sub-processing unit 102 before task A is copied from local memory to shared memory 106. In this regard, the sub-processing unit 102 may execute the task A while taking actions to identify and read the task B from the shared memory 106 at the same time. This involves copying the task table 280 and task queue 282 from the shared memory 106 to the local memory of the sub-processing unit 102A and using them to identify the next ready task, task B. At the time of transfer of execution rights, the kernel of sub-processing unit 102A copies task A from local memory to shared memory 106, which involves modifying task table 280 and task queue 282 as described above. May be. Thereafter, the sub-processing unit 102 may occupy execution of task B. This technique greatly reduces the latency between the transfer of execution rights and the execution of task B, as compared to the techniques shown in FIGS.

図１７と図１８を参照して、本発明の一つまたは複数のさらなる態様にしたがって、実行権の譲渡とタスクＢの実行との間のレイテンシをさらに削減することもできる。より詳細には、実行権の譲渡の時点までは、図１５と図１６に関して先に述べたのと実質的に同様な方法でサブ処理ユニット１０２が動作してよい。しかしながら、実行権の譲渡の後、サブ処理ユニット１０２はタスクＢの実行を開始するのが好ましい。実質的に同時に、サブ処理ユニット１０２のカーネルは、タスクＡをサブ処理ユニット１０２のローカルメモリから共有メモリ１０６にコピーするように動作することが好ましい。タスクＢが実行件の譲渡後すぐに実行されるので、図１４から図１６に示した方法と比較してレイテンシが大きく削減される。 With reference to FIGS. 17 and 18, the latency between execution rights transfer and task B execution may be further reduced in accordance with one or more further aspects of the present invention. More specifically, up to the point of execution right transfer, sub-processing unit 102 may operate in a manner substantially similar to that described above with respect to FIGS. However, it is preferable that the sub-processing unit 102 starts execution of the task B after the execution right is transferred. Substantially simultaneously, the kernel of sub-processing unit 102 preferably operates to copy task A from the local memory of sub-processing unit 102 to shared memory 106. Since task B is executed immediately after the execution is transferred, the latency is greatly reduced as compared with the methods shown in FIGS.

本発明の一つまたは複数の態様によると、サブ処理ユニット１０２は、実行のためにローカルメモリ内の多数のプロセッサタスクを維持してもよい。これは、図１９で示される。多数のプロセッサタスクの実行を管理するために、ローカルメモリは、複数のページとページテーブルを含んでもよい。この方法の利点は、レイテンシがさらに削減できるという点である。但し、その欠点の一つとして、ローカルメモリ内の相当多くの空間がプロセスタスクの実行によって独占される。 According to one or more aspects of the present invention, sub-processing unit 102 may maintain multiple processor tasks in local memory for execution. This is shown in FIG. To manage the execution of multiple processor tasks, the local memory may include multiple pages and page tables. The advantage of this method is that the latency can be further reduced. However, one drawback is that a considerable amount of space in the local memory is monopolized by the execution of process tasks.

図２０から図２２を参照して、本発明の一つまたは複数の態様によるプロセッサタスクの移動（マイグレーション）を示す。これらの図は、プロセッサタスク、例えばタスクＢが、サブ処理ユニットＳＰＵ１から別のサブ処理ユニットＳＰＵ２に移動される方法を示している。移動は、何らかの条件、例えばそれぞれのプロセッサタスクと関連付けられるそれぞれの優先度に基づいてもよい。本発明のいくつかの態様によれば、一つのサブ処理ユニットから別のサブ処理ユニットへのプロセッサタスクの移動は、プリエンプティブでなくてもよい。言い換えると、プロセッサタスクの移動は、優先度条件とタイミングの結果として自然に起こり、移動を引き起こす何らかの決定に基づくのではない。 With reference to FIGS. 20-22, migration of processor tasks in accordance with one or more aspects of the present invention is illustrated. These figures show how a processor task, for example task B, is moved from sub-processing unit SPU1 to another sub-processing unit SPU2. The movement may be based on some condition, eg, each priority associated with each processor task. According to some aspects of the invention, the movement of a processor task from one sub-processing unit to another may not be preemptive. In other words, movement of processor tasks occurs naturally as a result of priority conditions and timing and is not based on any decision that causes movement.

このノン・プリエンプティブな移動は、以下の実施例で示すことができる。タスクテーブルを利用している共有メモリ１０６からプロセッサタスクＢが選択され、このタスクテーブルが、実行の準備ができているプロセッサタスクの優先度順序を表す、と仮定する。タスクＢは、サブ処理ユニットＳＰＵ１上で動作している。同様に、タスクテーブルにしたがって共有メモリ１０６からプロセッサタスクＣが選択され、サブ処理ユニットＳＰＵ２上で動作していると仮定する。プロセッサタスクＢとプロセッサタスクＣが選択された時点で、より高い優先度のプロセッサタスクＡは実行の準備ができておらず、したがって実行のために選択されたなかったと仮定する。しかしながら、プロセッサタスクＢとプロセッサタスクＣが動作している間に、プロセッサタスクＡの実行の準備ができると仮定する。 This non-preemptive movement can be demonstrated in the following example. Assume that processor task B is selected from shared memory 106 using a task table, and that this task table represents the priority order of processor tasks ready for execution. Task B is operating on the sub-processing unit SPU1. Similarly, it is assumed that the processor task C is selected from the shared memory 106 according to the task table and is operating on the sub-processing unit SPU2. Assume that when processor task B and processor task C are selected, higher priority processor task A is not ready for execution and therefore has not been selected for execution. However, assume that processor task A and processor task C are ready for execution of processor task A while they are operating.

図２１を参照して、プロセッサタスクＢは、サブ処理ユニットＳＰＵ１の実行権を譲渡（yield）してもよい。プロセッサタスクＢによるこの譲渡アクションは、実行権の譲渡がソフトウェアアプリケーションの全体的な実行に有益であると判断したプログラマにより発生してもよい。いずれの場合も、サブ処理ユニットＳＰＵ１は、共有メモリ１０６にプロセッサタスクＢを書き込み、タスクテーブルを更新することによって、この実行権の譲渡に応答する。サブ処理ユニットＳＰＵ１はまた、タスクテーブルにアクセスし、共有メモリ１０６内の複数のプロセッサタスクのうちいずれをコピーし実行するべきかを判定する。この例では、タスクテーブルによるとプロセッサタスクＡが最も高い優先度を持ち、したがって、サブ処理ユニットＳＰ１は、プロセッサタスクＡを実行のために共有メモリ１０６から自身のローカルメモリにコピーする。この時点で、サブ処理ユニットＳＰ１はプロセッサタスクＡを実行し、サブ処理ユニットＳＰ２はプロセッサタスクＣの実行を継続する。 Referring to FIG. 21, processor task B may yield the right to execute sub-processing unit SPU1. This transfer action by processor task B may occur by a programmer who has determined that transfer of execution rights is beneficial to the overall execution of the software application. In either case, the sub-processing unit SPU1 responds to this transfer of execution rights by writing the processor task B to the shared memory 106 and updating the task table. The sub-processing unit SPU1 also accesses the task table and determines which of a plurality of processor tasks in the shared memory 106 should be copied and executed. In this example, the processor task A has the highest priority according to the task table, and therefore the sub-processing unit SP1 copies the processor task A from the shared memory 106 to its own local memory for execution. At this point, the sub processing unit SP1 executes the processor task A, and the sub processing unit SP2 continues to execute the processor task C.

図２２をさらに参照して、プロセッサタスクＣは、他のプロセッサタスクにサブ処理ユニットＳＰＵ２の実行権を譲渡してもよい。また、この実行権の譲渡は、プログラム命令および／またはプロセッサタスクＣの条件によって呼び出されてもよい。いずれの場合も、サブ処理ユニットＳＰ２はプロセッサタスクＣを共有メモリ１０６に書き戻し、タスクテーブルを更新する。サブ処理ユニットＳＰ２はまた、タスクテーブルにアクセスし、実行の準備ができているプロセッサタスクのうちいずれをコピーするかを決定する。この実施例では、プロセッサタスクＢが実行の準備ができており、また実行の準備ができている複数のプロセッサタスクの中で最も高い優先度を持つ。したがって、サブ処理ユニットＳＰＵ２は、実行のためにプロセッサタスクＢを共有メモリ１０６から自身のローカルメモリにコピーする。 Still referring to FIG. 22, processor task C may transfer the execution right of sub-processing unit SPU2 to another processor task. Also, this transfer of execution rights may be invoked by program instructions and / or processor task C conditions. In either case, the sub-processing unit SP2 writes the processor task C back to the shared memory 106 and updates the task table. The sub-processing unit SP2 also accesses the task table and determines which of the processor tasks that are ready for execution is copied. In this embodiment, processor task B is ready for execution and has the highest priority among the plurality of processor tasks ready for execution. Therefore, the sub-processing unit SPU2 copies the processor task B from the shared memory 106 to its own local memory for execution.

図２０に示した処理状況と図２２に示した処理状況を比較すると、プロセッサタスクＢがサブ処理ユニットＳＰＵ１からサブ処理ユニットＳＰＵ２に移動したことが分かる。 Comparing the processing status shown in FIG. 20 with the processing status shown in FIG. 22, it can be seen that the processor task B has moved from the sub-processing unit SPU1 to the sub-processing unit SPU2.

図２３および図２４を参照して、本発明のプリエンプティブなマルチタスク態様を示す。本発明のこれらの態様は、例えばサブ処理ユニットＳＰＵ２であるサブ処理ユニット上で実行中の低い優先度のプロセッサタスクが、より高い優先度のプロセッサタスク、例えばプロセッサタスクＡでプリエンプティブに置換されてもよいことを述べる。より詳細には、プロセッサタスクＢはサブ処理ユニットＳＰＵ１上で動作していてもよいし、プロセッサタスクＣはサブ処理ユニットＳＰＵ２（図２３）上で動作していてもよい。その後、より高い優先度タスク、つまりタスクＡの実行の準備ができるようになってもよい。これは、システムの他のサブ処理ユニットによる何らかのアクションを原因として発生してもよい。 23 and 24, the preemptive multitasking aspect of the present invention is shown. These aspects of the present invention are such that, for example, a lower priority processor task executing on a sub processing unit, eg, sub processing unit SPU2, is replaced preemptively by a higher priority processor task, eg, processor task A State good things. More specifically, the processor task B may be operating on the sub-processing unit SPU1, and the processor task C may be operating on the sub-processing unit SPU2 (FIG. 23). Thereafter, the higher priority task, that is, task A may be prepared for execution. This may occur due to some action by other sub-processing units of the system.

説明のため、例えばプロセッサタスクＢの実行の結果として、サブ処理ユニットＳＰＵ１がタスクＡのステータスを実行状態に変更したと仮定する。その結果、サブ処理ユニットＳＰＵ１は、プロセッサタスクＡの優先度が、他のサブ処理ユニット上で動作中のプロセッサタスクのあらゆる優先度より高いか否かについての判定をすることが好ましい。この単純化されたケースにおいて、サブ処理ユニットＳＰＵ１は、プロセッサタスクＡがプロセッサタスクＣより高い優先度を持つか否かについて判定を行う。優先度が高い場合、サブ処理ユニットＳＰＵ１は、少なくともプロセッサタスクＣをプロセッサタスクＡで置換することを開始する。言い換えると、サブ処理ユニットＳＰＵ１は、プロセッサタスクＣにサブ処理ユニット１０２を明け渡してプロセッサタスクＡに与えさせる。この点に関して、サブ処理ユニットＳＰＵ１のカーネルは、サブ処理ユニットＳＰＵ２のカーネルに割り込みを発行してもよい。割り込みに応答して、サブ処理ユニットＳＰＵ２は、共有メモリ１０６にプロセッサタスクＣを書き戻して、タスクテーブル（図２４）を更新してもよい。サブ処理ユニットＳＰＵ２はまた、実行のためにプロセッサタスクＡを共有メモリから自身のローカルメモリにコピーしてもよい。 For the sake of explanation, it is assumed that, for example, as a result of execution of the processor task B, the sub-processing unit SPU1 has changed the status of the task A to an execution state. As a result, the sub processing unit SPU1 preferably determines whether or not the priority of the processor task A is higher than any priority of the processor tasks operating on other sub processing units. In this simplified case, the sub-processing unit SPU1 determines whether the processor task A has a higher priority than the processor task C. If the priority is high, the sub-processing unit SPU1 starts replacing at least the processor task C with the processor task A. In other words, the sub processing unit SPU1 yields the sub processing unit 102 to the processor task C and gives it to the processor task A. In this regard, the kernel of the sub-processing unit SPU1 may issue an interrupt to the kernel of the sub-processing unit SPU2. In response to the interrupt, the sub-processing unit SPU2 may update the task table (FIG. 24) by writing the processor task C back to the shared memory 106. Sub-processing unit SPU2 may also copy processor task A from shared memory to its local memory for execution.

図２５と図２６を参照して、本発明の特定の直接移動態様を示す。これらの態様は、サブ処理ユニットの一つで実行中のより高い優先度のプロセッサタスクが、より低い優先度のプロセッサタスクを実行している別のサブ処理ユニットに移動されてもよいことを述べる。この移動は、より高い優先度のプロセッサタスクを実行しているサブ処理ユニットによって受け取られる直接の割り込みに応答してなされてもよい。図２５を参照して、サブ処理ユニットＳＰＵ１は、他のいくつかのタスクを実行しなければならないことを示す割り込みを受け取ってもよい。この割り込みは、サブ処理ユニットＳＰＵ１に、システムの他のサブ処理ユニットのいずれがより低い優先度のプロセッサタスクを実行しているかについての判定をさせるようにしてもよい。その場合、このようなサブ処理ユニットは、より高い優先度の処理タスクを支持して、プロセッサタスクの実行権を譲渡してもよい。より詳細には、サブ処理ユニットＳＰＵ１は、サブ処理ユニットＳＰＵ２がプロセッサタスクＡより低い優先度のプロセッサタスク（例えばプロセッサタスクＢ）を実行していると判定する場合、サブ処理ユニットＳＰＵ１のカーネルは、サブ処理ユニットＳＰＵ２のカーネルに割り込みを発行することが好ましい。割り込みに応答して、サブ処理ユニットＳＰＵ２は、プロセッサタスクＢを自身のローカルメモリから共有メモリ１０６に書き戻して、タスクテーブルを更新する。サブ処理ユニットＳＰＵ２はまた、実行のためにプロセッサタスクＡをサブ処理ユニットＳＰＵ１のローカルメモリから自身のローカルメモリにコピーする（または、移動する）ことが好ましい。 With reference to FIGS. 25 and 26, a particular direct movement aspect of the present invention is shown. These aspects state that a higher priority processor task executing in one of the sub-processing units may be moved to another sub-processing unit executing a lower priority processor task. . This movement may be made in response to a direct interrupt received by a sub-processing unit executing a higher priority processor task. Referring to FIG. 25, the sub-processing unit SPU1 may receive an interrupt indicating that some other task must be performed. This interrupt may cause the sub-processing unit SPU1 to determine which of the other sub-processing units of the system is executing a lower priority processor task. In such a case, such a sub-processing unit may transfer the right to execute the processor task in support of a higher priority processing task. More specifically, when the sub processing unit SPU1 determines that the sub processing unit SPU2 is executing a processor task having a lower priority than the processor task A (for example, processor task B), the kernel of the sub processing unit SPU1 is It is preferable to issue an interrupt to the kernel of the sub-processing unit SPU2. In response to the interrupt, the sub-processing unit SPU2 writes back the processor task B from its local memory to the shared memory 106, and updates the task table. Sub-processing unit SPU2 also preferably copies (or moves) processor task A from the local memory of sub-processing unit SPU1 to its local memory for execution.

図２７は、処理ユニット（ＰＵ）が本発明の一態様にしたがって割り込みを処理する方法について示す。第１のステップにおいて、ＰＵは割り込みを受け取る。ＰＵは、いずれのサブ処理ユニット（この場合、ＳＰＵ０、ＳＰＵ１、ＳＰＵ２のグループ）が最も低い優先度を有しているかを判定する。そして、ＰＵは最も低い優先度を有するＳＰＵに割り込みを送信する。図２７の場合、ＳＰＵ２が最も低い優先度を持つので、ＰＵはＳＰＵ２に割り込みを送信する。 FIG. 27 illustrates how a processing unit (PU) handles an interrupt according to one aspect of the present invention. In the first step, the PU receives an interrupt. The PU determines which sub-processing unit (in this case, a group of SPU0, SPU1, and SPU2) has the lowest priority. The PU then sends an interrupt to the SPU having the lowest priority. In the case of FIG. 27, since SPU2 has the lowest priority, the PU transmits an interrupt to SPU2.

本発明の一つまたは複数のさらなる態様によると、サブ処理ユニットの一つから別のサブ処理ユニットへの割り込みは、多数の方法で処理することができる。図２８を参照して、本発明の一実施形態において、一つのサブ処理ユニットが、システム内の他のサブ処理ユニットのいずれかに対する割り込みを管理するよう指定されていてもよい。指定されたサブ処理ユニットは、このようなタスク移動割り込みの全てを受け取り、それらを自ら処理するか、または他のサブ処理ユニットにその割り込みを渡すかについて判定をする。例えば、割り込みが指定されたサブ処理ユニットに向けられたものであった場合、指定されたサブ処理ユニット自身でその割り込みを処理してもよい。代替的に、割り込みが指定されたサブ処理ユニットに向けられたものでなかった場合、指定されたサブ処理ユニットは、その割り込みを、最も低い優先度のプロセッサタスクを実行しているサブ処理ユニットに送信してもよい。 According to one or more further aspects of the present invention, an interrupt from one sub-processing unit to another sub-processing unit can be handled in a number of ways. With reference to FIG. 28, in one embodiment of the present invention, one sub-processing unit may be designated to manage interrupts to any of the other sub-processing units in the system. The designated sub-processing unit receives all such task movement interrupts and determines whether to process them themselves or pass the interrupt to another sub-processing unit. For example, when an interrupt is directed to a designated sub-processing unit, the designated sub-processing unit itself may process the interrupt. Alternatively, if the interrupt was not directed to the designated sub-processing unit, the designated sub-processing unit sends the interrupt to the sub-processing unit executing the lowest priority processor task. You may send it.

図２９は、分散割り込み処理スキームが使用される別の方法を示す。この技術によると、それぞれの割り込みは各サブ処理ユニットに割り当てられる。例えば、割り込みＡがサブ処理ユニットＳＰＵ０に割り当てられてもよい。割り込みＢ、Ｃがサブ処理ユニットＳＰＵ１に割り当てられ、割り込みＤ、Ｅ、Ｆがサブ処理ユニットＳＰＵ２に割り当てられてもよい。 FIG. 29 illustrates another way in which a distributed interrupt handling scheme is used. According to this technique, each interrupt is assigned to each sub-processing unit. For example, the interrupt A may be assigned to the sub processing unit SPU0. Interrupts B and C may be assigned to the sub-processing unit SPU1, and interrupts D, E, and F may be assigned to the sub-processing unit SPU2.

図２３から図２６を参照して述べた説明では、システムの他のサブ処理ユニット上で動作中のるプロセッサタスクの優先度を、サブ処理ユニットが判定可能である必要があった。本発明の一実施形態によれば、サブ処理ユニットは、実行中のプロセッサタスクの優先度を判定する際に、共有のタスク優先度テーブルを利用してもよい。共有タスク優先度テーブルは共有メモリに置かれていてもよく、サブ処理ユニット識別子およびプロセッサタスク優先度識別子のための複数のエントリを含んでいてもよい。例えば、サブ処理ユニット識別子は、サブ処理ユニットに固有の数字および／または英数字のコードであってもよい。プロセッサタスク優先度識別子は、好ましくは、実行中の特定のプロセッサタスクの優先度を示す。共有タスク優先度テーブルの各エントリは、サブ処理ユニット識別子と優先度識別子のペアを含むことが好ましく、これらは、関連付けられたサブ処理ユニット上で実行中の所与のプロセッサタスクの優先度を示す。このように、実行中のプロセッサタスクの優先度を判定しようとしているサブ処理ユニットは、共有タスク優先度テーブルにアクセスして、より低い優先度のプロセッサタスクを実行しているサブ処理ユニットを発見してもよい。最も低い優先度のプロセッサタスクを実行中のサブ処理ユニットが特定されて、より高い優先度のプロセッサタスクに実行権を譲渡することが好ましい。 In the description given with reference to FIGS. 23 to 26, it is necessary for the sub processing unit to be able to determine the priority of the processor task operating on the other sub processing units of the system. According to one embodiment of the present invention, the sub-processing unit may utilize a shared task priority table when determining the priority of the processor task being executed. The shared task priority table may be located in shared memory and may include multiple entries for sub-processing unit identifiers and processor task priority identifiers. For example, the sub-processing unit identifier may be a numeric and / or alphanumeric code unique to the sub-processing unit. The processor task priority identifier preferably indicates the priority of the particular processor task being executed. Each entry in the shared task priority table preferably includes a sub-processing unit identifier and priority identifier pair, which indicates the priority of a given processor task executing on the associated sub-processing unit. . In this way, a sub-processing unit that is trying to determine the priority of a running processor task accesses the shared task priority table and finds a sub-processing unit that is executing a lower priority processor task. May be. Preferably, the sub-processing unit executing the lowest priority processor task is identified and the execution right is transferred to the higher priority processor task.

本発明の他の実施形態は、サブ処理ユニットが、いずれのサブ処理ユニットが最も低い優先度のプロセッサタスクを実行中かを表す共有変数を利用することを提供する。優先度の正確な表示が保証されるように、共有変数の使用は微少更新プロセス（atomic update process）を通して達成されることが好ましい。代替的な方法は、一つのサブ処理ユニットから別のサブ処理ユニットに順次送信される、連続したメッセージを利用してもよい。メッセージは、より低い優先度のプロセッサタスクの優先度識別子とサブ処理ユニット識別子によって更新されてもよい。 Other embodiments of the present invention provide that sub-processing units utilize a shared variable that indicates which sub-processing unit is executing the lowest priority processor task. The use of shared variables is preferably accomplished through an atomic update process so that an accurate indication of priority is assured. An alternative method may utilize a series of messages that are sent sequentially from one sub-processing unit to another. The message may be updated with a lower priority processor task priority identifier and sub-processing unit identifier.

図３０を参照して、本発明の別の実施形態は、処理能力を強化するために、多数のＰＥ２００を結合することによって、プロセッサタスク１１０を実行するために割り当てられるサブ処理ユニット２０８の数を増加してもよいことを考察する。例えば、二つ以上のＰＥ２００Ａ、２００Ｂは、一つまたは複数のチップパッケージ内などにパッケージングされるか結合されて、一組のマルチプロセッサ装置を形成してもよい。この構成を広帯域エンジン（ＢＥ）と称してもよい。ＢＥ２９０は、二つのＰＥ２００Ａ、２００Ｂを含み、これらはバス２１２を介してデータ通信のために相互接続される。ＰＥ２００Ａ、２００Ｂおよび共有ＤＲＡＭ２１４の間で通信を可能にするように、追加的なデータバス２１６が提供されることが好ましい。一つまたは複数の入出力（Ｉ／Ｏ）インタフェース２０２Ａ、２０２Ｂおよび外部バス（図示せず）は、ＢＥ２９０と任意の外部要素の間の通信を提供する。ＢＥ２９０中のＰＥ２００Ａ、２００Ｂは、それぞれ、図３に関して上述したサブ処理ユニット２０８によって実行される並列で独立したアプリケーションおよびデータの処理に類似の並列で独立の態様で、データおよびアプリケーションの処理を実行する。本発明の様々な態様によれば、ＢＥは単一のＰＥを備えていても多数のＰＥを備えていてもよい。さらに、ＢＥ自体が複数のＢＥを含むことで構成されてもよい。 Referring to FIG. 30, another embodiment of the present invention reduces the number of sub-processing units 208 assigned to perform processor task 110 by combining multiple PEs 200 to enhance processing power. Consider that it may increase. For example, two or more PEs 200A, 200B may be packaged or combined into one or more chip packages to form a set of multiprocessor devices. This configuration may be referred to as a broadband engine (BE). The BE 290 includes two PEs 200A, 200B, which are interconnected for data communication via a bus 212. An additional data bus 216 is preferably provided to allow communication between the PEs 200A, 200B and the shared DRAM 214. One or more input / output (I / O) interfaces 202A, 202B and an external bus (not shown) provide communication between the BE 290 and any external elements. PEs 200A, 200B in BE 290 each perform data and application processing in a parallel and independent manner similar to the parallel independent application and data processing performed by sub-processing unit 208 described above with respect to FIG. . According to various aspects of the invention, a BE may comprise a single PE or multiple PEs. Further, the BE itself may include a plurality of BEs.

図３１を参照する。ここで、基本構成単位としてスタンドアロン型のＳＰＵ２０８、ＰＥ２００、または複数のＰＥのセットであるＢＥ２９０が複数の製品に分散されて、マルチプロセッサシステム５００を形成してもよい。コンピュータおよび／またはコンピューティングデバイスとして実装された、システム５００の要素またはメンバは、ネットワーク５０４を介して通信することが好ましい。ネットワーク５０４は、ローカルエリアネットワーク（ＬＡＮ）であっても、インターネットや他の任意のコンピュータネットワークのような大域的ネットワークであってもよい。 Refer to FIG. Here, the stand-alone SPU 208, PE 200, or BE 290, which is a set of a plurality of PEs, may be distributed among a plurality of products as a basic structural unit to form the multiprocessor system 500. Elements or members of system 500, implemented as computers and / or computing devices, preferably communicate via network 504. The network 504 may be a local area network (LAN) or a global network such as the Internet or any other computer network.

例えば、ネットワーク５０４に接続されたメンバは、例えば、クライアントコンピュータ５０６、サーバコンピュータ５０８、個人携帯情報機器（ＰＤＡ）５１０、デジタルテレビ（ＤＴＶ）５１２、または、そのほかの有線または無線ワイヤレスコンピュータおよびコンピューティングデバイスを含む。例えば、クライアント５０６Ａは、一つまたは複数のＰＥ２００、または他の適切なマルチプロセッサシステムから構成されるラップトップコンピュータであってもよい。クライアント５０６Ｂは、一つまたは複数のＰＥ２００、または他の適切なマルチプロセッサシステムから構成されるデスクトップコンピュータ（またはセットトップボックス）であってもよい。さらに、サーバ５０８Ａはデータベース機能を使用した管理要素（administrative entity）であってもよく、これは一つまたは複数のＰＥ２００から好ましくは構成される。 For example, members connected to network 504 may include, for example, client computer 506, server computer 508, personal digital assistant (PDA) 510, digital television (DTV) 512, or other wired or wireless wireless computers and computing devices. including. For example, client 506A may be a laptop computer comprised of one or more PEs 200 or other suitable multiprocessor system. Client 506B may be a desktop computer (or set-top box) comprised of one or more PEs 200 or other suitable multiprocessor system. Further, the server 508A may be an administrative entity using a database function, which is preferably composed of one or more PEs 200.

したがって、マルチプロセッサシステム５００の処理能力は、ローカルに（例えば、一つの製品に）またはリモートに（例えば複数の製品に）配置された複数のＰＥ２００に依存してもよい。この点に関して、図３０は、本発明の一つまたは複数の態様による全体的なコンピュータネットワークのブロック図である。ＰＥ２００および／または多数のＰＥからなるＢＥ２９０は、コンピュータシステム５００の全体的な分散アーキテクチャを実装するために利用することができる。 Accordingly, the processing power of the multiprocessor system 500 may depend on multiple PEs 200 located locally (eg, on one product) or remotely (eg, on multiple products). In this regard, FIG. 30 is a block diagram of an overall computer network according to one or more aspects of the present invention. A BE 290 consisting of a PE 200 and / or multiple PEs can be utilized to implement the overall distributed architecture of the computer system 500.

システム５００のサーバ５０８がクライアント５０６より多くのデータおよびアプリケーションの処理を実行するので、サーバ５０８は、クライアント５０６より多くのコンピュータモジュール（例えばＰＥ２００）を含む。他方、ＰＤＡ５１０は、この実施例において最小の処理量を実行する。したがって、ＰＤＡ５１０は、単一のＰＥ２００のように最小数のＰＥ２００を含む。ＤＴＶ５１２は、実質的にクライアント５０６とサーバ５０８の間の処理レベルを実行する。したがって、ＤＴＶ５１２は、クライアント５０６のＰＥとサーバ５０８のＰＥの間の数のＰＥを含む。 Server 508 includes more computer modules (eg, PE 200) than client 506 because server 508 of system 500 performs more data and application processing than client 506. On the other hand, the PDA 510 executes the minimum processing amount in this embodiment. Accordingly, the PDA 510 includes a minimum number of PEs 200, such as a single PE 200. The DTV 512 essentially executes the processing level between the client 506 and the server 508. Accordingly, the DTV 512 includes a number of PEs between the PEs of the client 506 and the server 508.

分散マルチプロセッサシステム５００に関して、さらに詳細に述べる。システム５００の均質的な構成は、適応性、処理速度、処理効率を促進する。システム５００の各メンバが同一のコンピューティングモジュール、例えばＰＥ２００の一つまたは複数（またはその一部）を使用して処理を実行するので、データおよびアプリケーションの処理がネットワークメンバで共有されてもよいため、データおよび／またはアプリケーションの処理を実行するコンピュータおよびコンピューティングデバイスの特定は、重要でない。システム５００により処理されるデータおよびアプリケーションを含むセルアプリケーションを固有に識別することによって、処理が発生した場所にかかわらず、その処理を要求するコンピュータまたはコンピューティングデバイスに処理結果を送信することができる。処理を実行するモジュールが共通の構造を有し、共通の命令セットアーキテクチャを使用するので、プロセッサ間の互換性を得るための追加ソフトウェアレイヤによる計算負荷が省かれる。このアーキテクチャおよびプログラミングモデルは、例えば、リアルタイムでマルチメディアのアプリケーションを実行するのに必要な処理速度を促進する。 Further details regarding the distributed multiprocessor system 500 will be described. The homogeneous configuration of system 500 promotes adaptability, processing speed, and processing efficiency. Since each member of system 500 performs processing using the same computing module, eg, one or more (or portions thereof) of PE 200, data and application processing may be shared by network members. The identification of computers and computing devices that perform processing of data and / or applications is not critical. By uniquely identifying a cell application that includes data and applications to be processed by the system 500, the processing results can be sent to the computer or computing device requesting the processing regardless of where the processing occurred. Since modules that execute processing have a common structure and use a common instruction set architecture, a calculation load due to an additional software layer for obtaining compatibility between processors is saved. This architecture and programming model facilitates, for example, the processing speed required to run multimedia applications in real time.

システム５００により促進される処理速度および処理効率のさらなる利点を得るために、このシステムにより処理されるデータおよびアプリケーションは、固有に識別され一様にフォーマット化されたセルアプリケーション５０２にパッケージングされてもよい。各セルアプリケーション５０２は、アプリケーションおよびデータの両方を含み、または含んでもよい。後述するように、各セルアプリケーションはまた、ネットワーク５０４およびシステム５００の全体にわたってセルを識別するためのＩＤを含む。 In order to obtain further benefits of processing speed and processing efficiency facilitated by system 500, the data and applications processed by this system may be packaged into uniquely identified and uniformly formatted cell applications 502. Good. Each cell application 502 may or may include both applications and data. As described below, each cell application also includes an ID for identifying the cell throughout network 504 and system 500.

セルアプリケーションの構造の均質性、およびネットワーク全体にわたるセルアプリケーションの一様な識別により、ネットワーク５０４の任意のコンピュータまたはコンピューティングデバイス上でのアプリケーションおよびデータの処理が容易になる。例えば、クライアント５０６がセルアプリケーション５０２を実行してもよいが、クライアント５０６の処理能力が限られているため、セルアプリケーション５０２を処理のためにサーバ５０８に送信してもよい。したがって、セルアプリケーション５０２は、ネットワーク５０４上の処理リソースの利用可能性に基づいて、ネットワーク５０４の全体にわたって移動することができる。 The homogeneity of the structure of the cell application and the uniform identification of the cell application across the network facilitates the processing of applications and data on any computer or computing device in the network 504. For example, the client 506 may execute the cell application 502, but the cell application 502 may be transmitted to the server 508 for processing because the processing capability of the client 506 is limited. Accordingly, the cell application 502 can move throughout the network 504 based on the availability of processing resources on the network 504.

システム５００のプロセッサおよびセルアプリケーション５０２の均質的な構造により、今日の異機種間ネットワークの課題の多くが回避される。任意の命令セット、例えばＪａｖａ仮想マシン（Ｊａｖａは登録商標）のような仮想マシンを使用して、任意のＩＳＡでのアプリケーションの処理を可能にしようとする非効率的なプログラムモジュールが回避される。したがって、システム５００は、従来のネットワークより効果的にまた効率的に、ブロードバンド処理を実行することができる。 The homogeneous structure of the processor and cell application 502 of the system 500 avoids many of the challenges of today's heterogeneous networks. Using an arbitrary instruction set, for example a virtual machine such as a Java virtual machine (Java is a registered trademark), inefficient program modules that attempt to allow processing of applications in any ISA are avoided. Thus, the system 500 can perform broadband processing more effectively and efficiently than conventional networks.

本発明の方法の一態様によると、第１のマルチプロセッサ（便宜上、例えば第１のＢＥとする）は、様々な理由のうち任意のもののために、第１のＢＥからタスクを移動するべきか否かを決定する。例えば、第１のＢＥは、指定されたデッドラインまたは合理的な期間内にタスクを終了するにはビジーであるという判定をしてもよい。第１のＢＥが、別の処理システムにタスクを移動すべきであると判定すると、第１のＢＥからネットワークを介してアプリケーションをブロードキャストする。アプリケーションは、複数のタスクと一つの属性とを指定するかまたは含む。移動されることになるタスクは、指定されたタスクまたは含まれているタスクのうちの一つである。属性は、アプリケーションをブロードキャストした前段のＢＥ、もしくは初めにアプリケーションをブロードキャストしたＢＥ、のうち少なくとも一つからのアプリケーションの距離と、複数のタスクのそれぞれを終了するためのデッドラインと、を表示することが好ましい。タスクキューを指定することもできる。アプリケーションは、アプリケーション自身がタスクを保有することによって複数のタスクを直接指定してもよいし、または、アプリケーションがポインタ情報を使用して間接的にタスクを指定してもよい。ポインタ情報は、例えばポインタ、インデックスまたは他の同等手法であり、タスクの位置する場所を表すか指し示すものである。あるいは、タスクキュー、各タスク、およびアプリケーション内の属性と組み合わされたタスクのブロックのうちのいずれか一つと、属性とがリンク付けされていてもよい。さらに、本発明の態様によると、タスクは、タスクを実行するための要件を指定する属性を有している。属性は、二つ以上のタスクで共通であるか、または全てのタスクで共通であってもよい。または、タスクのそれぞれが固有の独立した属性を有していてもよい。 According to one aspect of the method of the present invention, should the first multiprocessor (for convenience, eg, the first BE) move tasks from the first BE for any of a variety of reasons? Decide whether or not. For example, the first BE may determine that it is busy completing the task within a specified deadline or within a reasonable time period. If the first BE determines that the task should be moved to another processing system, it broadcasts the application from the first BE over the network. An application specifies or includes a plurality of tasks and a single attribute. The task to be moved is one of the designated task or the included tasks. The attribute indicates the distance of the application from at least one of the preceding BE that broadcast the application or the BE that first broadcast the application, and a deadline for ending each of the plurality of tasks. Is preferred. You can also specify a task queue. The application may directly specify a plurality of tasks by the application itself holding the task, or the application may indirectly specify the task using the pointer information. The pointer information is, for example, a pointer, index, or other equivalent technique, and represents or points to a location where the task is located. Alternatively, the attribute may be linked to any one of the task queue, each task, and a block of tasks combined with the attribute in the application. Further in accordance with an aspect of the present invention, a task has an attribute that specifies a requirement for performing the task. The attribute may be common to two or more tasks, or may be common to all tasks. Alternatively, each task may have a unique independent attribute.

アプリケーションの「距離」とは、転送レイテンシの尺度となりうる物理的距離のことをいう場合もあるが、好適には、ネットワークを越えて別の位置またはプロセッサにアプリケーションを移動する際の、ネットワークレイテンシのことを指す。 The “distance” of an application may refer to the physical distance that can be a measure of transfer latency, but preferably the network latency as it moves across the network to another location or processor. Refers to that.

例えば、送信元すなわち「オーナーＢＥ」は、アプリケーションＡの移動を決定し、ネットワークにアプリケーションＡをブロードキャストする。アプリケーションＡは、デッドラインと、必要となる計算リソース（例えば、ＳＰＵの数または複数のタスク）と、他のパラメータとを有している。別のセルである第１のＢＥは、アプリケーションＡを受け取り、パラメータとオーナーＢＥへの「距離」とをチェックする。いくつかの理由により、第１のＢＥがアプリケーションＡを処理しないと決定し、アプリケーションＡをネットワークに戻してブロードキャストする。この場合、アプリケーションＡの距離は、依然として、潜在的な新たなＢＥからオーナーＢＥへのネットワークレイテンシである。今度は、第２のＢＥがアプリケーションＡを受け取り、パラメータとオーナーＢＥへの距離とをチェックする。この情報に基づいて、第２のＢＥはアプリケーションＡの結果をオーナーＢＥへと戻してもよい。 For example, the transmission source or “owner BE” decides to move the application A and broadcasts the application A to the network. The application A has a deadline, necessary calculation resources (for example, the number of SPUs or a plurality of tasks), and other parameters. Another cell, the first BE, receives application A and checks the parameters and the “distance” to the owner BE. For some reason, the first BE decides not to process application A and broadcasts application A back to the network. In this case, Application A's distance is still the network latency from the potential new BE to the owner BE. This time, the second BE receives application A and checks the parameters and the distance to the owner BE. Based on this information, the second BE may return the result of application A to the owner BE.

あるいは、「距離」は、送信元すなわちオーナーＢＥでなく、直前のＢＥからの距離であってもよい。例えば、オーナーＢＥが、アプリケーションＡをネットワーク上に移動させると決定したとする。アプリケーションＡは、デッドラインと、必要となる計算リソース（例えば、ＳＰＵの数または複数のタスク）と、他のパラメータとを有している。第１のＢＥは、アプリケーションＡを受け取り、パラメータとオーナーＢＥへの距離とをチェックする。いくつかの理由により、第１のＢＥがアプリケーションＡの一部を「Ａ’」として移動させると決定し、Ａ’をネットワークにブロードキャストする。この場合、Ａ’の距離は、潜在的な新たなＢＥから第１のＢＥへのネットワークレイテンシである。Ａ’のデッドラインは、アプリケーションＡよりも早い。第２のＢＥがＡ’を受け取り、パラメータと第１のＢＥへの距離とをチェックする。第２のＢＥは、Ａ’の結果を第１のＢＥへと戻してもよい。第１のＢＥは、Ａ’を有するアプリケーションＡの結果を生成し、その結果をオーナーＢＥへと戻してもよい。 Alternatively, the “distance” may be a distance from the immediately preceding BE instead of the transmission source, that is, the owner BE. For example, the owner BE decides to move the application A onto the network. The application A has a deadline, necessary calculation resources (for example, the number of SPUs or a plurality of tasks), and other parameters. The first BE receives application A and checks the parameters and the distance to the owner BE. For some reason, the first BE decides to move part of application A as “A ′” and broadcasts A ′ to the network. In this case, A 'distance is the network latency from the potential new BE to the first BE. A 'deadline is faster than application A. The second BE receives A 'and checks the parameters and the distance to the first BE. The second BE may return the result of A 'to the first BE. The first BE may generate a result for application A having A 'and return the result to the owner BE.

属性は、タスクの実行に要求される処理能力を指定することが好ましい。したがって、上述のプロセッサアーキテクチャが使用される場合、この属性は、タスクの実行に必要なＳＰＵの数を指定してもよい。さらに、同じ属性または異なる属性が、タスクの実行に必要とされるメモリの量を指定してもよい。 The attribute preferably specifies the processing capability required to execute the task. Thus, if the processor architecture described above is used, this attribute may specify the number of SPUs required to perform the task. Further, the same attribute or different attributes may specify the amount of memory required to perform the task.

第１のＢＥによってブロードキャストされたアプリケーションは、ネットワークにより第１のＢＥに接続される第２のＢＥで受け取られる。第２のＢＥは、アプリケーション内のタスクを分解し（unbundle）、いずれのタスクを第１のＢＥから移動させるべきかについて決定する。タスクをばらし移動させるべきタスクを決定した後、第２のＢＥは、好適には、ソフトウェアコマンドにより、タスクを実行するための要件を指定する属性を調べ、第２のＢＥによって実行されなければならないタスクを決定する。上述したように、属性（または、複数の属性）が必要な処理能力および必要なメモリを記述する場合、好適には第２のＢＥは、移動されるタスクに関連付けられている必要な処理能力および必要なメモリを検査する。 An application broadcast by the first BE is received at a second BE connected to the first BE by the network. The second BE unbundles the tasks in the application and determines which tasks should be moved from the first BE. After determining the task to be moved and moved, the second BE should be executed by the second BE, preferably by a software command, examining attributes specifying requirements for performing the task. Determine the task. As described above, when describing the required processing power and required memory for the attribute (or attributes), the second BE preferably has the required processing power associated with the task being moved and Check the required memory.

本発明の別の態様によると、第２のＢＥもまた、アプリケーションをブロードキャストした前段のＢＥ、または、初めにアプリケーションをブロードキャストしたＢＥ、のうちの少なくとも一つからのアプリケーションの距離と、複数のタスクのそれぞれを終了するためのデッドラインと、複数のタスクのそれぞれを終了するためのデッドラインとを調べる。そして、その複数のタスクを第２のＢＥで実行すべきか否かを決定する。第２のＢＥは、第１のＢＥに対して、第２のＢＥがタスク（単数または複数）を実行しているか否かを連絡することが好ましい。 According to another aspect of the invention, the second BE also has an application distance from at least one of the preceding BE that broadcast the application or the BE that originally broadcast the application, and a plurality of tasks. A deadline for ending each of the tasks and a deadline for ending each of the plurality of tasks are examined. Then, it is determined whether or not the plurality of tasks should be executed by the second BE. The second BE preferably informs the first BE whether or not the second BE is performing the task (s).

本発明の別の態様によると、ブロードキャストされたアプリケーションは、複数の他のＢＥによって受け取られる。複数の他のＢＥはそれぞれ、第２のＢＥに関して先に述べたステップを実行する。したがって、他のＢＥはそれぞれ、アプリケーション内のタスク（単数または複数）を分解する。移動されるべきタスク（単数または複数）は、アプリケーション内に含まれる。複数の他のＢＥはそれぞれ、複数のタスクを実行するための要件を指定する属性を調べて、タスクが実行されるべきか否かを決定する。さらに、複数の他のＢＥはそれぞれ、アプリケーションをブロードキャストした前段のＢＥ、もしくは初めにアプリケーションをブロードキャストしたＢＥ、のうち少なくとも一つからのアプリケーションの距離と、複数のタスクのそれぞれを終了するためのデッドラインと、を調べて、タスクが実行されるべきか否かを決定する。 According to another aspect of the invention, the broadcast application is received by a plurality of other BEs. Each of the other BEs performs the steps described above for the second BE. Accordingly, each of the other BEs decomposes the task (s) in the application. The task or tasks to be moved are included in the application. Each of the plurality of other BEs examines attributes that specify requirements for executing the plurality of tasks to determine whether the task should be executed. Further, each of the plurality of other BEs is a distance from the application to at least one of the preceding BE that broadcasts the application or the BE that first broadcast the application, and the dead for terminating each of the plurality of tasks. And determine whether the task should be executed.

上記説明においては、第１のマルチプロセッサとして第１のＢＥ、第２のマルチプロセッサとして第２のＢＥを便宜上定めた。しかし、当然であるが、第１のマルチプロセッサおよび第２のマルチプロセッサはそれぞれ、図３２に示した要素のいずれであってもよい。したがって、一例として、第１のマルチプロセッサはクライアント５０６のうちの一つであってもよいし、第２のマルチプロセッサはクライアント５０６のうちの別の一つまたはＰＤＡ５１０であってもよい。図３２中では、ネットワーク５０４上を複数のセル５０２が移動している状態が示されているが、このセル５０２がこれまでの説明でＢＥ間で転送またはブロードキャストされるとしたアプリケーションである。例えば、クライアント５０６内には複数の「セル」が描かれているが、これはクライアント５０６内の複数のＢＥまたはＰＥにおいて、セルアプリケーションが実行されているイメージを表したものである。クライアント５０６の他に、サーバ５０８内にも「セル」が描かれているが、これは必ずしもサーバやクライアント内のＢＥまたはＰＥが「セル」を実行中であるイメージを表しているわけではなく、「セル」を実行可能であることを示しているに過ぎない。各ＰＥは、複数のＳＰＵを持つ図３に示したようなアーキテクチャを有してもよい。しかしながら、本発明は、プロセッサのタイプまたはアーキテクチャにかかわらず、複数のネットワーク化されたプロセッサを有する任意のシステムに適用可能であることが理解される。 In the above description, the first BE is defined as the first multiprocessor and the second BE as the second multiprocessor for convenience. However, it will be appreciated that each of the first and second multiprocessors may be any of the elements shown in FIG. Thus, by way of example, the first multiprocessor may be one of the clients 506 and the second multiprocessor may be another one of the clients 506 or the PDA 510. FIG. 32 shows a state in which a plurality of cells 502 are moving on the network 504. This is an application in which this cell 502 is transferred or broadcast between BEs in the above description. For example, a plurality of “cells” are drawn in the client 506, and this represents an image in which a cell application is executed in a plurality of BEs or PEs in the client 506. In addition to the client 506, a “cell” is also drawn in the server 508, but this does not necessarily represent an image in which the BE or PE in the server or client is executing the “cell”. It only indicates that the “cell” can be executed. Each PE may have an architecture as shown in FIG. 3 having a plurality of SPUs. However, it is understood that the present invention is applicable to any system having multiple networked processors, regardless of processor type or architecture.

この点に関して、図３３は本発明のセルサーバ６００の一実施形態を示す。一つまたは複数の広帯域エンジン（ＢＥ）６１０は、セルサーバによって管理される。各ＢＥは、上述の環境と同じく、処理ユニット（ＰＵ）６３０、複数の相乗的（synergistic）処理ユニット（ＳＰＵ）６４０、有利には（図４に示す）複数のＳＰＵのそれぞれに関連するローカルメモリ用の論理記憶装置（ＬＳ）（ＤＭＡＣ６５０を介してアクセスされる）を備える基本プロセッサ要素（ＰＥ）６２０と、共有メモリ６６０とを備える。ＢＥは、ＢＥバス６７０およびＢＥ入出力チャネル６７５を介して内部通信する。加えて、セルサーバは、一つまたは複数のネットワークインタフェース６８０を有する。ネットワークインタフェースは、有利には、各ＰＵまたはＳＰＵに関連付けられているが、セルサーバに直接関連付けられていてもよい。有利には、セルサーバ６００は、内部通信用のセルバス６９０とローカルセル入出力チャネル６９５とを備える。 In this regard, FIG. 33 illustrates one embodiment of the cell server 600 of the present invention. One or more broadband engines (BE) 610 are managed by the cell server. Each BE has a local memory associated with each of the processing units (PUs) 630, a plurality of synergistic processing units (SPUs) 640, and advantageously a plurality of SPUs (shown in FIG. 4), similar to the environment described above. A basic processor element (PE) 620 comprising a logical storage device (LS) for use (accessed via the DMAC 650) and a shared memory 660. The BE internally communicates via the BE bus 670 and the BE input / output channel 675. In addition, the cell server has one or more network interfaces 680. The network interface is advantageously associated with each PU or SPU, but may be directly associated with the cell server. Advantageously, the cell server 600 includes a cell bus 690 for internal communication and a local cell input / output channel 695.

ＢＥの一実施形態の汎用メモリハンドリング特性の詳細は、２００１年３月２２日に出願された「SYSTEM AND METHOD FOR DATA SYNCHRONIZATION FOR A COMPUTER ARCHITECTURE FOR BROADBAND NETWORKS」と題する米国特許出願第０９／８１５，５５４号に説明されており、これは本出願の譲受人に譲渡されている。ネットワーク化されたマルチプロセッサ環境におけるＢＥの基本的な相互接続の一実施形態の説明は、２００４年４月２２日に出願された「METHOD AND APPARATUS FOR PROVIDING AN INTERCONNECTION NETWORK FUNCTION」と題する米国仮特許出願第６０／５６４，６４７号に説明されており、これは本出願の譲受人に譲渡されている。 Details of general purpose memory handling characteristics of one embodiment of BE are described in US patent application Ser. No. 09 / 815,554, filed Mar. 22, 2001, entitled “SYSTEM AND METHOD FOR DATA SYNCHRONIZATION FOR A COMPUTER ARCHITECTURE FOR BROADBAND NETWORKS”. Which is assigned to the assignee of the present application. A description of one embodiment of BE basic interconnection in a networked multiprocessor environment is described in US Provisional Patent Application entitled “METHOD AND APPARATUS FOR PROVIDING AN INTERCONNECTION NETWORK FUNCTION” filed on April 22, 2004. No. 60 / 564,647, which is assigned to the assignee of the present application.

各ＢＥは、図３４の一実施形態に模式的に示すように、一つまたは複数のＢＥタスク７００を含む。ＢＥタスク７００は、タスクキュー７１０および関連するタスクテーブル７２０ａ〜７２０ｄを利用して、セルサーバの共有メモリ内に編成されることが好ましい。タスクテーブル７２０ａ〜７２０ｄは、それぞれタスクテーブルエントリのセットを備え（図３５を参照）、タスクテーブルは、一つまたは複数のＳＰＵによって実行されるべき一つまたは複数のＳＰＵタスクを有する（図３５を参照）。ＢＥタスクは、固有のＢＥタスク識別子７３０を有することが好ましい。 Each BE includes one or more BE tasks 700, as schematically illustrated in one embodiment of FIG. The BE task 700 is preferably organized in the shared memory of the cell server utilizing the task queue 710 and associated task tables 720a-720d. Each of the task tables 720a-720d comprises a set of task table entries (see FIG. 35), which has one or more SPU tasks to be executed by one or more SPUs (see FIG. 35). reference). The BE task preferably has a unique BE task identifier 730.

加えて、各ＢＥタスクは、少なくとも以下の４つのパラメータを有することが好ましい。すなわち、ＢＥタスクまたはタスクグループを実行するために必要となる最小のＳＰＵ数（最小必要ＳＰＵ７３０ａ）、ＢＥタスクまたはタスクグループによって要求されるメモリサイズ（メモリ割り当て７３０ｂ）、セルレイテンシすなわちこのＢＥを所有するセルサーバと受取り側ＢＥとの間の距離（セルレイテンシ７３０ｃ）、およびＢＥタスクに含まれるＳＰＵタスクのタスクデッドライン（タスクデッドライン７３０ｄ）である。ＢＥタスクは、セルサーバネットワークによって要求される他の値（７３０ｅ）も含むことが好ましい。ＢＥタスクの一例を表１に示す。 In addition, each BE task preferably has at least the following four parameters. That is, the minimum number of SPUs required to execute a BE task or task group (minimum required SPU 730a), the memory size required by the BE task or task group (memory allocation 730b), cell latency, that is, own this BE The distance between the cell server and the receiving BE (cell latency 730c) and the task deadline (task deadline 730d) of the SPU task included in the BE task. The BE task preferably also includes other values (730e) required by the cell server network. An example of the BE task is shown in Table 1.

図３５は、本発明のタスクテーブルの一実施形態を示す。上述したように、各タスクテーブル７４０は、以下のパラメータのうち一つまたは複数を保持するテーブルエントリ７５０のセットを含む。すなわち、タスクステータス７６０ａ、タスク優先度７６０ｂ、他のタスクテーブル値７６０ｃ、前のタスクへのポインタ７８０、および次のタスクへのポインタ７９０である。各タスクテーブル７４０は、実行されるべき実際の命令に対する命令ポインタ７７０を有してもよい。 FIG. 35 shows an embodiment of the task table of the present invention. As described above, each task table 740 includes a set of table entries 750 that hold one or more of the following parameters. That is, task status 760a, task priority 760b, other task table value 760c, previous task pointer 780, and next task pointer 790. Each task table 740 may have an instruction pointer 770 for the actual instruction to be executed.

タスクステータス７６０ａは、タスクの状態、例えば、タスクが実行中、準備、待機、待機中断、中断、休止、不在のいずれかを表す。タスク優先度７６０ｂは、タスクのレベルまたは重要度を表し、スケジューリング中にタスクの実行順序を決定するために使用される。前のタスクポインタ７８０および次のタスクポインタ７９０は、他のテーブルエントリ７５０に関連して使用され、タスクテーブル７４０内のリンク付きリストを形成する。前のタスクポインタ７８０および次のタスクポインタ７９０もまた、ＰＵおよびＳＰＵへのタスクテーブルの処理タスクの順序づけされたアクセスを提供する。 The task status 760a represents a state of the task, for example, any one of a task being executed, preparation, standby, standby interruption, interruption, suspension, and absence. Task priority 760b represents the level or importance of the task and is used to determine the execution order of tasks during scheduling. Previous task pointer 780 and next task pointer 790 are used in conjunction with other table entries 750 to form a linked list in task table 740. Previous task pointer 780 and next task pointer 790 also provide ordered access of processing tasks in the task table to the PU and SPU.

図３６は、本発明のＢＥタスクキューの一実施形態を示す。各タスクキュー７１０は、複数のタスク優先度レベル８２０のそれぞれについての、ヘッドエントリポインタ８００とテイルエントリポインタ８１０のセットを含む。具体的には、タスクキュー７１０は、セルサーバ上でＢＥでのタスク命令の実行を優先順位付けするために提供される、ｎ優先度レベル（０からｎ−１）のそれぞれについて別個にヘッドエントリとテイルエントリの対を有する。一般に、優先度レベルと同じ数のリンク付きリストが存在する。したがって、各優先度レベルについて、その優先度レベルでのタスクのリンク付きリスト８３０は、その優先度のタスクテーブル７４０内の第１のタスクエントリ８４５をポイントするヘッドポインタ８４０と、その優先度のタスクテーブル内の最後のタスクエントリ８５５をポイントするテイルポインタ８５０とに関連付けられている。タスクキュー７１０の一実施例を表２に示す。 FIG. 36 shows an embodiment of the BE task queue of the present invention. Each task queue 710 includes a set of head entry pointers 800 and tail entry pointers 810 for each of a plurality of task priority levels 820. Specifically, the task queue 710 provides a separate head entry for each of the n priority levels (0 to n-1) provided to prioritize execution of task instructions at the BE on the cell server. And tail entry pairs. In general, there are as many linked lists as there are priority levels. Thus, for each priority level, a linked list of tasks 830 at that priority level includes a head pointer 840 that points to the first task entry 845 in the task table 740 for that priority, and a task for that priority. Associated with a tail pointer 850 that points to the last task entry 855 in the table. An example of the task queue 710 is shown in Table 2.

ＰＵおよび／またはＳＰＵは、ＢＥタスクキューおよび関連するＢＥタスクテーブルを使用して、タスク処理のスケジューリングを実行する。具体的には、共有メモリから引き抜かれたプロセスタスクを実行のためにＳＰＵの一つに入れる順序を決定する。セルサーバ内の各ＢＥは、少なくとも一つのタスクキューと、関連するタスクテーブルとを維持することが好ましい。 The PU and / or SPU uses the BE task queue and associated BE task table to perform task processing scheduling. Specifically, the order in which process tasks extracted from the shared memory are put into one of the SPUs for execution is determined. Each BE in the cell server preferably maintains at least one task queue and an associated task table.

各ＢＥについてのタスクキューおよびタスクテーブルは、同一セルサーバ内での一つのＢＥから別のＢＥへのプロセッサタスクの移動、または、ネットワークを介して接続された別個のセルサーバでの一つのＢＥから別のＢＥへのプロセッサタスクの移動を容易にする。別のＢＥに実行のために渡されるべきプロセッサタスクは、移動のためにアプリケーション内にバンドルされる（bundle）。一般に、アプリケーションは、以下に詳細に述べるように、他の情報でラップされた一つまたは複数のＢＥタスクを含む。特定のＢＥを管理するセルサーバは、一般に、そのＢＥのアプリケーションオーナーとみなされる。 The task queue and task table for each BE can be transferred from one BE to another BE within the same cell server, or from one BE on a separate cell server connected over the network. Facilitates moving processor tasks to another BE. A processor task to be passed to another BE for execution is bundled in the application for movement. In general, an application includes one or more BE tasks wrapped with other information, as described in detail below. A cell server that manages a particular BE is generally considered the application owner of that BE.

図３７は、移動のためにアプリケーションをバンドルする方法の一実施形態を示す。一般に、例えば、他の状況と同じく、ＢＥ内（または、セルサーバの特定のＳＰＵ内）のプロセッサロードが高いとき、または、特定のＳＰＵ内で優先度の衝突が発生したとき、特定のセルサーバ内のＢＥが、一つまたは複数のＢＥタスクのセットを含むアプリケーションを別のＢＥへ移動する。特に、移動のためのアプリケーションのバンドリングは、送出し側ＢＥによって処理されるステッププロセスを伴う。 FIG. 37 illustrates one embodiment of a method for bundling applications for movement. In general, as in other situations, for example, when the processor load in a BE (or in a cell server's specific SPU) is high, or when a priority conflict occurs in a specific SPU, a specific cell server One BE moves an application containing a set of one or more BE tasks to another BE. In particular, bundling applications for movement involves a step process that is handled by the sending BE.

通常のタスク処理状態９００の間、移動判定ステップ９１０において、例えば高いプロセッサロード、低いメモリ利用可能性などのために、ＢＥは、タスクの移動が必要であるか否かを判定する。移動が必要な場合、停止タスクステップ９２０において、現在実行中のタスク７００が停止される。タスク更新ステップ９３０において、バンドリングの準備のために、送出し側ＢＥの現在のタスクキュー７１０およびタスクテーブル７２０が更新される。バンドルステップ９５０において、移動されるべきＢＥタスクが移動のためにアプリケーション９４０にバンドルされる。好適には、セル９４０は、（タスクキュー７１０とタスクテーブル７２０の情報を含む）移動されるべきＢＥタスク７００のみならず、ＢＥタスクに関連するタスクデータ９５２も含む。有利には、アプリケーションは、必要に応じて、アプリケーションセキュリティを保証し、アプリケーション整合性を確認するために、圧縮されるか最小アプリケーションサイズに暗号化されてもよい。続いて、図３８に詳細を示すポーリング−応答−移動−復帰プロセスを介する移動ステップ９６０によって、タスクが別のＢＥに移動される。移動プロセス（および、移動されたプロセスの復帰）が完了すると、ＢＥは、通常のタスク処理状態９００を介してタスクの処理の継続に復帰する。 During the normal task processing state 900, in a move determination step 910, the BE determines whether a move of a task is necessary, for example due to high processor load, low memory availability, and so forth. If the move is necessary, in the stop task step 920, the currently executing task 700 is stopped. In task update step 930, the current task queue 710 and task table 720 of the sending BE are updated in preparation for bundling. In a bundle step 950, the BE task to be moved is bundled with the application 940 for movement. Preferably, cell 940 includes not only BE task 700 to be moved (including information in task queue 710 and task table 720), but also task data 952 associated with the BE task. Advantageously, the application may be compressed or encrypted to a minimum application size as needed to ensure application security and verify application integrity. Subsequently, the task is moved to another BE by a move step 960 via the poll-response-move-return process shown in detail in FIG. When the moving process (and the return of the moved process) is completed, the BE returns to the continuation of task processing via the normal task processing state 900.

図３８は、送出し側ＢＥから、既知のセルサーバネットワーク上のいずれかにある受取り側ＢＥへのＢＥタスクの移動プロセスの一実施形態を示す。図３７のバンドルステップ９５０において述べたように、移動されるべきＢＥタスクがバンドルされた後、ＢＥは、ポーリングステップ９７０において他のＢＥをポーリングする。好適には、ＢＥは、一つまたは複数のセルサーバネットワーク９７４上で、同一のセルサーバ６００および他の既知のセルサーバ９７２の双方にある、他のＢＥ６１０をポーリングする。 FIG. 38 illustrates one embodiment of a process for moving a BE task from a sending BE to a receiving BE anywhere on a known cell server network. As described in the bundling step 950 of FIG. 37, after the BE tasks to be moved are bundled, the BE polls other BEs in a polling step 970. Preferably, a BE polls other BEs 610, both on the same cell server 600 and other known cell servers 972, on one or more cell server networks 974.

好適には、ポーリングステップ９７０は、図３９に示す既知のセルサーバネットワーク上の既知のＢＥに対して、ブロードキャストクエリを介して実行される。一実施形態では、ブロードキャストクエリメッセージ９８０は、複数の値を含むネットワークメッセージの形態をとる。値には、送信元セルサーバおよびＢＥの表示９８５ａ、移動されるべきタスクのタスク優先度の表示９８５ｂ、必要なＳＰＵおよび／またはメモリリソースの表示９８５ｃ、移動中のタスクのタスクデッドラインの表示９８５ｄ、および、ブロードキャストクエリによって送信されることが有利な他の任意の値９８５ｅが含まれるが、これらに限定されない。これらの値は、バンドルされたアプリケーションの一部であるタスクテーブル内に格納される。これらの値は、必要に応じて、他のデータ構造で記載されてもよい。 Preferably, the polling step 970 is performed via a broadcast query to a known BE on the known cell server network shown in FIG. In one embodiment, the broadcast query message 980 takes the form of a network message that includes multiple values. Values include source cell server and BE indication 985a, task priority indication 985b of the task to be moved, required SPU and / or memory resource indication 985c, task deadline indication 985d of the task being moved. , And any other value 985e that would be advantageous to be transmitted by a broadcast query, including but not limited to. These values are stored in a task table that is part of the bundled application. These values may be described in other data structures as necessary.

図３８に戻り、クエリ応答受信ステップ９９０において、送出しセルサーバ内の送出し側ＢＥは、セルサーバネットワーク内の既知のＢＥから一つまたは複数の応答を受け取る。図４０に示すように、好適には、クエリ応答は、特定の値を含むネットワークメッセージの形態をとる。特定の値には、応答ＢＥおよびセルサーバロケーションの表示１００５ａ、応答ＢＥの現在のタスクデッドラインの表示１００５ｂ、現在のＢＥの空きＳＰＵ、プロセッサロード、および／またはメモリ負荷の表示１００５ｃ、クエリ応答１０００内で必要とされるか要求されるかまたは含まれる任意の他のブロードキャスト応答メッセージ値１００５ｄが含まれるが、これらに限定されない。 Returning to FIG. 38, in a query response receiving step 990, the sending BE in the sending cell server receives one or more responses from known BEs in the cell server network. As shown in FIG. 40, the query response preferably takes the form of a network message that includes a specific value. Specific values include response BE and cell server location display 1005a, response BE current task deadline display 1005b, current BE free SPU, processor load and / or memory load display 1005c, query response 1000 Including, but not limited to, any other broadcast response message value 1005d required, required or included within.

再び図３８に戻り、受取り側ＢＥ選択ステップ１０１０において、送出し側ＢＥは、受け取ったクエリ応答１０００から、応答ＢＥの移動タスクのうちいずれのセットを送信すべきかを決定する。この選択は、セルレイテンシの短さもしくは受取り側ＢＥと送出し側ＢＥ間のネットワークトポロジ構造上の短さ、受取り側ＢＥにおいて現在実行しているタスクのデッドラインまでの時間の短さ、受取り側ＢＥにおけるＳＰＵおよびメモリの利用可能性が十分かどうか、の一部または全てを考慮に入れた決定に基づくことが好ましい。移動ステップ１０２０において、アプリケーション９４０内のバンドルされたタスクは、選択された受取り側ＢＥ１０３０に移動される。 Returning again to FIG. 38, in the receiving BE selection step 1010, the sending BE determines from the received query response 1000 which set of response BE moving tasks should be sent. This selection is based on the short cell latency or the shortness of the network topology between the receiving BE and the sending BE, the short time to the deadline of the task currently being executed on the receiving BE, and the receiving side. It is preferably based on a decision that takes into account some or all of whether the SPU and memory availability at the BE is sufficient. In move step 1020, the bundled task in application 940 is moved to the selected receiving BE 1030.

アプリケーション９４０が受取り側ＢＥに送信されると、受取り側ＢＥタスクアンバンドルステップ１０４０において、有利には、受取り側ＢＥ１０３０は、共有メモリ内のアプリケーションをばらして処理する。受取り側ＢＥタスク処理ステップ１０５０において、受取り側ＢＥ１０３０は、バンドルされたタスク、データ、タスクキュー、およびタスクテーブルにアクセスして、移動された処理タスクを実行する。 When the application 940 is transmitted to the receiving BE, in the receiving BE task unbundling step 1040, the receiving BE 1030 advantageously distributes and processes the applications in the shared memory. At the receiving BE task processing step 1050, the receiving BE 1030 accesses the bundled task, data, task queue, and task table to execute the moved processing task.

ステップ１０５０において、移動されたタスクの処理が受取り側ＢＥによって完了すると、受取り側ＢＥ再バンドルステップ１０６０において、完了したタスク、データ、タスクキュー、およびタスクテーブルがアプリケーションに再バンドルされ、送信元の送出し側ＢＥ、典型的には始めに移動されたＢＥタスクを所有するセルサーバに戻される。送出し側ＢＥ終了タスク受信ステップ１０７０において、終了したタスクが元の送出し側ＢＥによって受け取られると、タスクアンバンドルステップ１０８０において、送出し側ＢＥは終了ＢＥタスクをばらす。続いて、終了タスク更新ステップ１０９０において、終了したＢＥタスクが関連するプロセスを更新し、一般のタスク処理状態９００を再開する。有利には、メッセージ９８０、１０００、アプリケーション６００、およびタスクデータ９５２等の他の重要なデータは、例えばＡＥＳ標準、Ｂｌｏｗｆｉｓｈ、または、ＲＳＡベースの公開鍵暗号化アルゴリズム等の既知のアルゴリズムによって暗号化されてもよい。同様に、本発明のメッセージ、データ構造および他の特徴は、例えばＭＤ５等のダイジェストアルゴリズムによって情報を採取され（fingerprint）、タスク、タスクデータ、およびメッセージが信頼性のないネットワークおよびネットワークプロトコルによって送信されるときに、重要なデータ整合性を保証するようにしてもよい。 When the processing of the moved task is completed by the receiving BE in step 1050, the completed task, data, task queue, and task table are re-bundled to the application in the receiving BE rebundle step 1060, and the sender sends out The returning BE, typically the cell server that owns the moved BE task, is returned. When the finished task is received by the original sending BE in the sending BE end task reception step 1070, the sending BE in the task unbundling step 1080 releases the finishing BE task. Subsequently, in the end task update step 1090, the process related to the completed BE task is updated, and the general task processing state 900 is resumed. Advantageously, other important data such as messages 980, 1000, application 600, and task data 952 are encrypted by known algorithms such as, for example, AES standards, Blowfish, or RSA-based public key encryption algorithms. May be. Similarly, the messages, data structures and other features of the present invention are fingerprinted by a digest algorithm such as MD5, and tasks, task data, and messages are transmitted over unreliable networks and network protocols. Important data consistency may be ensured.

図４１は、本発明の安全なサーバランチ（ranch）の一実施形態を示す。このネットワークで相互接続されたセルサーバ１１００は、有利には、サーバランチ１１１０と呼ばれるセルサーバのコミュニティを形成する。好適には、このサーバランチは、例えばインターネットまたはイントラネット等のＩＰ（インターネットプロトコル）ベースのネットワーク等の、トポロジーフリーなネットワーク上で形成される。しかしながら、十分な効率性および信頼性のあるネットワークプロトコルを使用可能である。 FIG. 41 illustrates one embodiment of the secure server launch of the present invention. The cell servers 1100 interconnected by this network advantageously form a community of cell servers called server launches 1110. Preferably, the server launch is formed on a topology-free network, such as an IP (Internet Protocol) based network such as the Internet or an intranet. However, a sufficiently efficient and reliable network protocol can be used.

無線ネットワーク、安全でないローカルエリアネットワーク、広域エリアネットワーク、組織されたまたはトポロジカルなイントラネットまたはより一般的なインターネット等の他のオープンネットワーク上にサーバランチが存在する場合にセキュリティが重要であるように、有利にはサーバランチは、それが存在する上でのネットワークに関して安全でであってもよい。一実施形態では、安全なサーバランチ内の各セルサーバは公開鍵および個人鍵を使用し、ネットワークを介して分散されるセルサーバは、暗号化モジュール１１２０（一実施形態では、公開鍵暗号化（ＰＫＩ）モジュール）を介してアプリケーションおよび他のメッセージを送信し、受信し、および認証することができる。しかしながら、ＡＥＳベースのハンドシェークまたは圧縮−暗号化技術等の任意のタイプの暗号化アルゴリズムを使用してもよい。既知のネットワーク１１３０の外部のセルサーバは、一般にセルランチに参加可能であるが、適当なタイプの暗号化モジュールおよび適当な認証（すなわち、キーおよび／または署名）を持たない場合は、安全なサーバランチへの参加を控えることが好ましい。 Advantageous, as security is important when server launches exist on other open networks such as wireless networks, insecure local area networks, wide area networks, organized or topological intranets or the more general Internet The server launch may be secure with respect to the network on which it resides. In one embodiment, each cell server in the secure server launch uses a public key and a private key, and cell servers distributed over the network can have an encryption module 1120 (in one embodiment, public key encryption ( Applications and other messages can be sent, received and authenticated via the PKI) module). However, any type of encryption algorithm may be used such as an AES-based handshake or compression-encryption technique. Cell servers outside the known network 1130 are generally able to participate in the cell launch, but if they do not have the proper type of encryption module and proper authentication (ie, keys and / or signatures), a secure server launch. It is preferable to refrain from participating in.

本明細書において特定の実施形態を参照して本発明を記載したが、これらの実施形態が単に本発明の原理および応用を例示するものにすぎないことは理解されよう。したがって、例示した実施形態には多数の変形が可能であり、添付の特許請求の範囲で規定される本発明の精神および範囲から逸脱することなく他の構成を考案することができることが理解されよう。 Although the invention herein has been described with reference to particular embodiments, it will be understood that these embodiments are merely illustrative of the principles and applications of the present invention. Accordingly, it will be appreciated that many variations can be made to the illustrated embodiments, and that other configurations can be devised without departing from the spirit and scope of the invention as defined by the appended claims. .

本発明の一つまたは複数の態様によるマルチプロセッサシステムの構造を示す図である。FIG. 2 illustrates a structure of a multiprocessor system according to one or more aspects of the present invention. 共有メモリ内のプロセッサタスクの格納を示すブロック図である。It is a block diagram which shows storage of the processor task in a shared memory. 本発明のプロセッサ要素（ＰＥ）の好適な構造を示す図である。FIG. 2 shows a preferred structure of a processor element (PE) of the present invention. 本発明による例示的なサブ処理ユニット（ＳＰＵ）の構造を示す図である。FIG. 3 shows a structure of an exemplary sub-processing unit (SPU) according to the present invention. 本発明の一つまたは複数の態様により使用可能なプロセッサタスクテーブルの一例を示す図である。FIG. 6 illustrates an example of a processor task table that can be used in accordance with one or more aspects of the present invention. 図５のタスクテーブルによって設定されるプロセッサタスクのリンク付きリストの状態図である。FIG. 6 is a state diagram of a linked list of processor tasks set by the task table of FIG. 5. 図５のタスクテーブルとともに使用してプロセッサタスクの実行を管理することができるタスクキューの一例を示す図である。FIG. 6 is a diagram illustrating an example of a task queue that can be used with the task table of FIG. 5 to manage execution of processor tasks. 本発明の一つまたは複数の態様によるマルチプロセッサシステムによって実行可能なプロセスステップを示すフロー図である。FIG. 5 is a flow diagram illustrating process steps that may be performed by a multiprocessor system according to one or more aspects of the present invention. 本発明のマルチプロセッサシステムによって実行可能なプロセスステップを示すフロー図である。FIG. 5 is a flow diagram illustrating process steps that can be performed by the multiprocessor system of the present invention. 本発明のマルチプロセッサシステムによって実行可能なプロセスステップを示すフロー図である。FIG. 5 is a flow diagram illustrating process steps that can be performed by the multiprocessor system of the present invention. 共有メモリ内のプロセッサタスクを初期化し、本発明の様々な態様によるマルチプロセッサシステムにより実行可能なプロセスステップを示すフロー図である。FIG. 5 is a flow diagram illustrating process steps that can be performed by a multiprocessor system in accordance with various aspects of the present invention to initialize a processor task in shared memory. 本発明の一つまたは複数の態様による、プロセッサタスクの異なるステータス状態を示す状態図である。FIG. 6 is a state diagram illustrating different status states of a processor task in accordance with one or more aspects of the present invention. 本発明の一つまたは複数の態様による、プロセッサタスクが共有メモリからコピーされ共有メモリに書き戻される方法を示すブロック図である。FIG. 6 is a block diagram illustrating a method in which processor tasks are copied from shared memory and written back to shared memory in accordance with one or more aspects of the present invention. 図１３のコピーおよび書き戻し技術に関連するレイテンシの処理を示すタイミング図である。FIG. 14 is a timing diagram illustrating latency processing associated with the copy and write-back technique of FIG. 13. 本発明の一つまたは複数の態様による、プロセッサタスクが共有メモリからコピーされ共有メモリに書き戻される方法を示すブロック図である。FIG. 6 is a block diagram illustrating a method in which processor tasks are copied from shared memory and written back to shared memory in accordance with one or more aspects of the present invention. 図１５のコピーおよび書き戻し技術に関連するレイテンシの処理を示すタイミング図である。FIG. 16 is a timing diagram illustrating latency processing associated with the copy and writeback technique of FIG. 15. 本発明の一つまたは複数の態様による、プロセッサタスクが共有メモリからコピーされ共有メモリに書き戻される方法を示すブロック図である。FIG. 6 is a block diagram illustrating a method in which processor tasks are copied from shared memory and written back to shared memory in accordance with one or more aspects of the present invention. 図１７のコピーおよび書き戻し技術に関連するレイテンシの処理を示すタイミング図である。FIG. 18 is a timing diagram illustrating latency processing associated with the copy and write-back technique of FIG. 本発明の一つまたは複数の態様による、プロセッサタスクが共有メモリからコピーされ共有メモリに書き戻される方法を示すブロック図である。FIG. 6 is a block diagram illustrating a method in which processor tasks are copied from shared memory and written back to shared memory in accordance with one or more aspects of the present invention. 本発明の特定の態様のノン・プリエンプティブなプロセッサタスク移動を示すブロック図である。FIG. 6 is a block diagram illustrating non-preemptive processor task movement in accordance with certain aspects of the present invention. 本発明の特定の態様のノン・プリエンプティブなプロセッサタスク移動を示すブロック図である。FIG. 6 is a block diagram illustrating non-preemptive processor task movement in accordance with certain aspects of the present invention. 本発明の特定の態様のノン・プリエンプティブなプロセッサタスク移動を示すブロック図である。FIG. 6 is a block diagram illustrating non-preemptive processor task movement in accordance with certain aspects of the present invention. 本発明の特定の態様のプリエンプティブなマルチタスクを示すブロック図である。FIG. 6 is a block diagram illustrating preemptive multitasking of certain aspects of the present invention. 本発明の特定の態様のプリエンプティブなマルチタスクを示すブロック図である。FIG. 6 is a block diagram illustrating preemptive multitasking of certain aspects of the present invention. 本発明の特定の態様のプリエンプティブなプロセッサタスク移動を示すブロック図である。FIG. 6 is a block diagram illustrating preemptive processor task movement in accordance with certain aspects of the present invention. 本発明の特定の態様のプリエンプティブなプロセッサタスク移動を示すブロック図である。FIG. 6 is a block diagram illustrating preemptive processor task movement in accordance with certain aspects of the present invention. 本発明の一つまたは複数の態様による特定プロセッサ割り込み技術を示す部分的なブロック図と部分的なフロー図である。FIG. 5 is a partial block diagram and partial flow diagram illustrating a specific processor interrupt technique in accordance with one or more aspects of the present invention. 本発明の一つまたは複数の態様によるプロセッサ割り込み技術を示す部分的なブロック図と部分的なフロー図である。FIG. 6 is a partial block diagram and partial flow diagram illustrating processor interrupt techniques in accordance with one or more aspects of the present invention. 本発明の一つまたは複数の態様によるプロセッサ割り込み技術を示す部分的なブロック図と部分的なフロー図である。FIG. 6 is a partial block diagram and partial flow diagram illustrating processor interrupt techniques in accordance with one or more aspects of the present invention. 本発明の一つまたは複数の態様による二つ以上のサブ処理ユニットを含む処理システムの構造を示す図である。FIG. 2 is a diagram illustrating the structure of a processing system including two or more sub-processing units according to one or more aspects of the present invention. 本発明の一つまたは複数の態様による分散マルチプロセッサシステムのシステム図である。1 is a system diagram of a distributed multiprocessor system according to one or more aspects of the present invention. FIG. 本発明のマルチプロセッサユニットと共に使用可能なセルアプリケーションのブロック図である。FIG. 3 is a block diagram of a cell application that can be used with the multiprocessor unit of the present invention. 本発明の一実施形態を示すシステム図である。It is a system diagram showing an embodiment of the present invention. 本発明の広帯域エンジンタスクの一実施形態を示すブロック図である。FIG. 2 is a block diagram illustrating one embodiment of a broadband engine task of the present invention. 本発明のタスクテーブルおよびタスクエントリの一実施形態を示すブロック図である。It is a block diagram which shows one Embodiment of the task table and task entry of this invention. 本発明のタスクキューの一実施形態を示すブロック図である。It is a block diagram which shows one Embodiment of the task queue of this invention. 本発明のアプリケーションのバンドリングの一実施形態を示す部分的なフロー図である。FIG. 4 is a partial flow diagram illustrating one embodiment of application bundling of the present invention. 本発明のタスク移動の一実施形態を示す部分的なフロー図である。FIG. 4 is a partial flow diagram illustrating one embodiment of task migration of the present invention. 本発明のブロードキャストクエリメッセージの一実施形態を示すブロック図である。It is a block diagram which shows one Embodiment of the broadcast query message of this invention. 本発明のクエリ応答メッセージの一実施形態を示すブロック図である。It is a block diagram which shows one Embodiment of the query response message of this invention. 本発明の安全なセルランチの一実施形態を示すシステム図である。1 is a system diagram illustrating one embodiment of a secure cell launch of the present invention. FIG.

Explanation of symbols

６００セルサーバ、６１０ＢＥ、６５０ＤＭＡＣ、６６０共有メモリ、６７０ＢＥバス、６７５ＢＥ入出力チャネル、６８０ネットワークインタフェース、６９０セルバス、６９５ローカルセル入出力チャネル。 600 cell server, 610 BE, 650 DMAC, 660 shared memory, 670 BE bus, 675 BE input / output channel, 680 network interface, 690 cell bus, 695 local cell input / output channel.

Claims

A data processing method for moving a task from one multiprocessor to at least one multiprocessor via a network,
Decide whether to move a task from one multiprocessor to at least one multiprocessor,
Broadcasting an application from one of the multiprocessors when it is determined that the task should be moved to at least one multiprocessor;
The application specifies a plurality of tasks and one attribute, and the attribute is a distance of the application from at least one of the preceding multiprocessor that broadcast the application, or the multiprocessor that initially broadcast the application, and A data processing method characterized by representing a deadline for ending each of a plurality of tasks.

The data processing method according to claim 1, wherein the attribute specifies a processing capability necessary for execution of a task.

Receiving the broadcasted application on a second multiprocessor;
Distribute the task specified in the application,
In the second multiprocessor, the method further includes examining an attribute designating a processing capability necessary for executing the task to determine whether or not the task should be executed in the second multiprocessor. The data processing method according to claim 2.

The second multiprocessor includes a distance from the application to at least one of the preceding multiprocessor that broadcasts the application or the multiprocessor that first broadcast the application, and a deadline for ending each of the plurality of tasks. And determining whether or not the task should be executed by the second multiprocessor. 4. The data processing method according to claim 3, wherein:

In the second multiprocessor, the method further comprises examining an attribute designating a memory required for executing the task to determine whether or not the task should be executed by the second multiprocessor. The data processing method according to claim 3.

The second multiprocessor includes a distance from the application to at least one of the preceding multiprocessor that broadcasts the application or the multiprocessor that first broadcast the application, and a deadline for ending each of the plurality of tasks. And determining whether or not the task should be executed by the second multiprocessor.

6. The method according to claim 3, further comprising: communicating whether or not the second multiprocessor is executing a task to the first multiprocessor. Data processing method.

The data processing method according to claim 1, wherein the attribute specifies a memory size necessary for execution of a task.

Receiving the broadcasted application on a second multiprocessor;
Distribute the task specified in the application,
In the second multiprocessor, further comprising examining an attribute designating memory required to execute the task to determine whether the task should be executed on the second multiprocessor; The data processing method according to claim 8.

The data processing method according to claim 1, wherein the application specifies the plurality of tasks by holding the plurality of tasks.

The data processing method according to claim 1, wherein the application specifies the plurality of tasks by pointer information indicating a location where the task is located.

Receiving the broadcasted application on a second multiprocessor;
Distribute the task specified in the application,
The method further comprises examining in the second multiprocessor an attribute that specifies a requirement to execute the task to determine whether the task should be executed on the second multiprocessor. Item 2. A data processing method according to Item 1.

The second multiprocessor includes a distance of the application from at least one of the preceding multiprocessor that broadcasts the application or the multiprocessor that initially broadcast the application, a deadline for ending a plurality of tasks, 13. The data processing method according to claim 12, further comprising: determining whether or not the task should be executed by the second multiprocessor.

14. The method of claim 9, 12 or 13, further comprising the second multiprocessor communicating to the first multiprocessor whether or not the second multiprocessor is performing a task. Data processing method.

Receive broadcast applications on multiple other multiprocessors,
The task specified in the application is distributed to each of a plurality of other multiprocessors,
13. The method of claim 12, further comprising: examining, in each of a plurality of other multiprocessors, an attribute specifying a requirement for executing the task to determine whether the task should be executed. The data processing method described.

The distance of the application from at least one of the previous multiprocessor that broadcast the application or the multiprocessor that first broadcast the application, and the deadline for completing the tasks 16. The data processing method according to claim 15, wherein each of a plurality of other multiprocessors determines whether or not the task should be executed.

16. The data processing method according to claim 15, wherein a plurality of other multiprocessors communicate with the first multiprocessor whether or not a task is being executed.

A data processing system for moving tasks,
Network,
A plurality of multiprocessors connected to the network;
Means for determining whether to move a task from one multiprocessor to at least one multiprocessor;
Means for broadcasting an application over a network from one of the multiprocessors when it is determined that the task should be moved to at least one multiprocessor;
The application specifies a plurality of tasks and one attribute, and the attribute is a distance of the application from at least one of the preceding multiprocessor that broadcast the application, or the multiprocessor that initially broadcast the application, and A data processing system representing a deadline for ending each of a plurality of tasks.

The data processing system according to claim 18, wherein the attribute specifies a processing capability necessary for execution of a task.

The data processing system according to claim 19, wherein the attribute specifies a memory size necessary for execution of a task.

Means for receiving the broadcasted application on a second multiprocessor;
Means for distributing tasks in the application in a second multiprocessor;
Means for examining in the second multiprocessor an attribute designating a processing capability required to execute the task and determining whether the task should be executed by the second multiprocessor;
The data processing system according to claim 19, further comprising:

The second multiprocessor includes a distance from the application to at least one of the preceding multiprocessor that broadcasts the application or the multiprocessor that first broadcast the application, and a deadline for ending each of the plurality of tasks. The data processing system according to claim 21, further comprising means for determining whether the task should be executed by the second multiprocessor.

The second multiprocessor includes means for examining an attribute designating a memory required for executing the task and determining whether or not the task should be executed by the second multiprocessor. The data processing system according to claim 22.

The second multiprocessor includes a distance from the application to at least one of the preceding multiprocessor that broadcasts the application or the multiprocessor that first broadcast the application, and a deadline for ending each of the plurality of tasks. 24. The data processing system according to claim 23, further comprising means for determining whether or not the task should be executed by the second multiprocessor.

Means for receiving the broadcasted application on a second multiprocessor;
Means for distributing tasks in the application in a second multiprocessor;
Means for examining in the second multiprocessor an attribute designating memory required to execute the task and determining whether the task should be executed on the second multiprocessor;
The data processing system according to claim 22, further comprising:

The second multiprocessor comprises means for communicating to the first multiprocessor whether or not the second multiprocessor is performing a task. The data processing system described in 1.

The data processing system according to claim 18, wherein the attribute specifies a memory size necessary for execution of a task.

The data processing system according to claim 18, wherein the application specifies the plurality of tasks by holding the plurality of tasks.

19. The data processing system according to claim 18, wherein the application designates the plurality of tasks by pointer information indicating a location where the task is located.

A second multiprocessor connected to the network;
Means for receiving the broadcasted application on a second multiprocessor;
Means for distributing tasks in the application;
Means for examining in the second multiprocessor an attribute that specifies a requirement for executing the task and determining whether the task should be executed on the second multiprocessor;
The data processing system according to claim 18, further comprising:

The second multiprocessor includes a distance from the application to at least one of the preceding multiprocessor that broadcasts the application or the multiprocessor that first broadcast the application, and a deadline for ending each of the plurality of tasks. The data processing system according to claim 30, further comprising means for determining whether the task should be executed by the second multiprocessor.

32. The data processing system of claim 31, wherein the second multiprocessor further comprises means for communicating to the first multiprocessor whether the second multiprocessor is performing a task.

Means for receiving the broadcasted application on multiple other multiprocessors;
Means for distributing tasks in the application with each of a plurality of other multiprocessors;
Means for examining in each of a plurality of other multiprocessors an attribute that specifies a requirement for performing the task and determining whether the task should be performed;
The data processing system according to claim 31, further comprising:

Each of the other multiprocessors terminates each of the tasks and the distance of the application from at least one of the previous multiprocessor that broadcast the application or the multiprocessor that first broadcast the application. 34. The data processing system according to claim 33, further comprising means for determining whether or not the task should be executed by checking the deadline.

34. The data processing system of claim 33, wherein the plurality of other multiprocessors comprise means for communicating to the first multiprocessor whether the task is being executed.

31. The data processing system of claim 30, wherein the second multiprocessor further comprises means for communicating to the first multiprocessor whether or not the second multiprocessor is performing a task.

A data processing device for moving a task,
A multiprocessor connectable to a network, the multiprocessor programmed to determine whether to perform a task by the multiprocessor or to move to at least one multiprocessor connected to the network;
When said multiprocessor determines that the task should be moved to at least one multiprocessor, it instructs to broadcast the application from said multiprocessor over the network;
The application specifies a plurality of tasks and one attribute, and the attribute is a distance of the application from at least one of the preceding multiprocessor that broadcast the application, or the multiprocessor that initially broadcast the application, and A data processing device representing a deadline for ending each of a plurality of tasks.