JP2004086921A

JP2004086921A - Multiprocessor system and method for executing task in multiprocessor system thereof

Info

Publication number: JP2004086921A
Application number: JP2003363920A
Authority: JP
Inventors: Tetsuya Tanaka; 田中　哲也; Akira Fukuda; 福田　晃; Hitoshi Tanuma; 田沼　仁
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1995-02-24
Filing date: 2003-10-23
Publication date: 2004-03-18

Abstract

<P>PROBLEM TO BE SOLVED: To provide a task executing method suitable to a task of fine granularity. <P>SOLUTION: This multiprocessor system comprises a step wherein whether or not there is a processor having a 'free state' among the processors 30-32 when a processor which is executing a task T1 among the processors 30-32 generates a new task T2, a step wherein when the processor having the 'free state' is detected, the task T2 begins to be executed by the processor by being assigned to the processor and the state of the processor is changed from the 'free state' to an 'execution state' to store a flag having a first value indicating that the execution of the task T1 is not interrupted. The system also includes a step wherein the execution of the task T1 is interrupted and the task T2 which is interrupted begins to be executed by the processor to store a flag having a second value indicating that the execution of the task T1 has been interrupted when the processor having a 'free state' is not detected. <P>COPYRIGHT: (C)2004,JPO

Description

　本発明は、複数のタスクを並列に実行する複数のプロセッサを含むマルチプロセッサシステムおよびそのマルチプロセッサシステムにおいてタスクを実行する方法に関する。 The present invention relates to a multiprocessor system including a plurality of processors that execute a plurality of tasks in parallel and a method of executing the tasks in the multiprocessor system.

　近年、マルチプロセッサシステムは汎用計算機の並列処理による高性能化のアプローチの一つとして注目されている。マルチプロセッサシステムにおいては複数のプロセッサを一つのバスに接続し、主記憶装置を共有する共有メモリ型のマルチプロセッサシステムが主に採用されている。 In recent years, multiprocessor systems have attracted attention as one of the approaches to high performance by parallel processing of general-purpose computers. In a multiprocessor system, a shared memory type multiprocessor system in which a plurality of processors are connected to one bus and a main storage device is shared is mainly adopted.

　このようなマルチプロセッサシステムは通常、複数のプロセッサチップをプリント基板上に実装するため、各プロセッサの処理速度に対し、プロセッサ間のバスを用いる通信や同期の処理速度は遅い。そのため、処理単位であるタスクの処理時間がプロセッサ間の通信や同期の時間に対し十分大きい場合に用いられる。この場合のタスクの大きさは中粒度〜粗粒度と呼ばれ実行命令数で数１０００命令程度以上とされている。このように、処理単位を大きくする（粒度を粗くする）ことでタスクの実行時間に対して相対的にプロセッサ通信や同期の時間を小さくしている。 (4) In such a multiprocessor system, since a plurality of processor chips are usually mounted on a printed circuit board, the processing speed of communication and synchronization using a bus between processors is slower than the processing speed of each processor. Therefore, it is used when the processing time of a task, which is a processing unit, is sufficiently longer than the communication or synchronization time between processors. In this case, the size of the task is called medium-grain size to coarse-grain size, and the number of executed instructions is about several thousands or more. In this way, by increasing the processing unit (roughening the granularity), the time for processor communication and synchronization is reduced relative to the task execution time.

　さらに、近年半導体の集積化技術は急速に発展している。そのため、チップ内に多くの機能ユニットやメモリを搭載することができるようになってきている。マルチプロセッサシステムにおいても今後複数のプロセッサをワンチップに搭載することが可能になると思われる。その場合、プロセッサが接続されるバスもチップ内に入ることになりプロセッサ間の通信や同期の高速化はそこで実行するタスクの粒度の選択肢を広げる。即ち、タスクの大きさが細粒度、命令数で数１０〜数１００命令程度の並列処理が可能になりつつある。今後、このような細粒度のタスクを並列処理することが主流になると予想される。近年注目されているオブジェクト指向プログラミングや関数型言語を用いたプログラミングは、いずれも「細粒度のタスクを並列処理する」ことに合致したものであるからである。 Furthermore, in recent years, semiconductor integration technology has been rapidly developing. Therefore, many functional units and memories can be mounted in a chip. It is expected that multiple processors can be mounted on a single chip in a multiprocessor system in the future. In that case, the bus to which the processor is connected is also included in the chip, so that the speed of communication and synchronization between the processors increases the choice of the granularity of the task to be executed there. In other words, it is becoming possible to perform parallel processing in which the task size is fine and the number of instructions is several tens to several hundreds of instructions. It is expected that parallel processing of such fine-grained tasks will become mainstream in the future. This is because object-oriented programming and programming using a functional language, which have attracted attention in recent years, all conform to “parallel processing of fine-grained tasks”.

　一方、マルチプロセッサシステムでは、複数のタスクを物理的に限られたプロセッサ数に割り当てることになるため、タスクの実行順序を決定し、どのプロセッサに対しどのタスクを割り当てるかを適切に選択することが行われる。この処理を動的に行うため、まず実行待ちタスクを一次記憶などのタスク管理装置に格納しておき、次に空きプロセッサを検出し、空きプロセッサがある場合は、実行待ちタスクの中から実行すべきタスクを選択し、選択したタスクを空きプロセッサに割り当てることが行われる。このときのタスク選択は仕事全体の実行時間を最小にするなどの目的で行われる。こういったタスクの実行順序を決定し、タスクをどのプロセッサに割り当てるかを決定する処理をスケジューリングといい、決定方法の異なるさまざまなアルゴリズムがある。また、タスク生成によって実行すべきタスクが生じた場合、タスク管理装置に実行待ちタスクとして登録する処理もある。 On the other hand, in a multiprocessor system, a plurality of tasks are assigned to a physically limited number of processors, so it is necessary to determine the execution order of the tasks and appropriately select which tasks are assigned to which processors. Done. In order to perform this processing dynamically, first, a task waiting to be executed is stored in a task management device such as a primary storage, and then a free processor is detected. If there is a free processor, the task is executed from the task waiting to be executed. A task to be selected is selected, and the selected task is assigned to a free processor. The task selection at this time is performed for the purpose of minimizing the execution time of the entire task. The process of deciding the execution order of such tasks and deciding to which processor the tasks are assigned is called scheduling, and there are various algorithms having different decision methods. Further, when a task to be executed is generated by task generation, there is a process of registering the task as a task waiting to be executed in the task management device.

　図１２にマルチプロセッサシステムにおける、従来のプロセッサ割当方法の動作説明図を示す。図１２において、プロセッサ２はタスクを生成し、実行待ちのタスクとしてタスク管理装置にタスク４を登録している。プロセッサ０はプロセッサ１が「空き状態」であることを検出すると、タスク管理装置の実行待ちのタスクをスケジューリングアルゴリズムにしたがって一つを選択し、選択されたタスクはプロセッサ０によりプロセッサ１に割り当てられる。このとき、プロセッサ０はスケジューリングの処理を、プロセッサ２はタスク登録の処理をそれぞれ行っている。 FIG. 12 is a diagram illustrating the operation of a conventional processor allocation method in a multiprocessor system. In FIG. 12, the processor 2 generates a task, and registers the task 4 in the task management device as a task waiting to be executed. When detecting that the processor 1 is in the “free state”, the processor 0 selects one of the tasks waiting to be executed by the task management device according to the scheduling algorithm, and the selected task is assigned to the processor 1 by the processor 0. At this time, the processor 0 performs the scheduling process, and the processor 2 performs the task registration process.

　これは、例えば特開昭６３−２０８９４８号公報に示すように空きプロセッサ（図１２ではプロセッサ１）がタスクレディーキュー（図１２ではタスク管理装置）の監視を行い、実行待ちのタスクを自動的に取り出し処理する場合でも、「空き状態」のプロセッサがスケジューリングの処理を行っている。 For example, as shown in Japanese Patent Application Laid-Open No. 63-208948, an empty processor (processor 1 in FIG. 12) monitors a task ready queue (task management device in FIG. 12) and automatically executes a task waiting to be executed. Even in the case of performing the extraction processing, the processor in the “empty state” is performing the scheduling processing.

　また、例えば特開昭６２−１９０５４８号公報に示されるように、タスクを依頼した依頼プロセッサが、依頼された被依頼プロセッサでのタスクの状態を監視しておき、被依頼プロセッサがタスクの終了を検出した場合、空きプロセッサとなった被依頼プロセッサにほかのタスクを適切に選択し割り当てる方法がある。この方法においては、依頼プロセッサが被依頼プロセッサの状態を監視する処理を行っている。 Further, as shown in, for example, Japanese Patent Application Laid-Open No. 62-190548, a requesting processor that requests a task monitors the state of the task in the requested processor, and the requested processor terminates the task. When the detection is detected, there is a method of appropriately selecting and assigning another task to the requested processor that has become an empty processor. In this method, the requesting processor performs a process of monitoring the state of the requested processor.

　前記したスケジューリング処理やタスクの登録処理、もしくは被依頼プロセッサを監視する処理はそれぞれ内容は異なるもののタスクをプロセッサに割り当て実行するまでのオーバヘッド即ちタスク処理に付随するオーバヘッドと考えることができる。図１３はタスクの処理時間と前記したオーバヘッドの処理時間のタイムチャートを示している。図１３に示すようにタスクの粒度が中〜粗粒度の場合はタスクの処理時間に対してオーバヘッドの処理時間が相対的に小さいため、オーバヘッドの処理時間を無視できるレベルにある。 The above-described scheduling processing, task registration processing, or processing for monitoring a requested processor can be considered as overhead involved in assigning a task to a processor, although the contents are different, and accompanying the task processing. FIG. 13 shows a time chart of the processing time of the task and the processing time of the above-mentioned overhead. As shown in FIG. 13, when the granularity of the task is medium to coarse, the overhead processing time is relatively small with respect to the task processing time, so that the overhead processing time can be ignored.

　しかしながら、上記のようなタスク処理に付随するオーバヘッドを持つマルチプロセッサシステムにおいて、プロセッサ間の通信や同期を高速化することで細粒度の並列処理を行う場合は、タスクの処理時間に対して相対的にオーバヘッドの処理時間が大きくなる。 However, in a multiprocessor system having overhead associated with task processing as described above, when fine-grained parallel processing is performed by speeding up communication and synchronization between processors, the relative processing time relative to the processing time of the task is increased. In addition, the overhead processing time increases.

　図１４は細粒度の場合のタスクの処理時間とオーバヘッドの処理時間のタイムチャートを示している。図１４に示すようにオーバヘッドの処理時間はタスクの処理時間に比べて相対的に大きくなり、オーバヘッドの処理時間が無視できず仕事全体としての処理時間が大きくなるという問題を有する。 FIG. 14 shows a time chart of the processing time of the task and the processing time of the overhead in the case of fine granularity. As shown in FIG. 14, the processing time of the overhead becomes relatively longer than the processing time of the task, and there is a problem that the processing time of the overhead cannot be ignored and the processing time of the entire work increases.

　本発明は上記問題点に鑑み、細粒度の並列処理をプロセッサ間の通信や同期が高速なマルチプロセッサにおいて、タスク管理やスケジューリング、タスク状態の監視を行わないことで、前記したオーバヘッドをなくし、その代わりのプロセッサに対する動的なタスク割当を一元的、単純かつ高速に行う方法を提供することにある。 In view of the above problems, the present invention eliminates the above-described overhead by performing fine-grained parallel processing in a multiprocessor in which communication and synchronization between processors are high-speed, by not performing task management, scheduling, and monitoring of the task state. It is an object of the present invention to provide a method for unified, simple and fast dynamic assignment of tasks to alternative processors.

　本発明の方法は、「空き状態」と「実行状態」とを有する複数のプロセッサを含むマルチプロセッサシステムにおいてタスクを実行する方法であって、該複数のプロセッサのうち第１タスクを実行中の第１プロセッサが新たな第２タスクを生成した場合において、該複数のプロセッサのうち「空き状態」を有する第２プロセッサがあるか否かを検出するステップと、「空き状態」を有する第２プロセッサが検出された場合には、該第２タスクを該第２プロセッサに割り当てることにより、該第２プロセッサによる該第２タスクの実行を開始し、該第２プロセッサの状態を「空き状態」から「実行状態」に変更し、該第１タスクの実行が中断されていないことを示す第１の値を有するフラグを格納するステップと、「空き状態」を有する第２プロセッサが検出されない場合には、該第１プロセッサによる該第１タスクの実行を中断し、該第１プロセッサによる該第２タスクの実行を開始し、該第１タスクの実行が中断されたことを示す第２の値を有するフラグを格納するステップとを包含しており、これにより上記目的が達成される。 The method of the present invention is a method of executing a task in a multiprocessor system including a plurality of processors having an “empty state” and an “executing state”, wherein the first processor executes a first task among the plurality of processors. Detecting whether there is a second processor having an “empty state” among the plurality of processors when one processor generates a new second task; If it is detected, the execution of the second task by the second processor is started by allocating the second task to the second processor, and the state of the second processor is changed from “free” to “executed”. Changing to a “status” and storing a flag having a first value indicating that execution of the first task is not interrupted; If the processor is not detected, the execution of the first task by the first processor is interrupted, the execution of the second task by the first processor is started, and the execution of the first task is interrupted. Storing a flag having the indicated second value, thereby achieving the above object.

　前記方法は、前記第２タスクの実行が終了した後、前記フラグが前記第１の値と前記第２の値のうちのいずれを有するかを判定するステップと、前記フラグが前記第１の値を有すると判定された場合には、前記第２プロセッサの状態を「実行状態」から「空き状態」に変更するステップと、前記フラグが前記第２の値を有すると判定された場合には、前記第１タスクの実行が中断されたところから前記第１プロセッサによる前記第１タスクの実行を再開するステップとをさらに包含してもよい。 Determining whether the flag has the first value or the second value after the execution of the second task is completed; and determining whether the flag has the first value. When it is determined that the flag has the second value, the step of changing the state of the second processor from the “execution state” to the “free state”, and when it is determined that the flag has the second value, Restarting the execution of the first task by the first processor from the point where the execution of the first task has been interrupted.

　前記複数のプロセッサのそれぞれは、前記複数のプロセッサを互いに識別する識別子を有しており、前記「空き状態」を有する第２プロセッサの検出は、該識別子を用いて行われてもよい。それぞれ Each of the plurality of processors may have an identifier for identifying the plurality of processors from each other, and the detection of the second processor having the “free state” may be performed using the identifier.

　前記複数のプロセッサのそれぞれは、タスクを割り当てる優先順位を決定する優先度を有しており、前記第２プロセッサへの前記第２タスクの割り当ては、該優先度に基づいて行われてもよい。 {Each of the plurality of processors may have a priority for determining a priority for assigning a task, and the assignment of the second task to the second processor may be performed based on the priority.

　本発明の他の方法は、「空き状態」と「実行状態」とを有する複数のプロセッサを含むマルチプロセッサシステムにおいて、「停止状態」と「第１実行状態」と「第２実行状態」とを有するタスクを実行する方法であって、該複数のプロセッサのうち第１タスクを実行中の第１プロセッサが新たな第２タスクを生成した場合において、該複数のプロセッサのうち「空き状態」を有する第２プロセッサがあるか否かを検出するステップと、「空き状態」を有する第２プロセッサが検出された場合には、該第２タスクを該第２プロセッサに割り当てることにより、該第２プロセッサによる該第２タスクの実行を開始し、該第２プロセッサの状態を「空き状態」から「実行状態」に変更し、該第２タスクの状態を「停止状態」から「第１実行状態」に変更するステップと、「空き状態」を有する第２プロセッサが検出されない場合には、該第１プロセッサによる該第１タスクの実行を中断し、該第１プロセッサによる該第２タスクの実行を開始し、該第２タスクの状態を「停止状態」から「第２実行状態」に変更するステップとを包含しており、これにより上記目的が達成される。 According to another method of the present invention, in a multiprocessor system including a plurality of processors having an “empty state” and an “executing state”, a “stop state”, a “first execution state”, and a “second execution state” are set. A method of executing a task having a “free state” among the plurality of processors when the first processor executing the first task among the plurality of processors generates a new second task. Detecting whether there is a second processor; and, if a second processor having an "empty state" is detected, assigning the second task to the second processor, The execution of the second task is started, the state of the second processor is changed from “free state” to “execution state”, and the state of the second task is changed from “stopped state” to “first execution state”. And, if no second processor having an “empty state” is detected, suspend execution of the first task by the first processor and start execution of the second task by the first processor And changing the state of the second task from the “stopped state” to the “second execution state”, thereby achieving the above object.

　前記方法は、前記第２タスクの実行が終了した後、前記第２タスクの状態を判定するステップと、前記第２タスクが「第１実行状態」を有すると判定された場合には、前記第２プロセッサの状態を「実行状態」から「空き状態」に変更し、前記第２タスクの状態を「第１実行状態」から「停止状態」に変更するステップと、前記第２タスクが「第２実行状態」を有すると判定された場合には、前記第２タスクの状態を「第２実行状態」から「停止状態」に変更するステップとをさらに包含してもよい。 The method further comprises: after completion of the execution of the second task, determining a state of the second task; and determining that the second task has a “first execution state”, Changing the state of the two processors from "execution state" to "empty state" and changing the state of the second task from "first execution state" to "stop state"; If it is determined that the second task has the “execution state”, the method may further include a step of changing the state of the second task from the “second execution state” to the “stop state”.

　本発明のマルチプロセッサシステムは、複数のタスクを並列に実行する複数のプロセッサと、該複数のプロセッサの状態を管理し、該複数のプロセッサのそれぞれからの問い合わせに応じて「空き状態」のプロセッサの識別子を返す状態管理手段とを備えており、該複数のプロセッサのそれぞれは、新たなタスクが発生した時点で、該状態管理手段に対して「空き状態」のプロセッサがあるか否かを問い合わる。これにより上記目的が達成される。 The multiprocessor system of the present invention manages the state of each of a plurality of processors that execute a plurality of tasks in parallel and the plurality of processors, and responds to an inquiry from each of the plurality of processors. State management means for returning an identifier, and each of the plurality of processors inquires of the state management means whether or not there is a "vacant" processor when a new task occurs. You. This achieves the above object.

　前記状態管理手段は、該プロセッサからの問い合わせに応答して、現在の状態を次の状態に遷移させる手段と、該次の状態に基づいて該問い合わせに対する応答を出力する手段とを備えていてもよい。 The state management unit may include a unit that changes a current state to a next state in response to an inquiry from the processor, and a unit that outputs a response to the inquiry based on the next state. Good.

　前記マルチプロセッサシステムは、該複数のプロセッサのそれぞれについて、命令キャッシュメモリとデータキャッシュメモリとをさらに備えていてもよい。 The multiprocessor system may further include an instruction cache memory and a data cache memory for each of the plurality of processors.

　前記マルチプロセッサシステムは、前記複数のプロセッサ間で命令アドレスおよびパケットアドレスを転送するためのネットワークをさらに備えていてもよい。 The multiprocessor system may further include a network for transferring instruction addresses and packet addresses among the plurality of processors.

　該複数のタスクのそれぞれは、細粒度であってもよい。それぞれ Each of the plurality of tasks may be fine-grained.

　本発明によれば、あるプロセッサで新たなタスクを生成したときにそのタスクの実行を他あるいは自プロセッサによりただちに開始することができる。このことは、タスクを保持しておく機構やタスクの実行順序をスケジューリングする機構を不要にする。また、実行待ちのタスクを選択し、その選択されたタスクを「空き状態」のプロセッサに割り当てる処理も不要となる。 According to the present invention, when a new task is generated by a certain processor, the execution of the task can be immediately started by another or its own processor. This eliminates the need for a mechanism for holding tasks and a mechanism for scheduling the execution order of tasks. Further, there is no need to perform a process of selecting a task waiting to be executed and assigning the selected task to a “vacant” processor.

　その結果、タスクの処理時間に比較してプロセッサ割り当てに要する時間が少なくてすむ。これにより、マルチプロセッサシステムにおいて、粒度の細かい並列処理の高速化を図ることができる。結果 As a result, the time required for processor assignment is shorter than the task processing time. As a result, in a multiprocessor system, high-speed parallel processing with a fine granularity can be achieved.

　以下、図面を参照しながら、本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

　図１は、本発明のマルチプロセッサシステム１の構成を示す。マルチプロセッサシステム１は、集積回路上にインプリメントされる。マルチプロセッサシステム１は、バスを介して主記憶装置２に接続される。 FIG. 1 shows a configuration of a multiprocessor system 1 of the present invention. The multiprocessor system 1 is implemented on an integrated circuit. The multiprocessor system 1 is connected to the main storage device 2 via a bus.

　マルチプロセッサシステム１は、要素プロセッサユニット１０〜１２を含む。要素プロセッサユニット１０〜１２のそれぞれは、同一の構成を有している。マルチプロセッサシステム１に含まれる要素プロセッサユニットの数は、３に限定されるわけではない。マルチプロセッサシステム１は、任意の個数の要素プロセッサユニットを含み得る。 The multiprocessor system 1 includes element processor units 10 to 12. Each of the element processor units 10 to 12 has the same configuration. The number of element processor units included in the multiprocessor system 1 is not limited to three. Multiprocessor system 1 may include any number of element processor units.

　要素プロセッサユニット１０〜１２は、それぞれ、プロセッサ３０〜３２と命令キャッシュ（ＩＣ）３３〜３５とデータキャッシュ（ＤＣ）３６〜３８とを有している。命令キャッシュ（ＩＣ）は、命令を格納するためのキャッシュメモリであり、読み出し専用である。データキャッシュ（ＤＣ）は、データを格納するためのキャッシュメモリであり、読み出しと書き込みができる。 The element processor units 10 to 12 have processors 30 to 32, instruction caches (IC) 33 to 35, and data caches (DC) 36 to 38, respectively. The instruction cache (IC) is a cache memory for storing instructions and is read-only. The data cache (DC) is a cache memory for storing data, and can be read and written.

　共有キャッシュ２０は、要素プロセッサユニット１０〜１２によって共有されている。命令セットやデータセットは、通常、主記憶装置２に格納されている。データセットは、必要に応じてバスインタフェース２３を介して共有キャッシュ２０にロードされる。共有キャッシュ２０は、主記憶装置２と比較して非常に高速に動作することが好ましい。データキャッシュ（ＤＣ）と共有キャッシュ２０とは、アドレスに応じて使い分けられる。例えば、アドレスが０ｘ００００００００〜０ｘ７ｆｆｆｆｆｆｆの範囲内である場合には、データキャッシュ（ＤＣ）がアクセスされ、アドレスが０ｘ８０００００００〜０ｘｆｆｆｆｆｆｆｆの範囲内である場合には、共有キャッシュ２０がアクセスされる。 The shared cache 20 is shared by the element processor units 10 to 12. The instruction set and the data set are usually stored in the main storage device 2. The data set is loaded into the shared cache 20 via the bus interface 23 as needed. The shared cache 20 preferably operates at a very high speed as compared with the main storage device 2. The data cache (DC) and the shared cache 20 are selectively used depending on the address. For example, when the address is in the range of 0x00000000 to 0x7ffffffff, the data cache (DC) is accessed, and when the address is in the range of 0x80000000 to 0xffffffff, the shared cache 20 is accessed.

　要素プロセッサユニット１０〜１２は、ネットワーク２１を介して相互に接続される。ネットワーク２１は、要素プロセッサユニット１０〜１２の相互間で命令アドレスやパケットアドレスを転送するために使用される。ネットワーク２１は、例えば、３×３のクロスバースイッチを用いて実現することができる。 The element processor units 10 to 12 are mutually connected via the network 21. The network 21 is used to transfer instruction addresses and packet addresses between the element processor units 10 to 12. The network 21 can be realized using, for example, a 3 × 3 crossbar switch.

　プロセッサ状態管理装置２２は、プロセッサ３０〜３２の状態を管理する。プロセッサ３０〜３２のそれぞれは、「実行状態」および「空き状態」のいずれか一方の状態を有する。 The processor status management device 22 manages the status of the processors 30 to 32. Each of the processors 30 to 32 has one of an “execution state” and an “empty state”.

　プロセッサ３０〜３２のそれぞれには固定された優先度が予め割り当てられている。ここでは、プロセッサ３０〜３３は、この順番に高い優先度を有していると仮定する。優先度は、複数のプロセッサがプロセッサ状態管理装置２２を同時にアクセスする場合において、その複数のプロセッサのうちのどのプロセッサにプロセッサ状態管理装置２２に優先的にアクセスすることを許すかを決定するために使用される。 A fixed priority is assigned to each of the processors 30 to 32 in advance. Here, it is assumed that the processors 30 to 33 have the higher priority in this order. The priority is used to determine which of the plurality of processors is allowed to access the processor state management device 22 preferentially when a plurality of processors simultaneously access the processor state management device 22. used.

　プロセッサ３０〜３２のそれぞれは、プロセッサ３０〜３２を互いに識別するための識別子（ＩＤ）を有している。典型的には、識別子（ＩＤ）は、番号によって表現される。 Each of the processors 30 to 32 has an identifier (ID) for distinguishing the processors 30 to 32 from each other. Typically, an identifier (ID) is represented by a number.

　プロセッサ３０〜３２のそれぞれは、それの内部にパケットのアドレスを保持する。パケットのアドレスは、例えば、プロセッサ３０〜３２の内部のレジスタ（図示せず）に保持される。これにより、プロセッサ３０〜３２は、パケットを参照することができる。パケットの詳細は、図６を参照して後述される。 Each of the processors 30 to 32 holds the address of the packet therein. The address of the packet is held, for example, in a register (not shown) inside the processors 30 to 32. Thereby, the processors 30 to 32 can refer to the packet. Details of the packet will be described later with reference to FIG.

　マルチプロセッサシステム１は、複数のタスクを並列に実行する機能を有する。例えば、プロセッサ３０がタスクＴ１を実行しているのと並行して、プロセッサ３１はタスクＴ２を実行することができる。 The multiprocessor system 1 has a function of executing a plurality of tasks in parallel. For example, the processor 31 can execute the task T2 while the processor 30 is executing the task T1.

　本明細書では、「タスク」とは、命令セットとデータセットとの組であると定義する。命令セットとデータセットとは、いずれも主記憶装置２に格納される。プロセッサ３０〜３２のそれぞれは、命令セットから命令を逐次読み出し、読み出された命令を解釈実行する。データセットは、プロセッサ３０〜３２が命令セットから読み出された命令を解釈実行する際、必要に応じて参照される。また、後述されるパケットは、データセットの少なくとも一部である。では In this specification, a “task” is defined as a set of an instruction set and a data set. The instruction set and the data set are both stored in the main storage device 2. Each of the processors 30 to 32 sequentially reads instructions from the instruction set, and interprets and executes the read instructions. The data set is referred to as necessary when the processors 30 to 32 interpret and execute the instruction read from the instruction set. A packet described later is at least a part of the data set.

　図２は、タスクの概念を模式的に示す。この例では、タスク１は、命令セット１とデータセット１の組によって定義され、タスク２は、命令セット１とデータセット２の組によって定義され、タスク３は、命令セット２とデータセット３の組によって定義される。命令セット１〜２とデータセット１〜３は、それぞれ、主記憶装置２に格納されている。 Fig. 2 schematically shows the concept of a task. In this example, task 1 is defined by a set of instruction set 1 and data set 1, task 2 is defined by a set of instruction set 1 and data set 2, and task 3 is defined by instruction set 2 and data set 3. Defined by tuple. The instruction sets 1 and 2 and the data sets 1 to 3 are stored in the main storage device 2, respectively.

　図３は、プロセッサ３０〜３２の状態を管理するプロセッサ状態管理装置２２の構成例を示す。プロセッサ状態管理装置２２は、入力（ＲＥＱ０〜ＲＥＱ２、ＲＥＳＥＴ０〜ＲＥＳＥＴ２）に応答して出力（ＩＤ０〜ＩＤ２、ＮＭＰ０〜ＮＭＰ２）を提供する組み合わせ回路を含んでいる。その組み合わせ回路は、現在の状態（Ｓ）と入力（ＲＥＱ０〜ＲＥＱ２、ＲＥＳＥＴ０〜ＲＥＳＥＴ２）とに応じて次の状態（ｎｅｘｔＳ）を決定し、次の状態に対応する出力（ＩＤ０〜ＩＤ２、ＮＭＰ０〜ＮＭＰ２）を提供する。現在の状態（Ｓ）から次の状態（ｎｅｘｔＳ）への遷移は、例えば、表１に示される状態遷移表に従って決定される。 FIG. 3 shows a configuration example of the processor state management device 22 that manages the states of the processors 30 to 32. Processor state management device 22 includes a combinational circuit that provides outputs (ID0-ID2, NMP0-NMP2) in response to inputs (REQ0-REQ2, RESET0-RESET2). The combinational circuit determines the next state (nextS) according to the current state (S) and the inputs (REQ0 to REQ2, RESET0 to RESET2), and outputs (ID0 to ID2, NMP0 to NMP0) corresponding to the next state. NMP2). The transition from the current state (S) to the next state (nextS) is determined according to, for example, a state transition table shown in Table 1.

　図３において、Ｓは現在の状態、ＮｅｘｔＳは次の状態を示す。これらの状態は、プロセッサ３０〜３２の状態を示す。例えば、Ｓ＝００１は、プロセッサ３０の状態が「実行状態」であり、プロセッサ３１とプロセッサ３２の状態が「空き状態」であることを示している。ＮｅｘｔＳについても同様である。

In FIG. 3, S indicates the current state, and NextS indicates the next state. These states indicate the states of the processors 30 to 32. For example, S = 001 indicates that the state of the processor 30 is “execution state” and the states of the processors 31 and 32 are “empty state”. The same applies to NextS.

　図３において、ＲＥＱ０〜ＲＥＱ２は、プロセッサ３０〜３２からプロセッサ状態管理装置２２に入力されるリクエストを表す。これらのリクエストは、「空き状態」のプロセッサの識別子を得ることをプロセッサ状態管理装置２２に依頼するものである。表１では、ＲＥＱ０〜ＲＥＱ２をまとめてＲＥＱと表記している。例えば、ＲＥＱ＝１０１は、ＲＥＱ０が１（アサート）であり、ＲＥＱ１が０（ネゲート）であり、ＲＥＱ２が１（アサート）であることを示している。 In FIG. 3, REQ0 to REQ2 represent requests input from the processors 30 to 32 to the processor state management device 22. These requests are for requesting the processor status management device 22 to obtain the identifier of the “free” processor. In Table 1, REQ0 to REQ2 are collectively described as REQ. For example, REQ = 101 indicates that REQ0 is 1 (asserted), REQ1 is 0 (negated), and REQ2 is 1 (asserted).

　図３において、ＲＥＳＥＴ０〜ＲＥＳＥＴ２は、プロセッサ３０〜３２からプロセッサ状態管理装置２２に入力されるリセットを表す。これらのリセットは、プロセッサ状態管理装置２２内に保持されているプロセッサ３０〜３２の状態を「実行状態」から「空き状態」に変更することをプロセッサ状態管理装置２２に依頼するものである。表１では、ＲＥＳＥＴ０〜ＲＥＳＥＴ２をまとめてＲＥＳＥＴと表記している。例えば、ＲＥＳＥＴ＝０１０は、ＲＥＳＥＴ０が０（ネゲート）であり、ＲＥＳＥＴ１が１（アサート）であり、ＲＥＳＥＴ２が０（ネゲート）であることを示している。 In FIG. 3, RESET0 to RESET2 represent resets input from the processors 30 to 32 to the processor state management device 22. These resets request the processor state management device 22 to change the state of the processors 30 to 32 held in the processor state management device 22 from the “execution state” to the “free state”. In Table 1, RESET0 to RESET2 are collectively described as RESET. For example, RESET = 010 indicates that RESET0 is 0 (negated), RESET1 is 1 (asserted), and RESET2 is 0 (negated).

　図３において、ＩＤ０〜ＩＤ２は、プロセッサ３０〜３２からのリクエストに対して「空き状態」のプロセッサの識別子を通知する信号を表す。これらの信号は、プロセッサ３０〜３２からのリクエストに応答してプロセッサ状態管理装置２２から出力される。ＩＤ０〜ＩＤ２の値の意味は、以下のとおりである。 In FIG. 3, ID0 to ID2 indicate signals for notifying the identifiers of the "free" processors in response to requests from the processors 30 to 32. These signals are output from the processor state management device 22 in response to requests from the processors 30 to 32. The meanings of the values of ID0 to ID2 are as follows.

　００：プロセッサ３０が「空き状態」である。 $ 00: The processor 30 is in an "empty state".

　０１：プロセッサ３１が「空き状態」である。 $ 01: The processor 31 is in an "empty state".

　１０：プロセッサ３２が「空き状態」である。 # 10: The processor 32 is in the “empty state”.

　図３において、ＮＭＰ０〜ＮＭＰ２は、プロセッサ３０〜３２からのリクエストに対して「空き状態のプロセッサが存在しない」旨を通知する信号を表す。これらの信号は、プロセッサ３０〜３２からのリクエストに応答してプロセッサ状態管理装置２２から出力される。ＮＭＰ０〜ＮＭＰ２の値の意味は、以下のとおりである。 In FIG. 3, NMP0 to NMP2 indicate signals for notifying that "an empty processor does not exist" in response to a request from the processors 30 to 32. These signals are output from the processor state management device 22 in response to requests from the processors 30 to 32. The meanings of the values of NMP0 to NMP2 are as follows.

　０：「空き状態」のプロセッサが存在する。「空き状態」のプロセッサの識別子は、ＩＤ０〜ＩＤ０２の値によって示される。 $ 0: There is a processor in an "empty state". The identifier of the “free” processor is indicated by the values of ID0 to ID02.

　１：「空き状態」のプロセッサが存在しない。この場合、ＩＤ０〜ＩＤ２の値は、ｄｏｎ’ｔ　ｃａｒｅである。 $ 1: There is no "free" processor. In this case, the values of ID0 to ID2 are don't @ care.

　以下、図４と図５とを参照して、プロセッサ状態管理装置２２の機能および動作を説明する。プロセッサ状態管理装置２２は、マルチプロセッサシステムに含まれるすべてのプロセッサの状態を管理する。具体的には、プロセッサ状態管理装置２２は、プロセッサの識別子とプロセッサの状態とを一対にしてプロセッサ状態管理装置２２内に保持する。プロセッサの識別子は、複数のプロセッサを互いに識別するために使用される。典型的には、プロセッサの識別子は整数で表現される。プロセッサの状態は、「実行状態」か「空き状態」かのいずれかである。 Hereinafter, functions and operations of the processor state management device 22 will be described with reference to FIGS. The processor status management device 22 manages the status of all processors included in the multiprocessor system. Specifically, the processor status management device 22 stores the identifier of the processor and the status of the processor as a pair in the processor status management device 22. The processor identifier is used to identify a plurality of processors from each other. Typically, the processor identifier is represented by an integer. The state of the processor is either “execution state” or “empty state”.

　プロセッサ状態管理装置２２は、あるプロセッサからのリクエストに応答して、「空き状態」のプロセッサが存在するか否かを判定する。「空き状態」のプロセッサが存在した場合には、プロセッサ状態管理装置２２は、その「空き状態」のプロセッサの識別子をそのリクエストを発したプロセッサに返す。「空き状態」のプロセッサが存在しなかった場合には、プロセッサ状態管理装置２２は、「空き状態のプロセッサが存在しない」旨のメッセージをそのリクエストを発したプロセッサに返す。 (4) In response to a request from a certain processor, the processor state management device 22 determines whether or not there is a “free” processor. If there is a “vacant” processor, the processor state management device 22 returns the identifier of the “vacant” processor to the processor that issued the request. If there is no “empty” processor, the processor state management device 22 returns a message indicating that “an empty processor does not exist” to the processor that issued the request.

　「空き状態」のプロセッサが複数個存在する場合には、プロセッサ状態管理装置２２は、「空き状態」の複数のプロセッサのうち優先度の最も高いプロセッサの識別子をそのリクエストを発したプロセッサに返す。また、複数のプロセッサからのリクエストが同時にプロセッサ状態管理装置２２に到達した場合には、そのリクエストを発した複数のプロセッサのうち優先度の高いものから順に上述した処理が行われる。 When there are a plurality of “free” processors, the processor state management device 22 returns the identifier of the processor with the highest priority among the plurality of “free” processors to the processor that issued the request. Further, when requests from a plurality of processors arrive at the processor state management device 22 at the same time, the above-described processing is performed in order from a plurality of processors that have issued the request, in descending order of priority.

　図４（ａ）および（ｂ）は、プロセッサ状態管理装置２２の動作の一例を示す。プロセッサ状態管理装置２２は、４つのプロセッサ０〜３の状態を管理している。図４（ａ）に示す例では、プロセッサ０とプロセッサ１の状態は「実行状態」であり、プロセッサ２とプロセッサ３の状態は「空き状態」である。プロセッサ０からのリクエストとプロセッサ１からのリクエストがプロセッサ状態管理装置２２に入力される。 FIGS. 4A and 4B show an example of the operation of the processor state management device 22. FIG. The processor state management device 22 manages the states of the four processors 0 to 3. In the example shown in FIG. 4A, the states of the processors 0 and 1 are “execution states”, and the states of the processors 2 and 3 are “empty states”. The request from the processor 0 and the request from the processor 1 are input to the processor state management device 22.

　プロセッサ状態管理装置２２は、プロセッサ０からのリクエストに応答して、「空き状態」のプロセッサ２の識別子をプロセッサ０に返し、プロセッサ１からのリクエストに応答して、「空き状態」のプロセッサ３の識別子をプロセッサ１に返す（図４（ｂ）参照）。「空き状態」のプロセッサの識別子は、プロセッサの優先度に従って返される。また、プロセッサ状態管理装置２２は、プロセッサ状態管理装置２２内に保持されているプロセッサ２の状態を「空き状態」から「実行状態」に変更し、プロセッサ３の状態を「空き状態」から「実行状態」に変更する。 In response to the request from the processor 0, the processor state management device 22 returns the identifier of the "free" processor 2 to the processor 0, and in response to the request from the processor 1, the "free" processor 3 The identifier is returned to the processor 1 (see FIG. 4B). The identifier of the "empty" processor is returned according to the priority of the processor. Further, the processor state management device 22 changes the state of the processor 2 held in the processor state management device 22 from “free state” to “execution state”, and changes the state of the processor 3 from “free state” to “execution state”. State ”.

　図５（ａ）および（ｂ）は、プロセッサ状態管理装置２２の動作の他の一例を示す。プロセッサ状態管理装置２２は、４つのプロセッサ０〜３の状態を管理している。図５（ａ）に示す例では、プロセッサ０とプロセッサ１とプロセッサ２の状態は「実行状態」であり、プロセッサ３の状態は「空き状態」である。プロセッサ０からのリクエストとプロセッサ１からのリクエストがプロセッサ状態管理装置２２に入力される。 FIGS. 5A and 5B show another example of the operation of the processor state management device 22. FIG. The processor state management device 22 manages the states of the four processors 0 to 3. In the example illustrated in FIG. 5A, the states of the processor 0, the processor 1, and the processor 2 are “execution states”, and the state of the processor 3 is “empty state”. The request from the processor 0 and the request from the processor 1 are input to the processor state management device 22.

　プロセッサ状態管理装置２２は、プロセッサ０からのリクエストに応答して、「空き状態」のプロセッサ３の識別子をプロセッサ０に返し、プロセッサ１からのリクエストに応答して、「空き状態のプロセッサが存在しない」旨のメッセージをプロセッサ１に返す（図５（ｂ）参照）。「空き状態のプロセッサが存在しない」旨のメッセージは、例えば、プロセッサ状態管理装置２２から出力されるリターンコードの値によって表される。「空き状態」のプロセッサの識別子は、プロセッサの優先度に従って返される。また、プロセッサ状態管理装置２２は、プロセッサ状態管理装置２２内に保持されているプロセッサ３の状態を「空き状態」から「実行状態」に変更する。 In response to the request from the processor 0, the processor state management device 22 returns the identifier of the "free" processor 3 to the processor 0, and in response to the request from the processor 1, "there is no free processor". Is returned to the processor 1 (see FIG. 5B). The message indicating that “there is no empty processor” is represented by, for example, a return code value output from the processor state management device 22. The identifier of the "empty" processor is returned according to the priority of the processor. Further, the processor state management device 22 changes the state of the processor 3 held in the processor state management device 22 from the “free state” to the “execution state”.

　図４と図５に示される例では、プロセッサ状態管理装置２２によって管理されるプロセッサの数は４である。しかし、これは、説明の便宜上のためであり、本発明が４つのプロセッサを有するマルチプロセッサシステムに限定されるわけではない。本発明は、任意の数のプロセッサを含むマルチプロセッサシステムに適用され得る。 In the example shown in FIGS. 4 and 5, the number of processors managed by the processor state management device 22 is four. However, this is for convenience of explanation, and the present invention is not limited to a multiprocessor system having four processors. The present invention can be applied to a multi-processor system including any number of processors.

　図６は、パケット５０の構成を示す。パケット５０は、ロックビットを格納するロックビット領域５１と、リターンビットを格納するためのリターンビット領域５２と、リターンアドレスを格納するためのリターンアドレス領域５３と、引数を格納するための引数領域５４と、戻り値を格納するための戻り値領域５５とを有している。パケット５０は、タスク毎に共有メモリ２０上に確保され、タスクに所有される。これ以降、「タスクに所有されたパケット」を単に「タスクのパケット」と呼ぶ。パケット５０は、タスク間のデータの受け渡しやタスクの情報を保持するために使用される。 FIG. 6 shows the configuration of the packet 50. The packet 50 includes a lock bit area 51 for storing lock bits, a return bit area 52 for storing return bits, a return address area 53 for storing return addresses, and an argument area 54 for storing arguments. And a return value area 55 for storing a return value. The packet 50 is secured on the shared memory 20 for each task and owned by the task. Hereinafter, the “packet owned by the task” is simply referred to as “task packet”. The packet 50 is used for transferring data between tasks and holding task information.

　パケット５０のロックビット領域５１には、ロックビットが格納される。ロックビットは、パケット５０を所有するタスクが実行中である間、他のタスクからその実行中のタスクへのアクセスを禁止するか否かを示す。ロックビットが”１”であることは、アクセスを禁止していることを示す。ロックビットが”０”であることは、アクセスを禁止していないことを示す。ロック The lock bit is stored in the lock bit area 51 of the packet 50. The lock bit indicates whether or not other tasks are prohibited from accessing the running task while the task owning the packet 50 is running. The lock bit being “1” indicates that access is prohibited. The lock bit being “0” indicates that access is not prohibited.

　パケット５０のリターンビット領域５２には、リターンビットが格納される。リターンビットは、パケット５０を所有するタスクを実行する前に、他のタスクを中断したか否かを示す。リターンビットが”０”であることは、「パケット５０を所有するタスクを実行する前に、他のタスクを中断していない」ことを示す。これは、「空き状態」のプロセッサにパケット５０を所有するタスクが割り当てられた場合に相当する。リターンビットが”１”であることは、「パケット５０を所有するタスクを実行する前に、他のタスクを中断した」ことを示す。これは、「空き状態」のプロセッサが存在しなかったため、タスクを実行中のプロセッサがそのタスクの実行を中断して、パケット５０を所有する別のタスクを実行する
場合に相当する。 Return bits are stored in the return bit area 52 of the packet 50. The return bit indicates whether or not another task was interrupted before executing the task that owns the packet 50. The return bit being “0” indicates that “the other task is not interrupted before executing the task that owns the packet 50”. This corresponds to the case where the task that owns the packet 50 is assigned to the processor in the “empty state”. The fact that the return bit is "1" indicates that "before executing the task that owns the packet 50, another task was interrupted." This corresponds to a case where the processor executing the task interrupts the execution of the task and executes another task that owns the packet 50 because there is no “idle” processor.

　パケット５０のリターンアドレス領域５３には、リターンアドレスが格納される。リターンアドレスは、リターンビットが”１”である場合にのみ参照される。リターンアドレスは、中断されたタスクへの戻りアドレスを示す。リターン The return address is stored in the return address area 53 of the packet 50. The return address is referred to only when the return bit is “1”. The return address indicates a return address to the interrupted task.

　パケット５０の引数領域５４には、パケット５０を所有するタスクへの引数が格納される。 # The argument area 54 of the packet 50 stores an argument to the task that owns the packet 50.

　パケット５０の戻り値領域５５には、パケット５０を所有するタスクの実行結果である戻り値が格納される。 The return value 55 of the packet 50 is stored in the return value area 55 of the task that owns the packet 50.

　図７は、プロセッサ３０〜３２がｆｏｒｋ命令を解釈実行する手順を示す。プロセッサ３０〜３２は、主記憶装置２に格納されている命令セットから命令を読み出す。読み出された命令がｆｏｒｋ命令である場合には、プロセッサ３０〜３２は、図７に示す処理を実行する。 FIG. 7 shows a procedure in which the processors 30 to 32 interpret and execute the fork instruction. Each of the processors 30 to 32 reads an instruction from an instruction set stored in the main storage device 2. If the read instruction is a fork instruction, the processors 30 to 32 execute the processing shown in FIG.

　以下、図７を参照して、プロセッサ３０がｆｏｒｋ命令を解釈実行する手順をステップごとに詳細に説明する。他のプロセッサ３１および３２がｆｏｒｋ命令を解釈実行する場合も同様である。なお、ｆｏｒｋ命令は、オペランドとして、新たなタスクの処理内容を示す命令列の先頭アドレス（以降、単に命令アドレスという）と新たなタスクのパケット５０のアドレス（以降、単にパケットアドレスという）とをとる。 Hereinafter, with reference to FIG. 7, a procedure in which the processor 30 interprets and executes the fork instruction will be described in detail for each step. The same applies to the case where the other processors 31 and 32 interpret and execute the fork instruction. The fork instruction takes, as operands, the start address of an instruction sequence indicating the processing content of a new task (hereinafter simply referred to as an instruction address) and the address of the packet 50 of the new task (hereinafter simply referred to as a packet address). .

　ステップ（ａ）：プロセッサ３０は、「空き状態」のプロセッサが存在するか否かをプロセッサ状態管理装置２２に問い合わせる。このような問い合わせは、例えば、プロセッサ３０がプロセッサ状態管理装置２２にリクエスト（ＲＥＱ０＝１）を送ることにより達成される。プロセッサ状態管理装置２２は、そのリクエストに応答して「空き状態」のプロセッサが存在するか否かを判定する。 Step (a): The processor 30 inquires of the processor state management device 22 whether or not there is an “idle” processor. Such an inquiry is achieved, for example, by the processor 30 sending a request (REQ0 = 1) to the processor state management device 22. In response to the request, the processor state management device 22 determines whether or not there is a “free” processor.

　「空き状態」のプロセッサが存在する場合には、プロセッサ状態管理装置２２は、その「空き状態」のプロセッサの識別子をプロセッサ３０に返す。「空き状態」のプロセッサの識別子は、例えば、プロセッサ３０がプロセッサ状態管理装置２２から出力されるＩＤ０の値を参照することによって得られる。「空き状態」のプロセッサが複数個存在する場合には、優先度の最も高いプロセッサの識別子が得られる。また、複数のプロセッサが同時にｆｏｒｋ命令を解釈実行する場合には、優先度の高いプロセッサから順にｆｏｒｋ命令を解釈実行する。このようにして、プロセッサ３０は、「空き状態」のプロセッサの識別子を取得する。 If there is an “empty” processor, the processor status management device 22 returns the identifier of the “empty” processor to the processor 30. The identifier of the processor in the “free state” is obtained, for example, by the processor 30 referring to the value of ID0 output from the processor state management device 22. If there are a plurality of "empty" processors, the identifier of the processor with the highest priority is obtained. When a plurality of processors interpret and execute a fork instruction at the same time, the processors interpret and execute the fork instructions in descending order of priority. In this way, the processor 30 obtains the identifier of the “free” processor.

　「空き状態」のプロセッサが存在しない場合には、プロセッサ状態管理装置２２は、「空き状態のプロセッサが存在しない」旨のメッセージをプロセッサ３０に返す。「空き状態のプロセッサが存在しない」旨のメッセージは、例えば、プロセッサ３０がプロセッサ状態管理装置２２から出力されるＮＭＰ０の値を参照することによって得られる。 If there is no “empty” processor, the processor state management device 22 returns a message to the effect that “an empty processor does not exist” to the processor 30. The message that “there is no free processor” is obtained, for example, by the processor 30 referring to the value of NMP0 output from the processor state management device 22.

　ステップ（ｂ）：「空き状態」のプロセッサが存在した場合には、プロセッサ３０は、ステップ（ｃ）〜（ｅ）の処理を行う。「空き状態」のプロセッサが存在しない場合には、プロセッサ３０は、ステップ（ｆ）〜（ｇ）の処理を行う。 Step (b): If there is an “empty” processor, the processor 30 performs the processing of steps (c) to (e). If there is no “idle” processor, the processor 30 performs the processing of steps (f) to (g).

　ステップ（ｃ）：ここでは、「空き状態」のプロセッサは、プロセッサ３１であると仮定する。この場合、プロセッサ３０は、ｆｏｒｋ命令のオペランドとして与えられたタスクの命令アドレスとタスクのパケットアドレスとをネットワーク２１を介してプロセッサ３１に転送する。 Step (c): Here, it is assumed that the processor in the “free state” is the processor 31. In this case, the processor 30 transfers the task instruction address and the task packet address given as operands of the fork instruction to the processor 31 via the network 21.

　ステップ（ｄ）：プロセッサ３０は、ｆｏｒｋ命令のオペランドとして与えられたタスクのパケットアドレスによって指定されるパケット５０のロックビット領域５１に”１”を書き込み、リターンビット領域５２に”０”を書き込む。その後、プロセッサ３０は、ｆｏｒｋ命令の処理を完了し、次の命令の処理を行う。 Step (d): The processor 30 writes "1" in the lock bit area 51 of the packet 50 specified by the packet address of the task given as the operand of the fork instruction, and writes "0" in the return bit area 52. Thereafter, the processor 30 completes the processing of the fork instruction and performs the processing of the next instruction.

　ステップ（ｅ）：プロセッサ３１は、ネットワーク２１を介してプロセッサ３０からタスクの命令アドレスとタスクのパケットアドレスとを受け取る。プロセッサ３１は、受け取ったパケットアドレスによって指定されるパケット５０を参照しながら、受け取った命令アドレスによって指定される命令から処理を開始する。 Step (e): The processor 31 receives the instruction address of the task and the packet address of the task from the processor 30 via the network 21. The processor 31 starts processing from the instruction specified by the received instruction address while referring to the packet 50 specified by the received packet address.

　以上のステップ（ａ）〜（ｅ）により、プロセッサ３０は、プロセッサ３１によって実行される処理とは異なる処理を独立に実行することとなる。すなわち、プロセッサ３０とプロセッサ３１とによって並列処理が開始される。ｆｏｒｋ命令の処理はここで終了する。により Through the above steps (a) to (e), the processor 30 independently executes processing different from the processing executed by the processor 31. That is, parallel processing is started by the processor 30 and the processor 31. The processing of the fork instruction ends here.

　ステップ（ｆ）：プロセッサ３０は、ｆｏｒｋ命令のオペランドとして与えられたタスクのパケットアドレスによって指定されるパケット５０のロックビット領域５１に”１”を書き込み、リターンビット領域５２に”１”を書き込む。また、ｆｏｒｋ命令の次の命令のアドレスをリターンアドレス領域５３に書き込む。プロセッサ３０は、実行中のタスクを中断する。 Step (f): The processor 30 writes "1" in the lock bit area 51 of the packet 50 specified by the packet address of the task given as the operand of the fork instruction, and writes "1" in the return bit area 52. Also, the address of the instruction following the fork instruction is written in the return address area 53. The processor 30 suspends the task being executed.

　ステップ（ｇ）：プロセッサ３０は、ｆｏｒｋ命令のオペランドとして与えられたタスクのパケットアドレスによって指定されるパケット５０を参照しながら、ｆｏｒｋ命令のオペランドとして与えられたタスクの命令アドレスによって指定される命令から処理を開始する。ｆｏｒｋ命令の処理はここで終了する。 Step (g): The processor 30 refers to the packet 50 specified by the packet address of the task given as the operand of the fork instruction, and from the instruction specified by the instruction address of the task given as the operand of the fork instruction. Start processing. The processing of the fork instruction ends here.

　以下、図８を参照して、プロセッサ３０がｕｎｌｏｃｋ命令を解釈実行する手順をステップごとに詳細に説明する。他のプロセッサ３１および３２がｕｎｌｏｃｋ命令を解釈実行する場合も同様である。 Hereinafter, with reference to FIG. 8, a procedure in which the processor 30 interprets and executes the unlock instruction will be described in detail for each step. The same applies when the other processors 31 and 32 interpret and execute the unlock instruction.

　ステップ（ｈ）：プロセッサ３０は、実行中のタスクが所有するパケット５０のリターンビット領域５２の値が”０”であるか否かを判定する。リターンビット領域５２の値が”０”であることは、プロセッサ３０が処理を中断したタスクが存在しないことを示す。従って、リターンビット領域５２の値が”０”である場合には、プロセッサ３０は、ステップ（ｉ）の処理を行う。リターンビット領域５２の値が”１”であることは、プロセッサ３０が処理を中断したタスクが存在することを示す。従って、リターンビット領域５２の値が”１”である場合には、プロセッサ３０は、ステップ（ｊ）の処理を行う。 Step (h): The processor 30 determines whether or not the value of the return bit area 52 of the packet 50 owned by the task being executed is “0”. When the value of the return bit area 52 is “0”, it indicates that there is no task whose processing has been interrupted by the processor 30. Therefore, when the value of the return bit area 52 is “0”, the processor 30 performs the process of step (i). When the value of the return bit area 52 is “1”, it indicates that there is a task whose processing has been interrupted by the processor 30. Therefore, when the value of the return bit area 52 is “1”, the processor 30 performs the processing of step (j).

　ステップ（ｉ）：プロセッサ３０は、実行中のタスクが所有するパケット５０のロックビット領域５１に”０”を書き込み、プロセッサ３０の状態を「空き状態」にする。「空き状態」となったプロセッサ３０は、これ以降の処理を行わない。ｕｎｌｏｃｋ命令の処理はここで終了する。 Step (i): The processor 30 writes “0” in the lock bit area 51 of the packet 50 owned by the task being executed, and sets the state of the processor 30 to “free”. The processor 30 in the “empty state” does not perform the subsequent processing. The processing of the unlock instruction ends here.

　ステップ（ｊ）：プロセッサ３０は、実行中のタスクが所有するパケット５０のロックビット領域５１に”０”を書き込む。さらに、プロセッサ３０は、リターンアドレス領域５３に格納されているアドレスからの命令を処理することにより、中断されたタスクを復帰させる。ｕｎｌｏｃｋ命令の処理はここで終了する。 Step (j): The processor 30 writes “0” in the lock bit area 51 of the packet 50 owned by the task being executed. Further, the processor 30 returns the interrupted task by processing the instruction from the address stored in the return address area 53. The processing of the unlock instruction ends here.

　表２は、ｆｏｒｋ命令およびｕｎｌｏｃｋ命令の解釈実行に応答して、マルチプロセッサシステムの状態がどのように遷移するかを示す。表２に示される例では、マルチプロセッサシステムは、プロセッサＰ１とプロセッサＰ２とを有していると仮定する。 Table 2 shows how the state of the multiprocessor system changes in response to the interpretation and execution of the fork and unlock instructions. In the example shown in Table 2, it is assumed that the multiprocessor system has a processor P1 and a processor P2.

　図９に示されるように、マルチプロセッサシステムの状態は、プロセッサの状態とタスクの状態とに区分される。

As shown in FIG. 9, the state of the multiprocessor system is divided into a processor state and a task state.

　プロセッサは、２つの状態を有する。一方の状態は「空き状態（ＩＤＬＥ）」であり、他方の状態は「実行状態（ＲＵＮ）」である。これらの状態は、プロセッサ状態管理装置２２によって管理されている状態と同じものである。プロセッサの状態が「実行状態（ＲＵＮ）」である場合には、そのプロセッサはいずれかのタスクを実行中である。 The processor has two states. One state is “idle state” (IDLE), and the other state is “execution state (RUN)”. These states are the same as the states managed by the processor state management device 22. When the state of the processor is “RUN state (RUN)”, the processor is executing any task.

　タスクは、３つの状態を有する。１つ目の状態は「停止状態（ＳＴＯＰ）」であり、２つ目の状態は「第１実行状態（ＥＸ１）」であり、３つ目の状態は「第２実行状態（ＥＸ２）」である。「停止状態（ＳＴＯＰ）」は、プロセッサがタスクの実行を待っている状態であるかタスクの実行を終了した状態である。「第１実行状態（ＥＸ１）」は、他のタスクの実行を中断することなく現在のタスクが実行されている状態である。「第２実行状態（ＥＸ２）」は、他のタスクの実行を中断してその後現在のタスクが実行されている状態である。プロセッサの状態が「実行状態（ＲＵＮ）」である場合には、そのプロセッサに実行されているタスクの状態は、「第１実行状態（ＥＸ１）」と「第２実行状態（ＥＸ２）」のうちのいずれかである。 A task has three states. The first state is a “stop state (STOP)”, the second state is a “first execution state (EX1)”, and the third state is a “second execution state (EX2)”. is there. The "stop state (STOP)" is a state in which the processor is waiting for the execution of the task or has finished executing the task. The “first execution state (EX1)” is a state in which the current task is being executed without interrupting the execution of another task. The “second execution state (EX2)” is a state in which the execution of another task is suspended and the current task is being executed thereafter. When the state of the processor is “execution state (RUN)”, the state of the task executed by the processor is one of “first execution state (EX1)” and “second execution state (EX2)”. Is one of

　表２を再び参照して、マルチプロセッサシステムの状態がどのように遷移するかを説明する。マルチプロセッサシステムの状態は、イベントの発生に応答して、そのイベントと現在の状態に基づいて次の状態に遷移する。ここで、「Ｐｘ．ｆｏｒｋ」という表記は、「プロセッサＰｘがｆｏｒｋ命令を実行した」というイベントが発生したことを表し、「Ｐｘ．ｕｎｌｏｃｋ」という表記は、「プロセッサＰｘがｕｎｌｏｃｋ命令を実行した」というイベントが発生したことを表す。再び Referring again to Table 2, how the state of the multiprocessor system transitions will be described. The state of the multiprocessor system changes to the next state based on the event and the current state in response to the occurrence of the event. Here, the notation “Px.fork” indicates that an event “the processor Px has executed the fork instruction” has occurred, and the notation “Px.unlock” indicates that the “processor Px has executed the unlock instruction”. Event has occurred.

　表２の第１行は、プロセッサＰ１が「実行状態」（タスクＴ１を実行中）であり、プロセッサＰ２が「空き状態」であり、タスクＴ１が「第１実行状態」であり、タスクＴ２が「停止状態」である場合において、「プロセッサＰ１がｆｏｒｋ命令を実行した」というイベントに応答して、プロセッサＰ２の状態が「空き状態」から「実行状態」（タスクＴ２を実行中）に変更され、タスクＴ２の状態が「停止状態」から「第１実行状態」に変更されることを示す。このように状態が遷移するのは、新たなタスクＴ２が生成された時点でタスクＴ２が「空き状態」のプロセッサＰ２に割り当てられるからである。 In the first row of Table 2, the processor P1 is in the “execution state” (executing the task T1), the processor P2 is in the “free state”, the task T1 is in the “first execution state”, and the task T2 is In the case of the "stop state", the state of the processor P2 is changed from the "free state" to the "execution state" (executing the task T2) in response to the event "the processor P1 has executed the fork instruction". , The state of the task T2 is changed from the “stopped state” to the “first execution state”. The reason for the state transition is that when the new task T2 is generated, the task T2 is assigned to the "free" processor P2.

　表２の第２行は、表２の第１行における次の状態が現在の状態である場合において、「プロセッサＰ２がｕｎｌｏｃｋ命令を実行した」というイベントに応答して、プロセッサＰ２の状態が「実行状態」（タスクＴ２を実行中）から「空き状態」に変更され、タスクＴ２の状態が「第１実行状態」から「停止状態」に変更されることを示す。 In the second row of Table 2, when the next state in the first row of Table 2 is the current state, the state of the processor P2 is changed to "In response to the event that the processor P2 has executed the unlock instruction". This indicates that the state of the task T2 is changed from the “first execution state” to the “stopped state” from the “execution state” (the task T2 is being executed) to the “empty state”.

　表２の第３行は、プロセッサＰ１が「実行状態」（タスクＴ１を実行中）であり、プロセッサＰ２が「実行状態」（他のタスクを実行中）であり、タスクＴ１が「第１実行状態」であり、タスクＴ２が「停止状態」である場合において、「プロセッサＰ１がｆｏｒｋ命令を実行した」というイベントに応答して、プロセッサＰ１の状態が「実行状態」（タスクＴ１を実行中）から「実行状態」（タスクＴ２を実行中）に変更され、タスクＴ２の状態が「停止状態」から「第２実行状態」に変更されることを示す。このように状態が遷移するのは、新たなタスクＴ２が生成された時点で「空き状態」のプロセッサが存在しないため、プロセッサＰ１がタスクＴ１の実行を中断してタスクＴ２の実行を開始するからである。 In the third row of Table 2, the processor P1 is in the “execution state” (executing the task T1), the processor P2 is in the “execution state” (executing another task), and the task T1 is in the “first execution state”. State "and the task T2 is" stopped ", the state of the processor P1 is changed to" execution state "(executing the task T1) in response to the event" processor P1 has executed the fork instruction ". Is changed to “execution state” (task T2 is being executed), and the state of task T2 is changed from “stopped state” to “second execution state”. This state transition occurs because the processor P1 interrupts the execution of the task T1 and starts the execution of the task T2 because there is no "idle" processor when the new task T2 is generated. It is.

　表２の第４行は、表２の第３行における次の状態が現在の状態である場合において、「プロセッサＰ１がｕｎｌｏｃｋ命令を実行した」というイベントに応答して、プロセッサＰ１の状態が「実行状態」（タスクＴ２を実行中）から「実行状態」（タスクＴ１を実行中）に変更され、タスクＴ２の状態が「第２実行状態」から「停止状態」に変更されることを示す。 In the fourth row of Table 2, when the next state in the third row of Table 2 is the current state, the state of the processor P1 changes to "In response to the event that the processor P1 has executed the unlock instruction". This indicates that the execution state is changed from "execution state" (task T2 is being executed) to "execution state" (task T1 is being executed), and the state of task T2 is changed from the "second execution state" to the "stopped state".

　以下、ｆｏｒｋ命令とｕｎｌｏｃｋ命令を含むプログラムを並列処理する場合におけるマルチプロセッサシステム１の動作を説明する。 Hereinafter, an operation of the multiprocessor system 1 when a program including a fork instruction and an unlock instruction is processed in parallel will be described.

　図１０は、１から４までの和（１＋２＋３＋４）を二分木に基づいて計算するプログラムの手順を示す。このプログラムは、ｍａｉｎとｓｕｍの２つの部分に分かれており、ｍａｉｎは主プログラム、ｓｕｍは再帰呼び出し可能でかつ並列処理可能なサブルーチンである。ｓｕｍはｎとｍの２つの引数をとり、ｎ＋１からｍまでの和を求めるものである。ｍａｉｎはｎ＝０、ｍ＝４を引数としてｓｕｍを呼び出すものである。 FIG. 10 shows the procedure of a program for calculating the sum (1 + 2 + 3 + 4) of 1 to 4 based on a binary tree. This program is divided into two parts, main and sum, where main is a main program and sum is a subroutine that can be called recursively and can be processed in parallel. The sum takes two arguments, n and m, and calculates the sum from n + 1 to m. main calls sum with n = 0 and m = 4 as arguments.

　まず、初期状態として、プロセッサ３０はｍａｉｎを実行していると仮定する。プロセッサ３０の状態は「実行状態」である。また、プロセッサ３１およびプロセッサ３２の状態は「空き状態」であると仮定する。 First, as an initial state, it is assumed that the processor 30 is executing main. The state of the processor 30 is an “execution state”. It is also assumed that the states of the processors 31 and 32 are “empty”.

　以下、プログラムの各ステップ（Ａ）〜（Ｈ）について、マルチプロセッサシステム１がどのように動作するかを詳細に説明する。 Hereinafter, how the multiprocessor system 1 operates in each step (A) to (H) of the program will be described in detail.

　ステップ（Ａ）：プロセッサ３０は、ｎ＝０、ｍ＝４を引数としてｓｕｍサブルーチンを実行する。具体的には、プロセッサ３０は、共有キャッシュメモリ２０上にパケット５０（Ｐｋ１）を確保し、そのパケット５０（Ｐｋ１）の引数領域５４に値０と値４とを格納する。次に、プロセッサ３０は、ｓｕｍの命令の先頭アドレスとパケット５０（Ｐｋ１）の先頭アドレスとをオペランドとして、ｅｘｅｃ命令を実行する。ｅｘｅｃ命令とは、図７に示すｆｏｒｋ命令の処理手順のうちステップ（ｆ）と（ｇ）のみに対応する命令である。ｅｘｅｃ命令は、ｆｏｒｋ命令と同様にして、オペランドとしてタスクの命令アドレスとタスクのパケットアドレスとをとる。 Step (A): The processor 30 executes the sum subroutine using n = 0 and m = 4 as arguments. Specifically, the processor 30 secures the packet 50 (Pk1) on the shared cache memory 20, and stores the value 0 and the value 4 in the argument area 54 of the packet 50 (Pk1). Next, the processor 30 executes the exec instruction using the start address of the instruction of sum and the start address of the packet 50 (Pk1) as operands. The exec instruction is an instruction corresponding to only steps (f) and (g) in the processing procedure of the fork instruction shown in FIG. The exec instruction takes a task instruction address and a task packet address as operands in the same manner as the fork instruction.

　プロセッサ３０は、パケット５０（Ｐｋ１）のロックビット領域５１に”１”を書き込み、パケット５０（Ｐｋ１）のリターンビット領域５２に”１”を書き込み、リターンアドレス領域５３にｅｘｅｃ命令の次の命令のアドレスを格納する（図７のステップ（ｆ）を参照）。また、プロセッサ３０は、パケット５０（Ｐｋ１）を参照しながらｓｕｍの命令の実行を開始する（図７のステップ（ｇ）を参照）。 The processor 30 writes “1” in the lock bit area 51 of the packet 50 (Pk1), writes “1” in the return bit area 52 of the packet 50 (Pk1), and writes the next instruction following the exec instruction in the return address area 53. The address is stored (see step (f) in FIG. 7). Further, the processor 30 starts executing the sum instruction while referring to the packet 50 (Pk1) (see step (g) in FIG. 7).

　ステップ（Ｂ）：プロセッサ３０は、パケット５０（Ｐｋ１）から引数ｎと引数ｍとを読み出し、（ｎ＋１）とｍとを比較する。（ｎ＋１）とｍが等しい場合には、処理はステップ（Ｇ）に進み、その他の場合には、処理はステップ（Ｃ）に進む。ｓｕｍサブルーチンがｍａｉｎから最初に呼ばれた場合には、ｎ＝０、ｍ＝４であるから、（ｎ＋１）とｍとは等しくない。従って、処理は、ステップ（Ｃ）に進む。 Step (B): The processor 30 reads the argument n and the argument m from the packet 50 (Pk1), and compares (n + 1) with m. If (n + 1) is equal to m, the process proceeds to step (G); otherwise, the process proceeds to step (C). When the sum subroutine is first called from main, since n = 0 and m = 4, (n + 1) is not equal to m. Therefore, the process proceeds to step (C).

　ステップ（Ｃ）：プロセッサ３０は、ｋ＝（ｎ＋ｍ）ｄｉｖ２を計算する。ここで、（ｎ＋ｍ）＝４であるから、ｋ＝２となる。 Step (C): The processor 30 calculates k = (n + m) div2. Here, since (n + m) = 4, k = 2.

　ステップ（Ｄ）：プロセッサ３０は、ｎとｋとを引数としてｓｕｍサブルーチンを実行する。具体的には、プロセッサ３０は、共有キャッシュメモリ２０上にパケット５０（Ｐｋ２）を確保し、そのパケット５０（Ｐｋ２）の引数領域５４に値ｎ（＝０）と値ｋ（＝２）とを格納する。次に、プロセッサ３０は、ｓｕｍの命令の先頭アドレスとパケット５０（Ｐｋ２）の先頭アドレスとをオペランドとして、ｆｏｒｋ命令を実行する。 Step (D): The processor 30 executes the sum subroutine using n and k as arguments. Specifically, the processor 30 secures the packet 50 (Pk2) on the shared cache memory 20, and stores the value n (= 0) and the value k (= 2) in the argument area 54 of the packet 50 (Pk2). Store. Next, the processor 30 executes the fork instruction using the start address of the sum instruction and the start address of the packet 50 (Pk2) as operands.

　プロセッサ３１とプロセッサ３２はいずれも「空き状態」である。プロセッサ３０は、優先度に従って「空き状態」のプロセッサ３１の識別子を得る（図７のステップ（ａ）を参照）。プロセッサ３０は、タスクの命令アドレスとタスクのパケットアドレスとをプロセッサ３１に転送する（図７のステップ（ｂ）を参照）。プロセッサ３０は、パケット５０（Ｐｋ２）のロックビット領域５１に”１”を書き込み、パケット５０（Ｐｋ２）のリターンビット領域５２に”０”を書き込む（図７のステップ（ｄ）を参照）。さらに、プロセッサ３１は、パケット５０（Ｐｋ２）を参照しながらｓｕｍの命令の実行を開始する（図７のステップ（ｅ）を参照）。このようにして、プロセッサ３０とプロセッサ３１とはｓｕｍサブルーチンを並列に実行する。 Both the processor 31 and the processor 32 are in the “empty state”. The processor 30 obtains the identifier of the “empty” processor 31 according to the priority (see step (a) in FIG. 7). The processor 30 transfers the task instruction address and the task packet address to the processor 31 (see step (b) in FIG. 7). The processor 30 writes “1” in the lock bit area 51 of the packet 50 (Pk2) and writes “0” in the return bit area 52 of the packet 50 (Pk2) (see step (d) in FIG. 7). Further, the processor 31 starts executing the sum instruction while referring to the packet 50 (Pk2) (see step (e) in FIG. 7). Thus, the processor 30 and the processor 31 execute the sum subroutine in parallel.

　ステップ（Ｅ）：プロセッサ３０は、ｋとｍとを引数としてｓｕｍサブルーチンを実行する。具体的には、プロセッサ３０は、共有キャッシュメモリ２０上にパケット５０（Ｐｋ３）を確保し、そのパケット５０（Ｐｋ３）の引数領域５４に値ｋ（＝２）と値ｍ（＝４）とを格納する。次に、プロセッサ３０は、ｓｕｍの命令の先頭アドレスとパケット５０（Ｐｋ３）の先頭アドレスとをオペランドとして、ｅｘｅｃ命令を実行する。プロセッサ３０がｅｘｅｃ命令の実行を開始する前に、パケット５０（Ｐｋ１）はスタック領域に退避される。 Step (E): The processor 30 executes the sum subroutine using k and m as arguments. Specifically, the processor 30 secures the packet 50 (Pk3) on the shared cache memory 20 and stores the value k (= 2) and the value m (= 4) in the argument area 54 of the packet 50 (Pk3). Store. Next, the processor 30 executes the exec instruction using the start address of the sum instruction and the start address of the packet 50 (Pk3) as operands. Before the processor 30 starts executing the exec instruction, the packet 50 (Pk1) is saved in the stack area.

　プロセッサ３０は、パケット５０（Ｐｋ３）のロックビット領域５１に”１”を書き込み、パケット５０（Ｐｋ３）のリターンビット領域５２に”１”を書き込み、リターンアドレス領域５３にｅｘｅｃ命令の次の命令のアドレスを格納する（図７のステップ（ｆ）を参照）。また、プロセッサ３０は、パケット５０（Ｐｋ３）を参照しながらｓｕｍの命令の実行を開始する（図７のステップ（ｇ）を参照）。 The processor 30 writes “1” in the lock bit area 51 of the packet 50 (Pk3), writes “1” in the return bit area 52 of the packet 50 (Pk3), and writes the next instruction following the exec instruction in the return address area 53. The address is stored (see step (f) in FIG. 7). The processor 30 starts executing the sum instruction while referring to the packet 50 (Pk3) (see step (g) in FIG. 7).

　ステップ（Ｆ）：プロセッサ３０は、ステップ（Ｅ）において呼び出したｓｕｍサブルーチンの実行を終了した後、スタック領域に退避したパケット５０（Ｐｋ１）を復帰させる。その後、プロセッサ３０は、ｓ１とｓ２とを加算する。ここで、ｓ１は、ステップ（Ｄ）において実行されたｓｕｍサブルーチンの結果を示す。従って、ｓ１は、パケット５０（Ｐｋ２）の戻り値領域５５に格納される。ｓ２は、ステップ（Ｅ）において実行されたｓｕｍサブルーチンの結果を示す。従って、ｓ２は、パケット５０（Ｐｋ３）の戻り値領域５５に格納される。プロセッサ３０がステップ（Ｅ）において呼び出したｓｕｍサブルーチンの実行を終了した時点では、パケット５０（Ｐｋ２）を所有するタスクはまだ実行中である可能性がある。プロセッサ３０は、パケット５０（Ｐｋ２）を所有するタスクの実行が終了した後、パケット５０（Ｐｋ２）の戻り値領域５５に格納されている値を読み出し、その値をｓ１とする。ここでは、ｓ１＝３である。パケット５０（Ｐｋ２）を所有するタスクの実行が終了したか否かは、パケット５０（Ｐｋ２）のロックビット領域５１の値を参照することにより判定される。パケット５０（Ｐｋ２）のロックビット領域５１の値が”０”であることは、パケット５０（Ｐｋ２）を所有するタスクの実行が終了したことを示す。 Step (F): After completing the execution of the sum subroutine called in step (E), the processor 30 restores the packet 50 (Pk1) saved in the stack area. After that, the processor 30 adds s1 and s2. Here, s1 indicates the result of the sum subroutine executed in step (D). Therefore, s1 is stored in the return value area 55 of the packet 50 (Pk2). s2 indicates the result of the sum subroutine executed in step (E). Therefore, s2 is stored in the return value area 55 of the packet 50 (Pk3). At the point in time when the processor 30 has finished executing the sum subroutine called in step (E), the task that owns the packet 50 (Pk2) may still be executing. After completing the execution of the task that owns the packet 50 (Pk2), the processor 30 reads the value stored in the return value area 55 of the packet 50 (Pk2) and sets the value as s1. Here, s1 = 3. Whether or not the execution of the task owning the packet 50 (Pk2) has been completed is determined by referring to the value of the lock bit area 51 of the packet 50 (Pk2). The fact that the value of the lock bit area 51 of the packet 50 (Pk2) is “0” indicates that the execution of the task that owns the packet 50 (Pk2) has been completed.

　同様にして、プロセッサ３０は、パケット５０（Ｐｋ３）を所有するタスクの実行が終了した後、パケット５０（Ｐｋ３）の戻り値領域５５に格納されている値を読み出し、その値をｓ２とする。ここでは、ｓ２＝７である。プロセッサ３０は、ｓ１＋ｓ２を計算する。その結果、ｓ＝１０が得られる。 Similarly, after the execution of the task that owns the packet 50 (Pk3), the processor 30 reads the value stored in the return value area 55 of the packet 50 (Pk3), and sets the value as s2. Here, s2 = 7. The processor 30 calculates s1 + s2. As a result, s = 10 is obtained.

　ステップ（Ｈ）：プロセッサ３０は、ｓの値をパケット５０（Ｐｋ１）の戻り値領域５５に格納する。その後、プロセッサ３０は、ｕｎｌｏｃｋ命令を実行する。 Step (H): The processor 30 stores the value of s in the return value area 55 of the packet 50 (Pk1). Thereafter, the processor 30 executes the unlock instruction.

　プロセッサ３０は、パケット５０（Ｐｋ１）のリターンビット領域５２に格納されている値が”１”であるか否かを判定する（図８のステップ（ｈ）を参照）。は、”１”である。従って、プロセッサ３０は、パケット５０（Ｐｋ１）のロックビット領域５１に”０”を格納し、リターンアドレス領域５３に格納されているアドレスからの命令を実行する（図８のステップ（ｊ）を参照）。この場合、ｍａｉｎのステップ（Ａ）の次の命令から処理が再開される。 The processor 30 determines whether or not the value stored in the return bit area 52 of the packet 50 (Pk1) is “1” (see step (h) in FIG. 8). Is “1”. Therefore, the processor 30 stores “0” in the lock bit area 51 of the packet 50 (Pk1) and executes the instruction from the address stored in the return address area 53 (see step (j) in FIG. 8). ). In this case, the processing is restarted from the instruction following the main step (A).

　ステップ（Ｇ）：ステップ（Ｂ）において、ｎ＋１＝ｍであると判定された場合は、処理はステップ（Ｇ）に進む。プロセッサ３０は、ｓに引数ｍの値を代入する。その後、処理はステップ（Ｈ）に進む。 Step (G): If it is determined in step (B) that n + 1 = m, the process proceeds to step (G). The processor 30 substitutes the value of the argument m for s. Thereafter, the processing proceeds to step (H).

　ここで、ステップ（Ｄ）において呼び出されたｓｕｍサブルーチンやステップ（Ｅ）において呼び出されたｓｕｍサブルーチンにおいても、上述したステップ（Ｂ）〜（Ｈ）が実行されることに注意されたい。ｓｕｍサブルーチンは、再帰呼び出し可能なサブルーチンだからである。 Note that the steps (B) to (H) described above are also executed in the sum subroutine called in step (D) and the sum subroutine called in step (E). This is because the sum subroutine is a subroutine that can be called recursively.

　このように、ｓｕｍサブルーチンを再帰的に呼び出すことにより、１から４の和（１＋２＋３＋４）を並列に計算することが達成される。この例では、ステップ（Ｄ）におけるｆｏｒｋ命令とステップ（Ｅ）におけるｅｘｅｃ命令によって２つのタスクが生成されている。ｆｏｒｋ命令は「空き状態」のプロセッサがある限りそのプロセッサにタスクを割り当てるために使用される命令であり、ｅｘｅｃ命令は、あくまで自プロセッサにタスクを割り当てるために使用される命令である。 As described above, the sum of 1 to 4 (1 + 2 + 3 + 4) is calculated in parallel by recursively calling the sum subroutine. In this example, two tasks are generated by the fork instruction in step (D) and the exec instruction in step (E). The fork instruction is an instruction used for allocating a task to a processor in an "idle state" as long as there is a processor, and the exec instruction is an instruction used for allocating a task to its own processor.

　図１１は、上述した処理の内容を模式的に示したものである。図１１に示されるように、タスクｓｕｍ（０，４）からｆｏｒｋ命令とｅｘｅｃ命令とにより２つのタスクｓｕｍ（０，２）とタスクｓｕｍ（２，４）とが生成される。タスクｓｕｍ（０，２）はプロセッサ３１に割り当てられ、タスクｓｕｍ（２，４）はプロセッサ３０に割り当てられる。同様に、２つのタスクのそれぞれからさらに２つのタスクが生成される。「空き状態」のプロセッサが存在する限り他のプロセッサにタスクが割り当てられる。 FIG. 11 schematically shows the contents of the above-described processing. As shown in FIG. 11, two tasks sum (0,2) and task sum (2,4) are generated from the task sum (0,4) by a fork instruction and an exec instruction. Task sum (0,2) is assigned to processor 31, and task sum (2,4) is assigned to processor 30. Similarly, two more tasks are generated from each of the two tasks. As long as there is an "idle" processor, tasks are assigned to other processors.

　タスクｓｕｍ（２，４）からタスクｓｕｍ（２，３）とタスクｓｕｍ（３，４）とが生成される。しかし、いずれのタスクもプロセッサ３０に割り当てられる。タスク（２，３）の割り当て時に「空き状態」のプロセッサがすでに存在しなくなっているからである。 (5) Task sum (2, 3) and task sum (3, 4) are generated from task sum (2, 4). However, both tasks are assigned to the processor 30. This is because, when the task (2, 3) is allocated, the "free" processor no longer exists.

　このように、本発明のマルチプロセッサシステム１におけるプロセッサ３０〜３２のそれぞれは、ｆｏｒｋ命令を解釈実行することにより、「空き状態」のプロセッサが存在する場合にはそのプロセッサにタスクを割り当て、「空き状態」のプロセッサが存在しない場合には実行中のタスクの実行を中断して、そのプロセッサにタスクを割り当てる。このようにして、処理すべきタスクが生成されると同時に「空き状態」のプロセッサか、あるいはタスクを生成したプロセッサのいずれかにその生成されたタスクが割り当てられる。その結果、生成さたタスクは即時に実行される。これにより、従来のマルチプロセッサシステムでは必要とされた処理すべきタスクを保存する機構や、タスクの実行順序をスケジューリングする機構は不要となる。また、「空き状態」のプロセッサが存在する場合には、必ずそのプロセッサにタスクが割り当てられるため、プロセッサの利用効率も高い。 As described above, each of the processors 30 to 32 in the multiprocessor system 1 of the present invention interprets and executes the fork instruction, and assigns a task to a processor in a “free state” when there is a “free state” processor. If there is no "state" processor, the execution of the task being executed is interrupted, and the task is assigned to that processor. In this way, the task to be processed is generated, and at the same time, the generated task is assigned to either the “free” processor or the processor that generated the task. As a result, the generated task is executed immediately. This eliminates the need for a mechanism for storing tasks to be processed and a mechanism for scheduling the execution order of tasks, which are required in the conventional multiprocessor system. In addition, when there is an “idle” processor, a task is always assigned to that processor, so that the utilization efficiency of the processor is high.

　さらに、ｆｏｒｋ命令やｕｎｌｏｃｋ命令は簡単なハードウェアで実現することができ、高速な処理も実現することができる。 (4) Further, the fork instruction and the unlock instruction can be realized by simple hardware, and high-speed processing can be realized.

　従って、集積回路上に実装されたマルチプロセッサシステム１において、例示した０から４までの和を求めるプログラムのような、タスクの処理時間がスケジューリング処理時間や実行待ちタスクの管理処理に要する時間に比べて小さいプログラムを並列処理する場合には、本発明のタスク実行方法は非常に有用である。 Therefore, in the multiprocessor system 1 mounted on the integrated circuit, the processing time of the task, such as the program for obtaining the sum of 0 to 4 illustrated, is shorter than the time required for the scheduling processing time and the time required for the management processing of the task waiting to be executed. The task execution method of the present invention is very useful when a small program is to be processed in parallel.

　なお、集積回路の外部から割り込みが入った場合には、プロセッサ状態管理装置２２を用いて「空き状態」のプロセッサを検出し、「空き状態」のプロセッサのうち最も優先度の低いプロセッサに割り込み処理を行わせることにより、割り込み処理による性能低下を低減できる。 When an interrupt is received from outside the integrated circuit, the processor in the "empty state" is detected by using the processor state management device 22, and the processor with the lowest priority among the "empty states" is processed by the interrupt processing. Is performed, it is possible to reduce performance degradation due to interrupt processing.

　なお、集積回路のプロセッサがすべて「空き状態」になったことは、プロセッサ状態管理装置２２を用いて検出することができる。従って、この場合には、いずれかのプロセッサで例外処理を行うことによりデッドロックを回避することができる。 The fact that all the processors of the integrated circuit are in the “empty state” can be detected by using the processor state management device 22. Therefore, in this case, deadlock can be avoided by performing exception processing in any of the processors.

　以上のように、本発明によれば、あるプロセッサで新たなタスクを生成したときにそのタスクの実行を他あるいは自プロセッサによりただちに開始することができる。このことは、タスクを保持しておく機構やタスクの実行順序をスケジューリングする機構を不要にする。また、実行待ちのタスクを選択し、その選択されたタスクを「空き状態」のプロセッサに割り当てる処理も不要となる。 As described above, according to the present invention, when a new task is generated by a certain processor, the execution of the task can be immediately started by another or its own processor. This eliminates the need for a mechanism for holding tasks and a mechanism for scheduling the execution order of tasks. Further, there is no need to perform a process of selecting a task waiting to be executed and assigning the selected task to a “vacant” processor.

本発明のマルチプロセッサシステム１の構成を示す図である。FIG. 1 is a diagram showing a configuration of a multiprocessor system 1 of the present invention. タスクの概念を模式的に示す図である。It is a figure which shows the concept of a task typically. マルチプロセッサシステム１におけるプロセッサ状態管理装置２２の構成例を示す図である。FIG. 2 is a diagram showing a configuration example of a processor state management device 22 in the multiprocessor system 1. （ａ）および（ｂ）は、プロセッサ状態管理装置２２の動作の一例を説明する図である。FIGS. 4A and 4B are diagrams illustrating an example of the operation of the processor state management device 22. FIGS. （ａ）および（ｂ）は、プロセッサ状態管理装置２２の動作の他の一例を説明する図である。(A) and (b) are diagrams illustrating another example of the operation of the processor state management device 22. パケット５０の構成を示す図である。FIG. 3 is a diagram illustrating a configuration of a packet 50. プロセッサ３０〜３２がｆｏｒｋ命令を解釈実行する手順を示す図である。FIG. 4 is a diagram illustrating a procedure in which processors 30 to 32 interpret and execute a fork instruction. プロセッサ３０〜３２がｕｎｌｏｃｋ命令を解釈実行する手順を示す図である。FIG. 4 is a diagram illustrating a procedure in which processors 30 to 32 interpret and execute an unlock instruction. プロセッサの状態とタスクの状態とを説明する図である。FIG. 3 is a diagram illustrating a state of a processor and a state of a task. １から４までの和を二分木に基づいて計算するプログラムの手順を示す図である。It is a figure which shows the procedure of the program which calculates the sum of 1 to 4 based on a binary tree. 図１０に示すプログラムの処理の内容を模式的に示した図である。FIG. 11 is a diagram schematically illustrating processing contents of a program illustrated in FIG. 10. 従来のプロセッサ割当方法の動作を説明する図である。FIG. 9 is a diagram illustrating an operation of a conventional processor assignment method. タスクが中粒度〜粗粒度である場合における、タスクの処理時間とオーバヘッドの処理時間とを示すタイムチャートである。6 is a time chart showing the processing time of a task and the processing time of an overhead when the task has a medium to coarse granularity. タスクが細粒度である場合における、タスクの処理時間とオーバヘッドの処理時間とを示すタイムチャートである。6 is a time chart showing a processing time of a task and a processing time of an overhead when the task has a fine granularity.

Explanation of reference numerals

　１　マルチプロセッサシステム
　２　主記憶装置
　１０〜１２　要素プロセッサユニット
　２０　共有キャッシュ
　２１　ネットワーク
　２２　プロセッサ状態管理装置
　２３　バスインターフェース
　３０〜３２　プロセッサ
　３３〜３５　命令キャッシュ（ＩＣ）
　３６〜３８　データキャッシュ（ＤＣ） DESCRIPTION OF SYMBOLS 1 Multiprocessor system 2 Main storage device 10-12 Element processor unit 20 Shared cache 21 Network 22 Processor state management device 23 Bus interface 30-32 Processor 33-35 Instruction cache (IC)
36-38 Data cache (DC)

Claims

A method for performing a task in a multiprocessor system including a plurality of processors having an “empty state” and an “execution state”,
When a first processor executing a first task among the plurality of processors generates a new second task, the first processor requests the execution of the second task. Detecting whether there is a second processor having an “empty state”;
As a result of the detection, when a second processor having an “empty state” is detected, the execution of the second task by the second processor is started by allocating the second task to the second processor. Changing the state of the second processor from an “empty state” to an “executing state” and storing a flag having a first value indicating that execution of the first task has not been interrupted;
As a result of the detection, if the second processor having the “free state” is not detected, the execution of the first task by the first processor is interrupted, and the execution of the second task by the first processor is started. Storing a flag having a second value indicating that execution of the first task has been interrupted.