JP4599438B2

JP4599438B2 - Pipeline processing apparatus, pipeline processing method, and pipeline control program

Info

Publication number: JP4599438B2
Application number: JP2008198078A
Authority: JP
Inventors: 史郎中瀬
Original assignee: Toshiba Corp; Toshiba Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2008-07-31
Filing date: 2008-07-31
Publication date: 2010-12-15
Anticipated expiration: 2028-07-31
Also published as: JP2010039511A

Description

本発明は、マルチプロセッサを用いてパイプライン処理するパイプライン処理装置等に関し、特に、リアルタイム要求に適したパイプライン処理の特性を残しつつ、プロセッサ要素間のデータ転送の総量を削減し、且つアプリケーション開発の負荷を軽減し得るパイプライン処理装置等に関する。 The present invention relates to a pipeline processing apparatus and the like that performs pipeline processing using a multiprocessor, and in particular, reduces the total amount of data transfer between processor elements while retaining the characteristics of pipeline processing suitable for real-time requirements, and applications. The present invention relates to a pipeline processing apparatus that can reduce the development load.

従前より、例えば動画像を再生するときには、一定時間内に一定量のデータ処理がリアルタイム要求される場合がある。 For example, when playing back a moving image, a certain amount of data processing may be requested in real time within a certain time.

リアルタイム要求を実現する方法としては、ＤＳＰ（digital signal processor）などの専用のハードウェアを組み込んだ組み込みシステムが挙げられる。近年では、プロセッサ技術の進歩により、マルチプロセッサを用いてプログラム並列処理を行う組み込みシステムが増えている。 As a method for realizing the real-time request, there is an embedded system in which dedicated hardware such as a DSP (digital signal processor) is incorporated. In recent years, due to advances in processor technology, an increasing number of embedded systems perform program parallel processing using a multiprocessor.

例えば、各々のプロセッサ要素（以下、ＰＥとも表記する）が比較的小容量のローカルメモリを有し、実行コードと処理対象のデータ（以下、対象データともいう）とをローカルメモリに取り込んでから処理する構成のシステムがある。この構成のシステムでは、ローカルメモリに取り込まれるデータを処理している間は共有メモリなどの外部資源にアクセスする必要がない。そのため、他のプロセッサとの資源の競合が生じにくく、リアルタイム要求を実現するのに適している。 For example, each processor element (hereinafter also referred to as PE) has a relatively small local memory, and processing is performed after fetching execution code and data to be processed (hereinafter also referred to as target data) into the local memory. There is a system configured to do so. In the system having this configuration, it is not necessary to access an external resource such as a shared memory while processing data fetched into the local memory. Therefore, resource competition with other processors hardly occurs, and it is suitable for realizing a real-time request.

しかしながら、マルチプロセッサを使用する場合、どのようにスケジューリングするかが問題となる。すなわち、複数のプロセッサとプログラムとをどのような手順で割り当てればリアルタイム要求を満たせるのかが問題となる。この問題の解決策のひとつとして、ソフトウェアをパイプライン処理（以下、単にパイプライン処理という）する方法がある。 However, when using a multiprocessor, how to schedule becomes a problem. That is, the problem is how to allocate a plurality of processors and programs in order to satisfy the real-time request. As a solution to this problem, there is a method of pipeline processing of software (hereinafter simply referred to as pipeline processing).

従来のパイプライン処理では、アプリケーションソフトウェアをあらかじめ複数のプログラムまたは機能モジュールなどのステージに分割する。そして、分割したステージのそれぞれを１つのＰＥに割り付ける（ロードする）。なお、多くの場合は、プログラムの分割数を使用可能なＰＥ数に対応させる。 In conventional pipeline processing, application software is divided in advance into stages such as a plurality of programs or functional modules. Then, each of the divided stages is assigned (loaded) to one PE. In many cases, the number of program divisions corresponds to the number of usable PEs.

パイプライン処理するためのマルチプロセッサでは、対象データが入力されると、例えばローカルメモリの空き領域の大きさから決定される処理単位分のデータが最前段のＰＥ１にロードされる。ＰＥ１での処理が終了すると、処理結果のデータが次の段のＰＥ２に転送される。ＰＥ２で２段目の処理が開始されると同時に、最前段のＰＥ１に新しいデータがロードされる。それから、最初のデータがパイプラインの最終段であるＰＥ３に転送され、処理が定常状態になる。この後は、ＰＥ１〜ＰＥ３の全てにおいて、データが同時に処理されている状態になる。これにより、所望の要求性能が満たされるようになる。 In the multiprocessor for pipeline processing, when target data is input, data for a processing unit determined from, for example, the size of the free area in the local memory is loaded into the first PE1. When the processing at PE1 is completed, the processing result data is transferred to PE2 at the next stage. At the same time as the second stage processing is started in PE2, new data is loaded into PE1 in the forefront stage. Then, the first data is transferred to PE3 which is the final stage of the pipeline, and the processing becomes a steady state. Thereafter, in all of PE1 to PE3, data is processed at the same time. As a result, desired required performance is satisfied.

パイプライン処理のメリットとしては、メモリやＩ／Ｏ等の外部資源にアクセスするＰＥが限定されるため資源競合が起こりにくく、処理時間を見積もり易いためリアルタイム設計がしやすいということが挙げられる。また、データの処理順序が逆転することがないため出力段の同期処理の必要がないことや、データが必ず逐次処理となるためあらかじめ分割できないデータでも問題なく処理できることが挙げられる。
特許第３８８９７２６号明細書 Advantages of pipeline processing include the fact that PEs that access external resources such as memory and I / O are limited, so that resource contention is unlikely to occur, and processing time is easy to estimate, making real-time design easy. In addition, there is no need to synchronize the output stage because the processing order of the data is not reversed, and there is no problem even with data that cannot be divided in advance because the data is always sequentially processed.
Japanese Patent No. 3889726

しかしながら、上述したパイプライン処理には次の課題がある。第１の課題は、パイプライン段数分のＰＥ間でデータを転送しなければならない、ということである。データ転送はＤＭＡ（direct memory access）により行われることが多いが、転送すべきデータが大きい場合やパイプライン段数を多くしたい場合には、ＤＭＡのオーバヘッドがシステム全体に影響にすることがある。特に、転送用のバスを共有しているハードウェアであれば、バスの使用率が上昇してくると、バスが使用できるまでの待ちの時間を見積もるのが困難になる。結果として、リアルタイム性を実現できなくなることがある。 However, the pipeline processing described above has the following problems. The first problem is that data must be transferred between PEs corresponding to the number of pipeline stages. Data transfer is often performed by DMA (direct memory access). However, when there is a large amount of data to be transferred or when it is desired to increase the number of pipeline stages, DMA overhead may affect the entire system. In particular, in the case of hardware sharing a transfer bus, when the bus usage rate increases, it becomes difficult to estimate the waiting time until the bus can be used. As a result, real-time performance may not be realized.

第２の課題は、処理の重い段が１つでもあると、その段がボトルネックとなり、システム全体としての要求性能を達成できない、ということである。例えば図１７に示すように、単位データ当たり５ｍｓの要求性能が求められる前提において、ソフトウェア全体の処理性能が単位データ当たり１４ｍｓかかるとする。そこで、アプリケーション設計者がプログラムを３ステージに分割し、各ステージをＰＥ１〜ＰＥ３で実行するとする。ここでは、ＰＥ１〜ＰＥ３は、それぞれ単位データ当たり４ｍｓ，６ｍｓ，４ｍｓの処理時間を要するので、ＰＥ２において要求性能を満たしていないことになる。要するに、ＰＥ１，ＰＥ３が要求性能を満たしていてもＰＥ２が要求性能を満たしていなければ、システム全体として要求性能を達成できないことになる。なお、ＰＥ数に余裕がある場合でも同様である。 The second problem is that if there is even one stage with heavy processing, that stage becomes a bottleneck and the required performance of the entire system cannot be achieved. For example, as shown in FIG. 17, it is assumed that the processing performance of the entire software takes 14 ms per unit data on the assumption that the required performance of 5 ms per unit data is required. Therefore, it is assumed that the application designer divides the program into three stages and executes each stage with PE1 to PE3. Here, since PE1 to PE3 require processing times of 4 ms, 6 ms, and 4 ms, respectively, the PE2 does not satisfy the required performance. In short, even if PE1 and PE3 satisfy the required performance, if the PE2 does not satisfy the required performance, the system as a whole cannot achieve the required performance. The same applies when there is a margin in the number of PEs.

そこで、アプリケーション設計者は、所望の要求性能を実現できるようにプログラムをさらに分割し、あまっているＰＥに割り付ける作業を行う。ＰＥ数に余裕がない場合は、アプリケーション設計者は、プログラムの分割を見直す作業を行う。 Therefore, the application designer further divides the program so as to realize the desired required performance, and performs the work of assigning to the remaining PE. If the number of PEs is not sufficient, the application designer performs a review of program division.

しかしながら、一般的には、プログラムを分割する作業は非常な労力を伴う。プログラムを分割する作業を行うとした場合、アプリケーション設計者は、プログラムの分割可能点を正確に把握しなければならない。また、分割した場合にボトルネックが生じないように、各ステージにおける処理性能を把握しなければならない。 However, generally, the work of dividing a program is very labor intensive. When the work of dividing the program is performed, the application designer must accurately grasp the division point of the program. In addition, it is necessary to grasp the processing performance at each stage so that a bottleneck does not occur when divided.

また、原理的あるいは現実的に分割不可能な機能モジュールがボトルネックになってしまう。この問題は、性能を上げるために多段化しようとすればするほど現れやすくなる。 In addition, functional modules that cannot be divided in principle or practically become bottlenecks. This problem becomes more apparent as the number of stages increases to improve performance.

なお、このような問題はパイプライン処理の潜在的欠点と認識されている。これに対し、例えば、スケジューラがプログラムの実行時間と入出力関係とを認識し、処理内容がパイプライン状になるように周期的にスケジュールすることによって所望の時間を達成できる枠組みをアプリケーション設計者に提供する方法などもある（例えば特許文献１参照）。 Such a problem is recognized as a potential drawback of pipeline processing. On the other hand, for example, the scheduler recognizes the execution time of the program and the input / output relationship, and provides a framework for the application designer that can achieve the desired time by periodically scheduling the processing contents into a pipeline. There is also a method to provide (for example, see Patent Document 1).

本発明は、上記実情に鑑みてなされたものであり、リアルタイム要求に適したパイプライン処理の特性を残しつつ、プロセッサ要素間のデータ転送の総量を削減し得るパイプライン処理装置、パイプライン処理方法及びパイプライン制御プログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and a pipeline processing apparatus and a pipeline processing method capable of reducing the total amount of data transfer between processor elements while retaining the characteristics of pipeline processing suitable for real-time requirements. And it aims at providing a pipeline control program.

本発明は上記課題を解決するために以下の手段を講じる。
第１の発明は、対象データをパイプライン処理するためのプログラムを複数のステージに分割したときの実行コードと前記対象データとを記憶する局所記憶手段と、経路情報を記憶する経路情報記憶手段とを有する複数のプロセッサ要素を備えたパイプライン処理装置であって、前記複数のプロセッサ要素のうちの１つのプロセッサ要素は、アプリケーションの初期化時に、分割されたステージの各実行コードを前記各プロセッサ要素の局所記憶手段に個別に割り当てる手段と、前記各プロセッサ要素の使用順序を決定し、該使用順序に従った経路情報を前記各プロセッサ要素の経路情報記憶手段に書き込む経路情報書込手段と、を備え、前記複数のプロセッサ要素のそれぞれは、前記対象データを前記実行コードにより処理するデータ処理手段と、前記実行コードが前記対象データの単位データ量の処理を終了したことを検知する検知手段と、前記実行コードが単位データ量の処理を終了したことが前記検知手段により検知された場合、前記経路情報に従って他のプロセッサ要素に前記実行コードを転送する実行コード転送手段とを備えたパイプライン処理装置である。 The present invention takes the following means in order to solve the above problems.
According to a first aspect of the present invention, there is provided a local storage unit for storing an execution code and the target data when a program for pipeline processing of target data is divided into a plurality of stages, a path information storage unit for storing path information, A pipeline processing apparatus having a plurality of processor elements, wherein one processor element of the plurality of processor elements transmits each execution code of a divided stage when the application is initialized. Means for individually allocating to the local storage means, and path information writing means for determining the order of use of the processor elements and writing the path information according to the order of use in the path information storage means of the processor elements. provided, each of the plurality of processors elements, data processing for processing the target data by the execution code If the detection means detects that the execution code has finished processing the unit data amount of the target data, and the detection means detects that the execution code has finished processing the unit data amount, A pipeline processing apparatus comprising execution code transfer means for transferring the execution code to another processor element according to the path information.

第１の発明は、アプリケーションの初期化時に、分割されたステージの各実行コードを前記各プロセッサ要素の局所記憶手段に個別に割り当てる手段と、各プロセッサ要素の使用順序を決定し、該使用順序に従った経路情報を各プロセッサ要素の経路情報記憶手段に書き込む経路情報書込手段とを備え、各プロセッサ要素において、実行コードにより単位データ量の処理を終了したことが検知された場合、経路情報に従って他のプロセッサ要素に実行コードを転送するので、対象データを転送するのに比してバスの使用率を抑えることができ、リアルタイム要求に適したパイプライン処理の特性を残しつつ、プロセッサ要素間のデータ転送の総量を削減し得るパイプライン処理装置を提供することができる。 According to a first aspect of the present invention, at the time of initialization of an application, means for individually allocating each execution code of the divided stage to local storage means of each processor element, and determining the use order of each processor element, Path information writing means for writing the path information in accordance with the path information storage means of each processor element. When it is detected in each processor element that the processing of the unit data amount is completed by the execution code, the path information is Since the execution code is transferred to other processor elements, the bus usage rate can be reduced compared to transferring the target data, and the characteristics of pipeline processing suitable for real-time requirements can be maintained, and A pipeline processing apparatus capable of reducing the total amount of data transfer can be provided.

第２の発明は、対象データをパイプライン処理するためのプログラムを複数のステージに分割したときの実行コードと前記対象データとを記憶する局所記憶手段と、経路情報を記憶する経路情報記憶手段と、前記各ステージに対応するステータス情報を記憶するステータス情報記憶手段とを有する複数のプロセッサ要素を備えたパイプライン処理装置であって、前記複数のプロセッサ要素のうちの１つのプロセッサ要素は、アプリケーションの初期化時に、一のプロセッサ要素の局所記憶手段に、複数のステージに分割された該アプリケーションプログラムの最初のステージの実行コードを割り当てる手段を備え、前記複数のプロセッサ要素のそれぞれは、前記対象データを前記実行コードにより処理するデータ処理手段と、前記局所記憶手段に実行コードを書き込み可能であるときは、前記ステータス情報を空き状態と設定する手段と、前記データ処理手段による処理が終了した場合、他のプロセッサ要素に前記実行コードの転送要求を送出する手段と、他のプロセッサ要素から該実行コードの転送要求を受けたときに、前記ステータス情報が空き状態と設定されている場合、該他のプロセッサ要素に許可応答を送出する手段と、前記転送要求を送出する手段からの転送要求に応じて他のプロセッサ要素から許可応答を受けた場合、該許可応答元のプロセッサ要素を次の転送先とする経路情報を前記経路情報記憶手段に書き込む経路情報書込手段と、前記経路情報書込手段により書き込まれた経路情報に従って他のプロセッサ要素に前記実行コードを転送する手段とを備えたパイプライン処理装置である。 According to a second aspect of the present invention, there is provided local storage means for storing an execution code and the target data when a program for subjecting target data to pipeline processing is divided into a plurality of stages, and path information storage means for storing path information. A pipeline processing apparatus comprising a plurality of processor elements having status information storage means for storing status information corresponding to each stage, wherein one of the plurality of processor elements is an application Means for allocating the execution code of the first stage of the application program divided into a plurality of stages to the local storage means of one processor element at the time of initialization, and each of the plurality of processor elements stores the target data Data processing means for processing by the execution code and the local memory Means for setting the status information as empty, and means for sending a transfer request for the execution code to another processor element when the processing by the data processing means is completed. And when the status information is set to an empty state when the execution code transfer request is received from another processor element, a means for sending a permission response to the other processor element, and sending the transfer request Path information writing means for writing, in the path information storage means, path information having the permission response source processor element as the next transfer destination when a permission response is received from another processor element in response to a transfer request from the performing means And a means for transferring the execution code to another processor element according to the path information written by the path information writing means. It is a line processing equipment.

第２の発明は、各プロセッサ要素が、他のプロセッサ要素から実行コードの転送要求を受けたときに、ステータス情報が空き状態と設定されている場合、許可応答を送出するとともに、転送要求に応じて他のプロセッサ要素から許可応答を受けた場合、許可応答元のプロセッサ要素を次の転送先とし、その経路情報に従って他のプロセッサ要素に実行コードを転送するので、対象データを転送するのに比してバスの使用率を抑えることができ、リアルタイム要求に適したパイプライン処理の特性を残しつつ、プロセッサ要素間のデータ転送の総量を削減し得るパイプライン処理装置を提供することができる。 In the second invention, when each processor element receives an execution code transfer request from another processor element, if the status information is set to an empty state, it sends a permission response and responds to the transfer request. If the authorization response is received from another processor element, the authorization response source processor element is set as the next transfer destination, and the execution code is transferred to the other processor element according to the path information. Thus, it is possible to provide a pipeline processing apparatus that can reduce the bus usage rate and reduce the total amount of data transfer between processor elements while retaining the characteristics of pipeline processing suitable for real-time requests.

本発明によれば、リアルタイム要求に適したパイプライン処理の特性を残しつつ、プロセッサ要素間のデータ転送の総量を削減し、且つアプリケーション開発の負担を軽減することができる。 According to the present invention, it is possible to reduce the total amount of data transfer between processor elements and reduce the burden of application development while retaining the characteristics of pipeline processing suitable for real-time requirements.

以下、図面を参照して本発明の実施形態を説明する。なお、本発明において、「アプリケーション」は、プログラムのみならず、ミドルウェアやデバイスドライバなどを含むものとする。
＜第１の実施形態＞
図１は本発明の第１の実施形態に係るパイプライン処理装置１０の構成を示す模式図である。このパイプライン処理装置１０は、複数のプロセッサ要素ＰＥ１〜ＰＥ８、共有メモリ１１、バス１２、入出力ポート（Ｉ／Ｏ）１３，１４を備えている。なお、図1に示すプロセッサ要素の個数は例示であり、これに限るものではない。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the present invention, the “application” includes not only a program but also middleware, a device driver, and the like.
<First Embodiment>
FIG. 1 is a schematic diagram showing the configuration of a pipeline processing apparatus 10 according to the first embodiment of the present invention. The pipeline processing apparatus 10 includes a plurality of processor elements PE1 to PE8, a shared memory 11, a bus 12, and input / output ports (I / O) 13 and 14. Note that the number of processor elements shown in FIG. 1 is merely an example, and the present invention is not limited to this.

共有メモリ１１は、「スケジューラ」のプログラムを記憶する記憶装置である。スケジューラは、アプリケーションの初期化時に、そのアプリケーションのプログラムを複数のステージに分割し、分割した各ステージに対応する実行コードを各プロセッサ要素ＰＥ１〜ＰＥ８のローカルメモリ２１に個別に割り当てる機能を有している。また、スケジューラは、各プロセッサ要素ＰＥ１〜ＰＥ８の使用順序を決定し、この使用順序に従った「経路情報」を各プロセッサ要素ＰＥ１〜ＰＥ８のパイプライン制御プログラムに設定する機能を有している。
なお、ここではスケジューラが共有メモリ１１に記憶されるとしているが、これに限らず、プロセッサ要素のローカルメモリ２１に記憶される構成であってもよい。 The shared memory 11 is a storage device that stores a “scheduler” program. The scheduler has a function of dividing an application program into a plurality of stages at the time of initializing the application and individually assigning execution codes corresponding to the divided stages to the local memories 21 of the processor elements PE1 to PE8. Yes. Further, the scheduler has a function of determining the use order of the processor elements PE1 to PE8 and setting “path information” according to the use order in the pipeline control program of the processor elements PE1 to PE8.
Although the scheduler is stored in the shared memory 11 here, the present invention is not limited to this, and a configuration in which the scheduler is stored in the local memory 21 of the processor element may be used.

各プロセッサ要素ＰＥ１〜ＰＥ８は、図２に示すように、ローカルメモリ２１、演算ユニット２２、ＤＭＡＣ（direct memory access controller）２３を有している。 As shown in FIG. 2, each of the processor elements PE <b> 1 to PE <b> 8 includes a local memory 21, an arithmetic unit 22, and a DMAC (direct memory access controller) 23.

ローカルメモリ２１は、各プロセッサ要素ＰＥ１〜ＰＥ８に個別に設けられた記憶装置であり、対象データをパイプライン処理するための「実行コード」と、「対象データ」とを記憶する。また、ローカルメモリ２１は、「パイプライン制御プログラム」も記憶している。パイプライン制御プログラムは、経路情報や、ステージの実行コードが格納されるステージバッファの状態を示す「ステータス情報」を管理情報として保持しており、これらの情報に基づいて各プロセッサ要素ＰＥ１〜ＰＥ８を制御する。パイプライン制御プログラムの詳細については後述する。 The local memory 21 is a storage device provided individually for each of the processor elements PE1 to PE8, and stores “execution code” for subjecting target data to pipeline processing and “target data”. The local memory 21 also stores a “pipeline control program”. The pipeline control program holds, as management information, path information and “status information” indicating the state of the stage buffer in which the execution code of the stage is stored. Based on these information, each of the processor elements PE1 to PE8 is stored. Control. Details of the pipeline control program will be described later.

なお、ローカルメモリ２１は、図３に示すように、各ステージの実行コードを格納するステージバッファ（stage buffer）をステージバッファＡ及びステージバッファＢとして２重化している。これにより、ＤＭＡＣ２３が実行コードの転送中に（図３（Ａ））、次のステージの処理を実行できる（図３（Ｂ））。この結果、ＤＭＡの処理時間に要するタイムラグをなくすことが可能となる。従来のデータパイプライン方式でもよく用いられる構成である。 As shown in FIG. 3, the local memory 21 has a stage buffer (stage buffer) for storing the execution code of each stage being duplicated as a stage buffer A and a stage buffer B. As a result, the DMAC 23 can execute the next stage process (FIG. 3B) while the execution code is being transferred (FIG. 3A). As a result, the time lag required for the DMA processing time can be eliminated. This configuration is often used in the conventional data pipeline system.

演算ユニット２２は、対象データを実行コードにより処理するものである。演算ユニット２２にパイプライン制御プログラムが組み込まれることにより各種機能が発揮される。具体的には、演算ユニット２２は、実行コードが対象データの単位データ量の処理を終了したことを検知する検知機能を有する。また、演算ユニット２２は、検知機能により実行コードが単位データ量の処理を終了したことを検知した場合、ＤＭＡＣ２３にＤＭＡ起動をかける機能を有する。 The arithmetic unit 22 processes target data with an execution code. Various functions are exhibited by incorporating a pipeline control program into the arithmetic unit 22. Specifically, the arithmetic unit 22 has a detection function for detecting that the execution code has finished processing the unit data amount of the target data. In addition, the arithmetic unit 22 has a function of activating DMA in the DMAC 23 when the detection function detects that the execution code has finished processing the unit data amount.

ＤＭＡＣ２３は、演算ユニット２２からＤＭＡ起動がかかると、ローカルメモリ２１に記憶された経路情報に従って他のプロセッサ要素に実行コードを転送する。 When the DMAC 23 is activated by the arithmetic unit 22, the DMAC 23 transfers the execution code to other processor elements according to the path information stored in the local memory 21.

図４は本実施形態に係るパイプライン制御プログラムの機能構成を示した模式図である。パイプライン制御プログラムは、管理情報とパイプライン制御プログラム実行コードとを保持する。 FIG. 4 is a schematic diagram showing a functional configuration of the pipeline control program according to the present embodiment. The pipeline control program holds management information and a pipeline control program execution code.

管理情報は、パイプライン制御プログラムがプロセッサ要素ＰＥの制御を行う上で必要な情報である。管理情報として、例えば、経路情報や、データバッファ状態を示す情報、ステージバッファ状態を示す情報（ステータス情報含む）が保持される。なお、管理情報はキャッシュメモリなどに記憶される。 The management information is information necessary for the pipeline control program to control the processor element PE. As management information, for example, path information, information indicating a data buffer state, and information (including status information) indicating a stage buffer state are held. The management information is stored in a cache memory or the like.

経路情報は、パイプラインがどのようにつながっているかを示すデータであり、例えば図４に示す「転送先のＰＥ番号」が該当する。なお、本実施形態においては、経路情報はアプリケーションの初期化時にスケジューラにより固定値が与えられるものとする。ただし、これに限るものではなく、アプリケーション実行中に動的に変更されるとしてもよい。 The path information is data indicating how the pipelines are connected, and corresponds to, for example, the “transfer destination PE number” shown in FIG. In the present embodiment, it is assumed that the route information is given a fixed value by the scheduler when the application is initialized. However, the present invention is not limited to this, and it may be changed dynamically during application execution.

他の管理情報は、アプリケーション実行時に動的に変更されるものであり、現在の実行状態を保持するものである。例えば、データバッファの状態を示す情報としては、最終ステージ番号、入力データを保持するバッファのアドレス、出力データを保持するバッファのアドレス、有効データサイズがある。また、ステージバッファの状態を示す情報としては、ステータス情報、ステージ番号がある。 Other management information is dynamically changed when the application is executed, and holds the current execution state. For example, the information indicating the state of the data buffer includes the final stage number, the address of the buffer that holds the input data, the address of the buffer that holds the output data, and the effective data size. Information indicating the state of the stage buffer includes status information and a stage number.

ステージバッファは、ローカルメモリ２１における各ステージの実行コードを保持する記憶領域のことであり、各ステージを識別するためのステージ番号やステージバッファの状態を示すステータス情報により管理される。ステージバッファは、図５に示すように、“空き状態（ＩＤＬＥ）”，“準備状態（ＳＴＡＮＤＢＹ）”，“実行状態（ＡＣＴＩＶＥ）”，“転送状態（ＴＲＡＮＳＦＥＲ）”の順に状態遷移する。ここでは、ステータス情報の設定を変えることにより、ステージバッファを状態遷移させて、パイプラインの逐次処理を実現する。 The stage buffer is a storage area for holding the execution code of each stage in the local memory 21 and is managed by a stage number for identifying each stage and status information indicating the state of the stage buffer. As shown in FIG. 5, the stage buffer transitions in the order of “empty state (IDLE)”, “preparation state (STANDBY)”, “execution state (ACTIVE)”, and “transfer state (TRANSFER)”. Here, by changing the setting of the status information, the state of the stage buffer is changed, and the pipeline sequential processing is realized.

なお、“ＡＣＴＩＶＥ”は、「現在この実行コードを実行している」状態であるので、１個のＰＥ内でＡＣＴＩＶＥであるステージバッファは高々１つである。“ＳＴＡＮＤＢＹ”は、実行コードはすでに存在するが、まだそれを実行するときではない状態である。現在ＡＣＴＩＶＥ状態のステージの処理が終了すると、ＳＴＡＮＤＢＹ状態のステージがＡＣＴＩＶＥ状態となる。 Since “ACTIVE” is in the state “currently executing this execution code”, there is at most one stage buffer that is ACTIVE in one PE. “STANDBY” is a state in which the execution code already exists but is not yet the time to execute it. When the processing of the stage currently in the ACTIVE state is completed, the stage in the STANDBY state is changed to the ACTIVE state.

また、パイプライン制御プログラム実行コードが演算ユニット２２に組み込まれると、演算ユニット２２が次のように機能する。まず、演算ユニット２２は、そのＰＥが保持する実行コードの処理が全体の処理のうち最初のステージか否かを認識する。そして、演算ユニット２２に、最初のステージの実行コードが処理すべきデータが到着すると、その実行コードを実行する。演算ユニット２２は、単位データ量の処理が終了すると、ＤＭＡ起動をかける。また、演算ユニット２２は、実行した実行コードを消去する。実行コードのＤＭＡ転送がされると、次ステージの実行コードが到着する（あるいは到着している）ので、演算ユニット２２は、次ステージの実行コードを実行する。次ステージの実行が完了したら、経路情報に従って実行コードを転送し、今実行した実行コードを消去する。これを繰り返すと、最終ステージの処理とその実行コードの転送とが終了したときに、最初のステージの処理が到着していることになる。 When the pipeline control program execution code is incorporated into the arithmetic unit 22, the arithmetic unit 22 functions as follows. First, the arithmetic unit 22 recognizes whether or not the processing of the execution code held by the PE is the first stage in the entire processing. When the data to be processed by the execution code of the first stage arrives at the arithmetic unit 22, the execution code is executed. The arithmetic unit 22 activates the DMA when the processing of the unit data amount is completed. The arithmetic unit 22 erases the executed execution code. When the execution code is DMA-transferred, the next stage execution code arrives (or has arrived), so the arithmetic unit 22 executes the next stage execution code. When the execution of the next stage is completed, the execution code is transferred according to the path information, and the execution code just executed is deleted. If this is repeated, the processing of the first stage has arrived when the processing of the final stage and the transfer of the execution code are completed.

なお、ローカルメモリの容量に余裕がある場合、全てのＰＥのローカルメモリに全ての段の実行コードを収容できる場合がある。このような場合は、実行コードの実体の転送を省略することができる。パイプライン制御プログラムは前段に対して、次ステージの実行コードの準備ができたこと（次ステージの実行コードの実行許可）を通知するだけでよい。そのため、ＰＥ間のデータ転送をほとんど無くす、または全く無し（割り込みだけなど）にすることができる。また、処理単位ごとに実行コードを消去する必要もなくなる。図６はパイプライン処理装置１０における初期化処理を説明するための図である。パイプライン処理装置１０における初期化処理は、例えばアプリケーションの開始時などにスケジューラにより実行される。ただし、この初期化処理を行う主体は、共有メモリ１１に記憶されたスケジューラに限るものではなく、各パイプライン制御プログラム内に設けた別個の初期化プログラムが実行するとしてもよい。なお、以下の説明において、パイプライン処理するためのアプリケーションプログラムは、ステージ１〜ステージ３に分割されるものとする。 In addition, when there is a margin in the capacity of the local memory, there are cases where the execution code of all stages can be accommodated in the local memory of all PEs. In such a case, the transfer of the execution code entity can be omitted. The pipeline control program only needs to notify the preceding stage that the execution code for the next stage is ready (permission to execute the execution code for the next stage). For this reason, data transfer between PEs can be eliminated almost completely or not (only an interrupt or the like). Further, it is not necessary to delete the execution code for each processing unit. FIG. 6 is a diagram for explaining an initialization process in the pipeline processing apparatus 10. The initialization process in the pipeline processing apparatus 10 is executed by the scheduler, for example, at the start of an application. However, the subject that performs this initialization processing is not limited to the scheduler stored in the shared memory 11, but may be executed by a separate initialization program provided in each pipeline control program. In the following description, it is assumed that an application program for pipeline processing is divided into stages 1 to 3.

始めに、アプリケーションの開始時などにおいて、スケジューラに初期化命令が送出される。これに応じて、スケジューラは、各ＰＥの経路情報を設定し、対応する経路情報を各ＰＥに送出する（Ｓ１）。ここでは、ＰＥ３，ＰＥ２，ＰＥ１の転送先としてそれぞれＰＥ２，ＰＥ１，ＰＥ３が設定される。 First, an initialization command is sent to the scheduler at the start of an application. In response to this, the scheduler sets the route information of each PE and sends the corresponding route information to each PE (S1). Here, PE2, PE1, and PE3 are set as transfer destinations of PE3, PE2, and PE1, respectively.

続いて、スケジューラは、全ＰＥの有効データバッファサイズを０にセットする（Ｓ２）。それから、スケジューラは、全ＰＥの２重化した２番目のステージバッファＢのステータス状態を“ＩＤＬＥ”にセットする（Ｓ３）。 Subsequently, the scheduler sets the effective data buffer size of all PEs to 0 (S2). Then, the scheduler sets the status state of the duplicated second stage buffer B of all PEs to “IDLE” (S3).

次に、スケジューラは、全ＰＥの最終ステージ番号と、1番目のステージバッファＡのステージ番号をセットする（Ｓ４）。ここでは、ＰＥ３，ＰＥ２，ＰＥ１のそれぞれに対して、最終ステージ番号及びステージバッファＡのステージ番号を（３，１），（１，２），（２，３）と設定する。 Next, the scheduler sets the final stage number of all PEs and the stage number of the first stage buffer A (S4). Here, the final stage number and the stage number of the stage buffer A are set to (3, 1), (1, 2), (2, 3) for PE3, PE2, and PE1, respectively.

続いて、スケジューラは、各ＰＥの１番目のステージバッファＡに対応するステージの実行コードをロードする（Ｓ５）。ここでは、ＰＥ３，ＰＥ２，ＰＥ１に対してそれぞれステージ１，ステージ２，ステージ３をロードする。 Subsequently, the scheduler loads the execution code of the stage corresponding to the first stage buffer A of each PE (S5). Here, stage 1, stage 2, and stage 3 are loaded to PE3, PE2, and PE1, respectively.

次に、スケジューラは、全ＰＥのステージバッファＡのステータス状態を“ＳＴＡＮＤＢＹ”にセットする（Ｓ６）。そして、スケジューラは、全ＰＥに「ＳＴＡＮＤＢＹ通知」を送出する（Ｓ７）。 Next, the scheduler sets the status state of the stage buffer A of all PEs to “STANDBY” (S6). Then, the scheduler sends “STANDBY notification” to all PEs (S7).

以上の手順で初期化処理が行われると、パイプライン処理装置１０においてパイプライン処理を実行できるようになる。
パイプライン処理装置１０の初期化後は、各ＰＥにおいて、ＳＴＡＮＤＢＹ通知の受取処理、ステージの終了処理、ＤＭＡ完了通知の受取処理、が随時実行される。 When the initialization process is performed in the above procedure, the pipeline processing apparatus 10 can execute the pipeline process.
After the initialization of the pipeline processing apparatus 10, STANDBY notification reception processing, stage end processing, and DMA completion notification reception processing are executed as needed in each PE.

図７はＳＴＡＮＤＢＹ通知の受取処理の手順を示す図である。各ＰＥは、「ＳＴＡＮＤＢＹ通知」を受け取ると、あるステージバッファｘのステータス情報を“ＩＤＬＥ”から“ＳＴＡＮＤＢＹ”に変更する（Ｔ１）。ＳＴＡＮＤＢＹ通知時に２重化した一方のステージバッファが“ＡＣＴＩＶＥ”であれば、その処理が継続される（Ｔ２−Ｙｅｓ）。 FIG. 7 is a diagram showing a procedure for receiving a STANDBY notification. When each PE receives the “STANDBY notification”, the status information of a certain stage buffer x is changed from “IDLE” to “STANDBY” (T1). If one of the stage buffers duplicated at the time of STANDBY notification is “ACTIVE”, the processing is continued (T2-Yes).

一方、ＡＣＴＩＶＥ状態のステージバッファがなければ、ステージ番号と最終ステージ番号とから次のステージバッファを検索する（Ｔ２−Ｎｏ，Ｔ３）。 On the other hand, if there is no stage buffer in the ACTIVE state, the next stage buffer is searched from the stage number and the final stage number (T2-No, T3).

ステージバッファを検索した場合、該当するステージバッファのステータス情報を“ＳＴＡＮＤＢＹ”から“ＡＣＴＩＶＥ”に変更する（Ｔ４）。ステータス情報をＡＣＴＩＶＥに変更するときには、パイプライン制御プログラムが、バッファの最初と最後のアドレス及びバッファの大きさを、該当するステージの実行コードに渡す処理を行う（Ｔ５）。これにより、該当するステージの実行コードの処理が開始される。 When the stage buffer is searched, the status information of the corresponding stage buffer is changed from “STANDBY” to “ACTIVE” (T4). When the status information is changed to ACTIVE, the pipeline control program performs processing for passing the first and last addresses of the buffer and the size of the buffer to the execution code of the corresponding stage (T5). Thereby, processing of the execution code of the corresponding stage is started.

図８はステージの終了処理の手順を示す図である。各ＰＥでは、実行コード完了通知が送出されると、処理したステージバッファｘのステータス情報を“ＡＣＴＩＶＥ”から“ＴＲＡＮＳＦＥＲ”に変更する（Ｕ１）。 FIG. 8 is a diagram showing a procedure of stage end processing. In each PE, when an execution code completion notification is sent, the status information of the processed stage buffer x is changed from “ACTIVE” to “TRANSFER” (U1).

ステップＵ１の処理をしたＰＥは、入力データバッファのアドレスと出力データバッファのアドレスとを入れ替える（Ｕ２）。そして、実行コード完了通知に含まれる出力サイズを有効データサイズにする（Ｕ３）。また、最終ステージ番号を更新する（Ｕ４）。 The PE that has processed step U1 swaps the address of the input data buffer and the address of the output data buffer (U2). Then, the output size included in the execution code completion notification is set to an effective data size (U3). Also, the final stage number is updated (U4).

それから、ＰＥは、転送先ＰＥから“空き状態（ＩＤＬＥ）”のステージバッファ（stage buff）を検索する（Ｕ５）。空き状態のステージバッファが検索された場合、ＰＥでは、転送先ＰＥへ実行コードのＤＭＡ転送を開始する（Ｕ６−Ｙｅｓ，Ｕ７）。また、ＰＥは、ＤＭＡ転送を開始したときに、後述するＳＴＡＮＤＢＹ通知を他のプロセッサ要素から既に受けている場合、ステージ番号と最終ステージ番号とから次ステージのステージバッファを検索する（Ｕ８−Ｙｅｓ，Ｕ９）。 Then, the PE searches the stage buffer (stage buff) of “idle state (IDLE)” from the transfer destination PE (U5). When an empty stage buffer is searched, the PE starts DMA transfer of the execution code to the transfer destination PE (U6-Yes, U7). Further, when the PE starts the DMA transfer, if the STANDBY notification described later is already received from another processor element, the PE searches the stage buffer of the next stage from the stage number and the final stage number (U8-Yes, U9).

そして、ＰＥは、次ステージのステージバッファが検索された場合、該当するステージバッファのステータス情報を“ＳＴＡＮＤＢＹ”から“ＡＣＴＩＶＥ”に変更する（Ｕ１０）。 Then, when the stage buffer of the next stage is searched, the PE changes the status information of the corresponding stage buffer from “STANDBY” to “ACTIVE” (U10).

この後、ＰＥは、該当するステージの実行コードの処理を開始する（Ｕ１１）。具体的には、データバッファの最初と最後のアドレス及びバッファの大きさを実行コードに引き渡すことにより処理を実行する。 Thereafter, the PE starts processing the execution code of the corresponding stage (U11). Specifically, the process is executed by passing the first and last addresses of the data buffer and the size of the buffer to the execution code.

図９はＤＭＡ完了処理の手順を示す図である。各ＰＥでは、実行コードのＤＭＡ転送が完了した場合、転送先ＰＥに「ＳＴＡＮＤＢＹ通知」を送信する（Ｖ１）。ＳＴＡＮＤＢＹ通知を送信したＰＥは、ステージバッファのステータス情報を“ＴＲＡＮＳＦＥＲ”から“ＩＤＬＥ”に変更する。これにより実行コードが無効化される（Ｖ２）。 FIG. 9 is a diagram showing a procedure of DMA completion processing. In each PE, when the DMA transfer of the execution code is completed, a “STANDBY notification” is transmitted to the transfer destination PE (V1). The PE that has transmitted the STANDBY notification changes the status information of the stage buffer from “TRANSFER” to “IDLE”. As a result, the execution code is invalidated (V2).

従来のパイプライン処理の方式では、図１０（Ａ）に示すように、実行コードが各プロセッサ要素に固定され、対象データが紙面の左から右へ順次転送される。これに対し、本実施形態の方式では、図１０（Ｂ）に示すように、対象データが各プロセッサ要素に固定され、実行コードが紙面の右から左へと逆方向に順次転送される。 In the conventional pipeline processing method, as shown in FIG. 10A, the execution code is fixed to each processor element, and the target data is sequentially transferred from the left to the right of the page. On the other hand, in the method of this embodiment, as shown in FIG. 10B, the target data is fixed to each processor element, and the execution code is sequentially transferred in the reverse direction from right to left on the page.

つまり、ステージに着目すると、例えばステージ１の実行コードが入力データをローカルメモリ２１のデータバッファにロードする機能を有する場合、ステージ１の実行コードは対象データをロードし終えたら次のプロセッサ要素ＰＥ３に移って次の対象データをロードすることになる。そして、ステージの実行コードはプロセッサ要素Ｐ２，Ｐ１，Ｐ３・・・を順次移り、上述の処理を繰り返すようになる。 In other words, focusing on the stage, for example, when the execution code of stage 1 has a function of loading input data into the data buffer of the local memory 21, the execution code of stage 1 sends the next processor element PE3 after loading the target data. It moves and loads the next target data. Then, the execution code of the stage sequentially moves the processor elements P2, P1, P3... And repeats the above processing.

また、データに着目すると、まず、処理単位の対象データが最初のステージ１によってデータバッファにロードされ、対象データは最終ステージ３が終了して出力Ｉ／Ｏなどに渡される。この間、対象データは同一のプロセッサ要素に存在し続けていることになる。 Focusing on the data, first, the target data of the processing unit is loaded into the data buffer by the first stage 1, and the target data is passed to the output I / O or the like after the final stage 3 is completed. During this time, the target data continues to exist in the same processor element.

また、プロセッサ要素（ＰＥ）に着目すると、次の動作が行われていることになる。すなわち、対象データのひとつの処理単位に対して順々に実行コードが到着し、データが処理されていく。データの処理が一巡すると、また最初のステージ１の実行コードと対象データとがデータバッファにロードされる。そして、これを繰り返す動作が行われる。 When attention is paid to the processor element (PE), the following operation is performed. That is, the execution code arrives sequentially for one processing unit of the target data, and the data is processed. When the data processing is completed, the first stage 1 execution code and the target data are loaded into the data buffer. And the operation | movement which repeats this is performed.

上述した構成により、本実施形態に係るパイプライン処理装置１０では、次の処理が行われることになる。すなわち、スケジューラが、アプリケーションの初期化時に、複数のステージに分割されたアプリケーションプログラムの各実行コードを各プロセッサ要素ＰＥ１〜ＰＥ８のローカルメモリ２１に個別に割り当てる。そして、スケジューラが、各プロセッサ要素ＰＥ１〜ＰＥ８の使用順序を決定し、この使用順序に従った経路情報を各プロセッサ要素ＰＥ１〜ＰＥ８のローカルメモリ２１に書き込む。そして、アプリケーションの実行中は、各プロセッサ要素ＰＥ１〜ＰＥ８が、対象データを実行コードにより処理する。各プロセッサ要素ＰＥ１〜ＰＥ８では、実行コードが対象データの単位データ量の処理を終了したことを検知すると、経路情報に従って他のプロセッサ要素に実行コードを転送する。 With the configuration described above, the following processing is performed in the pipeline processing apparatus 10 according to the present embodiment. That is, the scheduler individually allocates each execution code of the application program divided into a plurality of stages to the local memory 21 of each processor element PE1 to PE8 when initializing the application. Then, the scheduler determines the use order of the processor elements PE1 to PE8, and writes path information according to the use order to the local memory 21 of each processor element PE1 to PE8. During execution of the application, each of the processor elements PE1 to PE8 processes the target data with the execution code. When each processor element PE1 to PE8 detects that the execution code has finished processing the unit data amount of the target data, it transfers the execution code to other processor elements according to the path information.

それゆえ、本実施形態に係るパイプライン処理装置１０では、対象データを他のプロセッサ要素に転送するのではなく実行コードを転送するので、リアルタイム処理に適したパイプライン処理の特性を残しつつ、プロセッサ要素間のデータ転送の総量を削減することができる。 Therefore, in the pipeline processing apparatus 10 according to the present embodiment, the execution code is transferred instead of transferring the target data to other processor elements, so that the processor retains the characteristics of pipeline processing suitable for real-time processing. The total amount of data transfer between elements can be reduced.

すなわち、データの総量に比べて実行コードの総量が少ない場合、ＰＥ間の転送の総量がその差分だけ小さく済むので、転送オーバヘッドとバス競合を削減することができる。また結果として、データの処理時間も見積もりやすくなるので、アプリケーション設計者に複雑詳細なプログラム分割をする手間を省くことが可能となる。 That is, when the total amount of execution code is smaller than the total amount of data, the total amount of transfer between PEs can be reduced by the difference, so that transfer overhead and bus contention can be reduced. As a result, the data processing time can be easily estimated, so that it is possible to save the application designer from having to divide a complicated and detailed program.

なお、可変長符合の処理などにより生じる小さなデータ片が残る場合はデータ片も実行コードとともに前段のＰＥに転送する構成としてもよい。 When a small data piece generated by variable length code processing or the like remains, the data piece may be transferred to the preceding PE together with the execution code.

また、本実施形態に係るパイプライン処理装置１０において、最前段のデータの入力と最終段のデータの出力とはアプリケーションが行う実装でもよいし、パイプライン制御プログラムが行う実装でも良い。本発明の本質に関係するものではない。 Further, in the pipeline processing apparatus 10 according to the present embodiment, the input of the first stage data and the output of the last stage data may be implemented by an application, or may be implemented by a pipeline control program. It is not related to the essence of the present invention.

＜第２の実施形態＞
図１１は本発明の第２の実施形態に係るパイプライン処理装置１０Ｓの構成を示す模式図である。なお、既に説明した部分と同一部分には同一符号を付し、特に説明がない限りは重複した説明を省略する。本実施形態に係るパイプライン処理装置１０Ｓでは、スケジューラが経路情報を決定するのではなく、各プロセッサ要素が自ら決定する。 <Second Embodiment>
FIG. 11 is a schematic diagram showing the configuration of a pipeline processing apparatus 10S according to the second embodiment of the present invention. In addition, the same code | symbol is attached | subjected to the part same as the already demonstrated part, and the overlapping description is abbreviate | omitted unless there is particular description. In the pipeline processing apparatus 10S according to the present embodiment, each processor element determines itself instead of the path information being determined by the scheduler.

各プロセッサ要素ＰＥ１〜ＰＥ８の演算ユニット２２Ｓは、パイプライン制御プログラムが組み込まれることにより、図１２に示すように、データ処理部３１、ステータス情報設定部３２、転送要求部３３、許可応答部３４、経路情報書込部３５、実行コード転送制御部３６を有する。 As shown in FIG. 12, the arithmetic unit 22S of each of the processor elements PE1 to PE8 incorporates a pipeline control program so that the data processing unit 31, the status information setting unit 32, the transfer request unit 33, the permission response unit 34, A path information writing unit 35 and an execution code transfer control unit 36 are provided.

データ処理部３１は、各プロセッサ要素において、対象データを実行コードにより処理するものである。ステータス情報設定部３２は、ローカルメモリ２１Ｓに記憶されたステージバッファのステータス情報を随時更新するものであり、データ処理部３１が対象データを処理していないときは、ステータス情報を“空き状態（ＩＤＬＥ）”と設定する。 The data processing unit 31 processes target data with an execution code in each processor element. The status information setting unit 32 updates the status information of the stage buffer stored in the local memory 21S as needed. When the data processing unit 31 is not processing the target data, the status information setting unit 32 sets the status information to “free state (IDLE). ) ”.

転送要求部３３は、各プロセッサ要素ＰＥ１〜ＰＥ８において、データ処理部３１による処理が終了した場合、他のプロセッサ要素に実行コードの転送要求を送出するものである。許可応答部３４は、他のプロセッサ要素から該実行コードの転送要求を受けたときに、ステータス情報が“空き状態（ＩＤＬＥ）”と設定されている場合、該他のプロセッサ要素に許可応答を送出するものである。経路情報書込部３５は、転送要求に応じて他のプロセッサ要素から許可応答を受けた場合、該許可応答元のプロセッサ要素を次の転送先とする経路情報をローカルメモリ２１Ｓに書き込むものである。 The transfer request unit 33 sends an execution code transfer request to another processor element when the processing by the data processing unit 31 is completed in each of the processor elements PE1 to PE8. The permission response unit 34 sends a permission response to the other processor element when the status information is set to “idle state (IDLE)” when the transfer request of the execution code is received from the other processor element. To do. When receiving a permission response from another processor element in response to the transfer request, the path information writing unit 35 writes path information with the processor element that is the permission response source as the next transfer destination in the local memory 21S. .

実行コード転送制御部３６は、ＤＭＡＣ２３にＤＭＡ起動をかけるものである。具体的には、実行コード転送制御部３６は、データ処理部３１において実行コードの処理が完了した場合、実行コード完了通知をＤＭＡＣ２３に送出することによりＤＭＡを起動させる。ここでは、実行コード転送制御部３６は、経路情報書込部３５により書き込まれた経路情報に従って他のプロセッサ要素に実行コードをＤＭＡ転送させる。 The execution code transfer control unit 36 activates the DMAC 23 with DMA. Specifically, when execution of the execution code is completed in the data processing unit 31, the execution code transfer control unit 36 activates the DMA by sending an execution code completion notification to the DMAC 23. Here, the execution code transfer control unit 36 causes another processor element to DMA transfer the execution code according to the path information written by the path information writing unit 35.

次に本実施形態に係るパイプライン処理装置１０Ｓの動作を説明する。始めに、対象データを処理するアプリケーションが実行されると、その初期化が行われ、スケジューラなどにより、アプリケーションプログラムが複数のステージに分割される。そして、一のプロセッサ要素ＰＥ１のローカルメモリに最初のステージの実行コードが割り当てられる。なお、各プロセッサ要素ＰＥ１〜ＰＥ８は、対象データを処理していないときは、ステータス情報設定部４２により、ステータス情報が“ＩＤＬＥ”に設定される。 Next, the operation of the pipeline processing apparatus 10S according to the present embodiment will be described. First, when an application for processing target data is executed, initialization is performed, and an application program is divided into a plurality of stages by a scheduler or the like. Then, the execution code of the first stage is assigned to the local memory of one processor element PE1. Note that the status information is set to “IDLE” by the status information setting unit 42 when the processor elements PE1 to PE8 are not processing the target data.

続いて、プロセッサ要素ＰＥ１では、実行コードによる処理が終了すると、他のプロセッサ要素ＰＥ２にその実行コードの転送要求を送出する。転送要求を受けたプロセッサ要素ＰＥ２は、ステータス情報が空き状態と設定されている場合、転送要求元のプロセッサ要素ＰＥ１に許可応答を送出する。 Subsequently, when the processing by the execution code is completed, the processor element PE1 sends a transfer request for the execution code to the other processor element PE2. The processor element PE2 that has received the transfer request sends a permission response to the processor element PE1 that is the transfer request source when the status information is set to the empty state.

プロセッサ要素ＰＥ１は、転送要求に応じて他のプロセッサ要素ＰＥ２から許可応答を受けた場合、許可応答元のプロセッサ要素ＰＥ２を次の転送先とする経路情報を設定する。そして、プロセッサ要素ＰＥ１は、設定された経路情報に従って他のプロセッサ要素ＰＥ２に実行コードを転送する。 When the processor element PE1 receives a permission response from another processor element PE2 in response to the transfer request, the processor element PE1 sets path information with the processor element PE2 that is the permission response source as the next transfer destination. Then, the processor element PE1 transfers the execution code to the other processor element PE2 according to the set path information.

以上説明したように、本実施形態に係るパイプライン処理装置１０Ｓは、各プロセッサ要素が、転送要求に応じて他のプロセッサ要素から許可応答を受けた場合、該許可応答元のプロセッサ要素を次の転送先とする経路情報をローカルメモリ２１Ｓに書き込み、その経路情報に従って他のプロセッサ要素に実行コードを転送するので、対象データを転送するのに比してバスの使用率を抑えることができ、リアルタイム要求に適したパイプライン処理の特性を残しつつ、プロセッサ要素間のデータ転送の総量を削減することができる。 As described above, in the pipeline processing apparatus 10S according to the present embodiment, when each processor element receives a permission response from another processor element in response to the transfer request, the processor element of the permission response source is Since the path information as the transfer destination is written in the local memory 21S and the execution code is transferred to other processor elements according to the path information, the bus usage rate can be suppressed as compared with the case of transferring the target data. The total amount of data transfer between processor elements can be reduced while retaining the pipeline processing characteristics suitable for the requirements.

＜第３の実施形態＞
本発明の第３の実施形態に係るパイプライン処理装置１０Ｔは、第２の実施形態に係るパイプライン処理装置１０Ｓの変形例であり、実行コードを複製することでパイプライン段数を動的に決定するものである。 <Third Embodiment>
The pipeline processing apparatus 10T according to the third embodiment of the present invention is a modification of the pipeline processing apparatus 10S according to the second embodiment, and dynamically determines the number of pipeline stages by duplicating the execution code. To do.

本実施形態に係るローカルメモリ２１Ｔは、各ステージの状態をサブステージに区分けし、これらのサブステージが先頭状態か継続状態かを示す「サブステータス情報」をサブステージ番号とともに記憶している。つまり、後述する実行コード複製部４０により実行コードが複製されると、ＡＣＴＩＶＥ状態は、複製された実行コードの先頭であることを示す先頭状態か、先頭ではないことを示す継続状態かの２つの状態に区別されることになる。そして、サブステータス情報がこれらの２つの状態に対応して設定される。 The local memory 21T according to the present embodiment divides the state of each stage into sub-stages, and stores “sub-status information” indicating whether these sub-stages are in the head state or the continuation state together with the sub-stage number. That is, when the execution code is duplicated by the execution code duplication unit 40 described later, the ACTIVE state has two states: a head state indicating that it is the head of the copied execution code and a continuation state indicating that it is not the head. It will be distinguished by the state. Sub-status information is set corresponding to these two states.

本実施形態に係る演算ユニット２２Ｔは、図１３に示すように、実行コード複製部４０をさらに備えている。本実施形態に係る演算ユニット２２Ｔは、第２の実施形態に係るデータ処理部３１、転送要求部３３、実行コード転送制御部３６に代えて、データ処理部４１、転送要求部４３、実行コード転送制御部４６を備えている。 The arithmetic unit 22T according to the present embodiment further includes an execution code duplicating unit 40 as shown in FIG. The arithmetic unit 22T according to the present embodiment replaces the data processing unit 31, the transfer request unit 33, and the execution code transfer control unit 36 according to the second embodiment with a data processing unit 41, a transfer request unit 43, and an execution code transfer. A control unit 46 is provided.

実行コード複製部４０は、実行コードによる処理が予め設定された要求時間を超える場合、実行コードの複製を生成する。要求時間に近づいたことの検知は、「タイマ割込」などで実装される。また、実行コード複製部４０は、実行コードが複製された場合、複製元の実行コード及び複製により得られた実行コードに対応する各ステージのサブステータス情報をサブステージ番号と関連付けて設定する。 The execution code replication unit 40 generates a copy of the execution code when the processing by the execution code exceeds a preset request time. Detection of approaching the requested time is implemented by “timer interrupt”. Further, when the execution code is duplicated, the execution code duplication unit 40 sets the sub-status information of each stage corresponding to the duplication source execution code and the execution code obtained by duplication in association with the sub-stage number.

データ処理部４１は、第２の実施形態に係るデータ処理部３１の機能に加え、サブステータス情報が設定されたときのステージの状態から対象データの処理を再開する。転送要求部４３は、第２の実施形態に係る転送要求部３３の機能に加え、実行コード複製部４０により実行コードが複製された場合、他のプロセッサ要素に複製された実行コードの転送要求を送出する。 In addition to the function of the data processing unit 31 according to the second embodiment, the data processing unit 41 resumes the processing of the target data from the stage state when the sub status information is set. In addition to the function of the transfer request unit 33 according to the second embodiment, the transfer request unit 43 sends a transfer request for an execution code copied to another processor element when the execution code replicating unit 40 copies the execution code. Send it out.

具体的には、転送要求部４３は、実行コードによる処理が予め設定された要求時間を超えることをタイマ割込により検知する。転送要求部４３は、タイマ割込が生じると、サブステータス情報が先頭状態であるか否かを判定する。転送要求部４３は、サブステータス情報が先頭状態である場合、転送先ＰＥが経路情報として設定されているか否かを判定する。転送先ＰＥが設定されていない場合、空き状態のプロセッサ要素（空きＰＥ）を検索し、検索したプロセッサ要素に転送要求を送出する。 Specifically, the transfer request unit 43 detects by a timer interrupt that processing by the execution code exceeds a preset request time. When a timer interrupt occurs, the transfer request unit 43 determines whether the sub status information is in the head state. When the sub status information is in the head state, the transfer request unit 43 determines whether the transfer destination PE is set as route information. If the transfer destination PE is not set, a free processor element (empty PE) is searched, and a transfer request is sent to the searched processor element.

一方、転送要求部４３は、タイマ割込の通知時にサブステータス情報が継続状態である場合、既に転送先ＰＥが経路情報として設定されているはずなので、空きＰＥの検索は行わない。また、転送先ＰＥには、複製した実行コードが既に存在するはずなので、実行コードの転送は行わない。この場合、プロセッサ要素のコンテキストを移動する必要がないので、バス使用率の上昇を回避することができる。 On the other hand, the transfer request unit 43 does not search for a free PE because the transfer destination PE should already be set as the path information when the sub status information is in the continuous state when the timer interrupt is notified. Further, since the copied execution code should already exist in the transfer destination PE, the execution code is not transferred. In this case, since it is not necessary to move the context of the processor element, an increase in the bus usage rate can be avoided.

実行コード転送制御部４６は、第２の実施形態に係る実行コード転送制御部３６の機能に加え、サブステータス情報に基づいて、複製された実行コードを他のプロセッサ要素にＤＭＡ転送させる。実行コード転送制御部４６は、継続状態の実行コードに対しては、各プロセッサ要素でコンテキストを移動する必要がない。このような場合、実行コード転送制御部４６は、サブステージ番号のみを通知するようにする。これにより、同一の実行コードを複数回転送することによる処理時間の増加を回避できる。また、実行コード転送制御部４６は、最後のサブステージにおいて対象データの処理が終了した場合、実行コードを削除する機能を有する。なお、ローカルメモリ２１Ｔが全てのステージの実行コードを収容できるほどの記憶領域を有している場合は、実行コードを転送する必要がないので、サブステータス番号のみが通知されることになる。 The execution code transfer control unit 46, in addition to the function of the execution code transfer control unit 36 according to the second embodiment, causes the copied execution code to be DMA-transferred to other processor elements based on the sub status information. The execution code transfer control unit 46 does not need to move the context in each processor element for the execution code in the continuation state. In such a case, the execution code transfer control unit 46 notifies only the substage number. As a result, an increase in processing time due to transferring the same execution code a plurality of times can be avoided. Further, the execution code transfer control unit 46 has a function of deleting the execution code when the processing of the target data is completed in the last substage. If the local memory 21T has a storage area that can accommodate the execution codes of all the stages, it is not necessary to transfer the execution codes, so only the sub status number is notified.

次に、本実施形態に係るパイプライン処理装置１０Ｔの動作を図１４を用いて説明する。前提として、対象データをパイプライン処理するためのアプリケーションプログラムはステージ1〜３の３ステージに分割され、それぞれプロセッサ要素ＰＥ１〜ＰＥ３に割り付けられるものとする（図１４（Ａ））。 Next, the operation of the pipeline processing apparatus 10T according to the present embodiment will be described with reference to FIG. As a premise, it is assumed that an application program for subjecting target data to pipeline processing is divided into three stages 1 to 3 and assigned to the processor elements PE1 to PE3, respectively (FIG. 14A).

かかる前提のもと、要求時間が経過してタイマ割込が生じたときにプロセッサ要素ＰＥ２の処理が終了していない場合、プロセッサ要素ＰＥ２における実行コード複製部４０により実行コードが複製される。実行コードが複製されると、プロセッサ要素ＰＥ２の実行コード転送制御部４６により、複製された実行コードのＤＭＡ起動がかけられる。これにより、プロセッサ要素ＰＥ１にも同一の実行コードが記憶されることになる（図１４（Ｂ））。要するに、実行コードを複製して同一の実行コードを持つＰＥが１つ増加することになる。なお、実行コードが複製された場合、実行コード複製部４０により、サブステータス情報がサブステージ番号とともに設定される。ここでは、プロセッサ要素ＰＥ１の実行コードには先頭状態が設定され、プロセッサ要素ＰＥ２の実行コードには継続状態が設定される。 Under such a premise, when the processing of the processor element PE2 is not completed when the request time elapses and a timer interrupt occurs, the execution code is duplicated by the execution code duplication unit 40 in the processor element PE2. When the execution code is copied, the execution code transfer control unit 46 of the processor element PE2 activates the copied execution code by DMA. As a result, the same execution code is also stored in the processor element PE1 (FIG. 14B). In short, the execution code is duplicated and the number of PEs having the same execution code increases by one. When the execution code is copied, the execution code copying unit 40 sets the sub status information together with the sub stage number. Here, the head state is set to the execution code of the processor element PE1, and the continuation state is set to the execution code of the processor element PE2.

そして、このような動作が、最初のデータ単位の処理が全て終了する時点（パイプライン処理が一巡する時点）まで繰り返されると（図１４（Ｃ），（Ｄ））、パイプライン段数が決まり、経路情報が決定された状態になる。図１４の例では５段に決定された状態となっている。 Then, when such an operation is repeated until the time when all the processing of the first data unit is completed (the time when the pipeline processing is completed) (FIGS. 14C and 14D), the number of pipeline stages is determined, The route information is determined. In the example of FIG. 14, the state is determined in five stages.

この後は、各プロセッサ要素に対象データがロードされ、実行コードが順次実行される。この状態では、全てのプロセッサ要素において、各ステージ及びサブステージが要求時間内に終了するので、所望の要求性能を達成することができる。 Thereafter, the target data is loaded into each processor element, and the execution code is sequentially executed. In this state, since all stages and sub-stages are completed within the required time in all the processor elements, the desired required performance can be achieved.

図１５はタイマ割込が生じたときの処理の手順を示す図である。各ＰＥでは、要求時間が経過してタイマ割込が生じたときに、サブステータス情報が先頭状態である場合、転送先ＰＥには現在のステージの情報がないので、実行コードを転送する（Ｗ１−Ｙｅｓ）。実行コードの転送に際しては、経路情報が設定されているか否かを検索し、経路情報が設定されていなければ、空きＰＥを検索して転送先ＰＥとして設定する（Ｗ２〜Ｗ４）。 FIG. 15 is a diagram showing a processing procedure when a timer interrupt occurs. In each PE, when the request time elapses and a timer interrupt occurs, if the sub-status information is in the leading state, the transfer destination PE does not have information on the current stage, so the execution code is transferred (W1). -Yes). When transferring the execution code, it is searched whether route information is set. If the route information is not set, a free PE is searched and set as a transfer destination PE (W2 to W4).

各ＰＥは、転送先ＰＥに実行コードを転送する際には、まず、転送先ＰＥから空き状態のステージバッファを検索する（Ｗ５）。各ＰＥは、空き状態のステージバッファが検索された場合、転送先ＰＥへ実行コードのＤＭＡ転送を開始する（Ｗ６−Ｙｅｓ，Ｗ７）。そして、各ＰＥは、ＤＭＡ転送を終了すると、サブステータス情報を先頭状態から継続状態に変更する（Ｗ８）。 When transferring execution codes to the transfer destination PE, each PE first searches the transfer destination PE for an empty stage buffer (W5). Each PE starts DMA transfer of an execution code to a transfer destination PE when a free stage buffer is searched (W6-Yes, W7). Then, when completing the DMA transfer, each PE changes the sub status information from the head state to the continuation state (W8).

一方、ステップＷ１において、サブステータス情報が継続状態である場合、転送先ＰＥに複製した実行コードがあるので、各ＰＥは実行コードを転送しない（Ｗ１−Ｎｏ）。 On the other hand, when the sub status information is in the continuation state in step W1, each PE does not transfer the execution code because there is an execution code copied to the transfer destination PE (W1-No).

なお、実行コード完了通知を受けたときに、サブステータス情報が継続状態である場合は、タイマ割込と同じ理由でＤＭＡの実行が省略される。ステータス情報が“ＩＤＬＥ”に変更されるだけである。 If the sub-status information is in the continuous state when the execution code completion notification is received, the execution of DMA is omitted for the same reason as the timer interrupt. The status information is only changed to “IDLE”.

図１６はＤＭＡ完了処理の手順を示す図である。各ＰＥでは、実行コードのＤＭＡ転送が完了した場合、転送先ＰＥに「ＳＴＡＮＤＢＹ通知」を送信する（Ｘ１）。ここで、ＳＴＡＮＤＢＹ通知を送信したＰＥに複製された実行コードが存在しない場合、既に処理が終了しており、ステータス情報は“ＴＲＡＮＳＦＥＲ”となっている。そこで、ステータス情報を“ＴＲＡＮＳＦＥＲ”から“ＩＤＬＥ”に変更する（Ｘ２−Ｙｅｓ，Ｘ３）。 FIG. 16 is a diagram showing a procedure of DMA completion processing. In each PE, when the DMA transfer of the execution code is completed, a “STANDBY notification” is transmitted to the transfer destination PE (X1). Here, when the execution code copied to the PE that has transmitted the STANDBY notification does not exist, the processing has already been completed, and the status information is “TRANSFER”. Therefore, the status information is changed from “TRANSFER” to “IDLE” (X2-Yes, X3).

一方、ＳＴＡＮＤＢＹ通知を送信したＰＥが、タイマ割込によってＤＭＡを起動していた場合、ステータス情報の変更はしない（Ｘ２−Ｎｏ）。ＤＭＡがタイマ割込によって起動された場合、実行コードによる処理が終了しておらず、ステータス情報は“ＡＣＴＩＶＥ”のまま維持する。 On the other hand, when the PE that has transmitted the STANDBY notification has started DMA by a timer interrupt, the status information is not changed (X2-No). When the DMA is activated by a timer interrupt, the processing by the execution code is not completed, and the status information is maintained as “ACTIVE”.

以上説明したように、本実施形態に係るパイプライン処理装置１０Ｔでは、各プロセッサ要素が、ステージの状態を区分けするためのサブステータス情報を記憶しており、実行コードによる処理が予め設定された要求時間を超える場合、実行コードの処理を停止するとともに複製し、実行コードが複製された場合、先頭状態か継続状態かを示すサブステータス情報を設定し、複製された実行コードを転送した後、このサブステータス情報に基づいて対象データの処理を再開するので、各プロセッサ要素における処理を要求時間内に終了させることができ、所望の要求性能を満たすことができる。 As described above, in the pipeline processing apparatus 10T according to the present embodiment, each processor element stores the sub-status information for classifying the state of the stage, and the request based on the execution code is set in advance. If the time is exceeded, the execution code processing is stopped and duplicated. If the execution code is duplicated, set the sub-status information indicating whether it is in the head or continuation state, transfer the duplicated execution code, and then Since the processing of the target data is resumed based on the sub status information, the processing in each processor element can be completed within the required time, and the desired required performance can be satisfied.

例えば、第２に実施形態に係るパイプライン処理装置１０Ｓでは、ステージ１，２，３の処理にそれぞれ４ｍｓ，１２ｍｓ，４ｍｓを要するとすると、単位データ量当り５ｍｓの要求性能が求められたときには、ステージ２がボトルネックとなって要求性能を満たすことができない。これに対し、本実施形態係るパイプライン処理装置１０Ｔであれば、ステージ２の処理を３つのプロセッサ要素で処理させることにより、５ｍｓ以内の処理を実現することができる。 For example, in the pipeline processing apparatus 10S according to the second embodiment, if 4 ms, 12 ms, and 4 ms are required for the processing of stages 1, 2, and 3, respectively, when a required performance of 5 ms per unit data amount is required, Stage 2 becomes a bottleneck and cannot satisfy the required performance. On the other hand, in the pipeline processing apparatus 10T according to the present embodiment, processing within 5 ms can be realized by processing the processing of stage 2 with three processor elements.

また、本実施形態に係るパイプライン処理装置１０Ｔでは、アプリケーションの設計者は各ステージの処理時間を正確に把握する必要がなくなるので、アプリケーションの設計者の負荷を軽減することができる。具体的には、アプリケーションの設計者は、例えばデータロード、暗号解読、圧縮解凍、データ出力などの機能単位に分割するだけの作業で済むようになる。また、アプリケーション設計者は、実行コードを細分化させる必要もなくなる。 Further, in the pipeline processing apparatus 10T according to the present embodiment, it is not necessary for the application designer to accurately grasp the processing time of each stage, so that the load on the application designer can be reduced. Specifically, the application designer only needs to divide into functional units such as data loading, decryption, compression / decompression, and data output. In addition, the application designer does not need to subdivide the execution code.

なお、本実施形態に係るパイプライン処理装置１０Ｔは、動作概念としては前記の通りサブステージを順繰りに送っていくパイプライン処理であるが、継続状態の実行コードは移動する必要がなく、複製した実行コードでの処理が非常に簡素になる。 The pipeline processing apparatus 10T according to the present embodiment is a pipeline process in which substages are sent in order as described above as an operation concept, but the execution code in the continuous state does not need to be moved and copied. Processing in the execution code becomes very simple.

１個のＰＥに着目した場合、要求時間の到来によって概念上のサブステージが変更されても、実行コードの内容自体は変更していない。また処理対象のデータはローカルメモリ２１Ｔに記憶されたまま転送されていない。したがって、ローカルメモリ２１Ｔ上では、実行コードもデータも同じままであるため、実行コードをあらゆるタイミングで一時停止したり、再開したりすることができる。さらに、データの中途半端なところを処理中でも、一時変数などに重要な情報を保持しているタイミングでも一時停止したり、再開したりすることができる。 When focusing on one PE, even if the conceptual substage is changed due to the arrival of the required time, the content of the execution code itself is not changed. The data to be processed is not transferred while being stored in the local memory 21T. Accordingly, since the execution code and the data remain the same on the local memory 21T, the execution code can be paused or resumed at any timing. Furthermore, even when halfway data is being processed, it can be paused or resumed even when important information is held in a temporary variable or the like.

要するに、割り込みがかかって戻っていくだけなので、実行コードがそれを認識する必要がなく、非常に精度の高いタイミングでパイプライン動作を行うことができる。 In short, since an interrupt is just taken and the process returns, the execution code does not need to recognize it, and a pipeline operation can be performed with very high accuracy.

また、本実施形態に係るパイプライン処理装置１０Ｔは、サブステージ変更のための一時停止と再開とが同一ＰＥ上で行われるため、アプリケーション処理のどのタイミングでも転送可能である。アプリケーションに対しては、例えば一時タイマ割り込みがかかって、そこから復帰しただけのように認識させることができるので、処理対象のデータ、実行コードともどこを実行中でもサブステージの変更が可能である。したがって、実際に１個のステージまたはサブステージの処理が行われる時間はアプリケーションの中身によらず非常に高い精度（例えばタイマ割り込みの精度）で実現できる。 Further, the pipeline processing apparatus 10T according to the present embodiment can perform transfer at any timing of application processing because the suspension and resumption for changing the substage are performed on the same PE. The application can be recognized as if, for example, a temporary timer interrupt has occurred and returned from there, so that the substage can be changed regardless of where the data to be processed and the execution code are being executed. Therefore, the time for actually processing one stage or sub-stage can be realized with very high accuracy (eg, timer interrupt accuracy) regardless of the contents of the application.

＜他の変形例＞
なお、本発明は、上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合わせにより種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。更に、異なる実施形態に構成要素を適宜組み合わせてもよい。 <Other variations>
Note that the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of components disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, you may combine a component suitably in different embodiment.

なお、上記実施形態に記載した手法は、コンピュータに実行させることのできるプログラムとして、磁気ディスク（フロッピー（登録商標）ディスク、ハードディスクなど）、光ディスク（ＣＤ−ＲＯＭ、ＤＶＤなど）、光磁気ディスク（ＭＯ）、半導体メモリなどの記憶媒体に格納して頒布することもできる。 Note that the method described in the above embodiment is a program that can be executed by a computer, such as a magnetic disk (floppy (registered trademark) disk, hard disk, etc.), an optical disk (CD-ROM, DVD, etc.), a magneto-optical disk (MO). ), And can be distributed in a storage medium such as a semiconductor memory.

また、この記憶媒体としては、プログラムを記憶でき、かつコンピュータが読み取り可能な記憶媒体であれば、その記憶形式は何れの形態であっても良い。 In addition, as long as the storage medium can store a program and can be read by a computer, the storage format may be any form.

また、記憶媒体からコンピュータにインストールされたプログラムの指示に基づきコンピュータ上で稼働しているＯＳ（オペレーティングシステム）や、データベース管理ソフト、ネットワークソフト等のＭＷ（ミドルウェア）等が上記実施形態を実現するための各処理の一部を実行しても良い。 In addition, an OS (operating system) operating on the computer based on an instruction of a program installed in the computer from the storage medium, MW (middleware) such as database management software, network software, and the like implement the above-described embodiment. A part of each process may be executed.

さらに、本発明における記憶媒体は、コンピュータと独立した媒体に限らず、ＬＡＮやインターネット等により伝送されたプログラムをダウンロードして記憶または一時記憶した記憶媒体も含まれる。 Furthermore, the storage medium in the present invention is not limited to a medium independent of a computer, but also includes a storage medium in which a program transmitted via a LAN, the Internet, or the like is downloaded and stored or temporarily stored.

また、記憶媒体は１つに限らず、複数の媒体から上記実施形態における処理が実行される場合も本発明における記憶媒体に含まれ、媒体構成は何れの構成であっても良い。 Further, the number of storage media is not limited to one, and the case where the processing in the above embodiment is executed from a plurality of media is also included in the storage media in the present invention, and the media configuration may be any configuration.

尚、本発明におけるコンピュータは、記憶媒体に記憶されたプログラムに基づき、上記実施形態における各処理を実行するものであって、パソコン等の１つからなる装置、複数の装置がネットワーク接続されたシステム等の何れの構成であっても良い。 The computer according to the present invention executes each process in the above-described embodiment based on a program stored in a storage medium, and is a single device such as a personal computer or a system in which a plurality of devices are connected to a network. Any configuration may be used.

また、本発明におけるコンピュータとは、パソコンに限らず、情報処理機器に含まれる演算処理装置、マイコン等も含み、プログラムによって本発明の機能を実現することが可能な機器、装置を総称している。 In addition, the computer in the present invention is not limited to a personal computer, but includes an arithmetic processing device, a microcomputer, and the like included in an information processing device, and is a generic term for devices and devices that can realize the functions of the present invention by a program. .

本発明の第１の実施形態に係るパイプライン処理装置１０の構成を示す模式図である。It is a mimetic diagram showing the composition of pipeline processing device 10 concerning a 1st embodiment of the present invention. 同実施形態に係る各プロセッサ要素の構成を示す模式図である。It is a schematic diagram which shows the structure of each processor element which concerns on the same embodiment. 同実施形態に係るローカルメモリの概念を示す模式図である。It is a schematic diagram which shows the concept of the local memory which concerns on the same embodiment. 同実施形態に係るパイプライン制御プログラムの機能構成を示した模式図である。It is the schematic diagram which showed the function structure of the pipeline control program which concerns on the same embodiment. 同実施形態に係るステージバッファの状態遷移を示す模式図である。It is a schematic diagram which shows the state transition of the stage buffer which concerns on the same embodiment. 同実施形態に係るパイプライン処理装置１０における初期化処理を説明するための図である。It is a figure for demonstrating the initialization process in the pipeline processing apparatus 10 which concerns on the same embodiment. 同実施形態に係るＳＴＡＮＤＢＹ通知の受取処理の手順を示す図である。It is a figure which shows the procedure of the reception process of the STANDBY notification which concerns on the embodiment. 同実施形態に係るステージの終了処理の手順を示す図である。It is a figure which shows the procedure of the completion | finish process of the stage which concerns on the same embodiment. 同実施形態に係るＤＭＡ完了処理の手順を示す図である。It is a figure which shows the procedure of the DMA completion process which concerns on the embodiment. 従来のパイプライン処理との差異を説明するための図である。It is a figure for demonstrating the difference with the conventional pipeline process. 本発明の第２の実施形態に係るパイプライン処理装置１０Ｓの構成を示す模式図である。It is a schematic diagram which shows the structure of the pipeline processing apparatus 10S which concerns on the 2nd Embodiment of this invention. 同実施形態に係る各プロセッサ要素の構成を示す模式図である。It is a schematic diagram which shows the structure of each processor element which concerns on the same embodiment. 本発明の第３の実施形態に係る各プロセッサ要素の構成を示す模式図である。It is a schematic diagram which shows the structure of each processor element which concerns on the 3rd Embodiment of this invention. 同実施形態に係るパイプライン処理装置１０Ｔの動作を説明するための模式図である。It is a schematic diagram for demonstrating operation | movement of the pipeline processing apparatus 10T which concerns on the same embodiment. 同実施形態に係るタイマ割込が生じたときの処理の手順を示す図である。It is a figure which shows the procedure of a process when the timer interruption which concerns on the embodiment has arisen. 同実施形態に係るＤＭＡ完了処理の手順を示す図である。It is a figure which shows the procedure of the DMA completion process which concerns on the embodiment. 一般的なパイプライン処理を説明するための模式図である。It is a schematic diagram for demonstrating general pipeline processing.

Explanation of symbols

１０・・・パイプライン処理装置、１１・・・共有メモリ、１２・・・バス、１３，１４・・・入出力ポート、２１・・・ローカルメモリ、２２・・・演算ユニット、２３・・・ＤＭＡＣ、３１・・・データ処理部、３２・・・ステータス情報設定部、３３・・・転送要求部、３４・・・許可応答部、３５・・・経路情報書込部、３６・・・実行コード転送制御部、４０・・・実行コード複製部、４１・・・データ処理部、４３・・・転送要求部、４６・・・実行コード転送制御部、ＰＥ１〜ＰＥ８・・・プロセッサ要素。 DESCRIPTION OF SYMBOLS 10 ... Pipeline processing apparatus, 11 ... Shared memory, 12 ... Bus, 13, 14 ... Input / output port, 21 ... Local memory, 22 ... Arithmetic unit, 23 ... DMAC, 31... Data processing unit, 32... Status information setting unit, 33... Transfer request unit, 34. Code transfer control unit, 40... Execution code duplication unit, 41... Data processing unit, 43... Transfer request unit, 46 ... execution code transfer control unit, PE1 to PE8.

Claims

A plurality of processor elements having local storage means for storing an execution code and target data when a program for subjecting target data to pipeline processing is divided into a plurality of stages, and path information storage means for storing path information A pipeline processing apparatus comprising:
One processor element of the plurality of processor elements is:
Means for individually allocating each execution code of the divided stage to the local storage means of each processor element at the time of initialization of the application;
Path information writing means for determining the use order of the processor elements, and writing path information according to the use order in the path information storage means of the processor elements;
With
Each of the plurality of processor elements is
Data processing means for processing the target data with the execution code;
Detection means for detecting that the execution code has finished processing the unit data amount of the target data;
And an execution code transfer means for transferring the execution code to another processor element according to the path information when the detection means detects that the execution code has finished processing the unit data amount. Pipeline processing device.

Corresponding to each stage, local storage means for storing the execution code and the target data when the program for pipeline processing of the target data is divided into a plurality of stages, path information storage means for storing path information, and A pipeline processing apparatus comprising a plurality of processor elements having status information storage means for storing status information
One processor element of the plurality of processor elements is:
Means for allocating the execution code of the first stage of the application program divided into a plurality of stages to the local storage means of one processor element at the time of initialization of the application;
Each of the plurality of processor elements is
Data processing means for processing the target data with the execution code;
When the execution code can be written in the local storage means, the means for setting the status information as an empty state;
Means for sending the execution code transfer request to another processor element when the processing by the data processing means is completed;
Means for sending an authorization response to the other processor element when the status information is set to an empty state when the execution code transfer request is received from the other processor element;
When a permission response is received from another processor element in response to a transfer request from the means for sending the transfer request, the path information with the processor element that is the permission response source as the next transfer destination is written in the path information storage means. Route information writing means;
A pipeline processing apparatus comprising: means for transferring the execution code to another processor element according to the path information written by the path information writing means.

A plurality of processor elements having a local storage unit that stores an execution code and the target data when a program for pipeline processing of the target data is divided into a plurality of stages, and a path information storage unit that stores path information A pipeline processing method used in a pipeline processing apparatus comprising:
Individually assigning each execution code of the divided stage to the local storage means of each processor element when initializing the application;
A path information writing step of determining a use order of each processor element and writing path information according to the use order in a path information storage unit of each processor element;
In each processor element, a data processing step of processing the target data with the execution code;
A detection step of detecting that the execution code has finished processing the unit data amount of the target data;
An execution code transfer step of transferring the execution code to another processor element according to the path information when the detection step detects that the execution code has finished processing the unit data amount. Pipeline processing method.

Corresponding to each stage, local storage means for storing the execution code and the target data when the program for pipeline processing of the target data is divided into a plurality of stages, path information storage means for storing path information, and A pipeline processing method used in a pipeline processing apparatus having a plurality of processor elements having status information storage means for storing status information
Dividing the program into a plurality of stages of execution code at the time of initialization of the application, and assigning the execution code of the first stage to the local storage means of one processor element;
In each processor element, a data processing step of processing target data with the execution code;
When the execution code can be written in the local storage means, the step of setting the status information as an empty state;
In each of the processor elements, when the processing by the data processing step is completed, a step of sending a transfer request for the execution code to another processor element;
In another processor element that has received the transfer request, if status information is set to an empty state, means for sending a permission response to the processor element that is the transfer request source,
A path information writing step of writing, in the path information storage means, path information having the permission response source processor element as the next transfer destination when receiving a permission response in response to the transfer request;
And a step of transferring the execution code to another processor element according to the path information written by the path information writing step.

A pipeline processing program which is individually provided with storage means and is used in a pipeline processing apparatus having a plurality of processor elements for pipeline processing of target data, wherein one processor element of the plurality of processor elements The
Means for writing the execution code and the target data into the storage means when the program for pipeline processing is divided into a plurality of stages;
Means for writing path information in the storage means of each processor element according to a preset use order of each processor element at the time of initialization of the application;
Realized as
Each of the plurality of processor elements is
Data processing means for processing the target data with the execution code;
Detection means for detecting that the execution code has finished processing the unit data amount of the target data;
Pipeline processing for realizing execution code transfer means for transferring the execution code to another processor element according to the path information when the detection means detects that the execution code has finished processing the unit data amount program.

A pipeline processing program used in a pipeline processing apparatus having a plurality of processor elements for individually processing storage data and subjecting target data to pipeline processing,
One processor element of the plurality of processor elements;
Means for writing the execution code and the target data into the storage means when the program for pipeline processing is divided into a plurality of stages;
Realized as
Each of the plurality of processor elements is
Data processing means for processing the target data with the execution code;
Means for setting the status information as empty when an execution code can be written in the storage means;
Means for sending the execution code transfer request to another processor element when the processing by the data processing means is completed;
Means for sending a permission response to the other processor element when the status information is set to an empty state when the execution code transfer request is received from the other processor element;
Means for writing, in the storage means, path information having the permission response source processor element as the next transfer destination when receiving a permission response from another processor element in response to the transfer request;
A pipeline processing program for realizing as means for transferring the execution code to another processor element according to the path information written in the storage means .