JP2002007359A

JP2002007359A - Method and device for parallel processing simd control

Info

Publication number: JP2002007359A
Application number: JP2000186226A
Authority: JP
Inventors: Toru Kurata; 徹倉田
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2000-06-21
Filing date: 2000-06-21
Publication date: 2002-01-11

Abstract

PROBLEM TO BE SOLVED: To increase the memory capacity and processing speed by effectively utilizing a processor element(PE), which is not practically contributing to processing. SOLUTION: In SIMD control for controlling one-dimensionally located plural PE corresponding to the same command, ID codes, which are repeatedly assigned in the same array for every plural groups 2 composed of PE0, PE1 and PE2, for example, capable of specifying the arbitrary PE within each of groups 2 are applied for every PE. Then, data are inputted to the PE 0-2, the same processing is executed, the result in the middle of processing in the specified PE1 inside the group 2 is stored in the other PE0 and PE2 designated with a specified identification code (ID=1) as a reference, and the stored middle result is read out with (ID=1) as a reference and used for following processing. The result of processing is selected and outputted with (ID=1) as a reference.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、数多くのＰＥ(Pro
cessor Element) に対して単一の命令で多並列処理を実
行するＳＩＭＤ(Single Instruction Multiple Data)制
御方式の並列処理装置およびその制御方法に関する。The present invention relates to a number of PE (Pro
The present invention relates to a parallel processing device of a SIMD (Single Instruction Multiple Data) control method for executing multi-parallel processing with a single instruction for a processor element and a control method thereof.

【０００２】[0002]

【従来の技術】ＳＩＭＤ制御の多並列プロセッサでは、
複数の要素プロセッサＰＥを１次元に配置し、当該複数
の要素プロセッサ（ＰＥ）を単一の命令により制御する
ものである。従来、このＳＩＭＤ制御の多並列プロセッ
サでは、同一のデータを割り当てることはなく、各要素
プロセッサに対してデータが常に１対１もしくは１対多
になっていた。また、要素プロセッサ間通信を行う機能
も有するが、この機能はあくまで、一つの処理に他の要
素プロセッサの結果を利用する場合に用いるために設け
られていた。2. Description of the Related Art In a SIMD-controlled multi-parallel processor,
A plurality of element processors PE are arranged one-dimensionally, and the plurality of element processors (PE) are controlled by a single instruction. Conventionally, in this SIMD-controlled multi-parallel processor, the same data has not been assigned, and the data has always been one-to-one or one-to-many for each element processor. It also has a function of performing inter-element processor communication, but this function is provided for use only when the result of another element processor is used for one process.

【０００３】[0003]

【発明が解決しようとする課題】従来のＳＩＭＤ制御の
多並列プロセッサにおいて、しばしばローカルメモリ
（ＬＭ）の不足や、処理ステップ数の不足が問題となっ
ていた。また、ＳＩＭＤ制御プロセッサの構造上要素プ
ロセッサ数が固定されているのに対し、データ・ストリ
ームの長さが必ずしも要素プロセッサ数と一致している
とは限らないため、実際には使用されることのない要素
プロセッサが数多く存在する場合があり、これらの要素
プロセッサは無駄となっていた。また、データ・ストリ
ームの中で所定の異なる処理を周期的に行う処理も多
い。ＳＩＭＤ制御では、所定数おきに要素プロセッサの
一つが有効な処理中のとき、他の要素プロセッサも同じ
手順の処理を行うが、周期的な処理では、他の要素プロ
セッサの処理結果は出力されない。このため、他の要素
プロセッサの処理は、有効な処理に何ら寄与しておら
ず、この点でも無駄が多かった。In a conventional SIMD-controlled multi-parallel processor, a shortage of a local memory (LM) and a shortage of processing steps are often problems. Further, while the number of element processors is fixed due to the structure of the SIMD control processor, the length of the data stream does not always match the number of element processors. There may be many missing element processors, and these element processors have been wasted. In addition, there are many processes in which predetermined different processes are periodically performed in a data stream. In the SIMD control, when one of the element processors is in a valid process every predetermined number of times, the other element processors perform the same process, but in the periodic processing, the processing results of the other element processors are not output. For this reason, the processing of the other element processors does not contribute to the effective processing at all, and this point is wasteful.

【０００４】本発明の目的は、このような要素プロセッ
サが無駄になるデータ・ストリームに対して、要素プロ
セッサを有効利用したうえでローカルメモリを実質的に
複数倍に増加させる処理を、ハードウェアの構成をその
ままに実現することにある。また、本発明の他の目的
は、フィルタリング処理等、パラメータが異なるだけで
全く同じような処理を繰り返すような処理の場合に、従
来では出力データの算出に全く寄与していなかった要素
プロセッサを有効利用し、並列処理数を実質的に複数倍
に増加させた効率が高い処理を、ハードウェアの構成を
そのままに実現することにある。[0004] An object of the present invention is to perform a process of effectively increasing the local memory multiple times for such a data stream in which the element processor is wasted, while effectively utilizing the element processor. The point is to realize the configuration as it is. Further, another object of the present invention is to enable an element processor which has not conventionally contributed to the calculation of output data, in the case of processing such as filtering processing in which exactly the same processing is repeated with only different parameters. An object of the present invention is to realize highly efficient processing using the number of parallel processes substantially multiple times by using the hardware configuration as it is.

【０００５】[0005]

【課題を解決するための手段】本発明の第１の観点に係
るＳＩＭＤ制御並列処理方法は、１次元的に配列された
複数の要素プロセッサを単一の命令により制御するＳＩ
ＭＤ制御並列処理方法であって、所定数の要素プロセッ
サからなるグループごとに同じ配列で繰り返し割り当て
られ、かつ、各グループ内で任意の要素プロセッサを特
定可能な識別コードを各要素プロセッサごとに付与し、
上記複数の要素プロセッサにデータを入力して、同一の
処理を実行させ、上記グループ内の特定の要素プロセッ
サにおいて、上記処理の途中結果を、特定の識別コード
を基準に指定した他の要素プロセッサに記憶させ、当該
記憶した途中結果を、上記特定の識別コードを基準とし
た指定により読み出して後続の処理に用いる。好適に、
上記処理の結果を、特定の識別コードを基準とした指定
により選択して出力させる。According to a first aspect of the present invention, there is provided an SIMD control parallel processing method for controlling a plurality of one-dimensionally arranged element processors by a single instruction.
An MD control parallel processing method, wherein an identification code that is repeatedly assigned in the same arrangement for each group of a predetermined number of element processors and that can specify an arbitrary element processor in each group is assigned to each element processor. ,
Data is input to the plurality of element processors, and the same processing is executed.In a specific element processor in the group, an intermediate result of the processing is transmitted to another element processor specified based on a specific identification code. The stored intermediate result is read out according to the specification based on the specific identification code and used for the subsequent processing. Preferably,
The result of the above processing is selected and output according to a specification based on a specific identification code.

【０００６】本発明の第２の観点に係るＳＩＭＤ制御並
列処理方法は、１次元的に配列された複数の要素プロセ
ッサを単一の命令により制御するＳＩＭＤ制御並列処理
方法であって、所定数の要素プロセッサからなるグルー
プごとに同じ配列で繰り返し割り当てられ、かつ、各グ
ループ内で任意の要素プロセッサを特定可能な識別コー
ドを各要素プロセッサごとに付与し、上記複数の要素プ
ロセッサにデータとパラメータを入力して、同一の処理
を実行させ、上記要素プロセッサの処理結果の幾つか
を、識別コードを基準に指定して統合処理し、上記統合
処理の結果を、識別コードを基準とした指定により選択
して出力させる。A SIMD control parallel processing method according to a second aspect of the present invention is a SIMD control parallel processing method for controlling a plurality of one-dimensionally arranged element processors by a single instruction. An identification code that is repeatedly assigned in the same arrangement for each group of element processors and that can identify an arbitrary element processor within each group is assigned to each element processor, and data and parameters are input to the plurality of element processors. Then, the same processing is executed, some of the processing results of the element processors are integrated with reference to the identification code, and the result of the integration processing is selected by specification with the identification code as the reference. Output.

【０００７】本発明の第３の観点に係るＳＩＭＤ制御並
列処理装置は、情報を記憶するメモリ部、メモリ部に記
憶された情報を基に処理を実行する処理部をそれぞれに
含んで１次元的に配列され、隣接相互間でデータ通信が
可能な複数の要素プロセッサと、上記複数の要素プロセ
ッサを単一の命令によりＳＩＭＤ制御するＳＩＭＤ制御
回路とを有し、上記ＳＩＭＤ制御に、所定数の要素プロ
セッサからなるグループごとに同じ配列で繰り返し割り
当てられ、かつ、各グループ内で任意の要素プロセッサ
を特定可能な識別コードを各要素プロセッサごとに付与
し、上記複数の要素プロセッサにデータを入力して、同
一の処理を実行させ、上記グループ内の特定の要素プロ
セッサにおいて、上記処理の途中結果を、特定の識別コ
ードを基準に指定した他の要素プロセッサに記憶させ、
当該記憶した途中結果を、上記特定の識別コードを基準
とした指定により読み出して後続の処理に用いる制御を
含む。好適に、上記ＳＩＭＤ制御に、さらに、上記処理
の結果を上記特定の識別コードを基準とした指定により
選択して出力させる制御を含む。A SIMD control parallel processing device according to a third aspect of the present invention includes a memory unit for storing information and a processing unit for executing processing based on the information stored in the memory unit. And a SIMD control circuit for performing SIMD control of the plurality of element processors by a single instruction. The SIMD control includes a predetermined number of elements. It is repeatedly assigned in the same arrangement for each group of processors, and, in each group, assigns an identification code capable of specifying an arbitrary element processor to each element processor, and inputs data to the plurality of element processors, Execute the same process and specify the intermediate result of the above process on the specific element processor in the above group based on the specific identification code It has been stored in the other element processor,
The stored intermediate result is read out based on the specification based on the specific identification code and used for subsequent processing. Preferably, the SIMD control further includes a control for selecting and outputting the result of the processing based on the specification based on the specific identification code.

【０００８】本発明の第４の観点に係るＳＩＭＤ制御並
列処理装置は、情報を記憶するメモリ部、メモリ部に記
憶された情報を基に処理を実行する処理部をそれぞれに
含んで１次元的に配列され、隣接相互間でデータ通信が
可能な複数の要素プロセッサと、上記複数の要素プロセ
ッサを単一の命令によりＳＩＭＤ制御するＳＩＭＤ制御
回路とを有し、上記ＳＩＭＤ制御に、所定数の要素プロ
セッサからなるグループごとに同じ配列で繰り返し割り
当てられ、かつ、各グループ内で任意の要素プロセッサ
を特定可能な識別コードを各要素プロセッサごとに付与
し、上記複数の要素プロセッサにデータとパラメータを
入力して、同一の処理を実行させ、上記要素プロセッサ
の処理結果の幾つかを、識別コードを基準に指定して統
合処理し、上記統合処理の結果を、識別コードを基準と
した指定により選択して出力させる制御を含む。A SIMD control parallel processing apparatus according to a fourth aspect of the present invention includes a memory unit for storing information and a processing unit for executing processing based on the information stored in the memory unit. And a SIMD control circuit for performing SIMD control of the plurality of element processors by a single instruction. The SIMD control includes a predetermined number of elements. An identification code that is repeatedly assigned in the same arrangement for each group of processors and that can identify an arbitrary element processor in each group is assigned to each element processor, and data and parameters are input to the plurality of element processors. Then, the same processing is executed, and some of the processing results of the element processors are integrated based on the identification code as a reference. The result of the process, including control to select and output the specified relative to the identification code.

【０００９】このような本発明に係るＳＩＭＤ制御並列
処理方法および装置では、認識コードの繰り返し配列付
与により、要素プロセッサ群を所定数の要素プロセッサ
からなるグループに分け、このグループ内の複数の要素
プロセッサを１つの仮想的な要素プロセッサとして動作
させる。In the SIMD control parallel processing method and apparatus according to the present invention, a group of element processors is divided into a group consisting of a predetermined number of element processors by repeatedly arranging recognition codes, and a plurality of element processors in this group are divided. Are operated as one virtual element processor.

【００１０】具体的に、上記第１および第３の観点に係
るＳＩＭＤ制御並列処理方法および装置では、複数の要
素プロセッサのメモリ部を１つのデータに対して仮想的
に割り当てて使用する。このメモリ部への書き込みおよ
び読み出しでは、認識コードを基にメモリ部の指定を行
う。認識コードを基にしたメモリ指定では、メモリ部へ
の書き込みおよび読み出しを、基準となる要素プロセッ
サのみ正常にできる。他の要素プロセッサでは、書き込
み先が指定されなかったり、データを別の場所から読み
出すため、処理結果が誤ったものとなる。そして、基準
となる要素プロセッサのみから正常な出力データが得ら
れる。このような制御では、要素プロセッサ・ブロック
内の要素プロセッサ数をｎとした場合、並列処理装置全
体として一度に処理できるデータ数は１／ｎになるもの
の、1 つのデータに対して使用できるメモリ容量を実質
的にほぼｎ倍にすることができる。このような仮想的な
メモリ拡張手法は、間欠的にしか出力データは要らない
が、処理時に多量のメモリ空間を必要とする場合に適し
ている。また、ＳＩＭＤ制御回路のプログラム制御で行
えるため、従来のＳＩＭＤ制御並列処理装置のハードウ
ェア構成を全く変更する必要がない。Specifically, in the SIMD control parallel processing method and apparatus according to the first and third aspects, the memory units of a plurality of element processors are virtually allocated to one data and used. In writing to and reading from the memory unit, the memory unit is specified based on the recognition code. With the memory designation based on the recognition code, writing and reading to and from the memory unit can be performed normally only by the reference element processor. In other element processors, the write destination is not specified, or data is read from another location, so that the processing result is incorrect. Then, normal output data can be obtained only from the reference element processor. In such a control, when the number of element processors in an element processor block is n, the number of data that can be processed at a time by the entire parallel processing apparatus is 1 / n, but the memory capacity that can be used for one data Can be substantially increased by a factor of n. Such a virtual memory expansion method requires output data only intermittently, but is suitable when a large amount of memory space is required for processing. Further, since it can be performed by program control of the SIMD control circuit, there is no need to change the hardware configuration of the conventional SIMD control parallel processing device at all.

【００１１】また、第２，第４の観点に係るＳＩＭＤ制
御並列処理方法および装置では、同一グループ内の各要
素プロセッサ全てに同一データを入力し、かつ、異なる
パラメータを入力することで、データに対しパラメータ
の異なる処理が完全に並列で実行される。処理結果を各
要素プロセッサの認識コードを基に統合処理を行う際
に、基準となる要素プロセッサのみ正常の統合処理がで
きる。また、最終的に、各要素プロセッサの出力を認識
コードで選択制御する。これにより、各々のグループで
１つ正しい処理結果が出力データとして得られる。この
処理では、各グループを仮想的な１つの要素プロセッサ
と見なせば、その仮想要素プロセッサは個々の要素プロ
セッサよりも高い処理能力を持つことになる。実質的に
は、この処理手法によって有効な出力が得られる要素プ
ロセッサ数は減る。ところが、画像処理の分野などで
は、一度に並列処理させたい画素は間欠的な場合も多い
ので、本手法によって、従来は数回に１度しか処理に寄
与していなかった要素プロセッサの処理能力を有効活用
したことになる。しかも、本実施形態では、この並列処
理数を増加させた高効率なＳＩＭＤ制御プロセッサは、
ハードウェアの構成を全く変更する必要がない。In the SIMD control parallel processing method and apparatus according to the second and fourth aspects, the same data is input to all the element processors in the same group and different parameters are input, so that On the other hand, processing with different parameters is executed completely in parallel. When integrating the processing results based on the recognition code of each element processor, normal integration processing can be performed only for the reference element processor. Finally, the output of each element processor is selectively controlled by the recognition code. Thereby, one correct processing result is obtained as output data in each group. In this processing, if each group is regarded as one virtual element processor, the virtual element processor has a higher processing capability than the individual element processors. In effect, the number of element processors that can obtain an effective output by this processing method is reduced. However, in the field of image processing and the like, pixels that are desired to be processed in parallel at one time are often intermittent. Therefore, this method can reduce the processing power of element processors that have conventionally only contributed to processing once every several times. This means that it was used effectively. Moreover, in the present embodiment, a highly efficient SIMD control processor with an increased number of parallel processes is:
There is no need to change the hardware configuration at all.

【００１２】[0012]

【発明の実施の形態】本発明は、たとえば、画像の１水
平スキャンラインを１つのデータ・ストリームの単位と
して、水平方向の画素データを各要素プロセッサＰＥに
１対１に対応させ、次のスキャンラインまでの１水平走
査期間中に処理を行う、ＳＩＭＤ制御の多並列画像ＤＳ
Ｐ(Digital Signal Processor)に対して有効なＳＩＭＤ
制御手法を提供する。本発明の実施例として、図とフロ
ーチャートを利用し、具体的な事例に基づいて説明して
いく。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention relates to, for example, one horizontal scan line of an image as a unit of one data stream, one-to-one correspondence of pixel data in the horizontal direction to each element processor PE, and the next scan. SIMD-controlled multi-parallel image DS that performs processing during one horizontal scanning period up to the line
SIMD effective for P (Digital Signal Processor)
Provide a control method. Embodiments of the present invention will be described based on specific cases using drawings and flowcharts.

【００１３】第１実施形態第１実施形態では、多並列のＳＩＭＤ制御プロセッサに
おいて、複数の要素プロセッサに対して１つのデータを
割り当て、そのデータ処理過程の中間結果等を複数の要
素プロセッサに分散して保存することで、実質的に１つ
のデータ当たりのローカルメモリＬＭ(Local Memory)を
複数倍に増大させる手法を提案する。 First Embodiment In the first embodiment, in a multi-parallel SIMD control processor, one data is allocated to a plurality of element processors, and an intermediate result of the data processing is distributed to the plurality of element processors. A method is proposed in which the local memory LM (Local Memory) per one data is substantially increased multiple times by storing the data.

【００１４】まず、本件で前提とするＳＩＭＤ(Single
Instruction Multiple Data)制御の多並列デジタル・プ
ロセッサについて説明する。図１に、実施形態に係るＳ
ＩＭＤ制御の多並列プロセッサの要部構成を示す。First, SIMD (Single)
A multi-parallel digital processor controlled by Instruction Multiple Data will be described. FIG. 1 shows the S according to the embodiment.
1 shows a main configuration of an IMD-controlled multi-parallel processor.

【００１５】このプロセッサ１の内部に多数の要素プロ
セッサ(Processor Element) が１次元的に配置されてい
る。図１ではその一部として、連続した３つの要素プロ
セッサＰＥ０，ＰＥ１，ＰＥ２のみ示している。各要素
プロセッサＰＥ０，ＰＥ１，ＰＥ２の内部には、ローカ
ルメモリ(Local Memory)ＬＭ０，ＬＭ１，ＬＭ２とロジ
ック回路が含まれている。また、これら要素プロセッサ
群をＳＩＭＤ制御するＳＩＭＤ制御回路１０を有する。A large number of processor elements (Processor Elements) are one-dimensionally arranged inside the processor 1. FIG. 1 shows only three continuous element processors PE0, PE1, and PE2 as a part thereof. Each of the element processors PE0, PE1, PE2 includes a local memory (Local Memory) LM0, LM1, LM2 and a logic circuit. Further, it has a SIMD control circuit 10 for performing SIMD control on these element processors.

【００１６】ロジック回路および配線部分の要部は、図
１では代表して中央の要素プロセッサＰＥ１についての
み示している。本実施形態に係るＳＩＭＤ制御多並列プ
ロセッサ１は、プロセッサとして最低限必要なローカル
メモリＬＭ０，ＬＭ１，ＬＭ２と、ＡＬＵ(Arithmetic
Logic Unit) の他、幾つかの機能を有するロジック回路
もしくは配線を必要とする。FIG. 1 shows only the central element processor PE1 as a representative of the logic circuit and the main part of the wiring portion. The SIMD control multi-parallel processor 1 according to the present embodiment includes local memories LM0, LM1, and LM2, which are the minimum required as processors, and an ALU (Arithmetic).
Logic Unit) and a logic circuit or wiring having some functions.

【００１７】ある要素プロセッサＰＥ１と、その隣接す
る要素プロセッサＰＥ０，ＰＥ２のローカルメモリＬＭ
０，ＬＭ２とのデータの通信機能を実現するデータ・パ
ス３は、図１のようにそれぞれの要素プロセッサＰＥ
０，ＰＥ１，ＰＥ２について最低左右１本ずつあればよ
い。これによって、要素プロセッサＰＥ１のＡＬＵは、
隣接する要素プロセッサＰＥ０，ＰＥ２のローカルメモ
リＬＭ０，ＬＭ２のデータを読み込んで処理することが
できる。また、逆に、要素プロセッサＰＥ１のＡＬＵが
処理し、ローカルメモリＬＭ１に格納されていたデータ
を隣接する要素プロセッサＰＥ０，ＰＥ２が受信して、
自らのデータとして処理することもできる。The local memory LM of a certain element processor PE1 and its adjacent element processors PE0 and PE2
0, LM2, the data path 3 for realizing the function of communicating data with each of the element processors PE as shown in FIG.
0, PE1, and PE2 may be at least one each on the left and right. As a result, the ALU of the element processor PE1 becomes
The data in the local memories LM0 and LM2 of the adjacent element processors PE0 and PE2 can be read and processed. Conversely, the ALU of the element processor PE1 processes the data, and the adjacent element processors PE0 and PE2 receive the data stored in the local memory LM1.
You can also process it as your own data.

【００１８】なお、ここでは、以下の説明を簡略化する
ために、隣接する要素プロセッサＰＥ０，ＰＥ２のロー
カルメモリＬＭ０，ＬＭ２とのデータの通信機能を実現
するデータ・パス３は隣接左右の要素プロセッサＬＭ
０，ＬＭ２に対してそれぞ１本ずつ存在するものとす
る。もちろん、片側に隣接する２以上の要素プロセッサ
とのデータ・パスを有する構成でもよいし、これが煩雑
であれば、たとえばＡＬＵ間通信機能を設け、この機能
を利用して遠く離れた要素プロセッサとのデータのやり
取りも可能である。In order to simplify the following description, the data path 3 for realizing the data communication function with the local memories LM0 and LM2 of the adjacent element processors PE0 and PE2 is connected to the adjacent left and right element processors. LM
It is assumed that one exists for each of 0 and LM2. Of course, a configuration having a data path with two or more adjacent element processors on one side may be used. If this is complicated, for example, an ALU communication function is provided, and this function is used to connect to a distant element processor. Data exchange is also possible.

【００１９】また、要素プロセッサＰＥ１は、そのロー
カルメモリＬＭ１の記憶値（具体的には、後述するＩＤ
番号）に応じて、データ・パス３により接続されたロー
カルメモリＬＭ０，ＬＭ１，ＬＭ２のいずれかを選択す
るメモリ選択手段４と、ＡＬＵから送られきた出力デー
タの有効または無効を判定する有効性判定手段５も有す
る。なお、本実施形態におけるメモリ選択手段４および
有効性判定手段５は、実際に物理的構成を新たに付加し
たものではなく、既存の構成を利用したＳＩＭＤ制御回
路１０のプログラムにより実現されている。メモリ選択
手段４は、たとえば、メモリ選択用の１ビットのフラグ
という形で、図示しない制御線の論理を１または０に変
化させ、これによりローカルメモリＬＭ０，ＬＭ１，Ｌ
Ｍ２に読み出し等の許可を与えることで実現される。ま
た、有効性判定手段５は、要素プロセッサＰＥ０，ＰＥ
１，ＰＥ２の出力データを処理する外部のブロック、た
とえば図示を省略した出力回路の既存の機能を利用し
て、送られてくる出力データを取り込むか否かの制御に
より実現される。The element processor PE1 stores a value stored in the local memory LM1 (specifically, an ID
No.), a memory selection means 4 for selecting one of the local memories LM0, LM1, and LM2 connected by the data path 3, and a validity determination for determining whether the output data sent from the ALU is valid or invalid. Means 5 are also provided. Note that the memory selection unit 4 and the validity determination unit 5 in the present embodiment are not actually added with a new physical configuration, but are realized by a program of the SIMD control circuit 10 using an existing configuration. The memory selection means 4 changes the logic of a control line (not shown) to 1 or 0, for example, in the form of a 1-bit flag for memory selection, whereby the local memories LM0, LM1, L
This is realized by giving permission such as reading to M2. In addition, the validity determining means 5 includes element processors PE0, PE
This is realized by using an existing function of an external block for processing the output data of the PE1, PE2, for example, an output circuit not shown, by controlling whether or not the output data to be sent is taken in.

【００２０】このように、本発明が適用できるＳＩＭＤ
制御の多並列プロセッサ１は、通常のＡＬＵ以外に、各
要素プロセッサ間のデータ通信機能、とくに、このデー
タ通信機能を利用して要素プロセッサの処理に用いるデ
ータを他の要素プロセッサのローカルメモリから読み出
す機能と、要素プロセッサの出力の取捨選択を制御する
機能とを有することが前提となる。Thus, SIMD to which the present invention can be applied
The controlling multi-parallel processor 1 reads a data communication function between the element processors, in particular, data used for processing of the element processor by using the data communication function from a local memory of another element processor in addition to the normal ALU. It is assumed that it has a function and a function of controlling selection of outputs of element processors.

【００２１】図２は、以上のような構成のＳＩＭＤ制御
の多並列プロセッサ１の具体的処理の例を示すフローチ
ャートである。FIG. 2 is a flowchart showing an example of specific processing of the SIMD-controlled multi-parallel processor 1 having the above-described configuration.

【００２２】まず、ステップＳＴ０において、連続した
複数の要素プロセッサＰＥ０，ＰＥ１，ＰＥ２を１つの
仮想的なグループ２とするために、要素プロセッサ間の
データ通信機能を利用して、各要素プロセッサ・グルー
プ２，…内のローカルメモリＬＭ０，ＬＭ１，ＬＭ２に
それぞれ固有のＩＤ番号を設定する。このＩＤ番号設定
では、各要素プロセッサ同士をつなぐデータ・パス３を
介して、例えば図示した３つの要素プロセッサＰＥ０，
ＰＥ１，ＰＥ２ごとのＩＤ番号ＩＤ₀ ，ＩＤ₁ ，ＩＤ₂
として０，１，２を付与する。その際、このＩＤ番号の
配列を０，１，２，０，１，２，０，１，…と他のグル
ープにも同じように繰り返し付与する。そして、各グル
ープ内で固有のＩＤ番号を、すべての要素プロセッサ内
のローカルメモリの同一アドレスに記憶させる。First, in step ST0, in order to make a plurality of continuous element processors PE0, PE1 and PE2 into one virtual group 2, each element processor group is utilized by utilizing a data communication function between element processors. Each of the local memories LM0, LM1, and LM2 in 2,... Is assigned a unique ID number. In this ID number setting, for example, the illustrated three element processors PE0, PE0 are connected via a data path 3 connecting the element processors.
PE1, ID of each PE2 number ID _0, ID _1, ID ₂
Are assigned as 0, 1, and 2. At this time, the array of ID numbers is repeatedly assigned to other groups in the same manner as 0, 1, 2, 0, 1, 2, 0, 1,. Then, the unique ID number in each group is stored at the same address in the local memory in all the element processors.

【００２３】より具体的なＩＤ番号設定方法としては、
まず、図１で示した３つの要素プロセッサＰＥ０，ＰＥ
１，ＰＥ２を含めたすべての要素プロセッサのうちで１
番左に位置するものが、更に左側の要素プロセッサから
読み出しを行おうとした場合に、常に読み出し可能なハ
ードウェアの状態にする。たとえば、処理に応じて要素
プロセッサがｎ個のグループにすることが最適であると
判断されるが、要素プロセッサの個数をｎで割ると端数
が生じる場合には、その左端および／または右端の幾つ
かの要素プロセッサにアクセス禁止を意味するＩＤ番号
を付与する。これによって、あるプロセッサが処理する
際に読み出す要素プロセッサが存在しないということが
防止される。そして、まず、すべての要素プロセッサに
ついて、ＩＤ番号を格納するアドレスのローカルメモリ
の値をクリアしておく。As a more specific ID number setting method,
First, the three element processors PE0 and PE shown in FIG.
1, one of all element processors including PE2
When the leftmost one attempts to read out from the element processor on the left side, the hardware state is always set to be readable. For example, it is determined that it is optimal to divide the number of element processors into n groups according to processing, but if the number of element processors is divided by n, a fraction is generated. An ID number meaning access prohibition is assigned to one of the element processors. This prevents that there is no element processor to read when a certain processor performs processing. First, the values of the local memory at the addresses where the ID numbers are stored are cleared for all the element processors.

【００２４】続いて、アクセスが禁止されていない処理
可能な要素プロセッサ群のうち、一番左端の要素プロセ
ッサのＩＤ番号を格納するアドレスのメモリ領域にＩＤ
＝０を書き込む。そして、ＳＩＭＤ制御の特質を利用し
て、他の全ての要素プロセッサに、“左隣りの要素プロ
セッサＰＥのＩＤに１足した値を自らのＩＤとし、ＩＤ
の値が３になったら０に戻す”という処理の命令を一括
して送る。その結果、最初のＩＤ書き込み（ＩＤ＝０）
に続いて、左側から順次、ＩＤ番号が１，２，０，１，
２，０，１，…と確定してゆく。このようなＩＤ番号設
定では、要素プロセッサの数が多くても、一番右側の要
素プロセッサまで規則的なＩＤ番号が伝搬され、しかも
最後のＩＤ番号が必ず２となる。確定したＩＤ番号は、
各ローカルメモリ内の所定アドレスに保存される。Subsequently, the ID is stored in the memory area of the address for storing the ID number of the leftmost element processor among the group of processable element processors whose access is not prohibited.
= 0 is written. Then, by utilizing the characteristic of the SIMD control, all other element processors are notified that "the value of the ID of the element processor PE on the left side is added by one, and the ID of the own element processor is" ID ".
Is returned to 0 when the value of becomes "3". As a result, the first ID write (ID = 0)
Then, from the left side, the ID numbers are 1, 2, 0, 1,
It is decided as 2,0,1, ... In such ID number setting, even if the number of element processors is large, a regular ID number is propagated to the rightmost element processor, and the last ID number is always 2. The confirmed ID number is
It is stored at a predetermined address in each local memory.

【００２５】次に、ステップＳＴ１において、入力デー
タを各グループ内の特定の１つの要素プロセッサにのみ
配信する。すなわち、たとえば図１に示すＩＤ番号ＩＤ
₀ ，ＩＤ₁ ，ＩＤ₂ が左から順に０，１，２となるよう
な３つの連続した要素プロセッサＰＥ０，ＰＥ１，ＰＥ
２を１つの要素プロセッサ・グループ２として、１つの
要素プロセッサ・グループ２内のある特定の要素プロセ
ッサに対してのみデータを入力していく。ここでは、３
つの要素プロセッサＰＥ０，ＰＥ１，ＰＥ２の中央の要
素プロセッサＰＥ１にのみ有効なデータを入力すること
とする。Next, in step ST1, the input data is distributed to only one specific element processor in each group. That is, for example, the ID number ID shown in FIG.
₀ , ID ₁ , and ID ₂ are three consecutive element processors PE 0, PE 1, and PE 2 from left to right in order of 0, 1, and 2, respectively.
2 as one element processor group 2, data is input only to a specific element processor in one element processor group 2. Here, 3
It is assumed that valid data is input only to the central element processor PE1 of the two element processors PE0, PE1, PE2.

【００２６】つぎに、ステップＳＴ２の演算処理１に移
るが、ここでは、まず、ＳＩＭＤ制御により要素プロセ
ッサＰＥ０，ＰＥ１，ＰＥ２を含むすべての要素プロセ
ッサに対し、共通した単一の処理命令を発行する。ただ
し、前記したように実際に有効なデータが入力されてい
るのは中央の要素プロセッサＰＥ１のみである。そし
て、与えられた処理命令に沿って、各要素プロセッサＰ
Ｅ０，ＰＥ１，ＰＥ２が演算処理１を実行する。Next, the operation proceeds to the arithmetic processing 1 in step ST2. Here, a single common processing instruction is issued to all the element processors including the element processors PE0, PE1, and PE2 by SIMD control. . However, as described above, only the central element processor PE1 actually receives valid data. Then, along with the given processing instruction, each element processor P
E0, PE1, and PE2 execute the operation processing 1.

【００２７】いま、この処理の途中で、中央の要素プロ
セッサＰＥ１は、演算過程でその中間処理結果をテンポ
ラリ・データとして自らのローカルメモリＬＭ１に保存
しておきたいが、これを保存すると、そのメモリ容量が
限界近くに達してしまう場合がある。このような場合、
実際に有効なデータを処理していない左右の要素プロセ
ッサＰＥ０，ＰＥ２のローカルメモリＬＭ０，ＬＭ２の
メモリ領域を一時的なデータ保存用に利用する。また、
自らのローカルメモリＬＭ１に十分な空き領域がある場
合でも、後の合成処理時のデータの一時退避用などの用
途に、この空き領域はとっておきたいこともあり、この
場合も、ローカルメモリＬＭ０，ＬＭ２を一時的に利用
する。いずれにしても処理途中の演算結果を中間結果１
として出力するのであるが、ここでの中間結果１（後述
する中間結果２も同様）は、それに続く処理に直ぐには
用いられないような演算結果を想定している。たとえば
繰り返し計算の幾つかのルーチンワークが完結した時点
の演算結果などが、ここで言う中間結果に該当する。た
だし、出力したい演算結果が、続く処理で用いるような
演算結果とならざるを得ない場合も多い。その場合、以
下の説明では特に言及しないが、つぎのステップＳＴ
３，ＳＴ４等を経て保存した後、通常の通信機能がある
ＳＩＭＤ制御プロセッサが行っているように、その時々
で適宜、ローカルメモリＬＭ０またはＬＭ２に保存して
いる中間結果を読み出して処理に利用することになる。In the course of this processing, the central element processor PE1 wants to store the intermediate processing result as temporary data in its own local memory LM1 in the course of the operation. The capacity may reach the limit. In such a case,
The memory areas of the local memories LM0 and LM2 of the left and right element processors PE0 and PE2 which are not actually processing valid data are used for temporary data storage. Also,
Even if the local memory LM1 has a sufficient free area, the free area may be desired to be reserved for the purpose of temporarily saving data during the later synthesis processing. In this case, the local memories LM0 and LM2 are also used. Use temporarily. In any case, the calculation result in the middle of the processing is changed to the intermediate result 1
However, the intermediate result 1 (the same applies to an intermediate result 2 described later) is assumed to be an operation result that is not immediately used for the subsequent processing. For example, the calculation result at the time when some routine work of the repetition calculation is completed corresponds to the intermediate result here. However, in many cases, the calculation result to be output has to be a calculation result used in the subsequent processing. In that case, although not particularly mentioned in the following description, the next step ST
3, after storing through ST4, etc., the intermediate result stored in the local memory LM0 or LM2 is read and used at any time as appropriate, as is performed by a SIMD control processor having a normal communication function. Will be.

【００２８】中間結果１の保存に先立って、まず、ステ
ップＳＴ３において、要素プロセッサＰＥ０，ＰＥ１，
ＰＥ２それぞれにおいて、自己のＩＤ番号が、有効デー
タを入力した要素プロセッサＰＥ１以外の所定のＩＤ番
号と一致するかが判断される。この例では、まず、所定
のＩＤ番号（ＩＤ₀ ＝０）と自己のＩＤ番号との一致が
判断される。着目している中央の要素プロセッサＰＥ１
と、その右隣の要素プロセッサＰＥ２は、ともにＩＤ番
号が不一致なので、中間結果１を保持したまま、他の要
素プロセッサから中間結果１の掃き出しを要求されるま
で待機状態にある。所定期間待って掃き出し要求がない
場合は、処理を続行する。Prior to storing the intermediate result 1, first, in step ST3, the element processors PE0, PE1,
In each PE2, it is determined whether or not its own ID number matches a predetermined ID number other than the element processor PE1 that has input the valid data. In this example, first, it is determined whether the predetermined ID number (ID ₀ = 0) matches its own ID number. The central element processor PE1 of interest
Since the ID numbers of the element processors PE2 on the right side do not match, the intermediate processor 1 holds the intermediate result 1 and is in a standby state until another element processor requests the sweeping of the intermediate result 1. If there is no sweep request after a predetermined period, the process is continued.

【００２９】一方、左側の要素プロセッサＰＥ０は、Ｉ
Ｄ番号が一致するので、ステップＳＴ４において、共通
の処理命令に従って中間結果１の保存を行う。この処理
命令には、たとえば“自己のＩＤがＩＤ＝ｍ−１（ｍ：
有効データを入力したＰＥのＩＤ番号）ならば、ＰＥｍ
の中間結果１をＬＭ（ｍ−１）に保存せよ”といった指
示が含まれている。したがって、要素プロセッサＰＥ０
は、隣の要素プロセッサＰＥ１に中間結果１を要求し、
送られてきた中間結果１を、自己のローカルメモリＬＭ
０に保存した後、次の処理ステップに進む。要素プロセ
ッサＰＥ１は、中間結果１を排出した時点で、次の処理
ステップに進む。On the other hand, the element processor PE0 on the left side
Since the D numbers match, in step ST4, the intermediate result 1 is stored according to the common processing instruction. This processing instruction includes, for example, “the self ID is ID = m−1 (m:
If it is the ID number of the PE that input the valid data), then PEm
Is stored in LM (m-1). "Therefore, the element processor PE0
Requests an intermediate result 1 from an adjacent element processor PE1,
The received intermediate result 1 is stored in its own local memory LM.
After saving to 0, proceed to the next processing step. The element processor PE1 proceeds to the next processing step when the intermediate result 1 is discharged.

【００３０】各要素プロセッサＰＥ０，ＰＥ１，ＰＥ２
は、同様にして、ステップＳＴ５で演算処理２を実行す
る。この処理途中に、再びテンポラリ・データを保存す
る必要が生じれば、上記と同様にして、ＩＤ番号の一致
判断とデータ保存を行う。すなわち、ステップＳＴ６に
おいて、自己のＩＤ番号と、ステップＳＴ３で使用しな
かった他のＩＤ番号（ＩＤ＝２）との一致が判断され
る。着目している中央の要素プロセッサＰＥ１と、その
左隣の要素プロセッサＰＥ０は、ともにＩＤ番号が不一
致なので、中間結果２を保持したまま、他の要素プロセ
ッサから中間結果２の掃き出しを要求されるまで待機状
態にある。所定期間待って掃き出し要求がない場合は、
処理を続行する。Each element processor PE0, PE1, PE2
Executes the arithmetic processing 2 in step ST5 in the same manner. If it becomes necessary to store the temporary data again during this process, the ID number coincidence determination and the data storage are performed in the same manner as described above. That is, in step ST6, it is determined whether the own ID number matches another ID number (ID = 2) not used in step ST3. Since the ID of the central element processor PE1 of interest and the element processor PE0 on the left side do not match, the intermediate element processor PE1 holds the intermediate result 2 and keeps holding the intermediate result 2 until another element processor requests the sweeping of the intermediate result 2. In standby state. If there is no sweep request after waiting for a predetermined period,
continue processing.

【００３１】一方、右側の要素プロセッサＰＥ２は、Ｉ
Ｄ番号が一致するので、ステップＳＴ７において、共通
の処理命令に従って中間結果２の保存を行う。この処理
命令には、たとえば“自己のＩＤがＩＤ＝ｍ＋１（ｍ：
有効データを入力したＰＥのＩＤ番号）ならば、ＰＥｍ
の中間結果２をＬＭ（ｍ＋１）に保存せよ”といった指
示が含まれている。したがって、要素プロセッサＰＥ２
は、隣の要素プロセッサＰＥ１に中間結果２を要求し、
送られてきた中間結果２を、自己のローカルメモリＬＭ
２に保存した後、次の処理ステップに進む。要素プロセ
ッサＰＥ１は、中間結果２を排出した時点で、次の処理
ステップに進む。On the other hand, the element processor PE2 on the right
Since the D numbers match, in step ST7, the intermediate result 2 is stored according to the common processing instruction. This processing instruction includes, for example, “the self ID is ID = m + 1 (m:
If it is the ID number of the PE that input the valid data), then PEm
Is stored in LM (m + 1). ”Therefore, the element processor PE2
Requests an intermediate result 2 from the adjacent element processor PE1,
The received intermediate result 2 is stored in its own local memory LM.
After storing in step 2, the process proceeds to the next processing step. The element processor PE1 proceeds to the next processing step when the intermediate result 2 is discharged.

【００３２】この時点で、あるＰＥグループ２の３つの
要素プロセッサＰＥ０，ＰＥ１，ＰＥ２の中央の要素プ
ロセッサＰＥ１から見て、左右の要素プロセッサＰＥ
０，ＰＥ２のローカルメモリＬＭ０，ＬＭ２には、それ
ぞれ自分が行った演算処理の中間結果１，２が格納され
ている。したがって、要素プロセッサＰＥ１にとって、
ローカルメモリＬＭ０，ＬＭ２を、あたかも自己のロー
カルメモリＬＭ１の一部として機能させたかのような状
態になる。一方、他の２つの要素プロセッサＰＥ０，Ｐ
Ｅ２においても、同様に中間結果１，２は算出される
が、もともと有効なデータが入力されていないうえ、こ
のような大きなメモリ空間が必要な演算処理は正しく実
行できない。中間結果１，２を保管する場所がないから
である。At this point, as viewed from the central element processor PE1 of the three element processors PE0, PE1 and PE2 of a certain PE group 2, the left and right element processors PE1
In the local memories LM0 and LM2 of 0 and PE2, the intermediate results 1 and 2 of the arithmetic processing performed by themselves are stored, respectively. Therefore, for the element processor PE1,
It is as if the local memories LM0 and LM2 functioned as a part of their own local memory LM1. On the other hand, the other two element processors PE0, PE
In E2, the intermediate results 1 and 2 are similarly calculated, but valid data is not originally input, and arithmetic processing requiring such a large memory space cannot be correctly executed. This is because there is no place to store the intermediate results 1 and 2.

【００３３】以上より、１つのＰＥグループ２を仮想的
な１つの要素プロセッサＰＥと見なした場合、元の要素
プロセッサの３倍のローカルメモリ容量を持てることが
分かる。As described above, when one PE group 2 is regarded as one virtual element processor PE, it can be understood that the local memory capacity can be three times that of the original element processor.

【００３４】つぎのステップＳＴ８において、要素プロ
セッサＰＥ０，ＰＥ１，ＰＥ２は、最終の演算処理３を
行う。この演算処理３の結果が、そのまま最終の出力デ
ータとなるような場合は、ステップＳＴ９およびＳＴ１
０をスキップして、処理フローがステップＳＴ１１に進
む。In the next step ST8, the element processors PE0, PE1 and PE2 perform the final operation processing 3. If the result of this operation processing 3 becomes the final output data as it is, steps ST9 and ST1
Skipping 0, the process flow proceeds to step ST11.

【００３５】演算処理３の結果を先に算出しておいた中
間結果１，２の何れかと合成したい場合は、次のステッ
プＳＴ９において合成処理１が実行される。このとき中
間結果１を読み出して合成処理１に用いる場合は、（自
己のＩＤ番号−１）のＩＤ番号の要素プロセッサから中
間結果１を読み出す。また、中間結果２を読み出して合
成処理１に用いる場合は、（自己のＩＤ番号＋１）のＩ
Ｄ番号の要素プロセッサから中間結果２を読み出す。こ
のような処理命令はすべての要素プロセッサＰＥ０，Ｐ
Ｅ１，ＰＥ２に共通である。このため、有効データを用
いた要素プロセッサＰＥ１以外の他の要素プロセッサＰ
Ｅ０，ＰＥ２は、正常な中間結果の読み出しができな
い。たとえば要素プロセッサＰＥ２にとっては、前記し
たように中間結果を両隣の要素プロセッサに保存でき
ず、かりに自己のメモリ容量に余裕がある場合は、そこ
に中間結果１，２が保存されているはずである。にもか
かわらず、要素プロセッサＰＥ２は、この共通の命令に
より隣の要素プロセッサＰＥ１（または、存在していれ
ばＰＥ４）から全く無関係の中間結果を誤って読み出し
てしまう。したがって、合成処理１後の結果は、全く意
図しない結果となってしまう。このことは、もう１つの
要素プロセッサＰＥ０においても同じである。When it is desired to combine the result of the arithmetic processing 3 with one of the intermediate results 1 and 2 calculated previously, the combining processing 1 is executed in the next step ST9. At this time, when the intermediate result 1 is read and used for the synthesizing process 1, the intermediate result 1 is read from the element processor having the ID number of (own ID number -1). When the intermediate result 2 is read and used for the synthesizing process 1, the (self ID number + 1) I
The intermediate result 2 is read from the element processor of the D number. Such processing instructions are transmitted to all the element processors PE0, P
It is common to E1 and PE2. Therefore, other element processors P other than the element processor PE1 using the valid data
E0 and PE2 cannot read a normal intermediate result. For example, for the element processor PE2, as described above, the intermediate result cannot be stored in the adjacent element processors, and if there is a margin in its own memory capacity, the intermediate results 1 and 2 should be stored there. . Nevertheless, this common instruction causes element processor PE2 to erroneously read an unrelated intermediate result from the adjacent element processor PE1 (or PE4 if present). Therefore, the result after the synthesis processing 1 is an unintended result. This is the same for the other element processor PE0.

【００３６】このようにＳＩＭＤ制御で共通の命令が発
行されることを上手に利用することで、有効データが与
えられた中央の要素プロセッサＰＥ１に限って、正しい
データの読み出しができ、したがって正常に合成処理１
が実行できることが保証される。この合成処理１の結果
を最終的な出力データとする場合は、処理フローが次の
ステップＳＴ１０をスキップする。By making good use of the fact that the common instruction is issued under the SIMD control as described above, correct data can be read out only from the central element processor PE1 to which valid data has been given, and thus the data can be normally read out. Compositing process 1
Can be performed. When the result of the synthesis processing 1 is to be final output data, the processing flow skips the next step ST10.

【００３７】さらに、合成処理１の結果を未だ合成して
いない他の中間結果と合成したい場合は、つぎのステッ
プＳＴ１０において合成処理２が実行される。このとき
のデータ読み出しも、上記合成処理１のときと同じよう
に、中央の要素プロセッサＰＥ１のみが正しくでき、他
の要素プロセッサＰＥ０，ＰＥ２では誤読み出しとなっ
てしまう。したがって、中央の要素プロセッサＰＥ１だ
けから、正しい合成処理２の結果が出力される。Further, when it is desired to combine the result of the combining process 1 with another intermediate result which has not been combined, the combining process 2 is executed in the next step ST10. At this time, as in the case of the synthesizing process 1, only the central element processor PE1 can correctly read data, and the other element processors PE0 and PE2 read incorrectly. Therefore, a correct result of the synthesis processing 2 is output only from the central element processor PE1.

【００３８】つぎに、ステップＳＴ１１〜ＳＴ１３で出
力データの有効性判定を行う。上記したように、中央の
要素プロセッサＰＥ１のみ正しい演算結果（あるいは合
成結果）が得られているのであるから、一見、有効性判
定の判断は必要ないように思われる。しかし、当該ＳＩ
ＭＤ制御多並列プロセッサから外部にデータ出力すると
きは正しいものだけが出力されるのであるから、そのた
めには何らかの出力選択が行われる。いわゆる出力ポイ
ンタ回路の動作、すなわち出力部が有効性判定フラグを
用意して、それを１（有効）または０（無効）に変化さ
せるなどといった制御に限らず、たとえば出力部で要素
プロセッサＰＥ１以外の結果は無視するといったこと
も、広い意味で、この有効性判定に含まれる。また、必
ず２つ以上のローカルメモリを利用しなければ処理がで
きない場合にあっては、共通の命令に“自己のメモリが
限界に達したときは処理を停止する”ことを含ませてお
けば、中央の要素プロセッサＰＥ１以外からは出力デー
タそのものが送られてこない。このような制御も、ここ
でいう有効性判定の１種である。Next, the validity of the output data is determined in steps ST11 to ST13. As described above, since only the central element processor PE1 has obtained the correct operation result (or the combined result), at first glance, it does not seem necessary to determine the validity. However, the SI
When data is output from the MD-controlled multi-parallel processor to the outside, only the correct data is output. For this purpose, some output selection is performed. The operation of the output pointer circuit, that is, the control is not limited to such a control that the output unit prepares a validity determination flag and changes it to 1 (valid) or 0 (invalid). Ignoring the result is also included in this validity determination in a broad sense. In the case where processing cannot be performed unless two or more local memories are used, a common instruction may include "stop processing when own memory reaches a limit". The output data itself is not sent from other than the central element processor PE1. Such control is also one type of the validity determination here.

【００３９】図２の例では、ステップＳＴ１１におい
て、ＰＥグループ２内の全てのＩＤ番号が調べられ、Ｉ
Ｄ＝１である場合のみ、その要素プロセッサＰＥ１の出
力データをステップＳＴ１２で有効なものとして、ステ
ップＳＴ１３で出力される。一方、ＩＤ＝１でない他の
要素プロセッサＰＥ０，ＰＥ２の出力データはステップ
ＳＴ１４で無効であるとされてステップＳＴ１３で出力
されないか、あるいは、ステップＳＴ１４で無効処理さ
れた後に出力される。この無効処理された出力データ
は、外部には出力されない。In the example of FIG. 2, in step ST11, all ID numbers in the PE group 2 are checked, and
Only when D = 1, the output data of the element processor PE1 is validated in step ST12 and output in step ST13. On the other hand, the output data of the other element processors PE0 and PE2 that do not have ID = 1 are invalidated in step ST14 and are not output in step ST13, or are output after being invalidated in step ST14. The invalidated output data is not output to the outside.

【００４０】本実施例におけるデータの流れを分かりや
すく表すと、図３のようになる。なお、上記説明では、
中央の要素プロセッサＰＥ１に有効データを入力した場
合を説明したが、当然、両端の要素プロセッサＰＥ０，
ＰＥ２に対して有効データが入力される場合もある。こ
の場合、中間結果を保存する要素プロセッサが存在しな
いという事態が想定されるが、これは、処理の最初にス
テップＳＴ０で、有効データが入力される要素プロセッ
サが必ず中央に位置するようにＰＥグループ分けを変更
することで対処する。あるいは、ＩＤ番号０，１，２は
環状的に変化する規則にしておけば、この事態は避けら
れる。たとえば、有効データを入力した要素プロセッサ
が左端のＰＥ０であるとき、中間結果１の保存で、前記
した“自己のＩＤがＩＤ＝ｍ−１（ｍ：有効データを入
力したＰＥのＩＤ番号）ならば、ＰＥｍの中間結果１を
ＬＭ（ｍ−１）に保存せよ”との指令にしたがうと、自
己のＩＤ＝−１となって、そのようなＩＤ番号は存在せ
ず、中間結果１は保存されない事態となる。この事態を
避けるためＩＤ番号を環状的に変化させると、ＩＤ＝−
１＝２であるから、要素プロセッサＰＥ２が中間結果１
を保存する。そして、読み出し時の命令に含まれる（自
己のＩＤ番号−１）＝−１＝２となって、正常に中間結
果１を読み出すことができる。この方法では、ＰＥグル
ープ内で図２の処理フローを繰り返し行う場合、２回目
以降はステップＳＴ０を省略できるという利点がある。FIG. 3 shows the flow of data in this embodiment in an easy-to-understand manner. In the above description,
The case where valid data is input to the central element processor PE1 has been described.
Valid data may be input to PE2. In this case, it is assumed that there is no element processor that stores the intermediate result. However, at the beginning of the processing, the PE group is set so that the element processor to which valid data is input is always located at the center. We will deal with it by changing the division. Alternatively, this situation can be avoided if the ID numbers 0, 1, and 2 are set to a rule that changes in a circular manner. For example, when the element processor that has input valid data is PE0 at the left end, when the intermediate result 1 is stored, if “the self ID is ID = m−1 (m: the ID number of the PE that has input valid data), For example, if the intermediate result 1 of PEm is stored in LM (m-1), the ID of the own device becomes −1, such ID number does not exist, and the intermediate result 1 is stored. It will not be. To avoid this situation, if the ID number is changed cyclically, ID = −
Since 1 = 2, the element processor PE2 determines that the intermediate result 1
Save. Then, (the own ID number -1) =-1 = 2 included in the command at the time of reading, and the intermediate result 1 can be read normally. This method has an advantage that when the processing flow of FIG. 2 is repeatedly performed in the PE group, step ST0 can be omitted from the second time on.

【００４１】また、上記説明では３つの要素プロセッサ
ＰＥ０，ＰＥ１，ＰＥ２を１つのＰＥグループ２として
扱ったが、隣接する要素プロセッサ間のローカルメモリ
に対するデータ通信機能を利用すれば、理論的にＰＥグ
ループ２内の要素プロセッサの数に上限はない。このと
き、より離れた要素プロセッサのローカルメモリのデー
タを参照する場合、中間に位置する要素プロセッサは単
に中継させるだけにしてバケツリレー形式でデータ転送
すれば、ハードウェアとしてのデータ・パス３の追加は
必要ない。In the above description, the three element processors PE0, PE1 and PE2 are treated as one PE group 2. However, if a data communication function for a local memory between adjacent element processors is used, the PE group is theoretically considered. There is no upper limit on the number of element processors in 2. At this time, when referring to the data in the local memory of the distant element processor, if the element processor located in the middle is merely relayed and the data is transferred in a bucket brigade format, the data path 3 as hardware can be added. Is not required.

【００４２】第１実施形態では、複数の要素プロセッサ
ＰＥ０，ＰＥ１，ＰＥ２のローカルメモリＬＭ０，ＬＭ
１，ＬＭ２を１つのデータに対して仮想的に割り当てて
使用する。このため、ＰＥブロック２内の要素プロセッ
サ数をｎとした場合、ＳＩＭＤ制御プロセッサ１として
一度に処理できるデータ数は１／ｎになるものの、1つ
のデータに対して使用できるローカルメモリを実質的に
ほぼｎ倍にすることが可能となる。このような仮想的な
メモリ拡張手法は、制御部のプログラムで行えるため、
従来のＳＩＭＤ制御プロセッサのハードウェア構成を全
く変更することなしに実現できる点で極めて有用であ
る。In the first embodiment, the local memories LM0, LM of the plurality of element processors PE0, PE1, PE2
1 and LM2 are virtually allocated to one data and used. Therefore, when the number of element processors in the PE block 2 is n, the number of data that can be processed at a time by the SIMD control processor 1 is 1 / n, but the local memory that can be used for one data is substantially. It becomes possible to increase the number substantially n times. Since such a virtual memory expansion method can be performed by a program of the control unit,
This is extremely useful in that it can be realized without changing the hardware configuration of the conventional SIMD control processor at all.

【００４３】第２実施形態第２実施形態では、複数の要素プロセッサそれぞれに１
つの有効なデータを割り当てるが、同時に、同じデータ
が割り当てられた要素プロセッサに対して、互いに異な
るパラメータを入力し、共通の命令コードにしたがって
処理を行うことで、実質的に１つのデータ当たりの処理
ステップ数を複数倍に増大させる手法を提案する。 Second Embodiment In the second embodiment, one processor is assigned to each of a plurality of element processors.
One valid data is allocated, but at the same time, by inputting different parameters to the element processors to which the same data is allocated, and performing the processing according to the common instruction code, the processing per one data is substantially performed. We propose a method to increase the number of steps multiple times.

【００４４】この処理では、図１と同じ構造のＳＩＭＤ
制御プロセッサを用いる。また、処理の具体例として
は、図４に示すように、３つの並列のＦＩＲフィルタリ
ング処理Ｐ１と、それらの処理結果をミキシングする処
理Ｐ２がある。ＦＩＲフィルタリング処理Ｐ１で、原画
像の異なる複数のサンプリング点のデータに所定の係数
で重み付けする。そして、次のミキシング処理Ｐ２で、
重み付けした各データを合成する。この合成後の画像デ
ータは、新たなサンプリング点で欠落する画像情報が補
間されたものとなる。In this processing, a SIMD having the same structure as that of FIG.
A control processor is used. Further, as a specific example of the processing, as shown in FIG. 4, there are three parallel FIR filtering processing P1 and processing P2 for mixing the processing results. In the FIR filtering process P1, data at a plurality of sampling points in the original image is weighted by a predetermined coefficient. Then, in the next mixing process P2,
The weighted data is combined. The image data after the synthesis is obtained by interpolating image information missing at a new sampling point.

【００４５】図５は、第２実施形態に係る処理手順を示
すフローチャートである。まず、ステップＳＴ０におい
て、第１実施形態と同じ手法によって、０，１，２，
０，１，２，０，１，…と各要素プロセッサのローカル
メモリにＩＤ番号を記憶させ、ＩＤ番号ＩＤ₀ ，ＩＤ
₁ ，ＩＤ₁ が左から０，１，２と並ぶ３つの要素プロセ
ッサＰＥ０，ＰＥ１，ＰＥ２を、まとめて１つの要素プ
ロセッサ（ＰＥ）グループ２とする。FIG. 5 is a flowchart showing a processing procedure according to the second embodiment. First, in step ST0, 0, 1, 2, 2, and 3 are performed in the same manner as in the first embodiment.
..., ID numbers are stored in the local memory of each element processor, and ID numbers ID ₀ , ID
_The three element processors PE 0, PE 1, and PE 2 in which ₁ , ₁ and ID ₁ are lined up from the left with 0, 1, and 2 are collectively referred to as one element processor (PE) group 2.

【００４６】ステップＳＴ２１では、第１実施形態とは
異なり、ＰＥグループ２内の各要素プロセッサＰＥ０，
ＰＥ１，ＰＥ２に同一の有効なデータを入力する。ここ
ではＦＩＲフィルタリング処理であるので、データとし
ては原画像のサンプリング時の画像データ、たとえば輝
度Ｙまたは色相Ｃｒ，Ｃｂなどのデータであり、ＰＥグ
ループ内の複数の要素プロセッサでは同一のものが用い
られる。この場合、必ずしも各要素プロセッサＰＥ０，
ＰＥ１，ＰＥ２すべてに有効なデータを入力する必要は
なく、ＰＥグループ２内の、たとえば中央の要素プロセ
ッサＰＥ１にのみデータを入力し、それをグループ２内
の他の要素プロセッサＰＥ０，ＰＥ２に内部で配信して
もよい。In step ST21, unlike the first embodiment, each of the element processors PE0, PE0,
The same valid data is input to PE1 and PE2. Here, since the FIR filtering processing is performed, the data is image data at the time of sampling the original image, for example, data such as luminance Y or hue Cr, Cb, and the same data is used by a plurality of element processors in the PE group. . In this case, each element processor PE0, PE0,
It is not necessary to input valid data to all PE1 and PE2, but to input data only to, for example, the central element processor PE1 in the PE group 2 and to internally input it to the other element processors PE0 and PE2 in group 2. May be delivered.

【００４７】続くステップＳＴ２２において、それら各
要素プロセッサＰＥ０，ＰＥ１，ＰＥ２に、データとは
別に、複数のＦＩＲフィルタのフィルタ係数を、互いに
相異なる組み合わせパターンのパラメータ・セットとし
て入力する。パラメータ・セットは、含まれているパラ
メータ（フィルタ係数）の組合せが３つとも全く異なる
が、少なくとも中央の要素プロセッサＰＥ１の出力デー
タ生成に必要なパラメータが分散して含まれる必要があ
る。In the following step ST22, filter coefficients of a plurality of FIR filters are input to the respective element processors PE0, PE1 and PE2 as parameter sets of different combination patterns, apart from the data. The parameter set includes three different combinations of parameters (filter coefficients), but it is necessary that at least parameters necessary for generating output data of the central element processor PE1 are dispersed.

【００４８】そして、ステップＳＴ２３において、入力
したデータおよびパラメータ・セットを用いてＦＩＲフ
ィルタリング処理Ｐ１を実行する。この処理時に、ＳＩ
ＭＤ制御プログラムの共通命令コードが発行され、これ
に従ってすべての要素プロセッサＰＥ０，ＰＥ１，ＰＥ
２が同一手順で並列にＦＩＲフィルタリング処理Ｐ１を
演算処理する。Then, in step ST23, FIR filtering processing P1 is executed using the input data and parameter set. During this process, SI
A common instruction code of the MD control program is issued, and all the element processors PE0, PE1, PE
2 performs the FIR filtering process P1 in parallel in the same procedure.

【００４９】ＦＩＲフィルタリング処理Ｐ１後は、つぎ
のステップＳＴ２４およびＳＴ２５において、ミキシン
グ処理Ｐ２を実行する。具体的に、たとえば図５に示す
ように、最初のステップＳＴ２４で合成処理１が行わ
れ、その合成結果に対し、さらに次のステップＳＴ２５
で合成処理２が施される。これらの合成処理１，２は、
たとえば図１で図示を省略したＡＬＵ間通信パスを利用
して、それぞれの要素プロセッサＰＥ０，ＰＥ１，ＰＥ
２が、自分を中心として左右の要素プロセッサからの処
理結果を取り込みながら行う。After the FIR filtering process P1, the mixing process P2 is executed in the next steps ST24 and ST25. Specifically, for example, as shown in FIG. 5, the combining process 1 is performed in the first step ST24, and the combined result is further subjected to the next step ST25.
Performs the synthesis process 2. These combining processes 1 and 2
For example, each of the element processors PE0, PE1, and PE is utilized by using a communication path between ALUs not shown in FIG.
2, while taking in the processing results from the left and right element processors centering on itself.

【００５０】たとえば、最初の合成処理１（ステップＳ
Ｔ２４）の共通コマンドが、“自己の処理結果を（自己
のＩＤ番号）−１のＰＥの処理結果と合成せよ”であ
り、次の合成処理２（ステップＳＴ２５）共通コマンド
が、“自己の合成処理１の結果を（自己のＩＤ番号）＋
１のＰＥの処理結果と合成せよ”であるとする。このコ
マンドにしたがうと、要素プロセッサＰＥ１は要素プロ
セッサＰＥ０の処理結果を用いて合成処理１を行い、そ
の結果をさらに要素プロセッサＰＥ２の処理結果と合成
し、出力データとして出力する。ところが、要素プロセ
ッサＰＥ０は合成処理１時に、自己の処理結果を、当該
ＰＥグループ２の左隣の他のＰＥグループでの処理結果
と合成してしまい、誤った処理を行ってしまう。同様
に、要素プロセッサＰＥ２は合成処理２時に、自己の合
成処理１の結果を、当該ＰＥグループ２の右隣の他のＰ
Ｅグループでの処理結果と合成してしまい、誤った処理
を行ってしまう。したがって、正しい出力データが得ら
れるのは、中央の要素プロセッサＰＥ１のみである。For example, the first synthesis processing 1 (step S
The common command of T24) is “combine the processing result of itself with the processing result of the PE of (own ID number) −1”, and the next common command of the combining processing 2 (step ST25) is The result of processing 1 is (self ID number) +
According to this command, the element processor PE1 performs the synthesis processing 1 using the processing result of the element processor PE0, and further combines the result with the processing result of the element processor PE2. However, the element processor PE0 combines its own processing result with the processing result of the other PE group to the left of the PE group 2 at the time of the combining processing 1 and outputs an error. Similarly, the element processor PE2 compares the result of its own synthesis processing 1 with the other Ps on the right of the PE group 2 during the synthesis processing 2.
The result is combined with the processing result in the E group, and an erroneous processing is performed. Therefore, only the central element processor PE1 can obtain correct output data.

【００５１】その後は、第１実施形態と同様の手順によ
り、ステップＳＴ１１〜ＳＴ１３で出力データの有効性
判定を行う。この最後のステップＳＴ１３では、たとえ
ば中央の要素プロセッサＰＥ１のみから結果が出力さ
れ、図５で示した処理フローが完了する。この処理手法
では、３つの要素プロセッサＰＥ０，ＰＥ１，ＰＥ２を
１つのＰＥグループ２とし、仮想的な１つの要素プロセ
ッサと見なせば、通常の要素プロセッサの３倍の処理能
力で、１つのＦＩＲフィルタリング処理を行うことにな
る。Thereafter, the validity of the output data is determined in steps ST11 to ST13 in the same procedure as in the first embodiment. In this last step ST13, for example, the result is output only from the central element processor PE1, and the processing flow shown in FIG. 5 is completed. In this processing method, if three element processors PE0, PE1, and PE2 are regarded as one PE group 2, and considered as one virtual element processor, one FIR filtering is performed with a processing capacity three times as large as that of a normal element processor. Processing will be performed.

【００５２】なお、第２実施形態では、ＩＤ番号を環状
的に変化させることはなく、また、１つのＰＥグループ
２内の要素プロセッサ数に理論的に上限はない。In the second embodiment, the ID number does not change cyclically, and the number of element processors in one PE group 2 has no theoretical upper limit.

【００５３】このように、第２実施形態では、同一ＰＥ
グループ内の各要素プロセッサ全てに有効データを入力
し、かつ、ＰＥグループ内の各要素プロセッサに対して
異なるパラメータ・セットを入力すれば、各有効データ
に対しパラメータの異なる処理を完全に並列で実行でき
る。処理したデータを各要素プロセッサのＩＤ番号で区
別して統合処理を行ったうえで、最終的に、各要素プロ
セッサの出力を、たとえばＩＤ番号で選択制御する。こ
れにより、各々のＰＥグループで１つ正しい処理結果が
出力データとして得られる。この処理では、各グループ
を仮想的な１つの要素プロセッサと見なせば、その仮想
要素プロセッサは個々の要素プロセッサよりも高い処理
能力を持つことになる。実質的には、この処理手法によ
って有効な出力が得られる要素プロセッサ数は減る。と
ころが、画像処理の分野などでは、一度に並列処理させ
たい画素は間欠的な場合も多いので、本手法によって、
従来は数回の処理に１度しか処理に寄与していなかった
要素プロセッサの処理能力を有効活用したことになる。
しかも、本実施形態では、この並列処理数を増加させた
高効率なＳＩＭＤ制御プロセッサを、ハードウェアの構
成を全く変更することなしに実現できた。As described above, in the second embodiment, the same PE
If valid data is input to all of the element processors in the group and a different parameter set is input to each of the element processors in the PE group, different processing of the parameters is performed on each of the valid data completely in parallel. it can. After performing the integration process while distinguishing the processed data by the ID number of each element processor, finally, the output of each element processor is selectively controlled by, for example, the ID number. As a result, one correct processing result is obtained as output data in each PE group. In this processing, if each group is regarded as one virtual element processor, the virtual element processor has a higher processing capability than the individual element processors. In effect, the number of element processors that can obtain an effective output by this processing method is reduced. However, in the field of image processing and the like, pixels that are desired to be processed in parallel at one time are often intermittent.
This means that the processing capability of the element processor, which has conventionally contributed only once in several processes, has been effectively utilized.
Moreover, in the present embodiment, a highly efficient SIMD control processor with an increased number of parallel processes can be realized without any change in the hardware configuration.

【００５４】[0054]

【発明の効果】本発明に係るＳＩＭＤ制御並列処理方法
および装置によれば、要素プロセッサが無駄になるデー
タ・ストリームに対して、要素プロセッサを有効利用し
たうえでローカルメモリを実質的に複数倍に増加させる
処理を、ハードウェアの構成をそのままに実現すること
を可能とした。According to the SIMD control parallel processing method and apparatus according to the present invention, a local memory is effectively multiplied by a plurality of times for a data stream in which the element processor is wasted, while effectively utilizing the element processor. It has become possible to realize the increase processing without changing the hardware configuration.

【００５５】また、本発明に係る他のＳＩＭＤ制御並列
処理方法および装置によれば、ＦＩＲフィルタリング処
理など、パラメータ（フィルタ係数）が異なるだけで全
く同じような処理を繰り返す並列処理数を、ハードウェ
アの構成をそのまま増加させて、処理効率を上げること
を可能とした。Further, according to another SIMD control parallel processing method and apparatus according to the present invention, the number of parallel processes that repeats exactly the same process except for different parameters (filter coefficients), such as FIR filtering, can be reduced by hardware. , The processing efficiency can be increased.

[Brief description of the drawings]

【図１】本発明の実施形態に係るＳＩＭＤ制御の多並列
デジタル・プロセッサの要部構成を示すブロック図であ
る。FIG. 1 is a block diagram illustrating a main configuration of a SIMD-controlled multi-parallel digital processor according to an embodiment of the present invention.

【図２】第１実施形態の処理手順を示すフローチャート
である。FIG. 2 is a flowchart illustrating a processing procedure according to the first embodiment.

【図３】第１実施形態における、各要素プロセッサ間の
データの流れを示す模式図である。FIG. 3 is a schematic diagram illustrating a flow of data between element processors according to the first embodiment.

【図４】第２実施形態で実施する処理ブロックを示す図
である。FIG. 4 is a diagram illustrating processing blocks implemented in a second embodiment.

【図５】第２実施形態の処理手順を示すフローチャート
である。FIG. 5 is a flowchart illustrating a processing procedure according to a second embodiment.

[Explanation of symbols]

１…ＳＩＭＤ制御の多並列デジタル・プロセッサ（ＳＩ
ＭＤ制御並列処理装置）、２…仮想的な要素プロセッサ
・グループ、３…データ・パス、４…メモリ選択手段、
５…有効性判定手段、ＰＥ０，ＰＥ１，ＰＥ２…要素プ
ロセッサ、ＬＭ０，ＬＭ１，ＬＭ２…ローカルメモリ
（メモリ部）、ＡＬＵ…処理部、ＩＤ₀ 等…ＩＤ番号
（認識コード）。1: SIMD-controlled multi-parallel digital processor (SI
MD control parallel processing device), 2 ... virtual element processor group, 3 ... data path, 4 ... memory selection means,
5 ... validity determining means, PE0, PE1, PE2 ... element processors, LM0, LM1, LM2 ... local memory (memory unit), ALU ... processor, ID _0, etc. ... ID number (identification code).

Claims

[Claims]

1. An SIMD control parallel processing method for controlling a plurality of one-dimensionally arranged element processors by a single instruction, wherein the SIMD control parallel processing method is repeatedly assigned in the same arrangement for each group of a predetermined number of element processors, In addition, an identification code that can specify an arbitrary element processor in each group is assigned to each element processor, data is input to the plurality of element processors, and the same processing is executed. In the element processor, the intermediate result of the above processing is stored in another element processor specified based on the specific identification code, and the stored intermediate result is read out by the specification based on the specific identification code and the subsequent SIMD control parallel processing method used for processing.

2. The SIMD control parallel processing method according to claim 1, wherein a result of said processing is selected and output by designation based on a specific identification code.

3. A SIMD control parallel processing method for controlling a plurality of one-dimensionally arranged element processors by a single instruction, wherein the method is repeatedly assigned in the same arrangement for each group of a predetermined number of element processors, In addition, an identification code that can specify an arbitrary element processor within each group is assigned to each element processor, data and parameters are input to the plurality of element processors, and the same processing is executed. A SIMD control parallel processing method in which some processing results are integrated based on an identification code as a reference, and the result of the integration processing is selected and output according to the specification based on the identification code.

4. A plurality of one-dimensionally arrayed ones each including a memory unit for storing information and a processing unit for executing processing based on the information stored in the memory unit, and capable of data communication between adjacent ones. SIMD by using a single instruction
A SIMD control circuit for controlling the SIMD control, wherein the SIMD control is provided with an identification code that is repeatedly assigned in the same arrangement for each group of a predetermined number of element processors, and that can specify an arbitrary element processor in each group. Assigned for each element processor, input data to the plurality of element processors, execute the same processing, and, in a specific element processor in the group, determine an intermediate result of the processing based on a specific identification code. SIMD including control for storing in another designated element processor, reading out the stored intermediate result by designation based on the specific identification code, and using the same for subsequent processing
Control parallel processing unit.

5. The SIMD control parallel processing apparatus according to claim 4, wherein said SIMD control further includes a control for selecting and outputting a result of said processing by designating said specific identification code as a reference.

6. A plurality of memory units each of which is one-dimensionally arranged including a memory unit for storing information and a processing unit for executing processing based on the information stored in the memory unit, and capable of data communication between adjacent ones. SIMD by using a single instruction
A SIMD control circuit for controlling the SIMD control, wherein the SIMD control is provided with an identification code that is repeatedly assigned in the same arrangement for each group of a predetermined number of element processors, and that can specify an arbitrary element processor in each group. Assigned for each element processor, inputting data and parameters to the plurality of element processors, causing the same processing to be executed, and integrating and processing some of the processing results of the element processors with reference to an identification code. And a SIMD control parallel processing apparatus including a control for selecting and outputting the result of the integration processing by designating the identification code as a reference.