JP3971557B2

JP3971557B2 - Data setting device in SIMD processor

Info

Publication number: JP3971557B2
Application number: JP2000297115A
Authority: JP
Inventors: 貴雄片山; 慎一山浦; 正展福島; 和彦原; 圭治中村; 和彦岩永; 浩資高藤
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2000-09-28
Filing date: 2000-09-28
Publication date: 2007-09-05
Anticipated expiration: 2020-09-28
Also published as: JP2002108832A

Description

【０００１】
【発明の属する技術分野】
本発明は、画像データ等を高速処理するために同一の命令で複数データに対して同じ処理を行うＳＩＭＤ（ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎ−ｓｔｒｅａｍＭｕｌｔｉｐｌｅＤａｔａ−ｓｔｒｅａｍ）型マイクロプロセッサに関する。
【０００２】
【従来の技術】
近年、デジタル複写機やファクシミリ装置等の画像処理においては、画素数の増加、画像処理の多様化などにより画質の向上が図られている。このような画像処理では、複数（多数）のデータに対して同時に同じ処理を施すことが多い。その際、高速性を高めるため、１命令で１つのデータを処理するＳＩＳＤ（ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎ−ｓｔｒｅａｍＳｉｎｇｌｅＤａｔａ−ｓｔｒｅａｍ）型マイクロプロセッサよりも、１命令で複数のデータを同時処理する、ＳＩＭＤ（ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎ−ｓｔｒｅａｍＭｕｌｔｉｐｌｅＤａｔａ−ｓｔｒｅａｍ）型マイクロプロセッサが用いられることが多い。
【０００３】
図１は、一般的なＳＩＭＤ型マイクロプロセッサ２の概略の構成を示すブロック図である。該ＳＩＭＤ型マイクロプロセッサ２は、概略、グローバルプロセッサ（以下では、ＧＰと言う。）４、及びプロセッサエレメント３により構成されるのであるが、複数のデータを一度に処理するためにプロセッサエレメント３を複数個装備している。各プロセッサエレメント３は、レジスタファイル６と演算アレイ８を備える。ＧＰ４は、プロセッサ２全体の制御を行ない、プロセッサエレメント３は、外部入出力装置からデータを入力しデータ処理を行ない、外部入出力装置に出力する。
【０００４】
上記のＳＩＭＤ型マイクロプロセッサ２は、通常、１クロックサイクルで１命令を処理するが、１命令でプロセッサエレメント３の個数分のデータを一度に処理することができる。ＳＩＭＤ型マイクロプロセッサ２の性能を表す際には、ＳＩＭＤ型マイクロプロセッサ２の動作周波数や、プロセッサエレメント３の個数、即ち１命令で処理できるデータの数などが重要視されるが、更に、命令サイクル数も重要な要素とされる。つまり、同じ画像処理を行う限り１命令サイクルでも少ないほうが性能がよいとされるのである。しかし、１命令で複雑な処理を行うために、複雑な回路を設計・利用するならば、どうしてもコストが増大する。
【０００５】
【発明が解決しようとする課題】
本発明は、有効な命令と命令を実現する簡素な手段を設けることにより、上記のような画像データ処理に伴う命令の命令実行サイクル数を減らすことを目的とする。
【０００６】
【課題を解決するための手段】
本発明は、上記の目的を達成するためになされたものである。本発明に係る請求項１に記載のＳＩＭＤ型マイクロプロセッサは、
各プロセッサエレメントは、プロセッサエレメントに含まれる算術論理演算装置の演算結果を所定の汎用レジスタに格納する際の可否を決定するための値を有する演算制御レジスタを持ち、
各プロセッサエレメントには、識別のための整数番号が順に付されており、
上記の識別のための整数番号をＶＣＣとＧＮＤで形成する接続線部を有し、
更に、
上記の識別のための整数番号と、グローバルプロセッサにおける命令コードにより指定された数値を、全プロセッサエレメントに渡って同時に比較し、比較結果を各プロセッサエレメントの上記演算制御レジスタに含まれるフラグに設定することを特徴とするＳＩＭＤ型マイクロプロセッサである。
【０００７】
本発明に係る請求項２に記載のＳＩＭＤ型マイクロプロセッサは、
各プロセッサエレメントは、プロセッサエレメントに含まれる算術論理演算装置の演算結果を所定の汎用レジスタに格納する際の可否を決定するための値を有する演算制御レジスタを持ち、
各プロセッサエレメントには、識別のための整数番号が順に付されており、
各々のプロセッサエレメントでは、
識別のための整数番号をＶＣＣとＧＮＤで形成する接続線部と、識別のための整理番号に対するマスク信号とを取り込む論理回路部が、識別のための整理番号に対するマスク回路を構成し、上記接続線部を構成するビット信号の夫々と上記マスク信号の各ビット夫々が、論理回路部を構成する個別の論理回路の入力となっており、
更に、
グローバルプロセッサから指定されるマスク信号により、各プロセッサエレメントの論理回路部は、識別のための整数番号の、上記マスク信号により指定されたビットを、上記マスク信号の内容でマスクして出力し、
これにより、各プロセッサエレメントに対する識別のための整理番号に対応して、論理回路部の出力におけるマスクが為されて出力される部分とマスクが為されずに出力される部分との組み合わせが、周期的規則を有しており、
更に、
上記の各プロセッサエレメントの上記マスク回路にて生成されたデータと、グローバルプロセッサにおける命令コードにより指定された数値を全プロセッサエレメントに渡って同時に比較し、比較結果を各プロセッサエレメントの上記演算制御レジスタに含まれるフラグに設定することを特徴とするＳＩＭＤ型マイクロプロセッサである。
【０００８】
本発明に係る請求項３に記載のＳＩＭＤ型マイクロプロセッサは、
各プロセッサエレメントは、プロセッサエレメントに含まれる算術論理演算装置の演算結果を所定の汎用レジスタに格納する際の可否を決定する演算制御レジスタを持ち、
各プロセッサエレメントには、識別のための整数番号が順に付されており、
各々のプロセッサエレメントは、ＶＣＣとＧＮＤで形成する接続線部と、該接続線部に繋がりグローバルプロセッサから指定される入力値選択信号を受けるマルチプレクサ部を有し、
該接続線部は、複数の部分接続線部に分割され、この分割はプロセッサエレメント全体を通して共通かつ固定であり、各部分接続部は一つ又は複数のビット信号部を有し、
各部分接続線部は、プロセッサエレメントに順に付される識別のための整数番号に対応して、ＶＣＣとＧＮＤに繋がることにより周期的に変動するビット信号を出力するように形成されており、
更に、
グローバルプロセッサから指定される入力値選択信号により、マルチプレクサ部は、各プロセッサエレメントの有する接続線部を構成する部分接続線部のうち共通のものを選択して、選択された部分接続部の出力する信号を出力し、
これにより、各プロセッサエレメントに対する識別のための整理番号に対応して、上記マルチプレクサ部の出力が、周期的規則を有しており、
更に、
上記の各プロセッサエレメントの上記マルチプレクサ部にて生成される周期的規則を有する出力値と、グローバルプロセッサにおける命令コードにより指定された数値を全プロセッサエレメントに渡って同時に比較し、比較結果を各プロセッサエレメントの上記演算制御レジスタに含まれるフラグに設定することを特徴とするＳＩＭＤ型マイクロプロセッサである。
【０００９】
本発明に係る請求項４に記載のＳＩＭＤ型マイクロプロセッサは、
各プロセッサエレメントは、プロセッサエレメントに含まれる算術論理演算装置の演算結果を所定の汎用レジスタに格納する際の可否を決定する演算制御レジスタを持ち、
各プロセッサエレメントには、識別のための整数番号が順に付されており、
各々のプロセッサエレメントでは、
接続線部と、マスク信号を取り込む論理回路部が、ビットパターンデータ出力回路を構成し、上記接続線部を構成するビット信号の夫々と上記マスク信号の各ビット夫々が、論理回路部を構成する個別の論理回路の入力となっており、
接続線部は、個別の接続線に分割され、この分割はプロセッサエレメント全体を通して共通かつ固定であり、
各個別の接続線は、プロセッサエレメントに順に付される識別のための整数番号に対応して、周期的に変動するビット信号を、ＶＣＣ若しくはＧＮＤの接続により出力するように形成されており、
更に、
グローバルプロセッサから指定されるマスク信号により、各プロセッサエレメントの論理回路部は、指定されたビットを、上記マスク信号の内容でマスクして出力し、
これにより、各プロセッサエレメントに対する識別のための整理番号に対応して、論理回路部の出力におけるマスクが為されて出力される部分とマスクが為されずに出力される部分との組み合わせが、周期的規則を有しており、
更に、
上記の各プロセッサエレメントのビットパターンデータ出力回路にて生成されるデータと、グローバルプロセッサにおける命令コードにより指定された数値を全プロセッサエレメントに渡って同時に比較し、比較結果を各プロセッサエレメントの上記演算制御レジスタに含まれるフラグに設定することを特徴とするＳＩＭＤ型マイクロプロセッサである。
【００１３】
【発明の実施の形態】
以下、図面を参照して本発明に係る好適な実施の形態を説明する。
【００１４】
図１は、本発明を含む一般的なＳＩＭＤ型マイクロプロセッサ２の概略の構成を示すブロック図である。主としてプロセッサ２全体を制御するグローバルプロセッサ（以下、ＧＰと言う。）４と、主として外部入出力装置からデータを入力しデータ処理を行い、外部入出力装置にデータを出力するプロセッサエレメント３とから、構成される。プロセッサエレメント３は、複数データを同時に処理するために複数用意されている。図１では、１個のＧＰ４と、２５６個のプロセッサエレメント３とにより、ＳＩＭＤ型マイクロプロセッサ２が構成されている。
【００１５】
図２は、本発明に係るＳＩＭＤ型マイクロプロセッサ２のより詳細な構成を示すブロック図である。図に示されるようにＧＰ４は、
・命令コードで構成されるプログラムを格納するためのプログラムＲＡＭ１０と、
・ＧＰ４での演算データを格納するデータＲＡＭ１２と、
・プログラムを解読し各種ブロックに各種制御信号を送るシーケンシャルユニット（ＳＣＵ）９と、
・データを格納する複数の汎用レジスタ（Ｇ０〜Ｇ３）と、
・ＳＣＵ９にプログラムの命令コードを送るためにプログラムのアドレスを保持するプログラムカウンタ（ＰＣ）１４と、
・データメモリにスタックを形成するためデータメモリのアドレスを格納するスタックポインタ（ＳＰ）２４と、
・プログラムの途中でサブルーチン処理を行う際には分岐が発生するが分岐前のアドレスを格納する複数のリンクレジスタ（ＬＳ、ＬＩ、ＬＮ）と、
・データメモリのデータ、命令コード中に記述された数値（即値）データ、若しくは汎用レジスタに格納されているデータのいずれかの組み合わせで算術論理演算を行う算術論理演算装置（ＡＬＵ）１１と、
・プロセッサの状態を保持するプロセッサステータスレジスタ（図示せず。）と、
・ハードウェア割り込みとソフトウェア割り込みを制御する割り込み制御回路（図示せず。）と、
・外部入出力に直接接続され外部からのデータの入出力を制御する外部Ｉ／Ｏ制御回路（図示せず。）と
を含む。
【００１６】
図２では示していないが、上記ＳＣＵ９は、ＧＰ命令を解読し主にＧＰ内の各ブロックに制御信号を発生するＧＰインストラクションデコーダと、プロセッサエレメント命令を解読し主にプロセッサエレメント内の各ブロックに制御信号を発生するプロセッサエレメントインストラクションデコーダとで、構成される。即ち、本プロセッサに係る命令コードは、主にＧＰ４内の各ブロックを制御し、プログラムのシーケンスを決定したり、プロセッサエレメント３に転送する共通データをＧＰ４内のＡＬＵ１１で加工したりするＧＰ命令と、外部入出力から一度に入力されたデータをプロセッサエレメント３毎に処理をさせるプロセッサエレメント命令とに、分類される。
【００１７】
図１に示すように、各プロセッサエレメント３は、外部からの入出力データを一時的に保持するレジスタファイル６と、プロセッサエレメント３内で算術論理演算やビット演算のデータ処理を行うための演算アレイ８を含む。さらに図２に示すようにレジスタファイル６には、例えば、Ｒ０〜Ｒ３１までの８ビットのレジスタ３４が３２本用意されている。これらのレジスタ３４からデータが演算アレイ８に転送され、又逆に演算アレイ８からデータが転送されてレジスタ３４に格納される。レジスタ３４と演算アレイ６とのバスは、８ビットの双方向バスである。
【００１８】
更に図２に示すように、単体の演算アレイ８は演算ユニットであり、
・レジスタファイル６からのデータをシフトして符号付き拡張もしくは符号無し拡張をし１６ビットデータに加工するシフト・拡張器４４と、
・例えば、Ａレジスタ３６とＦレジスタ４０のような複数の汎用レジスタと、
・レジスタファイル６からのデータをシフト・拡張器４４を経由して加工し１入力とし、他方の入力をＡレジスタ３６からの入力とする算術論理演算装置（ＡＬＵ）３６と、
・（後で説明する）本発明に係るＰＥ番号マスク回路、固定値選択回路、及びｎおきビットパターンデータ出力回路の夫々からの出力を入力とし、自らの出力をＡレジスタ３８やＴレジスタ５４に繋げる選択回路３５と
を含む。算術論理演算装置（ＡＬＵ）３６の出力は、Ａレジスタ３６もしくはＦレジスタ４０に一時格納されように設定されているが、Ａレジスタ３６からレジスタファイル６の所定の１レジスタ３４にデータ転送されることも可能である。
【００１９】
また、演算アレイ８は、後でも説明するように、「Ｔレジスタ」と呼ばれる演算制御レジスタ５４を備える。ＡＬＵ３６からの出力は、該Ｔレジスタ５４によって、Ａレジスタ３６もしくはＦレジスタ４０への書き込み内容が制御される。例えば、演算制御レジスタ（Ｔレジスタ）５４の中の所定の１ビットの状態に応じて、“１”あればＡレジスタ３６もしくはＦレジスタ５４への書き込みを行い、“０”であれば行わないというような制御が行なわれる。
【００２０】
図３は、レジスタファイル６のレジスタ３４と演算アレイ８とを結び付けるマルチプレクサの機能を示すブロック図である。ＰＥｉ（ｉ＝０，１，２，・・・２５５）のプロセッサエレメントに備わるマルチプレクサは７ｔｏ１（７対１）のマルチプレクサであり、ＰＥｉ−３（ＰＥｉの３つ左隣り）、ＰＥｉ−２（ＰＥｉの２つ左隣り）、ＰＥｉ−１（ＰＥｉの１つ左隣り）、ＰＥｉ、ＰＥｉ＋１（ＰＥｉの１つ右隣り）、ＰＥｉ＋２（ＰＥｉの２つ右隣り）、ＰＥｉ＋３（ＰＥｉの３つ右隣り）のプロセッサエレメント３のレジスタファイル６からのデータを入出力することができるように設定されている。この機能を、ＰＥシフト機能と称する。マルチプレクサによって選択されたデータは、演算アレイ８のシフト・拡張部４４に転送される。
【００２１】
ここで、プロセッサエレメント３の番号を含む呼称について定義する。図２に示すように、本発明に係るＳＩＭＤ型マイクロプロセッサ２には２５６個のプロセッサエレメント３が設置されており、それらプロセッサエレメント３の個々に対し、（図では左側から）ＰＥ０、ＰＥ１、ＰＥ２、ＰＥ３、・・・、ＰＥ２５４、ＰＥ２５５というように、プロセッサエレメント番号（ＰＥ番号）を付すと定義する。
【００２２】
≪第１の実施の形態≫
図４は、本発明の第１の実施の形態に係るＰＥ（プロセッサエレメント）番号マスク回路の回路図である。上記ＰＥ番号マスク回路は、（従来技術である）ＰＥ番号を所定の汎用レジスタに入力する接続線に対し、ＧＰ４からのマスク制御信号（即ち、ＰＥ番号マスク信号）を取り入れる論理回路を挿入することにより形成される。後でも説明するように、この図４の回路構成では、“０”によりマスクすることになる。
【００２３】
各ＰＥｉ（ｉ＝０，１，２，・・・２５５）においては、各ＰＥ番号を形成する接続線部５０とＰＥ番号マスク信号を取り込む論理回路（ＡＮＤ回路）部５１とが結合して設置されている。そこからの出力は、後で説明するように、各プロセッサエレメント３毎に設置されている選択回路３５（図２参照）に繋がる。その選択回路３５を介して例えばＡレジスタ３８にて上記出力が格納される。
【００２４】
（従来技術である）各ＰＥ番号を形成する接続線部５０を利用して、Ａレジスタ３８にＰＥ番号を設定することは従来も可能であった。例えば、「ＬＤＰＮ」（ＬｏａｄＰＥＮｕｍｂｅｒ）という命令を実行することにより、上記機能を実現していたとする。ここで、本発明の第１の実施の形態に係る回路によっても、全く同じ機能を実現すること、即ち各ＰＥ番号を各選択回路３５に送ることを「ＬＤＰＮ」の命令の利用によって実現するように設定することができる。なお、図４からも明白なように、ＬＤＰＮ命令を用いるときのＰＥ番号マスク信号は（８ビット）すべて“１”となる。このような設定により、ＶＣＣとＧＮＤとで形成されるパターンでＰＥ番号を表すデータが、各プロセッサエレメント３の選択回路３５に入力されることになる。
【００２５】
具体的に示すと、選択回路３５に入力されるデータは、
・ＰＥ０、ＰＥ１、ＰＥ２、・・・ＰＥ２５５の順に、
・０、１、２、・・・２５５
の値となる。
【００２６】
更に、本発明の第１の実施の形態に係る回路においては、ＳＩＭＤ型マイクロプロセッサ２への命令にて指定するビットを“０”に設定することにより、入力データをマスクすることができる。そのような処理を行なう命令として、「ＬＤＮＭ」（ＬｏａｄＰＥＮｕｍｂｅｒＭａｓｋｉｎｇ）命令が用意されている。ＬＤＮＭ命令は、マスクパターンを即値で指定することで出力データをマスクする。その即値はＰＥマスク信号として図４の回路に取り込まれる。ＬＤＮＭ命令は、次のように、記述される。
【００２７】
ＬＤＮＭ／０＃００００００１１ｂ
【００２８】
上記の記述で「００００００１１ｂ」の末尾のｂは、２進数表記であることを示す。上記の命令では、ＰＥ番号のビット３からビット７までの出力値に対しマスクが施されることになる。また、“／０”は、“０”によるマスクを表すオプション記述である。
【００２９】
よって、各選択回路３５に入力されるデータは、
・ＰＥ０、ＰＥ１、ＰＥ２、ＰＥ３、ＰＥ４、ＰＥ５、ＰＥ６、・・・ＰＥ２５４、ＰＥ２５５の順に、
・０、１、２、３、０、１、２、・・・２、３
の値となる。
【００３０】
図４の第１の実施の形態にかかる回路では、ＡＮＤ回路が利用されているが、これをＯＲ回路に置き換えることも想定され得る。即ち、各ＰＥｉ（ｉ＝０，１，２，・・・２５５）において、各ＰＥ番号を形成する接続線部５０とＰＥ番号マスク信号を取り込む論理回路（ＯＲ回路）部（図示せず。）とが結合して設置されることになる。このとき、上記（図４）の回路構成では“０”によりマスクをしていたが、ＯＲ回路に置き換える回路の場合、“１”によるマスクとなる。命令は以下のように記述される。
【００３１】
ＬＤＮＭ／１＃１１１１１１００ｂ
【００３２】
上記の記述では、ＰＥ番号のビット３からビット７までの入力値が、“１”でマスクされることになる。／１は“１”によるマスクを表すオプション記述である。
【００３３】
このとき、各選択回路３５に入力されるデータは、
・ＰＥ０、ＰＥ１、ＰＥ２、ＰＥ３、ＰＥ４、ＰＥ５、ＰＥ６、・・・ＰＥ２５４、ＰＥ２５５の順に、
・２５２（１１１１１１００ｂ）、２５３（１１１１１１０１ｂ）、２５４（１１１１１１１０ｂ）、２５５（１１１１１１１１ｂ）、２５２、２５３、２５４、・・・２５４、２５５
の値となる。
【００３４】
図４に示される本発明の第１の実施の形態に係る回路を利用することにより、マスク制御信号（ＰＥ番号マスク信号）にて“０”であるビットに対応する制御線がマスクされ、所定のビットのみで表され且つ繰り返しの規則性を備えるデータを、１命令で設定（形成）することができる。例えば、上位６ビットを全てマスクすることにより、下位２ビットのみが有効となり、ＰＥ０、ＰＥ１、ＰＥ２、ＰＥ３、ＰＥ４、ＰＥ５、ＰＥ６、・・・のプロセッサエレメント３の各選択回路３５に対し、
・０、１、２、３、０、１、２、３、・・・
の規則性のある繰り返しの値を１命令で出力することができる。また、図４の回路において、ＡＮＤ回路を全てＯＲ回路にすることによっても、ＰＥ０、ＰＥ１、ＰＥ２、ＰＥ３、ＰＥ４、ＰＥ５、ＰＥ６、・・・のプロセッサエレメント３の各選択回路３５に対し、規則性のある繰り返しの値を１命令で出力することができる。但し、このときは“１”によりマスクがなされることになる。
【００３５】
従来技術を利用する場合、ＬＤＰＮ命令でＰＥ番号を一旦Ａレジスタ３８に設定し、更にそのＡレジスタ３８の値を４で割ったときの剰余を同じくＡレジスタ３８に設定すれば、図４と同様の結果が得られる。しかし、
・ＬＤＰＮ命令で１命令サイクル、
・（乗除算器があると仮定して、）除算命令で１命令サイクル、
計２命令サイクルが必要である。然も、乗除算器をＳＩＭＤ型マイクロプロセッサ２の全プロセッサエレメント３内に設定するならば、莫大な回路規模が必要となる。通常のＳＩＭＤ型マイクロプロセッサ２には乗除算器は設定されてないため、除算の実施は、ＡＬＵ３６を利用して被除数のビット数の分までの減算を行なうことで実現される。汎用レジスタであるＡレジスタ３８は、例えば１６ビットであるならば、最短１６回の減算が必要となる（現実には除算前の設定が数命令サイクル必要になる）。ここで１６命令サイクルかかり、結局、全体で最短１８命令サイクルとなる。
【００３６】
以上、明白なように、本発明の第１の実施の形態に係る回路によって、相当数の命令サイクルの削減が行なえる。
【００３７】
≪第２の実施の形態≫
図５は、本発明の第２の実施の形態に係る固定値選択回路の回路図である。上記固定値選択回路は、ｎ（ｎは自然数）おきの周期変動を備え１周期内では単調に０から１ずつ増加する数値を、（後で説明する）選択回路３５に対し、入力値選択信号による選択制御の下、出力する回路である。図５に示される回路では、ｎの値として、３、５、７が想定されている。
【００３８】
図５に示されるように、各プロセッサエレメント３においては、各別の接続線部５０’とマルチプレクサ部５１’が備わる。接続線部５０’は、９ビットの信号を生成するのであるが、左方２ビットは「３」おきの周期変動を備え、中位３ビットは「５」おきの周期変動を備え、右方４ビットは「７」おきの周期変動を備えるように、接続線が設定されている。図５から明白なように、３おきの周期変動は、
・ＰＥ０、ＰＥ１、ＰＥ２、ＰＥ３、ＰＥ４、ＰＥ５、ＰＥ６、・・・の順に、
・０、１、２、０、１、２、０、・・・
となる。５おきの周期変動は、
・ＰＥ０、ＰＥ１、ＰＥ２、ＰＥ３、ＰＥ４、ＰＥ５、ＰＥ６、．．．の順に、
・０、１、２、３、４、０、１、・・・
となる。７おきの周期変動は、
・ＰＥ０、ＰＥ１、ＰＥ２、ＰＥ３、ＰＥ４、ＰＥ５、ＰＥ６、ＰＥ７、ＰＥ８、・・・の順に、
・０、１、２、３、４、５、６、０、１、・・・
となる。
【００３９】
どの周期変動の出力を選択するかは、入力値選択信号により与えられる。該入力値選択信号は、各プロセッサエレメント３におけるマルチプレクサ部５１’に指示を与えるものである。上記の周期変動の抽出に係る命令として、次の命令「ＬＤＲＮ」（ＬｏａｄＲｅｐｅａｔｉｎｇＮｕｍｂｅｒ）が設定されている。
【００４０】
ＬＤＲＮ＃３
【００４１】
上記の命令では、各プロセッサエレメント３の選択回路３５に対し、３おきの周期変動を備え１周期内では単調に０から２まで１ずつ増加する数値が、プロセッサエレメント３の有するＰＥ番号の順に従い、出力されることになる。
【００４２】
図５に示される本発明の第２の実施の形態に係る固定値選択回路を利用すれば、上記のような、選択回路３５（Ａレジスタ３８）への数値入力を、１命令サイクルで行なうことができる。ここで、上記の例に示される３、５、７という数字は、一般的に、画像処理において頻繁に使用される値であることが知られている。
【００４３】
≪第３の実施の形態≫
図６は、本発明の第３の実施の形態に係るｎおきビットパターンデータ出力回路の回路図である。上記ｎおきビットパターンデータ出力回路は、個別のビットにおいて（ＰＥ番号の順に従い）ｎおきに“１”を設定する接続線に対し、ＧＰ４からのビットマスク信号を取り入れる論理回路を挿入することにより形成される。
【００４４】
各ＰＥｉ（ｉ＝０，１，２，・・・２５５）においては、夫々のビットにおいてｎおきの“１”を設定する接続線部５０”とビットマスク信号を取り込む論理回路（ＡＮＤ回路）部５１”が結合して設置されている。そこからの出力は、第１の実施の形態に係る回路、及び第２の実施の形態に係る回路と同様に、各プロセッサエレメント３に設置されている選択回路３５（図２参照）に繋がる。その選択回路３５を介して例えばＡレジスタ３８にて上記出力が格納される。
【００４５】
各ＰＥｉの接続線部５０”は、８ビットの個別ビットを備える。例えば、ビット０は「２おき」、ビット１は「３おき」、ビット２は「５おき」、ビット３は「７おき」、ビット４は「１１おき」、ビット５は「１３おき」、ビット６は「１７おき」、及びビット７は「２３おき」の周期特性を備えている。夫々のビットにおいて各周期特性に相当するＰＥ番号であれば、該ビットが“１”となるように、各ＰＥｉの接続線部５０”が設定されている（即ち、該当するビットがＶＣＣに接続されている）（図６）。即ち、
・２おきの周期特性を有するビット０は、ＰＥ０、ＰＥ２、ＰＥ４、ＰＥ６、ＰＥ８、・・・において、
・３おきの周期特性を有するビット１は、ＰＥ０、ＰＥ３、ＰＥ６、ＰＥ９、ＰＥ１２、・・・において、
・５おきの周期特性を有するビット２は、ＰＥ０、ＰＥ５、ＰＥ１０、ＰＥ１５、ＰＥ２０、・・・において
ＶＣＣに接続されている。
【００４６】
上記のように設定される各ＰＥｉの接続線部５０”の設定内容の抽出に係る命令として、次の命令「ＬＤＲＢ」（ＬｏａｄＲｅｐｅａｔｉｎｇＢｉｔ）が用意されている。
【００４７】
ＬＤＲＢ
【００４８】
この命令により、各ＰＥｉの接続線部５０”の設定内容が全て、例えば、選択回路３５（乃至Ａレジスタ３８）に入力される。上記の命令では、
・ＰＥ０、ＰＥ１、ＰＥ２、ＰＥ３、ＰＥ４、ＰＥ５、ＰＥ６、ＰＥ７、ＰＥ８、ＰＥ９、ＰＥ１０、ＰＥ１１、・・・の順に、
・１１１１１１１１ｂ（２進数表記。以下同様である。）、００００００００ｂ、０００００００１ｂ、００００００１０ｂ、０００００００１ｂ、０００００１００ｂ、００００００１１ｂ、００００１０００ｂ、０００００００１ｂ、０００００１１０ｂ、０００００００１ｂ、・・・
となる。
【００４９】
但し、上述したように、上記接続線に対し、ＧＰ４からのビットマスク信号を取り入れる論理回路が挿入されているため、命令コードにおいて指定するビットが“０”に設定されることにより、選択回路３５（乃至Ａレジスタ３８）への入力データがマスクされ得る。そのマスク処理を行なうときには、命令は次のように記述される。マスクパターンは、即値で指定されてビットマスク信号として図６の回路に取り込まれ、選択回路３５（乃至Ａレジスタ３８）への入力データをマスクする。
【００５０】
ＬＤＲＢ／０＃０００００１００ｂ
【００５１】
上記の命令では、ビット２に“１”が設定されるＰＥに係る出力データのみ“０００００１００ｂ”の数値となる。“０００００１００ｂ”が、例えばＡレジスタ３８に設定されるのは、
・ＰＥ０、ＰＥ５、ＰＥ１０、ＰＥ１５、ＰＥ２０、．．．
である。それ以外のＰＥにおいては、“００００００００ｂ”となる。
【００５２】
第３の実施の形態に係るｎおきビットパターンデータ出力回路では、上記のような選択回路３５（Ａレジスタ３８）への数値入力を、１命令サイクルで行なうことができる。
【００５３】
≪第４の実施の形態≫
本発明の第４の実施の形態に係るＳＩＭＤ型マイクロプロセッサ２について説明する。
【００５４】
本発明の第１の実施の形態に係る「ＬＤＰＭ」命令において、即値（ＰＥ番号マスク信号）をすべてマスクしない数値（＃１１１１１１１１ｂ）とした場合、ＰＥ番号がそのまま選択回路３５（乃至Ａレジスタ３８）に入力される。また、前に説明したように、ＧＰ４からのＰＥ番号マスク信号を取り入れる論理回路が挿入されない、ＰＥ番号を所定の汎用レジスタ等に入力する接続線を、そのまま利用することによっても、ＰＥ番号がそのまま選択回路３５（乃至Ａレジスタ３８）に入力される。
【００５５】
従って、選択回路３５には、
・ＰＥ０、ＰＥ１、ＰＥ２、・・・ＰＥ２５５の順に、
・０、１、２、．．．２５５
の値が入力される。
【００５６】
各プロセッサエレメント３において、上記の選択回路３５に入力された値を、選択回路３５に備わるコンパレータ８０（図７）にて（ＳＩＭＤ型マイクロプロセッサ２への）命令で指定された即値と比較し、その結果をＴレジスタ（演算制御レジスタ）５４の所定の位置のビットに格納する。このＴレジスタ５４への結果格納により、次命令以降の演算にて、２５６個あるプロセッサエレメント３のうち特定の１つのプロセッサエレメント３においてのみ演算を行ない、その他のプロセッサエレメント３の演算を行なわないという設定をすることができる。
【００５７】
図７は、本発明に係る選択回路３５の概略回路ブロック図を示す。図４、図５及び図６の、ＰＥ番号マスク回路、固定値選択回路及びｎおきビットパターンデータ出力回路からの出力データは、図２に示す演算アレイ８内の選択回路３５に繋がる。ＧＰ４からの制御信号である入力データ選択信号２によってマルチプレクサ７８にて選択され、３者のうちいずれか１データがＡレジスタ３８に入力される。また、選択された１データは、命令コード中の即値データＩＭＭ２とコンパレータ８０にて比較され、その結果一致した場合には “１”を出力する。一致しなかった場合“０”を出力する。
【００５８】
ここで、本発明に係るＴレジスタ５４について説明する。図２のように、各プロセッサエレメント３には、実行条件指定のための演算制御レジスタ（Ｔレジスタ）５４が装備されている。図８は、Ｔレジスタ５４の回路ブロック図の例である。図８では、Ｔレジスタ５４は８ビットのレジスタ（Ｔ０、Ｔ１、・・・Ｔ７）を備え、夫々の１ビットのレジスタは別々に制御される。そのため、１つのプロセッサエレメント３にて８通りの制御パターンを保持できる。
【００５９】
Ｔレジスタ５４における夫々のビットは、各プロセッサエレメント３の演算実行の無効／有効の制御を行ない、特定のプロセッサエレメント３だけを演算対象として選択するという制御を行なうことができる。例えば、次のような命令が想定される。
【００６０】
ＡＤＤ／Ｔ１＃１２
【００６１】
この命令は加算命令であり、Ａレジスタ３８の値と即値“１２”とが加算されて結果がＡレジスタ３８に格納される。この命令において、“／Ｔ１”という実行制御オプションを記述することにより、Ｔレジスタ３８のうちＴ１の（ビット）フラグの値が“１”であるプロセッサエレメント３のみ、Ａレジスタ３８へのＡＬＵ３６からのデータの格納が行なわれる。Ｔ１フラグが“０”であるプロセッサエレメント３のＡレジスタ５４へのデータの格納は行なわれない。
【００６２】
図８の例では、このＴレジスタ５４への入力へは、
・ＰＥ３内の演算ユニット６のＡＬＵ３６にて発生したオーバーフローフラグ（Ｖ）、キャリーフラグ（Ｃ）からの入力、
・全Ｔフラグへの即値ＩＭＭ２によるマスク操作の結果に対する、ＯＲ操作の結果からの入力、
・図２に示される記憶手段２からの入力、
・図７の選択回路３５でのコンパレータ８０出力からの入力
などである。
【００６３】
以上説明した本発明の第４の実施の形態に係るＳＩＭＤ型マイクロプロセッサ２によれば、特定のＰＥ番号のプロセッサエレメント３のみ演算を行なわせるという処理を、少ない命令数で行なうことができる。従来の技術であれば、全プロセッサエレメント３のＴレジスタ５４に対し先ず“０”を設定し、更に、演算を行なうプロセッサエレメント３のＴレジスタ５４に“１”を設定することにより、上記の機能は実現されるが、本発明の第４の実施の形態に比べて命令数は必然的に増加してしまう。
【００６４】
≪第５の実施の形態≫
本発明の第５の実施の形態に係るＳＩＭＤ型マイクロプロセッサ２について説明する。
【００６５】
第５の実施の形態に係るＳＩＭＤ型マイクロプロセッサ２の各プロセッサエレメント３においては、第１の実施の形態に係るＰＥ番号マスク回路から選択回路３５に繋げられる（データ）値を、選択回路３５に備わるコンパレータ８０（図７）にて、命令コード中に指定された即値データと比較し、その結果をＴレジスタ５４の所定の位置のビットに格納する。このＴレジスタ５４への結果格納により、次命令以降の演算にて、「２のべき乗」おきのＰＥ番号を備えるプロセッサエレメント３のみの演算を実行させるという制御を行うことができる。
【００６６】
図４に示されるＰＥ番号マスク回路の出力は図７の選択回路３５に入力され、該選択回路３５では（予め用意されている）「ＬＤＴＭ」（ＬｏａｄｔｏＴｒｅｇｉｓｔｅｒＭａｓｋｅｄＰＥＮｕｍｂｅｒ）命令を実行することで、入力データ選択信号２によりそのＰＥ番号マスク回路の出力のデータが選択される。更に、上記ＬＤＴＭ命令の実行時に、即値データ（の１つ）がＩＭＭ２に入力され、ＰＥ番号マスク回路の出力データとＩＭＭ２とがコンパレータ８０にて比較される。一致するとそのプロセッサエレメント３のコンパレータ８０の出力が“１”となり、一致しなかったプロセッサエレメント３では“０”となる。次の命令は、実行される命令の例である。
【００６７】
ＬＤＴＭ／Ｔ２＃００００００１１ｂ、＃１
【００６８】
上記命令において、即値オペランドのうち第１オペランドがマスクパターンである。第２オペランドが比較値である。但し、マスクパターンは２進数表記、比較値は１０進表記としている。
・ＰＥ０、ＰＥ１、ＰＥ２、ＰＥ３、ＰＥ４、ＰＥ５、ＰＥ６、ＰＥ７、・・・
にマスクした結果の値は、
・０、１、２、３、０、１、２、３、・・・
のような繰り返しの値となるが、比較値と一致する値のＰＥ、即ち、
・ＰＥ１、ＰＥ５、ＰＥ９、・・・
のＴレジスタ５４のＴ２フラグに“１”が設定される。他のＰＥのＴレジスタ５４のＴ２フラグには“０”が設定される。これによって以降の命令でＴレジスタ５４のＴ２フラグを実行制御に用いれば、特定のＰＥ、即ちＰＥ１から始まり「２²」おきのＰＥ番号を備えるプロセッサエレメント３のみの演算を実行させることができる。
【００６９】
≪第６の実施の形態≫
本発明の第６の実施の形態に係るＳＩＭＤ型マイクロプロセッサ２について説明する。
【００７０】
第６の実施の形態に係るＳＩＭＤ型マイクロプロセッサ２の各プロセッサエレメント３においては、第２の実施の形態に係る固定値選択回路から選択回路３５に繋げられる（データ）値を、選択回路３５に備わるコンパレータ８０（図７）にて、命令コード中に指定された即値データと比較し、その結果をＴレジスタ５４の所定の位置のビットに格納する。このＴレジスタ５４への結果格納により、次命令以降の演算にて、３、５、７のような特定の数値おきのＰＥ番号を備えるプロセッサエレメント３のみの演算を実行させるという制御を行うことができる。
【００７１】
図５に示される固定値選択回路の出力は図７の選択回路に入力され、該選択回路３５では（予め用意されている）「ＬＤＴＮ」（ＬｏａｄｔｏＴｒｅｇｉｓｔｅｒＰｒｅｄｅｆｉｎｅｄＮｕｍｂｅｒ）命令を実行することで、入力データ選択信号２によりその固定値選択回路の出力データが選択される。更に、上記ＬＤＴＮ実行時に、即値データ（の１つ）がＩＭＭ２に入力され、固定値選択回路の出力データとＩＭＭ２とがコンパレータ８０にて比較される。一致するとそのプロセッサエレメント３のコンパレータ８０の出力が“１”となり、一致しなかったプロセッサエレメント３では“０”となる。次の命令は、実行される命令の例である。
【００７２】
ＬＤＴＮ／Ｔ２＃３、＃２
【００７３】
上記命令において、即値オペランドのうち第１オペランドが選択固定値である。第２オペランドが比較値である。
・ＰＥ０、ＰＥ１、ＰＥ２、ＰＥ３、ＰＥ４、ＰＥ５、ＰＥ６、ＰＥ７、・・・
に結果の値は、
・０、１、２、０、１、２、０、１、．．．
のような繰り返しの値となるが、比較値と一致する値のＰＥ、即ち、
・ＰＥ２、ＰＥ５、ＰＥ８、・・・
のＴレジスタ５４のＴ２フラグに“１”が設定される。他のＰＥのＴレジスタ５４のＴ２フラグには“０”が設定される。これによって以降の命令でＴレジスタ５４のＴ２フラグを実行制御に用いれば、特定のＰＥ、即ちＰＥ２から始まり「３」おきのＰＥ番号を備えるプロセッサエレメント３のみの演算を実行させることができる。
【００７４】
≪第７の実施の形態≫
本発明の第７の実施の形態に係るＳＩＭＤ型マイクロプロセッサ２について説明する。
【００７５】
第７の実施の形態に係るＳＩＭＤ型マイクロプロセッサ２の各プロセッサエレメント３においては、第３の実施の形態に係るｎおきビットパターンデータ出力回路から選択回路３５に繋げられる（データ）値を、選択回路３５に備わるコンパレータ８０（図７）にて、命令コード中に指定された即値データと比較し、その結果をＴレジスタ５４の所定の位置のビットに格納する。このＴレジスタ５４への結果格納により、次命令以降の演算にて、２、３、５、７、１１のような特定の数値おきのＰＥ番号を備えるプロセッサエレメント３のみの演算を実行させるという制御を行うことができる。
【００７６】
図６に示されるｎおきビットパターンデータ出力回路の出力は図７の選択回路３５に出力され、該選択回路３５では（予め用意されている）「ＬＤＴＢ」（ＬｏａｄｔｏＴｒｅｇｉｓｔｅｒＢｉｔＮｕｍｂｅｒ）命令を実行することで、入力データ選択信号２によりそのｎおきビットパターンデータ出力回路の出力のデータが選択される。更に、上記ＬＤＴＢ命令の実行時に、即値データ（の１つ）がＩＭＭ２に入力され、ｎおきビットパターンデータ出力回路の出力データとＩＭＭ２とがコンパレータ８０にて比較される。一致するとそのプロセッサエレメント３のコンパレータの出力が“１”となり、一致しなかったプロセッサエレメント３では“０”となる。次の命令は、実行される命令の例である。
【００７７】
ＬＤＴＢ／Ｔ２＃００００００１０ｂ、＃１
【００７８】
上記命令において、即値オペランドのうち第１オペランドがｎおきビット指定である。第２オペランドが比較値である。
・ＰＥ０、ＰＥ１、ＰＥ２、ＰＥ３、ＰＥ４、ＰＥ５、ＰＥ６、ＰＥ７、・・・
に結果の値は、
・１、０、０、１、０、０、１、０、．．．
のような繰り返しの値となるが、比較値と一致する値のＰＥ、即ち、
・ＰＥ０、ＰＥ３、ＰＥ６、・・・
のＴレジスタ５４のＴ２フラグに“１”が設定される。他のＰＥのＴレジスタ５４のＴ２フラグには“０”が設定される。これによって以降の命令でＴレジスタ５４のＴ２フラグを実行制御に用いれば、特定のＰＥ、即ちＰＥ０から始まり「３」おきのＰＥ番号を備えるプロセッサエレメント３のみの演算を実行させることができる。
【００７９】
【発明の効果】
本発明を利用することにより、特に、画像データ処理に伴う命令の命令実行サイクル数を減らすことができる。そのために増設の必要な回路は、簡素なものであるに過ぎない。
【図面の簡単な説明】
【図１】一般的なＳＩＭＤ型マイクロプロセッサの概略の構成を示すブロック図である。
【図２】本発明に係るＳＩＭＤ型マイクロプロセッサのより詳細な構成を示すブロック図である。
【図３】レジスタファイルのレジスタと演算アレイとを結び付けるマルチプレクサの機能を示すブロック図である。
【図４】本発明の第１の実施の形態に係るＰＥ番号マスク回路の回路図である。
【図５】本発明の第２の実施の形態に係る固定値選択回路の回路図である。
【図６】本発明の第３の実施の形態に係るｎおきビットパターンデータ出力回路の回路図である。
【図７】本発明に係る選択回路の概略回路ブロック図を示す。
【図８】演算制御レジスタ（Ｔレジスタ）の回路ブロック図の例である。
【符号の説明】
２・・・ＳＩＭＤ型マイクロプロセッサ、３・・・プロセッサエレメント、４・・・グローバルプロセッサ、６・・・レジスタファイル、８・・・演算アレイ、３４・・・レジスタ、３５・・・演算回路、３６・・・ＡＬＵ（算術論理演算装置）、３８・・・Ａレジスタ、５０、５０’、５０”・・・接続線部、５１、５１”・・・論理回路部、５１’・・・マルチプレクサ部、５４・・・Ｔレジスタ（演算制御レジスタ）、７８・・・マルチプレクサ、８０・・・コンパレータ。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a single instruction-stream multiple data-stream (SIMD) type microprocessor that performs the same processing on a plurality of data with the same instruction in order to process image data and the like at high speed.
[0002]
[Prior art]
In recent years, in image processing such as digital copying machines and facsimile machines, image quality has been improved by increasing the number of pixels and diversifying image processing. In such image processing, the same processing is often performed simultaneously on a plurality (many) of data. At this time, in order to improve the high speed, a single instruction-single instruction-simulation (single instruction-stream) single-stream data (SIMD) type microprocessor that processes one piece of data simultaneously with a single instruction. -Stream (Multiple Data-stream) type microprocessor is often used.
[0003]
FIG. 1 is a block diagram showing a schematic configuration of a general SIMD type microprocessor 2. The SIMD type microprocessor 2 is roughly composed of a global processor (hereinafter referred to as GP) 4 and a processor element 3, and a plurality of processor elements 3 are used to process a plurality of data at a time. Equipped with pieces. Each processor element 3 includes a register file 6 and an operation array 8. The GP 4 controls the entire processor 2, and the processor element 3 inputs data from the external input / output device, performs data processing, and outputs the data to the external input / output device.
[0004]
The SIMD type microprocessor 2 normally processes one instruction in one clock cycle, but can process data for the number of processor elements 3 at one time with one instruction. When expressing the performance of the SIMD type microprocessor 2, the operating frequency of the SIMD type microprocessor 2 and the number of processor elements 3, that is, the number of data that can be processed by one instruction, are emphasized. Number is also an important factor. In other words, as long as the same image processing is performed, the performance is better when the number of instruction cycles is small. However, in order to perform complicated processing with one instruction, if a complicated circuit is designed and used, the cost inevitably increases.
[0005]
[Problems to be solved by the invention]
It is an object of the present invention to reduce the number of instruction execution cycles of instructions associated with image data processing as described above by providing effective instructions and simple means for realizing the instructions.
[0006]
[Means for Solving the Problems]
  The present invention has been made to achieve the above object. According to the first aspect of the present invention, there is provided a SIMD type microprocessor.
  Each processor element determines whether or not to store the operation result of the arithmetic logic unit included in the processor element in a predetermined general-purpose register.Have a value forHas a calculation control register,
  Each processor element is given an integer number for identification, in order,
It has a connecting line part that forms an integer number for the above identification with VCC and GND,
Furthermore,
  An integer number for the above identification,In global processorsThe numerical value specified by the instruction code is compared simultaneously across all processor elements, and the comparison result isFor each processor elementThe SIMD type microprocessor is characterized in that it is set in a flag included in the arithmetic control register.
[0007]
  According to a second aspect of the present invention, there is provided a SIMD type microprocessor.
  Each processor element determines whether or not to store the operation result of the arithmetic logic unit included in the processor element in a predetermined general-purpose register.Have a value forHas a calculation control register,
  Each processor element is given an integer number for identification, in order,
In each processor element
A connection line part that forms an integer number for identification with VCC and GND, and a logic circuit part that captures a mask signal for the identification number for identification constitutes a mask circuit for the identification number for identification, and the connection Each of the bit signals constituting the line part and each bit of the mask signal is an input of an individual logic circuit constituting the logic circuit part,
Furthermore,
With the mask signal specified by the global processor, the logic circuit part of each processor element masks and outputs the bit specified by the mask signal of the integer number for identification with the contents of the mask signal,
Accordingly, in accordance with the reference number for identification for each processor element, the combination of the portion that is output with the mask in the output of the logic circuit portion and the portion that is output without being masked is the cycle Rules,
Furthermore,
  Each processor element aboveOf the above mask circuitAnd the data generated inIn global processorsThe numerical value specified by the instruction code is compared simultaneously across all processor elements, and the comparison result isFor each processor elementThe SIMD type microprocessor is characterized in that it is set in a flag included in the arithmetic control register.
[0008]
  According to the third aspect of the present invention, there is provided a SIMD type microprocessor.
  Each processor element has an operation control register that determines whether or not to store the operation result of the arithmetic logic unit included in the processor element in a predetermined general-purpose register,
  Each processor element is given an integer number for identification, in order,
Each processor element has a connection line part formed by VCC and GND, and a multiplexer part connected to the connection line part for receiving an input value selection signal designated by a global processor,
The connection line portion is divided into a plurality of partial connection line portions, the division being common and fixed throughout the processor element, each partial connection portion having one or a plurality of bit signal portions,
Each partial connection line portion is formed so as to output a bit signal that varies periodically by being connected to VCC and GND, corresponding to an integer number for identification sequentially attached to the processor element,
Furthermore,
In response to an input value selection signal designated by the global processor, the multiplexer unit selects a common connection line part that constitutes the connection line part of each processor element and outputs the selected partial connection part. Output signal,
Thereby, corresponding to the serial number for identification for each processor element, the output of the multiplexer unit has a periodic rule,
Furthermore,
  Each processor element aboveAn output value having a periodic rule generated by the multiplexer unit ofWhen,In global processorsThe numerical value specified by the instruction code is compared simultaneously across all processor elements, and the comparison result isFor each processor elementThe SIMD type microprocessor is characterized in that it is set in a flag included in the arithmetic control register.
[0009]
  The SIMD type microprocessor according to claim 4 according to the present invention includes:
  Each processor element has an operation control register that determines whether or not to store the operation result of the arithmetic logic unit included in the processor element in a predetermined general-purpose register,
  Each processor element is given an integer number for identification, in order,
In each processor element
The connection line portion and the logic circuit portion that captures the mask signal constitute a bit pattern data output circuit, and each of the bit signals constituting the connection line portion and each bit of the mask signal constitutes a logic circuit portion. It is an input to an individual logic circuit,
The connection line part is divided into individual connection lines, this division being common and fixed throughout the processor element,
Each individual connection line is formed to output a periodically varying bit signal by connection of VCC or GND in correspondence with an integer number for identification sequentially given to the processor elements.
Furthermore,
With the mask signal specified by the global processor, the logic circuit portion of each processor element masks the specified bit with the content of the mask signal and outputs it,
Accordingly, in accordance with the reference number for identification for each processor element, the combination of the portion that is output with the mask in the output of the logic circuit portion and the portion that is output without being masked is the cycle Rules,
Furthermore,
  Each processor element aboveBit pattern data output circuitData generated inIn global processorsThe numerical value specified by the instruction code is compared simultaneously across all processor elements, and the comparison result isFor each processor elementThe SIMD type microprocessor is characterized in that it is set in a flag included in the arithmetic control register.
[0013]
DETAILED DESCRIPTION OF THE INVENTION
DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments according to the present invention will be described below with reference to the drawings.
[0014]
FIG. 1 is a block diagram showing a schematic configuration of a general SIMD type microprocessor 2 including the present invention. From a global processor (hereinafter referred to as GP) 4 that mainly controls the entire processor 2, and a processor element 3 that mainly inputs data from an external input / output device, performs data processing, and outputs data to the external input / output device. Composed. A plurality of processor elements 3 are prepared for simultaneously processing a plurality of data. In FIG. 1, a SIMD type microprocessor 2 is configured by one GP 4 and 256 processor elements 3.
[0015]
FIG. 2 is a block diagram showing a more detailed configuration of the SIMD type microprocessor 2 according to the present invention. As shown in the figure, GP4 is
A program RAM 10 for storing a program composed of instruction codes;
A data RAM 12 for storing calculation data in GP4;
A sequential unit (SCU) 9 that decodes the program and sends various control signals to various blocks;
A plurality of general purpose registers (G0 to G3) for storing data;
A program counter (PC) 14 that holds the address of the program to send the instruction code of the program to the SCU 9;
A stack pointer (SP) 24 for storing the address of the data memory in order to form a stack in the data memory;
-When performing subroutine processing in the middle of a program, a branch occurs, but a plurality of link registers (LS, LI, LN) for storing addresses before branching;
An arithmetic and logic unit (ALU) 11 that performs an arithmetic and logical operation on any combination of data in data memory, numeric (immediate) data described in an instruction code, or data stored in a general-purpose register;
A processor status register (not shown) that holds the state of the processor;
An interrupt control circuit (not shown) for controlling hardware interrupts and software interrupts;
An external I / O control circuit (not shown) that is directly connected to the external input / output and controls input / output of data from the outside.
including.
[0016]
Although not shown in FIG. 2, the SCU 9 decodes the GP instruction and generates a control signal mainly in each block in the GP, and decodes the processor element instruction and mainly in each block in the processor element. And a processor element instruction decoder for generating a control signal. That is, the instruction code related to this processor is a GP instruction that mainly controls each block in GP4, determines a program sequence, and processes common data to be transferred to processor element 3 by ALU11 in GP4. These are classified into processor element instructions for processing data input at a time from the external input / output for each processor element 3.
[0017]
As shown in FIG. 1, each processor element 3 includes a register file 6 that temporarily holds input / output data from the outside, and an operation array for performing data processing of arithmetic logic operations and bit operations in the processor element 3. 8 is included. Further, as shown in FIG. 2, the register file 6 includes 32 8-bit registers 34 from R0 to R31, for example. Data is transferred from these registers 34 to the operation array 8, and conversely, data is transferred from the operation array 8 and stored in the register 34. The bus between the register 34 and the arithmetic array 6 is an 8-bit bidirectional bus.
[0018]
Further, as shown in FIG. 2, the single arithmetic array 8 is an arithmetic unit,
A shift / extension unit 44 that shifts data from the register file 6 to perform signed extension or unsigned extension to process the data into 16-bit data;
For example, a plurality of general purpose registers such as A register 36 and F register 40,
An arithmetic logic unit (ALU) 36 that processes the data from the register file 6 via the shift / extension unit 44 to make one input and the other input from the A register 36;
The outputs from the PE number mask circuit, fixed value selection circuit, and n-th bit pattern data output circuit according to the present invention are input to the A register 38 and the T register 54 (described later). Connected selection circuit 35 and
including. The output of the arithmetic logic unit (ALU) 36 is set so as to be temporarily stored in the A register 36 or the F register 40, but data is transferred from the A register 36 to a predetermined one register 34 of the register file 6. Is also possible.
[0019]
The arithmetic array 8 includes an arithmetic control register 54 called “T register” as will be described later. The output from the ALU 36 is controlled by the T register 54 to be written to the A register 36 or the F register 40. For example, according to the state of a predetermined 1 bit in the arithmetic control register (T register) 54, if “1”, writing to the A register 36 or the F register 54 is performed, and if “0”, it is not performed. Such control is performed.
[0020]
FIG. 3 is a block diagram showing the function of the multiplexer that connects the register 34 of the register file 6 and the operation array 8. The multiplexers provided in the processor elements of PEi (i = 0, 1, 2,... 255) are 7 to 1 (7 to 1) multiplexers, PEi-3 (3 left neighbors of PEi), PEi-2 (PEi 2), PEi-1 (one left next to PEi), PEi, PEi + 1 (one right next to PEi), PEi + 2 (two right next to PEi), PEi + 3 (three right next to PEi) It is set so that data from the register file 6 of the processor element 3 can be input / output. This function is referred to as a PE shift function. The data selected by the multiplexer is transferred to the shift / extension unit 44 of the arithmetic array 8.
[0021]
Here, a name including the number of the processor element 3 is defined. As shown in FIG. 2, the SIMD type microprocessor 2 according to the present invention is provided with 256 processor elements 3, and each of the processor elements 3 (from the left side in the figure) PE0, PE1, PE2 , PE3,..., PE254, PE255, and so on, are defined as processor element numbers (PE numbers).
[0022]
<< First Embodiment >>
FIG. 4 is a circuit diagram of a PE (processor element) number mask circuit according to the first embodiment of the present invention. In the PE number mask circuit, a logic circuit that takes in a mask control signal (ie, PE number mask signal) from GP4 is inserted into a connection line for inputting a PE number (which is a prior art) to a predetermined general-purpose register. It is formed by. As will be described later, in the circuit configuration of FIG. 4, masking is performed with “0”.
[0023]
In each PEi (i = 0, 1, 2,... 255), a connection line unit 50 that forms each PE number and a logic circuit (AND circuit) unit 51 that captures a PE number mask signal are combined and installed. Has been. The output from there is connected to a selection circuit 35 (see FIG. 2) installed for each processor element 3, as will be described later. The output is stored in the A register 38 through the selection circuit 35, for example.
[0024]
It has been possible in the past to set the PE number in the A register 38 by using the connecting line portion 50 that forms each PE number (which is a conventional technique). For example, it is assumed that the above function is realized by executing an instruction “LDPN” (Load PE Number). Here, even with the circuit according to the first embodiment of the present invention, the same function is realized, that is, the transmission of each PE number to each selection circuit 35 is realized by using the instruction “LDPN”. Can be set to As is apparent from FIG. 4, the PE number mask signal when the LDPN instruction is used (8 bits) is all “1”. With this setting, data representing the PE number in a pattern formed by VCC and GND is input to the selection circuit 35 of each processor element 3.
[0025]
Specifically, the data input to the selection circuit 35 is
-PE0, PE1, PE2, ... PE255 in this order
・ 0, 1, 2, ... 255
The value of
[0026]
Furthermore, in the circuit according to the first embodiment of the present invention, the input data can be masked by setting the bit designated by the instruction to the SIMD type microprocessor 2 to “0”. As an instruction for performing such processing, an “LDNM” (Load PE Number Masking) instruction is prepared. The LDNM instruction masks output data by designating a mask pattern with an immediate value. The immediate value is taken into the circuit of FIG. 4 as a PE mask signal. The LDNM instruction is described as follows.
[0027]
LDNM / 0 # 00000011b
[0028]
In the above description, “b” at the end of “00000011b” indicates binary notation. In the above instruction, the output values from bit 3 to bit 7 of the PE number are masked. “/ 0” is an optional description representing a mask by “0”.
[0029]
Therefore, the data input to each selection circuit 35 is
-PE0, PE1, PE2, PE3, PE4, PE5, PE6, ... PE254, PE255 in this order,
・ 0, 1, 2, 3, 0, 1, 2, ... 2, 3
The value of
[0030]
In the circuit according to the first embodiment of FIG. 4, an AND circuit is used, but it may be assumed that this is replaced with an OR circuit. That is, in each PEi (i = 0, 1, 2,... 255), a connection line portion 50 that forms each PE number and a logic circuit (OR circuit) portion that captures a PE number mask signal (not shown). Will be installed in combination. At this time, in the circuit configuration described above (FIG. 4), masking is performed with “0”. However, in the case of a circuit that is replaced with an OR circuit, masking with “1” is performed. The instructions are written as follows:
[0031]
LDNM / 1 # 11111100b
[0032]
In the above description, the input values from bit 3 to bit 7 of the PE number are masked with “1”. / 1 is an optional description representing a mask by “1”.
[0033]
At this time, the data input to each selection circuit 35 is:
-PE0, PE1, PE2, PE3, PE4, PE5, PE6, ... PE254, PE255 in this order,
252 (11111100b), 253 (11111101b), 254 (11111110b), 255 (11111111b), 252, 253, 254, ... 254, 255
The value of
[0034]
By using the circuit according to the first embodiment of the present invention shown in FIG. 4, a control line corresponding to a bit of “0” is masked by a mask control signal (PE number mask signal), and a predetermined value is masked. It is possible to set (form) data represented by only this bit and having repetitive regularity with one instruction. For example, by masking all the upper 6 bits, only the lower 2 bits are valid, and for each selection circuit 35 of the processor element 3 of PE0, PE1, PE2, PE3, PE4, PE5, PE6,.
・ 0, 1, 2, 3, 0, 1, 2, 3, ...
It is possible to output a repeated value with regularity in one instruction. In the circuit of FIG. 4, even if all the AND circuits are OR circuits, the rules for the selection circuits 35 of the processor elements 3 of PE0, PE1, PE2, PE3, PE4, PE5, PE6,. It is possible to output a repeated value with a single instruction. However, at this time, the mask is made by “1”.
[0035]
When using the prior art, if the PE number is once set in the A register 38 by the LDPN instruction, and the remainder when the value of the A register 38 is divided by 4 is also set in the A register 38, the same as in FIG. Result is obtained. But,
・ One instruction cycle with LDPN instruction
-1 instruction cycle with division instruction (assuming there is a multiplier / divider)
A total of two instruction cycles are required. However, if the multiplier / divider is set in all the processor elements 3 of the SIMD type microprocessor 2, an enormous circuit scale is required. Since a multiplier / divider is not set in the normal SIMD type microprocessor 2, division is performed by subtracting up to the number of bits of the dividend using the ALU 36. If the A register 38, which is a general-purpose register, is 16 bits, for example, subtraction is required at least 16 times (actually, setting before division requires several instruction cycles). Here, it takes 16 instruction cycles, and the total is 18 instruction cycles at the shortest.
[0036]
As is apparent from the above, a considerable number of instruction cycles can be reduced by the circuit according to the first embodiment of the present invention.
[0037]
<< Second Embodiment >>
FIG. 5 is a circuit diagram of a fixed value selection circuit according to the second embodiment of the present invention. The fixed value selection circuit has a cycle variation every n (n is a natural number), and inputs a numerical value monotonously increasing from 0 to 1 within one cycle to the selection circuit 35 (described later). It is a circuit that outputs under the selection control by. In the circuit shown in FIG. 5, 3, 5, and 7 are assumed as the value of n.
[0038]
As shown in FIG. 5, each processor element 3 includes a separate connection line portion 50 ′ and a multiplexer portion 51 ′. The connecting line section 50 'generates a 9-bit signal, but the left 2 bits have periodic fluctuations every "3", and the middle 3 bits have periodic fluctuations every "5". The connection lines are set so that the 4 bits have periodic fluctuations every "7". As is apparent from FIG. 5, every third periodic variation is
・ PE0, PE1, PE2, PE3, PE4, PE5, PE6, ...
・ 0, 1, 2, 0, 1, 2, 0, ...
It becomes. Periodic fluctuation every 5th
PE0, PE1, PE2, PE3, PE4, PE5, PE6,. . . In the order
・ 0, 1, 2, 3, 4, 0, 1, ...
It becomes. Periodic fluctuation every 7th
-PE0, PE1, PE2, PE3, PE4, PE5, PE6, PE7, PE8,.
・ 0, 1, 2, 3, 4, 5, 6, 0, 1, ...
It becomes.
[0039]
Which period variation output is selected is given by an input value selection signal. The input value selection signal gives an instruction to the multiplexer unit 51 ′ in each processor element 3. The next instruction “LDRN” (Load Repeating Number) is set as an instruction related to the above-described extraction of the period variation.
[0040]
LDRN # 3
[0041]
In the above-described instruction, the selection circuit 35 of each processor element 3 has a period variation every third, and a numerical value that increases monotonically from 0 to 2 within one period follows the order of PE numbers of the processor element 3. Will be output.
[0042]
If the fixed value selection circuit according to the second embodiment of the present invention shown in FIG. 5 is used, the numerical value input to the selection circuit 35 (A register 38) as described above is performed in one instruction cycle. Can do. Here, it is known that the numbers 3, 5, and 7 shown in the above example are values that are frequently used in image processing.
[0043]
<< Third Embodiment >>
FIG. 6 is a circuit diagram of an n-th bit pattern data output circuit according to the third embodiment of the present invention. The n-bit bit pattern data output circuit inserts a logic circuit that takes in a bit mask signal from GP4 to a connection line that sets “1” every n (in the order of PE numbers) in individual bits. It is formed.
[0044]
In each PEi (i = 0, 1, 2,... 255), a connection line portion 50 for setting every n to “1” in each bit and a logic circuit (AND circuit) portion for taking in a bit mask signal 51 ″ are installed in combination. The output from there is connected to a selection circuit 35 (see FIG. 2) installed in each processor element 3 in the same manner as the circuit according to the first embodiment and the circuit according to the second embodiment. The output is stored in the A register 38 through the selection circuit 35, for example.
[0045]
Each PEi connection line section 50 "includes 8 individual bits. For example, bit 0 is" every 2 ", bit 1 is" every 3 ", bit 2 is" every 5 ", and bit 3 is" every 7 ". , Bit 4 is “every 11”, bit 5 is “every 13”, bit 6 is “every 17”, and bit 7 is “every 23”. If each bit has a PE number corresponding to each periodic characteristic, the connection line portion 50 ”of each PEi is set so that the bit is“ 1 ”(that is, the corresponding bit is connected to the VCC). (FIG. 6) That is,
Bit 0 having every other periodic characteristic is PE0, PE2, PE4, PE6, PE8,.
-Bit 1 having periodic characteristics of every third is PE0, PE3, PE6, PE9, PE12, ...
-Bit 2 having a periodic characteristic of every 5 is in PE0, PE5, PE10, PE15, PE20, ...
Connected to VCC.
[0046]
The next instruction “LDRB” (Load Repeating Bit) is prepared as an instruction relating to the extraction of the setting contents of the connection line portion 50 ″ of each PEi set as described above.
[0047]
LDRB
[0048]
By this instruction, all the setting contents of the connection line portion 50 ″ of each PEi are input to, for example, the selection circuit 35 (or the A register 38).
-PE0, PE1, PE2, PE3, PE4, PE5, PE6, PE7, PE8, PE9, PE10, PE11,.
11111111b (binary notation; the same applies hereinafter), 00000000b, 00000001b, 00000010b, 00000001b, 00000100b, 00000011b, 00001000b, 00000001b, 00000110b, 00000001b, ...
It becomes.
[0049]
However, as described above, since the logic circuit for taking in the bit mask signal from GP4 is inserted into the connection line, the selection circuit 35 is set by setting the bit specified in the instruction code to “0”. Input data to (or A register 38) may be masked. When performing the mask processing, the instruction is described as follows. The mask pattern is designated as an immediate value and is taken into the circuit of FIG. 6 as a bit mask signal to mask input data to the selection circuit 35 (or A register 38).
[0050]
LDRB / 0 # 00000100b
[0051]
In the above instruction, only the output data related to the PE in which “1” is set in bit 2 has a numerical value of “00000100b”. For example, “00000100b” is set in the A register 38.
PE0, PE5, PE10, PE15, PE20,. . .
It is. For other PEs, the value is “00000000b”.
[0052]
In the n-bit bit pattern data output circuit according to the third embodiment, the numerical value input to the selection circuit 35 (A register 38) as described above can be performed in one instruction cycle.
[0053]
<< Fourth Embodiment >>
A SIMD type microprocessor 2 according to a fourth embodiment of the present invention will be described.
[0054]
In the “LDPM” instruction according to the first embodiment of the present invention, when the immediate value (PE number mask signal) is a numerical value (# 11111111b) that does not mask all, the PE number is directly selected by the selection circuit 35 (or A register 38). Is input. In addition, as described above, the PE number is not changed by using the connection line for inputting the PE number to a predetermined general-purpose register or the like without inserting a logic circuit for taking in the PE number mask signal from GP4. The data is input to the selection circuit 35 (or A register 38).
[0055]
Therefore, the selection circuit 35 includes
-PE0, PE1, PE2, ... PE255 in this order
・ 0, 1, 2,. . . 255
The value of is entered.
[0056]
In each processor element 3, the value input to the selection circuit 35 is compared with the immediate value specified by the instruction (to the SIMD type microprocessor 2) in the comparator 80 (FIG. 7) provided in the selection circuit 35. The result is stored in a bit at a predetermined position of the T register (arithmetic control register) 54. By storing the result in the T register 54, the calculation after the next instruction is performed only in one specific processor element 3 among the 256 processor elements 3, and the other processor elements 3 are not operated. You can set it.
[0057]
FIG. 7 shows a schematic circuit block diagram of the selection circuit 35 according to the present invention. Output data from the PE number mask circuit, the fixed value selection circuit and the n-th bit pattern data output circuit shown in FIGS. 4, 5, and 6 is connected to the selection circuit 35 in the arithmetic array 8 shown in FIG. The input data selection signal 2 which is a control signal from the GP 4 is selected by the multiplexer 78, and one of the three data is input to the A register 38. Also, the selected 1 data is compared with the immediate data IMM2 in the instruction code by the comparator 80, and when the result is coincident, “1” is output. If they do not match, “0” is output.
[0058]
Here, the T register 54 according to the present invention will be described. As shown in FIG. 2, each processor element 3 is equipped with an operation control register (T register) 54 for specifying an execution condition. FIG. 8 is an example of a circuit block diagram of the T register 54. In FIG. 8, the T register 54 includes 8-bit registers (T0, T1,... T7), and each 1-bit register is controlled separately. Therefore, eight control patterns can be held by one processor element 3.
[0059]
Each bit in the T register 54 can be controlled to invalidate / validate the execution of the operation of each processor element 3 and to select only a specific processor element 3 as an operation target. For example, the following instructions are assumed.
[0060]
ADD / T1 # 12
[0061]
This instruction is an addition instruction, the value of the A register 38 and the immediate value “12” are added, and the result is stored in the A register 38. In this instruction, by describing the execution control option “/ T1”, only the processor element 3 whose T1 (bit) flag value is “1” in the T register 38 is sent from the ALU 36 to the A register 38. Data is stored. Data is not stored in the A register 54 of the processor element 3 whose T1 flag is “0”.
[0062]
In the example of FIG. 8, the input to the T register 54 is
An input from the overflow flag (V) and carry flag (C) generated in the ALU 36 of the arithmetic unit 6 in the PE 3
-Input from the result of OR operation to the result of mask operation by immediate value IMM2 to all T flags,
Input from the storage means 2 shown in FIG.
・ Input from comparator 80 output in selection circuit 35 of FIG.
Etc.
[0063]
According to the SIMD type microprocessor 2 according to the fourth embodiment of the present invention described above, it is possible to perform the process of performing the operation only on the processor element 3 having a specific PE number with a small number of instructions. According to the conventional technique, first, “0” is set to the T registers 54 of all the processor elements 3, and further, “1” is set to the T registers 54 of the processor elements 3 that perform the operation. However, the number of instructions inevitably increases as compared with the fourth embodiment of the present invention.
[0064]
<< Fifth Embodiment >>
A SIMD type microprocessor 2 according to a fifth embodiment of the present invention will be described.
[0065]
In each processor element 3 of the SIMD type microprocessor 2 according to the fifth embodiment, the (data) value connected to the selection circuit 35 from the PE number mask circuit according to the first embodiment is transferred to the selection circuit 35. The provided comparator 80 (FIG. 7) compares the immediate data specified in the instruction code and stores the result in a bit at a predetermined position of the T register 54. By storing the result in the T register 54, it is possible to perform control such that only the processor element 3 having the PE number every “power of 2” is executed in the calculation after the next instruction.
[0066]
The output of the PE number mask circuit shown in FIG. 4 is input to the selection circuit 35 of FIG. 7, and the selection circuit 35 executes an “LDTM” (Load to Tregister Masked PE Number) instruction (prepared in advance). Thus, the output data of the PE number mask circuit is selected by the input data selection signal 2. Further, when the LDTM instruction is executed, immediate data (one of them) is input to the IMM 2, and the output data of the PE number mask circuit and the IMM 2 are compared by the comparator 80. If they match, the output of the comparator 80 of the processor element 3 becomes “1”, and the processor element 3 that does not match becomes “0”. The following instruction is an example of an instruction to be executed.
[0067]
LDTM / T2 # 00000011b, # 1
[0068]
In the above instruction, the first operand of the immediate operand is a mask pattern. The second operand is a comparison value. However, the mask pattern is expressed in binary notation, and the comparison value is expressed in decimal.
・ PE0, PE1, PE2, PE3, PE4, PE5, PE6, PE7, ...
The result value masked by
・ 0, 1, 2, 3, 0, 1, 2, 3, ...
It is a repetitive value like
・ PE1, PE5, PE9, ...
"1" is set in the T2 flag of the T register 54. “0” is set in the T2 flag of the T register 54 of another PE. As a result, if the T2 flag of the T register 54 is used for execution control in the subsequent instructions, “2” starts from a specific PE, ie, PE1.²The operation of only the processor element 3 having every other PE number can be executed.
[0069]
<< Sixth Embodiment >>
A SIMD type microprocessor 2 according to a sixth embodiment of the present invention will be described.
[0070]
In each processor element 3 of the SIMD type microprocessor 2 according to the sixth embodiment, a (data) value connected to the selection circuit 35 from the fixed value selection circuit according to the second embodiment is transferred to the selection circuit 35. The provided comparator 80 (FIG. 7) compares the immediate data specified in the instruction code and stores the result in a bit at a predetermined position of the T register 54. By storing the result in the T register 54, it is possible to perform control such that only the processor element 3 having a PE number every specific numerical value such as 3, 5, 7 is executed in the calculation after the next instruction. it can.
[0071]
The output of the fixed value selection circuit shown in FIG. 5 is input to the selection circuit of FIG. 7, and the selection circuit 35 executes an “LDTN” (Load to T register Predefined Number) instruction (prepared in advance). The output data of the fixed value selection circuit is selected by the input data selection signal 2. Furthermore, when the LDTN is executed, immediate data (one of them) is input to the IMM 2, and the output data of the fixed value selection circuit and the IMM 2 are compared by the comparator 80. If they match, the output of the comparator 80 of the processor element 3 becomes “1”, and the processor element 3 that does not match becomes “0”. The following instruction is an example of an instruction to be executed.
[0072]
LDTN / T2 # 3, # 2
[0073]
In the above instruction, the first operand of the immediate operands is a selected fixed value. The second operand is a comparison value.
・ PE0, PE1, PE2, PE3, PE4, PE5, PE6, PE7, ...
The resulting value is
0, 1, 2, 0, 1, 2, 0, 1,. . .
It is a repetitive value like
・ PE2, PE5, PE8, ...
"1" is set in the T2 flag of the T register 54. “0” is set in the T2 flag of the T register 54 of another PE. As a result, if the T2 flag of the T register 54 is used for execution control in subsequent instructions, only the processor element 3 having a PE number starting with PE2 and having every other “3” PE number can be executed.
[0074]
<< Seventh Embodiment >>
A SIMD type microprocessor 2 according to a seventh embodiment of the present invention will be described.
[0075]
In each processor element 3 of the SIMD type microprocessor 2 according to the seventh embodiment, the (data) value connected to the selection circuit 35 from the n-th bit pattern data output circuit according to the third embodiment is selected. The comparator 80 (FIG. 7) provided in the circuit 35 compares it with the immediate data specified in the instruction code, and stores the result in a bit at a predetermined position of the T register 54. By storing the result in the T register 54, the control is executed so that only the processor element 3 having a PE number every specific numerical value such as 2, 3, 5, 7, 11 is executed in the operation after the next instruction. It can be performed.
[0076]
The output of the n-bit pattern data output circuit shown in FIG. 6 is output to the selection circuit 35 of FIG. 7, and the selection circuit 35 executes an “LDTB” (Load to T register Bit Number) instruction (prepared in advance). As a result, the output data of the n-th bit pattern data output circuit is selected by the input data selection signal 2. Further, when the LDTB instruction is executed, immediate data (one of them) is input to the IMM 2, and the output data of the n-th bit pattern data output circuit is compared with the IMM 2 by the comparator 80. If they match, the output of the comparator of the processor element 3 becomes “1”, and the processor element 3 that does not match becomes “0”. The following instruction is an example of an instruction to be executed.
[0077]
LDTB / T2 # 00000010b, # 1
[0078]
In the above instruction, the first operand of the immediate operand is designated every n bits. The second operand is a comparison value.
・ PE0, PE1, PE2, PE3, PE4, PE5, PE6, PE7, ...
The resulting value is
1, 0, 0, 1, 0, 0, 1, 0,. . .
It is a repetitive value like
・ PE0, PE3, PE6, ...
"1" is set in the T2 flag of the T register 54. “0” is set in the T2 flag of the T register 54 of another PE. As a result, if the T2 flag of the T register 54 is used for execution control in subsequent instructions, only the processor element 3 having a PE number starting from PE0 and having every other “3” PE number can be executed.
[0079]
【The invention's effect】
By using the present invention, it is possible to reduce the number of instruction execution cycles of instructions associated with image data processing. Therefore, the circuit that needs to be added is only a simple one.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a schematic configuration of a general SIMD type microprocessor;
FIG. 2 is a block diagram showing a more detailed configuration of a SIMD type microprocessor according to the present invention.
FIG. 3 is a block diagram illustrating a function of a multiplexer that connects a register of a register file and an arithmetic array;
FIG. 4 is a circuit diagram of a PE number mask circuit according to the first embodiment of the present invention.
FIG. 5 is a circuit diagram of a fixed value selection circuit according to a second embodiment of the present invention.
FIG. 6 is a circuit diagram of an n-th bit pattern data output circuit according to a third embodiment of the present invention.
FIG. 7 shows a schematic circuit block diagram of a selection circuit according to the present invention.
FIG. 8 is an example of a circuit block diagram of an arithmetic control register (T register).
[Explanation of symbols]
2 ... SIMD type microprocessor, 3 ... processor element, 4 ... global processor, 6 ... register file, 8 ... arithmetic array, 34 ... register, 35 ... arithmetic circuit, 36 ... ALU (arithmetic logic unit), 38 ... A register, 50, 50 ', 50 "... connection line part, 51, 51" ... logic circuit part, 51' ... multiplexer 54, T register (operation control register), 78, multiplexer, 80, comparator.

Claims

A global processor that performs overall control, in SIMD microprocessor including a plurality of processor elements,
Each processor element has an operation control register having a value for determining whether or not to store the operation result of the arithmetic logic unit included in the processor element in a predetermined general-purpose register,
Each processor element is given an integer number for identification, in order,
It has a connecting line part that forms an integer number for the above identification with VCC and GND,
In addition,
The integer number for identification and the numerical value specified by the instruction code in the global processor are simultaneously compared across all the processor elements, and the comparison result is set in the flag included in the arithmetic control register of each processor element. SIMD type microprocessor characterized by the above.

A global processor that performs overall control, in SIMD microprocessor including a plurality of processor elements,
Each processor element has an operation control register having a value for determining whether or not to store the operation result of the arithmetic logic unit included in the processor element in a predetermined general-purpose register,
Each processor element is given an integer number for identification, in order,
In each processor element
A connection line part that forms an integer number for identification with VCC and GND, and a logic circuit part that captures a mask signal for the identification number for identification constitutes a mask circuit for the identification number for identification, and the connection Each of the bit signals constituting the line part and each bit of the mask signal is an input of an individual logic circuit constituting the logic circuit part,
Furthermore,
With the mask signal specified by the global processor, the logic circuit part of each processor element masks and outputs the bit specified by the mask signal of the integer number for identification with the contents of the mask signal,
Accordingly, in accordance with the reference number for identification for each processor element, the combination of the portion that is output with the mask in the output of the logic circuit portion and the portion that is output without being masked is the cycle Rules,
In addition,
The data generated by the mask circuit of each processor element described above and the numerical value specified by the instruction code in the global processor are simultaneously compared across all the processor elements, and the comparison result is stored in the arithmetic control register of each processor element. A SIMD type microprocessor characterized in that the flag is set to an included flag.

A global processor that performs overall control, in SIMD microprocessor including a plurality of processor elements,
Each processor element has an operation control register that determines whether or not to store the operation result of the arithmetic logic unit included in the processor element in a predetermined general-purpose register,
Each processor element is given an integer number for identification, in order,
Each processor element has a connection line part formed by VCC and GND, and a multiplexer part connected to the connection line part for receiving an input value selection signal designated by a global processor,
The connection line portion is divided into a plurality of partial connection line portions, the division being common and fixed throughout the processor element, each partial connection portion having one or a plurality of bit signal portions,
Each partial connection line portion is formed so as to output a bit signal that varies periodically by being connected to VCC and GND, corresponding to an integer number for identification sequentially attached to the processor element,
Furthermore,
In response to an input value selection signal designated by the global processor, the multiplexer unit selects a common connection line part that constitutes the connection line part of each processor element and outputs the selected partial connection part. Output signal,
Thereby, corresponding to the serial number for identification for each processor element, the output of the multiplexer unit has a periodic rule,
In addition,
The output value having a periodic rule generated in the multiplexer unit of each processor element is compared with the numerical value specified by the instruction code in the global processor simultaneously over all the processor elements, and the comparison result is compared with each processor element. A SIMD type microprocessor characterized in that it is set in a flag included in the arithmetic control register.

A global processor that performs overall control, in SIMD microprocessor including a plurality of processor elements,
Each processor element has an operation control register that determines whether or not to store the operation result of the arithmetic logic unit included in the processor element in a predetermined general-purpose register,
Each processor element is given an integer number for identification, in order,
In each processor element
The connection line portion and the logic circuit portion that captures the mask signal constitute a bit pattern data output circuit, and each of the bit signals constituting the connection line portion and each bit of the mask signal constitutes a logic circuit portion. It is an input to an individual logic circuit,
The connection line part is divided into individual connection lines, this division being common and fixed throughout the processor element,
Each individual connection line is formed to output a periodically varying bit signal by connection of VCC or GND in correspondence with an integer number for identification sequentially given to the processor elements.
Furthermore,
With the mask signal specified by the global processor, the logic circuit portion of each processor element masks the specified bit with the content of the mask signal and outputs it,
Accordingly, in accordance with the reference number for identification for each processor element, the combination of the portion that is output with the mask in the output of the logic circuit portion and the portion that is output without being masked is the cycle Rules,
Furthermore,
The data generated by the bit pattern data output circuit of each processor element and the numerical value specified by the instruction code in the global processor are simultaneously compared across all the processor elements, and the comparison result is controlled by the arithmetic operation of each processor element. A SIMD type microprocessor characterized in that a flag included in a register is set.