JP3821198B2

JP3821198B2 - Signal processing device

Info

Publication number: JP3821198B2
Application number: JP31480699A
Authority: JP
Inventors: 和彦岩永; 慎一山浦; 和彦原; 貴雄片山; 浩資高藤
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1999-11-05
Filing date: 1999-11-05
Publication date: 2006-09-13
Anticipated expiration: 2019-11-05
Also published as: JP2001134538A

Description

【０００１】
【産業上の利用分野】
この発明は一つの演算命令により複数の画像データ等を並列処理するＳＩＭＤ（ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎＳｔｒｅａｍＭｕｌｔｉｐｌｅＤａｔａＳｔｒｅａｍ）型プロセッサを用いた信号処理装置に関し、例えばデジタルコピーなどの画像処理に用いて好適な信号処理装置に関するものである。
【０００２】
【従来の技術】
近年、デジタル複写機やファクリミリ装置等において、画素数を増加させたり、或いはカラー対応にするなど画像の向上が図られている。そして、この画像の向上に伴い、処理すべきデータ数が増加している。ところで、複写機などにおけるデータ処理は全ての画素に対して同じ演算処理を施すことが多い。そこで、１つの命令で複数のデータに対して同時に同じ演算処理を行うＳＩＭＤ方式のプロセッサが用いられるようになっている。
【０００３】
通常、ＳＩＭＤ方式のプロセッサを用いて画像処理を行う場合、主走査方向にプロセッサエレメント（ＰＥ）を展開する。このため、フィルター処理などの画像処理を行う場合、注目画素の上下の参照画素が必要になり、前のラインの画素データをラインディレイさせて、ラインメモリに格納しておくことが考えられる。
【０００４】
図１６に、ＳＩＭＤ方式を用いた画像処理装置の概略ブロック図を示す。図に示すように、この画像処理装置は、画像データが格納される外部画像メモリ（ＲＡＭ）１０１、例えば、１０２４個のプロセッサエレメントからなるプロセッサエレメントブロック１０３、グローバルプロセッサ１０４を備える。
【０００５】
各プロセッサエレメント（ＰＥ）は、ｎ個の汎用レジスタ（Ｒ０〜Ｒｎ−１）と演算アレイを有し、汎用レジスタは、通常演算アレイの外部にレジスタファイルとして持つ形を取っている。それぞれのレジスタはシフトレジスタとしてシリアルポートを介して外部とアクセス可能となっている。図において、斜線部が単位プロセッサエレメントになる。
【０００６】
グローバルプロセッサ１０４はプログラムメモリ（ＰＲＡＭ）１０５のアドレス生成手段を有し、ＰＲＡＭ１０５よりプロセッサエレメントブロック１０３のプロセッサエレメントに与える命令コードをリードし、演算アレイ及びレジスタファイルの制御を行っている。
【０００７】
各ＰＥの汎用レジスタのシリアルポートに接続されるデバイスはシステム構成によって異なるが、２ライン分のラインディレイのみが必要なシステムでは、例えば、Ｒ０に２ライン前の画素データ、Ｒ１に１ライン前の画素データ、Ｒ２に現ライン画素データを配置するとして、Ｒ０とＲ１のシリアルポートにシリアルパラレル変換器１０２を配置して、外部ＲＡＭ１０１に接続するといった構成をとっている。
【０００８】
上記のような構成をもち、画像処理装置を実現している例としては、例えばＳＶＰ（ＳＥＲＩＡＬＶＩＤＥＯＰＲＯＣＥＳＳＯＲ）が知られている。
【０００９】
従来のＳＩＭＤ方式のプロセッサは、殆どが主走査方向の画素数以上のプロセッサエレメントを持っているため、外部メモリ（ＲＡＭ）１０１に対する制御には難しい処理を必要としない。また、デジタルコピーなど画像処理において拡大、縮小などの変倍機能を実現するためには、別途ＡＳＩＣを外付けするか、特開平０８−１２３６８３号公報（ＩＰＣ：Ｇ０６Ｆ９／３８）に記載されているようなＳＩＭＤ方式のプロセッサ内部に変倍制御用のフラグを持つといった構成を取ることで実現できる。
【００１０】
【発明が解決しようとする課題】
しかしながら、画像処理の精度は近年ますます向上しており、主走査方向の画素数は増加する傾向にある。また、従来のＳＩＭＤ型プロセッサにおいて、プロセッサエレメント数の多いものは１ビットの演算アレイを使用するなど、小規模の回路を用いて回路規模の増大を防止しているのが普通である。
【００１１】
デジタルコピーなどにＳＩＭＤ型プロセッサを応用しようとした場合に、例えば６００ＤＰＩ（ＤｏｔＰｅｒＩｎｃｈ）の精度でＡ４のサイズの画像を扱う場合、７０００画素以上のプロセッサエレメント数が必要となり、単純にプロセッサエレメント数を増やすのは現実的ではない。そこで、これを解決するために、主走査方向を分割して処理を行うことが考えられるが、ラインメモリとしてデュアルポートメモリ（ＲＡＭ）が必要になり回路規模が増加するという難点がある。
【００１２】
一方、特開平１０−３２６２５８号公報（ＩＰＣ：Ｇ０６Ｆ１５／１６）においては、主走査方向の画素を２分割して、全プロセッサエレメント数を半分に分割してパイプライン処理を行うことで処理時間の短縮を可能にしたデータ演算システムが提案されている。しかしながら、この方法では、プロセッサエレメントとアクセス可能なデータメモリのサイズを越えるような画素数の処理を行うことが出来ない。
【００１３】
さらに、プロセッサエレメントに変倍制御用のフラグを持つ構成では、分割数が増加した場合に処理が複雑になるため、ＳＩＭＤ型プロセッサの外部で変倍処理を実現するほうが望ましい。しかし、別途ＡＳＩＣを用いて変倍機能を実現すると、プロセッサの汎用性を減少させてしまうことになる。
【００１４】
そこで、本発明では、シングルポートメモリ（ＲＡＭ）を用いることで回路規模の増加を抑えながら、同時に変倍機能も内蔵可能な簡単な構成のＳＩＭＤ型プロセッサを用いた信号処理装置を提供することを目的とする。
【００１５】
【課題を解決するための手段】
この発明は、データを演算処理する演算手段及び当該演算手段で演算処理されるデータを保持するとともに当該演算手段で演算処理されたデータを保持するデータ保持手段を備えるＳＩＭＤ型プロセッサのプロセッサエレメントと、複数の前記プロセッサエレメントそれぞれに接続されるデータ転送バスと、前記複数のプロセッサエレメントに割り付けられるアドレスに基づき、所定のプロセッサエレメントを指定する指定手段と、この指定手段にアドレスを供給するアドレスバスと、前記複数のプロセッサエレメントが内蔵しているデータ保持手段にプロセッサ外部からアクセスするためのデータ転送用インタフェースと、このデータ転送インタフェースに接続され、前記アドレスバスに供給される前記所定のプロセッサエレメントを指定するためのアドレスを生成するとともに、外部メモリに格納されたデータを読み出して、プロセッサエレメントへデータの書き込みを行うと共に前記プロセッサエレメントからデータを読み出して、前記メモリにデータの書き込みを行うメモリコントローラと、を備え、前記メモリコントローラは、前記データ転送用インタフェースを介して、各プロセッサエレメントのデータ保持手段から前記メモリにデータ転送する場合に、転送を開始するプロセッサエレメントのアドレスと転送を終了するプロセッサエレメントのアドレスを指定し、所定数のデータを除いて、各プロセッサエレメントから演算済みのデータの読み出しを行い、転送を開始するプロセッサエレメントのアドレスに対応するポインタ値を設定し、このポインタ値に基づき前記メモリにこのデータの書き込みを行うと共に、前記メモリからデータ転送用インタフェースを介して各プロセッサエレメントのデータ保持手段にデータ転送する場合に、前段の処理において除いた所定数のデータ数に相当する分のアドレスを戻したポインタ値を設定し、このポインタ値に基づき、前記メモリからデータを読み出し、前記ポインタ値に対応して前記アドレスバスに供給される前記所定のプロセッサエレメントを指定するためのアドレスを生成して前記プロセッサエレメントへデータの書き込みを行い、データを分割してＳＩＭＤ型プロセッサで演算処理を行うことを特徴とする。
【００１６】
上記のように構成することで、プロセッサエレメントの数が主走査方向の画素数より少ない場合においても、容易にＳＩＭＤプロセッサでの演算が行え、フィルタ処理などの重み付け演算を行う場合に有効な画素のみを転送できる。
【００１７】
前記メモリコントローラは、メモリへの書き込み転送と読み込み転送とを時分割で行うように制御するとよい。
【００１８】
このように構成することで、シングルポートメモリをＦＩＦＯもしくはＬＩＦＯメモリとして用いることが可能となり、シングルポートメモリを用いながらラインメモリを実現できる。
【００１９】
また、前記メモリコントローラは、データ転送用インタフェースを介して、各プロセッサエレメントのデータ保持手段から前記メモリにデータ転送する場合に、転送を開始するプロセッサエレメントのアドレスと転送を終了するプロセッサエレメントのアドレスを指定するレジスタを備えるように構成すればよい。たとえば、プロセッサエレメントのアドレスカウンタに初期値ロード機能を付加し、プロセッサエレメントのアドレスを基準としたオフセット値の設定ができるように構成すればよい。
【００２０】
さらに、前記メモリコントローラは、前記メモリからデータ転送用インタフェースを介して各プロセッサエレメントのデータ保持手段にデータ転送する場合に、転送を終了するプロセッサエレメントのアドレスと、転送終了後にメモリのリードポインタを戻すためのプロセッサエレメントのアドレスを指定するレジスタを備えるように構成することができる。例えば、プロセッサエレメントのアドレスと設定値との比較器を設け、設定値と等しいプロセッサエレメントアドレスのレジスタに転送されたデータが格納されていたメモリのリードポインタをリロードするように構成すればよい。
【００２１】
上記のように構成することで、主査方向の画素数よりも少ないプロセッサエレメント数のＳＩＭＤ型プロセッサを用いた場合に、フィルタ処理などの重み付け演算を行う場合に有効な画素のみを転送できる。
【００２２】
また、前記メモリコントローラは、メモリの任意のアドレス領域の下限値と上限値をレジスタで設定して、その領域をリング状に使用するように構成することができる。例えば、メモリのポインタの下限値、上限値レジスタを設け、それぞれ比較器を持ち、条件が揃った場合にそれぞれのレジスタの値をアドレスバスに出力できるように構成すればよい。
【００２３】
上記のように構成することで、複数のステップを持つような画像処理の場合に、処理ステップごとにデータを保管することができる。
【００２４】
また、前記データ転送用インタフェースは、偶数のアドレスを持つプロセッサエレメントと奇数のアドレスを持つプロセッサエレメントのデータ保持手段に同時にアクセスすることが可能になるように構成するとよい。たとえば、ＳＩＭＤ型プロセッサのデータ転送ポートを偶数のアドレスを持つプロセッサエレメントと奇数のアドレスを持つプロセッサエレメントのデータ保持手段にアクセスするための２つの独立したポートを持つ構成とし、１サイクルで２プロセッサエレメント分のデータを処理できるように構成すればよい。
【００２５】
さらに、メモリとの入出力バスのビット幅がプロセッサエレメントとのデータ転送ビット幅よりも広くするとよい。たとえば、メモリとの入出力バッファを４プロセッサエレメント分持つことでメモリのビット幅を４プロセッサエレメント分の幅に構成すればよい。
【００２６】
上記のように構成することで、全プロセッサエレメントのデータ保持手段へデータ転送する時間を半分にすることができる。また、メモリのアクセスタイムに余裕を持たせることができる。
【００２７】
また、この発明は、各プロセッサエレメントのデータ保持手段からデータ転送用インタフェースを介してメモリにデータ転送する場合に、このデータ転送と同期を取った外部からのライト制御信号が”０”である場合にデータを書き込み、”１”である場合にデータの書き込みを禁止するように構成することができる。たとえば、同期信号を外部よりシーケンスユニットに入力し、ライト転送を開始するタイミングを制御することで、ライト制御信号とライト転送との同期を取ることができるようにし、またライト制御信号の値によって、プロセッサエレメントのデータ転送ポートからリードされたデータを整形してＲＡＭにライトするためのライトバッファ部の制御を変えることによって、ライト制御信号に応じた転送を行えるように構成すればよい。
【００２８】
さらに、前記メモリからデータ転送用インタフェースを介して各プロセッサエレメントのデータ保持手段にデータ転送する場合に、このデータ転送と同期を取った外部からのリード制御信号が”０”である場合には、メモリからデータリードした値をデータ転送用インタフェースに書き込み、”１”の場合には、その転送の前で最後に外部信号が”０”であった場合に転送されたデータと同じ値をデータ転送用インタフェースに書き込むように構成することができる。たとえば、同期信号を外部よりシーケンスユニットに入力し、リード転送を開始するタイミングを制御することで、リード制御信号とリード転送との同期を取ることができるようにし、またリード制御信号の値によって、ＲＡＭからリードされたデータを整形してプロセッサエレメントのデータ転送ポートにライトするためのリードバッファ部の制御を変えることによって、リード制御信号に応じた転送を行えるように構成すればよい。
【００２９】
上記のように構成することで、デジタル画像処理における変倍処理をメモリコントローラで実現できるため、ＳＩＭＤ型プロセッサ自体の汎用性と回路規模を保つことができる。
【００３０】
また、この発明は、前記メモリには画像データが格納され、前記ＳＩＭＤ型プロセッサのプロセッサエレメントの数は主走査方向の画素数よりも少なく構成され、前記メモリコントローラは主走査方向の全画素を２つ以上に分割して処理を行うように、データの書き込み及び読み込みの処理の制御を行うように構成することができる。
【００３１】
上記のように構成することで、デジタルコピーなどの主走査方向の画素数が極めて多い画像処理装置を実現する場合に、プロセッサエレメント数を増減するなどのＳＩＭＤ型プロセッサのアーキテクチャそのものを変更することなく画像処理装置を構築することができる。
【００３２】
【発明の実施の形態】
以下、この発明に係るＳＩＭＤ型プロセッサ１の実施の形態を図面を参照して説明する。
【００３３】
まず、この発明にかかるＳＩＭＤ型プロセッサの全体構成について、図１に従い説明する。この発明のＳＩＭＤ型プロセッサ１は、図１に示すように、グローバルプロセッサ２、本実施形態では１０２４組の後述するプロセッサエレメント３ａからなるプロセッサエレメントブロック３、メモリコントローラ５と接続される外部インターフェース４から構成される。メモリコントローラ５はグローバルプロセッサ２の命令に基づき、シングルポートメモリ（ＲＡＭ）で構成された外部画像メモリ６から演算対象となる画像データをプロセッサ内部の入出力用のレジスタフィル３１に与えるとともに、演算処理されたデータをレジスタファイル３１から画像メモリ（ＲＡＭ）６へ転送するものである。
【００３４】
グローバルプロセッサ２は、図２に示すように、プロセッサエレメントブロック３、外部インタフェース４及びメモリコントローラ５を制御するためのプログラムが格納されたプログラムＲＡＭ２１、及びこのプログラムＲＡＭ２１に基づきグローバルプロセッサ２、プロセッサエレメントブロック３、外部インタフェース４、メモリコントローラ５を制御するシーケンスユニット２２を備える。具体的には、このシーケンスユニット２２は、グローバルプロセッサ２に備えられている後述する算術論理演算器２３（以下、「ＡＬＵ２３」という。）等を制御する。
【００３５】
また、このシーケンスユニット２２は、プロセッサエレメントブロック３を構成するレジスタファイル３１、及び演算アレイ３６を制御する。この演算アレイ３６は、マルチプレクサ３２、シフト拡張回路３３、算術論理演算器３４（以下、「ＡＬＵ３４」という）、及びレジスタ３５を備える。なお、このグローバルプロセッサ２は、いわゆるＳＩＳＤ型であり、一つの演算命令に対して一つの演算処理を行うものである。
【００３６】
さらに、このシーケンスユニット２２は、メモリコントローラ５に対してデータ転送のための動作設定用データ及びコマンド等を送る。メモリコントローラ５は、シーケンスユニット２２の動作設定用データ及びコマンドに基づき、プロセッサエレメント３ａのアドレス指定のためのアドレス制御信号、プロセッサエレメント３ａを構成するレジスタ３１ｂにデータのリード／ライトを指示するためのリード／ライト制御信号、クロック信号を与えるためのクロック制御信号を外部インタフェース４に与える。
【００３７】
ここで、リード／ライト制御信号のうちライト制御信号とは、演算処理されるデータをデータバス４６ａ（４６ｂ）より取得して、プロセッサエレメント３ａのレジスタ３１ｂに保持させるための信号をいう。一方、リード／ライト制御信号のうちリード制御信号とは、プロセッサエレメント３ａのレジスタ３１ｂが保持している演算処理されたデータを、データバス４６ａ（４６ｂ）へ与えるようにレジスタ３１ｂに指示するための信号をいう。
【００３８】
この実施の形態におけるプロセッサエレメントブロック３は、隣り合う２つのプロセッサエレメント３ａに偶数番号、奇数番号を割り付けて１組とすると共に、この１組のプロセッサエレメント３ａには同一のアドレスを割り付けている。
【００３９】
メモリコントローラ５は、グローバルプロセッサ２からのコマンドを受けて、プロセッサエレメントブロック３を構成するプロセッサエレメント３ａのアドレスを指定する信号（以下、「アドレス指定信号」という。）を作成し、外部インターフェース４からアドレスバス４１ａを介してプロセッサエレメント３ａのレジスタコントローラ３１ａヘ送る。また、メモリコントローラ５は、後述するように、プロセッサエレメント３ａを構成するレジスタ３１ｂに対して、データのリード／ライトを指示するための信号（以下、「リード／ライト指示信号」という。）を、リード／ライト信号線４５ａ（４５ｂ）を介してプロセッサエレメント３ａの後述するレジスタコントローラ３１ａヘリード／ライト信号が与えられる。偶数用リード／ライト信号線４５ａは、偶数のプロセッサエレメント３ａにリード／ライト信号を与え、奇数用リード／ライト信号線４５ｂは、奇数のプロセッサエレメント３ａにリード／ライト信号を与える。
【００４０】
また、メモリコントローラ５は、外部インタフェース４からクロック信号線４１ｃを介してプロセッサエレメント３ａの後述するレジスタコントローラ３１ａへクロック信号を与える。
【００４１】
さらに、メモリコントローラ５は、上述したように、ＳＩＭＤ型プロセッサ１の外部に設けられた画像メモリ６に格納されているデータを、本実施形態では１６ビットのパラレルデータとして、外部インタフェース４に与える。この１６ビットのデータは、偶数番号が割り付けられたプロセッサエレメント３ａに与えられる８ビットと、奇数番号が割り付けられたプロセッサエレメント３ａに与えられる８ビットとから構成されている。それぞれ８ビットデータは偶数用データバス４６ａ及び奇数用データバス４６ｂに与えられる。この８ビットのパラレルデータについては、データに応じて適宜変更しても問題ない。このデータバス４６ａ，４６ｂは、レジスタ３１ｂに保持されている演算処理されたデータが、ＳＩＭＤ型プロセッサ１の外部に設けられた画像メモリ６に送られる時にも使用される。
【００４２】
なお、画像メモリ６は演算処理されるデータを格納するとともに、演算処理されたデータを格納するものであり、これらの画像メモリ６はＳＩＭＤ型プロセッサ１の内部に設けても問題ない。また、メモリコントローラ５と画像メモリ６との間のデータ転送についても、本実施の形態では、後述するように、３２ビットのパラレルデータとして転送されるものとして扱うが、データに応じて適宜変更しても問題ない。なお、メモリコントローラ５が行うその他の動作については後述する。
【００４３】
また、グローバルプロセッサ２は、上記シーケンスユニット２２からの命令により、算術論理演算を行うＡＬＵ２３、演算データを格納するデータＲＡＭ２４を備える。さらに、グローバルプロセッサ２は、演算処理されるデータ等を保持するためのレジスタ群２５を備える。
【００４４】
このレジスタ群２５は、プログラムのアドレスを保持するプログラムカウンタＰＣ、演算処理のデータ格納のための汎用レジスタであるＧ０〜Ｇ３レジスタ、レジスタ待避、復帰時に待避先データＲＡＭのアドレスを保持しているスタックポインタ（ＳＰ）、サブルーチンコール時にコール元のアドレスを保持するリンクレジスタ（ＬＳ）、同じくＩＲＱ時とＮＭＩ時の分岐元アドレスを保持するＬＩ、ＬＮレジスタ、プロセッサの状態を保持しているプロセッサステータスレジスタ（Ｐ）を内蔵している。
【００４５】
また、レジスタ群２５は、プロセッサエレメントブロック３の後述するレジスタ３５に接続されており、このレジスタ３５との間でシーケンスユニット２２の制御によりデータの交換が行われる。
【００４６】
プロセッサエレメントブロック３は、図１及び図２に示すように、レジスタファイル３１、マルチプレクサ３２、シフト・拡張回路３３、算術論理演算器３４（以下、「ＡＬＵ３４」という。）、レジスタ３５、を一単位とする複数のプロセッサエレメント３ａを備える。レジスタファイル３１には、１つのプロセッサエレメント３ａ単位に８ビットのレジスタが３２本内蔵されており、本実施形態では１０２４のプロセッサエレメント分の組がアレイ構成になっている。レジスタファイル３１は１つのプロセッサエレメント（ＰＥ）３ａごとにＲ０、Ｒ１、Ｒ２、．．．Ｒ３１と呼ばれているレジスタが内蔵されている。それぞれのレジスタファイル３１は演算アレイ３６に対して１つの読み出しポートと１つの書き込みポートを備えており、８ビットのリード／ライト兼用のバスで演算アレイ３６からアクセスされる。３２本のレジスタの内、２４本はプロセッサ外部からアクセス可能であり、外部からクロックとアドレス、リード／ライト制御を入力することで任意のレジスタを読み書きできる。
【００４７】
レジスタの外部からのアクセスは１つの外部ポートで各プロセッサエレメント３ａの１つのレジスタがアクセス可能であり、外部から入力されたアドレスでプロセッサエレメントの番号（０〜１０２３）を指定する。したがって、レジスタアクセスの外部ポートは全部で２４組搭載されている。また、外部からのアクセスされるデータは上述したように、偶数のプロセッサエレメント３ａと奇数のプロセッサエレメント３ａの１組で１６ビットデータとなっており、１回のアクセスで２つのレジスタを同時にアクセスしている。
【００４８】
本実施形態では、プロセッサエレメント３ａの数を１０２４個として説明するが、これに限定されるものでなく適宜変更して使用してもよい。このプロセッサエレメント３ａには、グローバルプロセッサ２のシーケンスユニット２２により、外部インタフェース４に近い順に０から１０２３までのアドレスが割り付けられる。
【００４９】
プロセッサエレメント３ａのレジスタファイル３１は、レジスタコントローラ３１ａ、２種類のレジスタ３１ｂ、３１ｃを備える。本実施形態では、図３に示すように、一単位のプロセッサエレメント３ａ毎に、レジスタコントローラ３１ａとレジスタ３１ｂとを２４組備え、さらにレジスタ３１ｃを８個備えている。なお、図３では２組のプロセッサエレメント３ａにおけるレジスタファイル３１の一部を表しており、図３中の１プロセッサエレメントとは１つのプロセッサエレメント３ａを表している。ここで、本実施形態では、レジスタ３１ｂ、３１ｃを８ビットのものとして扱うが、これに限定されるものでなく適宜変更して使用してもよい。
【００５０】
レジスタコントローラ３１ａは、図３に示すように、外部インタフェース４と、上述したアドレスバス４１ａ、偶数用リード／ライト信号線４５ａ、奇数用リード／ライト信号線４５ｂ、クロック信号線４１ｃを介して接続されている。
【００５１】
外部インタフェース４は、メモリコントローラ５からアドレス制御信号を受けると、アドレス指定信号をアドレスバス４１ａを介してプロセッサエレメントブロック３ヘ送る。これにより、一組のプロセッサエレメント３ａ、即ち２つのプロセッサエレメント３ａが同時にアドレス指定される。レジスタコントローラ３１ａは、送られてきたアドレス指定信号をデコードし、デコードしたアドレスと、自己に割り付けられたアドレスとが一致する場合には、メモリコントローラ５からクロック信号４１ｃを介して送られてきたクロック信号に同期して、リード／ライト信号４５ａ或いは４５ｂを介してメモリコントローラ５から送られてきたリード／ライト指示信号を得る。具体的には、偶数番号が割り付けられているレジスタコントローラ３１ａは、偶数用リード／ライト信号４５ａを介してメモリコントローラ５から送られてきたリード／ライト指示信号を得る。一方、奇数番号が割り付けられているレジスタコントローラ３１ａは、奇数用リード／ライト信号４５ｂを介してメモリコントローラ５から送られてきたリード／ライト指示信号を得る。このとき一組を構成するプロセッサエレメント３ａのレジスタコントローラ３１ａへ送られるリード／ライト指示信号はそれぞれ異なるものであってもよい。即ち、偶数番号が割り付けられているレジスタコントローラ３１ａへ送られる指示信号がリード指示であるとき、奇数番号が割り付けられているレジスタコントローラ３１ａへ送られる指示信号はライト指示であってもよい。そして、このリード／ライト指示信号はレジスタ３１ｂに与えられる。
【００５２】
レジスタコントローラ３１ａから双方のプロセッサエレメント３ａに対し、ライト指示信号が送られてきた場合には、偶数番号が割り付けられたプロセッサエレメント３ａのレジスタ３１ｂは、演算処理されるデータ（８ビット）を偶数用データバス４６ａより取得して保持する。また、奇数番号が割り付けられたプロセッサエレメント３ａのレジスタ３１ｂは、演算処理されるデータ（８ビット）を奇数用データバス４６ｂより取得して保持する。一方、レジスタコントローラ３１ａから双方のプロセッサエレメント３ａに対し、リード指示信号が送られてきた場合には、偶数番号が割り付けられたプロセッサエレメント３ａのレジスタ３１ｂは、演算処理されたデータ（８ビット）を偶数用データバス４６ａへ送る。また、奇数番号が割り付けられたプロセッサエレメント３ａのレジスタ３１ｂは、演算処理されたデータ（８ビット）を奇数用データバス４６ｂへ送る。
【００５３】
このように、一度のアドレス指定により、偶数番号が割り付けられたプロセッサエレメント３ａにデータ転送できるとともに、奇数番号が割り付けられたプロセッサエレメント３ａにもデータ転送できる。このため、データの転送回数を少なくすることができ、データ転送を高速にできる。
【００５４】
レジスタ３１ｂは、後述するＡＬＵ３４でこれから演算される外部から入力されたデータを保持したり、或いはＡＬＵ３４で演算処理されたデータを外部へ出力するために保持するものであり、いわゆる入力レジスタとしても、或いは出力レジスタとしても機能する。また、演算処理されるデータ、或いは演算されたデータを一時的に保持するといった、後述するレジスタ３１ｃとしての機能も有する。なお、本実施形態では、レジスタ３１ｂは８ビットのデータを保持できるものとして扱うが、データに応じて適宜変更しても問題ない。上述したレジスタコントローラ３１ａからライト指示信号が与えられると、レジスタ３１ｂは演算処理されるデータをデータバス４６ａまたはデータバス４６ｂより取得して保持する。一方、レジスタコントローラ３１ａからリード指示信号が送られてくると、レジスタ３１ｂは保持している演算処理されたデータをデータバス４６ａまたはデータバス４６ｂへ与える。このデータは外部インタフェース４からメモリコントローラ５のライトバッファ部５４に与えられ、ライトバッファ部５４から画像メモリ６へ格納される。
【００５５】
また、レジスタ３１ｂは、本実施形態においては８ビットデータをパラレルで転送するデータバス３７を介してマルチプレクサ３２に接続されている。ＡＬＵ３４で演算処理されるデータ、或いはＡＬＵ３４で演算処理されたデータは、このデータバス３７を介して、レジスタ３１ｂとの間で転送される。この転送は、グローバルプロセッサ２のシーケンスユニット２２からの指示によって、グローバルプロセッサ２に接続されたリード信号線２６ａ、ライト信号線２６ｂを介して行われる。具体的には、グローバルプロセッサ２のシーケンスユニット２２から、リード信号線２６ａを介してリード指示信号が送られてくると、レジスタ３１ｂは保持している演算処理されるデータをデータバスへ置く。このデータはＡＬＵ３４へ送られ演算処理される。一方、グローバルプロセッサ２のシーケンスユニット２２から、ライト信号線２６ｂを介してライト指示信号が送られてくると、レジスタ３１ｂはデータバス３７を介して送られてきたＡＬＵ３４で演算処理されたデータを保持する。
【００５６】
レジスタ３１ｃは、レジスタ３１ｂより与えられた演算処理されるデータ、或いは演算されたデータがレジスタ３１ｂに与えられる前に、そのデータを一時的に保持するものである。このレジスタ３１ｃは、上述したレジスタ３１ｂと異なり、メモリコントローラ５を介して、画像メモリ６との間においてデータ転送はしない。
【００５７】
演算アレイ３６は、マルチプレクサ３２、シフト／拡張回路３３、１６ビットＡＬＵ３４及び１６ビットのレジスタ３５を備えている。このレジスタ３５には、１６ビットＡレジスタ、Ｆレジスタを内蔵している。
【００５８】
プロセッサエレメント３ａの命令による演算は、基本的にレジスタファイル３１から読み出されたデータをＡＬＵ３４の片側の入力としてもう片側にはレジスタ３５のＡレジスタの内容を入力として結果をＡレジスタに格納する。したがって、Ａレジスタとレジスタファイル３１のＲ０〜Ｒ３１レジスタとの演算が行われることとなる。レジスタファイル３１と演算アレイ３６との接続に（７ｔｏ１）のマルチプレクサ３２を置いており、プロセッサエレメント方向で左に１、２、３つ離れたデータと右に１、２、３つ離れたデータ、中央のデータを演算対象として選択している。また、レジスタファイル３１の８ビットのデータはシフト／拡張回路３３により任意ビットの左シフトしてＡＬＵ３４に入力される。さらに、図示していない８ビットの条件レジスタ（Ｔ）により、プロセッサエレメント３ａごとに演算実行の無効／有効の制御をしており、特定のプロセッサエレメント３ａだけを演算対象として選択できるように構成している。
【００５９】
上記したように、マルチプレクサ３２は、自己のプロセッサエレメント３ａに備えられた上記データバス３７に接続されるとともに、両隣３つのプロセッサエレメント３ａに備えられたデータバス３７にも接続されている。このマルチプレクサ３２は７つのプロセッサエレメント３ａから１つを選択し、その選択したプロセッサエレメント３ａにおけるレジスタ３１ｂ、３１ｃで保持されているデータをＡＬＵ３４へ送る。或いはＡＬＵ３４で演算処理されたデータを、選択したプロセッサエレメント３ａにおけるレジスタ３１ｂ、３１ｃへ送る。これによって、隣のプロセッサエレメント３ａにおけるレジスタ３１ｂ、３１ｃで保持されているデータを利用した演算処理が可能になり、ＳＩＭＤ型プロセッサ１の演算処理能力を高めることができる。
【００６０】
シフト／拡張回路３３は、マルチプレクサ３２から送られてきたデータを所定ビットシフトしてＡＬＵ３４へ送る。或いはＡＬＵ３４から送られてきた演算処理されたデータを所定ビットシフトしてマルチプレクサ３２へ送る。
【００６１】
ＡＬＵ３４は、シフト／拡張回路３３から送られてきたデータと、レジスタ３５に保持されているデータとに基づき算術論理演算を行う。なお、本実施形態では、ＡＬＵ３４は１６ビットのデータに対応できるものとして扱うが、データに応じて適宜変更しても問題ない。演算処理されたデータは、レジスタ３５に保持され、シフト／拡張回路３３へ転送されたり、或いはグローバルプロセッサ２の汎用レジスタ２５へ転送される。
【００６２】
グローバルプロセッサ２からメモリコントローラ５へはＩ／Ｏ用のアドレス、データ、コントロール信号がバスを介して与えられる。グローバルプロセッサ２がメモリコントローラ５のいくつかの動作設定レジスタ（図示せず）へ動作方法等のコマンドを設定している。最後にグローバルプロセッサ２は、メモリコントローラ５のスタートレジスタ（図示せず）にスタートコードを書き込むことで、メモリコントローラ５は自動的に設定に従った動作を行う。このように構成することで、プロセッサの命令制御による演算と同時にレジスタファイル３１のデータを入出力させることができる。
【００６３】
図４は、この発明に用いられるメモリコントローラ５の構成を示したものである。メモリコントローラ５は、画像メモリ６にデータライトを行うライトバッファ部５４と、画像メモリ６からデータリードを行うリードバッファ部５５と、プロセッサエレメントのレジスタファイル３１への制御を行っているＰＥ制御部５２、画像メモリ６への制御を行うＲＡＭ制御部５３、及びシーケンスユニット（ＳＣＵ）５１より構成されている。
【００６４】
メモリコントローラ５は、ＳＩＭＤ型プロセッサ１のレジスタファイル３１と外部インタフェース４内のデータ転送ポートを介して接続されていて、レジスタファイル３１から画像メモリ６へのデータ転送、画像メモリ６からレジスタファイル３１へのデータ転送を行っている。このデータ転送ポートは、出力ポートと入力ポートを備える。また、この実施の形態におけるメモリコントローラ５が制御するレジスタは、上述したように、Ｉ／Ｏ空間にマッピングされており、グローバルプロセッサ２からの指示に従い、アドレス、クロック、及びリード・ライト制御を出力することでリード、ライト可能となっている。
【００６５】
ライトバッファ部５４にはＳＩＭＤ方式プロセッサ１の外部インタフェース４の出力ポートが接続され、リードバッファ部５５には外部インタフェース４の入力ポートが接続される。データ転送ポートはそれぞれ偶数プロセッサエレメント用と奇数プロセッサエレメント用の入力、出力ポートを独立して有しており、１サイクルで一度に偶数、奇数の１組のプロセッサエレメント分のデータがアクセス可能に構成されている。また、ライトバッファ部５４、リードバッファ部５５と画像メモリ６間のデータバスは、それぞれ４プロセッサエレメント分のデータ幅で構成されており、１サイクルで一度に４プロセッサエレメント分のデータをアクセスできる。尚、この実施の形態においては、１プロセッサエレメント分のデータは８ビットとしている。また、外部インタフェース４とライトバッファ５４部及びリードバッファ部５５とのデータバスのビット幅は１６ビットで構成される。従って、メモリコントローラ５と画像メモリ６間のビット幅は３２ビットで構成される。
【００６６】
この結果、外部インターフェース４の外部インタフェース４のデータ転送ポートとメモリコントローラ５間の転送を２回行う間に、画像メモリ６とメモリコントローラ５間の転送を１回実行すればよいことになる。
【００６７】
メモリコントローラ５のライトバッファ部５４はＳＩＭＤ型プロセッサ１の外部インタフェース４より出力された画素データを２回取り込み、４個のプロセッサエレメント分のデータに整形した後、画像メモリ６に転送する動作を行っている。また、リードバッファ部５５は、画像メモリ６から読み出した４個のプロセッサエレメント分のデータを２回に分けて、ＳＩＭＤ型プロセッサ１の外部インタフェース４に転送する動作を行っている。
【００６８】
図５にプロセッサエレメント（ＰＥ）制御部５２の実施形態の概略ブロック図を示す。
【００６９】
図５に示すように、ＰＥ制御部５２はＳＩＭＤプロセッサ１の外部インタフェース４にプロセッサエレメント３ａのアドレス、クロック、リード／ライト制御信号を出力するものであり、ＳＩＭＤプロセッサ１における各プロセッサエレメント３ａのレジスタファイル３１のデータを読み書きすることを可能にしている。
【００７０】
このＰＥ制御部５２は、プロセッサエレメント（ＰＥ）アドレスカウンタ５２１、転送数カウンタ５２２、５２３、有効データ数カウンタ５２４とからなる。ＰＥアドレスカウンタ５２１はライト転送（プロセッサエレメント３４ａのレジスタファイル３１から画像メモリ６への転送）の場合は、転送開始時に”０”が初期ロードされ、リード転送（画像メモリ６からプロセッサエレメント３４ａのレジスタファイル３１への転送）の場合は、転送開始時に転送開始ＰＥアドレスレジスタ５２５に格納されている値が初期ロードされるアップカウンタであり、データ転送するプロセッサエレメントのアドレスを生成する。転送数カウンタ５２２，５２３は転送するデータ数を初期ロードできるダウンカウンタであり、転送数を管理するのに使用される。転送カウンタ５２２はライトするデータ数がセットされ、転送カウンタ５２３にはリードするデータ数がそれぞれ初期ロードされる。
【００７１】
有効データ数カウンタ５２４は、画像メモリ６に格納済みの有効なデータ数を管理するカウンタであり、画像メモリ６からプロセッサエレメント３ａのレジスタファイル３１にデータ転送する時に、転送数よりも有効データ数の方が多い時にのみ転送が実施されるようになっている。このように構成することで、データの欠落を防止している。
【００７２】
図６にメモリ（ＲＡＭ）制御部５３の第１の実施形態の概略ブロック図を示す。
【００７３】
図６に示すように、ＲＡＭ制御部５３は、ライトバッファ部５４、リードバッファ部５５からの制御線によって制御され、シングルポートメモリで構成される画像メモリ（ＲＡＭ）６へのクロック、アドレス、リード／ライト制御、バイトセレクトを出力する。
【００７４】
ＲＡＭ制御部５３は、ＲＡＭアドレス加減器５３１、ライトポインタレジスタ５３２、リードポインタレジスタ５３３、マルチプレクサ５３４とからなる。各ブロックはアドレス設定用バス（以降、ＡＢと略す）を介して接続されている。
【００７５】
ライトポインタレジスタ５３２は次に画像メモリ６にライトすべきポインタを格納しているレジスタであり、リードポインタレジスタ５３３は同様に次に画像メモリ６からリードすべきポインタを格納しているレジスタである。
【００７６】
ライトポインタレジスタ５３２、リードポインタレジスタ５３３は、そのポインタへの画像メモリ６のアクセスの後、ＲＡＭアドレス加減器５３１で更新されたポインタがＡＢ５３５から入力され格納される。ＲＡＭアドレス加減器５３１は、ライトアクセス時にはライトポインタ、リードアクセス時にはリードポインタの値にＲＡＭアクセスサイズに応じた数を、ＦＩＦＯ動作モード時は加算、ＬＩＦＯ動作モード時は減算した値をＡＢ５３５に出力する。
【００７７】
シーケンスユニット５１は、アドレスデコーダ、制御レジスタとからなり、グローバルプロセッサ２からＩ／Ｏ空間にマッピングされた制御レジスタをアクセスすることで、メモリコントローラ５全体の制御を行ったり、グローバルプロセッサ２からメモリコントローラ５の内部状態を監視することが可能になっている。また、ライト転送とリード転送の時分割もこのシーケンスユニット５１のブロックで行われる。また、ライトバッファ部５４へのクロックとリードバッファ部５５へのクロックとの２系統のクロックを生成することによってライト転送とリード転送を同時に行うことを可能にしている。
【００７８】
次に、主走査方向の画素数よりもプロセッサエレメント３ａの数が少ないＳＩＭＤ型プロセッサを用いた場合の画素データの転送方法について説明する。図７は主走査方向の画素数よりもプロセッサエレメント３ａの数が少ないＳＩＭＤ方式のプロセッサを用いた場合の画素データの転送方法についての説明図を示している。
【００７９】
主走査方向の画素数よりプロセッサエレメント３ａが少ない場合には、主走査方向の画素データを分割して、メモリコントローラ５は、ＳＩＭＤプロセッサ１のプロセッサエレメント３ａのレジスタファイル３１に画素データを画像メモリ６より与え、この処理を繰り返し実行している。すなわち、主走査方向の画素数よりもプロセッサエレメント数が少ない場合には、主走査方向の画素データを分割してＳＩＭＤプロセッサ１で演算処理を繰り返し実行している。
【００８０】
通常、ＳＩＭＤ型プロセッサを用いた画像処理ではフィルター処理などのように、注目画素前後の画素のデータを参照した処理を含む処理を実施するとＳＩＭＤの両端のデータに無効なデータが残る。このため、プロセッサエレメント３ａのレジスタファイルから画像メモリ６にデータを格納する際には、両端のデータを除いて格納する必要がある。また、画像メモリ６からプロセッサエレメント３ａのレジスタファイル３１にデータを転送する際には、注目画素前後の参照用の画素データも併せて転送する必要がある。つまり、フィルター処理などの重み付け処理における画素データの転送は以下の順序で行われることになる。ここでは、参照用画素の個数をａ個とする。
【００８１】
１．処理前画素データを画像メモリ６からＳＩＭＤプロセッサ１のプロセッサエレメント３ａの対応するレジスタファイル３１へ転送する。
【００８２】
２．ＳＩＭＤプロセッサ１による画像処理を行う。このとき、ＳＩＭＤプロセッサエレメント３ａの前後ａ個のデータは参照画素不在により無効にする。すなわち、図７における斜線部を施した部分が有効画素数になる。
【００８３】
３．処理後画素データをＳＩＭＤプロセッサ１から画像メモリ６に転送する。このとき、両端の前後ａ個分の画素を除いた有効画素が画像メモリ６に転送される。
【００８４】
図７を参照して、画像メモリ６とＳＩＭＤプロセッサ１間の画素データの転送につき説明する。ＳＩＭＤプロセッサ１のプロセッサエレメント（ＰＥ）３ａは、ｎ個、すなわち、ＰＥ０からＰＥｎ−１を備え、これらプロセッサエレメント３ａ…に画像メモリ６から画素データを送り、画像処理を行った後、これらプロセッサエレメント３ａ…から画像メモリ６に画素データが転送される。
【００８５】
まず、最初の転送、すなわち、図中１ＳＩＭＤ目の転送では、ＳＩＭＤプロセッサ１のプロセッサエレメント３ａ…前後に参照画素ａ個を併せて転送している。図７に示す例では、ＲＡＭ制御部５３は、まず、リードポインタ５３３に０を格納し、ＲＡＭアクセスサイズに応じた数、この実施の形態においては、３２ビット分のデータを画像メモリ６より読み出し、リードバッファ部５５へ格納する。そして、アドレス加減器５３１は、リードポインタの値にＲＡＭアクセスサイズに応じた数を、ＦＩＦＯ動作モード時は加算、ＬＩＦＯ動作モード時は減算した値をＡＢ５３５に出力し、その値がリードポインタ５３３に格納される。そして、ＰＥ制御部５２のＰＥアドレスカウンタ５２１により、アドレスが生成され、そのアドレスに基づき、外部インターフェース４から該当するプロセッサエレメント３ａ…にデータが書き込みされる。転送数カウンタ５２２は画像データの書き込みの度にＲＡＭサイズに応じた数、この実施の形態では４ずつデクリメントしてゆく。上記の処理はこの転送カウンタ５２２の値が０になるまで繰り返し実行され、ＳＩＭＤプロセッサ１のＰＥ０からＰＥｎ−１に画像メモリ６からの画素データが転送される。
【００８６】
処理後の画素データを画像メモリ６に転送する場合、前後それぞれａ個の画素データは無効なので、転送する画素データはＰＥ０からＰＥｎ−１までのｎ画素の内、ａ〜（ｎ−ａ−１）までの（ｎ−２×ａ）画素である。このため、ＰＥ制御部５２のＰＥアドレスカウンタ５２１には、転送開始時に転送開始ＰＥアドレスレジスタ５２５の値がロードされ、そのアドレスに基づき、該当するプロセッサエレメント３ａ…からデータが読み出され、外部インタフェース４からライトバッファ部５４に画像データが書き込みまれる。転送数カウンタ５２３には読み出される度にＲＡＭアクセスサイズに応じた数、この実施の形態では４ずつデクリメントされる。ライトバッファ部５４に３２ビット分のデータが格納されると、画像メモリ６に画像データが転送される。ＲＡＭ制御部５３は、まず、ライトポインタ５３２にａを格納し、ライトバッファ部５４に格納された画像データ、この実施の形態においては、３２ビット分のデータを画像メモリ６に転送する。アドレス加減器５３１は、リードポインタの値にＲＡＭアクセスサイズに応じた数を、ＦＩＦＯ動作モード時は加算、ＬＩＦＯ動作モード時は減算した値をＡＢ５３５に出力し、その値がライトポインタ５３２に格納される。このようにして、ＳＩＭＤプロセッサ１のＰＥａからＰＥｎ−２×ａまでの画像データが画像メモリ６に転送される。
【００８７】
続いて、２番目の転送が同様にして行われる。図中２ＳＩＭＤ目の転送では、（ｎ−ａ）〜（２ｎ−３×ａ）までの画素を注目画素として処理を行うため、参照画素として（ｎ−２×ａ）〜（ｎ−ａ−１）及び、（２ｎ−３×ａ）〜（２ｎ−２×ａ−１）の画素データを併せて送っている。データ処理後、（ｎ−ａ）〜（ｎ−２×ａ）の（ｎ−３×ａ）画素が画像メモリ６に格納される。主走査方向のＳＩＭＤ分割数が多い場合も同様の処理を繰り返すだけである。
【００８８】
以上の処理を実現するには、画像メモリ６へのデータライトの場合には、本発明のプロセッサエレメント（ＰＥ）制御部５２の実施形態において、転送開始ＰＥアドレスレジスタ５２５に”ａ”を設定し、転送数カウンタ５２２に”（ｎ−２×ａ）を設定すればよい。
【００８９】
設定数は全てＰＥアドレスを基準としているため、ＳＩＭＤごとに設定を変更する必要がない。また、画像メモリ６からのデータリードの場合には、リードポインタ５３３を転送終了後に戻すという操作が必要である。上記の例では（ｎ−２×ａ）に戻すことになる。ＲＡＭ制御部５３の打愛１の実施形態においては、転送が終了した後に、リードポインタ５３３の値をグローバルプロセッサ２から設定する必要がある。
【００９０】
また、画像メモリ６へのデータライトの場合には、ライトポインタ５３２の操作を行う必要はない。
【００９１】
図８は本発明におけるＲＡＭ制御部５３の第２の実施形態を示す概略ブロック図である。
【００９２】
図８に示すＲＡＭ制御部５３は、図６に示したＲＡＭ制御部５３に、さらにリードオフセット生成器５３６を設けたものである。このＲＡＭ制御部５３は、リードオフセット生成器５３６によりメモリ６からＳＩＭＤプロセッサ１のプロセッサエレメント３ａの各レジスタファイルへのデータ転送の際に、プロセッサエレメント３ａへの転送終了後にリードポインタ５３３の値を戻すことができるように変更している。
【００９３】
リードオフセット生成器５３６は、ＰＥ制御部５２より入力されたＰＥアドレスを監視し、ＰＥアドレスが設定された値と等しくなると、その時のリードポインタ５３３の値を保持し、設定された転送数を送り終えるまで転送を継続する。転送が終了すると、保持しておいた値をリードポインタ５３３にリロードする。上記図７で示す例では（ｎ−２×ａ−１）を設定することになる。設定値は全てＰＥアドレスを基準としているため、ＲＡＭ制御部５３の第１の実施形態のようにＳＩＭＤごとに設定を変更する必要がない。
【００９４】
図９は本発明におけるＲＡＭ制御部５３の第３の実施の形態を示す概略ブロック図である。
【００９５】
図９に示すＲＡＭ制御部５３は、上記処理に加えて、画像メモリ６の特定の領域のみをリング状に使用することを可能にしたものである。
【００９６】
このＲＡＭ制御部５３は、図８に示したＲＡＭ制御部にさらに、２つのレジスタ５３７、５３８及び比較器５３９を設けたものである。レジスタ（ＬＡＤＤＲ）５３７は画像メモリ６のアドレスの下限値を設定し、レジスタ（ＵＡＤＤＲ）５３８は画像メモリ６のアドレスの上限値を設定する。そして、比較器５３９はそれぞれ現在のポインタ５３２（５３３）とＬＡＤＤＲ５３７、ＵＡＤＤＲ５３８に設定された値との比較を行う。
【００９７】
ＦＩＦＯモード時に、ＵＡＤＤＲ５３８とポインタ５３２（５３３）が一致していて、かつアドレス加減器５３１より出力される値がＵＡＤＤＲ５３８を越える場合に、アドレス加減器５３１よりＡＢ５３５への出力をネゲートし、ＬＡＤＤＲ５３７よりＡＢ５３５への出力をアサートする。
【００９８】
ＬＩＦＯモード時は逆に、ＬＡＤＤＲ５３７とポインタ５３２（５３３）が一致し、かつアドレス加減器５３１より出力される値がＬＡＤＤＲ５３７を下回る場合に、アドレス加減器５３１よりスＡＢ５３５への出力をネゲートし、ＵＡＤＤＲ５３８よりＡＢ５３５への出力をアサートする。上記のように構成することで、画像メモリ６のメモリ空間の特定空間だけを用いることができるようになる。
【００９９】
図１０にこの発明の実施の形態にかかるライトバッファ部５４及びリードバッファ部５５の概略ブロック図を示す。
【０１００】
この実施の形態におけるライトバッファ部５４及びリードバッファ部５５は、偶数プロセッサエレメント３ａ、奇数プロセッサエレメント３ａの専用のポートを持っていて、それぞれ１サイクルで２プロセッサエレメント分のデータをアクセス可能に構成され、全プロセッサエレメント数の半分のサイクル数でデータ転送することが可能となっている。
【０１０１】
ライトバッファ部５４は４プロセッサエレメント分のデータがバッファに格納されると、画像メモリ６にライトアクセスを行うように構成されている。
【０１０２】
外部インタフェース４の転送ポートからライトバッファ部５４に与えられる２プロセッサエレメント分のデータは、まずフリップフロップ５４１，５４２に格納された後、次段のフリップフロップ５４３，５４４に転送される。続いて、与えられる２プロセッサエレメント分はフリップフロップ５４１，５４２に格納される。そして、フリップフロップ５４１〜５４４に格納された４プロセッサエレメント分のデータがそれぞれラッチ５４５〜５４８に格納される。ライトバッファ部５４は、４プロセッサエレメント分のデータをラッチ５４５〜５４８に格納されると、画像メモリ６にライトアクセスを行う。
【０１０３】
リードバッファ部５５は、画像メモリ６から４プロセッサエレメント分読み出されたデータをバッファとしてのフリップフロップ５５１〜５５４に格納する。４プロセッサエレメント分のデータから２プロセッサエレメント分がマルチプレクサ５５５により選択され、ラッチ５５６，５５７に格納される。このラッチ５５６，５５７に格納された画像データが外部インタフェース４を介してＳＩＭＤプロセッサ１のレジスタファイル３１に転送される。
【０１０４】
４プロセッサエレメント分のデータの転送が終わると再び画像メモリ６にリードアクセスを行う。画像メモリ６は一度に４プロセッサエレメント分のデータをアクセスできるので、２サイクルで一度のアクセスを実現できればＳＩＭＤプロセッサ１とのインタフェースが取れることとなり、画像メモリ６のアクセスタイムの制限を緩和できる。
【０１０５】
図１１はデジタルコピーやファクシミリなどでよく行われる変倍処理の内、縮小を実現するための間引きライト動作について図示したものである。
【０１０６】
図１１ではプロセッサエレメント（０）、プロセッサエレメント（２）などの画素データはそのまま画像メモリ６に格納され、プロセッサエレメント（１）、プロセッサエレメント（３）などの画素データが画像メモリ６に格納されずに間引かれている。
【０１０７】
メモリコントローラ５は偶数側のプロセッサエレメント３ａと奇数側のプロセッサエレメント３ａのそれぞれのデータを間引くかどうかを決定する外部ライト制御信号を２本有している。また、メモリコントローラ５とライト制御信号との同期を取るために転送を開始するタイミングをメモリコントローラ５に通知する外部端子を有している。転送を開始する外部同期信号はシーケンスユニットに入力されており、同期信号がアサートされるとシーケンスユニット５１は転送を開始させる。
【０１０８】
シーケンスユニット５１は、上記ライト制御信号の値を検出して、外部からのライト制御信号が”０”である場合には、そのデータをライトバッファ部５４内のバッファに書き込む。また、シーケンスユニット５１は、上記ライト制御信号の値を検出して、”１”が立っている場合は、そのデータをライトバッファ部５４内のバッファに格納することを抑止する。
【０１０９】
ライトバッファ部５４は転送開始時及び転送終了時の例外を除き、４プロセッサエレメント分のデータが格納されるまで、画像メモリ６への転送要求をＲＡＭ制御部５３に対して出力しないため、アドレスポインタの更新と画像メモリ６へのクロック、リード・ライト制御、バイトセレクトの出力は４プロセッサエレメント分のデータがライトバッファ部５４に格納されるまで行われない。ＳＩＭＤプロセッサ１のデータ転送ポートからのデータリードはライト制御信号によらず継続する。
【０１１０】
図１２は上記変倍処理の内、拡大を実現するための重複リード動作について図示したものである。図１２ではプロセッサエレメント（０）、プロセッサエレメント（２）などのレジスタにはリードポインタの位置から順番に画像メモリ６のデータが書き込まれ、プロセッサエレメント（１）、プロセッサエレメント（３）などのレジスタには１つ前のプロセッサエレメントのレジスタに書き込まれた値が重複して書き込まれている。
【０１１１】
メモリコントローラ５は偶数側のプロセッサエレメント３ａと奇数側のプロセッサエレメント３ａのそれぞれのデータを書き込む際に前と同じデータを重複して書き込むかどうかを決定する外部リード制御信号を２本有している。
【０１１２】
メモリコントローラ５と重複制御信号との同期を取るために転送を開始するタイミングをメモリコントローラ５に通知する外部端子を有している。転送を開始する外部同期信号はシーケンスユニット５１に入力されており、同期信号がアサートされるとシーケンスユニット５１は転送を開始させる。
【０１１３】
シーケンスユニット５１は、上記リード制御信号の値を検出して、”１”が立っている場合は、リードバッファ部５５のマルチプレクサを制御して、１つ前にＰＥアドレスを持つプロセッサエレメントのレジスタに転送すべきデータを再度出力できるようにしている。リードバッファ部５５は転送開始時と転送終了時の例外を除き、画像メモリ６からリードされた４プロセッサエレメント分のデータが全て不必要になるまで、ＲＡＭ制御部５３に対して画像メモリ６へのリードデータ転送要求を出力しないため、（たとえばリード制御が常に”０”であれば２回のデータ転送ポートへのアクセスがあるまでであり、リード制御信号に１がたっている間はデータが不必要になることはない。）アドレスポインタの更新と画像メモリ６へのクロック、リード・ライト制御、バイトセレクトの出力は４プロセッサエレメント分のデータが全て不必要になるまで行われない。ＳＩＭＤプロセッサ１の外部インタフェース４へのデータライトはリード制御信号によらず継続する。
【０１１４】
上記した実施の形態においては、一度のアドレス指定により、ＳＩＭＤプロセッサ１の偶数番号が割り付けられたプロセッサエレメント３ａにデータ転送できるとともに、奇数番号が割り付けられたプロセッサエレメント３ａにもデータ転送できるように構成しているが、ＳＩＭＤプロセッサ１への画像データの転送はこの方式に限られるものではない。たとえば、図１３に示すように、ＳＩＭＤプロセッサ１のプロセッサエレメント３ａに、奇数、偶数の区別を付けずに、アドレス指定により順次データを転送するように構成したものにおいても、この発明は適用できる。すなわち、図１１に示すように、レジスタコントローラ３１ａは、外部インタフェース４と、アドレスバス４１ａ、リード／ライト信号４５ｃ、クロック信号４１ｃを介して接続されている。このレジスタコントローラ３１ａは、メモリコントローラ５から外部インタフェース４に与えられ、アドレスバス４１ａを介してアドレス指定信号が送られてくると、そのアドレス指定信号をデコードする。そして、デコードしたアドレスと、自己のプロセッサエレメント３ａに割り付けられたアドレスとが一致する場合には、メモリコントローラ５から外部インタフェース４に与えられ、クロック信号４１ｃからのクロック信号に同期して、リード／ライト信号４１ｂを介してメモリコントローラ５から送られてきたリード／ライト指示信号を得る。このリード／ライト指示信号は、レジスタ３１ｂへ与えられる。
【０１１５】
ＳＩＭＤ型プロセッサ１の外部に設けられた画像メモリ６に格納されているデータを、この実施形態では８ビットのパラレルデータとして、データバス４６ｃに置く。このデータバス４６ｃは、レジスタ３１ｂに保持されている演算処理されたデータが、ＳＩＭＤ型プロセッサ１の外部に設けられた画像メモリ６に送られる時にも使用される。
【０１１６】
外部インタフェース４から与えられるアドレス、リード／ライト、クロック、データの信号はレジスタファイル３１の各レジスタに供給される。そして、各プロセッサエレメント３ａ…ごとにアドレスをデコードして各プロセッサエレメント３ａ…を示すアドレスと一致したプロセッサエレメント３ａだけがリード／ライトの動作をおこなう。
【０１１７】
このように構成されるＳＩＭＤ型プロセッサ１は、メモリコントローラ５が、画像メモリ６に格納されているデータをプロセッサエレメント３ａに送る場合、プロセッサエレメント３ａに割り付けられたアドレスを指定することにより、１回のクロック信号が入力されるだけで、その指定したプロセッサエレメント３ａにデータが送られる。なお、この例では、偶数、奇数のプロセッサエレメント３ａに同時にデータは送られないので、第１の実施の形態に比べると、データ転送に時間はかかるが、回路構成は簡略化できる。
【０１１８】
上述した実施形態においては、プロセッサエレメント３ａをアドレス指定しているが、プロセッサエレメント３ａの指定をアドレス指定する方式ではなく、ポインタ指定する方式、即ちシリアルアクセスメモリ方式においても、この発明は適用できる。この例につき図１４に従い説明する。なお、ここでは上述した第１の実施形態と異なる点について説明することとし、同じ点については説明を省略する。また、上述した第１実施形態と同じ構成部分については、同一の符号を付する。
【０１１９】
まず、グローバルプロセッサ２からメモリコントローラ５へはＩ／Ｏ用のアドレス、データ、コントロール信号がバスを介して与えられる。グローバルプロセッサ２がメモリコントローラ５のいくつかの動作設定レジスタ（図示せず）へ動作方法等のコマンドを設定している。最後にグローバルプロセッサ２は、メモリコントローラ５のスタートレジスタ（図示せず）にスタートコードを書き込むことで、メモリコントローラ５は自動的に設定に従った動作を行う。メモリコントローラ５は、グローバルプロセッサ２のコマンドに基づき、このリセット信号を生成し、外部インタフェース４からリセット信号４７を介してプロセッサエレメントブロック３ヘ送る。これにより、レジスタコントローラ３１ａは、リセットされる。そして、外部インタフェース４に最も近いレジスタコントローラ３１ａへメモリコントローラ５から外部インタフェース４、クロック信号４１ｃを介してクロック信号が送られる。このクロック信号に同期して、レジスタコントローラ３１ａ’は、リード／ライト信号４５ａ或いは４５ｂを介してメモリコントローラ５から送られてきたリード／ライト指示信号を得る。このリード／ライト指示信号は、偶数番号が割り付けられたプロセッサエレメント３ａのレジスタ３１ｂ、及び奇数番号が割り付けられたプロセッサエレメント３ａのレジスタ３１ｂにそれぞれ与えられる。このとき一組を構成するプロセッサエレメント３ａのレジスタコントローラ３１ａ’へ送られるリード／ライト指示信号は、上記第１実施形態の場合と同様それぞれ異なるものであってもよい。
【０１２０】
これにより、上述した第１実施形態の場合と同様、一度のポインタ指定により、偶数番号が割り付けられたプロセッサエレメント３ａにデータ転送できるとともに、奇数番号が割り付けられたプロセッサエレメント３ａにもデータ転送できる。
【０１２１】
図１５に示すものは、上記したこの発明のメモリコントローラ５を含んだ画像処理装置の他の実施形態の構成を示すブロック図である。この図１５に示すものは、独立した２つのレジスタファイル３のデータ転送ポートの間にメモリコントローラ５を配置したものである。このような構成のものにおいても、本発明は適用することができる。
【０１２２】
【発明の効果】
上述したように、この発明の請求項１の記載の発明によれば、プロセッサエレメントの数が主走査方向の画素数より少ない場合においても、容易にＳＩＭＤプロセッサでの演算が行え、フィルタ処理などの重み付け演算を行う場合に有効な画素のみを転送できる。
【０１２３】
また、請求項２に記載の発明によれば、ＳＩＭＤ型プロセッサを用いた画像処理システムにおいて、シングルポートメモリを用いながらラインメモリを実現できる。
【０１２４】
請求項３に記載の発明によれば、主走査方向の画素数よりも少ないプロセッサエレメント数のＳＩＭＤ型プロセッサを用いた場合に、フィルタ処理などの重み付け演算を行う場合に、有効な画素のみを転送することができる。
【０１２５】
また、請求項４に記載の発明によれば、主走査方向の画素数よりも少ないプロセッサエレメント数のＳＩＭＤ方式プロセッサを用いた場合に、フィルタ処理などの重み付け演算を行う場合に、参照用画素データを併せて転送することができる。
【０１２６】
請求項５に記載の発明によれば、複数の処理ステップを持つような画像処理の場合に、処理ステップごとにデータを保管することが可能になる。
【０１２７】
また、請求項６及び７に記載の発明によれば、全プロセッサエレメントの汎用レジスタのデータを転送する時間を半分にすることができること。また、ＲＡＭのアクセスタイムに余裕を持たせることができる。
【０１２８】
請求項８及び９に記載の発明によれば、デジタル画像処理における変倍処理をメモリコントローラで実現しているため、ＳＩＭＤ型プロセッサ自体の汎用性と回路規模を保つことが可能になる。
【０１２９】
また、請求項１０に記載の発明によれば、デジタルコピーなど主走査方向の画素が極めて多い画像処理装置を実現する場合に、プロセッサエレメント数を増減するなどのＳＩＭＤ方式プロセッサのアーキテクチャそのものを変更することなく画像処理装置を構築することが可能になる。
【図面の簡単な説明】
【図１】この発明の実施形態におけるＳＩＭＤ型プロセッサを示すブロック図である。
【図２】この発明の第１の実施形態におけるＳＩＭＤ型プロセッサの内部構成を示すブロック図である。
【図３】この発明の第１の実施形態におけるプロセッサエレメントの内部構成を示すブロック図である。
【図４】この発明に用いられるメモリコントローラ５の構成を示すブロック図である。
【図５】この発明に用いられるメモリコントローラ５のプロセッサエレメント（ＰＥ）制御部５２の実施形態の概略ブロック図である。
【図６】この発明に用いられるメモリコントローラ５のメモリ（ＲＡＭ）制御部５３の第１の実施形態の概略ブロック図である。
【図７】主走査方向の画素数よりもプロセッサエレメント３ａの数が少ないＳＩＭＤ方式のプロセッサを用いた場合の画素データの転送方法についての説明図である。
【図８】この発明に用いられるメモリコントローラ５のＲＡＭ制御部５３の第２の実施形態を示す概略ブロック図である。
【図９】この発明に用いられるメモリコントローラ５のＲＡＭ制御部５３の第３の実施形態を示す概略ブロック図である。
【図１０】この発明に用いられるメモリコントローラ５のライトバッファ部５４及びリードバッファ部５５の概略ブロック図である。
【図１１】変倍処理の内、縮小を実現するための間引きライト動作の説明図である。
【図１２】変倍処理の内、重複リード動作の説明図である。
【図１３】この発明の他の実施形態におけるＳＩＭＤ型プロセッサの内部構成を示すブロック図である。
【図１４】この発明のさらに異なる実施形態におけるＳＩＭＤ型プロセッサの内部構成を示すブロック図である。
【図１５】この発明の他の実施の形態を示す概略ブロック図である。
【図１６】従来のＳＩＭＤ方式を用いた画像処理装置の概略ブロック図である。
【符号の説明】
１ＳＩＭＤ型プロセッサ
２グローバルプロセッサ
４外部インタフェース
５メモリコントローラ
６画像メモリ
５１シーケンスユニット
５２ＰＥ制御部
５３ＲＡＭ制御部
５４ライトバッファ部
５５リードバッファ部[0001]
[Industrial application fields]
The present invention relates to a signal processing apparatus using a single instruction stream multiple data stream (SIMD) type processor that processes a plurality of image data and the like in parallel by a single operation instruction, and is suitable for use in image processing such as digital copying. It relates to the device.
[0002]
[Prior art]
In recent years, in digital copying machines, facsimile machines, and the like, improvement of images has been attempted by increasing the number of pixels or making it compatible with color. As the image is improved, the number of data to be processed has increased. By the way, data processing in a copying machine or the like often performs the same arithmetic processing on all pixels. Therefore, a SIMD processor that performs the same arithmetic processing on a plurality of data simultaneously with one instruction is used.
[0003]
Normally, when image processing is performed using a SIMD processor, processor elements (PE) are developed in the main scanning direction. For this reason, when performing image processing such as filter processing, reference pixels above and below the target pixel are required, and it is conceivable that pixel data of the previous line is delayed in line and stored in the line memory.
[0004]
FIG. 16 shows a schematic block diagram of an image processing apparatus using the SIMD method. As shown in the figure, the image processing apparatus includes an external image memory (RAM) 101 in which image data is stored, for example, a processor element block 103 including 1024 processor elements, and a global processor 104.
[0005]
Each processor element (PE) has n general-purpose registers (R0 to Rn-1) and an operation array, and the general-purpose registers are in the form of having a register file outside the normal operation array. Each register is accessible as a shift register to the outside through a serial port. In the figure, the shaded area is a unit processor element.
[0006]
The global processor 104 has an address generation means for a program memory (PRAM) 105, reads an instruction code given to the processor element of the processor element block 103 from the PRAM 105, and controls the operation array and the register file.
[0007]
The device connected to the serial port of the general-purpose register of each PE differs depending on the system configuration. However, in a system that requires only a line delay of two lines, for example, pixel data of two lines before in R0 and one line before in R1 Assuming that the current line pixel data is arranged in the pixel data R2, the serial-parallel converter 102 is arranged in the serial ports R0 and R1, and is connected to the external RAM 101.
[0008]
For example, SVP (SERIAL VIDEO PROCESSOR) is known as an example of the image processing apparatus having the above-described configuration.
[0009]
Since most conventional SIMD processors have processor elements larger than the number of pixels in the main scanning direction, difficult processing is not required for controlling the external memory (RAM) 101. In order to realize a scaling function such as enlargement or reduction in image processing such as digital copying, a separate ASIC is attached or described in Japanese Patent Laid-Open No. 08-123683 (IPC: G06F 9/38). This can be realized by adopting a configuration in which a flag for zooming control is provided inside the SIMD processor.
[0010]
[Problems to be solved by the invention]
However, the accuracy of image processing has been increasing in recent years, and the number of pixels in the main scanning direction tends to increase. Further, in a conventional SIMD type processor, an increase in the circuit scale is usually prevented by using a small-scale circuit, for example, a processor having a large number of processor elements uses a 1-bit arithmetic array.
[0011]
When trying to apply a SIMD type processor to digital copying, for example, when handling an A4 size image with an accuracy of 600 DPI (Dot Per Inch), the number of processor elements of 7000 pixels or more is required. It is not realistic to increase In order to solve this problem, it is conceivable to perform processing by dividing the main scanning direction. However, a dual port memory (RAM) is required as a line memory, which increases the circuit scale.
[0012]
On the other hand, in Japanese Patent Laid-Open No. 10-326258 (IPC: G06F 15/16), the processing time is obtained by dividing the pixels in the main scanning direction into two and dividing the total number of processor elements in half to perform pipeline processing. There has been proposed a data operation system that can shorten the time. However, with this method, it is not possible to perform processing with the number of pixels exceeding the size of the data memory accessible to the processor element.
[0013]
Furthermore, in a configuration in which a processor element has a scaling control flag, the processing becomes complicated when the number of divisions increases, so it is desirable to implement scaling processing outside the SIMD type processor. However, if the scaling function is realized separately using an ASIC, the versatility of the processor is reduced.
[0014]
Therefore, the present invention provides a signal processing device using a SIMD processor having a simple configuration that can suppress the increase in circuit scale by using a single port memory (RAM) and at the same time incorporate a zoom function. Objective.
[0015]
[Means for Solving the Problems]
The present invention relates to a processor element of a SIMD type processor comprising a computing means for computing data and data holding means for holding data computed by the computing means and holding data computed by the computing means; A data transfer bus connected to each of the plurality of processor elements; designation means for designating a predetermined processor element based on addresses assigned to the plurality of processor elements; and an address bus for supplying an address to the designation means; A data transfer interface for accessing the data holding means incorporated in the plurality of processor elements from outside the processor, and the predetermined processor element connected to the data transfer interface and supplied to the address bus is designated A memory controller that generates an address for reading data, reads data stored in an external memory, writes data to the processor element, reads data from the processor element, and writes data to the memory; And when the memory controller transfers data from the data holding means of each processor element to the memory via the data transfer interface, the address of the processor element that starts the transfer and the processor element that ends the transfer Specify the address, and a predetermined number De The calculated data is read from each processor element. The pointer value corresponding to the address of the processor element that starts the transfer is set, and the previous value is set based on this pointer value. When writing this data to the memory and transferring data from the memory to the data holding means of each processor element via the data transfer interface, The amount corresponding to the predetermined number of data excluded in the previous processing Returned address A pointer value is set, data is read from the memory based on the pointer value, and an address for designating the predetermined processor element to be supplied to the address bus corresponding to the pointer value is generated. The data is written into the processor element, the data is divided, and the arithmetic processing is performed by the SIMD type processor.
[0016]
With the configuration described above, even when the number of processor elements is smaller than the number of pixels in the main scanning direction, the SIMD processor can easily perform calculations and only effective pixels when performing weighting calculations such as filter processing. Can be transferred.
[0017]
The memory controller may control to perform write transfer and read transfer to the memory in a time-sharing manner.
[0018]
With this configuration, the single port memory can be used as a FIFO or LIFO memory, and a line memory can be realized using the single port memory.
[0019]
The memory controller starts a transfer when transferring data from the data holding means of each processor element to the memory via the data transfer interface. Address And the processor element that ends the transfer Address May be provided with a register for designating. For example, an initial value loading function may be added to the address counter of the processor element so that an offset value can be set based on the address of the processor element.
[0020]
Furthermore, the memory controller terminates the transfer when transferring data from the memory to the data holding means of each processor element via the data transfer interface. Address And a processor element to return the memory read pointer after the transfer ends Address Can be provided with a register for designating. For example, a comparator between the processor element address and the set value may be provided so as to reload the read pointer of the memory in which the data transferred to the register having the processor element address equal to the set value is stored.
[0021]
With the configuration described above, when a SIMD processor having a smaller number of processor elements than the number of pixels in the main scanning direction is used, only effective pixels can be transferred when performing weighting operations such as filter processing.
[0022]
Further, the memory controller can be configured so that a lower limit value and an upper limit value of an arbitrary address area of the memory are set by a register and the area is used in a ring shape. For example, a lower limit value and an upper limit value register for memory pointers may be provided, each having a comparator, and configured so that the value of each register can be output to the address bus when the conditions are met.
[0023]
With the above configuration, in the case of image processing having a plurality of steps, data can be stored for each processing step.
[0024]
The data transfer interface may be configured to be able to simultaneously access data holding means of processor elements having even addresses and processor elements having odd addresses. For example, a data transfer port of a SIMD type processor has a configuration having two independent ports for accessing a data holding means of a processor element having an even address and a processor element having an odd address, and two processor elements in one cycle. What is necessary is just to comprise so that the data of a minute can be processed.
[0025]
Furthermore, the bit width of the input / output bus with the memory may be wider than the data transfer bit width with the processor element. For example, the bit width of the memory may be configured to be 4 processor elements wide by having input / output buffers for the memory for 4 processor elements.
[0026]
With the above configuration, the time for transferring data to the data holding means of all the processor elements can be halved. Further, it is possible to provide a margin for memory access time.
[0027]
Further, according to the present invention, when data is transferred from the data holding means of each processor element to the memory via the data transfer interface, the write control signal from the outside synchronized with the data transfer is “0”. The data can be written in, and when it is “1”, the data writing can be prohibited. For example, by inputting a synchronization signal to the sequence unit from the outside and controlling the timing to start the write transfer, the write control signal and the write transfer can be synchronized, and depending on the value of the write control signal, What is necessary is just to comprise so that transfer according to a write control signal can be performed by changing the control of the write buffer part for shaping the data read from the data transfer port of the processor element and writing it to the RAM.
[0028]
Further, when data is transferred from the memory to the data holding means of each processor element via the data transfer interface, when an external read control signal synchronized with the data transfer is “0”, The value read from the memory is written to the data transfer interface. If it is “1”, the same value as the data transferred when the external signal was “0” last before the transfer is transferred. Can be configured to write to the interface. For example, by inputting a synchronization signal from the outside to the sequence unit and controlling the timing of starting the read transfer, the read control signal and the read transfer can be synchronized, and depending on the value of the read control signal, What is necessary is just to comprise so that transfer according to a read control signal can be performed by changing the control of the read buffer part for shaping the data read from RAM, and writing in the data transfer port of a processor element.
[0029]
With the configuration described above, the scaling process in the digital image processing can be realized by the memory controller, so that the versatility and circuit scale of the SIMD type processor itself can be maintained.
[0030]
Further, according to the present invention, image data is stored in the memory, the number of processor elements of the SIMD type processor is configured to be smaller than the number of pixels in the main scanning direction, and the memory controller sets all the pixels in the main scanning direction to 2 It can be configured to control the processing of writing and reading data so that the processing is divided into two or more.
[0031]
By configuring as described above, when realizing an image processing apparatus having a very large number of pixels in the main scanning direction, such as digital copying, without changing the SIMD processor architecture itself, such as increasing or decreasing the number of processor elements. An image processing apparatus can be constructed.
[0032]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of a SIMD type processor 1 according to the present invention will be described below with reference to the drawings.
[0033]
First, the overall configuration of the SIMD type processor according to the present invention will be described with reference to FIG. As shown in FIG. 1, the SIMD type processor 1 of the present invention includes a global processor 2, a processor element block 3 including 1024 sets of processor elements 3a described later in this embodiment, and an external interface 4 connected to a memory controller 5. Composed. Based on instructions from the global processor 2, the memory controller 5 provides image data to be calculated from an external image memory 6 composed of a single port memory (RAM) to an input / output register file 31 inside the processor, and performs arithmetic processing. The processed data is transferred from the register file 31 to the image memory (RAM) 6.
[0034]
As shown in FIG. 2, the global processor 2 includes a processor element block 3, a program RAM 21 in which a program for controlling the external interface 4 and the memory controller 5 is stored, and a global processor 2 and a processor element block based on the program RAM 21. 3, a sequence unit 22 for controlling the external interface 4 and the memory controller 5 is provided. Specifically, the sequence unit 22 controls an arithmetic logic unit 23 (hereinafter referred to as “ALU 23”), which will be described later, provided in the global processor 2.
[0035]
The sequence unit 22 controls the register file 31 and the arithmetic array 36 that constitute the processor element block 3. The arithmetic array 36 includes a multiplexer 32, a shift extension circuit 33, an arithmetic logic unit 34 (hereinafter referred to as “ALU 34”), and a register 35. The global processor 2 is a so-called SISD type, and performs one arithmetic process for one arithmetic instruction.
[0036]
Further, the sequence unit 22 sends operation setting data and commands for data transfer to the memory controller 5. Based on the operation setting data and command of the sequence unit 22, the memory controller 5 is an address control signal for addressing the processor element 3a, and instructs the register 31b constituting the processor element 3a to read / write data. A clock control signal for supplying a read / write control signal and a clock signal is supplied to the external interface 4.
[0037]
Here, the write control signal among the read / write control signals refers to a signal for acquiring data to be processed from the data bus 46a (46b) and holding it in the register 31b of the processor element 3a. On the other hand, of the read / write control signals, the read control signal means that the processed data held in the register 31b of the processor element 3a is given to the data bus 46a (46b). In A signal for instructing the register 31b.
[0038]
In the processor element block 3 in this embodiment, an even number and an odd number are assigned to two adjacent processor elements 3a to form one set, and the same address is assigned to this set of processor elements 3a.
[0039]
Upon receiving a command from the global processor 2, the memory controller 5 creates a signal (hereinafter referred to as “address designation signal”) that designates the address of the processor element 3 a that constitutes the processor element block 3, and the external controller 4 The data is sent to the register controller 31a of the processor element 3a via the address bus 41a. As will be described later, the memory controller 5 sends a signal for instructing the register 31b constituting the processor element 3a to read / write data (hereinafter referred to as “read / write instruction signal”). A read / write signal is supplied to a register controller 31a (to be described later) of the processor element 3a through a read / write signal line 45a (45b). The even read / write signal line 45a provides a read / write signal to the even number processor element 3a, and the odd read / write signal line 45b provides a read / write signal to the odd number processor element 3a.
[0040]
Further, the memory controller 5 gives a clock signal from the external interface 4 to a register controller 31a (to be described later) of the processor element 3a through the clock signal line 41c.
[0041]
Further, as described above, the memory controller 5 gives the data stored in the image memory 6 provided outside the SIMD type processor 1 to the external interface 4 as 16-bit parallel data in this embodiment. The 16-bit data is composed of 8 bits given to the processor element 3a assigned with the even number and 8 bits given to the processor element 3a assigned with the odd number. The 8-bit data is applied to the even data bus 46a and the odd data bus 46b. The 8-bit parallel data can be appropriately changed according to the data. The data buses 46 a and 46 b are also used when the processed data held in the register 31 b is sent to the image memory 6 provided outside the SIMD type processor 1.
[0042]
The image memory 6 stores data to be subjected to arithmetic processing and stores data subjected to arithmetic processing. These image memories 6 may be provided inside the SIMD type processor 1 without any problem. Also, in this embodiment, data transfer between the memory controller 5 and the image memory 6 is treated as being transferred as 32-bit parallel data, as will be described later. However, the data transfer is appropriately changed according to the data. There is no problem. Other operations performed by the memory controller 5 will be described later.
[0043]
In addition, the global processor 2 includes an ALU 23 that performs arithmetic logic operations and a data RAM 24 that stores operation data in accordance with instructions from the sequence unit 22. Further, the global processor 2 includes a register group 25 for holding data to be processed.
[0044]
The register group 25 includes a program counter PC that holds a program address, G0 to G3 registers that are general-purpose registers for storing data for arithmetic processing, and a stack that holds the address of the save destination data RAM at the time of register save and return. Pointer (SP), link register (LS) that holds the address of the caller at the time of a subroutine call, LI and LN registers that hold branch source addresses at the time of IRQ and NMI, and a processor status register that holds the state of the processor (P) is incorporated.
[0045]
The register group 25 is connected to a later-described register 35 of the processor element block 3, and data is exchanged with the register 35 under the control of the sequence unit 22.
[0046]
As shown in FIGS. 1 and 2, the processor element block 3 includes a register file 31, a multiplexer 32, a shift / expansion circuit 33, an arithmetic logic unit 34 (hereinafter referred to as “ALU 34”), and a register 35 as one unit. A plurality of processor elements 3a. The register file 31 includes 32 8-bit registers in one processor element 3a. In this embodiment, a set of 1024 processor elements has an array configuration. The register file 31 stores R0, R1, R2,... For each processor element (PE) 3a. . . A register called R31 is incorporated. Each register file 31 has one read port and one write port for the arithmetic array 36 and is accessed from the arithmetic array 36 by an 8-bit read / write bus. Of the 32 registers, 24 are accessible from outside the processor, and any register can be read and written by inputting a clock, an address, and read / write control from the outside.
[0047]
Access from the outside of the register allows one register of each processor element 3a to be accessed by one external port, and the processor element number (0 to 1023) is designated by an address inputted from the outside. Therefore, a total of 24 external ports for register access are installed. As described above, externally accessed data is 16-bit data for one set of even-numbered processor elements 3a and odd-numbered processor elements 3a, and two registers are accessed simultaneously by one access. ing.
[0048]
In the present embodiment, the number of processor elements 3a is described as 1024. However, the number of processor elements 3a is not limited to this, and may be changed as appropriate. Addresses 0 to 1023 are assigned to the processor element 3a in the order of closeness to the external interface 4 by the sequence unit 22 of the global processor 2.
[0049]
The register file 31 of the processor element 3a includes a register controller 31a and two types of registers 31b and 31c. In the present embodiment, as shown in FIG. 3, each unit of processor element 3a includes 24 sets of register controller 31a and register 31b, and further includes 8 registers 31c. 3 shows a part of the register file 31 in the two sets of processor elements 3a, and one processor element in FIG. 3 represents one processor element 3a. Here, in the present embodiment, the registers 31b and 31c are handled as 8-bit registers, but the present invention is not limited to this and may be used with appropriate modifications.
[0050]
As shown in FIG. 3, the register controller 31a is connected to the external interface 4 via the address bus 41a, the even read / write signal line 45a, the odd read / write signal line 45b, and the clock signal line 41c. ing.
[0051]
When receiving the address control signal from the memory controller 5, the external interface 4 sends an address designation signal to the processor element block 3 via the address bus 41a. Thereby, a set of processor elements 3a, ie two processor elements 3a, are addressed simultaneously. The register controller 31a decodes the address designation signal sent, and if the decoded address matches the address assigned to itself, the register controller 31a sends the clock sent from the memory controller 5 via the clock signal 41c. In synchronization with the signal, a read / write instruction signal sent from the memory controller 5 is obtained via the read / write signal 45a or 45b. Specifically, the register controller 31a to which the even number is assigned obtains the read / write instruction signal sent from the memory controller 5 through the even read / write signal 45a. On the other hand, the register controller 31a to which the odd number is assigned obtains the read / write instruction signal sent from the memory controller 5 via the odd read / write signal 45b. At this time, the read / write instruction signals sent to the register controller 31a of the processor element 3a constituting the set may be different. That is, when the instruction signal sent to the register controller 31a assigned with the even number is a read instruction, the instruction signal sent to the register controller 31a assigned with the odd number may be a write instruction. The read / write instruction signal is given to the register 31b.
[0052]
When a write instruction signal is sent from the register controller 31a to both processor elements 3a, the register 31b of the processor element 3a to which the even number is assigned uses the data (8 bits) to be processed for an even number. Obtained from the data bus 46a and held. The register 31b of the processor element 3a to which the odd number is assigned acquires the data (8 bits) to be processed from the odd data bus 46b and holds it. On the other hand, when a read instruction signal is sent from the register controller 31a to both processor elements 3a, the register 31b of the processor element 3a to which the even number is assigned receives the processed data (8 bits). The data is sent to the even data bus 46a. In addition, the register 31b of the processor element 3a to which the odd number is assigned sends the processed data (8 bits) to the odd data bus 46b.
[0053]
As described above, data can be transferred to the processor element 3a to which the even number is assigned, and can be transferred to the processor element 3a to which the odd number is assigned. For this reason, the number of times of data transfer can be reduced, and data transfer can be performed at high speed.
[0054]
The register 31b holds data input from the outside that will be calculated in the ALU 34, which will be described later, or holds the data processed in the ALU 34 for output to the outside. Alternatively, it functions as an output register. Further, it also has a function as a register 31c, which will be described later, such as temporarily holding data to be processed or calculated data. In this embodiment, the register 31b is handled as one that can hold 8-bit data, but there is no problem even if it is appropriately changed according to the data. When the write instruction signal is given from the register controller 31a described above, the register 31b acquires and holds data to be processed from the data bus 46a or the data bus 46b. On the other hand, when a read instruction signal is sent from the register controller 31a, the register 31b gives the data processed and held to the data bus 46a or the data bus 46b. This data is given from the external interface 4 to the write buffer unit 54 of the memory controller 5 and stored in the image memory 6 from the write buffer unit 54.
[0055]
In the present embodiment, the register 31b is connected to the multiplexer 32 via a data bus 37 for transferring 8-bit data in parallel. Data processed by the ALU 34 or data processed by the ALU 34 is transferred to the register 31b via the data bus 37. This transfer is performed via a read signal line 26 a and a write signal line 26 b connected to the global processor 2 in accordance with an instruction from the sequence unit 22 of the global processor 2. Specifically, when a read instruction signal is sent from the sequence unit 22 of the global processor 2 via the read signal line 26a, the register 31b puts the data to be processed and held in the data bus. This data is sent to the ALU 34 and processed. On the other hand, when a write instruction signal is sent from the sequence unit 22 of the global processor 2 via the write signal line 26b, the register 31b holds the data processed by the ALU 34 sent via the data bus 37. To do.
[0056]
The register 31c temporarily holds the data to be processed by the register 31b or before the calculated data is supplied to the register 31b. Unlike the register 31b described above, the register 31c does not transfer data to or from the image memory 6 via the memory controller 5.
[0057]
The arithmetic array 36 includes a multiplexer 32. , Ft / extension circuit 33, 16-bit ALU 34, and 16-bit register 35. The register 35 includes a 16-bit A register and an F register.
[0058]
In the calculation by the instruction of the processor element 3a, basically, the data read from the register file 31 is input to one side of the ALU 34 and the content of the A register of the register 35 is input to the other side, and the result is stored in the A register. Therefore, the operation between the A register and the R0 to R31 registers of the register file 31 is performed. A (7 to 1) multiplexer 32 is placed in the connection between the register file 31 and the arithmetic array 36, and the data 1, 2, 3 away to the left and the data 1, 2, 3 away to the right in the processor element direction, The center data is selected as the calculation target. The 8-bit data in the register file 31 is shifted to the left by an arbitrary bit by the shift / extension circuit 33 and input to the ALU 34. In addition, the execution / invalidation control of each processor element 3a is controlled by an 8-bit condition register (T) (not shown) so that only a specific processor element 3a can be selected as an operation target. ing.
[0059]
As described above, the multiplexer 32 is connected to the data bus 37 provided in its own processor element 3a, and is also connected to the data bus 37 provided in the three adjacent processor elements 3a. The multiplexer 32 selects one of the seven processor elements 3 a and sends the data held in the registers 31 b and 31 c in the selected processor element 3 a to the ALU 34. Alternatively, the data processed by the ALU 34 is sent to the registers 31b and 31c in the selected processor element 3a. As a result, arithmetic processing using data held in the registers 31b and 31c in the adjacent processor element 3a becomes possible, and the arithmetic processing capability of the SIMD type processor 1 can be increased.
[0060]
The shift / extension circuit 33 shifts the data sent from the multiplexer 32 by a predetermined bit and sends it to the ALU 34. Alternatively, the arithmetically processed data sent from the ALU 34 is shifted by a predetermined bit and sent to the multiplexer 32.
[0061]
The ALU 34 performs arithmetic logic operations based on the data sent from the shift / expansion circuit 33 and the data held in the register 35. In this embodiment, the ALU 34 is handled as being capable of handling 16-bit data, but there is no problem even if it is appropriately changed according to the data. The processed data is held in the register 35 and transferred to the shift / expansion circuit 33 or transferred to the general-purpose register 25 of the global processor 2.
[0062]
An I / O address, data, and control signal are given from the global processor 2 to the memory controller 5 via a bus. The global processor 2 sets commands such as an operation method in some operation setting registers (not shown) of the memory controller 5. Finally, the global processor 2 writes a start code in a start register (not shown) of the memory controller 5 so that the memory controller 5 automatically performs an operation according to the setting. With this configuration, the data in the register file 31 can be input / output simultaneously with the calculation based on the instruction control of the processor.
[0063]
FIG. 4 shows the configuration of the memory controller 5 used in the present invention. The memory controller 5 includes a write buffer unit 54 that writes data to the image memory 6, a read buffer unit 55 that reads data from the image memory 6, and a PE control unit 52 that controls the register file 31 of the processor element. A RAM control unit 53 that controls the image memory 6 and a sequence unit (SCU) 51 are included.
[0064]
The memory controller 5 is connected to the register file 31 of the SIMD type processor 1 via a data transfer port in the external interface 4, transfers data from the register file 31 to the image memory 6, and transfers from the image memory 6 to the register file 31. Data transfer. This data transfer port includes an output port and an input port. In addition, as described above, the registers controlled by the memory controller 5 in this embodiment are mapped to the I / O space, and output addresses, clocks, and read / write controls in accordance with instructions from the global processor 2. By doing so, it is possible to read and write.
[0065]
An output port of the external interface 4 of the SIMD processor 1 is connected to the write buffer unit 54, and an input port of the external interface 4 is connected to the read buffer unit 55. Each data transfer port has independent input and output ports for even processor elements and odd processor elements, and can be configured to access data for one set of even and odd processor elements at a time in one cycle. Has been. The data buses between the write buffer unit 54, the read buffer unit 55, and the image memory 6 are each configured with a data width for four processor elements, and data for four processor elements can be accessed at one time in one cycle. In this embodiment, the data for one processor element is 8 bits. The bit width of the data bus between the external interface 4 and the write buffer 54 and read buffer 55 is 16 bits. Accordingly, the bit width between the memory controller 5 and the image memory 6 is 32 bits.
[0066]
As a result, the transfer between the image memory 6 and the memory controller 5 only needs to be executed once while the transfer between the data transfer port of the external interface 4 and the memory controller 5 of the external interface 4 is performed twice.
[0067]
The write buffer unit 54 of the memory controller 5 takes in the pixel data output from the external interface 4 of the SIMD type processor 1 twice, shapes it into data for four processor elements, and transfers it to the image memory 6. ing. The read buffer unit 55 performs an operation of dividing the data for the four processor elements read from the image memory 6 into two times and transferring them to the external interface 4 of the SIMD type processor 1.
[0068]
FIG. 5 shows a schematic block diagram of an embodiment of the processor element (PE) control unit 52.
[0069]
As shown in FIG. 5, the PE control unit 52 outputs the address, clock, and read / write control signal of the processor element 3 a to the external interface 4 of the SIMD processor 1, and registers the processor elements 3 a in the SIMD processor 1. Data in the file 31 can be read and written.
[0070]
The PE control unit 52 includes a processor element (PE) address counter 521, transfer number counters 522 and 523, and a valid data number counter 524. In the case of write transfer (transfer from the register file 31 of the processor element 34a to the image memory 6), the PE address counter 521 is initially loaded with “0” at the start of transfer, and read transfer (register from the image memory 6 to the processor element 34a). In the case of (transfer to file 31), the value stored in the transfer start PE address register 525 is initially loaded at the start of transfer, and generates the address of the processor element to which data is transferred. The transfer number counters 522 and 523 are down counters that can initially load the number of data to be transferred, and are used to manage the transfer number. The transfer counter 522 is set with the number of data to be written, and the transfer counter 523 is initially loaded with the number of data to be read.
[0071]
The valid data number counter 524 is a counter that manages the number of valid data stored in the image memory 6. When data is transferred from the image memory 6 to the register file 31 of the processor element 3 a, the valid data number counter 524 has a valid data number that is larger than the transfer number. The transfer is performed only when there are more people. With this configuration, data loss is prevented.
[0072]
FIG. 6 shows a schematic block diagram of the first embodiment of the memory (RAM) control unit 53.
[0073]
As shown in FIG. 6, the RAM control unit 53 is controlled by control lines from the write buffer unit 54 and the read buffer unit 55, and clocks, addresses, and reads to the image memory (RAM) 6 composed of a single port memory / Write control and byte select are output.
[0074]
The RAM control unit 53 includes a RAM address adjuster 531, a write pointer register 532, a read pointer register 533, and a multiplexer 534. Each block is connected via an address setting bus (hereinafter abbreviated as AB).
[0075]
The write pointer register 532 is a register that stores a pointer to be written to the image memory 6 next, and the read pointer register 533 is a register that similarly stores a pointer to be read from the image memory 6 next.
[0076]
In the write pointer register 532 and the read pointer register 533, the pointer updated by the RAM address adder / subtractor 531 is input from the AB 535 and stored after accessing the pointer in the image memory 6. The RAM address adder / subtractor 531 outputs a value corresponding to the RAM access size to the value of the write pointer at the time of write access and the read pointer at the time of read access, and adds the subtracted value to the AB 535 in the FIFO operation mode and subtraction in the LIFO operation mode. .
[0077]
The sequence unit 51 includes an address decoder and a control register. By accessing a control register mapped to the I / O space from the global processor 2, the sequence unit 51 controls the entire memory controller 5, or the global processor 2 controls the memory controller. 5 can be monitored. Further, the time division of the write transfer and the read transfer is also performed in the block of the sequence unit 51. Further, by generating two clocks, a clock to the write buffer unit 54 and a clock to the read buffer unit 55, write transfer and read transfer can be performed simultaneously.
[0078]
Next, a description will be given of a pixel data transfer method in the case of using an SIMD type processor in which the number of processor elements 3a is smaller than the number of pixels in the main scanning direction. FIG. 7 shows an explanatory diagram of a pixel data transfer method when a SIMD processor having a smaller number of processor elements 3a than the number of pixels in the main scanning direction is used.
[0079]
When the number of processor elements 3a is smaller than the number of pixels in the main scanning direction, the pixel data in the main scanning direction is divided and the memory controller 5 stores the pixel data in the register file 31 of the processor element 3a of the SIMD processor 1 in the image memory 6 This process is repeatedly executed. That is, when the number of processor elements is smaller than the number of pixels in the main scanning direction, pixel data in the main scanning direction is divided and the arithmetic processing is repeatedly executed by the SIMD processor 1.
[0080]
Usually, in image processing using a SIMD type processor, when processing including processing referring to pixel data before and after a pixel of interest is performed, such as filter processing, invalid data remains in data at both ends of the SIMD. For this reason, when data is stored in the image memory 6 from the register file of the processor element 3a, it is necessary to store the data excluding the data at both ends. Further, when data is transferred from the image memory 6 to the register file 31 of the processor element 3a, it is necessary to transfer reference pixel data before and after the target pixel. That is, transfer of pixel data in weighting processing such as filter processing is performed in the following order. Here, the number of reference pixels is a.
[0081]
1. The pre-processing pixel data is transferred from the image memory 6 to the corresponding register file 31 of the processor element 3a of the SIMD processor 1.
[0082]
2. Image processing by the SIMD processor 1 is performed. At this time, the a data before and after the SIMD processor element 3a are invalidated due to the absence of the reference pixel. That is, the hatched portion in FIG. 7 is the number of effective pixels.
[0083]
3. The processed pixel data is transferred from the SIMD processor 1 to the image memory 6. At this time, effective pixels excluding a pixels before and after both ends are transferred to the image memory 6.
[0084]
The transfer of pixel data between the image memory 6 and the SIMD processor 1 will be described with reference to FIG. The processor element (PE) 3a of the SIMD processor 1 includes n, that is, PE0 to PEn-1, and sends pixel data from the image memory 6 to the processor elements 3a. Pixel data is transferred from 3a to the image memory 6.
[0085]
First, in the first transfer, that is, the 1st SIMD transfer in the figure, a reference pixel a is transferred before and after the processor element 3a of the SIMD processor 1. In the example shown in FIG. 7, the RAM control unit 53 first stores 0 in the read pointer 533, and reads the number corresponding to the RAM access size, in this embodiment, 32 bits of data from the image memory 6. And stored in the read buffer unit 55. The address adder / subtractor 531 outputs a value corresponding to the RAM access size to the read pointer value in the FIFO operation mode, and outputs the subtracted value to the AB 535 in the LIFO operation mode. Stored. Then, an address is generated by the PE address counter 521 of the PE control unit 52, and data is written from the external interface 4 to the corresponding processor element 3a... Based on the address. The transfer number counter 522 is decremented by a number corresponding to the RAM size each time image data is written, ie, 4 in this embodiment. The above processing is repeatedly executed until the value of the transfer counter 522 becomes 0, and the pixel data from the image memory 6 is transferred from PE0 to PEn-1 of the SIMD processor 1.
[0086]
When the processed pixel data is transferred to the image memory 6, the a pixel data before and after the pixel data are invalid, and therefore, the pixel data to be transferred is a to (na-1) among n pixels from PE0 to PEn-1. ) Pixels up to (n−2 × a). For this reason, the value of the transfer start PE address register 525 is loaded into the PE address counter 521 of the PE control unit 52 at the start of transfer, and data is read from the corresponding processor element 3a. The image data is written from 4 to the write buffer unit 54. Each time data is read, the transfer number counter 523 is decremented by a number corresponding to the RAM access size, ie, 4 in this embodiment. When 32-bit data is stored in the write buffer unit 54, the image data is transferred to the image memory 6. First, the RAM control unit 53 Write pointer 532 A is stored, and the image data stored in the write buffer unit 54, in this embodiment, 32 bits of data is transferred to the image memory 6. The address adder / subtractor 531 outputs the value corresponding to the RAM access size to the value of the read pointer, adds the value in the FIFO operation mode, and subtracts the value in the LIFO operation mode to the AB 535, and the value is stored in the write pointer 532. The In this manner, image data from PEa to PEn-2 × a of the SIMD processor 1 is transferred to the image memory 6.
[0087]
Subsequently, the second transfer is performed in the same manner. In the second SIMD transfer in the figure, since the pixels from (n−a) to (2n−3 × a) are processed as the target pixel, (n−2 × a) to (n−a−1) are used as reference pixels. ) And (2n−3 × a) to (2n−2 × a−1) pixel data are sent together. After the data processing, (n−3 × a) pixels (n−a) to (n−2 × a) are stored in the image memory 6. Similar processing is only repeated when the number of SIMD divisions in the main scanning direction is large.
[0088]
In order to realize the above processing, in the case of data write to the image memory 6, “a” is set in the transfer start PE address register 525 in the embodiment of the processor element (PE) control unit 52 of the present invention. In this case, “(n−2 × a)” may be set in the transfer number counter 522.
[0089]
Since all the setting numbers are based on the PE address, there is no need to change the setting for each SIMD. In the case of reading data from the image memory 6, an operation of returning the read pointer 533 after the transfer is completed is necessary. In the above example, the value is returned to (n−2 × a). In the embodiment of the hitting 1 of the RAM control unit 53, it is necessary to set the value of the read pointer 533 from the global processor 2 after the transfer is completed.
[0090]
In the case of data writing to the image memory 6, it is not necessary to operate the write pointer 532.
[0091]
FIG. 8 is a schematic block diagram showing a second embodiment of the RAM control unit 53 according to the present invention.
[0092]
The RAM control unit 53 shown in FIG. 8 is obtained by adding a read offset generator 536 to the RAM control unit 53 shown in FIG. The RAM control unit 53 returns the value of the read pointer 533 after the transfer to the processor element 3a is completed when data is transferred from the memory 6 to each register file of the processor element 3a of the SIMD processor 1 by the read offset generator 536. It has changed to be able to.
[0093]
The read offset generator 536 monitors the PE address input from the PE control unit 52. When the PE address becomes equal to the set value, the read offset generator 536 holds the value of the read pointer 533 at that time and sends the set transfer number. Continue forwarding until finished. When the transfer is completed, the held value is reloaded into the read pointer 533. In the example shown in FIG. 7, (n−2 × a−1) is set. Since all setting values are based on the PE address, it is not necessary to change the setting for each SIMD as in the first embodiment of the RAM control unit 53.
[0094]
FIG. 9 is a schematic block diagram showing a third embodiment of the RAM control unit 53 according to the present invention.
[0095]
In addition to the above processing, the RAM control unit 53 shown in FIG. 9 can use only a specific area of the image memory 6 in a ring shape.
[0096]
The RAM control unit 53 is obtained by adding two registers 537 and 538 and a comparator 539 to the RAM control unit shown in FIG. The register (LADDR) 537 sets the lower limit value of the address of the image memory 6, and the register (UADDR) 538 sets the upper limit value of the address of the image memory 6. The comparator 539 compares the current pointer 532 (533) with the values set in the LADDR 537 and UADDR 538, respectively.
[0097]
In the FIFO mode, when the UADDR 538 and the pointer 532 (533) coincide with each other and the value output from the address adjuster 531 exceeds the UADDR 538, the output from the address adjuster 531 to the AB 535 is negated, and the LADDR 537 receives the AB 535. Assert the output to
[0098]
Conversely, in the LIFO mode, when the LADDR 537 matches the pointer 532 (533) and the value output from the address adjuster 531 is lower than the LADDR 537, the output from the address adjuster 531 to the bus AB 535 is negated, and the UADDR 538 Assert the output to AB535. By configuring as described above, only a specific space of the memory space of the image memory 6 can be used.
[0099]
FIG. 10 is a schematic block diagram of the write buffer unit 54 and the read buffer unit 55 according to the embodiment of the present invention.
[0100]
The write buffer unit 54 and the read buffer unit 55 in this embodiment have dedicated ports for the even-numbered processor element 3a and the odd-numbered processor element 3a, and are configured to be able to access data for two processor elements in one cycle. It is possible to transfer data with the number of cycles that is half of the total number of processor elements.
[0101]
The write buffer unit 54 is configured to perform write access to the image memory 6 when data for four processor elements is stored in the buffer.
[0102]
Data for two processor elements supplied from the transfer port of the external interface 4 to the write buffer unit 54 is first stored in the flip-flops 541 and 542 and then transferred to the flip-flops 543 and 544 in the next stage. Subsequently, the given two processor elements are stored in flip-flops 541 and 542. Then, the data for four processor elements stored in the flip-flops 541 to 544 are stored in the latches 545 to 548, respectively. When the data for four processor elements is stored in the latches 545 to 548, the write buffer unit 54 performs write access to the image memory 6.
[0103]
The read buffer unit 55 stores data read from the image memory 6 for four processor elements in flip-flops 551 to 554 as buffers. Two processor elements are selected from the data for four processor elements by the multiplexer 555 and stored in the latches 556 and 557. The image data stored in the latches 556 and 557 is transferred to the register file 31 of the SIMD processor 1 via the external interface 4.
[0104]
When the transfer of data for four processor elements is completed, read access to the image memory 6 is performed again. Since the image memory 6 can access data for four processor elements at a time, if the access can be realized once in two cycles, the interface with the SIMD processor 1 can be taken, and the access time limit of the image memory 6 can be relaxed.
[0105]
FIG. 11 illustrates a thinning write operation for realizing reduction among the scaling processes often performed in digital copying, facsimile, and the like.
[0106]
In FIG. 11, the pixel data of the processor element (0), the processor element (2), etc. are stored in the image memory 6 as they are, and the pixel data of the processor element (1), the processor element (3), etc. are not stored in the image memory 6. Is thinned out.
[0107]
The memory controller 5 has two external write control signals for determining whether to thin out the data of the even-numbered processor element 3a and the odd-numbered processor element 3a. The memory controller 5 has an external terminal for notifying the memory controller 5 of the timing for starting transfer in order to synchronize the write control signal. The external synchronization signal for starting the transfer is input to the sequence unit, and when the synchronization signal is asserted, the sequence unit 51 starts the transfer.
[0108]
The sequence unit 51 detects the value of the write control signal, and when the external write control signal is “0”, writes the data in the buffer in the write buffer unit 54. Further, the sequence unit 51 detects the value of the write control signal, and when “1” is set, the sequence unit 51 suppresses storing the data in the buffer in the write buffer unit 54.
[0109]
Since the write buffer unit 54 does not output a transfer request to the image memory 6 to the RAM control unit 53 until data for four processor elements is stored, except for exceptions at the start and end of transfer, the address pointer The update of the data, the output to the image memory 6, the read / write control, and the byte select are not performed until the data for four processor elements is stored in the write buffer unit 54. Data reading from the data transfer port of the SIMD processor 1 continues regardless of the write control signal.
[0110]
FIG. 12 illustrates an overlapping read operation for realizing enlargement in the above scaling process. In FIG. 12, the data of the image memory 6 is written into the registers such as the processor element (0) and the processor element (2) sequentially from the position of the read pointer, and the registers such as the processor element (1) and the processor element (3) are written. The value written in the register of the previous processor element is written in duplicate.
[0111]
The memory controller 5 has two external read control signals for deciding whether or not the same data as before is written when the data of the even-numbered processor element 3a and the odd-numbered processor element 3a is written. .
[0112]
In order to synchronize the memory controller 5 and the overlap control signal, an external terminal for notifying the memory controller 5 of the timing for starting the transfer is provided. The external synchronization signal for starting the transfer is input to the sequence unit 51. When the synchronization signal is asserted, the sequence unit 51 starts the transfer.
[0113]
The sequence unit 51 detects the value of the read control signal. When “1” is set, the sequence unit 51 controls the multiplexer of the read buffer unit 55 to store the register of the processor element having the previous PE address. The data to be transferred can be output again. The read buffer unit 55 transfers the data to the image memory 6 to the RAM control unit 53 until all the data for the four processor elements read from the image memory 6 are unnecessary, except for exceptions at the start and end of the transfer. Since the read data transfer request is not output (for example, if the read control is always “0”, the data transfer port is accessed twice, and no data is required while the read control signal is 1). The updating of the address pointer and the output of the clock, read / write control, and byte select to the image memory 6 are not performed until all the data for the four processor elements are unnecessary. Data writing to the external interface 4 of the SIMD processor 1 continues regardless of the read control signal.
[0114]
In the above-described embodiment, the data can be transferred to the processor element 3a to which the even number of the SIMD processor 1 is assigned, and the data can be transferred to the processor element 3a to which the odd number is assigned. However, the transfer of image data to the SIMD processor 1 is not limited to this method. For example, as shown in FIG. 13, the present invention can also be applied to a configuration in which the processor element 3a of the SIMD processor 1 is configured to sequentially transfer data by address designation without distinguishing between odd and even. That is, as shown in FIG. 11, the register controller 31a is connected to the external interface 4 via the address bus 41a, the read / write signal 45c, and the clock signal 41c. When the register controller 31a is supplied from the memory controller 5 to the external interface 4 and receives an address designation signal via the address bus 41a, the register controller 31a decodes the address designation signal. If the decoded address matches the address assigned to its own processor element 3a, it is given from the memory controller 5 to the external interface 4 and read / read in synchronization with the clock signal from the clock signal 41c. A read / write instruction signal sent from the memory controller 5 is obtained via the write signal 41b. This read / write instruction signal is applied to the register 31b.
[0115]
In this embodiment, data stored in the image memory 6 provided outside the SIMD type processor 1 is placed on the data bus 46c as 8-bit parallel data. The data bus 46c is also used when the processed data held in the register 31b is sent to the image memory 6 provided outside the SIMD type processor 1.
[0116]
Address, read / write, clock, and data signals given from the external interface 4 are supplied to each register of the register file 31. Then, the address is decoded for each processor element 3a..., And only the processor element 3a that matches the address indicating each processor element 3a.
[0117]
When the memory controller 5 sends the data stored in the image memory 6 to the processor element 3a, the SIMD processor 1 configured as described above designates the address assigned to the processor element 3a once. Is input to the designated processor element 3a. In this example, since data is not simultaneously sent to the even and odd processor elements 3a, the data transfer takes time compared to the first embodiment, but the circuit configuration can be simplified.
[0118]
In the above-described embodiment, the processor element 3a is addressed. However, the present invention can be applied not only to a method of addressing the designation of the processor element 3a but also to a method of specifying a pointer, that is, a serial access memory method. This example will be described with reference to FIG. Here, the points different from the first embodiment described above will be described, and the description of the same points will be omitted. Moreover, the same code | symbol is attached | subjected about the same component as 1st Embodiment mentioned above.
[0119]
First, an I / O address, data, and control signal are given from the global processor 2 to the memory controller 5 via a bus. The global processor 2 sets commands such as an operation method in some operation setting registers (not shown) of the memory controller 5. Finally, the global processor 2 writes a start code in a start register (not shown) of the memory controller 5 so that the memory controller 5 automatically performs an operation according to the setting. The memory controller 5 generates this reset signal based on the command of the global processor 2 and sends it to the processor element block 3 from the external interface 4 via the reset signal 47. As a result, the register controller 31a is reset. Then, a clock signal is sent from the memory controller 5 to the register controller 31a closest to the external interface 4 via the external interface 4 and the clock signal 41c. In synchronization with this clock signal, the register controller 31a ′ obtains a read / write instruction signal sent from the memory controller 5 via the read / write signal 45a or 45b. This read / write instruction signal is applied to the register 31b of the processor element 3a to which the even number is assigned and to the register 31b of the processor element 3a to which the odd number is assigned. At this time, the read / write instruction signals sent to the register controller 31a ′ of the processor element 3a constituting one set may be different from those in the first embodiment.
[0120]
As a result, as in the case of the first embodiment described above, data can be transferred to the processor element 3a to which the even number is assigned by one pointer designation, and can also be transferred to the processor element 3a to which the odd number is assigned.
[0121]
FIG. 15 is a block diagram showing the configuration of another embodiment of the image processing apparatus including the memory controller 5 of the present invention described above. In FIG. 15, a memory controller 5 is arranged between data transfer ports of two independent register files 3. The present invention can also be applied to such a configuration.
[0122]
【The invention's effect】
As described above, according to the first aspect of the present invention, even when the number of processor elements is smaller than the number of pixels in the main scanning direction, the SIMD processor can easily perform calculations, and filter processing and the like. Only effective pixels can be transferred when performing the weighting calculation.
[0123]
According to the second aspect of the present invention, a line memory can be realized using a single port memory in an image processing system using a SIMD type processor.
[0124]
According to the third aspect of the present invention, when a SIMD type processor having a smaller number of processor elements than the number of pixels in the main scanning direction is used, when performing a weighting operation such as filter processing, only effective pixels are transferred. can do.
[0125]
According to the fourth aspect of the present invention, when a SIMD processor having a smaller number of processor elements than the number of pixels in the main scanning direction is used, when performing a weighting operation such as filter processing, the reference pixel data Can be transferred together.
[0126]
According to the invention described in claim 5, in the case of image processing having a plurality of processing steps, data can be stored for each processing step.
[0127]
According to the sixth and seventh aspects of the present invention, the time for transferring the data in the general-purpose registers of all the processor elements can be halved. Further, it is possible to provide a margin for the RAM access time.
[0128]
According to the inventions described in claims 8 and 9, since the scaling process in the digital image processing is realized by the memory controller, the versatility and circuit scale of the SIMD type processor itself can be maintained.
[0129]
According to the invention described in claim 10, when an image processing apparatus such as a digital copy having a very large number of pixels in the main scanning direction is realized, the architecture of the SIMD processor, such as increasing or decreasing the number of processor elements, is changed. It is possible to construct an image processing apparatus without any problems.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a SIMD type processor according to an embodiment of the present invention.
FIG. 2 is a block diagram showing an internal configuration of a SIMD type processor according to the first embodiment of the present invention.
FIG. 3 is a block diagram showing an internal configuration of a processor element according to the first embodiment of the present invention.
FIG. 4 is a block diagram showing a configuration of a memory controller 5 used in the present invention.
FIG. 5 is a schematic block diagram of an embodiment of a processor element (PE) control unit 52 of the memory controller 5 used in the present invention.
FIG. 6 is a schematic block diagram of a first embodiment of a memory (RAM) control unit 53 of the memory controller 5 used in the present invention.
FIG. 7 is an explanatory diagram of a pixel data transfer method in the case of using a SIMD type processor having a smaller number of processor elements 3a than the number of pixels in the main scanning direction.
FIG. 8 is a schematic block diagram showing a second embodiment of a RAM controller 53 of the memory controller 5 used in the present invention.
FIG. 9 is a schematic block diagram showing a third embodiment of a RAM control unit 53 of the memory controller 5 used in the present invention.
10 is a schematic block diagram of a write buffer unit 54 and a read buffer unit 55 of the memory controller 5 used in the present invention. FIG.
FIG. 11 is an explanatory diagram of a thinning write operation for realizing reduction in the scaling process.
FIG. 12 is an explanatory diagram of an overlapping read operation in the scaling process.
FIG. 13 is a block diagram showing an internal configuration of a SIMD type processor according to another embodiment of the present invention.
FIG. 14 is a block diagram showing an internal configuration of a SIMD type processor in still another embodiment of the present invention.
FIG. 15 is a schematic block diagram showing another embodiment of the present invention.
FIG. 16 is a schematic block diagram of an image processing apparatus using a conventional SIMD system.
[Explanation of symbols]
1 SIMD type processor
2 Global processor
4 External interface
5 Memory controller
6 Image memory
51 Sequence unit
52 PE controller
53 RAM controller
54 Write buffer
55 Read buffer section

Claims

A processor element of a SIMD type processor having a computing means for computing data, a data holding means for holding data computed by the computing means and holding data computed by the computing means, and a plurality of the processors A data transfer bus connected to each of the elements; designation means for designating a predetermined processor element based on an address assigned to the plurality of processor elements; an address bus for supplying an address to the designation means; and the plurality of processors A data transfer interface for accessing the data holding means incorporated in the element from outside the processor, and an interface for specifying the predetermined processor element connected to the data transfer interface and supplied to the address bus A memory controller that reads data stored in the external memory, writes data to the processor element, reads data from the processor element, and writes data to the memory, and The memory controller specifies the address of the processor element that starts the transfer and the address of the processor element that ends the transfer when transferring data from the data holding means of each processor element to the memory via the data transfer interface. and, except for a predetermined number of data, it has row reading of operation data already from each processor element, and sets a pointer value corresponding to the address of the processor elements to initiate a transfer, before on the basis of this pointer value Symbol Memory Performs writing of data, when said from memory through the interface for data transfer to the data transfer to the data holding means of each processor element, returns the minutes of the address corresponding to the number of data of a predetermined number excluding the preceding processing set the pointer value, it reads data from the memory on the basis of this pointer value, prior to generating the address for specifying the predetermined processor element to be supplied to the address bus in response to said pointer value Symbol A signal processing apparatus, wherein data is written to a processor element, data is divided, and arithmetic processing is performed by a SIMD type processor.

The signal processing apparatus according to claim 1, wherein the memory controller performs write transfer and read transfer to the memory in a time-sharing manner, and controls the single port memory as a FIFO or LIFO memory.

The memory controller designates the address of the processor element that starts the transfer and the address of the processor element that ends the transfer when transferring data from the data holding means of each processor element to the memory via the data transfer interface. The signal processing apparatus according to claim 1, further comprising a register.

When transferring data from the memory to the data holding means of each processor element via the data transfer interface, the memory controller returns the address of the processor element that ends the transfer and the read pointer of the memory after the transfer ends. 4. The signal processing apparatus according to claim 1, further comprising a register for designating an address of the processor element.

5. The signal processing according to claim 1, wherein the memory controller sets a lower limit value and an upper limit value of an arbitrary address area of the memory by using a register and uses the area in a ring shape. apparatus.

6. The data transfer interface according to claim 1, wherein the data transfer interface can simultaneously access data holding means of a processor element having an even address and a processor element having an odd address. Signal processing equipment.

The bit width of the input / output bus with the memory is wider than the data transfer bit width with the processor element, and the access time for data transfer with the memory is longer than the data transfer with the processor element. The signal processing device according to any one of 1 to 6.

When data is transferred from the data holding means of each processor element to the memory via the data transfer interface, the data is written when the external write control signal synchronized with the data transfer is “0”. 8. The signal processing apparatus according to claim 1, wherein data writing is prohibited when “1”.

When data is transferred from the memory to the data holding means of each processor element via the data transfer interface, if the external read control signal synchronized with the data transfer is "0", the memory The data read value is written to the data transfer interface, and when it is “1”, the same value as the data transferred when the external signal was “0” last before the transfer is transferred to the data transfer interface. The signal processing apparatus according to claim 1, wherein the signal processing apparatus writes the data into an interface.

Image data is stored in the memory, the number of processor elements of the SIMD processor is configured to be smaller than the number of pixels in the main scanning direction, and the memory controller divides all pixels in the main scanning direction into two or more. The signal processing apparatus according to claim 1, wherein control of data writing and reading processing is performed so as to perform processing.