JP3590973B2

JP3590973B2 - Memory access device

Info

Publication number: JP3590973B2
Application number: JP31516092A
Authority: JP
Inventors: 雄一鈴木; 健一中西; 和彦西脇
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1992-11-25
Filing date: 1992-11-25
Publication date: 2004-11-17
Anticipated expiration: 2019-11-17
Also published as: JPH06161884A

Description

【０００１】
【産業上の利用分野】
本発明は複数のメモリを並列に配置して同時に読出す様にした特に画像処理分野で用いるメモリアクセス装置の改良に関する。
【０００２】
【従来の技術】
従来からメモリアクセス装置として、比較的読み出し速度の遅いＤＲＡＭ等のメモリを高速に処理する手法としては図２４に示す様なメモリを並列に配置して同時に読み出すメモリアクセス装置が知られている。これらはＫ次元に配列されたメモリに適用可能であるが説明を簡略化のために以下１次元の構成で説明する。
【０００３】
図２４で入力端子Ｔ_１にはアドレスＸが入力され、このアドレスＸは演算回路１に供給される。この演算回路１はアドレスＸを上位ビット及び下位ビットに分割する。即ち、例えば上位アドレス演算部１ａでは上位ビットに分割して出力し、この上位ビットは並列配列したメモリ素子（図ではＮ＝４の例である）２ａ〜２ｄのアドレスＸとして供給される。この上位ビットは（メモリ素子数Ｎ＝４で外部アドレスＸを割った商）Ｘｑであり、データ幅Ｗ＝４の場合はアドレスＸから下位２ビットを欠いたものと成る。
【０００４】
又、下位アドレス演算部１ｂからは下位ビットのアドレスデータが分割出力され選択回路３に供給される。この選択回路３には入力端子Ｔ_２によりデータ幅Ｗ（ここではＷ＝１〜４）を与えることで下位２ビット（Ｎ＝４の剰余）Ｘｒとデータ幅Ｗによって選択データＥｉ（ｉ＝Ｅ_０〜Ｅ_３）が複数のメモリ素子２ａ〜２ｄに供給され、第Ｘｒ番目以後のＷ個のメモリ素子を「オン」することによりＷ個のデータＤ_０〜Ｄ_３を並列的にアクセスすることになる。
【０００５】
上述の並列的メモリアクセス装置の選択回路３の動作を図２５に依って説明する。図２５でＷ（１〜４）はデータ幅、Ｘ_rはメモリ数Ｎで外部アドレスＸを割った商Ｘ_Qの剰余、Ｅ ₀〜Ｅ₃は選択回路３からの選択信号である。
【０００６】
データ幅Ｗ＝１の時は選択信号Ｅ_０〜Ｅ_３のうちの１つが剰余Ｘｒ０〜３によって選択される。データ幅Ｗ＝２の時は選択信号Ｅ_０〜Ｅ_３のうち剰余Ｘｒ０〜３によって２つが並列的にアクセスされる。ここでＷ＝２，Ｘｒ＝３は禁止されることになる。以下同様にデータ幅Ｗ＝３の時は選択信号Ｅ_０〜Ｅ_３のうち剰余Ｘｒ０〜３によって３つが並列的にアクセスされ、Ｗ＝３，Ｘｒ＝２及びＸｒ＝３が禁止される。データ幅Ｗ＝４の時は選択信号Ｅ_０−Ｅ_３のうち剰余Ｘｒ０〜３によって４つが並列的にアクセスされ、Ｗ＝３，Ｘｒ＝１，Ｘｒ＝２，Ｘｒ＝３が禁止される。
【０００７】
即ち、図２５では４個のメモリ素子２ａ〜２ｄに同一アドレスＸｑが同時に与えられるため、４の倍数の境界を跨ぐ並列アクセスが出来ない、この制限はデータ幅Ｗが増大すると共に厳しくなり、図２５のＷ＝４ではＸｒ＝０以外へはアクセス出来ないことが解る。
【０００８】
【発明が解決しようとする課題】
一般にメモリアクセス装置を画像処理等に用いる場合には以下に述べる各項目が要求される。
（イ）大量のデータを処理するので、出来るだけ多くのデータを並列的にアクセスしたい。
（ロ）画像の最小単位である画素は、通常１バイトで表されるので、１バイト単位でアドレスを指定する必要がある。
（ハ）様々な大きさの画像を取扱うため、画像の境界（上下左右端）を考慮して処理する必要がある。
所が上述の従来のメモリアクセス装置ではメモリ素子２ａ〜２ｄを並列に配設するメモリ数を増加させる程、即ち、データ幅Ｗが大きくなる程、アドレスの制限が大きくなり、（イ）及び（ロ）項を同時に満足することが出来ない問題があった。
【０００９】
更に（ハ）項の要求に従った境界判定をする場合にはアドレスの制限があるため、並列アクセスが行なわれず後述するも図２６示す様に１バイトのアクセス毎に単純な比較を行なうことで画像処理が行なわれていた。更に、本発明に用いる後述するデータ交換回路は回路規模が大きくなるため、より回路規模の小際データ交換回路が要望されていた。
【００１０】
本発明は叙上の問題点を解消したメモリアクセス装置を提供しようとするもので、その目的とするところは並列化した単位でデータを取り出す際に１バイト単位でアドレス指定が出来、且つ大量データを並列的に同時に取り出し得る様にしたものであり、更に上述の（ハ）項を満足させる領域判定手段をも得ようとすると共にデータ交換回路の回路規模を小さくしたメモリアクセス装置を得ようとするものである。
【００１１】
【課題を解決するための手段】
本発明のメモリアクセス装置は、データの格納された複数のメモリ素子よりなるメモリユニットから並列データを出力するメモリアクセス装置であって、アドレスを発生するアドレス発生手段と、アドレス発生手段から供給されるアドレスをメモリ素子数で除した上位ビットからなる第１のアドレスデータと、アドレス発生手段から供給されるアドレスをメモリ素子数で除した剰余の下位ビットからなる第２のアドレスデータに分割する分割アドレス制御手段と、分割アドレス制御手段からの第２のアドレスデータで制御され、第１のアドレスデータに１または零または−１を加算して、複数のメモリユニットに第１のアドレスデータを更新した第３のアドレスデータを供給するアドレス更新手段と、第３のアドレスデータにより更新されて、アドレスの順序に並べられていない複数のメモリユニットに格納された上記データが供給される複数のレジスタと、第２のアドレスデータが入力され、上記アドレスの順序に並べられていない上記データに対応する切換制御データを出力するデコーダと、デコーダの切換制御データに基づいて複数の重み係数の１つを選択して出力する重み係数の格納された複数の重み係数レジスタと、複数のレジスタからのアドレスの順序に並べられていない複数のメモリユニットに格納されたデータと、複数の重み係数レジスタから供給される重み係数を乗算する複数の乗算手段と、複数の乗算手段からの乗算出力を加算する総和手段と、を具備する様に成したことを特徴とするメモリアクセス装置としたものである。
【００１２】
本発明のメモリアクセス装置は、アドレス更新手段から第３のアドレスデータが供給される複数のメモリユニットに対して、予め定められた有効領域を基に複数の領域判定信号を供給してメモリユニットを動作させる領域判定手段を付加したものである。
【００１３】
【作用】
上述せる本発明両面ディスクによれば、Ｎ個のメモリに対し、外部アドレスＸをメモリ数Ｎで除算した商Ｘｑと剰余Ｘｒを発生させ、この剰余Ｘｒにより商Ｘｑに対し１，０，又は−１を加算する様にしたので従来の様に外部アドレスＸが必ずしも並列に配設したメモリ数Ｎの倍数でなくても並列的にアクセス可能なものが得られる。又、メモリに記憶された画像データを処理する場合に任意の画素を中心とするＮ個の近傍画素データを同時にアクセス出来て、処理速度の速いものが得られる。又メモリユニット内のデータ交換回路を処理回路内の乗算回路で兼用することができ、この乗算回路の重み係数レジスタの個数はＮに比例するのでメモリユニットの並列数Ｎが大きい時に効果を発揮させる事ができる。
【００１４】
【実施例】
以下、本発明のメモリアクセス装置の一実施例を図面について詳記する。本例のメモリアクセス装置はＮ次元のアドレス空間をなすＫ次元に配列したＮ個のメモリに適用可能であるが簡単のためアドレス空間を１次元のものについて説明を進める。
【００１５】
本例のメモリアクセス装置を図１で詳記するに先だち、図２に依って本例のメモリアクセス回路が適用可能な画像処理装置の全体的システムを説明する。
【００１６】
図２で６は本例のメモリ回路であり、予めメモリ素子内に画像データが記憶されているものとする。このメモリ回路６へアドレス発生回路７から外部アドレスＸが供給され、メモリの処理部位を指定する。メモリ回路６は処理部位のアドレスＸ近傍の画素データＤｉ（ｉ＝０〜Ｎ−１）を処理回路８に供給する。処理回路８は画素データＤｉを処理し、その処理結果Ｓを出力装置９に出力する。
【００１７】
上述のメモリ回路１の一実施例を図１に示す。図１では領域判定を行なうための領域判定回路１８は後述するもＥｉ（ｉ＝０〜Ｎ−１）を出力し、複数メモリ素子２ａ〜２ｎに供給される。アドレス発生回路７で発生した外部アドレスＸは入力端子Ｔ_１に供給され分割アドレス制御手段を構成する演算回路１に供給される。この演算回路１は除算器でありメモリ素子２ａ〜２ｎの数Ｎ即ち並列数ＮでアドレスＸを割った商Ｘｑとその剰余Ｘｒを出力する上位アドレス演算部１ａと下位アドレス演算部１ｂより成り、商ＸｑはアドレスＸの上位ビットを剰余ＸｒはアドレスＸの下位ビットを表す。
【００１８】
剰余ＸｒはＮ個のアドレス変換回路４ａ〜４ｎに供給されると共にデータ変換回路５にも供給される。商ＸｑもＮ個のアドレス変換回路４ａ〜４ｎに供給される。
【００１９】
アドレス変換回路４ａ〜４ｎは剰余Ｘｒによって制御される一種の加算器であり、剰余Ｘｒの値により１又は０又は−１を商Ｘｑに加算することで異なった変換を受け、アドレスが更新された更新アドレスＸｑｉ（ｉ＝０〜Ｎ−１）を出力し、各メモリ２ａ〜２ｎにアドレスとして供給される。
【００２０】
複数の各メモリ２ａ〜２ｎに供給される領域判定回路１８からの判定信号Ｅｉ（ｉ＝０〜Ｎ−１）が「オン」であればメモリ素子２ａ〜２ｎが動作し、データ変換回路５にデータＤｉ′（ｉ＝０〜Ｎ−１）を出力する。
【００２１】
データ変換回路５は外部アドレスＸの剰余Ｘｒによって制御され、メモリ素子２ａ〜２ｎからの入力データＤｉ′をアドレス順に並べ換えてデータＤｉ（ｉ＝０〜Ｎ−１）として図２に示す処理回路８に出力する様に成されている。
【００２２】
上述の図１に示したメモリ回路の動作を図３乃至図９によって説明する。
【００２３】
図３は外部アドレスＸのデータが４つのメモリ素子２ａ〜２ｄ（Ｍ_０，Ｍ_１，Ｍ_２，Ｍ_３）のどの位置に記憶されているかを示すアドレスマップを示すものでＸｑはＸ／４の商を示している。依って、図３で例えば、破線の丸印で示す様にアドレスＸ＝４の位置に置くデータはメモリ素子Ｍ_０の第１のアドレスにありＸ＝１１の位置に置くデータはメモリ素子Ｍ_３の第２のアドレスにあることに成る。
【００２４】
外部アドレスＸは図１で説明した様に商Ｘｑと剰余Ｘｒに分割されるが、このうちの剰余Ｘｒはメモリ素子の番号に一致し、商Ｘｑは各メモリ素子内のアドレスに一致している。即ち、Ｘ＝１１のとき、Ｘｑ＝２，Ｘｒ＝３であるが、これはメモリ素子Ｍ_３の第２のアドレスを意味することが図３から明らかである。
【００２５】
表現を変えて、各メモリ素子に外部アドレスＸが与えられたとき、選択されるメモリ素子Ｍ_０〜Ｍ_３とそのアドレスＸｑとの関係を図４に示す。図４で丸印で示す様にＸ＝４のとき、メモリ素子Ｍ_０の第１のアドレス“１”が選択され、Ｘ＝１１のとき、メモリ素子Ｍ_３の第２のアドレス“２”が選択されていることが解る。
【００２６】
本発明の第１の目的は、この様なＮ個のメモリ素子（ここではＮ＝４）のデータを並列的にアクセスすることである。今、図４で外部アドレスＸ＝４を与えたとき、同時に並列的にアクセス可能なＸ＝４，５，６，７のデータを選ぶにはメモリ素子Ｍ_０，Ｍ_１，Ｍ_２，Ｍ_３のすべてに商Ｘｑ＝１としてのアドレスを与えてやればよいことが図から読みとれる。
【００２７】
所が図４の丸印で示すＸ＝１１の例ではメモリ素子Ｍ_３にアドレスＸｑ＝２を与え、メモリ素子Ｍ_０,Ｍ_１,Ｍ_２に対しては（Ｘｑ＋１）＝３をアドレスとして与えている。
【００２８】
従来の図２４に示す並列メモリアクセス装置ではメモリ素子Ｍ_０〜Ｍ_３に同一のアドレスＸｑが供給されているため、Ｘ＝１１の様な境界１０に跨がる様な並列アクセスは図２５に示す様に禁止されていた。
【００２９】
そこで本例のアドレス変換回路４ａ〜４ｎは図４の例ではメモリ素子Ｍ_０，Ｍ_１，Ｍ_２に対しメモリ素子のアドレス２を（Ｘｑ＋１）＝３と加算することでメモリを更新することで従来禁止されていた並列アクセスを可能にすることが出来る。
【００３０】
図５は外部アドレスＸが与えられたとき、商Ｘｑに対して各アドレス変換回路４ａ〜４ｄ（Ｔ₀,Ｔ₁,Ｔ₂,Ｔ₃)が加算する値を剰余Ｘｒの値にしたがって整理したものである。このマップから解る様に，外部アドレスＸ＝１１のとき、メモリ数Ｎ＝４で外部アドレスＸ＝１１を除した商は２で、剰余Ｘｒ＝３はであり、これはメモリ素子Ｍ₃であるからアドレス変換回路４ａ〜４ｄ（Ｔ₀,Ｔ₁,Ｔ₂,Ｔ₃)のうちＴ₀,Ｔ₁,Ｔ₂がアドレスに１を加算し（Ｘｑ＋１）に変換しなければならないことが図から読み取れる。
【００３１】
図６は図５に示すＮ＝４のケースをＮ個並列にした場合まで拡張したものである。このマップからアドレス変換回路４ａ〜４ｎ（Ｔｉ＝ｉ＝０〜Ｎ−１）は剰余Ｘｒ＞ｉのときメモリ素子のアドレスを（Ｘｑ＋１）に変換すればよいことになる。
【００３２】
上述の実施例ではアドレス変換回路４ａ〜４ｂのアドレス更新手段は外部アドレスＸが与えられたとき、外部アドレスＸ以降のＮ個のデータを並列的にアクセスする場合を説明したが、例えば画像処理装置等においてはアドレスＸを中心とする前後Ｎ個のデータに処理を施す必要がある。この様な処理を行なうためには本来のアドレスＸから（Ｎ−１）／２を引いた値をＸとして与えてやればよいが、叙上のアドレス変換回路４ａ〜４ｂを工夫することでも、これを実現することが出来る。以下、これを説明する。
【００３３】
図７は、メモリ数Ｎ＝５の場合で外部アドレスＸを中心とする５つのデータを並列的にアクセスするときのもので図４と同様に外部アドレスＸに対するメモリ素子とそのマップであり、同図において丸印で示す様にＸ＝２が与えられ、メモリＭ ₀,Ｍ ₁,Ｍ ₂,Ｍ ₃,Ｍ ₄ を同時にアクセスする場合を考えると、Ｘｑをすべてのメモリ素子Ｍ₀〜Ｍ₄に供給し、同様に丸印で示すＸ＝９でメモリＭ ₂,Ｍ ₃,Ｍ ₄,Ｍ ₀,Ｍ ₁ を同時にアクセスする場合はメモリ素子Ｍ₂,Ｍ₃,Ｍ₄にＸｑを与え、メモリ素子Ｍ₀,Ｍ₁に（Ｘｑ＋１）を供給する。一方丸印で示すＸ＝１５でメモリＭ ₃,Ｍ ₄,Ｍ ₀,Ｍ ₁,Ｍ ₂ を同時にアクセスする場合はメモリ素子Ｍ₀,Ｍ₁,Ｍ₂,に対しＸｑを与え、メモリ素子Ｍ₃,Ｍ₄では（Ｘｑ−１）を供給する必要がある。
【００３４】
これらを図５と同様の様式にまとめると図８の如く成る。
【００３５】
更に図６に示したと同様の図９の様に（Ｘ−ｎ）から（Ｘ−ｎ＋Ｎ−１）までのＮ個のデータを並列にアクセスするには、Ｘｒ＜（ｉ−ｎ）のとき（Ｘｑ−１），Ｘｒ＞（ｉ＋ｎ）のとき（Ｘｑ＋１）に変換するよう、各アドレス変換回路Ｔｉを設計すればよい。
【００３６】
次に図１に示したデータ変換回路５の動作を説明する。一般の画像処理では処理方法はアドレスＸとの相対位置によって決められることが多く、例えばアドレスＸの処理はＰ_０、アドレス（Ｘ＋１）の処理はＰ_１‥‥‥等と成っている。従って複数のデータを並列に処理回路へ送るには、アドレス順に配列する必要がある。ここで、上述した図４の例において、Ｘ＝１１の場合を見ると、Ｘ＝１１，１２，１３，１４に相当するデータは、それぞれメモリ素子Ｍ_３，Ｍ_０，Ｍ_１，Ｍ_２に対してアクセスされる。これを、メモリ素子の配置順に整理すると、図１０のメモリ素子出力Ｄｉ′の様に１２，１３，１４，１１のようになり、これを図１０のデータ変換回路出力Ｄｉの様に１１，１２，１３，１４と並び変える必要がある。
【００３７】
図１１はＮ＝４の場合にアドレスの剰余Ｘｒによってメモリ素子２ａ〜２ｄの出力データＤｉ′の配置がどの様に変化するかを表したものである。図１０の例は図１１のＸｒ＝３の場合に相当することが解る。
【００３８】
図１１の場合、各メモリ素子のデータを、アドレスの小さいほうへＸｒ個だけローテーションを施す様にデータ変換回路を設計することにより、データ変換回路出力Ｄｉのごとくデータを整列させることができる。
【００３９】
更に、一般のＮ並列の場合において、（Ｘ−ｎ）から（Ｘ−ｎ＋Ｎ−１）までのＮ個のデータをアクセスするように拡張したのが図１２である。このとき、各メモリ素子のデータを、アドレスの小さいほうへ、（Ｘｒ−ｎ）個だけローテーションを施す（但し、（Ｘｒ−ｎ）＜０のときは反対方向へ）ようにデータ交換回路５を設計すればよい。
【００４０】
上述の図１に示す構成ではメモリ回路６内にデータ交換回路５を設けた場合を説明したが図２に示す処理回路８にこの機能を持たせることによってメモリ回路６内のデータ変換回路を省略させることも出来る。
【００４１】
この様な処理回路８の例として画像処理に多用される積和演算回路を図１３で説明する。
【００４２】
図１３に於いて、図１のメモリ２ａ〜２ｎからのアドレスの順序に並べられていないデータＤｉ′（ｉ＝０〜３）はデータ変換回路５を介さずに直接レジスタ１１に供給される。このレジスタ１１に保持されたアドレスの順序に並べられていないメモリからの出力データＤｉ′は複数の乗算回路１２ａ，１２ｂ，１２ｃ，１２ｄに供給される。
【００４３】
一方、これら乗算回路１２ａ〜１２ｄにはメモリ２ａ〜２ｎから来たアドレスＸ，Ｘ＋１，Ｘ＋２，Ｘ＋３におけるデータＤｉ′に重み係数Ｗ₀,Ｗ₁,Ｗ₂,Ｗ₃を掛け、それらの総和を算出するために、重み係数用レジスタ１３から重み係数Ｗｊ（ｊ＝０〜３）が供給される。
【００４４】
重み係数レジスタ１３内には４つのレジスタＵ１０，Ｕ１１，Ｕ１２, Ｕ１３を有する。重み係数Ｗ _０，Ｗ _１，Ｗ _２，Ｗ _３・Ｗ _３，Ｗ _０，Ｗ _１，Ｗ _２・Ｗ _２，Ｗ _３，Ｗ _０，Ｗ _１，・Ｗ _１，Ｗ _２，Ｗ _３，Ｗ _０が格納された４個のレジスタ１３にはデコーダ１４からの切換え用の制御信号が供給され、４つのレジスタのうちの一つを選択する。
【００４５】
デコーダ１４には図１に示した演算回路１からの剰余Ｘｒが供給されているので、この剰余Ｘｒによって重み係数が制御され、デコーダ１４ではＸｒ＝０の時レジスタＵ１０を、Ｘｒ＝１の時レジスタＵ１１を、Ｘｒ＝２の時レジスタＵ１２をＸｒ＝３の時レジスタＵ１３が選択される様にデコードされて乗算回路１２ａ〜１２ｄに送出される。重み係数Ｗｊの配列は図１４に示す様に成される。
【００４６】
乗算回路１２ａ〜１２ｄで夫々のデータＤｉに重み係数Ｗ_０〜Ｗ_３が乗算された乗算出力Ｐｉ（ｉ＝０〜３）は総和回路１５に供給されて加算され、最終結果データＳが図２に示した出力装置９に供給されることになる。
【００４７】
ところで、図１４に示したデコード表は図１１がＮ＝４の場合のデータ配列であることを思い出してみると、ＸとＷ_０、Ｘ＋１とＷ_１、Ｘ＋２とＷ_２、Ｘ＋３とＷ_３が、正に対応していることがわかる。即ちデータ交換回路５と同じ効果が得られるわけであり、このデータ変換回路５を省略できることが解る。
ここで、データ交換回路５と、重み係数レジスタ１３のＵ１０〜Ｕ１３の回路規模について考えてみると、データ交換回路５の回路規模は、並列数Ｎの２乗に比例するのに対し、重み係数レジスタ１３の個数はＮに比例する。従って、図１３の構成は、並列数Ｎが大きい時に効果を発揮することになる。
【００４８】
上述の図１で示した構成はメモリを１次元配列したメモリアクセス装置であるが複数次元に拡張可能である。図１５ではこれを２次元まで拡張した構成を示す。
【００４９】
図１５で２ａ〜２ｐはメモリユニットであり、１及び１′は列及び行の分割アドレス制御回路である。この分割アドレス制御回路１及び１′内の上位アドレス演算及び下位アドレス演算部１ａ及び１ａ′並に１ｂ及び１ｂ′は別の定義を施せば、元のアクセスすべき外部アドレスをＸとし、並列に配置されるメモリユニット２ａ〜２ｐの数をＮとすればＸ／Ｎ＝Ｘｑ即ち除算を施した商Ｘｑが上位アドレス演算部１ａに対応し、余りＸｒが下位アドレス演算部１ｂとなる。依ってＸ＝Ｘｑ＋Ｘｒと表すことが出来、分割アドレス制御回路１，１′は除算器の演算回路で構成可能となる。
【００５０】
この演算部１，１′からの商Ｘｑと剰余ＸｒはＰＬＤ（プログラマブル論理デバイス）等で構成されたアドレス更新回路４ａ，４ｂ，４ｃ，４ｄ，４ａ′，４ｂ′，４ｃ′，４ｄ′に供給される。
【００５１】
列及び行のアドレス更新回路４ａ〜４ｄ及び４ａ′〜４ｄ′は元の外部アドレスＸをメモリユニットの数で除算した商Ｘｑの値（アドレスの上位ビット）に対し、その余りＸｒ（アドレスの下位ビット）で制御が成され、オフセットに対応する様な制御を行なう。
【００５２】
即ち、格子状配列させたメモリユニット２ａ〜２ｐの第１及び第２列（行）に対しては商Ｘｑをそのまま通過させるか商Ｘｑの値に１を加算するかの０／＋１の選択が剰余値Ｘｒで行なわれる。又第３列（行）に対しては商Ｘｑはそのまま通過させる０／０の選択が剰余値Ｘｒで行なわれる。同様に第４列（行）に対しては商Ｘｑから１を減算するか、そのまま通過させるかの−１／０の選択が剰余値Ｘｒで行なわれて、図２に示す様にメモリユニット２から所定のデータが読み出されて処理回路８に供給されることになる。
【００５３】
上述の図１５の構成では２次元構成のメモリユニット２ａ〜２ｐについて説明したが３次元構成の立方体状に構成させたものについて考える場合はＸ及びＹ軸方向と同様にＺ軸方向の分割アドレス制御回路１と同じくＺ軸方向のアドレス更新回路４ａ″，４ｂ″４ｃ″，４ｄ″を設ければよい。
【００５４】
次に図１で説明した領域判定回路を説明する。本例のメモリアクセス回路が利用される画像処理に於いては画像データは例えば図１６に示す様にアドレス空間Ａ（Ａ／Ｂ）の中の有効区間Ｂ（Ｘｍｉｎ，Ｘｍａｘ）上に置かれ、有効区間Ｂ以外のデータ１６のアドレスＸｍｉｎ〜Ｘｍａｘで区切られる境界によって分割されたデータ１６の斜線部分のデータ１７は保証されないことが多い。この為にあるアドレスＸにおけるデータの処理について、Ｘ近傍の複数データを利用する画像処理においては、以下のいずれかの方法をとる必要がある。
（ａ）処理する近傍データ数は変えず、近傍データがすべて有効区間Ｂに含まれる様に、処理する画像の範囲を、有効区間Ｂの境界より少し内側に制限する。
（ｂ）処理する画像の範囲は変えず、有効区間Ｂの全区間に対して処理するが、処理データが有効区間Ｂからはみ出さない様に、境界付近の近傍データ数を小さくしていく。
（ｃ）近傍データのうち、有効区間Ｂに属さないものについては、予め定めた背景値に置き換えたうえで、通常の処理を施す。
【００５５】
しかし、上記（ａ）の方法では、処理後の画像が小さくなるという問題がある。また、（ｂ）の方法では、例外処理が複雑と成り、高速化や回路規模の点で不利である。
【００５６】
一方、（ｃ）の方法を採れば、例えば白紙に枠どりなしで画像を印刷する場合は背景色を白に指定するなど、用途に応じた背景値を用いれば、画像全体に同一処理を施すことができる。
【００５７】
さらに、上記（ａ）および（ｂ）においては、有効区間Ｂ以外のアドレス指定が禁止されるのに対し、（ｃ）では許されているため、より自由なアドレス指定が可能であり、特に画像の幾何学変換等に有効である。
【００５８】
この様な領域判定回路１８の従来の構成を図２６に示す。同図で１９及び２０は、夫々有効領域Ｂの最大値Ｘｍａｘ、最小値Ｘｍｉｎを保持しているレジスタであり、この最大値、Ｘｍａｘ及び最小値Ｘｍｉｎが比較回路２１及び２２に供給されると共に指定アドレスＸもこれら比較回路２１及び２２に供給されて、その値の大小の比較が成される。この比較回路は例えばＴＴＬの７４ＬＳ６８２等の代表されるものである。
【００５９】
比較回路２１は比較出力Ｅｈ（Ｘ＞Ｘｍａｘ）が出力され、アドレスＸが有効区間Ｂの境界の上限値Ｘｍａｘを越えたことを示し、比較回路２２は比較出力Ｅｌ（Ｘ＜Ｘｍｉｎ）を出力し、アドレスＸが有効区間Ｂの境界の下限値Ｘｍｉｎを越えたことを示す。これら両出力Ｅｈ及びＥｌはオアゲート回路２３で論理和され領域判定信号Ｅｉが出力されて出力端子Ｔ_２を介して各メモリへ供給される。
【００６０】
同図において明らかな様に、従来技術では、与えられた１つのアドレスＸにつき、１つの判定信号Ｅｉが得られていた。しかし、本例では１つのアドレスＸから、Ｎ個のデータをアクセスするためＮ個の判定信号を生成する必要がある。この様な領域判定回路１８の構成を図１７で説明する。
【００６１】
本発明に用いられる領域判定回路１８は、予め定められた有効領域Ｂの上限値Ｘｍａｘ、下限値Ｘｍｉｎをもとに、任意に与えられたアドレスＸに対し、Ｎ個の領域判定信号Ｅｉ（ｉ＝０〜Ｎ−１）を生成して、図１における各メモリ素子２ａ〜２ｎに供給するものである。
【００６２】
図１７で、レジスタ２４及び２５は有効領域Ｂの最大値Ｘｍａｘ、最小値Ｘｍｉｎによって決まる、上限基準値ＸＨ、下限基準値ＸＬを保持してこれら基準値ＸＨ及びＸＬを減算回路２６及び２７に供給する。
【００６３】
減算回路２６及び２７には指定アドレスＸが与えられ、減算回路２６では基準値ＸＨからＸを引いた差出力Ｘｈを出力し、この差出力Ｘｈは除算回路２９に供給される。除算回路２９ではメモリの並列数Ｎで割った商Ｘｈｑと剰余Ｘｈｒに分割されてエンコーダ３１に供給される。エンコーダ３１は商Ｘｈｑと剰余ＸｈｒからＮ個の上限判定信号Ｅｈｊ（ｊ＝０〜Ｎ−１）を生成する。
【００６４】
同様に減算回路２７は基準ＸＬからＸを引いた差出力Ｘｌを出力し、この差出力Ｘｌは除算回路３０に供給される。除算回路３０ではメモリの数Ｎで割った商Ｘｌｑと剰余Ｘｌｒに分割され、エンコーダ３２に供給される。エンコーダ３２は商Ｘｌｑ及び剰余ＸｌｒからＮ個の下限判定信号Ｅｌｊ（ｊ＝０〜Ｎ−１）を生成する。
【００６５】
上述のエンコーダ３１及び３２から得られた上限判定信号Ｅｈｊ及び下限反転信号Ｅｌｊはオアゲート回路３３によって各ｉにつき論理和され、判定信号Ｅｊ′（ｊ＝０〜Ｎ−１）として交換回路３４に送出される。
【００６６】
一方、除算回路２８はアドレスＸをＮで割って剰余Ｘｒを出力して交換回路３４に供給する。交換回路３４は剰余Ｘｒの制御によって判定信号Ｅｊ′の順序を交換して判定信号Ｅｉ（ｉ＝０〜Ｎ−１）を出力して各メモリ２ａ〜２ｎに供給する。
【００６７】
以下、図１８及び図１９によってエンコーダ３１の動作原理を説明する。簡単のため並列数＝４において、アドレスＸが与えられた時に（図３〜図５で示した場合に相当）アドレスＸからＸ−３までを考えることにする。
【００６８】
図１８は、アドレスＸが与えられたとき、Ｘ〜Ｘ＋３のデータと、画像領域の上限Ｘｍａｘとの位置関係を表す図である。図で、＊印は、有意の画像データを表し、≫は、そのデータが、Ｘｍａｘの位置に存在することを示す。
ＸとＸｍａｘの関係は、下記の３つの場合に分類できる。
【００６９】
（ｄ）Ｘｍａｘが、Ｘ〜Ｘ＋３のどれよりも大きい場合
Ｘ〜Ｘ＋３のすべてが領域内である。（図１８（１））
（ｅ）Ｘｍａｘが、Ｘ〜Ｘ＋３のどれかと一致する場合
Ｘ〜Ｘ＋３の一部が領域外である。（図１８（２）〜（５））
（ｆ）ＸｍａｘがＸ〜Ｘ＋３のどれよりも小さい場合
Ｘ〜Ｘ＋３のすべてが領域外である。（図１８（６））
【００７０】
ここで、（Ｘｍａｘ−Ｘ）をＸｈで表すと、次のように言い換えられる。
（ｄ′）Ｘｈ＞３のとき、Ｘ〜Ｘ＋３のすべてが領域内である。
（ｅ′）０≦Ｘｈ≦３のとき、Ｘ〜Ｘ＋３の一部が領域外である。
（ｆ′）Ｘｈ＜０のとき、Ｘ〜Ｘ＋３のすべてが領域外である。
【００７１】
さらに、Ｘｈを並列数Ｎ＝４で割った商をＸｈｑ、剰余をＸｈｒとすると、次のようになる。
（ｄ″）Ｘｈｑ＞０のとき、すべてが領域内である。
（ｅ″）Ｘｈｑ＝０のとき、一部が領域外である。
（ｆ″）Ｘｈｑ＜０のとき、すべてが領域外である。
【００７２】
上記（ｄ″）、（ｅ″）、（ｆ″）の結果と、図１８から明らかな様に、図１７のエンコーダ３１は下記のように動作する。即ち、
【００７３】
・Ｘｈｑ＞０のとき、Ｅｈｊ（ｊ＝０〜Ｎ−１）は、すべてオフ、
・Ｘｈｑ＜０のとき、Ｅｈｊは、すべてオン、
・Ｘｈｑ＝０のとき、Ｅｈｊのうち、ｊ≦Ｘｈｒのものをオフ、他はオン。
【００７４】
さらに、一般のＮ並列の場合も同様であり、エンコーダ３１は、図１９のように動作することになる。
次に、図２０〜図２１により、エンコーダ３２の動作原理について説明する。図２０は、アドレスＸが与えられたとき、Ｘ〜Ｘ＋３のデータと、画像領域の上限Ｘｍｉｎとの位置関係を表す図である。図で＊印は有意の画像データを、≪印は、そのデータが、Ｘｍｉｎの位置に存在することを示す。図１８の場合と同様、ＸとＸｍｉｎの関係は、下記の３つの場合に分類できる。
【００７５】
（ｇ）Ｘｍｉｎが、Ｘ〜Ｘ＋３のどれよりも小さい場合
Ｘ〜Ｘ＋３のすべてが領域内である。（図２０（１））
（ｈ）Ｘｍｉｎが、Ｘ〜Ｘ＋３のどれかと一致する場合
Ｘ〜Ｘ＋３の一部が領域外である。（図２０（２）〜（５））
（ｉ）Ｘｍｉｎが、Ｘ〜Ｘ＋３のどれよりも大きい場合
Ｘ〜Ｘ＋３のすべてが領域外である。（図２０（６））
【００７６】
ここで、（Ｘｍｉｎ−Ｘ）をＸｌで表すと、次のように言い換えられる。
（ｇ′）Ｘｌ＜０のとき、Ｘ〜Ｘ＋３のすべてが領域内である。
（ｈ′）０≦Ｘｌ≦３のとき、Ｘ〜Ｘ＋３の一部か領域外である。
（ｉ′）Ｘｌ＞３のとき、Ｘ〜Ｘ＋３のすべてが領域外である。
【００７７】
さらに、Ｘｌを並列数Ｎ＝４で割った商をＸｌｑ、剰余をＸｌｒとすると、次のようになる。
（ｇ″）Ｘｌｑ＜０のとき、すべてが領域内である。
（ｈ″）Ｘｌｑ＝０のとき、一部が領域外である。
（ｉ″）Ｘｌｑ＞０のとき、すべてが領域外である。
【００７８】
上記ｇ″、ｈ″、ｉ″の結果と、図２０から明らかな様に、図１７のエンコーダ３２は下記のように動作する。即ち、
・Ｘｌｑ＜０のとき、Ｅｌｊ（ｊ＝０〜Ｎ−１）は、すべてオフ、
・Ｘｌｑ＜０のとき、Ｅｌｊ（ｊ＝０〜Ｎ−１）は、すべてオン、
・Ｘｌｑ＝０のとき、Ｅｌｊのうち、ｊ≧Ｘｌｒのものをオフ、他はオン。
【００７９】
さらに、一般のＮ並列の場合も同様であり、エンコーダ３２は、図２１のように動作することになる。
【００８０】
以上で、上限Ｘｍａｘに対する判定信号Ｅｈｊ、および下限Ｘｍｉｎに対する判定信号Ｅｌｊが得られた。そこで、図１７のオアゲート回路３３によって、各ＥｈｊとＥｌｊの論理和Ｅｊ′をとれば、有効領域Ｂ〔Ｘｍｉｎ、Ｘｍａｘ〕に対する判定信号となる。
【００８１】
ところで、得られた判定信号Ｅｊ′は、Ｘ〜Ｘ＋Ｎ−１のデータに対応したものであるが、Ｘは任意のアドレスを指定するため、実際のメモリ素子２ａ〜２ｎの配列とは異なっている。しかし、Ｘの剰余Ｘｒは、メモリ素子の配列に対応しているため、交換回路３４によって図２２の規則にしたがって並べ変えることにより、Ｍｉの配列に対応した領域判定信号Ｅｉを得ることができる。
【００８２】
以上の説明で、アドレスＸの近傍として、（Ｘ）〜（Ｘ＋Ｎ−１）のデータがアクセスされるとしたが、アドレスＸの近傍として、一般に（Ｘ−ｎ）〜（Ｘ−ｎ＋Ｎ−１）がアクセスされる場合は、図１７のレジスタ２４及び２５に、ＸＨ＝（Ｘｍａｘ−ｎ）、ＸＬ＝（Ｘｍｉｎ−ｎ）をセットすることで実現できることは明白である。
【００８３】
上述の図１５に示す構成ではメモリ２ａ〜２ｐにデータ線を独立に設けたが行方向に共通のバス構成とし、各メモリの出力イネーブルを時分割制御して、データをバス上で多重化してもよい。
【００８４】
更に図２３に示す様にＤＲＡＭで上記した多重化を施すため列アドレスは３５ａ，３５ｂ‥‥等のマルチプレクサを設けて共通に多重化し、行アドレスについては３６ａ，３６ｂ，３６ｃ，３６ｄ‥‥の様に各メモリにマルチプレクサを設置する様に構成させることも出来る。
【００８５】
【発明の効果】
本発明のメモリアクセス装置を画像処理等に用いることで任意の画素を中心にＮ個の近傍データを同時にアクセス可能で処理速度を大幅に向上させることが可能と成る。即ち、バイト単位で並列データを高速にアクセス可能なメモリアクセス装置が得られる。又メモリユニット内のデータ交換回路を処理回路内の乗算回路で兼用することができ、この乗算回路の重み係数レジスタの個数はＮに比例するのでメモリユニットの並列数Ｎが大きい時に効果を発揮させる事ができる。
【図面の簡単な説明】
【図１】本発明のメモリアクセス装置の一実施例を示す系統図である。
【図２】本発明のメモリアクセス装置が利用される画像処理装置の全体的系統図である。
【図３】本発明のメモリアクセス装置に用いるメモリマップ図である。
【図４】本発明のメモリアクセス装置に用いる外部アドレスに対するメモリ素子とそのアドレスを示すマップ図である。
【図５】本発明のメモリアクセス装置に用いる剰余の値に応じて商に加算するアドレス変換回路の出力マップ図である。
【図６】本発明のメモリアクセス装置に用いる剰余の値に応じて商に加算するアドレス変換回路をＮ並列に拡張したマップ図である。
【図７】本発明のメモリアクセス装置に用いる外部アドレスに対するメモリ素子とそのアドレスを示すマップ図である。
【図８】本発明のメモリアクセス装置に用いる剰余の値に応じて商を加減算するアドレス変換回路のマップ図である。
【図９】本発明のメモリアクセス装置に用いる剰余の値に応じて商に加算するアドレス変換回路の出力をＮ並列に拡張したマップ図である。
【図１０】本発明のメモリアクセス装置に用いるデータ変換回路動作説明図である。
【図１１】本発明のメモリアクセス装置に用いる剰余とメモリ素子のデータの関係を示す図である。
【図１２】本発明のメモリアクセス装置に用いる剰余とメモリ素子のデータの関係をＮ並列まで拡張した関係を示す図である。
【図１３】本発明のメモリアクセス装置に用いる処理回路内のデータ変換方法を示す系統図である。
【図１４】本発明のメモリアクセス装置に用いるデコーダのデコード表を示す図である。
【図１５】本発明のメモリアクセス装置の２次元構成の系統図である。
【図１６】本発明のメモリアクセス装置のアクセス時の境界部分の読み出し説明図である。
【図１７】本発明のメモリアクセス装置に用いる領域判定回路の系統図である。
【図１８】本発明のメモリアクセス装置に用いる領域判定回路でアドレスを与えた時のデータの画像領域の上限を示す図である。
【図１９】本発明のメモリアクセス装置に用いる領域判定回路の図１８をＮ並列に拡張した場合の説明図である。
【図２０】本発明のメモリアクセス装置に用いる領域判定回路でアドレスを与えたときのデータの画像領域の下限を示す図である。
【図２１】本発明のメモリアクセス装置に用いる領域判定回路の図２０をＮ並列に拡張した場合の説明図である。
【図２２】本発明のメモリアクセス装置に用いるレジスタのレジスタ値を一般化した場合の説明図である。
【図２３】本発明のメモリアクセス装置の他の構成を示す系統図である。
【図２４】従来のメモリアクセス装置の系統図である。
【図２５】従来の並列アクセス時の選択データとデータ幅及び剰余との関係を示す図である。
【図２６】従来のメモリアクセス装置に用いる領域判定回路の系統図である。
【符号の説明】
１演算回路（分割アドレス制御手段）
２（２ａ〜２ｎ）メモリ
４（４ａ〜４ｎ）アドレス変換回路
５データ交換回路
６メモリ回路
７アドレス発生回路
８処理回路[0001]
[Industrial applications]
The present invention relates to an improvement in a memory access device particularly used in the field of image processing in which a plurality of memories are arranged in parallel and read simultaneously.
[0002]
[Prior art]
Conventionally, as a memory access device, as a method for processing a memory such as a DRAM having a relatively low reading speed at a high speed, a memory access device as shown in FIG. These can be applied to memories arranged in a K-dimension, but for simplification of explanation, a one-dimensional configuration will be described below.
[0003]
In FIG. 24, the input terminal T₁Is supplied with an address X, which is supplied to the arithmetic circuit 1. The arithmetic circuit 1 divides the address X into upper bits and lower bits. That is, for example, the upper address operation unit 1a divides the data into upper bits and outputs the divided bits, and the upper bits are supplied as addresses X of memory elements 2a to 2d arranged in parallel (in the figure, N = 4). The upper bits are Xq (a quotient obtained by dividing the external address X by the number of memory elements N = 4) Xq. When the data width W = 4, the lower two bits are missing from the address X.
[0004]
Also, the lower-order address operation section 1 b divides and outputs lower-order address data and supplies it to the selection circuit 3. This selection circuit 3 has an input terminal T₂Gives the data width W (here, W = 1 to 4), and selects the lower 2 bits (remainder of N = 4) Xr and the selected data Ei (i = E₀~ E₃) Is supplied to the plurality of memory elements 2a to 2d, and the W data elements D are turned on by turning on the W memory elements subsequent to the Xr-th memory element.₀~ D₃Will be accessed in parallel.
[0005]
The operation of the selection circuit 3 of the above parallel memory access device will be described with reference to FIG. In FIG. 25, W (1-4) is the data width, X_rIs the quotient X obtained by dividing the external address X by the number of memories N_QThe remainder, E ₀~ E_ThreeIs a selection signal from the selection circuit 3.
[0006]
When the data width W = 1, the selection signal E₀~ E₃Are selected by the remainders Xr0 to Xr3. When the data width W = 2, the selection signal E₀~ E₃Are accessed in parallel by the remainders Xr0 to Xr3. Here, W = 2 and Xr = 3 are prohibited. Similarly, when the data width W = 3, the selection signal E₀~ E₃Are accessed in parallel by the remainders Xr0-3, and W = 3, Xr = 2 and Xr = 3 are prohibited. When the data width W = 4, the selection signal E₀-E₃Are accessed in parallel by the remainders Xr0 to Xr3, and W = 3, Xr = 1, Xr = 2, and Xr = 3 are prohibited.
[0007]
That is, in FIG. 25, the same address Xq is given to the four memory elements 2a to 2d at the same time, so that parallel access across a boundary of a multiple of 4 cannot be performed. This restriction becomes severe as the data width W increases. It can be seen that when W = 4 of 25, access to anything other than Xr = 0 is not possible.
[0008]
[Problems to be solved by the invention]
Generally, when a memory access device is used for image processing or the like, the following items are required.
(B) Since a large amount of data is processed, it is desirable to access as much data as possible in parallel.
(B) Since the pixel which is the minimum unit of an image is usually represented by one byte, it is necessary to specify an address in one byte unit.
(C) In order to handle images of various sizes, it is necessary to perform processing in consideration of image boundaries (upper, lower, left and right ends).
However, in the above-mentioned conventional memory access device, as the number of memories in which the memory elements 2a to 2d are arranged in parallel is increased, that is, as the data width W is increased, the limitation of the address is increased. B) There was a problem that item (2) could not be satisfied at the same time.
[0009]
Further, when the boundary is determined in accordance with the request of item (c), there is a restriction on the address, so that parallel access is not performed and a simple comparison is performed for each one-byte access as shown in FIG. Image processing was being performed.Further, used in the present inventionSince a data exchange circuit to be described later has a large circuit scale, a smaller data exchange circuit has been demanded.
[0010]
An object of the present invention is to provide a memory access device that solves the above-mentioned problems. It is an object of the present invention to specify an address in units of 1 byte when extracting data in parallel units and to store a large amount of data. Can be taken out in parallel at the same time.The above(C) Attempt to obtain area determination means that satisfies the termTo obtain a memory access device with a reduced data exchange circuit scaleThings.
[0011]
[Means for Solving the Problems]
A memory access device according to the present invention is a memory access device that outputs parallel data from a memory unit including a plurality of memory elements storing data, and is supplied from an address generation unit that generates an address and an address generation unit. A first address data consisting of upper bits obtained by dividing an address by the number of memory elements, and a second address data consisting of lower bits obtained by dividing the address supplied by the address generating means by the number of memory elements. The first address data is controlled by the control means and the second address data from the divided address control means, and the first address data is updated by adding 1 or zero or -1 to the first address data. Address updating means for supplying the third address data, and updated by the third address data. A plurality of registers in which a plurality of memory units stored in the said data is supplied not arranged in order of addresses, a second address data is input,Corresponds to the above data not arranged in the above address orderA decoder for outputting switching control data, a plurality of weighting factor registers storing a weighting factor for selecting and outputting one of the plurality of weighting factors based on the switching control data of the decoder; A plurality of multiplying means for multiplying data stored in a plurality of memory units which are not arranged in order and a weighting factor supplied from a plurality of weighting factor registers, and a summing means for adding multiplied outputs from the plurality of multiplying means And theBe preparedThe memory access device is characterized in that the memory access device is configured as follows.
[0012]
The memory access device of the present invention includes:Area determination means for operating a memory unit by supplying a plurality of area determination signals based on a predetermined effective area to a plurality of memory units to which the third address data is supplied from the address updating means is added.Things.
[0013]
[Action]
According to the double-sided disk of the present invention described above, a quotient Xq obtained by dividing the external address X by the number of memories N and a remainder Xr are generated for N memories, and the remainder Xr gives 1,0, or-to the quotient Xq. Since 1 is added, an address which can be accessed in parallel can be obtained even if the external address X is not always a multiple of the number N of memories arranged in parallel as in the prior art. Further, when processing image data stored in the memory, N neighboring pixel data centered on an arbitrary pixel can be simultaneously accessed, and a high processing speed can be obtained.The data exchange circuit in the memory unit can also be used as a multiplication circuit in the processing circuit. The number of weight coefficient registers in this multiplication circuit is proportional to N, so that the effect is exhibited when the number N of parallel memory units is large. Can do things.
[0014]
【Example】
Hereinafter, an embodiment of the memory access device of the present invention will be described in detail with reference to the drawings. The memory access device of this example is applicable to N memories arranged in K dimensions forming an N-dimensional address space. However, for simplicity, the description will proceed to a one-dimensional address space.
[0015]
Before describing the memory access device of this embodiment in detail in FIG. 1, an overall system of an image processing device to which the memory access circuit of this embodiment can be applied will be described with reference to FIG.
[0016]
In FIG. 2, reference numeral 6 denotes a memory circuit of this example, and it is assumed that image data is stored in a memory element in advance. An external address X is supplied from the address generation circuit 7 to the memory circuit 6, and a processing portion of the memory is designated. The memory circuit 6 supplies the pixel data Di (i = 0 to N−1) near the address X of the processing portion to the processing circuit 8. The processing circuit 8 processes the pixel data Di and outputs the processing result S to the output device 9.
[0017]
One embodiment of the above-described memory circuit 1 is shown in FIG. In FIG. 1, an area determination circuit 18 for performing area determination outputs Ei (i = 0 to N-1), which will be described later, and is supplied to the plurality of memory elements 2a to 2n. The external address X generated by the address generation circuit 7 is input to the input terminal T.₁And supplied to the arithmetic circuit 1 constituting the divided address control means. The arithmetic circuit 1 is a divider and includes an upper address arithmetic unit 1a and a lower address arithmetic unit 1b which output a quotient Xq obtained by dividing the address X by the number N of the memory elements 2a to 2n, that is, the number of parallels N, and a remainder Xr. The quotient Xq represents the upper bits of the address X, and the remainder Xr represents the lower bits of the address X.
[0018]
The remainder Xr is supplied to the N address conversion circuits 4a to 4n and also to the data conversion circuit 5. The quotient Xq is also supplied to the N address conversion circuits 4a to 4n.
[0019]
The address conversion circuits 4a to 4n are a kind of adder controlled by the remainder Xr, and undergo different conversions by adding 1 or 0 or -1 to the quotient Xq depending on the value of the remainder Xr, and the address is updated. An update address Xqi (i = 0 to N-1) is output and supplied as an address to each of the memories 2a to 2n.
[0020]
If the determination signal Ei (i = 0 to N-1) from the region determination circuit 18 supplied to each of the memories 2a to 2n is "ON", the memory elements 2a to 2n operate and the data conversion circuit 5 The data Di '(i = 0 to N-1) is output.
[0021]
The data conversion circuit 5 is controlled by the remainder Xr of the external address X. The data conversion circuit 5 rearranges the input data Di 'from the memory elements 2a to 2n in the order of addresses and converts the input data Di' into data Di (i = 0 to N-1) as shown in FIG. Is output.
[0022]
The operation of the memory circuit shown in FIG. 1 will be described with reference to FIGS.
[0023]
FIG. 3 shows that the data of the external address X has four memory elements 2a to 2d (M₀, M₁, M₂, M₃) Shows an address map indicating where the data is stored, and Xq indicates the quotient of X / 4. Therefore, in FIG. 3, for example, as shown by a broken-line circle, the data placed at the address X = 4 is the memory element M₀Is located at the first address and the data at the position of X = 11 is the memory element M₃At the second address.
[0024]
The external address X is divided into a quotient Xq and a remainder Xr as described with reference to FIG. 1. Of these, the remainder Xr matches the number of the memory element, and the quotient Xq matches the address in each memory element. . That is, when X = 11, Xq = 2 and Xr = 3.₃It is clear from FIG. 3 that the second address is used.
[0025]
In other words, when an external address X is given to each memory element, the selected memory element M₀~ M₃FIG. 4 shows the relationship between the address and the address Xq. When X = 4 as shown by a circle in FIG.₀Is selected, and when X = 11, the memory element M₃It can be understood that the second address "2" is selected.
[0026]
A first object of the present invention is to access data of such N memory elements (here, N = 4) in parallel. Now, when an external address X = 4 is given in FIG. 4, to select data of X = 4, 5, 6, 7 which can be accessed in parallel at the same time, the memory element M₀, M₁, M₂, M₃ It can be seen from the figure that it is sufficient to give an address as quotient Xq = 1 to all of.
[0027]
In the example where X = 11 indicated by a circle in FIG.₃ To the memory element M₀, M₁, M₂Is given as (Xq + 1) = 3 as an addressIYou.
[0028]
In the conventional parallel memory access device shown in FIG.₀~ M₃, The same address Xq is supplied to the data, so that parallel access over the boundary 10 such as X = 11 is prohibited as shown in FIG.
[0029]
Therefore, the address conversion circuits 4a to 4n of the present embodiment correspond to the memory elements M in the example of FIG.₀, M₁, M₂ On the other hand, by adding the address 2 of the memory element to (Xq + 1) = 3 to update the memory, parallel access that has been prohibited in the past can be enabled.
[0030]
FIG. 5 shows that when an external address X is given, each address conversion circuit 4a to 4d (T₀, T₁, T_Two, T_Three) Are arranged according to the value of the remainder Xr. As can be seen from this map, when the external address X = 11,The quotient obtained by dividing the external address X = 11 by the number of memories N = 4 is 2, and the remainder Xr = 3 is obtained.Memory element M_ThreeTherefore, the address conversion circuits 4a to 4d (T₀, T₁, T_Two, T_Three) Of T₀, T₁, T_TwoMust be added to the address and converted to (Xq + 1).
[0031]
FIG. 6 is an extension of the case where N = 4 shown in FIG. From this map, the address conversion circuits 4a to 4n (Ti = i = 0 to N-1) need only convert the address of the memory element to (Xq + 1) when the remainder Xr> i.
[0032]
In the above-described embodiment, the case where the address updating means of the address conversion circuits 4a to 4b access the N pieces of data after the external address X in parallel when the external address X is given is described. In such a case, it is necessary to process N data before and after the address X as the center. In order to perform such processing, a value obtained by subtracting (N-1) / 2 from the original address X may be given as X. However, by devising the address conversion circuits 4a to 4b described above, This can be achieved. Hereinafter, this will be described.
[0033]
FIG.Number of memoryIn the case of N = 5OutsideFIG. 6 shows a memory element and its map for an external address X in the same manner as in FIG.AtX = 2 is given as shown by a circle,Memory M ₀,M ₁,M _Two,M _Three,M _Four Access simultaneouslyConsidering the case, Xq is assigned to all memory elements M₀~ M_FourAnd X = 9 similarly indicated by a circle.Memory M _Two,M _Three,M _Four,M ₀,M ₁ Access simultaneouslyIf the memory element M_Two, M_Three, M_FourTo the memory element M₀, M₁(Xq + 1). On the other hand, when X = 15 indicated by a circle,Memory M _Three,M _Four,M ₀,M ₁,M _Two Access simultaneouslyIf the memory element M₀, M₁, M_Two, Xq for the memory element M_Three, M_FourThen, it is necessary to supply (Xq-1).
[0034]
These are summarized in the same manner as in FIG.
[0035]
Further, as shown in FIG. 9 similar to FIG. 6, in order to access N pieces of data from (X−n) to (X−n + N−1) in parallel, when Xr <(in), Xq-1), Xr> (i + n), the address conversion circuits Ti should be designed to convert to (Xq + 1).
[0036]
Next, the operation of the data conversion circuit 5 shown in FIG. 1 will be described. In general image processing, the processing method is often determined by a relative position with respect to the address X.₀, Address (X + 1) is P₁And so on. Therefore, in order to send a plurality of data to the processing circuit in parallel, it is necessary to arrange them in address order. Here, looking at the case of X = 11 in the example of FIG. 4 described above, data corresponding to X = 11, 12, 13, and 14 are respectively stored in the memory elements M₃, M₀, M₁, M₂Accessed to. When these are arranged in the arrangement order of the memory elements, they become 12, 13, 14, 11 as in the memory element output Di 'in FIG. 10, and these are converted into 11, 12, as in the data conversion circuit output Di in FIG. , 13, 14 need to be rearranged.
[0037]
FIG. 11 shows how the arrangement of the output data Di 'of the memory elements 2a to 2d changes depending on the address remainder Xr when N = 4. It can be seen that the example of FIG. 10 corresponds to the case of Xr = 3 in FIG.
[0038]
In the case of FIG. 11, the data conversion circuit is designed so that the data of each memory element is rotated by the Xr number to the smaller address, so that the data can be aligned like the data conversion circuit output Di.
[0039]
Further, FIG. 12 shows an example in which N data from (X−n) to (X−n + N−1) is accessed in a general N-parallel case. At this time, the data exchange circuit 5 rotates the data of each memory element in the direction of the smaller address by (Xr−n) (however, in the opposite direction when (Xr−n) <0). Just design.
[0040]
In the configuration shown in FIG. 1 described above, the case where the data exchange circuit 5 is provided in the memory circuit 6 has been described. However, by providing the processing circuit 8 shown in FIG. 2 with this function, the data conversion circuit in the memory circuit 6 is omitted. It can also be done.
[0041]
As an example of such a processing circuit 8, a product-sum operation circuit frequently used in image processing will be described with reference to FIG.
[0042]
In FIG. 13, data Di '(i = 0 to 3) not arranged in the order of addresses from the memories 2a to 2n in FIG. 1 is directly supplied to the register 11 without passing through the data conversion circuit 5. Output data Di 'from the memory not arranged in the order of the addresses held in the register 11 is supplied to a plurality of multiplier circuits 12a, 12b, 12c, and 12d.
[0043]
On the other hand, the data Di at the addresses X, X + 1, X + 2 and X + 3 coming from the memories 2a to 2n are stored in these multiplication circuits 12a to 12d.′To the weighting factor W₀, W₁, W_Two, W_ThreeTo calculate the sum of them, a weighting coefficient Wj (j = 0 to 3) is supplied from the weighting coefficient register 13.
[0044]
The weight coefficient register 13 has four registers U10, U11, U12, and U13. Weighting factorW _0, W _1, W _2, W ₃ ・ W _3, W _0, W _1, W ₂ ・ W _2, W _3, W _0, W _1, ・ W _1, W _2, W _3, W ₀ Are stored.The register 13For switchingA control signal is provided to select one of the four registers.
[0045]
Since the remainder Xr from the arithmetic circuit 1 shown in FIG. 1 is supplied to the decoder 14, the weight coefficient is controlled by the remainder Xr. In the decoder 14, the register U10 when Xr = 0 and the register U10 when Xr = 1 The register U11 is decoded so that the register U12 is selected when Xr = 2 and the register U13 is selected when Xr = 3, and is sent to the multiplier circuits 12a to 12d. The arrangement of the weight coefficients Wj is performed as shown in FIG.
[0046]
The weighting coefficient W is assigned to each data Di by the multiplication circuits 12a to 12d.₀~ W₃Are supplied to the summation circuit 15 and added, and the final result data S is supplied to the output device 9 shown in FIG.
[0047]
By the way, when recalling that the decoding table shown in FIG. 14 is a data array in the case of N = 4, X and W₀, X + 1 and W₁, X + 2 and W₂, X + 3 and W₃However, it turns out that it corresponds exactly. That is, the same effect as that of the data exchange circuit 5 can be obtained, and it can be understood that the data conversion circuit 5 can be omitted.
Here, considering the circuit scales of the data exchange circuit 5 and U10 to U13 of the weight coefficient register 13, the circuit scale of the data exchange circuit 5 is proportional to the square of the number N of parallels. The number of the registers 13 is proportional to N. Therefore, the configuration of FIG. 13 is effective when the number of parallels N is large.
[0048]
The configuration shown in FIG. 1 described above is a memory access device in which memories are arranged one-dimensionally, but can be extended to a plurality of dimensions. FIG. 15 shows a configuration in which this is extended to two dimensions.
[0049]
In FIG.pIs a memory unit, and 1 and 1 'are division address control circuits for columns and rows. If the upper address calculation and lower address calculation units 1a and 1a 'in the divided address control circuits 1 and 1' and 1b and 1b 'are defined differently, the original external address to be accessed is X, and Assuming that the number of memory units 2a to 2p to be arranged is N, X / N = Xq, that is, the quotient Xq after the division corresponds to the upper address operation unit 1a, and the remainder Xr becomes the lower address operation unit 1b. Therefore, it can be expressed as X = Xq + Xr, and the divided address control circuits 1 and 1 'can be constituted by the operation circuit of the divider.
[0050]
The quotient Xq and remainder Xr from the arithmetic units 1, 1 'are supplied to address update circuits 4a, 4b, 4c, 4d, 4a', 4b ', 4c', 4d 'constituted by PLD (programmable logic device) or the like. Is done.
[0051]
The column and row address update circuits 4a to 4d and 4a 'to 4d' add the remainder Xr (the lower bit of the address) to the value of the quotient Xq (the upper bit of the address) obtained by dividing the original external address X by the number of memory units. ), And control is performed so as to correspond to the offset.
[0052]
That is, the memory units 2a to 2pFor the first and second columns (rows), 0 / + 1 selection of whether to pass the quotient Xq as it is or to add 1 to the value of the quotient Xq is performed with the remainder value Xr. For the third column (row), the quotient Xq is passed as it is, and the selection of 0/0 is performed with the remainder value Xr. Similarly, for the fourth column (row), whether to subtract 1 from the quotient Xq or to pass the quotient as it is, −1/0 is selected with the remainder value Xr, and as shown in FIG. , Predetermined data is read out and supplied to the processing circuit 8.
[0053]
In the configuration of FIG. 15 described above, the two-dimensional memory units 2a to 2p have been described. However, when considering a three-dimensional cubic memory unit, division address control in the Z-axis direction is performed in the same manner as in the X and Y-axis directions. Just like the circuit 1, address updating circuits 4a ", 4b", 4c ", and 4d" in the Z-axis direction may be provided.
[0054]
Next, the area determination circuit described with reference to FIG. 1 will be described. In the image processing using the memory access circuit of this example, image data is placed on an effective section B (Xmin, Xmax) in an address space A (A / B) as shown in FIG. In many cases, the data 17 in the hatched portion of the data 16 divided by the boundary delimited by the addresses Xmin to Xmax of the data 16 other than the valid section B is not guaranteed. For this reason, regarding the processing of data at a certain address X, in image processing using a plurality of data in the vicinity of X, it is necessary to take one of the following methods.
(A) The range of the image to be processed is limited slightly inside the boundary of the effective section B so that the number of neighboring data to be processed is not changed and all the neighboring data is included in the effective section B.
(B) The entire range of the effective section B is processed without changing the range of the image to be processed, but the number of neighboring data near the boundary is reduced so that the processed data does not protrude from the effective section B.
(C) Of the neighboring data, those that do not belong to the valid section B are replaced with a predetermined background value and then subjected to normal processing.
[0055]
However, the method (a) has a problem that an image after processing becomes small. Further, the method (b) complicates exception processing, and is disadvantageous in terms of speeding up and circuit scale.
[0056]
On the other hand, if the method (c) is adopted, the same processing is performed on the entire image by using a background value according to the application, such as designating the background color to white when printing an image on a blank sheet without borders. be able to.
[0057]
Further, in the above (a) and (b), address designation other than the valid section B is prohibited, whereas in (c), address designation is allowed, so that more flexible address designation is possible. It is effective for the geometric transformation of.
[0058]
FIG. 26 shows a conventional configuration of such an area determination circuit 18. In the figure, reference numerals 19 and 20 denote registers holding the maximum value Xmax and the minimum value Xmin of the effective area B, respectively. These maximum value, Xmax and minimum value Xmin are supplied to the comparison circuits 21 and 22 and specified. The address X is also supplied to these comparison circuits 21 and 22 to compare the values. This comparison circuit is represented by, for example, TTL 74LS682.
[0059]
The comparison circuit 21 outputs the comparison output Eh (X> Xmax), indicating that the address X has exceeded the upper limit value Xmax of the boundary of the valid section B, and the comparison circuit 22 outputs the comparison output El (X <Xmin). , The address X has exceeded the lower limit value Xmin of the boundary of the valid section B. These two outputs Eh and El are ORed by the OR gate circuit 23 to output the area determination signal Ei, and the output terminal T₂Is supplied to each memory via the.
[0060]
As is clear from the figure, in the related art, one determination signal Ei is obtained for one given address X. However, in this example, it is necessary to generate N determination signals to access N data from one address X. The configuration of such an area determination circuit 18 will be described with reference to FIG.
[0061]
The area determination circuit 18 used in the present invention is configured to provide N area determination signals Ei (i) for an arbitrary given address X based on a predetermined upper limit value Xmax and lower limit value Xmin of the effective area B. = 0 to N-1) and supplies the generated data to the memory elements 2a to 2n in FIG.
[0062]
In FIG. 17, registers 24 and 25 hold an upper limit reference value XH and a lower limit reference value XL determined by the maximum value Xmax and the minimum value Xmin of the effective area B, and supply these reference values XH and XL to the subtraction circuits 26 and 27. I do.
[0063]
The designated address X is given to the subtraction circuits 26 and 27, and the subtraction circuit 26 outputs a difference output Xh obtained by subtracting X from the reference value XH. The difference output Xh is supplied to the division circuit 29. The division circuit 29 divides the quotient Xhq divided by the parallel number N of the memory and the remainder Xhr, and supplies them to the encoder 31. The encoder 31 generates N upper limit determination signals Ehj (j = 0 to N−1) from the quotient Xhq and the remainder Xhr.
[0064]
Similarly, the subtraction circuit 27 outputs a difference output Xl obtained by subtracting X from the reference XL, and the difference output Xl is supplied to the division circuit 30. The division circuit 30 divides the quotient Xlq divided by the number N of memories into a remainder Xlr and supplies the result to the encoder 32. The encoder 32 generates N lower limit determination signals Elj (j = 0 to N−1) from the quotient Xlq and the remainder Xlr.
[0065]
The upper limit determination signal Ehj and the lower limit inverted signal Elj obtained from the encoders 31 and 32 are ORed for each i by the OR gate circuit 33, and sent to the switching circuit 34 as the determination signal Ej '(j = 0 to N-1). Is done.
[0066]
On the other hand, the division circuit 28 divides the address X by N, outputs a remainder Xr, and supplies it to the switching circuit 34. The exchange circuit 34 exchanges the order of the decision signals Ej 'by controlling the remainder Xr, outputs decision signals Ei (i = 0 to N-1) and supplies them to the memories 2a to 2n.
[0067]
Hereinafter, the operation principle of the encoder 31 will be described with reference to FIGS. For the sake of simplicity, let us consider the addresses X to X-3 when the address X is given (corresponding to the case shown in FIGS. 3 to 5) when the parallel number is 4.
[0068]
FIG. 18 is a diagram illustrating a positional relationship between data of X to X + 3 and an upper limit Xmax of an image area when an address X is given. In the figure, * indicates significant image data, and ≫ indicates that the data exists at the position of Xmax.
The relationship between X and Xmax can be classified into the following three cases.
[0069]
(D) When Xmax is greater than any of X to X + 3
All of X to X + 3 are within the region. (FIG. 18 (1))
(E) When Xmax matches any of X to X + 3
Part of X to X + 3 is outside the region. (FIG. 18 (2) to (5))
(F) When Xmax is smaller than any of X to X + 3
All of X to X + 3 are out of the area. (FIG. 18 (6))
[0070]
Here, when (Xmax-X) is represented by Xh, it can be paraphrased as follows.
(D ') When Xh> 3, all of X to X + 3 are within the region.
(E ′) When 0 ≦ Xh ≦ 3, a part of X to X + 3 is out of the area.
(F ') When Xh <0, all of X to X + 3 are out of the area.
[0071]
Further, assuming that a quotient obtained by dividing Xh by the parallel number N = 4 is Xhq and a remainder is Xhr, the following is obtained.
(D ″) When Xhq> 0, everything is within the region.
(E ″) When Xhq = 0, a part is out of the area.
(F ″) When Xhq <0, all are out of the area.
[0072]
As is clear from the results of the above (d ″), (e ″), and (f ″) and FIG. 18, the encoder 31 in FIG. 17 operates as follows.
[0073]
When Xhq> 0, Ehj (j = 0 to N-1) is all off,
When Xhq <0, Ehj is all on,
When Xhq = 0, of Ehj, those of j ≦ Xhr are off, and the others are on.
[0074]
Further, the same applies to the case of general N parallel operation, and the encoder 31 operates as shown in FIG.
Next, the operation principle of the encoder 32 will be described with reference to FIGS. FIG. 20 is a diagram illustrating a positional relationship between data X to X + 3 and an upper limit Xmin of an image area when an address X is given. In the figure, * indicates significant image data, and Δ indicates that the data exists at the position of Xmin. As in the case of FIG. 18, the relationship between X and Xmin can be classified into the following three cases.
[0075]
(G) When Xmin is smaller than any of X to X + 3
All of X to X + 3 are within the region. (FIG. 20 (1))
(H) When Xmin matches any of X to X + 3
Part of X to X + 3 is outside the region. (FIGS. 20 (2) to (5))
(I) When Xmin is greater than any of X to X + 3
All of X to X + 3 are out of the area. (FIG. 20 (6))
[0076]
Here, when (Xmin-X) is represented by Xl, the following can be rephrased as follows.
(G ') When Xl <0, all of X to X + 3 are within the region.
(H ′) When 0 ≦ Xl ≦ 3, a part of X to X + 3 is out of the area.
(I ') When Xl> 3, all of X to X + 3 are out of the area.
[0077]
Further, assuming that the quotient obtained by dividing Xl by the parallel number N = 4 is Xlq and the remainder is Xlr, the following is obtained.
(G ″) When Xlq <0, everything is within the region.
(H ″) When Xlq = 0, a part is out of the area.
(I ″) When Xlq> 0, all are outside the region.
[0078]
As is clear from the results of the above g ″, h ″, and i ″ and FIG. 20, the encoder 32 of FIG. 17 operates as follows.
When Xlq <0, Elj (j = 0 to N-1) is all off;
When Xlq <0, Elj (j = 0 to N−1) is all on;
When Xlq = 0, among Elj, those with j ≧ Xlr are off, and the others are on.
[0079]
Further, the same applies to the case of general N parallel operation, and the encoder 32 operates as shown in FIG.
[0080]
As described above, the determination signal Ehj for the upper limit Xmax and the determination signal Elj for the lower limit Xmin are obtained. Therefore, if the OR gate Ej 'of each of Ehj and Elj is calculated by the OR gate circuit 33 in FIG. 17, it becomes a determination signal for the effective area B [Xmin, Xmax].
[0081]
By the way, the obtained decision signal Ej 'corresponds to the data of X to X + N-1, but since X designates an arbitrary address, it differs from the actual arrangement of the memory elements 2a to 2n. . However, since the remainder Xr of X corresponds to the arrangement of the memory elements, an area determination signal Ei corresponding to the arrangement of Mi can be obtained by rearranging by the switching circuit 34 according to the rule of FIG.
[0082]
In the above description, it is assumed that data of (X) to (X + N−1) is accessed near the address X, but generally, (X−n) to (X−n + N−1) near the address X. Is accessed by setting XH = (Xmax-n) and XL = (Xmin-n) in the registers 24 and 25 in FIG.
[0083]
In the configuration shown in FIG. 15, the data lines are independently provided in the memories 2a to 2p. However, a common bus configuration is provided in the row direction, the output enable of each memory is time-divisionally controlled, and the data is multiplexed on the bus. Is also good.
[0084]
Further, as shown in FIG. 23, in order to perform the above-described multiplexing in the DRAM, the column address is multiplexed in common by providing a multiplexer such as 35a, 35b #, and the row address is designated as 36a, 36b, 36c, 36d #. It is also possible to configure a multiplexer in each memory.
[0085]
【The invention's effect】
By using the memory access device of the present invention for image processing or the like, it is possible to simultaneously access N pieces of neighboring data centering on an arbitrary pixel, and to greatly improve the processing speed. That is, a memory access device that can access parallel data at high speed in byte units is obtained.The data exchange circuit in the memory unit can also be used as a multiplication circuit in the processing circuit. The number of weight coefficient registers in this multiplication circuit is proportional to N, so that the effect is exhibited when the number N of parallel memory units is large. Can do things.
[Brief description of the drawings]
FIG. 1 is a system diagram showing one embodiment of a memory access device of the present invention.
FIG. 2 is an overall system diagram of an image processing apparatus using the memory access device of the present invention.
FIG. 3 is a memory map diagram used in the memory access device of the present invention.
FIG. 4 is a map diagram showing memory elements and their addresses for external addresses used in the memory access device of the present invention.
FIG. 5 is an output map diagram of an address conversion circuit that adds a quotient according to a remainder value used in the memory access device of the present invention.
FIG. 6 is a map diagram in which an address conversion circuit for adding to a quotient according to a remainder value used in the memory access device of the present invention is extended in N parallel.
FIG. 7 is a map diagram showing memory elements and their addresses for external addresses used in the memory access device of the present invention.
FIG. 8 is a map diagram of an address conversion circuit that adds and subtracts a quotient according to a remainder value used in the memory access device of the present invention.
FIG. 9 is a map diagram in which the output of an address conversion circuit that adds to a quotient according to the value of the remainder used in the memory access device of the present invention is extended in N parallel.
FIG. 10 is an explanatory diagram of the operation of a data conversion circuit used in the memory access device of the present invention.
FIG. 11 is a diagram showing a relationship between a remainder used in the memory access device of the present invention and data of a memory element.
FIG. 12 is a diagram showing a relationship between a remainder and data of a memory element used in the memory access device of the present invention, which is extended to N parallel.
FIG. 13 is a system diagram showing a data conversion method in a processing circuit used in the memory access device of the present invention.
FIG. 14 is a diagram showing a decoding table of a decoder used in the memory access device of the present invention.
FIG. 15 is a system diagram of a two-dimensional configuration of the memory access device of the present invention.
FIG. 16 is an explanatory diagram of reading a boundary portion at the time of access of the memory access device of the present invention.
FIG. 17 is a system diagram of an area determination circuit used in the memory access device of the present invention.
FIG. 18 is a diagram showing an upper limit of an image area of data when an address is given by an area determination circuit used in the memory access device of the present invention.
FIG. 19 is an explanatory diagram in a case where FIG. 18 of the area determination circuit used in the memory access device of the present invention is extended in N parallel.
FIG. 20 is a diagram showing a lower limit of an image area of data when an address is given by an area determination circuit used in the memory access device of the present invention.
FIG. 21 is an explanatory diagram of the area determination circuit used in the memory access device of the present invention when FIG. 20 is extended in N parallel.
FIG. 22 is an explanatory diagram in a case where register values of registers used in the memory access device of the present invention are generalized.
FIG. 23 is a system diagram showing another configuration of the memory access device of the present invention.
FIG. 24 is a system diagram of a conventional memory access device.
FIG. 25 is a diagram showing the relationship between selected data, data width, and remainder at the time of conventional parallel access.
FIG. 26 is a system diagram of an area determination circuit used in a conventional memory access device.
[Explanation of symbols]
1 arithmetic circuit (divided address control means)
2 (2a-2n) memory
4 (4a to 4n) address conversion circuits
5 Data exchange circuit
6. Memory circuit
7 Address generation circuit
8 Processing circuit

Claims

A memory access device that outputs parallel data from a memory unit including a plurality of memory elements storing data,
Address generating means for generating an address;
First address data consisting of upper bits obtained by dividing the address supplied from the address generating means by the number of memory elements; and first address data consisting of a lower bit obtained by dividing the address supplied by the address generating means by the number of memory elements. Divided address control means for dividing the second address data into:
The first address data is controlled by the second address data from the divided address control means, and 1 or 0 or -1 is added to the first address data to update the first address data in the plurality of memory units. Address updating means for supplying address data of No. 3;
A plurality of registers to which the data stored in the plurality of memory units which are updated with the third address data and which are not arranged in the order of the addresses are supplied;
A decoder to which the second address data is inputted and which outputs switching control data corresponding to the data not arranged in the order of the addresses ;
A plurality of weight coefficient registers storing a weight coefficient for selecting and outputting one of a plurality of weight coefficients based on the switching control data of the decoder;
Data stored in the plurality of memory units that are not arranged in the order of the addresses from the plurality of registers, and a plurality of multiplication means for multiplying a weight coefficient supplied from the plurality of weight coefficient registers;
Summing means for adding the multiplied outputs from the plurality of multiplying means,
Memory access device, wherein ingredients Bei to Rukoto a.

A plurality of area determination signals are supplied to the plurality of memory units to which the third address data is supplied from the address updating unit based on a predetermined effective area to operate the plurality of memory units. 2. The memory access device according to claim 1, further comprising an area determination unit.

The dividing address control means to said plurality of memory units consisting of the stored N memory elements Mi data [i = 0 to N-1], the memory the address X supplied from said address generating means Dividing the number of elements by N to generate the first address data as a quotient Xq and the second address data as a remainder Xr,
The address updating means is controlled by the second address data, which is the remainder Xr, and adds 1 or zero or -1 to the first address data, which is the quotient Xq, as appropriate, to obtain the quotient Xq. Generating the third address data to be N addresses Xqi obtained by updating the first address data;
The area determination means is configured to perform N area determination signals Ei (i = 0 to N−) for an arbitrarily given address X based on a predetermined upper limit value Xmax and lower limit value Xmin of the effective area B. 1), and
The third address data serving as the N addresses Xqi and the N area determination signals Ei are supplied to the plurality of memory units, and data not arranged in the order of the addresses X from the plurality of memory units is supplied. Generate Di ′,
The plurality of registers temporarily hold data Di ′ not arranged in the order of the addresses X from the plurality of memory units,
The decoder is supplied with the second address data serving as the remainder Xr, and generates switching control data according to the value of the second address data serving as the remainder Xr;
The plurality of weighting factor registers have N weighting factors Wj,
The plurality of multiplying means multiplies the data Di ′ supplied from the plurality of registers by a weight coefficient Wj from the plurality of weight coefficient registers,
The summation unit memory access apparatus according to claim 2, characterized in that form as that put calculate the sum from said plurality of multiplying means.

4. The data processing apparatus according to claim 3, wherein the processing means rotates the data Di 'in the plurality of memory units by a remainder Xr to the smaller value side of the memory address X to generate the data Di by data exchange. A memory access device according to any of the preceding claims.

A plurality of data near an arbitrary address in a K-dimensional address space are accessed in parallel using K sets of the divided address control unit, the address updating unit, the area determining unit, the processing unit, and the memory unit. 3. The memory access device according to claim 2, wherein: