JP2007200090A

JP2007200090A - Semiconductor processor

Info

Publication number: JP2007200090A
Application number: JP2006018762A
Authority: JP
Inventors: Hideyuki Noda; 英行野田
Original assignee: Renesas Technology Corp
Current assignee: Renesas Technology Corp
Priority date: 2006-01-27
Filing date: 2006-01-27
Publication date: 2007-08-09
Anticipated expiration: 2026-01-27
Also published as: JP4989899B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a semiconductor processor with a memory cell mat divided into a plurality of entries having processors respectively, in which data are transferred between the entries efficiently without increasing a wiring layout area. <P>SOLUTION: Transfer wirings (300) having data output parts (305) corresponding to respective entries (ERY) are provided. The transfer wirings (300) is provided with data transmission parts (XP1, XP2, XN1, and XN2) for transferring data to the entry spaced apart by predetermined entries between a farthest transfer part and the transfer data output part (305). The transfer wirings (300) are aligned in a first direction, and disposed so as to be shifted by one entry in a second direction, so that the wiring layout area is reduced. The data can be transferred between the arbitrary entries in computation. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

この発明は、半導体演算処理装置に関し、特に、複数のエントリに分割される半導体メモリと、各エントリに対応して設けられる演算器とを有する半導体演算処理装置に関する。より特定的には、この発明は、半導体演算処理装置内において演算器間のデータ転送を小占有面積で高速に行なうための構成に関する。 The present invention relates to a semiconductor arithmetic processing apparatus, and more particularly to a semiconductor arithmetic processing apparatus having a semiconductor memory divided into a plurality of entries and an arithmetic unit provided corresponding to each entry. More specifically, the present invention relates to a configuration for performing high-speed data transfer between arithmetic units in a semiconductor arithmetic processing apparatus with a small occupation area.

近年の携帯端末機器の普及に伴い、音声および画像のような大量のデータを高速に処理するデジタル信号処理の重要性が高くなってきている。このデジタル信号処理には、一般に、専用の半導体装置としてＤＳＰ（デジタル・シグナル・プロセッサ）が用いられることが多い。フィルタ演算などのデータ処理においては、積和演算を繰返す演算処理が多い。したがって、一般に、ＤＳＰの構成においては、乗算回路、加算回路および累算用のレジスタが設けられる。このような専用のＤＳＰを用いると、積和演算を１マシンサイクルで実行することが可能となり、高速演算処理が可能となる。 With the recent widespread use of portable terminal devices, the importance of digital signal processing for processing a large amount of data such as sound and images at a high speed has increased. In this digital signal processing, a DSP (digital signal processor) is often used as a dedicated semiconductor device in general. In data processing such as filter calculation, there are many calculation processes that repeat product-sum calculation. Therefore, in general, in the configuration of the DSP, a multiplication circuit, an addition circuit, and an accumulation register are provided. If such a dedicated DSP is used, the product-sum operation can be executed in one machine cycle, and high-speed operation processing is possible.

また、このような大量のデータに対し同じ演算を実行する画像処理などの用途において、ＳＩＭＤ（シングル・インストラクション・ストリーム・マルチプル・データ・ストリーム：Single Instruction Stream Multiple Data Stream）型プロセッサが用いられることも多い。このＳＩＭＤ型プロセッサにおいては、複数のプロセッサエレメントが設けられ、これらの複数のプロセッサエレメントにおいて、同一の演算処理をそれぞれ異なるデータに対して実行する。したがって複数のプロセッサエレメントが並行して演算処理を実行するため、大量のデータを高速で処理することが可能である。 In addition, a SIMD (Single Instruction Stream Multiple Data Stream) type processor may be used in such applications as image processing for performing the same operation on a large amount of data. Many. In this SIMD type processor, a plurality of processor elements are provided, and the same arithmetic processing is executed for different data in each of the plurality of processor elements. Therefore, since a plurality of processor elements execute arithmetic processing in parallel, a large amount of data can be processed at high speed.

このようなＳＩＭＤ型プロセッサにおいて、各プロセッサエレメントに対して、異なるデータを高速で格納することを図る構成が、特許文献１（特開２００２−２０７７０７号公報に示されている。この特許文献１においては、グローバルプロセッサから、複数のデータバスを各プロセッサエレメントに対して配置する。各プロセッサエレメントは、これらの複数のデータバスのうちのどれかのデータバスを選択する信号を、プロセッサエレメント番号に従って形成し、対応のデータバスのデータを選択して、プロセッサエレメント内のＡレジスタに格納する。特許文献１においては、画像処理におけるディザ処理において、各画素列に対応するしきい値を各列ごとにプロセッサエレメント内に格納して、ディザ処理におけるしきい値との比較動作を高速化することを図る。 In such a SIMD type processor, a configuration for storing different data at high speed for each processor element is disclosed in Japanese Patent Laid-Open No. 2002-207707. Allocates a plurality of data buses from the global processor to each processor element, and each processor element forms a signal for selecting one of the plurality of data buses according to the processor element number. Then, the data on the corresponding data bus is selected and stored in the A register in the processor element.In Patent Document 1, in the dither processing in the image processing, the threshold value corresponding to each pixel column is set for each column. Stored in the processor element and ratio to threshold in dither processing It aims to speed up the operation.

また、特許文献２（特開２００１−８４２２９号公報）においては、異なるデータビット幅に柔軟に対応してデータ処理を実行することを目的とするＳＩＭＤ型プロセッサが開示されている。この特許文献２に示される構成においては、プロセッサエレメントをアドレス指定可能とし、このアドレス指定されたプロセッサエレメントの内部レジスタとデータ転送バスとの間でデータを転送する。また、このプロセッサエレメントは、レジスタファイルのレジスタと、演算器アレイの演算器とを有し、このレジスタファイル内のレジスタの数を、処理データビット幅に応じて調整することにより、処理データのビット幅変更に対応することを図る。 Patent Document 2 (Japanese Patent Application Laid-Open No. 2001-84229) discloses a SIMD type processor for executing data processing in a flexible manner corresponding to different data bit widths. In the configuration disclosed in Patent Document 2, the processor element can be addressed, and data is transferred between the internal register of the addressed processor element and the data transfer bus. The processor element has a register of the register file and an arithmetic unit of the arithmetic unit array. By adjusting the number of registers in the register file according to the processing data bit width, Try to accommodate width changes.

また、特許文献３（特開２００２−２０７７０６号公報）は、少ない回路規模で、総和値演算およびピーク検出を行なうことを目的とするＳＩＭＤ型プロセッサを開示する。この特許文献３に示される構成においては、プロセッサエレメントの演算器それぞれに対応して、マルチプレクサを設け、このマルチプレクサにより、レジスタファイルの対応のレジスタおよび対応のレジスタの７隣接列のいずれかを選択する。このマルチプレクサによ
り、演算データおよび演算結果データをプロセッサエレメント間で転送する。 Patent Document 3 (Japanese Patent Application Laid-Open No. 2002-207706) discloses a SIMD type processor aiming to perform summation calculation and peak detection with a small circuit scale. In the configuration shown in Patent Document 3, a multiplexer is provided corresponding to each of the processor elements of the processor element, and the multiplexer selects either the corresponding register of the register file or the seven adjacent columns of the corresponding register. . The multiplexer and the operation result data are transferred between the processor elements by this multiplexer.

この特許文献３においては、また、演算器と、共通データバスとを設け、この共通のデータバスを、所定数単位で分割し、各分割されたグループ内のプロセッサエレメントにおいて分割共通データバスを介してデータを転送して、総和値算出およびピーク値算出などの演算を高速に実行することを図る。 In this Patent Document 3, an arithmetic unit and a common data bus are provided, the common data bus is divided by a predetermined number unit, and the processor elements in each divided group are connected via the divided common data bus. Thus, the data is transferred, and calculations such as the sum value calculation and peak value calculation are executed at high speed.

また、特許文献４（特開２００１−２０２３５１号公報）は、画像データ処理を、高速に実行することを目的とする構成を開示する。この特許文献４に示される構成においては、プロセッサエレメントをアドレス指定可能とし、グローバルプロセッサからアドレス信号に従ってプロセッサエレメントをアドレス指定してこのプロセッサエレメント内の演算レジスタにデータを格納する。必要なプロセッサエレメントに対してのみデータ転送を行うことによりデータ転送時間を短縮することを図る。 Japanese Patent Application Laid-Open No. 2001-202351 discloses a configuration for performing image data processing at high speed. In the configuration disclosed in Patent Document 4, a processor element can be addressed, the processor element is addressed in accordance with an address signal from a global processor, and data is stored in an arithmetic register in the processor element. Data transfer time is shortened by transferring data only to necessary processor elements.

また、特許文献５（特開２００３−１８６８５４号公報）は、プロセッサエレメントの内部データを外部からモニタすることを目的とするＳＩＭＤ型プロセッサを開示する。この特許文献５に示される構成においては、各プロセッサエレメントにおいて演算レジスタの内容を複写して格納するミラーレジスタを設け、プロセッサエレメントをアドレス指定することにより、このミラーレジスタに対し外部からアクセス可能とする。
特開２００２−２０７７０７号公報特開２００１−８４２２９号公報特開２００２−２０７７０６号公報特開２００１−２０２３５１号公報特開２００３−１８６８５４号公報 Patent Document 5 (Japanese Patent Laid-Open No. 2003-186854) discloses a SIMD type processor for the purpose of monitoring internal data of a processor element from the outside. In the configuration disclosed in Patent Document 5, a mirror register for copying and storing the contents of the operation register is provided in each processor element, and the mirror register can be accessed from the outside by addressing the processor element. .
JP 2002-207707 A JP 2001-84229 A JP 2002-207706 A JP 2001-202351 A JP 2003-186854 A

処理対象のデータ量が非常に多い場合には、専用のＤＳＰを用いても、性能を飛躍的に向上させることは困難である。例えば、演算対象のデータが１万組ある場合、１つ１つのデータの組に対する演算を１マシンサイクルで実行することが出来たとしても、最低でも１万マシンサイクル必要となる。また、ＤＳＰを利用する場合、処理性能は動作周波数に大きく依存するため、高速処理を優先する場合、消費電力が増大する。 When the amount of data to be processed is very large, it is difficult to dramatically improve the performance even if a dedicated DSP is used. For example, if there are 10,000 sets of calculation target data, even if the calculation for each data set can be executed in one machine cycle, at least 10,000 machine cycles are required. Further, when using a DSP, the processing performance largely depends on the operating frequency, and therefore power consumption increases when priority is given to high-speed processing.

特許文献１に示されるＳＩＭＤ型プロセッサにおいては、グローバルプロセッサから延びる複数のデータバスを設け、各プロセッサエレメント内において、そのプロセッサエレメント番号に応じてデータバスを選択して、選択データバスのデータをレジスタに格納する。これにより、画素アレイの画素位置に応じたしきい値データを対応のプロセッサエレメントに格納して、所定サイズの画素マトリクスに対してフィルタ処理を行う。グローバルプロセッサからのデータを、並行してプロセッサエレメントに転送して格納して、データ転送サイクル数を低減することを図る。 In the SIMD type processor shown in Patent Document 1, a plurality of data buses extending from the global processor are provided, and in each processor element, the data bus is selected according to the processor element number, and the data of the selected data bus is registered. To store. As a result, threshold data corresponding to the pixel position of the pixel array is stored in the corresponding processor element, and filter processing is performed on a pixel matrix of a predetermined size. Data from the global processor is transferred and stored in parallel to the processor element to reduce the number of data transfer cycles.

この特許文献１において、さらに、演算処理時において、プロセッサエレメントの演算データを、異なるプロセッサエレメント間で転送する場合、マルチプレクサを用いて、所定の範囲のプロセッサエレメント（±３隣接プロセッサエレメント）の間でデータ転送を行なう構成が示されている。このマルチプレクサにより、各プロセッサエレメントにおいて、異なる列のプロセッサエレメントのレジスタのデータを選択して演算処理を行なう。データ転送時各マルチプレクサにおいて選択経路を同じとするために、各プロセッサエレメントのレジスタファイルのレジスタの出力線が結合される共通データバス線が異なる（１つの列のレジスタを、７隣接演算器で択一的に使用するために、隣接列のレジスタが異なるバス線に接続される）。このレジスタファイルの出力信号線と共通データバスの接続
に応じて、各マルチプレクサと共通データバスのバス線との接続が変更される。このため、各プロセッサエレメント内においてマルチプレクサに対する配線が複雑となり、配線が錯綜するという問題が生じる。また、ある列のレジスタのデータを転送することの出来る範囲は、固定されており、任意のプロセッサエレメント間でのデータ転送を行う構成は示されていない。 In this Patent Document 1, in addition, when the arithmetic data of a processor element is transferred between different processor elements at the time of arithmetic processing, a multiplexer is used between processor elements (± 3 adjacent processor elements) in a predetermined range. A configuration for performing data transfer is shown. With this multiplexer, in each processor element, the data of the registers of the processor elements in different columns is selected to perform arithmetic processing. In order to make the selection path the same in each multiplexer during data transfer, the common data bus line to which the output line of the register of the register file of each processor element is coupled is different (one column register is selected by seven adjacent arithmetic units). For single use, adjacent column registers are connected to different bus lines). The connection between each multiplexer and the bus line of the common data bus is changed according to the connection between the output signal line of the register file and the common data bus. For this reason, the wiring for the multiplexer is complicated in each processor element, and the wiring is complicated. Further, the range in which the data of the register in a certain column can be transferred is fixed, and a configuration for transferring data between arbitrary processor elements is not shown.

特許文献２においては、データ転送を高速に行なうために、各プロセッサエレメントに選択信号を与え、選択されたプロセッサエレメントに対してデータの書込／読出を行なう。この特許文献２においても、レジスタファイル内のレジスタに対するデータの書込／読出を行なう構成に加えて、異なる列のプロセッサエレメント間のデータ転送時には、特許文献１と同様にマルチプレクサが用いられる。従って、内部の配線が錯綜するという問題が生じ、また、任意のプロセッサエレメント間でデータを転送することは困難である。 In Patent Document 2, in order to perform data transfer at high speed, a selection signal is given to each processor element, and data is written / read to / from the selected processor element. Also in this Patent Document 2, in addition to the configuration for writing / reading data to / from the register in the register file, a multiplexer is used in the same manner as in Patent Document 1 at the time of data transfer between processor elements in different columns. Therefore, there arises a problem that internal wiring is complicated, and it is difficult to transfer data between arbitrary processor elements.

特許文献３に示される構成においては、プロセッサエレメントに共通に設けられるデータバスを複数のセグメントに分割し、各分割データバスを介して演算器間でデータを転送する。しかしながら、この場合、１つのプロセッサエレメントから別のプロセッサエレメントに対するデータ転送が、各分割データバスにおいて実行されて、順次加算などの処理が行なわれており、各プロセッサエレメントが、有効な演算処理を各サイクルにおいて行なわれてはいない。従って、総和値演算またはピーク値演算などの処理においては適用することができるものの、通常のコピー操作などのデータ転送を行なって演算処理を行なう構成に対しては適用することは、困難である。また、この特許文献３においても、分割データバスのデータ線の選択は、マルチプレクサを用いて行なわれており、特許文献１および２と同様に、内部配線が錯綜するという問題が生じる。 In the configuration disclosed in Patent Document 3, a data bus provided in common to processor elements is divided into a plurality of segments, and data is transferred between arithmetic units via each divided data bus. However, in this case, data transfer from one processor element to another processor element is executed in each divided data bus and processing such as sequential addition is performed, and each processor element performs effective arithmetic processing. It is not done in the cycle. Therefore, although it can be applied to processing such as total value calculation or peak value calculation, it is difficult to apply to a configuration in which calculation processing is performed by performing data transfer such as a normal copy operation. Also in Patent Document 3, the selection of the data line of the divided data bus is performed by using a multiplexer, and similarly to Patent Documents 1 and 2, there is a problem that internal wiring is complicated.

また、特許文献４に示される構成においては、プロセッサエレメントをアドレス指定して、グローバルプロセッサのレジスタのデータを、選択されたプロセッサエレメント内のレジスタに転送している。しかしながら、この特許文献４においても、プロセッサエレメント間のデータ転送はマルチプレクサを用いて行なわれており、特許文献１から３と同良いうに内部配線が錯綜し、またデータの転送範囲が制限される。 In the configuration disclosed in Patent Document 4, the processor element is addressed, and the data of the global processor register is transferred to the register in the selected processor element. However, also in this Patent Document 4, data transfer between processor elements is performed using a multiplexer, the internal wiring is complicated as in Patent Documents 1 to 3, and the data transfer range is limited.

特許文献５に示される構成においては、単に、レジスタファイル内にミラーレジスタを設けて、外部からこのミラーレジスタのデータを、演算処理時においてもモニタすることを図っている。プロセッサエレメント間のデータ転送は、先の特許文献１から４と同様、マルチプレクサを用いて実行されているだけであり、内部配線が錯綜し、また、データ転送範囲が制限される。 In the configuration disclosed in Patent Document 5, a mirror register is simply provided in a register file, and data in the mirror register is monitored from the outside even during arithmetic processing. The data transfer between the processor elements is executed only by using a multiplexer as in the above-described Patent Documents 1 to 4, and the internal wiring is complicated, and the data transfer range is limited.

一般に、演算器が複数個設けられている並列演算器において、任意の演算器間でデータ転送を行なう場合、転送距離に応じて、各プロセッサエレメントの演算器にデータ転送配線を複数配置する必要があり、配線面積が増大する。この結果、上述の特許文献１から５に示されるように、レジスタファイルと演算器アレイとの間の配線占有面積が大きくなり、また配線レイアウトが錯綜するという問題が生じる。 In general, in a parallel arithmetic unit provided with a plurality of arithmetic units, when transferring data between arbitrary arithmetic units, it is necessary to arrange a plurality of data transfer wirings in the arithmetic units of each processor element according to the transfer distance. Yes, the wiring area increases. As a result, as shown in Patent Documents 1 to 5 described above, there are problems that the wiring occupation area between the register file and the arithmetic unit array becomes large and the wiring layout is complicated.

それゆえ、この発明の目的は、任意の演算器間でデータを高速に転送することのできる小占有面積の転送回路を備える半導体演算処理装置を提供することである。 SUMMARY OF THE INVENTION Therefore, an object of the present invention is to provide a semiconductor arithmetic processing apparatus including a transfer circuit having a small occupation area that can transfer data between arbitrary arithmetic units at high speed.

この発明の特定的な目的は、メモリアレイが複数のエントリに分割され、各エントリに対応して演算器が設けられる半導体演算処理装置において、この演算器間またはエントリ間でのデータ転送を配線面積の増大を伴うことなく高速で行なうことの出来る転送回路を実現することである。 A specific object of the present invention is to provide a wiring area for transferring data between arithmetic units or between entries in a semiconductor processing unit in which a memory array is divided into a plurality of entries and an arithmetic unit is provided for each entry. It is to realize a transfer circuit that can be performed at a high speed without increasing the number of.

この発明に係る半導体演算処理装置は、各々が複数のメモリセルを有する複数のエントリに分割されるメモリアレイと、これらのエントリに対応して配置され、各々が与えられたデータに対して演算処理を行なう複数の演算器と、複数のエントリ間でデータを転送する転送回路を含む。この転送回路は、エントリに対応して配置され、各々が対応のエントリのデータを異なる複数の別のエントリのいずれかに転送する複数の転送配線経路を備える。各転送配線経路は、対応のエントリのデータを受けて出力するデータ出力部と、異なる複数の別のエントリに結合される複数のデータ送出部とを有する。転送データが、各転送配線経路においてデータ出力部からデータ送出部に向かって転送される。 A semiconductor arithmetic processing apparatus according to the present invention includes a memory array that is divided into a plurality of entries each having a plurality of memory cells, and arranged corresponding to these entries, each of which performs arithmetic processing on given data. And a transfer circuit for transferring data between a plurality of entries. The transfer circuit includes a plurality of transfer wiring paths that are arranged corresponding to the entries and each transfer data of the corresponding entry to one of a plurality of different different entries. Each transfer wiring path includes a data output unit that receives and outputs data of a corresponding entry, and a plurality of data transmission units coupled to different different entries. The transfer data is transferred from the data output unit toward the data transmission unit in each transfer wiring path.

エントリ間でのデータ転送に、転送経路用いる。この転送経路は、各々が、対応のエントリのデータと複数の別のエントリへのデータを転送するデータ送出部とを有する。したがって、データ送出部を各転送経路において選択することにより、複数のエントリ間において並行してデータ転送を行なうことができ、高速データ転送を実現することができる。 A transfer path is used for data transfer between entries. Each of the transfer paths includes data of a corresponding entry and a data transmission unit that transfers data to a plurality of different entries. Therefore, by selecting the data transmission unit in each transfer path, data transfer can be performed in parallel between a plurality of entries, and high-speed data transfer can be realized.

また、各転送経路においてデータ出力部とデータ送出部とを個々別々に設けることにより、マルチプレクサを用いてデータ転送経路を設定する構成に比べて配線レイアウトが簡略化される。 In addition, by providing the data output unit and the data transmission unit separately in each transfer path, the wiring layout is simplified compared to a configuration in which the data transfer path is set using a multiplexer.

また、この転送経路を介してエントリ間で順次データを転送することにより、任意の距離はなれたエントリ間でのデータ転送を実現することが出来る。 Also, by sequentially transferring data between entries via this transfer path, data transfer between entries separated by an arbitrary distance can be realized.

［全体の構成］
図１は、この発明に従う半導体演算処理装置が適用される処理システムの全体構成を概略的に示す図である。図１において、信号処理システム１は、各種処理を実行する演算機能を実現するシステムＬＳＩ２と、システムＬＳＩ２と外部システムバス３を介して接続される外部メモリとを含む。この外部メモリは、大容量メモリ４と、高速メモリ５と、システム立上げ時の命令などの固定情報を格納する読出専用メモリ（リード・オンリー・メモリ：ＲＯＭ）６を含む。大容量メモリ４は、たとえばクロック同期型ダイナミック・ランダム・アクセス・メモリ（ＳＤＲＡＭ）で構成され、高速メモリ５は、たとえばスタティック・ランダム・アクセス・メモリ（ＳＲＡＭ）で構成される。 [Overall configuration]
FIG. 1 is a diagram schematically showing an overall configuration of a processing system to which a semiconductor processing device according to the present invention is applied. In FIG. 1, the signal processing system 1 includes a system LSI 2 that realizes an arithmetic function for executing various processes, and an external memory connected to the system LSI 2 via an external system bus 3. The external memory includes a large-capacity memory 4, a high-speed memory 5, and a read-only memory (read-only memory: ROM) 6 that stores fixed information such as instructions at the time of system startup. The large-capacity memory 4 is composed of, for example, a clock synchronous dynamic random access memory (SDRAM), and the high-speed memory 5 is composed of, for example, a static random access memory (SRAM).

システムＬＳＩ２は、内部システムバス７に並列に結合される基本演算ブロックＦＢ１−ＦＢｈと、これらの基本演算ブロックＦＢ１−ＦＢｈの処理動作を制御するホストＣＰＵ８と、信号処理システム１の外部からの入力信号ＩＮを内部処理用データに変換する入力ポート９と、内部システムバス７から与えられた内部出力データを受けて、システム外部への出力データＯＵＴを生成する出力ポート１０を含む。これらの入力ポート９および出力ポート１０は、たとえばライブラリ化されたＩＰ（インテレクチャル・プロパティ）ブロックで構成され、データ／信号の入出力に必要な機能を実現する。 The system LSI 2 includes basic operation blocks FB1 to FBh coupled in parallel to the internal system bus 7, a host CPU 8 that controls processing operations of these basic operation blocks FB1 to FBh, and input signals from the outside of the signal processing system 1. It includes an input port 9 that converts IN to internal processing data, and an output port 10 that receives internal output data given from the internal system bus 7 and generates output data OUT to the outside of the system. These input port 9 and output port 10 are constituted by, for example, library IP (intellectual property) blocks, and realize functions necessary for data / signal input / output.

システムＬＳＩ２は、さらに、基本演算ブロックＦＢ１−ＦＢｈからの割込要求を受付け、ホストＣＰＵ８に対して割込を通知する割込コントローラ１１と、ホストＣＰＵ８の各処理に必要な制御動作を行なうＣＰＵ周辺１２と、基本演算ブロックＦＢ１−ＦＢｈからの転送要求に従って外部メモリに対するデータ転送を行なうＤＭＡ（ダイレクト・メモリ・アクセス）コントローラ１３と、ホストＣＰＵ８またはＤＭＡコントローラ１３からの指示に従って、外部システムバス３に接続されるメモリ４−６に対するアクセス制御を行なう外部バスコントローラ１４と、ホストＣＰＵ８のデータ処理を補助する専用ロジック１５とを含む。 The system LSI 2 further receives an interrupt request from the basic operation blocks FB1 to FBh, notifies the host CPU 8 of the interrupt, and a CPU peripheral that performs a control operation necessary for each processing of the host CPU 8. 12, a DMA (direct memory access) controller 13 for transferring data to an external memory in accordance with a transfer request from the basic operation blocks FB1 to FBh, and an external system bus 3 in accordance with an instruction from the host CPU 8 or the DMA controller 13 The external bus controller 14 that performs access control to the memory 4-6 and the dedicated logic 15 that assists the data processing of the host CPU 8 are included.

ＣＰＵ周辺１２は、タイマおよびシリアルＩＯ（入出力）などのホストＣＰＵ８におけるプログラムおよびデバッグの用途に必要な機能を備える。専用ロジック１５は、たとえばＩＰブロックで構成され、既存の機能ブロックを用いて必要な処理機能を実現する。これらの機能ブロック９−１５は、内部システムバス７に接続される。また、ＤＭＡコントローラ１３には、基本演算ブロックＦＢ１−ＦＢｈからのＤＭＡ要求信号が与えられる。 The CPU peripheral 12 has functions necessary for program and debugging purposes in the host CPU 8 such as a timer and serial IO (input / output). The dedicated logic 15 is configured by, for example, an IP block, and implements a necessary processing function using an existing function block. These functional blocks 9-15 are connected to the internal system bus 7. The DMA controller 13 is given a DMA request signal from the basic operation blocks FB1-FBh.

基本演算ブロックＦＢ１−ＦＢｈは、同一構成を有するため、図１においては、基本演算ブロックＦＢ１の構成を代表的に示す。 Since basic operation blocks FB1-FBh have the same configuration, FIG. 1 representatively shows the configuration of basic operation block FB1.

基本演算ブロックＦＢ１は、実際のデータの演算処理を行なう主演算回路２０と、この主演算回路２０における演算処理を指定するマイクロ命令を格納するマイクロ命令メモリ２１と、マイクロ命令メモリ２１からのマイクロ命令に従って主演算回路２０の演算処理を制御するコントローラ２２と、コントローラ２２の中間処理データまたは作業データを格納するワークデータメモリ２３と、基本演算ブロックＦＢ１の内部と内部システムバス７との間でデータ／信号の転送を行なうシステムバスインターフェイス（Ｉ／Ｆ）２４とを含む。 The basic arithmetic block FB1 includes a main arithmetic circuit 20 that performs arithmetic processing on actual data, a microinstruction memory 21 that stores microinstructions that specify arithmetic processing in the main arithmetic circuit 20, and a microinstruction from the microinstruction memory 21. Controller 22 for controlling the arithmetic processing of the main arithmetic circuit 20, the work data memory 23 for storing intermediate processing data or work data of the controller 22, and the data / data between the basic arithmetic block FB1 and the internal system bus 7. And a system bus interface (I / F) 24 for transferring signals.

主演算回路２０は、複数のメモリセルが行列状に配列されかつ複数のエントリに分割されるメモリセルマット３０と、メモリセルマット３０のエントリに対応して配置され、指定された演算処理を行なう演算器（ＡＬＵ）３１と、演算器３１間のデータ転送経路を設定するＡＬＵ間相互接続用スイッチ回路３２を含む。 The main arithmetic circuit 20 is arranged corresponding to the memory cell mat 30 in which a plurality of memory cells are arranged in a matrix and divided into a plurality of entries, and the entry of the memory cell mat 30 is performed. An arithmetic unit (ALU) 31 and an ALU interconnection switch circuit 32 for setting a data transfer path between the arithmetic units 31 are included.

基本的に、１エントリに多ビットデータの各ビットが格納される。演算器（ＡＬＵ）３１は、対応のエントリからのデータビットをシリアルに受けて演算処理を行ない、その処理結果をメモリセルマット３０の指定されたエントリ（たとえば対応のエントリ）にシリアルに格納する。 Basically, each bit of multi-bit data is stored in one entry. The arithmetic unit (ALU) 31 receives the data bits from the corresponding entry in serial and performs arithmetic processing, and stores the processing result serially in a designated entry (for example, corresponding entry) of the memory cell mat 30.

ＡＬＵ間相互接続用スイッチ回路３２により、演算器３１の接続経路が切換えられ、異なったエントリのデータの演算が可能となる。各エントリに異なるデータを格納し、演算器３１により並列演算処理を行なうことにより、高速にデータ処理を行なうことができる。 The inter-ALU interconnection switch circuit 32 switches the connection path of the arithmetic unit 31 and enables calculation of data of different entries. By storing different data in each entry and performing parallel arithmetic processing by the arithmetic unit 31, data processing can be performed at high speed.

コントローラ２２は、マイクロ命令メモリ２１に格納されるマイクロ命令に従って、マイクロプログラム方式に従った動作を行なう。このマイクロプログラム動作に必要なワークデータが、ワークデータメモリ２３に格納される。 The controller 22 performs an operation according to the microprogram method in accordance with the microinstruction stored in the microinstruction memory 21. Work data necessary for this microprogram operation is stored in the work data memory 23.

システムバスＩ／Ｆ２４により、ホストＣＰＵ８またはＤＭＡコントローラ１３が、メモリセルマット３０、コントローラ２２内の制御レジスタ、マイクロ命令メモリ２１およびワークデータメモリ２３へアクセスすることが可能となる。 The system bus I / F 24 enables the host CPU 8 or the DMA controller 13 to access the memory cell mat 30, the control register in the controller 22, the microinstruction memory 21, and the work data memory 23.

基本演算ブロックＦＢ１−ＦＢｈには、異なるアドレス領域（ＣＰＵアドレス領域）が割付けられる。同様、基本演算ブロックＦＢ１−ＦＢｈ内のメモリセルマット３０、コントローラ２２内の制御レジスタ、マイクロ命令メモリ２１、およびワークデータメモリ２３についても、それぞれ異なるアドレス（ＣＰＵアドレス）が割付けられる。したがって、これらの基本演算ブロックＦＢ１−ＦＢｈそれぞれにおいて、異なる内容のマイクロ命令を格納することにより、異なる演算処理を並行して実行することができる。また、基本演算ブロックＦＢ１−ＦＢｈにおいて、異なるアドレス領域のデータについて同一の演算処理が行なわれるように、マイクロ命令メモリ２１に同一の演算内容のマイクロ命令が格納されてもよい。また、マイクロ命令メモリ２１においては、マイクロ命令が格納される
ものの、マクロ命令が格納されていても良い。 Different address areas (CPU address areas) are allocated to the basic operation blocks FB1 to FBh. Similarly, different addresses (CPU addresses) are assigned to the memory cell mat 30 in the basic operation blocks FB1 to FBh, the control register in the controller 22, the microinstruction memory 21, and the work data memory 23, respectively. Therefore, different arithmetic processes can be executed in parallel by storing microinstructions having different contents in each of these basic arithmetic blocks FB1-FBh. In the basic operation blocks FB1 to FBh, microinstructions having the same operation content may be stored in the microinstruction memory 21 so that the same operation processing is performed on data in different address areas. The microinstruction memory 21 may store microinstructions but may store macroinstructions.

各割付けられたアドレスに従って、ホストＣＰＵ８およびＤＭＡコントローラ１３が、アクセス対象の基本演算ブロックＦＢｉ（ＦＢ１−ＦＢｈのいずれか）を識別し、該アクセス対象の基本演算ブロックに対するアクセスを実行する。 According to each assigned address, the host CPU 8 and the DMA controller 13 identify the basic arithmetic block FBi to be accessed (any one of FB1 to FBh) and execute access to the basic arithmetic block to be accessed.

図２は、図１に示す基本演算ブロックＦＢ１−ＦＢｈそれぞれに含まれる主演算回路２０の要部の構成を概略的に示す図である。図２において、メモリセルマット３０において、メモリセルＭＣが行列状に配列される。このメモリセルマットは、ｍ個のエントリＥＲＹに分割される。エントリＥＲＹがｎビットのビット幅を有する。このメモリセルマット３０においては、ｍ個のエントリＥＲＹに共通にワード線が配設され、各エントリ個々に、ビット線が配設される。基本的に、１つのエントリＥＲＹは、この１列（ビット線延在方向）に整列するメモリセルＭＣで構成される。したがって、エントリＥＲＹの数は、メモリセルマットのビット線の数で決定される。 FIG. 2 schematically shows a configuration of a main part of main arithmetic circuit 20 included in each of basic arithmetic blocks FB1-FBh shown in FIG. In FIG. 2, in the memory cell mat 30, memory cells MC are arranged in a matrix. This memory cell mat is divided into m entries ERY. The entry ERY has a bit width of n bits. In this memory cell mat 30, a word line is provided in common for m entries ERY, and a bit line is provided for each entry. Basically, one entry ERY is composed of memory cells MC aligned in this one column (bit line extending direction). Therefore, the number of entries ERY is determined by the number of bit lines of the memory cell mat.

エントリＥＲＹそれぞれに対応して、演算器（ＡＬＵ）３１が、演算処理ユニット３５において配置される。この演算器３１は、加算、論理積、一致検出（ＥＸＯＲ）、および反転（ＮＯＴ）などの演算を実行することができる。 Corresponding to each entry ERY, an arithmetic unit (ALU) 31 is arranged in the arithmetic processing unit 35. The computing unit 31 can perform operations such as addition, logical product, coincidence detection (EXOR), and inversion (NOT).

エントリＥＲＹと対応の演算器３１との間でのデータのロード（メモリセルマット３０から演算処理ユニット３５へのデータの転送）およびストア（演算処理ユニット３５からメモリセルマット３０へのデータの転送および格納）を行なって演算処理を実行する。エントリＥＲＹには、多ビットデータの各ビットが格納される。 Data load (transfer of data from the memory cell mat 30 to the arithmetic processing unit 35) and store (data transfer from the arithmetic processing unit 35 to the memory cell mat 30) and between the entry ERY and the corresponding arithmetic unit 31 Store) to execute the arithmetic processing. Each bit of the multi-bit data is stored in the entry ERY.

演算器ユニット３５に対してＡＬＵ相互接続用スイッチ回路３２が配置される。このＡＬＵ相互接続用スイッチ回路３２により演算器３１間のデータ転送を実現する。 An ALU interconnection switch circuit 32 is arranged for the arithmetic unit 35. This ALU interconnection switch circuit 32 realizes data transfer between the arithmetic units 31.

演算器３１は、ビットシリアルな態様（多ビットデータワードをビット単位で逐次処理する態様）で演算処理を実行する。演算処理ユニット３５においては、データワードについてビットシリアル態様でかつ複数のエントリＥＲＹのデータが並行して処理されるエントリパラレルな態様でデータの演算処理が実行される。 The arithmetic unit 31 performs arithmetic processing in a bit serial mode (a mode in which multi-bit data words are sequentially processed in units of bits). In the arithmetic processing unit 35, the data arithmetic processing is executed in a bit serial manner with respect to the data word and in an entry parallel manner in which the data of the plurality of entries ERY are processed in parallel.

このエントリＥＲＹのビット幅を変更することにより、データワードの語構成が異なる場合においても、演算サイクル数（アドレスポインタの範囲）を変更するだけで、データ処理を実行することができる。また、エントリ数ｍを多くすることにより、大量のデータを一括して演算処理することができる。 By changing the bit width of this entry ERY, data processing can be executed only by changing the number of operation cycles (range of address pointer) even when the word structure of the data word is different. Further, by increasing the number of entries m, a large amount of data can be collectively processed.

メモリセルＭＣは、たとえばＣＭＯＳ（相補金属‐絶縁膜‐半導体）構成のＳＲＡＭセルで構成され、高速でデータの書込／読出を行なう。また、メモリセルＭＣとして、ＳＲＡＭセルを利用することにより、メモリセルマット３０において記憶データのリフレッシュを行なう必要がなく、動作制御が容易となり、演算処理を高速で実行することができる。 Memory cell MC is formed of, for example, an SRAM cell having a CMOS (complementary metal-insulating film-semiconductor) structure, and performs data writing / reading at high speed. Further, by using an SRAM cell as the memory cell MC, it is not necessary to refresh the stored data in the memory cell mat 30, operation control is facilitated, and arithmetic processing can be executed at high speed.

主演算回路２０において演算を行なう場合には、基本的に、先ず、エントリＥＲＹ各々に演算対象のデータを格納する。次いで、格納されたデータのある桁のビットをすべてのエントリＥＲＹについて並列に読出して、対応の演算器３１へ転送（ロード）する。二項演算の場合には、各エントリＥＲＹにおいて、別のデータワードのビットに対しても同様の転送を行なった後、各演算器３１において２入力演算を行なう。この演算処理結果は、演算器３１から対応のエントリＥＲＹ内の所定の領域に再書込（ストア）される。 When the main arithmetic circuit 20 performs an operation, basically, data to be calculated is first stored in each entry ERY. Next, a certain digit of the stored data is read in parallel for all the entries ERY and transferred (loaded) to the corresponding arithmetic unit 31. In the case of binary operation, the same transfer is performed for bits of another data word in each entry ERY, and then 2-input operation is performed in each calculator 31. The calculation processing result is rewritten (stored) from the calculator 31 in a predetermined area in the corresponding entry ERY.

図３は、図２に示す主演算回路２０における演算操作の一例を模式的に示す図である。図３において、２ビット幅のデータワードａおよびｂの加算を行なって、データワードｃを生成する。エントリＥＲＹには、演算対象の組をなすデータワードａおよびｂがともに格納される。 FIG. 3 is a diagram schematically showing an example of the arithmetic operation in the main arithmetic circuit 20 shown in FIG. In FIG. 3, data words a and b having a 2-bit width are added to generate data word c. The entry ERY stores both data words a and b that form a set to be calculated.

図３において、第１行目のエントリＥＲＹに対する演算器３１においては、１０Ｂ＋０１Ｂの加算が行なわれ、２行目のエントリＥＲＹに対する演算器３１においては、００Ｂ＋１１Ｂの演算が行なわれる。ここで、末尾の“Ｂ”は、２進数を示す。３行目のエントリＥＲＹに対する演算器３１においては、１１Ｂ＋１０Ｂの演算が行なわれる。同様に、エントリＥＲＹ各々に格納されたデータワードａおよびｂの加算が実行される。 In FIG. 3, the calculator 31 for the entry ERY in the first row adds 10B + 01B, and the calculator 31 for the entry ERY in the second row calculates 00B + 11B. Here, “B” at the end indicates a binary number. In the calculator 31 for the entry ERY in the third row, the calculation of 11B + 10B is performed. Similarly, addition of data words a and b stored in each entry ERY is executed.

演算は、下位側ビットから順にビットシリアル態様で行なわれる。また、エントリＥＲＹにおいてデータワードａの下位ビットａ［０］を対応の演算器３１へ転送する。次いで、データワードｂの下位ビットｂ［０］を対応の演算器３１へ転送する。演算器３１においては、これらの与えられた２ビットデータを用いて加算演算を行なう。この加算演算結果ａ［０］＋ｂ［０］は、データワードｃの下位ビットｃ［０］の位置に書込まれる（ストアされる）。すなわち、１行目のエントリＥＲＹにおいて、“１”が、ビットｃ［０］の位置に書込まれる。 The calculation is performed in a bit serial manner in order from the lower bit. In addition, the lower bit a [0] of the data word a is transferred to the corresponding arithmetic unit 31 in the entry ERY. Next, the lower bit b [0] of the data word b is transferred to the corresponding computing unit 31. The arithmetic unit 31 performs an addition operation using these given 2-bit data. This addition operation result a [0] + b [0] is written (stored) at the position of the lower bit c [0] of the data word c. That is, “1” is written at the position of bit c [0] in entry ERY in the first row.

この加算処理は、次いで、上位ビットａ［１］およびｂ［１］に対しても行ない、その演算結果ａ［１］＋ｂ［１］がビットｃ［１］の位置に書込まれる。 This addition processing is then performed for the upper bits a [1] and b [1], and the operation result a [1] + b [1] is written at the position of bit c [1].

加算演算においては、桁上がりが生じる可能性がある。この桁上がり（キャリー）の値が、ビットｃ［２］の位置に書込まれる。これにより、データワードａおよびｂの加算が、すべてのエントリＥＲＹにおいて完了し、その結果が、データｃとして各エントリＥＲＹにおいて格納される。エントリとして、たとえば１０２４エントリを準備した場合、１０２４組のデータの加算を並列に実行することができる。 A carry may occur in the addition operation. This carry value is written at the position of bit c [2]. Thereby, the addition of the data words a and b is completed in all the entries ERY, and the result is stored as data c in each entry ERY. For example, when 1024 entries are prepared as entries, 1024 sets of data can be added in parallel.

図４は、この加算演算処理時における内部タイミングを模式的に示す図である。以下、図４を参照して、加算演算の内部タイミングについて説明する。この加算演算処理においては、演算器３１に含まれる２ビット加算器（ＡＤＤ）が利用される。 FIG. 4 is a diagram schematically showing the internal timing during the addition calculation process. Hereinafter, the internal timing of the addition operation will be described with reference to FIG. In this addition calculation process, a 2-bit adder (ADD) included in the calculator 31 is used.

図４において、“Read”は、メモリセルマット３０から演算対象のデータビットを読出して対応の演算器に転送する動作（ロード）または動作命令を示し、“Write”は、演算器３１の演算結果データを対応のエントリＥＲＹの対応のビット位置に書込む動作（ストア）または動作命令を示す。 In FIG. 4, “Read” indicates an operation (load) or operation instruction for reading the data bit to be calculated from the memory cell mat 30 and transferring it to a corresponding arithmetic unit, and “Write” indicates an operation result of the arithmetic unit 31. Indicates an operation (store) or operation instruction for writing data to a corresponding bit position of a corresponding entry ERY.

マシンサイクルｋにおいて、データビットａ［ｉ］がメモリセルマット３０から読出され、次のマシンサイクル（ｋ＋１）で、次の演算対象のデータビットｂ［ｉ］が読出され（Read）、演算器３１の加算器（ＡＤＤ）にそれぞれ与えられる。 In the machine cycle k, the data bit a [i] is read from the memory cell mat 30, and in the next machine cycle (k + 1), the next operation target data bit b [i] is read (Read). To the adder (ADD).

マシンサイクル（ｋ＋２）において、演算器３１の加算器（ＡＤＤ）において、読出されたデータビットａ［ｉ］およびｂ［ｉ］の加算処理が行なわれる。マシンサイクル（ｋ＋３）において、この加算結果ｃ［ｉ］が、対応のエントリの対応のビット位置に書込まれる。 In the machine cycle (k + 2), the adder (ADD) of the arithmetic unit 31 performs addition processing of the read data bits a [i] and b [i]. In the machine cycle (k + 3), this addition result c [i] is written in the corresponding bit position of the corresponding entry.

次のマシンサイクル（ｋ＋４）および（ｋ＋５）において、次の演算対象のデータビットａ［ｉ＋１］およびｂ［ｉ＋１］が読出され、演算器３１の加算器（ＡＤＤ）へ転送される。マシンサイクル（ｋ＋６）において、演算器３１により加算処理が行なわれ、マシンサイクル（ｋ＋７）において、加算結果がビット位置ｃ［ｉ＋１］へ格納される。 In the next machine cycles (k + 4) and (k + 5), the next operation target data bits a [i + 1] and b [i + 1] are read and transferred to the adder (ADD) of the arithmetic unit 31. In the machine cycle (k + 6), addition processing is performed by the arithmetic unit 31, and the addition result is stored in the bit position c [i + 1] in the machine cycle (k + 7).

メモリセルマット３０と演算器３１の間のデータビットの転送に、それぞれ１マシンサイクルが必要とされ、演算器３１において１マシンサイクルの演算サイクルが必要とされる。したがって、１ビットデータの加算および加算結果の格納を行なうために、４マシンサイクルが必要となる。メモリセルマット３０を、複数のエントリＥＲＹに分割し、各エントリに演算対象データの組をそれぞれ格納し、対応の演算器３１においてビットシリアル態様で演算処理を行なう方式の特徴は、１つ１つのデータ演算には、比較的多くのマシンサイクルが必要とされるものの、処理すべきデータ量が非常に多い場合には、演算の並列度を高くすることにより高速データ処理を実現することができるということである。 Transfer of data bits between the memory cell mat 30 and the arithmetic unit 31 requires one machine cycle, and the arithmetic unit 31 requires one machine cycle. Therefore, 4 machine cycles are required to add 1-bit data and store the addition result. The memory cell mat 30 is divided into a plurality of entries ERY, a set of data to be calculated is stored in each entry, and a calculation process is performed in a bit serial manner in the corresponding calculator 31. Although data operations require a relatively large number of machine cycles, if the amount of data to be processed is very large, high-speed data processing can be realized by increasing the parallelism of the operations. That is.

たとえば、演算対象のデータワードのビット幅がＮの場合、各エントリの演算には、４・Ｎマシンサイクルが必要となる。演算対象のデータワードのビット幅は、８ビットから６４ビット程度である。エントリ数ｍを、たとえば１０２４と大きくすることにより、並列演算処理時に、たとえば８ビットデータの場合、３２マシンサイクルで１０２４個の演算結果を得ることができ、１０２４組のデータをシーケンシャルに処理する場合に比べて大幅に処理時間を短縮することができる。 For example, when the bit width of the data word to be operated is N, 4 · N machine cycles are required for the operation of each entry. The bit width of the data word to be calculated is about 8 to 64 bits. When the number of entries m is increased to, for example, 1024, for example, in the case of 8-bit data, 1024 operation results can be obtained in 32 machine cycles, and 1024 sets of data are processed sequentially. The processing time can be greatly shortened compared to.

また、ビットシリアル態様で演算処理を行なっており、処理されるデータのビット幅は固定されないため、種々のデータ構成を有する種々のアプリケーションに容易に適用することができる。 In addition, since the arithmetic processing is performed in the bit serial form and the bit width of the processed data is not fixed, it can be easily applied to various applications having various data configurations.

図５は、図１に示すコントローラ２２の制御態様を概略的に示す図である。このコントローラ２２に対応して、レジスタ群４０が設けられる。このレジスタ群４０においては、ポインタレジスタｒ０−ｒ３が設けられる。演算対象のデータのメモリセルマット３０内のアドレスが、これらのポインタレジスタｒ０−ｒ３に格納される。コントローラ２２は、このポインタレジスタｒ０−ｒ３に格納されるポインタに従って、主演算回路におけるエントリまたはエントリ内位置を指定するアドレスを生成して、メモリセルマット（３０）と演算処理ユニット（３５）の間のデータ転送（ロード／ストア）を制御する。また、このコントローラ２２は、ポインタレジスタｒ０−ｒ３のポインタに従って、マイクロ命令メモリ２１から転送命令が与えられたとき、この演算処理ユニット（３５）における演算器（ＡＬＵ）３１間の接続指定情報を設定する。 FIG. 5 is a diagram schematically showing a control mode of the controller 22 shown in FIG. Corresponding to the controller 22, a register group 40 is provided. In this register group 40, pointer registers r0-r3 are provided. The addresses in the memory cell mat 30 of the data to be calculated are stored in these pointer registers r0-r3. The controller 22 generates an address for designating an entry in the main arithmetic circuit or an in-entry position according to the pointer stored in the pointer registers r0 to r3, and between the memory cell mat (30) and the arithmetic processing unit (35). Control data transfer (load / store). The controller 22 sets connection designation information between the arithmetic units (ALU) 31 in the arithmetic processing unit (35) when a transfer instruction is given from the microinstruction memory 21 according to the pointers of the pointer registers r0 to r3. To do.

［演算器の構成１］
図６は、図１に示す演算器（３１）の構成および１つの演算器に関連する部分の構成を概略的に示す図である。図６において、演算器（ＡＬＵ）３１は、指定された演算処理を行なう算術演算論理回路５０と、対応のエントリから読出されたデータまたは算術演算論理回路５０の演算処理結果データまたは対応のエントリへ転送するデータを一時的に格納するＸレジスタ５４と、加演算処理時のキャリーまたはボローを格納するＣレジスタ５６と、この算術演算論理回路５０の演算処理の禁止を指定するマスクデータを格納するＭレジスタ５８を含む。 [Configuration 1 of the computing unit]
FIG. 6 is a diagram schematically showing a configuration of the computing unit (31) shown in FIG. 1 and a configuration of a portion related to one computing unit. In FIG. 6, an arithmetic unit (ALU) 31 performs an arithmetic operation logic circuit 50 that performs a specified operation process, and data read from a corresponding entry or operation process result data of the arithmetic operation logic circuit 50 or a corresponding entry. An X register 54 for temporarily storing data to be transferred, a C register 56 for storing a carry or a borrow at the time of addition operation processing, and an M for storing mask data for designating prohibition of the arithmetic processing of the arithmetic operation logic circuit 50 Register 58 is included.

エントリと演算器の間には、センスアンプ６２およびライトドライバ６０が設けられる。これらのセンスアンプ６２およびライトドライバ６０は、対応のエントリのビット線対ＢＬＰに結合される。センスアンプ６２は、対応のエントリのメモリセルから読出されるデータを増幅し、その増幅データを内部データ転送線２００を介してＸレジスタ５４へ転送する。ライトドライバ６０は、Ｘレジスタ５４に格納されたデータをバッファ処理して、対応のエントリのメモリセルへ対応のビット線対ＢＬＰを介して書込む。 A sense amplifier 62 and a write driver 60 are provided between the entry and the arithmetic unit. Sense amplifier 62 and write driver 60 are coupled to bit line pair BLP of the corresponding entry. Sense amplifier 62 amplifies data read from the memory cell of the corresponding entry, and transfers the amplified data to X register 54 via internal data transfer line 200. The write driver 60 buffers the data stored in the X register 54 and writes the data to the memory cell of the corresponding entry via the corresponding bit line pair BLP.

この算術演算論理回路５０は、加算（ＡＤＤ）、論理積（ＡＮＤ）、論理和（ＯＲ）、排他的論理和（ＥＸＯＲ）、反転（ＮＯＴ）等の演算を実行することができ、その演算内
容が、先の図５に示すコントローラ２２からの制御信号（ＡＬＵ制御）により設定される。Ｍレジスタ５８に格納されるマスクデータにより、この演算器３１における演算処理動作を選択的にイネーブル／ディスエーブルする。この演算マスク機能を利用することにより、仮に全エントリが利用されない場合においても、有効エントリに対してのみ演算を実行して、正確な処理を行なうことができる。また、不必要な演算を停止させることにより、消費電流を低減することができる。 This arithmetic operation logic circuit 50 can execute operations such as addition (ADD), logical product (AND), logical sum (OR), exclusive logical sum (EXOR), inversion (NOT), etc. Is set by a control signal (ALU control) from the controller 22 shown in FIG. Based on the mask data stored in the M register 58, the arithmetic processing operation in the arithmetic unit 31 is selectively enabled / disabled. By using this calculation mask function, even when not all entries are used, it is possible to execute an operation only on valid entries and perform accurate processing. In addition, current consumption can be reduced by stopping unnecessary computations.

Ｘレジスタ５０は、また、ＡＬＵ間相互接続スイッチ回路３２に含まれるＡＬＵ間接続回路６５を介して他の演算器（ＡＬＵ）に接続される。このＡＬＵ間接続回路６５の構成については、後に詳細に説明する。このＡＬＵ間接続回路６５の転送機能により、メモリマット内のさまざまな物理位置に格納されているデータに対する演算を実現することができ、演算の自由度を高くすることができる。 The X register 50 is also connected to another arithmetic unit (ALU) via an inter-ALU connection circuit 65 included in the inter-ALU interconnection switch circuit 32. The configuration of the inter-ALU connection circuit 65 will be described in detail later. By the transfer function of the inter-ALU connection circuit 65, it is possible to realize an operation on data stored at various physical positions in the memory mat, and to increase the degree of freedom of the operation.

図７は、ＡＬＵ命令のうち、エントリ間のデータ移動（Move）を行なう命令を一覧にして示す図である。 FIG. 7 is a diagram showing a list of instructions for moving data between entries among ALU instructions.

命令“ecm.mv.n♯n”は、データ移動命令（Move）における移動量を数値♯ｎで規定する。したがってこの命令で、Ｘレジスタ間のデータ転送において、エントリｊ＋ｎのＸレジスタの格納値が、エントリｊのＸレジスタに格納される。一例として、エントリ移動量ｎは、０から１２８の範囲の整数値を取り、最大１２８ビット離れた位置のエントリ間でデータ移動を行なうことができる。 The instruction “ecm.mv.n # n” defines the movement amount in the data movement instruction (Move) with a numerical value #n. Therefore, with this instruction, in the data transfer between the X registers, the stored value of the X register of entry j + n is stored in the X register of entry j. As an example, the entry movement amount n takes an integer value ranging from 0 to 128, and data movement can be performed between entries at positions separated by a maximum of 128 bits.

命令“ecm.mv.r rx”は、ポインタレジスタｒｘに格納された値だけエントリ間をデータ移動させる命令である。この命令が実行されると、エントリｊ＋ｒｘのＸレジスタの格納値が、エントリｊのＸレジスタに転送される。 The instruction “ecm.mv.r rx” is an instruction for moving data between entries by the value stored in the pointer register rx. When this instruction is executed, the stored value of the X register of entry j + rx is transferred to the X register of entry j.

このＡＬＵ命令に従って、ＡＬＵ間接続回路６５における接続経路が設定され、各エントリ対応に設けられる演算器において、Ｘレジスタを用いて並列に、データ転送が実行される。 In accordance with the ALU instruction, a connection path in the inter-ALU connection circuit 65 is set, and the arithmetic unit provided for each entry executes data transfer in parallel using the X register.

［演算の構成２］
図８は、この発明において利用される演算器３１の別の構成を概略的に示す図である。この図８に示す演算器３１の構成に対しては、メモリセルマットにおいて１つのエントリＥＲＹが、偶数アドレスのデータビットＡ［２ｉ］を格納する偶数エントリＥＲＹｅと、奇数アドレスのデータＴビットＡ［２ｉ＋１］を格納する奇数エントリＥＲＹｏとで構成される。偶数エントリＥＲＹｅおよび奇数エントリＥＲＹｏの同じアドレスのデータビットに対し、並列に演算処理を実行することにより、処理の高速化を図る。 [Calculation configuration 2]
FIG. 8 is a diagram schematically showing another configuration of the arithmetic unit 31 used in the present invention. For the configuration of the arithmetic unit 31 shown in FIG. 8, one entry ERY in the memory cell mat has an even entry ERYe for storing even-address data bits A [2i] and an odd-address data T bit A [ 2i + 1] and an odd entry ERyo. By executing arithmetic processing in parallel on the data bits at the same address in the even-numbered entry ERYe and the odd-numbered entry ERYo, the processing speed is increased.

演算器（ＡＬＵ）３１においては、演算処理を行なうための縦続される全加算器２１０および２１１が、演算処理部として設けられる。全加算器２１０および２１１は、それぞれ入力ＡおよびＢに与えられたデータビットを加算し、サム出力Ｓおよびキャリー出力Ｃｏに演算結果を出力する。また、全加算器２１０は、Ｃレジスタ５６に格納されるデータをキャリー入力ＣＩｎに受ける。１ビット動作時には、このＣレジスタ５６のキャリーが、または全加算器２１１のキャリー入力Ｃｉｎに与えられる。２ビット並列に処理する２ビット動作時においては、全加算器２１０のキャリー出力Ｃｏが、全加算器２１１のキャリー入力Ｃｉｎに伝達される。この全加算器２１０および２１１の接続を切り替えることにより、２ビット並列演算および１ビット逐次処理を実行することが出来る。 In the arithmetic unit (ALU) 31, cascaded full adders 210 and 211 for performing arithmetic processing are provided as arithmetic processing units. Full adders 210 and 211 add the data bits applied to inputs A and B, respectively, and output the operation result to sum output S and carry output Co. Full adder 210 receives the data stored in C register 56 at carry input CIn. At the time of 1-bit operation, the carry of the C register 56 or the carry input Cin of the full adder 211 is given. In a 2-bit operation in which 2-bit processing is performed in parallel, the carry output Co of the full adder 210 is transmitted to the carry input Cin of the full adder 211. By switching the connections of the full adders 210 and 211, 2-bit parallel operation and 1-bit sequential processing can be executed.

この演算器３１は、対応のエントリのメモリセルからのロードデータの一時保存を行ないかつ演算途中の結果の一時保存を行なうＸレジスタ５４が設けられる。二項演算処理時
においては、Ｘレジスタ５４に第１の演算データビットが格納されたとき、次の（別の）演算データビットが、この演算器３１に直接対応のメモリセルマットのエントリから与えられて演算処理が実行される。このＸレジスタ５０は、ＡＬＵ間接続用スイッチ回路を介して他の演算器（ＡＬＵ）と接続され、異なる演算器間でデータ転送を行なうことができる。 The computing unit 31 is provided with an X register 54 for temporarily storing load data from the memory cell of the corresponding entry and temporarily storing a result in the middle of the operation. At the time of binary operation processing, when the first operation data bit is stored in the X register 54, the next (other) operation data bit is given from the entry of the memory cell mat directly corresponding to the operation unit 31. And the arithmetic processing is executed. The X register 50 is connected to another arithmetic unit (ALU) via an inter-ALU connection switch circuit, and can transfer data between different arithmetic units.

演算器（ＡＬＵ）３１は、さらに、２ビットデータを並列に格納するためのＸＨレジスタ２２０およびＸＬレジスタ２２１と、Ｄレジスタ２２２の格納値に従って、レジスタ５４、２２０および２２１のデータの組の一方の２ビットを選択するセレクタ（ＳＥＬ）２２７と、Ｆレジスタ２０５の格納ビットに従ってセレクタ２２７の選択した２ビットに対する反転／非反転操作を行なう選択反転回路２１７と、Ｎレジスタ２０７とＶレジスタ２０８の格納データに従って、全加算器２１０および２１１の出力Ｓからのデータビットを選択的に出力するゲート２２３および２２４を含む。 The arithmetic unit (ALU) 31 further includes one of the data sets of the registers 54, 220, and 221 in accordance with the stored values of the XH register 220 and XL register 221 and the D register 222 for storing 2-bit data in parallel. A selector (SEL) 227 that selects 2 bits, a selection inversion circuit 217 that performs inversion / non-inversion operation on the 2 bits selected by the selector 227 in accordance with the storage bits of the F register 205, and data stored in the N register 207 and the V register 208 , Gates 223 and 224 for selectively outputting data bits from outputs S of full adders 210 and 211 are included.

選択反転回路２１７の２ビット出力は、全加算器２１０および２１１の入力Ａへそれぞれ与えられる。ＸＨレジスタ２２０およびＸＬレジスタ２２１は、それぞれ内部データ線２２６および２２８を介して奇数エントリＥＲＹｏの奇数アドレスビットおよび偶数エントリＥＲＹｅの偶数アドレスビットの転送を行なう。Ｘレジスタ５４は、２ビット／１ビット動作に従って、スイッチ回路ＳＷａおよびＳＷｂにより、内部データ線２２６および２２８の一方に選択的に接続される。 The 2-bit output of selective inversion circuit 217 is applied to inputs A of full adders 210 and 211, respectively. XH register 220 and XL register 221 transfer odd address bits of odd entry ERYo and even address bits of even entry ERYe via internal data lines 226 and 228, respectively. X register 54 is selectively connected to one of internal data lines 226 and 228 by switch circuits SWa and SWb according to a 2-bit / 1-bit operation.

Ｎレジスタ２０７は、定数値を格納する。Ｖレジスタ２０８は、ゲート２２３および２２４の転送経路の遮断／接続を制御するマスクビットを格納する。 The N register 207 stores a constant value. The V register 208 stores a mask bit for controlling the interruption / connection of the transfer path of the gates 223 and 224.

全加算器２１０の入力Ｂは、内部データ線２２６に結合され、全加算器２１０のサム出力Ｓを受けるゲート２２３の出力が、また内部データ線２２６に接続される。全加算器２１１の入力Ｂは、スイッチ回路ＳＷｃおよびＳＷｄにより、内部データ線２２６および２２８の一方に選択的に接続される。全加算器２１１のサム出力Ｓからのデータビットゲート２２４に与えられる。ゲート２２４の出力は、スイッチ回路ＳＷｅおよびＳＷｆに従って、内部データ線２２６および２２８の一方に選択的に接続される。これらのスイッチ回路ＳＷａ−ＳＷｆにより、２ビット並列除算処理を行なう場合の１ビット単位のビットシリアル処理を実行する。 The input B of full adder 210 is coupled to internal data line 226, and the output of gate 223 receiving sum output S of full adder 210 is also connected to internal data line 226. Input B of full adder 211 is selectively connected to one of internal data lines 226 and 228 by switch circuits SWc and SWd. The data is added to the data bit gate 224 from the sum output S of the full adder 211. The output of gate 224 is selectively connected to one of internal data lines 226 and 228 in accordance with switch circuits SWe and SWf. These switch circuits SWa-SWf execute bit serial processing in units of 1 bit when performing 2-bit parallel division processing.

ゲート２２３および２２４は、Ｖレジスタ２０８およびＮレジスタ２０７の格納値がともに“１”のときに、指定された演算処理を実行し（データ転送動作を行ない）、それ以外においては、ハイインピーダンスを出力する（出力ハイインピーダンス状態となる）。 Gates 223 and 224 execute designated arithmetic processing (perform data transfer operation) when the stored values of V register 208 and N register 207 are both “1”, and otherwise output high impedance. (Output high impedance state).

全加算器２１１のキャリー入力Ｃｉｎおよび全加算器２１０のキャリー出力Ｃｏに対して設けられるスイッチ２２５は、１ビット単位での演算処理を行なう場合に、全加算器２１０のキャリー出力Ｃｏを切離し、全加算器２１１のキャリー入力ＣｉｎをＣレジスタ５６に接続する。 A switch 225 provided for the carry input Cin of the full adder 211 and the carry output Co of the full adder 210 disconnects the carry output Co of the full adder 210 when performing arithmetic processing in units of one bit. The carry input Cin of the adder 211 is connected to the C register 56.

この演算器３１においては、Ｘレジスタ５４、ＸＨレジスタ２２０およびＸＬレジスタ２２１が、他のエントリの対応のレジスタとデータ転送を行なう機能を有する。 In this computing unit 31, the X register 54, the XH register 220, and the XL register 221 have a function of performing data transfer with the corresponding registers of other entries.

この変更例２の構成を利用する場合、１つの演算器が２つのエントリに対応して配置される。したがって、１ビット逐次処理に加えて、２ビット並列演算を行なって高速処理を実行することができ、また演算処理ユニットにおいて、余裕を持って演算器３１を配置することができる。 When the configuration of the second modification is used, one arithmetic unit is arranged corresponding to two entries. Therefore, in addition to 1-bit sequential processing, 2-bit parallel computation can be performed to execute high-speed processing, and the computing units 31 can be arranged with a margin in the computation processing unit.

図９は、この図８に示す演算器（３１）の２ビット演算時におけるＡＬＵ３１の内部の接続を概略的に示す図である。この２ビット演算時、特に２次のブースアルゴリズムに従って乗算を行なう場合、Ｘレジスタ５４は、スイッチ回路ＳＷａを介して内部データ線２２６に結合される。スイッチ回路ＳＷｂは、ＸＬレジスタ５４と内部データ線２２８を切離す状態に設定される。 FIG. 9 is a diagram schematically showing internal connections of the ALU 31 at the time of 2-bit computation of the computing unit (31) shown in FIG. At the time of this 2-bit operation, particularly when multiplication is performed according to the secondary Booth algorithm, the X register 54 is coupled to the internal data line 226 via the switch circuit SWa. Switch circuit SWb is set to a state in which XL register 54 and internal data line 228 are disconnected.

スイッチ回路ＳＷｂが、全加算器２１１の入力Ｂと内部データ線２２６とを切離す。スイッチ２２５は、全加算器２１０のキャリー出力Ｃｏと全加算器２１０のキャリー入力Ｃｉｎとを分離する。Ｃレジスタ５６が、スイッチ２２５を介して全加算器２１０のキャリー入力Ｃｉｎに結合される。ゲート回路２２４の出力はスイッチ回路ＳＷｆにより内部データ線２２８に結合される。 Switch circuit SWb disconnects input B of full adder 211 from internal data line 226. The switch 225 separates the carry output Co of the full adder 210 and the carry input Cin of the full adder 210. C register 56 is coupled to carry input Cin of full adder 210 via switch 225. The output of gate circuit 224 is coupled to internal data line 228 by switch circuit SWf.

２ビット演算時においては、全加算器２１０および２１１が並列に動作し、選択反転回路２１７から与えられるデータビットをそれぞれ内部転送線２２６および２２８から転送されるデータビットと加算し、それぞれの加算結果をゲート２２３および２２４を介して内部データ線２２６および２２８へ出力する。したがって、加算結果は、ビットＡｊ［ｐｘ］およびＡｊ［ｐｘ＋１］について並列に算出する。ここで、ｐｘおよびｐｘ＋１は、ポインタレジスタのポインタ値であり、２ビット並列動作時には、メモリセルマット内において同一のワード線アドレスビットである。 At the time of 2-bit operation, full adders 210 and 211 operate in parallel to add the data bits supplied from selective inversion circuit 217 to the data bits transferred from internal transfer lines 226 and 228, respectively, and the respective addition results Are output to internal data lines 226 and 228 via gates 223 and 224, respectively. Therefore, the addition result is calculated in parallel for the bits Aj [px] and Aj [px + 1]. Here, px and px + 1 are pointer values of the pointer register, and are the same word line address bits in the memory cell mat during 2-bit parallel operation.

メモリセルマットにおいては、偶数エントリＥＲＹｅおよび奇数エントリＥＲＹｏにそれぞれ、偶数アドレスＡ［２ｉ］および奇数アドレスＡ［２ｉ＋１］のデータビットが格納される。ポインタレジスタｒｘのポインタにより、これらの偶数エントリＥＲＹｅおよび奇数エントリＥＲＹｏの同一ビット位置のメモリセルが指定される。したがって、プログラム実行時においてポインタレジスタｒｘのカウント値が２増分されることにより、奇数エントリＥＲＹｏおよび偶数エントリＥＲＹｅにおいて１ビットそのビット位置が上位方向にシフトされる。このポインタレジスタｒｘのポインタに基づいてメモリセルマットのワード線を選択するアドレスが生成される場合、ワード線の切換により、ポインタレジスタｒｘのポインタを２増分する構成が実現される。 In the memory cell mat, data bits of even address A [2i] and odd address A [2i + 1] are stored in even entry ERYe and odd entry ERYo, respectively. A memory cell at the same bit position of the even-numbered entry ERYe and the odd-numbered entry ERYo is designated by the pointer of the pointer register rx. Accordingly, when the count value of the pointer register rx is incremented by 2 at the time of program execution, the bit position of the odd entry ERYo and the even entry ERYe is shifted upward by one bit. When an address for selecting the word line of the memory cell mat is generated based on the pointer of the pointer register rx, a configuration in which the pointer of the pointer register rx is incremented by 2 is realized by switching the word line.

演算器３１において２つの全加算器２１０および２１１を設けて２ビット加算を行なうことにより、たとえばブースアルゴリズムに従う乗算操作時、２ビット単位での部分積生成および前の部分積との加算を行なうことができる。また、加算および減算も、２ビット単位で実行することができ、また１ビット単位で演算を実行することもできる。除算は、被除数のビット位置を１ずつ右シフトして減算を行なう必要があり、１ビット単位で演算を実行する。この１ビット演算を実現するために、スイッチ２２５が設けられる。 By providing two full adders 210 and 211 in the computing unit 31 and performing 2-bit addition, for example, in a multiplication operation according to the Booth algorithm, partial product generation in units of 2 bits and addition with the previous partial product are performed. Can do. Also, addition and subtraction can be executed in units of 2 bits, and operations can be executed in units of 1 bit. In division, the bit position of the dividend needs to be shifted right by one and subtraction is performed, and the calculation is performed in units of one bit. In order to realize this 1-bit operation, a switch 225 is provided.

図１０は、１ビット演算操作時における演算器３１の内部接続の一例を概略的に示す図である。１ビット演算の接続時においては、Ｘレジスタ５４が、内部データ線２２６および２２８にスイッチ回路ＳＷａおよびＳＷｂを介してそれぞれ接続される。Ｘレジスタ５４の出力が、セレクタ２２７によりＤレジスタ２２０の格納データに従って選択される。スイッチ回路ＳＷａおよびＳＷｂの接続は、ポインタｐｘ（ポインタレジスタｒｘのポインタ）により決定される。 FIG. 10 is a diagram schematically showing an example of internal connections of the arithmetic unit 31 during a 1-bit arithmetic operation. When 1-bit operation is connected, X register 54 is connected to internal data lines 226 and 228 via switch circuits SWa and SWb, respectively. The output of the X register 54 is selected by the selector 227 according to the data stored in the D register 220. The connection between the switch circuits SWa and SWb is determined by a pointer px (pointer of the pointer register rx).

Ｆレジスタ２０５の格納ビット値に従って、選択反転回路２１７により、加算／減算が実現される。この選択反転器２１７の出力は、全加算器２１１の入力Ａに与えられる。全加算器２１０の入力Ｂは、内部データ線２２６に接続される。全加算器２１０のキャリー出力Ｃｏが、スイッチ回路２２５により全加算器２１０のキャリー入力Ｃｉｎと分離される。全加算器２１０のサム出力Ｓが、ゲート２２３を介して内部データ線２２６に結合される。全加算器２１０は、加算演算には用いられない。全加算器２１１のキャリー入力Ｃ
ｉｎが、Ｃレジスタ５６にスイッチ回路２２５を介して結合される。全加算器の入力Ｂは、ポインタｐｘによりスイッチ回路ＳＷｃおよびＳＷｄを介して内部データ線２２６または２２８に選択的に結合される。また、全加算器２１１のサム出力Ｓが、ゲート２２４およびスイッチ回路ＳＷｅおよびＳＷｆを介して選択的に内部データ線２２６および２２８に接続される。 Addition / subtraction is realized by the selective inversion circuit 217 according to the stored bit value of the F register 205. The output of the selective inverter 217 is given to the input A of the full adder 211. The input B of the full adder 210 is connected to the internal data line 226. The carry output Co of the full adder 210 is separated from the carry input Cin of the full adder 210 by the switch circuit 225. Sum output S of full adder 210 is coupled to internal data line 226 via gate 223. The full adder 210 is not used for the addition operation. Carry input C of full adder 211
in is coupled to C register 56 via switch circuit 225. The input B of the full adder is selectively coupled to the internal data line 226 or 228 via the switch circuits SWc and SWd by the pointer px. The sum output S of the full adder 211 is selectively connected to the internal data lines 226 and 228 via the gate 224 and the switch circuits SWe and SWf.

減算演算を、２の補数の加算演算により行なう場合には、Ｃレジスタ５６に、初期値として“１”が格納され、Ｘレジスタ５４からのビット値が選択反転回路２１７により反転される。加算演算を行なう場合には、Ｃレジスタ５６は、初期状態として“０”にクリアされる。 When the subtraction operation is performed by the two's complement addition operation, “1” is stored in the C register 56 as an initial value, and the bit value from the X register 54 is inverted by the selective inversion circuit 217. When the addition operation is performed, the C register 56 is cleared to “0” as an initial state.

メモリセルマット内のエントリにおいて、内部データ線２２６および２２７に接続される領域には、連続アドレスのデータビットＡ［２ｉ］およびＡ［２ｉ＋１］が格納され、内部データ線２２６および２２８を介してＸレジスタ５４に２ビットデータを並列に転送する。ポインタｐｘ［０］の値を逐次切換えることにより、アドレスＡ［２ｉ］のエントリおよびアドレスＡ［２ｉ＋１］のエントリのデータビットについて、ビットシリアル態様で、加算を行なうことができる。 In the entry in the memory cell mat, data bits A [2i] and A [2i + 1] of continuous addresses are stored in the area connected to internal data lines 226 and 227, and X is transmitted via internal data lines 226 and 228. 2-bit data is transferred to the register 54 in parallel. By sequentially switching the value of the pointer px [0], it is possible to perform addition in a bit serial manner on the data bits of the entry of the address A [2i] and the entry of the address A [2i + 1].

図１１は、１ビット動作時のエントリ間データ移動（move）を行なう命令を一覧にして示す図である。このエントリ間（ＡＬＵ間）データ移動時には、ポインタレジスタｒｎが用いられる。エントリ間データ移動用ポインタレジスタｒｎの候補レジスタとして、前述のポインタレジスタｒ０−ｒ３が設けられる。 FIG. 11 is a diagram showing a list of instructions for performing data movement (move) between entries during 1-bit operation. The pointer register rn is used when data is moved between entries (between ALUs). The aforementioned pointer registers r0 to r3 are provided as candidate registers for the inter-entry data movement pointer register rn.

命令“ecm.mv.n♯n”は、定数ｎ離れたエントリａ＋ｎのＸレジスタの格納値を、エントリｊのＸレジスタに転送することを示す命令である。 The instruction “ecm.mv.n # n” is an instruction indicating that the value stored in the X register of entry a + n separated by a constant n is transferred to the X register of entry j.

命令“ecm.mv.r rn”は、レジスタｒｎの格納値分離れたエントリｊ＋ｒｎのＸレジスタの値が、エントリｊのＸレジスタに転送される操作を示す命令である。 The instruction “ecm.mv.r rn” is an instruction indicating an operation in which the value of the X register of the entry j + rn separated from the stored value of the register rn is transferred to the X register of the entry j.

命令“ecm.swp”は、隣接エントリｊ＋１およびｊのＸレジスタの格納値を交換する操作を指令する命令である。 The instruction “ecm.swp” is an instruction for instructing an operation for exchanging the stored values of the X registers of the adjacent entries j + 1 and j.

図１２は、２ビット動作時の、演算器におけるエントリ間データ移動（move）の操作を指令する命令を一覧にして示す図である。この２ビット操作時においては、命令記述子“ecm２”が、命令記述子“ecm”に代えて用いられる。この命令記述子“ecm２”が指定されると、２ビット単位での演算処理が指定され、ＸＨレジスタおよびＸＬレジスタ間での並列のデータ転送が行なわれる。各レジスタ間の転送内容の指定には、先の１ビット動作時と同じ命令記述子“mv.n♯n”、“mv.r rn”、および“swp”が用いられる。 FIG. 12 is a diagram showing a list of commands for instructing an operation of data movement between entries (move) in an arithmetic unit during a 2-bit operation. In the 2-bit operation, the instruction descriptor “ecm2” is used in place of the instruction descriptor “ecm”. When this instruction descriptor “ecm2” is designated, arithmetic processing in units of 2 bits is designated, and parallel data transfer is performed between the XH register and the XL register. To specify the transfer contents between the registers, the same instruction descriptors “mv.n # n”, “mv.r rn”, and “swp” as in the previous one-bit operation are used.

［実施の形態１］
図１３は、この発明の実施の形態１に従うＡＬＵ間相互接続スイッチ回路の配線レイアウトを概略的に示す図である。図１３において、データの移動量として、±１、±２、および±４の移動量の組を備えるエントリ間通信配線の配線レイアウトを示す。 [Embodiment 1]
FIG. 13 schematically shows a wiring layout of the inter-ALU interconnection switch circuit according to the first embodiment of the present invention. FIG. 13 shows a wiring layout of inter-entry communication wirings having a set of movement amounts of ± 1, ± 2, and ± 4 as data movement amounts.

図１３において、エントリｎ＋１３からエントリｎ−１３を示す。このデータ転送配線３００は、それぞれが対応のエントリからのデータを受けて出力する出力部３０５と、転送先のエントリへのデータの送出部ＸＰ１、ＸＰ２およびＸＰ４、ＸＮ１、ＸＮ２およびＸＮ４を含む。送出部ＸＰ１、ＸＰ２およびＸＰ４は、それぞれ、＋１、＋２および＋４離れたエントリに対するデータを転送する送出部であり、送出部ＸＮ１、ＸＮ２およびＸＮ４は、それぞれ−１、−２および−４離れたエントリへデータを転送する送出部である
。従って、１つのデータ転送配線３００は、９エントリにわたって延在して配置される。 In FIG. 13, entries n + 13 to n-13 are shown. The data transfer wiring 300 includes an output unit 305 that receives and outputs data from corresponding entries, and data sending units XP1, XP2, and XP4, XN1, XN2, and XN4 to the transfer destination entries. The sending sections XP1, XP2, and XP4 are sending sections that transfer data for entries that are separated by +1, +2, and +4, respectively. The sending sections XN1, XN2, and XN4 are entries that are separated by -1, -2 and -4, respectively. It is a sending part which transfers data to. Accordingly, one data transfer wiring 300 is arranged extending over nine entries.

このデータ転送配線３００は、エントリと交差する方向（第２の方向）において１列に整列して配置され、かつエントリの延在方向（第１の方向）においては、データ出力部３０５が１エントリずれるようにアレイ状に配置される。このデータ転送配線３００は、最遠のデータ送出部が、±４エントリ離れたエントリに対応する位置であり、自身を含んで９エントリごとに、データ転送配線３００が、第２の方向に整列して配置される。 The data transfer wiring 300 is arranged in a line in the direction intersecting the entry (second direction), and the data output unit 305 has one entry in the extending direction of the entry (first direction). They are arranged in an array so as to be displaced. In this data transfer wiring 300, the farthest data transmission unit corresponds to an entry that is separated by ± 4 entries, and the data transfer wiring 300 is aligned in the second direction every nine entries including itself. Arranged.

この各エントリからのデータを受けて出力するデータ出力部および転送先のエントリにデータを与えるデータ送出部を別々に設けることにより、１つのデータ転送配線３００において複数のデータ送出部を設けることにより、１つの配線で複数種類のエントリ間で他通信を実現することが出来る。また、各エントリからのデータの出力部を各転送配線３００により、１エントリ、第１の方向においてずらせて配置することにより、効率的に、データ転送路を配線することができる。すなわち、図１３に示すように、９エントリにわたって傾斜を有する対辺を有する菱形状の平行四辺形３２０を第２の方向に順次繰返し配置することにより、配線レイアウトを低減して効率的に、エントリ間通信配線を配置配線することができる。 By separately providing a data output unit that receives and outputs data from each entry and a data transmission unit that provides data to a transfer destination entry, by providing a plurality of data transmission units in one data transfer wiring 300, Other communication can be realized between a plurality of types of entries with one wiring. Further, the data transfer path can be efficiently wired by shifting the output part of the data from each entry by one transfer wiring 300 in the first direction. That is, as shown in FIG. 13, by arranging the parallelograms 320 in the rhombus shape having opposite sides having inclinations over nine entries in order in the second direction, the wiring layout is reduced and the inter-entry can be efficiently performed. Communication wiring can be arranged and wired.

上述のように、データの転送量に応じて配線を個々に配置するのではなく、１つのデータ転送配線３００において、複数のデータ移動量の組を実現し、各エントリ対応の演算器において、選択的に送出部からの信号を取込むことにより、エントリそれぞれにおいて、並列にデータの転送を行なうことができ、また、配線レイアウト面積を低減することができる。また、このエントリ間データ移動量の組を複数個設けることにより、任意のエントリ間移動を、比較的少ないサイクル数で実現することができる。 As described above, instead of individually arranging the wiring according to the data transfer amount, a single data transfer wiring 300 realizes a plurality of sets of data movement amounts, and the arithmetic unit corresponding to each entry selects By taking in the signal from the sending unit, data can be transferred in parallel in each entry, and the wiring layout area can be reduced. Further, by providing a plurality of sets of inter-entry data movement amounts, any inter-entry movement can be realized with a relatively small number of cycles.

たとえば、データにおいて、±４エントリ間の移動を、１サイクルで実現する場合には、この送出部ＸＰ４およびＸＮ４を選択的に利用する。この場合、図１４に示すようなサイクリックなデータ移動が実現される。ただし、図１４においては、２０４８エントリがメモリセルマットにおいて設けられるとしている。 For example, when data is moved between ± 4 entries in one cycle, the sending sections XP4 and XN4 are selectively used. In this case, cyclic data movement as shown in FIG. 14 is realized. However, in FIG. 14, 2048 entries are provided in the memory cell mat.

例えばエントリ０のデータを＋４エントリ移動させると、エントリ２０４４に大使て設けられるＸレジスタに転送される。ここで、＋移動は、エントリ番号の小さいエントリにデータを転送する方向に設定している。また、エントリ２０４７のデータを‐４エントリ移動させると、エントリ３のＸレジスタに転送される。 For example, when the entry 0 data is moved by +4 entries, the entry 044 is transferred to the X register that is used for the entry 2044. Here, + movement is set in a direction in which data is transferred to an entry with a small entry number. When the data of entry 2047 is moved by -4 entries, it is transferred to the X register of entry 3.

したがって、１隣接エントリ間のデータ転送と同様に、データ転送配線３００を用いて、４エントリ離れたエントリへのデータ転送を１マシンサイクルで実現することができる。したがって、たとえば３エントリ離れた位置へのデータ転送時には、２エントリ移動および１エントリ移動を、シフトレジスタを利用して順次データを転送する場合と同様にして、行なうことにより、２サイクルで、データ転送を実現することができる。また、８エントリ離れたエントリへのデータ転送時には、４エントリ移動を２回繰返すことにより、８エントリ間移動が実現される。このシフトレジスタ態様のデータ転送を繰返すことにより、任意の距離（エントリ数）離れたエントリ間でのデータ通信が実現される。 Therefore, similarly to the data transfer between one adjacent entry, the data transfer to an entry separated by four entries can be realized in one machine cycle using the data transfer wiring 300. Therefore, for example, when transferring data to a position separated by three entries, two-entry movement and one-entry movement are performed in the same manner as in the case of sequentially transferring data using a shift register. Can be realized. Further, when transferring data to an entry that is 8 entries apart, the transfer between 8 entries is realized by repeating the 4-entry movement twice. By repeating this data transfer in the shift register mode, data communication between entries separated by an arbitrary distance (number of entries) is realized.

配線レイアウト面積の余裕に応じて、１本のデータ転送配線３００の実現するデータ移動量の組をより多く設けることにより、任意のエントリ間データ移動を比較的少ないサイクル数で実現することができる。 By providing a larger number of sets of data movement amounts realized by one data transfer wiring 300 in accordance with the margin of the wiring layout area, data movement between any entry can be realized with a relatively small number of cycles.

図１５は、このメモリセルマット端部のエントリ０−５および２０４４−２０４７に対するデータ転送配線３００の配線レイアウトを概略的に示す図である。このエントリ０−
３においては、そのデータ出力端部３０５から、＋１、＋２および＋４のデータシフト転送を行なう場合、エントリ２０４５‐２０４７方向へ、データをシフトする必要がある。したがって、この場合、プラスシフト用配線３３０ｐを用いて、各データ転送配線３００の転送経路を、エントリ端部から他方端へ配線する。これにより、エントリ０ないし３のデータ出力部３０５に対し、それぞれ、プラス方向のシフト用送出部ＸＰ１、ＸＰ２およびＸＰ４をそれぞれ配置することができる。 FIG. 15 schematically shows a wiring layout of data transfer wiring 300 for entries 0-5 and 2044-2047 at the ends of the memory cell mat. This entry 0-
3, when data shift transfer of +1, +2, and +4 is performed from the data output end 305, it is necessary to shift data in the direction of entries 2045-2047. Therefore, in this case, the transfer path of each data transfer wiring 300 is wired from the entry end to the other end using the plus shift wiring 330p. As a result, plus-direction shift sending sections XP1, XP2 and XP4 can be arranged for the data output sections 305 of entries 0 to 3, respectively.

一方、メモリセルマットの他方端のエントリ２０４５−２０４７に対して設けられるデータ転送配線３００においては、マイナス方向のデータ転送時、最初のエントリ０に戻って、データを転送する必要がある。このため、マイナスシフト用の配線３３０ｎをメモリセルマットの他方端から一方端に配設し、各データ転送配線３００のマイナス方向のデータシフトを実現する。これにより、エントリ２０４５−２０４７に対して設けられるデータ出力部３０５に対し、各マイナス方向のシフト用送出ＸＮ１、ＸＮ２およびＸＮ４を、それぞれ配置することができ、各データ転送配線３００において、±１、±２、および±４のデータ転送シフトを実現することができる。 On the other hand, in the data transfer wiring 300 provided for the entries 2045 to 2047 at the other end of the memory cell mat, it is necessary to return to the first entry 0 and transfer data when transferring data in the minus direction. For this reason, a minus shift wiring 330n is provided from the other end of the memory cell mat to one end, thereby realizing a data shift in the minus direction of each data transfer wiring 300. As a result, the transmission outputs XN1, XN2, and XN4 in the minus direction can be respectively arranged for the data output unit 305 provided for the entries 2045-2047. In each data transfer wiring 300, ± 1, Data transfer shifts of ± 2 and ± 4 can be realized.

なお、このデータシフト用の配線３３０ｐおよび３３０ｎは、転送経路３００を構成する配線と別の配線層の配線で形成されてもよい。また、このシフト用配線３３０ｎおよび３３０ｐにより接続される送出部ＸＰ１、ＸＰ２、およびＸＰ４およびＸＮ１、ＸＮ２およびＸＮ４は、それぞれ、データ転送配線３００の平行四辺形領域の底部の空き領域（三角形状の領域）に配設されてもよい。 The data shift wirings 330p and 330n may be formed of wirings in a wiring layer different from the wirings constituting the transfer path 300. The sending sections XP1, XP2, and XP4 and XN1, XN2, and XN4 connected by the shift wirings 330n and 330p are respectively empty areas (triangular areas) at the bottom of the parallelogram area of the data transfer wiring 300. ).

いずれの場合においても、このエントリ０ないし２０４７において、リング状にデータを移動させることができ、任意の方向に向かって、任意のデータエントリ間データ移動を実現することができる。 In any case, in the entries 0 to 2047, data can be moved in a ring shape, and data movement between any data entries can be realized in any direction.

図１６は、図６に示すＡＬＵ間接続回路６５の構成を概略的に示す図である。ＡＬＵ間接続回路６５において、演算器３１のＸレジスタ５４、ＸＨレジスタ２２０およびＸＬレジスタ２２１が、それぞれ、ｎビット幅の転送配線束３４０ａ、３４０ｈおよび３４０ｌにそれぞれ結合される。このｎビット幅の転送配線束３４０ａ、３４０ｈおよび３４０ｌは、それぞれ、他のエントリのからのデータ転送配線３００の束である。これらの転送配線束３４０ａ、３４０ｈおよび３４０ｌは、それぞれ、接続部３５０ａ、３５０ｈおよび３５０ｌを介してＸレジスタ５４、ＸＨレジスタ２２０およびＸＬレジスタ２２１に結合される。 FIG. 16 schematically shows a configuration of inter-ALU connection circuit 65 shown in FIG. In the inter-ALU connection circuit 65, the X register 54, the XH register 220, and the XL register 221 of the arithmetic unit 31 are respectively coupled to transfer wiring bundles 340a, 340h, and 340l having an n-bit width. The n-bit width transfer wiring bundles 340a, 340h and 340l are bundles of data transfer wirings 300 from other entries, respectively. These transfer wiring bundles 340a, 340h and 340l are coupled to X register 54, XH register 220 and XL register 221 via connection portions 350a, 350h and 350l, respectively.

一方、Ｘレジスタ５４、ＸＨレジスタ２２０およびＸＬレジスタ２２１が、それぞれ、出力ノード３０５ａ、３０５ｈおよび３０５ｌを介してデータ転送配線３００ａ、３００ｈおよび３００ｌに結合される。これらの転送配線３００ａ、３００ｈおよび３００ｌは、それぞれ、最大データ移動量のエントリ部まで延在し、その経路の途中で、他のエントリにもデータを転送する。他のエントリにおいて、この接続部３５０ａ、３５０ｈおよび３５０ｌにより、データ移動量の配線が選択される。 On the other hand, X register 54, XH register 220 and XL register 221 are coupled to data transfer lines 300a, 300h and 300l through output nodes 305a, 305h and 305l, respectively. Each of these transfer wirings 300a, 300h, and 300l extends to the entry portion of the maximum data movement amount, and transfers data to other entries along the route. In the other entry, the data movement amount wiring is selected by the connection portions 350a, 350h and 350l.

演算器３１の構成は、先の図８に示す演算器の構成と同じであり、対応する部分には同一参照番号を付し、その詳細説明は省略する。 The configuration of the computing unit 31 is the same as the configuration of the computing unit shown in FIG. 8, and corresponding portions are denoted by the same reference numerals, and detailed description thereof is omitted.

これらの接続部３５０ａ、３５０ｈおよび３５０ｌにおいて、データ移動量に応じて対応のエントリを選択する必要がある。この対応の他のエントリを選択するために、接続部３５０ａ、３５０ｈおよび３５０ｌそれぞれにおいて、スイッチ回路が設けられる。 In these connection portions 350a, 350h, and 350l, it is necessary to select a corresponding entry according to the amount of data movement. In order to select another entry corresponding to this, a switching circuit is provided in each of the connection portions 350a, 350h, and 350l.

図１７は、このＡＬＵ間接続回路６５におけるスイッチ回路および関連の部分の構成の
一例を示す図である。１ビットデータ転送時においては、Ｘレジスタ５４を利用する。２ビットデータ転送時にＸＨレジスタおよびＸＬレジスタが利用される。他のエントリの演算器のＸレジスタ、ＸＨレジスタおよびＸＬレジスタそれぞれに対して、この図１７に示す構成が設けられる。しかしながら、図１７においては、図面を簡略化するために、Ｘレジスタ５４に対応する部分の構成のみを代表的に示す。 FIG. 17 is a diagram showing an example of the configuration of the switch circuit and related portions in the inter-ALU connection circuit 65. When transferring 1-bit data, the X register 54 is used. The XH register and the XL register are used during 2-bit data transfer. The configuration shown in FIG. 17 is provided for each of the X register, XH register, and XL register of the arithmetic units of other entries. However, in FIG. 17, only the configuration of the portion corresponding to the X register 54 is representatively shown to simplify the drawing.

図１７において、ＡＬＵ間接続回路６５は、他のエントリの選択信号ＥＣＭ＿ＥＭ＿ｋ（ｋ＝１、…２５６）に従って、対応のデータ出力配線ＥＣＭ＿ＩＮ＿ｋ上の信号に応じた信号を生成するスイッチ回路３６０と、転送イネーブル信号ＥＣＭ＿ＥＮに従って、このスイッチ回路３６０の出力信号と対応のエントリから転送されたデータの一方を選択するセレクタ３６２と、転送イネーブル信号ＥＣＭ＿ＥＮに従って、Ｘレジスタ５４の保持データに従って出力データＥＣＭ＿ＯＵＴを生成するトライステートバッファ３６４を含む。このトライステートバッファ３６４により、データ転送配線の出力段３０５（３０５ａ，３０５ｈ，３０５ｌ）が駆動される。 In FIG. 17, the inter-ALU connection circuit 65 generates a signal according to the signal on the corresponding data output wiring ECM_IN_k according to the selection signal ECM_EM_k (k = 1,... 256) of another entry, and the transfer. A selector 362 that selects one of the data transferred from the entry corresponding to the output signal of the switch circuit 360 according to the enable signal ECM_EN, and a trie that generates output data ECM_OUT according to the data held in the X register 54 according to the transfer enable signal ECM_EN. A state buffer 364 is included. The tristate buffer 364 drives the output stage 305 (305a, 305h, 305l) of the data transfer wiring.

スイッチ回路３６０は、他のエントリそれぞれに対応して設けられる選択スイッチゲートＰＳＱ２５６…ＰＳＱ１、ＮＳＱ１、…ＮＳＱ２５６を含む。これらの選択スイッチゲートＰＳＱ２５６−ＰＳＱ１，ＮＳＱ１−ＮＳＱ２５６が、それぞれ、エントリ選択信号ＥＣＭ＿ＥＮ＿ｋに従って対応のデータ転送配線上の出力信号ＥＣＭ＿ＩＮ＿ｋを選択して、この対応のデータ転送配線上の入力信号に従って内部ノードＮＤを駆動する。 Switch circuit 360 includes select switch gates PSQ256... PSQ1, NSQ1,... NSQ256 provided corresponding to each of the other entries. These selection switch gates PSQ256-PSQ1, NSQ1-NSQ256 select output signal ECM_IN_k on the corresponding data transfer wiring in accordance with entry selection signal ECM_EN_k, respectively, and internal node ND in accordance with the input signal on the corresponding data transfer wiring. Drive.

これらの選択スイッチングゲートＰＳＱ２５６−ＰＳＱ１、ＮＳＱ１−ＮＳＱ２５６は、それぞれ、内部ノードＮＤと接地ノードの間に直列に接続されるＮチャネルＭＯＳトランジスタ（絶縁ゲート型電界効果トランジスタ）ＮＱ１およびＮＱ２を含む。ＭＯＳトランジスタＮＱ１のゲートが、対応の他のエントリからの転送入力信号ＥＣＭ＿ＩＮ＿ｋを受け、ＭＯＳトランジスタＮＱ２のゲートに、エントリ選択信号ＥＣＭ＿ＥＮ＿ｋを受ける。 These selective switching gates PSQ256-PSQ1, NSQ1-NSQ256 include N channel MOS transistors (insulated gate field effect transistors) NQ1 and NQ2 connected in series between internal node ND and the ground node, respectively. The gate of MOS transistor NQ1 receives transfer input signal ECM_IN_k from another corresponding entry, and the gate of MOS transistor NQ2 receives entry selection signal ECM_EN_k.

したがって、図１７に示すスイッチ回路３６０の構成では、データ転送配線は、最大±２５６エントリ離れた位置のエントリ間のデータ転送を行なう。このデータ転送配線の途中に設けられる中間の転送量（データ移動量）は、２のべき乗の値で順次設定されてもよい。 Therefore, in the configuration of switch circuit 360 shown in FIG. 17, the data transfer wiring transfers data between entries at positions separated by a maximum of ± 256 entries. The intermediate transfer amount (data movement amount) provided in the middle of the data transfer wiring may be sequentially set as a power of 2.

スイッチ回路３６０は、さらに、プリチャージ指示信号／ＥＣＭ＿ＰＲＣに従って内部ノードＮＤを電源電圧レベルにプリチャージするＰチャネルＭＯＳトランジスタＰＱ１と、内部ノードＮＤ上の信号を反転するインバータＩＶ１と、インバータＩＶ１の出力信号に従って選択的に導通して、内部ノードＮＤを電源電圧レベルに駆動するＰチャネルＭＯＳトランジスタＰＱ２と、インバータＩＶ１の出力信号を反転して、このスイッチ回路３６０の出力信号を生成するインバータＩＶ２を含む。 Switch circuit 360 further includes a P channel MOS transistor PQ1 for precharging internal node ND to the power supply voltage level in accordance with precharge instruction signal / ECM_PRC, an inverter IV1 for inverting a signal on internal node ND, and an output signal of inverter IV1 And a P channel MOS transistor PQ2 for driving internal node ND to the power supply voltage level, and an inverter IV2 for inverting the output signal of inverter IV1 and generating an output signal of switch circuit 360.

ＭＯＳトランジスタＰＱ２およびインバータＩＶ１により、いわゆる「ハーフラッチ」が形成される。内部ノードＮＤは、プリチャージ時、電源電圧レベルにプリチャージされる。選択スイッチゲートＰＳＱｋまたはＮＳＱｋ（ｋ＝１−２５６）は、選択されたときに、内部ノードＮＤを接地電圧レベルに駆動する。選択スイッチゲートＰＳＱ２５６−ＰＳＱ１、ＮＳＱ１−ＮＳＱ２５６を用いて内部ノードＮＤをいわゆるオープンドレイン方式で駆動する。内部ノードＮＤがプリチャージ電圧レベルから接地電圧レベル方向へ駆動されるだけであり、データ転送時、高速でデータ転送を行なうことができる。また、各選択スイッチゲートは、２個のトランジスタで構成され、回路のレイアウト面積を低減することができる。 MOS transistor PQ2 and inverter IV1 form a so-called “half latch”. Internal node ND is precharged to the power supply voltage level during precharging. Select switch gate PSQk or NSQk (k = 1-256) drives internal node ND to the ground voltage level when selected. Internal node ND is driven by a so-called open drain method using selection switch gates PSQ256-PSQ1 and NSQ1-NSQ256. Internal node ND is merely driven from the precharge voltage level to the ground voltage level, and data transfer can be performed at a high speed during data transfer. Each selection switch gate is composed of two transistors, and the layout area of the circuit can be reduced.

Ｘレジスタ５４は、クロック信号ＣＬＫＸ（内部の演算処理サイクルを決定するクロック）の立上がりに応答してラッチ状態となる。ここで、図１８においては、＋１シフト動作時の波形を示す。すなわち、エントリ番号ｉのエントリに対し、エントリ番号（ｉ＋１）のエントリのデータが転送され、隣接エントリ間でのデータ転送が行われる。 The X register 54 is in a latch state in response to the rise of the clock signal CLKX (clock that determines the internal arithmetic processing cycle). Here, in FIG. 18, the waveform at the time of +1 shift operation is shown. That is, the entry number (i + 1) entry data is transferred to the entry number i entry, and data transfer is performed between adjacent entries.

図１８は、図１７に示すＡＬＵ間接続回路の動作を示すタイミング図である。以下、図１８を参照して、図１７に示すＡＬＵ間接続回路６５の動作について説明する。 FIG. 18 is a timing chart showing the operation of the inter-ALU connection circuit shown in FIG. The operation of the inter-ALU connection circuit 65 shown in FIG. 17 will be described below with reference to FIG.

クロックサイクル♯１においては、転送動作は行なわれず、転送イネーブル信号ＥＣＭ＿ＥＮはＬレベルである。この状態においては、セレクタ３６２は、対応のエントリからのデータを選択する状態に設定され、また、トライステートバッファ３６４は、出力ハイインピーダンス状態にある。Ｘレジスタ５４には、データＤ［ｉ］が格納されている。 In clock cycle # 1, no transfer operation is performed, and transfer enable signal ECM_EN is at the L level. In this state, selector 362 is set to select data from the corresponding entry, and tristate buffer 364 is in an output high impedance state. The X register 54 stores data D [i].

クロックサイクル♯２において、転送イネーブル信号ＥＣＭ＿ＥＮが活性状態のＨレベルに設定される。応じて、セレクタ３６２が、スイッチ回路３６０の出力信号を選択する状態に設定され、また、トライステートバッファ３６４が、活性化される。これにより、Ｘレジスタ５４に保持されているデータＤ［ｉ］に従って、転送出力信号ＥＣＭ＿ＯＵＴがデータＤ［ｉ］に対応する状態に設定される。 In clock cycle # 2, transfer enable signal ECM_EN is set to an active H level. In response, selector 362 is set to a state for selecting the output signal of switch circuit 360, and tristate buffer 364 is activated. Thereby, the transfer output signal ECM_OUT is set to a state corresponding to the data D [i] according to the data D [i] held in the X register 54.

この転送出力信号ＥＣＭ＿ＯＵＴが確定すると、対応のデータ転送配線において出力部からデータが転送され、応じて、各エントリにおいて、個々の転送データ送出部のデータＥＣＭ＿ＩＮ＿ｉのデータが確定状態となる。 When the transfer output signal ECM_OUT is determined, data is transferred from the output unit in the corresponding data transfer wiring, and accordingly, the data ECM_IN_i of each transfer data transmission unit is determined in each entry.

この転送出力信号の送信時においては、プリチャージ指示信号／ＥＣＭ＿ＰＲＣは、Ｌレベルであり、スイッチ回路３６０はプリチャージ状態にある（エントリ選択信号ＥＣＭ＿ＥＮ＿＋１は、非活性状態にある）。 At the time of transmission of this transfer output signal, precharge instruction signal / ECM_PRC is at L level, and switch circuit 360 is in a precharge state (entry selection signal ECM_EN_ + 1 is in an inactive state).

クロックサイクル♯２において、クロック信号ＣＬＫＸの立下がりに従って、プリチャージ指示信号／ＥＣＭ＿ＰＲＣがＨレベルの非活性状態となり、また、これと並行して、エントリ選択信号ＥＣＭ＿ＥＮ＿＋１が活性化される。このエントリ選択信号ＥＣＭ＿ＥＮ＿＋１の活性化時には、既に、隣接エントリ（ｉ＋１）からトライステートバッファ３６４を介してデータ転送配線に転送された出力データＥＣＭ＿ＩＮ＿＋１が確定状態にある。従って、スイッチ回路３６０からの出力データが、転送データＥＣＭ＿ＩＮ＿＋１に対応する状態となる。 In clock cycle # 2, in accordance with the fall of clock signal CLKX, precharge instruction signal / ECM_PRC becomes inactive at the H level, and in parallel with this, entry selection signal ECM_EN_ + 1 is activated. When the entry selection signal ECM_EN_ + 1 is activated, the output data ECM_IN_ + 1 transferred from the adjacent entry (i + 1) to the data transfer wiring via the tristate buffer 364 is already in a definite state. Therefore, the output data from the switch circuit 360 is in a state corresponding to the transfer data ECM_IN_ + 1.

すなわち、隣接エントリ（ｉ＋１）からの転送データＥＣＭ＿ＩＮ＿＋１がＨレベルのとき、選択スイッチゲートＰＳＱ１において、ＭＯＳトランジスタＮＱ１およびＮＱ２がともに導通状態となり、ノードＮＤが接地電圧レベルに駆動される。この状態においては、インバータＩＶ１の出力信号は、Ｈレベルであり、ＭＯＳトランジスタＰＱ２はオフ状態にあり、ノードＮＤはＬレベルに維持される。一方、隣接エントリ（ｉ＋１）からの転送データＥＣＭ＿ＩＮ＿＋１がＬレベルのときには、選択スイッチゲートＰＳＱ１は非導通状態であり、ノードＮＤは、プリチャージ状態の電圧レベルを維持する。ノードＮＤがＨレベルのときには、インバータＩＶ１の出力信号がＬレベルとなり、応じて、ＭＯＳトランジスタＰＱ２がオン状態となり、ノードＮＤは、電源電圧レベルに維持される。 That is, when transfer data ECM_IN_ + 1 from adjacent entry (i + 1) is at the H level, MOS transistors NQ1 and NQ2 are both rendered conductive in selection switch gate PSQ1, and node ND is driven to the ground voltage level. In this state, the output signal of inverter IV1 is at the H level, MOS transistor PQ2 is in the off state, and node ND is maintained at the L level. On the other hand, when transfer data ECM_IN_ + 1 from the adjacent entry (i + 1) is at the L level, the selection switch gate PSQ1 is in a non-conductive state, and the node ND maintains the precharged voltage level. When node ND is at H level, the output signal of inverter IV1 is at L level, and accordingly MOS transistor PQ2 is turned on, and node ND is maintained at the power supply voltage level.

したがって、クロックサイクル♯２のクロック信号ＣＬＫＸが立下がる後半サイクルにおいて、スイッチ回路３６０において、隣接エントリ（ｉ＋１）からデータ転送配線ＥＣＭ＿ＩＮ＿＋１を介して転送されるデータＤ［ｉ＋１］に応じたデータを生成して、セレクタ３６２を介してＸレジスタ５４へ伝達する。 Therefore, in the latter half cycle when clock signal CLKX of clock cycle # 2 falls, switch circuit 360 generates data corresponding to data D [i + 1] transferred from adjacent entry (i + 1) through data transfer wiring ECM_IN_ + 1. To the X register 54 via the selector 362.

Ｘレジスタ５４は、次のクロックサイクル♯３において、クロック信号ＣＬＫＸが立上がるとラッチ状態となり、その保持データが、転送データＤ［ｉ＋１］に応じた状態に設定される。このクロックサイクル♯３において、プリチャージ指示信号／ＥＣＭ＿ＰＲＣが再びＬレベルの活性状態とされ、スイッチ回路３６０のインバータＩＶ２からの出力信号が再びＨレベルとなる。また、転送イネーブル信号ＥＣＭ＿ＥＮが非活性状態のＬレベルとなり、トライステートバッファ３６４が出力ハイインピーダンス状態となり、また、セレクタ３６２が対応のエントリからのデータをＸレジスタ５４へ転送する状態に設定される。 In the next clock cycle # 3, the X register 54 is in a latched state when the clock signal CLKX rises, and the held data is set in a state corresponding to the transfer data D [i + 1]. In clock cycle # 3, precharge instruction signal / ECM_PRC is again activated to the L level, and the output signal from inverter IV2 of switch circuit 360 again becomes the H level. Further, the transfer enable signal ECM_EN is set to the inactive L level, the tristate buffer 364 is set to the output high impedance state, and the selector 362 is set to a state of transferring data from the corresponding entry to the X register 54.

Ｘレジスタ５４は、通常のラッチ回路またはフリップフロップで構成され、クロック信号ＣＬＫＸが立ち上がると、内部のデータをラッチする状態に設定されれば良い。クロック信号ＣＬＫＸの立ち上がり時において、スイッチ回路３６０が、プリチャージ状態とされ、その出力信号が初期化されても、Ｘレジスタ５４には、このスイッチ回路３６０からの初期化信号の影響を受けることなく、確実に、転送データがラッチされる。 The X register 54 is configured by a normal latch circuit or flip-flop, and may be set to latch the internal data when the clock signal CLKX rises. Even when the switch circuit 360 is in a precharged state and its output signal is initialized when the clock signal CLKX rises, the X register 54 is not affected by the initialization signal from the switch circuit 360. Certainly, the transfer data is latched.

この図１８に示すように、転送サイクル♯２において、エントリｉからのデータＤ［ｉ］を対応のデータ出力部から送信データＥＣＭ＿Ｏｕｔとしてデータ転送配線を介して転送し、次のクロックサイクル♯３において、隣接エントリから転送されたデータＤ［ｉ＋１］をＸレジスタ５４に保持することができる。 As shown in FIG. 18, in transfer cycle # 2, data D [i] from entry i is transferred as transmission data ECM_Out from the corresponding data output unit via the data transfer wiring, and in next clock cycle # 3. The data D [i + 1] transferred from the adjacent entry can be held in the X register 54.

なお、このエントリ間のデータ移動量は、隣接エントリ間のデータ移動に限定されず、１回の転送動作により、最大２５６のエントリ間のデータ移動を行なうことができる。 Note that the amount of data movement between entries is not limited to data movement between adjacent entries, and data movement between up to 256 entries can be performed by one transfer operation.

エントリ間のデータ転送移動量が、データ転送配線において設定されたデータ移動量（たとえば±１、±２および±４…±２５６）と異なる値の場合、この転送サイクルを繰返す（シフトレジスタ態様でデータ転送を行う）ことにより、任意の距離離れたエントリ間のデータ転送（たとえばエントリ＋２５７等の奇数エントリ分離れたエントリとの間での通信）を、実現することができる。これにより、少ないサイクル数で、任意のエントリ間のデータ転送を実現することができる。 When the data transfer movement amount between entries is different from the data movement amount (for example, ± 1, ± 2, and ± 4... ± 256) set in the data transfer wiring, this transfer cycle is repeated (data in the shift register mode). By performing (transfer), data transfer between entries separated by an arbitrary distance (for example, communication with an entry separated by an odd number of entries such as entry +257) can be realized. Thereby, data transfer between arbitrary entries can be realized with a small number of cycles.

また、このエントリ間のデータ転送において、各エントリが並列にデータ転送を行なってデータの取込みを行なっており、効率的なデータ転送を実現することができる。 Further, in the data transfer between the entries, each entry performs data transfer in parallel and takes in data, thereby realizing efficient data transfer.

また、スイッチ回路３６０においては、オープンドレイン方式でプリチャージされた内部ノードをドライブしており、スイッチ回路３６０のトランジスタ数を低減することができ、また、高速でデータ転送を行うことが出来る。また、１本のデータ転送配線は、複数のエントリ間データ移動に対応しており、配線領域を低減することができる。 Further, the switch circuit 360 drives an internal node precharged by an open drain method, so that the number of transistors of the switch circuit 360 can be reduced and data transfer can be performed at high speed. In addition, one data transfer wiring corresponds to data movement between a plurality of entries, and the wiring area can be reduced.

［実施の形態２］
図１９は、この発明の実施の形態２に従うＡＬＵ間相互接続スイッチ回路の構成を示す図である。図１９においても、１つのエントリに対応して配置されるＡＬＵ間接続回路の構成を示す。この図１９に示すＡＬＵ間接続回路６５は、図１７に示すＡＬＵ間接続回路６５と以下の点でその構成が異なる。すなわち、Ｘレジスタ５４とトライステートバッファ３６４との間に、対応のエントリから読出されたデータとＸレジスタ５４の保持データとを受けるセレクタ３７０が設けられる。このセレクタ３７０は、転送ロードイネーブル信号ＥＣＭ＿ＬＤ＿ＥＮが活性化されると、対応のエントリから読出したデータを選択してトライステートバッファ３６４へ与える。この転送ロードイネーブル信号ＥＣＭ＿ＬＤ＿ＥＮの非活性化時、セレクタ３７０は、Ｘレジスタ５４の出力するデータを選択してトライステートバッファ３６４へ与える。 [Embodiment 2]
FIG. 19 shows a configuration of the inter-ALU interconnection switch circuit according to the second embodiment of the present invention. FIG. 19 also shows the configuration of the inter-ALU connection circuit arranged corresponding to one entry. The inter-ALU connection circuit 65 shown in FIG. 19 differs from the inter-ALU connection circuit 65 shown in FIG. 17 in the following points. That is, a selector 370 that receives data read from the corresponding entry and data held in the X register 54 is provided between the X register 54 and the tristate buffer 364. When the transfer load enable signal ECM_LD_EN is activated, the selector 370 selects the data read from the corresponding entry and supplies it to the tristate buffer 364. When the transfer load enable signal ECM_LD_EN is inactivated, the selector 370 selects the data output from the X register 54 and supplies it to the tristate buffer 364.

図１９に示すＡＬＵ間接続回路６５の他の構成は、図１７に示すＡＬＵ間接続回路６５の構成と同じであり、対応する部分には同一参照番号を付し、その詳細説明は省略する。 The other configuration of the inter-ALU connection circuit 65 shown in FIG. 19 is the same as the configuration of the inter-ALU connection circuit 65 shown in FIG. 17, and corresponding portions are denoted by the same reference numerals and detailed description thereof is omitted.

図２０は、図１９に示すＡＬＵ間相互接続回路のデータ転送動作を示すタイミング図である。この図２０においても、隣接エントリ（ｉ＋１）からエントリｉへのデータ転送時のタイミング図を示す。以下、図２０を参照して、図１９に示すＡＬＵ間相互接続回路のデータ転送動作について説明する。 FIG. 20 is a timing chart showing a data transfer operation of the inter-ALU interconnection circuit shown in FIG. FIG. 20 also shows a timing chart at the time of data transfer from the adjacent entry (i + 1) to the entry i. The data transfer operation of the ALU interconnection circuit shown in FIG. 19 will be described below with reference to FIG.

クロックサイクル♯１においては、プリチャージ指示信号／ＥＣＭ＿ＰＲＣがＬレベルの活性状態であり、一方、転送イネーブル信号ＥＣＭ＿ＥＮおよび転送ロードイネーブル信号ＥＣＭ＿ＬＤ＿ＥＮは、Ｌレベルの非活性状態である。セレクタ３６０は、対応のエントリからのデータを選択する状態に設定され、また、セレクタ３７０は、Ｘレジスタ５４の保持出力データを選択する状態に設定される。スイッチ回路３６０は、プリチャージ状態にあり、ノードＮＤは電源電圧レベルである。 In clock cycle # 1, precharge instruction signal / ECM_PRC is in an active state at an L level, while transfer enable signal ECM_EN and transfer load enable signal ECM_LD_EN are in an inactive state at an L level. The selector 360 is set to select data from the corresponding entry, and the selector 370 is set to select the output data held in the X register 54. Switch circuit 360 is in a precharge state, and node ND is at the power supply voltage level.

クロックサイクル♯２において、対応のエントリからのデータの読出が実施される。この読出時においては、先の図４において説明したビットシリアル演算時と同様にして実行される。すなわち、メモリセルマットにおいてアドレスポインタに従ってワード線が選択され、ついで、センスアンプが活性化され、選択メモリセルのデータが、演算器に向かって転送される（ロード命令が実行される）。 In clock cycle # 2, data is read from the corresponding entry. This reading is executed in the same manner as the bit serial calculation described in FIG. That is, a word line is selected according to the address pointer in the memory cell mat, then the sense amplifier is activated, and the data in the selected memory cell is transferred toward the arithmetic unit (load instruction is executed).

このクロックサイクル♯２において、また、データ読出と並行して、クロック信号ＣＬＫＸの立上がりと同期して、転送イネーブル信号ＥＣＭ＿ＥＮおよび転送ロードイネーブル信号ＥＣＭ＿ＬＤ＿ＥＮが活性状態に設定される。プリチャージ指示信号／ＥＭＣ＿ＰＲＣは、活性状態にある。応じて、対応のエントリから読出されたデータが、セレクタ３７０において選択され、トライステートバッファ３６４により増幅されて、転送出力データＥＣＭ＿ＯＵＴとして、対応のエントリから読出データＤ［ｉ］に対応するデータが出力される。 In clock cycle # 2, and in parallel with data reading, transfer enable signal ECM_EN and transfer load enable signal ECM_LD_EN are set in an active state in synchronization with the rise of clock signal CLKX. Precharge instruction signal / EMC_PRC is in an active state. Accordingly, data read from the corresponding entry is selected by selector 370, amplified by tristate buffer 364, and data corresponding to read data D [i] is output from the corresponding entry as transfer output data ECM_OUT. Is done.

次いで、このクロックサイクル♯２において、クロック信号ＣＬＫＸの立下がりに同期して、プリチャージ指示信号／ＥＣＭ＿ＰＲＣを非活性状態に設定し、またエントリ選択信号ＥＣＭ＿ＥＮ＿＋１を活性状態に設定する。応じて、スイッチ回路３６０がイネーブルされ、この隣接エントリ（ｉ＋１）からデータ転送配線を介して入力部ＥＣＭ＿ＩＮ＿＋１上に伝達されたデータＤ［ｉ＋１］が選択される。このとき、セレクタ３６２は、転送イネーブル信号ＥＣＭ＿ＥＮに従って、スイッチ回路３６０の出力信号を選択する状態に設定されており、セレクタ３６２を介してスイッチ回路３６０からのデータＤ［ｉ＋１］に応じたデータが、Ｘレジスタ５４へ転送される。 Next, in clock cycle # 2, in synchronization with the fall of clock signal CLKX, precharge instruction signal / ECM_PRC is set to an inactive state, and entry selection signal ECM_EN_ + 1 is set to an active state. Accordingly, the switch circuit 360 is enabled, and the data D [i + 1] transmitted from the adjacent entry (i + 1) to the input unit ECM_IN_ + 1 via the data transfer wiring is selected. At this time, the selector 362 is set to select the output signal of the switch circuit 360 in accordance with the transfer enable signal ECM_EN, and the data corresponding to the data D [i + 1] from the switch circuit 360 via the selector 362 is Transferred to the X register 54.

このクロックサイクル♯２においてクロック信号ＣＬＫＸがＬレベルに立下がると、Ｘレジスタ５４は、スルー状態となり、与えられたデータを取り込む。クロックサイクル♯３において、クロック信号ＣＬＫＸがＨレベルに立上がると、Ｘレジスタ５４が、取り込んだデータをラッチし、その保持データが、データＤ［ｉ＋１］に確定する。 When clock signal CLKX falls to L level in clock cycle # 2, X register 54 enters a through state and takes in the applied data. In clock cycle # 3, when clock signal CLKX rises to H level, X register 54 latches the fetched data, and the retained data is determined as data D [i + 1].

したがって、この場合、クロックサイクル♯２において、メモリセルマットからのデータ読出と、エントリ間通信を行なったことになる。これにより、Ｘレジスタ５４に、一旦転送データを保持させる動作が不要となり、エントリ間通信に要するクロックサイクル数を低減することができる。 In this case, therefore, data read from the memory cell mat and inter-entry communication are performed in clock cycle # 2. As a result, the operation of once holding the transfer data in the X register 54 becomes unnecessary, and the number of clock cycles required for inter-entry communication can be reduced.

以上のように、Ｘレジスタ５４をバイパスして、対応のエントリからのデータに従って、転送データを生成して、データ転送配線経路のデータ出力部へ転送することにより、Ｘ
レジスタに転送データを格納するサイクルが不要となり、高速のエントリ間通信を実現することができる。 As described above, by bypassing the X register 54, transfer data is generated according to the data from the corresponding entry, and transferred to the data output unit of the data transfer wiring path.
A cycle for storing transfer data in a register is not required, and high-speed communication between entries can be realized.

なお、このエントリ指定信号ＥＣＭ＿ＥＮ＿ｋおよびエントリ転送指示信号ＥＣＭ＿ＥＮおよびプリチャージ指示信号／ＥＣＭ＿ＰＲＣは、先の図１に示すコントローラ２２からプログラム命令（マイクロ命令メモリ）２１に格納された命令に従って生成される。 The entry designation signal ECM_EN_k, the entry transfer instruction signal ECM_EN, and the precharge instruction signal / ECM_PRC are generated according to the instruction stored in the program instruction (microinstruction memory) 21 from the controller 22 shown in FIG.

また、演算器の構成としては、Ｘレジスタ、ＸＨレジスタおよびＸＬレジスタを含む図８に示す構成の場合、およびＸレジスタのみがデータ転送用に設けられる図６に示す構成のいずれが用いられても良い。演算器内のデータ転送用のレジスタに対して図１７または図１９に示す構成を設ける。 Further, as the configuration of the arithmetic unit, either the configuration shown in FIG. 8 including the X register, the XH register, and the XL register, or the configuration shown in FIG. good. The configuration shown in FIG. 17 or FIG. 19 is provided for the register for data transfer in the arithmetic unit.

この実施の形態２に従うＡＬＵ間接続回路の構成においても、１本のデータ転送配線の通信可能エントリ以外のエントリとの間でデータ転送を行う場合、データ転送動作を繰返し実行する。すなわち、転送エントリ選択信号ＥＣＭ＿ＥＮ＿±ｊを順次ｊを設定してデータ転送サイクル（サイクル♯２の動作）を行うことにより、転送データを順次シフトして転送して目標エントリに転送することができ、任意のエントリ間でのデータ通信を実現することができる。 Also in the configuration of the inter-ALU connection circuit according to the second embodiment, when data transfer is performed with an entry other than the communicable entry on one data transfer wiring, the data transfer operation is repeatedly executed. That is, by sequentially setting the transfer entry selection signal ECM_EN_ ± j to j and performing a data transfer cycle (operation in cycle # 2), the transfer data can be sequentially shifted and transferred to the target entry. Data communication between arbitrary entries can be realized.

以上のように、このエントリ間の通信機能を利用することにより、メモリセルマットに格納されたデータを、たとえばバルブシフタを用いた場合と同様に、エントリ間をシフトさせることができ、シフトレジスタを用いたデータ転送と同様のデータ転送動作を実現することができる。このデータ転送機能を利用することにより、データのコピー操作、画像データ処理において画素マトリクスを利用するフィルタ処理等での隣接画素データを利用する演算等を高速で実行することが可能となる。 As described above, by using the communication function between the entries, the data stored in the memory cell mat can be shifted between the entries as in the case of using a valve shifter, for example. Data transfer operation similar to the data transfer that has been performed can be realized. By using this data transfer function, it is possible to execute data copy operations, calculations using adjacent pixel data in filter processing using a pixel matrix in image data processing, and the like at high speed.

この発明は、メモリセルマットが複数のエントリに分割され、エントリに対応して演算器が配置される半導体演算処理装置に適用することにより、小配線レイアウト面積で高速にデータを転送して演算処理を行なう、多機能の高速演算処理装置を実現することができる。 The present invention is applied to a semiconductor processing unit in which a memory cell mat is divided into a plurality of entries, and an arithmetic unit is arranged corresponding to the entry, thereby transferring data at high speed with a small wiring layout area. It is possible to realize a multi-functional high-speed arithmetic processing device.

この半導体演算処理装置は、画像データまたは音声データなどの大量のデータの処理用途に適用することにより、フィルタ処理などの、エントリ間のデータを用いて処理を行なう場合に、高速演算処理を実現することができる。 This semiconductor arithmetic processing device is applied to processing a large amount of data such as image data or audio data, thereby realizing high-speed arithmetic processing when processing is performed using data between entries such as filter processing. be able to.

この発明に従うエントリ間転送回路が適用される半導体演算処理装置の全体の構成を概略的に示す図である。1 is a diagram schematically showing an overall configuration of a semiconductor processing unit to which an interentry transfer circuit according to the present invention is applied. FIG. 図１に示す主演算回路の構成を概略的に示す図である。FIG. 2 is a diagram schematically showing a configuration of a main arithmetic circuit shown in FIG. 1. 図２に示す主演算回路における演算シーケンスを模式的に示す図である。FIG. 3 is a diagram schematically showing an arithmetic sequence in the main arithmetic circuit shown in FIG. 2. 図２に示す主演算回路における演算処理シーケンスの内部タイミングを示す図である。FIG. 3 is a diagram showing an internal timing of an arithmetic processing sequence in the main arithmetic circuit shown in FIG. 2. 図１に示すコントローラの制御態様を概略的に示す図である。It is a figure which shows roughly the control aspect of the controller shown in FIG. 図１に示す演算器の構成の一例を示す図である。It is a figure which shows an example of a structure of the calculator shown in FIG. 図６に示す演算器を用いる際のエントリ間データ移動を行なう命令を一覧にして示す図である。FIG. 7 is a diagram showing a list of commands for performing data movement between entries when the arithmetic unit shown in FIG. 6 is used. 図１に示す演算器の他の構成を概略的に示す図である。It is a figure which shows schematically the other structure of the calculator shown in FIG. 図８に示す演算器の２ビット演算時の接続を示す図である。It is a figure which shows the connection at the time of 2-bit calculation of the calculator shown in FIG. 図８に示す演算器の１ビット演算時の内部接続を示す図である。It is a figure which shows the internal connection at the time of 1 bit calculation of the calculator shown in FIG. 図８に示す主演算器を用いるエントリ間データ移動を指定する命令を一覧して示す図である。FIG. 9 is a diagram showing a list of instructions for designating data movement between entries using the main arithmetic unit shown in FIG. 8. 図８に示す演算器の２ビット動作時のエントリ間データ移動の命令を一覧して示す図である。FIG. 9 is a diagram showing a list of data movement instructions between entries when the arithmetic unit shown in FIG. この発明の実施の形態１に従うＡＬＵ間相互接続スイッチ回路の配線を概略的に示す図である。It is a figure which shows roughly the wiring of the inter-ALU interconnection switch circuit according to Embodiment 1 of this invention. 図１３に示す配線レイアウトにおけるＸレジスタのデータ転送時のレジスタ値を一覧にして示す図である。FIG. 14 is a diagram showing a list of register values at the time of data transfer of the X register in the wiring layout shown in FIG. 13. 図１３に示す配線レイアウトのメモリセルマット両端部のエントリ間のデータ転送経路を概略的に示す図である。FIG. 14 schematically shows a data transfer path between entries at both ends of the memory cell mat in the wiring layout shown in FIG. 13. この発明の実施の形態１に従うＡＬＵ間相互接続スイッチ回路と演算器との配線接続の構成を概略的に示す図である。It is a figure which shows roughly the structure of the wiring connection of the inter-ALU interconnection switch circuit and arithmetic unit according to Embodiment 1 of this invention. 図１６に示すＡＬＵ間相互接続回路６５の構成を概略的に示す図である。FIG. 17 schematically shows a configuration of an ALU interconnection circuit 65 shown in FIG. 16. 図１７に示すＡＬＵ間相互接続回路の動作を示すタイミング図である。FIG. 18 is a timing chart showing an operation of the inter-ALU interconnection circuit shown in FIG. 17. この発明の実施の形態２に従うＡＬＵ間接続回路（エントリ間転送回路）の構成を概略的に示す図である。It is a figure which shows roughly the structure of the connection circuit between ALUs (inter-entry transfer circuit) according to Embodiment 2 of this invention. 図１９に示すＡＬＵ間相互接続回路の動作を示すタイミング図である。FIG. 20 is a timing diagram showing an operation of the inter-ALU interconnection circuit shown in FIG. 19.

Explanation of symbols

１演算処理システム、２システムＬＳＩ、ＦＢ１−ＦＢｈ基本演算ブロック、２０主演算回路、３０メモリセルマット、３１演算器（ＡＬＵ）、３２ＡＬＵ間相互接続用スイッチ回路、２２コントローラ、５４Ｘレジスタ、６５ＡＬＵ間接続回路、３００データ転送配線、ＸＰ１，ＸＰ２，ＸＰ４，ＸＮ１，ＸＮ２，ＸＮ４転送データ送出部、３２０単位配線領域、３３０ｐ，３３０ｎメモリセルエントリ端部間シフト動作用配線、３０５ａ，３０５ｈ，３０５ｌ転送データ出力部、３４０ａ，３４０ｈ，３４０ｌ転送配線束、３５０ａ，３５０ｈ，３５０ｌ転送データ入力部、３６０スイッチ回路、３６２セレクタ、３６４トライステートバッファ、ＰＳＱ２５６，ＰＳＱ１，ＮＳＱ１，ＮＳＱ２５６選択スイッチゲート、３７０セレクタ。 1 arithmetic processing system, 2 system LSI, FB1-FBh basic arithmetic block, 20 main arithmetic circuit, 30 memory cell mat, 31 arithmetic unit (ALU), 32 ALU interconnection switch circuit, 22 controller, 54 X register, 65 ALU connection circuit, 300 data transfer wiring, XP1, XP2, XP4, XN1, XN2, XN4 transfer data sending section, 320 unit wiring area, 330p, 330n wiring for shifting operation between memory cell entry ends, 305a, 305h, 305l Transfer data output unit, 340a, 340h, 340l Transfer wiring bundle, 350a, 350h, 350l Transfer data input unit, 360 switch circuit, 362 selector, 364 tristate buffer, PSQ256, PSQ1, NSQ1, NSQ256 Tchigeto, 370 selector.

Claims

A memory array that is divided into a plurality of entries, each having a plurality of memory cells;
A plurality of arithmetic units arranged corresponding to the entries, each performing arithmetic processing on given data, and a transfer circuit for transferring data between the plurality of entries;
The transfer circuit includes:
A plurality of transfer wiring paths arranged corresponding to the entries, each of which transfers the data of the corresponding entry to one of a plurality of different entries, and each of the transfer wiring paths includes data of the corresponding entry A data output unit that receives data and a plurality of data transmission units coupled to each of the plurality of different different entries, and transfer data is transferred from the data output unit toward the data transmission unit Processing equipment.

The plurality of transfer wiring paths are arranged in a plurality of columns in the second direction so that the data output sections are aligned in the second direction and the data output unit is shifted by one entry in the first direction. Semiconductor processing unit.

In the transfer circuit, a plurality of transfer wiring path sending sections are arranged corresponding to one entry,
Each of the arithmetic units includes a register circuit,
The transfer circuit further includes:
Arranged corresponding to each of the entries, coupled to the transmission unit of the corresponding plurality of transfer wiring paths, selected one transmission unit according to the selection signal, and transmitted on the transfer wiring path of the selected transmission unit A switch circuit that generates a signal corresponding to the signal;
The data from the corresponding entry and the output data of the switch circuit are received, the output data of the switch circuit is transferred to the register circuit according to a transfer instruction, and the data from the corresponding entry is transferred to the register circuit according to a transfer non-instruction A selection circuit to transfer to,
The semiconductor arithmetic processing device according to claim 1, further comprising: a transmission circuit that transmits the held data from the register circuit to an output unit of a corresponding transfer path in accordance with the transfer instruction.

In the transfer circuit, a plurality of transfer wiring path sending units are arranged corresponding to one entry,
Each of the arithmetic units includes a register circuit,
The transfer circuit further includes:
A switch that is arranged corresponding to an entry, is coupled to a transmission unit of a plurality of corresponding transfer wiring paths, selects one transmission unit according to a selection signal, and generates a signal corresponding to the transmission signal of the selected transmission unit Circuit,
The data from the corresponding entry and the output data of the switch circuit are received, the output data of the switch circuit is transferred to the register circuit according to a transfer instruction, and the data from the corresponding entry is transferred to the register circuit according to a transfer non-instruction A first selection circuit to transfer to
A second selection circuit that receives the data held in the register circuit and the data from the corresponding entry, and transfers the data from the corresponding entry in accordance with the activation of the selection transfer instruction signal;
The semiconductor arithmetic processing device according to claim 1, further comprising: a transmission circuit configured to transmit data provided from the second selection circuit to an output unit of a corresponding transfer path in accordance with the transfer instruction.

The switch circuit is
A precharge element for precharging the internal node in accordance with the activation of the precharge instruction signal;
A plurality of selection switch gates provided corresponding to each of the plurality of sending units and selectively driving the internal node to a voltage level different from the precharge voltage in accordance with an entry selection signal and a signal from the corresponding sending unit; 5. A semiconductor processing unit according to claim 3 or 4.

Each of the selection switch gates is connected in series between the internal node and a node for applying a reference potential, and the first and second transistors receive the entry selection signal and the signal from the corresponding sending unit at the respective gates. The semiconductor arithmetic processing apparatus according to claim 5, comprising an element.