JPH07182157A

JPH07182157A - Digital signal processor

Info

Publication number: JPH07182157A
Application number: JP6258083A
Authority: JP
Inventors: Atsumichi Murakami; 篤道村上; Isao Uesawa; 功上澤; Masatoshi Kameyama; 正俊亀山
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1994-10-24
Filing date: 1994-10-24
Publication date: 1995-07-21

Abstract

PURPOSE:To fast transfer data to a module by performing the input/output control of data for each block and at a direct memory transfer control part between an external data memory connecting part and an internal, data memory via a direct memory transfer bus. CONSTITUTION:The input data signals are inputted through plural data input buses 104 and stored in the internal data memories 107 and 108. Then the stored data signals are processed at an arithmetic part 106 and an address generating part 103. Meanwhile an external data memory 105 is connected to a read/write port of an external data memory connecting part 111. Furthermore the memory 105 is connected to a direct memory transfer bus 110 via a direct memory transfer control part 109 which performs the input/output control of data for each block between the part 111 and both memories 107 and 108.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、主に信号系列を対象
とした演算処理を実行するディジタル信号処理プロセッ
サに関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a digital signal processor which mainly executes arithmetic processing on signal sequences.

【０００２】[0002]

【従来の技術】図７は、例えば昭和６１年度電子通信学
会通信部門全国大会シンポジウム予稿（Ｎｏ．Ｓ１０−
１）に示された従来のディジタル信号処理プロセッサで
あるＤＳＳＰ１（ＤｉｇｉｔａｌＳｐｅｅｃｈＳｉ
ｇｎａｌＰｒｏｃｅｓｓｏｒ１）の構成を示すブロッ
ク図であり、図において、１は命令アドレスを制御する
スタックを内蔵したプログラム・カウンタＰＣ、２はマ
イクロ命令を記憶した命令マスクＲＯＭ、３はこの命令
マスクＲＯＭ２ないし外部から入力されるマイクロ命令
をマシンサイクル毎に１語入力するインストラクション
・レジスタＩＲ０、４はこのインストラクション・レジ
スタＩＲ０３へ入力されたマイクロ命令中のデコード
が必要なビットフィールドのみを入力するインストラク
ションレジスタＩＲ１、５はこのインストラクションレ
ジスタＩＲ１４へ入力されたマイクロ命令をデコード
するインストラクションデコーダ、６はマイクロ命令を
各機能部へ分配するプログラムバスＰ−Ｂｕｓ、７はこ
のプログラムバスＰ−Ｂｕｓ６から出力されるマイクロ
命令中の即値（１８ビット幅）を入力し、データバスＤ
−Ｂｕｓ８へ出力するレジスタＢＩ、８は演算に伴うデ
ータの内部転送に用いられる１８ビット幅のデータバス
Ｄ−Ｂｕｓ、９はデータメモリのアドレスモード指示を
プログラムバスＰ−Ｂｕｓ６から入力するレジスタＡ
Ｍ、１０は間接アドレス生成に使用するアドレスポイン
タ情報を保持する４ｗ×１６ビット幅のレジスタＡＤ、
１１は外部データ・メモリのページを指定する３ビット
幅のページレジスタＰＲ、１２は同時に最大３つのアド
レス生成が可能な９ビット幅のアドレス算出器ＡＡＵ、
１３はアドレスレジスタＡＲ０、１４はアドレス・レジ
スタＡＲ１、１５はアドレスレジスタＡＲ２、１６はア
ドレスセレクタＲＡＳ、１７はループカウンタＬＣ、１
８はプロセッサの動作モードおよび状態の表示を行うス
テータスレジスタＳＲ、１９はシリアルＩ／ＯポートＳ
Ｉ０／１、ＳＯ０／１３２と外部データメモリとの間
で直接データ転送を行うＤＭＡ制御部、２０は外部デー
タメモリへ出力する１２ビット幅のアドレスを保持する
アドレスレジスタＡＲ、２１は５１２ｗ×１８ビットの
容量を持ち、同時に２つのデータの読み出し、書き込み
が可能なデュアルポート内部データメモリ２Ｐ−ＲＡ
Ｍ、２２は被演算入力データを保持するレジスタＤＰ
０、２３は演算入力データを保持するレジスタＤＰ１、
２４は１２Ｅ６ビットフォーマットの浮動小数点乗算を
行う乗算器ＦＭＰＬ、２５はこの乗算器ＦＭＰＬ２４の
結果を保持するレジスタＰ、２６はセレクタ、２７はセ
レクタ、２８は主に１２Ｅ６ビットフォーマットの浮動
小数点演算を実行する浮動小数点算術論理演算器ＦＡＬ
Ｕ、２９はこの浮動小数点算術論理演算器ＦＡＬＵ２８
の出力を保持し、累算等に使用する４ｗ×１８ビットの
アキュームレータＡＣＣ０〜ＡＣＣ３、３０は外部デー
タメモリに対する読み出し／書き込みデータを一時保持
する目的でデータバスＤ−Ｂｕｓ８に接続されたデータ
レジスタＤＲ、３１は外部データメモリの読み出し／書
き込み制御回路Ｒ／ＷＣｏｎｔ、３２は外部デバイス
との間で全２重２チャンネルのシリアルデータ転送を実
行するシリアルＩ／ＯポートＳＩ０／１，ＳＯ０／１、
３３は割込制御回路Ｉｎｔ．Ｃｏｎｔ．、３４は外部デ
ータメモリバス制御回路ＢｕｓＣｏｎｔ．、３５は内部
タイミングを制御するクロック制御回路ＣＬＫＣｏｎ
ｔ．、３６はセレクタである。2. Description of the Related Art FIG.
1) A conventional digital signal processor DSSP1 (Digital Speech Si)
FIG. 1 is a block diagram showing a configuration of a general processor 1), in which 1 is a program counter PC having a stack for controlling an instruction address, 2 is an instruction mask ROM storing microinstructions, and 3 is the instruction mask ROM 2 or an external unit. The instruction registers IR0 and 4 for inputting one word of the micro instruction input from the instruction register IR0, 4 input only the bit field in the micro instruction input to the instruction register IR0 3 which needs decoding. Reference numeral 5 is an instruction decoder for decoding the microinstruction input to the instruction register IR14, 6 is a program bus P-Bus for distributing the microinstruction to each functional unit, and 7 is this program bus P-. Output from us6 enter the immediate in the microinstruction (18 bits wide), the data bus D
-Register BI for outputting to Bus8, 8 is a data bus D-Bus having an 18-bit width used for internal transfer of data associated with operation, and 9 is a register A for inputting an address mode instruction of the data memory from program bus P-Bus6.
M and 10 are registers AD having a width of 4 × 16 bits for holding address pointer information used for indirect address generation,
Reference numeral 11 is a 3-bit width page register PR for designating a page of the external data memory, 12 is a 9-bit width address calculator AAU capable of simultaneously generating a maximum of three addresses,
13 is an address register AR0, 14 is an address register AR1, 15 is an address register AR2, 16 is an address selector RAS, 17 is a loop counter LC, 1
8 is a status register SR for displaying the operation mode and state of the processor, 19 is a serial I / O port S
A DMA control unit that directly transfers data between I0 / 1, SO0 / 1 32 and an external data memory, 20 is an address register AR that holds a 12-bit wide address to be output to the external data memory, and 21 is 512w × 18. Dual port internal data memory 2P-RA with bit capacity and capable of reading and writing two data at the same time
M and 22 are registers DP for holding the input data to be operated
0 and 23 are registers DP1 for holding operation input data,
24 is a multiplier FMPL for performing floating point multiplication in 12E6 bit format, 25 is a register P for holding the result of the multiplier FMPL24, 26 is a selector, 27 is a selector, and 28 is mainly 12E6 bit format floating point operation Floating point arithmetic logic unit FAL
U and 29 are the floating point arithmetic logic unit FALU28
The 4w × 18-bit accumulators ACC0 to ACC3, 30 for holding the output of the memory and used for accumulation and the like are data registers DR connected to the data bus D-Bus8 for the purpose of temporarily holding the read / write data for the external data memory. , 31 is a read / write control circuit R / W Cont of an external data memory, 32 is a serial I / O port SI0 / 1, SO0 / 1 for executing serial / double-channel serial data transfer with an external device,
33 is an interrupt control circuit Int. Cont. , 34 are external data memory bus control circuits BusCont. , 35 are clock control circuits CLKCon for controlling internal timing
t. , 36 are selectors.

【０００３】図８は図７に示したディジタル信号処理プ
ロセッサＤＳＳＰ１のマイクロ命令実行シーケンスを説
明したタイムチャートであり、図において、４０は４相
のクロックからなるサイクルタイミング、４１はプログ
ラムカウンタＰＣ１のアドレス出力およびインストラク
ションレジスタＩＲ０３へのマイクロ命令入力のステ
ージを示すフェッチステージタイミング、４２はインス
トラクションレジスタＩＲ１４へ入力されたマイクロ
命令をインストラクションデコーダ５でデコードするデ
コードステージ・タイミング、４３はデコードステージ
においてアドレス算出器ＡＡＵ１２の更新を行うタイミ
ング、４４は浮動小数点乗算器ＦＭＰＬ２４が動作を行
うタイミング、４５は浮動小数点算術論理演算器ＦＡＬ
Ｕ２８が演算を行うタイミング、４６はデータバスＤ−
Ｂｕｓ８を経由してレジスタ間のデータ転送を行うタイ
ミング、４７はデータレジスタＤＲ３０を介して外部デ
ータメモリへデータの読み出し／書き込みを行うタイミ
ングである。FIG. 8 is a time chart for explaining a microinstruction execution sequence of the digital signal processor DSSP1 shown in FIG. 7. In the figure, 40 is a cycle timing composed of four-phase clocks, 41 is an address of the program counter PC1. Fetch stage timing indicating the stage of output and microinstruction input to the instruction register IR03, 42 is decode stage timing for decoding the microinstruction input to the instruction register IR14 by the instruction decoder 5, 43 is address calculation in the decode stage Of updating the unit AAU12, 44 is a timing of operation of the floating point multiplier FMPL24, and 45 is a floating point arithmetic logic unit FAL.
U28 is the timing of the calculation, 46 is the data bus D-
Reference numeral 47 is a timing for transferring data between registers via Bus 8, and 47 is a timing for reading / writing data from / to external data memory via the data register DR30.

【０００４】図９は図７に示したディジタル信号処理プ
ロセッサＤＳＳＰ１の４グループに分類された１語当り
３２ビット幅で構成されるマイクロ命令の構造を示す図
であり、５０は命令動作手順を制御するシーケンス命
令、５１はステータスレジスタＳＲ１７、アドレス算出
器ＡＡＵ１２、ＤＭＡ制御部１９のモード設定・初期値
設定を示すモード命令、５２は主に浮動小数点算術論理
演算器ＦＡＬＵ２８に対する実行とそれに伴う並列デー
タ転送を制御する演算命令、５３は任意のレジスタない
しデータメモリへ即値ロードを実行するロード命令であ
る。FIG. 9 is a diagram showing the structure of a microinstruction having a 32-bit width per word, which is classified into 4 groups of the digital signal processor DSSP1 shown in FIG. 7, and 50 controls the instruction operating procedure. A sequence instruction to perform, 51 is a mode instruction indicating the mode setting / initial value setting of the status register SR17, the address calculator AAU12, and the DMA control unit 19, and 52 is mainly execution to the floating-point arithmetic logic operation unit FALU28 and accompanying parallel data transfer. Is a load instruction for executing an immediate load to an arbitrary register or data memory.

【０００５】次に動作について説明する。以下、簡単の
ために各部の名称は前記説明中で用いた略称を用いるも
のとする。先ず、図７に基づき全体の概略動作を説明す
る。本信号処理プロセッサはＰ−Ｂｕｓ６とＤ−Ｂｕｓ
８が分離された構成を持ち、ＩＲ０３へのマイクロ命
令入力、Ｐ−Ｂｕｓ６を介したマイクロ命令の転送、イ
ンストラクションデコーダ５によるマイクロ命令のデコ
ード、Ｄ−Ｂｕｓ８、ＦＭＰＬ２４、ＦＡＬＵ２８等に
よる命令の実行をパイプライン処理によって並列に処理
を行う。ここで、Ｄ−Ｂｕｓ８、２Ｐ−ＲＡＭ２１を始
めとする各実行ユニットは全てレジスタベース、すなわ
ち、入力と出力は全てレジスタに接続された形式とな
る。このレジスタへのアクセスタイミングは、マシンサ
イクルの前縁で出力し、マシンサイクルの後縁でレジス
タへセットが行われる。すなわち、実際に処理されるデ
ータは同一マイクロ命令によってレジスタへセットされ
た内容ではなく、１以上前のマイクロ命令でレジスタへ
セットされた内容となる。これを、遅延動作（デイレー
ド動作）と呼び、レジスタで演算部内の各部を区切るこ
とで各部を並列に動作させることが可能となる。例え
ば、ＦＭＰＬ２４は本プロセッサではマシンサイクル毎
に１回浮動小数点乗算を常に実行している。ここへ演算
データを入力する場合には、先ず１つ前のマイクロ命令
でＤＰ０２２、ＤＰ１２３へデータをセットし、１
つ以上後のマイクロ命令でＰ２５にセットされている内
容を取り出すことで乗算結果を得る。この内容を取り出
すまでの間ＤＰ０２２、ＤＰ１２３、Ｐ２５によっ
てデータを保持することから、本来はデータ入力、乗
算、データ出力と３マイクロ命令を必要とする１回の乗
算も、連続して処理を行う場合には、等価的に１マイク
ロ命令に１回処理を行うことができる。Next, the operation will be described. Hereinafter, for simplification, the names of the respective parts shall be the abbreviations used in the above description. First, the overall schematic operation will be described with reference to FIG. This signal processor is P-Bus 6 and D-Bus
8 has a separated structure, and inputs microinstructions to IR03, transfers microinstructions via P-Bus6, decodes microinstructions by instruction decoder 5, and executes instructions by D-Bus8, FMPL24, FALU28, etc. Parallel processing is performed by pipeline processing. Here, all the execution units including the D-Bus 8 and 2P-RAM 21 are register-based, that is, the input and output are all connected to the register. The access timing to this register is output at the leading edge of the machine cycle, and the register is set at the trailing edge of the machine cycle. That is, the data actually processed is not the content set in the register by the same microinstruction, but the content set in the register by one or more previous microinstructions. This is called a delay operation (delayed operation), and it is possible to operate each unit in parallel by dividing each unit in the arithmetic unit with a register. For example, the FMPL 24 always executes the floating point multiplication once every machine cycle in this processor. When inputting the operation data here, first, the data is set in DP0 22 and DP1 23 by the previous microinstruction, and 1
The multiplication result is obtained by taking out the contents set in P25 by one or more subsequent microinstructions. Since the data is held by DP0 22, DP1 23, and P25 until the contents are taken out, data input, multiplication, data output, and one multiplication that originally requires 3 microinstructions can be processed continuously. When performing, processing can be equivalently performed once for one microinstruction.

【０００６】ＤＳＳＰ１ではＦＭＰＬ２４とＦＡＬＵ２
８がＰ２５を介して接続され、ＦＡＬＵ２８はＡＣＣ０
〜ＡＣＣ３２９でＰ２５の内容を累算可能な様構成さ
れている。これは、ＬｏｕｉｓＳｃｈｉｒｍがＥｌｅ
ｃｔｒｏｎｉｃｓ１９７９年１２月２０日号で発表し
た論文“Ｐａｃｋｉｎｇａｓｉｇｎａｌｐｒｏｃ
ｅｓｓｏｒｏｎｔｏａｓｉｎｇｌｅｄｉｇｉｔ
ａｌｂｏａｄ”に示した乗算器−累算器の１対と同
様、フィルタリング、ＦＦＴ（ＦａｓｔＦｏｕｒｉｅ
ｒＴｒａｎｓｆｏｒｍ）のバラフライ演算等で多用さ
れる積和演算の１項を１マシンサイクルで実行するため
のものである。積和は例えば以下の式に従う。In the DSPP1, the FMPL24 and the FALU2
8 is connected via P25, FALU28 is ACC0
~ ACC329 is configured so that the contents of P25 can be accumulated. This is Louis Schirm's Ele
paper published in the December 20, 1979 issue of ctronics, "Packing a signal proc".
essor onto a single digit
Similar to the multiplier-accumulator pair shown in “al board”, filtering, FFT (Fast Fourier) is performed.
This is for executing one term of the multiply-accumulate operation, which is frequently used in the r. The sum of products follows, for example, the following formula.

【０００７】[0007]

【数１】 [Equation 1]

【０００８】本プロセッサにおいては１項の積和はＤＰ
０２２、ＤＰ１２３へのデータ入力、ＦＭＰＬ２４
での乗算、ＦＡＬＵ２８でＰ２５へセットされた乗算結
果とＡＣＣ０〜ＡＣＣ３２９の累算の３マイクロ命令
を必要とする。もちろん連続して処理を行う場合には、
等価的に１マイクロ命令に１回、１項の積和を実現する
ことができる。当然、この様に１マイクロ命令に１回、
１項の積和を実行するためには１マイクロ命令毎に前出
の式中のａ_i ，ｂ_i に相当する２つの入力データをＤＰ
０２２、ＤＰ１２３へ入力することが必要となる。
そのため、２Ｐ−ＲＡＭ２１によってこの２つの入力デ
ータを供給可能とし、Ｄ−Ｂｕｓ８へのバス競合を避け
るため、２Ｐ−ＲＡＭ２１から読み出されたデータはＤ
−Ｂｕｓ８を介さずにＤＰ０２２，ＤＰ１２３へ直
接転送するパスを備える。主としてこの２Ｐ−ＲＡＭ２
１の２入力データのアドレス指定のため、ＡＡＵ１２は
ＡＲ０１３，ＡＲ１１４，ＡＲ２１５を介して出
力される９ビット幅のアドレスデータ中の２つを選択し
て出力する手段を備える。このＡＡＵ１２は２Ｐ−ＲＡ
Ｍ２１からの２入力データアドレスとＤＲ３０，ＡＲ２
０を介した外部データメモリへの１出力データアドレス
の場合にのみ最高３つのアドレスを同時に指定できる様
に構成される。各々のアドレス指定は全て、ＡＡＵ１２
の内部に設定されたアドレスポインタを用いたいわゆる
間接アドレス指定方式のみとなっており、ＡＲ０１３
に対してはインクリメント，モジュロ，ビットリバー
ス，リピート，インクリメントベースアドレス，インク
リメント値の更新等が可能であり、他のＡＲ１１４，
ＡＲ２１５は単純なインクリメントのみが可能となっ
ている。ＡＡＵ１２は９ビット自然２進形式でのみアド
レス演算が可能であり、外部データメモリアドレス１２
ビットを指定する時は、この９ビットにＰＲ１１で指示
される３ビットのメモリページ指定とあわせて１２ビッ
トとする。一方、ＦＭＰＬ２４，ＦＡＬＵ２８は１２Ｅ
６の正規化浮動小数点形式で演算を実行するため、２Ｐ
−ＲＡＭ２１，ＤＰ０２２，ＤＰ１２３，Ｐ２５，
ＡＣＣ０〜ＡＣＣ３２９，ＤＲ３０，Ｄ−Ｂｕｓ８，
ＢＩ７は全て１８ビット幅であり，ＦＡＬＵ２８で特別
なアドレス初期値を算出するためには特殊な演算モード
を必要とする。このため、ＡＲ０１３，ＡＲ１１
４，ＡＲ２１５，ＡＲ２０とＡＣＣ０〜ＡＣＣ３２
９へセットされる演算結果データの間のデータ互換性は
無い。In this processor, the sum of products of one term is DP
0 22, data input to DP 1 23, FMPL 24
3 multiplications of the multiplication result set in P25 by the FALU 28 and the accumulation of ACC0 to ACC329 are required. Of course, when performing processing continuously,
Equivalently, the sum of products of one term can be realized once for one microinstruction. Naturally, once every 1 microinstruction,
In order to execute the sum of products of one term, DP is applied to the two input data corresponding to a _i and b _i in the above equation for each microinstruction.
It is necessary to input to 0 22 and DP1 23.
Therefore, the 2P-RAM 21 can supply these two input data, and the data read from the 2P-RAM 21 is D in order to avoid the bus competition to the D-Bus 8.
-A path for directly transferring to DP0 22 and DP1 23 without going through Bus8 is provided. Mainly this 2P-RAM2
For addressing 1-input 2-input data, the AAU 12 is provided with means for selecting and outputting 2 out of 9-bit width address data output via AR 0 13, AR 1 14 and AR 2 15. This AAU12 is 2P-RA
2 input data address from M21 and DR30, AR2
Only one output data address to the external data memory via 0 can be designated up to three addresses at the same time. Each addressing is AAU12
Only the so-called indirect addressing method using the address pointer set inside the
Increment, modulo, bit reverse, repeat, increment base address, increment value update, etc. are possible with respect to other AR1 14,
AR2 15 is only capable of simple increments. The AAU12 can perform address arithmetic only in the 9-bit natural binary format, and the external data memory address 12
When specifying the bits, the 9 bits are combined with the 3-bit memory page specified by PR11 to make 12 bits. On the other hand, FMPL24 and FALU28 are 12E
2P for performing operations in the normalized floating point format of 6
-RAM 21, DP0 22, DP1 23, P25,
ACC0 to ACC3 29, DR30, D-Bus8,
BI7 is all 18 bits wide and requires a special operation mode in order to calculate a special address initial value in FALU28. Therefore, AR0 13, AR1 1
4, AR2 15, AR20 and ACC0 to ACC3 2
There is no data compatibility between the operation result data set to 9.

【０００９】ＤＭＡ制御部１９は合計２チャンネルの全
２重シリアルＩ／ＯポートＳＩ０／１，ＳＯ０／１３
２の入出力データと外部データメモリ間とのデータ転送
をマイクロ命令とは独立に実行する。ＤＭＡ制御部１９
によるデータ転送にはＤ−Ｂｕｓ８，ＡＲ２０，ＤＲ３
０を使用するため、インストラクションデコーダ５で制
御されるマイクロ命令動作とこの内部リソースの競合が
生じる危険がある。これを回避する目的でＤＭＡ制御部
１９によるデータ転送の際には１ワードにつき、６マシ
ンサイクルの間インストラクションデコーダ５を休止
し、マイクロ命令による動作を止める。以上をまとめる
と、ＤＳＳＰ１はマイクロ命令実行時に１マイクロ命令
内で以下の動作を並列に実行することが可能である。（１）ＡＡＵ１２による最大３種の９ビットアドレス演
算。（２）ＦＭＰＬ２４による１２Ｅ６の浮動小数点乗算。（３）ＦＡＬＵ２８による１２Ｅ６の浮動小数点演算。（４）２Ｐ−ＲＡＭ２１とＤ−Ｂｕｓ８、ＤＲ３０を介
した外部データメモリ間でのデータ転送。（５）２チャンネルの全２重シリアルＩ／ＯポートＳＩ
０／１，ＳＯ０／１３２とＤ−Ｂｕｓ８，ＤＲ３０を
介した外部データメモリ間のＤＭＡデータ転送。The DMA controller 19 has a total of two channels of full-duplex serial I / O ports SI0 / 1 and SO0 / 1 3.
The data transfer between the two input / output data and the external data memory is executed independently of the micro instruction. DMA controller 19
For data transfer by D-Bus8, AR20, DR3
Since 0 is used, there is a risk of contention between the micro instruction operation controlled by the instruction decoder 5 and this internal resource. In order to avoid this, when the data is transferred by the DMA controller 19, the instruction decoder 5 is suspended for 6 machine cycles per word, and the operation by the micro instruction is stopped. To summarize the above, the DSSP 1 can execute the following operations in parallel within one microinstruction when executing a microinstruction. (1) Up to three types of 9-bit address calculation by AAU12. (2) 12E6 floating point multiplication by FMPL24. (3) 12E6 floating point arithmetic by FALU28. (4) Data transfer between the 2P-RAM 21 and the external data memory via the D-Bus 8 and DR 30. (5) 2-channel full-duplex serial I / O port SI
DMA data transfer between 0/1, SO0 / 1 32 and external data memory via D-Bus 8 and DR30.

【００１０】次に図８に基づき、ＤＳＳＰ１のマイクロ
命令実行タイミングについて説明する。ＤＳＳＰ１のマ
シンサイクル４０は１マシンサイクルを４つに分割した
Ｐ０〜Ｐ３の４相のタイミングによって動作し、１マシ
ンサイクルのサイクルタイムは公称５０ｎｓｅｃと高速
である。このため、１マシンサイクル内で命令マスクＲ
ＯＭ２からのマイクロ命令読み出し、インストラクショ
ンデコーダ５によるマイクロ命令のデコード、ＦＭＰＬ
２４、ＦＡＬＵ２８等の内部リソースによる命令の実行
の３つの動作を行うことは実状では困難である。そこ
で、ＤＳＳＰ１ではこの３つを各々１マシンサイクル毎
のステージに分割し、３段パイプラインを構成して高速
動作を表現している。この３段パイプラインの各ステー
ジでは以下のことが実行される。（１）フェッチ・ステージ４１ＰＣ１によるマイクロ命令アドレス出力と命令マスクＲ
ＯＭ２からのマイクロ命令読み出し。および、ＩＲ０
３へマイクロ命令セット。（２）デコード・ステージ４２，４３ＩＲ０３からＩＲ１４へのマイクロ命令転送とイン
ストラクションデコーダ５によるマイクロ命令デコー
ド。および、プログラム制御モードのセット。ＩＲ０
３からＰ−Ｂｕｓ６へのマイクロ命令転送とＡＭ９、Ａ
Ｄ１０を介したＡＡＵ１２のアドレス演算。（３）実行ステージ４４，４５，４６，４７ＦＭＰＬ２４，ＦＡＬＵ２８によるデータ演算。Ｄ−Ｂ
ｕｓ８によるデータ転送。ＡＲ２０，ＤＲ３０を介した
外部データメモリ・アクセス等。これにより、ＤＳＳＰ
１は１マイクロ命令の実行に３マシンサイクルを必要と
する。しかし、パイプライン手法により等価的に１マシ
ンサイクル毎に１マイクロ命令の実行が可能となる。こ
のため、命令マスクＲＯＭ２からマイクロ命令を読み出
す時点から実際に命令を実行する時点まで２マシンサイ
クルの遅延を生じる。内部リソースにおけるタイミング
競合を完全に防止する目的で内部バスをＰ−Ｂｕｓ６，
Ｄ−Ｂｕｓ８に分離し、これに伴って命令マスクＲＯＭ
２と２Ｐ−ＲＡＭ２１を分離した構成を取るのはこのた
めによる。しかし、分枝命令等では実際に分枝するのは
（２）のデコードステージであるためその時点でＩＲ０
３へセット中のマイクロ命令は実行されてしまう。す
なわち、分枝命令の次に書かれた命令は無条件に実行さ
れてしまうこととなる。これを避ける目的でＤＳＳＰ１
では分枝命令を実行中は次の命令をＮＯＰ（ノーオペレ
ーション）へ自動的に変更することとしている。この機
能はマイクロ命令記述の簡単化をねらったものであるが
分枝動作では１マシンサイクルのロスが生じ、更にＤ−
Ｂｕｓ８を用いた間接分枝では２マシンサイクルのロス
を生じる。一般に命令記述の順序を考慮することによっ
て約８０％程度の無条件分枝は次命令を実行しても問題
が生ぜず、前記ロスの回避は可能であるがＤＳＳＰ１で
はこれが不可能である。Next, the microinstruction execution timing of the DSSP1 will be described with reference to FIG. The machine cycle 40 of the DSSP1 operates with the timing of four phases P0 to P3 obtained by dividing one machine cycle into four, and the cycle time of one machine cycle is nominally 50 nsec, which is a high speed. Therefore, the instruction mask R within one machine cycle
Micro instruction reading from OM2, micro instruction decoding by instruction decoder 5, FMPL
In reality, it is difficult to perform the three operations of executing instructions by internal resources such as 24 and FALU28. Therefore, in the DSPP1, these three are divided into stages for each one machine cycle, and a three-stage pipeline is configured to express a high speed operation. The following is executed in each stage of this three-stage pipeline. (1) Fetch stage 41 Micro instruction address output by PC1 and instruction mask R
Read micro instruction from OM2. And IR0
Micro instruction set to 3. (2) Decode stages 42 and 43 Micro instruction transfer from IR0 3 to IR1 4 and micro instruction decoding by the instruction decoder 5. And a set of program control modes. IR0
3 to P-Bus6 micro instruction transfer and AM9, A
Address operation of AAU12 via D10. (3) Execution stage 44, 45, 46, 47 Data calculation by the FMPL 24, FALU 28. D-B
Data transfer by us8. External data memory access etc. via AR20, DR30. This allows the DSSP
1 requires 3 machine cycles to execute 1 microinstruction. However, the pipeline method makes it possible to execute one micro instruction equivalently every one machine cycle. Therefore, there is a delay of 2 machine cycles from the time when the micro instruction is read from the instruction mask ROM 2 to the time when the instruction is actually executed. The internal bus is set to P-Bus 6, for the purpose of completely preventing timing conflict in internal resources.
Separated into D-Bus 8 and instruction mask ROM
This is why the 2 and 2P-RAMs 21 are separated. However, in the branching instruction, etc., the branch actually occurs at the decode stage of (2), so at that point IR0
The microinstruction being set to 3 will be executed. That is, the instruction written after the branch instruction is unconditionally executed. DSSP1 to avoid this
Then, while the branch instruction is being executed, the next instruction is automatically changed to NOP (no operation). This function aims at simplification of microinstruction description, but branching operation causes a loss of one machine cycle, and further D-
Indirect branching using Bus8 causes a loss of 2 machine cycles. Generally, considering the order of instruction description, unconditional branching of about 80% does not cause a problem even when the next instruction is executed, and the loss can be avoided, but this is impossible in the DSSP1.

【００１１】次に、図９に基づきＤＳＳＰ１のマイクロ
命令セットについて説明する。マイクロ命令のセットは
シーケンス、モード、演算、ロード命令の４種のみであ
る。シーケンス命令は分枝、ループ、サブルーチンコー
ルを制御するものであり主にＰＣ１に対する命令を担当
する。モード命令はＡＡＵ１２セレクタ１６、ＬＣ１
７、ＳＲ１８、ＤＭＡ制御部１９に対する初期値および
モード設定を行う命令である。ロード命令はＢＩ７を介
してＤ−Ｂｕｓ８に接続されたレジスタに即値（１８ビ
ット幅）をロードする命令である。以上のマイクロ命令
ではその操作対象となるリソースが命令動作によって一
定となる。一方、演算命令に関しては前述の並列動作可
能な内部リソースの全てを直接指示する必要がある。こ
のため、演算命令のビット長が最多となり、ＤＳＳＰ１
は３２ビット幅の水平マイクロ命令を使用している。こ
こでＦＭＰＬ２４はフリーランとし、前述の様に命令で
直接指示を行わない。ＦＡＬＵ２８に対する動作指定は
命令で直接指示を行い、例えば以下のものがある。（１）絶対値１Ｘ１（２）符号相関Ｓｉｇｎ（Ｙ）・Ｘ（３）加算Ｘ＋Ｙ（４）減算Ｘ−Ｙ（５）最大値ＭＡＸ（Ｘ，Ｙ）（６）最小値ＭＩＮ（Ｘ，Ｙ）（７）固定→浮動変換ＦＬＴ（Ｘ）（８）浮動→固定変換ＦＩＸ（Ｘ）（９）シフトＲ１，Ｌ１〜Ｌ８（１０）論理ＡＮＤ，ＯＲ，ＥＯＲ，ＮＯＴ（１１）仮数加算Ｘ_M ＋Ｙ_M （１２）仮数減算Ｘ_E ＋Ｙ_E ここで問題となるのは、ＤＳＳＰ１では浮動小数点演算
を基本とし、論理・アドレス演算を行う場合に固定小数
点演算となる点である。前述の様に両者には互換性はな
く、例えば演算結果によってメモリのアドレス指定を行
う場合、ＦＡＬＵ２８において（８）の命令を実行する
必要がある。また、一般の信号処理では浮動小数点でデ
ータの入出力を行うことはあまりしないため、データ入
出力毎に（７）ないし（８）の命令を実行し、データ変
換を行う必要がある。Next, the microinstruction set of the DSSP1 will be described with reference to FIG. There are only four types of microinstruction sets: sequence, mode, operation, and load instruction. The sequence instruction controls a branch, a loop, and a subroutine call, and is mainly in charge of an instruction to the PC 1. Mode command is AAU12 selector 16, LC1
7, SR 18, and DMA control unit 19 are instructions for setting initial values and modes. The load instruction is an instruction to load an immediate value (18-bit width) into a register connected to D-Bus 8 via BI7. In the above microinstruction, the resource to be operated becomes constant depending on the instruction operation. On the other hand, as for the operation instruction, it is necessary to directly instruct all of the internal resources capable of operating in parallel. Therefore, the bit length of the arithmetic instruction becomes the maximum, and the DSPP1
Uses a 32-bit wide horizontal microinstruction. Here, the FMPL 24 is set to free run, and as described above, it is not directly instructed by an instruction. The operation designation to the FALU 28 is directly given by an instruction, and there are the following, for example. (1) Absolute value 1X1 (2) Sign correlation Sign (Y) · X (3) Addition X + Y (4) Subtraction XY (5) Maximum value MAX (X, Y) (6) Minimum value MIN (X, Y ) (7) Fixed to floating conversion FLT (X) (8) Floating to fixed conversion FIX (X) (9) Shift R1, L1 to L8 (10) Logical AND, OR, EOR, NOT (11) Mantissa addition X _M + Y _M (12) Mantissa subtraction X _E + Y _E The problem here is that the DSPP1 is based on floating-point arithmetic, and fixed-point arithmetic is used when performing logical / address arithmetic. As described above, the two are not compatible with each other. For example, when addressing the memory by the operation result, it is necessary to execute the instruction (8) in the FALU 28. Further, in general signal processing, data is not input / output in floating point very often, so it is necessary to execute the instructions (7) to (8) for each data input / output to perform data conversion.

【００１２】次に問題となるのは浮動小数点データを正
規化する際に常にビットの切り捨てを行うことである。
信号処理プロセッサでは演算精度が有限であるために当
然演算誤差を伴う。しかし、ビットの切り捨てのみでこ
れに対応する場合、演算結果が常に絶対値を取った場合
を考えると真値よりも小となることとなり、誤差がラン
ダム化されない。これは演算語長を拡大することで容易
に無視できる程の量とすることが可能であるが、通常の
信号処理プロセッサでは高速動作を要求されるためにこ
れには限界がある。この様な問題は特にＩＩＲ型ディジ
タルフィルタ（巡回型）、フレーム間処理を行う画像信
号処理では無視できず、ＤＳＳＰ１においては処理結果
を論理演算命令等によって丸め（四捨五入）することが
必要となる。更に、一般の信号処理アルゴリズムでは演
算精度が単位処理毎に種々規定されることが多く、その
精度は必ずしも信号処理プロセッサの演算語長とは一致
しない。この場合には単位処理毎に演算データのフォー
マット変換をＦＡＬＵ２８を用いてくり返すこととな
る。The second problem is that bits are always truncated when normalizing floating point data.
Since the signal processing processor has a finite calculation accuracy, it naturally has a calculation error. However, if this is dealt with only by truncating bits, considering that the operation result always takes an absolute value, it becomes smaller than the true value, and the error is not randomized. This can be made an amount that can be easily ignored by increasing the operation word length, but there is a limit to this because a normal signal processor requires high-speed operation. Such a problem cannot be ignored particularly in the IIR type digital filter (recursive type) and image signal processing for performing inter-frame processing, and it is necessary to round (round) the processing result in the DSPP1 by a logical operation instruction or the like. Further, in a general signal processing algorithm, the calculation accuracy is often specified for each unit processing, and the accuracy does not always match the calculation word length of the signal processing processor. In this case, the format conversion of the calculation data is repeated using the FALU 28 for each unit processing.

【００１３】次に問題となるのは、ＤＳＰ１では高速処
理可能な演算が前述の積和演算のみに限定されることで
ある。これは旧来の代表的な信号処理アルゴリズムであ
るＦＦＴ，ＦＩＲフィルタでは十分なものであった。し
かし近年の信号処理アルゴリズムではベクトルＡ→とＢ
→の近似度すなわち距離計算、例えば以下の式で表わさ
れるもの等も高速処理することが要求される。The next problem is that the DSP 1 is limited to only the above-described product-sum operation as the operations that can be processed at high speed. This is sufficient for the FFT and FIR filters, which are typical conventional signal processing algorithms. However, in recent signal processing algorithms, vectors A → and B
High-speed processing is required for the degree of approximation of →, that is, distance calculation, for example, the one expressed by the following equation.

【００１４】[0014]

【数２】 [Equation 2]

【００１５】この様な演算はＤＳＳＰ１ではサポートで
きず、全て単一の四則演算に分解して処理する必要があ
るため１項の算出に３積の別々の演算を実行しなくては
ならない。この時、１項毎に上式の結果を算出すると遅
延のため１項当り３×３＝９命令を必要とし、処理多重
度が極度に低下する。もちろん２Ｐ−ＲＡＭ２１を使用
して中間結果をセーブすることで差分＋自乗累算という
分類によって多重度を上げることができるが、限られた
データメモリ空間を有効に利用することが困難となり、
多重のデータを処理できない。Such an operation cannot be supported by the DSSP1, and all four arithmetic operations need to be decomposed and processed. Therefore, three operations must be performed separately to calculate one term. At this time, if the result of the above equation is calculated for each term, 3 × 3 = 9 instructions are required for each term due to delay, and the processing multiplicity is extremely reduced. Of course, by using the 2P-RAM 21 to save the intermediate result, the multiplicity can be increased by the classification of difference + square accumulation, but it becomes difficult to effectively use the limited data memory space,
Cannot handle multiple data.

【００１６】例えば図１０に示す様な２進木探索を行う
場合を考える。ここで、２Ｐ−ＲＡＭ２１上には入力ベ
クトルＡ→がセットされ、図中で番号付けされた各ノー
ドには木状に構造化された参照ベクトルＢ→が外部デー
タメモリに図１１に示す様に配置されているものとす
る。入力ベクトルＡ→と参照ベクトルＢ→との間に近似
度を表わす評価関数はConsider, for example, the case of performing a binary tree search as shown in FIG. Here, the input vector A → is set on the 2P-RAM 21, and the reference vector B → structured in a tree shape is assigned to each numbered node in the external data memory as shown in FIG. It is supposed to be arranged. The evaluation function representing the degree of approximation between the input vector A → and the reference vector B → is

【００１７】[0017]

【数３】 [Equation 3]

【００１８】とし、この結果が最小となるものを各段で
２進木状に選択し、最終的に最も近似度の高い参照ベク
トルを得るものである。この時、各段の参照ベクトルＢ
→は現時点のノード番号がｎの場合、２ｎ＋１と２ｎ＋
２のノードの２つの参照ベクトルＢ→との間で近似度を
求めてその結果から次段で比較する参照ベクトルのノー
ド番号を算出する。この処理をＤＳＳＰ１で実現した場
合は以下の命令ステップ数を必要とする。・入力データの変換Ｎ＋２ステップ・１ベクトルの評価値算出９Ｎ＋２ステップ・評価値の丸め約３ステップ・評価値の比較４ステップ・次ノードの参照ベクトルアドレス算出約９ステップ計１８Ｎ＋１４ステップ１段＋Ｎ＋２ステップこれは評価値算出に要するステップの理想値を２Ｎステ
ップとし、アドレスと入力データの変換が不要であった
場合の約９倍のステップ数となる。更に、この様な処理
の場合、同一処理が連続しないこととなるため、常に命
令の前後関係を意識する必要がある。このため、処理効
率が大幅に劣化するのみならず、プログラム作成が非常
に煩雑となり、ソフトウェア開発の工数上も問題となる
のは明らかである。Then, the one that minimizes this result is selected in a binary tree shape at each stage, and finally the reference vector with the highest degree of approximation is obtained. At this time, the reference vector B of each stage
→ indicates 2n + 1 and 2n + when the current node number is n
The degree of approximation is calculated between the two reference vectors B → of the second node, and the node number of the reference vector to be compared in the next stage is calculated from the result. When this processing is realized by DSSP1, the following instruction step number is required.・ Input data conversion N + 2 steps ・ 1 vector evaluation value calculation 9N + 2 steps ・ Evaluation value rounding about 3 steps ・ Evaluation value comparison 4 steps ・ Next node reference vector address calculation about 9 steps 18N + 14 steps 1 stage + N + 2 steps Is an ideal value of 2N steps required for the evaluation value calculation, which is about 9 times the number of steps when the conversion between the address and the input data is unnecessary. Furthermore, in the case of such processing, the same processing is not continuous, so it is necessary to always be aware of the context of the instructions. For this reason, not only the processing efficiency is significantly deteriorated, but also the program creation becomes very complicated, which obviously causes a problem in terms of man-hours for software development.

【００１９】[0019]

【発明が解決しようとする課題】従来のディジタル信号
処理プロセッサは以上の様に構成されているので例えば
以下の様な問題点があった。・２入力・１出力演算全て
をデータメモリから同時に読みだし／書き込みを行うこ
とができず例えばベクトルデータの処理では効率が極度
に劣化する。・間接アドレスのモード指定が命令中で即
時にできず、アドレスのモード変更を行う毎に処理を中
断する必要がある。・外部メモリ等の外部モジュールに
対するデータ転送を高速に行えないので、他のモジュー
ルのデータ転送にともなう時間がかかりすぎる。Since the conventional digital signal processor is constructed as described above, it has the following problems, for example. All the 2-input / 1-output operations cannot be read / written simultaneously from the data memory, and the efficiency is extremely deteriorated, for example, in the processing of vector data. -The mode of the indirect address cannot be specified immediately in the instruction, and the process must be interrupted every time the address mode is changed. -Since data transfer to an external module such as an external memory cannot be performed at high speed, it takes too much time for data transfer of other modules.

【００２０】この発明は上記のような問題点を解消する
ためになされたもので、柔軟性に富み、簡易な装置構成
で他のモジュールに対してデータ転送を高速に行えるデ
ィジタル信号処理プロセッサを得ることを目的とする。The present invention has been made to solve the above problems, and provides a digital signal processor capable of high-speed data transfer to other modules with a flexible and simple device configuration. The purpose is to

【００２１】[0021]

【課題を解決するための手段】第１の発明に係るディジ
タル信号処理プロセッサはデータ入力バスからデータを
読み込む、外部に設けられた外部データメモリと、この
外部データメモリに対してデータを書き込む外部データ
メモリ接続部と、上記外部データメモリ接続部の読み出
し／書き込みポートと上記外部データメモリとを接続す
る直接メモリ転送バスと、この直接メモリ転送バスを介
し、上記外部データメモリ接続部と内部データメモリと
の間でブロック単位でデータの入出力の制御を行う直接
メモリ転送制御部とを備えたものである。A digital signal processor according to a first aspect of the present invention reads external data from a data input bus and external data memory provided externally, and external data to write data to the external data memory. A memory connection unit, a direct memory transfer bus connecting the read / write port of the external data memory connection unit and the external data memory, and the external data memory connection unit and the internal data memory via the direct memory transfer bus. And a direct memory transfer control unit for controlling input / output of data in block units.

【００２２】第２の発明に係るディジタル信号処理プロ
セッサは、上記外部データメモリ接続部に対するアドレ
ス指示をｍ行×ｎ列（ｍ，ｎは正の整数）の２次元デー
タアドレス空間中のｋ行×L 列（ｋ，L は正の整数）の
矩形部分を指示して、上記外部データメモリに対する任
意の開始アドレスを指示し、外部データメモリと上記内
部データメモリとの間での２次元データ転送を行うもの
である。In the digital signal processor according to the second aspect of the present invention, the address instruction to the external data memory connection section is given by k rows in a two-dimensional data address space of m rows × n columns (m and n are positive integers) × By designating the rectangular part of the L column (k and L are positive integers), designating an arbitrary start address for the external data memory, and performing two-dimensional data transfer between the external data memory and the internal data memory. It is something to do.

【００２３】第３の発明に係るディジタル信号処理プロ
セッサは、上記ｋ行×L 列の矩形ブロック単位に外部デ
ータメモリとのデータ入出力と内部演算処理を並列に行
うものである。A digital signal processor according to a third aspect of the invention performs data input / output with an external data memory and internal arithmetic processing in parallel in units of rectangular blocks of k rows × L columns.

【００２４】第４の発明に係るディジタル信号処理プロ
セッサは、外部データメモリを２分し、この一方をアド
レスする場合には１マシンサイクルで読み出し／書き込
みを完了する高速メモリとし、他方をアドレスする場合
には外部からの読み出し／書き込み完了信号が検知され
るまで待機する低速メモリとしたものである。In the digital signal processor according to the fourth aspect of the present invention, the external data memory is divided into two, and when one of them is addressed, it is a high speed memory which completes read / write in one machine cycle, and when the other is addressed. Is a low-speed memory that waits until a read / write completion signal from the outside is detected.

【００２５】[0025]

【作用】上記のように構成された第１の発明のディジタ
ル信号処理プロセッサは、直接メモリ転送バスを介し外
部データメモリ接続部と内部データメモリとの間でブロ
ック単位でデータの入出力の制御を行うことによりデー
タの高速転送が可能になる。The digital signal processor of the first aspect of the present invention configured as described above controls the input / output of data in block units between the external data memory connection section and the internal data memory via the direct memory transfer bus. By doing so, high speed transfer of data becomes possible.

【００２６】上記のように構成された第２の発明のディ
ジタル信号処理プロセッサは、外部データメモリ接続部
に対するアドレス指示をｍ行×ｎ列（ｍ，ｎは正の整
数）の２次元データアドレス空間中のｋ行×L 列（ｋ，
L は正の整数）の矩形部分を指示する。そして、上記外
部データメモリに対する任意の開始アドレスを指示し、
外部データメモリと上記内部データメモリとの間での２
次元データ転送を行うことが可能である。The digital signal processor according to the second aspect of the present invention having the above-described configuration provides the address designation to the external data memory connection unit in a two-dimensional data address space of m rows × n columns (m and n are positive integers). K rows × L columns (k,
L is a positive integer) indicating the rectangular part. Then, specify an arbitrary start address for the external data memory,
2 between the external data memory and the internal data memory
It is possible to perform dimensional data transfer.

【００２７】上記のように構成された第３の発明のディ
ジタル信号処理プロセッサはデータ入出力と内部演算処
理を並列に行うことができる。The digital signal processor of the third invention configured as described above can perform data input / output and internal arithmetic processing in parallel.

【００２８】上記のように構成された第４の発明のディ
ジタル信号処理プロセッサは、外部データメモリを２分
し、一方を高速メモリとして他方を低速メモリとしたこ
とによりデータ転送の効率を図っている。In the digital signal processor according to the fourth aspect of the present invention configured as described above, the external data memory is divided into two, one of which is a high speed memory and the other of which is a low speed memory, thereby improving the efficiency of data transfer. .

【００２９】[0029]

【実施例】以下、この発明の一実施例を図について説明
する。図１はこの発明によるディジタル信号処理プロセ
ッサの概略を示すブロック図であり、図において、１０
０は外部拡張マイクロ命令メモリへ接続するための外部
プログラム・バス、１０１は内部に実装された書き込み
可能命令メモリＷＣＳ、１０２は外部プログラム・バス
１００又は書き込み可能命令メモリＷＣＳ１０１から読
み出されるマイクロ命令を入力し、命令実行パイプライ
ンにおいて所定の動作制御を行うシーケンス制御部、１
０３はデータメモリに対する２入力・１出力アドレスを
並列に生成するアドレス生成部、１０４はこの２入力・
１出力データを並列に転送するため、に備えられた各々
２４ビットの幅を有する３本の内部データバス、１０５
はこの３本の内部データバス１０４中の１つを選択し、
外部データバス１１１に接続する外部データメモリＩ／
Ｆ、１０６は３本の内部データバス１０４に接続され、
所定の演算を行う演算部、１０７は１本の読み出しポー
トと１本の読み出し／書き込みポートを備え、内部デー
タバス１０４に接続された内部データメモリＭ０、１０
８は同様に内部データメモリＭ１、１０９は外部データ
メモリアドレス生成器と内部データメモリアドレス生成
器を独自に備えたＤＭＡ制御部、１１０は外部データバ
ス１１１と内部データメモリＭ０１０７ないし内部デ
ータメモリＭ１１０８との間のＤＭＡ転送を行うＤＭ
Ａバス、１１１は外部の拡張データメモリに接続する外
部データバス、１１２はシーケンス制御部１０２へ外部
からリセット信号を入力するリセット端子、１１３は同
様に外部から割込制御信号を入力する割込端子である。DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing an outline of a digital signal processor according to the present invention.
0 is an external program bus for connecting to an external extended microinstruction memory, 101 is a writable instruction memory WCS mounted inside, and 102 is a microinstruction read from the external program bus 100 or the writable instruction memory WCS 101. And a sequence control unit for performing predetermined operation control in the instruction execution pipeline, 1
Reference numeral 03 designates an address generator for generating two input / one output addresses in parallel to the data memory, and 104 designates the two input
Three internal data buses, each having a width of 24 bits, provided for transferring one output data in parallel, 105
Selects one of the three internal data buses 104,
External data memory I / connected to external data bus 111
F and 106 are connected to three internal data buses 104,
An arithmetic unit 107 for performing a predetermined arithmetic operation includes an internal data memory M0, 10 provided with one read port and one read / write port and connected to the internal data bus 104.
Similarly, 8 is an internal data memory M1, 109 is a DMA control unit independently provided with an external data memory address generator and an internal data memory address generator, and 110 is an external data bus 111 and an internal data memory M0 107 to internal data memory M1. DM for performing DMA transfer with 108
A bus, 111 is an external data bus connected to an external extended data memory, 112 is a reset terminal for inputting an external reset signal to the sequence control unit 102, and 113 is an interrupt terminal for similarly inputting an external interrupt control signal. Is.

【００３０】図２は図１における演算部１０６の構成例
を示すブロック図であり、図において、１２０は３本の
内部データバス１０４中の被演算データを転送するＸ−
バス、１２１は同様に演算データを転送するＹ−バス、
１２２は同様に出力データを転送するＺ−バス、１２３
は１マシンサイクルで入力データを所定のビット数シフ
ト／ローテートを行う２４ビット語長のバレルシフタＢ
−ＳＦＴ、１２４は１マシンサイクルで所定の算術論理
演算または差分絶対値の算出を行う２４ビット語長の算
術論理演算器ＡＬＵ、１２５は１マシンサイクルで２４
ビットの乗算を行い４７ビットの結果を出力する乗算器
ＭＰＹ、１２６は算術論理演算器ＡＬＵ１２４の差分出
力を一時保持し、乗算器ＭＰＹ１２５の自乗入力ポート
へ出力することで差分自乗を算出するためのデータ・パ
イプラインレジスタＤＰＲ０、１２７はバレルシフタＢ
−ＳＦＴ１２３の２４ビット出力または算術論理演算器
ＡＬＵ１２４の２４ビット出力の一方を選択し、データ
・パイプラインレジスタＤＰＲ１、１２９へ出力するマ
ルチプレクサ、１２８は乗算器ＭＰＹ１２５の４７ビッ
ト出力を一時保持するデータ・パイプラインレジスタＤ
ＰＲ２、１２９はマルチプレクサ１２７の２４ビット出
力を一時保持するデータ・パイプラインレジスタＤＰＲ
１、１３０はデータ・パイプラインレジスタＤＰＲ１
１２９からの２４ビットデータまたはデータ・パイプラ
インレジスタＤＰＲ２１２８の４７ビットデータの一
方を選択して入力し、１／２マシンサイクルで所定の桁
数調整を行った後２４ビットデータとして出力する正規
化用バレルシフタＮ−ＳＦＴ、１３１はこの正規化用バ
レルシフタＮ−ＳＦＴ１３０の２４ビット出力、１３２
はワーキングレジスタＷ_r １３５からの２４ビット累算
用出力、１３３は累算／丸め用加算器ＡＵ、１３４はこ
の累算／丸め用加算器ＡＵ１３３の２４ビット結果出
力、１３５は２４ビット×８ワード構成のワーキングレ
ジスタＷ_r 、１３６は算術論理演算器ＡＬＵのフラグ出
力、１３７はこのフラグ出力１３６を条件テストするフ
ラグチェック回路、１３８はこのフラグチェック回路の
出力である１ビットの真偽判定結果を順次記憶する２４
×１ビットの条件テストシフトレジスタｔｃｓｒ、１３
９は正規化用バレルシフタＮ−ＳＦＴ１３０においてＬ
ＳＢ方向、すなわち右シフトを指示した場合にシフトア
ウトされた最上位のビットをそのまま出力する１ビット
のキャリーである。FIG. 2 is a block diagram showing a configuration example of the arithmetic unit 106 in FIG. 1. In the figure, 120 is an X- for transferring the data to be operated in the three internal data buses 104.
A bus, 121 is a Y-bus for transferring calculation data in the same manner,
122 is a Z-bus for transferring output data similarly, 123
Is a 24-bit word length barrel shifter B that shifts / rotates the input data by a predetermined number of bits in one machine cycle.
-SFT, 124 is a 24-bit word length arithmetic logic unit ALU for calculating a predetermined arithmetic logic operation or difference absolute value in 1 machine cycle, 125 is 24 in 1 machine cycle
Multipliers MPY and 126 that perform bit multiplication and output a 47-bit result temporarily hold the difference output of the arithmetic logic operation unit ALU124 and output it to the square input port of the multiplier MPY125 to calculate the difference square. Data pipeline registers DPR0 and 127 are barrel shifters B
-A multiplexer that selects one of the 24-bit output of the SFT123 or the 24-bit output of the arithmetic and logic unit ALU124 and outputs it to the data pipeline registers DPR1 and 129, and 128 is a data that temporarily holds the 47-bit output of the multiplier MPY125. Pipeline register D
PR2 and 129 are data pipeline registers DPR for temporarily holding the 24-bit output of the multiplexer 127.
1, 130 are data pipeline registers DPR1
Normalization in which one of 24-bit data from 129 or 47-bit data of the data pipeline register DPR2 128 is selected and input, a predetermined number of digits is adjusted in 1/2 machine cycle, and then output as 24-bit data For barrel shifter N-SFT, 131 is the 24-bit output of this normalizing barrel shifter N-SFT 130, 132
Is a 24-bit accumulation output from the working register W _r 135, 133 is an accumulation / rounding adder AU, 134 is a 24-bit result output of this accumulation / rounding adder AU133, and 135 is 24 bits × 8 words The working register W _{r of the} configuration, 136 is a flag output of the arithmetic and logic unit ALU, 137 is a flag check circuit for conditionally testing the flag output 136, and 138 is a 1-bit true / false determination result output from the flag check circuit. Sequentially store 24
× 1 bit conditional test shift register tcsr, 13
9 is L in the normalizing barrel shifter N-SFT130
It is a 1-bit carry that outputs the most significant bit shifted out in the SB direction, that is, when the right shift is instructed.

【００３１】図３は図１に示したディジタル信号処理プ
ロセッサの内部データメモリと内部データバスの関係を
説明する図であり、１４０は内部データメモリＭ０１
０７の読み出しポートからの２４ビットデータをＸ−バ
ス１２０ないしＹ−バス１２１の一方へ出力するデマル
チプレクサ、１４１は内部データメモリＭ１１０８の
読み出しポートからの２４ビットデータをＸ−バス１２
０ないしＹ−バス１２１の一方へ出力するデマルチプレ
クサ、１４２はＺ−バス１２２ないしＤＭＡバス１１０
の書き込みデータの一方を選択して内部データメモリＭ
０１０７の読み出し／書き込みポートへ出力するマル
チプレクサ、１４３は同様にＺ−バス１２２ないしＤＭ
Ａバス１１０の書き込みデータの一方を選択して内部デ
ータメモリＭ１１０８の読み出し／書き込みポートへ
出力するマルチプレクサ、１４４は書き込みアドレスＤ
アドレス１４７とＤＭＡ制御部１０９からの内部データ
メモリアドレスＩアドレス１４８を内部データメモリＭ
０１０７ないし内部データメモリＭ１１０８の読み
出し／書き込みポートのいずれかへ選択して出力するア
ドレス用２−２セレクタ、１４５は内部データメモリＭ
０１０７の読み出しポートアドレスであるＳ０アドレ
ス、１４６は内部データメモリＭ１１０８の読み出し
ポートアドレスであるＳ１アドレス、１４７は内部デー
タメモリＭ０１０７ないし内部データメモリＭ１１０
８に対する書き込みアドレス、１４８はＤＭＡバス１１
０から転送されるデータに対応する内部データメモリア
ドレスであるＩアドレスである。FIG. 3 is a diagram for explaining the relationship between the internal data memory and the internal data bus of the digital signal processor shown in FIG. 1, and 140 is the internal data memory M01.
A demultiplexer 141 outputs 24-bit data from the read port of 07 to one of the X-bus 120 to Y-bus 121. Reference numeral 141 represents the 24-bit data from the read port of the internal data memory M1 108 to the X-bus 12.
Demultiplexer for outputting to one of 0 to Y-bus 121, and 142 for Z-bus 122 to DMA bus 110
Select one of the write data of the internal data memory M
0-107 read / write port multiplexer, 143 is also a Z-bus 122 to DM.
A multiplexer 144 for selecting one of the write data of the A bus 110 and outputting it to the read / write port of the internal data memory M1 108 is a write address D
The address 147 and the internal data memory address I address 148 from the DMA control unit 109 are transferred to the internal data memory M.
0 107 to the internal data memory M1 108 and the read / write port of the address 2-2 selector 145 for selecting and outputting the internal data memory M1 108.
0 107 is the read port address S0 address, 146 is the read port address of the internal data memory M1 108 S1 address, 147 is the internal data memory M0107 to internal data memory M1 10
Write address for 8 and 148 for DMA bus 11
It is an I address which is an internal data memory address corresponding to data transferred from 0.

【００３２】図４は図１中のアドレス発生部１０３の構
成を説明する図であり、１５０はシーケンス制御部１０
２へ入力されたマイクロ命令中の即値で示すディスプレ
ーメントデータ、１５１は２４ビット×４ワードのアド
レスレジスタＡＲ、１５２は１２ビット×４ワードのイ
ンデックス修飾レジスタＩＸＲ、１５３はアドレスレジ
スタＡＲ１５１とＸ−バス１２０のデータ入出力パス、
１５４はインデックス修飾レジスタＩＸＲ１５２とＸ−
バス１２０のデータ入出力パス、１５５は２４ビット語
長のアドレス加算器、１５６は３系統独立に備えたアド
レス生成器ＡＧＵ、１５７は２４ビットのアクセスの書
き込みアドレスを１マシンサイクル遅延させる書き込み
アドレスパイプラインレジスタＤＡＰＲ３、１５８は同
様に書き込みアドレスパイプラインレジスタＤＡＰＲ４
である。FIG. 4 is a diagram for explaining the configuration of the address generator 103 in FIG. 1, and 150 is the sequence controller 10.
2 is the display data indicated by the immediate value in the microinstruction input to 2; 151 is a 24 bit × 4 word address register AR; 152 is a 12 bit × 4 word index modification register IXR; 153 is an address register AR151 and X-bus. 120 data input / output paths,
154 is an index modification register IXR152 and X-
A data input / output path of the bus 120, 155 is an address adder having a 24-bit word length, 156 is an address generator AGU independently provided in three systems, and 157 is a write address pipe for delaying a write address of a 24-bit access by one machine cycle. The line registers DAPR3 and 158 are similarly write address pipeline registers DAPR4.
Is.

【００３３】図５は図１に示したディジタル信号処理プ
ロセッサの５ステージで構成された命令実行パイプライ
ンを説明する図であり、１６０は４相で構成されるマシ
ンサイクル、１６１はフェッチステージ、１６２はデコ
ードステージ、１６３はデコードステージ後半のアドレ
ス更新タイミング、１６４は読み出しステージ、１６５
は実行ステージ、１６６は書き込み／累算ステージ前半
の正規化用タイミング、１６７は書き込み／累算ステー
ジである。FIG. 5 is a diagram for explaining an instruction execution pipeline composed of 5 stages of the digital signal processor shown in FIG. 1. 160 is a machine cycle composed of 4 phases, 161 is a fetch stage, and 162 is a machine cycle. Is a decode stage, 163 is an address update timing in the latter half of the decode stage, 164 is a read stage, 165
Is an execution stage, 166 is a normalization timing in the first half of the write / accumulation stage, and 167 is a write / accumulation stage.

【００３４】図６は図１に示すディジタル信号処理プロ
セッサのマイクロ命令セット例の一部を示す図であり、
図において、１７０はロード命令、１７１は分枝命令、
１７２は１ソース演算命令、１７３は２ソース演算命
令、１７４はソース指示コード、１７５はデスティネー
ション指示コード、１７６はソース０指示コード、１７
７はソース１指示コードである。FIG. 6 is a diagram showing a part of an example of a microinstruction set of the digital signal processor shown in FIG.
In the figure, 170 is a load instruction, 171 is a branch instruction,
172 is a 1-source operation instruction, 173 is a 2-source operation instruction, 174 is a source instruction code, 175 is a destination instruction code, 176 is a source 0 instruction code, 17
Reference numeral 7 is a source 1 instruction code.

【００３５】次に動作について説明する。以下、同様に
各部の名称は上記説明で用いた略称を用いる。先ず、図
１より、全体の概略動作を説明する。本発明によるディ
ジタル信号処理プロセッサは従来例と同様プログラムバ
ス１００とデータバス１０４が分離された構成を持ち、
シーケンス制御部１０２へのマイクロ命令入力、データ
バス１０４を介した演算部１０６のデータ入出力、アド
レス生成部１０３による２入力・１出力データアドレス
の並列生成、内部データメモリＭ０１０７、Ｍ１１
０８ないし外部データメモリＩ／Ｆ１０５による外部デ
ータメモリのアクセスをマイクロ命令によって並列に実
行する。更に、ＤＭＡ制御部１０９によりＤＭＡバス１
１０を介しこの内部動作と独立に内部データメモリＭ０
１０７、Ｍ１１０８と外部データメモリＩ／Ｆ１０５
との間でデータのＤＭＡ転送を実行する。ここで、各実
行ユニットは従来例と同様にレジスタベースである。本
プロセッサでは大半の命令で遅延動作形式としないた
め、命令実行パイプライン中に、データの入出力ステー
ジを含めている。従って、例えば演算部１０６において
加算を行う場合を考えると、入力、出力も含め、１ステ
ップのマイクロ命令によって加算命令を実行すれば良
い。このため、種々の演算を組み合せたプログラムでも
等価的に１マシンサイクルで１マイクロ命令の実行が可
能である。但し、命令実行結果を使用できるのは次命令
の読み出しステージとのステージ数差に対応する３命令
ステップ後からである。本プロセッサではこれによるロ
スを避ける意味を含め結果を直ちに使用する必要のある
ものの大半を複合演算とし、１命令で対応させている。
このため、大半のプログラムではこのロスが発生しな
い。演算部１０６、アドレス生成部１０３のデータ語長
とフォーマットは同一であって、完全に互換性を有す
る。このため、テーブルルックアップ、辞書参照等の処
理において、演算結果をデータメモリアドレスに直接換
算することができる。Next, the operation will be described. Hereinafter, similarly, the names of the respective parts use the abbreviations used in the above description. First, the overall schematic operation will be described with reference to FIG. The digital signal processor according to the present invention has a configuration in which the program bus 100 and the data bus 104 are separated as in the conventional example.
Micro instruction input to sequence control unit 102, data input / output of operation unit 106 via data bus 104, parallel generation of 2-input / 1-output data address by address generation unit 103, internal data memories M0 107, M1 1
08 or the external data memory I / F 105 accesses the external data memory in parallel by microinstructions. Further, the DMA control unit 109 causes the DMA bus 1
Independent of this internal operation via the internal data memory M0
107, M1108 and external data memory I / F 105
DMA transfer of data is executed between and. Here, each execution unit is register-based as in the conventional example. Since most of the instructions in this processor do not use the delayed operation format, the instruction input pipeline includes the data input / output stage. Therefore, for example, considering the case where the arithmetic unit 106 performs addition, the addition instruction may be executed by one-step micro instruction including the input and output. Therefore, even a program combining various operations can equivalently execute one micro instruction in one machine cycle. However, the instruction execution result can be used after three instruction steps corresponding to the difference in the number of stages from the read stage of the next instruction. In this processor, most of the results that need to be used immediately, including the meaning of avoiding the loss due to this, are compound operations and are handled by one instruction.
For this reason, most programs do not experience this loss. The data word lengths and formats of the arithmetic unit 106 and the address generation unit 103 are the same, and they are completely compatible. Therefore, in the processing such as table lookup and dictionary reference, the calculation result can be directly converted into the data memory address.

【００３６】次に、図２に基づき演算部１０６の機能を
説明する。Ｂ−ＳＦＴ１２３、ＡＬＵ１２４、ＭＰＹ１
２５は全て１マシンサイクルで動作が可能であり、命令
実行パイプラインステージ中の実行ステージで動作す
る。次ステージである書き込み／累算ステージにおいて
はＮ−ＳＦＴ１３０において桁数調整を行い結果１３１
をＺ−バス１２２へ出力しデータメモリへ書き込みを行
うか、ＡＵ１３３によってＷ_r １３５の内容１３２と累
算ないし丸めを行い再び結果１３４をＷ_r １３５へセッ
トすることができる。ここで、ＤＰＲ１１２９、ＤＰ
Ｒ２１２８は各々次ステージへ結果を転送するレジス
タである。この構成によって例えば複合演算は以下の様
に実行される。積和：ＭＰＹ１２５→ＤＰＲ２１２８→Ｎ−ＳＦＴ１
３０→ＡＵ１３３→Ｗ_r １３５差分絶対値和：ＡＬＵ１２４→ＭＵＸ１２７→ＤＰＲ１
１２９→ＳＦＴ１３０→ＡＵ１３３→Ｗ_r １３５差分自乗和：ＡＬＵ１２４→ＤＰＲ０１２６→ＭＰＹ
１２５→ＤＰＲ２１２８→Ｎ−ＳＦＴ１３０→ＡＵ１
３３→Ｗ_r １３５差分自乗和に関してはＤＰＲ０１２６を用いた遅延動
作となる。しかし、この命令は大半の場合に連続して用
いられるのみであり、これによる問題は無視できる。Next, the function of the arithmetic unit 106 will be described with reference to FIG. B-SFT123, ALU124, MPY1
All of 25 can operate in one machine cycle, and operate in the execution stage of the instruction execution pipeline stage. In the write / accumulation stage, which is the next stage, the number of digits is adjusted in the N-SFT 130, and the result 131
The Z- or output to the bus 122 writing to the data memory can again result 134 performs accumulation or rounding the contents 132 of W _r 135 is set to W _r 135 by AU133. Where DPR1 129, DP
R2 128 is a register that transfers the result to the next stage. With this configuration, for example, a composite operation is executed as follows. Sum of products: MPY125 → DPR2 128 → N-SFT1
30 → AU133 → W _r 135 Sum of absolute differences: ALU124 → MUX127 → DPR1
_{129 → SFT130 → AU133 → W r} 135 sum of squared differences: ALU124 → DPR0 126 → MPY
125 → DPR2 128 → N-SFT130 → AU1
33 → W _r 135 For the sum of squared differences, the delay operation is performed using DPR0 126. However, in most cases this instruction is only used consecutively, and the problem due to this can be ignored.

【００３７】丸めを行う場合、本プロセッサでは以下の
手順による。ＭＳＢＬＳＢ（１） 0000 0000 1111 1111 1010 0111 ：ＤＰＲ１１２９出力２４ビット（２） 0000 0000 0000 0000 1111 1111 ：Ｎ−ＳＦＴ１３０出力（右８ビットシフト） 1 ：キャリ１３９（３） 0000 0000 0000 0001 0000 0000 ：ＡＵ１３３出力１３４キャリ加算を行う。すなわち、Ｎ−ＳＦＴ１３０でシフトアウトされるデー
タの最上位ビットをキャリとし、ＡＵ１３３においてキ
ャリ加算を実行することで丸め処理を行える。このた
め、丸めた結果の出力先はＷ_r １３５のみに限定され
る。次に、フラグチェック回路１３７はＡＬＵ１２４で
比較動作を行った結果のフラグ１３６をマイクロ命令で
指示される条件コードに従い、条件が成立したか否かを
示す１ビットのフラグを出力し、ｔｃｓｒ１３８へ順次
セットして行く。例えば、２入力のデータ最大値・最小
値を求める場合、どちらを選択したかの履歴を記憶でき
る。このｔｃｓｒ１３８にセットされた内容をＭＳＢか
らＬＳＢまで水平に見たものが２進木探索におけるイン
デックスコードに相当する。When rounding is performed, the following procedure is performed in this processor. MSB LSB (1) 0000 0000 1111 1111 1010 0111: DPR1 129 output 24 bits (2) 0000 0000 0000 0000 0000 1111 1111: N-SFT130 output (right 8 bit shift) 1: carry 139 (3) 0000 0000 0000 0001 0000 0000: AU133 output 134 Carry addition is performed. That is, the rounding process can be performed by setting the most significant bit of the data shifted out in the N-SFT 130 as the carry and performing the carry addition in the AU 133. Therefore, the output destination of the rounded result is limited to only W _r 135. Next, the flag check circuit 137 outputs a 1-bit flag indicating whether or not the condition is satisfied according to the condition code designated by the microinstruction, as the flag 136 as a result of the comparison operation by the ALU 124, and sequentially outputs to tcsr 138. I will set it. For example, when obtaining the maximum and minimum values of two-input data, the history of which is selected can be stored. A horizontal view of the contents set in the tcsr 138 from the MSB to the LSB corresponds to the index code in the binary tree search.

【００３８】図３に基づき内部データメモリの構成を説
明する。Ｍ０１０７、Ｍ１１０８は各々２４ビット
×５１２ワードの２ポートＲＡＭであり、演算部１０６
へ２入力データを並列に出力する場合はＭ０１０７、
Ｍ１１０８の読み出しポートの出力をセレクタ１４
０，１４１によってＸ−バス１２０Ｙ−バス１２１へ出
力する。この時のアドレスはＳ０アドレス１４５がＭ０
１０７、Ｓ１アドレス１４６がＭ１１０８へ出力さ
れる。更に、ベクトル加算すなわちＡ→＋Ｂ→→Ｃ→の
様にソース、ディティネーション共データメモリを対象
とする場合にはＺ−バス１２２からＭＵＸ１４２ないし
ＭＵＸ１４３を通じてＭ０１０７ないしＭ１１０８
の読み出し／書き込みポートからデータが書き込まれ
る。すなわち、内部動作に関してはバス競合が発生しな
い。The structure of the internal data memory will be described with reference to FIG. Each of M0 107 and M1 108 is a 2-port RAM of 24 bits × 512 words, and the arithmetic unit 106
To output 2 input data in parallel to M0 107,
The output of the read port of M1 108 is selected by the selector 14
0, 141 to output to the X-bus 120 Y-bus 121. As for the address at this time, S0 address 145 is M0
107, S1 address 146 is output to M1 108. Further, in the case of vector addition, that is, in the case of targeting the source and destination data memories as in A → + B →→ C →, the M0 107 to M1 108 from the Z-bus 122 through the MUX 142 to MUX 143.
Data is written from the read / write port of. That is, bus contention does not occur in the internal operation.

【００３９】図４に基づきアドレス発生部１０３の構成
を説明する。アドレス発生部１０３はＲ０アドレス発生
器、Ｓ１発生器、Ｄアドレス発生器を各々担当する３系
統のＡＧＵ１５６から構成される。各ＡＧＵには２４ビ
ット×４ワードのＡＲ１５１と１２ビット×４ワードの
ＩＸＲ１５２が備えられており、ＡＲ１５１とＩＸＲ１
５２とディスプレースメント１５０の３項の加算の組み
合わせをアドレス加算器１５５によって行うことで２次
元的なアドレス生成が可能である。The configuration of the address generator 103 will be described with reference to FIG. The address generator 103 is composed of three systems of AGU 156, which are in charge of the R0 address generator, the S1 generator, and the D address generator, respectively. Each AGU is provided with an AR151 of 24 bits × 4 words and an IXR152 of 12 bits × 4 words.
Two-dimensional address generation is possible by performing the combination of the addition of the three terms 52 and the displacement 150 by the address adder 155.

【００４０】尚、ＡＧＵ１５６の動作はデコードステー
ジであるが書き込み／累算ステージとは２ステージ分の
ステージ差があるため、Ｄアドレス１４７はＤＡＰＲ３
１５７、ＤＡＰＲ４１５８によって２マシンサイク
ル遅延されてＡＧＵ１５６から出力される。ＡＲ１５
１，ＩＸＲ１５２は各々Ｘ−バス１２０に接続され、デ
ータフォーマットは演算部１０６と互換性を有してい
る。よって、例えばテーブルルックアップを行う場合は
直接Ｗ_r １３５からＸ−バス１２０を介してＡＲ１５１
へデータを転送し、そのままＳ０アドレス１４５ないし
Ｓ１アドレス１４６としてアドレス出力を行えば良い。Although the operation of the AGU 156 is the decode stage, there is a stage difference of two stages from the write / accumulation stage, so the D address 147 is set to DAPR3.
157, DAPR4 158 delays the output by 2 machine cycles and then outputs from AGU 156. AR15
1, 1 and IXR 152 are each connected to the X-bus 120, and the data format is compatible with the arithmetic unit 106. Thus, for example, when performing a table lookup, the AR 151 is directly connected to the W _r 135 via the X-bus 120.
The data may be transferred to the S0 address 145 to the S1 address 146, and the address may be output.

【００４１】本プロセッサの命令実行パイプラインを図
５に基づいて説明する。命令実行パイプラインは１命令
に付以下の５つのステージから構成される。（１）フェッチ・ステージ１６１プログラムカウンタ出力および１ワード（４８ビット
幅）のマイクロ命令読み出し。（２）デコードステージマイクロ命令のデコード１６２およびアドレス更新１６
３。（３）読み出しステージ１６４データメモリまたはレジスタ等のソースデータをＸ−バ
ス１２０、Ｙ−バス１２１経由で読み出し。（４）実行ステージ１６５Ｂ−ＳＦＴ１２３、ＡＬＵ１２４、ＭＰＹ１２５による
演算。（５）書き込み／累算ステージＮ−ＳＦＴ１３０による正規化１６６およびＡＵ１３３
による丸め／累算ないしＺ−バス１２２を介したデータ
メモリへの書き込み１６７。ここで（５）の書き込み／
累算ステージにおいてＡＵ１３３またはＺ−バス１２２
を介したデータ書き込みのタイミング１６７を共有する
とはＡＵ１３３の出力はＷ_r １３５のみにセットされ、
Ｚ−バス１２２を使用する場合、ＡＵ１３３は使用しな
いという排他的関係があるためである。以上のシーケン
スに従って命令を実行することで煩雑な遅延を考慮した
プログラムの作成がほぼ不要となり、高級言語コンパイ
ラを用いても効率の良いマイクロプログラムの作成が可
能となる。The instruction execution pipeline of this processor will be described with reference to FIG. The instruction execution pipeline is composed of the following five stages per instruction. (1) Fetch stage 161 Program counter output and 1-word (48-bit width) microinstruction read. (2) Decode stage Micro instruction decode 162 and address update 16
3. (3) Read stage 164 Read source data such as data memory or register via X-bus 120 and Y-bus 121. (4) Execution stage 165 Calculation by B-SFT123, ALU124, MPY125. (5) Write / accumulation stage Normalization 166 and AU 133 by N-SFT 130
Round / accumulate by or write to data memory 167 via Z-bus 122. Write (5) here /
AU133 or Z-Bus 122 in accumulation stage
Sharing the timing 167 of writing data via the output of the AU 133 is set to W _r 135 only,
This is because there is an exclusive relationship that the AU 133 is not used when the Z-bus 122 is used. By executing the instructions according to the above sequence, it becomes almost unnecessary to create a program considering a complicated delay, and an efficient microprogram can be created even by using a high-level language compiler.

【００４２】本プロセッサのマイクロ命令は例えば図６
に示す様になっており、全て４８ビット語長の１ワード
水平型命令セットである。この命令セットでは同時に動
作可能な内部リソースを並列に指示するのではなく、命
令対応に各ステージのリソース動作の組み合わせを規定
した機能コードを用いる。これによって、マイクロ命令
の記述が簡易化する。この命令セットは大別してロード
１７０、分枝１７１、１ソース演算１７２、２ソース演
算１７３があり、機能コードに対応し、ソース・デステ
ィネーションを制御するソースコード１７４、デスティ
ネーションコード１７５、ソース０コード１７６、ソー
ス１コード１７７がセットされている。これらのコード
は各々データメモリを対象とする場合はアドレス発生部
１０３内の対応するＡＧＵ１５６に対するアドレッシン
グ指示コードとなる。この識別はリソースコードによっ
て行われる。この命令セットにより例えば演算命令毎に
アドレッシングモードの切り換え、正規化シフト値等の
設定を変更でき、複雑な信号処理アルゴリズムをプログ
ラムする時も最小限のロスで記述することが可能とな
る。The microinstruction of this processor is shown in FIG.
As shown in (1), it is a 1-word horizontal type instruction set with a word length of 48 bits. In this instruction set, a function code that defines a combination of resource operations of each stage is used for each instruction instead of instructing internal resources that can operate simultaneously in parallel. This simplifies the description of microinstructions. This instruction set is roughly divided into a load 170, a branch 171, a 1-source operation 172, and a 2-source operation 173. Source code 174, destination code 175, source 0 code corresponding to the function code and controlling the source / destination. 176 and source 1 code 177 are set. Each of these codes is an addressing instruction code for the corresponding AGU 156 in the address generator 103 when the data memory is targeted. This identification is done by the resource code. With this instruction set, for example, the addressing mode can be switched and the settings such as the normalized shift value can be changed for each operation instruction, and it is possible to describe with a minimum loss even when programming a complicated signal processing algorithm.

【００４３】例えば、従来例と同様に図１０に示す２進
木探索を実行する場合、本プロセッサでは近似度の算出
を以下の様にプログラムすれば良い。ｒｅｐＮ｛ｓｕｂａａｓｃ０，ｓｃ１，ｗｒ_x ｝Ｎ回くり返しｓｃ０：入力ベクトルアドレス制御ｓｃ１：参照ベクトルアドレス制御ｗｒ_x ：ワーキングレジスタ指定これに要するマシンサイクル数はＮ＋１サイクルであ
り、これを２回くり返せば方向０，方向１の参照ベクト
ルの近似度が求められる。次に近似度が大のものを決定
し、次段のノード番号を求める処理は以下の様に記述で
きる。ｃｍｐ・ｇｅｗｒ０，ｗｒ１比較し、結果をｔｃｓｒ１３８へセットｍｖｒａｒ００，ａｒ０１｝ｓｃ０，ｓｃ１のアドレスポインタを初期化ｍｖｒａｒ１０，ａｒ１１ａｄｓｌ１，ｔｓｃｒ，ｗｒ２次ノード参照ベクトルアドレスを算出（２ｎ＋１：１はｗｒ２に予めセット）ｎｏｐｎｏｐｍｖｒｗｒ２ａｒ１２計７命令よって１段当りの所要マシンサイクル数は２Ｎ＋９マシ
ンサイクルである。これは理想値とほぼ一致する程の高
効率処理であることが明らかであり、またプログラムも
簡潔である。For example, when executing the binary tree search shown in FIG. 10 as in the conventional example, the processor may be programmed to calculate the degree of approximation as follows. rep N {subaa sc0, sc1, wr _x } N times repeated sc0: Input vector address control sc1: Reference vector address control wr _x : Working register designation The number of machine cycles required for this is N + 1 cycles, and this can be repeated twice. For example, the degree of approximation of the reference vector in directions 0 and 1 can be obtained. Next, the process of determining the one with the highest degree of approximation and obtaining the node number of the next stage can be described as follows. cmp · ge wr0, wr1 are compared, and the result is set to tcsr138 Initialize the address pointer of mvr ar00, ar01} sc0, sc1 mvr ar10, ar11 adsl 1, tscr, wr2 Calculate the secondary node reference vector address (2n + 1: 1 is (preset to wr2) nop nop mvr wr2 ar12 Total 7 instructions require 2N + 9 machine cycles per stage. It is clear that this is a highly efficient process that almost matches the ideal value, and the program is simple.

【００４４】なお、上記実施例では語長を２４ビットア
ドレス空間を１６ＭＷ（２４ビット）としたもので説明
したが他の語長およびデータフォーマットであってもよ
い。また、上記実施例では２進木探索について説明した
が、他の信号処理アルゴリズムも同様に上記実施例と同
一の効果を奏する。また、上記実施例の細部の仕様は本
発明の本質とは無関係であり、本発明の内容を限定する
ものではないことは明らかである。In the above embodiment, the word length is 24 bits and the address space is 16 MW (24 bits). However, other word lengths and data formats may be used. Further, although the binary tree search is described in the above embodiment, other signal processing algorithms also have the same effect as the above embodiment. Further, it is apparent that the detailed specifications of the above-mentioned embodiment are not related to the essence of the present invention and do not limit the content of the present invention.

【００４５】[0045]

【発明の効果】以上のように、第１の発明及び第３の発
明によれば直接メモリ転送バスを介し外部データメモリ
接続部と内部データメモリとの間でブロック単位でデー
タの入出力の判断を行うことにより、データの高速転送
が可能になり、外部のメモリへのデータ伝送効率を上げ
ることができる。また第２の発明によれば、外部データ
メモリに対する２次元データアドレス空間中の任意の矩
形部分を指示することにより、外部データメモリと内部
データメモリとの間での２次元データ転送を行うことが
可能である。（ｐ．１３参照）また第４の発明によれ
ば、外部データメモリを２分し、一方を高速メモリと
し、他方を低速メモリとしたことによりデータ転送の効
率を図っている。したがって外部のメモリへのデータ伝
送効率をよりいっそう上げることができる。As described above, according to the first and third aspects of the present invention, it is possible to judge the input / output of data in block units between the external data memory connection section and the internal data memory via the direct memory transfer bus. By doing so, high-speed data transfer becomes possible, and the data transmission efficiency to the external memory can be improved. According to the second aspect of the invention, the two-dimensional data transfer between the external data memory and the internal data memory can be performed by designating an arbitrary rectangular portion in the two-dimensional data address space for the external data memory. It is possible. (See p.13) According to the fourth invention, the efficiency of data transfer is improved by dividing the external data memory into two parts, one of which is a high speed memory and the other of which is a low speed memory. Therefore, the data transmission efficiency to the external memory can be further improved.

[Brief description of drawings]

【図１】この発明の一実施例によるディジタル信号処
理プロセッサの構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a digital signal processor according to an embodiment of the present invention.

【図２】この発明の図１中の演算部の構成を示す図で
ある。FIG. 2 is a diagram showing a configuration of an arithmetic unit in FIG. 1 of the present invention.

【図３】この発明の図１中の内部データメモリ構成を
説明する図である。FIG. 3 is a diagram for explaining the internal data memory configuration in FIG. 1 of the present invention.

【図４】この発明の図１中のアドレス生成部の構成を
示す図である。FIG. 4 is a diagram showing a configuration of an address generation unit in FIG. 1 of the present invention.

【図５】この発明の図１に示したディジタル信号処理
プロセッサの命令実行タイミングを説明する図である。5 is a diagram illustrating instruction execution timing of the digital signal processor shown in FIG. 1 of the present invention.

【図６】この発明の図１に示したディジタル信号処理
プロセッサのマイクロ命令セットの例を示す図である。6 is a diagram showing an example of a micro instruction set of the digital signal processor shown in FIG. 1 of the present invention.

【図７】従来のディジタル信号処理プロセッサの一例
であるＤＳＳＰ１の構成を示すブロック図である。FIG. 7 is a block diagram showing a configuration of a DSSP1 which is an example of a conventional digital signal processor.

【図８】図７のＤＳＳＰ１の命令実行タイミングを説
明する図である。8 is a diagram illustrating instruction execution timing of the DSSP1 of FIG.

【図９】ＤＳＳＰ１のマイクロ命令セットを示す図で
ある。FIG. 9 is a diagram showing a microinstruction set of DSSP1.

【図１０】２進木探索の動作を説明する図である。FIG. 10 is a diagram illustrating an operation of a binary tree search.

【図１１】図１０における参照ベクトルのデータメモ
リ内での配置列を示す図である。11 is a diagram showing a sequence of arrangement of reference vectors in FIG. 10 in a data memory.

[Explanation of symbols]

１００プログラムバス、１０１ＷＣＳ、１０２シ
ーケンス制御部、１０３アドレス生成部、１０４デ
ータバス、１０５外部データメモリＩ／Ｆ、１０６
演算部、１０７Ｍ０、１０８Ｍ１、１０９ＤＭＡ
制御部、１１０ＤＭＡバス、１１１外部データバス、
１２０Ｘ−バス、１２１Ｙ−バス、１２２Ｚ−バ
ス。尚、図中、同一符号は同一、又は相当部分を示す。100 program bus, 101 WCS, 102 sequence control unit, 103 address generation unit, 104 data bus, 105 external data memory I / F, 106
Arithmetic unit, 107 M0, 108 M1, 109 DMA
Control unit, 110 DMA bus, 111 external data bus,
120 X-bus, 121 Y-bus, 122 Z-bus. In the drawings, the same reference numerals indicate the same or corresponding parts.

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号庁内整理番号ＦＩ技術表示箇所Ｇ０６Ｆ 17/10 15/78 ５１０Ｆ ─────────────────────────────────────────────────── ─── Continuation of the front page (51) Int.Cl. ⁶ Identification code Internal reference number FI Technical display location G06F 17/10 15/78 510 F

Claims

[Claims]

1. A plurality of data input buses for transmitting input data signals, an internal data memory for receiving the respective input data signals from the plurality of data input buses, and storing the input data signals, and the internal data. A digital signal processor comprising: an arithmetic unit that receives an input data signal output from a memory and performs arithmetic processing; and an address generation unit that controls the operations of the arithmetic unit and the data memory. An externally provided external data memory that reads data from the plurality of data input buses, an external data memory connection unit that writes data to the external data memory, and a read / write port of the external data memory connection unit A direct memory transfer bus for connecting to the external data memory, and a direct memory for controlling input / output of data in block units between the external data memory connection unit and the internal data memory via the direct memory transfer bus. A digital signal processor comprising a transfer control unit.

2. The direct memory transfer control unit gives address instructions to the external data memory connection unit in k rows × L columns in a two-dimensional data address space of m rows × n columns (m and n are positive integers). Instructing a rectangular part (where k and L are positive integers), instructing an arbitrary start address for the external data memory, and performing two-dimensional data transfer between the external data memory and the internal data memory. A digital signal processor according to claim 1.

3. The direct memory transfer control unit performs data input / output with the external data memory and internal arithmetic processing in parallel in units of rectangular blocks of k rows × L columns. The digital signal processor according to item 1.

4. The external data memory connection section divides the external data memory into two, and when one of them is addressed, a high-speed memory that completes read / write in one machine cycle is completed, and when the other is addressed, 2. The digital signal processor according to claim 1, wherein the digital signal processor is a low-speed memory that waits until an external read / write completion signal is detected.