JP6292324B2

JP6292324B2 - Arithmetic processing unit

Info

Publication number: JP6292324B2
Application number: JP2017000742A
Authority: JP
Inventors: 和浩吉村; 毅葛; 一生堀尾
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2017-01-05
Filing date: 2017-01-05
Publication date: 2018-03-14
Anticipated expiration: 2033-04-22
Also published as: JP2017059273A

Description

本明細書で言及する実施例は、演算処理装置に関する。 The embodiments referred to in this specification relate to an arithmetic processing device.

近年、スマートフォンやタブレットコンピュータといった携帯端末の通信量の増大に伴って、より高速な無線通信方式が注目されている。このような高速無線通信方式として、例えば、ＬＴＥ(Long Term Evolution)が普及されてきており、さらに高性能な次世代移動通信システムのＬＴＥアドバンスト(LTE-Advanced)も標準化が完了し、実用化に向けて様々な提案がなされている。 In recent years, higher-speed wireless communication schemes have attracted attention as the amount of communication of mobile terminals such as smartphones and tablet computers increases. As such a high-speed wireless communication system, for example, LTE (Long Term Evolution) has become widespread, and LTE-Advanced, which is a higher-performance next-generation mobile communication system, has been standardized and put into practical use. Various proposals have been made.

ところで、例えば、ＬＴＥアドバンストを適用する場合、無線通信ベースバンド処理として、膨大な行列演算処理を行うことになる。これは、ＬＴＥアドバンストに限ったことではなく、ＷｉＭＡＸ２(Worldwide Interoperability for Microwave Access 2)や現在使用されている方式を含めて様々な無線通信方式(規格)でも同様である。 By the way, for example, when LTE Advanced is applied, enormous matrix calculation processing is performed as the wireless communication baseband processing. This is not limited to LTE Advanced, and the same applies to various wireless communication systems (standards) including WiMAX 2 (Worldwide Interoperability for Microwave Access 2) and currently used systems.

一般的に、無線通信ベースバンド処理において、通信速度の向上に比例して膨大な行列演算を行うことになるが、例えば、上述したＬＴＥアドバンストでは、行列演算が全体の演算量の多くを占めている。 In general, in wireless communication baseband processing, enormous matrix operations are performed in proportion to the improvement in communication speed. For example, in LTE Advanced described above, matrix operations occupy a large amount of the entire operation amount. Yes.

行列演算処理(ストリーム処理のひとつ)を高速に実行するためには、行列データが格納されたメモリと演算器を直列に接続し、メモリから読み出したデータに対して、行列演算を行ない、演算結果をメモリに書き出すストリームエンジンが適している。 In order to execute matrix calculation processing (one of stream processing) at high speed, a memory storing matrix data and an arithmetic unit are connected in series, matrix calculation is performed on the data read from the memory, and the calculation result A stream engine that writes to memory is suitable.

そこで、例えば、ＬＴＥアドバンストの無線通信ベースバンド処理を行う演算処理装置(演算処理システム)として、汎用プロセッサであるベースプロセッサと、ストリームエンジンを持つコプロセッサを組み合わせたものが提案されている。 Thus, for example, as an arithmetic processing device (arithmetic processing system) that performs LTE advanced wireless communication baseband processing, a combination of a base processor that is a general-purpose processor and a coprocessor having a stream engine has been proposed.

ところで、従来、ベースプロセッサと、ストリームエンジンを持つコプロセッサを組み合わせた演算処理システムとしては、様々なものが提案されている。 By the way, various arithmetic processing systems combining a base processor and a coprocessor having a stream engine have been proposed.

特開２０１１−１９７７７４号公報JP 2011-197774 A 特開平０８−０６９３７７号公報Japanese Patent Application Laid-Open No. 08-069377

前述したように、無線通信ベースバンド処理を行う演算処理システムとして、ベースプロセッサと、ストリームエンジンを持つコプロセッサを組み合わせたものが提案されている。 As described above, a combination of a base processor and a coprocessor having a stream engine has been proposed as an arithmetic processing system that performs wireless communication baseband processing.

このような演算処理システムでは、例えば、コプロセッサ命令であるストリーム命令を実行するとき、ベースプロセッサは、ハンドシェイクにより、コプロセッサの状態監視、データ転送および実行制御等を行うため，オーバーヘッドが生じる。このオーバーヘッドは、例えば、通信サイクルオーバヘッドと呼ばれている。 In such an arithmetic processing system, for example, when executing a stream instruction that is a coprocessor instruction, the base processor performs coprocessor state monitoring, data transfer, execution control, and the like by handshaking, resulting in overhead. This overhead is called, for example, communication cycle overhead.

さらに、例えば、コプロセッサのストリームエンジンがストリーム処理を実行中に割り込みが発生した場合、割り込み処理は、ストリーム処理の実行が完了するまで待機して行うことになる。 Further, for example, when an interrupt occurs while the stream engine of the coprocessor is executing the stream processing, the interrupt processing is performed until the execution of the stream processing is completed.

すなわち、割り込み発生時にコプロセッサがビジー状態の場合、ベースプロプロセッサはコプロセッサがアイドル状態になるまで待機することになり、通信サイクルオーバヘッドはさらに増加してしまう。 That is, if the coprocessor is busy when an interrupt occurs, the base proprocessor waits until the coprocessor becomes idle, further increasing the communication cycle overhead.

一実施形態によれば、演算処理装置であって、演算を実行する演算器と、ストリーム処理を実行するストリームエンジンと、前記演算器および前記ストリームエンジンに命令を発行する命令発行部と、を有する演算処理装置が提供される。前記命令発行部が前記ストリームエンジンに発行する命令は、ステップ命令であり、前記ストリームエンジンの各パイプラインステージは、１つのステップ命令に従ってそれぞれ１つの処理を実行する。 According to one embodiment, an arithmetic processing device includes an arithmetic unit that executes an operation, a stream engine that executes stream processing, and an instruction issuing unit that issues an instruction to the arithmetic unit and the stream engine. An arithmetic processing device is provided. The instruction issued by the instruction issuing unit to the stream engine is a step instruction, and each pipeline stage of the stream engine executes one process according to one step instruction.

開示の演算処理装置は、サイクルオーバヘッドを低減して処理を高速化することができるという効果を奏する。 The disclosed arithmetic processing device has an effect of reducing the cycle overhead and speeding up the processing.

図１は、演算処理装置の一例を示すブロック図である。FIG. 1 is a block diagram illustrating an example of an arithmetic processing device. 図２は、本実例に係る演算処理装置の一例を示すブロック図である。FIG. 2 is a block diagram illustrating an example of an arithmetic processing device according to the present example. 図３は、本実例の演算処理装置における動作を説明するための図である。FIG. 3 is a diagram for explaining the operation in the arithmetic processing apparatus of this example. 図４は、本実施例の演算処理装置におけるストリームエンジンの停止動作を説明するための図である。FIG. 4 is a diagram for explaining the stop operation of the stream engine in the arithmetic processing apparatus according to this embodiment. 図５は、図４を参照して説明したストリームエンジンの停止動作による効果の一例を説明するための図である。FIG. 5 is a diagram for explaining an example of the effect of the stop operation of the stream engine described with reference to FIG. 図６は、本実施例の演算処理装置における読出回路の動作の一例を説明するための図である。FIG. 6 is a diagram for explaining an example of the operation of the readout circuit in the arithmetic processing unit of this embodiment. 図７は、本実施例の演算処理装置における読出回路の動作の他の例を説明するための図である。FIG. 7 is a diagram for explaining another example of the operation of the readout circuit in the arithmetic processing unit of this embodiment. 図８は、本実施例の演算処理装置における実行回路の動作の一例を説明するための図である。FIG. 8 is a diagram for explaining an example of the operation of the execution circuit in the arithmetic processing apparatus according to the present embodiment. 図９は、本実施例の演算処理装置における実行回路の動作の他の例を説明するための図である。FIG. 9 is a diagram for explaining another example of the operation of the execution circuit in the arithmetic processing apparatus according to the present embodiment. 図１０は、本実施例の演算処理装置における書込回路の動作の一例を説明するための図である。FIG. 10 is a diagram for explaining an example of the operation of the write circuit in the arithmetic processing unit of the present embodiment. 図１１は、本実施例の演算処理装置における書込回路の動作の他の例を説明するための図である。FIG. 11 is a diagram for explaining another example of the operation of the writing circuit in the arithmetic processing unit of the present embodiment. 図１２は、本実施例の演算処理装置におけるパラメータ情報の一例を説明するための図である。FIG. 12 is a diagram for explaining an example of parameter information in the arithmetic processing apparatus according to the present embodiment. 図１３は、本実施例の演算処理装置におけるステップ命令を説明するための図(その１)である。FIG. 13 is a diagram (part 1) for explaining a step command in the arithmetic processing unit according to the present embodiment. 図１４は、本実施例の演算処理装置におけるステップ命令を説明するための図(その２)である。FIG. 14 is a diagram (part 2) for explaining the step command in the arithmetic processing unit according to the present embodiment. 図１５は、本実施例の演算処理装置におけるステップ命令の変形を説明するための図である。FIG. 15 is a diagram for explaining a modification of the step command in the arithmetic processing unit according to the present embodiment. 図１６は、本実施例の演算処理装置におけるマイクロ命令を説明するための図(その１)である。FIG. 16 is a diagram (part 1) for explaining the micro instruction in the arithmetic processing unit of the present embodiment. 図１７は、本実施例の演算処理装置におけるマイクロ命令を説明するための図(その２)である。FIG. 17 is a diagram (part 2) for explaining the micro instruction in the arithmetic processing unit of the present embodiment. 図１８は、本実施例の演算処理装置におけるマイクロ命令によるアクセス制御を説明するための図である。FIG. 18 is a diagram for explaining access control by microinstructions in the arithmetic processing unit of this embodiment. 図１９は、本実施例の演算処理装置におけるマイクロ命令をＶＬＩＷ命令に埋め込む様子を示す図である。FIG. 19 is a diagram illustrating a state in which the micro instruction is embedded in the VLIW instruction in the arithmetic processing unit according to the present embodiment. 図２０は、図１９に示すＶＬＩＷ命令のプロローグ処理を説明するための図である。FIG. 20 is a diagram for explaining the prologue processing of the VLIW instruction shown in FIG. 図２１は、図１９に示すＶＬＩＷ命令のエピローグ処理を説明するための図である。FIG. 21 is a diagram for explaining the epilogue processing of the VLIW instruction shown in FIG.

まず、演算処理装置の実施例を詳述する前に、図１を参照して、演算処理装置の一例およびその問題点を説明する。 First, before describing in detail an embodiment of the arithmetic processing device, an example of the arithmetic processing device and its problems will be described with reference to FIG.

図１は、演算処理装置の一例を示すブロック図であり、汎用プロセッサであるベースプロセッサと、ストリームエンジンを持つコプロセッサを組み合わせた演算処理装置(演算処理システム)を示すものである。 FIG. 1 is a block diagram showing an example of an arithmetic processing device, and shows an arithmetic processing device (arithmetic processing system) in which a base processor, which is a general-purpose processor, and a coprocessor having a stream engine are combined.

図１において、参照符号ＩＦは命令読出(Instruction Fetch)ステージ、ＩＤは命令解釈(Instruction Decode)ステージ、そして、ＲＲ／ＩＩはレジスタ読出(Register Read)および命令発行(Instruction Issue)ステージを示す。 In FIG. 1, reference symbol IF indicates an instruction read stage, ID indicates an instruction decode stage, and RR / II indicates a register read and instruction issue stage.

また、参照符号ＥＸは実行(EXecution)ステージ、ＭＡはメモリアクセス(Memory Access)ステージ、そして、ＲＷはレジスタ書込(Register Write)ステージを示す。図１に示す演算処理システムは、例えば、汎用プロセッサであるベースプロセッサ１００と、ストリームエンジン２００を含むコプロセッサ３００を有する。 Reference numeral EX denotes an execution stage, MA denotes a memory access stage, and RW denotes a register write stage. The arithmetic processing system illustrated in FIG. 1 includes, for example, a base processor 100 that is a general-purpose processor and a coprocessor 300 that includes a stream engine 200.

ベースプロセッサ１００において、ＩＦステージでは、命令読出部１０１が命令メモリ１０８から命令をフェッチ(読み出)し、また、ＩＤステージでは、命令解釈部１０２が命令読出部１０１で読み出されたた命令を受け取ってデコード(解釈)する。 In the base processor 100, in the IF stage, the instruction reading unit 101 fetches (reads) an instruction from the instruction memory 108, and in the ID stage, the instruction interpretation unit 102 reads the instruction read by the instruction reading unit 101. Receive and decode (interpret).

ＲＲ／ＩＩステージでは、レジスタ読出部１０３がレジスタ１１０のリード(読み出し)を行うと共に、命令発行部１０４が命令解釈部１０２で解釈された命令を演算器１０５へ発行する。 In the RR / II stage, the register reading unit 103 reads (reads) the register 110 and the instruction issuing unit 104 issues the instruction interpreted by the instruction interpreting unit 102 to the computing unit 105.

ＥＸステージでは、演算器１０５が命令発行部１０４から発行された命令に従った演算を実行し、また、ＭＡステージでは、メモリアクセス部１０６がメモリ(データメモリ)１０９に対するロード(読み出し)またはストア(書き込み)のアクセスを行う。 In the EX stage, the arithmetic unit 105 executes an operation according to the instruction issued from the instruction issuing unit 104. In the MA stage, the memory access unit 106 loads (reads) or stores (stores) the memory (data memory) 109. Write) access.

ＲＷステージでは、レジスタ書込部１０７が演算器１０５による演算結果、または、データメモリ１０９からロードされたデータをレジスタ１１０に書き込む。 In the RW stage, the register writing unit 107 writes the calculation result by the calculator 105 or the data loaded from the data memory 109 into the register 110.

ここで、図１の参照符号Ｐ１００で示されるように、ベースプロセッサ１００では、レジスタ−メモリ間またはレジスタ−演算器間の処理を１命令としてパイプライン実行するようになっている。 Here, as indicated by the reference symbol P100 in FIG. 1, the base processor 100 is configured to execute pipeline-execution processing between a register and a memory or between a register and an arithmetic unit as one instruction.

コプロセッサ３００において、ＩＦステージでは、命令読出部３０１が命令メモリ１０８から命令を読み出し、また、ＩＤステージでは、命令解釈部３０２が命令読出部３０１で読み出された命令を受け取って解釈する。 In the coprocessor 300, in the IF stage, the instruction reading unit 301 reads an instruction from the instruction memory 108, and in the ID stage, the instruction interpreting unit 302 receives and interprets the instruction read by the instruction reading unit 301.

ＲＲ／ＩＩステージでは、レジスタ読出部３０３がレジスタ３１０のリードを行うと共に、命令発行部３０４が命令解釈部３０２で解釈された命令をストリームエンジン２００へ発行する。ここで、ストリームエンジン２００は、演算器２０５、および、データメモリ４００に対するロードまたはストアのアクセスを行うメモリアクセス部２０６を含む。 In the RR / II stage, the register reading unit 303 reads the register 310 and the instruction issuing unit 304 issues the instruction interpreted by the instruction interpreting unit 302 to the stream engine 200. Here, the stream engine 200 includes a computing unit 205 and a memory access unit 206 that performs load or store access to the data memory 400.

図１の参照符号Ｐ２００で示されるように、命令発行部３０４からストリームエンジン２００への命令はストリーム命令であり、１ストリーム命令が発行されると、メモリ−演算器間の１ストリーム処理が完了するまでパイプライン実行するようになっている。 As indicated by reference numeral P200 in FIG. 1, the instruction from the instruction issuing unit 304 to the stream engine 200 is a stream instruction. When one stream instruction is issued, one stream processing between the memory and the arithmetic unit is completed. Pipeline execution is up to.

すなわち、ＥＸおよびＭＡステージでは、ストリームエンジン２００における演算器２０５およびメモリアクセス部２０６が命令発行部３０４から発行されたストリーム命令に従ってストリーム処理が完了するまで処理を行う。なお、ＲＷステージでは、レジスタ書込部３０７がストリームエンジン２００によりストリーム処理されたデータ(演算結果)をレジスタ３１０に書き込む。 That is, in the EX and MA stages, the arithmetic unit 205 and the memory access unit 206 in the stream engine 200 perform processing until the stream processing is completed according to the stream instruction issued from the instruction issuing unit 304. In the RW stage, the register writing unit 307 writes the data (calculation result) stream-processed by the stream engine 200 to the register 310.

ここで、図１において、参照符号Ｐ１５０は、ベースプロセッサ１００によるコプロセッサ３００の処理を示し、例えば、コプロセッサ３００のストリーム命令発行によりコプロセッサ３００とハンドシェイクする処理を示す。すなわち、ベースプロセッサ１００は、例えば、コプロセッサ３００の状態を監視し、コプロセッサ３００の実行制御を行い、そして、コプロセッサ３００へのデータ転送を制御する。 Here, in FIG. 1, reference numeral P <b> 150 indicates processing of the coprocessor 300 by the base processor 100, for example, processing of handshaking with the coprocessor 300 by issuing a stream instruction of the coprocessor 300. That is, for example, the base processor 100 monitors the state of the coprocessor 300, performs execution control of the coprocessor 300, and controls data transfer to the coprocessor 300.

図１を参照して説明したベースプロセッサ１００と、ストリームエンジン２００を持つコプロセッサ３００を組み合わせた演算処理システムは、ストリームエンジン２００によりストリーム処理を実行するとき、サイクルオーバヘッドの問題がある。 The arithmetic processing system combining the base processor 100 described with reference to FIG. 1 and the coprocessor 300 having the stream engine 200 has a problem of cycle overhead when the stream processing is executed by the stream engine 200.

すなわち、コプロセッサ命令であるストリーム命令を実行するとき、ベースプロセッサ１００は、ハンドシェイクにより、コプロセッサ３００の状態を監視し、コプロセッサ３００との間のデータ転送を行ない、そして、コプロセッサ３００の実行を制御する。 That is, when executing a stream instruction that is a coprocessor instruction, the base processor 100 monitors the state of the coprocessor 300 by handshaking, performs data transfer with the coprocessor 300, and Control execution.

そのため、ベースプロセッサ１００とコプロセッサ３００の間には、オーバーヘッド(通信サイクルオーバヘッド)が生じる。また、例えば、コプロセッサ３００のストリームエンジン２００がストリーム処理を実行中に割り込みが発生した場合、そのストリーム処理の実行が完了するまで待機することになり、通信サイクルオーバヘッドはさらに増加する。 Therefore, overhead (communication cycle overhead) occurs between the base processor 100 and the coprocessor 300. Further, for example, when an interrupt occurs while the stream engine 200 of the coprocessor 300 is executing the stream process, the process waits until the execution of the stream process is completed, and the communication cycle overhead further increases.

以下、本実施例の演算処理装置を、添付図面を参照して詳述する。図２は、本実例に係る演算処理装置の一例を示すブロック図である。図２と上述した図１の比較から明らかなように、図２に示す演算処理装置(プロセッサ)１は、図１におけるベースプロセッサ１００に対応する構成を含み、さらに、ストリームエンジン２を内蔵している。 Hereinafter, the arithmetic processing apparatus of the present embodiment will be described in detail with reference to the accompanying drawings. FIG. 2 is a block diagram illustrating an example of an arithmetic processing device according to the present example. As is clear from the comparison between FIG. 2 and FIG. 1 described above, the arithmetic processing unit (processor) 1 shown in FIG. 2 includes a configuration corresponding to the base processor 100 in FIG. Yes.

すなわち、図２に示されるように、プロセッサ１は、レジスタ１０，命令読出部１１，命令解釈部１２，レジスタ読出部１３，命令発行部１４，演算器１５，メモリアクセス部１６，レジスタ書込部１７，命令メモリ１８およびデータメモリ１９を含む。ここで、命令発行部１４は、演算器１５に命令を発行するだけでなく、ストリームエンジン２に対しても命令(例えば、ステップ命令)を発行するようになっている。 That is, as shown in FIG. 2, the processor 1 includes a register 10, an instruction reading unit 11, an instruction interpreting unit 12, a register reading unit 13, an instruction issuing unit 14, an arithmetic unit 15, a memory access unit 16, and a register writing unit. 17, an instruction memory 18 and a data memory 19 are included. Here, the instruction issuing unit 14 not only issues an instruction to the computing unit 15 but also issues an instruction (for example, a step instruction) to the stream engine 2.

ストリームエンジン２は、データメモリ４からデータを読み出してレジスタ２２１，２２２に書き込むＰＯＰ部２１、レジスタ２２１，２２２に書き込まれたデータに対してストリーム処理を実行してレジスタ２４に書き込むＥＸＥＣ部２３を含む。さらに、ストリームエンジン２は、レジスタ２４に書き込まれたデータをデータメモリ４に書き込むＰＵＳＨ部２５も含む。 The stream engine 2 includes a POP unit 21 that reads data from the data memory 4 and writes the data to the registers 221 and 222, and an EXEC unit 23 that performs stream processing on the data written to the registers 221 and 222 and writes the data to the register 24. . Further, the stream engine 2 includes a PUSH unit 25 that writes the data written in the register 24 to the data memory 4.

図２において、参照符号ＩＦ，ＩＤ，ＲＲ／ＩＩ，ＥＸ，ＭＡおよびＲＷは、それぞれ図１を参照して説明したのと同様のステージを示す。 In FIG. 2, reference numerals IF, ID, RR / II, EX, MA, and RW indicate stages similar to those described with reference to FIG.

すなわち、ＩＦステージでは、命令読出部１１が命令メモリ１８から命令をフェッチ(読み出)し、また、ＩＤステージでは、命令解釈部１０２が命令読出部１０１でフェッチされた命令を受け取ってデコード(解釈)する。 That is, in the IF stage, the instruction reading unit 11 fetches (reads) an instruction from the instruction memory 18, and in the ID stage, the instruction interpretation unit 102 receives and decodes (interprets) the instruction fetched by the instruction reading unit 101. )

ＲＲ／ＩＩステージでは、レジスタ読出部１３がレジスタ１０のリード(読み出し)を行うと共に、命令発行部１４が命令解釈部１２で解釈された命令を演算器１５およびストリームエンジン２へ発行する。 In the RR / II stage, the register reading unit 13 reads (reads) the register 10 and the instruction issuing unit 14 issues the instruction interpreted by the instruction interpreting unit 12 to the arithmetic unit 15 and the stream engine 2.

ＥＸステージでは、演算器１５が命令発行部１４から発行された命令に従った演算を実行すると共に、ストリームエンジン２が命令発行部１４から発行された命令に従ったストリーム処理を実行する。ここで、命令発行部１４からストリームエンジン２への命令は、上述したように、ステップ命令とされている。 In the EX stage, the arithmetic unit 15 executes an operation according to the instruction issued from the instruction issue unit 14, and the stream engine 2 executes a stream process according to the instruction issued from the instruction issue unit 14. Here, the command from the command issuing unit 14 to the stream engine 2 is a step command as described above.

ＭＡステージでは、メモリアクセス部１６がメモリ(データメモリ)１９に対するロードまたはストアのアクセスを行う。さらに、ＭＡステージでは、ストリームエンジン２(ＰＯＰ部２１またはＰＵＳＨ部２５)がメモリ(データメモリ)４に対するロード(読み出し)またはストア(書き込み)のアクセスを行う。 In the MA stage, the memory access unit 16 performs load or store access to the memory (data memory) 19. Further, in the MA stage, the stream engine 2 (POP unit 21 or PUSH unit 25) performs load (read) or store (write) access to the memory (data memory) 4.

ＲＷステージでは、レジスタ書込部１７が演算器１５による演算結果またはデータメモリ１９からロードされたデータをレジスタ１０に書き込むと共に、レジスタ書込部１７がストリームエンジン２によりストリーム処理されたデータをレジスタ１０に書き込む。 In the RW stage, the register writing unit 17 writes the calculation result by the calculator 15 or the data loaded from the data memory 19 to the register 10, and the register writing unit 17 stores the data stream-processed by the stream engine 2 in the register 10. Write to.

図３は、本実例の演算処理装置における動作を説明するための図である。図３の参照符号Ｐ１と、前述した図１の参照符号Ｐ１００の比較から明らかなように、図１におけるベースプロセッサ１００の対応個所では、レジスタ−メモリ間またはレジスタ−演算器間の処理を１命令としてパイプライン実行するようになっている。 FIG. 3 is a diagram for explaining the operation in the arithmetic processing apparatus of this example. As is clear from the comparison between the reference symbol P1 in FIG. 3 and the reference symbol P100 in FIG. 1 described above, at the corresponding portion of the base processor 100 in FIG. 1, processing between the register and the memory or between the register and the arithmetic unit is performed by one instruction. As a pipeline execution.

また、図３の参照符号Ｐ２１〜Ｐ２３に示されるように、プロセッサ１に内蔵されたストリームエンジン２では、命令発行部１４から発行されたステップ命令に従って、ステップ毎の処理を実行する。 Further, as indicated by reference numerals P21 to P23 in FIG. 3, the stream engine 2 built in the processor 1 executes processing for each step in accordance with the step command issued from the command issuing unit 14.

ここで、処理Ｐ２１は、ストリームエンジン２のＰＯＰ部２１がデータメモリ４からデータを読み出してレジスタ２２１，２２２に書き込む処理である。また、処理Ｐ２２は、ＥＸＥＣ部２３がレジスタ２２１，２２２に書き込まれたデータに対してストリーム処理を実行してレジスタ２４に書き込む処理である。 Here, the process P 21 is a process in which the POP unit 21 of the stream engine 2 reads out data from the data memory 4 and writes it in the registers 221 and 222. Process P22 is a process in which the EXEC unit 23 performs a stream process on the data written in the registers 221 and 222 and writes the data in the register 24.

さらに、処理Ｐ２３は、ＰＵＳＨ部２５がレジスタ２４に書き込まれたデータをデータメモリ４に書き込む処理である。これらの処理Ｐ２１〜Ｐ２３は、命令発行部１４から発行されたステップ命令に従ってパイプライン実行される。 Further, the process P23 is a process in which the push unit 25 writes the data written in the register 24 to the data memory 4. These processes P21 to P23 are pipeline-executed according to the step instruction issued from the instruction issuing unit 14.

なお、本明細書では、ストリームエンジン２は、３つの処理Ｐ２１〜Ｐ２３を３つのステップ命令で処理(３ステップ命令で１回転)する場合を例として示している。しかしながら、これは単なる例であり、４つ以上の処理で１回転の処理とし、この１回転の処理を多数回繰り返すことでストリーム処理を実行してもよいのはいうまでもない。 In this specification, the stream engine 2 shows an example in which the three processes P21 to P23 are processed by three step commands (one rotation by three step commands). However, this is merely an example, and it is needless to say that the stream process may be executed by repeating the process of one rotation many times with four or more processes.

図４は、本実施例の演算処理装置におけるストリームエンジンの停止動作を説明するための図である。例えば、プロセッサ１に内蔵されたストリームエンジン２がストリーム処理を実行中に割り込みが発生した場合、命令発行部１４がストリームエンジン２に対するステップ命令の発行を停止する。 FIG. 4 is a diagram for explaining the stop operation of the stream engine in the arithmetic processing apparatus according to this embodiment. For example, when an interrupt occurs while the stream engine 2 built in the processor 1 is executing stream processing, the instruction issuing unit 14 stops issuing step instructions to the stream engine 2.

このように、命令発行部１４がストリームエンジン２に対するステップ命令の発行を停止すると、ストリームエンジン２における全ての処理Ｐ２１〜Ｐ２３が停止する。すなわち、ＰＯＰ部２１は、データメモリ４からデータを読み出してレジスタ２２１，２２２に書き込む処理Ｐ２１を停止する。 Thus, when the command issuing unit 14 stops issuing step commands to the stream engine 2, all the processes P21 to P23 in the stream engine 2 are stopped. That is, the POP unit 21 stops the process P21 that reads data from the data memory 4 and writes the data in the registers 221 and 222.

また、ＥＸＥＣ部２３は、レジスタ２２１，２２２に書き込まれたデータに対してストリーム処理を実行してレジスタ２４に書き込む処理Ｐ２２を停止する。そして、ＰＵＳＨ部２５は、レジスタ２４に書き込まれたデータをデータメモリ１９に書き込む処理Ｐ２３を停止する。 In addition, the EXEC unit 23 performs a stream process on the data written in the registers 221 and 222 and stops the process P22 to write in the register 24. Then, the PUSH unit 25 stops the process P23 for writing the data written in the register 24 into the data memory 19.

このように、本実施例の演算処理装置は、ストリームエンジン２の動作をステップ命令により細粒度で制御しているため、ストリーム処理中に割り込みが発生した場合には、ストリーム処理を直ちに停止して割り込み処理を行うことができる。 As described above, since the arithmetic processing unit according to the present embodiment controls the operation of the stream engine 2 with fine granularity by the step command, if an interrupt occurs during the stream processing, the stream processing is immediately stopped. Interrupt processing can be performed.

すなわち、本実施例の演算処理装置によれば、例えば、割り込み発生時にステップ命令の発行を止めることにより、ストリームエンジン２を直ちに停止することができる。換言すると、本実施例の演算処理装置によれば、命令発行を止めた後、ストリームエンジン２の各パイプラインステージ(処理Ｐ２１〜Ｐ２３)は自律的に停止することができ、サイクルオーバヘッドを低減して処理を高速化することが可能になる。 That is, according to the arithmetic processing unit of this embodiment, for example, the stream engine 2 can be stopped immediately by stopping the issuance of a step command when an interrupt occurs. In other words, according to the arithmetic processing unit of this embodiment, after the instruction issuance is stopped, each pipeline stage (processes P21 to P23) of the stream engine 2 can be stopped autonomously, thereby reducing cycle overhead. Speeding up the process.

図５は、図４を参照して説明したストリームエンジンの停止動作による効果の一例を説明するための図であり、図５(a)は、前述した図１に示す演算処理システムによる動作を示し、図５(b)は、図４を参照して説明した演算処理装置による動作を示す。 FIG. 5 is a diagram for explaining an example of the effect of the stop operation of the stream engine described with reference to FIG. 4, and FIG. 5 (a) shows the operation of the arithmetic processing system shown in FIG. FIG. 5B shows the operation of the arithmetic processing unit described with reference to FIG.

ここで、前提として、１ストリーム処理のサイクル数(クロックサイクル数)を２００サイクル、演算データパスのレイテンシを１０サイクル、そして、１ストリーム処理に使用するパラメータ情報のビット幅を３２０ビットとする。 Here, it is assumed that the number of cycles for one stream processing (number of clock cycles) is 200 cycles, the latency of the operation data path is 10 cycles, and the bit width of parameter information used for one stream processing is 320 bits.

また、外部−メモリ間のデータ転送はストリーム処理とオーバラップ動作し、データ転送サイクルは隠蔽されているものとする。さらに、図５(a)において、ベースプロセッサ１００−コプロセッサ３００間のデータパスを３２ビットとし、パラメータ情報は、ベースプロセッサ１００からコプロセッサ３００へ１０サイクルで転送されるものとする。 Further, it is assumed that data transfer between the external and the memory overlaps with the stream processing, and the data transfer cycle is concealed. 5A, the data path between the base processor 100 and the coprocessor 300 is 32 bits, and parameter information is transferred from the base processor 100 to the coprocessor 300 in 10 cycles.

従って、図５(a)では、通信サイクルオーバヘッドは、例えば、１０[サイクル](データ転送)＋１０[サイクル](演算データパス)＝２０[サイクル]となる。 Accordingly, in FIG. 5A, the communication cycle overhead is, for example, 10 [cycle] (data transfer) +10 [cycle] (calculation data path) = 20 [cycle].

また、図５(b)において、データパスは密結合されているため、パラメータ情報は１サイクルで転送されるとする。なお、本明細書において、密結合とは、バスレベルで結合された複数のプロセッサが共通のメモリにアクセスするというのではなく、共通の命令発行部１４が演算器１５およびストリームエンジン２に対して命令を発行して制御することを意味する。 In FIG. 5B, since the data paths are tightly coupled, the parameter information is transferred in one cycle. In the present specification, tight coupling does not mean that a plurality of processors coupled at a bus level access a common memory, but the common instruction issuing unit 14 is connected to the arithmetic unit 15 and the stream engine 2. This means issuing commands and controlling them.

従って、図５(b)では、通信サイクルオーバヘッドは、例えば、１[サイクル](データ転送)＋１０[サイクル](演算データパス)＝１１[サイクル]となる。 Accordingly, in FIG. 5B, the communication cycle overhead is, for example, 1 [cycle] (data transfer) +10 [cycle] (calculation data path) = 11 [cycle].

まず、図５(a)に示されるように、図１に示す演算処理システムにおいて、例えば、３回目のストリーム処理(Ａ２)における５０サイクル目で割り込みが発生した場合、その３回目のストリーム処理を全て完了した後に、別のストリーム処理(Ｂ０)を実行する。 First, as shown in FIG. 5A, in the arithmetic processing system shown in FIG. 1, for example, when an interrupt occurs at the 50th cycle in the third stream process (A2), the third stream process is performed. After all the processing is completed, another stream process (B0) is executed.

そのため、演算処理システムでは、別のストリーム処理(Ｂ０)を完了するまでに、２００＋２０＋２００＋２０＋５０＋１５０＋２０＋２００＝８６０[サイクル]だけ要することになる。 Therefore, in the arithmetic processing system, it takes 200 + 20 + 200 + 20 + 50 + 150 + 20 + 200 = 860 [cycles] to complete another stream processing (B0).

一方、図４を参照して説明した本実施例の演算処理装置(プロセッサ)１において、例えば、３回目のストリーム処理(Ａ２)における５０サイクル目で割り込みが発生した場合、直ちにその３回目のストリーム処理を止めて別のストリーム処理(Ｂ０)を実行する。 On the other hand, in the arithmetic processing unit (processor) 1 of the present embodiment described with reference to FIG. 4, for example, when an interrupt occurs in the 50th cycle in the third stream processing (A2), the third stream is immediately generated. The process is stopped and another stream process (B0) is executed.

そのため、本実施例のプロセッサ１では、別のストリーム処理(Ｂ０)を完了するまでに、２００＋１１＋２００＋１１＋５０＋１１＋２００＝６８３[サイクル]だけ要することになる。 Therefore, in the processor 1 of the present embodiment, only 200 + 11 + 200 + 11 + 50 + 11 + 200 = 683 [cycles] are required to complete another stream process (B0).

すなわち、本実施例のプロセッサ１によれば、同じ処理を行うのに、８６０[サイクル]から６８３[サイクル]へ１７７[サイクル]だけ処理を高速化することが可能なことが分かる。 That is, according to the processor 1 of the present embodiment, it can be understood that the same processing can be performed at a speed of 177 [cycles] from 860 [cycles] to 683 [cycles].

なお、図５は、単なるストリーム処理の一例を説明するためのもので、例えば、１ストリーム命令による処理サイクル数が多いほど、或いは、ストリーム処理を実行中の割り込み発生頻度が高いほど、より一層高速化の効果が大きくなるのはいうまでもない。 Note that FIG. 5 is merely an example of stream processing. For example, the higher the number of processing cycles by one stream instruction, or the higher the frequency of interrupt generation during stream processing, the higher the speed. Needless to say, the effect of conversion is increased.

図６は、本実施例の演算処理装置における読出回路の動作の一例を説明するための図であり、図７は、本実施例の演算処理装置における読出回路の動作の他の例を説明するための図である。 FIG. 6 is a diagram for explaining an example of the operation of the readout circuit in the arithmetic processing apparatus of this embodiment, and FIG. 7 explains another example of the operation of the readout circuit in the arithmetic processing apparatus of this embodiment. FIG.

図６および図７に示されるように、読出回路２１０は、ＰＯＰ部２１およびレジスタ２２１，２２２を含み、データメモリ４は、メモリ部４１，４２を含む。なお、メモリ部４１および４２は、例えば、データメモリ４における異なる番地(先頭アドレス)のバンク化されたメモリ領域を示すもので、２つのメモリを持たなくてもよいのはもちろんである。 As shown in FIGS. 6 and 7, read circuit 210 includes POP unit 21 and registers 221 and 222, and data memory 4 includes memory units 41 and 42. The memory units 41 and 42 indicate, for example, banked memory areas at different addresses (start addresses) in the data memory 4 and need not have two memories.

図６に示されるように、読出回路２１０において、ＰＯＰ部２１は、データメモリ４のメモリ部(第１バンク)４１から先頭アドレスおよびストリーム長を指定して第１データを読み出し、レジスタ２２１に格納する。 As shown in FIG. 6, in the reading circuit 210, the POP unit 21 reads the first data from the memory unit (first bank) 41 of the data memory 4 by specifying the start address and the stream length, and stores the first data in the register 221. To do.

さらに、読出回路２１０において、ＰＯＰ部２１は、データメモリ４のメモリ部(第２バンク)４２から先頭アドレスおよびストリーム長を指定して第２データを読み出し、レジスタ２２２に格納する。この読出回路２１０の処理は、例えば、前述した図３の演算処理装置における処理Ｐ２１に対応する。 Further, in the reading circuit 210, the POP unit 21 reads the second data from the memory unit (second bank) 42 of the data memory 4 by specifying the head address and the stream length, and stores the second data in the register 222. The processing of the reading circuit 210 corresponds to, for example, the processing P21 in the arithmetic processing device of FIG.

すなわち、ＰＯＰ部２１は、データメモリ４からストリームデータを読み出し、ストリーム処理の読出ステージ(ＰＯＰ部２１)と実行ステージ(ＥＸＥＣ部２３)の間のレジスタ(パイプラインレジスタ)２２１，２２２に投入(格納)してパイプライン処理を実行する。 That is, the POP unit 21 reads stream data from the data memory 4 and inputs (stores) the data into registers (pipeline registers) 221 and 222 between the read stage (POP unit 21) and the execution stage (EXEC unit 23) of the stream processing. ) To execute pipeline processing.

このように、例えば、第１バンク４１および第２バンク４２にバンク化されたデータメモリ４から、先頭アドレスおよびストリーム長を指定してストリームデータを読み出すことにより、メモリのポート数およびサイクルオーバヘッドを解消することができる。 In this way, for example, by reading stream data from the data memory 4 banked in the first bank 41 and the second bank 42 by specifying the head address and stream length, the number of memory ports and cycle overhead are eliminated. can do.

また、図７に示されるように、例えば、ＤＭＡ(Direct Memory Access)５により、メモリ部(第１および第２バンク)４１，４２から読み出されたデータをＦＩＦＯ(First In First Out)バッファ６１，６２を介して読出回路２１０に供給することもできる。すなわち、データメモリ４からのデータ転送をＤＭＡ５に任せ、ＦＩＦＯバッファ６１，６２から読み出しデータを取り出すこともできる。 As shown in FIG. 7, for example, data read from the memory units (first and second banks) 41 and 42 by the DMA (Direct Memory Access) 5 is converted into a FIFO (First In First Out) buffer 61. , 62 can be supplied to the reading circuit 210. That is, it is possible to leave the data transfer from the data memory 4 to the DMA 5 and take out the read data from the FIFO buffers 61 and 62.

図８は、本実施例の演算処理装置における実行回路の動作の一例を説明するための図である。図８に示されるように、実行回路２３０は、ＥＸＥＣ部２３およびレジスタ２４を含む。 FIG. 8 is a diagram for explaining an example of the operation of the execution circuit in the arithmetic processing apparatus according to the present embodiment. As shown in FIG. 8, the execution circuit 230 includes an EXEC unit 23 and a register 24.

実行回路２３０において、ＥＸＥＣ部２３は、レジスタ２２１および２２２に書き込まれたデータに対してストリーム処理を実行し、その演算結果をレジスタ２４に書き込む。この実行回路２３０の処理は、例えば、前述した図３の演算処理装置における処理Ｐ２２に対応する。 In the execution circuit 230, the EXEC unit 23 performs stream processing on the data written in the registers 221 and 222, and writes the calculation result in the register 24. The process of the execution circuit 230 corresponds to, for example, the process P22 in the arithmetic processing apparatus of FIG.

すなわち、ＥＸＥＣ部２３は、レジスタ２２１および２２２に投入されたデータに対してストリーム処理を実行し、その演算結果をＥＸＥＣ部２３とＰＵＳＨ部２５の間のレジスタ(パイプラインレジスタ)２４に投入してパイプライン処理を実行する。 That is, the EXEC unit 23 performs stream processing on the data input to the registers 221 and 222, and inputs the calculation result to the register (pipeline register) 24 between the EXEC unit 23 and the PUSH unit 25. Perform pipeline processing.

図９は、本実施例の演算処理装置における実行回路の動作の他の例を説明するための図であり、実行回路２３０を多段のＥＸＥＣ部２３１〜２３３およびレジスタ２４１〜２４３としたものである。 FIG. 9 is a diagram for explaining another example of the operation of the execution circuit in the arithmetic processing unit according to the present embodiment, in which the execution circuit 230 includes multi-stage EXEC units 231 to 233 and registers 241 to 243. .

このとき、読出回路２１０のレジスタは、初段の２つのＥＸＥＣ部２３１および２３２に対応させて、４つのレジスタ２２１ａ，２２１ｂおよび２２２ａ，２２２ｂとされている。 At this time, the registers of the read circuit 210 are four registers 221a, 221b and 222a, 222b corresponding to the two EXEC sections 231 and 232 in the first stage.

また、実行回路２３０のレジスタも、３つのＥＸＥＣ部２３１〜２３３による演算結果を格納するために３つのレジスタ２４１〜２４３とされている。なお、図９に示す実行回路は単なる例であり、様々な構成を適用することができるのはもちろんである。 The registers of the execution circuit 230 are also three registers 241 to 243 in order to store the calculation results by the three EXEC units 231 to 233. Note that the execution circuit shown in FIG. 9 is merely an example, and it is needless to say that various configurations can be applied.

このように、実行回路２３０(演算器データパス)は多段構成でもよく、演算結果を毎サイクルＥＸＥＣ部２３３とＰＵＳＨ部２５の間のレジスタ(パイプラインレジスタ)２４３に投入してパイプライン処理を実行することができる。 As described above, the execution circuit 230 (arithmetic unit data path) may have a multi-stage configuration, and the operation result is input to the register (pipeline register) 243 between the EXEC unit 233 and the PUSH unit 25 every cycle to execute the pipeline processing. can do.

図１０は、本実施例の演算処理装置における書込回路の動作の一例を説明するための図であり、図１１は、本実施例の演算処理装置における書込回路の動作の他の例を説明するための図である。 FIG. 10 is a diagram for explaining an example of the operation of the writing circuit in the arithmetic processing apparatus of the present embodiment, and FIG. 11 is another example of the operation of the writing circuit in the arithmetic processing apparatus of the present embodiment. It is a figure for demonstrating.

図１０に示されるように、書込回路２５０は、ＰＵＳＨ部２５を含み、レジスタ２４に格納された演算結果をデータメモリ４のメモリ部４３に書き込む。すなわち、ＥＸＥＣ部２３とＰＵＳＨ部２５間のパイプラインレジスタ２４から出力データを取り出し、例えば、先頭アドレスとストリーム長で示されたメモリ領域へ書き込む。 As shown in FIG. 10, the writing circuit 250 includes the PUSH unit 25 and writes the operation result stored in the register 24 into the memory unit 43 of the data memory 4. That is, output data is extracted from the pipeline register 24 between the EXEC unit 23 and the PUSH unit 25, and is written in, for example, a memory area indicated by the head address and the stream length.

この書込回路２５０の処理は、例えば、前述した図３の演算処理装置における処理Ｐ２３に対応する。ここで、メモリ部４３は、例えば、データメモリ４において、メモリ部４１，４２とは異なるメモリ領域とすることができる。 The processing of the writing circuit 250 corresponds to, for example, the processing P23 in the arithmetic processing device shown in FIG. Here, the memory unit 43 can be a memory area different from the memory units 41 and 42 in the data memory 4, for example.

図１０に示す書込回路２５０は、レジスタ２４に格納された演算結果を直接データメモリ部４３に書き込む。これに対して、図１１に示す書込回路２５０は、レジスタ２４に格納された演算結果をＦＩＦＯバッファ７に書き込み、そのＦＩＦＯバッファ７に書き込まれたデータをＤＭＡ８がメモリ部４３に転送する。 The write circuit 250 shown in FIG. 10 directly writes the operation result stored in the register 24 into the data memory unit 43. On the other hand, the write circuit 250 shown in FIG. 11 writes the calculation result stored in the register 24 into the FIFO buffer 7, and the DMA 8 transfers the data written in the FIFO buffer 7 to the memory unit 43.

すなわち、図１１に示す書込回路２５０は、レジスタ２４に格納された演算結果を順にＦＩＦＯバッファ７に書き込み、ＦＩＦＯバッファ７からメモリ部４３(データメモリ４)へのデータ転送は、ＤＭＡ８に任せるようになっている。 That is, the write circuit 250 shown in FIG. 11 sequentially writes the operation results stored in the register 24 into the FIFO buffer 7 and leaves the data transfer from the FIFO buffer 7 to the memory unit 43 (data memory 4) to the DMA 8. It has become.

図１２は、本実施例の演算処理装置におけるパラメータ情報の一例を説明するための図である。ストリーム処理に使用するパラメータ情報は、例えば、各ストリーム(ｉ)の先頭アドレス(ａｉ)、ストリーム長(ｌｉ)、演算オペコード(ｏ)および演算モード(ｍ)は単一かつ長ビット長の命令(セット命令：ｓｅｔ)で表現することができる。 FIG. 12 is a diagram for explaining an example of parameter information in the arithmetic processing apparatus according to the present embodiment. The parameter information used for the stream processing is, for example, a single long-bit length instruction (ai), stream length (li), operation opcode (o), and operation mode (m) for each stream (i). Set command: set).

このセット命令(パラメータ情報)は、参照符号Ｐ１０で示されるように、命令メモリ１８から読み出されて、パラメータレジスタ１４０へ一括して代入(セット)される。そして、各パイプラインステージ(ＰＯＰ部２１，ＥＸＥＣ部２３およびＰＵＳＨ部２５)は、参照符号Ｐ１１で示されるように、パラメータレジスタ１４０からパラメータ情報を参照してパイプライン実行する。 This set instruction (parameter information) is read from the instruction memory 18 and substituted (set) into the parameter register 140 in a lump as indicated by reference symbol P10. Each pipeline stage (the POP unit 21, the EXEC unit 23, and the PUSH unit 25) executes the pipeline by referring to the parameter information from the parameter register 140, as indicated by the reference symbol P11.

図１３および図１４は、本実施例の演算処理装置におけるステップ命令を説明するための図である。図１３および図１４に示されるように、本実施例の演算処理装置(ストリームエンジン２)は、セット命令により制御することができる。 13 and 14 are diagrams for explaining step commands in the arithmetic processing unit of the present embodiment. As shown in FIGS. 13 and 14, the arithmetic processing unit (stream engine 2) of this embodiment can be controlled by a set instruction.

すなわち、参照符号Ｐ２０で示されるように、命令メモリ１８からステップ命令を読み出し、そのステップ命令の実行により、ストリームエンジン２の各パイプラインステージの処理Ｐ２１〜Ｐ２３を制御することができる。なお、ステップ命令は、例えば、予めプログラマーにより作成されたものが使用される。 That is, as indicated by the reference symbol P20, a step instruction is read from the instruction memory 18, and the processing P21 to P23 of each pipeline stage of the stream engine 2 can be controlled by executing the step instruction. As the step command, for example, one prepared in advance by a programmer is used.

ここで、ステップ命令 step 1〜step N は、命令メモリ１８から順番に読み出されて命令発行部１４からストリームエンジン２へ発行され、各パイプライン処理Ｐ２１〜Ｐ２３が実行される。 Here, step instructions step 1 to step N are sequentially read from the instruction memory 18 and issued from the instruction issuing unit 14 to the stream engine 2 to execute the respective pipeline processes P21 to P23.

図１３に示されるように、ステップ命令は、命令発行部１４からストリームエンジン２へ発行され、１つのステップ命令により、ＰＯＰ部２１，ＥＸＥＣ部２３およびＰＵＳＨ部２５が１つの処理(Ｐ２１，Ｐ２２，Ｐ２３)を実行する。 As shown in FIG. 13, the step command is issued from the command issuing unit 14 to the stream engine 2, and one step command causes the POP unit 21, EXEC unit 23, and PUSH unit 25 to perform one process (P 21, P 22, P23) is executed.

すなわち、図１４(a)に示されるように、処理Ｐ２１は、ＰＯＰ部２１がデータメモリ４からデータを読み出してレジスタ２２１，２２２に書き込む処理である。また、図１４(b)に示されるように、処理Ｐ２２は、ＥＸＥＣ部２３がレジスタ２２１，２２２に書き込まれたデータに対してストリーム処理を実行してレジスタ２４に書き込む処理である。 That is, as shown in FIG. 14A, the process P21 is a process in which the POP unit 21 reads data from the data memory 4 and writes it in the registers 221 and 222. 14B, the process P22 is a process in which the EXEC unit 23 performs a stream process on the data written in the registers 221 and 222 and writes the data in the register 24.

さらに、図１４(c)に示されるように、処理Ｐ２３は、ＰＵＳＨ部２５がレジスタ２４に書き込まれたデータをデータメモリ１９に書き込む処理である。これらの処理Ｐ２１〜Ｐ２３は、命令発行部１４から発行されたステップ命令に従ってパイプライン実行される。 Further, as shown in FIG. 14C, the process P <b> 23 is a process in which the push unit 25 writes the data written in the register 24 into the data memory 19. These processes P21 to P23 are pipeline-executed according to the step instruction issued from the instruction issuing unit 14.

図１５は、本実施例の演算処理装置におけるステップ命令の変形を説明するための図である。前述した図１３では、Ｎ個のステップ命令 step 1〜step N は、そのまま命令メモリ１８から読み出されて命令発行部１４からストリームエンジン２へ発行されている。 FIG. 15 is a diagram for explaining a modification of the step command in the arithmetic processing unit according to the present embodiment. In FIG. 13 described above, the N step instructions step 1 to step N are read from the instruction memory 18 as they are and issued from the instruction issuing unit 14 to the stream engine 2.

これに対して、図１５に示す変形例では、セット命令を、連続する繰り返し処理(ループ処理)を効率よく実行するためのループ処理専用の命令(ゼロオーバヘッドループ命令)と組み合わせるようになっている。 On the other hand, in the modification shown in FIG. 15, the set instruction is combined with an instruction dedicated to loop processing (zero overhead loop instruction) for efficiently executing continuous repetition processing (loop processing). .

すなわち、Ｎ個のステップ命令 step 1〜step N は、ゼロオーバヘッドループ命令（loop N step)とすることで、命令列を増加させないようにすることができる。なお、ゼロオーバヘッドループ命令においても、例えば、割り込み発生時、ストリームは、直ちに実行中のステップで処理を停止するようになっている。 That is, N step instructions step 1 to step N can be set to zero overhead loop instructions (loop N step), so that the instruction sequence is not increased. Also in the zero overhead loop instruction, for example, when an interrupt occurs, the stream immediately stops processing at the step being executed.

図１６および図１７は、本実施例の演算処理装置におけるマイクロ命令を説明するための図である。図１６に示されるように、命令発行部１４からストリームエンジン２へ発行される命令は、マイクロ命令とされている。 16 and 17 are diagrams for explaining microinstructions in the arithmetic processing unit of the present embodiment. As shown in FIG. 16, the instruction issued from the instruction issuing unit 14 to the stream engine 2 is a microinstruction.

すなわち、図１６の参照符号Ｐ３０で示されるように、命令メモリ１８からマイクロ命令を読み出し、そのマイクロ命令の実行により、ストリームエンジン２の各パイプラインステージの処理Ｐ２１〜Ｐ２３を制御するようになっている。 That is, as indicated by reference numeral P30 in FIG. 16, the microinstruction is read from the instruction memory 18, and the processes P21 to P23 of each pipeline stage of the stream engine 2 are controlled by executing the microinstruction. Yes.

例えば、図１７(a)に示す処理Ｐ２１に対してｐｏｐ命令を割り当て、図１７(b)に示す処理Ｐ２２に対してｅｘｅｃ命令を割り当て、そして、図１７(c)に示す処理Ｐ２３に対してｐｕｓｈ命令を割り当て、各マイクロ命令により実行する。これにより、各パイプラインステージの処理Ｐ２１〜Ｐ２３をマイクロ命令により個別に制御することができる。 For example, a pop instruction is assigned to the process P21 shown in FIG. 17A, an exec instruction is assigned to the process P22 shown in FIG. 17B, and the process P23 shown in FIG. A push instruction is assigned and executed by each microinstruction. Thereby, the processes P21 to P23 of each pipeline stage can be individually controlled by the micro instruction.

図１８は、本実施例の演算処理装置におけるマイクロ命令によるアクセス制御を説明するための図である。 FIG. 18 is a diagram for explaining access control by microinstructions in the arithmetic processing unit of this embodiment.

ここで、図１８(a)は、ｐｏｐ，ｅｘｅｃおよびｐｕｓｈ命令を全て発行した場合を示し、図１８(ｂ)は、ｐｏｐ命令を停止した場合を示し、そして、図１８(c)は、ｐｕｓｈ命令を停止した場合を示す。なお、演算処理装置には、前述した図７および図１１のように、ＤＭＡ５，８およびＦＩＦＯバッファ６１，６２，７が設けられている。 Here, FIG. 18A shows a case where all the pop, exec and push instructions are issued, FIG. 18B shows a case where the pop instruction is stopped, and FIG. 18C shows a push instruction. Indicates the case where the instruction is stopped. The arithmetic processing unit is provided with DMAs 5, 8 and FIFO buffers 61, 62, 7 as shown in FIGS.

まず、図１８(a)に示されるように、ｐｏｐ命令，ｅｘｅｃ命令およびｐｕｓｈ命令の全てが発行されると、各パイプラインステージの処理Ｐ２１〜Ｐ２３が毎サイクル実行される。 First, as shown in FIG. 18A, when all of the pop instruction, the exec instruction, and the push instruction are issued, the processes P21 to P23 of each pipeline stage are executed every cycle.

次に、図１８(ｂ)に示されるように、ｐｏｐ命令を停止すると、すなわち、ｅｘｅｃ命令およびｐｕｓｈ命令のみ実行すると、ＰＯＰ部２１は、ＦＩＦＯバッファ６１，６２からのデータ読み出しを停止する。 Next, as shown in FIG. 18B, when the pop instruction is stopped, that is, when only the exec instruction and the push instruction are executed, the POP unit 21 stops reading data from the FIFO buffers 61 and 62.

これにより、ＦＩＦＯバッファ６１，６２は、ＤＭＡ(入力ＤＭＡ)５によるデータ転送で満状態となり、ＤＭＡ５がＦＩＦＯバッファ６１，６２の満状態を検出して自動停止する。すなわち、マイクロ命令であるｐｏｐ命令を停止することにより、ストリームエンジン２のパイプライン処理を停止することができる。 As a result, the FIFO buffers 61 and 62 become full when data is transferred by the DMA (input DMA) 5, and the DMA 5 detects the full state of the FIFO buffers 61 and 62 and automatically stops. In other words, the pipeline processing of the stream engine 2 can be stopped by stopping the pop instruction that is a microinstruction.

さらに、図１８(c)に示されるように、ｐｕｓｈ命令を停止すると、すなわち、ｐｏｐ命令およびｅｘｅｃ命令のみ実行すると、ＰＵＳＨ部２５は、レジスタ２４からデータを読み出してＦＩＦＯバッファ７に格納する動作を停止する。 Further, as shown in FIG. 18C, when the push instruction is stopped, that is, when only the pop instruction and the exec instruction are executed, the PUSH unit 25 reads data from the register 24 and stores it in the FIFO buffer 7. Stop.

これにより、ＦＩＦＯバッファ７は空状態となり、ＤＭＡ(出力ＤＭＡ)８がＦＩＦＯバッファ７の空状態を検出して自動停止する。すなわち、マイクロ命令であるｐｕｓｈ命令を停止することにより、ストリームエンジン２のパイプライン処理を停止することができる。 As a result, the FIFO buffer 7 becomes empty, and the DMA (output DMA) 8 detects the empty state of the FIFO buffer 7 and automatically stops. That is, the pipeline processing of the stream engine 2 can be stopped by stopping the push instruction that is a micro instruction.

このように、ｐｏｐ命令，ｅｘｅｃ命令およびｐｕｓｈ命令のマイクロ命令を使用することで、例えば、割り込み発生時でも、ＤＭＡ５，８がメモリアクセスを自律的に制御することができる。すなわち、メモリ−演算器間のデータ転送の制御を簡略化することができ、メモリアクセス制御のハードウェア量を削減することが可能になる。 In this way, by using the micro instructions such as the pop instruction, the exec instruction, and the push instruction, for example, even when an interrupt occurs, the DMAs 5 and 8 can autonomously control the memory access. That is, the control of data transfer between the memory and the arithmetic unit can be simplified, and the amount of hardware for memory access control can be reduced.

図１９は、本実施例の演算処理装置におけるマイクロ命令をＶＬＩＷ命令に埋め込む(パックする)様子を示す図である。図１６〜図１８を参照して説明したように、マイクロ命令を使用する場合、例えば、ＶＬＩＷ(Very Long Instruction Word：超長命令語)命令に埋め込むことで各処理を同時に実行することができ、実行サイクル数を削減することが可能となる。 FIG. 19 is a diagram illustrating a state in which a micro instruction is embedded (packed) in a VLIW instruction in the arithmetic processing unit according to the present embodiment. As described with reference to FIGS. 16 to 18, when a microinstruction is used, for example, each process can be executed simultaneously by embedding it in a VLIW (Very Long Instruction Word) instruction. It is possible to reduce the number of execution cycles.

すなわち、複数のマイクロ命令をＶＬＩＷ命令に埋め込むことで、ループ処理の命令数を削減することができ、さらに、ループの実行サイクル数を削減することもできる。また、ベースプロセッサ(ＶＬＩＷプロセッサを想定：演算処理装置１)の命令セットアーキテクチャを有効に活用することも可能になる。 That is, by embedding a plurality of microinstructions in the VLIW instruction, the number of loop processing instructions can be reduced, and the number of loop execution cycles can also be reduced. It is also possible to effectively utilize the instruction set architecture of the base processor (assuming a VLIW processor: arithmetic processing unit 1).

図１９は、Ｍ個のマイクロ命令をＮ個のＶＬＩＷ命令にパックする様子を示しているが、ここで、VLIW 1命令〜VLIW 3命令によるプロローグ処理、および、VLIW N-2命令〜VLIW N命令によるエピローグ処理を、図２０および図２１を参照して説明する。 FIG. 19 shows how M microinstructions are packed into N VLIW instructions. Here, prologue processing by VLIW 1 instruction to VLIW 3 instructions and VLIW N-2 instructions to VLIW N instructions are shown. The epilogue processing according to the above will be described with reference to FIGS. 20 and 21. FIG.

図２０は、図１９に示すＶＬＩＷ命令のプロローグ処理を説明するための図であり、図２０(a)はVLIW 1命令の処理を示し、図２０(b)はVLIW 2命令の処理を示し、そして、図２０(c)はVLIW 3命令の処理を示す。 20 is a diagram for explaining the prologue processing of the VLIW instruction shown in FIG. 19, FIG. 20 (a) shows the processing of the VLIW 1 instruction, FIG. 20 (b) shows the processing of the VLIW 2 instruction, FIG. 20C shows the processing of the VLIW 3 instruction.

ここで、図１９に示されるように、プロローグ処理は、停止しているストリームエンジン２起動させる処理で、VLIW 1[pop ]、VLIW 2[pop, exec ]およびVLIW 3[pop, exec, push]の３つの命令を実行することで達成される。 Here, as shown in FIG. 19, the prologue process is a process of starting the stopped stream engine 2, and VLIW 1 [pop], VLIW 2 [pop, exec] and VLIW 3 [pop, exec, push] This is achieved by executing the following three instructions.

まず、図２０(a)に示されるように、VLIW 1命令によるｐｏｐ命令のみ実行する。すなわち、ｐｏｐ命令により、ＰＯＰ部２１がデータメモリ４からデータを読み出してレジスタ２２１，２２２に書き込む処理Ｐ２１を実行する。これにより、レジスタ２２１，２２２には、ＥＸＥＣ部２３が演算処理を行うデータが投入されたことになる。 First, as shown in FIG. 20A, only the pop instruction by the VLIW 1 instruction is executed. That is, in response to the pop instruction, the POP unit 21 executes processing P21 for reading data from the data memory 4 and writing the data in the registers 221 and 222. As a result, the registers 221 and 222 are filled with data to be processed by the EXEC unit 23.

次に、図２０(b)に示されるように、VLIW 2命令によるｐｏｐ命令およびｅｘｅｃ命令を実行する。すなわち、ｐｏｐ命令により上述した処理Ｐ２１を実行すると共に、ｅｘｅｃ命令により、ＥＸＥＣ部２３がレジスタ２２１，２２２に書き込まれたデータに対してストリーム処理を実行してレジスタ２４に書き込む処理Ｐ２２を実行する。 Next, as shown in FIG. 20B, a pop instruction and an exec instruction by the VLIW 2 instruction are executed. That is, the above-described process P21 is executed by the pop instruction, and the EXEC part 23 executes the stream process on the data written in the registers 221 and 222 and executes the process P22 written in the register 24 by the exec instruction.

これにより、レジスタ２２１，２２２には、ＥＸＥＣ部２３が演算処理を行うデータが投入され、また、レジスタ２４には、ＰＵＳＨ部２５がデータメモリ４に書き込む演算結果のデータが投入されたことになる。 As a result, the data to be executed by the EXEC unit 23 is input to the registers 221 and 222, and the operation result data to be written to the data memory 4 by the PUSH unit 25 is input to the register 24. .

そして、図２０(c)に示されるように、VLIW 3命令によるｐｏｐ命令，ｅｘｅｃ命令およびＰＵＳＨ命令を実行する。すなわち、ｐｏｐ命令により処理Ｐ２１を実行すると共に、ｅｘｅｃ命令により処理Ｐ２２を実行し、さらに、ＰＵＳＨ命令により、ＰＵＳＨ部２５がレジスタ２４に書き込まれた演算結果データをデータメモリ４に書き込む処理Ｐ２３を実行する。 Then, as shown in FIG. 20 (c), the pop instruction, the exec instruction, and the PUSH instruction by the VLIW 3 instruction are executed. That is, the process P21 is executed by the pop instruction, the process P22 is executed by the exec instruction, and the process P23 in which the PUSH unit 25 writes the operation result data written in the register 24 to the data memory 4 by the PUSH instruction is executed. To do.

なお、このエピローグ処理以降、図２１を参照して説明するエピローグ処理まで、VLIW 3命令と同じ命令(VLIW 4命令，VLIW 5命令，…)により処理Ｐ２１〜Ｐ２３によるパイプライン処理が継続して実行される。 From this epilogue processing to the epilogue processing described with reference to FIG. 21, the pipeline processing by the processing P21 to P23 is continuously executed by the same instruction as the VLIW 3 instruction (VLIW 4 instruction, VLIW 5 instruction,...). Is done.

図２１は、図１９に示すＶＬＩＷ命令のエピローグ処理を説明するための図であり、図２１(a)はVLIW N-2命令の処理を示し、図２１(b)はVLIW N-1命令の処理を示し、そして、図２１(c)はVLIW N命令の処理を示す。 FIG. 21 is a diagram for explaining the epilogue processing of the VLIW instruction shown in FIG. 19, FIG. 21 (a) shows the processing of the VLIW N-2 instruction, and FIG. 21 (b) shows the processing of the VLIW N-1 instruction. The processing is shown, and FIG. 21 (c) shows the processing of the VLIW N instruction.

ここで、図１９に示されるように、エピローグ処理は、図２０を参照して説明したプロローグ処理と逆に動作中のストリームエンジン２を停止させる処理である。このエピローグ処理は、VLIW N-2[pop, exec, push]、VLIW N-1[ exec, push]およびVLIW N[ push]、の３つの命令を実行することで達成される。 Here, as shown in FIG. 19, the epilogue process is a process of stopping the operating stream engine 2 in reverse to the prologue process described with reference to FIG. 20. This epilogue processing is achieved by executing three instructions, VLIW N-2 [pop, exec, push], VLIW N-1 [exec, push] and VLIW N [push].

まず、図２１(a)に示されるように、VLIW N-2命令によるｐｏｐ命令，ｅｘｅｃ命令およびｐｕｓｈ命令を行う。このVLIW N-2命令は、図２０(c)を参照して説明したVLIW 3命令、すなわち、処理Ｐ２１〜Ｐ２３により継続して実行されるパイプライン処理と同じものである。 First, as shown in FIG. 21 (a), a pop instruction, an exec instruction, and a push instruction by the VLIW N-2 instruction are performed. This VLIW N-2 instruction is the same as the VLIW 3 instruction described with reference to FIG. 20 (c), that is, the pipeline process continuously executed by the processes P21 to P23.

次に、図２１(b)に示されるように、VLIW N-1命令によるｅｘｅｃ命令およびｐｕｓｈ命令を実行する。すなわち、ｐｏｐ命令を除くことにより、ＰＯＰ部２１がデータメモリ４からデータを読み出してレジスタ２２１，２２２に書き込む処理Ｐ２１を停止する。これにより、レジスタ２２１，２２２は空状態となる。 Next, as shown in FIG. 21B, the exec instruction and the push instruction by the VLIW N-1 instruction are executed. That is, by removing the pop instruction, the POP unit 21 stops the process P21 of reading data from the data memory 4 and writing it to the registers 221 and 222. As a result, the registers 221 and 222 become empty.

そして、図２１(c)に示されるように、VLIW N命令によるＰＵＳＨ命令のみ実行する。すなわち、ｐｏｐ命令およびｅｘｅｃ命令を除くことで、レジスタ２２１，２２２だけでなく、レジスタ２４も、空状態となる。 Then, as shown in FIG. 21 (c), only the PUSH instruction by the VLIW N instruction is executed. That is, by removing the pop instruction and the exec instruction, not only the registers 221 and 222 but also the register 24 becomes empty.

なお、ｐｏｐ命令，ｅｘｅｃ命令およびｐｕｓｈ命令の３つのマイクロ命令によりストリームエンジン２を制御するのは、単なる例であり、さらなるマイクロ命令を追加し、或いは、異なるマイクロ命令を適用するといった様々な変更が可能なのはいうまでもない。 It should be noted that controlling the stream engine 2 with three microinstructions such as a pop instruction, an exec instruction, and a push instruction is merely an example, and various changes such as adding additional microinstructions or applying different microinstructions are possible. It goes without saying that it is possible.

なお、上述した実施例では、ＬＴＥアドバンスト等における行列演算処理を行う演算処理装置を例として説明したが、本実施例は、このような無線通信デバイスに適用する演算処理装置に限定されず、様々な演算処理装置に幅広く適用することが可能である。 In the above-described embodiment, an arithmetic processing apparatus that performs matrix arithmetic processing in LTE Advanced or the like has been described as an example. However, the present embodiment is not limited to the arithmetic processing apparatus applied to such a wireless communication device, and various The present invention can be widely applied to various arithmetic processing devices.

以上、実施形態を説明したが、ここに記載したすべての例や条件は、発明および技術に適用する発明の概念の理解を助ける目的で記載されたものであり、特に記載された例や条件は発明の範囲を制限することを意図するものではない。また、明細書のそのような記載は、発明の利点および欠点を示すものでもない。発明の実施形態を詳細に記載したが、各種の変更、置き換え、変形が発明の精神および範囲を逸脱することなく行えることが理解されるべきである。 Although the embodiment has been described above, all examples and conditions described herein are described for the purpose of helping understanding of the concept of the invention applied to the invention and the technology. It is not intended to limit the scope of the invention. Nor does such a description of the specification indicate an advantage or disadvantage of the invention. Although embodiments of the invention have been described in detail, it should be understood that various changes, substitutions and modifications can be made without departing from the spirit and scope of the invention.

以上の実施例を含む実施形態に関し、さらに、以下の付記を開示する。
（付記１）
演算を実行する演算器、および、ストリーム処理を実行するストリームエンジンを含み、前記演算器のデータパスと前記ストリームエンジンのデータパスを密結合した、
ことを特徴とする演算処理装置。 Regarding the embodiment including the above examples, the following supplementary notes are further disclosed.
(Appendix 1)
An arithmetic unit that executes an operation, and a stream engine that executes stream processing, wherein a data path of the arithmetic unit and a data path of the stream engine are tightly coupled;
An arithmetic processing apparatus characterized by that.

（付記２）
さらに、
命令を発行する命令発行部を有し、
前記命令発行部は、前記演算器に対する命令を発行すると共に、前記ストリームエンジンに対する命令も発行する、
ことを特徴とする付記１に記載の演算処理装置。 (Appendix 2)
further,
An instruction issuing unit for issuing instructions;
The instruction issuing unit issues an instruction for the arithmetic unit and also issues an instruction for the stream engine.
The arithmetic processing apparatus according to Supplementary Note 1, wherein:

（付記３）
前記ストリームエンジンは、
メモリからデータを読み出す読出回路と、
前記読み出したデータに対してストリーム処理を実行する実行回路と、
前記ストリーム処理された演算結果を前記メモリに書き込む書込回路と、を含む、
ことを特徴とする付記２に記載の演算処理装置。 (Appendix 3)
The stream engine
A readout circuit for reading data from the memory;
An execution circuit for performing stream processing on the read data;
A write circuit for writing the stream-processed operation result into the memory,
The arithmetic processing device according to attachment 2, wherein

（付記４）
前記読出回路は、ＰＯＰ部および第１レジスタを含み、
前記ＰＯＰ部は、前記メモリにおける、先頭アドレスおよびストリーム長で示された第１メモリ部からデータを読み出して前記第１レジスタに格納する、
ことを特徴とする付記３に記載の演算処理装置。 (Appendix 4)
The readout circuit includes a POP unit and a first register,
The POP unit reads data from a first memory unit indicated by a head address and a stream length in the memory and stores the data in the first register.
The arithmetic processing apparatus according to attachment 3, wherein:

（付記５）
前記実行回路は、ＥＸＥＣ部および第２レジスタを含み、
前記ＥＸＥＣ部は、前記第１レジスタに格納されたデータに対してストリーム処理を実行し、前記ストリーム処理された演算結果を前記第２レジスタに格納する、
ことを特徴とする付記４に記載の演算処理装置。 (Appendix 5)
The execution circuit includes an EXEC unit and a second register,
The EXEC unit performs a stream process on the data stored in the first register, and stores the stream-processed operation result in the second register.
The arithmetic processing device according to appendix 4, wherein

（付記６）
前記実行回路は、階層化された複数のＥＸＥＣ部および各階層の前記ＥＸＥＣ部間に設けられた複数の第３レジスタを含む、
ことを特徴とする付記５に記載の演算処理装置。 (Appendix 6)
The execution circuit includes a plurality of hierarchical EXEC sections and a plurality of third registers provided between the EXEC sections of each hierarchy.
The arithmetic processing apparatus according to appendix 5, characterized in that:

（付記７）
前記書込回路は、ＰＵＳＨ部を含み、
前記ＰＵＳＨ部は、前記第２レジスタに格納された演算結果を、前記メモリにおける、先頭アドレスおよびストリーム長で示された第２メモリ部に書き込む、
ことを特徴とする付記５または付記６に記載の演算処理装置。 (Appendix 7)
The writing circuit includes a PUSH unit;
The PUSH unit writes the operation result stored in the second register to the second memory unit indicated by a head address and a stream length in the memory.
The arithmetic processing apparatus according to appendix 5 or appendix 6, characterized in that.

（付記８）
前記命令発行部が前記ストリームエンジンに発行する命令は、ステップ命令であり、
前記ストリームエンジンの各パイプラインステージは、１つの前記ステップ命令に従ってそれぞれ１つの処理を実行する、
ことを特徴とする付記２乃至付記７のいずれか１項に記載の演算処理装置。 (Appendix 8)
The command issued by the command issuing unit to the stream engine is a step command,
Each pipeline stage of the stream engine executes one process according to one step instruction,
The arithmetic processing apparatus according to any one of appendix 2 to appendix 7, characterized in that:

（付記９）
前記ストリーム処理に使用するパラメータ情報は、単一かつ長ビット長のセット命令で表現される、
ことを特徴とする付記８に記載の演算処理装置。 (Appendix 9)
The parameter information used for the stream processing is expressed by a single and long bit length set instruction.
The arithmetic processing apparatus according to appendix 8, wherein

（付記１０）
前記ストリーム処理に使用するパラメータ情報は、各ストリームの先頭アドレス，ストリーム長および演算モードを含む、
ことを特徴とする付記９に記載の演算処理装置。 (Appendix 10)
The parameter information used for the stream processing includes the start address of each stream, the stream length, and the operation mode.
The arithmetic processing apparatus according to appendix 9, wherein

（付記１１）
さらに、
前記ストリーム処理に使用するパラメータ情報を一括してセットするパラメータレジスタを含み、
前記ストリームエンジンの各パイプラインステージは、前記パラメータレジスタからパラメータ情報を参照してパイプライン実行する、
ことを特徴とする付記８に記載の演算処理装置。 (Appendix 11)
further,
Including a parameter register that collectively sets parameter information used for the stream processing;
Each pipeline stage of the stream engine executes pipeline referring to parameter information from the parameter register.
The arithmetic processing apparatus according to appendix 8, wherein

（付記１２）
前記命令発行部が前記ストリームエンジンに発行する命令は、ステップ命令を分解して、前記ストリームエンジンの各パイプラインステージの操作を制御する短ビット長のマイクロ命令であり、
前記各パイプラインステージは、それぞれ対応する前記マイクロ命令に従って独立して処理を実行する、
ことを特徴とする付記２乃至付記７のいずれか１項に記載の演算処理装置。 (Appendix 12)
The instruction issued to the stream engine by the instruction issuing unit is a short bit-length microinstruction that controls the operation of each pipeline stage of the stream engine by disassembling a step instruction.
Each pipeline stage performs processing independently according to the corresponding microinstruction,
The arithmetic processing apparatus according to any one of appendix 2 to appendix 7, characterized in that:

（付記１３）
さらに、
前記メモリと前記読出回路の間に設けられた第１ＦＩＦＯバッファを有し、
前記メモリはＤＭＡ制御され、前記メモリからデータを読み出す読出回路の処理を制御する第１マイクロ命令を停止することで、前記第１ＦＩＦＯバッファを満状態として、前記ストリームエンジンのパイプライン処理を停止する、
ことを特徴とする付記１２に記載の演算処理装置。 (Appendix 13)
further,
A first FIFO buffer provided between the memory and the readout circuit;
The memory is DMA-controlled, and the first microinstruction that controls the processing of the read circuit that reads data from the memory is stopped, thereby filling the first FIFO buffer and stopping the pipeline processing of the stream engine.
The arithmetic processing apparatus according to appendix 12, wherein

（付記１４）
さらに、
前記書込回路と前記メモリの間に設けられた第２ＦＩＦＯバッファを有し、
前記メモリはＤＭＡ制御され、前記メモリへデータを書き込む書込回路の処理を制御する第２マイクロ命令を停止することで、前記第２ＦＩＦＯバッファを空状態として、前記ストリームエンジンのパイプライン処理を停止する、
ことを特徴とする付記１２に記載の演算処理装置。 (Appendix 14)
further,
A second FIFO buffer provided between the write circuit and the memory;
The memory is DMA-controlled, and by stopping the second microinstruction that controls the processing of the writing circuit that writes data to the memory, the second FIFO buffer is emptied, and the pipeline processing of the stream engine is stopped. ,
The arithmetic processing apparatus according to appendix 12, wherein

（付記１５）
前記演算器がＶＬＩＷ命令により制御されるとき、
前記ストリームエンジンの各パイプラインステージの操作を制御するマイクロ命令をＶＬＩＷ命令に埋め込む、
ことを特徴とする付記１２乃至付記１４のいずれか１項に記載の演算処理装置。 (Appendix 15)
When the arithmetic unit is controlled by a VLIW instruction,
Embed a micro instruction in the VLIW instruction that controls the operation of each pipeline stage of the stream engine.
The arithmetic processing device according to any one of appendix 12 to appendix 14, wherein

１プロセッサ
２，２００ストリームエンジン
４，４００データメモリ
５，８ＤＭＡ
７，６１，６２ＦＩＦＯバッファ
１０，１１０，３１０レジスタ
１１，１０１，３０１命令読出部
１２，１０２，３０２命令解釈部
１３，１０３，３０３レジスタ読出部
１４，１０４，３０４命令発行部
１５，１０５演算器
１６，１０６メモリアクセス部
１７，１０７レジスタ書込部
１８，１０８命令メモリ
１９，１０９データメモリ
２１ＰＯＰ部
２３ＥＸＥＣ部
２４，２２１，２２２，２２１ａ，２２１ｂ，２２２ａ，２２２ｂ，２４１〜２４３レジスタ
２５ＰＵＳＨ部
４１〜４３メモリ部
１００ベースプロセッサ
１４０パラメータレジスタ
２１０読出回路
２３０実行回路
２５０書込回路
３００コプロセッサ 1 processor 2,200 stream engine 4,400 data memory 5,8 DMA
7, 61, 62 FIFO buffer 10, 110, 310 Register 11, 101, 301 Instruction reading unit 12, 102, 302 Instruction interpreting unit 13, 103, 303 Register reading unit 14, 104, 304 Instruction issuing unit 15, 105 16, 106 Memory access unit 17, 107 Register writing unit 18, 108 Instruction memory 19, 109 Data memory 21 POP unit 23 EXEC unit 24, 221, 222, 221a, 221b, 222a, 222b, 241 to 243 Register 25 PUSH unit 41 to 43 Memory unit 100 Base processor 140 Parameter register 210 Read circuit 230 Execution circuit 250 Write circuit 300 Coprocessor

Claims

An arithmetic processing unit comprising:
An arithmetic unit for performing the operation;
A stream engine that performs stream processing;
An instruction issuing unit for issuing an instruction to the arithmetic unit and the stream engine,
The command issued by the command issuing unit to the stream engine is a step command,
Each pipeline stage of the stream engine executes one process according to one step instruction,
An arithmetic processing apparatus characterized by that.

The parameter information used for the stream processing is expressed by a single and long bit length set instruction.
The arithmetic processing apparatus according to claim 1.

The parameter information used for the stream processing includes the start address of each stream, the stream length, and the operation mode.
The arithmetic processing apparatus according to claim 2.

further,
Including a parameter register that collectively sets parameter information used for the stream processing;
Each pipeline stage of the stream engine executes pipeline referring to parameter information from the parameter register.
The arithmetic processing apparatus according to claim 1.