JP5598114B2

JP5598114B2 - Arithmetic unit

Info

Publication number: JP5598114B2
Application number: JP2010141377A
Authority: JP
Inventors: 毅葛; 好正竹部
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2010-06-22
Filing date: 2010-06-22
Publication date: 2014-10-01
Anticipated expiration: 2030-06-22
Also published as: JP2012008622A

Description

本発明は、演算ユニットに関する。 The present invention relates to an arithmetic unit.

スーパーコンピュータに使用されているプロセッサの中には、図８や図９に示したような演算ユニットを備えたものが存在している。 Among processors used in supercomputers, there are processors equipped with arithmetic units as shown in FIGS.

図８に示してある演算ユニットは、ＡＬＵ系処理、ＭＵＬ系処理及びＬＤＳＴ系処理を実行可能な４個の演算器ｐ０〜ｐ３を備え、命令バッファに入力された命令毎に、演算器ｐＸ（Ｘ＝０〜３のいずれか）が複数回動作するユニットである。なお、ＡＬＵ系処理とは、加算処理、論理演算処理等のことである。また、ＭＵＬ系処理とは、乗算処理等のことであり、ＬＤＳＴ系処理とは、ロード処理、ストア処理等のことである。 The arithmetic unit shown in FIG. 8 includes four arithmetic units p0 to p3 capable of executing ALU processing, MUL processing, and LDST processing, and for each instruction input to the instruction buffer, an arithmetic unit pX ( X = 0 to 3) is a unit that operates a plurality of times. Note that the ALU processing is addition processing, logical operation processing, and the like. Further, the MUL system process is a multiplication process or the like, and the LDST system process is a load process or a store process.

より具体的には、この演算ユニットの制御パスは、各種処理を実行実行可能な各演算器ｐＸに、或る特定の処理を、オペランド（読み出し／書き込み対象とするレジスタファイル内のレジスタ）を変更しながら複数回（複数サイクル分）行わせることが出来るユニット（回路）となっている。そして、この演算ユニットのスケジューラは、サイクル毎に、命令バッファ内の命令をフェッチし、処理を実行していない（又は、処理が現サイクルで完了する）演算器ｐＸが複数サイクル分の動作を開始することになるように、制御パスを制御するユニットとなっている。 More specifically, the control path of this arithmetic unit changes a specific process and an operand (register in a register file to be read / written) to each arithmetic unit pX capable of executing and executing various processes. However, it is a unit (circuit) that can be performed a plurality of times (for a plurality of cycles). Then, the scheduler of this arithmetic unit fetches the instruction in the instruction buffer every cycle, and the arithmetic unit pX not executing the processing (or completing the processing in the current cycle) starts the operation for a plurality of cycles. As a result, it is a unit that controls the control path.

図９に示してある演算ユニットも、図８の演算ユニットと同様に、命令バッファに入力された命令毎に、演算器ｐＸ（Ｘ＝０〜７のいずれか）が複数回動作するユニットである。ただし、この演算ユニットは、全処理機能を有する演算器（図８）ではなく、単一の処理機能しか有さない演算器ｐ０〜ｐ７（ＡＬＵ系処理のみが可能な演算器ｐ０〜ｐ３、ＭＵＬ系処理のみが可能な演算器ｐ４、ｐ５、及び、ＬＤＳＴ系処理のみが可能な演算器ｐ６、ｐ７）を備えたユニットとなっている。 Similarly to the arithmetic unit of FIG. 8, the arithmetic unit shown in FIG. 9 is also a unit in which the arithmetic unit pX (X = 0 to 7) operates a plurality of times for each instruction input to the instruction buffer. . However, this arithmetic unit is not an arithmetic unit having all processing functions (FIG. 8), but arithmetic units p0 to p7 having only a single processing function (operators p0 to p3 capable of only ALU processing, MUL It is a unit including arithmetic units p4 and p5 capable of performing only system processing and arithmetic units p6 and p7) capable of performing only LDST system processing.

特開２００４−３８７５１号公報JP 2004-38751 A

Ｊ．Ｌ．ヘネシー、Ｄ．Ａ．パターソン（J. L. Hennessy and D. A. Patterson）著，「コンピュータアーキテクチャ:定量的アプローチ（Computer Architecture: A Quantitative Approach）」，（米国），第２版, モルガンカウフマン出版（Morgan Kaufmann Publishers）, １９９６年，アペンディックスＢベクタープロセッサ（Appendix B Vector Processors）J. et al. L. Hennessy, D.C. A. By JL Hennessy and DA Patterson, "Computer Architecture: A Quantitative Approach" (USA), 2nd edition, Morgan Kaufmann Publishers, 1996, Appendix B Vector Processor (Appendix B Vector Processors)

各演算器が単一の処理機能しか有していない演算ユニット（図９）のスケジューラに要求される機能は、基本的には、命令の発行先とすることが可能な，空いている（処理を実行中でない／処理が現サイクルで完了する）演算器を特定する機能だけである。そのため、この演算ユニットは、スケジューラの設計・製造が容易なものとなっている。ただし、この演算ユニットは、ポート数が多いレジスタファイルを必要とするという欠点、つまり、回路規模が大きくならざるを得ないという欠点を有するものとなっている。 The functions required of the scheduler of an arithmetic unit (FIG. 9) in which each arithmetic unit has only a single processing function are basically free (processing) that can be an instruction issue destination. Is not being executed / processing is completed in the current cycle). Therefore, this arithmetic unit is easy to design and manufacture a scheduler. However, this arithmetic unit has a disadvantage that a register file with a large number of ports is required, that is, a circuit scale must be increased.

また、各演算器が全処理機能を有している演算ユニット（図８）のスケジューラに要求される機能は、空いている演算器を特定する機能だけである。従って、この演算ユニットも、スケジューラの設計・製造が容易なものとなっている。しかも、この演算ユニットは、そのレジスタファイルに必要とされるポート数が比較的に少ないものであるため、演算ユニットの製造コストや消費電力を下げるという観点からは、図９の構成よりも、この図８の構成を採用した方が良い。 Further, the function required for the scheduler of the arithmetic unit (FIG. 8) in which each arithmetic unit has all the processing functions is only a function for specifying an empty arithmetic unit. Therefore, this arithmetic unit is also easy to design and manufacture the scheduler. In addition, since this arithmetic unit requires a relatively small number of ports in its register file, this arithmetic unit is less than the configuration of FIG. 9 from the viewpoint of reducing the manufacturing cost and power consumption of the arithmetic unit. It is better to adopt the configuration of FIG.

しかしながら、ＡＬＵ系処理機能の利用頻度の方が他の処理機能の利用頻度よりも通常高いため、図８の構成を採用した場合、或る演算器のＬＤＳＴ系処理機能やＭＵＬ系処理機能の利用頻度が低い演算ユニット、つまり、効率的に利用されない回路を有する演算ユニットが得られてしまうことになる。 However, since the usage frequency of the ALU processing function is usually higher than the usage frequency of the other processing functions, when the configuration of FIG. 8 is adopted, the use of the LDST processing function or the MUL processing function of a certain arithmetic unit is used. An arithmetic unit having a low frequency, that is, an arithmetic unit having a circuit that is not efficiently used is obtained.

そして、効率的に利用されない回路を少なくすることが出来れば、演算ユニットの製造コストや消費電力を低減できる。従って、演算ユニットを、図１０に示したように、ＡＬＵ系処理とＭＵＬ系処理とを実行可能な演算器ｐ０、ｐ１と、ＡＬＵ系処理とＬＤＳＴ系処理とを実行可能なｐ２、ｐ３とを備えたものとすることが考えられる。 If the number of circuits that are not efficiently used can be reduced, the manufacturing cost and power consumption of the arithmetic unit can be reduced. Accordingly, as shown in FIG. 10, the arithmetic unit includes arithmetic units p0 and p1 capable of executing ALU processing and MUL processing, and p2 and p3 capable of executing ALU processing and LDST processing. It is conceivable to be provided.

ただし、この構成を採用した演算ユニットでは、各演算器が全処理機能を有している演算ユニット（図８）では生じない問題が生じることになる。 However, in the arithmetic unit adopting this configuration, there arises a problem that does not occur in the arithmetic unit (FIG. 8) in which each arithmetic unit has all processing functions.

具体的には、図８に示してある構成は、各命令を空いているパイプライン〔演算器とそれを複数回動作させるための，制御パス内の回路とからなる部分〕に対して発行するスケジューラを搭載しておけば、各パイプラインが効率的に機能する（各パイプラインの未利用時間が短い）演算ユニットを実現することが出来るものである。 Specifically, in the configuration shown in FIG. 8, each instruction is issued to an empty pipeline (part consisting of an arithmetic unit and a circuit in a control path for operating it multiple times). If a scheduler is installed, it is possible to realize an arithmetic unit in which each pipeline functions efficiently (the unused time of each pipeline is short).

これに対して、図１０に示してある構成は、そのようなレベルの命令発行機能しか有さないスケジューラを用いたのでは、各パイプラインが効率的に機能する演算ユニットを実現できないものとなっている。 On the other hand, the configuration shown in FIG. 10 cannot realize an arithmetic unit in which each pipeline functions efficiently if a scheduler having only such a level of instruction issue function is used. ing.

具体的には、図１０の演算ユニットのスケジューラが、命令毎に、その命令を処理（実行）する機能を有する空きパイプラインを予め定められている順に探索し、探索したパイプラインに対して命令を発行するものである共に、ＡＬＵ系命令については、パイプラインｐ０、ｐ１、ｐ２、ｐ３の順に空きパイプラインを探索し、ＭＵＬ系命令については、パイプラインｐ０、ｐ１の順に空きパイプラインを探索し、ＬＤＳＴ系命令については、パイプラインｐ２、ｐ３の順に空きパイプラインを探索するものであると仮定する。 Specifically, for each instruction, the scheduler of the arithmetic unit in FIG. 10 searches for an empty pipeline having a function to process (execute) the instruction in a predetermined order, and the instruction is searched for the searched pipeline. For ALU instructions, search for empty pipelines in the order of pipelines p0, p1, p2, and p3. For MUL instructions, search for empty pipelines in the order of pipelines p0 and p1. For the LDST instruction, it is assumed that an empty pipeline is searched in the order of pipelines p2 and p3.

そして、そのような構成を有する演算ユニットの命令バッファに、加算処理を８回行うことが必要なＡＬＵ系命令ｖａｄｄ、乗算処理を８回行うことが必要なＭＵＬ系命令ｖｍｕｌ、ロード処理を８回行うことが必要なＬＤＳＴ系命令ｖｌｏａｄ、乗算処理を８回行うことが必要なＭＵＬ系命令ｖｍｕｌが、この順に、入力された場合を考える。 Then, in the instruction buffer of the arithmetic unit having such a configuration, an ALU instruction vadd that needs to be added eight times, a MUL instruction vmul that needs to be multiplied eight times, and a load process eight times Consider a case where an LDST instruction vload that needs to be executed and a MUL instruction vmul that needs to be multiplied eight times are input in this order.

この場合、図１１に示したように、パイプラインｐ０に対して、最初の命令ｖａｄｄが発行されてから、パイプラインｐ１、ｐ２に対して、順次、命令ｖｍｕｌ、ｖｌｏａｄが発行される。従って、４番目の命令ｖｍｕｌのスケジューラによるフェッチ時には、ＭＵＬ系命令を処理できないパイプラインｐ３しか空いていないことになる。そして、ＭＵＬ系命令を処理できるパイプラインが空くのは９サイクル目であるため、４番目の命令ｖｍｕｌは、９サイクル目にパイプラインｐ０に対して発行されることになる。 In this case, as shown in FIG. 11, after the first instruction vadd is issued to the pipeline p0, the instructions vmul and vload are sequentially issued to the pipelines p1 and p2. Therefore, when the fourth instruction vmul is fetched by the scheduler, only the pipeline p3 that cannot process the MUL instruction is free. Since the pipeline that can process the MUL instruction is free in the ninth cycle, the fourth instruction vmul is issued to the pipeline p0 in the ninth cycle.

上記した一連の命令がこのような形で処理されるということは、この演算ユニットでは
、各パイプラインが効率的に利用されていないということである。何故ならば、これらの命令は、図１２に示したように、まず、パイプラインｐ３に対して、最初の命令ｖａｄｄを発行してから、パイプラインｐ１、ｐ２、ｐ０に対して、順次、命令ｖｍｕｌ、ｖｌｏａｄ、ｖｍｕｌを発行するようにすれば、サイクル毎に発行できるものとなっているからである。 That the series of instructions described above are processed in this manner means that each pipeline is not efficiently used in this arithmetic unit. This is because these instructions, as shown in FIG. 12, first issue the first instruction vadd to the pipeline p3, and then sequentially execute the instructions to the pipelines p1, p2, and p0. This is because if vmul, vload, and vmul are issued, they can be issued for each cycle.

このように、図１０に示してある構成は、比較的に単純な構成のスケジューラを用いたのでは、各パイプラインが効率的に機能する演算ユニットを実現できないものとなっているのであるが、命令バッファ内の命令が、図１１に示したような順番で各パイプラインに発行すべきものであることを判断できるスケジューラを設計することは極めて困難なことである。しかも、そのようなスケジューラは、回路規模が大きなものとならざるを得ない。 As described above, the configuration shown in FIG. 10 cannot realize an arithmetic unit in which each pipeline functions efficiently if a scheduler having a relatively simple configuration is used. It is extremely difficult to design a scheduler that can determine that instructions in the instruction buffer should be issued to each pipeline in the order shown in FIG. Moreover, such a scheduler must be large in circuit scale.

そこで、開示の技術の課題は、マルチサイクル動作をする複数のパイプラインを備えた、安価に製造可能な演算ユニットであって、複雑な構成のスケジューラを搭載しなくても、各パイプラインを効率的に動作させることが出来る演算ユニットを提供することにある。 Therefore, the problem with the disclosed technology is that it is an arithmetic unit that can be manufactured at low cost with multiple pipelines that perform multi-cycle operation, and each pipeline can be efficiently operated without having a scheduler with a complicated configuration. It is to provide an arithmetic unit that can be operated in an automatic manner.

上記課題を解決するために、開示の技術の一態様の演算ユニットは、第１種処理の実行機能と第２種処理の実行機能とを有する１つ以上の第１パイプライン、及び、第１種処理の実行機能と第３種処理の実行機能とを有する１つ以上の第２パイプラインを含む、マルチサイクル動作をする複数のパイプラインと、新たな処理を開始できる状態にある第２パイプラインに、或る第１パイプラインが既に開始している，第１種処理を複数回実行することにより完了する第１種ループ処理を引き継がせる機能を有する制御回路とを備える。 In order to solve the above problems, an arithmetic unit according to an aspect of the disclosed technology includes one or more first pipelines having a first type process execution function and a second type process execution function, and a first type A plurality of pipelines that perform a multi-cycle operation, including one or more second pipelines having a seed processing execution function and a third type processing execution function, and a second pipe that is ready to start a new process And a control circuit having a function of taking over a first type loop process that has already been started by a first pipeline and that is completed by executing the first type process a plurality of times.

上記構成を採用しておけば、マルチサイクル動作をする複数のパイプラインを備えた、安価に製造可能な演算ユニットであって、複雑な構成のスケジューラを搭載しなくても、各パイプラインを効率的に動作させることが出来る演算ユニットを実現することが出来る。 If the above configuration is adopted, it is an arithmetic unit that can be manufactured at low cost with multiple pipelines that perform multi-cycle operation, and each pipeline can be efficiently operated without having a complicated scheduler. It is possible to realize an arithmetic unit that can be operated automatically.

実施形態に係る演算ユニットの概略構成図。The schematic block diagram of the arithmetic unit which concerns on embodiment. 演算ユニットの制御情報レジスタに記憶される制御情報の説明図。Explanatory drawing of the control information memorize | stored in the control information register | resistor of an arithmetic unit. スケジューラが実行する命令発行処理の流れ図。The flowchart of the instruction issue process which a scheduler performs. スケジューラによる制御内容の説明図。Explanatory drawing of the control content by a scheduler. スケジューラによる制御内容の説明図。Explanatory drawing of the control content by a scheduler. スケジューラによる制御内容の説明図。Explanatory drawing of the control content by a scheduler. 既存の，全処理機能を有する複数のパイプラインを備えた演算ユニットの概略構成図。The schematic block diagram of the arithmetic unit provided with several existing pipelines which have all the processing functions. 既存の，単一の処理機能を有する複数のパイプラインを備えた演算ユニットの変形例の説明図Explanatory drawing of a modification example of an existing arithmetic unit having a plurality of pipelines having a single processing function 既存の，全処理機能を有する複数のパイプラインを備えた演算ユニットの変形／改良例の説明図。Explanatory drawing of the modification / improvement example of the arithmetic unit provided with the some pipeline which has the existing all processing function. 既存の，単一の処理機能を有する複数のパイプラインを備えた演算ユニットの概略構成図。The schematic block diagram of the arithmetic unit provided with the some pipeline which has the existing single processing function. 演算ユニットの，採用できることが望ましい構成の説明図。Explanatory drawing of the structure which it is desirable to employ | adopt about an arithmetic unit. 図１０の構成を採用した場合に生じ得る問題を説明するための図。The figure for demonstrating the problem which may arise when the structure of FIG. 10 is employ | adopted. 図１０の構成を採用した場合に生じ得る問題を説明するための説明図。Explanatory drawing for demonstrating the problem which may arise when the structure of FIG. 10 is employ | adopted.

まず、図１及び図２を用いて、実施形態に係る演算ユニット１０の概要を説明する。なお、これらの図のうち、図１は、演算ユニット１０の概略構成図であり、図２は、演算ユニット１０の制御情報レジスタ２１_X（Ｘ＝０〜３）に記憶される制御情報の説明図であ
る。 First, the outline of the arithmetic unit 10 according to the embodiment will be described with reference to FIGS. 1 and 2. Of these drawings, FIG. 1 is a schematic configuration diagram of the arithmetic unit 10, and FIG. 2 is an explanation of control information stored in the control information register 21 _X (X = 0 to 3) of the arithmetic unit 10. FIG.

図１に示してあるように、実施形態に係る演算ユニット１０は、命令バッファ１１、スケジューラ１２、制御パス１３及びデータパス１４を備えている。 As shown in FIG. 1, the arithmetic unit 10 according to the embodiment includes an instruction buffer 11, a scheduler 12, a control path 13, and a data path 14.

データパス１４は、データに対する各種処理を実際に行うユニット（機能ブロック、回路）である。このデータパス１４は、レジスタファイル２５と、４個の演算器ｐ０〜ｐ４とを、備えている。 The data path 14 is a unit (functional block, circuit) that actually performs various processes on data. The data path 14 includes a register file 25 and four arithmetic units p0 to p4.

レジスタファイル２５は、複数（本実施形態では、８個）のリードポート、及び、複数のライトポート（図示略）を備えた，複数のレジスタ（本実施形態では、３２個の６４ビットレジスタ）の集合体である。 The register file 25 includes a plurality of registers (in the present embodiment, eight 64-bit registers) and a plurality of registers (in the present embodiment, 32 64-bit registers) having a plurality of read ports and a plurality of write ports (not shown). It is an aggregate.

演算器ｐ０、ｐ１は、いずれも、ＡＬＵ系処理（加算処理、論理演算処理等）とＭＵＬ系処理（乗算処理等）とを実行可能なユニット（以下、ＭＵＬ系演算器とも表記する）である。演算器ｐ２、ｐ３は、いずれも、ＡＬＵ系処理とＬＤＳＴ系処理（ロード処理、ストア処理等）とを実行可能なユニット（以下、ＬＤＳＴ系演算器とも表記する）である。 The arithmetic units p0 and p1 are both units (hereinafter also referred to as MUL type arithmetic units) capable of executing ALU processing (addition processing, logical operation processing, etc.) and MUL processing (multiplication processing, etc.). . The arithmetic units p2 and p3 are both units (hereinafter also referred to as LDST type arithmetic units) capable of executing ALU processing and LDST processing (load processing, store processing, etc.).

データパス１４内の各演算器ｐＸ（Ｘ＝０〜３）は、１サイクル（後述するスケジューラ１２の命令発行周期）の間に１回分の処理が完了するユニットである。なお、各演算器ｐＸは、１サイクルの間に１処理しか行えないユニット（複数の処理を同時に実行することは出来ないユニット）である。また、図示の都合上、図１には、各演算器ｐＸの演算結果をレジスタファイル２５外に示してあるが、データパス１４は、各演算器ｐＸの演算結果がレジスタファイル２５内の特定のレジスタに格納されるものである。 Each arithmetic unit pX (X = 0 to 3) in the data path 14 is a unit that completes one process during one cycle (an instruction issue cycle of the scheduler 12 described later). Each computing unit pX is a unit that can perform only one process in one cycle (a unit that cannot simultaneously execute a plurality of processes). For convenience of illustration, FIG. 1 shows the calculation result of each calculator pX outside the register file 25, but the data path 14 indicates that the calculation result of each calculator pX is a specific value in the register file 25. It is stored in the register.

命令バッファ１１は、演算ユニット１０が受け付けた命令を時系列的に記憶しておくためのバッファ（一種のＦＩＦＯメモリ）である。 The instruction buffer 11 is a buffer (a kind of FIFO memory) for storing instructions received by the arithmetic unit 10 in time series.

スケジューラ１２は、バッファ１１内の先頭の命令（最も過去に受け付けた命令）をサイクル毎に読み出し（フェッチし）、読み出した命令に応じた内容の処理を、データパス１４（及び制御パス１３）に開始させるユニットである。このスケジューラ１２の詳細については後述するが、演算ユニット１０が受け付ける命令（スケジューラ１２が処理する命令）は、原則として、或る演算器ｐＸを複数回動作させることにより遂行できる命令である。より具体的には、演算ユニット１０が受け付ける命令は、原則として、加算処理等（いずれかの演算器ｐＸが実行可能な処理）をそのオペランドを変更しつつ複数回繰り返すことによって遂行できる命令となっている。 The scheduler 12 reads (fetches) the first instruction (the instruction received in the past in the past) in the buffer 11 for each cycle, and processes the contents corresponding to the read instruction to the data path 14 (and the control path 13). The unit to be started. Although details of the scheduler 12 will be described later, an instruction received by the arithmetic unit 10 (an instruction processed by the scheduler 12) is, in principle, an instruction that can be executed by operating a certain arithmetic unit pX a plurality of times. More specifically, the instruction received by the arithmetic unit 10 is, in principle, an instruction that can be executed by repeating an addition process or the like (a process that can be executed by any of the arithmetic units pX) a plurality of times while changing its operand. ing.

制御パス１３は、スケジューラ１２から与えられる制御情報（詳細は後述）に基づきデータパス１４を制御するユニットである。この制御パス１３は、４つの制御情報レジスタ２１₀〜２１₃、４つの命令更新回路２２₀〜２２₃、４つの２入力マルチプレクサ２３₀〜
２３₃、及び、４つの３入力マルチプレクサ２４₀〜２４₃を、備えている。 The control path 13 is a unit that controls the data path 14 based on control information (details will be described later) given from the scheduler 12. The control path 13 includes four control information registers 21 _{0 to} 21 ₃ , four instruction update circuits 22 _{0 to} 22 ₃ , and four two-input multiplexers 23 ₀ to 23.
23 ₃ , and four three-input multiplexers 24 _{0 to} 24 ₃ .

制御パス１３が備える各制御情報レジスタ２１_X（Ｘ＝０〜３）は、図２に示してある
ような構成の制御情報、すなわち、演算器制御情報、オペランド情報、現在のループ回数等からなる制御情報（本実施形態では、３２ビットの情報）を記憶しておくためのレジス
タである。 Each control information register 21 _X (X = 0 to 3) provided in the control path 13 includes control information having a configuration as shown in FIG. 2, that is, arithmetic unit control information, operand information, the current number of loops, and the like. This is a register for storing control information (32-bit information in this embodiment).

この制御情報は、スケジューラ１２が初期値を設定し、命令更新回路２２_Xが定期的に
（次サイクルへの移行時に）その内容を更新する情報である。 This control information is information in which the scheduler 12 sets an initial value and the instruction update circuit 22 _X periodically updates the contents (at the time of transition to the next cycle).

具体的には、制御情報レジスタ２１_X上の制御情報中の演算器制御情報は、次サイクル
にて行うべき処理（処理の種類）を指定するために、演算器ｐＸに供給される情報（図１参照）である。オペランド情報は、演算器制御情報が指定する種類の処理時に読み出し／書き込み対象とすべき，レジスタファイル２５内の幾つかのレジスタの指定情報（図２では、src0, src1, dst）を含む情報である。各制御情報レジスタ２１_X上のオペランド情報は、図１に模式的に示してあるように、演算器ｐＸにデータを出力するレジスタの指定情報等として（図１には、src0, src1相当のものだけを２本の矢印で示してある）、レジスタファイル２５に供給されている。 Specifically, the arithmetic unit control information in the control information on the control information register 21 _X is information supplied to the arithmetic unit pX in order to specify the processing (type of processing) to be performed in the next cycle (see FIG. 1). Operand information is information including designation information (src0, src1, dst in FIG. 2) of some registers in the register file 25 that should be read / written in the type of processing designated by the arithmetic unit control information. is there. As schematically shown in FIG. 1, the operand information on each control information register 21 _X is, for example, register designation information for outputting data to the arithmetic unit pX (in FIG. 1, those corresponding to src0 and src1) Only the two arrows) are supplied to the register file 25.

現在のループ回数（図２）は、制御情報レジスタ２１_X上の制御情報を更新する必要が
ある（次サイクルも演算器ｐＸを機能させる必要がある）か否かを判断するために命令更新回路２２_Xが参照する情報である。なお、次サイクルも演算器ｐＸを機能させる必要が
ある場合に命令更新回路２２_Xが行う処理は、制御情報レジスタ２１_Xから読み込んだ制御情報を、オペランド情報及び現在のループ回数を変更した上で出力する（制御情報レジスタ２１_Xに設定し直す）処理である。 The current loop count (FIG. 2) is an instruction update circuit for determining whether or not the control information on the control information register 21 _X needs to be updated (it is also necessary to make the arithmetic unit pX function in the next cycle). 22 _X is information to be referred to. Note that the processing performed by the instruction update circuit 22 _X when the arithmetic unit pX needs to function in the next cycle is performed after changing the operand information and the current loop count for the control information read from the control information register 21 _X. This is a process of outputting (resetting to the control information register 21 _X ).

制御パス１３（図１）内の各２入力マルチプレクサ２３_Xは、スケジューラ１２による
制御情報レジスタ２１_Xへの制御情報の設定、及び、命令更新回路２２_Xによる制御情報の更新を可能とするために設けられているマルチプレクサである。 Each 2-input multiplexer 23 _X in the control path 13 (FIG. 1) enables the scheduler 12 to set control information in the control information register 21 _X and update the control information by the instruction update circuit 22 _X. It is a provided multiplexer.

各３入力マルチプレクサ２４_Xは、命令更新回路２２_Xによって制御情報レジスタ２１_X
に設定される制御情報を、他の制御情報レジスタ２１_Y（Ｙ≠Ｘ；詳細は後述）に設定で
きるようにするために設けられているマルチプレクサである。 Each three-input multiplexer 24 _X is controlled by the instruction update circuit 22 _X using the control information register 21 _X.
Is a multiplexer provided so that the control information set in can be set in another control information register 21 _Y (Y ≠ X; details will be described later).

図１に示してある回路構成（他構成要素との接続形態）から明らかなように、３入力マルチプレクサ２４₀は、ＬＤＳＴ系演算器ｐ２又はｐ３に関する更新後の制御情報（つま
り、命令更新回路２２₂、２２₃の出力）を、制御情報レジスタ２１₀に設定できるものと
なっている。３入力マルチプレクサ２４₁も、ＬＤＳＴ系演算器ｐ２又はｐ３に関する更
新後の制御情報を、制御情報レジスタ２１₁に設定できるものとなっている。 And Aru circuit configuration shown in FIG. 1 as apparent from (connection form between the other components), 3-input multiplexer 24 ₀ is control information updated about LDST ALUs p2 or p3 (i.e., the instruction updating circuit 22 _2, 22 ₃ of the output), which is assumed to be set in the control information register 21 _0. The 3-input multiplexer 24 ₁ can also set updated control information related to the LDST arithmetic unit p2 or p3 in the control information register 21 ₁ .

そして、３入力マルチプレクサ２４₂、２４₃は、それぞれ、ＭＵＬ系演算器ｐ０又はｐ１に関する更新後の制御情報（命令更新回路２２₀、２２₁の出力）を、制御情報レジスタ２１₂、２１₃に設定できるものとなっている。 Then, the three-input multiplexers 24 ₂ and 24 ₃ respectively send the updated control information (outputs of the instruction update circuits 22 ₀ and 22 ₁ ) related to the MUL arithmetic units p0 or p1 to the control information registers 21 ₂ and 21 ₃ . It can be set.

以上のことを前提に、以下、演算ユニット１０の構成及び動作をさらに詳細に説明する。なお、以下の説明では、制御情報レジスタ２１_X、命令更新回路２２_X、演算器ｐＸ等からなる部分〔演算器ｐＸと演算器ｐＸを複数回動作させるための構成とからなる部分〕のことを、パイプラインｐＸと表記する。また、パイプラインｐ０、ｐ１（つまり、ＭＵＬ系演算器ｐ０、ｐ１を含むパイプライン）のことを、ＭＵＬ系パイプラインｐ０、ｐ１と表記し、パイプラインｐ２、ｐ３のことを、ＬＤＳＴ系パイプラインｐ２、ｐ３と表記する。 Based on the above, the configuration and operation of the arithmetic unit 10 will be described in more detail below. In the following description, the part composed of the control information register 21 _X , the instruction update circuit 22 _X , the arithmetic unit pX, etc. [the part consisting of the arithmetic unit pX and the configuration for operating the arithmetic unit pX multiple times] , Written as pipeline pX. Further, the pipelines p0 and p1 (that is, the pipeline including the MUL arithmetic units p0 and p1) are denoted as MUL pipelines p0 and p1, and the pipelines p2 and p3 are denoted as LDST pipelines. They are expressed as p2 and p3.

図３に、スケジューラ１２が、命令バッファ１１上の１命令をいずれかのパイプラインｐＸに対して発行するために実行する命令発行処理の流れ図を示す。なお、この命令発行処理は、スケジューラ１２が、原則として（ステップＳ１０５が実行された場合が例外）
、サイクル毎に開始する処理である。また、実際の発行制約はレジスタ干渉など他にもいろいろあるが、本処理の流れでは簡単のため資源競合のみ存在するものと仮定する。 FIG. 3 shows a flowchart of instruction issue processing executed by the scheduler 12 to issue one instruction on the instruction buffer 11 to any pipeline pX. This instruction issue processing is performed by the scheduler 12 in principle (except when the step S105 is executed).
This process starts every cycle. Although there are various other issuance restrictions such as register interference, it is assumed that only resource contention exists for the sake of simplicity in this processing flow.

すなわち、所定タイミングとなったため、この命令発行処理を開始したスケジューラ１２は、まず、命令バッファ１１上の先頭の命令（最も過去に受け付けた命令）を読み出す（ステップＳ１０１）。 That is, since the predetermined timing has come, the scheduler 12 that has started the instruction issuing process first reads out the first instruction (the instruction received in the past) on the instruction buffer 11 (step S101).

次いで、スケジューラ１２は、読み出した命令を発行可能なパイプラインｐＸが空いているか否かを、命令の種類別に各パイプラインｐＸに予め割り当てられている優先順位順に確認する（ステップＳ１０２）。なお、『パイプラインｐＸが空いている』とは、新たな処理を次サイクルに開始できる状態にある（パイプラインｐＸが処理を行っていない、又は、パイプラインｐＸが実行中の処理が現サイクルで終わる）ということである。また、或る命令を発行可能なパイプラインｐＸとは（図１参照）、ＡＬＵ系命令（加算命令、論理演算命令等）については、パイプラインｐ０〜ｐ３のことであり、ＭＵＬ系命令（乗算命令等）については、パイプラインｐ０、ｐ１のことであり、ＬＤＳＴ系命令（ロード命令、ストア命令等）については、パイプラインｐ２、ｐ３のことである。 Next, the scheduler 12 checks whether or not the pipeline pX that can issue the read instruction is empty in order of priority assigned in advance to each pipeline pX for each instruction type (step S102). Note that “the pipeline pX is free” means that a new process can be started in the next cycle (the process in which the pipeline pX is not performing or the pipeline pX is being executed is the current cycle). It ends with). A pipeline pX that can issue an instruction (see FIG. 1) is an ALU instruction (addition instruction, logical operation instruction, etc.), which is a pipeline p0 to p3, and a MUL instruction (multiplication). (Instructions, etc.) refers to pipelines p0, p1, and LDST instructions (load instructions, store instructions, etc.) refer to pipelines p2, p3.

読み出した命令を発行可能なパイプラインｐＸが存在していた場合（ステップＳ１０３；ＹＥＳ）、スケジューラ１２は、そのパイプラインｐＸに対して命令を発行する（ステップＳ１０８）。より具体的には、スケジューラ１２は、ステップＳ１０２の処理により空いていることを見出したパイプラインｐＸの制御情報レジスタ２１_Xに、読み出した命
令に応じた内容の制御情報（図２参照）を設定する処理を、このステップＳ１０８にて行う。 If there is a pipeline pX that can issue the read instruction (step S103; YES), the scheduler 12 issues an instruction to the pipeline pX (step S108). More specifically, the scheduler 12 sets the control information (see FIG. 2) according to the read instruction in the control information register 21 _X of the pipeline pX that has been found free by the process of step S102. This processing is performed in step S108.

そして、ステップＳ１０８の処理を終えたスケジューラ１２は、この図３の処理を一旦終了し、図３の処理を開始すべきタイミングとなる（１サイクルが経過して命令バッファ１１上の次の命令を処理すべきタイミングとなる）のを待機する状態となる。 Then, the scheduler 12 that has finished the process of step S108 once ends the process of FIG. 3 and the timing to start the process of FIG. 3 is reached (one cycle has passed and the next instruction on the instruction buffer 11 is It is in a state of waiting for the timing to be processed.

一方、読み出した命令を発行可能なパイプラインｐＸが存在していなかった場合（ステップＳ１０３；ＮＯ）、スケジューラ１２は、処理引継可能条件が満たされているか否かを判断する（ステップＳ１０４）。 On the other hand, when there is no pipeline pX that can issue the read instruction (step S103; NO), the scheduler 12 determines whether or not the process takeover enabling condition is satisfied (step S104).

ここで、処理引継可能条件とは、読み出した命令がＭＵＬ系命令である場合には、『空きＬＤＳＴ系パイプライン及びＡＬＵ系処理を実行中のＭＵＬ系パイプラインが存在する』（ＬＤＳＴ系パイプラインｐ２、ｐ３の少なくとも一方が空いており、且つ、ＭＵＬ系パイプラインｐ０、ｐ１の少なくとも一方がＡＬＵ系処理を実行している）という条件のことである。また、読み出した命令がＬＤＳＴ系命令である場合には、『空きＭＵＬ系パイプライン及びＡＬＵ系処理を実行中のＬＤＳＴ系パイプラインが存在する』（ＭＵＬ系パイプラインｐ０、ｐ１の少なくとも一方が空いており、且つ、ＬＤＳＴ系パイプラインｐ２、ｐ３の少なくとも一方がＡＬＵ系処理を実行している』という条件のことである。 Here, the process takeover enabling condition is that when the read instruction is a MUL instruction, “there is an empty LDST pipeline and a MUL pipeline executing ALU processing” (LDST pipeline) at least one of p2 and p3 is free, and at least one of the MUL pipelines p0 and p1 is executing ALU processing). When the read instruction is an LDST instruction, “There is an empty MUL pipeline and an LDST pipeline executing ALU processing” (at least one of the MUL pipelines p0 and p1 is empty). And at least one of the LDST pipelines p2 and p3 is executing ALU processing ”.

スケジューラ１２は、処理引継可能条件が満たされていた場合には、ＡＬＵ系処理を実行している１個のパイプライン（ＭＵＬ系命令の処理時にはＭＵＬ系パイプライン、ＬＤＳＴ系命令の処理時にはＬＤＳＴ系パイプライン）を、処理引継元パイプラインとして特定し、空いている１個のパイプライン（ＭＵＬ系命令の処理時にはＬＤＳＴ系パイプライン、ＬＤＳＴ系命令の処理時にはＭＵＬ系パイプライン）を、処理引継先パイプラインとして特定する処理も行う。なお、この処理は、処理引継元／処理引継先パイプラインとすることが可能なパイプラインが２個存在した場合、予め設定されているアルゴリズムによりそれらのパイプラインの中から、処理引継元／処理引継先パイプラインとする１個のパイプラインを選択するものとなっている。 If the process takeover condition is satisfied, the scheduler 12 executes one pipeline executing ALU processing (MUL pipeline when processing a MUL instruction, and LDST processing when processing an LDST instruction. (Pipeline) is specified as the processing takeover source pipeline, and one available pipeline (LDST pipeline when processing MUL instructions, MUL pipeline when processing LDST instructions) Processing that identifies the pipeline is also performed. In this process, when there are two pipelines that can be the process takeover source / process takeover pipeline, the process takeover source / process is selected from the pipelines by a preset algorithm. One pipeline is selected as the takeover destination pipeline.

要するに、ステップＳ１０４にて、スケジューラ１２は、読み出した命令がＭＵＬ系命令であり、ＬＤＳＴ系パイプラインｐ２、ｐ３の少なくとも一方が空いており、ＭＵＬ系パイプラインｐ０、ｐ１の少なくとも一方がＡＬＵ系処理を実行していた場合には、処理引継可能条件が満たされていると判断する。また、スケジューラ１２は、当該判断時に、ＡＬＵ系処理を実行していることを見出した１個のＭＵＬ系パイプライン、空いていることを見出した１個のＬＤＳＴ系パイプラインを、それぞれ、処理引継元パイプライン、処理引継先パイプラインとして特定する。 In short, in step S104, the scheduler 12 determines that the read instruction is a MUL instruction, at least one of the LDST pipelines p2 and p3 is free, and at least one of the MUL pipelines p0 and p1 is an ALU process. Is executed, it is determined that the process takeover condition is satisfied. In addition, the scheduler 12 takes over processing of one MUL pipeline that has been found to be executing ALU processing and one LDST pipeline that has been found free at the time of the determination. Specify the original pipeline and the processing takeover destination pipeline.

スケジューラ１２は、読み出した命令がＬＤＳＴ系命令であり、ＭＵＬ系パイプラインｐ０、ｐ１の少なくとも一方が空いており、且つ、ＬＤＳＴ系パイプラインｐ２、ｐ３の少なくとも一方がＡＬＵ系処理を実行していた場合にも、処理引継可能条件が満たされていると判断する。また、スケジューラ１２は、当該判断時に、ＡＬＵ系処理を実行していることを見出した１個のＬＤＳＴ系パイプライン、空いていることを見出した１個のＭＵＬ系パイプラインを、それぞれ、処理引継元パイプライン、処理引継先パイプラインとして特定する。 In the scheduler 12, the read instruction is an LDST instruction, at least one of the MUL pipelines p0 and p1 is free, and at least one of the LDST pipelines p2 and p3 is executing an ALU process. Even in this case, it is determined that the conditions for enabling the process takeover are satisfied. In addition, the scheduler 12 takes over processing of one LDST pipeline that has been found to be executing ALU processing and one MUL pipeline that has been found to be free at the time of the determination. Specify the original pipeline and the processing takeover destination pipeline.

そして、スケジューラ１２は、処理引継可能条件が満たされていると判断した場合（ステップＳ１０４；ＹＥＳ）には、処理引継元パイプラインに関する更新後の制御情報を、処理引継先パイプラインに関する制御情報レジスタ２１_Xに設定する（ステップＳ１０６
）。 If the scheduler 12 determines that the process takeover condition is satisfied (step S104; YES), the updated control information related to the process takeover source pipeline is used as the control information register related to the process takeover destination pipeline. 21 _X is set (step S106).
).

より具体的には、スケジューラ１２は、読み出した命令がＭＵＬ系命令であった場合には、処理引継元パイプラインｐＫ（Ｋ＝０or１）の命令更新回路２２_Kによって更新され
た制御情報が、処理引継先パイプラインｐＬ（Ｌ＝２or３）に関する制御情報レジスタ２１_Lに設定されるように、３入力マルチプレクサ２４_Lを制御する。また、スケジューラ１２は、読み出した命令がＬＤＳＴ系命令であった場合には、処理引継元パイプラインｐＫ（Ｋ＝２or３）の命令更新回路２２_Kによって更新された制御情報が、処理引継先パイプ
ラインｐＬ（Ｌ＝０or１）に関する制御情報レジスタ２１_Lに設定されるように、３入力
マルチプレクサ２４_Lを制御する。 More specifically, when the read instruction is a MUL instruction, the scheduler 12 processes the control information updated by the instruction update circuit 22 _K of the process takeover source pipeline pK (K = 0 or 1). The three-input multiplexer 24 _L is controlled so as to be set in the control information register 21 _L regarding the takeover destination pipeline pL (L = 2 or 3). When the read instruction is an LDST instruction, the scheduler 12 updates the control information updated by the instruction update circuit 22 _K of the process takeover source pipeline pK (K = 2 or 3) to the process takeover destination pipeline. The three-input multiplexer 24 _L is controlled so that it is set in the control information register 21 _L regarding pL (L = 0 or 1).

次いで、スケジューラ１２は、処理引継元パイプラインｐＫに対して、命令バッファ１１から読み出した命令を発行する処理（ステップＳ１０７）、すなわち、制御情報レジスタ２１_Kに制御情報を書き込む処理を行う。 Then, the scheduler 12, the processing takeover original pipeline pK, processing of issuing an instruction read from the instruction buffer 11 (step S107), i.e., performs the process of writing the control information in the control information register 21 _K.

そして、スケジューラ１２は、この図３の処理を一旦終了して、図３の処理を再び実行すべきタイミングとなるのを待機する状態となる。 Then, the scheduler 12 once ends the process of FIG. 3 and enters a state of waiting for the timing to execute the process of FIG. 3 again.

一方、ステップＳ１０４にて、処理引継可能条件が満たされていないと判断した場合（ＮＯ）、スケジューラ１２は、いずれかのパイプラインの処理が完了する（いずれかのパイプラインが、新たな処理を開始できる状態となる）のを待機する（ステップＳ１０５）。 On the other hand, if it is determined in step S104 that the process takeover condition is not satisfied (NO), the scheduler 12 completes the processing of one of the pipelines (one of the pipelines performs a new process). It waits until it can be started (step S105).

そして、スケジューラ１２は、いずれかのパイプラインの処理が完了した場合には、ステップＳ１０２以降の処理を再び開始する。なお、流れ図には明示していないが、この場合に、スケジューラ１２がステップＳ１０２及びＳ１０３にて実行する処理は、処理が完了したパイプラインが、命令バッファ１１から読み出してある命令（前サイクルで発行できなかった命令）を発行可能なものであるか否かを判断する処理である。 Then, when the processing of any pipeline is completed, the scheduler 12 starts the processing after step S102 again. Although not explicitly shown in the flowchart, in this case, the processing executed by the scheduler 12 in steps S102 and S103 is an instruction (issued in the previous cycle) that has been read from the instruction buffer 11 by the pipeline that has completed processing. This is a process for determining whether or not an instruction that could not be issued can be issued.

以上の説明により既に明らかであるとは考えるが、ここで、具体例に基づき、本演算ユ
ニット１０の動作（機能）を説明しておくことにする。 Although it is considered to be clear from the above description, the operation (function) of the present arithmetic unit 10 will be described based on a specific example.

既に説明したものと同じ命令ｖａｄｄ、ｖｍｕｌ、ｖｌｏａｄ、ｖｍｕｌが、この順に、演算ユニット１０の命令バッファ１１に入力された場合を考える。 Consider a case where the same instructions vadd, vmul, vload, vmul as already described are input to the instruction buffer 11 of the arithmetic unit 10 in this order.

この場合、ｖｌｏａｄまでの各命令の処理時には、図３のステップＳ１０３にてＹＥＳ側への分岐が行われる。そのため、各パイプラインに対する優先順位の設定内容に応じて、例えば、ｖａｄｄ、ｖｍｕｌ、ｖｌｏａｄが、それぞれ、パイプラインｐ０、ｐ１、ｐ２に対して発行されることになる。 In this case, when each instruction up to vload is processed, branching to the YES side is performed in step S103 of FIG. Therefore, for example, vadd, vmul, and vload are issued to the pipelines p0, p1, and p2, respectively, according to the priority setting contents for each pipeline.

そして、ｖａｄｄ、ｖｍｕｌ、ｖｌｏａｄが、それぞれ、パイプラインｐ０、ｐ１、ｐ２に対して発行されている場合には、４個目の命令ｖｍｕｌの処理時に、ＭＵＬ系処理を行えないＬＤＳＴ系パイプラインｐ３だけしか空いていないことになる。従って、この場合、スケジューラ１２は、命令を発行可能なパイプラインが存在しないと判断（ステップＳ１０３；ＮＯ）して、処理引継可能条件が満たされているか否かを判断する（ステップＳ１０４）。 If vadd, vmul, and vload are issued to the pipelines p0, p1, and p2, respectively, the LDST pipeline p3 that cannot perform MUL-related processing when the fourth instruction vmul is processed It will only be free. Therefore, in this case, the scheduler 12 determines that there is no pipeline that can issue an instruction (step S103; NO), and determines whether or not the process takeover condition is satisfied (step S104).

そして、ｖｍｕｌがＭＵＬ系命令であり、ＬＤＳＴ系パイプラインｐ３が空いており、ＭＵＬ系パイプラインｐ０がＡＬＵ系処理（ｖａｄｄ）を実行中であるため、スケジューラ１２は、ステップＳ１０４にて、処理引継可能条件が満たされていると判断する。また、スケジューラ１２は、ステップＳ１０４にて、ＡＬＵ系処理を実行中のＭＵＬ系パイプラインｐ０を処理引継元パイプラインとして特定し、空いているＬＤＳＴ系パイプラインｐ３を処理引継先パイプラインとして特定する。 Since vmul is a MUL instruction, the LDST pipeline p3 is free, and the MUL pipeline p0 is executing an ALU process (vadd), the scheduler 12 takes over the process in step S104. Judge that the possible conditions are met. In step S104, the scheduler 12 specifies the MUL pipeline p0 that is executing the ALU processing as the processing takeover source pipeline, and specifies the vacant LDST pipeline p3 as the processing takeover destination pipeline. .

その後、スケジューラ１２は、ステップＳ１０６の処理を行う。すなわち、スケジューラ１２は、処理引継元パイプラインｐ０の命令更新回路２２₀によって更新された制御情
報が、処理引継先パイプラインｐ３に関する制御情報レジスタ２１₃に設定されるように
、３入力マルチプレクサ２４₃を制御する。 Thereafter, the scheduler 12 performs the process of step S106. That is, the scheduler 12, the process takeover control information updated by the instruction update circuit 22 ₀ of the original pipeline p0 is, as set in the control information register 21 ₃ about the processing takeover destination pipeline p3, 3-input multiplexer 24 ₃ To control.

既に説明したように、本演算ユニット１０が備える各パイプラインｐＸは、サイクル毎に、オペランドのみが異なる処理を繰り返すものである。また、いずれのパイプラインｐＸもＡＬＵ系処理を実行可能なものであり、スケジューラ１２が処理引継元パイプラインとして特定するパイプラインは、ＡＬＵ系処理を実行しているパイプラインである。従って、パイプラインｐ０の次サイクル用の制御情報を、パイプラインｐ３の次サイクル用の制御情報とすれば、図４に模式的に示したように、処理引継元パイプラインｐ０がそれまで行っていた処理（図４では、“ｖａｄｄ”）の残りの部分が、その後、処理引継先パイプラインｐ３によって行われる（〔１〕）ことになる。 As already described, each pipeline pX provided in the present arithmetic unit 10 repeats processing in which only operands are different for each cycle. Any pipeline pX can execute ALU processing, and the pipeline identified by the scheduler 12 as a processing takeover source pipeline is a pipeline that executes ALU processing. Therefore, if the control information for the next cycle of the pipeline p0 is the control information for the next cycle of the pipeline p3, the process takeover source pipeline p0 has performed so far as schematically shown in FIG. The remaining part of the process ("vadd" in FIG. 4) is thereafter performed by the process takeover destination pipeline p3 ([1]).

より具体的には、図５Ａ、図５Ｂに模式的に示してあるように、パイプラインｐ３によって、パイプラインｐ０がそれまで行っていた処理の残りの部分（この場合、加算処理（ＡＬＵ系処理）を５回繰り返すことが必要な処理）が行われることになる。 More specifically, as schematically shown in FIGS. 5A and 5B, the pipeline p3 causes the remaining part of the processing that the pipeline p0 has performed so far (in this case, addition processing (ALU processing) ) Is required to be repeated five times.

また、ステップＳ１０６の処理を終えたスケジューラ１２は、図４に模式的に示してあるように、制御情報を移動した処理引継元パイプラインｐ０に対してｖｍｕｌを発行する（〔２〕）。従って、図５Ａ、図５Ｂに示してあるように、４個目の命令ｖｍｕｌに関する処理が、パイプラインｐ０によって、その命令の読み出し直後の４サイクル目から開始されることになる。 Further, as schematically shown in FIG. 4, the scheduler 12 that has finished the process of step S106 issues vmul to the process takeover source pipeline p0 to which the control information has been moved ([2]). Therefore, as shown in FIGS. 5A and 5B, the processing related to the fourth instruction vmul is started from the fourth cycle immediately after the instruction is read by the pipeline p0.

以上の説明から明らかなように、実施形態に係る演算ユニット１０は、図６の演算ユニットに比べて、同時に実行できるＭＵＬ系処理、ＬＤＳＴ系処理の個数が少ないため多少
性能は劣る。ただ、一般にＭＵＬ系処理やＬＤＳＴ系処理はプログラム中の頻度が多くないため、多くのプログラムで同等程度の性能を期待できる。なお、この図６は、既に説明した、全処理機能を有する４つのパイプラインｐ０〜ｐ３を備えた演算ユニット（図８）の構成を、より詳細に（図１に示した演算ユニット１０の構成と同レベルで）示した図である。 As is apparent from the above description, the arithmetic unit 10 according to the embodiment is somewhat inferior in performance to the arithmetic unit of FIG. 6 because the number of MUL processing and LDST processing that can be executed simultaneously is small. However, in general, MUL-based processing and LDST-based processing are not frequently performed in a program, and therefore, almost the same performance can be expected in many programs. 6 shows the configuration of the arithmetic unit (FIG. 8) including the four pipelines p0 to p3 having all the processing functions described above in more detail (configuration of the arithmetic unit 10 shown in FIG. 1). (At the same level).

また、演算ユニット１０のスケジューラ１２、制御パス１３は、それぞれ、図６の演算ユニットのスケジューラ、制御パスに対して簡単な改良を施せば実現できるものとなっている。そして、演算ユニット１０のデータパス１４は、図６の演算ユニットのデータパスよりも、製造に要する素子数及び配線数が少なく、ＭＵＬ系処理やＬＤＳＴ系処理のような利用頻度が低く素子数が大きい回路も少ないユニットとなっている。 Further, the scheduler 12 and the control path 13 of the arithmetic unit 10 can be realized by performing simple improvements to the scheduler and the control path of the arithmetic unit in FIG. 6, respectively. The data path 14 of the arithmetic unit 10 requires fewer elements and wirings than the data path of the arithmetic unit shown in FIG. 6, and is less frequently used and has a smaller number of elements, such as MUL processing and LDST processing. Large circuits are also few units.

従って、演算ユニット１０に採用されている構成は、従来と同等程度の性能を有し、従来よりも消費電力が少ない演算ユニットを、従来よりも安価に実現できるものとなっていると言うことが出来る。 Therefore, it can be said that the configuration adopted in the arithmetic unit 10 can realize an arithmetic unit having performance equivalent to the conventional one and lower power consumption than the conventional one at a lower cost. I can do it.

また、演算ユニット１０は、図９に示した構成に比べて性能的には多少劣るが、ポート数が比較的に少ないレジスタファイルしか必要としないものである。一方、図９の演算ユニットは、より多くのポート（１６個のリードポートと８個のライトポート）を備えたレジスタファイルが必要とされるが故に、面積が非常に大きくなるものである。 The arithmetic unit 10 is somewhat inferior in performance as compared with the configuration shown in FIG. 9, but requires only a register file having a relatively small number of ports. On the other hand, the arithmetic unit shown in FIG. 9 requires a register file having more ports (16 read ports and 8 write ports), so that the area becomes very large.

従って、演算ユニット１０に採用されている構成は、この図９の構成よりもコストパフォーマンスの面で優れたものとなっていることにもなる。 Therefore, the configuration employed in the arithmetic unit 10 is also superior in cost performance compared to the configuration of FIG.

また、図７に示した構成は、図９に示した構成を、その機能が『データパスに同時期に発行可能な命令数が４個に制限されているスケジューラ（４個以下のパイプラインしか動作させることが出来ないスケジューラ）を備えた演算ユニット』と同機能となるように変形／改良したものである。このように、レジスタファイルと各パイプラインとの間に４入力マルチプレクサを設けておけば、単一の処理機能を有する複数のパイプラインを備えたタイプの演算ユニットに、リードポート数がレジスタファイル２５と等しいレジスタファイルを使用できることになる。ただし、データパスにおける情報のビット数（幅）の方が、制御パスにおける情報のビット数よりも大きいし、必要とされるマルチプレクサの数も、図７に示した構成の方が多い。しかも、図７に示した構成の演算ユニットに、レジスタファイル２５と全く同一構成のレジスタファイルを使用するためには、各パイプラインの出力側にもマルチプレクサを設けることが必要となる。 The configuration shown in FIG. 7 is the same as the configuration shown in FIG. 9 in that the function is “scheduler in which the number of instructions that can be issued to the data path at the same time is limited to four (only four pipelines or less It is modified / improved so as to have the same function as the “operation unit having a scheduler that cannot be operated”. In this way, if a 4-input multiplexer is provided between the register file and each pipeline, the number of read ports can be set to the register file 25 in an arithmetic unit having a plurality of pipelines having a single processing function. A register file equal to can be used. However, the number of bits (width) of information in the data path is larger than the number of bits of information in the control path, and the number of required multiplexers is larger in the configuration shown in FIG. In addition, in order to use a register file having exactly the same configuration as the register file 25 in the arithmetic unit having the configuration shown in FIG. 7, it is necessary to provide a multiplexer on the output side of each pipeline.

従って、演算ユニット１０に採用されている構成は、図７の構成と比しても、優れたものとなっていると言うことが出来る。 Therefore, it can be said that the configuration adopted in the arithmetic unit 10 is superior to the configuration of FIG.

以上のように、実施形態に係る演算ユニット１０に採用されている構成は、他のいずれの構成を採用した場合よりも、製造コストが低く消費電力が少ない演算ユニットを実現できるものとなっている。従って、スーパーコンピュータ用のプロセッサや、一般電子機器用のプロセッサ（ＤＳＰ等）の演算ユニットを、上記構成のものとしておけば、既存のものよりも製造コストが低く消費電力が少ないプロセッサを実現できることになる。 As described above, the configuration adopted in the arithmetic unit 10 according to the embodiment can realize an arithmetic unit with lower manufacturing cost and lower power consumption than when any other configuration is adopted. . Therefore, if the arithmetic unit of a processor for a supercomputer or a processor for a general electronic device (DSP or the like) is configured as described above, a processor with a lower manufacturing cost and lower power consumption than an existing one can be realized. Become.

《変形形態》
上記した演算ユニット１０は、各種の変形を行うことが出来るものである。例えば、若干、性能が低下することにはなるが、演算ユニット１０を、ＬＤＳＴ系パイプラインからＭＵＬ系パイプラインへの制御情報の移動、又は、ＭＵＬ系パイプラインからＬＤＳＴ系パイプラインへの制御情報の移動のみが可能なものに変形することが出来る。 <Deformation>
The arithmetic unit 10 described above can be modified in various ways. For example, although the performance is slightly reduced, the arithmetic unit 10 moves the control information from the LDST pipeline to the MUL pipeline, or the control information from the MUL pipeline to the LDST pipeline. Can be transformed into one that can only be moved.

スケジューラ１２を、制御情報の移動と、処理引継元パイプラインへの命令発行とを別サイクルで行うものに変形することも出来る。また、演算ユニット１０を、ベクトルユニットや、ＳＩＭＤ(Single Instruction Multiple Data)のようなユニット（複数個の制御情報レジスタ２１_Xに、同時に、制御情報が設定されることがあるユニット）に変形する
ことも出来る。さらに、演算ユニット１０を、パイプラインの数が上記したものとは異なるユニットや、各パイプライン（実行ステージ）がベクトルユニット／ＳＩＭＤ等であるユニット、各パイプラインの機能（各パイプラインが有する処理機能の組み合わせ）が上記したものとは異なるユニット等に変形することも出来る。 The scheduler 12 can be modified to perform control information transfer and instruction issuance to the processing takeover source pipeline in different cycles. Further, the arithmetic unit 10 is transformed into a unit such as a vector unit or SIMD (Single Instruction Multiple Data) (a unit in which control information may be simultaneously set in a plurality of control information registers 21 _X ). You can also. Further, the arithmetic unit 10 includes a unit having a different number of pipelines as described above, a unit in which each pipeline (execution stage) is a vector unit / SIMD, etc., and a function of each pipeline (processing each pipeline has) The combination of functions) may be modified to a unit different from the above.

また、演算ユニット１０は、或るパイプラインが既に開始している処理（ＡＬＵ系処理を繰り返す処理）を他のパイプラインに引き継がせるための制御が、スケジューラ１２と３入力マルチプレクサ２４₀〜２４₃とを主要構成要素とした回路により行われるユニットであったが、当該回路として、他の構成を有する回路を採用することも出来る。 In addition, the arithmetic unit 10 controls the scheduler 12 and the three-input multiplexers 24 _{0 to} 24 ₃ so that processing that a certain pipeline has already started (processing that repeats ALU processing) is taken over by another pipeline. However, a circuit having another configuration can also be adopted as the circuit.

１０演算ユニット
１１命令バッファ
１２スケジューラ
１３制御パス
１４データパス
２１₀〜２１₃ 制御情報レジスタ
２２₀〜２２₃ 命令更新回路
２３₀〜２３₃ ２入力マルチプレクサ
２４₀〜２４₃ ３入力マルチプレクサ
２５レジスタファイル DESCRIPTION OF SYMBOLS 10 Arithmetic unit 11 Instruction buffer 12 Scheduler 13 Control path 14 Data path 21 _{0 to} 21 ₃ Control information register 22 _{0 to} 22 ₃ Instruction update circuit 23 ₀ to 23 ₃ Two-input multiplexer 24 _{0 to} 24 ₃ Three-input multiplexer 25 Register file

Claims

One or more first pipelines having an execution function of a first type process and an execution function of a second type process, and one or more having an execution function of a first type process and an execution function of a third type process A plurality of pipelines for multi-cycle operation, including a second pipeline of
A function of taking over the first type loop process completed by executing the first type process a plurality of times, which has already been started by a certain first pipeline, to the second pipeline ready to start a new process. A control circuit having
An arithmetic unit comprising:

The control circuit further comprises:
The first pipeline in a state in which a new process can be started has a function of taking over the first type loop process already started by a certain second pipeline. Arithmetic unit.

The control circuit includes:
The process to be started in any one of the pipelines is a second type loop process that is completed by executing the second type process a plurality of times, and there is no first pipeline that is ready to start a new process. When there is a second pipeline that is ready to start a new process and a first pipeline that is executing the first type loop process, the second pipe that is ready to start a new process Causing the line to take over the first type loop process being executed by a certain first pipeline, and then causing the first pipeline to start the second type loop process to be executed by any of the pipelines;
The process to be started in any one of the pipelines is a third type loop process that is completed by executing the third type process a plurality of times, and there is no second pipeline in a state where a new process can be started. When there is a first pipeline ready to start a new process and a second pipeline executing a first type loop process, the first pipe ready to start a new process Causing the second pipeline to take over the first type loop processing being executed by a certain second pipeline, and then causing the second pipeline to start the third type loop processing to be executed by any one of the pipelines. The arithmetic unit according to claim 2, wherein:

The control circuit comprises:
A plurality of control information registers, each of which is associated with a specific pipeline and in which control information defining the operation content in the next cycle of the corresponding pipeline is set for each cycle;
The function of moving control information in the control information register for any first pipeline to the control information register for any second pipeline, and the control information in the control information register for any second pipeline A control information moving circuit having a function of moving to a control information register for an arbitrary first pipeline;
The arithmetic unit according to claim 2, comprising:

The control circuit includes:
The process to be started in any one of the pipelines is a second type loop process that is completed by executing the second type process a plurality of times, and there is no first pipeline that is ready to start a new process. When there is a second pipeline that is ready to start a new process and a first pipeline that is executing the first type loop process, the control information moving circuit is controlled to control the first pipeline. After moving the control information set in the control information register for a certain first pipeline that is executing the seed loop process to the control information register for the second pipeline that is ready to start a new process, A scheduler that sets control information related to the second type loop processing to be started in any pipeline in the control information register for the first pipeline;
The process to be started in any one of the pipelines is a third type loop process that is completed by executing the third type process a plurality of times, and there is no second pipeline in a state where a new process can be started. When there is a first pipeline that is ready to start a new process and a second pipeline that is executing the first type loop process, the control information moving circuit is controlled to control the first pipeline. The control information set in the control information register for a certain second pipeline that is executing the seed process is moved to the control information register for the first pipeline that is ready to start a new process, and then A control information register for the second pipeline includes a scheduler that sets control information related to the third type loop processing to be started in any of the pipelines. The arithmetic unit according to claim 4.