JP7393519B2

JP7393519B2 - Arithmetic device and method

Info

Publication number: JP7393519B2
Application number: JP2022505966A
Authority: JP
Inventors: 成司西村
Original assignee: Denso Corp; NSI Texe Inc
Current assignee: Denso Corp; NSI Texe Inc
Priority date: 2020-03-11
Filing date: 2021-03-03
Publication date: 2023-12-06
Anticipated expiration: 2041-03-03
Also published as: WO2021182222A1; JPWO2021182222A1

Description

Cross-reference to related applications

本出願は、２０２０年３月１１日に出願された特許出願番号２０２０－０４２１６９号に基づくものであって、その優先権の利益を主張するものであり、その特許出願のすべての内容が、参照により本明細書に組み入れられる。 This application is based on patent application No. 2020-042169 filed on March 11, 2020, and claims the benefit of priority thereto, and all contents of that patent application are referred to , incorporated herein by reference.

本開示は、演算装置及び演算方法に関する。 The present disclosure relates to an arithmetic device and an arithmetic method.

従来から、命令を処理する演算器である機能ユニットを複数備え、入力される一連の命令に対して複数の機能ユニットでパイプライン処理（以下「多段演算パイプライン」ともいう。）を行なう演算装置が用いられている。 Conventionally, an arithmetic device is equipped with a plurality of functional units that are arithmetic units that process instructions, and performs pipeline processing (hereinafter also referred to as "multi-stage arithmetic pipeline") with the plurality of functional units for a series of input instructions. is used.

例えば特許文献１には、乗算器ＭＵＬと加算機ＡＤＤ等の複数の機能ユニットによってパイプライン処理を行う構成が開示されている。特許文献１に記載の構成は、乗算器ＭＵＬは同時に入力された値「ａ」「ｂ」の対応する要素の組の各データを乗算して加算器ＡＤＤに順次出力し、加算器ＡＤＤは出力された乗算値と、前回の加算器ＡＤＤの出力とを順次加算している。 For example, Patent Document 1 discloses a configuration in which pipeline processing is performed by a plurality of functional units such as a multiplier MUL and an adder ADD. In the configuration described in Patent Document 1, the multiplier MUL multiplies each data of a set of elements corresponding to the values "a" and "b" input at the same time and sequentially outputs the multiplier to the adder ADD, and the adder ADD outputs the multiplier MUL. The multiplied value and the output of the previous adder ADD are sequentially added.

特開２０１２－６９０８１号公報JP2012-69081A

ここで、多段演算パイプラインを用いたチェイニングによって命令（演算）を行う演算装置は、パイプラインの立ち上げや立ち下がりのオーバーヘッドを要し、加算や乗算等が組み合わさった複雑な演算を実行する際に、パイプラインの立ち上げ立ち下げを複数回行う場合があった。また、複雑な演算を実行する場合には、他の機能ユニットによる演算の終了を待ってから演算を行う機能ユニットが存在する場合があった。すなわち、パイプラインを用いたチェイニングでは効率的な演算を行えない場合があった。 Here, an arithmetic unit that performs instructions (operations) by chaining using a multi-stage arithmetic pipeline requires overhead for starting and falling of the pipeline, and performs complex operations that combine addition, multiplication, etc. When doing so, there were cases where the pipeline was started and stopped multiple times. Furthermore, when executing a complex calculation, there may be a functional unit that waits for the completion of calculations by other functional units before performing the calculation. That is, chaining using a pipeline may not be able to perform efficient calculations.

本開示は、パイプラインを用いたチェイニングによる演算をより効率的に行える、演算装置及び演算方法を提供することを目的とする。 An object of the present disclosure is to provide an arithmetic device and an arithmetic method that can more efficiently perform chaining operations using a pipeline.

本開示の一態様の演算装置は、同じ機能を有する複数の機能ユニットを含むパイプラインを具備し、チェイニングによって演算を行う演算装置であって、命令を実行中でない前記機能ユニットの有無を判定する判定部と、前記何れかの機能ユニットが命令を実行している際に、前記命令を実行中でない前記機能ユニットに対して、前記何れかの機能ユニットによる命令実行と並行して実行可能な前記命令を実行させる制御部と、を備える。 An arithmetic device according to an aspect of the present disclosure is an arithmetic device that includes a pipeline including a plurality of functional units having the same function and performs arithmetic operations by chaining, and determines whether or not there is a functional unit that is not executing an instruction. and a determination unit that is capable of being executed in parallel with the execution of an instruction by any of the functional units for the functional unit that is not executing the instruction when any of the functional units is executing the instruction. and a control unit that executes the instructions.

本発明によれば、パイプラインを用いたチェイニングによる演算をより効率的に行える。 According to the present invention, calculations by chaining using a pipeline can be performed more efficiently.

本開示についての上記目的およびその他の目的、特徴や利点は、添付の図面を参照しながら下記の詳細な記述により、より明確になる。その図面は、
図１は、実施形態の演算装置の概略構成図である。図２は、実施形態のチェイニングを示す模式図である。図３は、実施形態のチェイニングを示す模式図である。図４は、実施形態のチェイニング演算処理の流れを示すフローチャートである。 The above objects and other objects, features and advantages of the present disclosure will become more apparent from the following detailed description with reference to the accompanying drawings. The drawing is
FIG. 1 is a schematic configuration diagram of an arithmetic device according to an embodiment. FIG. 2 is a schematic diagram showing chaining of the embodiment. FIG. 3 is a schematic diagram showing chaining of the embodiment. FIG. 4 is a flowchart showing the flow of chaining calculation processing according to the embodiment.

以下、図面を参照して本開示の実施形態を説明する。なお、以下に説明する実施形態は、本開示を実施する場合の一例を示すものであって、本開示を以下に説明する具体的構成に限定するものではない。本開示の実施にあたっては、実施形態に応じた具体的構成が適宜採用されてよい。 Embodiments of the present disclosure will be described below with reference to the drawings. Note that the embodiment described below shows an example of implementing the present disclosure, and the present disclosure is not limited to the specific configuration described below. In implementing the present disclosure, specific configurations depending on the embodiments may be adopted as appropriate.

図１は、本実施形態の演算装置１０の概略構成図である。本実施形態の演算装置１０は、複数の機能ユニット１２を含むパイプラインを具備し、チェイニングによって演算を行うものである。 FIG. 1 is a schematic configuration diagram of an arithmetic device 10 of this embodiment. The arithmetic device 10 of this embodiment includes a pipeline including a plurality of functional units 12, and performs arithmetic operations by chaining.

機能ユニット１２は、例えば、メモリからレジスタへデータをコピーするＬＤ、レジスタからメモリにデータをコピーするＳＴ、加算機能を有するＡＤＤ（加算器）、乗算機能を有するＭＵＬ（乗算器）、除算機能を有するＤＩＶ（除算器）等の演算器である。これら複数の機能ユニット１２は、パイプライン処理を実行するマルチパイプベクトル演算器（以下「ベクトル演算器」という。）２０に備えられる。 The functional unit 12 includes, for example, an LD that copies data from memory to a register, an ST that copies data from a register to memory, an ADD (adder) that has an addition function, an MUL (multiplier) that has a multiplication function, and a division function. It is an arithmetic unit such as a DIV (divider) that has a DIV (divider). These plurality of functional units 12 are included in a multi-pipe vector calculator (hereinafter referred to as "vector calculator") 20 that executes pipeline processing.

次に、本実施形態の演算装置１０によるパイプライン処理について説明する。 Next, pipeline processing by the arithmetic device 10 of this embodiment will be explained.

まず、一例として、パイプラインの段数（以下「パイプライン段数」という。）を４段とし、機能ユニット１２の数を５つ（ＬＤ、ＳＴ、ＡＤＤ、ＭＵＬ、ＤＩＶ）で構成されるベクトル演算器２０を想定する。以下の説明では、パイプライン段数を４段とするベクトル演算器２０のハードウエア状態をディフォルトモードという。 First, as an example, the number of pipeline stages (hereinafter referred to as "pipeline stage number") is four stages, and the number of functional units 12 is five (LD, ST, ADD, MUL, DIV). Assume 20. In the following description, the hardware state of the vector arithmetic unit 20 with four pipeline stages will be referred to as a default mode.

このディフォルトモードは、パイプライン段数をより少なくしたモード１とモード２に再構成することが可能である。モード１は、パイプライン段数が２段であり、機能ユニット１２の数が８つ（ＬＤ、ＳＴ、２つのＡＤＤ、２つのＭＵＬ、２つのＤＩＶ）である。モード２は、パイプライン段数が１段であり、機能ユニット１２の数が１４個（ＬＤ、ＳＴ、４つのＡＤＤ、４つのＭＵＬ、４つのＤＩＶ)である。このように、ディフォルトモードよりも段数を少なくしたモード１やモード２では、ＡＤＤ、ＭＵＬ、ＤＩＶ等の同じ機能を有する機能ユニット１２が複数存在する。 This default mode can be reconfigured into mode 1 and mode 2, which have fewer pipeline stages. In mode 1, the number of pipeline stages is two, and the number of functional units 12 is eight (LD, ST, two ADD, two MUL, and two DIV). In mode 2, the number of pipeline stages is one, and the number of functional units 12 is 14 (LD, ST, 4 ADDs, 4 MULs, and 4 DIVs). In this way, in mode 1 and mode 2, which have fewer stages than the default mode, there are a plurality of functional units 12 having the same functions, such as ADD, MUL, and DIV.

なお、本実施形態の演算装置１０は、一例として、ハードウエアのモード（パイプライン段数と機能ユニット１２の数の組み合わせ）の全バリエーションが予め定義されており、現在のハードウエアのモードはフラグ値として専用のレジスタ（モードレジスタ）に保持される。そして、ハードウエアのモードは専用の制御命令により設定される。 Note that in the arithmetic device 10 of this embodiment, all variations of hardware modes (combinations of the number of pipeline stages and the number of functional units 12) are defined in advance, as an example, and the current hardware mode is determined by a flag value. It is held in a dedicated register (mode register). The hardware mode is then set by a dedicated control command.

このように、本実施形態の演算装置１０は、ディフォルトモードから、同じ機能を有する複数の機能ユニット１２を含んでパイプラインを再構成可能とされている。そして、本実施形態の演算装置１０は、機能ユニット１２が命令を実行している場合に、命令を実行中でない他の機能ユニット１２に対して並行して実行可能な命令を実行させる。このため、本実施形態の演算装置１０は、ある機能ユニット１２が命令を実行している場合に、他の機能ユニット１２が命令を実行しているか否かを判定する機能（Out-of-Order機能）を有している。 In this way, the arithmetic device 10 of this embodiment can reconfigure the pipeline including a plurality of functional units 12 having the same function from the default mode. Then, when the functional unit 12 is executing an instruction, the arithmetic device 10 of this embodiment causes other functional units 12 that are not executing an instruction to execute an instruction that can be executed in parallel. Therefore, the arithmetic device 10 of this embodiment has a function (Out-of-Order) that determines whether or not another functional unit 12 is executing an instruction when a certain functional unit 12 is executing an instruction. function).

Out-of-Order機能を実現するために、本実施形態の機能ユニット１２は、命令の実行中であるか否かを識別する識別子を保持するステートレジスタを有している。演算装置１０は、このステートレジスタに基づいて、命令を実行中でない機能ユニット１２を判定し、機能ユニット１２毎に命令の割り当てを行う。本実施形態では、一例として、命令を実行中である機能ユニット１２のステートレジスタは“１”であり、命令を実行中でない機能ユニット１２のステートレジスタは“０”である。 In order to implement the Out-of-Order function, the functional unit 12 of this embodiment has a state register that holds an identifier that identifies whether an instruction is being executed. Based on this state register, the arithmetic unit 10 determines which functional units 12 are not executing an instruction, and allocates an instruction to each functional unit 12. In this embodiment, as an example, the state register of the functional unit 12 that is executing an instruction is "1", and the state register of the functional unit 12 that is not executing an instruction is "0".

なお、演算に寄与する機能ユニット１２の総数（パイプライン段数と対応する機能ユニット１２の数との積）が一定であるので、ステートレジスタは、機能ユニット１２の総数を保持するために十分なｂｉｔ数があればよい。上記例では、パイプライン段数が最大で４段であり、同じ機能を有する機能ユニット１２が３種類（ＡＤＤ、ＭＵＬ、ＤＩＶ）であるので、ステートレジスタは３×４＝１２ｂｉｔでよい。なお、Out-of-Order機能を実行するためには、ＬＤ及びＳＴの機能ユニット１２が実行中か否かを判定する必要があるのでＬＤ及びＳＴも１ｂｉｔずつのステートレジスタを有する。但し、ＬＤ及びＳＴのステートレジスタは、パイプライン段数には関係なく１ｂｉｔずつであればよい。 Note that since the total number of functional units 12 contributing to an operation (the product of the number of pipeline stages and the number of corresponding functional units 12) is constant, the state register has enough bits to hold the total number of functional units 12. All you need is a number. In the above example, the maximum number of pipeline stages is four, and there are three types of functional units 12 (ADD, MUL, DIV) having the same function, so the state register may be 3×4=12 bits. Note that in order to execute the Out-of-Order function, it is necessary to determine whether or not the functional units 12 of the LD and ST are being executed, so the LD and ST also have state registers of 1 bit each. However, the state registers of LD and ST only need to have 1 bit each, regardless of the number of pipeline stages.

次に、一例として、図２を参照して、モード１で“（（Ａ＋Ｂ）＋Ｃ)×Ｄ”という演算をチェイニングする場合について説明する。モード１では、上述のように、ＡＤＤ、ＭＵＬ、ＤＩＶの機能ユニット１２の数は各々２つである。なお、図２及び後述の図３における各値（Ａ，Ｂ，Ｃ，Ｄ）に対応するベクトルレジスタ幅は固定されており、例えば６４ｂｉｔや３２ｂｉｔである。 Next, as an example, with reference to FIG. 2, a case will be described in which the operation "((A+B)+C)×D" is chained in mode 1. In mode 1, as described above, the number of functional units 12 for ADD, MUL, and DIV is two each. Note that the vector register width corresponding to each value (A, B, C, D) in FIG. 2 and FIG. 3 described later is fixed, and is, for example, 64 bits or 32 bits.

モード１では、まず“Ａ＋Ｂ”の演算結果をレジスタ（中間レジスタ）へ書き出す処理（ＡＤＤ命令）が発行される。ＡＤＤ命令の発行時点では、何れの機能ユニット１２も未だ命令を実行中でないので、全ての機能ユニット１２のステートレジスタは“０”である。ここでＡＤＤ命令を発行することにより、１つ目のＡＤＤに対応する機能ユニット１２のステートレジスタを“１”に設定する。 In mode 1, first, a process (ADD instruction) for writing the operation result of "A+B" to a register (intermediate register) is issued. At the time of issuance of the ADD instruction, none of the functional units 12 is executing the instruction yet, so the state registers of all the functional units 12 are "0". By issuing the ADD command here, the state register of the functional unit 12 corresponding to the first ADD is set to "1".

そして、次の命令サイクルでは、最初のＡＤＤ命令の完了を待たず、最初のＡＤＤ命令の結果に対して“Ｃ”を加算するＡＤＤ命令が発行される。２つ目のＡＤＤのステートレジスタは“０”なので、命令を実行中でないＡＤＤがあると判断され、２つ目のＡＤＤのステートレジスタを“１”に設定し、最初のＡＤＤ命令とのチェイニングが行われる。 Then, in the next instruction cycle, an ADD instruction that adds "C" to the result of the first ADD instruction is issued without waiting for the completion of the first ADD instruction. Since the state register of the second ADD is "0", it is determined that there is an ADD that is not executing an instruction, so the state register of the second ADD is set to "1" and chaining with the first ADD instruction is performed. will be held.

さらに、次の命令サイクルでは先行の２つのＡＤＤ命令の終了を待たずに、“（Ａ＋Ｂ）＋Ｃ”に対する“Ｄ”の乗算命令が発行される。このときＭＵＬのステートレジスタは“０”なので、１つ目のＭＵＬのステートレジスタを“１”に設定し、先行の２つのＡＤＤ命令とのチェイニングが行われる。 Furthermore, in the next instruction cycle, a multiplication instruction of "D" for "(A+B)+C" is issued without waiting for the completion of the two preceding ADD instructions. At this time, the MUL state register is "0", so the first MUL state register is set to "1", and chaining with the previous two ADD instructions is performed.

次に、図３を参照して“（Ａ＋Ｂ）×（Ｃ＋Ｄ）”という演算をモード１でチェイニングする場合について説明する。まず“（Ａ＋Ｂ）”と“（Ｃ＋Ｄ）”との演算は各々独立して実行が可能である。このため“（Ａ＋Ｂ）”の演算を一つのＡＤＤが行うと共に、“（Ｃ＋Ｄ）”の演算命令は、もう他のＡＤＤによって演算が行われる。すなわち、これらの２つのＡＤＤ命令は互いの終了を待つことなく同時に発行される。従って、モード１の場合にはハードウエアのリソースとしてこれら２つのＡＤＤ命令を同時に割り当て可能である。 Next, with reference to FIG. 3, a case will be described in which the calculation "(A+B)×(C+D)" is chained in mode 1. First, the operations "(A+B)" and "(C+D)" can be executed independently. Therefore, one ADD performs the operation of "(A+B)", and the operation instruction of "(C+D)" is performed by another ADD. That is, these two ADD commands are issued simultaneously without waiting for each other to complete. Therefore, in mode 1, these two ADD instructions can be allocated simultaneously as hardware resources.

図３の例では、ＭＵＬの機能ユニット１２が一つ空きとなる。ここで次の命令（後続命令）に上記のチェイニング演算とは依存関係のないＭＵＬ命令がある場合には、そのＭＵＬ命令を同時実行できる。結果として演算機能ユニットの稼働率をディフォルトモードよりも高くすることができ、従来と同じハードウエアリソース（演算器の数）でもより高い実効性能を発揮することが可能となる。 In the example of FIG. 3, one MUL functional unit 12 becomes vacant. Here, if the next instruction (subsequent instruction) includes a MUL instruction that has no dependency on the chaining operation described above, the MUL instructions can be executed simultaneously. As a result, the operating rate of the arithmetic function unit can be made higher than in the default mode, and higher effective performance can be achieved even with the same hardware resources (number of arithmetic units) as before.

また、本実施形態の機能ユニット１２は、命令の実行状態を示すマスクレジスタ３０（図２，３参照）を備える。そして、複数の他の機能ユニット１２による演算結果を用いて次の命令を実行する機能ユニット１２は、複数の他の機能ユニット１２のマスクレジスタ３０が命令の実行終了を示した後に次の命令を実行する。 Further, the functional unit 12 of this embodiment includes a mask register 30 (see FIGS. 2 and 3) that indicates the execution state of an instruction. Then, the functional unit 12 that executes the next instruction using the operation results of the plurality of other functional units 12 executes the next instruction after the mask register 30 of the plurality of other functional units 12 indicates the completion of instruction execution. Execute.

より具体的にはマスクレジスタ３０は、ベクトルレジスタ長に対応して設けられており、演算の進行度合いに応じて“０”が“１”に書き換えられる。各機能ユニット１２における演算が完了すると、マスクレジスタ３０は全て“１”とされる。そして、先の命令を実行した複数の機能ユニット１２のマスクレジスタ３０のＡＮＤ（論理積）によって、当該複数の機能ユニット１２による演算が完了したか否かが判定される。すなわち、先の命令を実行する複数の機能ユニット１２による演算が終了するまで、次の命令を実行する機能ユニット１２は演算を行わない。 More specifically, the mask register 30 is provided corresponding to the vector register length, and "0" is rewritten to "1" according to the degree of progress of the calculation. When the calculations in each functional unit 12 are completed, all mask registers 30 are set to "1". Then, by AND (logical product) of the mask registers 30 of the plurality of functional units 12 that have executed the previous instruction, it is determined whether or not the calculation by the plurality of functional units 12 has been completed. That is, the functional unit 12 that executes the next instruction does not perform an operation until the multiple functional units 12 that execute the previous instruction complete their operations.

すなわち、図３の例では、“Ａ＋Ｂ”の演算が完了するとレジスタ領域が全て“１”となり、“Ｃ＋Ｄ”の演算が完了するとレジスタ領域が全て“１”となる。そして、“Ａ＋Ｂ”の演算及び“Ｃ＋Ｄ”の演算が完了した場合に、“（Ａ＋Ｂ）×（Ｃ＋Ｄ）”の演算が開始される。 That is, in the example of FIG. 3, when the operation "A+B" is completed, all the register areas become "1", and when the operation "C+D" is completed, all the register areas become "1". Then, when the calculation of "A+B" and the calculation of "C+D" are completed, the calculation of "(A+B)×(C+D)" is started.

これにより、複数の機能ユニット１２が先の命令を非同期で行っても、次の命令を実行する機能ユニット１２は、複数の機能ユニット１２による命令の実行完了を待って演算を行うことになるので、エラーを生じさせることなく次の命令の実行が可能となる。 As a result, even if multiple functional units 12 execute the previous instruction asynchronously, the functional unit 12 that executes the next instruction will wait for the execution of the instructions by the multiple functional units 12 to be completed before performing the calculation. , the next instruction can be executed without causing an error.

なお、先の命令を実行するパイプラインの段数よりも次の命令を実行するパイプラインの段数が多い場合、例えば先のパイプラインが２段であり、後のパイプラインが４段の場合にも、上述と同様にマスクレジスタ３０によって先のパイプラインによる命令（演算）の進行度合いが判定される。このため、異なる段数のパイプラインの組み合わせによってチェイニングによる演算が行われる場合であっても、エラーを生じさせることなく命令の実行が可能となる。 Note that if the number of stages in the pipeline that executes the next instruction is greater than the number of stages in the pipeline that executes the previous instruction, for example, if the first pipeline has two stages and the second pipeline has four stages, , as described above, the degree of progress of the previous pipeline instruction (operation) is determined by the mask register 30. Therefore, even if a chaining operation is performed by a combination of pipelines with different numbers of stages, instructions can be executed without causing an error.

このように、本実施形態の演算装置１０は、４段のパイプラインをそれよりも段数の小さいパイプライン（２段のパイプライン）に再構成することによって、パイプラインの立ち上げ、立ち下がりのオーバーヘッドを少なくできる。このとき、２段のパイプラインでは、同じ機能を有する機能ユニット１２を複数（少なくとも２つ）備えることになる。 In this way, the arithmetic device 10 of the present embodiment can control the start-up and fall of the pipeline by reconfiguring the four-stage pipeline into a pipeline with a smaller number of stages (two-stage pipeline). Overhead can be reduced. At this time, the two-stage pipeline includes a plurality (at least two) of functional units 12 having the same function.

具体的には、上述の４段のパイプラインによって“（（Ａ＋Ｂ）＋Ｃ)×Ｄ”という演算を行う場合、“Ａ＋Ｂ＝Ｅ”の演算をチェイニングで行った後に、“（Ｅ＋Ｃ）×Ｄ”を新たなチェイニングで行う必要がある。このため、４段のパイプラインでは、パイプラインの立ち上げと立ち下がりとを２度行う必要があり、オーバーヘッドが２回分必要となる。また、“Ａ＋Ｂ＝Ｅ”の演算結果を一旦メモリに記憶（ストア）させ、“（Ｅ＋Ｃ）×Ｄ”の演算を行う場合に、演算結果である“Ｅ”をメモリから読み出す必要があり、処理が非効率であった。 Specifically, when performing the operation “((A+B)+C)×D” using the four-stage pipeline described above, after performing the operation “A+B=E” by chaining, “(E+C)×D” ” needs to be performed with new chaining. Therefore, in a four-stage pipeline, it is necessary to start and fall the pipeline twice, which requires overhead for two times. In addition, when the calculation result of "A + B = E" is temporarily stored in memory and the calculation of "(E + C) x D" is performed, it is necessary to read out the calculation result "E" from the memory and process it. was inefficient.

一方、２段のパイプラインでは、同じ機能を有する機能ユニット１２（ＡＤＤ、ＭＵＬ、ＤＩＶ）を２つずつ有することとなる。そして、ある機能ユニット１２が命令を実行している際に、命令を実行中でない他の機能ユニット１２の有無を判定し（Out-of-Order機能）、命令を実行中でない他の機能ユニット１２に対して、ある機能ユニット１２による命令実行と並行して実行可能な命令を実行させる。 On the other hand, a two-stage pipeline has two functional units 12 (ADD, MUL, DIV) each having the same function. When a certain functional unit 12 is executing an instruction, the presence or absence of another functional unit 12 that is not executing the instruction is determined (Out-of-Order function), and the other functional unit 12 that is not executing the instruction is determined. Executable instructions are executed in parallel with the instruction execution by a certain functional unit 12.

これにより、２段のパイプラインでは、“（（Ａ＋Ｂ）＋Ｃ)×Ｄ”の演算を一度のチェイニングによって行うことができ、パイプラインの立ち上げと立ち下がりと（オーバーヘッド）が１度で済む。また、２段のパイプラインでは、４段のパイプラインで行ったような演算結果を一旦メモリに記憶させるという処理は不要なため、より効率的な処理が可能となる。 As a result, in a two-stage pipeline, the operation "((A+B)+C)×D" can be performed by chaining once, and the pipeline startup and fall (overhead) can be done only once. . In addition, the two-stage pipeline does not require the process of temporarily storing the calculation results in memory, as is the case with the four-stage pipeline, so more efficient processing is possible.

このことから、本実施形態のようにパイプラインの段数を少なくし、同じ機能を有する複数の機能ユニット１２を含むパイプラインとすることで、パイプラインの立ち上げと立ち下がりとに要するオーバーヘッドの時間を削減でき、演算の効率化が可能となる。 Therefore, by reducing the number of pipeline stages and creating a pipeline that includes a plurality of functional units 12 having the same function as in this embodiment, the overhead time required for starting and falling of the pipeline can be reduced. can be reduced, making it possible to improve the efficiency of calculations.

このようなパイプラインによるチェイニングを実行するために、図１に示されるように本実施形態の演算装置１０は、実行判定部２２と制御部２４とを備える。 In order to execute such pipeline chaining, the arithmetic device 10 of this embodiment includes an execution determination section 22 and a control section 24, as shown in FIG.

実行判定部２２は、Out-of-Order機能を実行する構成要素であり、命令を実行中でない機能ユニット１２の有無を判定する。なお、実行判定部２２は、ステートレジスタに基づいて、命令を実行中でない機能ユニット１２を判定する。 The execution determination unit 22 is a component that executes an Out-of-Order function, and determines whether there is a functional unit 12 that is not executing an instruction. Note that the execution determination unit 22 determines which functional units 12 are not executing an instruction based on the state register.

制御部２４は、命令を実行中でない機能ユニット１２に対して実行可能な命令を実行させる。本実施形態の制御部２４は、何れかの機能ユニット１２が命令を実行している際に、命令を実行中でない機能ユニット１２に対して、何れかの機能ユニット１２による命令実行と並行して実行可能な命令を実行させる。 The control unit 24 causes the functional units 12 that are not currently executing instructions to execute executable instructions. The control unit 24 of this embodiment, when any of the functional units 12 is executing an instruction, instructs the functional units 12 that are not executing the instruction in parallel with the execution of the instruction by any of the functional units 12. Execute executable instructions.

なお、演算装置１０は、パイプラインの段数に応じた、換言するとモードに応じた制御を行う。例えば、４段のパイプライン（ディフォルトモード）では、Out-of-Order機能を実行せずに、２段又は１段のパイプラン（モード１又はモード２）ではOut-of-Order機能を実行する。換言すると、Out-of-Order機能は、同じ機能を有する複数の機能ユニット１２を備えるパイプラインに対して実行される。なお、モードは、ベクトル演算器２０に実行させる一連の演算に応じて適宜選択される。 Note that the arithmetic device 10 performs control according to the number of pipeline stages, in other words, according to the mode. For example, a four-stage pipeline (default mode) does not perform the Out-of-Order function, and a two-stage or one-stage pipeline (Mode 1 or Mode 2) performs the Out-of-Order function. In other words, the Out-of-Order function is executed on a pipeline comprising multiple functional units 12 having the same function. Note that the mode is appropriately selected depending on the series of calculations to be performed by the vector calculator 20.

また、これに限らず、ディフォルトモードであってもOut-of-Order機能を実行するとしてもよい。すなわち、同じ機能を有する機能ユニット１２だけでなく、異なる機能ユニット１２（ＬＤ、ＳＴ、ＭＵＬ、ＡＤＤ、ＤＩＶ）全てにおいて依存関係にない機能ユニット１２は同時に実行可能としてもよい。 Further, the present invention is not limited to this, and the Out-of-Order function may be executed even in the default mode. That is, not only the functional units 12 having the same function but also the functional units 12 that are not dependent on each other in all the different functional units 12 (LD, ST, MUL, ADD, DIV) may be executable at the same time.

なお、本実施形態の演算装置１０は、機能ユニット１２による実行待ちの命令を命令待機バッファ１４に記憶させる。そして、命令待機バッファ１４に記憶された命令を実行可能な機能ユニット１２が有った場合に、命令待機バッファ１４から命令が順次読み出されて機能ユニット１２によって実行される。これにより、命令を実行中でない機能ユニット１２へ命令を効率的に割り当てることを可能とする。 Note that the arithmetic device 10 of this embodiment stores instructions waiting for execution by the functional unit 12 in the instruction standby buffer 14. If there is a functional unit 12 that can execute the instructions stored in the instruction standby buffer 14, the instructions are sequentially read from the instruction standby buffer 14 and executed by the functional unit 12. This makes it possible to efficiently allocate instructions to functional units 12 that are not currently executing instructions.

図４は、Out-of-Order機能を実行するチェイニング演算処理の流れを示すフローチャートである。このチェイニング演算処理は、演算装置１０が備える記録媒体に格納されたプログラムによって実行される。このプログラムが実行されることで、プログラムに対応する方法が実行される。 FIG. 4 is a flowchart showing the flow of chaining calculation processing for executing the Out-of-Order function. This chaining calculation process is executed by a program stored in a recording medium included in the calculation device 10. When this program is executed, a method corresponding to the program is executed.

なお、図４に示されるチェイニング演算処理はOut-of-Order機能を実行するものであるため、このチェイニング演算処理の実行前にモードレジスタを確認することで、Out-of-Order機能を実行可能なモードであるか否かが判定される。Out-of-Order機能を実行可能なモードでない場合は、Out-of-Order機能を実行しない通常のチェイニング演算処理が実行される。または、Out-of-Order機能を実行可能なモードに再構成される。 Note that the chaining calculation process shown in Figure 4 executes the Out-of-Order function, so by checking the mode register before executing the chaining calculation process, the Out-of-Order function can be executed. It is determined whether the mode is executable. If the mode is not one in which the Out-of-Order function can be executed, normal chaining calculation processing that does not execute the Out-of-Order function is executed. or reconfiguring the Out-of-Order functionality into a viable mode.

まず、ステップＳ１００では、実行判定部２２が各機能ユニット１２のステートレジスタを確認し、命令を実行中でない機能ユニット１２の有無を判定する（Out-of-Order機能）。なお、ここでいう機能ユニット１２の有無とは、与えられた命令を実行可能な機能ユニット１２である。例えば与えられた命令がＡＤＤ命令である場合には、このＡＤＤ命令を実行可能な機能ユニット１２の有無が実行判定部２２によって判定される。 First, in step S100, the execution determination unit 22 checks the state register of each functional unit 12 and determines whether there is any functional unit 12 that is not executing an instruction (Out-of-Order function). Note that the presence or absence of the functional unit 12 here refers to the functional unit 12 that can execute a given command. For example, if the given instruction is an ADD instruction, the execution determination unit 22 determines whether there is a functional unit 12 that can execute this ADD instruction.

次のステップＳ１０２では、命令を実行中でない機能ユニット１２が有る場合は、ステップＳ１０６へ移行する一方、命令を実行しないない機能ユニット１２が無い場合は、ステップＳ１０４へ移行する。 In the next step S102, if there is a functional unit 12 that is not executing an instruction, the process moves to step S106, whereas if there is no functional unit 12 that is not executing an instruction, the process moves to step S104.

ステップＳ１０４では、命令を実行できる機能ユニット１２が無いため、当該命令を実行待ちの命令として命令待機バッファ１４へキューイングし、ステップＳ１００へ戻る。 In step S104, since there is no functional unit 12 capable of executing the instruction, the instruction is queued in the instruction standby buffer 14 as an instruction waiting to be executed, and the process returns to step S100.

ステップＳ１０６では、制御部２４が、命令を実行中でない機能ユニット１２へ命令の割り当てを行う。 In step S106, the control unit 24 allocates the command to the functional unit 12 that is not currently executing the command.

次のステップＳ１０８では、制御部２４が、命令が割り当てられた機能ユニット１２のステートレジスタを“１”に設定する。 In the next step S108, the control unit 24 sets the state register of the functional unit 12 to which the instruction is assigned to "1".

次のステップＳ１１０では、機能ユニット１２が割り当てられた命令を実行する。 In the next step S110, the functional unit 12 executes the assigned instruction.

次のステップＳ１１２では、制御部２４が、割り当てられた命令の実行を機能ユニット１２が完了したか否かを判定し、肯定判定の場合はステップＳ１１４へ移行し、否定判定の場合はステップＳ１１６へ移行する。なお、命令を実行している機能ユニット１２が複数有る場合、ステップＳ１１２では各々の機能ユニット１２毎に命令の実行が完了したか否かを判定する。 In the next step S112, the control unit 24 determines whether the functional unit 12 has completed the execution of the assigned command, and if the determination is affirmative, the process moves to step S114, and if the determination is negative, the process moves to step S116. Transition. Note that if there are a plurality of functional units 12 executing the command, it is determined in step S112 whether or not the command execution has been completed for each functional unit 12.

ステップＳ１１４では、制御部２４が、命令を完了した機能ユニット１２のステートレジスタを“０”に設定し、ステップＳ１１６へ移行する。 In step S114, the control unit 24 sets the state register of the functional unit 12 that has completed the instruction to "0", and proceeds to step S116.

ステップＳ１１６では、次の命令の有無を制御部２４が判定し、次の命令がある場合にはステップＳ１００へ戻り、次の命令に対応させて各ステップを実行する。 In step S116, the control unit 24 determines whether there is a next command, and if there is a next command, the process returns to step S100 and executes each step in response to the next command.

一方、ステップＳ１１６において次の命令がないと判定された場合には、入力された一連の演算命令の全てが完了したことになるので、本チェイニングを終了する。 On the other hand, if it is determined in step S116 that there is no next instruction, this means that all of the input series of arithmetic instructions have been completed, and the main chaining ends.

このように、本実施形態の演算装置１０は、何れかの機能ユニット１２が命令を実行している際に、命令を実行中でない機能ユニット１２に対して、何れかの機能ユニット１２による命令実行と並行して実行可能な命令を実行させる。これにより、本実施形態の演算装置１０は、パイプラインを用いたチェイニングによる演算をより効率的に行える。 In this way, the arithmetic device 10 of the present embodiment is configured such that when any functional unit 12 is executing an instruction, any functional unit 12 executes an instruction with respect to a functional unit 12 that is not executing an instruction. Executes executable instructions in parallel. Thereby, the arithmetic device 10 of this embodiment can more efficiently perform chaining operations using a pipeline.

以上、本開示を、上記実施形態を用いて説明したが、本開示の技術的範囲は上記実施形態に記載の範囲には限定されない。開示の要旨を逸脱しない範囲で上記実施形態に多様な変更又は改良を加えることができ、該変更又は改良を加えた形態も本開示の技術的範囲に含まれる。 Although the present disclosure has been described above using the above embodiments, the technical scope of the present disclosure is not limited to the range described in the above embodiments. Various changes or improvements can be made to the embodiments described above without departing from the gist of the disclosure, and forms with such changes or improvements are also included within the technical scope of the present disclosure.

例えば、上記実施形態では、４段のパイプラインを２段のパイプラインや１段のパイプラインに再構成する形態について説明したが、本開示は、これに限定されるものではない。例えば、５段やそれ以上の段数のパイプラインをより少ない段数のパイプラインに再構成してもよい。また、パイプラインの再構成という概念を有さず、ベクトル演算器２０は、例えば、２段で固定されたパイプラインで構成されてもよい。

For example, in the embodiment described above, a four-stage pipeline is reconfigured into a two-stage pipeline or a one-stage pipeline, but the present disclosure is not limited thereto. For example, a pipeline with five or more stages may be reconfigured into a pipeline with fewer stages. Further, without the concept of pipeline reconfiguration, the vector calculator 20 may be configured with a fixed two-stage pipeline, for example.

Claims

An arithmetic device (10) comprising a pipeline each comprising a plurality of functional units (12) of a plurality of types and performing arithmetic operations by chaining,
a determination unit (22) that determines the presence or absence of the functional unit that is not executing an instruction;
When any of the functional units is executing an instruction, causing the functional unit that is not executing the instruction to execute the instruction that can be executed in parallel with the execution of the instruction by any of the functional units. A control unit (24) that reconfigures the pipeline to have a smaller number of stages;
A calculation device comprising:

The functional unit includes a mask register (30) that indicates the execution state of the instruction,
The functional unit that executes the next instruction using the operation results of the plurality of other functional units executes the next instruction after the mask registers of the plurality of other functional units indicate completion of execution of the instruction. The arithmetic device according to claim 1, which executes the arithmetic operation.

The functional unit is set with an identifier that identifies whether or not the instruction is being executed;
The determination unit determines the functional unit that is not executing the instruction based on the identifier.
The arithmetic device according to claim 1 or claim 2.

the instructions awaiting execution by the functional unit are stored in a storage medium (14);
If there is a functional unit capable of executing the instructions stored in the storage medium, the instructions are sequentially read from the storage medium and executed by the functional unit;
An arithmetic device according to any one of claims 1 to 3.

An arithmetic method using chaining using a pipeline each comprising a plurality of functional units each having a plurality of types of functions, the method comprising:
a first step of determining the presence or absence of the functional unit that is not executing an instruction;
When any of the functional units is executing an instruction, causing the functional unit that is not executing the instruction to execute the instruction that can be executed in parallel with the execution of the instruction by any of the functional units. The second step is to reconfigure the pipeline to have a smaller number of stages.
A calculation method having.