JP2008181179A

JP2008181179A - Processor and arithmetic unit

Info

Publication number: JP2008181179A
Application number: JP2007012316A
Authority: JP
Inventors: Motohiko Okabe; 基彦岡部
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2007-01-23
Filing date: 2007-01-23
Publication date: 2008-08-07

Abstract

<P>PROBLEM TO BE SOLVED: To achieve high speed arithmetic execution for preventing a bottleneck from being caused by a low-speed memory access. <P>SOLUTION: A processor 1 for performing arithmetic processing based on an instruction and an operand stored in a main memory 2 is provided with: a register 5; an internal memory 6; a fetch unit 4 to be operated by a first operation clock for transferring an instruction and an operand from the main memory 2 to the internal memory 6 for storage; and an arithmetic execution unit 3 to be operated by a second operation clock different from the first operation clock for performing an arithmetic operation by using the register 5 based on the instruction and the operand stored in the internal memory 6. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、プロセッサおよびそれを用いた演算装置に関する。 The present invention relates to a processor and an arithmetic device using the processor.

電子機器にて演算処理を行う装置は、プロセッサとメモリおよびその他周辺回路にて構成され、常に更なる高速化を求められている。ＣＩＳＣやＲＩＳＣなどのアーキテクチャによるマイクロプロセッサでは、プロセッサの高速化のために、命令サイクルをステージに分割したパイプライン処理や、メモリを階層化したキャッシュメモリによる高速アクセスが用いられる。また、他のアーキテクチャでは、動的再構成を行うことで高速演算処理を可能とするリコンフィギュアラブルプロセッサがある。 An apparatus that performs arithmetic processing in an electronic device is composed of a processor, a memory, and other peripheral circuits, and always requires higher speed. In a microprocessor based on an architecture such as CISC or RISC, pipeline processing in which instruction cycles are divided into stages and high-speed access by a cache memory in which memories are hierarchized are used for speeding up the processor. In other architectures, there are reconfigurable processors that enable high-speed arithmetic processing by performing dynamic reconfiguration.

パイプライン処理では、プロセッサの１命令の実行過程を複数のステージに分割して並列実行する。ステージ数はプロセッサによって異なり、一般的にはステージ数が大きくなるほど高速化するが、プロセッサのハード構造が複雑化し、回路規模の増大を招く。複数のステージへの分割としては、たとえば命令フェッチ（ＩＦ）、デコード（ＩＤ）、オペランドのアドレス計算（ＯＡ）、オペランドフェッチ（ＯＦ）、命令の実行（ＥＸ）および演算結果の格納（ＲＳ）の６つのステージに分割した例がある。 In pipeline processing, the execution process of one instruction of a processor is divided into a plurality of stages and executed in parallel. The number of stages varies depending on the processor. Generally, the higher the number of stages, the higher the speed, but the hardware structure of the processor becomes complicated and the circuit scale increases. Examples of the division into a plurality of stages include instruction fetch (IF), decode (ID), operand address calculation (OA), operand fetch (OF), instruction execution (EX), and operation result storage (RS). There is an example divided into six stages.

メモリの階層化では、たとえば、プロセッサの演算実行ユニットに最も近い階層から、レジスタ、キャッシュメモリ、メインメモリ、補助記憶装置と構成する。この場合、レジスタが最も高速にアクセス可能であるが、記憶容量が最も小さい。補助記憶装置は、最も記憶容量が大きいが、アクセス速度が最も低速となる。 In the memory hierarchization, for example, a register, a cache memory, a main memory, and an auxiliary storage device are configured from the hierarchy closest to the processor operation execution unit. In this case, the register can be accessed at the highest speed, but the storage capacity is the smallest. The auxiliary storage device has the largest storage capacity, but the access speed is the slowest.

プロセッサ内部での最高動作周波数で動作するレジスタに対してプロセッサ外部のメインメモリのアクセスは低速となるため、命令サイクルのフェッチ時にメインメモリにアクセスした命令とオペランドをプロセッサ内部のキャッシュメモリに蓄積する方法がある（たとえば特許文献１参照）。この場合、キャッシュメモリにヒットした場合は高速アクセスすることができるが、キャッシュメモリをミスヒットした場合はキャッシュメモリよりも低速のメインメモリへアクセスすることになる。 A method of accumulating instructions and operands accessed to the main memory in the cache memory inside the processor when fetching an instruction cycle, because access to the main memory outside the processor is slow for a register operating at the maximum operating frequency inside the processor (See, for example, Patent Document 1). In this case, when the cache memory is hit, high-speed access can be performed, but when the cache memory is miss-hit, the main memory having a lower speed than the cache memory is accessed.

リコンフィギュアラブルプロセッサは、演算内容をプロセッサ内部のハード回路に構成し、メインメモリに蓄積した演算内容をプロセッサに転送してハード回路を切換えることで様々な演算に対応する。
特開平６−２４２９５１号公報 A reconfigurable processor configures the operation content in a hardware circuit inside the processor, and transfers the operation content stored in the main memory to the processor and switches the hardware circuit to cope with various operations.
JP-A-6-242951

パイプライン処理では、パイプラインのステージの中で、命令フェッチ（ＩＦ）とオペランドフェッチ（ＯＦ）でメモリアクセスが必要となる。キャッシュメモリをミスヒットした命令サイクルでは、メインメモリへアクセスした命令およびオペランドをキャッシュメモリへ蓄積する。この際、キャッシュメモリへの蓄積量が最大容量を超えると、所定のアルゴリズムに従ってメモリが上書きされる。このため、全てがキャッシュメモリにヒットした場合はパイプラインでの並列実行が理想的に行われるが、キャッシュメモリのヒット率に従ってミスヒットし、メインメモリへのアクセスが生じて、演算実行の高速化のボトルネックとなる。 In pipeline processing, memory access is required for instruction fetch (IF) and operand fetch (OF) in the pipeline stage. In the instruction cycle in which the cache memory is miss-hit, the instruction and operand that accessed the main memory are stored in the cache memory. At this time, if the amount stored in the cache memory exceeds the maximum capacity, the memory is overwritten according to a predetermined algorithm. For this reason, parallel execution in the pipeline is ideally performed when everything hits the cache memory, but a miss occurs according to the hit rate of the cache memory, and access to the main memory occurs, resulting in faster execution of operations. Bottleneck.

リコンフィギュアラブルプロセッサでは、ハード回路の容量に制限があり、ハード回路に配置可能な演算処理は高速実行可能であるが、ハード回路の容量を超える演算内容については、メインメモリへのアクセスおよびハード回路の切換えを必要として高速化できない。ハード回路の大容量化はプロセッサのコストアップへ直結し、最大の容量は物理的にも制限される課題があった。 In the reconfigurable processor, the capacity of the hard circuit is limited, and the arithmetic processing that can be arranged in the hard circuit can be executed at high speed. However, for the arithmetic contents exceeding the capacity of the hard circuit, access to the main memory and the hard circuit It is not possible to increase the speed because switching is required. Increasing the capacity of the hardware circuit directly increases the cost of the processor, and the maximum capacity is physically limited.

そこで、本発明は、低速なメモリアクセスがボトルネックとならない高速演算実行を可能とすることを目的とする。 Therefore, an object of the present invention is to enable high-speed arithmetic execution in which low-speed memory access does not become a bottleneck.

上述の課題を解決するため、本発明は、メインメモリに記憶された命令およびオペランドに基づいて演算処理を行うプロセッサにおいて、レジスタと、内部メモリと、第１の動作クロックで動作して命令およびオペランドを前記メインメモリから前記内部メモリに転送して格納するフェッチユニットと、前記第１の動作クロックと異なる第２の動作クロックで動作して前記内部メモリに格納された命令およびオペランドに基づいて前記レジスタを用いて演算を行う演算実行ユニットと、を有することを特徴とする。 In order to solve the above-described problems, the present invention provides a processor that performs arithmetic processing based on instructions and operands stored in a main memory, and operates with a register, an internal memory, and a first operation clock. Fetch unit for transferring and storing data from the main memory to the internal memory, and the register based on instructions and operands stored in the internal memory operating at a second operating clock different from the first operating clock. And an operation execution unit that performs an operation using.

また、本発明は、演算装置において、メインメモリと、レジスタ、内部メモリ、第１の動作クロックで動作して命令およびオペランドを前記メインメモリから前記内部メモリに転送して格納するフェッチユニット、および、前記第１の動作クロックと異なる第２の動作クロックで動作して前記内部メモリに格納された命令およびオペランドに基づいて前記レジスタを用いて演算を行う演算実行ユニットを備えたプロセッサと、を有することを特徴とする。 In the arithmetic device, the present invention provides a main memory, a register, an internal memory, a fetch unit that operates with a first operation clock and transfers instructions and operands from the main memory to the internal memory for storage. And a processor including an operation execution unit that operates using a second operation clock different from the first operation clock and performs an operation using the register based on an instruction and an operand stored in the internal memory. It is characterized by.

本発明によれば、低速なメモリアクセスがボトルネックとならない高速演算実行が可能となる。 According to the present invention, it is possible to execute high-speed computations in which low-speed memory access does not become a bottleneck.

本発明に係る演算装置の実施の形態を、図面を参照して説明する。なお、同一または類似の構成には同一の符号を付し、重複する説明は省略する。なお、以下の実施の形態は単なる例示であり、本発明はこれらに限定されない。 Embodiments of an arithmetic device according to the present invention will be described with reference to the drawings. In addition, the same code | symbol is attached | subjected to the same or similar structure, and the overlapping description is abbreviate | omitted. The following embodiments are merely examples, and the present invention is not limited to these.

図１は、本発明に係る演算装置の一実施の形態におけるブロック図である。 FIG. 1 is a block diagram of an embodiment of an arithmetic device according to the present invention.

本実施の形態の演算装置１０は、プロセッサ１とメインメモリ２を有している。プロセッサ１は、演算実行ユニット３、フェッチユニット４、レジスタ５および内部メモリ６を有している。演算実行ユニット３、フェッチユニット４、レジスタ５および内部メモリ６は、互いに接続されている。また、プロセッサ１とメインメモリ２は、フェッチユニット４を介して接続されている。メインメモリ２および内部メモリ６は、それぞれ命令（Ｉｎｓｔｒｕｃｔｉｏｎ）用とオペランド（Ｏｐｅｒａｎｄ）用の領域を備えており、これらの領域を命令領域２１，６１およびオペランド領域２２，６２と呼ぶ。 The arithmetic device 10 according to the present embodiment includes a processor 1 and a main memory 2. The processor 1 includes an operation execution unit 3, a fetch unit 4, a register 5, and an internal memory 6. The arithmetic execution unit 3, the fetch unit 4, the register 5, and the internal memory 6 are connected to each other. The processor 1 and the main memory 2 are connected via a fetch unit 4. The main memory 2 and the internal memory 6 include areas for instructions (instructions) and operands (operands), and these areas are referred to as instruction areas 21 and 61 and operand areas 22 and 62, respectively.

演算実行ユニット３は、内部メモリ６とレジスタ５を使用して演算する。また、フェッチユニット４は、メインメモリ２と内部メモリ６との間のデータ転送を行う。 The operation execution unit 3 performs operations using the internal memory 6 and the register 5. The fetch unit 4 performs data transfer between the main memory 2 and the internal memory 6.

演算実行ユニット３とフェッチユニット４は、それぞれパイプラインを構成している。 The operation execution unit 3 and the fetch unit 4 each constitute a pipeline.

図２は、本実施の形態における演算実行ユニットおよびフェッチユニットの命令サイクルを示す図である。図２の横軸は、時間の経過を示している。 FIG. 2 is a diagram showing instruction cycles of the operation execution unit and the fetch unit in the present embodiment. The horizontal axis in FIG. 2 indicates the passage of time.

演算実行ユニット３は、６ステージ構成の命令サイクル７で動作する。これら６つのステージとは、命令フェッチ（ＩＦ）、命令デコード（ＩＤ）、オペランドのアドレス計算（ＯＡ）、オペランドフェッチ（ＯＦ）、命令の実行（ＥＸ）、および、演算結果の格納（ＲＳ）である。図２の縦軸で同じ位置にある各ステージは、同一の命令サイクルであることを意味している。 The arithmetic execution unit 3 operates in an instruction cycle 7 having a 6-stage configuration. These six stages are instruction fetch (IF), instruction decode (ID), operand address calculation (OA), operand fetch (OF), instruction execution (EX), and operation result storage (RS). is there. Each stage at the same position on the vertical axis in FIG. 2 means the same instruction cycle.

また、フェッチユニット４は、５ステージ構成の命令サイクル８で動作する。これら５つのステージとは、メインメモリ２からの命令フェッチおよび内部メモリへの命令転送（ＩＦ２）、フェッチユニット４内での命令デコード（ＩＤ２）、フェッチユニット４内でのオペランドのアドレス計算（ＯＡ２）、メインメモリ２からのオペランドフェッチおよび内部メモリへの転送（ＯＦ２）、および、内部メモリ６に格納された演算結果のメインメモリ２への格納（ＲＳ２）である。 The fetch unit 4 operates in an instruction cycle 8 having a 5-stage configuration. These five stages are: instruction fetch from main memory 2 and instruction transfer to internal memory (IF2), instruction decode in fetch unit 4 (ID2), operand address calculation in fetch unit 4 (OA2) The operand fetch from the main memory 2 and the transfer to the internal memory (OF2), and the calculation result stored in the internal memory 6 is stored in the main memory 2 (RS2).

演算実行ユニット３の命令サイクルの６つのステージは、第１の動作クロックで行われる。また、フェッチユニット４の命令サイクルの５つのステージは、第２の動作クロックで行われる。内部メモリ６に比べて低速なメインメモリ２にアクセスするフェッチユニット４の第２の動作クロックは、第１の動作クロックに比べて小さい。このように、演算実行ユニット３とフェッチユニット４は、非同期で動作する。 The six stages of the instruction cycle of the arithmetic execution unit 3 are performed with the first operation clock. The five stages of the instruction cycle of the fetch unit 4 are performed with the second operation clock. The second operation clock of the fetch unit 4 that accesses the main memory 2 that is slower than the internal memory 6 is smaller than the first operation clock. Thus, the operation execution unit 3 and the fetch unit 4 operate asynchronously.

図３は、本実施の形態におけるメインメモリ２の命令領域２１のマップを示す図である。 FIG. 3 is a diagram showing a map of the instruction area 21 of the main memory 2 in the present embodiment.

命令をフェッチするアドレスは、所定のアドレスから分岐命令までは連続した領域となり、メインメモリ２に高速にバーストアクセスすることで命令フェッチを行うことができる。分岐命令にて分岐無しの場合は連続領域として扱えるが、分岐命令にて分岐有りの場合は分岐命令のオペランドで示されたアドレスからのフェッチが必要となる。 An instruction fetch address is a continuous area from a predetermined address to a branch instruction, and instruction fetch can be performed by burst access to the main memory 2 at high speed. When a branch instruction has no branch, it can be handled as a continuous area, but when a branch instruction has a branch, fetching from the address indicated by the operand of the branch instruction is required.

そこで、フェッチユニット４の命令サイクルのＯＦ２において、分岐命令の場合はオペランドで示された分岐先に従って、分岐無しの場合と分岐有りの場合の両者の命令をフェッチユニット４が内部メモリ６に蓄積する。これにより、分岐命令実行に対しても演算実行ユニット３のパイプライン処理を遅滞無く継続することができる。 Therefore, in the OF2 of the instruction cycle of the fetch unit 4, in the case of a branch instruction, the fetch unit 4 accumulates both instructions with and without a branch in the internal memory 6 according to the branch destination indicated by the operand. . As a result, the pipeline processing of the operation execution unit 3 can be continued without delay even for branch instruction execution.

分岐命令のオペランドがレジスタ５をアドレスとする場合には、フェッチユニット４がレジスタ５にアクセスすることで、メインメモリ２での分岐先のアドレスを得て、そのアドレスにアクセスすることができる。 When the operand of the branch instruction uses the register 5 as an address, the fetch unit 4 accesses the register 5 to obtain the branch destination address in the main memory 2 and access the address.

図４は、本実施の形態におけるメインメモリ２のオペランド領域２２のマップを示す図である。 FIG. 4 is a diagram showing a map of the operand area 22 of the main memory 2 in the present embodiment.

また、メインメモリ２のオペランドをフェッチするアドレスは命令に含まれており、連続した領域へのテーブル処理命令を除けば、命令毎に異なる値となるランダムアクセスを必要とする。フェッチユニット４は、メインメモリ２からの命令フェッチおよび内部メモリ６への命令転送（ＩＦ２）、命令デコード（ＩＤ２）、オペランドのアドレス計算（ＯＡ２）を行う。このため、フェッチユニット４は、ランダムな値をとるオペランドのアドレスを把握し、内部メモリ６へオペランドを転送することができる。 The address for fetching the operand of the main memory 2 is included in the instruction. Random access having a different value for each instruction is required except for the table processing instruction to the continuous area. The fetch unit 4 performs instruction fetch from the main memory 2, instruction transfer to the internal memory 6 (IF2), instruction decode (ID2), and operand address calculation (OA2). For this reason, the fetch unit 4 can grasp the address of an operand that takes a random value and transfer the operand to the internal memory 6.

さらに、フェッチユニット４は、メインメモリ２のオペランドアドレスを内部メモリ６のオペランドアドレスへ変換しておく。これにより、演算実行ユニット３は、内部メモリ６に転送されて格納されたオペランドを用いて命令を実行できる。また、レジスタ５をメインメモリ２のオペランドアドレスとする場合には、フェッチユニット４がレジスタ５にアクセスすることで、メインメモリ２でのオペランドアドレスを得て、メインメモリ２のそのアドレスにアクセスすることができる。 Further, the fetch unit 4 converts the operand address of the main memory 2 into the operand address of the internal memory 6. Thereby, the operation execution unit 3 can execute an instruction using the operand transferred to the internal memory 6 and stored. When the register 5 is used as the operand address of the main memory 2, the fetch unit 4 accesses the register 5 to obtain the operand address in the main memory 2, and accesses the address in the main memory 2. Can do.

このようにして、演算実行ユニット３は、フェッチユニット４が予め内部メモリ６に転送した命令およびオペランドを使用して演算を実行する。このため、演算実行ユニット３の命令サイクルは、外部の低速なメインメモリ２へのアクセスサイクルに縛られることなくプロセッサ１内部での最高動作周波数でパイプライン処理を継続することができる。すなわち、低速なメモリアクセスがボトルネックとならない高速演算実行が可能となる。 In this way, the operation execution unit 3 executes an operation using the instruction and operand that the fetch unit 4 has previously transferred to the internal memory 6. For this reason, the instruction cycle of the operation execution unit 3 can continue the pipeline processing at the maximum operating frequency inside the processor 1 without being restricted by the access cycle to the external low-speed main memory 2. That is, it is possible to execute high-speed computations where low-speed memory access does not become a bottleneck.

本発明に係る演算装置の一実施の形態におけるブロック図である。It is a block diagram in one embodiment of an arithmetic unit according to the present invention. 本発明に係る演算装置の一実施の形態における演算実行ユニットおよびフェッチユニットの命令サイクルを示す図である。It is a figure which shows the instruction cycle of the operation execution unit and fetch unit in one Embodiment of the arithmetic unit which concerns on this invention. 本発明に係る演算装置の一実施の形態におけるメインメモリの命令領域のマップを示す図である。It is a figure which shows the map of the instruction area of the main memory in one Embodiment of the arithmetic unit which concerns on this invention. 本発明に係る演算装置の一実施の形態におけるメインメモリのオペランド領域のマップを示す図である。It is a figure which shows the map of the operand area | region of the main memory in one Embodiment of the arithmetic unit which concerns on this invention.

Explanation of symbols

１…プロセッサ、２…メインメモリ、３…演算実行ユニット、４…フェッチユニット、５…レジスタ、６…内部メモリ、７，８…命令サイクル、１０…演算装置、２１…命令領域、２２…オペランド領域、６１…命令領域、６２…オペランド領域 DESCRIPTION OF SYMBOLS 1 ... Processor, 2 ... Main memory, 3 ... Operation execution unit, 4 ... Fetch unit, 5 ... Register, 6 ... Internal memory, 7, 8 ... Instruction cycle, 10 ... Arithmetic unit, 21 ... Instruction area, 22 ... Operand area 61 ... Instruction area 62 ... Operand area

Claims

In a processor that performs arithmetic processing based on instructions and operands stored in main memory,
Registers,
Internal memory,
A fetch unit that operates at a first operating clock to transfer and store instructions and operands from the main memory to the internal memory;
An operation execution unit that operates using a second operation clock different from the first operation clock and performs an operation using the register based on an instruction and an operand stored in the internal memory;
A processor characterized by comprising:

When the instruction transferred to and stored in the internal memory is a branch instruction, the fetch unit receives the instruction and the operand from the branch destination address indicated by the address when there is no branch and the branch instruction operand. 2. The processor according to claim 1, wherein the processor is transferred to and stored in a memory.

3. The processor according to claim 2, wherein the fetch unit converts an address of the main memory at a branch destination indicated by an operand of the branch instruction into an address of the internal memory.

4. The fetch unit according to claim 2, wherein when the operand of the branch instruction uses the register as an address, the fetch unit accesses the register to obtain an address of the branch destination main memory. The processor described in.

The fetch unit includes instruction fetch from the main memory and instruction transfer to the internal memory, instruction decode, operand address calculation, operand fetch from the main memory and operand transfer to the internal memory, and the internal 5. The processor according to claim 1, wherein the operation result stored in the memory is stored in the main memory by a pipeline. 6.

The processor according to any one of claims 1 to 5, wherein the fetch unit converts an operand address of the main memory into an operand address of the internal memory.

7. The fetch unit according to claim 1, wherein when the register is used as an operand address of the main memory, the fetch unit accesses the register and acquires the operand address of the main memory. The processor according to item 1.

Main memory,
A register, an internal memory, a fetch unit that operates with a first operation clock to transfer instructions and operands from the main memory to the internal memory, and a second operation clock different from the first operation clock A processor comprising an operation execution unit that operates and performs an operation using the register based on instructions and operands stored in the internal memory;
An arithmetic device comprising: