JP2010262600A

JP2010262600A - Processor

Info

Publication number: JP2010262600A
Application number: JP2009114896A
Authority: JP
Inventors: Kohei Oikawa; 恒平及川
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2009-05-11
Filing date: 2009-05-11
Publication date: 2010-11-18

Abstract

<P>PROBLEM TO BE SOLVED: To provide a processor capable of enhancing efficiency of time for operating an arithmetic unit to thereby improve performance of serial processing. <P>SOLUTION: The processor 1 has: a clock control circuit 21 which outputs a control clock signal ECLK generated on the basis of a supplied clock signal CLK according to a control signal S_CYCLE for controlling the clock signal CLK; and a plurality of serially connected ALUs 22-25. In an operation stage where a plurality of operations are executed by the plurality of ALUs 22-25, the plurality of operations to be serially executed are executed in one execution cycle by changing execution cycles on the basis of the control clock signal ECLK from the clock control circuit 21. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、プロセッサに関し、特に、制御クロック信号によって演算の実行サイクル数を変更することができるプロセッサに関する。 The present invention relates to a processor, and more particularly to a processor that can change the number of execution cycles of an operation by a control clock signal.

従来、パーソナルコンピュータ、組み込み機器等の情報機器において、プロセッサは性能を左右する重要な要素の１つである。プロセッサの性能向上には、高クロック化、処理の並列化等が広く実施されていて、特に近年は消費電力あるいは熱の問題から高クロック化が困難になり、並列処理に重点が置かれるようになっている。しかし、並列化できずに順番に実行しなければならない逐次演算処理も多数存在し、これらの処理時間を最適化することも必要である。 Conventionally, in an information device such as a personal computer or an embedded device, the processor is one of the important factors that influence the performance. In order to improve processor performance, high clocks and parallel processing have been widely implemented. In recent years, it has become difficult to increase clocks due to problems with power consumption or heat, so that emphasis is placed on parallel processing. It has become. However, there are many sequential operations that must be executed in order without being parallelized, and it is necessary to optimize these processing times.

上述したように、逐次演算処理を同期クロック式プロセッサで実行する場合の問題として、演算時間の非効率性がある。演算の中にはビット数の大きい加減算もあれば、ビット数の小さい論理演算もある。これらの演算に必要な時間は大きく異なるが、マイクロプロセッサで実行する場合、等しく１クロック時間かけなければならない場合多い。 As described above, there is an inefficiency of calculation time as a problem when the sequential calculation processing is executed by the synchronous clock processor. Some operations include addition / subtraction with a large number of bits, and other logical operations have a small number of bits. The time required for these operations varies greatly, but when executed by a microprocessor, it often takes one clock time equally.

ここでは、プロセッサの演算ステージだけに着目し、前後の命令フェッチ、メモリ制御などは演算ステージの動作に影響しないと仮定する。例えば、６個の演算A〜Fを実行するために６サイクル掛かっているが、１サイクル時間のほぼ全てを使う演算もあれば、非常に短い時間で実行可能な演算もある。このように、非常に短い時間で実行可能な演算の場合、従来の構成では演算を実行していない無駄な時間が存在するという問題があった。 Here, attention is paid only to the operation stage of the processor, and it is assumed that instruction fetches before and after, memory control, etc. do not affect the operation of the operation stage. For example, six cycles are required to execute six operations A to F, but there are operations that use almost all of one cycle time and other operations that can be executed in a very short time. As described above, in the case of an operation that can be executed in a very short time, there is a problem that there is a useless time in which the operation is not executed in the conventional configuration.

また、積和演算等の特定の複合演算のみ１サイクルで実行できるようした構成のプロセッサがある。この構成は、例えばマルチメディア処理の性能向上を目的として適用されるが、上述した無駄な時間を削減する効果もある。しかし、この構成では、複合演算を１サイクル時間に収めるため、クロック周期が長くなりやすい。そのため、演算の種類によっては、無駄な時間が増えてしまうという問題があった。 In addition, there is a processor having a configuration in which only a specific composite operation such as a product-sum operation can be executed in one cycle. This configuration is applied, for example, for the purpose of improving the performance of multimedia processing, but also has the effect of reducing the above-described wasted time. However, in this configuration, since the composite operation is held in one cycle time, the clock cycle tends to be long. For this reason, there is a problem in that useless time increases depending on the type of calculation.

そこで、命令の実行時間の相違に基づくハザードの発生を防ぐ情報処理装置が提案されている（例えば、特許文献１参照）。この提案の情報処理装置は、命令コードの実行サイクル数に基づいて、クロック信号の周期を変化させ、その周期が変化されたクロック信号に従って、命令コードを実行する。 In view of this, an information processing apparatus that prevents occurrence of a hazard based on a difference in instruction execution time has been proposed (see, for example, Patent Document 1). The proposed information processing apparatus changes the cycle of the clock signal based on the number of execution cycles of the instruction code, and executes the instruction code according to the clock signal whose cycle has been changed.

しかしながら、この提案の情報処理装置は、１つの演算が複数サイクルかかる場合、クロック信号をマスクして、命令を効率的に実行するものであり、演算器の使用効率を上げて、複数の演算を１度に直列に実行するものではない。 However, in the proposed information processing apparatus, when one operation takes a plurality of cycles, the clock signal is masked and the instructions are efficiently executed. It does not execute in series at a time.

特開２００６−１２６８９３号公報JP 2006-126893 A

本発明は、演算器の使用時間効率を上げ、逐次処理の性能向上をすることができるプロセッサを提供することを目的とする。 An object of the present invention is to provide a processor capable of improving the use time efficiency of an arithmetic unit and improving the performance of sequential processing.

本発明の一態様によれば、メモリから読み出した命令コードを実行するプロセッサであって、クロック信号を制御するためのクロック信号制御情報に応じて、供給されたクロック信号に基づいて生成された制御クロック信号を出力する制御クロック生成部と、直列に接続された複数の演算器と、を有し、前記複数の演算器のそれぞれが実行する演算処理の内容の情報及び前記演算処理に必要なデータの情報を含む動作モード情報に応じて、前記複数の演算器によって複数の演算が実行される演算ステージにおいて、直列に実行される前記複数の演算を、前記制御クロック生成部からの前記制御クロック信号に基づいて、実行サイクルを変更して１つの実行サイクル内で実行させるようにしたことを特徴とするプロセッサを提供することができる。 According to one aspect of the present invention, a processor that executes an instruction code read from a memory, the control generated based on a supplied clock signal according to clock signal control information for controlling the clock signal A control clock generation unit that outputs a clock signal; and a plurality of arithmetic units connected in series; information on contents of arithmetic processing executed by each of the plurality of arithmetic units and data necessary for the arithmetic processing In the operation stage in which a plurality of operations are executed by the plurality of arithmetic units according to the operation mode information including the information of the control clock signal from the control clock generator Based on the above, it is possible to provide a processor characterized in that the execution cycle is changed to be executed within one execution cycle. .

本発明のプロセッサによれば、演算器の使用時間効率を上げ、逐次処理の性能向上をすることができる。 According to the processor of the present invention, it is possible to increase the usage time efficiency of the arithmetic unit and improve the performance of sequential processing.

本発明の第１の実施の形態に係るプロセッサシステムの構成を示すブロック図である。It is a block diagram which shows the structure of the processor system which concerns on the 1st Embodiment of this invention. 命令コードの例を説明するための説明図である。It is explanatory drawing for demonstrating the example of an instruction code. 命令コードを生成する処理の流れの例を説明するためのフローチャートである。It is a flowchart for demonstrating the example of the flow of a process which produces | generates an instruction code. ALU２２の構成の例を説明するための説明図である。4 is an explanatory diagram for explaining an example of a configuration of an ALU 22. FIG. モード情報と演算処理との対応の例を説明するための説明図である。It is explanatory drawing for demonstrating the example of a response | compatibility with mode information and a calculation process. 演算部１５ａによって実行される演算ステージの動作の例を説明するための説明図である。It is explanatory drawing for demonstrating the example of operation | movement of the calculation stage performed by the calculating part 15a. 本発明の第２の実施の形態に係るプロセッサシステムの構成を示すブロック図である。It is a block diagram which shows the structure of the processor system which concerns on the 2nd Embodiment of this invention. 命令コードの例を説明するための説明図である。It is explanatory drawing for demonstrating the example of an instruction code. ALU３１〜３８の接続の例を説明するための説明図である。It is explanatory drawing for demonstrating the example of the connection of ALU31-38. ALU３１〜３８の接続の例を説明するための説明図である。It is explanatory drawing for demonstrating the example of the connection of ALU31-38. ALU３１の構成の例を説明するための説明図である。4 is an explanatory diagram for explaining an example of a configuration of an ALU 31. FIG. モード情報と演算処理の例を説明するための説明図である。It is explanatory drawing for demonstrating the example of mode information and a calculation process. 演算処理の例を説明するための説明図である。It is explanatory drawing for demonstrating the example of a calculation process. 本発明の第３の実施の形態に係るプロセッサシステムの構成を示すブロック図である。It is a block diagram which shows the structure of the processor system which concerns on the 3rd Embodiment of this invention. エントリ０〜１５のそれぞれが有している情報の例を説明するための説明図である。It is explanatory drawing for demonstrating the example of the information which each of the entries 0-15 has. 命令コードの生成の処理の流れの例を説明するためのフローチャートである。It is a flowchart for demonstrating the example of the flow of a process of the production | generation of an instruction code. 命令コードの生成の処理の流れの例を説明するためのフローチャートである。It is a flowchart for demonstrating the example of the flow of a process of the production | generation of an instruction code. 本発明の第４の実施の形態に係るプロセッサシステムの構成を示すブロック図である。It is a block diagram which shows the structure of the processor system which concerns on the 4th Embodiment of this invention.

以下、図面を参照して本発明の実施の形態について詳細に説明する。
（第１の実施の形態）
まず、図１に基づき、本発明の第１の実施の形態に係るプロセッサシステムの構成について説明する。図１は、本発明の第１の実施の形態に係るプロセッサシステムの構成を示すブロック図である。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
(First embodiment)
First, the configuration of the processor system according to the first embodiment of the present invention will be described with reference to FIG. FIG. 1 is a block diagram showing a configuration of a processor system according to the first embodiment of the present invention.

図１に示すように、プロセッサシステム１００は、プロセッサ１と、クロック発生器１０１と、メモリインターファイス（以下、メモリI/Fという）１０２と、バス１０３と、メモリ１０４とを有して構成されている。 As shown in FIG. 1, the processor system 100 includes a processor 1, a clock generator 101, a memory interface (hereinafter referred to as a memory I / F) 102, a bus 103, and a memory 104. ing.

クロック発生器１０１は、所定の周波数のクロック信号CLKを発生し、発生したクロック信号CLKをプロセッサ１に供給する。プロセッサ１は、メモリ１０４に格納されている命令コードをバス１０３及びメモリI/F１０２を介して読み出し、読み出した命令コードに基づいた所定の演算処理を実行する。また、プロセッサ１は、所定の演算処理を実行して得られた演算結果をメモリI/F１０２及びバス１０３を介してメモリ１０４に格納する。 The clock generator 101 generates a clock signal CLK having a predetermined frequency and supplies the generated clock signal CLK to the processor 1. The processor 1 reads out an instruction code stored in the memory 104 via the bus 103 and the memory I / F 102, and executes predetermined arithmetic processing based on the read instruction code. Further, the processor 1 stores a calculation result obtained by executing predetermined calculation processing in the memory 104 via the memory I / F 102 and the bus 103.

プロセッサ１は、命令フェッチ部１１と、命令デコード部１２と、３つのパイプラインレジスタ１３ａ、１３ｂ及び１３ｃと、データレジスタ１４と、演算部１５ａ及び１５ｂと、セレクタ１６と、メモリアクセス部１７と、ライトバック部１８とを有して構成されている。 The processor 1 includes an instruction fetch unit 11, an instruction decode unit 12, three pipeline registers 13a, 13b and 13c, a data register 14, arithmetic units 15a and 15b, a selector 16, a memory access unit 17, And a write-back portion 18.

本実施の形態のプロセッサ１は、５段パイプライン構造のプロセッサであり、パイプライン処理は、命令フェッチ部１１によって実行されるフェッチステージ、命令デコード部１２によって実行されるデコードステージ、演算部１５ａまたは演算部１５ｂによって実行される演算ステージ、メモリアクセス部１７によって実行されるメモリアクセスステージ及びライトバック部１８によって実行されるライトバックステージにより構成される。 The processor 1 of the present embodiment is a processor having a five-stage pipeline structure, and the pipeline processing is performed by a fetch stage executed by the instruction fetch unit 11, a decode stage executed by the instruction decode unit 12, an arithmetic unit 15a or The calculation unit is configured by a calculation stage executed by the calculation unit 15b, a memory access stage executed by the memory access unit 17, and a write back stage executed by the write back unit 18.

演算部１５ａは、クロック制御回路２１と、直列に接続された複数、ここでは４つの演算器（以下、ALUという）２２、２３、２４及び２５とを含み、演算部１５ｂは、汎用のALU２６を含んで構成されている。 The calculation unit 15a includes a clock control circuit 21 and a plurality of, in this case, four calculation units (hereinafter referred to as ALUs) 22, 23, 24, and 25 connected in series. The calculation unit 15b includes a general-purpose ALU 26. It is configured to include.

命令フェッチ部１１は、図示しないプログラムカウンタに格納されているアドレスを読み出す。そして、命令フェッチ部１１は、そのアドレスに格納されている命令コードをメモリ１０４から読み出し、読み出した命令コードを命令デコード部１２に出力する。 The instruction fetch unit 11 reads an address stored in a program counter (not shown). Then, the instruction fetch unit 11 reads the instruction code stored at the address from the memory 104 and outputs the read instruction code to the instruction decoding unit 12.

命令デコード部１２は、命令フェッチ部１１により読み出された命令コードをデコードし、デコード結果をパイプラインレジスタ１３ａ又は１３ｂに出力する。命令デコード部１２は、命令コード中の後述する命令動作を示すビットをデコードした結果、命令コードが本実施の形態の命令動作を示す命令コードの場合、デコード結果をパイプラインレジスタ１３ａに出力する。さらに、命令デコード部１２は、命令コードが本実施の形態の命令動作を示す命令コードの場合、演算部１５ａの出力を選択するための選択信号をセレクタ１６に出力する。 The instruction decode unit 12 decodes the instruction code read by the instruction fetch unit 11, and outputs the decoding result to the pipeline register 13a or 13b. The instruction decoding unit 12 outputs a decoding result to the pipeline register 13a when the instruction code is an instruction code indicating an instruction operation of the present embodiment as a result of decoding a bit indicating an instruction operation described later in the instruction code. Further, the instruction decode unit 12 outputs a selection signal for selecting the output of the arithmetic unit 15a to the selector 16 when the instruction code is an instruction code indicating the instruction operation of the present embodiment.

一方、命令デコード部１２は、命令コードが通常の命令動作を示す命令コードの場合、デコード結果をパイプラインレジスタ１３ｂに出力する。さらに、命令デコード部１２は、命令コードが通常の命令動作を示す命令コードの場合、演算部１５ｂの出力を選択するための選択信号をセレクタ１６に出力する。 On the other hand, when the instruction code is an instruction code indicating a normal instruction operation, the instruction decoding unit 12 outputs a decoding result to the pipeline register 13b. Furthermore, when the instruction code is an instruction code indicating a normal instruction operation, the instruction decoding unit 12 outputs a selection signal for selecting the output of the arithmetic unit 15b to the selector 16.

パイプラインレジスタ１３ａは、後述するクロック制御回路２１からの制御クロック信号ECLKにより制御され、この制御クロック信号ECLKの立ち上がりエッジにより命令デコード部１２からのデコード結果を取り込み、取り込んだデコード結果を演算部１５ａに出力する。 The pipeline register 13a is controlled by a control clock signal ECLK from a clock control circuit 21, which will be described later. The pipeline register 13a takes in the decoding result from the instruction decoding unit 12 at the rising edge of the control clock signal ECLK, and the fetched decoding result is used as the arithmetic unit 15a. Output to.

演算部１５ａは、パイプラインレジスタ１３ａからのデコード結果に基づいて、所定の演算処理を実行し、演算結果をセレクタ１６に出力する。 The arithmetic unit 15 a executes predetermined arithmetic processing based on the decoding result from the pipeline register 13 a and outputs the arithmetic result to the selector 16.

一方、パイプラインレジスタ１３ｂは、図示を省略しているが、後述するクロック制御回路２１からの制御クロック信号ECLKにより制御され、この制御クロック信号ECLKの立ち上がりエッジにより命令デコード部１２からのデコード結果を取り込み、取り込んだデコード結果を演算部１５ｂに出力する。 On the other hand, although not shown, the pipeline register 13b is controlled by a control clock signal ECLK from a clock control circuit 21 to be described later, and the decoding result from the instruction decoding unit 12 is received by a rising edge of the control clock signal ECLK. The fetched and fetched decoding result is output to the arithmetic unit 15b.

演算部１５ｂは、パイプラインレジスタ１３ｂからのデータに基づいて、所定の演算処理を実行し、演算結果をセレクタ１６に出力する。 The arithmetic unit 15 b executes predetermined arithmetic processing based on the data from the pipeline register 13 b and outputs the arithmetic result to the selector 16.

セレクタ１６は、命令デコード部１２から選択信号に基づいて、演算部１５ａから出力された演算結果又は演算部１５ｂから出力された演算結果のいずれか一方を選択し、選択した演算結果をパイプラインレジスタ１３ｃに出力する。 The selector 16 selects either the operation result output from the operation unit 15a or the operation result output from the operation unit 15b on the basis of the selection signal from the instruction decoding unit 12, and selects the selected operation result from the pipeline register. To 13c.

パイプラインレジスタ１３ｃは、後述するクロック制御回路２１からの制御クロック信号ECLKにより制御され、この制御クロック信号ECLKの立ち上がりエッジによりセレクタ１６からの演算結果を取り込み、取り込んだ演算結果をメモリアクセス部１７に出力する。 The pipeline register 13c is controlled by a control clock signal ECLK from a clock control circuit 21 to be described later. The operation result from the selector 16 is fetched by the rising edge of the control clock signal ECLK, and the fetched computation result is sent to the memory access unit 17. Output.

メモリアクセス部１７は、メモリI/F１０２及びバス１０３を介してメモリアクセスを行うとともに、演算結果をライトバック部１８に出力する。 The memory access unit 17 performs memory access via the memory I / F 102 and the bus 103 and outputs a calculation result to the write back unit 18.

ライトバック部１８は、演算結果をデータレジスタ１４あるいは図示しない他のデータレジスタに書き込む処理を行う。 The write-back unit 18 performs a process of writing the calculation result to the data register 14 or another data register (not shown).

ここで、演算部１５ａの詳細な構成について説明する。クロック制御回路２１には、クロック発生器１０１からのクロック信号CLKとパイプラインレジスタ１３ａからの制御信号S_CYCLEとが供給される。クロック制御回路２１は、クロック信号CLKをクロック信号制御情報である制御信号S_CYCLEが示す値だけストールし、制御クロック信号ECLKを生成する。このストールとは、クロック信号CLKをマスクすることである。即ち、クロック制御回路２１は、制御信号S_CYCLEが示す値が０の場合、クロック信号をストールせず、クロック信号CLKを制御クロック信号ECLKとして出力する。制御信号S_CYCLEが示す値が１の場合、クロック信号CLKを１クロックだけストールした制御クロック信号ECLKを生成し、生成した制御クロック信号ECLKを出力する。このように、クロック制御回路２１は、クロック信号CLKを制御するための制御信号S_CYCLEに応じて、供給されたクロック信号CLKに基づいて生成された制御クロック信号ECLKを出力する制御クロック生成部を構成する。 Here, the detailed structure of the calculating part 15a is demonstrated. The clock control circuit 21 is supplied with the clock signal CLK from the clock generator 101 and the control signal S_CYCLE from the pipeline register 13a. The clock control circuit 21 stalls the clock signal CLK by the value indicated by the control signal S_CYCLE that is the clock signal control information, and generates the control clock signal ECLK. This stall is to mask the clock signal CLK. That is, when the value indicated by the control signal S_CYCLE is 0, the clock control circuit 21 does not stall the clock signal and outputs the clock signal CLK as the control clock signal ECLK. When the value indicated by the control signal S_CYCLE is 1, the control clock signal ECLK in which the clock signal CLK is stalled by one clock is generated, and the generated control clock signal ECLK is output. As described above, the clock control circuit 21 configures a control clock generation unit that outputs the control clock signal ECLK generated based on the supplied clock signal CLK according to the control signal S_CYCLE for controlling the clock signal CLK. To do.

クロック制御回路２１は、生成した制御クロック信号ECLKをパイプラインレジスタ１３ａ、１３ｂ及び１３ｃに供給する。なお、パイプラインレジスタ１３ａ及び１３ｂには、クロック制御回路２１からの制御クロック信号ECLKに代わり、クロック発生器１０１からのクロック信号CLKを供給するようにしてもよい。 The clock control circuit 21 supplies the generated control clock signal ECLK to the pipeline registers 13a, 13b, and 13c. The pipeline registers 13a and 13b may be supplied with the clock signal CLK from the clock generator 101 in place of the control clock signal ECLK from the clock control circuit 21.

ALU２２〜２５には、パイプラインレジスタ１３ａからそれぞれモード情報i0〜i3及びデータである即値v0〜v3が供給される。モード情報i0〜i3は、ALU２２〜２５のそれぞれが実行する演算処理の内容の情報であり、即値v0〜v3は、それらの演算処理に必要な即値の情報である。このように、モード情報i0〜i3及び即値v0〜v3は、ALU２２〜２５のそれぞれに供給される動作モード情報である。また、ALU２２には、データレジスタ１４に格納されているデータA0が供給される。 The ALUs 22 to 25 are supplied with mode information i0 to i3 and immediate values v0 to v3, respectively, from the pipeline register 13a. The mode information i0 to i3 is information on the contents of the arithmetic processing executed by each of the ALUs 22 to 25, and the immediate values v0 to v3 are immediate information necessary for the arithmetic processing. As described above, the mode information i0 to i3 and the immediate values v0 to v3 are operation mode information supplied to the ALUs 22 to 25, respectively. The ALU 22 is supplied with the data A0 stored in the data register 14.

ALU２２は、モード情報i0に基づいて、データレジスタ１４に格納されているデータA0と即値v0とに所定の演算処理を施し、データである演算結果をALU２３に出力する。ALU２３は、モード情報i1に基づいて、ALU２２からの演算結果と即値v1とに所定の演算処理を施し、演算結果をALU２４に出力する。ALU２４は、モード情報i2に基づいて、ALU２３からの演算結果と即値v2とに所定の演算処理を施し、演算結果をALU２５に出力する。ALU２５は、モード情報i3に基づいて、ALU２４からの演算結果と即値v3とに所定の演算処理を施し、演算結果をセレクタ１６に出力する。これらのALU２２〜２５は、演算パスにフリップフロップ等の順序回路を含まず、組み合わせ回路により構成されている。 The ALU 22 performs predetermined arithmetic processing on the data A0 and the immediate value v0 stored in the data register 14 based on the mode information i0, and outputs an arithmetic result that is data to the ALU 23. Based on the mode information i1, the ALU 23 performs a predetermined calculation process on the calculation result from the ALU 22 and the immediate value v1, and outputs the calculation result to the ALU 24. Based on the mode information i2, the ALU 24 performs a predetermined calculation process on the calculation result from the ALU 23 and the immediate value v2, and outputs the calculation result to the ALU 25. Based on the mode information i3, the ALU 25 performs a predetermined calculation process on the calculation result from the ALU 24 and the immediate value v3, and outputs the calculation result to the selector 16. These ALUs 22 to 25 do not include a sequential circuit such as a flip-flop in the operation path, and are configured by combinational circuits.

ここで、プロセッサ１により実行される命令コードについて説明する。図２は、命令コードの例を説明するための説明図である。 Here, an instruction code executed by the processor 1 will be described. FIG. 2 is an explanatory diagram for explaining an example of an instruction code.

図２に示すように、命令コードは、３２ビットのビットフィールドを有している。ビット３１〜２６は、命令動作の情報を示すビットであり、ビット２５及び２４は、ストール情報を示すビットであり、ビット２３〜０は、モード情報及び即値を示すビットである。 As shown in FIG. 2, the instruction code has a 32-bit bit field. Bits 31 to 26 are bits indicating instruction operation information, bits 25 and 24 are bits indicating stall information, and bits 23 to 0 are bits indicating mode information and immediate values.

ビット３１〜２６が全て１の場合、本実施の形態の命令動作であることを示し、ビット３１〜２６が全て１でない場合、通常の命令動作であることを示す。そのため、命令デコード部１２では、ビット３１〜２６が全て１の場合、デコード結果をパイプラインレジスタ１３ａに出力し、ビット３１〜２６が全て１でない場合、デコード結果をパイプラインレジスタ１３ｂに出力する。 When bits 31 to 26 are all 1, this indicates an instruction operation of the present embodiment, and when bits 31 to 26 are not all 1, it indicates a normal instruction operation. Therefore, the instruction decode unit 12 outputs the decode result to the pipeline register 13a when the bits 31 to 26 are all 1, and outputs the decode result to the pipeline register 13b when the bits 31 to 26 are not all 1.

ビット２５及び２４は命令実行時のストールサイクル数を表し、ビットフィールドが０の場合、演算実行に１サイクルかかることを表し、ビットフィールドが１の場合、演算実行に２サイクルかかることを表し、ビットフィールドが２の場合、演算実行に３サイクルかかることを表し、ビットフィールドが３の場合、演算実行に４サイクルかかることを表す。命令デコード部１２は、このビット２５及び２４に基づいて、上述した制御信号S_CYCLEを生成し、生成した制御信号S_CYCLEをパイプラインレジスタ１３ａを介してクロック制御回路２１に供給する。 Bits 25 and 24 indicate the number of stall cycles at the time of instruction execution. When the bit field is 0, it indicates that one cycle is required for execution. When the bit field is 1, it indicates that two cycles are required for execution. When the field is 2, it means that it takes 3 cycles to execute the operation, and when the bit field is 3, it means that it takes 4 cycles to execute the operation. The instruction decoding unit 12 generates the above-described control signal S_CYCLE based on the bits 25 and 24, and supplies the generated control signal S_CYCLE to the clock control circuit 21 via the pipeline register 13a.

ビット２３〜０は後述するALUのモード情報及び即値を表し、３ビットのモード情報と３ビットの即値とを一組として、ALU２２〜２５用に４組分のビットフィールドを有している。 Bits 23 to 0 represent ALU mode information and an immediate value, which will be described later, and have 4 bit fields for ALUs 22 to 25 as a set of 3-bit mode information and a 3-bit immediate value.

ビット５〜３及びビット２〜０は、それぞれALU２２用のモード情報i0及び即値v0であり、ビット１１〜９及びビット８〜６は、それぞれALU２３用のモード情報i1及び即値v1であり、ビット１７〜１５及びビット１４〜１２は、それぞれALU２４用のモード情報i2及び即値v2であり、ビット２３〜２１及びビット２０〜１８は、それぞれALU２５用のモード情報i3及び即値v3である。 Bits 5 to 3 and bits 2 to 0 are mode information i0 and immediate value v0 for ALU 22, respectively. Bits 11 to 9 and bits 8 to 6 are mode information i1 and immediate value v1 for ALU 23, respectively. -15 and bits 14-12 are mode information i2 and immediate value v2 for ALU 24, respectively, and bits 23-21 and bits 20-18 are mode information i3 and immediate value v3 for ALU 25, respectively.

ここで、このような命令コードを生成する処理について説明する。図３は、命令コードを生成する処理の流れの例を説明するためのフローチャートである。 Here, processing for generating such an instruction code will be described. FIG. 3 is a flowchart for explaining an example of a flow of processing for generating an instruction code.

まず、初期値として、ｐ＝０及びｒ＝３が設定される（ステップＳ１）。ここで、ｒは、演算部１５ａのALU２２〜２５の個数から１を引いた値である。次に、ｑ＝ｐ＋ｒが算出される（ステップＳ２）。入力コードのｐ〜ｑ行目が演算部１５ａで実行可能な命令に変換できるか否かが判定される（ステップＳ３）。この判定では、ｐ〜ｑ行目の全ての命令がALU２２〜２５により実行可能であり、ｐ〜ｑ行目の命令パターンが逐次処理により構成可能であり、かつ、ｐ＋１〜ｑ行目の命令がジャンプ命令のジャンプ先になっていない場合、変換可能と判定される。 First, p = 0 and r = 3 are set as initial values (step S1). Here, r is a value obtained by subtracting 1 from the number of ALUs 22 to 25 of the arithmetic unit 15a. Next, q = p + r is calculated (step S2). It is determined whether the pth to qth lines of the input code can be converted into an instruction executable by the arithmetic unit 15a (step S3). In this determination, all the instructions on the p to q lines can be executed by the ALUs 22 to 25, the instruction pattern on the p to q lines can be configured by sequential processing, and the instructions on the p + 1 to q lines are If it is not the jump destination of the jump instruction, it is determined that conversion is possible.

ステップＳ３において、変換可能でないと判定された場合、NOとなり、ｐとｑの値が同じか否かが判定される（ステップＳ４）。ｐとｑの値が同じでないと判定された場合、NOとなり、ｑ＝ｑ−１が算出され（ステップＳ５）、ステップＳ３に戻る。ｐとｑの値が同じと判定された場合、ｐ行目の命令がそのまま出力され（ステップＳ６）、ステップＳ９に進む。 If it is determined in step S3 that conversion is not possible, NO is determined, and it is determined whether the values of p and q are the same (step S4). When it is determined that the values of p and q are not the same, NO is obtained, q = q−1 is calculated (step S5), and the process returns to step S3. If it is determined that the values of p and q are the same, the command in the p-th row is output as it is (step S6), and the process proceeds to step S9.

一方、ステップＳ３において、変換可能と判定された場合、YESとなり、ｐ〜ｑ行目を変換してストール数が計算され（ステップＳ７）、変換後の命令が出力される（ステップＳ８）。次に、ｐ＝ｑ＋１が算出され（ステップＳ９）、ｐが最終行を過ぎているか否かが判定される（ステップＳ１０）。ｐが最終行を過ぎていないと判定された場合、ステップＳ２に戻り、上述した処理を繰り返す。ｐが最終行を過ぎていると判定された場合、YESとなり、処理を終了する。以上の処理により、図２に示す命令コードの生成が行われる。 On the other hand, if it is determined in step S3 that conversion is possible, the result is YES, the pth to qth rows are converted, the number of stalls is calculated (step S7), and the converted instruction is output (step S8). Next, p = q + 1 is calculated (step S9), and it is determined whether p is past the last row (step S10). If it is determined that p is not past the last line, the process returns to step S2 and the above-described processing is repeated. If it is determined that p is past the last line, YES is determined and the process is terminated. Through the above processing, the instruction code shown in FIG. 2 is generated.

ここで、ALU２２〜２５の構成について説明する。なお、ALU２２〜２５のそれぞれは、同様の構成をしているため、代表としてALU２２を例に説明する。図４は、ALU２２の構成の例を説明するための説明図である。 Here, the configuration of the ALUs 22 to 25 will be described. Since each of the ALUs 22 to 25 has the same configuration, the ALU 22 will be described as an example. FIG. 4 is an explanatory diagram for explaining an example of the configuration of the ALU 22.

図４に示すように、ALU２２は、端子ｍ、ｎ、ｉ及びｘを有している。端子ｍには、データレジスタ１４に格納されているデータA0が供給される。なお、ALU２３、２４及び２５の端子ｍには、それぞれ前段のALU２２、２３及び２４の出力が供給される。端子ｎには、パイプラインレジスタ１３ａからALU２２用の即値v0が供給される。端子ｉには、パイプラインレジスタ１３ａからALU２２用のモード情報i0が供給される。ALU２２では、モード情報i0に基づいて、端子ｍと端子ｎとに供給されたデータに、後述する図４の演算が実行される。この演算によって得られた演算結果は、端子ｘから次段のALU２３の端子ｍに出力される。 As shown in FIG. 4, the ALU 22 has terminals m, n, i, and x. Data A0 stored in the data register 14 is supplied to the terminal m. Note that the outputs of the preceding ALUs 22, 23, and 24 are supplied to the terminals m of the ALUs 23, 24, and 25, respectively. An immediate value v0 for the ALU 22 is supplied to the terminal n from the pipeline register 13a. The terminal i is supplied with mode information i0 for the ALU 22 from the pipeline register 13a. In the ALU 22, based on the mode information i0, the operation shown in FIG. 4 to be described later is performed on the data supplied to the terminal m and the terminal n. The calculation result obtained by this calculation is output from the terminal x to the terminal m of the ALU 23 at the next stage.

図５は、モード情報と演算処理との対応の例を説明するための説明図である。ALU２２では、図４に示すモード情報i0に対応した演算処理が実行される。ALU２２は、例えば、３ビットのモード情報i0として“000”が入力された場合、ｍ＋ｎの演算処理、即ち、端子ｍと端子ｎとに供給されたデータを加算する。同様に、ALU２２は、３ビットのモード情報i0として“001”が入力された場合、ｍ−ｎの演算処理、即ち、端子ｍと端子ｎとに供給されたデータを減算する。 FIG. 5 is an explanatory diagram for explaining an example of correspondence between mode information and arithmetic processing. In the ALU 22, a calculation process corresponding to the mode information i0 shown in FIG. 4 is executed. For example, when “000” is input as the 3-bit mode information i0, the ALU 22 adds m + n arithmetic processing, that is, adds data supplied to the terminals m and n. Similarly, when “001” is input as the 3-bit mode information i0, the ALU 22 subtracts the mn calculation process, that is, the data supplied to the terminals m and n.

また、ALU２２は、モード情報i0が“010”の場合、左シフト演算、モード情報i0が“011”の場合、右シフト演算、モード情報i0が“100”の場合、AND演算、モード情報i0が“101”の場合、OR演算、モード情報i0が“110”の場合、EXOR演算、モード情報i0が“111”の場合、絶対値演算を実行する。 Further, the ALU 22 performs a left shift operation when the mode information i0 is “010”, a right shift operation when the mode information i0 is “011”, an AND operation and the mode information i0 when the mode information i0 is “100”. In the case of “101”, the OR operation, the mode information i0 is “110”, the EXOR operation, and the mode information i0 is “111”, the absolute value operation is executed.

ここで、Z＝（（A0+２）>>１）&７という演算を例にプロセッサ１の動作を説明する。
この計算を通常のプロセッサにより実行すると、加算、シフト演算及びAND演算の３サイクルが必要である。本実施の形態のプロセッサ１により実現するには、例えば、
（i0，v0）＝（2’b000，2’b010）
（i1，v1）＝（2’b011，2’b001）
（i2，v2）＝（2’b100，2’b111）
（i3，v3）＝（2’b101，2’b000）
と設定すればよい。 Here, the operation of the processor 1 will be described by taking the calculation Z = ((A0 + 2) >> 1) & 7 as an example.
When this calculation is executed by a normal processor, three cycles of addition, shift operation, and AND operation are required. In order to realize by the processor 1 of the present embodiment, for example,
(I0, v0) = (2'b000, 2'b010)
(I1, v1) = (2'b011, 2'b001)
(I2, v2) = (2'b100, 2'b111)
(I3, v3) = (2'b101, 2'b000)
Should be set.

ALU２２は、モード情報i0に基づいて、データレジスタ１４からのデータA0と即値v0との加算を実行し、加算結果をALU２３に出力する。ALU２３は、モード情報i1に基づいて、この加算結果を即値v1の値だけ右シフト演算を実行し、演算結果をALU２４に出力する。ALU２４は、モード情報i2に基づいて、この演算結果と即値v2とのAND演算を実行する。なお、ALU２５は、OR演算を実行し、ALU２４からの演算結果をそのまま出力しているものであり、通常のマイクロプロセッサにより上述した演算を実行する場合は省略することができる。 The ALU 22 adds the data A0 from the data register 14 and the immediate value v0 based on the mode information i0, and outputs the addition result to the ALU 23. Based on the mode information i1, the ALU 23 performs a right shift operation on the addition result by the value of the immediate value v1, and outputs the operation result to the ALU 24. The ALU 24 performs an AND operation on the operation result and the immediate value v2 based on the mode information i2. The ALU 25 executes an OR operation and outputs the operation result from the ALU 24 as it is, and can be omitted when the above-described operation is executed by a normal microprocessor.

ところで、通常、クロック信号CLKの周期はALU２２、２３、２４又は２５の最悪遅延時間以上に設定しなければならない。この最悪遅延時間は、命令、即ち、モード情報及び入力データに応じて決まる最も長い演算時間である。しかし、ALU２２、２３、２４又は２５が演算に要する時間は、短時間で終わる演算が使用されている、あるいは、入力データの上位ビットが０固定である場合、１クロック周期より短くなることがある。仮に、上述した演算において、ALU２２、２３、２４及び２５による演算にかかる時間の合計が２クロック周期である場合、従来のマイクロプロセッサでは１サイクル分の時間を無駄にしていることになる。 By the way, normally, the cycle of the clock signal CLK must be set to be equal to or longer than the worst delay time of the ALU 22, 23, 24 or 25. This worst delay time is the longest calculation time determined according to the command, that is, mode information and input data. However, the time required for the operation of the ALU 22, 23, 24, or 25 may be shorter than one clock cycle when an operation that ends in a short time is used, or when the upper bits of the input data are fixed to 0. . If the total time required for the operations by the ALUs 22, 23, 24, and 25 is two clock cycles in the above-described operation, the conventional microprocessor is wasting time for one cycle.

しかし、本実施の形態のプロセッサ１は、演算にかかる時間の合計が２クロック周期である場合、命令コード中のストール情報の値を１に設定する。そして、クロック制御回路２１は、ストール情報から生成された制御信号S_CYCLEに基づいて、クロック信号CLKを１クロック分だけストールした制御クロック信号ECLKを生成する。さらに、プロセッサ１は、この制御クロック信号ECLKによりパイプラインレジスタ１３ｃを制御することにより、上述した演算を２サイクルで実行することができる。 However, the processor 1 of the present embodiment sets the value of the stall information in the instruction code to 1 when the total time required for the calculation is 2 clock cycles. Then, based on the control signal S_CYCLE generated from the stall information, the clock control circuit 21 generates a control clock signal ECLK that stalls the clock signal CLK by one clock. Furthermore, the processor 1 can execute the above-described operation in two cycles by controlling the pipeline register 13c with the control clock signal ECLK.

図６は、演算部１５ａによって実行される演算ステージの動作の例を説明するための説明図である。 FIG. 6 is an explanatory diagram for explaining an example of the operation of the operation stage executed by the operation unit 15a.

図６（ａ）に示すように、従来、演算部１５ａによって実行される演算ステージは、直列に実行される３つの演算A、B及びCを、クロック信号CLKに基づいて、３つの実行サイクル内で実行させるようにしていた。 As shown in FIG. 6A, the operation stage executed by the operation unit 15a conventionally includes three operations A, B and C executed in series within three execution cycles based on the clock signal CLK. I was trying to run it.

しかし、図６（ｂ）に示すように、本実施の形態の演算部１５ａによって実行される演算ステージは、直列に実行される３つの演算A、B及びCを、クロック制御回路２１からの制御クロック信号ECLKに基づいて、実行サイクルを変更して１つの実行サイクル内で実行させるようにしている。この結果、演算A、B及びCにかかる時間の合計が２クロック周期である場合、演算ステージは、制御クロック信号ECLKに基づいて、演算A、B及びCを２サイクルで実行することができる。 However, as shown in FIG. 6B, the operation stage executed by the operation unit 15a of the present embodiment controls three operations A, B, and C executed in series from the clock control circuit 21. Based on the clock signal ECLK, the execution cycle is changed to be executed within one execution cycle. As a result, when the total time required for the operations A, B, and C is 2 clock cycles, the operation stage can execute the operations A, B, and C in 2 cycles based on the control clock signal ECLK.

以上のように、プロセッサ１は、複数の演算にかかるサイクルをストール情報により指定し、このストール情報に基づいて、制御クロック信号ECLKを生成する。そして、パイプラインレジスタ１３ｃは、この制御クロック信号ECLKによって演算結果を取り込むようにした。この結果、従来無駄になっていた演算を実行していない時間を詰めることが可能になり、マイクロプロセッサの処理を高速化できる。 As described above, the processor 1 designates a cycle related to a plurality of calculations by the stall information, and generates the control clock signal ECLK based on the stall information. The pipeline register 13c takes in the operation result by the control clock signal ECLK. As a result, it is possible to reduce the time during which calculations that have been wasted in the past are not executed, and the processing of the microprocessor can be speeded up.

よって、本実施の形態のプロセッサによれば、演算器の使用時間効率を上げ、逐次処理の性能向上をすることができる。 Therefore, according to the processor of the present embodiment, it is possible to improve the usage time efficiency of the arithmetic unit and improve the performance of sequential processing.

（第２の実施の形態）
次に、第２の実施の形態について説明する。図７は、本発明の第２の実施の形態に係るプロセッサシステムの構成を示すブロック図である。なお、図７において図１と同様の構成については、同一の符号を付して説明を省略する。 (Second Embodiment)
Next, a second embodiment will be described. FIG. 7 is a block diagram showing a configuration of a processor system according to the second embodiment of the present invention. In FIG. 7, the same components as those in FIG. 1 are denoted by the same reference numerals and description thereof is omitted.

本実施の形態のプロセッサシステム１００ａは、図１のプロセッサ１に代わり、プロセッサ１ａを用いて構成されている。なお、説明を簡単にするため図示を省略しているが、メモリI/F１０２は、図１と同様にバス１０３を介してメモリ１０４に接続されている。 The processor system 100a of the present embodiment is configured using a processor 1a instead of the processor 1 of FIG. Although not shown for the sake of simplicity, the memory I / F 102 is connected to the memory 104 via the bus 103 as in FIG.

本実施の形態のプロセッサ１ａは、図１の演算部１５ａに代わり、演算部１５ｃを用いて構成されている。なお、説明を簡単にするため図示を省略しているが、図１と同様に、通常の命令動作の場合、パイプラインレジスタ１３ｂ及び演算部１５ｂにより得られた演算結果がセレクタ１６の一方の入力端子に供給されるようになっている。 The processor 1a of the present embodiment is configured by using a calculation unit 15c instead of the calculation unit 15a of FIG. Although illustration is omitted for the sake of simplicity, as in FIG. 1, in the case of normal instruction operation, the operation result obtained by the pipeline register 13b and the operation unit 15b is input to one of the selectors 16. It is supplied to the terminal.

演算部１５は、クロック制御回路２１と、クロスバスイッチ３０と、複数、ここでは８個のALU３１〜３８とを有して構成されている。 The arithmetic unit 15 includes a clock control circuit 21, a crossbar switch 30, and a plurality of (here, eight) ALUs 31 to 38.

パイプラインレジスタ１３ａは、後述するクロスバスイッチ３０の接続情報を示すビットから生成された接続情報信号XB_CONFをクロスバスイッチ３０に出力する。 The pipeline register 13a outputs a connection information signal XB_CONF generated from a bit indicating connection information of the crossbar switch 30 described later to the crossbar switch 30.

クロスバスイッチ３０には、データレジスタ１４からのデータA0、図示しないデータレジスタからのデータA1及びデータA2が供給される。また、クロスバスイッチ３０には、パイプラインレジスタ１３ａから即値v0〜v7が供給される。クロスバスイッチ３０は、接続情報信号XB_CONFに基づいて、後述する図９A及び図９Bに示すように、ALU３１〜３８の接続を変更する。このように、クロスバスイッチ３０は、複数のALU３１〜３８間の接続を複数の接続パターンで接続可能な切換部を構成する。 The crossbar switch 30 is supplied with data A0 from the data register 14 and data A1 and data A2 from a data register (not shown). The crossbar switch 30 is supplied with immediate values v0 to v7 from the pipeline register 13a. Based on the connection information signal XB_CONF, the crossbar switch 30 changes the connection of the ALUs 31 to 38 as shown in FIGS. 9A and 9B described later. Thus, the crossbar switch 30 constitutes a switching unit that can connect the connections between the plurality of ALUs 31 to 38 with a plurality of connection patterns.

図８は、命令コードの例を説明するための説明図である。図８に示すように、命令コードは、６４ビットのビットフィールドを有している。 FIG. 8 is an explanatory diagram for explaining an example of an instruction code. As shown in FIG. 8, the instruction code has a 64-bit bit field.

本実施の形態の命令コードは、図２の命令コードにクロスバスイッチ３０の接続情報を示すビットが追加されている。クロスバスイッチ３０の接続情報を示すビット５７及び５６により、クロスバスイッチ３０の接続が変更され、ALU３１〜ALU３８の接続が変更される。 In the instruction code of the present embodiment, a bit indicating connection information of the crossbar switch 30 is added to the instruction code of FIG. The connection of the crossbar switch 30 is changed by the bits 57 and 56 indicating the connection information of the crossbar switch 30, and the connections of the ALUs 31 to ALU 38 are changed.

ビット６３〜６１は、命令動作の情報を示すビットであり、ビット６３〜６１が全て１の場合、本実施の形態の命令動作であることを表し、ビット６３〜６１が全て１でない場合、通常の命令動作であることを表す。そのため、命令デコード部１２では、ビット６３〜６１が全て１の場合、デコード結果をパイプラインレジスタ１３ａに出力し、ビット６３〜６１が全て１でない場合、デコード結果を図示を省略しているパイプラインレジスタ１３ｂに出力する。 Bits 63 to 61 are bits indicating instruction operation information. When bits 63 to 61 are all 1, this indicates an instruction operation according to the present embodiment. When bits 63 to 61 are not all 1, normal Indicates that the command operation. For this reason, the instruction decode unit 12 outputs the decoding result to the pipeline register 13a when the bits 63 to 61 are all 1, and when the bits 63 to 61 are not all 1, the decoding result is omitted in the pipeline. Output to the register 13b.

ビット６０〜５８は命令実行時のストールサイクル数を表し、このビットフィールドが０の場合、演算実行に１サイクルかかることを表し、このビットフィールドが７の場合、演算実行に８サイクルかかることを表す。 Bits 60 to 58 represent the number of stall cycles at the time of instruction execution. When this bit field is 0, it indicates that 1 cycle is required for execution of the operation. When this bit field is 7, it indicates that 8 cycles are required for execution of the operation. .

ビット５５〜０は後述するALUのモード情報及び即値を表し、４ビットのモード情報と３ビットの即値とを一組として、ALU３１〜３８用に８組分のビットフィールドを有している。 Bits 55 to 0 represent ALU mode information and an immediate value, which will be described later, and have 8 sets of bit fields for ALUs 31 to 38, each of which includes a 4-bit mode information and a 3-bit immediate value.

ビット６〜３及びビット２〜０は、それぞれALU３１用のモード情報i0及び即値v0である。同様に、ビット５５〜５２及びビット５１〜２９は、それぞれALU３８用のモード情報i7及び即値v7である。 Bits 6 to 3 and bits 2 to 0 are mode information i0 and immediate value v0 for ALU 31, respectively. Similarly, bits 55 to 52 and bits 51 to 29 are mode information i7 and immediate value v7 for ALU 38, respectively.

このような命令コードは、図３と同様のフローチャートにより生成することができる。ただし、図３のステップＳ３の判定において、ｐ〜ｑ行目の全ての命令がALUで実行可能であり、ｐ〜ｑ行目の命令パターンがクロスバスイッチ３０で構成可能であり、かつ、ｐ＋１〜ｑ行目の命令がジャンプ命令のジャンプ先になっていない場合、変換可能と判定される。 Such an instruction code can be generated by the same flowchart as in FIG. However, in the determination of step S3 in FIG. 3, all the instructions in the p to q lines can be executed by the ALU, the instruction pattern in the p to q lines can be configured by the crossbar switch 30, and p + 1 to 1 If the q-th line instruction is not the jump destination of the jump instruction, it is determined that conversion is possible.

図９A及び図９Bは、ALU３１〜３８の接続の例を説明するための説明図である。クロスバスイッチ３０は、接続情報信号XB_CONFが００の場合、図９Aに示すように、ALU３１〜３８の接続を変更し、接続情報信号XB_CONFが０１の場合、図９Bに示すように、ALU３１〜３８の接続を変更する。なお、クロスバスイッチ３０は、接続情報信号XB_CONFが１０及び１１の場合にも、ALU３１〜３８の接続をそれぞれ所定の接続になるように変更する。 9A and 9B are explanatory diagrams for explaining an example of connection of the ALUs 31 to 38. FIG. When the connection information signal XB_CONF is 00, the crossbar switch 30 changes the connection of the ALUs 31 to 38 as shown in FIG. 9A. When the connection information signal XB_CONF is 01, the crossbar switch 30 changes the connection of the ALUs 31 to 38 as shown in FIG. Change the connection. Note that the crossbar switch 30 also changes the connections of the ALUs 31 to 38 so as to become predetermined connections even when the connection information signal XB_CONF is 10 and 11.

ここで、ALU３１〜３８の構成について説明する。なお、ALU３１〜３８のそれぞれは、同様の構成をしているため、代表としてALU３１を例に説明する。図１０は、ALU３１の構成の例を説明するための説明図である。 Here, the configuration of the ALUs 31 to 38 will be described. Since each of the ALUs 31 to 38 has the same configuration, the ALU 31 will be described as an example. FIG. 10 is an explanatory diagram for explaining an example of the configuration of the ALU 31.

図１０に示すように、ALU３１は、図３のALU２２に端子ｃが追加されて構成されている。この端子ｃには、条件設定の情報が供給される。そのため、ALU３１は、条件設定の情報に基づいて、条件付の演算を実行することができる。 As shown in FIG. 10, the ALU 31 is configured by adding a terminal c to the ALU 22 of FIG. Information on condition setting is supplied to the terminal c. Therefore, the ALU 31 can execute a conditional operation based on the condition setting information.

端子ｉには、パイプラインレジスタ１３ａからALU３１用のモード情報i0が供給される。端子ｍ、端子ｎ及び端子ｃのそれぞれには、クロスバスイッチ３０の接続状況によって異なり、データA0〜A2、即値v0〜v7及び他のALU３２〜３８の出力のいずれか１つからのデータが供給される。ALU３１では、モード情報i0及び条件設定の情報に基づいて、端子ｍと端子ｎとに供給されたデータに、後述する図１１の演算が実行される。この演算によって得られた演算結果は、端子ｘからクロスバスイッチ３０の接続状況によって決まる接続先に出力される。 The terminal i is supplied with mode information i0 for the ALU 31 from the pipeline register 13a. Each of the terminals m, n, and c is supplied with data from any one of data A0 to A2, immediate values v0 to v7, and outputs of other ALUs 32 to 38, depending on the connection status of the crossbar switch 30. The In the ALU 31, based on the mode information i0 and the condition setting information, the operation shown in FIG. 11 described later is executed on the data supplied to the terminal m and the terminal n. The calculation result obtained by this calculation is output from the terminal x to the connection destination determined by the connection status of the crossbar switch 30.

図１１は、モード情報と演算処理の例を説明するための説明図である。ALU３１では、図１１に示すモード情報i0に対応した演算処理が実行される。ALU３１は、例えば、４ビットのモード情報i0として“1000”が入力された場合、ｃ？（ｍ＋ｎ）：ｍの演算処理を実行する。この“ｃ？（ｍ＋ｎ）：ｍ”は、条件付の演算であり、端子ｃに入力された値が１ならば、端子ｍと端子ｎとに供給されたデータを加算し、この加算結果が端子ｘから出力される。一方、端子ｃに入力された値が１でなければ、端子ｍに入力されたデータが端子ｘから出力される。 FIG. 11 is an explanatory diagram for describing an example of mode information and calculation processing. In the ALU 31, an arithmetic process corresponding to the mode information i0 shown in FIG. 11 is executed. For example, when “1000” is input as the 4-bit mode information i0, the ALU 31 c? (M + n): m arithmetic processing is executed. This “c? (M + n): m” is a conditional operation. If the value input to the terminal c is 1, the data supplied to the terminal m and the terminal n are added, and the addition result is Output from terminal x. On the other hand, if the value input to the terminal c is not 1, the data input to the terminal m is output from the terminal x.

ここで、以下の演算を例にプロセッサ１ａの動作を説明する。
if((A0&1)==1)
Z=((~A1)>>1)&7
else
Z=((A2&2)>>1)&7
この演算を通常のプロセッサで実行すると最悪８サイクルかかり、並列処理を理想的に行うことができるプロセッサであっても４サイクル必要である。 Here, the operation of the processor 1a will be described by taking the following calculation as an example.
if ((A0 & 1) == 1)
Z = ((~ A1) >> 1) & 7
else
Z = ((A2 & 2) >> 1) & 7
When this operation is executed by a normal processor, the worst 8 cycles are required, and even a processor capable of ideally performing parallel processing requires 4 cycles.

この演算を並列処理により実行する場合、図９Bに示す回路を用いるとよい。即ち、クロスバスイッチ３０に接続情報信号XB_CONFとして０１が供給されることにより、図９Bに示す回路が構成される。 When this calculation is executed by parallel processing, the circuit shown in FIG. 9B may be used. That is, when the crossbar switch 30 is supplied with 01 as the connection information signal XB_CONF, the circuit shown in FIG. 9B is configured.

このように構成された回路による上述した演算の処理について説明する。図１２は、演算処理の例を説明するための説明図である。
ALU３１は、データA0と即値v0とのAND演算を実行し、演算結果をALU３８の端子ｃに条件設定の情報として出力する。 Processing of the above-described calculation by the circuit configured as described above will be described. FIG. 12 is an explanatory diagram for explaining an example of the arithmetic processing.
The ALU 31 performs an AND operation on the data A0 and the immediate value v0, and outputs the operation result to the terminal c of the ALU 38 as condition setting information.

ALU３２は、データA1と即値v1とのEXOR演算を実行し、演算結果をALU３３に出力する。ALU３３は、この演算結果を即値v2に示すビットだけ右にシフト演算を実行し、演算結果をALU３４に出力する。ALU３４は、この演算結果と即値v3とのAND演算を実行し、演算結果をALU３８の端子ｍに出力する。即ち、ALU３２〜３４では、上述した演算の((~A1)>>1)&7の演算が実行され、演算結果がALU３８に出力されている。 The ALU 32 performs an EXOR operation on the data A1 and the immediate value v1, and outputs the operation result to the ALU 33. The ALU 33 shifts the calculation result to the right by the bit indicated by the immediate value v2, and outputs the calculation result to the ALU 34. The ALU 34 performs an AND operation on the operation result and the immediate value v3, and outputs the operation result to the terminal m of the ALU 38. That is, in the ALUs 32 to 34, the above-described calculation ((˜A1) >> 1) & 7 is executed, and the calculation result is output to the ALU 38.

ALU３５は、データA2と即値v4との加算演算が実行され、演算結果をALU３６に出力する。ALU３６は、この演算結果を即値v5に示すビットだけ右にシフト演算を実行し、演算結果をALU３７に出力する。ALU３７は、この演算結果と即値v6とのAND演算を実行し、演算結果をALU３８の端子ｎに出力する。即ち、ALU３５〜３７では、上述した演算の((A2&2)>>1)&7の演算が実行され、演算結果がALU３８に出力される。 The ALU 35 performs the addition operation of the data A2 and the immediate value v4, and outputs the operation result to the ALU 36. The ALU 36 shifts the calculation result to the right by the bit indicated by the immediate value v5 and outputs the calculation result to the ALU 37. The ALU 37 performs an AND operation on the operation result and the immediate value v6, and outputs the operation result to the terminal n of the ALU 38. That is, in the ALUs 35 to 37, the above-described calculation ((A2 & 2) >> 1) & 7 is executed, and the calculation result is output to the ALU 38.

ALU３８は、ALU３１から入力された条件設定、即ち、データA0と即値v0とのAND演算の演算結果が１の場合、端子ｍに入力された演算結果を出力し、データA0と即値v0とのAND演算の演算結果が１でない場合、端子ｎに入力された演算結果を出力する。 The ALU 38 outputs the calculation result input to the terminal m when the condition setting input from the ALU 31, that is, the calculation result of the AND calculation of the data A 0 and the immediate value v 0 is 1, and the AND of the data A 0 and the immediate value v 0. When the calculation result of the calculation is not 1, the calculation result input to the terminal n is output.

仮に、ALU３５、３６、３７及び３８による演算時間の合計が３クロック周期で、かつこの演算全体の最長パスである場合、従来のプロセッサでは１サイクル分の時間を無駄にしていることになる。 If the total calculation time by the ALUs 35, 36, 37, and 38 is 3 clock cycles and is the longest path of the entire calculation, the conventional processor is wasting time for one cycle.

しかし、本実施の形態のプロセッサ１ａは、演算にかかる時間の合計が３クロック周期である場合、命令コード中のストール情報の値を２に設定する。そして、クロック制御回路２１は、ストール情報から生成された制御信号S_CYCLEに基づいて、クロック信号CLKを２クロック分だけストールした制御クロック信号ECLKを生成する。さらに、プロセッサ１ａは、この制御クロック信号ECLKによりパイプラインレジスタ１３ｃを制御することにより、上述した演算を３サイクルで実行することができる。 However, the processor 1a according to the present embodiment sets the value of the stall information in the instruction code to 2 when the total time required for the calculation is 3 clock cycles. Then, based on the control signal S_CYCLE generated from the stall information, the clock control circuit 21 generates a control clock signal ECLK that stalls the clock signal CLK by two clocks. Furthermore, the processor 1a can execute the above-described calculation in three cycles by controlling the pipeline register 13c with the control clock signal ECLK.

以上のように、プロセッサ１ａは、接続情報信号XB_CONFをクロスバスイッチ３０に供給し、ALU３１〜３８の接続を直列あるいは並列に接続することができようにした。この結果、プロセッサ１ａは、ALU３１〜３８の接続の自由度を上げることができ、演算処理を高速化することができる。 As described above, the processor 1a supplies the connection information signal XB_CONF to the crossbar switch 30 so that the connections of the ALUs 31 to 38 can be connected in series or in parallel. As a result, the processor 1a can increase the degree of freedom of connection of the ALUs 31 to 38, and can speed up the arithmetic processing.

（第３の実施の形態）
次に、第３の実施の形態について説明する。図１３は、本発明の第３の実施の形態に係るプロセッサシステムの構成を示すブロック図である。なお、図１３において図７と同様の構成については、同一の符号を付して説明を省略する。 (Third embodiment)
Next, a third embodiment will be described. FIG. 13 is a block diagram showing a configuration of a processor system according to the third embodiment of the present invention. In FIG. 13, the same components as those in FIG. 7 are denoted by the same reference numerals and description thereof is omitted.

本実施の形態のプロセッサシステム１００ｂは、図１のプロセッサ１に代わり、プロセッサ１ｂを用いて構成されている。 The processor system 100b of this embodiment is configured using a processor 1b instead of the processor 1 of FIG.

本実施の形態のプロセッサ１ｂは、図８に示す命令コードに相当する情報をテーブル情報としてプロセッサ１ｂ内に持つようにしている。
図１３に示すように、プロセッサ１ｂは、図７のプロセッサ１ａにコンフィグテーブル４１を加えて構成されている。 The processor 1b of the present embodiment has information corresponding to the instruction code shown in FIG. 8 in the processor 1b as table information.
As shown in FIG. 13, the processor 1b is configured by adding a configuration table 41 to the processor 1a of FIG.

コンフィグテーブル４１は、１６個のエントリ０〜１５を有し、エントリ０〜１５のそれぞれは、ストールサイクル数、クロスバスイッチ３０の接続情報、ALU３１〜３８それぞれのモード情報及び即値を保持している。 The configuration table 41 has 16 entries 0 to 15. Each of the entries 0 to 15 holds the number of stall cycles, the connection information of the crossbar switch 30, the mode information of each of the ALUs 31 to 38, and the immediate value.

命令デコード部１２は、コンフィグテーブル４１のエントリ０〜１５のいずれか１つを指定する命令コードが供給されると、指定されたエントリの情報をコンフィグテーブル４１に出力する。コンフィグテーブル４１は、この情報に基づいて、指定されたエントリ内のストールサイクル数、クロスバスイッチ３０の接続情報、ALU３１〜３８それぞれのモード情報i0〜i7及び即値v0〜v7をパイプラインレジスタ１３ａに出力する。これにより、パイプラインレジスタ１３ａから演算部１５ｃにストールサイクル数、クロスバスイッチ３０の接続情報、ALU３１〜３８それぞれのモード情報i0〜i7及び即値v0〜v7が供給され、第２の実施の形態と同様に、演算部１５ｃによって所定の演算処理が施される。 When the instruction code specifying any one of the entries 0 to 15 in the configuration table 41 is supplied, the instruction decoding unit 12 outputs information on the specified entry to the configuration table 41. Based on this information, the config table 41 outputs the number of stall cycles in the specified entry, the connection information of the crossbar switch 30, the mode information i0 to i7 and the immediate values v0 to v7 of the ALUs 31 to 38 to the pipeline register 13a. To do. As a result, the number of stall cycles, the connection information of the crossbar switch 30, the mode information i0 to i7 and the immediate values v0 to v7 of the ALUs 31 to 38 are supplied from the pipeline register 13a to the arithmetic unit 15c, as in the second embodiment. In addition, a predetermined calculation process is performed by the calculation unit 15c.

このようにプロセッサ１ｂにコンフィグテーブル４１を有する構成により、命令コードは、ストールサイクル数、クロスバスイッチ３０の接続情報、ALU３１〜３８それぞれのモード情報i0〜i7及び即値v0〜v7を持つ必要がなくなる。そのため、命令コードは、エントリ０〜１５の１６種類の情報を指定できるだけのビットフィールドを有していればよく、命令コードのビットフィールドを短縮することが可能になる。 Thus, the configuration having the configuration table 41 in the processor 1b eliminates the need for the instruction code to have the number of stall cycles, the connection information of the crossbar switch 30, the mode information i0 to i7 of the ALUs 31 to 38, and the immediate values v0 to v7. Therefore, the instruction code only needs to have a bit field that can specify 16 types of information of entries 0 to 15, and the bit field of the instruction code can be shortened.

なお、コンフィグテーブル４１は、１６種類の情報を有しているが、１６種類以上の情報を有していてもよい。また、コンフィグテーブル４１のエントリ０〜１５の情報を書き換える命令をプロセッサ１ｂに持つようにする、あるいは、プロセッサ１ｂの外部からエントリ０〜１５の情報を書き換えることができるようにしてもよい。この結果、プロセッサ１ｂは、ALU３１〜３８の接続パターンをプロセッサ１ａよりも多く持つことが可能となり、演算を効率的に実行することが可能になる。 The configuration table 41 has 16 types of information, but may have 16 types or more of information. Alternatively, the processor 1b may have an instruction to rewrite the information of the entries 0 to 15 in the configuration table 41, or the information of the entries 0 to 15 may be rewritten from outside the processor 1b. As a result, the processor 1b can have more connection patterns of the ALUs 31 to 38 than the processor 1a, and can efficiently execute operations.

図１４は、エントリ０〜１５のそれぞれが有している情報の例を説明するための説明図である。 FIG. 14 is an explanatory diagram for explaining an example of information included in each of the entries 0 to 15.

エントリ０〜１５のそれぞれは、図８の命令コードと同様に、ストールサイクル数、クロスバスイッチ３０の接続情報、ALU３１〜３８それぞれのモード情報及び即値を有している。 Each of the entries 0 to 15 has the number of stall cycles, the connection information of the crossbar switch 30, the mode information and the immediate value of each of the ALUs 31 to 38, as in the instruction code of FIG.

クロスバスイッチ３０の接続情報は、命令コードとして実現する場合に比べ、ビット長を短くする必要性が少ないため、全ての接続パターンを実現できる構成を示している。例えば、ビット３〜０のXB_0ｍは、ALU３１の端子ｍに接続する信号情報を表す。ALU３１の端子ｍには、データA0〜A2、即値v0及びALU３２〜３８の出力のいずれか１つが接続される可能性があり、４ビットあれば全ての接続パターンを実現できる。ALU３１の端子ｎ及び端子ｃについても、それぞれ４ビットあれば全ての接続パターンを実現できる。そのため、ALU３１の接続情報は１２ビットとなり、８個のALU３１〜３８までの９６ビットとなる。 The connection information of the crossbar switch 30 shows a configuration in which all connection patterns can be realized because it is less necessary to shorten the bit length compared to the case where it is realized as an instruction code. For example, XB_0m of bits 3 to 0 represents signal information connected to the terminal m of the ALU 31. Any one of the data A0 to A2, the immediate value v0, and the outputs of the ALUs 32 to 38 may be connected to the terminal m of the ALU 31, and all connection patterns can be realized with 4 bits. With regard to the terminals n and c of the ALU 31, all the connection patterns can be realized with 4 bits each. For this reason, the connection information of the ALU 31 is 12 bits, and 96 bits from the eight ALUs 31 to 38.

図１５及び図１６は、命令コードの生成の処理の流れの例を説明するためのフローチャートである。なお、図１５において、図３と同一の処理については同一の符号を付して説明を省略する。 15 and 16 are flowcharts for explaining an example of the flow of processing for generating an instruction code. In FIG. 15, the same processes as those in FIG.

まず、初期値として、ｐ＝０、ｒ＝３及びｓ＝０が設定される（ステップＳ１１）。ステップＳ７において、ｐ〜ｑ行目を変換してストール数が計算されると、命令エントリｓ〜ｓ−１で一致する命令があるか否かが判定される（ステップＳ１２）。一致する命令がある場合、YESとなり、一致する命令エントリ番号の命令を出力し（ステップＳ１３）、ステップＳ９に進む。一方、一致する命令がない場合、NOとなり、命令エントリ番号としてｓを用い、変換後の命令を出力する（ステップＳ１４）。次に、ｓ＝ｓ＋１が算出され（ステップＳ１５）、ステップＳ９に進む。以上の処理により、命令エントリデータを含むアセンブラコードが生成される。 First, p = 0, r = 3, and s = 0 are set as initial values (step S11). When the stall number is calculated by converting the pth to qth lines in step S7, it is determined whether or not there is a matching instruction in the instruction entries s to s-1 (step S12). If there is a matching instruction, YES is output, the instruction with the matching instruction entry number is output (step S13), and the process proceeds to step S9. On the other hand, if there is no matching instruction, the result is NO, using s as the instruction entry number, and outputting the converted instruction (step S14). Next, s = s + 1 is calculated (step S15), and the process proceeds to step S9. With the above processing, an assembler code including instruction entry data is generated.

次に図１６に移り、図１５の処理により生成された命令エントリデータを含むアセンブラコードが読み込まれ（ステップＳ２１）、ｔ＝２が設定される（ステップＳ２２）。次に、命令エントリデータの命令数がプロセッサ１ｂのコンフィグテーブル４１に収まるか否かが判定される（ステップＳ２３）。命令エントリデータの命令数がコンフィグテーブル４１に収まる場合、YESとなり、処理を終了する。命令エントリデータの命令数がコンフィグテーブル４１に収まらない場合、NOとなり、使用ALU数がｔの命令が命令エントリデータに存在するか否かが判定される（ステップＳ２４）。使用ALU数がｔの命令が存在する場合、YESとなり、命令エントリデータ中の使用ALU数がｔである命令を１つ選び、その命令について、命令エントリデータから削除し、出力コードを元の命令に書き戻し（ステップＳ２５）、ステップＳ２３に戻る。一方、使用ALU数がｔの命令が存在しない場合、NOとなり、ｔ＝ｔ＋１が算出され（ステップＳ２６）、ステップＳ２５に進み、上述した処理を繰り返す。 Next, moving to FIG. 16, the assembler code including the instruction entry data generated by the processing of FIG. 15 is read (step S21), and t = 2 is set (step S22). Next, it is determined whether or not the number of instructions in the instruction entry data fits in the configuration table 41 of the processor 1b (step S23). If the number of instructions in the instruction entry data fits in the configuration table 41, the determination is YES and the process ends. If the number of instructions in the instruction entry data does not fit in the configuration table 41, NO is determined, and it is determined whether or not an instruction having the number of used ALUs t is present in the instruction entry data (step S24). If there is an instruction with the number of used ALUs t, the answer is YES, one instruction with the number of used ALUs t in the instruction entry data is selected, the instruction is deleted from the instruction entry data, and the output code is changed to the original instruction. Is written back (step S25), and the process returns to step S23. On the other hand, if there is no instruction with the number of used ALUs t, NO is obtained, t = t + 1 is calculated (step S26), the process proceeds to step S25, and the above-described processing is repeated.

なお、テーブルを第２の実施の形態のプロセッサ１ａが有しているが、第１の実施の形態のプロセッサ１が有していてもよい。この場合、エントリ０〜１５のそれぞれが持っているクロスバスイッチ３０の接続情報を削除すればよい。 In addition, although the processor 1a of 2nd Embodiment has the table, you may have the processor 1 of 1st Embodiment. In this case, the connection information of the crossbar switch 30 possessed by each of the entries 0 to 15 may be deleted.

以上のように、プロセッサ１ｂは、コンフィグテーブル４１にストールサイクル数、クロスバスイッチ３０の接続情報、ALU３１〜３８それぞれのモード情報及び即値を持つようにした。この結果、プロセッサ１ｂは、第２の実施の形態のプロセッサ１ａによりも、命令コードのビットフィールドを短縮することができる。さらに、プロセッサ１ｂは、第２の実施の形態のプロセッサ１ａによりも、ALU３１〜３８の接続の自由度をあげることができ、より多くのパターンの演算を効率的に実行することが可能である。 As described above, the processor 1b has the number of stall cycles, the crossbar switch 30 connection information, the mode information of each of the ALUs 31 to 38, and the immediate value in the configuration table 41. As a result, the processor 1b can shorten the bit field of the instruction code also by the processor 1a of the second embodiment. Furthermore, the processor 1b can increase the degree of freedom of connection of the ALUs 31 to 38 as compared with the processor 1a of the second embodiment, and can efficiently execute more patterns of operations.

（第４の実施の形態）
次に、第４の実施の形態について説明する。図１７は、本発明の第４の実施の形態に係るプロセッサシステムの構成を示すブロック図である。なお、図１７において、図１と同様の構成については同一の符号を付して説明を省略する。 (Fourth embodiment)
Next, a fourth embodiment will be described. FIG. 17 is a block diagram showing a configuration of a processor system according to the fourth embodiment of the present invention. In FIG. 17, the same components as those in FIG.

近年、例えば、携帯機器向けのシステムでは、低消費電力、低発熱の要求がさらに高まっており、より細かい電力制御が求められている。そこで、本実施の形態のプロセッサシステム１００ａは、第１の実施の形態のプロセッサシステム１００にプロセッサ１の周波数あるいは動作電圧を変更し、より細かい電力制御を可能にするとともに、演算処理の最適化を可能にしている。 In recent years, for example, in systems for portable devices, demands for low power consumption and low heat generation are further increased, and finer power control is required. Therefore, the processor system 100a of the present embodiment changes the frequency or operating voltage of the processor 1 to the processor system 100 of the first embodiment, enables finer power control, and optimizes arithmetic processing. Making it possible.

図１７に示すように、プロセッサシステム１００ｃは、図１のプロセッサシステム１００にクロック制御部１０５及び電源制御部１０６を加えて構成されるとともに、プロセッサ１にモード信号が供給されている。なお、プロセッサシステム１００ｃは、プロセッサ１に代わり、第２の実施の形態のプロセッサ１ａまたは第３の実施の形態のプロセッサ１ｂを用いてもよい。 As shown in FIG. 17, the processor system 100 c is configured by adding a clock control unit 105 and a power supply control unit 106 to the processor system 100 of FIG. 1, and a mode signal is supplied to the processor 1. The processor system 100c may use the processor 1a of the second embodiment or the processor 1b of the third embodiment instead of the processor 1.

メモリ１０４には、低消費電力モード用のプログラム、高速動作モード用のプログラム及び通常動作用のプログラムが格納されている。 The memory 104 stores a program for a low power consumption mode, a program for a high-speed operation mode, and a program for a normal operation.

プロセッサ１に供給されるモード信号は、低消費電力モード信号、高速動作モード信号及び通常動作モード信号等のモード信号である。このモード信号がプロセッサ１に供給されることにより、プロセッサ１は、モード信号に対応するモードに変更する。 The mode signal supplied to the processor 1 is a mode signal such as a low power consumption mode signal, a high-speed operation mode signal, and a normal operation mode signal. When this mode signal is supplied to the processor 1, the processor 1 changes to a mode corresponding to the mode signal.

プロセッサ１に低消費電力モード信号が供給されると、クロック制御部１０５は、プロセッサ１の周波数を下げる制御を行う。その後、プロセッサ１は、メモリ１０４から低消費電力モード用のプログラムを読み出し、読み出したプログラムを実行する。 When the low power consumption mode signal is supplied to the processor 1, the clock control unit 105 performs control to reduce the frequency of the processor 1. After that, the processor 1 reads out the low power consumption mode program from the memory 104 and executes the read program.

例えば、低消費電力化のためプロセッサ１のクロック周波数は低くしておかなければならないが、特定の演算だけは高速に演算を実行したい場合がある。この場合、特定の演算列については、命令コード中のストールサイクル数を増やして、１度に動作させるALUを増やすように命令コードを変更し、この命令コードをプロセッサ１により実行することで、クロック周波数は下げたまま、演算処理の最適化が可能である。 For example, the clock frequency of the processor 1 must be kept low in order to reduce power consumption, but there are cases where it is desired to execute a specific operation at high speed. In this case, for a specific operation sequence, the number of stall cycles in the instruction code is increased, the instruction code is changed so as to increase the number of ALUs to be operated at one time, and the instruction code is executed by the processor 1 to The arithmetic processing can be optimized while the frequency is lowered.

一方、プロセッサ１に高速動作モード信号が供給されると、電源制御部１０６は、プロセッサ１の電圧を上げる制御を行う。その後、プロセッサ１は、メモリ１０４から高速動作モード用のプログラムを読み出し、読み出したプログラムを実行する。 On the other hand, when the high-speed operation mode signal is supplied to the processor 1, the power supply control unit 106 performs control to increase the voltage of the processor 1. Thereafter, the processor 1 reads the program for the high-speed operation mode from the memory 104 and executes the read program.

例えば、プロセッサ１の電圧を上げて特定の演算だけは高速に演算を実行したいが、クロック周波数はシステムに合わせて低いままにしたい場合がある。この場合、特定の演算列については、命令コード中のストールサイクル数を減らし、この命令コードをプロセッサ１により実行することで、クロック周波数を下げたまま、演算処理の最適化が可能である。 For example, there is a case where it is desired to increase the voltage of the processor 1 and execute a specific operation at high speed, but to keep the clock frequency low according to the system. In this case, for a specific operation sequence, the number of stall cycles in the instruction code is reduced, and the instruction code is executed by the processor 1, so that the operation processing can be optimized while the clock frequency is lowered.

以上のように、本実施の形態のプロセッサシステム１００ｃは、クロック制御部１０５によりプロセッサ１の周波数の制御を行う。あるいは、プロセッサシステム１００ｃは、電源制御部１０６によりプロセッサ１の電圧の制御を行うようにした。その結果、プロセッサシステム１００ｃは、プロセッサ１の周波数あるいは動作電圧を変更し、より細かい電力制御を行うとともに、演算処理の最適化を行うことができる。 As described above, in the processor system 100 c according to the present embodiment, the clock control unit 105 controls the frequency of the processor 1. Alternatively, the processor system 100 c controls the voltage of the processor 1 by the power supply control unit 106. As a result, the processor system 100c can change the frequency or operating voltage of the processor 1, perform finer power control, and optimize the arithmetic processing.

なお、本明細書におけるフローチャート中の各ステップは、その性質に反しない限り、実行順序を変更し、複数同時に実行し、あるいは実行毎に異なった順序で実行してもよい。 Note that the steps in the flowcharts in this specification may be executed in a different order for each execution by changing the execution order and executing a plurality of steps at the same time, as long as the steps are not contrary to the nature.

本発明は、上述した実施の形態に限定されるものではなく、本発明の要旨を変えない範囲において、種々の変更、改変等が可能である。 The present invention is not limited to the above-described embodiments, and various changes and modifications can be made without departing from the scope of the present invention.

１，１ａ，１ｂ…プロセッサ、１１…命令フェッチ部、１２…命令デコード部、１３ａ，１３ｂ，１３ｃ…パイプラインレジスタ、１４…データレジスタ、１５ａ，１５ｂ，１５ｃ…演算部、１６…セレクタ、１７…メモリアクセス部、１８…ライトバック部、２１…クロック制御回路、２２〜２６…ALU、３０…クロスバスイッチ、３１〜３８…ALU、４１…コンフィグテーブル、１００，１００ａ，１００ｂ，１００ｃ…プロセッサシステム、１０１…クロック発生器、１０２…メモリI/F、１０３…バス、１０４…メモリ、１０５…クロック制御部、１０６…電源制御部。 DESCRIPTION OF SYMBOLS 1, 1a, 1b ... Processor, 11 ... Instruction fetch part, 12 ... Instruction decode part, 13a, 13b, 13c ... Pipeline register, 14 ... Data register, 15a, 15b, 15c ... Operation part, 16 ... Selector, 17 ... Memory access unit, 18 ... Write-back unit, 21 ... Clock control circuit, 22-26 ... ALU, 30 ... Crossbar switch, 31-38 ... ALU, 41 ... Configuration table, 100, 100a, 100b, 100c ... Processor system, 101 DESCRIPTION OF SYMBOLS ... Clock generator, 102 ... Memory I / F, 103 ... Bus, 104 ... Memory, 105 ... Clock control part, 106 ... Power supply control part.

Claims

A processor for executing an instruction code read from a memory;
A control clock generator that outputs a control clock signal generated based on the supplied clock signal in accordance with clock signal control information for controlling the clock signal;
A plurality of arithmetic units connected in series;
Have
An arithmetic stage in which a plurality of operations are executed by the plurality of arithmetic units according to operation mode information including information on the contents of arithmetic processing executed by each of the plurality of arithmetic units and data information necessary for the arithmetic processing. The plurality of operations executed in series are executed in one execution cycle by changing an execution cycle based on the control clock signal from the control clock generation unit. Processor.

The plurality of arithmetic units connected in series are configured to be able to change the connection between the plurality of arithmetic units,
A switching unit capable of connecting the plurality of arithmetic units with a plurality of connection patterns;
2. The processor according to claim 1, wherein the switching unit switches connection between the plurality of arithmetic units according to a connection pattern according to connection information among the plurality of connection patterns.

The processor according to claim 1, wherein the instruction code includes information on the clock signal control information to the control clock generation unit and the operation mode information of the plurality of arithmetic units.

A storage unit that stores one or more sets of information of the clock signal control information to the control clock generation unit and the operation mode information of the plurality of computing units;
The processor according to claim 1, wherein the instruction code includes a code that specifies the information included in the storage unit.

The operating frequency or operating voltage of the processor can be changed,
The processor according to claim 1, wherein the clock signal control information to the control clock generation unit is different according to the changed operating frequency or operating voltage.