JP2012059195A

JP2012059195A - Multi-thread processor

Info

Publication number: JP2012059195A
Application number: JP2010204308A
Authority: JP
Inventors: Shohei Nomoto; 祥平野本; Yuki Kobayashi; 悠記小林
Original assignee: Renesas Electronics Corp
Current assignee: Renesas Electronics Corp
Priority date: 2010-09-13
Filing date: 2010-09-13
Publication date: 2012-03-22

Abstract

PROBLEM TO BE SOLVED: To solve the problem that, in a conventional multi-thread processor, the throughput cannot be sufficiently brought out.SOLUTION: A multi-thread processor has: an instruction supply section 10 including a first instruction buffer 231 for storing a first instruction code and second instruction buffers 232-23m for storing second instruction codes; an instruction selector 11 for selecting an instruction code issued from the first and second instruction buffers; an instruction decoder 12 for decoding the instruction code selected by the instruction selector 11; and an instruction execution section 13 for performing information processing based on a decode result. The instruction supply selection 10 has a thread control part 24 for storing a first instruction code in the first instruction buffer preferentially and, when the number of first instruction codes stored in the first instruction buffer becomes more than twice of a maximum value for the number of instruction codes which can be issued in parallel by the instruction supply section 10, storing a second instruction code in the second instruction buffer.

Description

本発明はマルチスレッドプロセッサに関し、特に高優先度のスレッドと、低優先度のスレッドと、を実行するマルチスレッドプロセッサに関する。 The present invention relates to a multi-thread processor, and more particularly to a multi-thread processor that executes a high-priority thread and a low-priority thread.

プロセッサは、外部メモリからプログラム及び処理対象データを読み出してデータ処理を行う。このようなプロセッサでは、命令コードを直列的に処理することでデータ処理を進める。なお、プロセッサは、命令供給部と、命令デコーダ、命令実行部とを有する。命令供給部は、外部メモリからプログラムをフェッチする。命令デコーダは、命令供給部が供給する命令をデコードし、命令実行部の制御情報を生成する。命令実行部は、前記制御情報に基づき、外部メモリに格納されているデータを処理する。 The processor reads the program and the processing target data from the external memory and performs data processing. Such a processor advances data processing by processing instruction codes serially. The processor includes an instruction supply unit, an instruction decoder, and an instruction execution unit. The instruction supply unit fetches a program from the external memory. The instruction decoder decodes an instruction supplied from the instruction supply unit and generates control information for the instruction execution unit. The instruction execution unit processes data stored in the external memory based on the control information.

しかし、プロセッサは、処理能力の向上が強く求められている。そのため、処理能力向上のために多くの手法が提案されている。この手法には、例えば、ＶＬＩＷ（Very Long Instruction Word）プロセッサと、マルチスレッドプロセッサとがある。 However, the processor is strongly required to improve the processing capability. For this reason, many methods have been proposed for improving the processing capability. Examples of this technique include a VLIW (Very Long Instruction Word) processor and a multi-thread processor.

ＶＬＩＷプロセッサは、複数の命令を並列して処理する。ＶＬＩＷプロセッサは、命令供給部、命令デコーダ、命令実行部を有する。つまり、基本的な構成は、一般的なプロセッサとは変わらない。しかし、ＶＬＩＷプロセッサでは、命令供給部が外部メモリから複数の命令をフェッチし、命令デコーダが複数の命令をデコードして複数の制御情報を生成する。そして、命令実行部は、複数の制御情報に基づき、複数のデータを並列処理する。このＶＬＩＷプロセッサには、平均的な命令並列性（Instruction-Level parallelism：ILP）の限界があるため、その性能向上には限界がある。 The VLIW processor processes a plurality of instructions in parallel. The VLIW processor has an instruction supply unit, an instruction decoder, and an instruction execution unit. That is, the basic configuration is the same as a general processor. However, in the VLIW processor, the instruction supply unit fetches a plurality of instructions from the external memory, and the instruction decoder decodes the plurality of instructions to generate a plurality of control information. Then, the instruction execution unit processes a plurality of data in parallel based on the plurality of control information. Since this VLIW processor has a limit of average instruction-parallelism (ILP), its performance improvement is limited.

マルチスレッドプロセッサは、独立した複数のプログラム（スレッド）を並列実行することで、ＶＬＩＷプロセッサのＩＬＰの限界を超えることができる。マルチスレッドプロセッサは、命令供給部、命令デコーダ、命令実行部に加えて命令セレクタを有する。マルチスレッドプロセッサでは、命令供給部が独立した複数のスレッドの命令をフェッチする。そして、命令供給部が、供給する複数のスレッドの命令の中からランタイムに同時実行可能な命令を抽出する。命令セレクタは、命令供給部から読み出された命令を命令デコーダ部へ供給する。 The multi-thread processor can exceed the ILP limit of the VLIW processor by executing a plurality of independent programs (threads) in parallel. The multi-thread processor has an instruction selector in addition to an instruction supply unit, an instruction decoder, and an instruction execution unit. In the multi-thread processor, the instruction supply unit fetches instructions of a plurality of independent threads. Then, the instruction supply unit extracts instructions that can be executed simultaneously at runtime from the instructions of the plurality of threads to be supplied. The instruction selector supplies the instruction read from the instruction supply unit to the instruction decoder unit.

マルチスレッドプロセッサでは、命令キャッシュと、命令バッファが設けられる。命令キャッシュは、外部メモリからフェッチした命令を格納する。命令バッファは、スレッドの数に応じた命令バッファが設けられている。命令バッファは、命令キャッシュから対応するスレッドの命令の供給を受ける。つまり、命令バッファには、処理する予定の命令がスレッド毎に格納される。マルチスレッドプロセッサは、適宜、命令バッファから命令を読み出すことで複数のスレッドを並列的に処理する。このマルチスレッドプロセッサでは、命令バッファに蓄積された命令の処理効率を向上させることで処理性能を向上させることができる。 In the multithread processor, an instruction cache and an instruction buffer are provided. The instruction cache stores instructions fetched from the external memory. The instruction buffer is provided with an instruction buffer corresponding to the number of threads. The instruction buffer receives the instruction of the corresponding thread from the instruction cache. In other words, the instruction buffer stores instructions to be processed for each thread. The multi-thread processor appropriately processes a plurality of threads by reading instructions from the instruction buffer. In this multi-thread processor, the processing performance can be improved by improving the processing efficiency of the instructions stored in the instruction buffer.

特許文献１では、ソフトウェアのプログラムが規定した頻度に基づき処理対象のスレッドを選択する方式が提案されている。また、非特許文献１では、複数のスレッドを循環的に選択するラウンドロビン方式、分岐が確定しているスレッドを選択する方式、命令バッファが保持する命令数が少ないスレッドを選択する方式が提案されている。 Patent Document 1 proposes a method of selecting a processing target thread based on a frequency defined by a software program. Non-Patent Document 1 proposes a round-robin method for cyclically selecting a plurality of threads, a method for selecting a thread for which a branch is determined, and a method for selecting a thread with a small number of instructions held in the instruction buffer. ing.

また、特許文献２では、マルチスレッドプロセッサにおいて、メモリからの命令読み出し待ちによるプログラムの停止（メモリ・ストール）を防ぐため、命令のフェッチ・バッファに管理部を設け、フェッチ・バッファに蓄積されている命令が少なくなった場合は、そのフェッチ・バッファに対応する命令の読み出しの緊急度をあげる、といった技術が開示されている。 Also, in Patent Document 2, in a multi-thread processor, in order to prevent a program stop (memory stall) due to an instruction read wait from the memory, a management unit is provided in the instruction fetch buffer and stored in the fetch buffer. A technique is disclosed in which when the number of instructions decreases, the urgency of reading the instruction corresponding to the fetch buffer is increased.

特開２０１０−８６１３０号公報JP 2010-86130 A 特開２００６−１９５７０５号公報JP 2006-195705 A

"Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor", International Symposium on Computer Architecture, 1996"Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor", International Symposium on Computer Architecture, 1996

しかしながら、特許文献１及び非特許文献１に記載のスレッド選択方式では、マルチスレッド動作可能なＶＬＩＷプロセッサにおいて十分な処理能力を実現できない問題がある。例えば、一のプログラムが、処理に多くの時間を要する主処理（主スレッド）と、主スレッドの結果を利用し、かつ、少ない時間で処理可能な副処理（副スレッド）と、から構成されている場合である。この場合、最適な性能を実現するためには、主スレッドと副スレッドが同時に終了することが望ましい。 However, the thread selection methods described in Patent Document 1 and Non-Patent Document 1 have a problem that a sufficient processing capability cannot be realized in a VLIW processor capable of multi-thread operation. For example, one program is composed of a main process (main thread) that takes a long time for processing, and a sub-process (sub thread) that uses the result of the main thread and can be processed in a short time. This is the case. In this case, in order to realize optimum performance, it is desirable that the main thread and the secondary thread are terminated at the same time.

特許文献１では、各スレッドが選択される頻度のみに基づき、選択するスレッドを決定する。そのため、特許文献１に記載のスレッド選択方式では、高優先度のスレッドが同時実行する命令数が少ない場合には、実行される命令数よりも、命令バッファにフェッチされる命令数の方が多くなり、命令バッファに処理待ちの命令が蓄積される。つまり、命令が命令バッファに格納される時間が不必要に長くなると共に、命令バッファに格納できない命令が廃棄される無駄が発生する。つまり、特許文献１に記載のスレッド選択方式では、命令キャッシュの命令供給能力が無駄になる問題がある。 In Patent Document 1, a thread to be selected is determined based only on the frequency with which each thread is selected. Therefore, in the thread selection method described in Patent Document 1, when the number of instructions simultaneously executed by a high priority thread is small, the number of instructions fetched into the instruction buffer is larger than the number of instructions to be executed. Thus, instructions waiting for processing are accumulated in the instruction buffer. In other words, the time for which the instruction is stored in the instruction buffer is unnecessarily long, and the instruction that cannot be stored in the instruction buffer is wasted. In other words, the thread selection method described in Patent Document 1 has a problem that the instruction supply capability of the instruction cache is wasted.

また、特許文献１に記載のスレッド選択方式では、高優先度のスレッドに対応する命令バッファに実行するのに十分な命令がない場合であっても、低優先度のスレッドに対応した命令バッファに命令が蓄積されるケースが生じる。その際には、高優先度のスレッドに対応した命令バッファに実行する命令が無くなり、処理がストール状態となり、高優先度のスレッドの実行が、過度に抑制されることになる問題が生じる。 Further, in the thread selection method described in Patent Document 1, even when there are not enough instructions to be executed in the instruction buffer corresponding to the high priority thread, the instruction buffer corresponding to the low priority thread is used. There are cases where instructions are accumulated. In that case, there is no instruction to be executed in the instruction buffer corresponding to the high priority thread, the process is stalled, and there is a problem that the execution of the high priority thread is excessively suppressed.

また、非特許文献１に記載のスレッド選択方式では、全スレッドを平等に扱うことを前提とする。そのため、主スレッドと副スレッドとの間の処理時間（あるいは、処理するスレッド数）に差がある場合、効率的な処理を行うことができない問題がある。 The thread selection method described in Non-Patent Document 1 is premised on treating all threads equally. Therefore, when there is a difference in processing time (or the number of threads to be processed) between the main thread and the sub thread, there is a problem that efficient processing cannot be performed.

また、特許文献２においても、スレッド毎、フェッチ・バッファ毎に優先度を考慮していないため、優先度の低い命令が優先して処理され、高優先度のスレッドの処理が滞る問題が生じる。特に、高優先度のスレッドにおいて処理する命令コード数が低優先度のスレッドの命令コード数よりも遙かに多いなど、命令コードの比率が極端に偏っている場合は特にこの問題は顕著になる。 Also in Patent Document 2, since priority is not taken into consideration for each thread and each fetch buffer, a low priority instruction is processed with priority, and there is a problem that processing of a high priority thread is delayed. This problem is particularly noticeable when the ratio of instruction codes is extremely biased, such as when the number of instruction codes processed in a high-priority thread is far greater than the number of instruction codes in a low-priority thread. .

つまり、特許文献１、２及び非特許文献１に記載の技術では、マルチスレッド動作可能なＶＬＩＷプロセッサにおいて、主スレッドと副スレッドとを有するプログラムを実行しようとした場合、処理性能を十分に発揮させることができない問題がある。 In other words, in the techniques described in Patent Documents 1 and 2 and Non-Patent Document 1, in a VLIW processor capable of multi-thread operation, when trying to execute a program having a main thread and a secondary thread, the processing performance is sufficiently exhibited. There is a problem that can not be.

本発明にかかるマルチスレッドプロセッサの一態様は、第１のスレッドに属する第１の命令コードを格納する第１の命令バッファと、第２のスレッドに属する第２の命令コードを格納する第２の命令バッファと、を備える命令供給部と、前記第１、第２の命令バッファから発行される命令コードのうち後段回路に伝達する命令コードを選択する命令セレクタと、前記命令セレクタが選択した前記命令コードをデコードして実行命令情報を生成する命令デコーダと、前記実行命令情報に基づく情報処理を行う命令実行部と、を有し、前記命令供給部は、優先的に前記第１の命令バッファに前記第１の命令コードを格納し、前記第１の命令バッファに格納される前記第１の命令コードの数が前記命令供給部が並列して発行可能な命令コード数の最大値の２倍以上となった場合に前記第２の命令バッファに前記第２の命令コードを格納するスレッド制御部を有する。 One aspect of the multi-thread processor according to the present invention is a first instruction buffer that stores a first instruction code belonging to a first thread, and a second instruction code that stores a second instruction code belonging to a second thread. An instruction supply unit comprising: an instruction buffer; an instruction selector for selecting an instruction code to be transmitted to a subsequent circuit among instruction codes issued from the first and second instruction buffers; and the instruction selected by the instruction selector An instruction decoder that decodes a code to generate execution instruction information; and an instruction execution unit that performs information processing based on the execution instruction information, wherein the instruction supply unit preferentially stores in the first instruction buffer The first instruction code is stored, and the number of the first instruction codes stored in the first instruction buffer is the maximum number of instruction codes that can be issued in parallel by the instruction supply unit. Having a thread control unit to store the second instruction code to said second instruction buffer when it becomes equal to or greater than 2 times the value.

また、本発明にかかるマルチスレッドプロセッサの別の態様は、第１のスレッドに属する第１の命令コードと第２のスレッドに属する第２の命令コードとを時分割で実行するマルチスレッドプロセッサであって、第１のスレッドに属する第１の命令コードを格納する第１の命令バッファと、第２のスレッドに属する第２の命令コードを格納する第２の命令バッファと、優先的に前記第１の命令バッファに前記第１の命令コードを格納し、前記第１の命令バッファに格納される前記第１の命令コードの数が前記第１のバッファが並列して発行可能な命令数の最大値の２倍よりも多くなった場合に前記第２の命令バッファに前記第２の命令コードを格納するスレッド制御部と、を有する。 Another aspect of the multi-thread processor according to the present invention is a multi-thread processor that executes, in a time division manner, a first instruction code belonging to the first thread and a second instruction code belonging to the second thread. A first instruction buffer that stores a first instruction code belonging to a first thread, a second instruction buffer that stores a second instruction code belonging to a second thread, and the first The first instruction code is stored in the instruction buffer, and the number of the first instruction codes stored in the first instruction buffer is the maximum number of instructions that can be issued in parallel by the first buffer. And a thread control unit that stores the second instruction code in the second instruction buffer when the number of the second instruction buffers exceeds two.

本発明にかかるマルチスレッドプロセッサは、スレッド制御部が、第１の命令バッファに格納される第１の命令コードの数が第１のバッファが並列して発行可能な命令数の最大値の２倍よりも多くなった場合に第２の命令バッファに第２の命令コードを格納する。これにより、本発明にかかるマルチスレッドプロセッサでは、第２の命令バッファに命令コードを蓄積したとしても、第１の命令バッファに蓄積される命令コードがストールすること防ぐことができる。 In the multi-thread processor according to the present invention, the thread control unit causes the number of first instruction codes stored in the first instruction buffer to be twice the maximum number of instructions that can be issued in parallel by the first buffer. The second instruction code is stored in the second instruction buffer when the number is larger. As a result, the multithread processor according to the present invention can prevent the instruction code stored in the first instruction buffer from stalling even if the instruction code is stored in the second instruction buffer.

本発明にかかるマルチスレッドプロセッサによれば、命令バッファに命令コードを効率よく蓄積することで、処理性能を向上させることができる。 According to the multithread processor of the present invention, it is possible to improve the processing performance by efficiently storing the instruction code in the instruction buffer.

実施の形態１にかかるマルチスレッドプロセッサのブロック図である。1 is a block diagram of a multithread processor according to a first embodiment; 実施の形態１にかかる命令キャッシュのデータ格納構造を示す概念図である。3 is a conceptual diagram showing a data storage structure of an instruction cache according to the first embodiment; FIG. 実施の形態１にかかる命令キャッシュの命令フェッチ手順を示すタイミングチャートである。3 is a timing chart illustrating an instruction fetch procedure of the instruction cache according to the first embodiment; 実施の形態１にかかるスレッド制御部のブロック図である。FIG. 3 is a block diagram of a thread control unit according to the first exemplary embodiment. 実施の形態１にかかる命令供給部の命令発行シーケンスを示すタイミングチャートである。4 is a timing chart showing an instruction issue sequence of an instruction supply unit according to the first exemplary embodiment; 図５のスレッド３に対応する動作状態のマルチスレッドプロセッサを示すブロック図である。FIG. 6 is a block diagram showing a multi-thread processor in an operating state corresponding to the thread 3 in FIG. 5. 図５のスレッド４に対応する動作状態のマルチスレッドプロセッサを示すブロック図である。FIG. 6 is a block diagram showing a multi-thread processor in an operating state corresponding to the thread 4 in FIG. 5. 図５のスレッド５に対応する動作状態のマルチスレッドプロセッサを示すブロック図である。FIG. 6 is a block diagram showing a multi-thread processor in an operating state corresponding to the thread 5 in FIG. 5. 実施の形態１にかかる命令供給部の命令発行シーケンスを示すタイミングチャートである。4 is a timing chart showing an instruction issue sequence of an instruction supply unit according to the first exemplary embodiment; 図９のスレッド３に対応する動作状態のマルチスレッドプロセッサを示すブロック図である。FIG. 10 is a block diagram showing a multi-thread processor in an operating state corresponding to the thread 3 of FIG. 9. 図９のスレッド４に対応する動作状態のマルチスレッドプロセッサを示すブロック図である。FIG. 10 is a block diagram showing a multi-thread processor in an operating state corresponding to the thread 4 of FIG. 9. 図９のスレッド５に対応する動作状態のマルチスレッドプロセッサを示すブロック図である。FIG. 10 is a block diagram showing a multi-thread processor in an operating state corresponding to the thread 5 of FIG. 9. 一般的なスレッド制御部による命令供給部の命令発行シーケンスを示すタイミングチャートである。It is a timing chart which shows the instruction issue sequence of the instruction supply part by a general thread control part. 実施の形態２にかかるスレッド制御部のブロック図である。FIG. 6 is a block diagram of a thread control unit according to the second embodiment. 実施の形態２にかかる命令供給部の命令発行シーケンスを示すタイミングチャートである。6 is a timing chart showing an instruction issue sequence of an instruction supply unit according to the second exemplary embodiment; 実施の形態３にかかるマルチスレッドプロセッサのブロック図である。FIG. 6 is a block diagram of a multithread processor according to a third exemplary embodiment. 実施の形態３にかかる待ち合わせ制御部のブロック図である。FIG. 10 is a block diagram of a waiting control unit according to the third embodiment. 実施の形態３にかかる制御信号生成部のブロック図である。FIG. 9 is a block diagram of a control signal generation unit according to a third exemplary embodiment. 実施の形態３にかかる制御信号生成部のブロック図である。FIG. 9 is a block diagram of a control signal generation unit according to a third exemplary embodiment. 実施の形態３にかかるスレッド制御部のブロック図である。FIG. 9 is a block diagram of a thread control unit according to a third embodiment. 実施の形態３にかかる命令供給部の命令発行シーケンスを示すタイミングチャートである。10 is a timing chart illustrating an instruction issue sequence of an instruction supply unit according to the third embodiment; 実施の形態３にかかるマルチスレッドプロセッサにおいてフェッチされるスレッドの切り替わり状態を示すシーケンス図である。FIG. 10 is a sequence diagram showing a switching state of threads fetched by the multithread processor according to the third embodiment.

実施の形態１
以下、図面を参照して本発明の実施の形態について説明する。本発明にかかるマルチスレッドプロセッサは、一度にフェッチする命令に複数の命令コードを含むＶＬＩＷプロセッサである。ＶＬＩＷプロセッサは、複数の命令コードを並列して処理することができる。以下の説明では、１サイクルで最大２個の命令を同時実行するＶＬＩＷプロセッサを例に説明を行う。 Embodiment 1
Embodiments of the present invention will be described below with reference to the drawings. The multi-thread processor according to the present invention is a VLIW processor including a plurality of instruction codes in instructions fetched at a time. The VLIW processor can process a plurality of instruction codes in parallel. In the following description, a VLIW processor that simultaneously executes a maximum of two instructions in one cycle will be described as an example.

図１に実施の形態１にかかるマルチスレッドプロセッサ１のブロック図を示す。図１に示すように、マルチスレッドプロセッサ１は、命令供給部１０、命令セレクタ１１、命令デコーダ１２、命令実行部１３を有する。マルチスレッドプロセッサ１は、命令供給部１０が外部メモリからプログラムを読み出す。命令供給部１０は、読み出したプログラムに関する命令コードを発行する。マルチスレッドプロセッサ１では、命令供給部１０から発行された命令コードに基づき外部メモリに格納されているデータを処理することでデータ処理を行う。 FIG. 1 shows a block diagram of a multithread processor 1 according to the first exemplary embodiment. As shown in FIG. 1, the multithread processor 1 includes an instruction supply unit 10, an instruction selector 11, an instruction decoder 12, and an instruction execution unit 13. In the multithread processor 1, the instruction supply unit 10 reads a program from an external memory. The instruction supply unit 10 issues an instruction code related to the read program. The multi-thread processor 1 performs data processing by processing data stored in the external memory based on the instruction code issued from the instruction supply unit 10.

命令供給部１０は、第１のスレッド（例えば、第１のプログラム）に属する第１の命令コードを格納する第１の命令バッファと、第２のスレッド（例えば、第２のプログラム）に属する第２の命令コードを格納する第２の命令バッファと、を備える。命令供給部１０は、一度に複数の命令コードを発行する。この複数の命令コードは、第１のスレッドに属するものによってのみ構成されていても良く、第１、第２のスレッドに属する命令コードが混在していても良い。本実施の形態では、第１のスレッドは、処理完了までに多くの時間を要する主スレッドであって、第２のスレッドは、少ない時間で処理が完了する副スレッドであるものとする。また、副スレッドは、主スレッドの演算結果を利用する、または、主スレッドの演算に必要な処理を準備するものとする。さらに、副スレッドは、複数のスレッドから構成されていても良い。命令供給部１０の詳細については後述する。 The instruction supply unit 10 includes a first instruction buffer that stores a first instruction code that belongs to a first thread (for example, a first program), and a second instruction that belongs to a second thread (for example, a second program). A second instruction buffer for storing two instruction codes. The instruction supply unit 10 issues a plurality of instruction codes at a time. The plurality of instruction codes may be configured only by those belonging to the first thread, or instruction codes belonging to the first and second threads may be mixed. In the present embodiment, it is assumed that the first thread is a main thread that requires a long time to complete the process, and the second thread is a secondary thread that completes the process in a short time. The secondary thread uses the calculation result of the main thread or prepares a process necessary for the calculation of the main thread. Furthermore, the secondary thread may be composed of a plurality of threads. Details of the instruction supply unit 10 will be described later.

命令セレクタ１１は、第１、第２の命令バッファから発行される命令コードのうち後段回路に伝達する命令コードを選択する。より具体的には、命令セレクタ１１は、命令供給部１０が発行する複数の命令コードが同一の演算ユニットを利用することがない命令コードの組み合わせを考慮して後段回路に伝達する命令コードを選択する。 The instruction selector 11 selects an instruction code to be transmitted to the subsequent circuit from the instruction codes issued from the first and second instruction buffers. More specifically, the instruction selector 11 selects an instruction code to be transmitted to the subsequent circuit in consideration of a combination of instruction codes in which a plurality of instruction codes issued by the instruction supply unit 10 do not use the same arithmetic unit. To do.

命令デコーダ１２は、命令セレクタ１１が選択した命令コードをデコードして実行命令情報を生成する。命令実行部１３は、外部メモリから処理対処のデータを読み出す。そして命令実行部１３は、命令デコーダ１２がデコードした実行命令情報に基づき読み出したデータを処理する。その後、命令実行部１３は、処理結果を外部メモリに書き戻す。 The instruction decoder 12 decodes the instruction code selected by the instruction selector 11 and generates execution instruction information. The instruction execution unit 13 reads data for processing from the external memory. The instruction execution unit 13 processes the read data based on the execution instruction information decoded by the instruction decoder 12. Thereafter, the instruction execution unit 13 writes the processing result back to the external memory.

実施の形態１にかかるマルチスレッドプロセッサ１は、命令供給部１０において、命令バッファに蓄積させる命令コードの数の制御方法に特徴の１つを有する。そこで、命令供給部１０について以下で詳細に説明する。 The multi-thread processor 1 according to the first embodiment has one feature in a method for controlling the number of instruction codes stored in the instruction buffer in the instruction supply unit 10. The instruction supply unit 10 will be described in detail below.

図１に示すように、命令供給部１０は、プログラムカウンタ２０１〜２０ｍ、アドレスセレクタ２１、命令キャッシュ２２、命令バッファ２３１〜２３ｍ、スレッド制御部２４を有する。プログラムカウンタは、実行可能なスレッド（例えば、プログラム）の数に応じて設けられる。図１に示す例では、ｍ個（ｍは整数）のスレッドに対応するものである。プログラムカウンタ２０１〜２０ｍは、それぞれスレッドの進捗に合わせてカウント値を増加させる。このカウント値は、スレッドに属する命令コードが格納される命令キャッシュのアドレスを占めすポインタとして機能する。なお、以下の説明では、プログラムカウンタ２０１は第１のスレッド（例えば、主スレッド）に対応して設けられるものとし、プログラムカウンタ２０１のカウント値を第１のカウント値とする。また、プログラムカウンタ２０２〜２０ｍは、第２のスレッド（例えば、副スレッド）に対応して設けられるものとし、プログラムカウンタ２０２〜２０ｍのカウント値を第２のカウント値とする。 As illustrated in FIG. 1, the instruction supply unit 10 includes program counters 201 to 20m, an address selector 21, an instruction cache 22, instruction buffers 231 to 23m, and a thread control unit 24. The program counter is provided according to the number of executable threads (for example, programs). In the example shown in FIG. 1, it corresponds to m threads (m is an integer). Each of the program counters 201 to 20m increases the count value in accordance with the progress of the thread. This count value functions as a pointer that occupies the address of the instruction cache in which the instruction code belonging to the thread is stored. In the following description, it is assumed that the program counter 201 is provided corresponding to the first thread (for example, the main thread), and the count value of the program counter 201 is the first count value. The program counters 202 to 20m are provided corresponding to the second thread (for example, the secondary thread), and the count value of the program counters 202 to 20m is set as the second count value.

アドレスセレクタ２１は、アドレス選択信号ＡｄＳｅｌに応じて第１のカウント値と前記第２のカウント値とのいずれか一方を選択して出力する。 The address selector 21 selects and outputs either the first count value or the second count value in accordance with the address selection signal AdSel.

命令キャッシュ２２は、外部メモリからスレッドに関連するプログラムを読み込んで、格納する。プログラムは、複数の命令コードから構成される。ここで、以下の説明では、第１のスレッドに関連するプログラムを構成する命令コードを第１の命令コードと称し、第２のスレッドに関連するプログラムを構成する命令コードを第２お命令コードと称す。命令キャッシュ２２は、第１のカウント値に応じて第１の命令コードを第１の命令バッファ２３１にフェッチし、第２のカウント値に応じて第２の命令コードを第２の命令バッファ２３２〜２３ｍにフェッチする。 The instruction cache 22 reads and stores a program related to the thread from the external memory. The program is composed of a plurality of instruction codes. Here, in the following description, an instruction code constituting a program related to the first thread is referred to as a first instruction code, and an instruction code constituting a program related to the second thread is referred to as a second instruction code. Call it. The instruction cache 22 fetches the first instruction code to the first instruction buffer 231 according to the first count value, and the second instruction code according to the second count value to the second instruction buffer 232-2. Fetch to 23m.

命令キャッシュ２２は、入力されるカウント値に応じて複数の命令コードをフェッチする。複数の命令コードは、同一のスレッドに属する命令コードであり、命令キャッシュ２２上において連続するアドレスに格納されるものである。実施の形態１にかかるマルチスレッドプロセッサ１は、命令供給部１０が同時に発行可能な命令コードの最大数ＮＭＩとして２が設定される。従って、マルチスレッドプロセッサ１では、命令キャッシュ２２は、２つの命令コードをフェッチするものとする。ここで、命令キャッシュ２２のデータ格納構造の概念図を図２に示す。図２に示すように、命令キャッシュ２２は、スレッド毎に命令コードを格納する。図２に示す例では、第１のスレッド（例えば、スレッド１）が命令コードＩｍ１〜Ｉｍ１６により構成され、第２のスレッド（例えば、スレッド２）が命令コードＩｓａ１〜Ｉｓａ６により構成され、第２のスレッド（例えば、スレッドｍ）が命令コードＩｓｘ１〜Ｉｓｘ８により構成される。そして、命令キャッシュ２２は、入力されるカウント値をポインタとし、当該ポインタにより示されるアドレスと当該アドレスに続くアドレスに格納されるデータをフェッチする。 The instruction cache 22 fetches a plurality of instruction codes according to the input count value. The plurality of instruction codes are instruction codes belonging to the same thread, and are stored at consecutive addresses on the instruction cache 22. In the multithread processor 1 according to the first embodiment, 2 is set as the maximum number NMI of instruction codes that can be issued simultaneously by the instruction supply unit 10. Therefore, in the multithread processor 1, the instruction cache 22 fetches two instruction codes. Here, a conceptual diagram of the data storage structure of the instruction cache 22 is shown in FIG. As shown in FIG. 2, the instruction cache 22 stores an instruction code for each thread. In the example shown in FIG. 2, the first thread (for example, thread 1) is configured by instruction codes Im1 to Im16, the second thread (for example, thread 2) is configured by instruction codes Isa1 to Isa6, and the second A thread (for example, thread m) is constituted by instruction codes Isx1 to Isx8. Then, the instruction cache 22 uses the input count value as a pointer, and fetches data stored at an address indicated by the pointer and an address subsequent to the address.

命令バッファ２３１〜２３ｍは、スレッドに対応して設けられる。ここで、本実施の形態では、命令バッファ２３１は、第１のスレッドに対応して設けられる第１の命令バッファとして機能する。命令バッファ２３２〜２３ｍは、第２のスレッドに対応して設けられる第２の命令バッファとして機能する。実施の形態１にかかるマルチスレッドプロセッサ１では、第１の命令バッファの容量を命令供給部１０が同時に発行可能な命令コードの最大数ＮＭＩの２倍以上に設定する。図１に示す例では、命令供給部１０が同時に発行可能な命令コードの最大数ＮＭＩが２に設定される。そのため、図１に示す例では、第１の命令バッファとして用いられる命令バッファ２３１は４つの命令を格納可能に設定される。また、第１、第２の命令バッファは、同容量に設定しても良いが、第２の命令バッファの容量を第１の命令バッファよりも小容量に設定しても良い。 The instruction buffers 231 to 23m are provided corresponding to the threads. Here, in the present embodiment, the instruction buffer 231 functions as a first instruction buffer provided corresponding to the first thread. The instruction buffers 232 to 23m function as a second instruction buffer provided corresponding to the second thread. In the multi-thread processor 1 according to the first embodiment, the capacity of the first instruction buffer is set to at least twice the maximum number NMI of instruction codes that can be issued simultaneously by the instruction supply unit 10. In the example shown in FIG. 1, the maximum number NMI of instruction codes that can be issued simultaneously by the instruction supply unit 10 is set to 2. Therefore, in the example shown in FIG. 1, the instruction buffer 231 used as the first instruction buffer is set so as to be able to store four instructions. The first and second instruction buffers may be set to have the same capacity, but the capacity of the second instruction buffer may be set to be smaller than that of the first instruction buffer.

実施の形態１にかかるマルチスレッドプロセッサ１は、ＶＬＩＷプロセッサであるため、命令キャッシュから命令バッファへの命令のフェッチ方法が単一の命令コードをフェッチする一般的な方式とは異なる。そこで、マルチスレッドプロセッサ１において、処理対象のスレッドが１つである場合の命令フェッチ手順を示すタイミングチャートを図３に示す。この図３を参照して処理対象スレッドが１スレッドである場合の命令フェッチ手順の説明を行う。なお、マルチスレッドプロセッサ１では、動作サイクル毎に処理が進むものとする。また、図３では、スレッド１のみを処理対象スレッドとし、プログラムカウンタ２０１のカウント値が増加する例を示す。 Since the multi-thread processor 1 according to the first embodiment is a VLIW processor, the method of fetching an instruction from the instruction cache to the instruction buffer is different from a general method of fetching a single instruction code. Therefore, FIG. 3 shows a timing chart showing the instruction fetch procedure when the multi-thread processor 1 has one thread to be processed. The instruction fetch procedure when the processing target thread is one thread will be described with reference to FIG. In the multi-thread processor 1, it is assumed that the process proceeds every operation cycle. FIG. 3 shows an example in which only the thread 1 is set as a processing target thread and the count value of the program counter 201 increases.

図３に示す例では、サイクル０でプログラムカウンタ２０１がリセットされる。そして、サイクル１でプログラムカウンタ２０１のカウント値が１となる。このとき、命令バッファ２３１には命令コードは蓄積されていない。そのため、命令キャッシュ２２は、２つの命令コードをフェッチする。図３に示す例では、命令キャッシュ２２は、カウント値"１"に対応する命令コードＩｍ１と、命令コードＩｍ１に続く命令コードＩｍ２と、をフェッチする。命令キャッシュ２２が２つの命令コードを同時にフェッチするのは、命令供給部１０の同時発行命令の最大数ＮＭＩが２であるためである。また、命令キャッシュ２２と命令バッファ２３１との間の帯域幅は、同時にフェッチされる命令コードを十分に転送できる帯域に設定される。 In the example shown in FIG. 3, the program counter 201 is reset at cycle 0. Then, in cycle 1, the count value of the program counter 201 becomes 1. At this time, no instruction code is stored in the instruction buffer 231. Therefore, the instruction cache 22 fetches two instruction codes. In the example shown in FIG. 3, the instruction cache 22 fetches the instruction code Im1 corresponding to the count value “1” and the instruction code Im2 following the instruction code Im1. The instruction cache 22 fetches two instruction codes at the same time because the maximum number NMI of simultaneously issued instructions of the instruction supply unit 10 is two. Further, the bandwidth between the instruction cache 22 and the instruction buffer 231 is set to a band that can sufficiently transfer instruction codes fetched at the same time.

続いて、サイクル２においてプログラムカウンタ２０１のカウント値が増加する。カウント増加数は、命令キャッシュ２２の同時フェッチ命令コード数に対応して、最大値が２に設定される。また、カウント増加数は、命令バッファ２３１が保持することができる最大命令数をＨｍａｘ、命令バッファ２３１に保持されている命令数をＨＭＩ、命令供給部１０が発行する命令コードの数をＮＩＩとすると、カウント増加数＝Ｈｍａｘ−（ＨＮＩ−ＮＩＩ）で表される。図３に示す例では、サイクル１のカウント増加数が４であるため、サイクル２のカウント増加数は２となる。従って、サイクル２では、プログラムカウンタ２０１のカウント値は３となる。そして、命令キャッシュ２２は、カウント値"３"に対応して命令コードＩｍ３、Ｉｍ４がフェッチされる。また、サイクル２では、サイクル１においてフェッチされた命令コードＩｍ１、Ｉｍ２が命令バッファ２３１に格納されると共に、命令コードＩｍ１が発行される。 Subsequently, in cycle 2, the count value of the program counter 201 increases. The maximum value of the count increment is set to 2 corresponding to the number of simultaneous fetch instruction codes in the instruction cache 22. In addition, the count increment is Hmax as the maximum number of instructions that can be held in the instruction buffer 231, HMI as the number of instructions held in the instruction buffer 231, and NII as the number of instruction codes issued by the instruction supply unit 10. , Count increase number = Hmax− (HNI−NII). In the example shown in FIG. 3, the count increase in cycle 1 is 4, so the count increase in cycle 2 is 2. Therefore, in cycle 2, the count value of the program counter 201 is 3. The instruction cache 22 fetches the instruction codes Im3 and Im4 corresponding to the count value “3”. In cycle 2, the instruction codes Im1 and Im2 fetched in cycle 1 are stored in the instruction buffer 231 and the instruction code Im1 is issued.

続いて、サイクル３では、サイクル２におけるカウント増加数が３であるため、プログラムカウンタ２０１のカウント値は最大の増加数の２だけ増加して５となる。そして、命令キャッシュ２２からは、カウント値"５"に対応して命令コードＩｍ５、Ｉｍ６がフェッチされる。サイクル２において命令コードＩｍ１が発行され、かつ、命令コードＩｍ３、Ｉｍ４がフェッチされている。そのため、サイクル３で命令バッファ２３１に蓄積されている命令コードは、命令コードＩｍ２、Ｉｍ３、Ｉｍ４の３つである。そして、図３に示す例では、サイクル３で、命令コードＩｍ２が発行される。これにより、サイクル３の動作完了時のカウント増加数は２となる。 Subsequently, in cycle 3, since the count increase number in cycle 2 is 3, the count value of the program counter 201 increases by 2 which is the maximum increase number to 5. Then, instruction codes Im5 and Im6 are fetched from the instruction cache 22 corresponding to the count value “5”. In cycle 2, instruction code Im1 is issued, and instruction codes Im3 and Im4 are fetched. Therefore, the instruction codes stored in the instruction buffer 231 in cycle 3 are the three instruction codes Im2, Im3, and Im4. In the example shown in FIG. 3, the instruction code Im2 is issued in cycle 3. As a result, the count increment when the operation of cycle 3 is completed becomes 2.

続いて、サイクル４では、サイクル３におけるカウント増加数が２であるため、プログラムカウンタ２０１のカウント値は２増加して７となる。そして、命令キャッシュ２２からは、カウント値"７"に対応して命令コードＩｍ７、Ｉｍ８がフェッチされる。サイクル３において命令コードＩｍ２が発行され、かつ、命令コードＩｍ５、Ｉｍ６がフェッチされている。そのため、サイクル４で命令バッファ２３１に蓄積されている命令コードは、命令コードＩｍ３、Ｉｍ４、Ｉｍ５、Ｉｍ６の４つである。そして、図３に示す例では、サイクル４で、命令コードＩｍ３が発行される。これにより、サイクル４の動作完了時のカウント増加数は１となる。 Subsequently, in cycle 4, since the number of increments in cycle 3 is 2, the count value of program counter 201 is incremented by 2 to 7. The instruction codes Im7 and Im8 are fetched from the instruction cache 22 corresponding to the count value “7”. In cycle 3, instruction code Im2 is issued, and instruction codes Im5 and Im6 are fetched. Therefore, the instruction codes stored in the instruction buffer 231 in cycle 4 are four instruction codes Im3, Im4, Im5, and Im6. In the example shown in FIG. 3, the instruction code Im3 is issued in cycle 4. As a result, the count increment when the operation of cycle 4 is completed becomes 1.

続いて、サイクル５では、サイクル４におけるカウント増加数が１であるため、プログラムカウンタ２０１のカウント値は１増加して８となる。そして、命令キャッシュ２２からは、カウント値"８"に対応して命令コードＩｍ８、Ｉｍ９がフェッチされる。サイクル４において命令コードＩｍ３が発行され、かつ、命令コードＩｍ７、Ｉｍ８がフェッチされている。しかし、サイクル５では、命令バッファ２３１の保持可能な命令数が１しかない。そのため、命令コードＩｍ８は、サイクル５では廃棄されることになる。サイクル５で命令バッファ２３１に蓄積されている命令コードは、命令コードＩｍ４、Ｉｍ５、Ｉｍ６、Ｉｍ７の４つである。そして、図３に示す例では、サイクル５で、命令コードＩｍ４が発行される。これにより、サイクル５の動作完了時のカウント増加数は１となる。 Subsequently, in cycle 5, since the count increment in cycle 4 is 1, the count value of program counter 201 is incremented by 1 to 8. Then, the instruction codes Im8 and Im9 are fetched from the instruction cache 22 corresponding to the count value “8”. In cycle 4, instruction code Im3 is issued, and instruction codes Im7 and Im8 are fetched. However, in cycle 5, the instruction buffer 231 can hold only one instruction. Therefore, the instruction code Im8 is discarded in the cycle 5. The instruction codes stored in the instruction buffer 231 in cycle 5 are four instruction codes Im4, Im5, Im6, and Im7. In the example shown in FIG. 3, the instruction code Im4 is issued in cycle 5. As a result, the count increment when the operation of cycle 5 is completed becomes 1.

続いて、サイクル６では、サイクル５におけるカウント増加数が１であるため、プログラムカウンタ２０１のカウント値は１増加して９となる。そして、命令キャッシュ２２からは、カウント値"９"に対応して命令コードＩｍ９、Ｉｍ１０がフェッチされる。サイクル５において命令コードＩｍ４が発行され、かつ、命令コードＩｍ８、Ｉｍ９がフェッチされている。しかし、サイクル６では、命令バッファ２３１の保持可能な命令数が１しかない。そのため、命令コードＩｍ９は、サイクル６では廃棄されることになる。サイクル６で命令バッファ２３１に蓄積されている命令コードは、命令コードＩｍ５、Ｉｍ６、Ｉｍ７、Ｉｍ８の４つである。そして、図３に示す例では、サイクル６で、命令コードＩｍ５、Ｉｍ６が発行される。これにより、サイクル６の動作完了時のカウント増加数は２となる。 Subsequently, in cycle 6, since the count increment in cycle 5 is 1, the count value of program counter 201 is incremented by 1 to 9. Then, the instruction codes Im9 and Im10 are fetched from the instruction cache 22 corresponding to the count value “9”. In cycle 5, instruction code Im4 is issued, and instruction codes Im8 and Im9 are fetched. However, in cycle 6, the instruction buffer 231 can hold only one instruction. Therefore, the instruction code Im9 is discarded in the cycle 6. The instruction codes stored in the instruction buffer 231 in cycle 6 are four instruction codes Im5, Im6, Im7, and Im8. In the example shown in FIG. 3, instruction codes Im5 and Im6 are issued in cycle 6. As a result, the count increment when the operation of cycle 6 is completed becomes 2.

続いて、サイクル７では、サイクル６におけるカウント増加数が２であるため、プログラムカウンタ２０１のカウント値は２増加して１１となる。そして、命令キャッシュ２２からは、カウント値"１１"に対応して命令コードＩｍ１１、Ｉｍ１２がフェッチされる。サイクル６において命令コードＩｍ５、Ｉｍ６が発行され、かつ、命令コードＩｍ９、Ｉｍ１０がフェッチされている。そのため、サイクル７で命令バッファ２３１に蓄積されている命令コードは、命令コードＩｍ７、Ｉｍ８、Ｉｍ９、Ｉｍ１０の４つである。そして、図３に示す例では、サイクル７で、命令コードＩｍ７、Ｉｍ８が発行される。これにより、サイクル７の動作完了時のカウント増加数は２となる。 Subsequently, in cycle 7, since the count increment in cycle 6 is 2, the count value of program counter 201 is increased by 2 to 11. Then, the instruction codes Im11 and Im12 are fetched from the instruction cache 22 corresponding to the count value “11”. In cycle 6, instruction codes Im5 and Im6 are issued, and instruction codes Im9 and Im10 are fetched. Therefore, the instruction codes stored in the instruction buffer 231 in cycle 7 are four instruction codes Im7, Im8, Im9, and Im10. In the example shown in FIG. 3, instruction codes Im7 and Im8 are issued in cycle 7. As a result, the count increment when the operation of cycle 7 is completed becomes 2.

続いて、サイクル８では、サイクル７におけるカウント増加数が２であるため、プログラムカウンタ２０１のカウント値は２増加して１３となる。そして、命令キャッシュ２２からは、カウント値"１３"に対応して命令コードＩｍ１３、Ｉｍ１４がフェッチされる。サイクル７において命令コードＩｍ７、Ｉｍ８が発行され、かつ、命令コードＩｍ１１、Ｉｍ１２がフェッチされている。そのため、サイクル８で命令バッファ２３１に蓄積されている命令コードは、命令コードＩｍ９、Ｉｍ１０、Ｉｍ１１、Ｉｍ１２の４つである。そして、図３に示す例では、サイクル８で、命令コードＩｍ９が発行される。これにより、サイクル８の動作完了時のカウント増加数は１となる。 Subsequently, in cycle 8, since the count increment in cycle 7 is 2, the count value of program counter 201 is increased by 2 to 13. Then, the instruction codes Im13 and Im14 are fetched from the instruction cache 22 corresponding to the count value “13”. In cycle 7, instruction codes Im7 and Im8 are issued, and instruction codes Im11 and Im12 are fetched. Therefore, the instruction codes stored in the instruction buffer 231 in cycle 8 are four instruction codes Im9, Im10, Im11, and Im12. In the example shown in FIG. 3, the instruction code Im9 is issued in cycle 8. As a result, the count increment when the operation of cycle 8 is completed becomes 1.

このように、プログラムカウンタ２０１は、命令バッファ２３１の最大保持命令数Ｈｍａｘと命令供給部１０が発行する命令コードの数に応じて、カウント値を進める。また、命令キャッシュ２２は、フェッチした命令コードが命令バッファ２３１に保持されるか否かにかかわらず常にフェッチ可能な最大の命令コード数をフェッチする。なお、命令キャッシュ２２は、フェッチした命令コードが命令バッファ２３１に保持されるか否かにより、最大の命令コード数以下の命令をフェッチする構成としても良い。上記で説明したプログラムカウンタ２０１及び命令キャッシュ２２の制御方法は一例であり、その他の制御方法を適用することも可能である。 Thus, the program counter 201 advances the count value according to the maximum number of instructions Hmax stored in the instruction buffer 231 and the number of instruction codes issued by the instruction supply unit 10. Further, the instruction cache 22 always fetches the maximum number of instruction codes that can be fetched regardless of whether or not the fetched instruction code is held in the instruction buffer 231. Note that the instruction cache 22 may be configured to fetch instructions equal to or less than the maximum number of instruction codes depending on whether or not the fetched instruction code is held in the instruction buffer 231. The control method of the program counter 201 and the instruction cache 22 described above is an example, and other control methods can be applied.

次に、命令供給部１０において命令バッファ２３１〜２３ｍにいずれのスレッドに属する命令コードを蓄積するかを制御するスレッド制御部２４について詳細に説明する。実施の形態１にかかるスレッド制御部２４は、優先的に第１の命令バッファ（例えば、命令バッファ２３１）に第１の命令コード（スレッド１に属する命令コード）を格納し、命令バッファ２３１に格納される第１の命令コードの数が命令供給部１０が並列して発行可能な命令コード数の最大値の２倍以上となった場合に第２の命令バッファ（例えば、命令バッファ２３２〜２３ｍ）に第２の命令コード（スレッド２〜スレッドｍに属する命令コード）を格納する。本実施の形態では、スレッド制御部２４が出力するアドレス選択信号ＡｄＳｅｌをアドレスセレクタ２１に出力する。スレッド制御部２４は、アドレスセレクタが出力するカウント値を第１のカウント値とするか第２のカウント値とするかを第１の命令バッファに蓄積された命令コードの数に応じて切り替える。出力するカウント値の切り替えは、アドレス選択信号ＡｄＳｅｌにより指定するプログラムカウンタを切り替えることで制御される。このような制御により、スレッド制御部２４は、命令バッファへの命令コードの蓄積状態を制御する。 Next, the thread control unit 24 that controls which instruction code belongs to which thread in the instruction buffers 231 to 23m in the instruction supply unit 10 will be described in detail. The thread control unit 24 according to the first embodiment preferentially stores the first instruction code (instruction code belonging to the thread 1) in the first instruction buffer (for example, the instruction buffer 231) and stores it in the instruction buffer 231. Second instruction buffer (for example, instruction buffers 232 to 23m) when the number of first instruction codes to be executed is twice or more the maximum value of the number of instruction codes that can be issued in parallel by the instruction supply unit 10 The second instruction code (instruction code belonging to thread 2 to thread m) is stored in. In the present embodiment, the address selection signal AdSel output from the thread control unit 24 is output to the address selector 21. The thread control unit 24 switches whether the count value output from the address selector is the first count value or the second count value according to the number of instruction codes stored in the first instruction buffer. Switching of the output count value is controlled by switching the program counter designated by the address selection signal AdSel. Through such control, the thread control unit 24 controls the accumulation state of the instruction code in the instruction buffer.

また、実施の形態１にかかるマルチスレッドプロセッサ１では、命令キャッシュ２２がフェッチした命令コードの数を示すフェッチ数信号ＮＦＩを出力し、命令バッファ２３１〜２３ｍが保持している命令コードの数を示す命令保持数信号ＮＨＩを出力し、命令デコーダ１２が命令供給部から発行された命令コードの数を示す命令発行数信号ＮＩＩを出力する。そして、スレッド制御部２４は、フェッチ数信号ＮＦＩ、命令保持数信号ＮＨＩ、命令発行数信号ＮＩＩに基づきアドレス選択信号ＡｄＳｅｌの状態を切り替える。なお、本実施の形態では、フェッチ数信号ＮＦＩ、命令保持数信号ＮＨＩ、命令発行数信号ＮＩＩはいずれも主スレッド（例えば、スレッド１）に属する命令コードの数を示すものとする。 In the multithread processor 1 according to the first embodiment, the fetch number signal NFI indicating the number of instruction codes fetched by the instruction cache 22 is output, indicating the number of instruction codes held in the instruction buffers 231 to 23m. The instruction holding number signal NHI is output, and the instruction decoder 12 outputs an instruction issue number signal NII indicating the number of instruction codes issued from the instruction supply unit. Then, the thread control unit 24 switches the state of the address selection signal AdSel based on the fetch number signal NFI, the instruction holding number signal NHI, and the instruction issue number signal NII. In this embodiment, it is assumed that the fetch number signal NFI, the instruction holding number signal NHI, and the instruction issue number signal NII all indicate the number of instruction codes belonging to the main thread (for example, thread 1).

スレッド制御部２４のブロック図を図４に示す。図４に示すように、スレッド制御部２４は、高スレッド選択部３０、ラウンドロビン選択部３１、低スレッド選択部３２を有する。本実施の形態では、スレッド制御部２４が出力するアドレス選択信号ＡｄＳｅｌは、プログラムカウンタの数に応じた複数の信号により構成される。そして、スレッド制御部２４は、命令キャッシュにカウント値を伝達すべきスレッドに対応するアドレス選択信号をハイレベル（例えば、１）とすることで、アドレスセレクタ２１を制御する。 A block diagram of the thread control unit 24 is shown in FIG. As illustrated in FIG. 4, the thread control unit 24 includes a high thread selection unit 30, a round robin selection unit 31, and a low thread selection unit 32. In the present embodiment, the address selection signal AdSel output from the thread control unit 24 is composed of a plurality of signals corresponding to the number of program counters. Then, the thread control unit 24 controls the address selector 21 by setting the address selection signal corresponding to the thread to transmit the count value to the instruction cache to a high level (for example, 1).

高スレッド選択部３０は、第１の命令バッファに格納される命令コードの数が命令供給部１０が並列して発行可能な命令の最大数ＮＭＩの２倍未満である期間はアドレス選択信号を第１のカウント値（例えば、プログラムカウンタ２０１のカウント値）を選択する状態（例えば、アドレス選択信号ＡｄＳｅｌ[１]を１とする状態）とする。より具体的には、高スレッド選択部３０は、フェッチ数信号ＮＦＩにより示される値をＮＦＩ、命令保持数信号ＮＨＩにより示される値をＮＨＩ、命令発行数信号ＮＩＩにより示される値をＮＩＩ、命令供給部１０の同時発行可能な最大命令数をＮＭＩした場合に（１）式の条件が満たされる場合に主スレッドに対応するアドレス選択信号ＡｄＳｅｌ［１］を１（例えば、選択状態）とする。
ＮＦＩ＋ＮＨＩ≧ＮＩＩ＋２×ＮＭＩ・・・（１） The high thread selection unit 30 outputs the address selection signal during a period in which the number of instruction codes stored in the first instruction buffer is less than twice the maximum number NMI of instructions that the instruction supply unit 10 can issue in parallel. It is assumed that the count value of 1 (for example, the count value of the program counter 201) is selected (for example, the address selection signal AdSel [1] is 1). More specifically, the high thread selection unit 30 supplies the value indicated by the fetch number signal NFI as NFI, the value indicated as the instruction holding number signal NHI as NHI, and the value indicated as the instruction issue number signal NII as NII. When the maximum number of instructions that can be issued simultaneously by the unit 10 is NMI, the address selection signal AdSel [1] corresponding to the main thread is set to 1 (for example, selected state) when the condition of the expression (1) is satisfied.
NFI + NHI ≧ NII + 2 × NMI (1)

ラウンドロビン選択部３１は、副スレッドとして定義されるスレッド２〜スレッドｍを巡回的に選択する。ラウンドロビン選択部３１には主スレッド（例えば、スレッド１）のアドレス選択信号ＡｄＳｅｌ［１］が入力される。ラウンドロビン選択部３１は、アドレス選択信号ＡｄＳｅｌ［１］が非選択状態を示す場合（例えば、０）となる毎に選択状態にするスレッドを切り替える。 The round robin selection unit 31 cyclically selects the threads 2 to m defined as the secondary threads. An address selection signal AdSel [1] of the main thread (for example, thread 1) is input to the round robin selection unit 31. The round robin selection unit 31 switches a thread to be selected every time the address selection signal AdSel [1] indicates a non-selected state (for example, 0).

低スレッド選択部３２は、アドレス選択信号ＡｄＳｅｌ［１］が第１のカウント値を非選択状態とした場合にアドレス選択信号ＡｄＳｅｌ［２］〜ＡｄＳｅｌ［ｍ］を第２のカウント値を選択する状態とする。より具体的には、低スレッド選択部３２は、ゲーティング回路として副スレッドの数に対応したＡＮＤ回路を有する。このＡＮＤ回路は、アドレス選択信号ＡｄＳｅｌ［１］が０の状態でラウンドロビン選択部３１の出力信号をアドレス選択信号ＡｄＳｅｌ［２］〜ＡｄＳｅｌ［ｍ］として出力する。 The low thread selection unit 32 selects the second count value from the address selection signals AdSel [2] to AdSel [m] when the address selection signal AdSel [1] sets the first count value to the non-selected state. And More specifically, the low thread selection unit 32 has an AND circuit corresponding to the number of sub threads as a gating circuit. This AND circuit outputs the output signal of the round robin selection unit 31 as the address selection signals AdSel [2] to AdSel [m] while the address selection signal AdSel [1] is 0.

続いて、スレッド制御部２４を有する実施の形態１にかかるマルチスレッドプロセッサ１の動作について説明する。マルチスレッドプロセッサ１では、主スレッドに属する命令コードの命令バッファへの蓄積状態をスレッド制御部２４により制御することで、少なくとも２つの効果を奏する。第１の効果は、主スレッドの実行効率を向上させることである。第２の効果は、命令キャッシュから命令バッファへの命令コードのフェッチ効率を向上させることである。そこで、以下の説明では、まず、第１の効果について説明する。 Next, the operation of the multithread processor 1 according to the first embodiment having the thread control unit 24 will be described. The multi-thread processor 1 has at least two effects by controlling the accumulation state of the instruction code belonging to the main thread in the instruction buffer by the thread control unit 24. The first effect is to improve the execution efficiency of the main thread. The second effect is to improve the fetch efficiency of the instruction code from the instruction cache to the instruction buffer. Therefore, in the following description, first, the first effect will be described.

第１の効果を説明するためのマルチスレッドプロセッサ１の動作を示すタイミングチャートを図５に示す。図５では、サイクル０〜サイクル８のサイクル毎に、アドレス選択信号ＡｄＳｅｌにより選択されるスレッドとそのときの命令コードのフェッチ数、命令保持数、命令発行数を示した。また、図５に示す例では、扱うスレッド数を３とした。そして、図５に示すタイミングチャートの各欄に示される数字は、スレッド１／スレッド２／スレッド３の命令コード数である。なお、スレッド１は主スレッドであり、スレッド２、３は副スレッドである。また、図５に示す例では、命令供給部１０が同時に発行可能な命令コードの最大数ＮＭＩを２と設定する。 FIG. 5 shows a timing chart showing the operation of the multi-thread processor 1 for explaining the first effect. FIG. 5 shows the thread selected by the address selection signal AdSel and the number of instruction code fetches, the number of instructions held, and the number of instructions issued for each cycle from cycle 0 to cycle 8. Further, in the example shown in FIG. The numbers shown in each column of the timing chart shown in FIG. 5 are the number of instruction codes of thread 1 / thread 2 / thread 3. Note that thread 1 is a main thread, and threads 2 and 3 are secondary threads. In the example shown in FIG. 5, the maximum number NMI of instruction codes that can be issued simultaneously by the instruction supply unit 10 is set to 2.

図５では、サイクル０で動作が開始される。サイクル０では、命令キャッシュ２２から命令コードのフェッチは行われず、命令バッファ２３１〜２３３に保持されている命令コードもない状態である。また、サイクル０において、スレッド１に対応するフェッチ数ＮＦＩ、命令保持数ＮＨＩ、命令発行数ＮＩＩに基づき上述した（１）式を計算すると、（１）式の条件を満たさない。そのため、サイクル０では、アドレス選択信号ＡｄＳｅｌによりスレッド１が選択される。 In FIG. 5, the operation starts at cycle 0. In cycle 0, the instruction code is not fetched from the instruction cache 22, and there is no instruction code held in the instruction buffers 231 to 233. In cycle 0, if the above equation (1) is calculated based on the fetch number NFI, the instruction holding number NHI, and the instruction issue number NII corresponding to the thread 1, the condition of the equation (1) is not satisfied. Therefore, in cycle 0, thread 1 is selected by the address selection signal AdSel.

サイクル１では、サイクル０においてアドレス選択信号ＡｄＳｅｌによりスレッド１が選択されていることに基づいて、命令キャッシュ２２がスレッド１に属する命令コードを２個フェッチする。つまり、フェッチ数ＮＦＩは２／０／０となる。また、サイクル１では、命令キャッシュ２２がフェッチした命令コードは命令バッファに格納されていない。また、サイクル１において、スレッド１に対応するフェッチ数ＮＦＩ、命令保持数ＮＨＩ、命令発行数ＮＩＩに基づき上述した（１）式を計算すると、（１）式の条件を満たさない。そのため、サイクル１では、アドレス選択信号ＡｄＳｅｌによりスレッド１が選択される。 In cycle 1, the instruction cache 22 fetches two instruction codes belonging to the thread 1 based on the fact that the thread 1 is selected by the address selection signal AdSel in the cycle 0. That is, the fetch number NFI is 2/0/0. In cycle 1, the instruction code fetched by the instruction cache 22 is not stored in the instruction buffer. In cycle 1, when the above-described expression (1) is calculated based on the fetch number NFI, instruction holding number NHI, and instruction issue number NII corresponding to thread 1, the condition of expression (1) is not satisfied. Therefore, in cycle 1, thread 1 is selected by address selection signal AdSel.

サイクル２では、サイクル１においてアドレス選択信号ＡｄＳｅｌによりスレッド１が選択されていることに基づいて、命令キャッシュ２２がスレッド１に属する命令コードを２個フェッチする。つまり、フェッチ数ＮＦＩは２／０／０となる。また、サイクル２では、サイクル１で命令キャッシュ２２がフェッチした命令コードがスレッド１に対応して設けられた命令バッファ２３１に格納される。そのため、命令保持数ＮＨＩは２／０／０となる。さらに、サイクル２では、命令供給部１０からスレッド１に属する命令コードが１つ発行されるため、命令発行数ＮＩＩは１／０／０となる。ここで、命令供給部１０が発行する命令数は、同時発行数の最大値を２とし、命令バッファ２３１〜２３３に保持されている命令コードが利用する演算器の種類に基づき決定される。つまり、命令バッファ２３１の先頭の２つの命令コードが、同じ演算器を利用する場合は同時発行数が１となり、異なる演算器を利用する場合は同時発行数が２となる。また、サイクル２において、スレッド１に対応するフェッチ数ＮＦＩ、命令保持数ＮＨＩ、命令発行数ＮＩＩに基づき上述した（１）式を計算すると、（１）式の条件を満たさない。そのため、サイクル２では、アドレス選択信号ＡｄＳｅｌによりスレッド１が選択される。 In cycle 2, the instruction cache 22 fetches two instruction codes belonging to the thread 1 based on the fact that the thread 1 is selected by the address selection signal AdSel in cycle 1. That is, the fetch number NFI is 2/0/0. In cycle 2, the instruction code fetched by instruction cache 22 in cycle 1 is stored in instruction buffer 231 provided corresponding to thread 1. Therefore, the instruction holding number NHI is 2/0/0. Furthermore, in cycle 2, since one instruction code belonging to the thread 1 is issued from the instruction supply unit 10, the instruction issue number NII is 1/0/0. Here, the number of instructions issued by the instruction supply unit 10 is determined based on the type of arithmetic unit used by the instruction codes held in the instruction buffers 231 to 233, with the maximum value of the simultaneous issue number being 2. That is, when the first two instruction codes of the instruction buffer 231 use the same arithmetic unit, the simultaneous issue number is 1, and when different arithmetic units are used, the simultaneous issue number is two. In the cycle 2, when the above equation (1) is calculated based on the fetch number NFI, the instruction holding number NHI, and the instruction issue number NII corresponding to the thread 1, the condition of the equation (1) is not satisfied. Therefore, in cycle 2, thread 1 is selected by the address selection signal AdSel.

サイクル３では、サイクル２においてアドレス選択信号ＡｄＳｅｌによりスレッド１が選択されていることに基づいて、命令キャッシュ２２がスレッド１に属する命令コードを２個フェッチする。つまり、フェッチ数ＮＦＩは２／０／０となる。また、サイクル３では、サイクル２のフェッチ数ＮＦＩが２／０／０、命令保持数ＮＨＩが２／０／０、命令発行数ＮＩＩが１／０／０である。そのため、サイクル３の命令保持数ＮＨＩは３／０／０となる。さらに、サイクル３では、命令供給部１０からスレッド１に属する命令コードが１つ発行されるため、命令発行数ＮＩＩは１／０／０となる。また、サイクル３において、スレッド１に対応するフェッチ数ＮＦＩ、命令保持数ＮＨＩ、命令発行数ＮＩＩに基づき上述した（１）式を計算すると、（１）式の条件を満たす。そのため、サイクル３では、アドレス選択信号ＡｄＳｅｌによりスレッド２が選択される。 In cycle 3, the instruction cache 22 fetches two instruction codes belonging to the thread 1 based on the fact that the thread 1 is selected by the address selection signal AdSel in cycle 2. That is, the fetch number NFI is 2/0/0. In cycle 3, the fetch number NFI of cycle 2 is 2/0/0, the instruction holding number NHI is 2/0/0, and the instruction issue number NII is 1/0/0. Therefore, the instruction holding number NHI of cycle 3 is 3/0/0. Further, in cycle 3, since one instruction code belonging to the thread 1 is issued from the instruction supply unit 10, the instruction issue number NII is 1/0/0. In the cycle 3, when the above equation (1) is calculated based on the fetch number NFI, the instruction holding number NHI, and the instruction issue number NII corresponding to the thread 1, the condition of the equation (1) is satisfied. Therefore, in cycle 3, thread 2 is selected by address selection signal AdSel.

サイクル４では、サイクル３においてアドレス選択信号ＡｄＳｅｌによりスレッド２が選択されていることに基づいて、命令キャッシュ２２がスレッド２に属する命令コードを２個フェッチする。つまり、フェッチ数ＮＦＩは０／２／０となる。また、サイクル４では、サイクル３のフェッチ数ＮＦＩが２／０／０、命令保持数ＮＨＩが３／０／０、命令発行数ＮＩＩが１／０／０である。そのため、サイクル４の命令保持数ＮＨＩは４／０／０となる。さらに、サイクル４では、命令供給部１０からスレッド１に属する命令コードが２つ発行されるため、命令発行数ＮＩＩは２／０／０となる。また、サイクル４において、スレッド１に対応するフェッチ数ＮＦＩ、命令保持数ＮＨＩ、命令発行数ＮＩＩに基づき上述した（１）式を計算すると、（１）式の条件を満たさない。そのため、サイクル４では、アドレス選択信号ＡｄＳｅｌによりスレッド１が選択される。 In cycle 4, the instruction cache 22 fetches two instruction codes belonging to the thread 2 based on the selection of the thread 2 by the address selection signal AdSel in the cycle 3. That is, the fetch number NFI is 0/2/0. In cycle 4, the fetch number NFI in cycle 3 is 2/0/0, the instruction holding number NHI is 3/0/0, and the instruction issue number NII is 1/0/0. Therefore, the instruction holding number NHI in cycle 4 is 4/0/0. Furthermore, in cycle 4, since the instruction supply unit 10 issues two instruction codes belonging to the thread 1, the instruction issue count NII is 2/0/0. Further, in the cycle 4, when the above-described expression (1) is calculated based on the fetch number NFI, the instruction holding number NHI, and the instruction issue number NII corresponding to the thread 1, the condition of the expression (1) is not satisfied. Therefore, in cycle 4, the thread 1 is selected by the address selection signal AdSel.

サイクル５では、サイクル４においてアドレス選択信号ＡｄＳｅｌによりスレッド１が選択されていることに基づいて、命令キャッシュ２２がスレッド１に属する命令コードを２個フェッチする。つまり、フェッチ数ＮＦＩは２／０／０となる。また、サイクル５では、サイクル４のフェッチ数ＮＦＩが０／２／０、命令保持数ＮＨＩが４／０／０、命令発行数ＮＩＩが２／０／０である。そのため、サイクル５の命令保持数ＮＨＩは２／２／０となる。さらに、サイクル５では、命令供給部１０からスレッド１に属する命令コードとスレッド２に属する命令コードとが１つずつ発行されるため、命令発行数ＮＩＩは１／１／０となる。また、サイクル５において、スレッド１に対応するフェッチ数ＮＦＩ、命令保持数ＮＨＩ、命令発行数ＮＩＩに基づき上述した（１）式を計算すると、（１）式の条件を満たさない。そのため、サイクル５では、アドレス選択信号ＡｄＳｅｌによりスレッド１が選択される。 In cycle 5, the instruction cache 22 fetches two instruction codes belonging to the thread 1 based on the fact that the thread 1 is selected by the address selection signal AdSel in cycle 4. That is, the fetch number NFI is 2/0/0. In cycle 5, the fetch number NFI in cycle 4 is 0/2/0, the instruction holding number NHI is 4/0/0, and the instruction issue number NII is 2/0/0. Therefore, the instruction holding number NHI in cycle 5 is 2/2/0. Further, in cycle 5, since the instruction code belonging to the thread 1 and the instruction code belonging to the thread 2 are issued one by one from the instruction supply unit 10, the instruction issue number NII is 1/1/0. In cycle 5, when the above-described equation (1) is calculated based on the fetch number NFI, the instruction holding number NHI, and the instruction issue number NII corresponding to the thread 1, the condition of the equation (1) is not satisfied. Therefore, in cycle 5, the thread 1 is selected by the address selection signal AdSel.

サイクル６では、サイクル５においてアドレス選択信号ＡｄＳｅｌによりスレッド１が選択されていることに基づいて、命令キャッシュ２２がスレッド１に属する命令コードを２個フェッチする。つまり、フェッチ数ＮＦＩは２／０／０となる。また、サイクル６では、サイクル５のフェッチ数ＮＦＩが２／０／０、命令保持数ＮＨＩが２／２／０、命令発行数ＮＩＩが１／１／０である。そのため、サイクル６の命令保持数ＮＨＩは３／１／０となる。さらに、サイクル６では、命令供給部１０からスレッド１に属する命令コードとスレッド２に属する命令コードとが１つずつ発行されるため、命令発行数ＮＩＩは１／１／０となる。また、サイクル６において、スレッド１に対応するフェッチ数ＮＦＩ、命令保持数ＮＨＩ、命令発行数ＮＩＩに基づき上述した（１）式を計算すると、（１）式の条件を満たす。そのため、サイクル６では、アドレス選択信号ＡｄＳｅｌによりスレッド３が選択される。 In cycle 6, the instruction cache 22 fetches two instruction codes belonging to the thread 1 based on the fact that the thread 1 is selected by the address selection signal AdSel in the cycle 5. That is, the fetch number NFI is 2/0/0. In cycle 6, the fetch number NFI in cycle 5 is 2/0/0, the instruction holding number NHI is 2/2/0, and the instruction issue number NII is 1/1/0. Therefore, the instruction holding number NHI in cycle 6 is 3/1/0. Further, in cycle 6, since the instruction code belonging to the thread 1 and the instruction code belonging to the thread 2 are issued one by one from the instruction supply unit 10, the instruction issue number NII is 1/1/0. Further, in the cycle 6, when the above equation (1) is calculated based on the fetch number NFI, the instruction holding number NHI, and the instruction issue number NII corresponding to the thread 1, the condition of the equation (1) is satisfied. Therefore, in cycle 6, the thread 3 is selected by the address selection signal AdSel.

サイクル７では、サイクル６においてアドレス選択信号ＡｄＳｅｌによりスレッド３が選択されていることに基づいて、命令キャッシュ２２がスレッド３に属する命令コードを２個フェッチする。つまり、フェッチ数ＮＦＩは０／０／２となる。また、サイクル７では、サイクル６のフェッチ数ＮＦＩが１／１／０、命令保持数ＮＨＩが３／１／０、命令発行数ＮＩＩが１／１／０である。そのため、サイクル７の命令保持数ＮＨＩは４／０／０となる。さらに、サイクル７では、命令供給部１０からスレッド１に属する命令コードが２つ発行されるため、命令発行数ＮＩＩは２／０／０となる。また、サイクル７において、スレッド１に対応するフェッチ数ＮＦＩ、命令保持数ＮＨＩ、命令発行数ＮＩＩに基づき上述した（１）式を計算すると、（１）式の条件を満たさない。そのため、サイクル７では、アドレス選択信号ＡｄＳｅｌによりスレッド１が選択される。 In cycle 7, the instruction cache 22 fetches two instruction codes belonging to the thread 3 based on the fact that the thread 3 is selected by the address selection signal AdSel in the cycle 6. That is, the fetch number NFI is 0/0/2. In cycle 7, the fetch number NFI in cycle 6 is 1/1/0, the instruction holding number NHI is 3/1/0, and the instruction issue number NII is 1/1/0. Therefore, the instruction holding number NHI in cycle 7 is 4/0/0. Further, in cycle 7, two instruction codes belonging to the thread 1 are issued from the instruction supply unit 10, so that the instruction issue count NII is 2/0/0. Further, when the above expression (1) is calculated based on the fetch number NFI, the instruction holding number NHI, and the instruction issue number NII corresponding to the thread 1 in the cycle 7, the condition of the expression (1) is not satisfied. Therefore, in cycle 7, the thread 1 is selected by the address selection signal AdSel.

サイクル８では、サイクル７においてアドレス選択信号ＡｄＳｅｌによりスレッド１が選択されていることに基づいて、命令キャッシュ２２がスレッド１に属する命令コードを２個フェッチする。つまり、フェッチ数ＮＦＩは２／０／０となる。また、サイクル８では、サイクル７のフェッチ数ＮＦＩが０／０／２、命令保持数ＮＨＩが４／０／０、命令発行数ＮＩＩが２／０／０である。そのため、サイクル８の命令保持数ＮＨＩは２／０／２となる。さらに、サイクル８では、命令供給部１０からスレッド１に属する命令コードとスレッド３に属する命令コードとが１つずつ発行されるため、命令発行数ＮＩＩは１／０／１となる。また、サイクル８において、スレッド１に対応するフェッチ数ＮＦＩ、命令保持数ＮＨＩ、命令発行数ＮＩＩに基づき上述した（１）式を計算すると、（１）式の条件を満たさない。そのため、サイクル８では、アドレス選択信号ＡｄＳｅｌによりスレッド１が選択される。 In cycle 8, the instruction cache 22 fetches two instruction codes belonging to the thread 1 based on the fact that the thread 1 is selected by the address selection signal AdSel in the cycle 7. That is, the fetch number NFI is 2/0/0. In cycle 8, the fetch number NFI in cycle 7 is 0/0/2, the instruction holding number NHI is 4/0/0, and the instruction issue number NII is 2/0/0. Therefore, the instruction holding number NHI in cycle 8 is 2/0/2. Further, in cycle 8, since the instruction code belonging to the thread 1 and the instruction code belonging to the thread 3 are issued one by one from the instruction supply unit 10, the instruction issue number NII is 1/0/1. In cycle 8, if the above-described equation (1) is calculated based on the fetch number NFI, the instruction holding number NHI, and the instruction issue number NII corresponding to the thread 1, the condition of the equation (1) is not satisfied. Therefore, in cycle 8, the thread 1 is selected by the address selection signal AdSel.

図５で示す例では、サイクル４で副スレッドとして定義されるスレッド２に属する命令コードがフェッチされる。マルチスレッドプロセッサ１では、サイクル３〜サイクル５の動作をスレッド制御部２４により実現することで、主スレッドの実行効率を向上させるという第１の効果を奏する。そこで、サイクル３〜サイクル５の動作について、マルチスレッドプロセッサ１のブロック図を用いてより具体的に説明する。図５のサイクル３〜サイクル５の動作を示すマルチスレッドプロセッサ１のブロック図を図６〜図８に示す。 In the example shown in FIG. 5, the instruction code belonging to the thread 2 defined as the secondary thread in the cycle 4 is fetched. The multi-thread processor 1 achieves the first effect of improving the execution efficiency of the main thread by realizing the operations of cycle 3 to cycle 5 by the thread control unit 24. Therefore, the operations of cycle 3 to cycle 5 will be described more specifically using the block diagram of the multithread processor 1. 6 to 8 are block diagrams of the multi-thread processor 1 showing the operations of cycle 3 to cycle 5 in FIG.

図６は、図５のサイクル３の動作を行っている状態のマルチスレッドプロセッサ１のブロック図である。図６に示すように、サイクル３では、命令バッファ２３１に命令コードＩｍ２、Ｉｍ３、Ｉｍ４が蓄積されている。命令供給部１０は命令コードＩｍ２を発行する。命令キャッシュ２２は、プログラムカウンタ２０１のカウント値に基づき命令コードＩｍ５、Ｉｍ６をフェッチする。 FIG. 6 is a block diagram of the multi-thread processor 1 in a state where the operation of the cycle 3 in FIG. 5 is performed. As shown in FIG. 6, in cycle 3, instruction codes Im2, Im3, and Im4 are stored in the instruction buffer 231. The instruction supply unit 10 issues an instruction code Im2. The instruction cache 22 fetches instruction codes Im5 and Im6 based on the count value of the program counter 201.

続いて、図７は、図５のサイクル４の動作を行っている状態のマルチスレッドプロセッサ１のブロック図である。サイクル３において、スレッド１に対応するフェッチ数ＮＦＩ、命令保持数ＮＨＩ、命令発行数ＮＩＩに基づき上述した（１）式を計算すると、（１）式の条件を満たす。つまり、サイクル４では、命令バッファ２３１に発行最大数ＮＭＩの２倍の命令コードが蓄積された状態となる。従って、スレッド制御部２４は、アドレス選択信号ＡｄＳｅｌによって、サイクル４において、命令バッファ２３２にスレッド２に属する命令コードＩｓａ１、Ｉｓａ２が蓄積されるように、スレッドを制御する。一方、サイクル４では、命令供給部１０が命令コードＩｍ３、Ｉｍ４を発行する。 Next, FIG. 7 is a block diagram of the multi-thread processor 1 in a state where the operation of the cycle 4 in FIG. 5 is performed. In cycle 3, when the above equation (1) is calculated based on the fetch number NFI, the instruction holding number NHI, and the instruction issue number NII corresponding to the thread 1, the condition of the equation (1) is satisfied. That is, in cycle 4, the instruction buffer 231 is in a state where instruction codes twice as large as the maximum number NMI issued are accumulated. Therefore, the thread control unit 24 controls the thread so that the instruction codes Isa1 and Isa2 belonging to the thread 2 are accumulated in the instruction buffer 232 in the cycle 4 by the address selection signal AdSel. On the other hand, in cycle 4, the instruction supply unit 10 issues instruction codes Im3 and Im4.

続いて、図８は、図５のサイクル５の動作を行っている状態のマルチスレッドプロセッサ１のブロック図である。図８に示すように、マルチスレッドプロセッサ１では、サイクル４（図７に示す動作サイクル）において、命令バッファ２３１に命令コードのフェッチが行われない。しかしながら、サイクル３までの動作において、命令バッファ２３１には、発行最大数ＮＭＩの２倍の数の命令コードが蓄積される。そのため、サイクル４において、発行最大数ＮＭＩと同数の命令コードが発行された場合においても、命令バッファ２３１には、次サイクルにおいて発行される可能性のある命令コード数の最大数を満たす命令コードが蓄積された状態を維持する。つまり、副スレッドの実行が、主スレッドの実行に全く影響を与えない。そして、マルチスレッドプロセッサ１は、サイクル４においてアドレス選択信号ＡｄＳｅｌがスレッド１を選択する状態となることに応じて、サイクル５でスレッド１に属する命令コードＩｍ７、Ｉｍ８を命令バッファ２３１に蓄積する。また、サイクル５では、命令バッファ２３１から命令コードＩｍ５が発行され、かつ、命令バッファ２３２から命令コードＩｓａ１が発行される。しかし、サイクル５において、命令バッファ２３１に命令コードＩｍ７、Ｉｍ８が蓄積されることで、サイクル６以降においても、発行される可能性のある命令コード数の最大数を満たす命令コードが命令バッファ２３１に蓄積された状態を維持することができる。 Next, FIG. 8 is a block diagram of the multi-thread processor 1 in a state where the operation of the cycle 5 in FIG. 5 is performed. As shown in FIG. 8, in the multithread processor 1, the instruction code is not fetched into the instruction buffer 231 in cycle 4 (the operation cycle shown in FIG. 7). However, in the operation up to the cycle 3, the instruction buffer 231 stores instruction codes twice as many as the maximum number NMI issued. Therefore, even when the same number of instruction codes as the maximum number NMI issued is issued in cycle 4, instruction codes satisfying the maximum number of instruction codes that may be issued in the next cycle are stored in the instruction buffer 231. Maintain the accumulated state. That is, the execution of the secondary thread has no influence on the execution of the main thread. Then, the multi-thread processor 1 stores the instruction codes Im7 and Im8 belonging to the thread 1 in the instruction buffer 231 in the cycle 5 in response to the address selection signal AdSel being in the state of selecting the thread 1 in the cycle 4. In cycle 5, instruction code Im 5 is issued from instruction buffer 231, and instruction code Isa 1 is issued from instruction buffer 232. However, since the instruction codes Im7 and Im8 are accumulated in the instruction buffer 231 in the cycle 5, an instruction code that satisfies the maximum number of instruction codes that can be issued is stored in the instruction buffer 231 even after the cycle 6. The accumulated state can be maintained.

続いて、命令キャッシュから命令バッファへの命令コードのフェッチ効率を向上させる第２の効果について説明する。第２の効果を説明するためのマルチスレッドプロセッサ１の動作を示すタイミングチャートを図９に示す。図９では、サイクル０〜サイクル８のサイクル毎に、アドレス選択信号ＡｄＳｅｌにより選択されるスレッドとそのときの命令コードのフェッチ数、命令保持数、命令発行数を示した。また、図９に示す例では、扱うスレッド数を３とした。そして、図９に示すタイミングチャートの各欄に示される数字は、スレッド１／スレッド２／スレッド３の命令コード数である。なお、スレッド１は主スレッドであり、スレッド２、３は副スレッドである。また、図９に示す例では、命令供給部１０が同時に発行可能な命令コードの最大数ＮＭＩを２と設定する。 Next, a second effect of improving the instruction code fetch efficiency from the instruction cache to the instruction buffer will be described. A timing chart showing the operation of the multi-thread processor 1 for explaining the second effect is shown in FIG. FIG. 9 shows the thread selected by the address selection signal AdSel and the number of instruction code fetches, the number of instructions held, and the number of instructions issued for each cycle from cycle 0 to cycle 8. Further, in the example shown in FIG. The numbers shown in each column of the timing chart shown in FIG. 9 are the number of instruction codes of thread 1 / thread 2 / thread 3. Note that thread 1 is a main thread, and threads 2 and 3 are secondary threads. In the example shown in FIG. 9, the maximum number NMI of instruction codes that can be issued simultaneously by the instruction supply unit 10 is set to 2.

図９では、サイクル０で動作が開始される。サイクル０では、命令キャッシュ２２から命令コードのフェッチは行われず、命令バッファ２３１〜２３３に保持されている命令コードもない状態である。また、サイクル０において、スレッド１に対応するフェッチ数ＮＦＩ、命令保持数ＮＨＩ、命令発行数ＮＩＩに基づき上述した（１）式を計算すると、（１）式の条件を満たさない。そのため、サイクル０では、アドレス選択信号ＡｄＳｅｌによりスレッド１が選択される。 In FIG. 9, the operation starts at cycle 0. In cycle 0, the instruction code is not fetched from the instruction cache 22, and there is no instruction code held in the instruction buffers 231 to 233. In cycle 0, if the above equation (1) is calculated based on the fetch number NFI, the instruction holding number NHI, and the instruction issue number NII corresponding to the thread 1, the condition of the equation (1) is not satisfied. Therefore, in cycle 0, thread 1 is selected by the address selection signal AdSel.

サイクル２では、サイクル１においてアドレス選択信号ＡｄＳｅｌによりスレッド１が選択されていることに基づいて、命令キャッシュ２２がスレッド１に属する命令コードを２個フェッチする。つまり、フェッチ数ＮＦＩは２／０／０となる。また、サイクル２では、サイクル１で命令キャッシュ２２がフェッチした命令コードがスレッド１に対応して設けられた命令バッファ２３１に格納される。そのため、命令保持数ＮＨＩは２／０／０となる。さらに、サイクル２では、命令供給部１０からスレッド１に属する命令コードが２つ発行されるため、命令発行数ＮＩＩは２／０／０となる。ここで、命令供給部１０が発行する命令数は、同時発行数の最大値を２とし、命令バッファ２３１〜２３３に保持されている命令コードが利用する演算器の種類に基づき決定される。つまり、命令バッファ２３１の先頭の２つの命令コードが、同じ演算器を利用する場合は同時発行数が１となり、異なる演算器を利用する場合は同時発行数が２となる。また、サイクル２において、スレッド１に対応するフェッチ数ＮＦＩ、命令保持数ＮＨＩ、命令発行数ＮＩＩに基づき上述した（１）式を計算すると、（１）式の条件を満たさない。そのため、サイクル２では、アドレス選択信号ＡｄＳｅｌによりスレッド１が選択される。 In cycle 2, the instruction cache 22 fetches two instruction codes belonging to the thread 1 based on the fact that the thread 1 is selected by the address selection signal AdSel in cycle 1. That is, the fetch number NFI is 2/0/0. In cycle 2, the instruction code fetched by instruction cache 22 in cycle 1 is stored in instruction buffer 231 provided corresponding to thread 1. Therefore, the instruction holding number NHI is 2/0/0. Further, in cycle 2, since two instruction codes belonging to the thread 1 are issued from the instruction supply unit 10, the instruction issue number NII is 2/0/0. Here, the number of instructions issued by the instruction supply unit 10 is determined based on the type of arithmetic unit used by the instruction codes held in the instruction buffers 231 to 233, with the maximum value of the simultaneous issue number being 2. That is, when the first two instruction codes of the instruction buffer 231 use the same arithmetic unit, the simultaneous issue number is 1, and when different arithmetic units are used, the simultaneous issue number is two. In the cycle 2, when the above equation (1) is calculated based on the fetch number NFI, the instruction holding number NHI, and the instruction issue number NII corresponding to the thread 1, the condition of the equation (1) is not satisfied. Therefore, in cycle 2, thread 1 is selected by the address selection signal AdSel.

サイクル３では、サイクル２においてアドレス選択信号ＡｄＳｅｌによりスレッド１が選択されていることに基づいて、命令キャッシュ２２がスレッド１に属する命令コードを２個フェッチする。つまり、フェッチ数ＮＦＩは２／０／０となる。また、サイクル３では、サイクル２のフェッチ数ＮＦＩが２／０／０、命令保持数ＮＨＩが２／０／０、命令発行数ＮＩＩが２／０／０である。そのため、サイクル３の命令保持数ＮＨＩは２／０／０となる。さらに、サイクル３では、命令供給部１０からスレッド１に属する命令コードが１つ発行されるため、命令発行数ＮＩＩは１／０／０となる。また、サイクル３において、スレッド１に対応するフェッチ数ＮＦＩ、命令保持数ＮＨＩ、命令発行数ＮＩＩに基づき上述した（１）式を計算すると、（１）式の条件を満たさない。そのため、サイクル３では、アドレス選択信号ＡｄＳｅｌによりスレッド１が選択される。 In cycle 3, the instruction cache 22 fetches two instruction codes belonging to the thread 1 based on the fact that the thread 1 is selected by the address selection signal AdSel in cycle 2. That is, the fetch number NFI is 2/0/0. In cycle 3, the fetch number NFI in cycle 2 is 2/0/0, the instruction holding number NHI is 2/0/0, and the instruction issue number NII is 2/0/0. Therefore, the instruction holding number NHI in cycle 3 is 2/0/0. Further, in cycle 3, since one instruction code belonging to the thread 1 is issued from the instruction supply unit 10, the instruction issue number NII is 1/0/0. In the cycle 3, when the above equation (1) is calculated based on the fetch number NFI, the instruction holding number NHI, and the instruction issue number NII corresponding to the thread 1, the condition of the equation (1) is not satisfied. Therefore, in cycle 3, thread 1 is selected by address selection signal AdSel.

サイクル４では、サイクル３においてアドレス選択信号ＡｄＳｅｌによりスレッド１が選択されていることに基づいて、命令キャッシュ２２がスレッド１に属する命令コードを２個フェッチする。つまり、フェッチ数ＮＦＩは２／０／０となる。また、サイクル４では、サイクル３のフェッチ数ＮＦＩが２／０／０、命令保持数ＮＨＩが２／０／０、命令発行数ＮＩＩが１／０／０である。そのため、サイクル４の命令保持数ＮＨＩは３／０／０となる。さらに、サイクル４では、命令供給部１０からスレッド１に属する命令コードが２つ発行されるため、命令発行数ＮＩＩは１／０／０となる。また、サイクル４において、スレッド１に対応するフェッチ数ＮＦＩ、命令保持数ＮＨＩ、命令発行数ＮＩＩに基づき上述した（１）式を計算すると、（１）式の条件を満たす。そのため、サイクル４では、アドレス選択信号ＡｄＳｅｌによりスレッド２が選択される。 In cycle 4, the instruction cache 22 fetches two instruction codes belonging to the thread 1 based on the fact that the thread 1 is selected by the address selection signal AdSel in the cycle 3. That is, the fetch number NFI is 2/0/0. In cycle 4, the fetch number NFI of cycle 3 is 2/0/0, the instruction holding number NHI is 2/0/0, and the instruction issue number NII is 1/0/0. Therefore, the instruction holding number NHI in cycle 4 is 3/0/0. Furthermore, in cycle 4, since the instruction supply unit 10 issues two instruction codes belonging to the thread 1, the instruction issue number NII is 1/0/0. In the cycle 4, when the above equation (1) is calculated based on the fetch number NFI, the instruction holding number NHI, and the instruction issue number NII corresponding to the thread 1, the condition of the equation (1) is satisfied. Therefore, in cycle 4, the thread 2 is selected by the address selection signal AdSel.

サイクル５では、サイクル４においてアドレス選択信号ＡｄＳｅｌによりスレッド２が選択されていることに基づいて、命令キャッシュ２２がスレッド２に属する命令コードを２個フェッチする。つまり、フェッチ数ＮＦＩは０／２／０となる。また、サイクル５では、サイクル４のフェッチ数ＮＦＩが２／０／０、命令保持数ＮＨＩが３／０／０、命令発行数ＮＩＩが１／０／０である。そのため、サイクル５の命令保持数ＮＨＩは４／０／０となる。さらに、サイクル５では、命令供給部１０からスレッド１に属する命令コードが２つ発行されるため、命令発行数ＮＩＩは２／０／０となる。また、サイクル５において、スレッド１に対応するフェッチ数ＮＦＩ、命令保持数ＮＨＩ、命令発行数ＮＩＩに基づき上述した（１）式を計算すると、（１）式の条件を満たさない。そのため、サイクル５では、アドレス選択信号ＡｄＳｅｌによりスレッド１が選択される。 In cycle 5, the instruction cache 22 fetches two instruction codes belonging to the thread 2 based on the fact that the thread 2 is selected by the address selection signal AdSel in the cycle 4. That is, the fetch number NFI is 0/2/0. In cycle 5, the fetch number NFI in cycle 4 is 2/0/0, the instruction holding number NHI is 3/0/0, and the instruction issue number NII is 1/0/0. Therefore, the instruction holding number NHI in cycle 5 is 4/0/0. Further, in cycle 5, since two instruction codes belonging to the thread 1 are issued from the instruction supply unit 10, the instruction issue number NII is 2/0/0. In cycle 5, when the above-described equation (1) is calculated based on the fetch number NFI, the instruction holding number NHI, and the instruction issue number NII corresponding to the thread 1, the condition of the equation (1) is not satisfied. Therefore, in cycle 5, the thread 1 is selected by the address selection signal AdSel.

サイクル６では、サイクル５においてアドレス選択信号ＡｄＳｅｌによりスレッド１が選択されていることに基づいて、命令キャッシュ２２がスレッド１に属する命令コードを２個フェッチする。つまり、フェッチ数ＮＦＩは２／０／０となる。また、サイクル６では、サイクル５のフェッチ数ＮＦＩが０／２／０、命令保持数ＮＨＩが４／０／０、命令発行数ＮＩＩが２／０／０である。そのため、サイクル６の命令保持数ＮＨＩは２／２／０となる。さらに、サイクル６では、命令供給部１０からスレッド２に属する命令コードが２つ発行されるため、命令発行数ＮＩＩは０／２／０となる。また、サイクル６において、スレッド１に対応するフェッチ数ＮＦＩ、命令保持数ＮＨＩ、命令発行数ＮＩＩに基づき上述した（１）式を計算すると、（１）式の条件を満たす。そのため、サイクル６では、アドレス選択信号ＡｄＳｅｌによりスレッド３が選択される。 In cycle 6, the instruction cache 22 fetches two instruction codes belonging to the thread 1 based on the fact that the thread 1 is selected by the address selection signal AdSel in the cycle 5. That is, the fetch number NFI is 2/0/0. In cycle 6, the fetch number NFI in cycle 5 is 0/2/0, the instruction holding number NHI is 4/0/0, and the instruction issue number NII is 2/0/0. Therefore, the instruction holding number NHI in cycle 6 is 2/2/0. Further, in cycle 6, since two instruction codes belonging to the thread 2 are issued from the instruction supply unit 10, the instruction issue number NII is 0/2/0. Further, in the cycle 6, when the above equation (1) is calculated based on the fetch number NFI, the instruction holding number NHI, and the instruction issue number NII corresponding to the thread 1, the condition of the equation (1) is satisfied. Therefore, in cycle 6, the thread 3 is selected by the address selection signal AdSel.

サイクル７では、サイクル６においてアドレス選択信号ＡｄＳｅｌによりスレッド３が選択されていることに基づいて、命令キャッシュ２２がスレッド１に属する命令コードを２個フェッチする。つまり、フェッチ数ＮＦＩは０／０／２となる。また、サイクル７では、サイクル６のフェッチ数ＮＦＩが２／０／０、命令保持数ＮＨＩが２／２／０、命令発行数ＮＩＩが０／２／０である。そのため、サイクル７の命令保持数ＮＨＩは４／０／０となる。さらに、サイクル７では、命令供給部１０からスレッド１に属する命令コードが２つ発行されるため、命令発行数ＮＩＩは２／０／０となる。また、サイクル７において、スレッド１に対応するフェッチ数ＮＦＩ、命令保持数ＮＨＩ、命令発行数ＮＩＩに基づき上述した（１）式を計算すると、（１）式の条件を満たさない。そのため、サイクル７では、アドレス選択信号ＡｄＳｅｌによりスレッド１が選択される。 In cycle 7, the instruction cache 22 fetches two instruction codes belonging to the thread 1 based on the selection of the thread 3 by the address selection signal AdSel in the cycle 6. That is, the fetch number NFI is 0/0/2. In cycle 7, the fetch number NFI in cycle 6 is 2/0/0, the instruction holding number NHI is 2/2/0, and the instruction issue number NII is 0/2/0. Therefore, the instruction holding number NHI in cycle 7 is 4/0/0. Further, in cycle 7, two instruction codes belonging to the thread 1 are issued from the instruction supply unit 10, so that the instruction issue count NII is 2/0/0. Further, when the above expression (1) is calculated based on the fetch number NFI, the instruction holding number NHI, and the instruction issue number NII corresponding to the thread 1 in the cycle 7, the condition of the expression (1) is not satisfied. Therefore, in cycle 7, the thread 1 is selected by the address selection signal AdSel.

サイクル８では、サイクル７においてアドレス選択信号ＡｄＳｅｌによりスレッド１が選択されていることに基づいて、命令キャッシュ２２がスレッド１に属する命令コードを２個フェッチする。つまり、フェッチ数ＮＦＩは２／０／０となる。また、サイクル８では、サイクル７のフェッチ数ＮＦＩが２／０／０、命令保持数ＮＨＩが４／０／０、命令発行数ＮＩＩが２／０／０である。そのため、サイクル８の命令保持数ＮＨＩは２／０／２となる。さらに、サイクル８では、命令供給部１０からスレッド１に属する命令コードとスレッド３に属する命令コードとが１つずつ発行されるため、命令発行数ＮＩＩは１／０／１となる。また、サイクル８において、スレッド１に対応するフェッチ数ＮＦＩ、命令保持数ＮＨＩ、命令発行数ＮＩＩに基づき上述した（１）式を計算すると、（１）式の条件を満たさない。そのため、サイクル８では、アドレス選択信号ＡｄＳｅｌによりスレッド１が選択される。 In cycle 8, the instruction cache 22 fetches two instruction codes belonging to the thread 1 based on the fact that the thread 1 is selected by the address selection signal AdSel in the cycle 7. That is, the fetch number NFI is 2/0/0. In cycle 8, the fetch number NFI in cycle 7 is 2/0/0, the instruction holding number NHI is 4/0/0, and the instruction issue number NII is 2/0/0. Therefore, the instruction holding number NHI in cycle 8 is 2/0/2. Further, in cycle 8, since the instruction code belonging to the thread 1 and the instruction code belonging to the thread 3 are issued one by one from the instruction supply unit 10, the instruction issue number NII is 1/0/1. In cycle 8, if the above-described equation (1) is calculated based on the fetch number NFI, the instruction holding number NHI, and the instruction issue number NII corresponding to the thread 1, the condition of the equation (1) is not satisfied. Therefore, in cycle 8, the thread 1 is selected by the address selection signal AdSel.

図９で示す例では、サイクル４で副スレッドとして定義されるスレッド２に属する命令コードがフェッチされる。マルチスレッドプロセッサ１では、サイクル３〜サイクル５の動作をスレッド制御部２４により実現することで、命令キャッシュから命令バッファへの命令コードのフェッチ効率を向上させるという第２の効果を奏する。そこで、サイクル３〜サイクル５の動作について、マルチスレッドプロセッサ１のブロック図を用いてより具体的に説明する。図９のサイクル３〜サイクル５の動作を示すマルチスレッドプロセッサ１のブロック図を図１０〜図１２に示す。 In the example shown in FIG. 9, the instruction code belonging to the thread 2 defined as the secondary thread in the cycle 4 is fetched. The multi-thread processor 1 achieves the second effect of improving the fetch efficiency of the instruction code from the instruction cache to the instruction buffer by realizing the operations of cycle 3 to cycle 5 by the thread control unit 24. Therefore, the operations of cycle 3 to cycle 5 will be described more specifically using the block diagram of the multithread processor 1. FIG. 10 to FIG. 12 are block diagrams of the multi-thread processor 1 showing the operations of cycle 3 to cycle 5 in FIG.

図１０は、図９のサイクル３の動作を行っている状態のマルチスレッドプロセッサ１のブロック図である。図１０に示すように、サイクル３では、命令バッファ２３１に命令コードＩｍ３、Ｉｍ４が蓄積されている。命令供給部１０は命令コードＩｍ３を発行する。命令キャッシュ２２は、プログラムカウンタ２０１のカウント値に基づき命令コードＩｍ５、Ｉｍ６をフェッチする。 FIG. 10 is a block diagram of the multi-thread processor 1 in the state where the operation of the cycle 3 in FIG. 9 is performed. As shown in FIG. 10, in cycle 3, instruction codes Im3 and Im4 are stored in the instruction buffer 231. The instruction supply unit 10 issues an instruction code Im3. The instruction cache 22 fetches instruction codes Im5 and Im6 based on the count value of the program counter 201.

続いて、図１１は、図９のサイクル４の動作を行っている状態のマルチスレッドプロセッサ１のブロック図である。サイクル３において、スレッド１に対応するフェッチ数ＮＦＩ、命令保持数ＮＨＩ、命令発行数ＮＩＩに基づき上述した（１）式を計算すると、（１）式の条件を満たさない。従って、スレッド制御部２４は、アドレス選択信号ＡｄＳｅｌによって、サイクル４において、命令バッファ２３１にスレッド１に属する命令コードＩｍ７、Ｉｍ８が蓄積されるように、スレッドを制御する。一方、サイクル４では、命令供給部１０が命令コードＩｍ４を発行する。 Next, FIG. 11 is a block diagram of the multi-thread processor 1 in a state where the operation of the cycle 4 in FIG. 9 is performed. In cycle 3, if the above equation (1) is calculated based on the fetch number NFI, the instruction holding number NHI, and the instruction issue number NII corresponding to the thread 1, the condition of the equation (1) is not satisfied. Therefore, the thread control unit 24 controls the thread so that the instruction codes Im7 and Im8 belonging to the thread 1 are accumulated in the instruction buffer 231 in the cycle 4 by the address selection signal AdSel. On the other hand, in cycle 4, the instruction supply unit 10 issues an instruction code Im4.

続いて、図１２は、図９のサイクル５の動作を行っている状態のマルチスレッドプロセッサ１のブロック図である。サイクル４において、スレッド１に対応するフェッチ数ＮＦＩ、命令保持数ＮＨＩ、命令発行数ＮＩＩに基づき上述した（１）式を計算すると、（１）式の条件を満たす。つまり、サイクル５では、命令バッファ２３１に発行最大数ＮＭＩの２倍の命令コードが蓄積された状態となる。従って、スレッド制御部２４は、アドレス選択信号ＡｄＳｅｌによって、サイクル５において、命令バッファ２３２にスレッド２に属する命令コードＩｓａ１、Ｉｓａ２が蓄積されるように、スレッドを制御する。一方、サイクル５では、命令供給部１０が命令コードＩｍ５、Ｉｍ６を発行する。 Next, FIG. 12 is a block diagram of the multi-thread processor 1 in a state where the operation of the cycle 5 in FIG. 9 is performed. In cycle 4, when the above equation (1) is calculated based on the fetch number NFI, the instruction holding number NHI, and the instruction issue number NII corresponding to the thread 1, the condition of the equation (1) is satisfied. That is, in cycle 5, the instruction buffer 231 is in a state where instruction codes twice as large as the maximum number NMI issued are accumulated. Therefore, the thread control unit 24 controls the thread so that the instruction codes Isa1 and Isa2 belonging to the thread 2 are accumulated in the instruction buffer 232 in the cycle 5 by the address selection signal AdSel. On the other hand, in cycle 5, the instruction supply unit 10 issues instruction codes Im5 and Im6.

一般的には、１つのスレッドから生成される命令コードの命令並列性（ＩＬＰ）には限界がある。この命令並列性により、平均的にフェッチされる命令数は発行される命令数よりも多くなる。そこで、実施の形態１にかかるマルチスレッドプロセッサ１では、上述したように、フェッチ数と命令発行数との差分にかかる命令コードを蓄積する。また、マルチスレッドプロセッサ１では、主スレッドに対応して設けられる命令バッファに最大発行数の命令発行サイクルが２回連続したとしても十分な数の命令コードが格納されるまで主スレッドに対応した命令バッファへの命令コードのフェッチを継続する。このような動作により、マルチスレッドプロセッサ１は、主スレッドに対応して設けられる命令バッファへのフェッチに代えて副スレッドに対応して設けられる命令バッファへのフェッチを行っている間に主スレッドに属する命令コードの発行がストールすることを防止することができる。つまり、マルチスレッドプロセッサ１は、命令キャッシュのフェッチ能力を効率よく利用することができる。 In general, there is a limit to the instruction parallelism (ILP) of an instruction code generated from one thread. Due to this instruction parallelism, the average number of instructions fetched is greater than the number of issued instructions. Therefore, in the multithread processor 1 according to the first embodiment, as described above, the instruction code related to the difference between the fetch number and the instruction issue number is accumulated. Further, in the multi-thread processor 1, an instruction corresponding to the main thread is stored until a sufficient number of instruction codes are stored even if the maximum number of instruction issuing cycles continues twice in the instruction buffer provided corresponding to the main thread. Continue fetching instruction code into buffer. By such an operation, the multi-thread processor 1 allows the main thread to perform fetching to the instruction buffer provided corresponding to the sub thread instead of fetching to the instruction buffer provided corresponding to the main thread. It is possible to prevent the issuance of the instruction code to which it belongs from stalling. That is, the multi-thread processor 1 can efficiently use the fetch capability of the instruction cache.

一般的に、マルチスレッド動作可能なＶＬＩＷプロセッサにおいて一のスレッドを考えた場合、平均的に命令フェッチ数が命令発行数よりも多くなる。上記説明より、実施の形態１にかかるマルチスレッドプロセッサ１は、主スレッドに関し、命令フェッチ数と命令発行数との差分に相当する数の命令コードを命令バッファに蓄積する。また、マルチスレッドプロセッサ１は、スレッド制御部２４が、命令コードのフェッチを行うことなく主スレッドに対応する命令バッファが少なくとも２サイクルの期間命令コードの発行を継続できる場合に限り、副スレッドに対応する命令バッファに命令コードを蓄積する。 In general, when one thread is considered in a VLIW processor capable of multi-thread operation, the number of instruction fetches is larger than the number of instruction issuances on average. From the above description, the multithread processor 1 according to the first embodiment accumulates, in the instruction buffer, the number of instruction codes corresponding to the difference between the instruction fetch number and the instruction issue number for the main thread. In addition, the multi-thread processor 1 supports the secondary thread only when the thread control unit 24 can continue issuing the instruction code for a period of at least two cycles without fetching the instruction code. The instruction code is stored in the instruction buffer.

マルチスレッドプロセッサ１は、上記のような構成及び制御により、主スレッドに対応する命令バッファへの命令コードのフェッチを停止した場合においても命令コードの発行を停止させることがない。つまり、マルチスレッドプロセッサ１は、主スレッドの処理効率を向上させることができる。また、マルチスレッドプロセッサ１は、上記のような構成及び制御により、命令発行数を命令フェッチ数に近づけることができるため、命令キャッシュのフェッチ効率を向上させることができる。 With the configuration and control described above, the multithread processor 1 does not stop issuing instruction codes even when instruction code fetching to the instruction buffer corresponding to the main thread is stopped. That is, the multithread processor 1 can improve the processing efficiency of the main thread. Further, the multi-thread processor 1 can bring the number of issued instructions close to the number of fetched instructions by the configuration and control as described above, so that the fetch efficiency of the instruction cache can be improved.

ここで、マルチスレッドプロセッサ１の処理効率が向上する効果について定量的な説明をする。一般的なマルチスレッドプロセッサでは、命令バッファの命令コードの格納数が命令発行最大数と同じ数に設定される。また、一般的なマルチスレッドプロセッサ１では、所定の選択順序（例えば、ラウンドロビン方式）により処理対処のスレッドを選択する。そこで、このような一般的なマルチスレッドプロセッサの動作を示すタイミングチャートを図１３に示す。 Here, the effect of improving the processing efficiency of the multi-thread processor 1 will be quantitatively described. In a general multi-thread processor, the number of instruction codes stored in the instruction buffer is set to the same number as the maximum number of issued instructions. Further, the general multi-thread processor 1 selects a thread to be processed according to a predetermined selection order (for example, round robin method). FIG. 13 shows a timing chart showing the operation of such a general multi-thread processor.

図１３に示す例では、スレッド１が主スレッドであり、スレッド２、３が副スレッドである。また、図１３に示す例では、スレッド１が２回選択された後にスレッド２が選択され、その後さらにスレッド１が２回選択された後にスレッド３が選択されるというスレッド選択シーケンスが繰り返される。また、図１３に示す例では、命令保持数の最大値は２に制限される。 In the example shown in FIG. 13, the thread 1 is a main thread, and the threads 2 and 3 are secondary threads. In the example illustrated in FIG. 13, the thread selection sequence is repeated in which the thread 2 is selected after the thread 1 is selected twice, and then the thread 3 is selected after the thread 1 is further selected twice. In the example shown in FIG. 13, the maximum value of the instruction holding number is limited to 2.

図１３に示す例では、サイクル１でスレッド１に属する命令コードが２個フェッチされる。そして、サイクル２において、スレッド１に属する命令コードが２個命令バッファに保持されると共に、保持された命令コードの１つが発行される。また、サイクル２では、スレッド１に属する命令コードが２個フェッチされる。 In the example shown in FIG. 13, two instruction codes belonging to thread 1 are fetched in cycle 1. In cycle 2, two instruction codes belonging to the thread 1 are held in the instruction buffer, and one of the held instruction codes is issued. In cycle 2, two instruction codes belonging to thread 1 are fetched.

そして、サイクル３において、サイクル２でフェッチされたスレッド１に属する命令コードが命令バッファに格納されるが、このとき、サイクル２で発行されなかった命令コードが命令バッファに残っている。そのため、サイクル３では、サイクル２でフェッチされた命令コードのうち１つは命令バッファに蓄積されるが、他の１つは廃棄される。また、サイクル３では、スレッド１に属する命令コードが１つ発行される。さらに、サイクル３では、スレッド２に属する命令コードが２個フェッチされる。そして、サイクル４では、命令保持数が１／２／０となる。 In cycle 3, the instruction code belonging to thread 1 fetched in cycle 2 is stored in the instruction buffer. At this time, the instruction code not issued in cycle 2 remains in the instruction buffer. Therefore, in cycle 3, one of the instruction codes fetched in cycle 2 is stored in the instruction buffer, but the other one is discarded. In cycle 3, one instruction code belonging to thread 1 is issued. Further, in cycle 3, two instruction codes belonging to the thread 2 are fetched. In cycle 4, the number of instructions held is 1/2/0.

このサイクル４で、スレッド１に属する命令コードを２つ発行使用とした場合、スレッド１に関する命令保持数は１である。そのため、サイクル４において、スレッド１の命令コードが発行できないストール状態が発生する。このサイクル４までの期間にフェッチされるスレッド１に属する命令コード数が６であり、発行できなかった命令コード数は１があるため、サイクル４までの間において、従来のマルチスレッドプロセッサでは、スレッド１に関し１６％（＝１／６）の性能低下が生じる。 In this cycle 4, when two instruction codes belonging to the thread 1 are issued and used, the number of instructions held for the thread 1 is 1. Therefore, in cycle 4, a stall state in which the instruction code of thread 1 cannot be issued occurs. Since the number of instruction codes belonging to the thread 1 fetched in the period up to the cycle 4 is 6 and the number of instruction codes that could not be issued is 1, in the conventional multithread processor until the cycle 4, A performance degradation of 16% (= 1/6) occurs for 1.

また、平均的な命令並列性（ＩＬＰ）の限界のために、命令フェッチ数は命令発行数よりも大きくなる。図１３に示す例では、サイクル２では命令フェッチ数が２であるのに対して命令発行数が１である。つまり、図１３に示す例では、サイクル２でフェッチされた名コードが１つ無駄になっている。従来のマルチスレッドプロセッサでは、サイクル４までの間に１６個の命令コードがフェッチされ、２つの命令コードが廃棄されていることから命令フェッチ能力に関し１２．５％（＝２／１６）の性能低下が生じる。 In addition, the number of instruction fetches is larger than the number of issued instructions due to an average instruction parallelism (ILP) limit. In the example shown in FIG. 13, in cycle 2, the number of instruction fetches is 1, while the number of instruction issuances is 1. That is, in the example shown in FIG. 13, one name code fetched in cycle 2 is wasted. In the conventional multi-thread processor, 16 instruction codes are fetched by cycle 4 and 2 instruction codes are discarded, so that the instruction fetch capability is reduced by 12.5% (= 2/16). Occurs.

実施の形態１にかかるマルチスレッドプロセッサ１では、主スレッドに対応する命令バッファには命令コードのフェッチを行うことなく２サイクル以上命令コードを発行することができるため、従来のマルチスレッドプロセッサのような性能低下は生じない。 In the multithread processor 1 according to the first embodiment, an instruction code can be issued to the instruction buffer corresponding to the main thread for two cycles or more without fetching the instruction code. There is no performance degradation.

なお、実施の形態１にかかるマルチスレッドプロセッサ１では、主スレッドに対応する命令バッファの容量よりも副スレッドに対応する命令バッファの容量を小さくすることで、複数レッドに対応する命令バッファに関する回路面積を削減することができる。 In the multi-thread processor 1 according to the first embodiment, the circuit area related to the instruction buffers corresponding to a plurality of red is reduced by reducing the capacity of the instruction buffer corresponding to the sub thread rather than the capacity of the instruction buffer corresponding to the main thread. Can be reduced.

実施の形態２
実施の形態２では、スレッド制御部２４の別の形態について説明する。スレッド制御部２４の別の形態を示すスレッド制御部２４ａのブロック図を図１４に示す。図１４に示すように、スレッド制御部２４ａは、スレッド制御部２４に低スレッド強制処理部を追加したものである。低スレッド強制処理部は、アドレス選択信号ＡｄＳｅｌ［１］が第１のカウント値を選択する状態を示した動作サイクル数をカウントし、当該動作サイクルが規定回数に達したことに応じてアドレス選択信号ＡｄＳｅｌ［２］〜ＡｄＳｅｌ［ｍ］が第２のカウント値を選択した状態とする。スレッド制御部２４ａでは、低スレッド強制処理部をカウンタ３３及びＡＮＤ回路３４により構成する。 Embodiment 2
In the second embodiment, another form of the thread control unit 24 will be described. A block diagram of a thread control unit 24a showing another form of the thread control unit 24 is shown in FIG. As shown in FIG. 14, the thread control unit 24 a is obtained by adding a low thread forcible processing unit to the thread control unit 24. The low thread forcible processing unit counts the number of operation cycles in which the address selection signal AdSel [1] indicates the state of selecting the first count value, and the address selection signal in response to the number of operation cycles reaching a specified number. It is assumed that AdSel [2] to AdSel [m] have selected the second count value. In the thread control unit 24 a, the low thread forcible processing unit includes a counter 33 and an AND circuit 34.

カウンタ３３は、アドレス選択信号ＡｄＳｅｌ［１］が選択状態を示す動作サイクル数をカウントする。そして、カウンタ３３は、動作サイクル数が規定回数に達したことに応じて、抑制信号をアサート（アドレス選択信号ＡｄＳｅｌ［１］が非選択状態を指定する状態）する。ＡＮＤ回路３４は、抑制信号がアサートされている期間は、アドレス選択信号ＡｄＳｅｌ［１］を非選択状態とし、抑制信号がネゲートされている期間は、高スレッド選択部３０の出力信号をアドレス選択信号ＡｄＳｅｌ［１］とする。 The counter 33 counts the number of operation cycles in which the address selection signal AdSel [1] indicates a selected state. Then, the counter 33 asserts a suppression signal (a state in which the address selection signal AdSel [1] specifies a non-selected state) in response to the number of operation cycles reaching the specified number. The AND circuit 34 sets the address selection signal AdSel [1] to the non-selected state during the period when the suppression signal is asserted, and outputs the output signal of the high thread selection unit 30 as the address selection signal during the period when the suppression signal is negated. Let AdSel [1].

続いて、スレッド制御部２４ａを有するマルチスレッドプロセッサ１ａの動作について説明する。図１５にマルチスレッドプロセッサ１ａの動作を示すタイミングチャートを示す。図１５に示す例では、カウンタ３３の規定回数として２が設定される。図１５に示すように、スレッド制御部２４ａは、フェッチ数ＮＦＩ、命令保持数ＮＨＩ、命令発行数ＮＩＩの値にかかわらずカウント値ＣＮＴが２に達した動作サイクルで副スレッドを選択した状態とする。 Next, the operation of the multithread processor 1a having the thread control unit 24a will be described. FIG. 15 is a timing chart showing the operation of the multithread processor 1a. In the example shown in FIG. 15, 2 is set as the specified number of times for the counter 33. As shown in FIG. 15, the thread control unit 24a selects the secondary thread in the operation cycle in which the count value CNT reaches 2 regardless of the values of the fetch number NFI, the instruction holding number NHI, and the instruction issue number NII. .

スレッド制御部２４がフェッチ数ＮＦＩ、命令保持数ＮＨＩ、命令発行数ＮＩＩの値のみを考慮してアドレス選択信号ＡｄＳｅｌによる選択スレッドの切り替えを行った場合、副スレッドに属する命令コードがフェッチされないことが生じる。しかし、実施の形態２にかかるスレッド制御部２４ａを用いた場合、所定の動作サイクル数毎に副スレッドに属する命令コードがフェッチされる。つまり、実施の形態２にかかるマルチスレッドプロセッサ１ａでは、副スレッドの処理が未処理のままとなることを防ぐことができる。 When the thread control unit 24 switches the selected thread by the address selection signal AdSel considering only the values of the fetch number NFI, the instruction holding number NHI, and the instruction issue number NII, the instruction code belonging to the sub thread may not be fetched. Arise. However, when the thread control unit 24a according to the second embodiment is used, the instruction code belonging to the sub thread is fetched every predetermined number of operation cycles. That is, in the multithread processor 1a according to the second embodiment, it is possible to prevent the processing of the secondary thread from being left unprocessed.

実施の形態３
実施の形態３では、複数のスレッドを実行している場合に、一のスレッドが他のスレッドの演算の終了を待ち合わせて処理を進める待ち合わせ動作を行うマルチスレッドプロセッサについて説明する。この待ち合わせ処理は、例えば、主スレッドが副スレッドの演算結果を用いる場合に有効である。以下の説明では、動作及びマルチスレッドプロセッサの一例として、スレッド１（主スレッド）がスレッド２（副スレッド）の演算終了を待って処理を進める例について説明する。 Embodiment 3
In the third embodiment, a description will be given of a multi-thread processor that performs a waiting operation in which one thread waits for the end of calculation of another thread and proceeds with processing when a plurality of threads are executed. This waiting process is effective, for example, when the main thread uses the operation result of the secondary thread. In the following description, as an example of the operation and the multi-thread processor, an example will be described in which the thread 1 (main thread) waits for the computation of the thread 2 (secondary thread) to end and the process proceeds.

実施の形態３にかかるマルチスレッドプロセッサ２のブロック図を図１６に示す。図１６に示すように、マルチスレッドプロセッサ２は、マルチスレッドプロセッサ１に待ち合わせ制御部２５を追加し、かつ、スレッド制御部２４に代えてスレッド制御部２４ｂを用いたものである。 FIG. 16 is a block diagram of the multithread processor 2 according to the third embodiment. As shown in FIG. 16, the multi-thread processor 2 is obtained by adding a waiting control unit 25 to the multi-thread processor 1 and using a thread control unit 24 b instead of the thread control unit 24.

待ち合わせ制御部２５は、アドレス選択信号ＡｄＳｅｌが所定のアドレス値を選択する状態となることを抑制する制御信号ＳＣをスレッド制御部２４ｂに出力し、第１のカウント値ＰＣ１が予め設定された第１の切り替え閾値Ｃ１に達した場合に制御信号ＳＣをアドレス選択信号ＡｄＳｅｌが第１のカウント値ＰＣ１に対応するスレッドを選択する状態となることを防止するスレッド抑制状態とし、その後、第２のカウント値ＰＣ２が予め設定された第２の切り替え閾値Ｃ２に達した場合に制御信号ＳＣのスレッド抑制状態を解除する。図１６に示す例では、第１のカウント値としてプログラムカウンタ２０１のカウント値ＰＣ１が用いられ、第２のカウント値としてプログラムカウンタ２０２のカウント値ＰＣ２が用いられる。 The queuing control unit 25 outputs a control signal SC that suppresses the address selection signal AdSel from selecting a predetermined address value to the thread control unit 24b, and the first count value PC1 is set to a first value set in advance. When the switching threshold C1 is reached, the control signal SC is set to a thread suppression state that prevents the address selection signal AdSel from selecting a thread corresponding to the first count value PC1, and then the second count value is set. When the PC2 reaches the preset second switching threshold C2, the thread suppression state of the control signal SC is canceled. In the example shown in FIG. 16, the count value PC1 of the program counter 201 is used as the first count value, and the count value PC2 of the program counter 202 is used as the second count value.

ここで、待ち合わせ制御部２５のブロック図を図１７に示す。図１７に示すように、待ち合わせ制御部２５は、比較器４０１、４０２、制御信号生成部４１１、４１２を有する。比較器４０１には、カウント値ＰＣ１と切り替え閾値Ｃ１が入力される。比較器４０２には、カウント値ＰＣ２と切り替え閾値Ｃ２が入力される。そして、比較器４０１、４０２は、入力されるカウント値と切り替え閾値とが一致した場合にスレッド切り替え通知信号ＣＲをアサート（例えば、１）する。この切り替え閾値Ｃ１、Ｃ２は、図示しない他の回路によって設定される。なお、図１７に示す例では、比較器４０１がスレッド切り替え通知信号ＣＲ１を出力し、比較器４０２がスレッド切り替え通知信号ＣＲ２を出力する。さらに、スレッド切り替え通知信号ＣＲ１は制御信号生成部４１１に出力され、スレッド切り替え通知信号ＣＲ２は制御信号生成部４１１、４１２に出力される。 Here, a block diagram of the waiting control unit 25 is shown in FIG. As illustrated in FIG. 17, the waiting control unit 25 includes comparators 401 and 402 and control signal generation units 411 and 412. The comparator 401 receives the count value PC1 and the switching threshold C1. The comparator 402 receives the count value PC2 and the switching threshold C2. The comparators 401 and 402 assert the thread switching notification signal CR (for example, 1) when the input count value matches the switching threshold. The switching threshold values C1 and C2 are set by another circuit (not shown). In the example shown in FIG. 17, the comparator 401 outputs a thread switching notification signal CR1, and the comparator 402 outputs a thread switching notification signal CR2. Further, the thread switching notification signal CR1 is output to the control signal generation unit 411, and the thread switching notification signal CR2 is output to the control signal generation units 411 and 412.

なお、比較器が対応するスレッドが待ち合わせ処理を必要としない場合、切り替え閾値としてカウント値が取り得ない値を設定することで、対応するスレッドの待ち合わせ処理を無効にすることができる。例えば、プログラムカウンタのカウント値の初期値が１である場合、切り替え閾値として０を設定すれば良い。 If the thread to which the comparator corresponds does not require the waiting process, the waiting process for the corresponding thread can be invalidated by setting a value that cannot be counted as the switching threshold. For example, when the initial value of the count value of the program counter is 1, 0 may be set as the switching threshold.

制御信号生成部４１１は、スレッド切り替え通知信号ＣＲ１、ＣＲ２が入力され、制御信号ＳＣ［１］を生成する。制御信号ＳＣ［１］は、スレッド１に対応する制御信号である。そして、制御信号生成部４１１は、スレッド切り替え通知信号ＣＲ１がアサート状態とされるたことに応じて制御信号ＳＣ［１］をアサート状態する。さらに、制御信号生成部４１１は、スレッド切り替え通知信号ＣＲ１がアサート状態である期間にスレッド切り替え通知信号ＣＲ２がアサート状態とされることに応じて制御信号ＳＣ［１］をアサート状態からネゲート状態とする。 The control signal generator 411 receives the thread switching notification signals CR1 and CR2 and generates a control signal SC [1]. The control signal SC [1] is a control signal corresponding to the thread 1. Then, the control signal generation unit 411 asserts the control signal SC [1] in response to the thread switching notification signal CR1 being asserted. Further, the control signal generation unit 411 changes the control signal SC [1] from the asserted state to the negated state in response to the thread switching notification signal CR2 being asserted during the period in which the thread switching notification signal CR1 is in the asserted state. .

ここで、制御信号生成部４１１の詳細な回路図を図１８に示す。図１８に示すように、制御信号生成部４１１は、ＡＮＤ回路５１１、５１２、６１を有する。ＡＮＤ回路５１１は、スレッド切り替え通知信号ＣＲ１と待ち合わせ設定値Ｓ１との論理積を出力する。ＡＮＤ回路５１２は、スレッド切り替え通知信号ＣＲ２の反転値と待ち合わせ設定値Ｓ２との論理積を出力する。ＡＮＤ回路６１は、ＡＮＤ回路５１１の出力値とＡＮＤ回路５１２の出力値との論理積を制御信号ＳＣ［１］として出力する。 Here, a detailed circuit diagram of the control signal generator 411 is shown in FIG. As illustrated in FIG. 18, the control signal generation unit 411 includes AND circuits 511, 512, and 61. The AND circuit 511 outputs a logical product of the thread switching notification signal CR1 and the waiting set value S1. The AND circuit 512 outputs a logical product of the inverted value of the thread switching notification signal CR2 and the waiting set value S2. The AND circuit 61 outputs a logical product of the output value of the AND circuit 511 and the output value of the AND circuit 512 as the control signal SC [1].

制御信号生成部４１２は、スレッド切り替え通知信号ＣＲ２が入力され、制御信号ＳＣ［２］を生成する。制御信号ＳＣ［２］は、スレッド２に対応する制御信号である。そして、制御信号生成部４１２は、スレッド切り替え通知信号ＣＲ２がアサート状態とされるたことに応じて制御信号ＳＣ［２］をアサート状態する。 The control signal generator 412 receives the thread switching notification signal CR2 and generates the control signal SC [2]. The control signal SC [2] is a control signal corresponding to the thread 2. Then, the control signal generation unit 412 asserts the control signal SC [2] in response to the thread switching notification signal CR2 being asserted.

ここで、制御信号生成部４１２の詳細な回路図を図１９に示す。図１９に示すように、制御信号生成部４１２は、ＡＮＤ回路５２１を有する。ＡＮＤ回路５２１は、スレッド切り替え通知信号ＣＲ２と待ち合わせ設定値Ｓ２との論理積を制御信号ＳＣ［２］として出力する。 Here, a detailed circuit diagram of the control signal generator 412 is shown in FIG. As illustrated in FIG. 19, the control signal generation unit 412 includes an AND circuit 521. The AND circuit 521 outputs a logical product of the thread switching notification signal CR2 and the waiting set value S2 as the control signal SC [2].

上記待ち合わせ設定値Ｓ１、Ｓ２は、例えば、０又は１を示す値であり、スレッド１、２がどのような待ち合わせ処理を行うかによって設定される値である。本実施の形態では、スレッド１のカウント値ＰＣ１が切り替え閾値Ｃ１に達した状態でスレッド２のカウント値ＰＣ１が切り替え閾値Ｃ２に達していない場合に、スレッド１に関する命令コードのフェッチを抑制し、スレッド２の処理を待ち合わせる。そのため、待ち合わせ設定値Ｓ１、Ｓ２としては１が設定される。待ち合わせ処理を行わない場合待ち合わせ設定値Ｓ１、Ｓ２は０に設定される。待ち合わせ設定値Ｓ１、Ｓ２は、図示しない他の回路によって設定される。 The waiting set values S1 and S2 are values indicating, for example, 0 or 1, and are set depending on what waiting process the threads 1 and 2 perform. In the present embodiment, when the count value PC1 of the thread 1 has reached the switching threshold value C1 and the count value PC1 of the thread 2 has not reached the switching threshold value C2, fetching of instruction codes related to the thread 1 is suppressed, and the thread Wait for process 2 Therefore, 1 is set as the waiting setting values S1 and S2. When the waiting process is not performed, the waiting setting values S1 and S2 are set to zero. The waiting set values S1 and S2 are set by another circuit (not shown).

続いてスレッド制御部２４ｂについて説明する。スレッド制御部２４ｂは、スレッド制御部２４ａに制御信号ＳＣに基づくアドレス選択信号ＡｄＳｅｌの制御機能を追加したものである。具体的には、スレッド制御部２４ｂは、制御信号ＳＣがアサートされた場合に、アサートされた制御信号ＳＣに対応するスレッドを除くスレッドのみが選択されるようにアドレス選択信号ＡｄＳｅｌを制御する。 Next, the thread control unit 24b will be described. The thread control unit 24b is obtained by adding a control function of the address selection signal AdSel based on the control signal SC to the thread control unit 24a. Specifically, when the control signal SC is asserted, the thread control unit 24b controls the address selection signal AdSel so that only threads other than the thread corresponding to the asserted control signal SC are selected.

スレッド制御部２４ｂのブロック図を図２０に示す。図２０に示すように、スレッド制御部２４ｂは、スレッド制御部２４ａのラウンドロビン選択部３１に代えてラウンドロビン選択部３５を備え、ＡＮＤ回路３４に代えてＡＮＤ回路３６を備える。 A block diagram of the thread control unit 24b is shown in FIG. As shown in FIG. 20, the thread control unit 24b includes a round robin selection unit 35 instead of the round robin selection unit 31 of the thread control unit 24a, and includes an AND circuit 36 instead of the AND circuit 34.

ラウンドロビン選択部３５は、アドレス選択信号ＡｄＳｅｌ［１］に応じて副スレッドに属するスレッドを巡回的に選択する信号を生成する。また、ラウンドロビン選択部３５には、制御信号ＳＣ［２］が入力される。ラウンドロビン選択部３５は、制御信号ＳＣ［２］がアサート状態であれば、スレッド２を除くスレッドに対応する信号を巡回的に選択状態とする。 The round robin selection unit 35 generates a signal for cyclically selecting a thread belonging to the secondary thread in response to the address selection signal AdSel [1]. The round robin selection unit 35 receives the control signal SC [2]. If the control signal SC [2] is in the asserted state, the round robin selecting unit 35 cyclically selects signals corresponding to the threads other than the thread 2.

ＡＮＤ回路３６は、カウンタ３３が出力する抑制信号と、高スレッド選択部３０の出力信号、及び、制御信号ＳＣ［１］が入力される。そして、ＡＮＤ回路３６は、抑制信号がアサートされている期間又は制御信号ＳＣ［１］がアサートされている期間は、アドレス選択信号ＡｄＳｅｌ［１］を非選択状態とし、抑制信号と制御信号ＳＣ［１］との両方がネゲートされている期間は、高スレッド選択部３０の出力信号をアドレス選択信号ＡｄＳｅｌ［１］とする。 The AND circuit 36 receives the suppression signal output from the counter 33, the output signal from the high thread selection unit 30, and the control signal SC [1]. The AND circuit 36 sets the address selection signal AdSel [1] in a non-selected state during the period when the suppression signal is asserted or the period when the control signal SC [1] is asserted, and the suppression signal and the control signal SC [ 1] is negated, the output signal of the high thread selection unit 30 is the address selection signal AdSel [1].

続いて、実施の形態３にかかるマルチスレッドプロセッサ２の動作について説明する。図２１にマルチスレッドプロセッサ２の動作を示すタイミングチャートを示す。図２１に示す例では、処理対象のスレッド数を３とし、スレッド１を主スレッド、スレッド２、３を副スレッドとした。また、カウンタ３３の所定値を７、切り替え閾値Ｃ１をｐ（ｐは整数）、切り替え閾値Ｃ２をｑ（ｑは整数）、待ち合わせ設定値Ｓ１、Ｓ２を１とした。つまり、図２１に示す例は、スレッド１が終了し、かつ、その終了時点でスレッド２が所定の演算を終えていない場合にスレッド１のフェッチが抑制される状態を示すものである。 Next, the operation of the multithread processor 2 according to the third embodiment will be described. FIG. 21 shows a timing chart showing the operation of the multi-thread processor 2. In the example shown in FIG. 21, the number of threads to be processed is 3, thread 1 is a main thread, and threads 2 and 3 are secondary threads. The predetermined value of the counter 33 is 7, the switching threshold C1 is p (p is an integer), the switching threshold C2 is q (q is an integer), and the waiting setting values S1 and S2 are 1. In other words, the example shown in FIG. 21 shows a state in which the fetch of the thread 1 is suppressed when the thread 1 is terminated and the thread 2 has not finished the predetermined calculation at the time of termination.

図２１に示す例では、サイクル０においてスレッド１が起動する。サイクル０では、命令キャッシュ２２から命令コードのフェッチは行われず、命令バッファ２３１〜２３３に保持されている命令コードもない状態である。また、サイクル０において、スレッド１に対応するフェッチ数ＮＦＩ、命令保持数ＮＨＩ、命令発行数ＮＩＩに基づき上述した（１）式を計算すると、（１）式の条件を満たさない。そのため、サイクル０では、アドレス選択信号ＡｄＳｅｌによりスレッド１が選択される。 In the example shown in FIG. 21, thread 1 is activated in cycle 0. In cycle 0, the instruction code is not fetched from the instruction cache 22, and there is no instruction code held in the instruction buffers 231 to 233. In cycle 0, if the above equation (1) is calculated based on the fetch number NFI, the instruction holding number NHI, and the instruction issue number NII corresponding to the thread 1, the condition of the equation (1) is not satisfied. Therefore, in cycle 0, thread 1 is selected by the address selection signal AdSel.

サイクル１では、サイクル０においてアドレス選択信号ＡｄＳｅｌによりスレッド１が選択されていることに基づいて、命令キャッシュ２２がスレッド１に属する命令コードを２個フェッチする。つまり、フェッチ数ＮＦＩは２／０／０となる。また、サイクル１では、命令キャッシュ２２がフェッチした命令コードは命令バッファに格納されていない。また、サイクル１において、スレッド１に対応するフェッチ数ＮＦＩ、命令保持数ＮＨＩ、命令発行数ＮＩＩに基づき上述した（１）式を計算すると、（１）式の条件を満たさない。そのため、サイクル１では、アドレス選択信号ＡｄＳｅｌによりスレッド１が選択される。また、サイクル１では、アドレス選択信号ＡｄＳｅｌ［１］が選択状態であるため、カウンタ３３のカウント値ＣＮＴが１増加する In cycle 1, the instruction cache 22 fetches two instruction codes belonging to the thread 1 based on the fact that the thread 1 is selected by the address selection signal AdSel in the cycle 0. That is, the fetch number NFI is 2/0/0. In cycle 1, the instruction code fetched by the instruction cache 22 is not stored in the instruction buffer. In cycle 1, when the above-described expression (1) is calculated based on the fetch number NFI, instruction holding number NHI, and instruction issue number NII corresponding to thread 1, the condition of expression (1) is not satisfied. Therefore, in cycle 1, thread 1 is selected by address selection signal AdSel. In cycle 1, since the address selection signal AdSel [1] is in the selected state, the count value CNT of the counter 33 increases by 1.

そして、サイクル２からサイクルｎまでは、実施の形態３にかかるマルチスレッドプロセッサ２は、実施の形態２にかかるマルチスレッドプロセッサ１ａと同様の動作により処理が進む。そして、処理がサイクルｎに達した時点において、マルチスレッドプロセッサ２は、スレッド１に属する命令コードを２つフェッチする。つまり、サイクルｎにおけるフェッチ数は２／０／０である。また、サイクルｎでは、サイクルｎ−１までの処理によって、命令保持数ＮＨＩは２／０／０となる。さらに、サイクルｎでは、命令供給部１０からスレッド１に属する命令コードが２つ発行されるため、命令発行数ＮＩＩは２／０／０となる。また、サイクルｎにおいて、スレッド１に対応するフェッチ数ＮＦＩ、命令保持数ＮＨＩ、命令発行数ＮＩＩに基づき上述した（１）式を計算すると、（１）式の条件を満たさない。そのため、サイクルｎでは、アドレス選択信号ＡｄＳｅｌによりスレッド１が選択される。また、サイクルｎでは、アドレス選択信号ＡｄＳｅｌ［１］が選択状態であるため、カウント値ＣＮＴが１増加して２となる。 From cycle 2 to cycle n, the multithread processor 2 according to the third embodiment proceeds by the same operation as the multithread processor 1a according to the second embodiment. When the processing reaches cycle n, the multithread processor 2 fetches two instruction codes belonging to the thread 1. That is, the number of fetches in cycle n is 2/0/0. In cycle n, the instruction holding number NHI becomes 2/0/0 by the processing up to cycle n-1. Further, in cycle n, since two instruction codes belonging to the thread 1 are issued from the instruction supply unit 10, the instruction issue count NII is 2/0/0. In the cycle n, when the above equation (1) is calculated based on the fetch number NFI, the instruction holding number NHI, and the instruction issue number NII corresponding to the thread 1, the condition of the equation (1) is not satisfied. Therefore, in cycle n, the thread 1 is selected by the address selection signal AdSel. In cycle n, since the address selection signal AdSel [1] is in the selected state, the count value CNT is increased by 1 to 2.

サイクルｎ＋１では、サイクルｎにおいてアドレス選択信号ＡｄＳｅｌによりスレッド１が選択されていることに基づいて、命令キャッシュ２２がスレッド１に属する命令コードを２個フェッチする。つまり、フェッチ数ＮＦＩは２／０／０となる。また、サイクルｎ＋１では、サイクルｎのフェッチ数ＮＦＩが２／０／０、命令保持数ＮＨＩが２／０／０、命令発行数ＮＩＩが２／０／０である。そのため、サイクルｎ＋１の命令保持数ＮＨＩは２／０／０となる。さらに、サイクルｎ＋１では、命令供給部１０からスレッド１に属する命令コードが２つ発行されるため、命令発行数ＮＩＩは２／０／０となる。また、サイクルｎ＋１において、スレッド１に対応するフェッチ数ＮＦＩ、命令保持数ＮＨＩ、命令発行数ＮＩＩに基づき上述した（１）式を計算すると、（１）式の条件を満たさない。そのため、サイクルｎ＋１では、アドレス選択信号ＡｄＳｅｌによりスレッド１が選択される。また、サイクルｎでは、アドレス選択信号ＡｄＳｅｌ［１］が選択状態であるため、カウント値ＣＮＴが１増加して２となる。 In cycle n + 1, the instruction cache 22 fetches two instruction codes belonging to the thread 1 based on the fact that the thread 1 is selected by the address selection signal AdSel in the cycle n. That is, the fetch number NFI is 2/0/0. In cycle n + 1, the fetch number NFI of cycle n is 2/0/0, the instruction holding number NHI is 2/0/0, and the instruction issue number NII is 2/0/0. Therefore, the instruction holding number NHI of cycle n + 1 is 2/0/0. Further, in cycle n + 1, since the instruction supply unit 10 issues two instruction codes belonging to the thread 1, the instruction issue number NII is 2/0/0. In the cycle n + 1, if the above equation (1) is calculated based on the fetch number NFI, the instruction holding number NHI, and the instruction issue number NII corresponding to the thread 1, the condition of the equation (1) is not satisfied. Therefore, in cycle n + 1, thread 1 is selected by the address selection signal AdSel. In cycle n, since the address selection signal AdSel [1] is in the selected state, the count value CNT is increased by 1 to 2.

サイクルｎ＋２では、サイクルｎ＋１においてアドレス選択信号ＡｄＳｅｌによりスレッド１が選択されていることに基づいて、命令キャッシュ２２がスレッド１に属する命令コードを２個フェッチする。つまり、フェッチ数ＮＦＩは２／０／０となる。また、サイクルｎ＋２では、サイクルｎ＋１のフェッチ数ＮＦＩが２／０／０、命令保持数ＮＨＩが２／０／０、命令発行数ＮＩＩが２／０／０である。そのため、サイクルｎ＋２の命令保持数ＮＨＩは２／０／０となる。さらに、サイクルｎ＋２では、命令供給部１０からスレッド１に属する命令コードが１つずつ発行されるため、命令発行数ＮＩＩは１／０／０となる。 In cycle n + 2, the instruction cache 22 fetches two instruction codes belonging to the thread 1 based on the fact that the thread 1 is selected by the address selection signal AdSel in the cycle n + 1. That is, the fetch number NFI is 2/0/0. In cycle n + 2, the fetch number NFI of cycle n + 1 is 2/0/0, the instruction holding number NHI is 2/0/0, and the instruction issue number NII is 2/0/0. Therefore, the instruction holding number NHI of cycle n + 2 is 2/0/0. Furthermore, in cycle n + 2, since the instruction supply unit 10 issues instruction codes belonging to the thread 1 one by one, the instruction issue count NII is 1/0/0.

ここで、図２１に示す例では、サイクルｎ＋２において、プログラムカウンタ２０１のカウント値ＰＣ１（例えば、第１のアドレス値）がｐに達するものとする。そのため、サイクルｎ＋２において、スレッド１に対応するフェッチ数ＮＦＩ、命令保持数ＮＨＩ、命令発行数ＮＩＩに基づき上述した（１）式を計算すると、（１）式の条件を満たさないが、サイクルｎ＋２では、制御信号ＳＣ［１］がアサート状態となり、アドレス選択信号ＡｄＳｅｌによりスレッド１が選択される状態が抑制される。そして、図２１に示す例では、アドレス選択信号ＡｄＳｅｌ［１］に代えて第２のカウント値を選択状態とするアドレス選択信号ＡｄＳｅｌ［２］が１となる。つまり、サイクルｎ＋２では、スレッド２が選択される。なお、サイクルｎ＋２では、カウンタ３３のカウント値ＣＮＴが規定値の７に達していないが、ＡＮＤ回路３６は、制御信号ＳＣ［１］がアサート状態であれば、カウンタ３３の抑制信号の状態にかかわらずアドレス選択信号ＡｄＳｅｌ［１］を非選択状態とする。また、サイクルｎ＋２では、アドレス選択信号ＡｄＳｅｌ［１］が非選択状態となるため、カウント値ＣＮＴがリセットされる。 In the example shown in FIG. 21, it is assumed that the count value PC1 (for example, the first address value) of the program counter 201 reaches p in the cycle n + 2. Therefore, in the cycle n + 2, when the above equation (1) is calculated based on the fetch number NFI, the instruction holding number NHI, and the instruction issue number NII corresponding to the thread 1, the condition of the equation (1) is not satisfied, but in the cycle n + 2 The control signal SC [1] is asserted, and the state in which the thread 1 is selected by the address selection signal AdSel is suppressed. In the example shown in FIG. 21, the address selection signal AdSel [2] for selecting the second count value is 1 instead of the address selection signal AdSel [1]. That is, in cycle n + 2, thread 2 is selected. In cycle n + 2, the count value CNT of the counter 33 has not reached the specified value of 7, but the AND circuit 36 is in the state of the suppression signal of the counter 33 if the control signal SC [1] is asserted. First, the address selection signal AdSel [1] is set to a non-selected state. In cycle n + 2, since the address selection signal AdSel [1] is in a non-selected state, the count value CNT is reset.

サイクルｎ＋３では、サイクルｎ＋２においてアドレス選択信号ＡｄＳｅｌによりスレッド２が選択されていることに基づいて、命令キャッシュ２２がスレッド２に属する命令コードを２個フェッチする。つまり、フェッチ数ＮＦＩは０／２／０となる。また、サイクルｎ＋３では、サイクルｎ＋２のフェッチ数ＮＦＩが２／０／０、命令保持数ＮＨＩが２／０／０、命令発行数ＮＩＩが１／０／０である。そのため、サイクルｎ＋３の命令保持数ＮＨＩは３／０／０となる。さらに、サイクルｎ＋３では、命令供給部１０からスレッド１に属する命令コードが２つ発行されるため、命令発行数ＮＩＩは２／０／０となる。また、サイクルｎ＋３では、制御信号ＳＣ［１］がアサート状態であるため、サイクルｎ＋３では、アドレス選択信号ＡｄＳｅｌによりスレッド３が選択される。 In cycle n + 3, the instruction cache 22 fetches two instruction codes belonging to the thread 2 based on the fact that the thread 2 is selected by the address selection signal AdSel in the cycle n + 2. That is, the fetch number NFI is 0/2/0. In cycle n + 3, the fetch number NFI of cycle n + 2 is 2/0/0, the instruction holding number NHI is 2/0/0, and the instruction issue number NII is 1/0/0. Therefore, the instruction holding number NHI of cycle n + 3 is 3/0/0. Further, in the cycle n + 3, two instruction codes belonging to the thread 1 are issued from the instruction supply unit 10, so that the instruction issue number NII is 2/0/0. Further, since the control signal SC [1] is in the asserted state in the cycle n + 3, the thread 3 is selected by the address selection signal AdSel in the cycle n + 3.

サイクルｎ＋４では、サイクルｎ＋３においてアドレス選択信号ＡｄＳｅｌによりスレッド３が選択されていることに基づいて、命令キャッシュ２２がスレッド３に属する命令コードを２個フェッチする。つまり、フェッチ数ＮＦＩは０／０／２となる。また、サイクルｎ＋４では、サイクルｎ＋３のフェッチ数ＮＦＩが０／２／０、命令保持数ＮＨＩが３／０／０、命令発行数ＮＩＩが２／０／０である。そのため、サイクルｎ＋４の命令保持数ＮＨＩは１／２／０となる。さらに、サイクルｎ＋４では、命令供給部１０からスレッド１に属する命令コードとスレッド２に属する命令コードが１つずつ発行されるため、命令発行数ＮＩＩは１／１／０となる。また、サイクルｎ＋４では、制御信号ＳＣ［１］がアサート状態であること、及び、サイクルｎ＋３の選択スレッドがスレッド３であったことに基づきスレッド２が選択される。 In cycle n + 4, the instruction cache 22 fetches two instruction codes belonging to the thread 3 based on the fact that the thread 3 is selected by the address selection signal AdSel in the cycle n + 3. That is, the fetch number NFI is 0/0/2. In cycle n + 4, the fetch number NFI of cycle n + 3 is 0/2/0, the instruction holding number NHI is 3/0/0, and the instruction issue number NII is 2/0/0. Therefore, the instruction holding number NHI of cycle n + 4 is 1/2/0. Further, in cycle n + 4, the instruction supply unit 10 issues one instruction code belonging to the thread 1 and one instruction code belonging to the thread 2, so the instruction issue count NII is 1/1/0. In cycle n + 4, thread 2 is selected based on the fact that control signal SC [1] is asserted and that the selected thread in cycle n + 3 is thread 3.

サイクルｎ＋５では、サイクルｎ＋４においてアドレス選択信号ＡｄＳｅｌによりスレッド２が選択されていることに基づいて、命令キャッシュ２２がスレッド２に属する命令コードを２個フェッチする。つまり、フェッチ数ＮＦＩは０／２／０となる。また、サイクルｎ＋５では、サイクルｎ＋４のフェッチ数ＮＦＩが０／０／２、命令保持数ＮＨＩが１／２／０、命令発行数ＮＩＩが１／１／０である。そのため、サイクルｎ＋５の命令保持数ＮＨＩは０／１／２となる。さらに、サイクルｎ＋５では、命令供給部１０からスレッド２に属する命令コードとスレッド３に属する命令コードが１つずつ発行されるため、命令発行数ＮＩＩは０／１／１となる。 In cycle n + 5, the instruction cache 22 fetches two instruction codes belonging to the thread 2 based on the fact that the thread 2 is selected by the address selection signal AdSel in the cycle n + 4. That is, the fetch number NFI is 0/2/0. In cycle n + 5, the fetch number NFI of cycle n + 4 is 0/0/2, the instruction holding number NHI is 1/2/0, and the instruction issue number NII is 1/1/0. Therefore, the instruction holding number NHI in cycle n + 5 is 0/1/2. Further, in cycle n + 5, since the instruction code belonging to thread 2 and the instruction code belonging to thread 3 are issued one by one from the instruction supply unit 10, the number NII of issued instructions is 0/1/1.

ここで、図２１に示す例では、サイクルｎ＋５において、スレッド２に属する命令コードがフェッチされることで、カウント値ＰＣ２（例えば、第２のカウント値）がｑに達する。そのため、サイクルｎ＋５では、スレッド切り替え通知信号ＣＲ１がアサート状態の期間にスレッド切り替え通知信号ＣＲ２がアサート状態となる。そにため、スレッド切り通知信号ＣＲ２がアサート状態に切り替わったことに応じて、制御信号ＳＣ［１］がネゲート状態に切り替わり、制御信号ＳＣ［２］がアサート状態に切り替わる。このようなことから、サイクルｎ＋５では、アドレス選択信号ＡｄＳｅｌによりスレッド１を選択することが可能になり、かつ、スレッド１に対応するフェッチ数ＮＦＩ、命令保持数ＮＨＩ、命令発行数ＮＩＩに基づき上述した（１）式を計算すると、（１）式の条件を満たさない状態となる。従って、サイクルｎ＋５では、アドレス選択信号ＡｄＳｅｌによりスレッド１が選択される。 In the example shown in FIG. 21, the count value PC2 (for example, the second count value) reaches q by fetching the instruction code belonging to the thread 2 in the cycle n + 5. Therefore, in cycle n + 5, the thread switching notification signal CR2 is asserted during the period in which the thread switching notification signal CR1 is asserted. Therefore, in response to the thread cut notification signal CR2 being switched to the asserted state, the control signal SC [1] is switched to the negated state, and the control signal SC [2] is switched to the asserted state. For this reason, in cycle n + 5, it becomes possible to select the thread 1 by the address selection signal AdSel, and the above description is based on the fetch number NFI, the instruction holding number NHI, and the instruction issue number NII corresponding to the thread 1. When formula (1) is calculated, the condition of formula (1) is not satisfied. Accordingly, in cycle n + 5, thread 1 is selected by the address selection signal AdSel.

ここで、図２１のサイクルｎ〜ｎ＋５の間に選択されるスレッドの切り替わりを示すシーケンス図を図２２に示す。図２２に示すように、サイクルｎ＋１で、カウント値ＰＣ１がｐに達すると、主スレッドの選択は抑制され、副スレッドから処理対処のスレッドが選択される。本実施の形態では、ラウンドロビン方式で副スレッドを選択する。そのため、サイクルｎ＋２〜ｎ＋３の期間は、スレッド２とスレッド３が巡回的に選択される。そして、サイクルｎ＋５で、カウント値ＰＣ２がｑに達すると、制御信号ＳＣ［１］がネゲート状態となるため、スレッド１が選択される。このような処理を行うことで、マルチスレッドプロセッサ２は、値がｐとなるカウント値ＰＣ１に対応する命令コード以降の命令コードをスレッド２の処理が完了した後にフェッチすることが可能になる。 Here, FIG. 22 shows a sequence diagram showing switching of threads selected during cycles n to n + 5 in FIG. As shown in FIG. 22, when the count value PC1 reaches p in cycle n + 1, the selection of the main thread is suppressed, and the thread for processing is selected from the secondary thread. In the present embodiment, the secondary thread is selected by the round robin method. Therefore, the thread 2 and the thread 3 are cyclically selected during the period of the cycles n + 2 to n + 3. Then, when the count value PC2 reaches q in the cycle n + 5, the control signal SC [1] is negated, so that the thread 1 is selected. By performing such processing, the multi-thread processor 2 can fetch the instruction code after the instruction code corresponding to the count value PC1 having the value p after the processing of the thread 2 is completed.

上記説明より、実施の形態３にかかるマルチスレッドプロセッサ２は、待ち合わせ制御部２５を有する。そして、待ち合わせ制御部２５は、プログラムカウンタのカウント値に基づき特定のスレッドの選択を抑制する制御信号ＳＣを生成する。これにより、マルチスレッドプロセッサ２では、各スレッドの進捗度に応じて一部のスレッドが選択されることを抑制することが可能となる。 From the above description, the multithread processor 2 according to the third exemplary embodiment includes the waiting control unit 25. The waiting control unit 25 generates a control signal SC that suppresses selection of a specific thread based on the count value of the program counter. As a result, the multithread processor 2 can suppress selection of some threads according to the progress of each thread.

マルチスレッドプロセッサでは、主スレッドが副スレッドの終了を条件として処理を進めることがある。このような場合、副スレッドの終了を検出するために主スレッドが無限ループ処理を行う。従来、この無限ループ処理を行うためには、無限ループを実行するための命令コードのフェッチが必要であった。この無限ループ処理は、副スレッドに属する命令コードのフェッチを阻害することになる。 In a multi-thread processor, the main thread may advance processing on the condition that the secondary thread ends. In such a case, the main thread performs an infinite loop process to detect the end of the secondary thread. Conventionally, in order to perform this infinite loop processing, it has been necessary to fetch an instruction code for executing the infinite loop. This infinite loop processing hinders fetching of instruction codes belonging to the secondary thread.

しかしながら、実施の形態３にかかるマルチスレッドプロセッサ２は、主スレッドにおいて無限ループに関する命令コードをフェッチすることなく副スレッドの終了を条件とした待ち合わせ処理を実行することができる。つまり、実施の形態３にかかるマルチスレッドプロセッサ２は、待ち合わせ処理の対象となるスレッドに対する命令フェッチ能力を向上させることができる。 However, the multithread processor 2 according to the third embodiment can execute the waiting process on the condition that the secondary thread is terminated without fetching the instruction code related to the infinite loop in the main thread. That is, the multithread processor 2 according to the third embodiment can improve the instruction fetch capability for the thread that is the target of the waiting process.

また、主スレッドに対応した命令バッファに蓄積された命令コードの数のみに基づき副スレッドの命令コードのフェッチを行った場合、主スレッドの命令コードが終了すると、主スレッドに対応した命令バッファに命令コードが蓄積されない。そのため、このような場合には、スレッド制御部が副スレッドに対応する第２のカウント値を選択状態とするアドレス選択信号ＡｄＳｅｌを生成できない問題がある。しかし、実施の形態３にかかるマルチスレッドプロセッサ２では、プログラムカウンタ２０１のカウント値に基づき主スレッドの終了を検出し、主スレッドの終了後は、主スレッドの選択を抑制して副スレッドを実行することができる。 If the instruction code of the secondary thread is fetched based only on the number of instruction codes stored in the instruction buffer corresponding to the main thread, the instruction is stored in the instruction buffer corresponding to the main thread when the instruction code of the main thread ends. The code is not accumulated. Therefore, in such a case, there is a problem that the thread control unit cannot generate the address selection signal AdSel that selects the second count value corresponding to the secondary thread. However, in the multithread processor 2 according to the third embodiment, the end of the main thread is detected based on the count value of the program counter 201, and after the main thread ends, the selection of the main thread is suppressed and the secondary thread is executed. be able to.

実施の形態３では、スレッド１がスレッド２の終了を待ち合わせる例について説明した。しかし、待ち合わせ制御部２５において、スレッド１が待ち合わせを必要とするカウント値と、スレッド２、３の処理が終了するカウント値とを設定して、上記待ち合わせ処理を行うことも可能である。この場合、スレッド１は、スレッド２とスレッド３との２つのスレッドの終了を条件（或いは、いずれかのスレッドの終了を条件）として処理を再開させることが可能である。つまり、待ち合わせの方式や待ち合わせ条件の設定方法については、種々の変更が可能である。 In the third embodiment, the example in which the thread 1 waits for the end of the thread 2 has been described. However, the waiting control unit 25 can set the count value that the thread 1 needs to wait for and the count value at which the processing of the threads 2 and 3 ends, and perform the waiting process. In this case, the thread 1 can restart the process on the condition that the two threads of the thread 2 and the thread 3 are terminated (or on the condition that one of the threads is terminated). That is, various changes can be made to the waiting method and the setting method of the waiting condition.

なお、本発明は上記実施の形態に限られたものではなく、趣旨を逸脱しない範囲で適宜変更することが可能である。 Note that the present invention is not limited to the above-described embodiment, and can be changed as appropriate without departing from the spirit of the present invention.

ＡｄＳｅｌアドレス選択信号
ＮＦＩフェッチ数
ＮＨＩ命令保持数
ＮＩＩ命令発行数
ＮＭＩ発行最大数
Ｃ１、Ｃ２切り替え閾値
Ｓ１、Ｓ２待ち合わせ設定値
ＳＣ制御信号
１、１ａ、２マルチスレッドプロセッサ
１０命令供給部
１１命令セレクタ
１２命令デコーダ
１３命令実行部
２１アドレスセレクタ
２２命令キャッシュ
２４、２４ａ、２４ｂスレッド制御部
２５待ち合わせ制御部
３０高スレッド選択部
３１、３５ラウンドロビン選択部
３２低スレッド選択部
３３カウンタ
３４、３６、６１ＡＮＤ回路
５１１、５１２、５２１ＡＮＤ回路
２０１-２０ｍプログラムカウンタ
２３１-２３ｍ命令バッファ
２３１命令バッファ
２３２命令バッファ
４０１-４０２比較器
４１１-４１２制御信号生成部 AdSel Address selection signal NFI Fetch number NHI Instruction holding number NII Instruction issue number NMI Issue maximum number C1, C2 Switching threshold S1, S2 Wait setting value SC Control signal 1, 1a, 2 Multithread processor 10 Instruction supply unit 11 Instruction selector 12 Instruction Decoder 13 Instruction execution unit 21 Address selector 22 Instruction caches 24, 24a, 24b Thread control unit 25 Wait control unit 30 High thread selection unit 31, 35 Round robin selection unit 32 Low thread selection unit 33 Counters 34, 36, 61 AND circuit 511 512, 521 AND circuit 201-20m Program counter 231-23m Instruction buffer 231 Instruction buffer 232 Instruction buffer 401-402 Comparator 411-412 Control signal generator

Claims

An instruction supply unit comprising: a first instruction buffer for storing a first instruction code belonging to the first thread; and a second instruction buffer for storing a second instruction code belonging to the second thread;
An instruction selector for selecting an instruction code to be transmitted to a subsequent circuit among instruction codes issued from the first and second instruction buffers;
An instruction decoder that decodes the instruction code selected by the instruction selector to generate execution instruction information;
An instruction execution unit that performs information processing based on the execution instruction information,
The command supply unit
The first instruction code is preferentially stored in the first instruction buffer, and the number of the first instruction codes stored in the first instruction buffer can be issued in parallel by the instruction supply unit A multi-thread processor having a thread control unit for storing the second instruction code in the second instruction buffer when the maximum value of the number of instruction codes becomes twice or more.

The command supply unit
A first program counter provided in accordance with the first thread and increasing a first count value in accordance with the progress of the first thread;
A second program counter provided in accordance with the second thread and increasing a second count value in accordance with the progress of the second thread;
An address selector that selects and outputs one of the first count value and the second count value in response to an address selection signal;
The first and second instruction codes are read from an external memory and stored, the first instruction code is output to the first instruction buffer according to the first count value, and the second count value is output. An instruction cache that outputs the second instruction code to the second instruction buffer in response to
A thread control unit that switches which of the first count value and the second count value is instructed by the address selection signal according to the number of instruction codes stored in the first instruction buffer;
The multi-thread processor according to claim 1.

The thread control unit receives the address selection signal during a period in which the number of instruction codes stored in the first instruction buffer is less than twice the maximum number of instructions that the instruction supply unit can issue in parallel. The multithread processor according to claim 2, further comprising a high thread selection unit configured to select the first count value.

The thread control unit includes a low thread selection unit that sets the address selection signal in a state of selecting the second count value when the address selection signal sets the first count value to a non-selected state. The multi-thread processor according to 2 or 3.

The thread control unit counts the number of operation cycles in which the address selection signal indicates a state in which the first count value is selected. The multi-thread processor according to claim 4, further comprising a low-thread forcible processing unit that selects a count value of 2.

When the control signal for suppressing the address selection signal from selecting a predetermined address value is output to the thread control unit, and the first count value reaches a preset first switching threshold value The control signal is set to a thread suppression state for preventing the address selection signal from selecting a thread corresponding to the first count value, and then the second count value is set to a second value set in advance. 6. The multi-thread processor according to claim 3, further comprising a waiting unit that releases the thread suppression state of the control signal when the switching threshold is reached.

The first thread is a main thread, the second thread is a sub thread, and the sub thread performs an operation using a calculation result by the main thread. The multithread processor according to item.

The multi-thread processor according to any one of claims 1 to 7, wherein the second instruction buffer is set to have a storage capacity smaller than that of the first instruction buffer.

The first instruction buffer stores a plurality of first instruction codes in parallel, the second instruction buffer stores a plurality of second instruction codes in parallel, and the instruction execution unit includes a plurality of instruction execution units. The multi-thread processor according to claim 1, wherein a plurality of pieces of execution instruction information decoded from the instruction code are processed in parallel.

A multi-thread processor that executes, in a time division manner, a first instruction code belonging to a first thread and a second instruction code belonging to a second thread,
A first instruction buffer for storing a first instruction code belonging to the first thread;
A second instruction buffer for storing a second instruction code belonging to the second thread;
The first instruction code is preferentially stored in the first instruction buffer, and the number of the first instruction codes stored in the first instruction buffer can be issued in parallel by the first buffer. A thread control unit that stores the second instruction code in the second instruction buffer when the number of instructions is more than twice the maximum value of the number of instructions,
A multi-thread processor.