JP2006285683A

JP2006285683A - Information processing apparatus, arithmetic processing device and memory access control method

Info

Publication number: JP2006285683A
Application number: JP2005105244A
Authority: JP
Inventors: Kenji Ezoe; 健司江副
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2005-03-31
Filing date: 2005-03-31
Publication date: 2006-10-19
Anticipated expiration: 2025-03-31
Also published as: JP4788177B2

Abstract

<P>PROBLEM TO BE SOLVED: To streamline the operation of a CPU by predicting traffic on a bus connecting the CPU and an MMU. <P>SOLUTION: An instruction decoder 13 decodes an instruction into an operation code and sends it to a scalar request count calculation part 42. The scalar request count calculation part 42 calculates an expected value of issued memory request count and sends it to an arithmetic unit 48 via a registration-based calculation control part 46 and the like. When data reading from a main storage part 21 is completed, a completion-based calculation control part 47 calculates an executed memory request count and sends it to the arithmetic unit 48. The arithmetic unit 48 adds up a value retained in a request counter 49 and the values sent from the registration-based calculation control part 46 and completion-based calculation control part 47 and stores the result in the request counter 49. According to the value stored in the request counter 49, control parts 51 to 54 in an operation control part 50 output various control signals for optimizing the operation of memory access. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、情報処理装置、演算処理装置およびメモリアクセス制御方法に関し、特に、キャッシュメモリを具備するパイプライン処理方式の演算処理装置を備える情報処理装置、パイプライン処理方式の演算処理装置およびメモリアクセス制御方法に関する。 The present invention relates to an information processing apparatus, an arithmetic processing apparatus, and a memory access control method, and more particularly, an information processing apparatus including a pipeline processing type arithmetic processing apparatus including a cache memory, a pipeline processing type arithmetic processing apparatus, and a memory access. It relates to a control method.

通常、パイプライン処理方式では、大別して、命令のフェッチステージ、命令のデコードステージ、データの読み出しステージ、命令の実行ステージ、書き込みステージがあり、この順でステージが進行する。 In general, the pipeline processing method is roughly divided into an instruction fetch stage, an instruction decode stage, a data read stage, an instruction execution stage, and a write stage, and the stages proceed in this order.

それぞれ、命令のフェッチステージでは命令の読み出し、命令のデコードステージでは命令の解読、データの読み出しステージでは命令の演算に必要なデータの準備、命令の実行ステージでは実際の演算、書き込みステージでは演算結果の書き込みが実行される。 The instruction fetch stage reads the instruction, the instruction decode stage decodes the instruction, the data read stage prepares the data required for the instruction operation, the instruction execution stage performs the actual operation, and the write stage displays the operation result. Writing is executed.

命令やデータがキャッシュメモリに存在しない場合、主記憶から命令やデータを読み出す必要がある。命令やデータを主記憶から読み出す状態が引き続くと、演算処理装置の処理能力の低下を引き起こす。 If the instruction or data does not exist in the cache memory, it is necessary to read the instruction or data from the main memory. If the state in which instructions and data are read from the main memory continues, the processing capability of the arithmetic processing unit is reduced.

演算処理装置の処理能力を有効に活用する１つの手法として、特許文献１にコンピュータシステムにおけるコンテクスト間のメモリシステム相互作用特性の統計値を推定し、これを命令スケジューリングや命令やデータの配置の最適化に適用することが開示されている。
特開平１１−３５３２３１号公報 As one method for effectively utilizing the processing power of the arithmetic processing unit, Patent Literature 1 estimates a statistical value of the memory system interaction characteristic between contexts in a computer system, and uses this to optimize instruction scheduling and instruction / data arrangement. It is disclosed to apply to
Japanese Patent Laid-Open No. 11-353231

しかし、特許文献１に開示されている手法では、推定したメモリシステム相互作用特性の統計値をプログラム単位で演算処理装置の処理能力を有効活用しているにすぎない。このため、様々なプログラムに対し汎用的に演算処理装置で効率的に演算を施すことができない。 However, in the method disclosed in Patent Document 1, the estimated statistical value of the memory system interaction characteristic is merely effectively used for the processing capability of the arithmetic processing unit in units of programs. For this reason, it is not possible to efficiently perform arithmetic operations on various programs using a general-purpose arithmetic processing device.

また、演算処理装置の動作周波数向上のスピードに対して、メモリアクセス動作速度の向上のスピードが追いつけない状況が続いている。そのため、パイプラインの段数が増加し、パイプライン上流に位置するデコードステージからメモリリクエストが実際に出力されるステージまでの時間が長くなる傾向にある。ますます長くなりつつあるこの時間を有効活用する手段が必要となってきている。 In addition, the situation in which the speed of improving the memory access operating speed cannot keep up with the speed of improving the operating frequency of the arithmetic processing unit continues. Therefore, the number of pipeline stages increases, and the time from the decode stage located upstream of the pipeline to the stage where the memory request is actually output tends to increase. There is a need to make effective use of this increasingly long time.

本発明は上記問題点に鑑みてなされたもので、命令デコード時に得られる命令のオペランドを元に発行されるメモリリクエスト数を予測し、この算出値を利用することで従来と比較して効率的に動作する情報処理装置、演算処理装置およびメモリアクセス制御方法の提供を目的する。 The present invention has been made in view of the above problems, and predicts the number of memory requests to be issued based on the operand of an instruction obtained at the time of instruction decoding, and uses this calculated value to make it more efficient than in the past. It is an object of the present invention to provide an information processing apparatus, an arithmetic processing apparatus, and a memory access control method that operate on the Internet.

本発明の第１の観点に係る演算処理装置は、
制御部と、
メモリとを具備する情報処理装置であって、
前記制御部は、
命令からオペコードとオペランドとを取り出す命令デコーダと、
前記命令デコーダが取り出したオペランドが前記メモリにアクセスする命令のオペランドであるか否かを判別する命令種別判別手段と、
前記命令種別判別手段でオペランドが前記メモリにアクセスする命令のオペランドであると判別された場合に、この命令で前記メモリにアクセスする際に発行されるリクエスト数の期待値を算出する加算値算出手段と、
前記加算値算出手段が算出した値を積算する積算手段と、
を具備する。 An arithmetic processing apparatus according to the first aspect of the present invention provides:
A control unit;
An information processing apparatus comprising a memory,
The controller is
An instruction decoder that extracts an opcode and an operand from the instruction;
Instruction type determining means for determining whether or not the operand fetched by the instruction decoder is an operand of an instruction that accesses the memory;
Addition value calculating means for calculating an expected value of the number of requests issued when accessing the memory by this instruction when the instruction type determining means determines that the operand is an operand of an instruction accessing the memory When,
Integrating means for integrating the values calculated by the added value calculating means;
It comprises.

この発明によれば、演算処理装置とメモリとの間でメモリアクセスのためにやり取りされるメモリリクエスト数の期待値を算出し、算出した期待値を積算する。この積算値を用いて、演算処理装置とメモリとの間でデータがスムーズにやり取りできるよう情報処理装置を動作させることができる。 According to the present invention, the expected value of the number of memory requests exchanged for memory access between the arithmetic processing unit and the memory is calculated, and the calculated expected value is integrated. Using this integrated value, the information processing apparatus can be operated so that data can be exchanged smoothly between the arithmetic processing unit and the memory.

上記演算処理装置において、
前記制御部は、前記メモリへのアクセス終了後、このアクセスの基になった命令に対し、前記加算値算出手段が算出した期待値を減算値として算出する減算値算出手段を具備することが望ましい。
この場合、前記積算手段は、前記加算値算出手段と前記減算値算出手段が算出した値とを積算する。 In the above arithmetic processing unit,
The control unit preferably includes subtraction value calculation means for calculating the expected value calculated by the addition value calculation means as a subtraction value for the instruction that is the basis of the access after the access to the memory is completed. .
In this case, the integration means integrates the value calculated by the addition value calculation means and the subtraction value calculation means.

上記演算処理装置において、
前記メモリは、情報を格納しつづけるために、所定間隔かそれより短い間隔で格納内容をリフレッシュするメモリである。
この場合、前記制御部は、
前記積算手段で算出した値が０であるか否かを判別する積算値判別手段と、
前記積算値判別手段で前記積算手段で算出した値が０であると判別した場合に、前記メモリの格納内容をリフレッシュさせるリフレッシュ指示信号を前記メモリへ出力し、前記積算手段で算出した値が０でないと判別した場合に、前記メモリの格納内容を前記所定間隔でリフレッシュ指示信号を前記メモリへ出力するリフレッシュ指示手段と、
を具備することが望ましい。 In the above arithmetic processing unit,
The memory is a memory that refreshes stored contents at a predetermined interval or shorter than the predetermined interval in order to keep storing information.
In this case, the control unit
Integrated value determining means for determining whether or not the value calculated by the integrating means is 0;
When the integrated value determining means determines that the value calculated by the integrating means is 0, a refresh instruction signal for refreshing the stored contents of the memory is output to the memory, and the value calculated by the integrating means is 0. Refresh instruction means for outputting a refresh instruction signal to the memory at the predetermined interval when it is determined that the stored contents of the memory are not,
It is desirable to comprise.

この場合、メモリリクエストが発生しないと予測される場合に、メモリの格納内容をリフレッシュする指示を、リフレッシュ指示手段が出力し、メモリの格納内容がリフレッシュされる。これにより、メモリの格納内容をリフレッシュすることにより生じるメモリアクセスの待ち時間を削減することができる。 In this case, when it is predicted that no memory request will occur, the refresh instruction means outputs an instruction to refresh the stored contents of the memory, and the stored contents of the memory are refreshed. As a result, it is possible to reduce the memory access waiting time caused by refreshing the stored contents of the memory.

上記前記情報処理装置はクロック信号に同期して動作するものであってもよい。
この場合、前記制御部は、前記積算値判別手段が前記積算手段が算出した値が０であると判別した場合に、前記クロック信号の周波数を基本周波数から上昇させるクロック制御信号を出力し、前記積算手段で算出した値が０でないと判別した場合に、前記クロック信号の周波数を基本周波数に戻すクロック制御信号を出力するクロック制御手段を具備することが望ましい。 The information processing apparatus may operate in synchronization with a clock signal.
In this case, when the integrated value determining unit determines that the value calculated by the integrating unit is 0, the control unit outputs a clock control signal for increasing the frequency of the clock signal from a basic frequency, and It is desirable to provide a clock control means for outputting a clock control signal for returning the frequency of the clock signal to the fundamental frequency when it is determined that the value calculated by the integrating means is not zero.

上記情報処理装置において、
前記メモリは、メモリクロック信号に同期して動作するものであってもよい。
この場合、前記制御部は、前記積算値判別手段が前記積算手段が算出した値が０であると判別した場合に、前記クロック信号の周波数を基本周波数から低下させるメモリクロック制御信号を出力し、前記積算手段で算出した値が０でないと判別した場合に、前記クロック信号の周波数を基本周波数に戻すメモリクロック制御信号を出力するメモリクロック制御手段を具備することが望ましい。 In the information processing apparatus,
The memory may operate in synchronization with a memory clock signal.
In this case, when the integrated value determining unit determines that the value calculated by the integrating unit is 0, the control unit outputs a memory clock control signal that reduces the frequency of the clock signal from the basic frequency, It is desirable to provide memory clock control means for outputting a memory clock control signal for returning the frequency of the clock signal to the fundamental frequency when it is determined that the value calculated by the integrating means is not zero.

上記情報処理装置において、
前記制御部は、前記積算値算出手段が算出した積算値に従って、同時に前記制御部と前記メモリとの間で送受信できるメモリアクセス要求の最大数を変更するリクエスト制御手段を具備してもよい。 In the information processing apparatus,
The control unit may include request control means for changing a maximum number of memory access requests that can be transmitted and received between the control unit and the memory at the same time in accordance with the integrated value calculated by the integrated value calculating means.

この発明によれば、例えば、メモリアクセス要求が多いときには、制御部とメモリとの間で送受信されるメモリアクセス要求の最大発行数を下げて、メモリアクセス要求の発行を押さえ、メモリアクセス要求が少ないときには、制御部とメモリとの間で送受信されるメモリアクセス要求の最大発行数を上げて、メモリアクセス要求の発行を制限しないようにすることができる。 According to the present invention, for example, when the number of memory access requests is large, the maximum number of memory access requests transmitted / received between the control unit and the memory is reduced to suppress the issuance of memory access requests and the number of memory access requests is small. In some cases, it is possible to increase the maximum number of issued memory access requests transmitted and received between the control unit and the memory so as not to limit the issuance of memory access requests.

上記情報処理装置において、
前記情報処理装置はキャッシュメモリを具備することが望ましい。
この場合、前記制御部は前記キャッシュメモリのキャッシュミス率を算出するミス率算出手段を具備し、
前記命令種別判別手段は、さらに、前記命令デコーダが取り出したオペランドがスカラ型命令であるか否かを判別する手段であって、
前記加算値算出手段は、前記命令種別判別手段がオペランドがスカラ型命令のオペランドであると判別した場合、前記ミス率算出手段で算出されたキャッシュミス率に従って、前記期待値を算出する。 In the information processing apparatus,
The information processing apparatus preferably includes a cache memory.
In this case, the control unit includes a miss rate calculation means for calculating a cache miss rate of the cache memory,
The instruction type determining means is further means for determining whether or not the operand fetched by the instruction decoder is a scalar type instruction,
The addition value calculation means calculates the expected value according to the cache miss ratio calculated by the miss rate calculation means when the instruction type determination means determines that the operand is an operand of a scalar type instruction.

上記情報処理装置において、
前記制御部は、前記積算値判別手段が前記積算手段が算出した値が０であると判別した場合に、前記制御部と前記メモリとの間を接続するメモリバスの遅延ロックループの再調整を指示するＤＬＬ制御信号を出力し、前記積算手段で算出した値が０でないと判別した場合に、前記メモリバスの遅延ロックループの再調整を所定の間隔で前記ＤＬＬ信号を出力するＤＬＬ制御手段を具備してもよい。 In the information processing apparatus,
When the integrated value determining unit determines that the value calculated by the integrating unit is 0, the control unit performs readjustment of a delay lock loop of a memory bus connecting the control unit and the memory. A DLL control unit that outputs a DLL control signal to output the DLL signal at a predetermined interval when readjustment of the delay lock loop of the memory bus is performed when it is determined that the value calculated by the integration unit is not 0. You may have.

本発明の第２の観点に係る演算処理装置によれば、
メモリバスを介してメモリとデータをやりとりする演算処理装置であって、
命令からオペコードとオペランドとを取り出す命令デコーダと、
前記命令デコーダが取り出したオペランドが前記メモリバスを介して前記メモリにアクセスする命令のオペランドであるか否かを判別する命令種別判別手段と、
前記命令種別判別手段でオペランドが前記メモリにアクセスする命令のオペランドであると判別された場合に、この命令で前記メモリにアクセスする際に発行されるリクエスト数の期待値を算出する加算値算出手段と、
前記加算値算出手段が算出した値を積算する積算手段と、
を具備する。 According to the arithmetic processing apparatus according to the second aspect of the present invention,
An arithmetic processing unit that exchanges data with a memory via a memory bus,
An instruction decoder that extracts an opcode and an operand from the instruction;
Instruction type determining means for determining whether or not the operand fetched by the instruction decoder is an operand of an instruction that accesses the memory via the memory bus;
Addition value calculating means for calculating an expected value of the number of requests issued when accessing the memory by this instruction when the instruction type determining means determines that the operand is an operand of an instruction accessing the memory When,
Integrating means for integrating the values calculated by the added value calculating means;
It comprises.

本発明の第３の観点に係るメモリアクセス制御方法は、
メモリバスを介してメモリとデータをやりとりする演算処理装置が、
命令からオペコードとオペランドとを取り出すデコードステップと、
前記デコードステップで取り出したオペランドが前記メモリバスを介して前記メモリにアクセスする命令のオペランドであるか否かを判別する判別ステップと、
前記判別ステップでオペランドが前記メモリにアクセスする命令のオペランドであると判別された場合に、この命令で前記メモリにアクセスする際に発行されるリクエスト数の期待値を算出する加算値算出ステップと、
前記加算値算出ステップが算出した値を積算する積算ステップと、
前記積算ステップで積算された値に従って、演算処理装置のメモリアクセスを最適化する制御ステップと、
を備える。 A memory access control method according to a third aspect of the present invention includes:
An arithmetic processing unit that exchanges data with the memory via the memory bus
A decoding step for extracting an opcode and an operand from the instruction;
A determination step of determining whether or not the operand fetched in the decoding step is an operand of an instruction that accesses the memory via the memory bus;
An addition value calculating step of calculating an expected value of the number of requests issued when accessing the memory with this instruction when the determining step determines that the operand is an operand of the instruction accessing the memory;
An integration step of integrating the values calculated by the addition value calculation step;
A control step of optimizing the memory access of the arithmetic processing unit according to the value integrated in the integration step;
Is provided.

本発明によれば、デコードステージでデコードした結果に従って、メモリとの間でデータの読み書きする際に演算処理装置が発行するメモリアクセス要求の数を予測し、予測値に従い後続のステージの動作を効率的に制御できる。 According to the present invention, the number of memory access requests issued by the arithmetic processing unit when data is read from or written to the memory is predicted according to the result of decoding at the decoding stage, and the operation of the subsequent stage is efficiently performed according to the predicted value. Can be controlled.

本発明に係る実施形態を、以下図面を参照して説明する。
本実施形態では、演算処理装置とメモリとの間でデータをやりとりする必要が発生する命令は、ロード命令とストア命令とする。また、ロード命令とストア命令とは、単一のレジスタのみ関係するスカラ命令と、複数のレジスタに関係するベクトル命令とに分類することができる。ロード命令とストア命令とを含む各命令は１つのオペコードと０個以上（３個以下）のオペランドとから構成される。なお、オペコードによっては、暗黙のうちにいくつかのオペランドが指定される。さらに、メモリアクセスは命令の実行順に行われる。
また、以下の説明で単にアドレスとした場合は主記憶アドレスを意味する。 Embodiments according to the present invention will be described below with reference to the drawings.
In the present embodiment, instructions that need to exchange data between the arithmetic processing unit and the memory are a load instruction and a store instruction. The load instruction and the store instruction can be classified into a scalar instruction related to only a single register and a vector instruction related to a plurality of registers. Each instruction including a load instruction and a store instruction is composed of one opcode and zero or more (three or less) operands. Depending on the opcode, some operands are implicitly specified. Furthermore, memory access is performed in the order of instruction execution.
In the following description, when the address is simply used, it means a main memory address.

図１は、本発明の実施形態に係る演算処理装置およびその周辺装置の構成を示すブロック図である。演算処理装置およびその周辺装置はメインフレームやパーソナルコンピュータなど、情報処理装置の１構成要素となっている。
なお、矢印の先が直線になっている線（以下、データ線と称する）は演算データ（命令や通常の意味でのデータ）が通る信号線で、矢印の先が三角形になっている線（以下、制御信号線と称する）は演算処理装置１内の各種制御信号が通る信号線である。また、本発明と直接関係のない、グラフィック、外部入出力関係の要素は省略した。 FIG. 1 is a block diagram showing the configuration of an arithmetic processing unit and its peripheral devices according to an embodiment of the present invention. The arithmetic processing unit and its peripheral devices are components of an information processing apparatus such as a main frame and a personal computer.
Note that a line in which the arrowhead is a straight line (hereinafter referred to as a data line) is a signal line through which operation data (command or data in a normal sense) passes, and a line in which the arrowhead is a triangle ( (Hereinafter referred to as a control signal line) is a signal line through which various control signals in the arithmetic processing unit 1 pass. Also, the graphic and external input / output related elements that are not directly related to the present invention are omitted.

本実施形態の演算処理装置１は、シリコン上に実装された論理回路と順序回路から構成される。演算処理装置１は、主記憶部２１に格納されている命令列や、主記憶部２１に格納されているデータあるいは図示しない入力装置データから入力されたデータを、クロック信号に同期しつつ解釈実行して所定の処理を実行し、処理結果を主記憶部２１に書き込んだり、図示しない出力装置に出力する。主記憶部２１の格納内容の一部は、一時的にキャッシュメモリ２３に格納される。演算処理装置１は、実際には主記憶部２１から直接命令列やデータを読み書きするのではなく、キャッシュメモリ２３に蓄えられている命令列やデータを読み書きする。 The arithmetic processing apparatus 1 according to the present embodiment includes a logic circuit and a sequential circuit mounted on silicon. The arithmetic processing unit 1 interprets and executes an instruction sequence stored in the main storage unit 21, data stored in the main storage unit 21, or data input from input device data (not shown) in synchronization with a clock signal. Then, predetermined processing is executed, and the processing result is written in the main storage unit 21 or output to an output device (not shown). A part of the contents stored in the main storage unit 21 is temporarily stored in the cache memory 23. The arithmetic processing unit 1 actually reads and writes instruction sequences and data stored in the cache memory 23 instead of directly reading and writing instruction sequences and data from the main storage unit 21.

図１に示すように、演算処理装置１は、命令フェッチ部１１と、命令デコーダ１３と、読み出しステージ１５と、演算器１７と、書き込みステージ１９と、レジスタ群２９と、キャッシュミス率算出部４１と、スカラリクエスト数計算部４２と、リプライ受信部４３と、設定値格納部４４と、登録時計算制御部４６と、終了時計算制御部４７と、演算器４８と、リクエストカウンタ４９と、動作制御部５０と、テーブル管理部１２１と、メモリリクエスト管理テーブル１２７とを備える。これらの各要素は論理回路と順序回路とを組み合わせて構成されている。その他、周辺装置として動作クロック信号発生部６０と、メモリクロック信号発生部６１とがある。 As shown in FIG. 1, the arithmetic processing device 1 includes an instruction fetch unit 11, an instruction decoder 13, a read stage 15, an arithmetic unit 17, a write stage 19, a register group 29, and a cache miss rate calculation unit 41. A scalar request number calculation unit 42, a reply reception unit 43, a set value storage unit 44, a registration time calculation control unit 46, an end time calculation control unit 47, a calculator 48, a request counter 49, an operation A control unit 50, a table management unit 121, and a memory request management table 127 are provided. Each of these elements is configured by combining a logic circuit and a sequential circuit. In addition, there are an operation clock signal generator 60 and a memory clock signal generator 61 as peripheral devices.

命令フェッチ部１１は、プログラムカウンタ（図示なし）で指定されるアドレスに格納されている命令をキャッシュメモリ２３より読み出し、命令デコーダ１３に受け渡す。 The instruction fetch unit 11 reads an instruction stored at an address specified by a program counter (not shown) from the cache memory 23 and transfers it to the instruction decoder 13.

命令デコーダ１３は、命令のオペコードからどの演算が要求されているのか、オペランドからどのレジスタあるいはどのアドレスに格納されているデータを利用し、また、演算器１７の演算結果をどのレジスタあるいはどのアドレスに格納するのか、などを判別する。また、オペランドから、主記憶部２１へのアクセス（読み書き）が発生する「可能性」があるか否かを判別し、主記憶部２１へのアクセスが発生する可能性があると判別すれば、デコードの結果として得られたオペコードをメモリリクエスト管理テーブル１２７にリクエスト管理情報として記録する。なお、結果として、キャッシュメモリ２３へのアクセスで済んだとしても、判別条件は成立する。命令デコーダ１３は、デコード結果として得られるオペコードをオペコード情報としてスカラリクエスト数計算部４２と、登録時計算制御部４６と、テーブル管理部１２１とに送信する。 The instruction decoder 13 uses which data is stored in which register or in which address from the operand, which operation is requested from the instruction opcode, and the register or in which address the operation result of the arithmetic unit 17 is stored. Whether it is stored or not is determined. If it is determined from the operand whether or not there is a “possibility” that access (read / write) to the main storage unit 21 occurs, and it is determined that access to the main storage unit 21 may occur, The operation code obtained as a result of decoding is recorded in the memory request management table 127 as request management information. As a result, even if the access to the cache memory 23 is completed, the determination condition is satisfied. The instruction decoder 13 transmits the operation code obtained as a decoding result to the scalar request number calculation unit 42, the registration time calculation control unit 46, and the table management unit 121 as operation code information.

読み出しステージ１５は、オペランドで指定されたレジスタまたはアドレスから後続の実行ステージで必要なデータをキャッシュメモリ２３、主記憶部２１とレジスタ群２９とから読み出して用意する。データをキャッシュメモリ２３と主記憶部２１とから読み出す場合、メモリ管理ユニット２５を介してデータの読み出しを要求する。 The read stage 15 reads and prepares data necessary for the subsequent execution stage from the register or address specified by the operand from the cache memory 23, the main storage unit 21, and the register group 29. When data is read from the cache memory 23 and the main storage unit 21, the data is requested to be read via the memory management unit 25.

演算器１７は、加算、減算、条件判断、などオペコードで指定される演算を実行する実行ステージを担当する。そして、演算結果を書き込みステージ１９に送信する。 The arithmetic unit 17 is in charge of an execution stage that executes an operation specified by an operation code such as addition, subtraction, and condition determination. Then, the calculation result is transmitted to the writing stage 19.

書き込みステージ１９は、オペランドで指定されるレジスタ群２９のうちの１つのレジスタあるいは主記憶部２１内の指定されたアドレスに演算結果を書き込む。 The write stage 19 writes the operation result to one register in the register group 29 designated by the operand or a designated address in the main storage unit 21.

レジスタ群２９は、演算器１７の演算結果を一時的に保持するレジスタを複数備える。 The register group 29 includes a plurality of registers that temporarily hold the calculation results of the calculator 17.

キャッシュミス率算出部４１は、メモリ管理ユニット２５から、キャッシュヒット、キャッシュミスの情報を受け取り、所定のキャッシュ判定回数（例えば、１０００回）でのキャッシュミス率を算出しスカラリクエスト数計算部４２に送信する。キャッシュミス率算出部４１はキャッシュミス率を繰り返し算出する。 The cache miss rate calculation unit 41 receives cache hit and cache miss information from the memory management unit 25, calculates the cache miss rate at a predetermined number of cache determinations (for example, 1000 times), and sends it to the scalar request number calculation unit 42. Send. The cache miss rate calculation unit 41 repeatedly calculates the cache miss rate.

スカラリクエスト数計算部４２は、命令デコーダ１３から送信されてきたオペコード情報がロード命令（スカラ命令）を示すデータであるか否かを判別する。ロード命令（スカラ命令）を示すデータであると判別した場合、スカラリクエスト数計算部４２はキャッシュミス率算出部４１から送信されたキャッシュミス率にキャッシュ読み込み時に発行されるメモリリクエスト数をかけて得られた値をテーブル管理部１２１に送信する。ロード命令（スカラ命令）を示すデータでないと判別した場合、スカラリクエスト数計算部４２は値０をテーブル管理部１２１に送信する。 The scalar request number calculation unit 42 determines whether or not the opcode information transmitted from the instruction decoder 13 is data indicating a load instruction (scalar instruction). When it is determined that the data indicates a load instruction (scalar instruction), the scalar request number calculation unit 42 obtains the cache miss rate transmitted from the cache miss rate calculation unit 41 by multiplying the number of memory requests issued when the cache is read. The obtained value is transmitted to the table management unit 121. When it is determined that the data does not indicate a load instruction (scalar instruction), the scalar request number calculation unit 42 transmits a value 0 to the table management unit 121.

リプライ受信部４３はメモリ管理ユニット２５が主記憶部２１に格納されているデータをキャッシュメモリ２３に読み出した際に出力する完了信号を受信し、テーブル管理部１２１に送信する。 The reply receiving unit 43 receives a completion signal that is output when the memory management unit 25 reads data stored in the main storage unit 21 to the cache memory 23, and transmits it to the table management unit 121.

設定値格納部４４は、ベクトル型ストア／ロード命令実行時に発行される可能性のあるメモリリクエスト数をオペコードごとに格納している。 The set value storage unit 44 stores, for each opcode, the number of memory requests that may be issued when the vector type store / load instruction is executed.

テーブル管理部１２１は、メモリリクエスト管理テーブル１２７のエントリへの登録および削除を制御する。テーブル管理部１２１は、命令デコーダ１３からオペコード情報を受信すると、そのオペコード情報で示される命令がスカラ型ロード／ストア命令の場合、スカラリクエスト数計算部４２から、発行されるメモリリクエスト数の期待値を受信し、オペコード情報と関連づけメモリリクエスト管理テーブル１２７に登録する。また、オペコード情報で示される命令がベクトル型ロード／ストア命令の場合、設定値格納部４４からその命令で発行されるメモリリクエスト数を検索し、オペコード情報と関連づけメモリリクエスト管理テーブル１２７に登録する。さらに、リプライ受信部４３から完了信号を受信した時、テーブル管理部１２１は完了信号に対応するエントリに対し、主記憶部２１あるいはキャッシュメモリ２３からデータが読み出された旨を記録する。 The table management unit 121 controls registration and deletion of entries in the memory request management table 127. When the table management unit 121 receives the operation code information from the instruction decoder 13, if the instruction indicated by the operation code information is a scalar type load / store instruction, the expected value of the number of memory requests issued from the scalar request number calculation unit 42 Is registered in the memory request management table 127 in association with the opcode information. When the instruction indicated by the opcode information is a vector type load / store instruction, the number of memory requests issued by the instruction is retrieved from the set value storage unit 44 and registered in the memory request management table 127 associated with the opcode information. Further, when the completion signal is received from the reply receiving unit 43, the table management unit 121 records that the data is read from the main storage unit 21 or the cache memory 23 for the entry corresponding to the completion signal.

メモリリクエスト管理テーブル１２７は、演算処理装置１が主記憶部２１あるいはキャッシュメモリ２３とデータを読み書きする際にメモリリクエストを発行する可能性のある命令やその命令で発行されるメモリリクエスト数などを関連づけて、リクエスト管理情報として順次記録する。メモリリクエスト管理テーブル１２７はサイクリックに使用される。すなわち、最後のエントリにリクエスト管理情報を登録すると、次の登録は最初のエントリに移る。メモリリクエスト管理テーブル１２７の各エントリは、登録されていたリクエスト管理情報に対応するデータの読み書き終了後に、何も登録されていない状態に戻され再利用される。メモリリクエスト管理テーブル１２７のエントリ数（図２では８個）は、使用中のエントリの格納内容を破壊することなくエントリを再利用できるよう余裕を持って設定される。 The memory request management table 127 associates instructions that may issue a memory request when the processor 1 reads / writes data from / to the main storage unit 21 or the cache memory 23, and the number of memory requests issued by the instruction. Are sequentially recorded as request management information. The memory request management table 127 is used cyclically. That is, when request management information is registered in the last entry, the next registration moves to the first entry. Each entry in the memory request management table 127 is returned to the state in which nothing is registered and reused after completion of reading and writing of data corresponding to the registered request management information. The number of entries in the memory request management table 127 (eight in FIG. 2) is set with a margin so that the entries can be reused without destroying the stored contents of the entries in use.

図２に示すように、メモリリクエスト管理テーブル１２７の各エントリは、「エントリ番号」と、「命令種別情報」と、「使用中情報」と、「リクエスト数」と、「リプライ情報」とを含む。
「エントリ番号」は、メモリリクエスト管理テーブル１２７の各エントリの識別情報である。
「命令種別情報」は、メモリリクエストを発行する可能性のあるオペコードを記録する。なお、図２では、理解しやすくするため、命令を自然言語で記載しているが、実際には、２進数で記録される。また、「空」と記載してある（エントリ番号７）のは、エントリが未使用であることを示しており、例えば、未定義のオペコードあるいは、いわゆるＮＯＰ命令のオペコードを格納する。
「使用中情報」は、エントリが使用中であるか否かを示すデータであり、「１」であれば使用中、「０」であれば未使用であることを示す。エントリに「命令種別情報」が登録された際に「０」から「１」に更新される。
「リクエスト数」は、命令種別情報に対応する命令が発行するメモリリクエスト数の期待値である。
「リプライ情報」は、命令種別情報に対応する命令の処理に必要なデータが主記憶部２１あるいはキャッシュメモリ２３から読み出されたか否かを示すデータであり、「１」でば読み出し済み、「０」であれば要求中であることを示す。
「使用中情報」と「リプライ情報」とが共に「１」になると、所定の時間経過後、テーブル管理部１２１の指示により、エントリ番号７の欄のように未使用状態に戻される。 As shown in FIG. 2, each entry of the memory request management table 127 includes “entry number”, “instruction type information”, “in-use information”, “number of requests”, and “reply information”. .
“Entry number” is identification information of each entry in the memory request management table 127.
“Instruction type information” records an operation code that may issue a memory request. In FIG. 2, the instructions are written in a natural language for easy understanding, but are actually recorded in binary numbers. Further, “empty” (entry number 7) indicates that the entry is unused, and stores, for example, an undefined opcode or a so-called NOP instruction opcode.
“In-use information” is data indicating whether or not the entry is in use. If it is “1”, it is in use, and if it is “0”, it indicates that it is unused. When “instruction type information” is registered in the entry, it is updated from “0” to “1”.
“Number of requests” is an expected value of the number of memory requests issued by the instruction corresponding to the instruction type information.
“Reply information” is data indicating whether or not the data necessary for processing the instruction corresponding to the instruction type information has been read out from the main storage unit 21 or the cache memory 23. “0” indicates that a request is being made.
When both “in-use information” and “reply information” are “1”, the table management unit 121 returns to the unused state as indicated by the entry number 7 column after a predetermined time has elapsed.

図１に戻って、登録時計算制御部４６は、メモリリクエスト管理テーブル１２７にリクエスト管理情報が登録される際に加算する値を計算し、演算器４８に渡す。加算される値は、テーブル管理部１２１から送信されるエントリ番号の「リクエスト数」を参照するか、命令デコーダ１３から受信したオペコード情報に応じて定まる値である。 Returning to FIG. 1, the registration-time calculation control unit 46 calculates a value to be added when request management information is registered in the memory request management table 127 and passes it to the computing unit 48. The value to be added is a value determined by referring to the “number of requests” of the entry number transmitted from the table management unit 121 or according to the opcode information received from the instruction decoder 13.

終了時計算制御部４７は、メモリリクエスト管理テーブル１２７に登録されている命令に関し、メモリアクセスが終了した際に減算される値を計算し、負の数に変換し演算器４８に渡す。減算される値は、メモリリクエスト管理テーブル１２７の「リプライ情報」が「０」から「１」に変更されたエントリの「リクエスト数」に登録された値である。 The end time calculation control unit 47 calculates a value to be subtracted when the memory access is completed for the instruction registered in the memory request management table 127, converts it to a negative number, and passes it to the computing unit 48. The value to be subtracted is a value registered in the “request count” of the entry in which “reply information” in the memory request management table 127 is changed from “0” to “1”.

演算器４８は加算器で構成され、登録時計算制御部４６と、終了時計算制御部４７（負の数とみなす）とから送信される値と、リクエストカウンタ４９に格納されている値とを加算し、リクエストカウンタ４９に格納する。 The computing unit 48 is composed of an adder, and the value transmitted from the registration-time calculation control unit 46 and the end-time calculation control unit 47 (considered as a negative number) and the value stored in the request counter 49 are Add and store in request counter 49.

リクエストカウンタ４９は、演算器４８の加算結果を格納するレジスタである。リクエストカウンタ４９に格納されている値は、動作制御部５０が読み出す。 The request counter 49 is a register that stores the addition result of the computing unit 48. The value stored in the request counter 49 is read by the operation control unit 50.

動作制御部５０は、リクエストカウンタ４９のカウント値に従って演算処理装置１あるいは主記憶部２１の動作を制御する各種制御信号を生成する。動作制御部５０は機能的にリフレッシュタイミング制御部５１と、遅延ロックループ（Delay Lock Loop、ＤＬＬ）制御部５２と、クロック制御部５３と、プリフェッチプリロード制御部５４とから構成される。 The operation control unit 50 generates various control signals for controlling the operation of the arithmetic processing unit 1 or the main storage unit 21 according to the count value of the request counter 49. The operation control unit 50 functionally includes a refresh timing control unit 51, a delay lock loop (DLL) control unit 52, a clock control unit 53, and a prefetch preload control unit 54.

リフレッシュタイミング制御部５１は、少なくとも所定の間隔（これは、主記憶部２１内の記憶セルの時定数に依存する）で主記憶部２１の格納内容をリフレッシュする意味を持つリフレッシュ信号を生成、出力する。ただし、リクエストカウンタ４９から読み出した値が０である場合に、直ちにリフレッシュ信号を生成する。
主記憶部２１の格納内容をリフレッシュしている間、主記憶部２１はメモリアクセスに対応できない。したがって、メモリアクセスがない隙を見計らって主記憶部２１の格納内容をリフレッシュすることで、リフレッシュ中のメモリアクセスの回数が減少し、演算処理装置１の処理性能が向上する。 The refresh timing control unit 51 generates and outputs a refresh signal having a meaning of refreshing the stored contents of the main memory unit 21 at least at a predetermined interval (which depends on the time constant of the memory cell in the main memory unit 21). To do. However, when the value read from the request counter 49 is 0, a refresh signal is immediately generated.
While the stored contents of the main storage unit 21 are refreshed, the main storage unit 21 cannot cope with memory access. Therefore, by refreshing the stored contents of the main memory 21 in anticipation of no memory access, the number of memory accesses during the refresh is reduced, and the processing performance of the arithmetic processing unit 1 is improved.

ＤＬＬ制御部５２は、少なくとも所定の間隔（これは、演算処理装置１の動作環境の変化により各種信号にばらつきが発生したとしても、各部の動作が１クロックで完了する状態が保たれる最長の時間よりは短い実験的・経験的に求められた時間である）でＤＬＬの再調整を指示する意味を持つ調整信号を生成、出力する。ただし、リクエストカウンタ４９から読み出した値が０である場合に、直ちに調整信号を生成する。ここで、ＤＬＬの再調整とは、遅延素子での遅延時間を調整することで演算処理装置１内の各種信号をクロック信号に同期させることをいう。また、ＤＬＬの再調整の対象となるのは演算処理装置１とメモリ管理ユニット２５との間のデータバスＡ、Ｂである。
メモリアクセスがない隙を見計らってＤＬＬを再調整することで、ＤＬＬ再調整中のメモリアクセスタイムの増加を低減させる。これにより、演算処理装置１の処理性能の向上を図っている。 The DLL control unit 52 has at least a predetermined interval (this is the longest time that the operation of each unit can be completed in one clock even if variations occur in various signals due to a change in the operating environment of the arithmetic processing unit 1. It is an experimentally and empirically determined time shorter than the time) and generates and outputs an adjustment signal having a meaning of instructing the readjustment of the DLL. However, when the value read from the request counter 49 is 0, an adjustment signal is immediately generated. Here, the readjustment of DLL means that various signals in the arithmetic processing unit 1 are synchronized with the clock signal by adjusting the delay time in the delay element. Further, the data buses A and B between the arithmetic processing unit 1 and the memory management unit 25 are subject to DLL readjustment.
By re-adjusting the DLL in anticipation of no memory access, an increase in memory access time during DLL re-adjustment is reduced. Thereby, the processing performance of the arithmetic processing device 1 is improved.

クロック制御部５３は、リクエストカウンタ４９のカウント値に従った電圧レベルで、動作クロック信号発生部６０が発生するクロック信号の周波数を制御するクロック制御信号を出力する。例えば、リクエストカウンタ４９のカウンタ値が０の場合、クロック信号の周波数を基準周波数から上げるよう、リクエストカウンタ４９のカウンタ値が０以上（以外）の場合、クロックの周波数を基準周波数に戻すよう、電圧レベルを調整してクロック制御信号を出力する。
これにより、演算処理装置１の消費電力の低減ができる。
また、クロック制御部５３が出力するクロック制御信号は、リクエストカウンタ４９のカウント値に従った電圧レベルで、メモリクロック信号発生部６１が発生するメモリクロック信号の周波数を制御する。例えば、クロック制御部５３は、上述したように電圧レベルを調整したクロック制御信号を出力することで、リクエストカウンタ４９のカウンタ値が０の場合、メモリクロック信号の周波数を基本周波数から下げ、リクエストカウンタ４９のカウント値が０以上の場合、メモリクロック信号の周波数を基本周波数に戻す、あるいは基本周波数から上げる。これにより、メモリアクセスのターンアラウンド時間の削減ができる。 The clock control unit 53 outputs a clock control signal for controlling the frequency of the clock signal generated by the operation clock signal generation unit 60 at a voltage level according to the count value of the request counter 49. For example, when the counter value of the request counter 49 is 0, the voltage is set so that the frequency of the clock signal is increased from the reference frequency, and when the counter value of the request counter 49 is 0 or more (other than), the clock frequency is returned to the reference frequency. Adjust the level and output the clock control signal.
Thereby, the power consumption of the arithmetic processing unit 1 can be reduced.
The clock control signal output from the clock controller 53 controls the frequency of the memory clock signal generated by the memory clock signal generator 61 at a voltage level according to the count value of the request counter 49. For example, the clock control unit 53 outputs the clock control signal with the voltage level adjusted as described above, so that when the counter value of the request counter 49 is 0, the frequency of the memory clock signal is lowered from the basic frequency, and the request counter When the count value of 49 is 0 or more, the frequency of the memory clock signal is returned to the basic frequency or increased from the basic frequency. As a result, the memory access turnaround time can be reduced.

プリフェッチプリロード制御部５４は、命令列やデータ列のプリフェッチやプリロード時のリクエスト発行数を制御する。例えば、０≦カウンタ値＜１００の場合はリクエスト発行数３、１００≦カウンタ値＜１０００の場合はリクエスト発行数２、１０００≦カウンタの値場合はリクエスト発行数１という制御を行う。 The prefetch preload control unit 54 controls the number of requests issued at the time of prefetching and preloading instruction sequences and data sequences. For example, when 0 ≦ counter value <100, control is performed such that the number of request issuances is 3, when 100 ≦ counter value <1000, the request issuance number is 2, and when 1000 ≦ counter value, the request issuance number is 1.

動作クロック信号発生部６０は、クロック制御部５３から受信するクロック制御信号の電圧に応じた周波数でクロック信号を生成し、演算処理装置１各部に供給する。このため、動作クロック信号発生部６０が発生するクロック信号の周波数は、リクエストカウンタ４９のカウンタ値に応じて増減する。ただし、基準となる周波数の上下で動作エラーが発生しない動作範囲で設定された所定の範囲内でしか周波数は変化しない。 The operation clock signal generator 60 generates a clock signal at a frequency corresponding to the voltage of the clock control signal received from the clock controller 53 and supplies the clock signal to each unit of the arithmetic processing unit 1. For this reason, the frequency of the clock signal generated by the operation clock signal generator 60 increases or decreases according to the counter value of the request counter 49. However, the frequency changes only within a predetermined range set in an operation range where an operation error does not occur above and below the reference frequency.

メモリクロック信号発生部６１は、クロック制御部５３から受信するクロック制御信号の電圧に応じた周波数でメモリクロック信号を生成し、主記憶部２１と、キャッシュメモリ２３と、メモリ管理ユニット２５とに供給する。このため、メモリクロック信号発生部６１が発生するバスクロック信号の周波数は、リクエストカウンタ４９のカウンタ値に応じて増減する。ただし、基準となる周波数の上下で動作エラーが発生しない動作範囲で設定された所定の範囲内でしか周波数は変化しない。 The memory clock signal generation unit 61 generates a memory clock signal at a frequency corresponding to the voltage of the clock control signal received from the clock control unit 53 and supplies the memory clock signal to the main storage unit 21, the cache memory 23, and the memory management unit 25. To do. For this reason, the frequency of the bus clock signal generated by the memory clock signal generator 61 increases or decreases according to the counter value of the request counter 49. However, the frequency changes only within a predetermined range set in an operation range where an operation error does not occur above and below the reference frequency.

主記憶部２１は、例えば、ＤＲＡＭ（Dynamic Random Access Memory）から構成され、演算処理装置１で実行されるプログラム（命令列）やプログラムの実行に必要なデータ列を格納する。主記憶部２１はリフレッシュタイミング制御部５１から出力されるリフレッシュ信号に応答して、そのたびに格納内容をリフレッシュしている。 The main storage unit 21 is composed of, for example, a DRAM (Dynamic Random Access Memory), and stores a program (instruction sequence) executed by the arithmetic processing device 1 and a data sequence necessary for execution of the program. In response to the refresh signal output from the refresh timing control unit 51, the main storage unit 21 refreshes the stored contents each time.

キャッシュメモリ２３は、例えば、ＳＲＡＭ（Static Random Access Memory）から構成され、主記憶部２１に格納されている命令列などのキャッシュデータを格納する。キャッシュメモリ２３へのデータの読み込みはキャッシュライン単位で行われる。例えば、キャッシュラインの大きさ（長さ）が６４バイトの場合、読み込み時に発行されるメモリリクエスト数は８バイトである。ストア命令の場合は、スカラ型ストア命令あるいはベクトル型ストア命令に従って発行されるメモリリクエスト数が変化する。スカラ型ストア命令であれば１、ベクトル型ストア命令であればストアするデータサイズに応じて変化する。 The cache memory 23 is composed of, for example, an SRAM (Static Random Access Memory), and stores cache data such as an instruction sequence stored in the main storage unit 21. Reading of data into the cache memory 23 is performed in units of cache lines. For example, when the size (length) of the cache line is 64 bytes, the number of memory requests issued at the time of reading is 8 bytes. In the case of a store instruction, the number of memory requests issued according to a scalar type store instruction or a vector type store instruction changes. 1 for a scalar type store instruction, and changes according to the data size to be stored for a vector type store instruction.

メモリ管理ユニット（Memory Management Unit、ＭＭＵ）２５は、演算処理装置１とキャッシュメモリ２３と主記憶部２１との間にあって、キャッシュメモリ２３に格納されているデータが主記憶部２１のどの部分のコピーであるのか、また、キャッシュメモリ２３の格納内容を上書きする領域の判別などを行う。 A memory management unit (MMU) 25 is located between the arithmetic processing unit 1, the cache memory 23, and the main storage unit 21, and the data stored in the cache memory 23 is a copy of which part of the main storage unit 21. It is also determined whether or not to overwrite the stored contents of the cache memory 23.

ＭＭＵ２５は、データを読み出す場合、命令フェッチ部１１と、読み出しステージ１５と、書き込みステージ１９とにより指定されるアドレスに格納されているデータがキャッシュメモリ２３に格納されているか否かを判別し、格納されていると判別すれば、キャッシュメモリ２３からデータを読み出し、格納されていないと判別すれば、主記憶部２１から読み出して、コピーをキャッシュメモリ２３に格納する。データを主記憶部２１に書き込む場合、主記憶部２１に格納する。また、指定されたアドレスのデータがキャッシュメモリ２３に格納されている場合、キャッシュメモリ２３にも格納する。ＭＭＵ２５はデータの読み書きが終了すると、その旨を示す完了信号をリプライ受信部４３へ送信する。 When reading data, the MMU 25 determines whether or not the data stored at the address specified by the instruction fetch unit 11, the read stage 15, and the write stage 19 is stored in the cache memory 23. If it is determined that the data is stored, the data is read from the cache memory 23. If it is determined that the data is not stored, the data is read from the main storage unit 21 and a copy is stored in the cache memory 23. When data is written to the main storage unit 21, the data is stored in the main storage unit 21. Further, when the data at the designated address is stored in the cache memory 23, the data is also stored in the cache memory 23. When the reading and writing of data is completed, the MMU 25 transmits a completion signal indicating that to the reply receiving unit 43.

以下、図面を参照して本実施形態に係る情報処理装置１および周辺装置の動作のうち、オペコード情報に従ってメモリリクエスト数を推定する部分を説明する。 Hereinafter, of the operations of the information processing apparatus 1 and peripheral devices according to the present embodiment with reference to the drawings, a part for estimating the number of memory requests according to the opcode information will be described.

まず、図３を参照して、メモリリクエスト管理テーブル１２７にリクエスト管理情報を登録する登録処理の際のスカラリクエスト数計算部４２と、テーブル管理部１２１と、メモリリクエスト管理テーブル１２７と、登録時計算制御部４６と、演算器４８との動作を説明する。 First, referring to FIG. 3, a scalar request number calculation unit 42, a table management unit 121, a memory request management table 127, and a registration calculation at the time of registration processing for registering request management information in the memory request management table 127. Operations of the control unit 46 and the computing unit 48 will be described.

まず、命令デコーダ１３が命令をデコードしたことを契機として登録処理が開始される。命令デコード１３は命令をデコードして得たオペコードをオペコード情報として、スカラリクエスト数計算部４２と、テーブル管理部１２１と、登録時計算制御部４６とに送信する（ステップＳ１０１）。 First, the registration process is started when the instruction decoder 13 decodes the instruction. The instruction decode 13 transmits the operation code obtained by decoding the instruction as operation code information to the scalar request number calculation unit 42, the table management unit 121, and the registration time calculation control unit 46 (step S101).

テーブル管理部１２１は、受信したオペコード情報に含まれるオペコードに対応する命令が主記憶部２１にアクセスする可能性のある命令（ロード命令あるいはストア命令）であるか否かを判別する（ステップＳ１０２）。命令が主記憶部２１にアクセスする可能性のある命令であると判別した場合（ステップＳ１０２：ＹＥＳ）、テーブル管理部１２１は、そのオペコード情報に基づいてリクエスト管理情報を作成し、メモリリクエスト管理テーブル１２７に登録し（ステップＳ１０３）、登録したエントリのエントリ番号を登録時計算制御部４６に送信する。ここで、オペコードが、スカラ型命令に対応する場合、「リクエスト数」としてスカラリクエスト数計算部４２が算出した値を選択し、ベクトル型命令に対応する場合、設定値格納部４４から読み出した値を選択する。そして、リクエスト情報を登録したエントリの「使用中情報」を使用中を意味する「１」に変更する。命令が主記憶部２１にアクセスする可能性のある命令でないと判別した場合（ステップＳ１０２：ＮＯ）、テーブル管理部１２１は何もせず、ステップＳ１０４に処理を進める。 The table management unit 121 determines whether or not an instruction corresponding to the operation code included in the received operation code information is an instruction (load instruction or store instruction) that may access the main storage unit 21 (step S102). . If it is determined that the instruction is an instruction that may access the main storage unit 21 (step S102: YES), the table management unit 121 creates request management information based on the opcode information, and generates a memory request management table. In step S103, the entry number of the registered entry is transmitted to the registration calculation control unit 46. Here, when the opcode corresponds to a scalar type instruction, the value calculated by the scalar request number calculation unit 42 is selected as the “number of requests”, and when it corresponds to a vector type instruction, the value read from the setting value storage unit 44 Select. Then, the “in-use information” of the entry in which the request information is registered is changed to “1” meaning in use. If it is determined that the instruction is not an instruction that may access the main storage unit 21 (step S102: NO), the table management unit 121 does nothing and advances the process to step S104.

次に、登録時計算制御部４６は、テーブル管理部１２１から通知されたエントリ番号に対応するエントリに格納されているリクエスト数をメモリリクエスト管理テーブル１２７から読み出し、演算器４８に送信する。最後に、演算器４８はリクエストカウンタ４９の格納値と登録時計算制御部４６から送信された値を加算し（ステップＳ１０４）、リクエストカウンタ４９に格納し登録処理を終了する。 Next, the registration-time calculation control unit 46 reads the number of requests stored in the entry corresponding to the entry number notified from the table management unit 121 from the memory request management table 127 and transmits it to the computing unit 48. Finally, the computing unit 48 adds the stored value of the request counter 49 and the value transmitted from the registration time calculation control unit 46 (step S104), stores it in the request counter 49, and ends the registration process.

図４を参照して、メモリリクエスト管理テーブル１２７にリクエスト管理情報を削除する削除処理の際のリプライ受信部４３と、テーブル管理部１２１、とメモリリクエスト管理テーブル１２７と、終了時計算制御部４７と、演算器４８との動作を説明する。 Referring to FIG. 4, reply receiving unit 43, table management unit 121, memory request management table 127, and end time calculation control unit 47 at the time of deletion processing for deleting request management information in memory request management table 127. The operation with the computing unit 48 will be described.

まず、ＭＭＵ２５が要求されたデータの読み出しが完了した旨を示す完了信号がＭＭＵ２５からリプライ受信部４３へ送信されたことを契機として削除処理が開始される。テーブル管理部１２１はリプライ受信部４３から完了信号を受信する（ステップＳ２０１）。上述したように、本実施形態では、オペコード情報の受信順序と完了信号の受信順序は同一であるので、テーブル管理部１２１は、全エントリの「リプライ情報」を参照すれば、メモリリクエスト管理テーブル１２７のどのエントリの「リプライ情報」を「１」にするべきかを判別できる。テーブル管理部１２１は、「リプライ情報」を「１」にすべきエントリを判別すると、該当する「リプライ情報」を１に更新する（ステップＳ２０２）。そして、テーブル管理部１２１はデータの読み書きが終了したエントリのエントリ番号を終了時制御部４７に送信する。 First, the deletion process is started when a completion signal indicating that the MMU 25 has read the requested data is transmitted from the MMU 25 to the reply receiving unit 43. The table management unit 121 receives a completion signal from the reply receiving unit 43 (step S201). As described above, in this embodiment, the reception order of the opcode information and the reception order of the completion signal are the same. It is possible to determine which entry of which “reply information” should be set to “1”. When the table management unit 121 determines an entry whose “reply information” should be “1”, the table management unit 121 updates the corresponding “reply information” to 1 (step S202). Then, the table management unit 121 transmits the entry number of the entry for which data reading / writing has been completed to the termination control unit 47.

次に、終了時計算制御部４７は、テーブル管理部１２１から通知されたエントリ番号に対応するエントリに格納されているリクエスト数をメモリリクエスト管理テーブル１２７から読み出し、演算器４８に送信する。最後に、演算器４８はリクエストカウンタ４９の格納値から登録時計算制御部４６から送信された値を減算し（ステップＳ２０３）、リクエストカウンタ４９に格納し削除処理を終了する。 Next, the end-time calculation control unit 47 reads the number of requests stored in the entry corresponding to the entry number notified from the table management unit 121 from the memory request management table 127 and transmits it to the computing unit 48. Finally, the computing unit 48 subtracts the value transmitted from the registration calculation control unit 46 from the stored value of the request counter 49 (step S203), stores it in the request counter 49, and ends the deletion process.

なお、実際には、登録処理と削除処理は並行して実行されるため、演算器４８では、登録時計算制御部４６から送信される値と終了時計算制御部４７から送信される値（ただし、この値は負の数として扱う）とリクエストカウンタ４９に格納されている値とを加算し、リクエストカウンタ４９に格納する。 Actually, since the registration process and the deletion process are executed in parallel, the computing unit 48 uses a value transmitted from the registration time calculation control unit 46 and a value transmitted from the end time calculation control unit 47 (however, The value is treated as a negative number) and the value stored in the request counter 49 is added and stored in the request counter 49.

以上説明した登録処理および削除処理により、演算処理装置１とＭＭＵ２５と間をつなぐバスＡ、Ｂで行き来するメモリリクエスト数を計算することが可能となる。この計算されたメモリリクエスト数は、演算処理装置１とＭＭＵ２５と間をつなぐバスＡ、Ｂの負荷（混雑度）の予測値を表す。主記憶部２１にアクセスするにはＭＭＵ２５が仮想アドレスから物理アドレスを計算する必要があるため、命令デコード１３が命令をデコードしてから、実際にリクエストが発行されるまでには時間的な遅延がある。このため、本発明の実施例で示した構成をとることで実際のリクエストが発行される段階でカウントするよりも早い段階でリクエスト数を知ることが可能となる。 By the registration process and the deletion process described above, it is possible to calculate the number of memory requests going back and forth on the buses A and B connecting the arithmetic processing unit 1 and the MMU 25. The calculated number of memory requests represents a predicted value of the load (congestion degree) of the buses A and B connecting the arithmetic processing unit 1 and the MMU 25. Since the MMU 25 needs to calculate the physical address from the virtual address in order to access the main storage unit 21, there is a time delay between the instruction decode 13 decoding the instruction and the actual request being issued. is there. Therefore, by adopting the configuration shown in the embodiment of the present invention, it is possible to know the number of requests at an earlier stage than counting at the stage where an actual request is issued.

リクエストカウンタ４９の保持値に従って、動作制御部５０内の各制御部５１〜５４で以下のように動作する。 The control units 51 to 54 in the operation control unit 50 operate as follows according to the hold value of the request counter 49.

リフレッシュタイミング制御部５１は、リクエストカウンタ４９に格納された値を読み出し、この値が０であると判別すると、主記憶部２１の格納内容をリフレッシュする意味を持つリフレッシュ信号を生成し、主記憶部２１に出力する。リクエストカウンタ４９から読み出した値が所定の時間０で無くても、前回のリフレッシュ時から所定の時間経過すると、リフレッシュ信号を生成し、主記憶部２１に出力する。 When the refresh timing control unit 51 reads the value stored in the request counter 49 and determines that this value is 0, the refresh timing control unit 51 generates a refresh signal having a meaning of refreshing the stored contents of the main storage unit 21, and To 21. Even if the value read from the request counter 49 is not the predetermined time 0, a refresh signal is generated and output to the main storage unit 21 when a predetermined time has elapsed since the previous refresh.

ＤＬＬ制御部５２は、リクエストカウンタ４９に格納された値を読み出し、この値が０であると判別すると、ＤＬＬの再調整を指示する意味を持つ調整信号を生成し、バスＡ、Ｂに出力する。リクエストカウンタ４９から読み出した値が所定の時間０で無くても、前回のＤＬＬの再調整の指示を出してから所定の時間経過すると、調整信号を生成し、バスＡ、Ｂに出力する。 The DLL control unit 52 reads the value stored in the request counter 49 and, if it is determined that this value is 0, generates an adjustment signal having a meaning of instructing readjustment of the DLL, and outputs it to the buses A and B. . Even if the value read from the request counter 49 is not the predetermined time 0, an adjustment signal is generated and output to the buses A and B when a predetermined time elapses after a previous DLL readjustment instruction is issued.

クロック制御部５３は、リクエストカウンタ４９に格納された値を読み出し、この値が０であると判別すると、クロック信号の周波数を上げるよう電圧レベルを調整してクロック制御信号を動作クロック信号発生部６０に出力する。読み出した値が０でない場合、クロック制御部５３は動作クロック信号の周波数を基本周波数に戻すよう電圧レベルを調整してクロック制御信号を動作クロック信号発生部６０に出力する。また、同じクロック制御信号により、メモリクロック信号発生部６１が出力するメモリクロック信号は、読み出した値が０である場合、メモリクロック信号の周波数を基本周波数から下げ、読み出した値が０でない場合、メモリクロック信号の周波数を基本周波数に戻す、もしくは基本周波数から上げる。 When the clock controller 53 reads the value stored in the request counter 49 and determines that the value is 0, the clock controller 53 adjusts the voltage level to increase the frequency of the clock signal and sends the clock control signal to the operation clock signal generator 60. Output to. When the read value is not 0, the clock control unit 53 adjusts the voltage level so as to return the frequency of the operation clock signal to the basic frequency, and outputs the clock control signal to the operation clock signal generation unit 60. Further, when the read value of the memory clock signal output from the memory clock signal generation unit 61 by the same clock control signal is 0, the frequency of the memory clock signal is lowered from the basic frequency, and when the read value is not 0, Return the frequency of the memory clock signal to the basic frequency or increase it from the basic frequency.

プリフェッチプリロード制御部５４は、リクエストカウンタ４９に格納された値を読み出し、この値が０以上１００未満の場合はメモリリクエストの発行数を３、１００以上１０００未満の場合メモリリクエストの発行数を２、１０００以上の場合メモリリクエスト数の発行数を１にするよう、制御信号を命令デコーダ１３と、読み出しステージ１５と、書き込みステージ１９へ送信する。 The prefetch preload control unit 54 reads the value stored in the request counter 49. When this value is 0 or more and less than 100, the number of issued memory requests is 3, and when the value is 100 or more and less than 1000, the number of issued memory requests is 2. When the number is 1000 or more, a control signal is transmitted to the instruction decoder 13, the read stage 15, and the write stage 19 so that the number of issued memory requests is 1.

以上、説明したように、本発明の実施形態に係る情報処理装置によれば、主記憶部２１へのメモリアクセスの変動に応じて、主記憶部２１をリフレッシュするタイミングの制御、ＤＬＬ再調整動作のタイミングの制御、動作周波数の一時的な上下、メモリリクエスト発行数の制御により、メモリアクセスの効率化を図り、もって演算処理装置１の動作周波数の向上を図ることができる。 As described above, according to the information processing apparatus according to the embodiment of the present invention, the timing of refreshing the main memory unit 21 and the DLL readjustment operation according to the memory access fluctuation to the main memory unit 21 Thus, the memory access efficiency can be improved by controlling the timing of the above, the temporary rise and fall of the operating frequency, and the control of the number of memory requests issued, thereby improving the operating frequency of the arithmetic processing unit 1.

なお、本発明は上記実施形態に限定されず、種々の変形および応用が可能である。
例えば、上記実施形態では、メモリアクセスが発生する命令は、ロード命令およびストア命令の２種類であったが、これらに限定されるものではない。さらに、ベクトル型命令、スカラ型命令のうち、一方だけで上記動作を行うようにしてもよい。また、上記実施形態では外部入出力命令について考慮しなかったが、上記実施形態で説明した手法を敷衍することで考慮することもできる。 In addition, this invention is not limited to the said embodiment, A various deformation | transformation and application are possible.
For example, in the above-described embodiment, there are two types of instructions that cause a memory access: a load instruction and a store instruction, but the present invention is not limited to these. Furthermore, the above operation may be performed by only one of a vector type instruction and a scalar type instruction. Further, although the external input / output command is not considered in the above embodiment, it can also be considered by applying the method described in the above embodiment.

また、キャッシュメモリ２３へのデータの書き込み方式がライトスルー方式である場合を例に説明したが、ライトバック方式でもよい。なお、キャッシュメモリ２３が演算処理装置１に内蔵されていてもよい。 Further, although the case where the data writing method to the cache memory 23 is the write-through method has been described as an example, the write-back method may be used. Note that the cache memory 23 may be built in the arithmetic processing unit 1.

また、上記実施形態では、主記憶部２１としてリフレッシュ動作の必要なＤＲＡＭを例に説明したが、ＳＲＡＭ（Static RAM）でもよい。この場合、リフレッシュ動作が不要となる。また、２次キャッシュなど多段にキャッシュメモリが存在する場合、演算処理装置１と主記憶部２１との間でメモリアクセスが発生する可能性があるときに、メモリリクエスト管理テーブル１２７のカウント数を増減させるようにしてもよい。 In the above embodiment, a DRAM requiring a refresh operation has been described as an example of the main storage unit 21, but an SRAM (Static RAM) may be used. In this case, the refresh operation is not necessary. In addition, when there are multi-stage cache memories such as a secondary cache, the memory request management table 127 count may be increased or decreased when there is a possibility of memory access between the arithmetic processing unit 1 and the main storage unit 21. You may make it make it.

また、レジスタのリネーミング、実行ステージでの演算結果の横流し、等演算処理装置１の動作周波数を向上させる既知の手法と組み合わせて本発明を適用することができる。 In addition, the present invention can be applied in combination with known methods for improving the operating frequency of the arithmetic processing apparatus 1, such as register renaming, cross-flow of calculation results at the execution stage, and the like.

また、設定値格納部４４はなくてもよい。この場合、テーブル管理部１２１がオペコード情報に含まれるオペコードの種別に応じた、発行される可能性のあるメモリリクエスト数を生成する。 Further, the set value storage unit 44 may be omitted. In this case, the table management unit 121 generates the number of memory requests that may be issued according to the type of the operation code included in the operation code information.

また、上記実施形態で動作制御部５０内の各制御部５１〜５４における判別条件は一例であり、任意の閾値により判別可能である。 In the above embodiment, the determination condition in each of the control units 51 to 54 in the operation control unit 50 is an example, and can be determined by an arbitrary threshold value.

また、リクエストカウンタ４９が特権命令、非特権命令によりアクセス可能なレジスタであってもよい。この場合、プログラム側でリクエスト数予測の結果を利用できる。例えば演算処理装置１とＭＭＵ２５の間にあるバス上の負荷を考慮したプログラムによる命令スケジューリングに利用できる。 Further, the request counter 49 may be a register accessible by a privileged instruction or a non-privileged instruction. In this case, the program side can use the result of request number prediction. For example, it can be used for instruction scheduling by a program considering the load on the bus between the arithmetic processing unit 1 and the MMU 25.

また、上記実施形態では、直近のキャッシュミス率に基づいて、ロード命令（スカラ）におけるリクエスト要求の発行数を予測していたが、これをあらかじめ実験的な試行に基づいて算出された１命令あたり平均のリクエスト要求発行数を格納したメモリから読み込むように構成することができる。この場合、このメモリへの格納値は工場設定により固定であってもよいし、ディップスイッチ入力あるいはジャンパプラグの設定により可変であってもよい。 In the above embodiment, the number of request requests issued for a load instruction (scalar) is predicted based on the most recent cache miss rate, but this is calculated in advance per instruction calculated based on experimental trials. The average request request issuance number can be read from the stored memory. In this case, the value stored in the memory may be fixed by factory setting, or may be variable by dip switch input or jumper plug setting.

また、上記実施形態からリプライ受信部４３と終了時計算制御部４７とを取り除き、演算器４８で減算しない構成にすることで、累積のリクエスト数を計算することもできる。 Further, by removing the reply receiving unit 43 and the end-time calculation control unit 47 from the above-described embodiment and adopting a configuration in which no subtraction is performed by the computing unit 48, the cumulative number of requests can be calculated.

なお、本発明の実施形態に係る演算処理装置１は、専用のハードウェアによらず、通常のコンピュータシステムを用いて実現可能である。例えば、汎用コンピュータに、上述の処理を実行するためのプログラムを格納した媒体（ＣＤ−ＲＯＭなど）から当該プログラムを記憶部にインストールすることにより、上述の処理をエミュレート実行する情報処理装置を構築することができる。 Note that the arithmetic processing apparatus 1 according to the embodiment of the present invention can be realized by using a normal computer system without using dedicated hardware. For example, an information processing apparatus that emulates the above-described processing is installed by installing the program in a storage unit from a medium (such as a CD-ROM) that stores the program for executing the above-described processing in a general-purpose computer. can do.

また、上述の機能を、ＯＳ（Operating System）とアプリケーションとの分担、またはＯＳとアプリケーションとの協動により実現する場合などには、ＯＳ以外の部分のみを媒体に格納してもよい。 Further, when the above-described functions are realized by sharing between an OS (Operating System) and an application, or by cooperation between the OS and the application, only a part other than the OS may be stored in the medium.

また、搬送波にプログラムを重畳し、通信ネットワークを介して配信することも可能である。例えば、通信ネットワーク上の掲示板（ＢＢＳ、Bulletin Board System）に該プログラムを掲示し、ネットワークを介して該プログラムを配信してもよい。そして、このプログラムを起動し、オペレーティングシステムの制御下で、他のアプリケーションプログラムと同様に実行することにより、上述の処理を実行できるように構成しても構わない。 It is also possible to superimpose a program on a carrier wave and distribute it via a communication network. For example, the program may be posted on a bulletin board (BBS, Bulletin Board System) on a communication network and distributed through the network. The program may be activated and executed in the same manner as other application programs under the control of the operating system, so that the above-described processing may be executed.

本発明の実施形態に係る演算処理装置および周辺装置のブロック図である。1 is a block diagram of an arithmetic processing device and peripheral devices according to an embodiment of the present invention. 図１のメモリリクエスト管理テーブルの構成を示すブロック図である。FIG. 2 is a block diagram illustrating a configuration of a memory request management table in FIG. 1. 本発明の実施形態に係るメモリリクエスト管理テーブルへの登録処理を説明するためのフローチャートである。It is a flowchart for demonstrating the registration process to the memory request management table which concerns on embodiment of this invention. 本発明の実施形態に係るメモリリクエスト管理テーブルからの削除処理を説明するためのフローチャートである。It is a flowchart for demonstrating the deletion process from the memory request management table which concerns on embodiment of this invention.

Explanation of symbols

１演算処理装置
１１命令フェッチ部
１３命令デコーダ
１５読み出しステージ
１７演算器
１９書き込みステージ
２１主記憶部
２３キャッシュメモリ
２５メモリ管理ユニット（ＭＭＵ）
２９レジスタ群
４１キャッシュミス率算出部
４２スカラリクエスト数計算部
４３リプライ受信部
４６登録時計算制御部
４７終了時計算制御部
４８演算器
４９リクエストカウンタ
５０動作制御部
５１リフレッシュタイミング制御部
５２遅延ロックループ（ＤＬＬ）制御部
５３クロック制御部
５４プリフェッチプリロード制御部
６０動作クロック信号発生部
６１メモリクロック信号発生部
１２１テーブル管理部
１２７メモリリクエスト管理テーブル
DESCRIPTION OF SYMBOLS 1 Arithmetic processing unit 11 Instruction fetch part 13 Instruction decoder 15 Read stage 17 Calculator 19 Write stage 21 Main memory part 23 Cache memory 25 Memory management unit (MMU)
29 Register Group 41 Cache Miss Rate Calculation Unit 42 Scalar Request Number Calculation Unit 43 Reply Reception Unit 46 Registration Time Calculation Control Unit 47 End Time Calculation Control Unit 48 Operation Unit 49 Request Counter 50 Operation Control Unit 51 Refresh Timing Control Unit 52 Delay Lock Loop (DLL) control unit 53 clock control unit 54 prefetch preload control unit 60 operation clock signal generation unit 61 memory clock signal generation unit 121 table management unit 127 memory request management table

Claims

A control unit;
An information processing apparatus comprising a memory,
The controller is
An instruction decoder that extracts an opcode and an operand from the instruction;
Instruction type determining means for determining whether or not the operand fetched by the instruction decoder is an operand of an instruction that accesses the memory;
Addition value calculating means for calculating an expected value of the number of requests issued when accessing the memory by this instruction when the instruction type determining means determines that the operand is an operand of an instruction accessing the memory When,
Integrating means for integrating the values calculated by the added value calculating means;
An information processing apparatus comprising:

The controller is
Subtract value calculation means for calculating the expected value calculated by the addition value calculation means as a subtraction value for the instruction that is the basis of this access after the access to the memory is completed,
The integration means integrates the value calculated by the addition value calculation means and the subtraction value calculation means;
The information processing apparatus according to claim 1.

The memory is a memory for refreshing stored contents at a predetermined interval or shorter than the predetermined interval in order to continue storing information,
The controller is
Integrated value determining means for determining whether or not the value calculated by the integrating means is 0;
When the integrated value determining means determines that the value calculated by the integrating means is 0, a refresh instruction signal for refreshing the stored contents of the memory is output to the memory, and the value calculated by the integrating means is 0. Refresh instruction means for outputting a refresh instruction signal to the memory at the predetermined interval when it is determined that the stored contents of the memory are not,
Comprising
The information processing apparatus according to claim 1, wherein:

The information processing apparatus operates in synchronization with a clock signal,
The control unit outputs a clock control signal for raising the frequency of the clock signal from a basic frequency when the integrated value determining unit determines that the value calculated by the integrating unit is 0, and the integrating unit Clock control means for outputting a clock control signal for returning the frequency of the clock signal to the fundamental frequency when it is determined that the calculated value is not 0;
The information processing apparatus according to claim 1, 2, or 3.

The memory operates in synchronization with a memory clock signal,
When the integrated value determining means determines that the value calculated by the integrating means is 0, the control section outputs a memory clock control signal for reducing the frequency of the clock signal from a fundamental frequency, and the integrating means Comprising a memory clock control means for outputting a memory clock control signal for returning the frequency of the clock signal to the fundamental frequency when it is determined that the value calculated in step 1 is not 0;
The information processing apparatus according to claim 1, wherein:

The control unit includes request control means for changing a maximum number of memory access requests that can be simultaneously transmitted and received between the control unit and the memory according to the integrated value calculated by the integrated value calculating means. The information processing apparatus according to any one of claims 1 to 5.

The information processing apparatus includes a cache memory,
The control unit comprises a miss rate calculation means for calculating a cache miss rate of the cache memory,
The instruction type determining means is further means for determining whether or not the operand fetched by the instruction decoder is a scalar type instruction,
The addition value calculating means is a means for calculating the expected value according to the cache miss rate calculated by the miss rate calculating unit when the instruction type determining unit determines that the operand is an operand of a scalar type instruction. thing,
The information processing apparatus according to any one of claims 1 to 6.

When the integrated value determining unit determines that the value calculated by the integrating unit is 0, the control unit performs readjustment of a delay lock loop of a memory bus connecting the control unit and the memory. A DLL control unit that outputs a DLL control signal to output the DLL signal at a predetermined interval when readjustment of the delay lock loop of the memory bus is performed when it is determined that the value calculated by the integration unit is not 0. The arithmetic processing apparatus according to claim 1, further comprising:

An arithmetic processing unit that exchanges data with a memory via a memory bus,
An instruction decoder that extracts an opcode and an operand from the instruction;
Instruction type determining means for determining whether or not the operand fetched by the instruction decoder is an operand of an instruction that accesses the memory via the memory bus;
Addition value calculating means for calculating an expected value of the number of requests issued when accessing the memory by this instruction when the instruction type determining means determines that the operand is an operand of an instruction accessing the memory When,
Integrating means for integrating the values calculated by the added value calculating means;
An arithmetic processing apparatus comprising:

An arithmetic processing unit that exchanges data with the memory via the memory bus
A decoding step for extracting an opcode and an operand from the instruction;
A determination step of determining whether or not the operand fetched in the decoding step is an operand of an instruction that accesses the memory via the memory bus;
An addition value calculating step of calculating an expected value of the number of requests issued when accessing the memory with this instruction when the determining step determines that the operand is an operand of the instruction accessing the memory;
An integration step of integrating the values calculated by the addition value calculation step;
A control step of optimizing the memory access of the arithmetic processing unit according to the value integrated in the integration step;
A memory access control method comprising: