JP2009217385A

JP2009217385A - Processor and multiprocessor

Info

Publication number: JP2009217385A
Application number: JP2008058471A
Authority: JP
Inventors: Tomoko Ishibashi; 朋子石橋
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2008-03-07
Filing date: 2008-03-07
Publication date: 2009-09-24

Abstract

<P>PROBLEM TO BE SOLVED: To provide a processor and a multi-processor for acquiring profile information of cache miss when a target program actually runs, for each function. <P>SOLUTION: The processor 11 is provided with: a local memory 23; a clock counter for counting a clock signal based on a clock signal; a function call control part 41 for, when detecting a function call, outputting the counter value of a clock counter, a program counter value and information on the address of a skip destination or a return destination to a local memory 23; and a cache miss control part 42 for, when detecting a cache miss, outputting the counter value of the clock counter, the program counter value and the information on an access address to the local memory 23. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、プロセッサ及びマルチプロセッサに関し、特に、キャッシュミスのプロファイル機能を有するプロセッサ及びマルチプロセッサに関する。 The present invention relates to a processor and a multiprocessor, and more particularly to a processor and a multiprocessor having a cache miss profile function.

従来より、プロセッサ上で動作するプログラムの処理速度を上げる為に、キャッシュメモリ（以下、単にキャッシュともいう）が使用されている。キャッシュを有効に利用するためには、プログラムの時間的局所性（すなわちデータの再利用率とその時間的特性）、及び空間的局所性（すなわちデータの格納位置に対する偏在性）を調査した上で、キャッシュミスのオーバヘッドを低減させることが必須である。 Conventionally, a cache memory (hereinafter also simply referred to as a cache) has been used to increase the processing speed of a program operating on a processor. In order to effectively use the cache, after investigating the temporal locality of the program (ie, the data reuse rate and its temporal characteristics) and the spatial locality (ie, the ubiquity of the data storage location) It is essential to reduce the cache miss overhead.

従来、キャッシュミスの発生状況を調査するために、キャッシュミスイベント検出器とキャッシュミスパフォーマンス測定用レジスタが利用される。すなわち、キャッシュミスイベント検出器が、プロセッサにおけるキャッシュミスの発生イベントを検出すると、キャッシュミスパフォーマンス測定用レジスタ（以下、キャッシュミスカウンタという）がインクリメントされることによって、キャッシュミスの発生回数を測定することができる。 Conventionally, in order to investigate the occurrence of a cache miss, a cache miss event detector and a cache miss performance measurement register are used. That is, when the cache miss event detector detects a cache miss occurrence event in the processor, a cache miss performance measurement register (hereinafter referred to as a cache miss counter) is incremented to measure the number of occurrences of a cache miss. Can do.

キャッシュミスカウンタに保持されたキャッシュミス回数は、ターゲットプログラム中に記述されたコマンドによって読み出すことができる。キャッシュミスカウンタは一つしかないため、ターゲットプログラム中の任意の位置でキャッシュミスカウンタを読み出し、別の場所へ保存することによって、キャッシュミス回数を測定することができる。例えば、ターゲットプログラム中の各関数におけるキャッシュミス回数を測定する場合、各関数の実行前にカウンタ値を読み込み、各関数の実行後に再びカウンタ値を読み込み、各関数の実行の前後におけるカウンタ値の差を算出することによって、各関数におけるキャッシュミス回数を測定することができる。このようにすることによって、測定対象、例えば各関数、のキャッシュに関するプロファイルの情報を得ることができる。 The number of cache misses held in the cache miss counter can be read by a command described in the target program. Since there is only one cache miss counter, the number of cache misses can be measured by reading the cache miss counter at an arbitrary position in the target program and storing it in another location. For example, when measuring the number of cache misses in each function in the target program, the counter value is read before each function is executed, the counter value is read again after each function is executed, and the difference between the counter values before and after the execution of each function is read. By calculating the number of cache misses in each function. By doing so, it is possible to obtain profile information regarding the cache of the measurement object, for example, each function.

しかし、上述した方法では、キャッシュプロファイル情報を得るために、キャッシュミス回数を測定するためのコードを含ませるようにターゲットプログラムを事前に書き換えて再コンパイルする必要がある。そのため、キャッシュミス回数を測定するためのコードを含むターゲットプログラムと、キャッシュミス回数を測定するためのコードを含まないターゲットプログラムとは、コンパイラの最適化により、コードのメモリ配置が変わってしまうため、実際の動作時のプロファイル情報と違う結果が得られる可能性が高い。 However, in the method described above, in order to obtain cache profile information, it is necessary to recompile and recompile the target program in advance so as to include code for measuring the number of cache misses. Therefore, the target memory that includes code for measuring the number of cache misses and the target program that does not include code for measuring the number of cache misses will change the memory layout of the code due to compiler optimization. There is a high possibility that a result different from the profile information during actual operation is obtained.

また、ソフトウエアのシミュレーションによるメモリアクセスの状態を抽出する方法では、実システムの動作とは異なる情報が得られるという問題に対して、CPUのメモリアクセス状態の情報を、システムの振る舞いに影響を与えることなく、外部に出力するCPUメモリアクセス解析装置の技術が提案されている（例えば、特許文献１参照）。 In addition, in the method of extracting memory access status by software simulation, CPU memory access status information affects the system behavior for the problem that information different from the actual system operation can be obtained. There has been proposed a technique of a CPU memory access analysis device that outputs to the outside (see, for example, Patent Document 1).

その提案に係る技術では、CPUメモリアクセス解析装置は、プロセスの切り替わりを検出し、キャッシュミスヒット時に、イベント識別子とCPUから出力された仮想アドレスを選択してキャッシュミス情報として出力する。 In the technology according to the proposal, the CPU memory access analysis device detects a process change, selects an event identifier and a virtual address output from the CPU, and outputs it as cache miss information when a cache miss hits.

しかし、その技術では、関数毎にキャッシュミスがどの程度かつどのようなタイミングで発生しているのか、という情報を得ることできない。
特開2006-285430号公報 However, with this technology, it is impossible to obtain information on how much and at what timing a cache miss occurs for each function.
JP 2006-285430 A

そこで、本発明は、以上の問題に鑑みてなされたものであり、ターゲットプログラムの実際の動作時のキャッシュミスのプロファイル情報を、関数毎に得ることができるプロセッサ及びマルチプロセッサを提供することを目的とする。 Therefore, the present invention has been made in view of the above problems, and an object of the present invention is to provide a processor and a multiprocessor capable of obtaining, for each function, profile information of cache misses during actual operation of a target program. And

本発明の一態様によれば、メモリと、クロック信号に基づいてクロック信号をカウントするクロックカウンタと、関数の呼び出しを検出すると、前記クロックカウンタのカウンタ値と、プログラムカウンタ値と、飛び先あるいは戻り先のアドレスの情報を、前記メモリに出力する関数呼び出し制御部と、キャッシュミスを検出すると、前記クロックカウンタのカウンタ値と、プログラムカウンタ値と、アクセスアドレスの情報を、前記メモリに出力するキャッシュミス制御部と、を有するプロセッサを提供することができる。 According to one aspect of the present invention, upon detecting a memory, a clock counter that counts a clock signal based on a clock signal, and a function call, the counter value of the clock counter, the program counter value, and a jump destination or return A function call control unit that outputs the information of the previous address to the memory, and a cache miss that outputs the counter value of the clock counter, the program counter value, and the access address information to the memory when a cache miss is detected. And a processor having a controller.

本発明のプロセッサ及びマルチプロセッサによれば、ターゲットプログラムの実際の動作時のキャッシュミスのプロファイル情報を、関数毎に得ることができる。 According to the processor and multiprocessor of the present invention, it is possible to obtain cache miss profile information for each function during actual operation of the target program.

以下、図面を参照して本発明の実施の形態を説明する。
まず図１に基づき、本実施の形態に係わるプロセッサシステムの構成を説明する。図１は、本実施の形態に係わるプロセッサシステムの構成を示す構成図である。 Embodiments of the present invention will be described below with reference to the drawings.
First, the configuration of the processor system according to the present embodiment will be described with reference to FIG. FIG. 1 is a configuration diagram showing a configuration of a processor system according to the present embodiment.

プロセッサシステム１は、プロセッサ１１と、主メモリ１２とを含む。プロセッサ１１は、バスインターフェース（以下、I/Fと略す）１３を介して、システムバス１４に接続され、主メモリ１２も、システムバス１４に接続されている。 The processor system 1 includes a processor 11 and a main memory 12. The processor 11 is connected to the system bus 14 via a bus interface (hereinafter abbreviated as I / F) 13, and the main memory 12 is also connected to the system bus 14.

プロセッサ１１は、コアプロセッサ２１と、イベント制御部２２と、一時保存用メモリとしてのローカルメモリ２３を含む。ローカルメモリ２３は、一時的にデータを格納するRAMのメモリである。
コアプロセッサ２１は、デコーダ３１と、命令キャッシュ３２と、データキャッシュ３３と、割り込みコントローラ３４を含む。
コアプロセッサ２１は、主メモリ１２に記憶されている命令とデータを、システムバス１４を介して読み込んで実行する。主メモリ１２から読み出した命令とデータは、それぞれ、命令キャッシュ３２とデータキャッシュ３３に一時的に記憶される。デコーダ３１は、命令キャッシュ３２の命令を読み出してデコードし、コアプロセッサ３１は、データキャッシュ３３から読み出した必要なデータに対して、そのデコードされた命令を実行する。コアプロセッサ３１は、命令の実行結果を、主メモリ１２の所定の領域に出力し、記憶させる。 The processor 11 includes a core processor 21, an event control unit 22, and a local memory 23 as a temporary storage memory. The local memory 23 is a RAM memory that temporarily stores data.
The core processor 21 includes a decoder 31, an instruction cache 32, a data cache 33, and an interrupt controller 34.
The core processor 21 reads and executes instructions and data stored in the main memory 12 via the system bus 14. The instruction and data read from the main memory 12 are temporarily stored in the instruction cache 32 and the data cache 33, respectively. The decoder 31 reads and decodes an instruction in the instruction cache 32, and the core processor 31 executes the decoded instruction on necessary data read from the data cache 33. The core processor 31 outputs and stores the instruction execution result in a predetermined area of the main memory 12.

プロセッサ１１において実行されるプログラムは、図示しない外部記憶装置等に記憶されており、その外部記憶装置から読み出されたプログラムは、主メモリ１２に記憶される。命令キャッシュ３２とデータキャッシュ３３には、主メモリ１２から読み出された命令とデータが一時的に記憶され、デコーダ３１が連続して命令を実行することによって、主メモリ１２に記憶されたプログラムの処理が実行される。 A program executed by the processor 11 is stored in an external storage device (not shown) or the like, and a program read from the external storage device is stored in the main memory 12. The instruction cache 32 and the data cache 33 temporarily store instructions and data read from the main memory 12, and the decoder 31 continuously executes the instructions to execute the program stored in the main memory 12. Processing is executed.

また、プログラムの実行中に所定のイベントが発生すると割り込み処理が実行される。割り込みコントローラ３４は、その割り込み処理の実行を制御する。 Further, when a predetermined event occurs during the execution of the program, an interrupt process is executed. The interrupt controller 34 controls the execution of the interrupt process.

イベント制御部２２は、関数コール制御部４１と、キャッシュミス制御部４２と、データ制御部４３とを含むハードウエア回路により形成された回路部である。 The event control unit 22 is a circuit unit formed by a hardware circuit including a function call control unit 41, a cache miss control unit 42, and a data control unit 43.

コアプロセッサ２１は、イベント制御部２２の機能を有効あるいは無効にするためのフラグ設定部を有し、そのフラグ設定部に設定するフラグによって、イベント制御部３３の機能を有効（ENABLE）あるいは無効（DISABLE）に設定することができる。よって、イベント制御部３３を有効することにより、後述するようなキャッシュミスのプロファイル情報を得るようにすることができる。 The core processor 21 has a flag setting unit for enabling or disabling the function of the event control unit 22, and enables (ENABLE) or disables the function of the event control unit 33 according to the flag set in the flag setting unit ( DISABLE). Therefore, by enabling the event control unit 33, it is possible to obtain cache miss profile information as described later.

関数コール制御部４１は、デコーダ３１によりデコードされた命令が、分岐命令、ジャンプ命令、あるいはリターン命令のときに、所定のイベントとしてこれらの命令の実行を検出する。デコーダ３１は、デコードされた命令コードを、関数コール制御部４１に出力する。関数コール制御部４１は、これらの命令の実行が、所定の命令であることを検出すると、その検出時のプログラムカウンタ値、飛び先あるいは戻り先のアドレス、及びそのときの時間すなわちクロックカウンタ値を、ログデータとして、一次保存用メモリであるローカルメモリ２３に記録する。すなわち、関数コール制御部４１は、関数の呼び出しを検出すると、クロックカウンタ５１のカウンタ値と、プログラムカウンタ（以下、PCと略す）の値と、飛び先あるいは戻り先のアドレスの情報を、ローカルメモリ２３に出力する関数呼び出し制御部を構成する。 When the instruction decoded by the decoder 31 is a branch instruction, a jump instruction, or a return instruction, the function call control unit 41 detects the execution of these instructions as a predetermined event. The decoder 31 outputs the decoded instruction code to the function call control unit 41. When the function call control unit 41 detects that the execution of these instructions is a predetermined instruction, the function call control unit 41 calculates the program counter value at the time of detection, the jump destination or return destination address, and the time at that time, that is, the clock counter value. Log data is recorded in the local memory 23 which is a primary storage memory. That is, when the function call control unit 41 detects a function call, the function call control unit 41 obtains the counter value of the clock counter 51, the value of the program counter (hereinafter abbreviated as PC), and the jump destination or return destination address information, The function call control unit to be output to 23 is configured.

キャッシュミス制御部４２は、命令キャッシュ３２とデータキャッシュ３３のキャッシュミスを、所定のイベントとして検出する。すなわち、キャッシュミス制御部４２は、命令キャッシュ３２とデータキャッシュ３３へのキャッシュミスのアクセスがあった時に、そのキャッシュミスを、所定のイベントとして検出する。キャッシュミス制御部４２は、それらの所定のイベントを検出すると、その検出時のプログラムカウンタ値、アクセスされたアドレスすなわちアクセスアドレス、及びそのときの時間すなわちクロックカウンタ値を、ログデータとして、一次保存用メモリのローカルメモリ２３に出力する。 The cache miss control unit 42 detects a cache miss between the instruction cache 32 and the data cache 33 as a predetermined event. That is, when there is a cache miss access to the instruction cache 32 and the data cache 33, the cache miss control unit 42 detects the cache miss as a predetermined event. When the cache miss control unit 42 detects these predetermined events, the program counter value at the time of detection, the accessed address, that is, the access address, and the time at that time, that is, the clock counter value are used as log data for primary storage. Output to the local memory 23 of the memory.

データ制御部４３には、ローカルメモリ２３に記憶されるデータの量についての閾値が設定されている。データ制御部４３は、ローカルメモリであるローカルメモリ２３に記録されているデータ量をモニタしている。ローカルメモリ２３に格納されたデータ量が、設定された所定の閾値を超えた時点で、データ制御部４３は、割り込みコントローラ３４に所定の割り込み信号を出力する。割り込みコントローラ３４は、対応する所定の割り込みルーチンを実行して、ローカルメモリ２３のログデータを、主メモリ１２の所定のデータ保存領域１２ａに格納する割り込み制御部である。以上により、主メモリ１２に、キャッシュミスのプロファイルデータが自動的に格納される。主メモリ１２は、キャッシュミスのプロファイルデータを格納する、データ保存用メモリである。すなわち、データ制御部４３は、ローカルメモリ２３に記憶されたデータ量が所定の閾値以上となったことを検出すると、データ保存用メモリである主メモリ１２に、ローカルメモリ２３に記憶されたデータを転送する制御部である。 In the data control unit 43, a threshold for the amount of data stored in the local memory 23 is set. The data control unit 43 monitors the amount of data recorded in the local memory 23 that is a local memory. When the amount of data stored in the local memory 23 exceeds a predetermined threshold value, the data control unit 43 outputs a predetermined interrupt signal to the interrupt controller 34. The interrupt controller 34 is an interrupt control unit that executes a corresponding predetermined interrupt routine and stores log data of the local memory 23 in a predetermined data storage area 12 a of the main memory 12. As described above, the cache miss profile data is automatically stored in the main memory 12. The main memory 12 is a data storage memory for storing cache miss profile data. That is, when the data control unit 43 detects that the amount of data stored in the local memory 23 exceeds a predetermined threshold, the data control unit 43 stores the data stored in the local memory 23 in the main memory 12 that is a data storage memory. It is a control part to transfer.

次に、上述した各制御部の構成をより詳述する。
図２は、プロセッサ１１の構成をより詳細に説明するためのブロック図である。 Next, the configuration of each control unit described above will be described in detail.
FIG. 2 is a block diagram for explaining the configuration of the processor 11 in more detail.

プロセッサ２１においては、図１で説明した構成に加えて、キャッシュミス検出部３５と、PCの値を保持するPCレジスタ３６と、命令におけるジャンプすなわち飛び先のアドレスを保持する飛び先レジスタ３７と、命令におけるリターンすなわち戻り先のアドレスを保持する戻り先レジスタ３８とが示されている。 In the processor 21, in addition to the configuration described in FIG. 1, a cache miss detection unit 35, a PC register 36 that holds a PC value, a jump destination register 37 that holds a jump in an instruction, that is, a jump destination address, A return in the instruction, that is, a return destination register 38 that holds a return destination address is shown.

キャッシュミス検出部３５は、命令キャッシュ３２及びデータキャッシュ３３においてキャッシュミスが発生すると、キャッシュミスのイベント情報と、データキャッシュにおいてキャッシュミスが発生したときのアクセスアドレスの情報を出力する。 When a cache miss occurs in the instruction cache 32 and the data cache 33, the cache miss detector 35 outputs cache miss event information and access address information when a cache miss occurs in the data cache.

イベント制御部２２は、関数コール制御部４１と、キャッシュミス制御部４２と、データ制御部４３に加えて、カウンタ４９を含む。 The event control unit 22 includes a counter 49 in addition to the function call control unit 41, the cache miss control unit 42, and the data control unit 43.

関数コール制御部４１は、命令比較部４４と、関数情報出力部４５とを含む。命令比較部４４には、デコーダ３１においてデコードされた命令コードと、分岐命令、ジャンプ命令、あるいはリターン命令の各命令コードとが入力される。命令比較部４４は、デコードされた命令コードと、分岐命令、ジャンプ命令、あるいはリターン命令の各命令コードを比較して、デコードされた命令コードが、分岐命令、ジャンプ命令、あるいはリターン命令のいずれかの命令コードと一致すると、一致信号を関数情報出力部４５に出力する。 The function call control unit 41 includes an instruction comparison unit 44 and a function information output unit 45. The instruction comparison unit 44 receives the instruction code decoded by the decoder 31 and each instruction code of a branch instruction, jump instruction, or return instruction. The instruction comparison unit 44 compares the decoded instruction code with each instruction code of the branch instruction, jump instruction, or return instruction, and the decoded instruction code is any of the branch instruction, jump instruction, or return instruction. When the instruction code matches, the coincidence signal is output to the function information output unit 45.

関数情報出力部４５には、PCレジスタ３６，飛び先レジスタ３７及び戻り先レジスタ３８のデータが入力されている。さらに、関数情報出力部４５には、クロック（以下、CLKと略す）信号に基づいてCLK信号をカウントするクロックカウンタ５１からのクロックカウンタ値の信号も入力されている。 Data of the PC register 36, the jump destination register 37, and the return destination register 38 are input to the function information output unit 45. Further, the function information output unit 45 also receives a clock counter value signal from a clock counter 51 that counts the CLK signal based on a clock (hereinafter abbreviated as CLK) signal.

関数コール制御部４１は、所定の命令が検出されると、その所定の命令を検出したときのクロックカウンタ値、PC値、関数コールであること、及び、飛び先あるいは戻り先のアドレスの情報を、カウンタ４９を介して、ローカルメモリ２３に出力する。
カウンタ４９は、関数コール制御部４１から出力されたログデータのデータ量に応じた値だけカウントアップする。 When a predetermined instruction is detected, the function call control unit 41 obtains a clock counter value, a PC value, a function call when the predetermined instruction is detected, and information on a jump destination or return destination address. And output to the local memory 23 via the counter 49.
The counter 49 counts up by a value corresponding to the data amount of log data output from the function call control unit 41.

キャッシュミス制御部４２は、キャッシュミス情報出力部４６を含んでいる。キャッシュミス情報出力部４６には、キャッシュミス検出部３５から、キャッシュミス検出信号としての、イベント情報とアクセスアドレスの情報とが入力されている。さらに、キャッシュミス情報出力部４６には、PCカウンタ３６のPC値の情報と、クロックカウンタ５１からのクロックカウンタ値の信号とが入力されている。 The cache miss control unit 42 includes a cache miss information output unit 46. The cache miss information output unit 46 receives event information and access address information as a cache miss detection signal from the cache miss detection unit 35. Further, the PC value information of the PC counter 36 and the clock counter value signal from the clock counter 51 are input to the cache miss information output unit 46.

キャッシュミス制御部４２は、キャッシュミスを検出したときのクロックカウンタ値、PC値、キャッシュミスであること、及び、キャッシュミスしたときのアクセスアドレスの情報を、カウンタ４９を介して、ローカルメモリ２３に出力する。 The cache miss control unit 42 stores the clock counter value, the PC value, the cache miss when the cache miss is detected, and the access address information when the cache miss is detected in the local memory 23 via the counter 49. Output.

カウンタ４９は、キャッシュミス制御部４２から出力されたログデータのデータ量に応じた値だけカウントアップする。 The counter 49 counts up by a value corresponding to the amount of log data output from the cache miss control unit 42.

データ制御部４３は、閾値比較部４７と閾値保持部４８とを含む。閾値比較部４７は、閾値保持部４８に保持された所定の閾値と、ローカルメモリ２３に記憶されたログデータのデータ量を示すカウンタ４９のカウント値とを比較し、カウント値がその閾値以上になると、割り込みコントローラ３４に所定の割り込み信号を出力する。上述したように、割り込みコントローラ３４は、その所定の割り込み信号を受信すると、ローカルメモリ２３から主メモリ１２へデータの転送処理を実行する。カウンタ４９は、データの転送がされると、リセットされる。 The data control unit 43 includes a threshold comparison unit 47 and a threshold holding unit 48. The threshold comparison unit 47 compares the predetermined threshold held in the threshold holding unit 48 with the count value of the counter 49 indicating the data amount of the log data stored in the local memory 23, and the count value is equal to or greater than the threshold. Then, a predetermined interrupt signal is output to the interrupt controller 34. As described above, when receiving the predetermined interrupt signal, the interrupt controller 34 executes a data transfer process from the local memory 23 to the main memory 12. The counter 49 is reset when data is transferred.

なお、閾値比較部４７は、ローカルメモリ２３のデータ量を監視して、そのデータ量と閾値と比較するようにしてもよい。 The threshold comparison unit 47 may monitor the data amount of the local memory 23 and compare the data amount with the threshold.

イベント制御部２２には、電源Vddが供給されているが、スイッチ（以下、SWと略す）５２を介して電源Vddが供給されている。上述したように、コアプロセッサ２１は、フラグ設定部５３を有している。フラグ設定部５３にイベント制御部２２を有効とする設定がされたときには、フラグ設定部５３は、SW５２をオンとするようにスイッチ信号がSW５２に供給される。SW５２は、関数コール制御部４１とキャッシュミス制御部４２への電源の供給を制御する。フラグ設定部５３は、関数コール制御部４１とキャッシュミス制御部４２の機能の有効あるいは無効を設定する機能設定部である。すなわち、SW５２とフラグ設定部５３は、フラグ設定部５３に設定された信号に基づいて、関数コール制御部４１とキャッシュミス制御部４２の機能を有効あるいは無効にするための機能設定部を構成する。 The event control unit 22 is supplied with power Vdd, but is supplied with power Vdd via a switch (hereinafter abbreviated as SW) 52. As described above, the core processor 21 has the flag setting unit 53. When the flag setting unit 53 is set to enable the event control unit 22, the flag setting unit 53 is supplied with a switch signal so that the SW 52 is turned on. The SW 52 controls power supply to the function call control unit 41 and the cache miss control unit 42. The flag setting unit 53 is a function setting unit that sets whether the functions of the function call control unit 41 and the cache miss control unit 42 are valid or invalid. That is, SW 52 and flag setting unit 53 constitute a function setting unit for enabling or disabling the functions of function call control unit 41 and cache miss control unit 42 based on the signal set in flag setting unit 53. .

以上のような構成において、ターゲットプログラムを実行させると、プロセッサ１１は、次のように動作する。
ターゲットプログラムの実行中に、分岐命令、ジャンプ命令、あるいはリターン命令が検出されると、関数コール制御部４１は、それらの命令を検出したときのクロックカウンタ値、PC値、関数コールであること、及び、飛び先あるいは戻り先のアドレスの情報を、カウンタ４９を介して、ローカルメモリ２３に出力する。
例えば、次のように、関数コール制御部４１は、CSV形式で、これらの情報を出力する。 In the above configuration, when the target program is executed, the processor 11 operates as follows.
When a branch instruction, a jump instruction, or a return instruction is detected during execution of the target program, the function call control unit 41 is a clock counter value, a PC value, a function call when those instructions are detected, Then, information on the jump destination or return destination address is output to the local memory 23 via the counter 49.
For example, the function call control unit 41 outputs these pieces of information in CSV format as follows.

（クロックカウンタ値），（PC値），（関数コール），（飛び先あるいは戻り先のアドレス）
また、ターゲットプログラムの実行中に、キャッシュミスが検出されると、キャッシュミス制御部４２は、キャッシュミスを検出したときのクロックカウンタ値、PC値、キャッシュミスであること、及び、キャッシュミスしたときのアクセスアドレスの情報を、カウンタ４９を介して、ローカルメモリ２３に出力する。
例えば、次のように、キャッシュミス制御部４２は、CSV形式で、これらの情報を出力する。 (Clock counter value), (PC value), (function call), (jump destination or return address)
In addition, when a cache miss is detected during execution of the target program, the cache miss control unit 42 is the clock counter value, the PC value, the cache miss when the cache miss is detected, and the cache miss The access address information is output to the local memory 23 via the counter 49.
For example, the cache miss control unit 42 outputs these pieces of information in the CSV format as follows.

（クロックカウンタ値），（PC値），（キャッシュミス），（アクセスアドレス）
従って、所定の命令が実行されたこと、及びキャッシュミスが発生したことに応じて、上述した情報が、例えば、CSV形式でカウンタ４９を介して、ローカルメモリ２３に出力され、ログ情報として記録されていく。 (Clock counter value), (PC value), (cache miss), (access address)
Accordingly, in response to the execution of a predetermined instruction and the occurrence of a cache miss, the above-described information is output to the local memory 23 via the counter 49 in, for example, CSV format and recorded as log information. To go.

関数コール制御部４１とキャッシュミス制御部４２からの情報は、ローカルメモリ２３に蓄積されていき、所定のデータ量が蓄積されると、割り込みコントローラ３４によって、主メモリ１２のデータ保存領域１２ａに転送される。 Information from the function call control unit 41 and the cache miss control unit 42 is accumulated in the local memory 23. When a predetermined amount of data is accumulated, the information is transferred to the data storage area 12a of the main memory 12 by the interrupt controller 34. Is done.

主メモリ１２のデータ保存領域１２ａに記憶されたログ情報は、飛び先及び戻り先のアドレスの情報を含むため、そのアドレスに基づいて、関数を特定することができるので、蓄積されたログデータを解析することによって、関数毎のキャッシュミス回数をカウントすることができる。 Since the log information stored in the data storage area 12a of the main memory 12 includes the information of the jump destination and return address, the function can be specified based on the address. By analyzing, the number of cache misses for each function can be counted.

次に、以上のようにして得られたキャッシュミスのプロファイルデータを用いた解析例を説明する。
１）関数のキャッシュミスの相関
例えば、図３は、関数名と、各関数のキャッシュミス回数を示す表を示す図である。ターゲットプログラムは、複数の関数を含む。図３には、関数名「func_a」の実行時に「１１２７９７」回のキャッシュミスが発生しており、関数名「func_b」の実行時に「４２６０７」回のキャッシュミスが発生しており、関数名「func_c」の実行時に「２０３６４」回のキャッシュミスが発生していることが示されている。 Next, an analysis example using the cache miss profile data obtained as described above will be described.
1) Correlation of function cache misses
For example, FIG. 3 is a table showing function names and the number of cache misses for each function. The target program includes a plurality of functions. In FIG. 3, “1279797” cache misses occur when the function name “func_a” is executed, and “42607” cache misses occur when the function name “func_b” is executed. It is indicated that “20364” cache misses have occurred during the execution of “func_c”.

すなわち、関数毎の、キャッシュミス回数を示すことができる。実行される関数名及び遷移先の関数名は、分岐命令、ジャンプ命令、及びリターン命令時の、PC値、飛び先あるいは戻り先アドレスの情報と、デバッグ情報から得ることができる。これらのアドレス情報から、空間的局所性を知ることができる。キャッシュミス回数が大きい関数は、それによるミスペナルティも大きいので、キャッシュミス回数を減らすことによりプログラム全体の処理速度の向上に効果が大きいと考えられる。 That is, the number of cache misses for each function can be indicated. The name of the function to be executed and the function name of the transition destination can be obtained from the PC value, jump destination or return destination address information, and debug information at the time of the branch instruction, jump instruction, and return instruction. From these address information, the spatial locality can be known. A function with a large number of cache misses has a large miss penalty, and therefore it is considered that reducing the number of cache misses is effective in improving the processing speed of the entire program.

従って、最もキャッシュミスの多かった関数からチューニングをすることによって、キャッシュミスのオーバヘッド低減が大きいので、関数名「func_a」からチューニングを行えばよいことがわかる。 Therefore, it can be understood that tuning is performed from the function name “func_a” because the overhead of cache miss is greatly reduced by tuning from the function having the most cache misses.

２）関数とキャッシュミスの発生の時間的な相関
例えば、図４は、関数、クロックカウンタ値及びアクセスアドレスの表データを示す図である。図５は、アクセスされた実アドレスの時間経過における散布図である。横軸が、関数のコール、リターン時の時間すなわちクロックカウンタ値の軸であり、縦軸が、関数のコール及びリターン時にアクセスされたときのメモリの実アドレスの軸である。 2) Temporal correlation between occurrence of function and cache miss For example, FIG. 4 is a diagram showing table data of a function, a clock counter value, and an access address. FIG. 5 is a scatter diagram of the accessed real addresses over time. The horizontal axis is the time of function call and return, that is, the axis of the clock counter value, and the vertical axis is the axis of the real address of the memory when accessed at the time of function call and return.

図５において、縦の実線CMで示す時点においてキャッシュミスが発生したことを示している。すなわち、図５は、キャッシュミスが発生したタイミングを時系列的に示しているので、キャッシュミスの時間的局所性を知ることができる。 FIG. 5 shows that a cache miss has occurred at the time indicated by the vertical solid line CM. That is, FIG. 5 shows the timing at which a cache miss occurs in time series, so that the temporal locality of the cache miss can be known.

関数コール制御部４１とキャッシュミス制御部４２から出力されるログデータには、クロックカウンタ値が含まれているので、図５に示すように、いわゆる分岐、ジャンプ、リターン時の時間と、キャッシュミス発生時の時間とが得られるので、関数の遷移と、キャッシュミスの発生とを関連付けることができる。 Since the log data output from the function call control unit 41 and the cache miss control unit 42 includes a clock counter value, as shown in FIG. 5, the so-called branch, jump, return time, cache miss Since the time at the time of occurrence is obtained, the transition of the function can be associated with the occurrence of the cache miss.

図４に示された時間範囲R1のデータに対応する、図５の同じ時間範囲R1におけるキャッシュミスの発生状況を見ると、キャッシュの同一ラインに配置された関数に交互にアクセスが発生しているために、そのアクセスの度にキャッシュミスが発生していることがわかる。特に、時間範囲R1と同じようなキャッシュミスが繰り返し発生しているような場合に、各関数が効率的にメモリに配置されていれば、キャッシュミスを大幅に削減できると考えられる。 When the occurrence situation of the cache miss in the same time range R1 in FIG. 5 corresponding to the data in the time range R1 shown in FIG. 4 is seen, the functions arranged on the same line of the cache are alternately accessed. Therefore, it can be seen that a cache miss occurs at each access. In particular, when a cache miss similar to that in the time range R1 occurs repeatedly, if each function is efficiently arranged in the memory, it is considered that the cache miss can be significantly reduced.

図６と図７は、それぞれ、図５に示すようなキャッシュミスの発生を低減するように、関数のメモリ配置を変更したときの、関数、クロックカウンタ値及びアクセスアドレスの表データを示す図と、アクセスされた実アドレスの時間経過における散布図である。図７を見ると判るように、図４の範囲R1に対応する範囲R2におけるキャッシュミス回数は、大幅に減少している。 FIGS. 6 and 7 are diagrams showing table data of functions, clock counter values, and access addresses when the memory arrangement of functions is changed so as to reduce the occurrence of cache misses as shown in FIG. FIG. 7 is a scatter diagram of the accessed real addresses over time. As can be seen from FIG. 7, the number of cache misses in the range R2 corresponding to the range R1 in FIG. 4 is greatly reduced.

以上のように、上述した本実施の形態に係るプロセッサによれば、従来のようなキャッシュミスカウンタのようなレジスタを設けず、かつプログラム中にキャッシュミス測定用のコードを含ませることもなく、ターゲットプログラムの実際の動作時のキャッシュミスのプロファイル情報を、関数毎に得ることができる。 As described above, according to the processor according to the present embodiment described above, a register such as a conventional cache miss counter is not provided, and a cache miss measurement code is not included in the program. Cache miss profile information during actual operation of the target program can be obtained for each function.

特に、出力されるログ情報は、時間の情報を含むので、キャッシュミスの発生した場所と時間を特定することができる。その結果、ターゲットプログラムをチューニングする際に、プログラムのメモリ配置を最適化することにより、キャッシュミス回数の減少につなげることが可能となる。 In particular, since the output log information includes time information, it is possible to specify the location and time at which the cache miss occurred. As a result, when tuning the target program, it is possible to reduce the number of cache misses by optimizing the memory layout of the program.

また、ログ情報は、プロセッサ上のローカルメモリに一旦保持され、一定の閾値以上になると、データ保存用のメモリに転送するので、出力されるログ情報を常にデータ保存用のメモリに転送するよりも、高速にログ情報の記憶を行うことができる。 Also, log information is temporarily stored in the local memory on the processor, and when it exceeds a certain threshold value, it is transferred to the data storage memory. Therefore, the output log information is always transferred to the data storage memory. Log information can be stored at high speed.

なお、上述した実施の形態に係るプロセッサは、半導体チップ上に複数のプロセッサが設けられたマルチプロセッサにおいても適用可能である。すなわち、マルチプロセッサ上の各プロセッサに、上述した関数コール制御部４１，キャッシュミス制御部４２，データ制御部４３、及びローカルメモリ２３が、設けられる。図８は、マルチプロセッサの構成例を示す図である。図８に示すように、マルチプロセッサ６１は、複数のプロセッサ１１を有し、複数のプロセッサ１１は、互いにバス６２に接続されている。 Note that the processor according to the above-described embodiment can also be applied to a multiprocessor in which a plurality of processors are provided on a semiconductor chip. That is, the above-described function call control unit 41, cache miss control unit 42, data control unit 43, and local memory 23 are provided in each processor on the multiprocessor. FIG. 8 is a diagram illustrating a configuration example of a multiprocessor. As shown in FIG. 8, the multiprocessor 61 includes a plurality of processors 11, and the plurality of processors 11 are connected to each other by a bus 62.

本発明は、上述した実施の形態に限定されるものではなく、本発明の要旨を変えない範囲において、種々の変更、改変等が可能である。 The present invention is not limited to the above-described embodiments, and various changes and modifications can be made without departing from the scope of the present invention.

本発明の実施の形態に係わるプロセッサシステムの構成を示す構成図である。It is a block diagram which shows the structure of the processor system concerning embodiment of this invention. 本発明の実施の形態に係わるプロセッサの構成をより詳細に説明するためのブロック図である。It is a block diagram for demonstrating in detail the structure of the processor concerning embodiment of this invention. 本発明の実施の形態に係わる、関数名と、各関数のキャッシュミス回数を示す表を示す図である。It is a figure which shows the table which shows the function name and the cache miss frequency of each function concerning embodiment of this invention. 本発明の実施の形態に係わる、関数、クロックカウンタ値及びアクセスアドレスの表データを示す図である。It is a figure which shows the table data of a function, a clock counter value, and an access address concerning embodiment of this invention. 本発明の実施の形態に係わる、アクセスされた実アドレスの時間経過における散布図である。It is a scatter diagram in the time passage of the accessed real address concerning an embodiment of the invention. 本発明の実施の形態に係わる、図５に示すようなキャッシュミスの発生を低減するように、関数のメモリ配置を変更したときの、関数、クロックカウンタ値及びアクセスアドレスの表データを示す図である。FIG. 6 is a diagram showing table data of a function, a clock counter value, and an access address when the memory arrangement of the function is changed so as to reduce the occurrence of a cache miss as shown in FIG. 5 according to the embodiment of the present invention. is there. 本発明の実施の形態に係わる、図５に示すようなキャッシュミスの発生を低減するように、関数のメモリ配置を変更したときの、アクセスされた実アドレスの時間経過における散布図である。FIG. 6 is a scatter diagram over time of an accessed real address when the memory arrangement of a function is changed so as to reduce the occurrence of a cache miss as shown in FIG. 5 according to the embodiment of the present invention. 本発明の実施の形態に係わる、マルチプロセッサの構成例を示す図である。It is a figure which shows the structural example of the multiprocessor concerning embodiment of this invention.

Explanation of symbols

１プロセッサシステム、１１コアプロセッサ、１２主メモリ、１３バスインターフェース、１４システムバス、２１コアプロセッサ、２２イベント制御部、２３ローカルメモリ、３１デコーダ、３２命令キャッシュ、３３データキャッシュ、３４割り込みコントローラ、４１関数コール制御部、４２キャッシュミス制御部、４３データ制御部、５１クロックカウンタ、５２スイッチ、５３フラグ設定部、６１マルチプロセッサ、６２バス 1 processor system, 11 core processor, 12 main memory, 13 bus interface, 14 system bus, 21 core processor, 22 event control unit, 23 local memory, 31 decoder, 32 instruction cache, 33 data cache, 34 interrupt controller, 41 function Call control unit, 42 cache miss control unit, 43 data control unit, 51 clock counter, 52 switch, 53 flag setting unit, 61 multiprocessor, 62 bus

Claims

Memory,
A clock counter that counts the clock signal based on the clock signal;
When a function call is detected, a counter value of the clock counter, a program counter value, and a function call control unit that outputs information on the jump destination or return destination address to the memory,
When a cache miss is detected, a counter value of the clock counter, a program counter value, and a cache miss control unit that outputs information of an access address to the memory;
A processor characterized by comprising:

2. A data control unit for transferring data stored in the memory to a data storage memory when detecting that the amount of data stored in the memory exceeds a predetermined threshold value. The processor described in.

Having an interrupt controller,
3. The processor according to claim 1, wherein the transfer of the data stored in the memory is performed by the interrupt control unit.

4. A function setting unit for enabling or disabling functions of the function call control unit and the cache miss control unit based on a set signal, according to any one of claims 1 to 3. The processor described.

A multiprocessor having a plurality of processors,
Each processor
Memory,
A clock counter that counts the clock signal based on the clock signal;
When a function call is detected, a counter value of the clock counter, a program counter value, and a function call control unit that outputs information on the jump destination or return destination address to the memory,
When a cache miss is detected, a counter value of the clock counter, a program counter value, and a cache miss control unit that outputs information of an access address to the memory;
A multiprocessor characterized by comprising: