JP2015210574A

JP2015210574A - Information processor, processing method and processing program

Info

Publication number: JP2015210574A
Application number: JP2014090249A
Authority: JP
Inventors: 昌生山本; Masao Yamamoto
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2014-04-24
Filing date: 2014-04-24
Publication date: 2015-11-24

Abstract

PROBLEM TO BE SOLVED: To provide an information processor capable of collecting information used for performance analysis.SOLUTION: The information processor includes: a measurement section 23 that measures events of CPU cycles and the number of instruction execution events to calculate the number of unit instruction cycles by using performance measurement counters 21a-21c; a sampling section 24 that samples plural pieces of branch information in a program as an analysis object; an analysis section 27 identifies an actual execution path on a disassemble list based on the branch information which is sampled by the sampling section 24, extracts a basic block which includes an instruction immediately after a predetermined branch to the next branching instruction, and holds a piece of correlation information T between the basic block identification information and the number of assembler instructions based on a count result of the number of assembler instructions included in the extracted basic blocks; and a calculation section 28 that calculates the execution time by multiplying the number of unit instruction cycle to the number of basic block assembler instructions included in the correlation information T.

Description

本発明は、情報処理装置，処理方法及び処理プログラムに関する。 The present invention relates to an information processing apparatus, a processing method, and a processing program.

計算機上で実行されるプログラムの性能を評価するプログラム性能解析方法として、例えば、性能プロファイラが知られている。性能プロファイラは、プログラムの実行を通して情報を収集することでプログラムの性能を解析する性能解析プログラムである。
性能プロファイラは、プログラム実行時の各種情報を収集し、特に、関数呼び出しの頻度やそれにかかる時間を計測する。 As a program performance analysis method for evaluating the performance of a program executed on a computer, for example, a performance profiler is known. A performance profiler is a performance analysis program that analyzes program performance by collecting information through program execution.
The performance profiler collects various types of information at the time of program execution, and in particular, measures the frequency of function calls and the time it takes.

図１２は性能プロファイラの機能構成を模式的に示す図、図１３はその測定フェーズを説明する図、図１４は解析結果を例示する図である。
性能プロファイラは、図１２に示すように、ＰＭＣ（Performance Monitoring Counter）とサンプリングドライバ（Sampling Driver）とを備える。例えば、ＰＭＣとしての機能はＣＰＵ（Central Processing Unit）によって実現され、また、サンプリングドライバはカーネルの機能として実装される。 12 is a diagram schematically illustrating the functional configuration of the performance profiler, FIG. 13 is a diagram illustrating the measurement phase, and FIG. 14 is a diagram illustrating the analysis result.
As shown in FIG. 12, the performance profiler includes a PMC (Performance Monitoring Counter) and a sampling driver (Sampling Driver). For example, the function as PMC is realized by a CPU (Central Processing Unit), and the sampling driver is implemented as a kernel function.

ＰＭＣは、一定間隔サンプリングドライバに対して割り込み（overflow割り込み，sampling割り込み）を発生させる。ＰＭＣは、レジスタのカウンタオーバーフロー割り込みを用いて、前述した割り込みを発生させる。図１２に示す例においては、ＰＭＣが１ｍｓ毎にオーバーフロー割り込みを発生させている。
サンプリングドライバは、ＰＭＣから割り込みが入力されると、解析対象である動作プログラムの情報を採取する。採取するプログラムの情報としては、例えば、実行中のプログラムのプロセスＩＤや命令アドレスである。 The PMC generates an interrupt (overflow interrupt, sampling interrupt) for the sampling driver at regular intervals. The PMC uses the register counter overflow interrupt to generate the aforementioned interrupt. In the example shown in FIG. 12, the PMC generates an overflow interrupt every 1 ms.
When an interrupt is input from the PMC, the sampling driver collects information on the operation program to be analyzed. The information on the collected program is, for example, the process ID and instruction address of the program being executed.

図１３においては、３種類のプログラムＡ，Ｂ，Ｃが、Ａ，Ｃ，Ｂ，Ａの順に実行された状態を例示している。そして、１ｍｓ毎（1ms rate）に発生されるオーバーフロー割り込みに応じて、サンプリングドライバがその時点でＣＰＵ（Central Processing Unit）により実行されているプログラムの情報を採取する。
採取されたプログラムの情報に対しては統計的解析が行なわれる。図１４に示す例においては、各動作プログラムをＣＰＵ使用率でソートして示している。これにより、例えば、各プログラムの処理に要した時間をそれぞれ知ることができる。 FIG. 13 illustrates a state in which three types of programs A, B, and C are executed in the order of A, C, B, and A. Then, in response to an overflow interrupt generated every 1 ms (1 ms rate), the sampling driver collects information on a program being executed by a CPU (Central Processing Unit) at that time.
Statistical analysis is performed on the collected program information. In the example shown in FIG. 14, each operation program is sorted according to the CPU usage rate. Thereby, for example, the time required for processing of each program can be known.

特開平１１−１４３７３５号公報JP 11-143735 A 特開平７−２１０６１号公報Japanese Patent Laid-Open No. 7-21061 特開２００４−３０５１４号公報JP 2004-30514 A

しかしながら、近年においては、計算機の性能向上に伴い、解析対象のプログラムの実行時間が短くなってきており、例えば、そのレスポンスが１ｍｓ以内に行なわれることが求められる場合もある。すなわち、図１３に示す例において、図中に横軸方向に示す各動作プログラムの実行時間が短くなり、ＰＭＣからのオーバーフロー割り込みでは時系列解析に必要な情報を十分に採取できない場合があるという課題がある。 However, in recent years, with the improvement in computer performance, the execution time of a program to be analyzed has been shortened. For example, the response may be required to be performed within 1 ms. That is, in the example shown in FIG. 13, the execution time of each operation program shown in the horizontal axis direction in the drawing is shortened, and there is a case where information necessary for time series analysis may not be collected sufficiently by an overflow interrupt from PMC. There is.

本発明の目的の一つは、精度の高い実行時間の算出を可能とすることである。
なお、前記目的に限らず、後述する発明を実施するための形態に示す各構成により導かれる作用効果であって、従来の技術によっては得られない作用効果を奏することも本発明の他の目的の１つとして位置付けることができる。 One of the objects of the present invention is to enable calculation of execution time with high accuracy.
In addition, the present invention is not limited to the above-described object, and other effects of the present invention can be achieved by the functions and effects derived from the respective configurations shown in the embodiments for carrying out the invention which will be described later. It can be positioned as one of

このため、この情報処理装置は、性能測定カウンタを用いて、ＣＰＵサイクルイベントと実行命令数イベントとを測定し、単位命令サイクル数を算出する実測部と、解析対象プログラムにおける複数の分岐情報を採取する採取部と、前記採取部が採取した分岐情報に基づき、前記解析対象プログラムから作成された逆アセンブルリスト上の実走行パスを特定し、所定の分岐の直後の命令から次に分岐する命令を含む基本ブロックを抽出し、抽出された各基本ブロックに含まれるアセンブラ命令数をカウントした結果に基づいて、基本ブロック特定情報と前記アセンブラ命令数との対応関係情報を保持する解析部と、前記対応関係情報に含まれる各基本ブロックの前記アセンブラ命令数に前記単位命令サイクル数を乗算して実行時間を算出する算出部とを有する。 For this reason, this information processing apparatus uses a performance measurement counter to measure a CPU cycle event and an execution instruction count event, and collects a plurality of branch information in the analysis target program, and an actual measurement unit that calculates the number of unit instruction cycles. And an actual branch path on the disassembly list created from the analysis target program based on the branch information collected by the collection unit, and an instruction to branch next from an instruction immediately after a predetermined branch An analysis unit that stores correspondence information between basic block identification information and the number of assembler instructions based on a result of counting the number of assembler instructions included in each extracted basic block The execution time is calculated by multiplying the number of unit instruction cycles by the number of assembler instructions of each basic block included in the relationship information. And a detecting section.

一実施形態によれば、精度の高い実行時間の算出を可能とする。 According to one embodiment, it is possible to calculate the execution time with high accuracy.

実施形態の一例としての情報処理装置の機能構成を示す図である。It is a figure which shows the function structure of the information processing apparatus as an example of embodiment. 実施形態の一例としての情報処理装置のハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the information processing apparatus as an example of embodiment. 実施形態の一例としての情報処理装置における分岐トレース支援部によって作成される分岐情報を例示する図である。It is a figure which illustrates the branch information created by the branch trace support part in the information processing apparatus as an example of the embodiment. 実施形態の一例としての情報処理装置における逆アセンブルリストを例示する図である。It is a figure which illustrates the disassembly list | wrist in the information processing apparatus as an example of embodiment. （ａ），（ｂ）は実施形態の一例としての情報処理装置における基本ブロックアドレス情報の生成方法を説明するための図である。(A), (b) is a figure for demonstrating the production | generation method of the basic block address information in the information processing apparatus as an example of embodiment. 実施形態の一例としての情報処理装置におけるプログラム性能解析方法の概要を説明するフローチャートである。It is a flowchart explaining the outline | summary of the program performance analysis method in the information processing apparatus as an example of embodiment. 実施形態の一例としての情報処理装置の基本ブロック抽出部による基本ブロックの抽出手法を説明するフローチャートである。It is a flowchart explaining the extraction method of the basic block by the basic block extraction part of the information processing apparatus as an example of embodiment. 実施形態の一例としての情報処理装置における基本ブロックアドレス情報の変形例を示す図である。It is a figure which shows the modification of the basic block address information in the information processing apparatus as an example of embodiment. 実施形態の一例としての情報処理装置の基本ブロック抽出部による基本ブロックの抽出手法の変形例を説明するフローチャートである。10 is a flowchart for describing a modification example of a basic block extraction method by a basic block extraction unit of an information processing apparatus as an example of an embodiment; 図４に示す逆アセンブルリストを実行した場合に採取される採取データの一部を例示する図である。FIG. 5 is a diagram illustrating a part of collected data collected when the disassemble list shown in FIG. 4 is executed. 図１０に例示する採取データから生成される基本ブロック情報の一部を例示する図である。It is a figure which illustrates a part of basic block information produced | generated from the collection data illustrated in FIG. 性能プロファイラの機能構成を模式的に示す図である。It is a figure which shows typically the function structure of a performance profiler. 性能プロファイラの測定フェーズを説明する図である。It is a figure explaining the measurement phase of a performance profiler. 解析結果を例示する図である。It is a figure which illustrates an analysis result.

以下、図面を参照して本情報処理装置，処理方法及び処理プログラムに係る実施の形態を説明する。ただし、以下に示す実施形態はあくまでも例示に過ぎず、実施形態で明示しない種々の変形例や技術の適用を排除する意図はない。すなわち、本実施形態を、その趣旨を逸脱しない範囲で種々変形して実施することができる。又、各図は、図中に示す構成要素のみを備えるという趣旨ではなく、他の機能等を含むことができる。 Hereinafter, embodiments of the information processing apparatus, processing method, and processing program will be described with reference to the drawings. However, the embodiment described below is merely an example, and there is no intention to exclude application of various modifications and techniques not explicitly described in the embodiment. That is, the present embodiment can be implemented with various modifications without departing from the spirit of the present embodiment. Each figure is not intended to include only the components shown in the figure, and may include other functions.

図１は実施形態の一例としての情報処理装置の機能構成を示す図、図２はそのハードウェア構成を示す図である。
本実施形態の情報処理装置１は、プログラムを実行することにより各種機能を実現する計算機である。
本情報処理装置１は、図２に示すように、ＣＰＵ２０１，メモリ２０２，ディスプレイ２０５，キーボード２０６，マウス２０７，媒体読取装置２０８及び記憶装置２０９を備える。 FIG. 1 is a diagram illustrating a functional configuration of an information processing apparatus as an example of an embodiment, and FIG. 2 is a diagram illustrating a hardware configuration thereof.
The information processing apparatus 1 according to the present embodiment is a computer that realizes various functions by executing a program.
As shown in FIG. 2, the information processing apparatus 1 includes a CPU 201, a memory 202, a display 205, a keyboard 206, a mouse 207, a medium reading device 208, and a storage device 209.

ディスプレイ２０５は種々の情報を表示する表示装置であり、例えば、液晶ディスプレイ装置やＣＲＴ（Cathode Ray Tube）ディスプレイ装置である。
マウス２０７及びキーボード２０６はオペレータが種々の入力を行なうために操作する入力装置である。
メモリ２０２はＲＯＭ（Read Only Memory）及びＲＡＭ（Random Access Memory）を含む記憶装置である。メモリ２０２のＲＯＭには、プログラム性能解析に係るソフトウェアプログラムやこのプログラム用のデータ類が書き込まれている。メモリ２０２上のソフトウェアプログラムは、ＣＰＵ２０１に適宜読み込まれて実行される。 The display 205 is a display device that displays various information, such as a liquid crystal display device or a CRT (Cathode Ray Tube) display device.
A mouse 207 and a keyboard 206 are input devices operated by an operator to perform various inputs.
The memory 202 is a storage device including a ROM (Read Only Memory) and a RAM (Random Access Memory). In the ROM of the memory 202, a software program related to program performance analysis and data for the program are written. The software program on the memory 202 is appropriately read by the CPU 201 and executed.

また、メモリ２０２のＲＡＭは、一次記憶メモリあるいはワーキングメモリとして利用される。このメモリ２０２のＲＡＭは、種々のデータやプログラムを一時的に格納する記憶装置であり図示しないメモリ領域をそなえる。
メモリ領域には、ＣＰＵ２０１がプログラムを実行する際に、データやプログラムを一時的に格納・展開して用いる。例えば、メモリ領域には、後述する、基本ブロック情報テーブルＴ，Ｔ′の情報や、逆アセンブルリスト等が格納される。 The RAM of the memory 202 is used as a primary storage memory or a working memory. The RAM of the memory 202 is a storage device that temporarily stores various data and programs, and has a memory area (not shown).
Data and programs are temporarily stored and expanded in the memory area when the CPU 201 executes the programs. For example, information on basic block information tables T and T ′, which will be described later, a disassemble list, and the like are stored in the memory area.

記憶装置２０９は、ハードディスクドライブ（Hard disk drive：ＨＤＤ）、ＳＳＤ（Solid State Drive）等の記憶装置であって、種々のデータを格納するものである。また、記憶装置２０９には、ＯＳ（Operating System）や解析対象のプログラムのオブジェクトファイル等も格納される。
媒体読取装置２０８は、記録媒体ＲＭが装着可能に構成される。媒体読取装置２０８は、記憶媒体ＲＭが装着された状態において、記録媒体ＲＭに記録されている情報を読み取り可能に構成される。本例では、記録媒体ＲＭは可搬性を有する。記録媒体ＲＭは、コンピュータ読取可能な記録媒体であって、例えば、フレキシブルディスク，ＣＤ（ＣＤ−ＲＯＭ，ＣＤ−Ｒ，ＣＤ−ＲＷ等），ＤＶＤ（ＤＶＤ−ＲＯＭ，ＤＶＤ−ＲＡＭ，ＤＶＤ−Ｒ，ＤＶＤ＋Ｒ，ＤＶＤ−ＲＷ，ＤＶＤ＋ＲＷ，ＨＤＤＶＤ等），ブルーレイディスク，磁気ディスク，光ディスク，光磁気ディスク、又は、半導体メモリである。 The storage device 209 is a storage device such as a hard disk drive (HDD) or an SSD (Solid State Drive), and stores various data. The storage device 209 also stores an OS (Operating System), an object file of a program to be analyzed, and the like.
The medium reading device 208 is configured so that a recording medium RM can be loaded. The medium reading device 208 is configured to be able to read information recorded on the recording medium RM when the storage medium RM is mounted. In this example, the recording medium RM has portability. The recording medium RM is a computer-readable recording medium such as a flexible disk, CD (CD-ROM, CD-R, CD-RW, etc.), DVD (DVD-ROM, DVD-RAM, DVD-R, etc.). DVD + R, DVD-RW, DVD + RW, HD DVD, etc.), Blu-ray disc, magnetic disc, optical disc, magneto-optical disc, or semiconductor memory.

ＣＰＵ２０１は、種々の制御や演算を行なう処理装置であり、メモリ２０２に格納されたＯＳやプログラムを実行することにより、種々の機能を実現する。例えば、ＣＰＵ２０１は、図１に示すように、性能解析部２１１としての機能を実現する。
すなわち、ＣＰＵ２０１が、性能解析プログラムを実行することにより、性能解析部２１１として機能する。 The CPU 201 is a processing device that performs various controls and calculations, and implements various functions by executing an OS and programs stored in the memory 202. For example, the CPU 201 realizes a function as the performance analysis unit 211 as shown in FIG.
That is, the CPU 201 functions as the performance analysis unit 211 by executing the performance analysis program.

なお、この性能解析部２１１としての機能を実現するためのプログラム（処理プログラム）は、例えば前述した記録媒体ＲＭに記録された形態で提供される。そして、コンピュータはその記録媒体ＲＭからプログラムを読み取って内部記憶装置または外部記憶装置に転送し格納して用いる。又、そのプログラムを、例えば磁気ディスク，光ディスク，光磁気ディスク等の記憶装置（記録媒体）に記録しておき、その記憶装置から通信経路を介してコンピュータに提供するようにしてもよい。 Note that a program (processing program) for realizing the function as the performance analysis unit 211 is provided in a form recorded in the above-described recording medium RM, for example. Then, the computer reads the program from the recording medium RM, transfers it to the internal storage device or the external storage device, stores it, and uses it. The program may be recorded in a storage device (recording medium) such as a magnetic disk, an optical disk, or a magneto-optical disk, and provided from the storage device to the computer via a communication path.

性能解析部２１１としての機能を実現する際には、内部記憶装置（本実施形態ではメモリ２０２のＲＡＭやＲＯＭ）に格納されたプログラムがコンピュータのマイクロプロセッサ（本実施形態ではＣＰＵ２０１）によって実行される。このとき、記録媒体に記録されたプログラムをコンピュータが読み取って実行するようにしてもよい。
性能解析部２１１は、本情報処理装置１において実行されるプログラムの性能を解析する。 When realizing the function as the performance analysis unit 211, a program stored in an internal storage device (RAM or ROM of the memory 202 in the present embodiment) is executed by a microprocessor of the computer (CPU 201 in the present embodiment). . At this time, the computer may read and execute the program recorded on the recording medium.
The performance analysis unit 211 analyzes the performance of a program executed in the information processing apparatus 1.

この性能解析部２１１は、図１に示すように、分岐トレース支援部２２，ＣＰＩ実測部２３，分岐トレース採取部２４，オブジェクト回収部２５，逆アセンブル部２６，基本ブロック解析部２７，基本ブロック実行時間算出部２８、及び、ＰＭＣ２１ａ〜２１ｃとしての機能を備える。
ＰＭＣ２１ａ〜２１ｃは、所定のイベントを監視して計数するカウンタ（性能イベント監視カウンタ）であり、レジスタのカウンタオーバーフロー割り込みを用いて、割り込みを発生させる。この割り込みをオーバーフロー（overflow）割り込み、もしくは、サンプリング（sampling）割り込みという場合がある。 As shown in FIG. 1, the performance analysis unit 211 includes a branch trace support unit 22, a CPI actual measurement unit 23, a branch trace collection unit 24, an object collection unit 25, a disassembly unit 26, a basic block analysis unit 27, and a basic block execution. The time calculation unit 28 and functions as the PMCs 21a to 21c are provided.
The PMCs 21a to 21c are counters (performance event monitoring counters) that monitor and count a predetermined event, and generate an interrupt using a counter overflow interrupt of a register. This interrupt may be referred to as an overflow interrupt or a sampling interrupt.

ＰＭＣ２１ａはＣＰＵサイクル（CPU Cycle）イベントを測定し、周波数同期サイクル数を計数する。また、ＰＭＣ２１ｂは実行命令数イベントを測定し、命令数を計数する。
これらのＰＭＣ２１ａ，２１ｂによる計数結果（カウント値）は、レジスタ等の所定の記憶領域（図示省略）に格納される。
ＰＭＣ２１ｃは固定サイクル数を計数するものである。このＰＭＣ２１ｃにおいては、例えば、ＣＰＵ２０１のクロック周波数に基づき、所定時間経過毎（例えば、１ｍｓ毎）にオーバーフロー割り込みが出力されるように設定されている。このＰＭＣ２１ｃから出力されるオーバーフロー割り込みが、ＣＰＩ実測部２３及び分岐トレース採取部２４に入力され、それぞれサンプリング契機として用いられる。すなわち、ＰＭＣ２１ｃからのオーバーフロー割り込み通知（サンプリング割り込み）が、ＣＰＩ実測部２３や分岐トレース採取部２４による情報取得のトリガ通知として用いられる。 The PMC 21a measures a CPU cycle event and counts the number of frequency synchronization cycles. Further, the PMC 21b measures the execution instruction number event, and counts the instruction number.
Count results (count values) by these PMCs 21a and 21b are stored in a predetermined storage area (not shown) such as a register.
The PMC 21c counts the number of fixed cycles. In the PMC 21c, for example, based on the clock frequency of the CPU 201, an overflow interrupt is set to be output every elapse of a predetermined time (for example, every 1 ms). The overflow interrupt output from the PMC 21c is input to the CPI actual measurement unit 23 and the branch trace collection unit 24, and each is used as a sampling trigger. That is, an overflow interrupt notification (sampling interrupt) from the PMC 21c is used as a trigger notification for information acquisition by the CPI actual measurement unit 23 or the branch trace collection unit 24.

なお、これらのＰＭＣ２１ａ〜２１ｃとしての機能はＣＰＵの既知の機能によって実現され、その詳細な説明は省略する。
ＣＰＩ実測部２３は、ＰＭＣ２１ａによって計数されたＣＰＵサイクル数と、ＰＭＣ２１ｂによって計数された命令実行数イベントとをＰＭＣ２１ｃからのオーバーフロー割り込みを契機に採取する、実測部として機能する。すなわち、ＣＰＩ実測部２３は、本情報処理装置１におけるＣＰＵサイクルイベント及び命令実行数イベントを実測する。 Note that these functions as the PMCs 21a to 21c are realized by known functions of the CPU, and detailed description thereof is omitted.
The CPI actual measurement unit 23 functions as an actual measurement unit that collects the CPU cycle number counted by the PMC 21a and the instruction execution number event counted by the PMC 21b in response to an overflow interrupt from the PMC 21c. That is, the CPI actual measurement unit 23 actually measures the CPU cycle event and the instruction execution number event in the information processing apparatus 1.

なお、ＰＭＣ２１ａ，２１ｂの各カウンタ値は、ＣＰＩ実測部２３によりデータ取得が行なわれる度にリセットされる。
また、ＣＰＩ実測部２３は、これらのＣＰＵサイクル数及び命令実行数の各値を用いてＣＰＩを算出する。
ＣＰＩは、１命令あたりの実行に要するＣＰＵサイクル数（平均命令実行サイクル数）である。ＣＰＩ実測部２３は、ＣＰＵサイクル数の実測値を命令実行数の実測値で除算することで、ＣＰＩを算出する。このように、ＣＰＵサイクル数及び命令実行数の各実測値を用いて算出されるＣＰＩを実測ＣＰＩという場合がある。 The counter values of the PMCs 21a and 21b are reset every time data acquisition is performed by the CPI actual measurement unit 23.
The CPI actual measurement unit 23 calculates the CPI using each value of the CPU cycle number and the instruction execution number.
CPI is the number of CPU cycles required for execution per instruction (average instruction execution cycle number). The CPI actual measurement unit 23 calculates the CPI by dividing the actual measurement value of the CPU cycle number by the actual measurement value of the instruction execution number. As described above, the CPI calculated using the actual measurement values of the CPU cycle number and the instruction execution number may be referred to as the actual CPI.

このＣＰＩ実測部２３によって算出された実測ＣＰＩは、後述する基本ブロック実行時間算出部２８に通知される。
分岐トレース支援部２２は、解析対象のプログラム（実行プログラム）について、プログラム中の分岐アドレスを採取する。例えば、分岐トレース支援部２２は、プログラム中の分岐コマンド（例えば、jumpやcall，return）を検知することで分岐アドレスを特定する。 The actual CPI calculated by the CPI actual measurement unit 23 is notified to a basic block execution time calculation unit 28 described later.
The branch trace support unit 22 collects a branch address in the program for the analysis target program (execution program). For example, the branch trace support unit 22 identifies a branch address by detecting a branch command (for example, jump, call, return) in the program.

また、分岐トレース支援部２２は、例えば、ＬＢＲ（Last Branch Record）やＢＴＳ（Branch Trace Store）等の分岐トレース支援機能を実行することで、プログラムの実行に伴う分岐情報を採取する。具体的には、分岐トレース支援部２２は、分岐トレース支援機能により、実行された分岐のソース（分岐元となる分岐命令アドレス）と、ターゲット（分岐先の命令アドレス）とを取得し、所定の記憶領域に保存する。 The branch trace support unit 22 collects branch information associated with the execution of the program by executing a branch trace support function such as LBR (Last Branch Record) or BTS (Branch Trace Store). Specifically, the branch trace support unit 22 acquires the executed branch source (branch source branch instruction address) and target (branch destination instruction address) by the branch trace support function, and performs predetermined processing. Save to storage area.

図３は実施形態の一例としての情報処理装置１における分岐トレース支援部２２によって作成される分岐情報を例示する図である。この図３においては、分岐情報としてＬＢＲを用いた例を示している。
分岐トレース支援部２２は、ＣＰＵ２０１によって実行されたプログラム中で発生した分岐について、そのソース（分岐元となる分岐命令アドレス）とターゲット（分岐先の命令アドレス）とを取得し、所定のレジスタに記録する。 FIG. 3 is a diagram illustrating the branch information created by the branch trace support unit 22 in the information processing apparatus 1 as an example of the embodiment. In this FIG. 3, the example which used LBR as branch information is shown.
The branch trace support unit 22 acquires the source (branch source instruction address) and target (branch destination instruction address) of the branch generated in the program executed by the CPU 201, and records it in a predetermined register. To do.

分岐トレース支援部２２は、図３に示すように、分岐について取得したソース（From）とターゲット（Ｔｏ）の各命令アドレス（ＩＰ：Instruction Pointer）を、専用のサイクリック・レジスタ・スタックにペア（分岐ペア）で記録する。
なお、この分岐トレース支援部２２による分岐トレース支援機能は既知の手法で実現することができる。 As shown in FIG. 3, the branch trace support unit 22 pairs each instruction address (IP: Instruction Pointer) of the source (From) and the target (To) acquired for the branch into a dedicated cyclic register stack ( Record in branch pair).
The branch trace support function by the branch trace support unit 22 can be realized by a known method.

例えば、Ｉｎｔｅｌ（登録商標）のＣＰＵでは、１６ペア分の分岐アドレスが、ＬＢＲと呼ばれるレジスタ・スタックに記録され、最後の記録位置を示すＴＯＳ（Top Of Stack）と呼ばれるインデックス（index）レジスタも同時に更新される。ＴＯＳにはインデックスとして、サイクリック・レジスタ・スタックに登録される１６ペア分の分岐アドレスに対応する１〜１５のいずれかの数字が記録される。 For example, in an Intel (registered trademark) CPU, 16 pairs of branch addresses are recorded in a register stack called LBR, and an index register called TOS (Top Of Stack) indicating the last recording position is also simultaneously recorded. Updated. In TOS, any number of 1 to 15 corresponding to the 16 pairs of branch addresses registered in the cyclic register stack is recorded as an index.

なお、分岐トレース支援機能においては、例えば、採取対象とする分岐命令の種類（無条件分岐／条件分岐／call／return等）や特権レベル（ＯＳモード、userモード）のフィルタリング設定、および採取開始・停止指示等を操作することができる。
なお、上述した分岐トレース支援部２２としての機能は既知の手法で実現することができ、その詳細な説明は省略する。この分岐トレース支援部２２によって採取された分岐アドレスは分岐トレース採取部２４に通知される。 In the branch trace support function, for example, the type of branch instruction to be collected (unconditional branch / conditional branch / call / return, etc.), privilege level (OS mode, user mode) filtering setting, and collection start / A stop instruction or the like can be operated.
The function as the branch trace support unit 22 described above can be realized by a known method, and detailed description thereof is omitted. The branch address collected by the branch trace support unit 22 is notified to the branch trace collection unit 24.

以下、分岐トレース支援機能としてＬＢＲを用いる例について示す。
分岐トレース採取部２４は、分岐トレース支援部２２によってレジスタ等に記録された分岐情報（各分岐アドレス）を採取する。すなわち、分岐トレース採取部２４は、本情報処理装置１におけるプログラムの実行履歴を採取するものであり、分岐トレース支援部２２によってレジスタ等に記録された分岐情報を読み出し、基本ブロック解析部２７に受け渡す。この分岐トレース採取部２４は、解析対象プログラムにおける複数の分岐情報を採取する採取部として機能する。 Hereinafter, an example in which LBR is used as a branch trace support function will be described.
The branch trace collection unit 24 collects branch information (each branch address) recorded in a register or the like by the branch trace support unit 22. That is, the branch trace collection unit 24 collects a program execution history in the information processing apparatus 1, reads the branch information recorded in the register or the like by the branch trace support unit 22, and receives it by the basic block analysis unit 27. hand over. The branch trace collection unit 24 functions as a collection unit that collects a plurality of pieces of branch information in the analysis target program.

オブジェクト回収部２５は、解析対象のプログラム（ファイル）のオブジェクトファイルを取得するものであり、対象のオブジェクトファイルを当該オブジェクトファイルが格納された記憶装置２０９等の記憶装置から読み出し、予め規定された所定のフォルダ（処理用フォルダ）にコピー（回収）する。
逆アセンブル部２６は、オブジェクト回収部２５によって回収された各オブジェクトファイルに対して逆アセンブルを行なうことにより逆アセンブルリスト（命令列）を作成する。 The object collection unit 25 acquires an object file of an analysis target program (file), reads the target object file from a storage device such as the storage device 209 in which the object file is stored, and defines a predetermined predetermined file. Copy (collect) to this folder (processing folder).
The disassemble unit 26 creates a disassemble list (instruction sequence) by performing disassembly on each object file collected by the object collection unit 25.

図４は実施形態の一例としての情報処理装置１における逆アセンブルリストを例示する図である。この図４に例示する逆アセンブルリストは、単純な演算関数をtest1()、test2()、test3()、test4()、test５()の順にネストしてcallするプログラムの一部（test4及びtest5）を示す。
逆アセンブル部２６は、プログラムのコードを、図４に示すような人間が処理の流れを視認できる形式の逆アセンブルリスト（命令列）に変換（作成）する。 FIG. 4 is a diagram illustrating a disassemble list in the information processing apparatus 1 as an example of the embodiment. The disassemble list illustrated in FIG. 4 is a part of a program (test4 and test5) that nests and calls a simple arithmetic function in the order of test1 (), test2 (), test3 (), test4 (), test5 (). ).
The disassembler 26 converts (creates) the program code into a disassemble list (instruction sequence) in a format in which a human can visually recognize the processing flow as shown in FIG.

基本ブロック解析部２７は、解析対象のプログラムの逆アセンブルリストから基本ブロックを抽出し、この基本ブロックに含まれる命令数（Instruction Count）を計数する。この基本ブロック解析部２７は、図１に示すように、基本ブロック抽出部２７１及び命令数カウント部２７２を備える。
基本ブロック抽出部２７１は、分岐トレース採取部２４が採取した分岐情報に基づいて、基本ブロック情報Ｔを作成する。 The basic block analysis unit 27 extracts basic blocks from the disassemble list of the analysis target program, and counts the number of instructions (Instruction Count) included in the basic blocks. The basic block analysis unit 27 includes a basic block extraction unit 271 and an instruction count counting unit 272 as shown in FIG.
The basic block extraction unit 271 creates basic block information T based on the branch information collected by the branch trace collection unit 24.

図５（ａ），（ｂ）は実施形態の一例としての情報処理装置１における基本ブロックアドレス情報の生成方法を説明するための図であり、（ａ）は分岐情報の構成を例示する図、（ｂ）は基本ブロック情報Ｔの構成を示す図である。
基本ブロックとは、実行プログラム中において、図５（ａ）に示す分岐情報のターゲットＮ−１とソースＮとの間の命令列である。 FIGS. 5A and 5B are diagrams for explaining a method of generating basic block address information in the information processing apparatus 1 as an example of the embodiment. FIG. 5A is a diagram illustrating a configuration of branch information. (B) is a figure which shows the structure of the basic block information T. FIG.
The basic block is an instruction sequence between the target N-1 and the source N of the branch information shown in FIG.

基本ブロックは、命令列上において途中で分岐することなく真っ直ぐ走行した命令ブロックであり、ある分岐の直後の命令から次に分岐する命令までの範囲（ブロック）である。ＬＢＲデータである分岐情報においては、分岐のターゲットＮ−１からソースＮまでの範囲がこの基本ブロックに相当する。なお、一般に、基本ブロックのことを、ベーシック・ブロックもしくは実行命令ブロックという場合がある。 The basic block is an instruction block that has run straight without branching in the middle of the instruction sequence, and is a range (block) from the instruction immediately after a certain branch to the instruction that branches next. In the branch information that is LBR data, the range from the branch target N-1 to the source N corresponds to this basic block. In general, the basic block may be referred to as a basic block or an execution instruction block.

基本ブロック抽出部２７１は、分岐トレース支援部２２によって採取された分岐情報に基づいて、逆アセンブリリスト上の実走行パスを特定する。基本ブロック抽出部２７１は、分岐情報から読み出したターゲットＮ−１及びソースＮの各アドレスの値を、図５（ｂ）に示すように、基本ブロック情報Ｔにおいて、基本ブロックの開始アドレスおよび終点アドレスとして登録する。 The basic block extraction unit 271 specifies the actual travel path on the disassembly list based on the branch information collected by the branch trace support unit 22. As shown in FIG. 5B, the basic block extraction unit 271 sets the values of the addresses of the target N-1 and the source N read from the branch information in the basic block information T, as shown in FIG. Register as

ターゲットＮ−１とソースＮとの間の命令列が基本ブロックとして抽出され、この間にある全ての命令が１回リタイアしている（実行されている）。ただし、この間にある全ての分岐は実行されていない。
なお、図５（ｂ）に示す例においては、基本ブロック情報Ｔは、基本ブロックを特定する情報（例えば、基本ブロック１，２）に対して、開始アドレス及び終点アドレスを関連付けたテーブルとして構成されている。開始アドレスと終点アドレスとはペア情報として分岐情報から抽出され、基本ブロック情報に記録される。ペア情報を成す開始アドレスと終点アドレスとは、逆アセンブルリスト中における１つの基本ブロックの範囲を表し、基本ブロック特定情報として機能する。 The instruction sequence between the target N-1 and the source N is extracted as a basic block, and all the instructions in between are retired once (executed). However, all branches in between are not executed.
In the example shown in FIG. 5B, the basic block information T is configured as a table in which the start address and the end point address are associated with information specifying the basic block (for example, the basic blocks 1 and 2). ing. The start address and the end point address are extracted from the branch information as pair information and recorded in the basic block information. The start address and the end point address forming the pair information represent a range of one basic block in the disassemble list and function as basic block specifying information.

また、図５（ｂ）に示すように、基本ブロック情報Ｔには、各基本ブロックに対して、その基本ブロックに含まれる命令数が解析結果として対応付けて記録される。この命令数は、後述する命令数カウント部２７２によって計数され、基本ブロック情報Ｔに記録される。以下、基本ブロック情報Ｔを基本ブロック情報テーブルＴという場合がある。
命令数カウント部２７２は、基本ブロック抽出部２７１によって抽出された各基本ブロック中に含まれる命令（comment）の数を計数する。この命令数カウント部２７２は、各基本ブロックに対応する逆アセンブルリスト上の実走行パスから、各基本ブロックに含まれる命令数（アセンブラ命令数）を計数する。 Further, as shown in FIG. 5B, in the basic block information T, the number of instructions included in each basic block is recorded in association with each basic block as an analysis result. The number of instructions is counted by an instruction number counting unit 272 described later and recorded in the basic block information T. Hereinafter, the basic block information T may be referred to as a basic block information table T.
The instruction number counting unit 272 counts the number of instructions included in each basic block extracted by the basic block extracting unit 271. The instruction count unit 272 counts the number of instructions (assembler instruction number) included in each basic block from the actual traveling path on the disassemble list corresponding to each basic block.

命令数カウント部２７２は、基本ブロック抽出部２７１が作成した基本ブロック情報（基本ブロック情報テーブル）Ｔを参照して、解析対象プログラムの逆アセンブルリストから、開始アドレスと終点アドレスとの組み合わせを読み出すことにより各基本ブロックを抽出する。
そして、命令数カウント部２７２は、抽出した各基本ブロックにおける命令数を計数する。 The instruction count unit 272 refers to the basic block information (basic block information table) T created by the basic block extraction unit 271 and reads the combination of the start address and the end point address from the disassemble list of the analysis target program. To extract each basic block.
Then, the instruction count counter 272 counts the number of instructions in each extracted basic block.

基本ブロック実行時間算出部２８は、基本ブロック情報テーブルＴ上の各基本ブロックの実行命令数（ＩＣ：Instruction Count）に、ＣＰＩ実測部２３によって算出された実測ＣＰＩを乗算して、実行時間を算出する。なお、以下、実行命令数を単に命令数もしくはＩＣという場合がある。また、実測ＣＰＩを単にＣＰＩという場合がある。
ＣＰＩにＩＣを乗算することで、ＣＰＵの時間性能を現すＣＰＵサイクル数（CPU Cycles，CPU Times）を算出することができる。時間性能とは、ある処理に要した時間で性能を現す指標であり、ＣＰＵ時間性能とは、ＣＰＵがある処理に要した時間をいう。 The basic block execution time calculation unit 28 calculates the execution time by multiplying the number of execution instructions (IC: Instruction Count) of each basic block on the basic block information table T by the actual CPI calculated by the CPI actual measurement unit 23. To do. Hereinafter, the number of executed instructions may be simply referred to as the number of instructions or IC. Further, the actually measured CPI may be simply referred to as CPI.
By multiplying CPI by IC, the number of CPU cycles (CPU Cycles, CPU Times) expressing the time performance of the CPU can be calculated. The time performance is an index representing the performance in the time required for a certain process, and the CPU time performance is the time required for a certain process.

なお、本実施形態においては、時間性能の表現としてＣＰＵサイクル数を用いるが、これに限定されるものではなく、例えば秒等の他の単位を用いてよい。Cyclesから秒への単位変換は、“ Cycles／ＣＰＵ周波数（Hz）”を演算することで実現できる。
また、基本ブロック実行時間算出部２８は、算出した実行時間を基本ブロック情報テーブルＴ上の対応する各基本ブロックに対して、再度、記録保持する。 In this embodiment, the number of CPU cycles is used as an expression of time performance, but is not limited to this, and other units such as seconds may be used. Unit conversion from Cycles to seconds can be realized by calculating “Cycles / CPU frequency (Hz)”.
The basic block execution time calculation unit 28 records and holds the calculated execution time again for each corresponding basic block on the basic block information table T.

上述の如く構成された実施形態の一例としての情報処理装置１におけるプログラム性能解析方法の概要を、図６に示すフローチャート（ステップＳ１〜Ｓ２）に従って説明する。
まず、ステップＳ１の測定フェーズにおいて、ＰＭＣ２１ｃからサンプリング間隔毎に行なわれるオーバーフロー割り込みを契機に、ＣＰＩ実測部２３及び分岐トレース採取部２４が情報採取を行なう。 The outline of the program performance analysis method in the information processing apparatus 1 as an example of the embodiment configured as described above will be described according to the flowchart (steps S1 to S2) shown in FIG.
First, in the measurement phase of step S1, the CPI actual measurement unit 23 and the branch trace collection unit 24 collect information when triggered by an overflow interrupt from the PMC 21c at every sampling interval.

すなわち、ＣＰＩ実測部２３は、ＰＭＣ２１ａ，２１ｂからサイクル数と実行命令数とを採取し、これらの値を用いて実測ＣＰＩを算出する。
また、分岐トレース採取部２４は、分岐トレース支援部２２によってレジスタ等に記録された分岐情報（各分岐アドレス）を採取する。この際、分岐トレース採取部２４は、逆アセンブルリスト中の複数の分岐についての分岐情報を一括で採取する。 That is, the CPI actual measurement unit 23 collects the number of cycles and the number of executed instructions from the PMCs 21a and 21b, and calculates the actual CPI using these values.
The branch trace collection unit 24 collects branch information (each branch address) recorded in a register or the like by the branch trace support unit 22. At this time, the branch trace collecting unit 24 collects branch information for a plurality of branches in the disassemble list at a time.

さらに、オブジェクト回収部２５が、動作していたプログラムのオブジェクトファイルの回収を行なう。
次に、ステップＳ２の解析フェーズにおいて、逆アセンブル部２６が、オブジェクト回収部２５が回収したオブジェクトファイルより逆アセンブルリスト（命令列リスト）を作成する。また、基本ブロック解析部２７の基本ブロック抽出部２７１が、分岐トレース採取部２４が採取した分岐情報より、逆アセンブル部２６が作成した命令列リスト上での実行パスを特定する。すなわち、解析対象のプログラムを実行させることにより実際に走行した基本ブロックを特定する。基本ブロック抽出部２７１は、基本ブロック情報テーブルＴを作成する。基本ブロック抽出部２７１による基本ブロックの特定・抽出手法については、図７に示すフローチャートを用いて後述する。 Further, the object collection unit 25 collects the object file of the program that has been operating.
Next, in the analysis phase of step S2, the disassemble unit 26 creates a disassemble list (instruction sequence list) from the object file collected by the object collection unit 25. Further, the basic block extraction unit 271 of the basic block analysis unit 27 specifies an execution path on the instruction sequence list created by the disassembly unit 26 from the branch information collected by the branch trace collection unit 24. That is, the basic block that actually travels is specified by executing the analysis target program. The basic block extraction unit 271 creates a basic block information table T. The basic block specifying / extracting method by the basic block extracting unit 271 will be described later with reference to the flowchart shown in FIG.

さらに、命令数カウント部２７２が、特定された各基本ブロックに対する命令数を逆アセンブルリスト上の命令列で計数することにより、各基本ブロックの実走行命令数を求める。
そして、基本ブロック実行時間算出部２８が、ステップＳ１においてＣＰＵ実測部２３が算出した実測ＣＰＩと、ステップＳ１において命令数カウント部２７２が求めた各ブロックの命令数とを乗算することにより、各基本ブロックの実行時間をそれぞれ求める。このようにして求められた各基本ブロックの実行時間の情報は、記憶装置２０９等の記憶領域に基本ブロックを特定する情報に対応付けて記録される。そして、各基本ブロックの実行時間の情報は、例えば統計解析に用いられ、図１４に例示したような解析結果として出力される。 Further, the instruction count counter 272 calculates the actual running instruction number of each basic block by counting the number of instructions for each identified basic block with the instruction sequence on the disassemble list.
Then, the basic block execution time calculation unit 28 multiplies the actual CPI calculated by the CPU actual measurement unit 23 in step S1 by the instruction count of each block obtained by the instruction count counting unit 272 in step S1. Find the execution time of each block. Information on the execution time of each basic block obtained in this way is recorded in a storage area such as the storage device 209 in association with information specifying the basic block. Information on the execution time of each basic block is used for statistical analysis, for example, and is output as an analysis result as illustrated in FIG.

次に、実施形態の一例としての情報処理装置１の基本ブロック抽出部２７１による基本ブロックの抽出手法を、図７に示すフローチャート（ステップＡ１〜Ａ６）に従って説明する。
なお、この図７においては、ＬＢＲデータからの基本ブロック抽出方法を示し、ＬＢＲスタックのうち、時間的に一番古いデータから時間順に辿る場合について例示する。分岐トレース採取部２４は、分岐トレース支援部２２が特段のフィルタを設定することなく採取した１６ペア分の分岐アドレスを採取する。 Next, a basic block extraction method by the basic block extraction unit 271 of the information processing apparatus 1 as an example of the embodiment will be described with reference to a flowchart (steps A1 to A6) illustrated in FIG.
FIG. 7 shows a method of extracting basic blocks from LBR data, and exemplifies a case where the oldest data in the LBR stack is traced in time order. The branch trace collection unit 24 collects 16 pairs of branch addresses collected by the branch trace support unit 22 without setting a special filter.

ステップＡ１において、基本ブロック抽出部２７１は、変数Ｎに対して、ＴＯＳレジスタの値を設定する。また、変数ｉに初期値としての１を設定する。
ステップＡ２において、基本ブロック抽出部２７１は、Ｎ＋２の値がＬＢＲのレジスタ・スタックの数である１６よりも小さいか否かを判定することで、ＬＢＲのサイクリックレジスタにおいてラップアラウンドさせるか否かを判定する。 In step A1, the basic block extracting unit 271 sets the value of the TOS register for the variable N. Also, 1 is set as an initial value for the variable i.
In step A2, the basic block extraction unit 271 determines whether or not to wrap around in the LBR cyclic register by determining whether or not the value of N + 2 is smaller than 16, which is the number of LBR register stacks. judge.

判定の結果、Ｎ＋２の値が１６未満の場合（ステップＡ２のＹＥＳルート参照）、ステップＡ３において、基本ブロック抽出部２７１は、分岐ＮについてのＬＢＲレコード（分岐情報）を参照して、ターゲットＮ＋１及びソースＮ＋２の値を読み出す。そして、基本ブロック抽出部２７１は、基本ブロック情報Ｔにおいて、基本ブロックＮの開始アドレスとしてターゲットＮ＋１の値を記録し、また、基本ブロックＮの終点アドレスとしてソースＮ＋２の値を記録する。その後、Ｎをインクリメントする。 As a result of the determination, if the value of N + 2 is less than 16 (see YES route in step A2), in step A3, the basic block extraction unit 271 refers to the LBR record (branch information) for the branch N, and sets the target N + 1 and Read the value of source N + 2. Then, the basic block extraction unit 271 records the value of the target N + 1 as the start address of the basic block N and the value of the source N + 2 as the end point address of the basic block N in the basic block information T. Thereafter, N is incremented.

ステップＡ５において、基本ブロック抽出部２７１は、ｉが１６未満であるか否かを確認し、ｉが１６以上である場合には（ステップＡ５のＮＯルート参照）、処理を終了する。また、ｉが１６未満である場合には（ステップＡ５のＹＥＳルート参照）、ステップＡ６においてｉをインクリメントした後、ステップＡ２に戻る。
一方、ステップＡ２における判定の結果、Ｎ＋２の値が１６以上の場合（ステップＡ２のＮＯルート参照）、ステップＡ４において、ＬＢＲのサイクリックレジスタにおいてラップアラウンドさせる処理を行なう。 In step A5, the basic block extraction unit 271 confirms whether i is less than 16, and if i is 16 or more (see NO route in step A5), the process ends. If i is less than 16 (see YES route in step A5), i is incremented in step A6, and then the process returns to step A2.
On the other hand, if the value of N + 2 is 16 or more as a result of the determination in step A2 (see the NO route in step A2), in step A4, a wraparound process is performed in the LBR cyclic register.

すなわち、分岐ＮについてのＬＢＲレコード（分岐情報）を参照して、ターゲット１５及びソース０の値を読み出す。そして、基本ブロック抽出部２７１は、基本ブロック情報Ｔにおいて、基本ブロックＮの開始アドレスとしてターゲット１５の値を記録し、また、基本ブロックＮの終点アドレスとしてソース０の値を記録する。その後、Ｎに“−１”を設定し、ステップＡ５に移行する。 That is, the values of the target 15 and the source 0 are read with reference to the LBR record (branch information) for the branch N. Then, the basic block extraction unit 271 records the value of the target 15 as the start address of the basic block N and the value of the source 0 as the end point address of the basic block N in the basic block information T. Thereafter, “−1” is set to N, and the process proceeds to step A5.

この後、命令数カウント部２７２が、各基本ブロックに含まれる命令数を計数し、図５（ｂ）に示す基本ブロック情報Ｔに記録することで、基本ブロック情報Ｔが生成される。
さて、上述した実施形態においては、基本ブロック情報Ｔにおいて、基本ブロックに対して、開始アドレス，終点アドレス及び命令数を対応付けているが、これに限定されるものではない。 Thereafter, the instruction number counting unit 272 counts the number of instructions included in each basic block and records it in the basic block information T shown in FIG. 5B, thereby generating the basic block information T.
In the embodiment described above, in the basic block information T, the basic block is associated with the start address, the end address, and the number of instructions. However, the present invention is not limited to this.

図８は実施形態の一例としての情報処理装置１における基本ブロックアドレス情報の変形例を示す図である。
この図８に示す基本ブロック情報Ｔ′においては、図５（ｂ）に示した基本ブロック情報Ｔに、更に、関数名を関連付けられている。
この基本ブロック情報Ｔ′の関数名は、基本ブロックにおいて実行される関数シンボルを示す。例えば、処理対象のオブジェクトファイルのシンボルマップを予め作成しておき、基本ブロック抽出部２７１による基本ブロックの抽出後に、基本ブロック解析部２７が、基本ブロックのターゲットＮ＋１のアドレスの値に基づいてシンボルマップを参照することにより、当該基本ブロックに対応する関数名を取得することができる。なお、シンボルマップの作成は、例えば、ｎｍコマンドを実行する等の機知の手法を用いて行なうことができ、その詳細な説明は省略する。 FIG. 8 is a diagram illustrating a modification of the basic block address information in the information processing apparatus 1 as an example of the embodiment.
In the basic block information T ′ shown in FIG. 8, a function name is further associated with the basic block information T shown in FIG.
The function name of the basic block information T ′ indicates a function symbol executed in the basic block. For example, a symbol map of an object file to be processed is created in advance, and after the basic block is extracted by the basic block extracting unit 271, the basic block analyzing unit 27 performs a symbol map based on the address value of the target N + 1 of the basic block. The function name corresponding to the basic block can be acquired by referring to. The symbol map can be created by using a known technique such as executing an nm command, and the detailed description thereof is omitted.

基本ブロック情報Ｔ′に関数名を備えることにより、関数単位での実行履歴を参照することができ利便性が高い。例えば、ｆｏｒ文によりループが繰り返されると、ＬＢＲのレジスタ・スタックはすぐに埋まってしまう。
プログラムの性能解析において、関数単位での挙動解析で良い場合には、このような基本ブロック情報Ｔ′の利便性が高い。なお、この際、分岐トレースは機知の分岐トレース支援機能のフィルタリング機能を用いて、call／returnのみを採取することが望ましい。 By providing a function name in the basic block information T ′, it is possible to refer to an execution history in units of functions, which is highly convenient. For example, if a loop is repeated by a “for” statement, the LBR register stack is immediately filled.
In the performance analysis of the program, when the behavior analysis in units of functions is sufficient, the convenience of such basic block information T ′ is high. At this time, it is desirable to collect only call / return using the filtering function of the well-known branch trace support function.

このような実施形態の一例としての情報処理装置１の基本ブロック抽出部２７１による基本ブロックの抽出手法の変形例を、図９に示すフローチャート（ステップＡ１〜Ａ６，Ａ３１，Ａ４１）に従って説明する。この図９に示すフローチャートによる基本ブロックの抽出手法によれば、図８に示す基本ブロック情報Ｔ′が生成される。
なお、この図９においても、ＬＢＲデータからの基本ブロック抽出方法を示し、ＬＢＲスタックのうち、時間的に一番古いデータから時間順に辿る場合について例示する。分岐トレース採取部２４は、分岐トレース支援部２２がフィルタリングでcall／return分岐のみの１６ペア分の分岐アドレスを採取するものとする。 A modification of the basic block extraction method by the basic block extraction unit 271 of the information processing apparatus 1 as an example of such an embodiment will be described with reference to the flowchart (steps A1 to A6, A31, and A41) shown in FIG. The basic block information T ′ shown in FIG. 8 is generated by the basic block extraction method according to the flowchart shown in FIG.
FIG. 9 also shows a basic block extraction method from LBR data, and exemplifies a case where the oldest data in the LBR stack is traced in time order. The branch trace collection unit 24 collects branch addresses for 16 pairs of only call / return branches by filtering by the branch trace support unit 22.

分岐トレース採取部２４は、１セット分（１６ペア分）のＬＢＲデータ採取し、また、オブジェクトファイルのシンボルマップも予め作成される。
ステップＡ１において、基本ブロック抽出部２７１は、変数Ｎに対して、ＴＯＳレジスタの値を設定する。また、変数ｉに初期値としての１を設定する。
ステップＡ２において、基本ブロック抽出部２７１は、Ｎ＋２の値がＬＢＲのレジスタ・スタックの数である１６よりも小さいか否かを判定することで、ＬＢＲのサイクリックレジスタにおいてラップアラウンドさせるか否かを判定する。 The branch trace collection unit 24 collects one set (16 pairs) of LBR data, and also creates a symbol map of the object file in advance.
In step A1, the basic block extracting unit 271 sets the value of the TOS register for the variable N. Also, 1 is set as an initial value for the variable i.
In step A2, the basic block extraction unit 271 determines whether or not to wrap around in the LBR cyclic register by determining whether or not the value of N + 2 is smaller than 16, which is the number of LBR register stacks. judge.

判定の結果、Ｎ＋２の値が１６未満の場合（ステップＡ２のＹＥＳルート参照）、ステップＡ３において、基本ブロック抽出部２７１は、分岐ＮについてのＬＢＲレコード（分岐情報）を参照して、ターゲットＮ＋１及びソースＮ＋２の値を読み出す。そして、基本ブロック抽出部２７１は、基本ブロック情報Ｔ′において、基本ブロックＮの開始アドレスとしてターゲットＮ＋１の値を記録し、また、基本ブロックＮの終点アドレスとしてソースＮ＋２の値を記録する。その後、Ｎをインクリメントする。 As a result of the determination, if the value of N + 2 is less than 16 (see YES route in step A2), in step A3, the basic block extraction unit 271 refers to the LBR record (branch information) for the branch N, and sets the target N + 1 and Read the value of source N + 2. Then, the basic block extraction unit 271 records the value of the target N + 1 as the start address of the basic block N and the value of the source N + 2 as the end point address of the basic block N in the basic block information T ′. Thereafter, N is incremented.

ステップＡ３１において、基本ブロック解析部２７は、ターゲットＮ＋１をシンボルマップと照らし合わせて関数名に変換し、その関数名を基本ブロック情報Ｔ′に記録する。
ステップＡ５において、基本ブロック抽出部２７１は、ｉが１６未満であるか否かを確認し、ｉが１６以上である場合には（ステップＡ５のＮＯルート参照）、処理を終了する。また、ｉが１６未満である場合には（ステップＡ５のＹＥＳルート参照）、ステップＡ６においてｉをインクリメントした後、ステップＡ２に戻る。 In step A31, the basic block analyzer 27 compares the target N + 1 with the symbol map to convert it into a function name, and records the function name in the basic block information T ′.
In step A5, the basic block extraction unit 271 confirms whether i is less than 16, and if i is 16 or more (see NO route in step A5), the process ends. If i is less than 16 (see YES route in step A5), i is incremented in step A6, and then the process returns to step A2.

一方、ステップＡ２における判定の結果、Ｎ＋２の値が１６以上の場合（ステップＡ２のＮＯルート参照）、ステップＡ４において、ＬＢＲのサイクリックレジスタにおいてラップアラウンドさせる処理を行なう。
すなわち、基本ブロック抽出部２７１は、分岐ＮについてのＬＢＲレコード（分岐情報）を参照して、ターゲット１５及びソース０の値を読み出す。そして、基本ブロック抽出部２７１は、基本ブロック情報Ｔ′において、基本ブロックＮの開始アドレスとしてターゲット１５の値を記録し、また、基本ブロックＮの終点アドレスとしてソース０の値を記録する。その後、Ｎに“−１”を設定する。 On the other hand, if the value of N + 2 is 16 or more as a result of the determination in step A2 (see the NO route in step A2), in step A4, a wraparound process is performed in the LBR cyclic register.
That is, the basic block extraction unit 271 reads the values of the target 15 and the source 0 with reference to the LBR record (branch information) for the branch N. Then, the basic block extraction unit 271 records the value of the target 15 as the start address of the basic block N and the value of the source 0 as the end address of the basic block N in the basic block information T ′. Thereafter, “−1” is set to N.

ステップＡ４１において、基本ブロック解析部２７は、ターゲット１５をシンボルマップと照らし合わせて関数名に変換し、その関数名を基本ブロック情報Ｔ′に記録する。その後、ステップＡ５に移行する。
この後、命令数カウント部２７２が、各基本ブロックに含まれる命令数を計数し、図８に示す基本ブロック情報Ｔ′に記録することで、基本ブロック情報Ｔ′が生成される。 In step A41, the basic block analysis unit 27 converts the target 15 into a function name by comparing it with the symbol map, and records the function name in the basic block information T ′. Thereafter, the process proceeds to step A5.
Thereafter, the instruction count counting unit 272 counts the number of instructions included in each basic block and records it in the basic block information T ′ shown in FIG. 8, thereby generating basic block information T ′.

図１０は図４に示す逆アセンブルリストを実行した場合に採取される採取データの一部を例示する図であり、図１１は図１０に例示する採取データから生成される基本ブロック情報Ｔ′の一部を例示する図である。なお、図１１に示す例においては、基本ブロック情報Ｔ′に、本手法により基本ブロック実行時間算出部２８が算出した実行時間（Cycles；ＣＰＩ×命令数）と、従来のシミュレーション手法による求めた実行時間（Cycles）とを付して示している。なお、この従来のシミュレーション手法による実行時間“19”は、ＣＰＩの仕様上の理想値“0.5”を命令数に乗算することにより算出されている（38×0.5＝19）。 FIG. 10 is a diagram illustrating a part of the collected data collected when the disassemble list shown in FIG. 4 is executed, and FIG. 11 shows basic block information T ′ generated from the collected data exemplified in FIG. It is a figure which illustrates a part. In the example shown in FIG. 11, the basic block information T ′ includes the execution time (Cycles; CPI × number of instructions) calculated by the basic block execution time calculation unit 28 according to this method and the execution obtained by the conventional simulation method. It is shown with time (Cycles). The execution time “19” according to the conventional simulation method is calculated by multiplying the number of instructions by the ideal value “0.5” in the CPI specification (38 × 0.5 = 19).

図１０に示す採取データにおいては、１行に１サンプル分の取得データを示すことで３サンプル分の採取データを例示している。また、この図１０中においては、上から２番目及び３番目のサンプルについての詳細なデータ例の図示を省略している。
例えば、ＣＰＩ実測部２３による採取結果として、命令数として“312007（符号Ｐ１参照）”が、また、サイクル数として“297491（符号Ｐ２参照）”が示されている。これにより、実測ＣＰＩは、297491／312007＝0.95と求められる（符号Ｐ３参照）。また、最新のＴＯＳが“５”である。 In the collected data shown in FIG. 10, collected data for three samples is illustrated by showing acquired data for one sample in one line. Further, in FIG. 10, illustration of detailed data examples for the second and third samples from the top is omitted.
For example, as a result of collection by the CPI actual measurement unit 23, “312007 (see symbol P1)” is shown as the number of instructions, and “297491 (see symbol P2)” as the number of cycles. As a result, the measured CPI is obtained as 297491/312007 = 0.95 (see P3). The latest TOS is “5”.

図１１に示すように、基本ブロック１の開始アドレス（ターゲット０）が“400c9a”であり、その終点アドレス（ソース１）が“400c17”である。また、この範囲で示されるk本ブロックの関数名は“test5”であり、命令数は３８である。この場合、ＣＰＩ×命令数（Cycles）は、0.95×38＝36.1と求められる。
このCyclesの値は、従来のシミュレーション手法の理想値“19”よりも現実に即した値となっている。 As shown in FIG. 11, the start address (target 0) of the basic block 1 is “400c9a”, and its end address (source 1) is “400c17”. Further, the function name of k blocks shown in this range is “test5”, and the number of instructions is 38. In this case, CPI × number of instructions (Cycles) is obtained as 0.95 × 38 = 36.1.
This Cycles value is more realistic than the ideal value “19” of the conventional simulation method.

このように、実施形態の一例としての情報処理装置１によれば、基本ブロック実行時間算出部２８が、基本ブロック毎の実行時間の実測値を算出する。これにより、解析対象のプログラムにおいて、当該プログラムを構成する基本ブロック毎に処理に要した時間を把握することができる。
ＬＢＲやＢＴＳ等の分岐トレース支援機能においては、一度のサンプリングで採取される分岐情報に含まれる各分岐ペアの採取時刻が不明であるが、本情報処理装置１によれば、基本ブロック毎の実行時間を知ることができ利便性が高い。 As described above, according to the information processing apparatus 1 as an example of the embodiment, the basic block execution time calculation unit 28 calculates an actual measurement value of the execution time for each basic block. Thereby, in the analysis target program, it is possible to grasp the time required for the processing for each basic block constituting the program.
In the branch trace support function such as LBR and BTS, the collection time of each branch pair included in the branch information collected by one sampling is unknown, but according to the information processing apparatus 1, the execution for each basic block is performed. We can know time and are convenient.

また、短時間で処理されるプロセスについても、基本ブロック単位で実行時間の算出を確実に行なうことができ、精度の高い実行時間の算出を可能とし、また、性能解析に必要な情報を収集することができる。例えば、１００μｓ未満の超短処理プロセスの高精細解析が可能になる。
基本ブロック毎に処理に要した時間を把握することができるので、例えば、どのプロセスにおいて待ち（遅延）が発生したか等を容易に知ることができ利便性が高い。 For processes that are processed in a short time, the execution time can be reliably calculated in units of basic blocks, making it possible to calculate the execution time with high accuracy and collecting information necessary for performance analysis. be able to. For example, high-definition analysis of an ultra-short processing process of less than 100 μs becomes possible.
Since it is possible to grasp the time required for processing for each basic block, for example, it is easy to know in which process a wait (delay) has occurred, which is highly convenient.

また、この際、ＣＰＩ実測部２３が、ＰＭＣ２１ｃからのオーバーフロー割り込みを契機として、ＰＭＣ２１ａ，２１ｂからサイクル数と実行命令数とを採取し、これらの値を用いて実測ＣＰＩを算出する。すなわち、基本ブロック実行時間算出部２８は、サンプリング毎に算出された実測ＣＰＩを用いて基本ブロック毎の実行時間の実測値を算出することができる。例えば、キャッシュミス等が発生した場合には、算出される実測ＣＰＩの値は、このキャッシュミスの発生に応じて一時的に小さくなる。すなわち、ＣＰＵ２０１による実際の処理状況が反映された解析を行なうことができ信頼性が高い。 At this time, the CPI actual measurement unit 23 collects the number of cycles and the number of execution instructions from the PMCs 21a and 21b triggered by the overflow interrupt from the PMC 21c, and calculates the actual CPI using these values. That is, the basic block execution time calculation unit 28 can calculate the actual value of the execution time for each basic block using the actual CPI calculated for each sampling. For example, when a cache miss or the like occurs, the calculated actual CPI value temporarily decreases according to the occurrence of the cache miss. That is, the analysis reflecting the actual processing status by the CPU 201 can be performed, and the reliability is high.

ＰＭＣ２１ｃからのオーバーフロー割り込みを契機として、１回のサンプリングで、分岐トレース採取部２４がＬＢＲの１６ペア分の分岐情報を採取することができる。これにより、高精細な解析を行なうことができる。例えば、従来手法である性能プロファイラを用いる場合に比べて、解像度が１７倍に向上する。
また、分岐毎に割り込みを発生させることで分岐情報を採取する、分岐トラップを利用した既知の命令トレーサ手法に比べて、採取負荷を例えば１／１０００程度に軽減することができる。 In response to an overflow interrupt from the PMC 21c, the branch trace collection unit 24 can collect branch information for 16 pairs of LBRs in one sampling. Thereby, high-definition analysis can be performed. For example, the resolution is improved by 17 times compared to the case of using a performance profiler which is a conventional method.
In addition, the sampling load can be reduced to, for example, about 1/1000 compared to a known instruction tracer method using a branch trap that collects branch information by generating an interrupt for each branch.

そして、開示の技術は上述した実施形態に限定されるものではなく、本実施形態の趣旨を逸脱しない範囲で種々変形して実施することができる。本実施形態の各構成及び各処理は、必要に応じて取捨選択することができ、あるいは適宜組み合わせてもよい。
例えば、上述した実施形態においては、分岐トレース支援部２２が分岐情報をＬＢＲを用いて取得する例を示しているが、これに限定されるものではなく、種々変形して実施することができる。例えば、分岐トレース支援部２２はＢＴＳを用いて分岐情報を取得してもよい。 The disclosed technology is not limited to the above-described embodiment, and various modifications can be made without departing from the spirit of the present embodiment. Each structure and each process of this embodiment can be selected as needed, or may be combined suitably.
For example, in the above-described embodiment, the branch trace support unit 22 acquires the branch information using the LBR. However, the embodiment is not limited to this, and various modifications can be made. For example, the branch trace support unit 22 may acquire branch information using a BTS.

なお、ＬＢＲに代えてＢＴＳを用いる場合には、基本ブロックの抽出に際して、ＬＢＲのレジスタ・スタックの数である１６を用いて比較判断を行なう（図７や図９参照）代わりに、分岐ペアの記憶に用いる記憶領域のアドレス等を用いることが望ましい。
また、上述した実施形態においては、情報処理装置１がオブジェクト回収部２５及び逆アセンブル部２６を備え、解析対象のプログラムのオブジェクトファイルを回収して逆アセンブルリストを作成しているが、これに限定されるものではない。すなわち、例えば、生成された逆アセンブルリストを外部から入力してもよく、種々変形して実施することができる。 When a BTS is used instead of the LBR, a comparison decision is made using 16 which is the number of LBR register stacks when extracting a basic block (see FIGS. 7 and 9). It is desirable to use an address of a storage area used for storage.
In the above-described embodiment, the information processing apparatus 1 includes the object collection unit 25 and the disassembly unit 26 and collects an object file of a program to be analyzed to create a disassembly list. However, the present invention is not limited to this. Is not to be done. That is, for example, the generated disassembly list may be input from the outside, and various modifications can be made.

また、上述した開示により本実施形態を当業者によって実施・製造することが可能である。
以上の実施形態に関し、更に以下の付記を開示する。
（付記１）
性能測定カウンタを用いて、ＣＰＵサイクルイベントと実行命令数イベントとを測定し、単位命令サイクル数を算出する実測部と、
解析対象プログラムにおける複数の分岐情報を採取する採取部と、
前記採取部が採取した分岐情報に基づき、前記解析対象プログラムから作成された逆アセンブルリスト上の実走行パスを特定し、所定の分岐の直後の命令から次に分岐する命令を含む基本ブロックを抽出し、抽出された各基本ブロックに含まれるアセンブラ命令数をカウントした結果に基づいて、基本ブロック特定情報と前記アセンブラ命令数との対応関係情報を保持する解析部と、
前記対応関係情報に含まれる各基本ブロックの前記アセンブラ命令数に前記単位命令サイクル数を乗算して実行時間を算出する算出部と
を有することを特徴とする、情報処理装置。 Further, according to the above-described disclosure, this embodiment can be implemented and manufactured by those skilled in the art.
Regarding the above embodiment, the following additional notes are disclosed.
(Appendix 1)
Using a performance measurement counter, an actual measurement unit that measures a CPU cycle event and an executed instruction number event and calculates a unit instruction cycle number;
A collection unit for collecting multiple branch information in the analysis target program;
Based on the branch information collected by the collection unit, the actual traveling path on the disassembly list created from the analysis target program is specified, and the basic block including the instruction to branch next from the instruction immediately after the predetermined branch is extracted. And, based on the result of counting the number of assembler instructions included in each extracted basic block, an analysis unit that holds correspondence information between basic block identification information and the number of assembler instructions,
An information processing apparatus comprising: a calculation unit configured to calculate an execution time by multiplying the number of unit instruction cycles by the number of assembler instructions of each basic block included in the correspondence relationship information.

（付記２）
前記解析対象プログラムのオブジェクトファイルを回収する回収部と、
前記回収部が回収したオブジェクトファイルから前記逆アセンブルリストを作成する逆アセンブル部と
を備えることを特徴とする、付記１記載の情報処理装置。 (Appendix 2)
A collection unit for collecting an object file of the analysis target program;
The information processing apparatus according to claim 1, further comprising: a disassembly unit that creates the disassembly list from the object file collected by the collection unit.

（付記３）
前記採取部が、分岐トレース支援機能により作成された複数分岐情報を一度の情報採取で読み出すことで採取する
ことを特徴とする、付記１又は２記載の情報処理装置。
（付記４）
性能測定カウンタを用いて、ＣＰＵサイクルイベントと実行命令数イベントとを測定し、単位命令サイクル数を算出する処理と、
解析対象プログラムにおける複数の分岐情報を採取する処理と、
採取した分岐情報に基づき、前記解析対象プログラムから作成された逆アセンブルリスト上の実走行パスを特定し、所定の分岐の直後の命令から次に分岐する命令を含む基本ブロックを抽出し、抽出された各基本ブロックに含まれるアセンブラ命令数をカウントした結果に基づいて、基本ブロック特定情報と前記アセンブラ命令数との対応関係情報を保持する処理と、
前記対応関係情報に含まれる各基本ブロックの前記アセンブラ命令数に前記単位命令サイクル数を乗算して実行時間を算出する処理と
を有することを特徴とする、処理方法。 (Appendix 3)
The information processing apparatus according to appendix 1 or 2, wherein the collection unit collects the multiple branch information created by the branch trace support function by reading the information once.
(Appendix 4)
A process for measuring a CPU cycle event and an execution instruction count event using a performance measurement counter and calculating a unit instruction cycle count;
Processing to collect multiple branch information in the analysis target program,
Based on the collected branch information, the actual travel path on the disassemble list created from the analysis target program is specified, and the basic block including the instruction to branch next from the instruction immediately after the predetermined branch is extracted and extracted. Based on the result of counting the number of assembler instructions included in each basic block, processing for holding correspondence information between the basic block identification information and the number of assembler instructions,
And a process of calculating an execution time by multiplying the number of unit instruction cycles by the number of assembler instructions of each basic block included in the correspondence information.

（付記５）
前記解析対象プログラムのオブジェクトファイルを回収する処理と、
回収したオブジェクトファイルから前記逆アセンブルリストを作成する処理と
を備えることを特徴とする、付記４記載の処理方法。
（付記６）
分岐トレース支援機能により作成された複数分岐情報を一度の情報採取で読み出すことで採取する
ことを特徴とする、付記４又は５記載の処理方法。 (Appendix 5)
A process of collecting an object file of the analysis target program;
The processing method according to claim 4, further comprising a process of creating the disassembly list from the collected object file.
(Appendix 6)
The processing method according to appendix 4 or 5, wherein a plurality of pieces of branch information created by the branch trace support function are collected by reading the information once.

（付記７）
性能測定カウンタを用いて、ＣＰＵサイクルイベントと実行命令数イベントとを測定し、単位命令サイクル数を算出し、
解析対象プログラムにおける複数の分岐情報を採取し、
採取した分岐情報に基づき、前記解析対象プログラムから作成された逆アセンブルリスト上の実走行パスを特定し、所定の分岐の直後の命令から次に分岐する命令を含む基本ブロックを抽出し、抽出された各基本ブロックに含まれるアセンブラ命令数をカウントした結果に基づいて、基本ブロック特定情報と前記アセンブラ命令数との対応関係情報を保持し、
前記対応関係情報に含まれる各基本ブロックの前記アセンブラ命令数に前記単位命令サイクル数を乗算して実行時間を算出する
処理をコンピュータに実行させることを特徴とする、処理プログラム。 (Appendix 7)
Using the performance measurement counter, measure the CPU cycle event and the execution instruction count event, calculate the unit instruction cycle count,
Collect multiple branch information in the analysis target program,
Based on the collected branch information, the actual travel path on the disassemble list created from the analysis target program is specified, and the basic block including the instruction to branch next from the instruction immediately after the predetermined branch is extracted and extracted. Based on the result of counting the number of assembler instructions included in each basic block, the correspondence information between the basic block identification information and the number of assembler instructions is held,
A processing program for causing a computer to execute a process of calculating an execution time by multiplying the number of unit instruction cycles by the number of assembler instructions of each basic block included in the correspondence relationship information.

（付記８）
前記解析対象プログラムのオブジェクトファイルを回収し、
回収したオブジェクトファイルから前記逆アセンブルリストを作成する
処理を、前記コンピュータに実行させることを特徴とする、付記７記載の処理プログラム。 (Appendix 8)
Collect the object file of the analysis target program,
The processing program according to appendix 7, wherein the computer is caused to execute processing for creating the disassembly list from the collected object file.

（付記９）
分岐トレース支援機能により作成された複数分岐情報を一度の情報採取で読み出すことで採取する
処理を、前記コンピュータに実行させることを特徴とする、付記７又は８記載の処理プログラム。 (Appendix 9)
9. The processing program according to appendix 7 or 8, characterized in that the computer executes a process of collecting a plurality of pieces of branch information created by the branch trace support function by reading the information once.

１情報処理装置
２１ａ，２１ｂ，２１ｃＰＭＣ
２２分岐トレース支援部
２３ＣＰＩ実測部
２４分岐トレース採取部
２５オブジェクト回収部
２６逆アセンブル部
２７基本ブロック解析部
２８基本ブロック実行時間算出部
２７１基本ブロック抽出部
２７２命令数カウント部
Ｔ，Ｔ′ 基本ブロック情報 1 Information processing apparatus 21a, 21b, 21c PMC
22 branch trace support unit 23 CPI actual measurement unit 24 branch trace collection unit 25 object collection unit 26 disassembly unit 27 basic block analysis unit 28 basic block execution time calculation unit 271 basic block extraction unit 272 instruction count counting unit T, T ′ basic block information

Claims

Using a performance measurement counter, an actual measurement unit that measures a CPU cycle event and an executed instruction number event and calculates a unit instruction cycle number;
A collection unit for collecting multiple branch information in the analysis target program;
Based on the branch information collected by the collection unit, the actual traveling path on the disassembly list created from the analysis target program is specified, and the basic block including the instruction to branch next from the instruction immediately after the predetermined branch is extracted. And, based on the result of counting the number of assembler instructions included in each extracted basic block, an analysis unit that holds correspondence information between basic block identification information and the number of assembler instructions,
An information processing apparatus comprising: a calculation unit configured to calculate an execution time by multiplying the number of unit instruction cycles by the number of assembler instructions of each basic block included in the correspondence relationship information.

A collection unit for collecting an object file of the analysis target program;
The information processing apparatus according to claim 1, further comprising: a disassembly unit that creates the disassembly list from the object file collected by the collection unit.

The information processing apparatus according to claim 1, wherein the collection unit collects the plurality of pieces of branch information created by the branch trace support function by reading the information once.

A process for measuring a CPU cycle event and an execution instruction count event using a performance measurement counter and calculating a unit instruction cycle count;
Processing to collect multiple branch information in the analysis target program,
Based on the collected branch information, the actual travel path on the disassemble list created from the analysis target program is specified, and the basic block including the instruction to branch next from the instruction immediately after the predetermined branch is extracted and extracted. Based on the result of counting the number of assembler instructions included in each basic block, processing for holding correspondence information between the basic block identification information and the number of assembler instructions,
And a process of calculating an execution time by multiplying the number of unit instruction cycles by the number of assembler instructions of each basic block included in the correspondence information.

Using the performance measurement counter, measure the CPU cycle event and the execution instruction count event, calculate the unit instruction cycle count,
Collect multiple branch information in the analysis target program,
Based on the collected branch information, the actual travel path on the disassemble list created from the analysis target program is specified, and the basic block including the instruction to branch next from the instruction immediately after the predetermined branch is extracted and extracted. Based on the result of counting the number of assembler instructions included in each basic block, the correspondence information between the basic block identification information and the number of assembler instructions is held,
A processing program for causing a computer to execute a process of calculating an execution time by multiplying the number of unit instruction cycles by the number of assembler instructions of each basic block included in the correspondence relationship information.