JP2011076389A

JP2011076389A - Information management program, information management method and information management device

Info

Publication number: JP2011076389A
Application number: JP2009227467A
Authority: JP
Inventors: Takayuki Matsui; 孝行松井; Satoshi Ogiwara; 聡荻原
Original assignee: Fujitsu Frontech Ltd
Current assignee: Fujitsu Frontech Ltd
Priority date: 2009-09-30
Filing date: 2009-09-30
Publication date: 2011-04-14
Anticipated expiration: 2029-09-30
Also published as: JP5313101B2

Abstract

<P>PROBLEM TO BE SOLVED: To reduce a time period required for identifying the cause of performance trouble. <P>SOLUTION: An information management program makes a computer to execute as: a generation means for generating performance information which indicates the processing status of a processing unit, on the basis of trace information of the processing unit for executing processing to associate the performance information with trace information which is a generation source; a detection means for detecting a predetermined event, on the basis of the generated performance information; and an output means for outputting the performance information which is a detection source and the trace information corresponding to the performance information, when the predetermined event is detected. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、情報を管理する情報管理プログラム、情報管理方法、及び、情報管理装置に関する。 The present invention relates to an information management program for managing information, an information management method, and an information management apparatus.

プログラムに基づき所定の処理を実行する処理部を備える処理装置に関して、処理部における性能トラブル、例えば、一部のプロセスの一時的なレスポンスの劣化等の原因を調査する方法として、処理部における処理状況を示す性能情報を解析する方法がある。 Regarding a processing device including a processing unit that executes predetermined processing based on a program, the processing status in the processing unit as a method for investigating the cause of performance troubles in the processing unit, for example, temporary response deterioration of some processes, etc. There is a method of analyzing performance information indicating

性能情報は、例えば、処理装置を管理するＯＳ（Operating System）により提供される情報であり、具体的には、処理部を監視するカウンタが、処理部でのプロセスの実行により発生する、例えば、書込要求数や書込要求サイズ等をカウントすることで生成されている。 The performance information is, for example, information provided by an OS (Operating System) that manages the processing device. Specifically, a counter that monitors the processing unit is generated by executing a process in the processing unit. It is generated by counting the number of write requests, the write request size, and the like.

特開２００６−２２７９９９号公報JP 2006-227999 A 特開平６−５９９４４号公報JP-A-6-59944 特開２００２−２８８００５号公報JP 2002-288005 A

しかしながら、カウンタでは実行されたプロセスを特定する情報やプロセスの実行が誰によるものかを特定するような情報は生成されないため、性能情報を解析しても、性能トラブルが誰のどのような処理によるものかを特定することができなかった。このため、性能トラブルの種類によっては、原因を特定するまでに多大な時間を要してしまう可能性があった。 However, the counter does not generate information that identifies the process that was executed or information that identifies who the process was executed by. I couldn't determine what it was. For this reason, depending on the type of performance trouble, it may take a long time to identify the cause.

このような点に鑑み、性能トラブルの原因の特定に要する時間の短縮を図る、情報管理プログラム、情報管理方法、及び、情報管理装置を提供することを目的とする。 In view of these points, an object of the present invention is to provide an information management program, an information management method, and an information management apparatus that can reduce the time required to identify the cause of a performance problem.

上記目的を達成するために以下のような情報管理プログラムが提供される。
この情報管理プログラムは、コンピュータを、処理を実行する処理部のトレース情報に基づいて処理部の処理状況を示す性能情報を生成し、性能情報と生成元のトレース情報とを対応付ける生成手段、生成した性能情報に基づいて所定の事象を検出する検出手段、所定の事象を検出すると、検出元になった性能情報と性能情報に対応するトレース情報とを出力する出力手段、として実行させる。 In order to achieve the above object, the following information management program is provided.
The information management program generates performance information indicating the processing status of the processing unit based on the trace information of the processing unit that executes the process, and generates a generation unit that associates the performance information with the trace information of the generation source. It is executed as detection means for detecting a predetermined event based on performance information, and as output means for outputting performance information as a detection source and trace information corresponding to the performance information when a predetermined event is detected.

開示の情報管理プログラムによれば、性能トラブルの原因の特定に要する時間の短縮を図ることが可能となる。 According to the disclosed information management program, it is possible to reduce the time required to identify the cause of the performance trouble.

第１の実施形態に係る情報管理装置の一例を示す図。The figure which shows an example of the information management apparatus which concerns on 1st Embodiment. 第１の実施形態に係る情報管理装置の処理手順の一例を示すフローチャート。5 is a flowchart illustrating an example of a processing procedure of the information management apparatus according to the first embodiment. 第２の実施形態に係る情報管理装置のハードウェアの一例を示すブロック図。The block diagram which shows an example of the hardware of the information management apparatus which concerns on 2nd Embodiment. 第２の実施形態に係る情報管理装置の機能の一例を示すブロック図。The block diagram which shows an example of the function of the information management apparatus which concerns on 2nd Embodiment. トレース情報の一例を示す図。The figure which shows an example of trace information. トレース情報の一例を示す図。The figure which shows an example of trace information. トレース情報の一例を示す図。The figure which shows an example of trace information. 変換テーブルの一例を示す図。The figure which shows an example of a conversion table. 第１の期間分の性能情報の一例を示す図。The figure which shows an example of the performance information for the 1st period. 第２の期間分の性能情報の一例を示す図。The figure which shows an example of the performance information for the 2nd period. 第２の実施形態に係る情報管理装置の処理手順の一例を示すフローチャート。9 is a flowchart illustrating an example of a processing procedure of the information management apparatus according to the second embodiment. 第２の実施形態に係る情報管理装置の処理手順の一例を示すフローチャート。9 is a flowchart illustrating an example of a processing procedure of the information management apparatus according to the second embodiment. 調査の一例を説明する図。The figure explaining an example of investigation. 調査の一例を説明する図。The figure explaining an example of investigation. 調査の一例を説明する図。The figure explaining an example of investigation. 調査の一例を説明する図。The figure explaining an example of investigation.

以下、実施形態を図面を参照して説明する。
［第１の実施形態］
図１は、第１の実施形態に係る情報管理装置の一例を示す図である。情報管理装置１０は、処理部１１と、トレース情報記憶部１２と、生成部１３と、検出部１４と、出力部１５とを有する。 Hereinafter, embodiments will be described with reference to the drawings.
[First Embodiment]
FIG. 1 is a diagram illustrating an example of an information management apparatus according to the first embodiment. The information management apparatus 10 includes a processing unit 11, a trace information storage unit 12, a generation unit 13, a detection unit 14, and an output unit 15.

処理部１１は、プログラムに基づいて所定の処理を実行する。トレース情報記憶部１２は、処理部１１における処理の履歴を示すトレース情報を記憶する。ここで、トレース情報は、処理を特定する情報や、処理の実行が誰によるものかを示す情報を含んでいる。また、トレース情報は、例えば、情報管理装置１０を管理するＯＳ（Operating System）により提供される。 The processing unit 11 executes predetermined processing based on the program. The trace information storage unit 12 stores trace information indicating a history of processing in the processing unit 11. Here, the trace information includes information specifying the process and information indicating who executed the process. The trace information is provided by, for example, an OS (Operating System) that manages the information management apparatus 10.

生成部１３は、トレース情報記憶部１２に記憶されたトレース情報に基づいて、性能情報を生成する。性能情報は、処理部１１の処理状況を示す情報であり、例えば、応答速度や処理速度等を示す指標を含む。例えば、性能情報は、書込要求数、書込要求サイズ、ＣＰＵ消費率、メモリ使用量、ＩＯ稼働率、読込要求数、読込要求サイズ等を含む。さらに、生成部１３は、生成した性能情報と生成元のトレース情報とを対応付ける。 The generation unit 13 generates performance information based on the trace information stored in the trace information storage unit 12. The performance information is information indicating the processing status of the processing unit 11 and includes, for example, an index indicating a response speed, a processing speed, and the like. For example, the performance information includes the number of write requests, the write request size, the CPU consumption rate, the memory usage, the IO operation rate, the number of read requests, the read request size, and the like. Furthermore, the generation unit 13 associates the generated performance information with the generation source trace information.

検出部１４は、生成部１３が生成した性能情報に基づいて、所定の事象を検出する。所定の事象とは、例えば、通常の状態とは大きく異なる異常状態を示す特異事象である。多くの場合、この特異事象は、性能トラブルの現象となって現れる。 The detection unit 14 detects a predetermined event based on the performance information generated by the generation unit 13. The predetermined event is, for example, a specific event indicating an abnormal state that is significantly different from the normal state. In many cases, this unique event appears as a phenomenon of performance trouble.

出力部１５は、検出部１４が所定の事象を検出すると、検出元になった性能情報と、この性能情報に対応するトレース情報とを、例えば、保存部や表示装置の表示画面等（図示せず）に出力する。 When the detection unit 14 detects a predetermined event, the output unit 15 displays performance information that is a detection source and trace information corresponding to the performance information, for example, a display screen of a storage unit or a display device (not illustrated). Output).

なお、情報管理装置１０は、処理部１１及びトレース情報記憶部１２を必ずしも有している必要はない。例えば、情報管理装置１０とネットワーク等で接続された外部の処理装置に、処理部１１及びトレース情報記憶部１２を設定することも可能である。この場合、情報管理装置１０は、ネットワーク等を介して外部の処理装置からトレース情報を取得する。 The information management apparatus 10 does not necessarily have the processing unit 11 and the trace information storage unit 12. For example, the processing unit 11 and the trace information storage unit 12 can be set in an external processing device connected to the information management device 10 via a network or the like. In this case, the information management apparatus 10 acquires trace information from an external processing apparatus via a network or the like.

次に、この情報管理装置１０の動作について説明する。図２は、第１の実施形態に係る情報管理装置の処理手順の一例を示すフローチャートである。
［ステップＳ１１］処理を開始すると、生成部１３が、トレース情報記憶部１２に記憶されているトレース情報に基づいて性能情報を生成する。 Next, the operation of the information management apparatus 10 will be described. FIG. 2 is a flowchart illustrating an example of a processing procedure of the information management apparatus according to the first embodiment.
[Step S11] When processing is started, the generation unit 13 generates performance information based on the trace information stored in the trace information storage unit 12.

［ステップＳ１２］生成部１３が、ステップＳ１１で生成された性能情報と、生成元のトレース情報とを対応付ける。
［ステップＳ１３］検出部１４が、ステップＳ１１で生成された性能情報に基づいて、所定の事象を検出する。所定の事象を検出した場合、処理をステップＳ１４に進める。所定の事象が検出されない場合は、処理を終了する。 [Step S12] The generation unit 13 associates the performance information generated in step S11 with the trace information of the generation source.
[Step S13] The detection unit 14 detects a predetermined event based on the performance information generated in step S11. If a predetermined event is detected, the process proceeds to step S14. If the predetermined event is not detected, the process is terminated.

［ステップＳ１４］ステップＳ１３で、検出部１４が所定の事象を検出した場合、出力部１５が、検出元になった性能情報と、この性能情報に対応するトレース情報とを出力して処理を終了する。 [Step S14] When the detection unit 14 detects a predetermined event in step S13, the output unit 15 outputs the performance information as the detection source and the trace information corresponding to the performance information, and ends the processing. To do.

なお、性能トラブルの原因の調査は、出力部１５が出力した情報を調査することで行われる。例えば、出力部１５が情報を表示装置の表示画面に出力する場合、ユーザは、この表示画面を閲覧することで調査を行うことが可能となる。 The cause of the performance trouble is investigated by examining the information output by the output unit 15. For example, when the output unit 15 outputs information to the display screen of the display device, the user can perform an investigation by browsing the display screen.

このように、情報管理装置１０は、性能情報と生成元のトレース情報とを対応付け、性能情報に所定の事象が検出された場合、検出元になった性能情報と、この性能情報と対応するトレース情報とを出力する。 In this way, the information management apparatus 10 associates the performance information with the trace information of the generation source, and when a predetermined event is detected in the performance information, the performance management information corresponding to the detection source corresponds to the performance information. Output trace information.

これにより、性能トラブルの原因を調査する際、トレース情報にまで遡って調査を行うことが可能となる。トレース情報は、処理を特定する情報や、処理の実行が誰によるものかを示す情報を含んでいるため、トレース情報を調査することで、性能トラブルの原因が誰のどのような処理によるものかを、短時間で特定することが可能となる。 As a result, when investigating the cause of a performance problem, it is possible to investigate retroactively to the trace information. The trace information includes information that identifies the process and information that indicates who is executing the process. By examining the trace information, the process that caused the performance trouble is caused by who and what process. Can be specified in a short time.

さらに、情報管理装置１０では、出力される性能情報及びトレース情報は、所定の事象が検出された情報のみなので、調査の対象を絞ることが可能となり、調査時間をさらに短縮することが可能となる。 Furthermore, in the information management apparatus 10, since the output performance information and trace information are only information in which a predetermined event is detected, it is possible to narrow down the investigation target and further reduce the investigation time. .

次に、情報管理装置１０をより具体的にした例を、第２の実施形態に説明する。
［第２の実施形態］
第２の実施形態に係る情報管理装置について説明する。まず、情報管理装置のハードウェアについて説明する。図３は、第２の実施形態に係る情報管理装置のハードウェアの一例を示すブロック図である。 Next, a more specific example of the information management apparatus 10 will be described in the second embodiment.
[Second Embodiment]
An information management apparatus according to the second embodiment will be described. First, the hardware of the information management apparatus will be described. FIG. 3 is a block diagram illustrating an example of hardware of the information management apparatus according to the second embodiment.

情報管理装置１００は、ＣＰＵ（Central Processing Unit）１０１によって装置全体が制御されている。ＣＰＵ１０１には、バス１０６を介してＲＡＭ（Random Access Memory）１０２、ハードディスクドライブ（ＨＤＤ：Hard Disk Drive）１０３、及びグラフィック処理部１０４、入力インタフェース１０５が接続されている。 The information management apparatus 100 is entirely controlled by a CPU (Central Processing Unit) 101. A random access memory (RAM) 102, a hard disk drive (HDD) 103, a graphic processing unit 104, and an input interface 105 are connected to the CPU 101 via a bus 106.

ＲＡＭ１０２には、ＣＰＵ１０１に実行させるＯＳのプログラムやアプリケーションプログラムの少なくとも一部が一時的に格納される。また、ＲＡＭ１０２には、ＣＰＵ１０１による処理に必要な各種データが格納される。ＨＤＤ１０３には、ＯＳやアプリケーションのプログラム、各種データが格納される。グラフィック処理部１０４には、モニタ１０７が接続されており、ＣＰＵ１０１からの命令に従って画像をモニタ１０７の画面に表示させる。入力インタフェース１０５には、キーボード１０８ａやマウス１０８ｂ等の入力デバイスが接続されており、キーボード１０８ａやマウス１０８ｂから送られてくる信号を、バス１０６を介してＣＰＵ１０１に送信する。 The RAM 102 temporarily stores at least a part of OS programs and application programs to be executed by the CPU 101. The RAM 102 stores various data necessary for processing by the CPU 101. The HDD 103 stores the OS, application programs, and various data. A monitor 107 is connected to the graphic processing unit 104, and an image is displayed on the screen of the monitor 107 in accordance with a command from the CPU 101. Input devices such as a keyboard 108 a and a mouse 108 b are connected to the input interface 105, and signals sent from the keyboard 108 a and the mouse 108 b are transmitted to the CPU 101 via the bus 106.

このようなハードウェアによって、情報管理装置１００の処理機能を実現することができる。次に、情報管理装置１００の機能について説明する。図４は、第２の実施形態に係る情報管理装置の機能の一例を示すブロック図である。 The processing function of the information management apparatus 100 can be realized by such hardware. Next, functions of the information management apparatus 100 will be described. FIG. 4 is a block diagram illustrating an example of functions of the information management apparatus according to the second embodiment.

情報管理装置１００は、処理部１１０と、トレース情報記憶部１２０と、性能情報生成部１３０と、性能情報生成部１４０と、事象検出部１５０と、情報出力部１６０と、保存部１７０とを有する。 The information management apparatus 100 includes a processing unit 110, a trace information storage unit 120, a performance information generation unit 130, a performance information generation unit 140, an event detection unit 150, an information output unit 160, and a storage unit 170. .

処理部１１０は、プログラムに基づいて所定のプロセスやスレッド等の処理を実行する。処理部１１０は、例えば、ＡＰＩ（Application Program Interface）を実行するＡＰＩ層１１１と、情報を一時記憶するキャッシュ層１１２と、情報の入出力を行うＩＯ層１１３とを含んでいる。 The processing unit 110 executes processing such as a predetermined process or thread based on a program. The processing unit 110 includes, for example, an API layer 111 that executes an application program interface (API), a cache layer 112 that temporarily stores information, and an IO layer 113 that inputs and outputs information.

トレース情報記憶部１２０は、処理部１１０における処理の履歴を示すトレース情報を記憶する。ここでは、トレース情報記憶部１２０は、各ＡＰＩ層１１１、キャッシュ層１１２、及び、ＩＯ層１１３に対応したトレース情報をそれぞれ記憶する。 The trace information storage unit 120 stores trace information indicating a history of processing in the processing unit 110. Here, the trace information storage unit 120 stores trace information corresponding to each API layer 111, cache layer 112, and IO layer 113.

ここで、トレース情報は、プロセスやスレッドを特定する情報や、プロセスやスレッドの実行が誰によるものかを示す情報を含んでいる。また、トレース情報は、例えば、情報管理装置１００を管理するＯＳにより提供される。なお、トレース情報記憶部１２０に記憶されたトレース情報は、例えば、一定時間（１分程度）経過すると、次のトレース情報に上書きされる。 Here, the trace information includes information for identifying a process or thread and information indicating who is executing the process or thread. The trace information is provided by, for example, an OS that manages the information management apparatus 100. Note that the trace information stored in the trace information storage unit 120 is overwritten on the next trace information when, for example, a certain time (about 1 minute) has elapsed.

性能情報生成部１３０は、トレース情報記憶部１２０に記憶されたトレース情報に基づいて、性能情報を生成する。性能情報は、処理部１１０の処理状況を示し、例えば、応答速度や処理速度等を示す指標を含む。具体的には、性能情報は、例えば、書込要求数、書込要求サイズ、ＣＰＵ消費率、メモリ使用量、ＩＯ稼働率等を含む。さらに、性能情報生成部１３０は、生成した性能情報と生成元のトレース情報とを対応付けて記憶する。 The performance information generation unit 130 generates performance information based on the trace information stored in the trace information storage unit 120. The performance information indicates the processing status of the processing unit 110 and includes, for example, an index indicating a response speed, a processing speed, and the like. Specifically, the performance information includes, for example, the number of write requests, the write request size, the CPU consumption rate, the memory usage, the IO operation rate, and the like. Furthermore, the performance information generation unit 130 stores the generated performance information and the generation source trace information in association with each other.

具体的には、性能情報生成部１３０は、変換部１３１と、トレース情報退避部１３２と、カウンタ１３３とを有している。さらに、変換部１３１は、トレース情報を性能情報に変換するための変換テーブルを記憶した変換テーブル記憶部１３４を有している。 Specifically, the performance information generation unit 130 includes a conversion unit 131, a trace information saving unit 132, and a counter 133. Furthermore, the conversion unit 131 includes a conversion table storage unit 134 that stores a conversion table for converting trace information into performance information.

変換部１３１は、トレース情報記憶部１２０に記憶されたトレース情報に基づいて、変換テーブル記憶部１３４に記憶された変換テーブルを参照して、カウンタ１３３をカウントアップする変換操作を行う。なお、変換部１３１によるトレース情報の変換操作は、第１の期間（例えば、３０秒）毎に実施される。このとき、変換部１３１は、直前の第１の期間（例えば、３０秒）分のトレース情報を変換操作の対象としている。 Based on the trace information stored in the trace information storage unit 120, the conversion unit 131 refers to the conversion table stored in the conversion table storage unit 134 and performs a conversion operation for counting up the counter 133. Note that the conversion operation of the trace information by the conversion unit 131 is performed every first period (for example, 30 seconds). At this time, the conversion unit 131 sets the trace information for the immediately preceding first period (for example, 30 seconds) as a conversion operation target.

トレース情報退避部１３２は、変換部１３１で変換操作の対象となった第１の期間分のトレース情報を記憶する。なお、トレース情報退避部１３２に記憶されたトレース情報は、記憶されてから所定の期間（例えば、１０分）が経過すると、トレース情報退避部１３２から消去される。 The trace information saving unit 132 stores the trace information for the first period that is the target of the conversion operation by the conversion unit 131. Note that the trace information stored in the trace information saving unit 132 is erased from the trace information saving unit 132 when a predetermined period (for example, 10 minutes) has elapsed since the storage.

カウンタ１３３は、性能情報を示す値を記憶するものであり、変換部１３１の変換操作に基づきカウントを行う。即ち、変換部１３１が第１の期間分のトレース情報に基づいてカウンタ１３３をカウントアップさせる変換操作を行うことで、カウンタ１３３に、第１の期間分の性能情報が生成されて記憶される。また、カウンタ１３３は、例えば、性能情報の種類別に設けられた複数のカウンタ部により構成されている。例えば、カウンタ１３３は、書込要求数を示すカウンタ部と、書込要求サイズを示すカウンタ部と、ＣＰＵ消費率を示すカウンタ部等を含んでいる。 The counter 133 stores a value indicating performance information, and performs counting based on the conversion operation of the conversion unit 131. That is, the conversion unit 131 performs a conversion operation for counting up the counter 133 based on the trace information for the first period, so that the performance information for the first period is generated and stored in the counter 133. In addition, the counter 133 is configured by, for example, a plurality of counter units provided for each type of performance information. For example, the counter 133 includes a counter unit indicating the number of write requests, a counter unit indicating the write request size, a counter unit indicating the CPU consumption rate, and the like.

ここで、生成された第１の期間分の性能情報は、生成元となったトレース情報と対応付けられて記憶される。即ち、第１の期間分の性能情報から生成元となったトレース情報が特定できるようにしている。例えば、これは、第１の期間分の性能情報に、生成元となったトレース情報の検索キーを付与することで実現することが可能である。なお、カウンタ１３３は、例えば、変換部１３１が、第１の期間経過後、次の回の変換操作を開始すると、ゼロクリアされる。 Here, the generated performance information for the first period is stored in association with the trace information that is the generation source. That is, the trace information that is the generation source can be specified from the performance information for the first period. For example, this can be realized by adding a search key for the trace information that is the generation source to the performance information for the first period. Note that the counter 133 is cleared to zero when, for example, the conversion unit 131 starts the next conversion operation after the first period has elapsed.

次に、性能情報生成部１４０は、カウンタ１３３が記憶した第１の期間分の性能情報を取得して蓄積し、蓄積した第１の期間分の性能情報群が第２の期間分に達していると、蓄積した第１の期間分の性能情報群を集計して第２の期間分の性能情報を生成する。そして、性能情報生成部１４０は、生成した第２の期間分の性能情報と集計元の第１の期間分の性能情報群とを対応付けて記憶する。 Next, the performance information generation unit 140 acquires and accumulates performance information for the first period stored by the counter 133, and the accumulated performance information group for the first period reaches the second period. If so, the accumulated performance information group for the first period is totaled to generate performance information for the second period. Then, the performance information generation unit 140 stores the generated performance information for the second period and the performance information group for the first period of the aggregation source in association with each other.

具体的には、性能情報生成部１４０は、集計部１４１と、性能情報退避部１４２と、カウンタ１４３とを有している。集計部１４１は、性能情報生成部１３０のカウンタ１３３が記憶した第１の期間分の性能情報を、記憶される毎、即ち、第１の期間毎に取得して、第１の期間分の性能情報群を蓄積する。 Specifically, the performance information generation unit 140 includes a counting unit 141, a performance information saving unit 142, and a counter 143. The aggregation unit 141 acquires the performance information for the first period stored by the counter 133 of the performance information generation unit 130 every time it is stored, that is, for each first period, and acquires the performance information for the first period. Accumulate information group.

さらに、集計部１４１は、蓄積した第１の期間分の性能情報群が、第２の期間（例えば、１０分）分に達していると、第１の期間分の性能情報群を、性能情報の種類毎、例えば、書込要求数、書込要求サイズ、ＣＰＵ消費率毎に集計して１つにまとめる。これにより、第２の期間（例えば、１０分間）分の性能情報が生成される。 Further, when the accumulated performance information group for the first period reaches the second period (for example, 10 minutes), the counting unit 141 converts the performance information group for the first period to the performance information. For each type, for example, the number of write requests, the write request size, and the CPU consumption rate. Thereby, performance information for the second period (for example, 10 minutes) is generated.

なお、集計部１４１は、カウンタ１３３が記憶した第１の期間分の性能情報を取得することで、蓄積する第１の期間分の性能情報群が第２の期間分を超えた場合、一番古く蓄積した第１の期間分の性能情報を消去する。そして、今回取得された分を含めた第２の期間分の第１の性能情報群を、性能情報の種類毎に集計する。 Note that the totaling unit 141 acquires the performance information for the first period stored in the counter 133, so that when the accumulated performance information group for the first period exceeds the second period, The old accumulated performance information for the first period is deleted. And the 1st performance information group for the 2nd period including the part acquired this time is totaled for every kind of performance information.

性能情報退避部１４２は、集計部１４１で生成された第２の期間分の性能情報の集計元となった第１の期間分の性能情報群を記憶する。なお、性能情報退避部１４２に記憶された第１の期間分の性能情報群は、記憶されてから所定の期間（例えば、７日間）が経過すると、性能情報退避部１４２から消去される。 The performance information saving unit 142 stores the performance information group for the first period, which is the aggregation source of the performance information for the second period generated by the aggregation unit 141. Note that the performance information group for the first period stored in the performance information saving unit 142 is deleted from the performance information saving unit 142 when a predetermined period (for example, seven days) has elapsed since the storage.

カウンタ１４３は、集計部１４１で生成された第２の期間分の性能情報を記憶する。ここで、第２の期間分の性能情報は、集計元となった第１の期間分の性能情報群と対応付けられて記憶される。即ち、第２の期間分の性能情報から集計元となった第１の期間分の性能情報群が特定できるようにしている。例えば、これは、第２の期間分の性能情報に、集計元となった第１の期間分の性能情報群の検索キーを付与することで実現することが可能である。なお、カウンタ１４３は、例えば、集計部１４１が、第１の期間経過後、次の回の集計を開始すると、ゼロクリアされる。 The counter 143 stores the performance information for the second period generated by the counting unit 141. Here, the performance information for the second period is stored in association with the performance information group for the first period that is the aggregation source. That is, the performance information group for the first period that is the source of aggregation can be identified from the performance information for the second period. For example, this can be realized by assigning a search key for the performance information group for the first period, which is the source of aggregation, to the performance information for the second period. Note that the counter 143 is cleared to zero when, for example, the counting unit 141 starts counting the next time after the first period has elapsed.

次に、事象検出部１５０は、カウンタ１３３に記憶された第１の期間分の性能情報と、カウンタ１４３に記憶された第２の期間分の性能情報とを比較し、比較結果に基づいて所定の事象を検出する。所定の事象とは、例えば、第２の期間分の性能情報に対して、第１の期間分の性能情報が異常値を示すような特異事象である。具体例としては、１０分間のＣＰＵ消費率が２０％未満とそれほど高くないにもかかわらず、３０秒間の書込要求サイズが１０ＭＢを超えるような非常に大きい場合である。 Next, the event detection unit 150 compares the performance information for the first period stored in the counter 133 with the performance information for the second period stored in the counter 143, and determines the predetermined information based on the comparison result. Detect the event. The predetermined event is, for example, a specific event in which the performance information for the first period indicates an abnormal value with respect to the performance information for the second period. As a specific example, the CPU consumption rate for 10 minutes is not so high as less than 20%, but the write request size for 30 seconds is very large such that it exceeds 10 MB.

ここで、第１の期間分の性能情報が、第１の期間分よりも長い第２の期間分の大局的な性能情報に対して異常値を示すような場合は、その第１の期間における処理において、性能トラブルが発生している可能性が高いと考えられる。 Here, when the performance information for the first period shows an abnormal value for the global performance information for the second period longer than the first period, the performance information for the first period It is highly probable that a performance problem has occurred during processing.

ここでは、事象検出部１５０は、特異事象として識別される条件を記憶する条件記憶部１５１を有している。そして、この条件記憶部１５１に記憶された条件を参照して、比較結果から所定の事象を検出する。 Here, the event detection unit 150 includes a condition storage unit 151 that stores a condition identified as a unique event. Then, a predetermined event is detected from the comparison result with reference to the condition stored in the condition storage unit 151.

また、例えば、性能トラブルが発生した際に現れる事象が予め分かっていれば、この事象を識別する条件を条件記憶部１５１に記憶しておくことで、性能トラブルに伴う事象を検出する検出精度を向上させることが可能となる。 In addition, for example, if an event that appears when a performance trouble occurs is known in advance, a condition for identifying the event is stored in the condition storage unit 151, so that the detection accuracy for detecting the event associated with the performance trouble is increased. It becomes possible to improve.

情報出力部１６０は、事象検出部１５０が所定の事象を検出すると、カウンタ１４３に記憶された第２の期間分の性能情報と、カウンタ１３３に記憶された第１の期間分の性能情報と、トレース情報退避部１３２に記憶された直前の第１の期間分のトレース情報とを取得する。さらに、情報出力部１６０は、取得した第２の期間分の性能情報、第１の期間分の性能情報、及び、トレース情報を、保存部１７０に出力する。 When the event detection unit 150 detects a predetermined event, the information output unit 160 includes the performance information for the second period stored in the counter 143, the performance information for the first period stored in the counter 133, Trace information for the first period immediately before stored in the trace information saving unit 132 is acquired. Furthermore, the information output unit 160 outputs the acquired performance information for the second period, performance information for the first period, and trace information to the storage unit 170.

なお、情報出力部１６０は、さらに、性能情報退避部１４２に記憶された直前の第１の期間分の性能情報群と、トレース情報記憶部１２０に記憶されたトレース情報とを取得し、保存部１７０に出力しても良い。 The information output unit 160 further obtains the performance information group for the first period immediately before stored in the performance information saving unit 142 and the trace information stored in the trace information storage unit 120, and saves the storage unit. You may output to 170.

保存部１７０に保存された、第２の期間分の性能情報と、第１の期間分の性能情報と、トレース情報とは、例えば、図３に示す情報管理装置１００のモニタ１０７に表示される。この時、例えば、第２の期間分の性能情報と、第１の期間分の性能情報と、トレース情報とが、互いに対応付けられて表示される。 The performance information for the second period, the performance information for the first period, and the trace information stored in the storage unit 170 are displayed on, for example, the monitor 107 of the information management apparatus 100 illustrated in FIG. . At this time, for example, performance information for the second period, performance information for the first period, and trace information are displayed in association with each other.

次に、トレース情報記憶部１２０が記憶するトレース情報について説明する。
図５〜図７は、トレース情報の一例を示す図である。
処理部１１０のＡＰＩ層１１１の処理履歴を示すトレース情報１２１は、例えば、図５に示される。トレース情報１２１は、処理が実行された時間（ＴＩＭＥ）と、処理プロセス及びスレッドを特定するプロセスＩＤ及びスレッドＩＤとを含む。さらに、トレース情報１２１は、処理の実行が誰によるものかを示す呼出元アドレスと、処理の依頼の履歴か、又は、処理の応答の履歴かを特定する区分と、処理の種類を示す機能名と、処理内容を示すパラメタとを含む。 Next, the trace information stored in the trace information storage unit 120 will be described.
5 to 7 are diagrams illustrating examples of trace information.
The trace information 121 indicating the processing history of the API layer 111 of the processing unit 110 is illustrated in FIG. 5, for example. The trace information 121 includes a time (TIME) when the process is executed, and a process ID and a thread ID that specify a process and a thread. Furthermore, the trace information 121 includes a caller address that indicates who is executing the process, a classification that identifies whether the process is a request history or a process response history, and a function name that indicates the type of process. And a parameter indicating the processing content.

例えば、プログラムＡ（プロセスＮ１、スレッドＭ１）がファイルＡに３２ＭＢの書き込みを行う場合、トレース情報１２１の最上欄に示されるように、ＴＩＭＥに処理が実行された時間が記憶され、プロセスＩＤに「Ｎ１」、スレッドＩＤに「Ｍ１」が記憶される。さらに、呼出元アドレスにこの処理の呼出元アドレスである「ｘｘｘ１」が記憶され、区分に「ｃａｌｌ」が記憶され、機能名に「ｆｗｒｉｔｅ」が記憶され、パラメタに「ファイルＡ，１，３２Ｍ，１」等が記憶される。 For example, when the program A (process N1, thread M1) writes 32 MB to the file A, as shown in the uppermost column of the trace information 121, the time when the process is executed is stored in the TIME, and the process ID is “ N1 ”and“ M1 ”are stored in the thread ID. Furthermore, “xxx1” which is the caller address of this process is stored in the caller address, “call” is stored in the category, “fwrite” is stored in the function name, and “file A, 1, 32M, 1 "etc. are stored.

処理部１１０のキャッシュ層１１２の処理履歴を示すトレース情報１２２は、例えば、図６に示される。トレース情報１２２は、処理が実行された時間（ＴＩＭＥ）と、処理プロセス及びスレッドの依頼元を特定する依頼元プロセスＩＤ及び依頼元スレッドＩＤとを含む。さらに、トレース情報１２２は、処理の実行が誰によるものかを示す呼出元アドレスと、処理の依頼の履歴か、又は、処理の応答の履歴かを特定する区分と、処理の種類を示す機能名と、処理内容を示すパラメタとを含む。 The trace information 122 indicating the processing history of the cache layer 112 of the processing unit 110 is illustrated in FIG. 6, for example. The trace information 122 includes a time (TIME) when the process is executed, a request source process ID and a request source thread ID that specify the request source of the processing process and the thread. Furthermore, the trace information 122 includes a caller address indicating who is executing the process, a classification specifying whether the process is a request history or a process response history, and a function name indicating the type of process. And a parameter indicating the processing content.

処理部１１０のＩＯ層１１３の処理履歴を示すトレース情報１２３は、例えば、図７に示される。トレース情報１２３は、処理が実行された時間（ＴＩＭＥ）と、処理プロセス及びスレッドの依頼元を特定する依頼元プロセスＩＤ及び依頼元スレッドＩＤとを含む。さらに、トレース情報１２３は、処理の実行が誰によるものかを示す呼出元アドレスと、処理の依頼の履歴か、又は、処理の応答の履歴かを特定する区分と、処理の種類を示す機能名と、処理内容を示すパラメタとを含む。 The trace information 123 indicating the processing history of the IO layer 113 of the processing unit 110 is illustrated in FIG. 7, for example. The trace information 123 includes a time (TIME) when the process is executed, a request source process ID and a request source thread ID for specifying the request source of the processing process and the thread. Further, the trace information 123 includes a caller address indicating who is executing the process, a classification specifying whether the process is a request history or a process response history, and a function name indicating the type of process. And a parameter indicating the processing content.

次に、変換テーブル記憶部１３４に記憶されている変換テーブルについて説明する。図８は、変換テーブルの一例を示す図である。変換テーブルは処理部１１０内の各層別に区分れている。ここでは、変換テーブルは、例えば、ＡＰＩ層、キャッシュ層、ＩＯ層に区分されている。さらに、変換テーブルには、各層別に、変換対象となるトレース情報の機能名、カウンタ１３３における変換先のカウンタ部、変換操作内容が設けられている。 Next, the conversion table stored in the conversion table storage unit 134 will be described. FIG. 8 is a diagram illustrating an example of the conversion table. The conversion table is divided for each layer in the processing unit 110. Here, the conversion table is divided into, for example, an API layer, a cache layer, and an IO layer. Further, the conversion table is provided with the function name of the trace information to be converted, the conversion destination counter section in the counter 133, and the conversion operation content for each layer.

ＡＰＩ層の欄については、例えば、機能名が書き込みを示す「ｆｗｒｉｔｅ」であり、変換先カウンタ部が、書込要求数（全体）、書込要求数（プロセス）、書込要求サイズ（全体）、書込要求サイズ（プロセス）である。さらに、変換操作内容が、「書込要求数（全体）に１を加える」、「書込要求数（プロセスＩＤ）に１を加える」、「書込要求サイズ（全体）に第３パラメタを加える」、「書込要求サイズ（プロセスＩＤ）に第３パラメタを加える」である。 In the API layer column, for example, the function name is “fwrite” indicating writing, and the conversion destination counter unit has the number of write requests (whole), the number of write requests (process), and the write request size (whole). Write request size (process). Furthermore, the contents of the conversion operation are “add 1 to the number of write requests (total)”, “add 1 to the number of write requests (process ID)”, and add the third parameter to the write request size (total). And “Add third parameter to write request size (process ID)”.

キャッシュ層の欄については、例えば、機能名が書き込みを示す「ｃｗｒｉｔｅ」であり、変換先カウンタ部が、書込要求数（論理装置）、書込要求サイズ（論理装置）である。さらに、変換操作内容が、「書込要求数（論理装置）に１を加える」、「書込要求サイズ（論理装置）に第３パラメタを加える」である。 In the cache layer column, for example, the function name is “cwrite” indicating writing, and the conversion destination counter unit is the number of write requests (logical device) and the write request size (logical device). Furthermore, the contents of the conversion operation are “add 1 to the number of write requests (logical device)” and “add the third parameter to the write request size (logical device)”.

ＩＯ層の欄については、例えば、機能名が書き込みを示す「ｉｏｗｒｉｔｅ」であり、変換先カウンタ部が、書込要求数（物理装置）、書込要求サイズ（物理装置）である。さらに、変換操作内容が、「書込要求数（物理装置）に１を加える」、「書込要求サイズ（物理装置）に第３パラメタを加える」である。 In the IO layer column, for example, the function name is “iowrite” indicating writing, and the conversion destination counter unit is the number of write requests (physical device) and the write request size (physical device). Furthermore, the contents of the conversion operation are “add 1 to the number of write requests (physical device)” and “add the third parameter to the write request size (physical device)”.

例えば、機能名が「ｆｗｒｉｔｅ」のＡＰＩ層１１１に対応するトレース情報に対しては、変換テーブル（ＡＰＩ層）に基づいて、次の通り変換操作が行われる。即ち、カウンタ１３３における書込要求数（全体）を示すカウンタ部、及び、書込要求数（プロセスＩＤ）を示すカウンタ部にそれぞれ１が加えられる。さらに、カウンタ１３３における書込要求サイズ（全体）を示すカウンタ部、及び、書込要求サイズ（プロセスＩＤ）を示すカウンタ部にそれぞれに書込情報のサイズが加えられる。 For example, the following conversion operation is performed on the trace information corresponding to the API layer 111 whose function name is “fwrite” based on the conversion table (API layer). That is, 1 is added to each of the counter unit indicating the number of write requests (total) in the counter 133 and the counter unit indicating the number of write requests (process ID). Furthermore, the size of the write information is added to the counter unit indicating the write request size (whole) and the counter unit indicating the write request size (process ID) in the counter 133, respectively.

次に、カウンタ１３３に記憶される第１の期間分の性能情報について説明する。図９は、第１の期間分の性能情報の一例を示す図である。
第１の期間分の性能情報では、性能情報の種類毎、即ち、カウンタ部毎に、それぞれカウンタ値が対応付けられている。ここでは、性能情報の種類に、例えば、プロセス全体の書込要求数及び書込要求サイズ、プロセスＩＤの書込要求数及び書込要求サイズ、論理装置の書込要求数及び書込要求サイズ、物理装置の書込要求数及び書込要求サイズを含んでいる。これらのそれぞれに対して、カウンタ値、及び、単位時間当たりのカウンタ値が対応付けられている。 Next, the performance information for the first period stored in the counter 133 will be described. FIG. 9 is a diagram illustrating an example of performance information for the first period.
In the performance information for the first period, a counter value is associated with each type of performance information, that is, for each counter unit. Here, the types of performance information include, for example, the number of write requests and the write request size of the entire process, the number of write requests and the write request size of the process ID, the number of write requests and the write request size of the logical device, It includes the number of physical device write requests and the write request size. Each of these is associated with a counter value and a counter value per unit time.

さらに、第１の期間分の性能情報は、当該性能情報の生成に要した期間を示す生成期間を含んでいる。図９に示される例では、第１の期間が３０秒間なので、生成期間が３０秒を示している。また、第１の期間分の性能情報には、生成時刻が付与されている。例えば、この生成時刻を、対応するトレース情報を検索する際の検索キーとすることも可能である。 Furthermore, the performance information for the first period includes a generation period indicating a period required to generate the performance information. In the example shown in FIG. 9, since the first period is 30 seconds, the generation period is 30 seconds. Moreover, the generation time is given to the performance information for the first period. For example, this generation time can be used as a search key when searching for the corresponding trace information.

次に、カウンタ１４３に記憶される第２の期間分の性能情報について説明する。図１０は、第２の期間分の性能情報の一例を示す図である。
第２の期間分の性能情報も、図９に示される第１の期間分の性能情報と同様の形式を備えている。図１０に示される例では、第２の期間が１０分間、即ち、６００秒間なので、生成期間が６００秒を示している。また、第２の期間分の性能情報にも、生成時刻が付与されている。例えば、この生成時刻を、対応する第１の性能情報群を検索する際の検索キーとすることも可能である。 Next, the performance information for the second period stored in the counter 143 will be described. FIG. 10 is a diagram illustrating an example of performance information for the second period.
The performance information for the second period also has the same format as the performance information for the first period shown in FIG. In the example shown in FIG. 10, since the second period is 10 minutes, that is, 600 seconds, the generation period is 600 seconds. The generation time is also given to the performance information for the second period. For example, this generation time can be used as a search key when searching for the corresponding first performance information group.

次に、情報管理装置１００の処理手順について説明する。図１１、図１２は、第２の実施形態に係る情報管理装置の処理手順の一例を示すフローチャートである。まず、図１１を用いて説明する。 Next, a processing procedure of the information management apparatus 100 will be described. 11 and 12 are flowcharts illustrating an example of a processing procedure of the information management apparatus according to the second embodiment. First, it demonstrates using FIG.

第１の期間（例えば、３０秒間）毎に処理が開始する。
［ステップＳ１１０］変換部１３１が、トレース情報記憶部１２０に記憶されている第１の期間分のトレース情報に基づいて、カウンタ１３３をカウントアップして、第１の期間分の性能情報を生成する。このとき、変換部１３１は、例えば、変換テーブル記憶部１３４に記憶されている変換テーブルを参照する。 The process starts every first period (for example, 30 seconds).
[Step S110] The conversion unit 131 counts up the counter 133 based on the trace information for the first period stored in the trace information storage unit 120, and generates performance information for the first period. . At this time, the conversion unit 131 refers to a conversion table stored in the conversion table storage unit 134, for example.

［ステップＳ１２０］トレース情報退避部１３２が、ステップＳ１１０で生成された第１の期間分の性能情報の生成元となった第１の期間分のトレース情報を記憶する。
［ステップＳ１３０］カウンタ１３３が、ステップＳ１１０で生成された第１の期間分の性能情報を、生成元となったトレース情報と対応付けて記憶する。 [Step S120] The trace information saving unit 132 stores the trace information for the first period that is the generation source of the performance information for the first period generated in Step S110.
[Step S130] The counter 133 stores the performance information for the first period generated in step S110 in association with the trace information that is the generation source.

［ステップＳ１４０］集計部１４１が、ステップＳ１３０で第１の期間分の性能情報がカウンタ１３３に記憶されると、カウンタ１３３から第１の期間分の性能情報を取得して蓄積する。次のステップからは図１２を用いて説明する。 [Step S140] When the performance information for the first period is stored in the counter 133 in step S130, the counting unit 141 acquires and accumulates the performance information for the first period from the counter 133. The following steps will be described with reference to FIG.

［ステップＳ１５０］集計部１４１が、ステップＳ１４０で蓄積した第１の期間分の性能情報群が第２の期間（例えば、１０分間）分に達しているかどうかを判定する。達している場合は処理をステップＳ１６０に進める。達していない場合は処理を終了する。 [Step S150] The counting unit 141 determines whether or not the performance information group for the first period accumulated in Step S140 has reached the second period (for example, 10 minutes). If so, the process proceeds to step S160. If not, the process is terminated.

［ステップＳ１６０］集計部１４１が、第１の期間分の性能情報群を集計して第２の期間分の性能情報を生成する。
［ステップＳ１７０］性能情報退避部１４２が、ステップＳ１６０で生成された第２の期間分の性能情報の集計元となった第１の期間分の性能情報群を記憶する。 [Step S160] The totaling unit 141 totals the performance information group for the first period to generate performance information for the second period.
[Step S170] The performance information saving unit 142 stores the performance information group for the first period, which is the aggregation source of the performance information for the second period generated in step S160.

［ステップＳ１８０］カウンタ１４３が、ステップＳ１６０で生成された第２の期間分の性能情報を、集計元となった第１の期間分の性能情報群と対応付けて記憶する。
［ステップＳ１９０］事象検出部１５０が、ステップＳ１３０でカウンタ１３３に記憶された第１の期間分の性能情報と、ステップＳ１８０でカウンタ１４３に記憶された第２の期間分の性能情報とを比較する。 [Step S180] The counter 143 stores the performance information for the second period generated in step S160 in association with the performance information group for the first period that is the source of aggregation.
[Step S190] The event detection unit 150 compares the performance information for the first period stored in the counter 133 in step S130 with the performance information for the second period stored in the counter 143 in step S180. .

［ステップＳ２００］事象検出部１５０が、ステップＳ１９０の比較結果に基づいて所定の事象を検出する。このとき、事象検出部１５０は、例えば、条件記憶部１５１に記憶されている条件を参照して、所定の事象を検出する。所定の事象が検出された場合、処理をステップＳ２１０に進める。所定の事象が検出されない場合、処理を終了する。 [Step S200] The event detection unit 150 detects a predetermined event based on the comparison result in step S190. At this time, the event detection unit 150 detects a predetermined event with reference to the conditions stored in the condition storage unit 151, for example. If a predetermined event is detected, the process proceeds to step S210. If the predetermined event is not detected, the process is terminated.

［ステップＳ２１０］情報出力部１６０が、ステップＳ１８０でカウンタ１４３に記憶された第２の性能情報、ステップＳ１３０でカウンタ１３３に記憶された第１の性能情報、ステップＳ１２０でトレース情報退避部１３２に記憶されたトレース情報を、保存部１７０に出力する。これで処理を終了する。なお、この時、情報出力部１６０が、さらに、ステップＳ１７０で性能情報退避部１４２に記憶された第１の期間分の性能情報群、及び、トレース情報記憶部１２０に記憶されたトレース情報を、保存部１７０に出力しても良い。なお、性能トラブルの原因の調査は、保存部１７０に保存された第２の期間分の性能情報、第１の期間分の性能情報群、及び、トレース情報を調査することで行われる。 [Step S210] The information output unit 160 stores the second performance information stored in the counter 143 in Step S180, the first performance information stored in the counter 133 in Step S130, and the trace information saving unit 132 in Step S120. The trace information thus output is output to the storage unit 170. This ends the process. At this time, the information output unit 160 further includes the performance information group for the first period stored in the performance information saving unit 142 in step S170 and the trace information stored in the trace information storage unit 120. You may output to the preservation | save part 170. FIG. The investigation of the cause of the performance trouble is performed by investigating the performance information for the second period, the performance information group for the first period, and the trace information stored in the storage unit 170.

ここで、保存部１７０に保存された情報から、ある性能トラブルの原因を調査する一例を説明する。図１３〜図１６は、調査の一例を説明する図である。
図１３には、カウンタ１３３が記憶する３０秒間の性能情報と、カウンタ１４３が記憶する１０分間の性能情報とが示されている。図１３に示すように、事象検出部１５０により、３０秒間の性能情報と１０分間の性能情報とから、１０分間のＣＰＵ消費率が１１％とそれほど高くないにもかかわらず、３０秒間の磁気ディスク＃１の書込要求サイズが１８ＭＢ／秒である特異事象が検出されたとする。ここで、磁気ディスク＃１は、磁気ディスクが有する領域の一部を指すものである。 Here, an example of investigating the cause of a certain performance trouble from the information stored in the storage unit 170 will be described. 13-16 is a figure explaining an example of investigation.
FIG. 13 shows the 30-second performance information stored in the counter 133 and the 10-minute performance information stored in the counter 143. As shown in FIG. 13, the event detection unit 150 uses a 30-second magnetic disk based on 30-second performance information and 10-minute performance information, even though the CPU consumption rate for 10 minutes is not as high as 11%. It is assumed that a unique event in which the # 1 write request size is 18 MB / second is detected. Here, the magnetic disk # 1 indicates a part of the area of the magnetic disk.

この特異事象が検出されると、情報出力部１６０が、検出元になった第１の期間分の性能情報をカウンタ１３３から取得し、さらに、この第１の期間分の性能情報に対応するトレース情報をトレース情報退避部１３２から取得する。そして、情報出力部１６０は、取得した第１の期間分の性能情報と、トレース情報とを、保存部１７０に出力する。 When this peculiar event is detected, the information output unit 160 acquires performance information for the first period as a detection source from the counter 133, and further traces corresponding to the performance information for the first period. Information is acquired from the trace information saving unit 132. Then, the information output unit 160 outputs the acquired performance information and trace information for the first period to the storage unit 170.

事象検出部１５０にて特異事象が検出されると、情報管理装置１００は、例えば、図３に示されるモニタ１０８にアラーム情報を表示する等して、特異事象が検出されたことをユーザに報知する。ユーザは、情報管理装置１００からの特異事象の報知に基づき、保存部１７０の調査を行う。なお、ユーザは、情報管理装置１００からの報知によらずに、定期的に保存部１７０の調査を行っても良いし、情報管理装置１００の操作中に動作速度が遅くなる等の不具合を感知することで、調査を行っても良い。 When the event detection unit 150 detects a specific event, the information management apparatus 100 notifies the user that the specific event has been detected, for example, by displaying alarm information on the monitor 108 shown in FIG. To do. The user investigates the storage unit 170 based on notification of a specific event from the information management apparatus 100. Note that the user may periodically check the storage unit 170 without notification from the information management apparatus 100, or may detect a malfunction such as a slow operation speed during the operation of the information management apparatus 100. By doing so, you may investigate.

保存部１７０の調査を行う場合、ユーザは、３０秒間の性能情報に対応付けされた同時間帯のトレース情報を参照して分析する。図１４は、対応するＩＯ層のトレース情報である。図１４に示すトレース情報１２３には、磁気ディスク＃１への書き込みが記憶されている。今回の特異事象は書込要求サイズに関するものであるため、書き込みのトレース情報（機能名：ｉｏｗｒｉｔｅ）から、書き込み対象のキャッシュバッファを全て抽出する。ここでは、キャッシュバッファ＃２、＃３、＃４が抽出された。ここで、各キャッシュバッファ＃２〜＃４はそれぞれ、キャッシュバッファが有する領域の一部を指すものである。 When investigating the storage unit 170, the user refers to and analyzes the trace information of the same time zone associated with the 30-second performance information. FIG. 14 shows trace information of the corresponding IO layer. The trace information 123 shown in FIG. 14 stores writing to the magnetic disk # 1. Since this singular event is related to the write request size, all the cache buffers to be written are extracted from the write trace information (function name: iwrite). Here, cache buffers # 2, # 3, and # 4 are extracted. Here, each of the cache buffers # 2 to # 4 indicates a part of the area of the cache buffer.

次に、ユーザは、抽出したキャッシュバッファ＃２〜＃４を元に、同時間帯のキャッシュ層１１２のトレース情報１２２を調査する。このトレース情報１２２に抽出したキャッシュバッファ＃２〜＃４が検出されなければ、前の３０秒間分のトレース情報１２２やそれ以前の３０秒間分のトレース情報１２２にも遡って調査する。ここでは、前の３０秒間のキャッシュ層１１２のトレース情報１２２に、抽出したキャッシュバッファ＃２〜＃４が検出された。図１５は、そのキャッシュ層のトレース情報である。 Next, the user investigates the trace information 122 of the cache layer 112 in the same time zone based on the extracted cache buffers # 2 to # 4. If the cache buffers # 2 to # 4 extracted in the trace information 122 are not detected, the trace information 122 for the previous 30 seconds and the trace information 122 for the previous 30 seconds are examined retrospectively. Here, the extracted cache buffers # 2 to # 4 are detected in the trace information 122 of the cache layer 112 for the previous 30 seconds. FIG. 15 shows the trace information of the cache layer.

次に、ユーザは、キャッシュバッファ＃２〜＃４が検出されたキャッシュ層１１２のトレース情報１２２と同時間帯のＡＰＩ層１１１のトレース情報１２１を解析する。図１６は、そのＡＰＩ層のトレース情報である。すると、キャッシュバッファ＃２〜＃４が検出された同時間に、対応するプロセス／スレッドから、１回で３２ＭＢもの「ｆｗｒｉｔｅ」を発行していることが検出された。呼出元アドレスに基づいてプロセスの開発元に照会したところ、プログラム動作環境パラメタの設定ミスにより不当なサイズでファイルの初期獲得が実施された結果、今回の事象が発生したことが判明した。 Next, the user analyzes the trace information 121 of the API layer 111 in the same time zone as the trace information 122 of the cache layer 112 in which the cache buffers # 2 to # 4 are detected. FIG. 16 shows the trace information of the API layer. Then, at the same time when the cache buffers # 2 to # 4 were detected, it was detected that 32 MB of “fwrite” was issued at a time from the corresponding process / thread. When the process developer was queried based on the caller address, it was found that this event occurred as a result of initial acquisition of the file with an invalid size due to a mistake in setting the program operating environment parameters.

このように、情報管理装置１００は、第１の期間分の性能情報と生成元のトレース情報とを対応付けて記憶し、第１の期間分の性能情報に所定の事象が検出された場合、記憶した第１の期間分の性能情報とトレース情報とを保存部１７０に出力する。 As described above, the information management apparatus 100 stores the performance information for the first period and the trace information of the generation source in association with each other, and when a predetermined event is detected in the performance information for the first period, The stored performance information and trace information for the first period are output to the storage unit 170.

これにより、性能トラブルの原因を調査する際、トレース情報にまで遡って調査を行うことが可能となる。トレース情報は、処理プロセスやスレッドを特定する情報や、処理の実行が誰によるものかを示す情報を含んでいるため、トレース情報を調査することで、性能トラブルの原因が誰のどのような処理によるものかを、短時間で特定することが可能となる。 As a result, when investigating the cause of a performance problem, it is possible to investigate retroactively to the trace information. Trace information includes information that identifies the processing process and thread, and information that indicates who executed the processing. Therefore, by examining the trace information, what kind of processing causes the performance problem. It is possible to specify whether or not it is due to a short time.

さらに、情報管理装置１００では、第１の期間分の性能情報からの所定の事象の検出は、第１の期間分の性能情報と、第１の期間分の性能情報を第２の期間分蓄積して集計した第２の期間分の性能情報とを比較することで行われる。これにより、第１の期間分の性能情報の局所的な異常値を検出することが可能となり、検出精度を向上させることが可能となる。 Furthermore, in the information management apparatus 100, the detection of the predetermined event from the performance information for the first period accumulates the performance information for the first period and the performance information for the first period for the second period. This is done by comparing the performance information for the second period totaled. This makes it possible to detect local abnormal values of the performance information for the first period, and to improve detection accuracy.

さらに、情報管理装置１００では、保存部１７０に保存される第２の期間分の性能情報、第１の期間分の性能情報、及びトレース情報は、所定の事象が検出された情報のみなので、調査の対象を絞ることが可能となり、調査時間をさらに短縮することが可能となる。また、この構成によれば、保存部１７０の記憶容量を小さく設定することが可能となる。 Furthermore, in the information management apparatus 100, the performance information for the second period, the performance information for the first period, and the trace information that are stored in the storage unit 170 are only information in which a predetermined event is detected. This makes it possible to narrow down the survey time. Further, according to this configuration, the storage capacity of the storage unit 170 can be set small.

なお、以上の処理は、前述した通り、コンピュータに所定のプログラムを実行させることで実現できる。その場合、実現すべき処理内容を記述したプログラムが提供される。処理内容を記述したプログラムは、コンピュータ読み取り可能な記録媒体に記録しておくことができる。コンピュータ読み取り可能な記録媒体には、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリがある。磁気記録装置には、ハードディスク装置、フレキシブルディスク（ＦＤ）、磁気テープ（ＭＴ）等がある。光ディスクには、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ、ＣＤ−ＲＯＭ（Compact Disc ？ Read Only Memory）、ＣＤ−Ｒ（Recordable）、ＣＤ−ＲＷ（ReWritable）等がある。光磁気記録媒体には、ＭＯ（Magneto-Optical disk）等がある。 The above processing can be realized by causing a computer to execute a predetermined program as described above. In that case, a program describing the processing contents to be realized is provided. The program describing the processing contents can be recorded on a computer-readable recording medium. Examples of the computer-readable recording medium include a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory. Examples of the magnetic recording device include a hard disk device, a flexible disk (FD), and a magnetic tape (MT). Optical discs include DVD (Digital Versatile Disc), DVD-RAM, CD-ROM (Compact Disc? Read Only Memory), CD-R (Recordable), CD-RW (ReWritable), and the like. Magneto-optical recording media include MO (Magneto-Optical disk).

プログラムを流通させる場合、例えば、そのプログラムが記録されたＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体が販売される。また、プログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにプログラムを転送することもできる。 When distributing the program, for example, a portable recording medium such as a DVD or a CD-ROM in which the program is recorded is sold. It is also possible to store the program in a storage device of a server computer and transfer the program from the server computer to another computer via a network.

プログラムを実行するコンピュータは、例えば、可搬型記録媒体に記録されたプログラム又はサーバコンピュータから転送されたプログラムを、自己の記憶装置に格納する。そして、コンピュータは、自己の記憶装置からプログラムを読み取り、プログラムに従った処理を実行する。なお、コンピュータは、可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することもできる。また、コンピュータは、サーバコンピュータからプログラムが転送される毎に、逐次、受け取ったプログラムに従った処理を実行することもできる。 The computer that executes the program stores, for example, the program recorded on the portable recording medium or the program transferred from the server computer in its own storage device. Then, the computer reads the program from its own storage device and executes processing according to the program. The computer can also read the program directly from the portable recording medium and execute processing according to the program. In addition, each time the program is transferred from the server computer, the computer can sequentially execute processing according to the received program.

１０情報管理装置
１１処理部
１２トレース情報記憶部
１３生成部
１４検出部
１５出力部 DESCRIPTION OF SYMBOLS 10 Information management apparatus 11 Processing part 12 Trace information storage part 13 Generation part 14 Detection part 15 Output part

Claims

Computer
Generating means for generating performance information indicating a processing status of the processing unit based on trace information of the processing unit that executes processing, and associating the performance information with the trace information of the generation source;
Detecting means for detecting a predetermined event based on the generated performance information;
When the predetermined event is detected, output means for outputting the performance information as a detection source and the trace information corresponding to the performance information;
An information management program that is executed as

The generation unit generates first performance information for a first period based on the trace information for a first period, and associates the first performance information with the trace information of the generation source. ,
The detecting means compares the first performance information with second performance information for a second period longer than the first period, and detects the predetermined event based on a comparison result. ,
When the output means detects the predetermined event, the output means outputs the first performance information as a detection source and the trace information corresponding to the first performance information;
The information management program according to claim 1.

The information management program according to claim 2, wherein the second performance information is obtained by aggregating a plurality of the performance information including the first performance information.

The information according to claim 1, wherein the output unit outputs the performance information as a detection source and the trace information corresponding to the performance information to a storage unit. Management program.

The information management program according to any one of claims 1 to 4, wherein the performance information includes an index indicating a response speed and a processing speed.

The information management program according to claim 5, wherein the performance information includes a write request count and a write request size.

The information management program according to claim 1, wherein the generation unit generates the performance information with reference to a conversion table that converts the trace information into the performance information.

The information management program according to claim 1, wherein the detection unit detects the predetermined event with reference to a condition identified as specific information.

Computer
Generate performance information indicating the processing status of the processing unit based on the trace information of the processing unit that executes processing, associate the performance information with the trace information of the generation source,
A predetermined event is detected based on the generated performance information,
When the predetermined event is detected, the performance information as a detection source and the trace information corresponding to the performance information are output.
An information management method characterized by that.

Generating performance information indicating the processing status of the processing unit based on the trace information of the processing unit that executes processing, and a generation unit that associates the performance information with the trace information of the generation source;
A detection unit for detecting a predetermined event based on the generated performance information;
When the predetermined event is detected, an output unit that outputs the performance information as a detection source and the trace information corresponding to the performance information;
An information management apparatus comprising: