JP5384136B2

JP5384136B2 - Failure analysis support system

Info

Publication number: JP5384136B2
Application number: JP2009036082A
Authority: JP
Inventors: 岳彦長野; 知彦茂岡
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2009-02-19
Filing date: 2009-02-19
Publication date: 2014-01-08
Anticipated expiration: 2029-02-19
Also published as: JP2010191738A

Description

本発明は、解析対象ソフトウェアが搭載された障害被検出装置と、上記被検出装置から提供される情報を基に、上記対象ソフトウェアにおいて発生した障害を検出する障害検出装置と、を備える障害解析支援システムに関する。 The present invention provides a failure analysis support comprising: a failure detection device on which analysis target software is installed; and a failure detection device that detects a failure occurring in the target software based on information provided from the detection target device. About the system.

従来、計算機システムの技術分野において、CPUの負荷低減を図り、使い勝手の良い、実行ログの記録、該実行ログの表示方法、該表示方法を用いた計算機システム、及び該表示方法がプログラムされ、記録される記録媒体の提供を目的とした提案が行われている。該提案では、計算機システムにおけるプログラムの実行ログの状態遷移が時系列的に表示され、該表示に際し、時間当たりの実行ログの状態遷移発生の頻度に従って時間軸における幅が可変とされる。また、該提案では、アプリケーション（プログラム）の他に、一度ログバッファに記録された実行ログが必要か否かをチェックする最も優先度の低いタスクが備えられており、該ログチェックの結果が記録されたログに適当なマークが付されることにより、該マークが付されていないログのみが、他の計算機システムに転送されるようになっている（例えば特許文献１参照）。また、上記技術分野において、ステップ実行をしながら障害を解析する手法や、デバッグ情報を出力して後から解析する手法等も知られている（例えば非特許文献１参照）。また、上記に加えて、解析対象の挙動に関する情報を取得するトレースの実行に、一般的なイベントトレーサを用いる手法も知られている（例えば非特許文献２参照）。更に、上記に加えて、描画の実施に際しての実装に、公開ツールを用いる手法も知られている（例えば非特許文献３参照）。 Conventionally, in the technical field of computer systems, the CPU load is reduced, and the user-friendly execution log recording, the execution log display method, the computer system using the display method, and the display method are programmed and recorded. Proposals have been made for the purpose of providing such recording media. In this proposal, the state transition of the execution log of the program in the computer system is displayed in time series, and the width on the time axis is variable according to the frequency of occurrence of the state transition of the execution log per time. In addition to the application (program), the proposal includes a task with the lowest priority for checking whether or not an execution log once recorded in the log buffer is necessary, and records the result of the log check. By attaching an appropriate mark to the recorded log, only the log without the mark is transferred to another computer system (see, for example, Patent Document 1). In addition, in the above technical field, a method of analyzing a failure while performing step execution, a method of outputting debug information and analyzing it later are known (for example, see Non-Patent Document 1). In addition to the above, there is also known a method using a general event tracer for executing a trace for acquiring information related to a behavior to be analyzed (for example, see Non-Patent Document 2). Further, in addition to the above, a technique using a public tool for implementation when performing drawing is also known (see, for example, Non-Patent Document 3).

特開平10-333938号公報Japanese Patent Laid-Open No. 10-333938 SWAP機構を用いた実行履歴の獲得について（情報処理学会研究報告［システムソフトウェアとオペレーティング・システム］Vol．97,No.20（19970227））Acquisition of execution history using SWAP mechanism (Information Processing Society of Japan [System Software and Operating System] Vol. 97, No. 20 (19970227)) Measuring and Characterizing System Behavior Using Kernel - LevelEvent LoggingMeasuring and Characterizing System Behavior Using Kernel-LevelEvent Logging http://sourceforge. net/projects/timedoctorhttp://sourceforge.net/projects/timedoctor

近年、携帯電話機端末や、テレビジョン受信機等の所謂コンシューマ機器
（組込み機器、若しくは組込み家電機器（製品）のことを指す。）の多機能化に伴い、該機器に搭載されるソフトウェアの規模も増大する傾向にある。一般に、ソフトウェアの規模が増大すると、それに伴って潜在的なバグの数も増加し、該ソフトウェアが搭載された製品の出荷後に上記潜在的なバグの存在に起因する不具合の発生する可能性が高くなる。そこで、このような状況を回避することを目的として、大規模なソフトウェアの開発を高品位に、且つ、高効率に行えるようにモデルベース開発（ソフトウェア開発において仕様をシミュレーション可能なモデルで表現し、各工程内でモデルのシミュレーションによる検証と修正の繰り返しを構成する開発手法のこと。）のようなソフトウェア開発の方法論に関する研究が進められている。また、上記手法に加えて、ソフトウェアプロダクトライン（ソフトウェアをドメインと称される小さな単位に再分化して開発する手法のこと。）のようなソフトウェア開発の方法論に関する研究も進められている。上述したモデルベース開発や、ソフトウェアプロダクトライン等の開発手法を採用して大規模なソフトウェアを開発することにより、該大規模なソフトウェアにおけるバグの発生数を、ある程度抑制することができる。しかし、上記何れの開発手法を採用しても、開発された大規模なソフトウェアにおけるバグを完全に取り除ける訳ではない。その理由としては、たとえ、上記のような新たな開発手法を導入しても、過去の資産（従来の組込み機器と、該組み込み機器に搭載された従来のソフトウェア）をそのまま再利用して大規模なソフトウェアを開発するために、潜在的なバグを抱えていることが挙げられる。また、開発すべきソフトウェアが、多くの種類の機器に搭載するため多岐に亘っている上に、それら数多くの種類のソフトウェアの早期開発が、多数のソフトウェア開発技術者に要求されており、それがソフトウェア開発技術者にとって大きな負担になっている。それ故、上述したような新たな開発手法の導入が徹底されていない点も、上記理由として挙げられる。 In recent years, with the increase in the number of functions of so-called consumer devices (referred to as embedded devices or embedded home appliances (products)) such as mobile phone terminals and television receivers, the scale of software installed in the devices has also increased. It tends to increase. In general, as the scale of software increases, the number of potential bugs increases accordingly, and there is a high possibility that defects due to the presence of the potential bugs will occur after the product equipped with the software is shipped. Become. Therefore, in order to avoid such a situation, model-based development (representing the specifications in a model that can be simulated in software development, so that large-scale software development can be performed with high quality and high efficiency, Research on software development methodologies, such as a development method that consists of repeated verification and correction by model simulation within each process, is underway. In addition to the above methods, research on software development methodologies such as software product lines (methods for re-dividing software into small units called domains) is also underway. By developing a large-scale software by adopting the above-described model-based development or a software product line or the like, the number of occurrences of bugs in the large-scale software can be suppressed to some extent. However, adopting any of the above development methods does not completely eliminate bugs in the developed large-scale software. The reason is that even if a new development method as described above is introduced, past assets (conventional embedded devices and conventional software installed in the embedded devices) can be reused as they are on a large scale. One of the reasons is that it has potential bugs in order to develop simple software. In addition, the software to be developed is diverse because it is installed in many types of devices, and early development of many types of software is required by many software development engineers. This is a heavy burden for software development engineers. Therefore, the reason why the introduction of new development methods as described above has not been thoroughly implemented is also cited as the above reason.

従来、ソフトウェアにバグが発生すると、その対策として既述の非特許文献１において示したようなステップ実行をしながら障害を解析する手法、又はデバッグ情報を出力して後から解析する手法の何れかが用いられる。しかし、前者の手法を採用すると、対話的な解析を行うために、タイミング依存等に代表される時間的な制約のあるアプリケーション（プログラム）のデバッグには不向きである。また、前者の手法には、バグの発生した箇所がおおまかに分かっているという条件や、バグの発生原因が分かっている等の条件を満たさない場合には、バグを再現させるために、対象のプログラムを実際に何度も何度も走らせてみることにより、言わば試行錯誤的に該バグの発生原因を見つけ出さないと問題を解決できないという欠点もある。一方、後者の手法を採用すると、解析に必要とするデバッグ情報の絞り込みができない場合に、出力情報が膨大な量になるので、解析自体が困難になるという問題が生じる。そこで、後者の手法を採用したことによって出力情報が膨大な量になった場合の対策として、既述の特許文献１で示したような計算機システムのイベントログの表示法に関し、時間当りのイベントログが緻密な部分での時間軸の目盛間隔、及び時間当りのイベントログが離散的な部分での時間軸の目盛間隔を動的に変化させることにより、表示ログの判別を容易にする手法が提案されている。 Conventionally, when a bug occurs in software, as a countermeasure, either a method of analyzing a failure while performing step execution as described in Non-Patent Document 1 described above, or a method of outputting debug information and analyzing it later Is used. However, if the former method is adopted, interactive analysis is performed, so that it is not suitable for debugging an application (program) having a time restriction represented by timing dependency or the like. In the former method, in order to reproduce the bug, if the condition that the location where the bug occurred is roughly known or the cause of the bug is not satisfied, By actually running the program over and over again, there is a drawback that the problem cannot be solved unless the cause of the bug is found by trial and error. On the other hand, when the latter method is employed, there is a problem that the analysis itself becomes difficult because the output information becomes enormous when the debug information required for analysis cannot be narrowed down. Therefore, as a countermeasure when the output information becomes enormous due to the latter method, the event log per hour is related to the event log display method of the computer system as described in Patent Document 1 described above. Proposal of a method that makes it easy to distinguish the display log by dynamically changing the time axis scale interval in the dense part and the time axis scale interval in the part where the event log per time is discrete. Has been.

しかし、該提案に係る手法では、イベントログの密度が低い箇所（即ち、イベントログが離散的な部分）での時間軸の目盛の密度を高めることで、表示ログの見易さを向上させることは可能でも、ユーザにとって解析の対象であるデータの量が削減された訳ではない。そのため、対象になるソフトウェアのトレースが長時間に及び、それに伴って取得したイベントログの量が増大すると、ユーザが、取得したイベントログの解析に要する時間も該イベントログの量に比例して長くならざるを得なくなる。よって、ユーザによる、長時間のトレースの結果得られた膨大な量のイベントログの解析時間の短縮化を図ることができないという問題があった。 However, in the method according to the proposal, it is possible to improve the visibility of the display log by increasing the density of the scale of the time axis at a place where the density of the event log is low (that is, where the event log is a discrete part). Although it is possible, the amount of data to be analyzed is not reduced for the user. For this reason, if the target software trace lasts for a long time and the amount of event logs acquired increases accordingly, the time required for the user to analyze the acquired event logs also increases in proportion to the amount of event logs. I have to be. Therefore, there is a problem that the analysis time of a huge amount of event logs obtained as a result of long-time tracing by the user cannot be shortened.

従って本発明の目的は、ユーザによるソフトウェア上の障害発生の有無の解析に際し、該ソフトウェアのトレースが長時間に及んだことにより取得したデータが膨大な量になった場合でも、ユーザが該膨大な量のデータを解析するのに要する時間を短縮することが可能な障害解析支援システムを提供することにある。 Therefore, an object of the present invention is to analyze the presence / absence of a fault on the software by the user, even if the amount of data acquired due to the trace of the software extending for a long time becomes huge. An object of the present invention is to provide a failure analysis support system that can reduce the time required to analyze a large amount of data.

本発明に従う障害解析支援システムは、解析対象ソフトウェアが搭載された障害被検出装置と、上記障害被検出装置から提供される情報を基に、上記解析対象ソフトウェアにおいて発生した障害を検出する障害検出装置と、を備え、上記障害被検出装置が、上記解析対象ソフトウェアのトレースを実行するトレース実行部、を有し、上記障害検出装置が、上記障害被検出装置から出力される、上記トレース実行部によりトレースされた上記解析対象ソフトウェアに係わる情報を、解析効率の良い形式の情報に変更する情報変更部と、選択された障害検出手法に基づき、上記情報変更部から出力される情報を解析する情報解析部と、ユーザからの情報表示出力要求を契機として、上記情報解析部による解析の結果として得られた情報を、可視化された情報として表示出力する情報可視化処理部と、を有する。 A failure analysis support system according to the present invention includes a failure detection device on which analysis target software is mounted, and a failure detection device that detects a failure that has occurred in the analysis target software based on information provided from the failure detection device. The failure detection apparatus has a trace execution unit that executes a trace of the analysis target software, and the failure detection apparatus is output from the failure detection apparatus by the trace execution unit. An information change unit that changes the information related to the traced software to be analyzed into information in a format with high analysis efficiency, and an information analysis that analyzes the information output from the information change unit based on the selected failure detection method Visualization of information obtained as a result of analysis by the information analysis unit, triggered by an information display output request from the user and the user Having an information visualization processing unit for displaying the output as information, a.

本発明に従う好適な実施形態では、上記情報解析部が、上記情報の解析の結果から上記解析対象ソフトウェアにおける障害の発生時刻を算出する。 In a preferred embodiment according to the present invention, the information analysis unit calculates a failure occurrence time in the analysis target software from a result of the analysis of the information.

上記とは別の実施形態では、上記選択された障害検出手法が、予め複数種類設定されている障害検出手法の中から、ユーザにより指定されたものである。 In an embodiment different from the above, the selected failure detection method is designated by the user from a plurality of failure detection methods set in advance.

また、上記とは別の実施形態では、上記情報変更部による変更対象にされる情報が、上記トレース実行部による、一定時間当たりのトレース結果に係わる情報である。 In an embodiment different from the above, the information to be changed by the information changing unit is information related to a trace result per fixed time by the trace execution unit.

また、上記とは別の実施形態では、上記情報可視化処理部が、ユーザにより指定された障害検出手法の名称に係わる情報をキーとして、上記情報変更部により解析効率の良い形式に変更された情報の中から、対応する情報を引き当て、該引き当てた情報の中から、上記解析対象ソフトウェアにおける障害発生時刻に係わる情報を抽出すると共に、該時刻情報に基づき、上記トレース結果に係わる情報の中から上記障害の検出箇所近傍の部位における情報を引き出す。 In an embodiment different from the above, the information visualization processing unit uses the information related to the name of the failure detection method specified by the user as a key, and the information changed by the information change unit into a format with high analysis efficiency. From the allocated information, the information related to the failure occurrence time in the analysis target software is extracted from the allocated information, and the information related to the trace result is extracted from the information related to the trace result based on the time information. Extract information in the vicinity of the fault detection location.

更に、上記とは別の実施形態では、上記情報可視化処理部が、上記トレース結果に係わる情報の中から引き出した上記障害の検出箇所近傍の部位における情報と、上記情報変更部から出力される、上記解析効率の良い形式に変更された情報とから、上記障害の検出箇所近傍の部位におけるトレース結果を描画するための処理、及び該描画処理が施された後の可視化画像情報に対し強調表示を行うための処理を施す。 Further, in an embodiment different from the above, the information visualization processing unit is output from the information in the vicinity of the fault detection location extracted from the information related to the trace result, and the information change unit. From the information that has been changed to a format with high analysis efficiency, a process for drawing the trace result in the vicinity of the location where the failure is detected, and the display of the visualized image information after the drawing process is highlighted Process to do.

本発明によれば、ユーザによるソフトウェア上の障害発生の有無の解析に際し、該ソフトウェアのトレースが長時間に及んだことにより取得したデータが膨大な量になった場合でも、ユーザが該膨大な量のデータを解析するのに要する時間を短縮することが可能な障害解析支援システムを提供することができる。 According to the present invention, when analyzing the presence or absence of a failure on the software by the user, even if the amount of data acquired due to the trace of the software is extended for a long time, the user It is possible to provide a failure analysis support system that can shorten the time required to analyze a large amount of data.

本発明の一実施形態に係る障害解析支援システムの全体構成を示す機能ブロック図。The functional block diagram which shows the whole structure of the failure analysis assistance system which concerns on one Embodiment of this invention. 図１に記載した障害検出部の内部構成を示す機能ブロック図。The functional block diagram which shows the internal structure of the failure detection part described in FIG. 図１に記載した可視化処理部の内部構成を示す機能ブロック図。The functional block diagram which shows the internal structure of the visualization process part described in FIG. 図１に記載した障害解析支援システムにおけるＣＰＵ使用効率より生成されたメタデータのデータ構造の一例を示す説明図。Explanatory drawing which shows an example of the data structure of the metadata produced | generated from CPU usage efficiency in the failure analysis assistance system described in FIG. 図１に記載した障害解析支援システムにおけるプロセス単位でのＣＰＵ使用効率より生成されたメタデータのデータ構造の一例を示す説明図。Explanatory drawing which shows an example of the data structure of the metadata produced | generated from the CPU usage efficiency per process in the failure analysis support system described in FIG. 図３に記載した可視化処理部により可視化されたピーク検出結果の一態様を示した説明図。Explanatory drawing which showed the one aspect | mode of the peak detection result visualized by the visualization process part described in FIG. ＣＰＵの負荷よりピーク箇所を検出するに際しての、障害検出処理部によるピーク検出処理のシーケンスの一例を示すフローチャート。The flowchart which shows an example of the sequence of the peak detection process by a failure detection process part at the time of detecting a peak location from the load of CPU.

以下、本発明の実施の形態を、図面により詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明の一実施形態に係る障害解析支援システムの全体構成を示す機能ブロック図である。 FIG. 1 is a functional block diagram showing the overall configuration of a failure analysis support system according to an embodiment of the present invention.

上記障害解析支援システムは、図１に示すように、ターゲットシステム１００と、ホストコンピュータ（以下では、「ホスト装置」と表記する。）３００と、を含む。ターゲットシステム１００とホスト装置３００との間は、イントラネット、若しくはインターネット等の通信ネットワークを通じて接続されるか、或いは、双方が相手方から送信された電気信号を直接受信することができるように、一般的な（規格の）信号線路を通じて接続されている。 As shown in FIG. 1, the failure analysis support system includes a target system 100 and a host computer (hereinafter referred to as “host device”) 300. The target system 100 and the host device 300 are connected via an intranet, a communication network such as the Internet, or the general so that both can directly receive an electrical signal transmitted from the other party. They are connected through (standard) signal lines.

ターゲットシステム１００は、解析対象ソフトウェア１と、トレース実行部３と、トレース結果送信部５と、を備える。ここで、ターゲットシステム１００とは、既述のような多機能化されたコンシューマ機器（携帯電話機端末やテレビジョン受信機等の組込み機器）のことを指す。ターゲットシステム１００には、演算処理や情報記憶や情報の入／出力等の情報処理機能を有するハードウェアが内蔵されており、該ハードウェアには、上記多機能化を実現するための大規模化したソフトウェアが搭載されている。従って、上記解析対象ソフトウェア１が、上記大規模化したソフトウェアに該当する。トレース実行部３は、上述した情報記憶のハードウェアに搭載されたソフトウェアにより、演算処理のハードウェアにおいて実現され、また、トレース結果送信部５は、上記情報の入／出力のハードウェアにおいて実現される。 The target system 100 includes analysis target software 1, a trace execution unit 3, and a trace result transmission unit 5. Here, the target system 100 refers to a multifunctional consumer device (an embedded device such as a mobile phone terminal or a television receiver) as described above. The target system 100 incorporates hardware having information processing functions such as arithmetic processing, information storage, and information input / output, and the hardware is scaled up to realize the above-mentioned multi-function. Installed software. Therefore, the analysis target software 1 corresponds to the scaled-up software. The trace execution unit 3 is realized in hardware for arithmetic processing by software installed in the information storage hardware described above, and the trace result transmission unit 5 is realized in hardware for input / output of the information. The

トレース実行部３は、解析対象ソフトウェア１を入力して、該解析対象ソフトウェア１に対しトレースを実行することにより、該解析対象ソフトウェア１の挙動に関する情報を取得すると共に、取得した該挙動に関する情報を、該解析対象ソフトウェア１のトレース結果に係わる情報としてトレース結果送信部５へ出力する。本実施形態では、上記トレース実行部３には、例えば、非特許文献１に開示されているような、一般的なイベントトレーサが用いられる。また、上記トレース実行部３として、ＩＣＥ（デバッグ用のハードウェアであるイン・サーキット・エミュレータの略記。ＣＰＵを外部から制御してシステム全体の動きを止め、メモリ内部を覗く機能を有する。）で取得した一般的なＣＰＵのプログラムカウンタトレース機能を用いることとしても差し支えない。 The trace execution unit 3 inputs the analysis target software 1 and executes a trace on the analysis target software 1 to obtain information on the behavior of the analysis target software 1 and also obtain the acquired information on the behavior. Then, it outputs to the trace result transmission unit 5 as information related to the trace result of the analysis target software 1. In the present embodiment, for the trace execution unit 3, for example, a general event tracer as disclosed in Non-Patent Document 1 is used. Further, as the trace execution unit 3, ICE (abbreviation of in-circuit emulator which is hardware for debugging. It has a function of controlling the CPU from the outside to stop the movement of the entire system and look inside the memory). The acquired general CPU program counter trace function may be used.

トレース結果送信部５は、トレース実行部３から出力されるトレース結果に係わる情報を受けて、該トレース結果に係わる情報をホスト装置３００へ送信する。 The trace result transmission unit 5 receives information related to the trace result output from the trace execution unit 3 and transmits information related to the trace result to the host device 300.

ホスト装置３００は、ターゲットシステム１００から出力される、該システム１００上で動作するソフトウェア、即ち、解析対象ソフトウェア１のトレース情報を基に、該（解析対象）ソフトウェア１の解析を実行する。ホスト装置３００は、メタ情報記録部７と、トレース結果記録部９と、トレース結果書き込み部１１と、トレース結果受信部１３と、メタ情報書き込み部１５と、障害検出部１７と、可視化処理部１９と、ユーザ要求収集部２１と、ユーザ要求保持部２３と、を備える。メタ情報記録部７、及びトレース結果記録部９に対応するホスト装置３００のハードウェア資源は、該ホスト装置３００に備えられるメモリである。また、トレース結果書き込み部１３、メタ情報書き込み部１５、障害検出部１７、可視化処理部１９、ユーザ要求収集部２１、及びユーザ要求保持部２３は、上述したメモリ（即ち、情報記憶のハードウェア）に搭載されたソフトウェア（アプリケーション・プログラム）により、ＣＰＵ（即ち、演算処理のハードウェア）において実現される。また、トレース結果受信部１３は、該ホスト装置３００の入／出力インタフェース（情報入／出力のハードウェア）において実現される。 The host apparatus 300 executes analysis of the (analysis target) software 1 based on the trace information of the software operating on the system 100 output from the target system 100, that is, the analysis target software 1. The host device 300 includes a meta information recording unit 7, a trace result recording unit 9, a trace result writing unit 11, a trace result receiving unit 13, a meta information writing unit 15, a failure detection unit 17, and a visualization processing unit 19. And a user request collecting unit 21 and a user request holding unit 23. The hardware resources of the host device 300 corresponding to the meta information recording unit 7 and the trace result recording unit 9 are memories provided in the host device 300. In addition, the trace result writing unit 13, the meta information writing unit 15, the failure detection unit 17, the visualization processing unit 19, the user request collection unit 21, and the user request holding unit 23 are the above-described memories (that is, information storage hardware). It is realized in a CPU (that is, hardware for arithmetic processing) by software (application program) installed in the computer. The trace result receiving unit 13 is realized in an input / output interface (information input / output hardware) of the host device 300.

トレース結果受信部１３は、（上記ターゲットシステム１００の）トレース結果送信部５から送信される上記トレース結果に係わる情報を受信し、該受信したトレース結果に係わる情報を、トレース結果書き込み部１１へ出力する。トレース結果書き込み部１１は、トレース結果受信部１３から出力される上記トレース結果に係わる情報を入力し、該入力したトレース結果に係わる情報を、トレース結果記録部９へ書き込むための処理を実行する。トレース結果記録部９は、トレース結果書き込み部１１により書き込まれた上記トレース結果に係わる情報を、保持すると共に、障害検出部１７からのトレース結果の読み出し要求に応じて、上記保持しているトレース結果に係わる情報を障害検出部１７、及び可視化処理部１９へ夫々出力する。 The trace result receiving unit 13 receives information related to the trace result transmitted from the trace result transmitting unit 5 (of the target system 100), and outputs the received information related to the trace result to the trace result writing unit 11. To do. The trace result writing unit 11 inputs information related to the trace result output from the trace result receiving unit 13 and executes processing for writing the input information related to the trace result to the trace result recording unit 9. The trace result recording unit 9 holds information related to the trace result written by the trace result writing unit 11 and also holds the trace result held in response to a trace result read request from the failure detection unit 17. Is output to the failure detection unit 17 and the visualization processing unit 19, respectively.

ユーザ要求収集部２１は、障害検出部１７から出力される複数種類の障害検出手法に係わる情報を入力し、該入力した複数種類の障害検出手法に係わる情報を、例えば、（該ホスト装置３００の）ＣＰＵの制御下にあるマン・マシンインタフェースである表示部に、ユーザが認識可能な表示態様で出力する。ユーザ要求収集部２１は、ユーザ２４が、上記マン・マシンインタフェースである（ホスト装置３００の）操作部を操作したことにより該操作部から指令信号を入力すると、該指令信号に基づき、ユーザ２４が上記表示されている複数種類の障害検出手法に係わる情報の中から何れの種類の障害検出手法（に係わる情報）を指定したか判別する。ユーザ要求収集部２１は、上記判別処理を行った結果として、上記複数種類の障害検出手法（に係わる情報）の中から選択した特定の障害検出手法（に係わる情報）を、ユーザ２４が指定した障害検出手法（に係わる情報）として、ユーザ要求保持部２３へ出力する。 The user request collection unit 21 inputs information related to a plurality of types of failure detection methods output from the failure detection unit 17, and inputs the information related to the plurality of types of failure detection methods input, for example (of the host device 300 ) Output to a display unit, which is a man-machine interface under the control of the CPU, in a display mode recognizable by the user. When the user 24 inputs a command signal from the operation unit by operating the operation unit (of the host device 300) that is the man-machine interface, the user 24 collects the user request based on the command signal. It is determined which type of failure detection method (information related to) is designated from the information related to the plurality of types of failure detection methods displayed above. As a result of performing the determination process, the user request collection unit 21 designates a specific failure detection method (information related to) selected from the plurality of types of failure detection methods (information related to). The information is output to the user request holding unit 23 as the failure detection method (related information).

ユーザ要求保持部２３は、ユーザ要求収集部２１から出力される、上記ユーザ２４が指定した障害検出手法（に係わる情報）をユーザ要求として保持すると共に、該障害検出手法に係わる情報を、障害検出部１７からの障害検出手法（に係わる情報）の読み出し要求に応じて障害検出部１７へ出力する。ユーザ要求保持部２３は、また、可視化処理部１９からの障害検出手法（に係わる情報）の読み出し要求に応じて、ユーザ要求として保持している上記障害検出手法（に係わる情報）を、可視化処理部１９へ出力する。 The user request holding unit 23 holds, as a user request, the failure detection method (related information) specified by the user 24, which is output from the user request collecting unit 21, and also detects information related to the failure detection method as a failure detection method. The information is output to the failure detection unit 17 in response to a request for reading a failure detection method (information relating to) from the unit 17. The user request holding unit 23 also visualizes the failure detection technique (related information) held as a user request in response to a read request of the failure detection technique (related information) from the visualization processing unit 19. To the unit 19.

メタ情報記録部７は、メタ情報書き込み部１５により書き込まれたメタ情報（後述する障害に係わる情報、及び解析に係わる情報）を保持すると共に、可視化処理部１９からのメタ情報読み出し要求に応じて、該保持しているメタ情報を、可視化処理部１９へ出力する。 The meta information recording unit 7 holds the meta information written by the meta information writing unit 15 (information related to failure and information related to analysis described later), and responds to a meta information read request from the visualization processing unit 19 The held meta information is output to the visualization processing unit 19.

障害検出部１７には、予め複数種類の障害検出手法に係わる情報が保持されており、例えば、ホスト装置３００の起動時等の適宜のタイミングで上記複数種類の障害検出手法に係わる情報をユーザ要求収集部２１へ出力する。障害検出部１７は、ユーザ要求保持部２３から出力される上記指定された障害検出手法に係わる情報と、トレース結果記録部９から出力される上記トレース結果に係わる情報とを、夫々入力する。そして、上記障害検出手法に係わる情報に基づき、上記トレース結果に係わる情報を解析することにより、上記トレース結果において発生した障害を検出する。障害検出部１７は、上記検出した障害に係わる情報と、上記トレース結果の解析に係わる情報とを、メタ情報書き込み部１５へ出力する。 Information related to a plurality of types of failure detection methods is stored in the failure detection unit 17 in advance. For example, information related to the plurality of types of failure detection methods may be requested by a user at an appropriate timing such as when the host apparatus 300 is started. Output to the collection unit 21. The failure detection unit 17 inputs information related to the specified failure detection method output from the user request holding unit 23 and information related to the trace result output from the trace result recording unit 9. Then, based on the information related to the failure detection method, the information related to the trace result is analyzed to detect a failure occurring in the trace result. The failure detection unit 17 outputs the information related to the detected failure and the information related to the analysis of the trace result to the meta information writing unit 15.

メタ情報書き込み部１５は、障害検出部１７から出力される上記障害に係わる情報、及び上記解析に係わる情報を夫々入力する。そして、上記障害に係わる情報、及び上記解析に係わる情報を、メタ情報としてメタ情報記録部７へ書き込むための処理を実行する。可視化処理部１９は、メタ情報記録部７から出力される上記メタ情報と、トレース結果記録部９から出力される上記トレース結果に係わる情報と、ユーザ要求保持部２３から出力される上記指定された障害検出手法に係わる情報と、を共に入力する。そして、上記指定された障害検出手法に基づき、上記メタ情報から導出される上記解析対象ソフトウェア１上の障害発生箇所、及び付近の状態を可視化（可視画像化）するための処理を実行する。 The meta information writing unit 15 inputs the information related to the failure output from the failure detection unit 17 and the information related to the analysis. Then, a process for writing the information related to the failure and the information related to the analysis into the meta information recording unit 7 as meta information is executed. The visualization processing unit 19 includes the meta information output from the meta information recording unit 7, information related to the trace result output from the trace result recording unit 9, and the specified information output from the user request holding unit 23. Information related to the failure detection method is input together. Then, based on the specified failure detection method, a process for visualizing (visualizing) a failure occurrence location and a nearby state on the analysis target software 1 derived from the meta information is executed.

図２は、図１に記載の障害検出部１７の内部構成を示す機能ブロック図である。 FIG. 2 is a functional block diagram illustrating an internal configuration of the failure detection unit 17 illustrated in FIG.

障害検出部１７は、既述のように、ユーザ要求保持部２３に保持されている、ユーザ２４により指定された障害検出手法に係わる情報を基に、トレース結果記録部９に記録されている解析対象ソフトウェア（１）のトレース結果に係わる情報を解析すると共に、その解析結果から、該解析対象ソフトウェア（１）上で発生した障害の発生時刻を算出する。障害検出部１７は、図２に示すように、障害情報検出手法記録部２５と、加工データ記憶部２７と、障害検出手法引き当て部２９と、トレース結果引き当て部３１と、トレース結果加工部３３と、障害検出処理部３５と、を含む。 As described above, the failure detection unit 17 performs the analysis recorded in the trace result recording unit 9 based on the information related to the failure detection method specified by the user 24 held in the user request holding unit 23. Information related to the trace result of the target software (1) is analyzed, and the occurrence time of the failure that occurred on the target software (1) is calculated from the analysis result. As shown in FIG. 2, the failure detection unit 17 includes a failure information detection method recording unit 25, a processing data storage unit 27, a failure detection method allocation unit 29, a trace result allocation unit 31, and a trace result processing unit 33. A failure detection processing unit 35.

障害情報検出手法記録部２５は、複数種類の障害検出手法の各々に対応する複数種類の障害検出処理プログラムを保持すると共に、上記保持している複数種類の障害検出処理プログラムの中から障害検出手法引き当て部２９により引き当てられた障害検出処理プログラムを、障害検出手法引き当て部２９へ出力する。障害検出手法引き当て部２９は、ユーザ要求保持部２３から出力される上記指定された障害検出手法に係わる情報を入力すると共に、該障害検出手法に係わる情報に基づき、障害情報検出手法記録部２５に保持されている複数種類の障害検出処理プログラムの中から対応する種類の障害検出処理プログラムを引き当てる。そして、該引き当てた１種類の障害検出処理プログラムを、障害検出手法記録部２５から入力する。障害検出手法引き当て部２９は、障害検出手法記録部２５から入力した上記１種類の障害検出処理プログラムを、障害検出処理部３５へ出力する。 The failure information detection method recording unit 25 holds a plurality of types of failure detection processing programs corresponding to each of a plurality of types of failure detection methods, and a failure detection method from among the plurality of types of failure detection processing programs held above. The failure detection processing program assigned by the assigning unit 29 is output to the failure detection method assigning unit 29. The failure detection method assigning unit 29 inputs information related to the specified failure detection method output from the user request holding unit 23 and, based on the information related to the failure detection method, to the failure information detection method recording unit 25. Corresponding types of failure detection processing programs are allocated from the plurality of types of failure detection processing programs held. Then, the assigned one type of failure detection processing program is input from the failure detection method recording unit 25. The failure detection method assigning unit 29 outputs the one type of failure detection processing program input from the failure detection method recording unit 25 to the failure detection processing unit 35.

トレース結果引き当て部３１は、障害検出手法引き当て部２９による上記障害検出処理プログラムの引き当て処理を実行した後に、トレース結果記録部９に記録されている上記解析対象ソフトウェア１のトレース結果に係わる情報の中から、一定時間当たりのトレース結果に係わる情報を、上記解析対象ソフトウェア１に係わる処理が終了するまでの間に読み出す。そして、該読み出した（一定時間当たりのトレース結果に係わる）情報を、トレース結果加工部３３へ出力する。トレース結果加工部３３は、トレース結果引き当て部３１から出力される上記一定時間当たりのトレース結果に係わる情報を読み込んで、該情報を解析効率の良い形式の情報（即ち、メタデータ）に変更する。該情報は、トレース結果加工部３３により、加工データ記憶部２７に書き込まれる。加工データ記憶部２７は、トレース結果加工部３３により書き込まれた上記情報を、一時的に保持すると共に、障害検出処理部３５からのデータ読み出し要求に応じて、該情報を、障害検出処理部３５へ出力する。 The trace result assigning unit 31 executes the assignment process of the failure detection processing program by the failure detection method assigning unit 29, and then includes the information related to the trace result of the analysis target software 1 recorded in the trace result recording unit 9. From this, information related to the trace result per fixed time is read out until the processing related to the analysis target software 1 is completed. Then, the read information (related to the trace result per fixed time) is output to the trace result processing unit 33. The trace result processing unit 33 reads information related to the trace result per certain time output from the trace result assigning unit 31 and changes the information to information in a format with high analysis efficiency (ie, metadata). The information is written into the processed data storage unit 27 by the trace result processing unit 33. The processed data storage unit 27 temporarily holds the information written by the trace result processing unit 33 and stores the information in response to a data read request from the failure detection processing unit 35. Output to.

障害検出処理部３５は、障害検出手法引き当て部２９から出力される上記１種類の障害検出処理プログラムを読み込むと共に、加工データ記憶部２７から出力される上記情報をも読み込む。そして、該読み込んだ障害検出処理プログラムを起動して、該読み込んだ情報を基に、解析対象ソフトウェア１における障害（データ）の検出処理を実行する。障害検出処理部３５により検出された上記障害（データ）は、障害検出処理部３５からメタ情報書き込み部１５へ出力される。 The failure detection processing unit 35 reads the one type of failure detection processing program output from the failure detection method assigning unit 29 and also reads the information output from the machining data storage unit 27. Then, the read failure detection processing program is started, and a failure (data) detection process in the analysis target software 1 is executed based on the read information. The failure (data) detected by the failure detection processing unit 35 is output from the failure detection processing unit 35 to the meta information writing unit 15.

図３は、図１に記載の可視化処理部１９の内部構成を示す機能ブロック図である。 FIG. 3 is a functional block diagram showing an internal configuration of the visualization processing unit 19 shown in FIG.

可視化処理部１９は、図３に示すように、メタ情報引き当て部３７と、トレース結果引き当て部３９と、表示箇所抽出部４１と、描画実施部４３と、を含む。メタ情報引き当て部３７は、表示箇所抽出部４１から出力される、表示箇所抽出部４１により指定された障害検出手法名（即ち、障害検出処理プログラム名）に係わる情報をキーとして、メタ情報記録部７に保持されているメタ情報の中から対応するメタ情報の引き当てを行う。該引き当てられたメタ情報は、メタ情報引き当て部３７から表示箇所抽出部４１へ出力される。 As illustrated in FIG. 3, the visualization processing unit 19 includes a meta information allocation unit 37, a trace result allocation unit 39, a display location extraction unit 41, and a drawing execution unit 43. The meta information assigning unit 37 uses the information related to the failure detection method name (ie, the failure detection processing program name) designated by the display location extraction unit 41, which is output from the display location extraction unit 41, as a key. The corresponding meta information is allocated from among the meta information held in 7. The assigned meta information is output from the meta information assigning unit 37 to the display location extracting unit 41.

トレース結果引き当て部３９は、表示箇所抽出部４１により指定された時刻情報をキーとして、トレース結果記録部９に保持されているトレース結果に係わる情報の中から解析対象ソフトウェア１において発生した障害の検出箇所の近傍のトレース結果に係わる情報の引き当てを行う。該引き当てられたトレース結果に係わる情報は、トレース結果引き当て部３９から表示箇所抽出部４１へ出力される。表示箇所抽出部４１は、ユーザ要求保持部２３から出力されるユーザ２４が指定した上記障害検出手法に係わる情報、即ち、ユーザ２４の指定に対応する障害検出処理プログラムを読み込むと共に、該読み込んだ障害検出処理プログラム名に係わる情報を、（ユーザ２４により）指定された障害検出手法名に係わる情報として、メタ情報引き当て部３７へ出力する。表示箇所抽出部４１は、また、メタ情報引き当て部３７から出力される上記メタ情報を読み込んで、上記障害検出処理プログラムを起動することにより、上記メタ情報から上記解析対象ソフトウェア１において発生した障害に係わる時刻情報を抽出し、該抽出した時刻情報を、トレース結果引き当て部３９へ出力する。表示箇所抽出部４１は、更に、トレース結果引き当て部３９から出力される、該トレース結果引き当て部３９において引き当てられた上記トレース結果に係わる情報を、上記障害の検出箇所近傍のトレース結果に係わる抽出結果として読み込む。表示箇所抽出部４１は、上記抽出結果、及び上記メタ情報を、描画実施部４３へ出力する。 The trace result assigning unit 39 uses the time information specified by the display location extracting unit 41 as a key to detect a failure that has occurred in the analysis target software 1 from the information related to the trace result held in the trace result recording unit 9. Allocate information related to the trace result near the location. Information relating to the assigned trace result is output from the trace result assigning unit 39 to the display location extracting unit 41. The display location extraction unit 41 reads the information related to the failure detection method specified by the user 24 output from the user request holding unit 23, that is, the failure detection processing program corresponding to the specification of the user 24, and the read failure Information relating to the detection processing program name is output to the meta information assigning unit 37 as information relating to the designated failure detection method name (by the user 24). The display location extraction unit 41 also reads the meta information output from the meta information assigning unit 37 and activates the failure detection processing program, so that a failure occurred in the analysis target software 1 from the meta information. The related time information is extracted, and the extracted time information is output to the trace result assigning unit 39. The display location extraction unit 41 further outputs the information related to the trace result assigned by the trace result assignment unit 39, which is output from the trace result assignment unit 39, to the extraction result related to the trace result in the vicinity of the failure detection location. Read as. The display location extraction unit 41 outputs the extraction result and the meta information to the drawing execution unit 43.

描画実施部４３は、表示箇所抽出部４１から出力される上記抽出結果、及び上記メタ情報を入力し、これらの情報に基づき、上記障害の検出箇所近傍のトレース結果を描画するための処理、及び該処理の結果に対する上記障害の検出箇所に対し強調表示を行うための処理を施す。そして、上記処理を経た後の可視化された情報は、描画実施部４３から画面表示部４５へ表示出力される。 The drawing execution unit 43 receives the extraction result output from the display location extraction unit 41 and the meta information, and based on the information, draws the trace result near the detection location of the failure, and A process for highlighting the detected part of the failure with respect to the result of the process is performed. The visualized information after the above processing is displayed and output from the drawing execution unit 43 to the screen display unit 45.

なお、本実施形態では、描画実施部４３に係わる実装については、例えば、上掲の非特許文献２において開示されているような、可視化処理機能や、非特許文献３において開示されているような公開ツールそのものか、若しくは、それらに準ずるものが採用されるものとする。 In this embodiment, the implementation related to the drawing execution unit 43 is, for example, a visualization processing function as disclosed in Non-Patent Document 2 described above, or as disclosed in Non-Patent Document 3. The public tools themselves or equivalents shall be adopted.

図４は、図１に記載の障害解析支援システムにおけるＣＰＵ使用効率より生成されたメタデータのデータ構造の一例を示す説明図である。 FIG. 4 is an explanatory diagram illustrating an example of a data structure of metadata generated based on CPU usage efficiency in the failure analysis support system illustrated in FIG. 1.

上記メタデータは、図４に示すように、障害検出手法情報記録欄５１と、時間情報記録欄５３と、ＣＰＵ使用率情報記録欄５５と、障害情報登録欄５７と、を含む。障害検出手法情報記録欄５１は、ユーザ（２４）が指定した障害検出手法が記録されるもので、本実施形態では、障害検出手法情報記録欄５１には、“ＣＰＵ使用率”が記録されている。時間情報記録欄５３は、上記システムにおいて、“ＣＰＵ使用率”という障害検出手法を用いて、実際にＣＰＵ使用率を測定した際に要した時間が記録されている。本実施形態では、時間情報記録欄５３には、“０、１、２、３、４、５、・・・、ｎ”が記録されている。ＣＰＵ使用率情報記録欄５５は、上記時間情報記録欄５３に記録されている個々の時間（時点）におけるＣＰＵ使用率が記録されている。本実施形態では、ＣＰＵ使用率情報記録欄５５における、時間“０”、及び時間“１”に夫々対応する箇所には、“１０％”が、時間“２”に対応する箇所には、“５０％”が、時間“３”に対応する箇所には、“８０％”が、時間“４”に対応する箇所には、“９０％”が、夫々記録されている。また、時間“５”に対応する箇所には、“４０％”が、そして、時間“ｎ”に対応する箇所には、“２０％”が、夫々記録されている。障害情報登録欄５７は、上述した解析対象ソフトウェア１に発生した障害が登録されているかどうかを示すためのフラグが登録される。本実施形態では、障害情報登録欄５７における、時間“３”、及び時間“４”に夫々対応する箇所には、上記発生した障害が登録されている旨のフラグが“オン”になっている。 The metadata includes a failure detection method information recording column 51, a time information recording column 53, a CPU usage rate information recording column 55, and a failure information registration column 57, as shown in FIG. The failure detection method information recording column 51 records the failure detection method designated by the user (24). In this embodiment, the failure detection method information recording column 51 records “CPU usage rate”. Yes. In the time information recording column 53, the time required when the CPU usage rate is actually measured using the failure detection technique “CPU usage rate” in the above system is recorded. In the present embodiment, “0, 1, 2, 3, 4, 5,..., N” is recorded in the time information recording field 53. The CPU usage rate information recording column 55 records the CPU usage rate at each time (time point) recorded in the time information recording column 53. In the present embodiment, in the CPU usage rate information recording column 55, “10%” is displayed at the locations corresponding to the time “0” and “1”, and “10” is displayed at the locations corresponding to the time “2”. “50%” is recorded at a location corresponding to time “3”, “80%” is recorded at a location corresponding to time “4”, and “90%” is recorded. In addition, “40%” is recorded at a location corresponding to the time “5”, and “20%” is recorded at a location corresponding to the time “n”. In the failure information registration column 57, a flag indicating whether or not a failure that has occurred in the analysis target software 1 is registered is registered. In the present embodiment, in the failure information registration column 57, the flag indicating that the occurred failure is registered is “ON” at the locations corresponding to the time “3” and the time “4”, respectively. .

図５は、図１に記載の障害解析支援システムにおけるプロセス単位でのＣＰＵ使用効率より生成されたメタデータのデータ構造の一例を示す説明図である。 FIG. 5 is an explanatory diagram illustrating an example of a data structure of metadata generated based on CPU usage efficiency in process units in the failure analysis support system illustrated in FIG.

上記メタデータは、図５に示すように、障害検出手法情報記録欄６１と、時間情報記録欄６３と、動作プロセスＩＤ記録欄６５と、ＣＰＵ占有時間情報記録欄６７と、障害情報登録欄６９と、を含む。障害検出手法情報記録欄６１は、図４で示した障害検出手法情報記録欄５１と同様に、ユーザ（２４）が指定した障害検出手法が記録されるもので、本実施形態では、障害検出手法情報記録欄６１には、“ＣＰＵ使用率”が記録されている。また、時間情報記録欄６３も、図５で示した時間情報記録欄５３と同様に、上記システムにおいて、“ＣＰＵ使用率”という障害検出手法を用いて、実際にＣＰＵ使用率を測定した際に要した時間が記録されている。なお、時間情報記録欄６３に記録される各々の時間情報は、図４で示した時間情報記録欄５３に記録されている各々の時間情報と対応付けがなされている。動作プロセスＩＤ記録欄６５は、時間情報記録欄６３に記録されている時間内において動作していたプロセスのＩＤ（即ち、識別情報）が、記録されるもので、本実施形態では、動作プロセスＩＤ記録欄６５には、時間“３”において動作していたプロセスのＩＤとして、３、５、１、２、４、５が、記録されている。 As shown in FIG. 5, the metadata includes a failure detection method information recording column 61, a time information recording column 63, an operation process ID recording column 65, a CPU occupation time information recording column 67, and a failure information registration column 69. And including. The failure detection method information recording column 61 records a failure detection method designated by the user (24), similarly to the failure detection method information recording column 51 shown in FIG. 4, and in this embodiment, the failure detection method is recorded. In the information recording column 61, “CPU usage rate” is recorded. Similarly to the time information recording column 53 shown in FIG. 5, the time information recording column 63 is also used when the CPU usage rate is actually measured by using the failure detection method “CPU usage rate” in the above system. The time required is recorded. Each time information recorded in the time information recording column 63 is associated with each time information recorded in the time information recording column 53 shown in FIG. The operation process ID recording column 65 records the ID (that is, identification information) of a process that has been operating within the time recorded in the time information recording column 63. In the present embodiment, the operation process ID is recorded as an operation process ID. In the recording column 65, 3, 5, 1, 2, 4, 5 are recorded as IDs of processes that were operating at time “3”.

ＣＰＵ占有時間情報記録欄６７は、動作プロセスＩＤ記録欄６５に記録されている複数のＩＤにより識別される個々のプロセスによるＣＰＵ占有時間が記録されるもので、該ＣＰＵ占有時間とは、時間情報記録欄６３に記録されている時間内において、何れのプロセスがどのタイミングでどれ位の時間、ＣＰＵを占有していたかを示している。本実施形態では、ＣＰＵ占有時間情報記録欄６７には、７５ｍｓ、１００ｍｓ、４０ｍｓ、３００ｍｓ、１００ｍｓ、２００ｍｓ、５０ｍｓ、７０ｍｓ、５０ｍｓ、１５ｍｓが記録されている。図５における左側から右側に向かって最初の７５ｍｓは、ＩＤ＝３で示されるプロセスがＣＰＵを占有していた時間であり、２番目の１００ｍｓは、ＣＰＵがアイドル状態であったことを示している。また、３番目の４０ｍｓは、ＩＤ＝５で示されるプロセスがＣＰＵを占有していた時間であり、４番目の３００ｍｓは、ＩＤ＝１で示されるプロセスがＣＰＵを占有していた時間である。また、５番目の１００ｍｓは、ＣＰＵがアイドル状態であったことを示しており、６番目の２００ｍｓは、ＩＤ＝１で示されるプロセスがＣＰＵを占有していた時間である。また、７番目の５０ｍｓは、ＩＤ＝２で示されるプロセスがＣＰＵを占有していた時間であり、８番目の７０ｍｓは、ＩＤ＝４で示されるプロセスがＣＰＵを占有していた時間である。更に、９番目の５０ｍｓは、ＣＰＵがアイドル状態であったことを示しており、１０番目の１５ｍｓは、ＩＤ＝５で示されるプロセスが、ＣＰＵを占有していた時間である。 The CPU occupation time information recording column 67 records CPU occupation times by individual processes identified by a plurality of IDs recorded in the operation process ID recording column 65. The CPU occupation time is time information. It shows which process occupies the CPU at which timing and which process within the time recorded in the recording field 63. In the present embodiment, 75 ms, 100 ms, 40 ms, 300 ms, 100 ms, 200 ms, 50 ms, 70 ms, 50 ms, and 15 ms are recorded in the CPU occupation time information recording column 67. The first 75 ms from the left side to the right side in FIG. 5 is the time that the process indicated by ID = 3 occupies the CPU, and the second 100 ms indicates that the CPU is idle. . The third 40 ms is the time that the process indicated by ID = 5 occupies the CPU, and the fourth 300 ms is the time that the process indicated by ID = 1 occupies the CPU. The fifth 100 ms indicates that the CPU is in an idle state, and the sixth 200 ms is the time during which the process indicated by ID = 1 occupies the CPU. The seventh 50 ms is the time that the process indicated by ID = 2 occupies the CPU, and the eighth 70 ms is the time that the process indicated by ID = 4 occupies the CPU. Furthermore, the ninth 50 ms indicates that the CPU is in an idle state, and the tenth 15 ms is a time during which the process indicated by ID = 5 has occupied the CPU.

障害情報登録欄６９も、図４で示した障害情報登録欄５７と同様に、上述した解析対象ソフトウェア１に発生した障害が登録されているかどうかを示すためのフラグが登録される。本実施形態では、障害情報登録欄６９における、時間“３”の、上記４番目の３００ｍｓに対応する箇所には、上記発生した障害が登録されている旨のフラグが“オン”になっている。 Similarly to the failure information registration field 57 shown in FIG. 4, the failure information registration field 69 also registers a flag for indicating whether or not a failure that has occurred in the analysis target software 1 is registered. In the present embodiment, in the failure information registration field 69, the flag indicating that the occurred failure is registered is “ON” at the location corresponding to the fourth 300 ms at the time “3”. .

図６は、図３に記載の可視化処理部１９により可視化されたピーク検出結果の一態様を示した説明図である。 FIG. 6 is an explanatory diagram showing an aspect of the peak detection result visualized by the visualization processing unit 19 shown in FIG.

図６で示すピーク検出結果において、図６（ａ）は、図４で示したメタデータを、可視画像化（グラフ化）したものである。図６（ａ）において、縦軸には、定義済みの画像ファイル（即ち、プログラムで分かり易く記したＣＰＵの利用率のこと。階段波形状にて示される。）を重ね合わせたことが示されており、横軸は、時間軸（ｔ）になっている。楕円により囲まれた領域である領域７１は、上述した解析対象ソフトウェア１において発生した障害の箇所を示している。なお、上記楕円は、強調（箇所）を意味している。 In the peak detection result shown in FIG. 6, FIG. 6A is a visual image (graphed) of the metadata shown in FIG. 4. In FIG. 6A, the vertical axis indicates that the defined image files (that is, the CPU usage rate described in an easy-to-understand manner in the program, which is indicated by a staircase waveform) are superimposed. The horizontal axis is the time axis (t). An area 71 that is an area surrounded by an ellipse indicates a location of a failure that has occurred in the analysis target software 1 described above. The ellipse means emphasis (location).

図６で示すピーク検出結果において、図６（ｂ）は、図４で示したメタデータ（即ち、図６（ａ））の一部である、図５で示したメタデータを、可視画像化（グラフ化）したものである。図６（ｂ）において、縦軸には、ＩＤＬＥ、及びＰＩＤ１乃至ＰＩＤ５（即ち、上述した定義済みの画像ファイルを重ね合わせたことを示す）が、また、横軸には、０、２００、４００、６００、８００、１０００、１２００が、夫々設定されている。楕円により囲まれた領域である領域７３は、上記領域７１と同様に、上述した解析対象ソフトウェア１において発生した障害の箇所を示している。なお、領域７３（即ち、強調箇所）の描画については、上述した時刻情報を中心に、定義済みの画像ファイルを重ね合わせる処理を行うことにより、実現が可能である。 In the peak detection result shown in FIG. 6, FIG. 6B is a part of the metadata shown in FIG. 4 (that is, the metadata shown in FIG. 6A). The metadata shown in FIG. (Graphed). In FIG. 6B, the vertical axis indicates IDLE and PID1 to PID5 (that is, the above-described defined image files are superimposed), and the horizontal axis indicates 0, 200, 400. , 600, 800, 1000, and 1200 are set. An area 73 that is an area surrounded by an ellipse indicates the location of the failure that has occurred in the analysis target software 1 as described above. It should be noted that the drawing of the region 73 (that is, the emphasized portion) can be realized by performing a process of superimposing the defined image files around the time information described above.

図７は、ＣＰＵの負荷よりピーク箇所を検出するに際しての、障害検出処理部３５によるピーク検出処理のシーケンスの一例を示すフローチャートである。図７に示すフローチャートは、解析対象ソフトウェア１において生じ得る性能障害の一つであるＣＰＵ使用率の高騰（ピーク）の検出手法に係わるものである。図７のフローチャートで示す処理動作は、処理対象であるデータ全体の一括処理にも、また、処理対象であるデータを単位時間分に分割し、分割された各データ別に処理を行う分割処理にも、夫々対応が可能である。 FIG. 7 is a flowchart showing an example of a sequence of peak detection processing by the failure detection processing unit 35 when detecting a peak location from the load of the CPU. The flowchart shown in FIG. 7 relates to a method for detecting a CPU usage rate soaring (peak), which is one of the performance problems that can occur in the analysis target software 1. The processing operation shown in the flowchart of FIG. 7 is for batch processing of the entire data to be processed, or for split processing in which the data to be processed is divided into unit times and processing is performed for each divided data. , Respectively.

図７において、障害検出処理部３５は、まず、加工データ記憶部２７より、該加工データ記憶部２７に記憶されているトレースデータのうちから１イベント分のデータを読み込む（ステップＳ８１）。次に、上記読み込んだ１イベント分のデータが、先頭データかどうかチェックする（ステップＳ８２）。該チェックの結果、先頭データであると判断すると（ステップＳ８２でＹＥＳ）、障害検出処理部３５は、先頭データであると判断した上記１イベント分のデータ中に記録されているイベント発生時刻に係わる情報を、先頭時刻として所定の記憶領域に記録し（ステップＳ８３）、次のステップＳ８４で示す処理動作に移行する。また、ステップＳ８２でのチェックの結果、先頭データでないと判断した場合にも（ステップＳ８２でＮＯ）、直ちにステップＳ８４で示す処理動作に移行する。 In FIG. 7, the failure detection processing unit 35 first reads data for one event from the trace data stored in the processed data storage unit 27 from the processed data storage unit 27 (step S81). Next, it is checked whether the read data for one event is the head data (step S82). As a result of the check, if it is determined that the data is the top data (YES in step S82), the failure detection processing unit 35 relates to the event occurrence time recorded in the data for the one event determined to be the top data. The information is recorded in a predetermined storage area as the start time (step S83), and the process proceeds to the next step S84. Further, when it is determined that the data is not the top data as a result of the check in step S82 (NO in step S82), the process immediately proceeds to the processing operation shown in step S84.

次に、ステップＳ８１で読み込んだ１イベント分のデータが、上述したトレースデータの最終データかどうかチェックする（ステップＳ８４）。該チェックの結果、上記トレースデータの最終データでないと判断すると（ステップＳ８４でＮＯ）、ステップＳ８１で読み込んだ１イベント分のデータが、プロセス切り替えイベントかどうかチェックする。ここで、プロセスとは、ＯＳレベルで見たプロセスのことであり、ＯＳから見た１つの処理単位のことを指す（ステップＳ８５）。該チェックの結果、プロセス切り替えイベントでないと判断すれば（ステップＳ８５でＮＯ）、ステップＳ８１で示した処理動作に移行する。一方、該チェックの結果、プロセス切り替えイベントであると判断すると（ステップＳ８５でＹＥＳ）、次のステップＳ８６で示す処理動作に移行する。 Next, it is checked whether the data for one event read in step S81 is the final data of the trace data described above (step S84). If it is determined as a result of this check that the data is not the final data of the trace data (NO in step S84), it is checked whether the data for one event read in step S81 is a process switching event. Here, the process refers to a process viewed at the OS level, and refers to one processing unit viewed from the OS (step S85). As a result of the check, if it is determined that the event is not a process switching event (NO in step S85), the process proceeds to the processing operation shown in step S81. On the other hand, if it is determined as a result of the check that the event is a process switching event (YES in step S85), the process proceeds to the next step S86.

次に、障害検出処理部３５は、ステップＳ８１で読み込んだ１イベント分のデータが、最初のプロセス切り替えイベントかどうかチェックする（ステップＳ８６）。該チェックの結果、最初のプロセス切り替えイベントであると判断すると（ステップＳ８６でＹＥＳ）、次のステップＳ８７で示す処理動作に移行する。即ち、障害検出処理部３５は、ステップＳ８３で記録した先頭時刻と、ステップＳ８１で読み込んだ１イベント分のデータの発生時刻とから（該最初のプロセス切り替えイベントの）実行時間を算出すると共に、該算出した実行時間を、所定の記憶領域に保存し、次のステップＳ８９で示す処理動作に移行する（ステップＳ８７）。 Next, the failure detection processing unit 35 checks whether the data for one event read in step S81 is the first process switching event (step S86). As a result of the check, if it is determined that the event is the first process switching event (YES in step S86), the process proceeds to the next step S87. That is, the failure detection processing unit 35 calculates the execution time (of the first process switching event) from the start time recorded in step S83 and the occurrence time of data for one event read in step S81, and The calculated execution time is stored in a predetermined storage area, and the process proceeds to the next step S89 (step S87).

一方、該チェックの結果、最初のプロセス切り替えイベントであると判断すると（ステップＳ８６でＮＯ）、次のステップＳ８８で示す処理動作に移行する。即ち、障害検出処理部３５は、実行中であったプロセスの開始時刻と、ステップＳ８１で読み込んだ１イベント分のデータの発生時刻とから（該最初のプロセス切り替えイベントの）実行時間を算出すると共に、該算出した実行時間を、所定の記憶領域に保存し、次のステップＳ８９で示す処理動作に移行する（ステップＳ８８）。障害検出部３５は、次に、上記読み込み済みの最初のプロセス切り替えイベントの内部に記録されている、切り替え後の開始プロセスのＩＤ（ｎ）を、所定の記憶領域に記録する処理を実行する（ステップＳ８９）。該処理が終了すると、障害検出部３５は、次に、ステップＳ８１で読み込んだ１イベント分のデータの発生時刻を、ステップＳ８９でＩＤ（ｎ）を所定の記憶領域に記録したプロセス（即ち、切り替え後の開始プロセス）の開始時刻として、所定の記憶領域に記録する処理を実行し（ステップＳ９０）、該処理が終了すると、ステップＳ８１で示した処理動作に移行する。 On the other hand, if it is determined as a result of the check that the event is the first process switching event (NO in step S86), the process proceeds to the next step S88. That is, the failure detection processing unit 35 calculates the execution time (of the first process switching event) from the start time of the process being executed and the generation time of the data for one event read in step S81. The calculated execution time is stored in a predetermined storage area, and the process proceeds to the next step S89 (step S88). Next, the failure detection unit 35 executes a process of recording the ID (n) of the start process after switching recorded in the first process switching event that has been read in a predetermined storage area ( Step S89). When the processing is completed, the failure detection unit 35 next processes the occurrence of the data for one event read in step S81 and the process in which ID (n) is recorded in a predetermined storage area in step S89 (ie, switching) A process of recording in a predetermined storage area is executed as the start time of the subsequent start process (step S90), and when the process ends, the process proceeds to the process operation shown in step S81.

そして、ステップＳ８１で読み込んだ１イベント分のデータが、上述したトレースデータの最終データかどうかチェックし（ステップＳ８４）。該チェックの結果、上記トレースデータの最終データであると判断すると（ステップＳ８４でＹＥＳ）、障害検出部３５は、プロセス単位毎に記録されているＣＰＵ使用率とトレ−スの実行時間とを参照して、プロセス単位でのＣＰＵ使用率を算出する（ステップＳ９１）。次に、ステップＳ９１で算出したプロセス単位でのＣＰＵ使用率から、システム全体としてのＣＰＵ使用率を算出するための処理を実行する（ステップＳ９２）。上記処理が終了すると、障害検出部３５は、次に、ステップＳ９１で求めたプロセス単位でのＣＰＵ使用率と、ステップＳ９２で求めたシステム全体としてのＣＰＵ使用率とから、ＣＰＵ使用率のピークを決定（検出）するための処理を実行する。この処理の仕方については、ユーザ（２４）が設定した任意の手法を用いて差し支えない。例えば、ＣＰＵ使用率の高い上位の１０箇所からＣＰＵ使用率のピークを求める方法や、予め設定されている閾値を超えているＣＰＵ使用率だけをサンプリングして、それらサンプリングされた幾つかのＣＰＵ使用率の中からＣＰＵ使用率を求める方法等が想定され得る（ステップＳ９３）。以上のような過程を経て生成されたＣＰＵ使用率のピークデータは、障害検出部３５よりメタ情報書き込み部１５に出力され（ステップＳ９４）、これにより、障害検出部３５による図７で示した一連の処理動作が終了することになる。 Then, it is checked whether the data for one event read in step S81 is the final data of the trace data described above (step S84). If it is determined as a result of the check that the data is the final data of the trace data (YES in step S84), the failure detection unit 35 refers to the CPU usage rate and the trace execution time recorded for each process unit. Then, the CPU usage rate for each process is calculated (step S91). Next, a process for calculating the CPU usage rate of the entire system is executed from the CPU usage rate for each process calculated in step S91 (step S92). When the above processing is completed, the failure detection unit 35 then obtains a peak CPU usage rate from the CPU usage rate for each process obtained in step S91 and the CPU usage rate for the entire system obtained in step S92. A process for determining (detecting) is executed. For this processing method, any method set by the user (24) may be used. For example, a method for obtaining a peak of CPU utilization from the top 10 locations with high CPU utilization, or sampling only CPU utilization exceeding a preset threshold, and using those sampled CPUs A method for obtaining the CPU usage rate from the rate can be assumed (step S93). The peak data of the CPU usage rate generated through the above process is output from the failure detection unit 35 to the meta information writing unit 15 (step S94), whereby the series shown in FIG. This processing operation ends.

以上、本発明の好適な実施形態を説明したが、これは本発明の説明のための例示であって、本発明の範囲をこの実施形態にのみ限定する趣旨ではない。本発明は、他の種々の形態でも実施することが可能である。 The preferred embodiment of the present invention has been described above, but this is an example for explaining the present invention, and the scope of the present invention is not limited to this embodiment. The present invention can be implemented in various other forms.

１解析対象ソフトウェア
３トレース実行部
５トレース結果送信部
７メタ情報記録部
９トレース結果記録部
１１トレース結果書き込み部
１３トレース結果受信部
１５メタ情報書き込み部
１７障害検出部
１９可視化処理部
２１ユーザ要求収集部
２３ユーザ要求保持部
２５障害情報検出手法記録部
２７加工データ記憶部
２９障害検出手法引き当て部
３１トレース結果引き当て部
３３トレース結果加工部
３５障害検出処理部
３７メタ情報引き当て部
３９トレース結果引き当て部
４１表示箇所抽出部
４３描画実施部
４５画面表示部
１００ターゲットシステム
３００ホスト装置 DESCRIPTION OF SYMBOLS 1 Analysis object software 3 Trace execution part 5 Trace result transmission part 7 Meta information recording part 9 Trace result recording part 11 Trace result writing part 13 Trace result receiving part 15 Meta information writing part 17 Failure detection part 19 Visualization processing part 21 User request collection Unit 23 user request holding unit 25 fault information detection method recording unit 27 processing data storage unit 29 fault detection method allocation unit 31 trace result allocation unit 33 trace result processing unit 35 fault detection processing unit 37 meta information allocation unit 39 trace result allocation unit 41 Display location extraction unit 43 Drawing execution unit 45 Screen display unit 100 Target system 300 Host device

Claims

A failure detection device in which analysis target software is installed, a failure detection device that detects a failure that occurred in the analysis target software, based on information provided from the failure detection device;
With
The fault detected device is
A trace execution unit for executing a trace of the analysis target software;
Have
The failure detection device is
An information changing unit that changes information related to the trace result related to the analysis target software traced by the trace execution unit, which is output from the failure detected apparatus, into information in a format with high analysis efficiency;
Based on the selected failure detection method, an information analysis unit that analyzes information output from the information change unit,
Triggered by an information display output request from the user, an information visualization processing unit that displays and outputs information obtained as a result of analysis by the information analysis unit as visualized information;
I have a,
The information visualization processing unit assigns corresponding information from the information changed to a format with high analysis efficiency by the information changing unit, using information related to the name of the failure detection method designated by the user as a key, From the assigned information, information related to the failure occurrence time in the analysis target software is extracted, and based on the time information, information on a part in the vicinity of the failure detection location is extracted from the trace result related to the analysis target software. A failure analysis support system to draw out.

The failure analysis support system according to claim 1 ,
The information visualization processing unit outputs the information in the vicinity of the fault detection location extracted from the information related to the trace result related to the analysis target software , and the analysis efficiency output format output from the information change unit. A failure for performing processing for drawing a trace result in a portion near the location where the failure is detected from the changed information and processing for highlighting the visualized image information after the drawing processing is performed Analysis support system.