JP5948291B2

JP5948291B2 - Monitoring information analyzing apparatus and method

Info

Publication number: JP5948291B2
Application number: JP2013168226A
Authority: JP
Inventors: 達明木村; 暁渡邉; 剛豊野; 西松　研; 研西松
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2013-08-13
Filing date: 2013-08-13
Publication date: 2016-07-06
Anticipated expiration: 2033-08-13
Also published as: JP2015036891A

Description

本発明は、監視情報分析装置及び方法に係り、特に、多種多様な機械が出力する監視情報から利用者にとって有用な情報を抽出するための監視情報分析装置及び方法に関する。 The present invention relates to a monitoring information analysis apparatus and method, and more particularly, to a monitoring information analysis apparatus and method for extracting information useful for a user from monitoring information output by various machines.

今日、コスト削減を主な理由として、異なる製造元の、異なる役割を持つ機器、ソフトウェアの一元的な監視・管理が行われている。一方で、こうした多種多様な機器やソフトウェアはそれぞれ独自の形態を持つ監視情報（以下、「ログ」と記す）を出力する機構を有しており、機器の監視・管理を行う際において使用される。情報機器の発展に伴い、これらのログ情報は複雑かつ大規模化しており、効率的な監視方法が必要となっている。 Today, centrally monitoring and managing devices and software with different roles from different manufacturers, mainly because of cost reduction. On the other hand, these various devices and software have a mechanism to output monitoring information (hereinafter referred to as “log”) that has its own form, and is used when monitoring and managing devices. . With the development of information equipment, the log information has become complicated and large-scale, and an efficient monitoring method is required.

こうした中で、ログ分析を簡略化するための分析基盤がある（例えば、非特許文献１参照）。しかし、この技術を利用するためには、個々のログの発生する意味やログメッセージの内容に関しての事前知識が必要となり、膨大で複雑なログに対しての適用には不向きとなる。 Under such circumstances, there is an analysis infrastructure for simplifying log analysis (see Non-Patent Document 1, for example). However, in order to use this technology, prior knowledge about the meaning of each log and the contents of the log message is required, which makes it unsuitable for application to huge and complex logs.

また、以前に生起したエラーログに対してログ番号を振り、ある一定期間保持しておき、同じエラーログ番号が再度発生する度にこれを通知する障害情報の管理方法が知られている。しかしながら、機械の生成するログには実際に発生した事象を意味するメッセージ部分と、エラーコードや、ID番号、発生場所、時間等のエラーコード特有の値や単語が含まれており、実際には発生した事象が同じであってもエラーログが一意に定まらないことが多い。このため、同じ事象の発生を意味するログが別々の番号で管理されてしまうことになり、非効率的であり有用とはいえない。 In addition, there is known a failure information management method in which a log number is assigned to an error log that has occurred before and is retained for a certain period, and the same error log number is notified each time it occurs again. However, the log generated by the machine includes a message part that means the event that actually occurred, and error code-specific values and words such as an error code, ID number, location, and time. Even if the events that occur are the same, the error log is often not uniquely determined. For this reason, logs indicating the occurrence of the same event are managed with different numbers, which is inefficient and not useful.

一方、ルータなどのネットワーク機器の生成するsyslogを対象とし、ベンダやメッセージタイプ、エラーコード、詳細なメッセージ内容といったある程度の形式を事前に与えられた場合におけるテンプレートの把握法を与えている。当該技術は、ルータの位置関係等を利用して、syslogのダイジェスト情報を表示する手法を提案している（例えば、非特許文献２参照）。しかしながら、当該技術では、少なくとも情報を収集する段階での事前知識が必要である。例えば、どのベンダからログを収集しているか、メッセージ内のどの部分がエラーコードを示しているかといったことを把握しておかなければならず、テンプレート把握機構を単純に用意するだけでなく、事前に手動での入力や準備が必要となる。また、syslogのダイジェスト表示機能は、syslogに特化した手法であるために、その他の監視ログ情報への適用ができない。 On the other hand, it targets syslog generated by network devices such as routers, and provides a method for grasping templates when a certain format such as vendor, message type, error code, and detailed message content is given in advance. This technology proposes a method of displaying the digest information of syslog using the positional relationship of routers (see Non-Patent Document 2, for example). However, this technique requires prior knowledge at least at the stage of collecting information. For example, you must know from which vendor the log is collected, which part of the message shows the error code, and not only prepare a template grasp mechanism, but also in advance Manual input and preparation are required. In addition, since the digest display function of syslog is a technique specialized for syslog, it cannot be applied to other monitoring log information.

Splunk http://www.splunk.com/Splunk http://www.splunk.com/ T. Qiu, Z. Ge, D. Pei, J. Wang, J, Xu, "What Happened in my Network? Mining Network Events from Router Syslogs", In IMC, 2010.T. Qiu, Z. Ge, D. Pei, J. Wang, J, Xu, "What Happened in my Network? Mining Network Events from Router Syslogs", In IMC, 2010.

こうした中で、事前情報を用いずに対象がsyslogに限定されない一般的なログのテンプレートを抽出する方法がある。図１にログのテンプレートとメッセージの例を示す。この手法は、テンプレート抽出の度に全てのデータを入力してバッチ処理を行う。詳しくは、図1の１行目の例は、スペースにより分割されたメッセージログを読み込み、それまでに受け取った全ログメッセージを利用して付与したスコアに基づいてクラスタリングを行い、スコアの低いクラスタをパラメータとして「*」で置き換える例である。 Under these circumstances, there is a method of extracting a general log template that does not use prior information and the target is not limited to syslog. FIG. 1 shows an example of a log template and a message. This method performs batch processing by inputting all data every time a template is extracted. Specifically, the example of the first line in Fig. 1 reads a message log divided by spaces, performs clustering based on the score given using all log messages received so far, and selects clusters with low scores. This is an example of replacing with “*” as a parameter.

しかしながら、上記の手法は、テンプレート抽出の度に全てのデータを入力が必要となるバッチ手法であるため、時々刻々と変化するログの形式に追従できず、稀に出現するログにも対応できない。このため、逐次的にログのテンプレートを抽出し、ログの生成則や内容の変化に対応できる手法が必要となる。 However, since the above method is a batch method in which all data needs to be input every time a template is extracted, it cannot follow the log format that changes from moment to moment and cannot cope with a log that appears rarely. For this reason, a technique is required that can sequentially extract log templates and cope with changes in log generation rules and contents.

本発明は、上記の点に鑑みなされたもので、利用者が機械の生成するログの生成則を事前に知ることなく、ログのテンプレート部分を逐次的に抽出することが可能な監視情報分析装置及び方法を提供することを目的とする。 The present invention has been made in view of the above points, and a monitoring information analysis apparatus capable of sequentially extracting a template portion of a log without a user knowing in advance a log generation rule generated by a machine. And to provide a method.

一態様によれば、監視対象の機器から出力されるログ情報から有用な情報を抽出するための監視情報分析装置であって、
特定の記号や助詞の間に分割のための空白が挿入され、明らかにパラメータであることが認識できる部分を削除した加工済みログメッセージを取得し、類似するメッセージをクラスタリングし、取得した加工済みログメッセージに含まれる各単語を各単語位置に対応する、テンプレートへのなりやすさに基づいて設定された所定のケースに分類し、取得した加工済みログメッセージと各クラスタとの類似度を計算し、最も類似度の高いクラスタを選択し、取得した加工済みログメッセージを、類似度が最も高いクラスタに追加し、当該クラスタをクラスタ記憶手段に格納するクラスタ抽出手段と、
前記クラスタ記憶手段からクラスタを読み出し、所定の条件を満たす単語位置にラベルを付与してテンプレートを抽出するテンプレート抽出手段と、を有する監視情報分析装置が提供される。
According to one aspect, a monitoring information analysis apparatus for extracting useful information from log information output from a device to be monitored,
A processed log message is obtained by inserting a blank space for splitting between specific symbols and particles, deleting the part that can be clearly recognized as a parameter, clustering similar messages, and acquiring the processed log Each word included in the message is classified into a predetermined case set based on the likelihood of being a template corresponding to each word position, and the degree of similarity between the acquired processed log message and each cluster is calculated, A cluster extraction unit that selects a cluster having the highest similarity, adds the acquired processed log message to the cluster having the highest similarity, and stores the cluster in the cluster storage unit;
There is provided a monitoring information analyzing apparatus comprising: a template extracting unit that reads a cluster from the cluster storage unit, adds a label to a word position that satisfies a predetermined condition, and extracts a template.

一態様によれば、利用者が機械の生成する監視情報の形式の生成則を事前に直接的に知ることなく、必要な情報を逐次的に抽出することにより、ログのテンプレート部分を逐次的に抽出することが可能となる。 According to one aspect, the template portion of the log is sequentially extracted by extracting the necessary information sequentially without the user directly knowing in advance the generation rules of the format of the monitoring information generated by the machine. It becomes possible to extract.

従来技術によるテンプレート生成例である。It is an example of template generation by a prior art. 本発明の一実施の形態における監視情報分析装置の構成例である。It is a structural example of the monitoring information analyzer in one embodiment of this invention. 本発明の一実施の形態におけるログメッセージとテンプレートの例である。It is an example of the log message and template in one embodiment of this invention. 本発明の一実施の形態におけるテンプレート把握手順のフローチャートである。It is a flowchart of the template grasping | ascertaining procedure in one embodiment of this invention. 本発明の一実施の形態におけるクラスタのデータ保存法及びクラスタへの要素追加法の例である。It is an example of the data preservation | save method of the cluster in one embodiment of this invention, and the element addition method to a cluster. 本発明の第１の実施例におけるシステムへの適用例（オフライン利用例）である。It is an application example (offline use example) to the system in the first embodiment of the present invention. 本発明の第２の実施例におけるシステムへの適用例（オンライン利用例）である。It is an application example (online use example) to the system in the second embodiment of the present invention.

以下、図面と共に本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図２は、本発明の一実施の形態における監視情報分析装置の構成例である。 FIG. 2 is a configuration example of a monitoring information analysis apparatus according to an embodiment of the present invention.

同図に示す監視情報分析装置１００は、クラスタ抽出部１１０、テンプレート生成部１２０、クラスタ記憶部１３０を有する。また、図示しないが、メモリ上に、メッセージ読み込みのカウンタ及びクラスタ数のカウンタを有する。 The monitoring information analysis apparatus 100 shown in FIG. 1 includes a cluster extraction unit 110, a template generation unit 120, and a cluster storage unit 130. Although not shown, the memory has a message reading counter and a cluster number counter.

クラスタ抽出部１１０は、ログメッセージを取得し、クラスタリングする。なお、ログメッセージは、逐次的に入力される、または、ログＤＢから読み込んでもよい。 The cluster extraction unit 110 acquires log messages and performs clustering. The log messages may be input sequentially or read from the log DB.

テンプレート生成部１２０は、ログ中のメッセージ本体のみを分析することで、ログのテンプレートとなる部分を抽出する。テンプレート生成部１２０では、テンプレート化処理、クラスタ分離処理、クラスタ結合処理が行われる。 The template generation unit 120 extracts only a log template by analyzing only the message body in the log. In the template generation unit 120, template processing, cluster separation processing, and cluster combination processing are performed.

図３は、本発明の一実施の形態におけるログメッセージとテンプレートの例を示す。 FIG. 3 shows an example of a log message and a template in one embodiment of the present invention.

入力されるメッセージには、タイムスタンプ、ログを生成した監視機器名、メッセージ本体が含まれる。 The input message includes a time stamp, the name of the monitoring device that generated the log, and the message body.

メッセージは、事前に特定の記号や助詞の間に空白を挿入することで、空白による単語の分割が行えるものとする。また、時間やIPアドレスなどの明らかにパラメータとわかる部分は事前に除去するものとする。これらの処理を加えたメッセージを加工済みメッセージとする。 In the message, it is assumed that words can be divided by spaces by inserting spaces between specific symbols and particles in advance. Also, parts that are clearly known as parameters, such as time and IP address, shall be removed in advance. A message to which these processes are added is defined as a processed message.

加工済みメッセージは、以下のように分類される。 Processed messages are classified as follows.

・ケース１：各単語がそれぞれ、数字のみで構成される場合；
・ケース２：数字とアルファベットから構成される場合：
・ケース３：数字と特定の記号のみで構成される場合：
・ケース４：記号とアルファベッドから構成される場合：
・ケース５：アルファベッドのみから構成される場合：
・ケース６：記号のみから構成される場合：
但し、メッセージが日本語の場合、アルファベッド部分にひらがな、カタカナ、漢字を加えてもよい。 Case 1: When each word is composed only of numbers;
・ Case 2: When it consists of numbers and alphabets:
・ Case 3: When it consists only of numbers and specific symbols:
Case 4: When composed of symbols and alpha beds:
・ Case 5: Consists of only alpha beds:
-Case 6: Consists of symbols only:
However, if the message is in Japanese, hiragana, katakana and kanji may be added to the alpha bed.

上記の単語の分類は、それぞれテンプレートに含まれ易さについて分類したものである。利用者はこれ以外にも特定の単語（down、upなど）について新しいケースを定義することで、テンプレートに含まれ易さを調整することができる。以下では、簡単化のため上記で定義したケースのみを用いて説明する。 The above word classifications are classified according to the ease of inclusion in the template. The user can adjust the ease of inclusion in the template by defining new cases for specific words (down, up, etc.). In the following, only the case defined above will be described for simplicity.

以下に、クラスタ抽出部１１０とテンプレート生成部１２０について説明する。 Hereinafter, the cluster extraction unit 110 and the template generation unit 120 will be described.

図４は、本発明の一実施の形態におけるテンプレート把握手順のフローチャートである。 FIG. 4 is a flowchart of the template grasping procedure according to the embodiment of the present invention.

クラスタ抽出部１１０には、図３に示すように、予めタイムスタンプを消去し、特定の記号文字（図３の例では、"＝"、"，"）にスペースを追加した加工済みメッセージが入力されるものとする。 As shown in FIG. 3, the cluster extraction unit 110 receives a processed message in which a time stamp is deleted in advance and a space is added to a specific symbol character (“=”, “,” in the example of FIG. 3). Shall be.

初期状態として、クラスタ記憶部１３０に現在把握しているクラスタの集合Ｃを設定し、これを空集合（Ｃ(0)）とし、最初の加工済みメッセージを読み込み、これを対象メッセージとすると共に、メモリ（図示せず）上のメッセージ読み込みのカウンタをインクリメントする（ステップ１０１）。 As an initial state, a cluster set C currently known is set in the cluster storage unit 130, this is set as an empty set (C (0)), the first processed message is read, and this is set as a target message. A message reading counter on a memory (not shown) is incremented (step 101).

クラスタ抽出部１１０は、まず、対象メッセージを空白単位で分割し、分割された総単語数Ｌを求める（ステップ１０２）。 The cluster extraction unit 110 first divides the target message in units of blanks and obtains the total number L of divided words (step 102).

クラスタは、総単語数ごとに管理されており、これまでにＬが出現していれば、これを把握し、なければ新しくＬをクラスタ集合Ｃに加える（ステップ１０４）。 The cluster is managed for each total number of words. If L has appeared so far, this is grasped, and if not, L is newly added to the cluster set C (step 104).

続いて、加工済みのメッセージ内の単語を分析し、各ケースへ分類する。このうち、記号のみの場合、"ケース６"に分類される文字とその位置を抽出する。これを加工済みメッセージにおける記号リストｓと呼ぶ。例えば、１番目の単語は数字のみであるので、"ケース１"となり、２番目の単語「LINK」はアルファベットのみであるので、"ケース５"となる。 Subsequently, the words in the processed message are analyzed and classified into each case. Among these, in the case of only symbols, the characters classified into “Case 6” and their positions are extracted. This is called a symbol list s in the processed message. For example, since the first word is only a number, it is “case 1”, and the second word “LINK” is only the alphabet, so “case 5”.

クラスタは総単語数Ｌによって分類されたのちに一意な記号リストｓによってメモリ（図示せず）で管理されている（ステップ１０５）。したがって、Ｌと把握した記号リストｓを用いてクラスタを検索し、既に出現していなければ（ステップ１０６,No）これを新しく作成する(ステップ１０７)。 The clusters are classified by the total number of words L and then managed in a memory (not shown) by a unique symbol list s (step 105). Therefore, a cluster is searched using the symbol list s grasped as L, and if it does not already appear (step 106, No), it is newly created (step 107).

総単語数Ｌ及び記号リストsに合致するクラスタが存在した場合（ステップ１０６,Yes）、これらをクラスタ候補とする。対象メッセージと各クラスタ間の類似度を計算し、最も類似度の高いクラスタを選択し、クラスタ記憶部１３０に格納し、メモリ（図示せず）上のクラスタ数カウンタをカウントアップする（ステップ１０８）。なお、この際、クラスタとメッセージの類似度があるしきい値Ｅ未満であった場合（ステップ１０９,No）、どのクラスタにも合致しないものとして対象の加工済みメッセージを用いてクラスタを新たにクラスタ記憶部１３０上に作成し、クラスタ数をカウントアップする(ステップ１１０)。これ以外の場合は（ステップ１０９,Yes）クラスタに対象メッセージを追加する（ステップ１１１）。 If there are clusters that match the total number of words L and the symbol list s (step 106, Yes), these are set as cluster candidates. The degree of similarity between the target message and each cluster is calculated, the cluster with the highest degree of similarity is selected, stored in the cluster storage unit 130, and the cluster number counter on the memory (not shown) is counted up (step 108). . At this time, if the degree of similarity between the cluster and the message is less than a threshold E (step 109, No), a cluster is newly created using the target processed message as not matching any cluster. It is created on the storage unit 130 and the number of clusters is counted up (step 110). In other cases (step 109, Yes), the target message is added to the cluster (step 111).

クラスタは、本例では、図５に示すように、内部に"ケース６"以外の文字情報及び単語位置を保持し、新たにクラスタに加工済みメッセージが追加された場合は、各単語位置における一意な単語のみを保持するものとする。また、同一位置の単語数があるしきい値α個を超えた場合は古い単語から順に削除される。 In this example, as shown in FIG. 5, the cluster holds character information and word positions other than “Case 6” inside, and when a processed message is newly added to the cluster, the cluster is unique at each word position. Suppose that it only holds simple words. When the number of words at the same position exceeds a threshold value α, the oldest words are deleted in order.

以下では、クラスタと加工済みメッセージの距離の計算について説明する。 Hereinafter, calculation of the distance between the cluster and the processed message will be described.

以下の計算において、w_i(ｉ=1,2,3,4)をそれぞれ各ケースｉにおけるテンプレート部になりやすさを表す重みとする。各ケースの定義より、w₁≦ w₂≦ w₃≦w₄ である。 In the following calculation, w _i (i = 1, 2, 3, 4) is set as a weight representing the likelihood of becoming a template part in each case i. From the definition of each case, w ₁ ≦ w ₂ ≦ w ₃ ≦ w ₄ .

クラスタは各単語位置における単語情報を保持しているので、各単語位置で最も高い値をとるΦiを計算し、各単語位置に対応する各ケースを設定する。その後、クラスタkにおける各ケースｉの個数をc_ki とする。w_i* c_ki のiに関する和をとることで、クラスタに一致することで得られる最大のスコアとなる。これを"最大クラスタスコア"と呼ぶ。 Since the cluster holds word information at each word position, Φi having the highest value at each word position is calculated, and each case corresponding to each word position is set. Then, _let c _ki be the number of cases i in cluster k. By taking the sum of w _i * c _ki with respect to i, the maximum score obtained by matching the cluster is obtained. This is called the “maximum cluster score”.

続いて、対象の加工済みメッセージ内の"ケース６"以外の単語が、各単語位置においてクラスタの中に含まれているかを調べる。もし単語がクラスタの内部に含まれていれば、その単語のケースを調べ、該当する重みを足し上げ一致スコアとする。すなわち、対象メッセージにおいてクラスタと一致する各ケースiの単語数をx_iとすると、一致スコアはw_i* x_i のiに関する和となる。 Subsequently, it is checked whether words other than “case 6” in the processed message of interest are included in the cluster at each word position. If the word is contained in the cluster, the case of the word is examined, and the corresponding weight is added to obtain a match score. That is, if the number of words in each case i that matches the cluster in the target message is x _i , the match score is the sum of i of w _i * x _i .

すべての単語を調べ終わった段階の一致スコアを最大クラスタスコアで割ったもの、すなわち、 The match score at the end of all the words is divided by the maximum cluster score, ie

をクラスタと加工済みメッセージの距離として定義する。

Is defined as the distance between the cluster and the processed message.

上記の距離を求めることにより、"ケース1"のように数字が一致した場合は類似度を低く、逆に"ケース２"のようにアルファベットのみで構成される文字が一致した場合には類似度が高くなるように類似度を構成できる。 By calculating the above distance, the similarity is low when the numbers match as in "Case 1", and conversely when the letters consisting of only the alphabet match as in "Case 2" The similarity can be configured so as to be high.

続いて、テンプレート生成部１２０における、クラスタリングされたメッセージのテンプレート化処理について述べる。以下では、クラスタからアスタリスク（*）を含む一行のテンプレートを生成する方法を述べるが、この例に限定されることなく、クラスタを「テンプレートクラスタ」としてテンプレート相当の扱いとすることができる。 Subsequently, the template generation processing of the clustered message in the template generation unit 120 will be described. In the following, a method for generating a one-line template including an asterisk (*) from a cluster will be described, but the present invention is not limited to this example, and the cluster can be treated as a template corresponding to a “template cluster”.

（１）まず、クラスタ記憶部１３０からクラスタを読み込み、クラスタが以下の条件に合致するか調べる。 (1) First, a cluster is read from the cluster storage unit 130 to check whether the cluster meets the following conditions.

条件１−１：クラスタ内に保持された単語のうち、ある単語位置に紐づく単語数がしきい値αを超える。 Condition 1-1: Of the words held in the cluster, the number of words associated with a certain word position exceeds a threshold value α.

条件１−２：クラスタのヒット数があるしきい値ｆ以上である。ここでヒット数とは、これまでにクラスタへ追加されたログの個数を表す。 Condition 1-2: The number of cluster hits is greater than or equal to a certain threshold value f. Here, the number of hits represents the number of logs that have been added to the cluster so far.

条件１−３：β個以上の単語位置において、1つしか単語要素のないものが存在する。 Condition 1-3: There are those having only one word element in β or more word positions.

上記の条件１−１においてしきい値を超えた単語位置にパラメータ箇所であるというパラメータラベル"("を付与することで、テンプレートを抽出する。 A template is extracted by assigning a parameter label “(”, which is a parameter location, to a word position exceeding the threshold value in the above condition 1-1.

パラメータラベルがふられた箇所については、ケースのみを保持し、今後はクラスタ間距離を計算する際には単語の一致を調べず、無条件に一致しているものとしてスコアを与える。 Only the case is retained for the place where the parameter label is given, and in the future, when calculating the distance between the clusters, the word match is not checked, and the score is given as being unconditionally matched.

次に、テンプレート生成部１２０のクラスタ分離処理について述べる。 Next, cluster separation processing of the template generation unit 120 will be described.

クラスタ分離処理は、クラスタリングを行った結果、別々のテンプレートと考えられるログを同一クラスタとして判定してしまった場合、これをクラスタとして分離する処理である。まず、以下の条件をクラスタが満たすか調べる。 The cluster separation process is a process of separating a log that is considered to be a different template as the same cluster as a cluster as a result of clustering. First, it is examined whether the cluster satisfies the following conditions.

条件２−１：条件１−１，１−２を満たす。 Condition 2-1: Conditions 1-1 and 1-2 are satisfied.

条件２−２：ある単語位置に紐づく単語数が、２つ以上かつγ個以下の箇所が存在する。 Condition 2-2: There are places where the number of words associated with a certain word position is 2 or more and γ or less.

上記条件を満たすクラスタは、条件２−１によりパラメータとすべき箇所が存在しているが、条件２−２により、パラメータとは考えられない箇所を含むにもかかわらず同一クラスタと判断している。そこで、条件２−２に合致する箇所を分断し、新たなクラスタに分離する。 Clusters that satisfy the above conditions have a part that should be a parameter according to condition 2-1, but are judged to be the same cluster according to condition 2-2 even though they include a part that cannot be considered as a parameter. . Therefore, the part that matches the condition 2-2 is divided and separated into new clusters.

さらに、上記のクラスタの分離に伴い、内部パラメータであるΦiを更新する。 Further, as the cluster is separated, the internal parameter Φi is updated.

以下ではΦiの更新方法について述べる。 The following describes how to update Φi.

加工済みメッセージとクラスタ間の距離の計算方法は、以下の線形識別問題と同値である。 The method for calculating the distance between the processed message and the cluster is equivalent to the following linear identification problem.

これの性質を用いて、線形識別問題のオンラインアルゴリズムを適用することでパラメータを更新することができる。オンラインアルゴリズムは様々な方法があるが、例えば、非特許文献３（Passive Aggressive：K. Crammer, O. Dekel, J. Keshet, S. S. Shwartz, and Y. Singer, "Online Passive-Aggressive Algorithms," Journal of Machine Learning Research 7 (2006) pp. 551-585.）が挙げられる。

Using this property, the parameters can be updated by applying an online algorithm for the linear identification problem. There are various online algorithms. For example, Non-Patent Document 3 (Passive Aggressive: K. Crammer, O. Dekel, J. Keshet, SS Shwartz, and Y. Singer, "Online Passive-Aggressive Algorithms," Journal of Machine Learning Research 7 (2006) pp. 551-585.

次に、テンプレート生成部１２０における、クラスタ結合処理について述べる。 Next, cluster combination processing in the template generation unit 120 will be described.

クラスタ結合処理は、同じクラスタと判断すべきところがクラスタの過剰な分割がなされてしまっている場合に、これを新たに結合することでテンプレート化を促進する処理である。 The cluster joining process is a process that promotes templating by newly joining clusters when excessive cluster division has been performed where the same cluster should be determined.

まず、クラスタが以下の条件を満たすかを調べる。 First, it is checked whether the cluster satisfies the following conditions.

条件３−１：クラスタのヒット数（クラスタ数のカウンタ値）があるしきい値以下である。 Condition 3-1: The number of cluster hits (counter value of the number of clusters) is below a certain threshold.

条件３−２：同一単語数、同一記号リストを持つクラスタにおいて距離が近いクラスタが存在する。 Condition 3-2: Clusters having the same number of words and the same symbol list and having a short distance exist.

条件３−３：一定数以上のメッセージを読み込んでいる（読み込み数カウンタの値を参照）。 Condition 3-3: A predetermined number or more of messages are read (see the value of the read number counter).

この際、条件３−１，３−３より、読み込んだメッセージに比べクラスタのヒット数が低いにもかかわらず、条件３−２のように類似度の高いクラスタが存在することは, クラスタの過分離を想定できる。そこで、条件３−２に合致するクラスタ同士を結合する。 At this time, under conditions 3-1 and 3-3, despite the fact that the number of cluster hits is lower than that of the read message, there is a cluster with high similarity as in condition 3-2. Separation can be assumed. Therefore, the clusters matching the condition 3-2 are joined.

クラスタの結合方法は単純に、それぞれの単語位置に含まれる単語の集合を直接結合することで実現する。 The cluster combining method is simply realized by directly combining a set of words included in each word position.

また, クラスタの結合時にもクラスタ分離処理と同様にパラメータの更新を行う。方法はクラスタ分離処理時と同様である。 In addition, the parameters are updated at the time of cluster combination in the same way as the cluster separation process. The method is the same as in the cluster separation process.

テンプレート生成部１２０では、テンプレート化処理、クラスタ結合・分離処理はそれぞれクラスタのヒット数の計算が必要となるため、これらはある程度のログの蓄積段階の任意の時点で行うことを想定している。例えば一日、一週間ごとやある一定メッセージを処理するごと、あるいは各クラスタのヒット数がしきい値を超えた場合などがある。 In the template generation unit 120, it is assumed that the templating process and the cluster joining / separating process each require calculation of the number of hits of the cluster, and these are performed at an arbitrary point in a certain log accumulation stage. For example, there is a case where the number of hits of each cluster exceeds a threshold value, every day, every week, every time a certain message is processed, or the like.

上記以外にも、パラメータを強制的に学習させたい場合には、利用者が既存のクラスタに情報を与えることで、クラスタの分離・結合を行うことができる。この際、条件２−＊や条件３−＊とは無条件で処理を行うものとする。 In addition to the above, when it is desired to forcibly learn parameters, the user can separate and combine clusters by providing information to existing clusters. At this time, the condition 2- * and the condition 3- * are unconditionally processed.

以下では、上記の構成及び処理を適用した例を示す。 Below, the example which applied said structure and process is shown.

本発明をシステム（オフライン、オンライン）に適用した場合の主要な構成要素は、過去に生起したログのテンプレートクラスタを保存しておくDBとこれを用いて監視情報に対して逐次クラスタリングを行う分析エンジンおよびそのDB、そして利用者へこれを出力し、あるいは利用者が事前あるいは事後にラベル等を与えるユーザインターフェースである。これらは図６あるいは図７に示すような形態で実施することができる。 The main components when the present invention is applied to a system (offline, online) are a DB that stores a template cluster of logs that have occurred in the past, and an analysis engine that performs sequential clustering on monitoring information using this DB It is a user interface that outputs this to the DB and the user, or gives a label etc. in advance or after the user. These can be implemented in the form shown in FIG. 6 or FIG.

図６では収集装置で集められたログを、あらかじめログDBへ保存している場合を想定している。図７では時々刻々と発生するログに対してクラスタリングを随時行い、逐次ログのテンプレートを更新しつつその発生状況を監視する形態を示している。 In FIG. 6, it is assumed that logs collected by the collection device are stored in the log DB in advance. FIG. 7 shows a form in which clustering is performed on a log that occurs from time to time, and the status of occurrence is monitored while sequentially updating a log template.

ここで、ログの収集装置１は事前に準備されているものとし、監視対象となる機器群P₀〜P_nもログを生成する機構を有しているものとする。
［第１の実施例］
入力とするログはどのような機器のものでも問題ないが、例えばネットワーク監視のためのルータのsyslog などが考えられる。この場合において、図６のオフラインの形態について説明する。 Here, it is assumed that the log collection device 1 is prepared in advance, and the device groups P _{0 to} P _n to be monitored also have a mechanism for generating a log.
[First embodiment]
The input log can be any device, but for example, the router syslog for network monitoring can be considered. In this case, the offline form of FIG. 6 will be described.

ネットワーク機器の生成するログは一元的に収集され、ログDB２へ保存される。クラスタ抽出エンジン１１０では、これらの情報を受け、逐次的にクラスタＤＢ１３０のログテンプレートの情報を更新していく。過去に生起したログに関する情報、例えば特定のログテンプレートが発生した時刻等はログＤＢ２を通して把握することができる。また利用者は、オフラインでテンプレートの更新や追加を行う。 Logs generated by network devices are collected centrally and stored in the log DB 2. The cluster extraction engine 110 receives these pieces of information and updates the log template information in the cluster DB 130 sequentially. Information about logs that occurred in the past, such as the time when a specific log template occurred, can be grasped through the log DB 2. The user also updates and adds templates offline.

［第２の実施例］
第１の実施例では、ログ情報があらかじめログDB２に蓄積されており、オフラインでテンプレートを抽出する場合を考えたが、時々刻々と到着するログに対しオンラインでテンプレート抽出しながらの利用も可能である。以下では図７を用いながら、この利用例を示す。 [Second Embodiment]
In the first embodiment, log information is stored in the log DB 2 in advance, and a case where a template is extracted offline has been considered. However, it is possible to use a template that is extracted from time to time while extracting a template online. is there. Hereinafter, this usage example will be described with reference to FIG.

ログ収集装置１から送られた各ログ情報は、どのクラスタに属するかが把握される。クラスタには事前にフィルタリングルールやラベル付ルールが設定されており、これを用いて利用者は時々刻々と生成するログの状況を把握することができる。また、この場合も第１の実施例と同様に、逐次到着する利用者は任意の時点でテンプレートの更新や追加の作業を、ユーザインターフェースを介して行うことができる。 It is grasped to which cluster each log information sent from the log collection device 1 belongs. Filtering rules and labeled rules are set in advance in the cluster, and using this, the user can grasp the status of logs that are generated every moment. Also in this case, as in the first embodiment, a user who arrives sequentially can perform template update or addition work at any time via the user interface.

なお、上記の実施の形態における監視情報分析装置の各構成要素の動作をプログラムとして構築し、監視情報分析装置（例えば、クラスタ抽出エンジン、テンプレート整形・表示部）として利用されるコンピュータにインストールして実行させる、または、ネットワークを介して流通させることが可能である。 The operation of each component of the monitoring information analysis apparatus in the above embodiment is constructed as a program and installed on a computer used as the monitoring information analysis apparatus (for example, cluster extraction engine, template shaping / display unit). It can be executed or distributed via a network.

本発明は、上記の実施の形態及び実施例に限定されることなく、特許請求の範囲内において種々変更・応用が可能である。 The present invention is not limited to the above-described embodiments and examples, and various modifications and applications can be made within the scope of the claims.

１ログ収集装置
２ログDB
３ログ情報表示部
４テンプレート表示ルール
１００監視情報分析装置
１１０クラスタ抽出部、クラスタ抽出エンジン
１２０テンプレート生成部
１３０クラスタ記憶部、クラスタＤＢ 1 Log collection device 2 Log DB
3 log information display unit 4 template display rule 100 monitoring information analysis device 110 cluster extraction unit, cluster extraction engine 120 template generation unit 130 cluster storage unit, cluster DB

Claims

A monitoring information analysis device for extracting useful information from log information output from a monitored device,
A processed log message is obtained by inserting a blank space for splitting between specific symbols and particles, deleting the part that can be clearly recognized as a parameter, clustering similar messages, and acquiring the processed log Each word included in the message is classified into a predetermined case set based on the likelihood of being a template corresponding to each word position, and the degree of similarity between the acquired processed log message and each cluster is calculated, A cluster extraction unit that selects a cluster having the highest similarity, adds the acquired processed log message to the cluster having the highest similarity, and stores the cluster in the cluster storage unit;
A template extraction unit that reads a cluster from the cluster storage unit and extracts a template by assigning a label to a word position that satisfies a predetermined condition;
A monitoring information analyzing apparatus characterized by comprising:

The cluster extraction means includes
The processed log message is divided into blank units, the total number of words is obtained, the words of the processed log message are classified into the case, the classified characters and character positions are extracted, and the total number of words is extracted. A means for selecting a cluster that matches the character and the character position as a cluster candidate, calculating a similarity between the cluster candidate and the processed log message, and selecting a cluster with the high similarity as a cluster. Monitoring information analysis equipment.

The cluster extraction means includes
When classifying the words into predetermined cases,
Each word is composed only of numbers;
If it consists of numbers and letters:
When it consists only of numbers and specific symbols:
If consists of symbols and alphabet:
Alphabetic case only consists of:
If it consists only of symbols:
The monitoring information analysis apparatus according to claim 1 or 2, further comprising means for classifying the information into any one of the following.

The template extraction means includes
The cluster is
Of the words held in the cluster, the number of words associated with a certain word position exceeds a predetermined threshold;
The number of hits of the cluster, which is the number of logs added to the cluster storage means so far, is greater than or equal to a predetermined threshold;
There is only one word element at a given number of word positions;
2. The monitoring information analysis apparatus according to claim 1, further comprising means for adding a label indicating a parameter location to the word position and extracting a template.

Said template extracting means,
The number of words associated with a certain word position among the words held in the cluster exceeds a predetermined threshold value;
The number of hits of the cluster, which is the number of logs added to the cluster storage means so far, is greater than or equal to a predetermined threshold;
There are places where the number of words linked to a certain word position is two or more and not more than a predetermined value;
In this case, cluster separation means for re-forming a cluster so as to separately include a word group existing at a place where the number of words linked to a certain word position is two or more and a predetermined value or less, and divides the original cluster is provided. The monitoring information analysis apparatus according to claim 4, further comprising:

Said template extracting means,
This number der Ruhi Wattage of the added log in clusters of the cluster storage means until there is a cluster is below a predetermined threshold value;
There is a cluster having a high degree of similarity with a cluster whose hit count is equal to or less than a predetermined threshold value ;
Reading more than a fixed number of processed log messages ;
5. The method according to claim 4, further comprising: a cluster combining unit that combines a set of words included in each word position with respect to a cluster in which the number of hits is equal to or less than a predetermined threshold and a cluster having a high similarity Monitoring information analyzer.

A monitoring information analysis method for extracting useful information from log information output from a monitored device,
Cluster storage means for storing clusters;
In an apparatus having cluster extraction means and template extraction means,
The cluster extraction means obtains a processed log message in which a space for dividing is inserted between specific symbols and particles, and a portion that can be clearly recognized as a parameter is deleted, and similar messages are clustered. Each word included in the acquired processed log message is classified into a predetermined case that is set based on the likelihood of being a template corresponding to each word position, and between the acquired processed log message and each cluster A cluster extraction step of calculating a similarity , selecting a cluster having the highest similarity, adding the acquired processed log message to the cluster having the highest similarity, and storing the cluster in the cluster storage unit;
The template extraction means reads a cluster from the cluster storage means, extracts a template by assigning a label to a word position that satisfies a predetermined condition, and a template extraction step;
Monitoring information analysis method characterized by performing.