JP2017146832A

JP2017146832A - Unusual log detection system and unusual log detection method

Info

Publication number: JP2017146832A
Application number: JP2016028956A
Authority: JP
Inventors: 幸紀南田; Yukinori Minamida; 弘柴田; Hiroshi Shibata
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2016-02-18
Filing date: 2016-02-18
Publication date: 2017-08-24

Abstract

PROBLEM TO BE SOLVED: To detect an unusual log by reflecting characteristics of a series of log and determining similarity ratio of the series of log.SOLUTION: An unusual log detection system 1 comprises an extract unit 102, a calculation unit 103 and a detecting unit 104. The extract unit extracts two or more appointed lines of series of log from logs of an information system while sliding one line the top. The calculation unit counts a line count corresponding with other log series and makes log identity line count series while sliding one line the extracted one series of log, and calculates product of the area and the kurtosis of the made log identity line count series as a similarity ratio. The detecting unit detects unusualness of the series of log based on the calculated similarity ratio.SELECTED DRAWING: Figure 1

Description

本発明は、異常ログ検出システムおよび異常ログ検出方法に関する。 The present invention relates to an abnormal log detection system and an abnormal log detection method.

情報システムにおいて作成され記憶されるログの情報を利用した様々な技術が知られている。たとえば、ログデータと、所定の検出対象パターンとに含まれる非数値情報を数値情報に変換して、数値情報に変換された検出対象パターンの中の部分シーケンスと、数値情報に変換されたログデータの中の部分シーケンスとを対応づけた部分シーケンスペアの類似度を、動的計画法に基づく単一のスコア行列を用いて算出する手法が知られている（特許文献１）。また、端末から正常モデルを収集、統合して正常モデルを生成し、正常モデルを用いてログを分析して動作の異常度合いを示す異常度を算出して異常検知を行うことが知られている（特許文献２）。 Various techniques using log information created and stored in an information system are known. For example, non-numeric information included in log data and a predetermined detection target pattern is converted into numerical information, a partial sequence in the detection target pattern converted into numerical information, and log data converted into numerical information There is known a technique for calculating the similarity of a partial sequence pair that is associated with a partial sequence in the file using a single score matrix based on dynamic programming (Patent Document 1). Also, it is known that normal models are collected from terminals, integrated to generate normal models, logs are analyzed using the normal models, and abnormalities indicating the degree of abnormalities in operations are calculated to detect abnormalities. (Patent Document 2).

特開２０１１−１３８４２２号公報JP 2011-138422 A 特開２０１５−１０８８９８号公報JP2015-108898A

Toby Segaran著、「集合知プログラミング」、オライリー・ジャパン、２００８年７月２５日、p.31−41Toby Segaran, “Collective Intelligence Programming”, O'Reilly Japan, July 25, 2008, p.31-41 上坂吉則、尾関和彦著、「パターン認識と学習のアルゴリズム」、文一総合出版、１９９０年５月、p.91−108Yoshinori Uesaka and Kazuhiko Ozeki, “Pattern Recognition and Learning Algorithm”, Bunichi Sogo Shuppan, May 1990, p.91-108

ところで、ログを用いた異常検知において、ログの監視を通じて通常の状態を学習し、学習した通常の状態を用いて異常ログを検出することができれば有用と考えられる。 By the way, in anomaly detection using a log, it is considered useful if a normal state is learned through log monitoring and an abnormal log can be detected using the learned normal state.

この点、ログの異常検出を行う方法として、ログを一行ずつ学習していく手法が考えられる。しかし、ログを一行ずつ切り離して学習した場合、複数行にまたがるログの系列に異常な振る舞いが現れる場合を検出することができない。 In this regard, a method of learning the log line by line can be considered as a method of detecting the abnormality of the log. However, when learning is performed by separating the logs line by line, it is not possible to detect a case in which an abnormal behavior appears in a series of logs extending over a plurality of lines.

また、系列を比較する方法としては、系列をbag-of-wordsベクトルで表現し、系列間の距離（ユークリッド距離やコサイン距離）に基づいて類似性を判定することも考えられる。しかし、bag-of-wordsベクトルでは、要素間の距離や並び順は考慮されない。これに対して、ログ系列は、ログの並び順に規則性があり、並び順の一致度が高いほど類似性が高いといえる。このため、bag-of-wordsベクトルをログ系列の比較に用いた場合、ログ系列の特徴を反映した比較結果を得ることができない（非特許文献１参照）。 Further, as a method for comparing sequences, it is conceivable that sequences are represented by bag-of-words vectors, and similarity is determined based on the distance between sequences (Euclidean distance or cosine distance). However, in the bag-of-words vector, the distance between elements and the order of arrangement are not considered. On the other hand, the log series has regularity in the order of logs, and it can be said that the similarity is higher as the matching degree of the order is higher. For this reason, when the bag-of-words vector is used for comparison of log sequences, a comparison result reflecting the characteristics of the log sequences cannot be obtained (see Non-Patent Document 1).

また、ログには、各種の機能が出力する一連の系列が混在する。したがって、単純にログ系列を比較するだけでなく、異なった機能が出力する多種のログが混在していても類似度を計算できるような方法によってログ系列を比較することが好ましい。そこで、系列を比較する他の方法として、ＤＰ（Dynamic Programming）マッチング（動的計画法）を用いることも考えられる。ＤＰマッチングを用いた場合、他のログが混ざっていても、並び順を保って最適にマッチする行数を求めることができる。しかし、ログ系列では、複数行が連続して一致するほど類似性は高いといえるが、ＤＰマッチングは、不連続な一致を許容しすぎてしまい、そのままでは連続して一致する場合に類似度が高いと評価することができない（非特許文献２参照）。 The log includes a series of series output by various functions. Therefore, it is preferable not only to simply compare log sequences, but also to compare log sequences by a method that allows the similarity to be calculated even when various logs output by different functions are mixed. Thus, as another method for comparing sequences, it is conceivable to use DP (Dynamic Programming) matching (dynamic programming). When DP matching is used, even if other logs are mixed, it is possible to obtain the optimal number of lines while maintaining the arrangement order. However, in a log sequence, the similarity is higher as multiple lines are matched continuously. However, DP matching allows too much discontinuous matching, and the degree of similarity is increased when matching continuously. It cannot be evaluated as high (see Non-Patent Document 2).

開示の実施形態は、上記に鑑みてなされたものであり、ログ系列の特性を反映してログ系列の類似度を判定することで、異常ログを検出することを目的とする。 An embodiment of the disclosure has been made in view of the above, and an object of the present invention is to detect an abnormality log by determining the similarity of a log sequence reflecting the characteristics of the log sequence.

開示する異常ログ検出システムおよび異常ログ検出方法は、情報システムのログから、先頭を１行ずつずらして２行以上の所定行数のログ系列を抽出する。異常ログ検出システムおよび異常ログ検出方法は、抽出した１のログ系列を１行ずつずらしながら、他のログ系列と一致する行数を数えてログ一致行数系列を作成し、作成したログ一致行数系列の面積と尖度との積を、類似度として算出する。異常ログ検出システムおよび異常ログ検出方法は、算出した類似度に基づき、１のログ系列の異常を検出する。 The disclosed abnormal log detection system and abnormal log detection method extract a log sequence of a predetermined number of lines of two or more lines by shifting the head line by line from the log of the information system. The abnormal log detection system and the abnormal log detection method create a log matching line number series by shifting the extracted one log series one line at a time and counting the number of lines that match the other log series. The product of the area and kurtosis of the number series is calculated as the similarity. The abnormality log detection system and the abnormality log detection method detect an abnormality in one log series based on the calculated similarity.

開示する異常ログ検出システムおよび異常ログ検出方法は、ログ系列の特性を反映してログ系列の類似度を判定することで、異常ログを検出することができるという効果を奏する。 The disclosed abnormal log detection system and abnormal log detection method have an effect that an abnormal log can be detected by determining the similarity of log sequences reflecting the characteristics of log sequences.

図１は、第１の実施形態にかかる異常ログ検出システムの構成の一例を示す図である。FIG. 1 is a diagram illustrating an example of a configuration of an abnormality log detection system according to the first embodiment. 図２は、第１の実施形態にかかる異常ログ検出システムにおける処理の概要について説明するための図である。FIG. 2 is a diagram for explaining an overview of processing in the abnormality log detection system according to the first embodiment. 図３は、第１の実施形態にかかる異常ログ検出処理の大まかな流れの一例を示すフローチャートである。FIG. 3 is a flowchart illustrating an example of a rough flow of the abnormality log detection process according to the first embodiment. 図４は、第１の実施形態におけるログ系列の抽出手法の一例を説明するための図である。FIG. 4 is a diagram for explaining an example of a log sequence extraction method according to the first embodiment. 図５は、第１の実施形態にかかる異常ログ検出システムにおいて、ログ系列間の類似度を算出する手法の一例を説明するための図である。FIG. 5 is a diagram for explaining an example of a technique for calculating the similarity between log sequences in the abnormality log detection system according to the first embodiment. 図６は、図５に示す二つのログ系列を突き合わせる場合について説明するための図である。FIG. 6 is a diagram for explaining a case where the two log sequences shown in FIG. 5 are matched. 図７は、第１の実施形態におけるログ系列の比較と類似度の算出の具体例について説明するための図である。FIG. 7 is a diagram for describing a specific example of comparison of log sequences and calculation of similarity in the first embodiment. 図８は、第１の実施形態にかかる異常検出処理について説明するための図である。FIG. 8 is a diagram for explaining the abnormality detection process according to the first embodiment. 図９は、プログラムが実行されることにより、異常ログ検出装置が実現されるコンピュータの一例を示す図である。FIG. 9 is a diagram illustrating an example of a computer in which an abnormality log detection apparatus is realized by executing a program.

以下に、開示するシステムおよび方法の実施形態を図面に基づいて詳細に説明する。なお、この実施形態によりこの発明が限定されるものではない。また、各実施形態は適宜組み合わせることができる。 In the following, embodiments of the disclosed system and method will be described in detail based on the drawings. In addition, this invention is not limited by this embodiment. Moreover, each embodiment can be combined suitably.

（第１の実施形態）
図１は、第１の実施形態にかかる異常ログ検出システム１の構成の一例を示す図である。第１の実施形態にかかる異常ログ検出システム１は、異常ログ検出装置１０と、情報処理システム２０とを備える。異常ログ検出装置１０と、情報処理システム２０とは、ネットワーク３０により通信可能に接続される。 (First embodiment)
FIG. 1 is a diagram illustrating an example of a configuration of an abnormality log detection system 1 according to the first embodiment. The abnormality log detection system 1 according to the first embodiment includes an abnormality log detection device 10 and an information processing system 20. The abnormality log detection apparatus 10 and the information processing system 20 are communicably connected via a network 30.

異常ログ検出装置１０は、情報処理システム２０からログを取得し、ログを解析することにより、通常の状態と評価されるログと、通常の状態から逸脱した異常なログと評価されるログとを区別し、異常なログを検出する。 The abnormality log detection apparatus 10 acquires a log from the information processing system 20 and analyzes the log, thereby obtaining a log evaluated as a normal state and a log evaluated as an abnormal log deviating from the normal state. Distinguish and detect abnormal logs.

情報処理システム２０は、情報処理を実行する。情報処理システム２０の構成および機能は特に限定されず、所定の情報処理を実行し、実行した情報処理に応じてログを蓄積するシステムであればよい。情報処理システム２０は、情報処理に応じて生成蓄積されるログを、異常ログ検出装置１０に送信する。ログの送信タイミングは特に限定されないが、逐次的に情報処理システム２０の状態を監視し、安定した運用を実現するという観点からは、情報処理システム２０はログを生成するごとに異常ログ検出装置１０に送信するよう構成されることが好ましい。なお、図１は、異常ログ検出装置１０と、情報処理システム２０と、を別体として例示するが、異常ログ検出装置１０は、監視対象である情報処理システム２０に組み入れて構成してもよい。 The information processing system 20 performs information processing. The configuration and function of the information processing system 20 are not particularly limited as long as it is a system that executes predetermined information processing and accumulates logs according to the executed information processing. The information processing system 20 transmits logs generated and accumulated in accordance with information processing to the abnormality log detection device 10. The log transmission timing is not particularly limited, but from the viewpoint of sequentially monitoring the state of the information processing system 20 and realizing stable operation, the information processing system 20 generates the log every time it generates a log. Is preferably configured to transmit to. Although FIG. 1 illustrates the abnormal log detection device 10 and the information processing system 20 as separate bodies, the abnormal log detection device 10 may be configured to be incorporated in the information processing system 20 that is a monitoring target. .

ネットワーク３０は、任意の通信網である。ネットワーク３０の種類は特に限定されない。ネットワーク３０はたとえば、インターネット、ＷＡＮ（Wide Area Network）、ＬＡＮ（Local Area Network）など任意の通信網である。また、ネットワーク３０は、有線でも無線でもよく、両者の組み合わせであってもよい。 The network 30 is an arbitrary communication network. The type of network 30 is not particularly limited. The network 30 is an arbitrary communication network such as the Internet, a WAN (Wide Area Network), and a LAN (Local Area Network). The network 30 may be wired or wireless, or a combination of both.

図２は、第１の実施形態にかかる異常ログ検出システム１における処理の概要について説明するための図である。図２に示すように、異常ログ検出装置１０は、情報処理システム２０が出力するログを取得する。ログは複数の行から構成される。複数の行から構成されるログを取得した異常ログ検出装置１０は、取得したログを蓄積し、学習処理を実行する。異常ログ検出装置１０は、学習処理により、ログの通常状態と、通常状態から逸脱する異常状態と、を識別する。そして、異常ログ検出装置１０は、以後情報処理システム２０から出力されるログと、学習済みのログとを比較することにより、通常状態から逸脱する異常ログを検出する。異常ログ検出装置１０は、検出した異常ログの情報を外部に出力（通知）する。これにより、情報処理システム２０の保守者は、情報処理システム２０の異常を検知し、開発者に解析を依頼する等の対応をとることができる。 FIG. 2 is a diagram for explaining an overview of processing in the abnormality log detection system 1 according to the first embodiment. As illustrated in FIG. 2, the abnormality log detection device 10 acquires a log output from the information processing system 20. The log consists of multiple lines. The abnormality log detection apparatus 10 that has acquired a log composed of a plurality of rows accumulates the acquired log and executes a learning process. The abnormality log detection device 10 identifies a normal state of the log and an abnormal state deviating from the normal state through a learning process. Then, the abnormality log detection device 10 detects an abnormality log that deviates from the normal state by comparing the log output from the information processing system 20 with the learned log. The abnormality log detection device 10 outputs (notifies) information of the detected abnormality log to the outside. Thereby, the maintainer of the information processing system 20 can take an action such as detecting an abnormality in the information processing system 20 and requesting the developer to analyze it.

このように、第１の実施形態の異常ログ検出システム１によれば、通常状態を逸脱するログを検出して早期に保守者に通知することができる。保守者は、早期に開発者に異常ログを詳細に解析させるなどの対応をとることができ、情報処理システム２０のトラブルを未然に防止して安定した運用を実現することができる。 Thus, according to the abnormality log detection system 1 of 1st Embodiment, the log which deviates from a normal state can be detected, and a maintenance person can be notified at an early stage. The maintenance person can take measures such as causing the developer to analyze the abnormality log in detail at an early stage, and can prevent a trouble in the information processing system 20 and realize a stable operation.

次に、図１に戻り、異常ログ検出装置１０の構成の一例、および、異常ログ検出装置１０における異常ログ検出処理の流れの一例について説明する。 Next, referring back to FIG. 1, an example of the configuration of the abnormality log detection device 10 and an example of the flow of abnormality log detection processing in the abnormality log detection device 10 will be described.

（異常ログ検出装置１０の構成の一例）
異常ログ検出装置１０は、制御部１００と、記憶部２００と、通信部３００と、を備える。 (Example of the configuration of the abnormality log detection device 10)
The abnormality log detection apparatus 10 includes a control unit 100, a storage unit 200, and a communication unit 300.

制御部１００は、異常ログ検出装置１０における異常ログ検出処理を制御する。制御部１００としては、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）等の電子回路や、ＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field Programmable Gate Array）等の集積回路を利用することができる。また、制御部１００は、異常ログ検出装置１０における処理手順等を規定したプログラムや制御データを格納する記憶部を有する。制御部１００は、各種プログラムが動作することにより、各種の処理部として機能する。 The control unit 100 controls abnormality log detection processing in the abnormality log detection device 10. As the control unit 100, an electronic circuit such as a CPU (Central Processing Unit) or MPU (Micro Processing Unit), or an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or FPGA (Field Programmable Gate Array) can be used. . In addition, the control unit 100 includes a storage unit that stores a program that defines a processing procedure and the like in the abnormality log detection device 10 and control data. The control unit 100 functions as various processing units by operating various programs.

制御部１００は、取得部１０１と、抽出部１０２と、算出部１０３と、検出部１０４と、を備える。 The control unit 100 includes an acquisition unit 101, an extraction unit 102, a calculation unit 103, and a detection unit 104.

取得部１０１は、情報処理システム２０からログを取得する。取得部１０１は、取得したログを記憶部２００に記憶する。また、取得部１０１は、取得したログを、抽出部１０２に送信する。 The acquisition unit 101 acquires a log from the information processing system 20. The acquisition unit 101 stores the acquired log in the storage unit 200. In addition, the acquisition unit 101 transmits the acquired log to the extraction unit 102.

抽出部１０２は、取得部１０１が取得したログから、所定の行数のログ系列を抽出する。具体的には、抽出部１０２は、取得部１０１が取得したログを、１行ずつずらして所定の行数分だけログ系列として抽出する。抽出部１０２は、抽出した所定の行数ｎ（ｎは２以上の自然数）のログ系列を、ｎ次元ベクトル（ｎ次元のログ系列）とする。 The extraction unit 102 extracts a log sequence having a predetermined number of rows from the log acquired by the acquisition unit 101. Specifically, the extraction unit 102 extracts the log acquired by the acquisition unit 101 as a log sequence by shifting a line at a time by a predetermined number of lines. The extraction unit 102 sets the extracted log sequence of a predetermined number of rows n (n is a natural number of 2 or more) as an n-dimensional vector (n-dimensional log sequence).

算出部１０３は、抽出部１０２が抽出したログ系列を、抽出済みの他のログ系列と比較して類似度を算出する。類似度の算出の詳細については後述する。 The calculation unit 103 calculates the similarity by comparing the log series extracted by the extraction unit 102 with other extracted log series. Details of the calculation of the similarity will be described later.

検出部１０４は、算出部１０３が算出した類似度に基づき、ログ系列の異常を検出する。たとえば、検出部１０４は、ログ系列と複数の他のログ系列との類似度のうち、上位所定数の類似度が閾値以下（すなわち類似度が低い）の場合、当該ログ系列は異常ログを含むと判定する。検出部１０４は、上位所定数の類似度が閾値より大きい場合、当該ログ系列は異常ログを含まないと判定する。検出部１０４は、判定結果を異常ログ検出装置１０の外部に出力する。 The detection unit 104 detects an abnormality in the log series based on the similarity calculated by the calculation unit 103. For example, the detection unit 104 includes an abnormal log when the upper predetermined number of similarities among the similarities between a log sequence and a plurality of other log sequences is equal to or lower than a threshold (that is, the similarity is low). Is determined. When the upper predetermined number of similarities is greater than the threshold, the detection unit 104 determines that the log series does not include an abnormality log. The detection unit 104 outputs the determination result to the outside of the abnormality log detection device 10.

記憶部２００は、各種のデータを記憶する記憶装置である。記憶部２００は、ＲＡＭ（Random Access Memory）、フラッシュメモリ、などのデータを書き換え可能な半導体メモリであってもよい。記憶部２００として使用される装置は特に限定されない。 The storage unit 200 is a storage device that stores various data. The storage unit 200 may be a semiconductor memory that can rewrite data, such as a RAM (Random Access Memory) and a flash memory. The device used as the storage unit 200 is not particularly limited.

記憶部２００は、ログ記憶部２０１と、ログ系列記憶部２０２と、類似度記憶部２０３と、検出結果記憶部２０４と、を備える。 The storage unit 200 includes a log storage unit 201, a log series storage unit 202, a similarity storage unit 203, and a detection result storage unit 204.

ログ記憶部２０１は、情報処理システム２０から取得部１０１により取得されるログを記憶する。 The log storage unit 201 stores a log acquired by the acquisition unit 101 from the information processing system 20.

ログ系列記憶部２０２は、抽出部１０２が抽出するログ系列を記憶する。 The log series storage unit 202 stores the log series extracted by the extraction unit 102.

類似度記憶部２０３は、算出部１０３が算出するログ系列間の類似度を記憶する。 The similarity storage unit 203 stores the similarity between log sequences calculated by the calculation unit 103.

検出結果記憶部２０３は、検出部１０４により異常ログを含むと判定されたログ系列を記憶する。 The detection result storage unit 203 stores the log series determined by the detection unit 104 as including an abnormality log.

なお、記憶部２００に含まれる各部は適宜統合分散することができる。また、記憶される情報の構成も適宜変更することができる。 Each unit included in the storage unit 200 can be appropriately integrated and distributed. Further, the configuration of the stored information can be changed as appropriate.

通信部３００は、異常ログ検出装置１０と外部との通信を実行する。通信部３００は、検出部１０４が検出した異常ログの情報を外部に送信する。通信部３００はまた、情報処理システム２０が出力するログを受信する。通信部３００は、外部との通信を可能にするものであれば具体的な構成は特に限定されない。通信部３００は、情報の入力および出力を実行する入力部および出力部の機能も備えるものとする。 The communication unit 300 performs communication between the abnormality log detection device 10 and the outside. The communication unit 300 transmits information on the abnormality log detected by the detection unit 104 to the outside. The communication unit 300 also receives a log output from the information processing system 20. The specific configuration of the communication unit 300 is not particularly limited as long as it enables communication with the outside. The communication unit 300 also includes functions of an input unit and an output unit that execute input and output of information.

（異常ログ検出処理の流れの一例）
図３は、第１の実施形態にかかる異常ログ検出処理の大まかな流れの一例を示すフローチャートである。 (Example of abnormal log detection process flow)
FIG. 3 is a flowchart illustrating an example of a rough flow of the abnormality log detection process according to the first embodiment.

異常ログ検出装置１０は、電源投入や指示入力等所定のトリガに応じて、情報処理システム２０の監視を開始する（ステップＳ４１）。異常ログ検出装置１０が監視を開始すると、情報処理システム２０から出力されるログを通信部３００を介して取得部１０１が取得し、ログ記憶部２０１に記憶する（ステップＳ４２）。ログが取得されると、抽出部１０２は、取得されたログを一行ずつずらしてｎ行のログをｎ次元のログ系列として抽出する（ステップＳ４３）。算出部１０３は、抽出されたログ系列と他の抽出済みのログ系列との類似度を算出する（ステップＳ４４）。検出部１０４は、上位所定数の類似度が閾値以下の場合、当該ログ系列は異常ログを含むと判定する。検出部１０４は、上位所定数の類似度が閾値より大きい場合、当該ログ系列は異常ログを含まないと判定する。検出部１０４は検出結果を外部に出力する（ステップＳ４５）。以上が、異常ログ検出処理の大まかな流れの一例である。 The abnormality log detection device 10 starts monitoring the information processing system 20 in response to a predetermined trigger such as power-on or instruction input (step S41). When the abnormality log detection device 10 starts monitoring, the acquisition unit 101 acquires the log output from the information processing system 20 via the communication unit 300 and stores it in the log storage unit 201 (step S42). When the logs are acquired, the extraction unit 102 shifts the acquired logs line by line and extracts n lines of logs as an n-dimensional log series (step S43). The calculating unit 103 calculates the similarity between the extracted log series and other extracted log series (step S44). The detection unit 104 determines that the log series includes an abnormality log when the upper predetermined number of similarities is equal to or less than a threshold value. When the upper predetermined number of similarities is greater than the threshold, the detection unit 104 determines that the log series does not include an abnormality log. The detection unit 104 outputs the detection result to the outside (step S45). The above is an example of a rough flow of the abnormality log detection process.

（ログ系列の抽出手法の一例）
図４は、第１の実施形態におけるログ系列の抽出手法の一例を説明するための図である。図４に示すように、ログとして、「Ａ，Ｂ，Ｃ，Ｄ，Ｅ，Ｆ，Ｇ，…」が、情報処理システム２０から出力され、異常ログ検出装置１０により取得されたものとする。なお、図４中、アルファベットの大文字は各々１つの行を表すものとする。このとき、予め５行分を一つのログ系列とする旨、異常ログ検出装置１０に設定されているとする。抽出部１０２は、最初の５行分「Ａ，Ｂ，Ｃ，Ｄ，Ｅ」を一つのログ系列として抽出する。また、抽出部１０２は、最初に抽出したログ系列の先頭から１行ずらした箇所を先頭として次のログ系列を抽出する。すなわち、抽出部１０２は、２番目の行を先頭とする５行分「Ｂ，Ｃ，Ｄ，Ｅ，Ｆ」を次のログ系列として抽出する。さらに、抽出部１０２は、さらに１行ずらして３番目の行を先頭とする５行分「Ｃ，Ｄ，Ｅ，Ｆ，Ｇ」を次のログ系列として抽出する。このように、抽出部１０２は、取得したログから、先頭を１行ずつずらしながら所定数行分を切り出してログ系列として抽出する。 (Example of log sequence extraction method)
FIG. 4 is a diagram for explaining an example of a log sequence extraction method according to the first embodiment. As shown in FIG. 4, it is assumed that “A, B, C, D, E, F, G,...” Is output from the information processing system 20 and acquired by the abnormality log detection device 10 as a log. In FIG. 4, each capital letter of the alphabet represents one line. At this time, it is assumed that the abnormality log detection apparatus 10 has previously set five lines as one log series. The extraction unit 102 extracts “A, B, C, D, E” for the first five lines as one log series. In addition, the extraction unit 102 extracts the next log sequence with the position shifted by one line from the top of the log sequence extracted first. That is, the extraction unit 102 extracts “B, C, D, E, F” for five lines starting from the second line as the next log series. Further, the extraction unit 102 further shifts one row and extracts “C, D, E, F, G” for five rows starting from the third row as the next log series. In this manner, the extraction unit 102 extracts a predetermined number of lines from the acquired log while shifting the head line by line, and extracts the log series.

（ログ系列の類似度算出手法の一例）
図５は、第１の実施形態にかかる異常ログ検出システムにおいて、ログ系列間の類似度を算出する手法の一例を説明するための図である。また、図６は、図５に示す二つのログ系列を突き合わせる場合について説明するための図である。一例として、図５では、５行のログ系列ａ「Ｚ，Ａ，Ｂ，Ｙ，Ｃ」と５行のログ系列ｂ「Ａ，Ｂ，Ｃ，Ｄ，Ｅ」との類似度を算出する場合を説明する。算出部１０３は、二つのログ系列を、一行ずつずらして突き合わせる。たとえば、ログ系列ａ「Ｚ，Ａ，Ｂ，Ｙ，Ｃ」と、ログ系列ｂ「Ａ，Ｂ，Ｃ，Ｄ，Ｅ」とを先頭をそろえて突き合わせると、両者間で一致する部分はない（図６の（１））。ログ系列ｂを１行分右にずらしてログ系列ａと突き合わせると、「Ａ，Ｂ」が両者間で一致する（図６の（２））。ログ系列ｂをさらに１行分、合計で２行分右にずらしてログ系列ａと突き合わせると、「Ｃ」が一致する（図６の（３））。このように、算出部１０３は、類似度を算出する対象となるログ系列の一方を一行ずつずらして、二つのログ系列を突き合わせ、一致する行数を数える。 (Example of log series similarity calculation method)
FIG. 5 is a diagram for explaining an example of a technique for calculating the similarity between log sequences in the abnormality log detection system according to the first embodiment. FIG. 6 is a diagram for explaining a case where the two log sequences shown in FIG. 5 are matched. As an example, in FIG. 5, the similarity between the 5-line log series a “Z, A, B, Y, C” and the 5-line log series b “A, B, C, D, E” is calculated. Will be explained. The calculation unit 103 matches the two log series by shifting one line at a time. For example, when the log sequence a “Z, A, B, Y, C” and the log sequence b “A, B, C, D, E” are matched with each other at the head, there is no matching portion between the two. ((1) in FIG. 6). When the log series b is shifted to the right by one line and matched with the log series a, “A, B” coincides between them ((2) in FIG. 6). When the log series b is further shifted to the right by one line, a total of two lines, and matched with the log series a, “C” matches ((3) in FIG. 6). In this way, the calculation unit 103 shifts one of the log series for which the similarity is to be calculated one line at a time, matches the two log series, and counts the number of matching lines.

算出部１０３は、突き合わせの結果に基づき、一致した行数の系列ｄを作成する。たとえば、図５の例では、算出部１０３は、ログ一致行数系列ｄ＝（０，０，１，２，０，０，０，０，０）を作成する。算出部１０３は、あらかじめ二つのログ系列を突き合わせる順序を設定しておき、その順序に応じて一致行数を並べてベクトルとしてログ一致行数系列を作成する。たとえば図６の例では、オフセット数を「１」から「９」まで設定し、オフセット数「１」の場合は、ログ系列ｂをログ系列ａから右に４行ずらす。オフセット数「２」の場合は、ログ系列ｂをログ系列ａから右に３行ずらす。また、オフセット数「３」の場合は、ログ系列ｂをログ系列ａから右に２行ずらす。そして、各オフセット数の場合の一致行数を数えて、ベクトル化する。図６のようにオフセット「１」から「９」の順に一致行数を並べると、図５に示すログ一致行数系列ｄが得られる。 The calculation unit 103 creates a series d of the number of matched rows based on the matching result. For example, in the example of FIG. 5, the calculation unit 103 creates a log matching row number series d = (0, 0, 1, 2, 0, 0, 0, 0, 0). The calculation unit 103 sets an order in which two log series are matched in advance, and creates a log matching line number series as a vector by arranging the number of matching lines according to the order. For example, in the example of FIG. 6, the number of offsets is set from “1” to “9”, and when the number of offsets is “1”, the log series b is shifted four lines to the right from the log series a. When the offset number is “2”, the log sequence b is shifted to the right by 3 lines from the log sequence a. When the offset number is “3”, the log series b is shifted to the right by 2 lines from the log series a. Then, the number of matching rows for each offset number is counted and vectorized. When the number of matching lines is arranged in the order of offsets “1” to “9” as shown in FIG. 6, a log matching line number series d shown in FIG. 5 is obtained.

そして、算出部１０３は、ログ一致行数系列の面積ｓ（系列の長さで正規化したもの）と尖度ｋの積を類似度ｍとして算出する。たとえば、ｎ次元（ｎ行）のログ系列Ａ

と、ｎ次元のログ系列Ｂ

と、があるとする。ログ系列Ａとログ系列Ｂとの類似度を算出する場合、ログ一致行数系列

を、

と定義する。この場合、ログ一致行数系列の面積ｓおよび尖度ｋはそれぞれ、以下の式（１）（２）で定義される。なお、式（２）中、ｖは分散を示す。 Then, the calculation unit 103 calculates the product of the area s (normalized by the length of the sequence) of the log matching row number series and the kurtosis k as the similarity m. For example, n-dimensional (n rows) log sequence A

And n-dimensional log series B

Suppose that there is. When calculating the similarity between log series A and log series B, log matching line number series

The

It is defined as In this case, the area s and the kurtosis k of the log matching row number series are defined by the following equations (1) and (2), respectively. In the formula (2), v represents dispersion.

ログ系列Ａとログ系列Ｂとの類似度ｍは、面積ｓと尖度ｋの積であるから、以下の式（３）で表される。 Since the similarity m between the log series A and the log series B is the product of the area s and the kurtosis k, it is expressed by the following equation (3).

また、ログ系列Ａとログ系列Ｂとの間の距離は、類似度の逆数、すなわち、式（４）で表される。 Further, the distance between the log sequence A and the log sequence B is expressed by the reciprocal of the similarity, that is, the equation (4).

以上に基づき、図７を参照し、例としてログ系列Ｈ１＝（Ｚ，Ａ，Ｂ，Ｙ，Ｃ）、Ｈ２＝（Ｕ，Ｖ，Ａ，Ｗ，Ｙ）、Ｌ＝（Ａ，Ｂ，Ｃ，Ｄ，Ｅ）について、Ｈ１とＬとの類似度、Ｈ２とＬとの類似度を算出する場合を考える。図７は、第１の実施形態におけるログ系列の比較と類似度の算出の具体例について説明するための図である。 Based on the above, referring to FIG. 7, as an example, the log sequence H1 = (Z, A, B, Y, C), H2 = (U, V, A, W, Y), L = (A, B, C) , D, E), consider the case of calculating the similarity between H1 and L and the similarity between H2 and L. FIG. 7 is a diagram for describing a specific example of comparison of log sequences and calculation of similarity in the first embodiment.

図７の（Ａ）は、ログ系列Ｈ１とログ系列Ｌとを比較する場合について示す。ログ系列Ｈ１と、ログ系列Ｌとを１行ずつずらして突き合わせていくと、ログ系列ＬをＨ１より右に１行分ずらしたときに、「Ａ，Ｂ」の行が一致する。また、ログ系列ＬをＨ１より右に２行分ずらしたときに、「Ｃ」の行が一致する。これ以外の場合には、一致する行はない。したがって、ログ系列Ｈ１とログ系列Ｌのログ一致行数系列は、式（５）で表すことができる。 FIG. 7A shows a case where the log sequence H1 and the log sequence L are compared. When the log series H1 and the log series L are shifted and matched one line at a time, when the log series L is shifted to the right by one line from H1, the lines “A, B” match. Further, when the log series L is shifted by two lines to the right of H1, the line “C” matches. Otherwise, there is no matching line. Accordingly, the log matching line number series of the log series H1 and the log series L can be expressed by Expression (5).

ここで、式（５）のログ一致行数系列について面積ｓを式（１）に基づいて算出すると、ｓ＝０．３３となる。また、尖度ｋを式（２）に基づいて算出すると、ｋ＝２となる。したがって、ログ系列Ｈ１とログ系列Ｌとの類似度は、ｍ（Ｈ１，Ｌ）＝０．６７となる。 Here, when the area s is calculated based on the formula (1) for the log matching row number series of the formula (5), s = 0.33. Further, when the kurtosis k is calculated based on the formula (2), k = 2. Therefore, the similarity between the log sequence H1 and the log sequence L is m (H1, L) = 0.67.

他方、図７の（Ｂ）は、ログ系列Ｈ２とログ系列Ｌとを比較する場合について示す。ログ系列Ｈ２と、ログ系列Ｌとを１行ずつずらして突き合わせていくと、ログ系列ＬをＨ２より右に２行分ずらしたときに、「Ａ」の行が一致する。これ以外の場合には、一致する行はない。したがって、ログ系列Ｈ２とログ系列Ｌのログ一致行数系列は、式（６）で表すことができる。 On the other hand, FIG. 7B shows a case where the log series H2 and the log series L are compared. When the log series H2 and the log series L are shifted and matched one line at a time, when the log series L is shifted two lines to the right of H2, the line “A” matches. Otherwise, there is no matching line. Accordingly, the log matching row number series of the log series H2 and the log series L can be expressed by Expression (6).

ここで、式（６）のログ一致行数系列について面積ｓを式（１）に基づいて算出すると、ｓ＝０．１１となる。また、尖度ｋを式（２）に基づいて算出すると、ｋ＝０．７０となる。したがって、ログ系列Ｈ２とログ系列Ｌとの類似度は、ｍ（Ｈ２，Ｌ）＝０．０７８となる。 Here, when the area s is calculated based on the formula (1) for the log matching row number series of the formula (6), s = 0.11. Further, when the kurtosis k is calculated based on the formula (2), k = 0.70. Therefore, the similarity between the log sequence H2 and the log sequence L is m (H2, L) = 0.078.

ログ系列Ｈ１とログ系列Ｌ、ログ系列Ｈ２とログ系列Ｌの場合を比較すると、ログ系列Ｈ２とログ系列Ｌの場合の方がログ系列Ｈ１とログ系列Ｌの場合よりも、面積も尖度も小さく、結果的に算出される類似度も小さい。すなわち、ログ系列Ｈ２よりログ系列Ｈ１の方がログ系列Ｌと類似している、と評価することができる。 When the log series H1 and the log series L and the log series H2 and the log series L are compared, the log series H2 and the log series L have a larger area and kurtosis than the log series H1 and the log series L. It is small and the similarity calculated as a result is also small. That is, it can be evaluated that the log sequence H1 is more similar to the log sequence L than the log sequence H2.

このように、第１の実施形態の異常ログ検出装置１０は、ログから先頭行を１行ずつずらしてログ系列を抽出した上で、上記のように他のログ系列と比較してログ一致行数系列を導出し、ログ系列間の類似度を算出する。このため、異常ログ検出装置１０は、仮に一つのログ系列の中に異なる機能により生成されたログが混ざっていたとしても、ログ系列間の類似度をログの並び順や連続性も考慮にいれて算出することができる。 As described above, the abnormality log detection apparatus 10 according to the first embodiment extracts the log series by shifting the first line from the log one line at a time, and then compares the log line with the other log series as described above. A number series is derived and the similarity between log series is calculated. For this reason, even if logs generated by different functions are mixed in one log series, the abnormality log detection apparatus 10 can consider the order of logs and the continuity of the logs. Can be calculated.

（異常検出処理の一例）
図８は、第１の実施形態にかかる異常検出処理について説明するための図である。検出部１０４が異常ログを含むログ系列を検出する手法としてはたとえば、ＬＯＦ（Local Outlier Factor）等の外れ値検出法を利用することができる。まず、図８を用いて、外れ値を検出する手法について説明する。 (Example of abnormality detection processing)
FIG. 8 is a diagram for explaining the abnormality detection process according to the first embodiment. For example, an outlier detection method such as LOF (Local Outlier Factor) can be used as a method for the detection unit 104 to detect a log sequence including an abnormal log. First, a method for detecting an outlier will be described with reference to FIG.

上述のようにログ系列を抽出し、類似度を算出した場合に、各ログ系列間の距離を、類似度の逆数で表現することができる（上記式（４））。このようにして得た各ログ系列間の距離をグラフ化すると、例えば図８に示すグラフのようになる。図８のグラフには、クラスタ１とクラスタ２が示される。クラスタ１に含まれるログ系列ｐ、ｑは過去に抽出されたログ系列である。また、クラスタ２に含まれるログ系列ｒも過去に抽出されたログ系列である。新たに抽出されたログ系列は、過去に抽出されたログ系列との距離を評価し、過去に抽出されたログ系列によって形成されるクラスタ（集団）に近ければ、過去に抽出されたログ系列と同様のログ系列である、と評価することができる。逆に、新たに抽出されたログ系列が、それまでに抽出されたログ系列により形成される何れのクラスタからも遠ければ、通常状態から逸脱する異常ログを含むログ系列である、と評価することができる。 When the log series is extracted and the similarity is calculated as described above, the distance between the log series can be expressed by the reciprocal of the similarity (the above formula (4)). When the distance between the log series obtained in this way is graphed, for example, the graph shown in FIG. 8 is obtained. In the graph of FIG. 8, cluster 1 and cluster 2 are shown. Log series p and q included in cluster 1 are log series extracted in the past. The log series r included in the cluster 2 is also a log series extracted in the past. The newly extracted log sequence evaluates the distance from the log sequence extracted in the past, and if it is close to the cluster (group) formed by the log sequence extracted in the past, the log sequence extracted in the past It can be evaluated that the log series is similar. Conversely, if the newly extracted log sequence is far from any cluster formed by the log sequence extracted so far, it is evaluated that the log sequence includes an abnormal log that deviates from the normal state. Can do.

図８の例において、たとえば、新しくログ系列ａが抽出されたとする。そしてログ系列ａとログ系列ｐおよびログ系列ｑとの距離をそれぞれ算出したところ、ログ系列ａがグラフに示す位置に配置されたとする。この場合、ログ系列ａとログ系列ｐ、ｑとの距離は、同じクラスタ１を構成する他のログ系列とログ系列ｐ、ｑとの距離とあまり変わらない。つまり、ログ系列ａは、クラスタ１を構成する通常状態のログ系列である、と評価することができる。 In the example of FIG. 8, for example, it is assumed that a new log series a is extracted. Then, when the distances between the log series a, the log series p, and the log series q are calculated, it is assumed that the log series a is arranged at the position shown in the graph. In this case, the distance between the log series a and the log series p and q is not much different from the distance between the other log series and the log series p and q constituting the same cluster 1. That is, the log sequence a can be evaluated as a log sequence in a normal state that constitutes the cluster 1.

他方、図８の例において、新しくログ系列ｂが抽出されたとする。そしてログ系列ｂと、ログ系列ｑおよびログ系列ｒとの距離をそれぞれ算出したところ、ログ系列ｂがグラフに示す位置に配置されたとする。この場合、ログ系列ｂとログ系列ｑとの距離は、クラスタ１に属する他のログ系列とログ系列ｑとの距離と比較して遠い。また、ログ系列ｂとログ系列ｒとの距離は、クラスタ２に属する他のログ系列とログ系列ｒとの距離と比較して遠い。したがって、ログ系列ｂは、クラスタ１およびクラスタ２のいずれにも属さない、通常状態を逸脱したログ系列である、と評価することができる。 On the other hand, assume that a new log series b is extracted in the example of FIG. Then, when the distance between the log series b, the log series q, and the log series r is calculated, it is assumed that the log series b is arranged at the position shown in the graph. In this case, the distance between the log series b and the log series q is far compared with the distance between the other log series belonging to the cluster 1 and the log series q. Further, the distance between the log series b and the log series r is far compared with the distance between the other log series belonging to the cluster 2 and the log series r. Therefore, it can be evaluated that the log series b is a log series that does not belong to either cluster 1 or cluster 2 and deviates from the normal state.

このように、各ログ系列について算出される距離を用いて、当該ログ系列が通常状態から逸脱しているか否かを判定することができる。 In this way, it is possible to determine whether or not the log sequence deviates from the normal state using the distance calculated for each log sequence.

異常検出処理の具体的な手順は、算出部１０３が算出した類似度または距離に基づいて検出するのであれば、特に限定されない。また、異常検出のために使用する閾値等は、過去のログ系列から算出された類似度や距離に基づいて設定すればよい。閾値の設定には機械学習等を用いてもよい。また閾値は、ログ系列から新たに算出される類似度や距離に基づいて、逐次更新していくものとしてもよい。 The specific procedure of the abnormality detection process is not particularly limited as long as it is detected based on the similarity or distance calculated by the calculation unit 103. Further, the threshold value used for abnormality detection may be set based on the similarity or distance calculated from the past log series. Machine learning or the like may be used for setting the threshold. The threshold value may be sequentially updated based on the similarity or distance newly calculated from the log series.

（第１の実施形態の効果）
第１の実施形態にかかる異常ログ検出システムは、抽出部と、算出部と、検出部とを備える。抽出部は、情報システムのログから、先頭を一行ずつずらして２行以上の所定行数のログ系列を抽出する。算出部は、抽出した１のログ系列を１行ずつずらしながら、他のログ系列と一致する行数を数えてログ一致行数系列を作成し、作成したログ一致行数系列の面積と尖度との積を、類似度として算出する。検出部は、算出した類似度に基づき、１のログ系列の異常を検出する。このため、異常ログ検出システムは、ログ系列の特性を反映してログ系列の類似度を判定し、異常ログを検出することができる。また、異常ログ検出システムは、ログ系列に複数機能に起因するログが含まれている場合であっても、ログ系列間の類似度を算出することができる。また、異常ログ検出システムは、ログの並び順や連続性を反映した類似度を算出することができる。 (Effects of the first embodiment)
The abnormality log detection system according to the first embodiment includes an extraction unit, a calculation unit, and a detection unit. The extraction unit extracts a log sequence having a predetermined number of lines of two or more lines by shifting the head line by line from the log of the information system. The calculation unit creates a log matching line number series by counting the number of lines that match another log series while shifting the extracted one log series line by line, and the area and kurtosis of the created log matching line number series Is calculated as the similarity. The detection unit detects an abnormality in one log series based on the calculated similarity. Therefore, the abnormality log detection system can detect the abnormality log by determining the similarity of the log series reflecting the characteristics of the log series. Further, the abnormality log detection system can calculate the similarity between log sequences even when the log sequence includes logs resulting from a plurality of functions. In addition, the abnormality log detection system can calculate a similarity that reflects the order and continuity of logs.

また、第１の実施形態にかかる異常ログ検出システムにおいて、抽出部は、少なくとも２つのｎ行数のログ系列

を抽出し、算出部は、２つのｎ行数のログ系列のログ一致行数系列

について、

により定義される面積ｓと、

（ただし式中、ｖは分散）により定義される尖度ｋと、の積

を、類似度として算出する。このため、異常ログ検出システムは、ログ系列に含まれるログの並び順や連続性も加味しつつ、ログ系列間の類似度を算出することができる。また、ログ系列に複数機能に起因するログが含まれていても、ログ系列間の類似度を算出することができる。 In the abnormality log detection system according to the first embodiment, the extraction unit includes at least two log sequences of n rows.

The calculation unit extracts the log matching line number series of the two n line log series.

about,

An area s defined by

Product of kurtosis k defined by (where v is the variance)

Is calculated as the similarity. For this reason, the abnormality log detection system can calculate the similarity between the log sequences while taking into consideration the arrangement order and continuity of the logs included in the log sequence. Further, even when logs resulting from a plurality of functions are included in the log series, the similarity between the log series can be calculated.

また、第１の実施形態にかかる異常ログ検出システムにおいて、算出部はさらに、類似度に基づき、１のログ系列と他のログ系列との間の距離を算出し、検出部は、類似度に基づき算出された距離に基づき、１のログ系列の異常を検出する。このため、異常ログ検出システムは、距離によってあらわされるログ系列間の類似度に基づき、容易に通常状態を逸脱するログ系列を検出することができる。 In the abnormality log detection system according to the first embodiment, the calculation unit further calculates a distance between one log series and another log series based on the similarity, and the detection unit determines the similarity. Based on the calculated distance, an abnormality in one log series is detected. For this reason, the abnormal log detection system can easily detect a log sequence deviating from the normal state based on the similarity between log sequences represented by the distance.

また、第１の実施形態にかかる異常ログ検出システムにおいて、算出部は、類似度の逆数を距離として算出し、検出部は、１のログ系列と他のログ系列との距離が、他のログ系列間の距離よりも所定値以上長い場合に、１のログ系列の異常を検出する。このため、異常ログ検出システムは、容易に通常状態を逸脱するログ系列を検出することができる。 In the abnormality log detection system according to the first embodiment, the calculation unit calculates the reciprocal of the similarity as a distance, and the detection unit calculates the distance between one log series and another log series as another log. If the distance between the series is longer than a predetermined value, an abnormality in one log series is detected. For this reason, the abnormal log detection system can easily detect a log sequence deviating from the normal state.

また、上記のように構成した異常ログ検出システムは、情報システムを自動的に監視して、ログの通常状態を学習し、通常状態を逸脱するログを検出することができる。このため、異常ログ検出システムは、通常と異なる異常な振る舞いをするログを早期に検出して、情報処理システムの不具合に早期に対応し、情報処理システムの安定した運用を実現することを可能にする。 Further, the abnormality log detection system configured as described above can automatically monitor the information system, learn the normal state of the log, and detect a log that deviates from the normal state. For this reason, the abnormal log detection system can detect logs that behave abnormally different from normal at an early stage, respond to problems in the information processing system at an early stage, and realize stable operation of the information processing system. To do.

（システム構成等）
また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況等に応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。さらに、各装置にて行なわれる各処理機能は、その全部または任意の一部が、ＣＰＵ（Central Processing Unit）および当該ＣＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 (System configuration etc.)
Further, each component of each illustrated apparatus is functionally conceptual, and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution / integration of each device is not limited to that shown in the figure, and all or a part thereof may be functionally or physically distributed or arbitrarily distributed in arbitrary units according to various loads or usage conditions. Can be integrated and configured. Further, all or any part of each processing function performed in each device is realized by a CPU (Central Processing Unit) and a program analyzed and executed by the CPU, or hardware by wired logic. Can be realized as

また、本実施形態において説明した各処理のうち、自動的におこなわれるものとして説明した処理の全部または一部を手動的におこなうこともでき、あるいは、手動的におこなわれるものとして説明した処理の全部または一部を公知の方法で自動的におこなうこともできる。この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 In addition, among the processes described in the present embodiment, all or part of the processes described as being automatically performed can be manually performed, or the processes described as being manually performed can be performed. All or a part can be automatically performed by a known method. In addition, the processing procedure, control procedure, specific name, and information including various data and parameters shown in the above-described document and drawings can be arbitrarily changed unless otherwise specified.

（プログラム）
一実施形態として、異常ログ検出装置１０は、パッケージソフトウェアやオンラインソフトウェアとして上記の監視を実行する異常ログ検出プログラムを所望のコンピュータにインストールさせることによって実装できる。例えば、上記の異常ログ検出プログラムを情報処理装置に実行させることにより、情報処理装置を異常ログ検出装置１０として機能させることができる。ここで言う情報処理装置には、デスクトップ型またはノート型のパーソナルコンピュータが含まれる。また、その他にも、情報処理装置にはスマートフォン、携帯電話機やＰＨＳ（Personal Handyphone System）等の移動体通信端末、さらには、ＰＤＡ（Personal Digital Assistants）等のスレート端末等がその範疇に含まれる。 (program)
As an embodiment, the abnormality log detection apparatus 10 can be implemented by installing an abnormality log detection program for executing the above monitoring as package software or online software on a desired computer. For example, the information processing apparatus can function as the abnormality log detection apparatus 10 by causing the information processing apparatus to execute the above-described abnormality log detection program. The information processing apparatus referred to here includes a desktop or notebook personal computer. In addition, the information processing apparatus includes mobile communication terminals such as smartphones, mobile phones and PHS (Personal Handyphone System), and slate terminals such as PDA (Personal Digital Assistants).

また、異常ログ検出装置１０は、ユーザが使用する端末装置をクライアントとし、当該クライアントに上記の異常ログ検出に関するサービスを提供するサーバ装置として実装することもできる。例えば、異常ログ検出装置１０は、ログを入力とし、検出した異常ログを含むログ系列を出力とする異常ログ検出サービスを提供するサーバ装置として実装される。この場合、異常ログ検出装置１０は、Ｗｅｂサーバとして実装することとしてもよいし、アウトソーシングによって上記の異常ログ検出に関するサービスを提供するクラウドとして実装することとしてもかまわない。 In addition, the abnormality log detection device 10 can be implemented as a server device that uses a terminal device used by a user as a client and provides the client with the above-described service related to abnormality log detection. For example, the abnormality log detection device 10 is implemented as a server device that provides an abnormality log detection service that receives a log as an input and outputs a log sequence including the detected abnormality log. In this case, the abnormality log detection device 10 may be implemented as a Web server, or may be implemented as a cloud that provides the above-described service relating to abnormality log detection by outsourcing.

図９は、プログラムが実行されることにより、異常ログ検出装置が実現されるコンピュータの一例を示す図である。コンピュータ１０００は、例えば、メモリ１０１０、ＣＰＵ１０２０を有する。また、コンピュータ１０００は、ハードディスクドライブインタフェース１０３０、ディスクドライブインタフェース１０４０、シリアルポートインタフェース１０５０、ビデオアダプタ１０６０、ネットワークインタフェース１０７０を有する。これらの各部は、バス１０８０によって接続される。 FIG. 9 is a diagram illustrating an example of a computer in which an abnormality log detection apparatus is realized by executing a program. The computer 1000 includes a memory 1010 and a CPU 1020, for example. The computer 1000 also includes a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These units are connected by a bus 1080.

メモリ１０１０は、ＲＯＭ（Read Only Memory）１０１１およびＲＡＭ（Random Access Memory）１０１２を含む。ＲＯＭ１０１１は、例えば、ＢＩＯＳ（Basic Input Output System）等のブートプログラムを記憶する。ハードディスクドライブインタフェース１０３０は、ハードディスクドライブ１０９０に接続される。ディスクドライブインタフェース１０４０は、ディスクドライブ１１００に接続される。例えば磁気ディスクや光ディスク等の着脱可能な記憶媒体が、ディスクドライブ１１００に挿入される。シリアルポートインタフェース１０５０は、例えばマウス１１１０、キーボード１１２０に接続される。ビデオアダプタ１０６０は、例えばディスプレイ１１３０に接続される。 The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM (Random Access Memory) 1012. The ROM 1011 stores a boot program such as BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to the hard disk drive 1090. The disk drive interface 1040 is connected to the disk drive 1100. For example, a removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1100. The serial port interface 1050 is connected to a mouse 1110 and a keyboard 1120, for example. The video adapter 1060 is connected to the display 1130, for example.

ハードディスクドライブ１０９０は、例えば、ＯＳ１０９１、アプリケーションプログラム１０９２、プログラムモジュール１０９３、プログラムデータ１０９４を記憶する。すなわち、異常ログ検出装置１０の各処理を規定するプログラムは、コンピュータ１０００により実行可能なコードが記述されたプログラムモジュール１０９３として実装される。プログラムモジュール１０９３は、例えばハードディスクドライブ１０９０に記憶される。例えば、異常ログ検出装置１０における機能構成と同様の処理を実行するためのプログラムモジュール１０９３が、ハードディスクドライブ１０９０に記憶される。なお、ハードディスクドライブ１０９０は、ＳＳＤ（Solid State Drive）により代替されてもよい。 The hard disk drive 1090 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. That is, a program that defines each process of the abnormality log detection apparatus 10 is implemented as a program module 1093 in which a code executable by the computer 1000 is described. The program module 1093 is stored in the hard disk drive 1090, for example. For example, a program module 1093 for executing processing similar to the functional configuration in the abnormality log detection device 10 is stored in the hard disk drive 1090. The hard disk drive 1090 may be replaced by an SSD (Solid State Drive).

また、上述した実施形態の処理で用いられる設定データは、プログラムデータ１０９４として、例えばメモリ１０１０やハードディスクドライブ１０９０に記憶される。そして、ＣＰＵ１０２０が、メモリ１０１０やハードディスクドライブ１０９０に記憶されたプログラムモジュール１０９３やプログラムデータ１０９４を必要に応じてＲＡＭ１０１２に読み出して実行する。 The setting data used in the processing of the above-described embodiment is stored as program data 1094 in, for example, the memory 1010 or the hard disk drive 1090. Then, the CPU 1020 reads the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1090 to the RAM 1012 and executes them as necessary.

なお、プログラムモジュール１０９３やプログラムデータ１０９４は、ハードディスクドライブ１０９０に記憶される場合に限らず、例えば着脱可能な記憶媒体に記憶され、ディスクドライブ１１００等を介してＣＰＵ１０２０によって読み出されてもよい。あるいは、プログラムモジュール１０９３およびプログラムデータ１０９４は、ネットワーク（ＬＡＮ（Local Area Network）、ＷＡＮ（Wide Area Network）等）を介して接続された他のコンピュータに記憶されてもよい。そして、プログラムモジュール１０９３およびプログラムデータ１０９４は、他のコンピュータから、ネットワークインタフェース１０７０を介してＣＰＵ１０２０によって読み出されてもよい。 The program module 1093 and the program data 1094 are not limited to being stored in the hard disk drive 1090, but may be stored in, for example, a removable storage medium and read out by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program module 1093 and the program data 1094 may be stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.). The program module 1093 and the program data 1094 may be read by the CPU 1020 from another computer via the network interface 1070.

上記の実施形態やその変形は、本願が開示する技術に含まれると同様に、特許請求の範囲に記載された発明とその均等の範囲に含まれるものである。 The above embodiments and modifications thereof are included in the invention disclosed in the claims and equivalents thereof as well as included in the technology disclosed in the present application.

１異常ログ検出システム
１０異常ログ検出装置
２０情報処理システム
３０ネットワーク
１００制御部
１０１取得部
１０２抽出部
１０３算出部
１０４検出部
２００記憶部
２０１ログ記憶部
２０２ログ系列記憶部
２０３類似度記憶部
２０４検出結果記憶部
３００通信部 DESCRIPTION OF SYMBOLS 1 Abnormality log detection system 10 Abnormality log detection apparatus 20 Information processing system 30 Network 100 Control part 101 Acquisition part 102 Extraction part 103 Calculation part 104 Detection part 200 Storage part 201 Log storage part 202 Log sequence storage part 203 Similarity degree storage part 204 Detection Result storage unit 300 Communication unit

Claims

An extraction unit that extracts a log sequence of a predetermined number of lines of two or more lines by shifting the head line by line from the log of the information system;
While shifting the extracted log series one line at a time, count the number of lines that match the other log series to create a log matching line number series, and multiply the area and kurtosis of the created log matching line number series , As a similarity,
Based on the calculated similarity, a detection unit that detects an abnormality in the one log series;
An abnormal log detection system comprising:

The extraction unit includes at least two log sequences of n rows.

Extract
The calculation unit includes a log matching row number sequence of the two n row log sequences.

about,

An area s defined by

Product of kurtosis k defined by (where v is the variance)

The abnormality log detection system according to claim 1, wherein the similarity is calculated as the similarity.

The calculation unit further calculates a distance between the one log series and the other log series based on the similarity,
The abnormality log detection system according to claim 1, wherein the detection unit detects an abnormality of the one log series based on the distance calculated based on the similarity.

The calculation unit calculates the reciprocal of the similarity as the distance,
The detection unit detects an abnormality in the first log series when a distance between the first log series and the other log series is longer than a distance between the other log series by a predetermined value or more. The abnormality log detection system according to claim 3, wherein

An extraction step of extracting a log sequence of a predetermined number of lines of two or more lines by shifting the head line by line from the log of the information system;
While shifting the extracted log series one line at a time, count the number of lines that match the other log series to create a log matching line number series, and multiply the area and kurtosis of the created log matching line number series , As a similarity,
A detection step of detecting an abnormality in the log sequence of 1 based on the calculated similarity;
An abnormality log detection method characterized in that a computer executes the process.