JP2019110513A

JP2019110513A - Anomaly detection method, learning method, anomaly detection device, and learning device

Info

Publication number: JP2019110513A
Application number: JP2018117398A
Authority: JP
Inventors: 達海大庭; Tatsumi Oba; 郁大濱; Iku Ohama
Original assignee: Panasonic Intellectual Property Corp of America
Current assignee: Panasonic Intellectual Property Corp of America
Priority date: 2017-12-15
Filing date: 2018-06-20
Publication date: 2019-07-04
Anticipated expiration: 2038-06-20
Also published as: JP7082533B2

Abstract

To provide, for example, an anomaly detection method which allows accurate identification of anomalous packets, or a learning method for learning anomaly detection models for the accurate identification.SOLUTION: An anomaly detection method includes the steps of: extracting, for each of multiple learning packets obtained, all possible combinations of N-grams in a payload included in the learning packet; counting a first number, which is the number of occurrences of each of all the combinations extracted from the packet in the payloads included in the packets obtained; calculating, as anomaly detection models, first probabilities by performing smoothing processing based on the multiple first numbers; and, when a score calculated for each of the multiple packets exceeds a predetermined threshold that is based on the anomaly detection models stored in a memory, outputting an indication that the packet having the score calculated has an anomaly.SELECTED DRAWING: Figure 14

Description

本開示は、学習モデルを用いて複数のパケットにおける異常を検知する異常検知方法および異常検知装置、並びに、当該学習モデルの学習方法および学習装置に関する。 The present disclosure relates to an abnormality detection method and an abnormality detection device for detecting an abnormality in a plurality of packets using a learning model, and a learning method and a learning device for the learning model.

特許文献１では、Ｎグラムを用いてデータの異常検知を行う方法が開示されている。 Patent Document 1 discloses a method of detecting abnormality of data using N-grams.

特表２００９−５２３２７０号公報Japanese Patent Publication 2009-523270

本開示は、異常なパケットを、精度よく特定することができる異常検知方法、または、精度よく特定するための異常検知モデルを学習する学習方法などを提供する。 The present disclosure provides an anomaly detection method capable of identifying an abnormal packet with high accuracy, or a learning method of learning an anomaly detection model for identifying the packet with high accuracy.

本開示の一態様に係る異常検知方法は、監視対象内での通信、または、前記監視対象と前記監視対象が接続されているネットワークとの間での通信に異常があるか否かを検知する異常検知装置が実行する異常検知方法であって、前記異常検知装置は、プロセッサおよびメモリを備え、前記メモリは、複数の学習用パケットを用いた学習により生成された異常検知モデルを記憶しており、前記異常検知方法では、前記プロセッサが、前記複数の学習用パケットを取得し、取得した前記複数の学習用パケットのそれぞれについて、当該学習用パケットに含まれるペイロードを構成するデータ列をＡ（Ａは１以上の整数）ビット単位で区切ることにより得られる複数個のデータ単位のうちのＮ（Ｎは２以上の整数）個のデータ単位の取り得る全ての第１の組み合わせであって、当該ペイロードにおける互いに連続している並び順、または、Ｂ（Ｂは１以上の整数）個飛ばしの並び順でのＮ個のデータ単位の第１の組み合わせを抽出し、前記複数の学習用パケットについて抽出した前記全ての第１の組み合わせのそれぞれについて、当該第１の組み合わせが前記複数の学習用パケットにおいて出現する回数である第１の数をカウントし、抽出した前記全ての第１の組み合わせのそれぞれについて、カウントすることで得られた複数の前記第１の数に基づいて、スムージング処理を行うことで前記複数の学習用パケットにおいて当該第１の組み合わせが出現する確率である複数の第１の確率を算出し、算出した前記複数の第１の確率を前記異常検知モデルとして前記メモリに記憶させ、複数のパケットを取得し、取得した複数のパケットのそれぞれについて、当該パケットに対して算出したスコアが、前記メモリに記憶されている前記異常検知モデルに基づく所定の閾値を超えている場合、当該スコアが算出されたパケットが以上であることを出力する。 An abnormality detection method according to an aspect of the present disclosure detects whether there is an abnormality in communication within a monitoring target or communication between the monitoring target and a network to which the monitoring target is connected. The abnormality detection method executes the abnormality detection device, the abnormality detection device includes a processor and a memory, and the memory stores an abnormality detection model generated by learning using a plurality of learning packets. In the abnormality detection method, the processor acquires the plurality of learning packets, and for each of the acquired plurality of learning packets, a data string constituting a payload included in the learning packet is A (A Is an integer of 1 or more) N (N is an integer of 2 or more) of a plurality of data units obtained by dividing in units of bits Extracting a first combination of N data units in combination order of consecutive ones in the payload, or in the arrangement order of B (B is an integer of 1 or more) skips of the combination; For each of all the first combinations extracted for the plurality of learning packets, a first number, which is the number of times the first combination appears in the plurality of learning packets, is counted, and all the extracted combinations are extracted For each of the first combinations, the smoothing process is performed based on the plurality of first numbers obtained by counting, and the probability is that the first combination appears in the plurality of learning packets. Calculating a plurality of first probabilities and storing the calculated first probabilities in the memory as the abnormality detection model; If a score is calculated for each of a plurality of acquired packets and the acquired packets exceed a predetermined threshold based on the abnormality detection model stored in the memory, the score is calculated. Output that the packet being sent is above.

また、本開示の一態様に係る学習方法は、監視対象内での通信、または、前記監視対象と前記監視対象が接続されているネットワークとの間での通信に異常があるか否かを検知するための異常検知モデルを学習する学習装置が実行する学習方法であって、前記学習装置は、プロセッサおよびメモリを備え、前記学習方法では、前記プロセッサが、複数の学習用パケットを取得し、取得した前記複数の学習用パケットのそれぞれについて、当該学習用パケットに含まれるペイロードを構成するデータ列をＡ（Ａは１以上の整数）ビット単位で区切ることにより得られる複数個のデータ単位のうちのＮ（Ｎは２以上の整数）個のデータ単位の取り得る全ての第１の組み合わせであって、当該ペイロードにおける互いに連続している並び順、または、Ｂ（Ｂは１以上の整数）個飛ばしの並び順でのＮ個のデータ単位の第１の組み合わせを抽出し、前記複数の学習用パケットについて抽出した前記全ての第１の組み合わせのそれぞれについて、当該第１の組み合わせが前記複数の学習用パケットにおいて出現する回数である第１の数をカウントし、抽出した前記全ての第１の組み合わせのそれぞれについて、カウントすることで得られた複数の前記第１の数に基づいて、スムージング処理を行うことで前記複数の学習用パケットにおいて当該第１の組み合わせが出現する確率である複数の第１の確率を算出し、算出した前記複数の第１の確率を前記異常検知モデルとして前記メモリに記憶させる。 In addition, the learning method according to an aspect of the present disclosure detects whether there is an abnormality in communication within a monitoring target or communication between the monitoring target and a network to which the monitoring target is connected. A learning device for learning an abnormality detection model to be executed, the learning device including a processor and a memory, in which the processor acquires and acquires a plurality of learning packets Of each of the plurality of learning packets, a plurality of data units obtained by dividing a data string constituting a payload included in the learning packet into A (A is an integer of 1 or more) bits units All possible first combinations of N (N is an integer of 2 or more) data units, in a sequential order of consecutive in the payload, or B (B is an integer of 1 or more) A first combination of N data units in the order of skipping is extracted, and for each of the first combinations extracted for the plurality of learning packets, A plurality of the first obtained by counting the first number which is the number of times the first combination appears in the plurality of learning packets, and counting each of the extracted first combinations. A plurality of first probabilities, which are probabilities of occurrence of the first combination in the plurality of learning packets, are calculated by performing smoothing processing based on the number of 1, and the plurality of calculated first probabilities Are stored in the memory as the abnormality detection model.

なお、これらの全般的または具体的な態様は、システム、装置、集積回路、コンピュータプログラムまたはコンピュータ読み取り可能なＣＤ−ＲＯＭなどの記録媒体で実現されてもよく、システム、装置、集積回路、コンピュータプログラムおよび記録媒体の任意な組み合わせで実現されてもよい。 Note that these general or specific aspects may be realized by a system, a device, an integrated circuit, a computer program, or a recording medium such as a computer readable CD-ROM, and the system, the device, the integrated circuit, the computer program And any combination of recording media.

本開示における異常検知方法、学習方法、異常検知装置、および、学習装置を用いることで、異常なパケットを精度よく特定することができる。 By using the abnormality detection method, the learning method, the abnormality detection device, and the learning device according to the present disclosure, an abnormal packet can be identified with high accuracy.

図１は、実施の形態に係る異常検知システムの概略図である。FIG. 1 is a schematic view of an abnormality detection system according to the embodiment. 図２は、実施の形態に係る異常検知装置のハードウェア構成の一例を示すブロック図である。FIG. 2 is a block diagram showing an example of the hardware configuration of the abnormality detection apparatus according to the embodiment. 図３は、本実施の形態における異常検知装置の機能構成の一例を示すブロック図である。FIG. 3 is a block diagram showing an example of a functional configuration of the abnormality detection device according to the present embodiment. 図４Ａは、Ｅｔｈｅｒｎｅｔフレームの構造を示す図である。FIG. 4A is a diagram showing the structure of an Ethernet frame. 図４Ｂは、ＴＣＰプロトコルのパケット構造の一例を示す図である。FIG. 4B is a diagram showing an example of a packet structure of the TCP protocol. 図５は、異常検知モデルＤＢが保持する異常検知モデルの例を示す図である。FIG. 5 is a diagram showing an example of the abnormality detection model held by the abnormality detection model DB. 図６は、異常検知モデルＤＢが保持する異常検知モデルの例を示す図である。FIG. 6 is a diagram showing an example of the abnormality detection model held by the abnormality detection model DB. 図７は、異常検知モデルＤＢが保持する異常検知モデルの例を示す図である。FIG. 7 is a diagram showing an example of the abnormality detection model held by the abnormality detection model DB. 図８は、異常検知モデルＤＢが保持する異常検知モデルの例を示す図である。FIG. 8 is a diagram showing an example of the abnormality detection model held by the abnormality detection model DB. 図９は、宛先ポートとアラート閾値とが対応付けられた対応情報を示す図である。FIG. 9 is a diagram showing correspondence information in which a destination port and an alert threshold are associated. 図１０は、異常検知装置における動作の概要を示すフローチャートである。FIG. 10 is a flowchart showing an outline of the operation in the abnormality detection device. 図１１は、異常検知装置における学習処理の詳細の一例を示すフローチャートである。FIG. 11 is a flowchart showing an example of the details of the learning process in the abnormality detection device. 図１２は、アラート閾値決定処理の詳細の一例を示すフローチャートである。FIG. 12 is a flowchart illustrating an example of the alert threshold determination process in detail. 図１３は、アラート閾値決定処理の詳細の他の一例を示すフローチャートである。FIG. 13 is a flowchart illustrating another example of details of the alert threshold determination process. 図１４は、異常検知装置における検査処理の詳細の一例を示すフローチャートである。FIG. 14 is a flowchart showing an example of the details of inspection processing in the abnormality detection device. 図１５は、ＦＴＰプロトコルにおいて評価を行った場合の本実施の形態に係る異常検知方法と他の手法とを比較した実験結果を示す図である。FIG. 15 is a diagram showing an experimental result comparing the abnormality detection method according to the present embodiment and another method when evaluation is performed in the FTP protocol. 図１６は、ＴＥＬＮＥＴプロトコルにおいて評価を行った場合の本実施の形態に係る異常検知方法と他の手法とを比較した実験結果を示す図である。FIG. 16 is a diagram showing experimental results comparing the anomaly detection method according to the present embodiment and another method when evaluation is performed in the TELNET protocol.

（本発明の基礎となった知見）
［１−１背景］
近年、制御システム（工場、プラント、重要インフラなど）に対するサイバー攻撃の脅威が高まり、サイバー攻撃による被害が増加傾向にある。上記のような制御システムの被害が増加してきた理由として、下記の事項が指摘される。 (Findings that formed the basis of the present invention)
[1-1 Background]
In recent years, the threat of cyber attacks on control systems (plants, plants, important infrastructure, etc.) has increased, and the damage by cyber attacks has been on the rise. The following points are pointed out as reasons for the increase in damage to control systems as described above.

（１）信頼性や制御の利便性を向上させるために、制御システムを含むシステム間は、相互接続されるようになった。このため、制御システムは、外部にさらされることとなり、サイバーセキュリティの脅威を増加させることに至ったと考えられる。 (1) In order to improve the reliability and the convenience of control, the systems including the control system are interconnected. Therefore, the control system is exposed to the outside, which is considered to have increased the threat of cyber security.

（２）システムの相互接続性、汎用性を上げるために、制御システムの内部ネットワークは、Ｍｏｄｂｕｓ、ＥｔｈｅｒＣＡＴ、ＢＡＣｎｅｔなどのオープンプロトコルを用いた通信を行うようになった。このため、制御システムは、マルウェアの感染などのような攻撃の可能性を飛躍的に高めることに繋がったと考えられる。 (2) In order to improve system interoperability and versatility, the internal network of the control system has come to communicate using open protocols such as Modbus, EtherCAT, BACnet and the like. Therefore, it is considered that the control system has led to dramatically increasing the possibility of attacks such as malware infection.

（３）制御システムには、セキュリティ対策が施されていない場合が多く、制御システムでは、１つのシステムを何十年にも渡って使用することが多い。このため、このような制御システムでは、使用期間中にＯＳ等のサポートが終了してＰＣ端末にセキュリティパッチの適用が出来ないケース、または、ウィルス対策ソフトを導入できないケースがたびたび見受けられる。 (3) Control systems often do not have security measures in place, and control systems often use one system for many decades. For this reason, in such a control system, there are often found cases where the support of the OS and the like is terminated during the use period and the security patch can not be applied to the PC terminal or cases where anti-virus software can not be introduced.

上記（１）及び（２）については、産業的なメリットが非常に大きいことから、これらの傾向は今後ますます拡大していくものと考えられている。本開示では、上記（３）に示すようにセキュリティ対策の導入や設備の変更が容易ではない制御システムのセキュリティを確保するため、ネットワークレベルでの侵入検知技術に着目する。ネットワーク侵入検知システム（ＮＩＤＳ：Network-based Intrusion Detection System）は、対象とするネットワークをパッシブにモニタリングするシステムであり、設備に直接的な変更を加える必要がない。このため、可用性が重視される制御システムにおいても導入しやすいという利点がある。 With regard to the above (1) and (2), these trends are considered to be further expanded in the future, because the industrial merits are very large. The present disclosure focuses on intrusion detection technology at the network level in order to ensure security of a control system in which the introduction of security measures and the change of equipment are not easy as described in (3) above. A Network Intrusion Detection System (NIDS) is a system that passively monitors a target network, and there is no need to make a direct change to equipment. For this reason, there is an advantage that it is easy to introduce even in a control system where availability is important.

［１−１−１侵入検知システムの種類と性質］
侵入検知システム（ＩＤＳ：Intrusion Detection System）は一般にホストベースＩＤＳ（ＨＩＤＳ：Host-based Intrusion Detection System）とネットワークベースＩＤＳ（ＮＩＤＳ）に分類される。制御システムにおいては、ＮＩＤＳを利用することが一般的である。制御システムにおいてＮＩＤＳの利用が好ましい点として、生産設備に直接手を加えずに済む点が挙げられる。ＮＩＤＳであれば、監視対象となる制御機器のＯＳ、リソースなどに無関係に導入できる。一方で、ウィルス検知ソフトなどのＨＩＤＳでは、ウィルスのスキャン時に端末に大きな負荷がかかり、生産に関わるソフトウェアの処理速度が下がって生産活動への影響が発生する可能性などが存在する。 [1-1-1 Type and nature of intrusion detection system]
Intrusion detection systems (IDS) are generally classified into host-based intrusion detection systems (HIDS) and network-based IDS (NIDS). In control systems, it is common to use NIDS. The preferred use of NIDS in the control system is that it avoids direct manipulation of production equipment. If it is NIDS, it can be introduced regardless of OS, resources, etc. of the control device to be monitored. On the other hand, in HIDS such as virus detection software, a large load is placed on the terminal at the time of virus scanning, and there is a possibility that the processing speed of the software relating to production will be reduced to affect production activity.

またＮＩＤＳは更にシグネチャ／ルール方式とアノマリ方式とに大別され、アノマリ方式のＮＩＤＳは更にフローベースとペイロードベースとに分類される。シグネチャ／ルール方式のＮＩＤＳは、一般によく利用されているものであり、ペイロードに含まれる特定のバイト列のパターンを見つけ出した場合、または、通信量が予め定められた闘値を超えた場合にアラートを発するものである。フローペースのＮＩＤＳではパケットのヘッダのみを観測し、当該ネットワークにおけるトラフィックのサイズや周期などのフロー情報に関して定常状態のモデルを生成し、定常状態から逸脱するようなトラフィックを検知した場合にアラートを発する。フローペースのＮＩＤＳではパケットのヘッダ情報しか用いないため、暗号化された通信、または、ペイロードを伴わない通信に対しても異常を検知できる。一方で、ペイロードベースのＮＩＤＳでは、パケットのペイロード情報を観測し、そのペイロードが通常の通信内容から逸脱していないかどうかを判断する。フローベースの方式ではシステムのメンテナンス、非定常的なファイル転送など、検知したくない状態の変化に対して敏感に反応してしまう恐れがあるが、ペイロードベースの方式はその恐れが少ないし、フロー情報には反映されない巧妙な攻撃を検知できる可能性がある。 The NIDS is further roughly divided into a signature / rule system and an anomaly system, and the anomaly type NIDS is further classified into a flow base and a payload base. The signature / rule NIDS is commonly used and is alerted when it finds a particular byte string pattern contained in the payload, or when the traffic exceeds a predetermined threshold. Emit Flow-based NIDS observes only the packet header, generates a steady-state model for flow information such as traffic size and cycle in the network, and alerts when it detects traffic that deviates from the steady state . Since the flow-based NIDS uses only packet header information, it can detect anomalies even for encrypted communications or communications without payload. On the other hand, payload-based NIDS observes packet payload information and determines whether the payload deviates from normal communication content. The flow-based method may react sensitively to changes in the state that you do not want to detect, such as system maintenance and non-stationary file transfer, but the payload-based method is less likely to It is possible to detect clever attacks that are not reflected in the information.

［１−１−２本開示の概要と効果］
本開示では新たなペイロードベースの異常検知方法を説明する。本開示の異常検知方法に、ペイロードベースの異常検知技術を採用した理由として、下記の点が挙げられる。 [1-1-2 Summary and effect of the present disclosure]
This disclosure describes a new payload-based anomaly detection method. The following points can be mentioned as reasons for adopting the payload-based anomaly detection technology in the anomaly detection method of the present disclosure.

・制御システムが用いられる環境では多くのオペレーションは自動化されている。しかし、制御システムには、人間による操作時、メンテナンス時、製造物の変更時などには多くの非定常的なオペレーションが入る。フローベースＮＩＤＳでは、このような非定常的なオペレーションが多く検知されてしまう恐れがある。一方で、ペイロードベースＮＩＤＳでは、オペレーションの内容自体に普遍性があれば誤検知を防ぐことができるという利点がある。 Many operations are automated in environments where control systems are used. However, the control system includes many non-stationary operations during human operation, maintenance, product changes, and the like. In flow-based NIDS, many such non-stationary operations may be detected. On the other hand, payload-based NIDS has the advantage that false detection can be prevented if the contents of the operation itself are universal.

・非常に精巧に作りこまれたマルウェアの場合、フロー情報に現れないような巧妙な攻撃が仕掛けられた場合に、制御システムでは、パケットのペイロードを監視しないとパケットの異常を検知できない恐れがある。・ In the case of highly sophisticated malware, if a clever attack that does not appear in the flow information is launched, the control system may not be able to detect packet anomalies without monitoring the packet payload. .

・フローベースのＮＩＤＳでは、制御システムが悪意を持った正規のオペレータにより操作される場合の異常、また正規のオペレータにより誤って異常パラメータが入力される場合の異常を、フローとしては正常なものと一致するため検知できない。ペイロードベースＮＩＤＳではこのような異常を検知することも可能である。 In the flow-based NIDS, if the control system is operated by a malicious authorized operator, or if an anomaly parameter is erroneously input by the authorized operator, the anomaly is regarded as a normal flow. It can not be detected because it matches. The payload-based NIDS can also detect such an anomaly.

本開示の異常検知方法などでは以下の効果が奏される。 The following effects are exhibited by the abnormality detection method of the present disclosure and the like.

・本開示の異常検知方法は、パケットのペイロードのＮ−ｇｒａｍ情報を利用することで、高い性能（低誤検知率、高検知率）を実現できる。 -The abnormality detection method of this indication can implement | achieve high performance (low false alarm rate, high detection rate) by utilizing N-gram information of the payload of a packet.

・本開示の異常検知方法は、デプロイする環境に応じてチューニングが行う必要性が無く、膨大なネットワーク環境においても自動で異常検知システムを構成するのに適している。 The abnormality detection method according to the present disclosure does not have to be tuned according to the deployment environment, and is suitable for automatically configuring an abnormality detection system even in a vast network environment.

・本開示の異常検知方法は、比較的チューニングの手間が少ない既存手法であるＰＡＹＬ、ＡＮＡＧＲＡＭよりも、１９９９ＤＡＲＰＡＩＤＳＤａｔａＳｅｔのＦＴＰプロトコル、ＴＥＬ−ＮＥＴプロトコルの異常検知性能において優れた性能を発揮することができる。 The anomaly detection method of the present disclosure exhibits superior performance in anomaly detection performance of the FTP protocol of the 1999 DARPA IDS Data Set and the TEL-NET protocol than PAYL and ANAGRAM, which are relatively less time-consuming tuning techniques. be able to.

［１−２基本的な技術］
本開示の異常検知方法を説明する前に、本開示で用いられている基本的な技術についての説明を行う。 [1-2 Basic technology]
Before describing the anomaly detection method of the present disclosure, the basic technology used in the present disclosure will be described.

［１−２−１Ｎ−ｇｒａｍ］
Ｎ−ｇｒａｍは与えられた文字、単語などの要素から成る系列データに対し、Ｎ個の連続する並びのことである。例えばＤＮＡの塩基配列において、・・・ＡＧＣＴＴＣＧＡ・・・という列が与えられた場合、この列に現れる１−ｇｒａｍは・・・、Ａ，Ｇ，Ｃ，Ｔ，Ｔ，Ｃ，Ｇ，Ａ，・・・であり、２−ｇｒａｍは・・・，ＡＧ，ＧＣ，ＣＴ，ＴＴ，ＴＣ，ＣＧ，ＧＡ，・・・であり、３−ｇｒａｍは・・・，ＡＧＣ，ＧＣＴ，ＣＴＴ，ＴＴＣ，ＴＣＧ，ＣＧＡ，・・・である。例えば文中に・・・ｔｏｂｅｏｒｎｏｔｔｏｂｅ・・・という列が現れたとき、各要素を単語とみなすと、この列に現れる１−ｇｒａｍは・・・，ｔｏ，ｂｅ，ｏｒ，ｎｏｔ，ｔｏ，ｂｅ，・・・であり、２−ｇｒａｍは・・・，ｔｏｂｅ，ｂｅｏｒ，ｏｒｎｏｔ，ｎｏｔｔｏ，ｔｏｂｅ，・・・であり、３−ｇｒａｍは・・・，ｔｏｂｅｏｒ，ｂｅｏｒｎｏｔ，ｏｒｎｏｔｔｏ，ｎｏｔｔｏｂｅ，・・・である。 [1-2-1 N-gram]
An N-gram is N consecutive sequences for sequential data consisting of elements such as given characters and words. For example, given the sequence of ... AGCTTCGA ... in the base sequence of DNA, the 1-gram appearing in this sequence is ..., A, G, C, T, T, C, G, A, ..., 2-gram is ..., AG, GC, CT, TT, TC, CG, GA, ..., 3-gram is ..., AGC, GCT, CTT, TTC, TCG, CGA,. For example, when a sequence of to be or not to be appears in a sentence, if each element is regarded as a word, 1-gram appearing in this sequence is ..., to, be, or, not, to, be, ..., 2-gram is ..., to be, be or, or not, not to, to be, ..., 3-gram is ..., to be or , Be or not, or not to, not to be,.

例えばＮ−ｇｒａｍを利用すると、単にＮ−ｇｒａｍの出現回数をベクトル化することで系列データを特徴抽出することができる。ある系列データに対して、それぞれの要素が取り得る値の数をＭ個、系列データの長さをＬ個とした場合、この系列データとして考えられるパターンはＭ^Ｌ通り存在する。しかし、例えば２−ｇｒａｍの出現回数を特徴として用いると、この系列データをＭ^２次元のべクトルとして扱うことができるため、扱いやすいデータとなる。Ｎ−ｇｒａｍは自然言語処理の分野または生命科学の分野で用いられることが多いが、ペイロード異常検知においても有効なことが過去の研究によって示されている。 For example, using N-grams, feature data of sequence data can be extracted simply by vectorizing the number of occurrences of N-grams. Assuming that the number of possible values of each element is M and the length of the series data is L for a certain series data, there are M ^L possible patterns for this series data. However, if, for example, the number of occurrences of 2-grams is used as a feature, this series data can be treated as an M ² -dimensional vector, which makes the data easy to handle. N-grams are often used in the field of natural language processing or in the field of life sciences, but past studies have shown that they are also effective in payload anomaly detection.

［１−２−２Ｎ−ｇｒａｍを用いた系列生成モデル］
上述の通り、Ｎ−ｇｒａｍは系列情報を扱う際に有用なモデルである。Ｎ−ｇｒａｍを用いると非常にシンプルな系列の生成モデルを構築することができる。Ｎ−ｇｒａｍを用いた系列の生成モデルでは、Ｐｒ（ｘ_ｉ｜ｘ_{ｉ−（ｎ−１）}，．．．，ｘ_ｉ−１）と扱う。すなわち、ある要素が出力される確率は、直前のＮ−１個の要素だけで決定されると仮定する。もちろんこの仮定はほとんどの場合正しくないが、系列が与えられた場合に、その系列が発生する尤度を得ること等ができる点で便利である。例えばｘ_１，ｘ_２，．．．，ｘ_ｌを対象の系列データとするとき，このデータが生成される確率はＰｒ（ｘ_１，ｘ_２，．．．，ｘ_ｌ）で表現されるが、２−ｇｒａｍを用いた生成モデルでは、これは下記のように確率の積に分解できる。 [1-2-2 Sequence generation model using N-gram]
As described above, N-gram is a useful model when dealing with sequence information. Using N-grams, it is possible to construct a very simple series generation model. In a generation model of a series using N-gram, it is treated as Pr (x _i | x _{i − (n−1)} ,..., X _i−1 ). That is, it is assumed that the probability that an element is output is determined only by the immediately preceding N-1 elements. Of course, this assumption is not correct in most cases, but it is convenient in that when given a sequence, it is possible to obtain the likelihood that the sequence will occur. For example, x ₁ , x ₂ ,. . . , When subjected to the series data _{x l,} the probability that the data is generated _{_{Pr (x 1, x 2,}} ..., x l) is represented, in the model that uses 2-gram , Which can be decomposed into products of probabilities as follows:

Ｐｒ（ｘ_１，ｘ_２，．．．，ｘ_ｌ）（式１）
＝Ｐｒ（ｘ_１）・Ｐｒ（ｘ_２｜ｘ_１）・Ｐｒ（ｘ_３｜ｘ_１，ｘ_２）
・・・Ｐｒ（ｘｌｘ_１，ｘ_２，．．．，ｘ_ｌ−１）（式２）
＝Ｐ（ｘ_１｜ｓｔａｒｔ）・Ｐｒ（ｘ_２｜ｘ_１）・Ｐｒ（ｘ_３｜ｘ_２）
・・・Ｐｒ（ｘ_ｌ｜ｘ_ｌ−１）（式３） Pr (x ₁ , x ₂ , ..., x _l ) (Equation 1)
= Pr (x ₁ ) · Pr (x ₂ | x ₁ ) · Pr (x ₃ | x ₁ , x ₂ )
... Pr (xlx ₁ , x ₂ , ..., x _{l -1} ) (Equation 2)
= P (x ₁ | start) · Pr (x ₂ | x ₁ ) · Pr (x ₃ | x ₂ )
... Pr (x _l | x _l-1 ) (Equation 3)

先頭の要素のみ、先頭にｘ_１が出現する確率を利用し、以降の文字が出現する確率は、その直前の要素が出現する確率のみに依存するという性質を用いて確率を算出する。最も単純な手法では、取りうる要素の数をＭ，２−ｇｒａｍｘ_ｉ，ｘ_ｊが出現した回数をｋ_{ｘｉ，ｘｊ}とすると、次の式４で各項の確率を定める。 Only the beginning of the element, the top use the probability that x ₁ appears in, the probability of subsequent characters appear calculates the probability by using the property that depends only on the probability that the element of the immediately preceding appears. In the simplest method, assuming that the number of possible elements is M, and the number of occurrences of 2-gram x _i , x _j is k _xi, x _j , the probability of each term is determined by the following equation 4.

本開示の異常検知方法はペイロードに含まれるバイト列のＮ−ｇｒａｍを用いてモデリングを行ない異常なペイロード列を検知する方法であり、検知性能の高さとチューニングの容易さの点で従来技術に対して優位性がある。 The anomaly detection method of the present disclosure is a method of performing modeling using N-grams of byte sequences included in the payload to detect an abnormal payload sequence, which is superior to the prior art in terms of high detection performance and ease of tuning. Advantage.

［１−３既存手法］
既存のペイロードベース異常検知方法の紹介を行う。ここで挙げる異常検知方法は、例えばＴＣＰ（Transmission Control Protocol）／ＵＤＰ（User Datagram Protocol）レイヤーのペイロード（図４Ａの構造をしたパケットのＴＣＰ／ＵＤＰペイロード部）を利用して異常検知を行う。しかし勿論、検知対象はＴＣＰ／ＵＤＰ上のプロトコルに限定されるわけではなく、他のプロトコルのペイロードを用いても同様に異常検知を行うことができる。また、各方法は事前の知識を必要としない。すなわち、プロトコル専用のパーサを利用して、ペイロードの特定の要素を抽出する等の操作を行わない。Ｎ−ｇｒａｍをペイロードベースの異常検知方法として利用した代表的な先行例として、下記のＰＡＹＬ、ＰＯＳＥＩＤＯＮ、ＡＮＡＧＲＡＭが存在する。ＰＡＹＬ、ＰＯＳＥＩＤＯＮはともにｕｎｉｇｒａｍを用いて識別を行う例である。ＡＮＡＧＲＡＭはＮ−ｇｒａｍ（Ｎ＝３，５，７など）を用いて識別を行う例である。以下、３つの方法について順に説明する。 [1-3 Existing method]
Introduce existing payload-based anomaly detection methods. The abnormality detection method mentioned here performs abnormality detection using, for example, a payload of a Transmission Control Protocol (TCP) / User Datagram Protocol (UDP) layer (a TCP / UDP payload portion of a packet having a structure of FIG. 4A). However, as a matter of course, the detection target is not limited to the protocol on TCP / UDP, and even if payloads of other protocols are used, abnormality detection can be similarly performed. Also, each method does not require prior knowledge. That is, the parser dedicated to the protocol is not used to perform an operation such as extracting a specific element of the payload. The following PAYL, POSEIDON, and ANAGRAM exist as a typical precedent example using N-gram as a payload-based anomaly detection method. PAYL and POSEIDON are both examples of identification using a unigram. ANAGRAM is an example of performing identification using N-gram (N = 3, 5, 7, etc.). Hereinafter, three methods will be described in order.

［１−３−１ＰＡＹＬ］
ＰＡＹＬはペイロード列のｕｎｉｇｒａｍ情報を用いる手法であり、２００４年のＫｅＷａｎｇらによって提案された。ＰＡＹＬでは異常検知モデルを、パケットの宛先ＩＰ、パケットの宛先ポート、パケットペイロード長（１ｂｙｔｅ単位）でモデルを別々に学習する。学習フェーズでは、学習用のペイロード列全てを２５６次元のｕｎｉｇｒａｍのべクトルに変換し、各次元の平均と標準偏差の情報とを蓄積する。検知フェーズにおいても検査対象となるペイロードを２５６次元（１ｂｙｔｅ）のｕｎｉｇｒａｍベクトルに変換し、変換されたペクトルと学習時に蓄積した平均ベクトルとの間の簡易化されたマハラノビス距離を異常スコアとして算出する。異常スコアの算出には、下記の式５が用いられる。 [1-3-1 PAYL]
PAYL is a method using unigram information of a payload sequence, and was proposed by Ke Wang et al. In PAYL, an anomaly detection model is learned separately by the packet destination IP, packet destination port, and packet payload length (1 byte unit). In the learning phase, all the payload sequences for learning are converted into 256-dimensional unigram vectors, and the information of the average and standard deviation of each dimension is accumulated. Also in the detection phase, the payload to be inspected is converted into a 256-dimensional (1 byte) unigram vector, and the simplified Mahalanobis distance between the converted vector and the average vector accumulated at learning is calculated as an abnormality score. The following equation 5 is used to calculate the abnormality score.

ナイーブなＰＡＹＬの実装においては、ハイパーパラメータ（学習の前段階で人間が定める必要のあるパラメータ）は上式のαしか存在しないため、チューニングの必要性が少ない。また、データの追加学習は容易に可能である。ＰＡＹＬは、シンプルで優れた識別器であるが、ｕｎｉｇｒａｍを使うという性質上、並びに関する情報が一切失われてしまい、それが精度の悪化に繋がっていると考えられる。その後、ｕｎｉｇｒａｍを使うことによる弱点を克服するため、Ｎ−ｇｒａｍ（Ｎ≧２）を用いる手法が様々に考案されている。 In the naive PAYL implementation, the hyperparameters (parameters that need to be determined by humans at the pre-learning stage) are only α in the above equation, so there is less need for tuning. Also, additional learning of data is easily possible. PAYL is a simple and excellent discriminator, but the nature of using unigram results in the loss of any information about the alignment, which is considered to lead to the deterioration of accuracy. Then, in order to overcome the weak point by using unigram, various methods using N-gram (N> = 2) are devised.

つまり、ＰＡＹＬの問題点は、ｕｎｉｇｒａｍを使うため識別精度がその後提案された各種手法に比べてやや低い点である。 In other words, the problem with PAYL is that the discrimination accuracy is somewhat lower than the various proposed methods since unigrams are used.

［１−３−２ＰＯＳＥＩＤＯＮ］
ＰＡＹＬでは、パケットの役割ごとに異常検知モデルを切り分けたいという意図があったため、パケットのペイロード長ごとにモデルを分割していた。しかし、ペイロード長ではパケットの役割ごとにモデルを切り分けられない場合があると考え、別の情報を用いてモデルの分割を試みた手法が２００６年にＤａｍｉａｎｏＢｏｌｚｏｎｉらによって提案されたＰＯＳＥＩＤＯＮである。ＰＯＳＥＩＤＯＮでは、ペイロード間の距離尺度を定め、その距離尺度の下で近いパケット同士をクラスタリングし、そのクラスタ情報をペイロード長の代わりにモデルを分割するための情報として用いた。クラスタリングの手法としては自己組織化マップを用いている。このクラスタ情報をペイロード長の代わりに用いる点を除けばＰＯＳＥＩＤＯＮはＰＡＹＬと同一の異常検知手法である。ＰＯＳＥＩＤＯＮは適切な自己組織化マップを学習できた場合には高い識別精度を発揮する。しかし、自己組織化マップは非常に数多くのハイパーパラメータを持つ。このため、ＰＯＳＥＩＤＯＮでは、パケットが好ましいクラスタに分かれるようになるために、多くの試行や交差検証法によるチューニングが必要となるため、実用性は乏しい。 [1-3-2 POSEIDON]
In PAYL, the intention was to divide the anomaly detection model according to the role of the packet, so the model was divided according to the payload length of the packet. However, it is POSEIDON proposed by Damiano Bolzoni et al. In 2006 that attempted to divide the model using different information, thinking that the payload length may not be able to separate the model for each role of the packet. In POSE IDON, a distance measure between payloads is defined, near packets are clustered under the distance measure, and the cluster information is used as information for dividing the model instead of the payload length. A self-organizing map is used as a clustering method. POSEIDON is the same anomaly detection method as PAYL except that this cluster information is used instead of the payload length. POSEIDON demonstrates high identification accuracy if it can learn an appropriate self-organizing map. However, self-organizing maps have a very large number of hyperparameters. For this reason, in POSEIDON, in order to divide packets into desirable clusters, it is necessary to perform tuning by many trials and cross verification methods, so the practicality is poor.

つまり、ＰＯＳＥＩＤＯＮの間題点は、下記の２点である。１点目は、自己組織化マップに多くのハイパーパラメータが存在するため、チューニングが非常に困難である点である。２点目は、自己組織化マップの学習に多くの時間や計算リソースが必要であり、実環境での利用に不向きである点である。 In other words, there are two problems with POSEIDON: The first point is that tuning is very difficult because there are many hyperparameters in the self-organizing map. The second point is that learning a self-organizing map requires a lot of time and computational resources, and is unsuitable for use in a real environment.

［１−３−３ＡＮＡＧＲＡＭ］
ＡＮＡＧＲＡＭは、ＰＡＹＬを改良すべく２００６年にＫｅＷａｎｇらによって提案された手法である。ＰＡＹＬはＭｉｍｉｃｒｙＡｔｔａｃｋ（モデルによる検知を回避しようとする攻撃）に対して脆弱であるという問題点が指摘されたが、この問題を回避するため、ＡＮＡＧＲＡＭではより大きなＮに対するＮ−ｇｒａｍモデリングを行っている。論文中では、各Ｎ−ｇｒａｍの出現回数情報も利用するｆｒｅｑｕｅｎｃｙ−ｂａｓｅｄの手法と、各Ｎ−ｇｒａｍが出現したか否かのみを利用するｂｉｎａｒｙ−ｂａｓｅｄの手法が紹介されている。この２つの手法のうち、学習データのスパース性が高いためＢｉｎａｒｙ−ｂａｓｅｄのＡＮＡＧＲＡＭの方が、精度が良いとされている（以降単にＡＮＡＧＲＡＭと記述した場合はｂｉｎａｒｙ−ｂａｓｅｄのＡＮＡＧＲＡＭを指し示すこととする）。Ｎ−ｇｒａｍ情報は、Ｎの大きさに対し指数関数的に情報量が増加するため、ＡＮＡＧＲＡＭではブルームフィルタを使って効率的に学習対象のペイロード中に現れたＮ−ｇｒａｍ情報を保持している。ブルームフィルタは高速に動作し、メモリ利用量も膨大にはならないが、予め対象データに応じてフィルタサイズを決定する必要がある。フィルタサイズが小さすぎれば、これまでに観測していないＮ−ｇｒａｍを観測したものだと誤る恐れがあり、フィルタサイズが大きければメモリを大量に占有してしまう。またＡＮＡＧＲＡＭは、その性質上一度でも不正なＮ−ｇｒａｍを含むパケットを観測してしまうと、以降そのパケットに含まれていたＮ−ｇｒａｍを全て正常なものだと解釈してしまうため、著しく精度が劣化してしまう。これは例えば学習データ中に暗号化された文字列や、データのバイナリ列のようなランダム性の高いバイト列が含まれている場合に間題となる。ＡＮＡＧＲＡＭ（ｂｉｎａｒｙ−ｂａｓｅｄ）における異常スコアは、下記の式６により算出される。 [1-3-3 ANAGRAM]
ANAGRAM is a method proposed by Ke Wang et al. In 2006 to improve PAYL. PAYL has been pointed out that it is vulnerable to Mimicry Attack (an attack that tries to evade detection by a model), but in order to avoid this problem, NAG gram modeling for larger N is performed in ANAGRAM. There is. In the paper, a frequency-based method that uses the number of occurrences of each N-gram and a binary-based method that uses only whether each N-gram has appeared are introduced. Of these two methods, the accuracy of the binary-based ANAGRAM is considered to be better because the learning data is highly sparse (it will be pointed to the binary-based ANAGRAM when it is simply described as the ANAGRAM) ). As N-gram information increases in an exponential manner with respect to the size of N, in ANAGRAM, the Bloom filter is used to efficiently hold the N-gram information that appears in the payload to be learned. . The Bloom filter operates at high speed, and the memory usage does not become enormous. However, it is necessary to determine the filter size in advance according to the target data. If the filter size is too small, it may be mistaken if it is an observed N-gram that has not been observed so far, and if the filter size is large, a large amount of memory will be occupied. Also, by its nature, NAGRAM interprets the N-gram contained in the packet as a normal one if it observes a packet containing an invalid N-gram even once. Will deteriorate. This is a problem, for example, when the learning data includes an encrypted character string or a highly random byte string such as a binary string of data. The abnormality score in ANAGRAM (binary-based) is calculated by the following equation 6.

一方、ＡＮＡＧＲＡＭ（ｆｒｅｑｕｅｎｃｙ−ｂａｓｅｄ）における異常スコアは、下記の式７により算出される。 On the other hand, the abnormality score in AnaGRAM (frequency-based) is calculated by the following equation 7.

これらの式からもわかるように、ＡＮＡＧＲＡＭはｂｉｎａｒｙ−ｂａｓｅｄ版、ｆｒｅｑｕｅｎｃｙ−ｂａｓｅｄ版ともに非常にシンプルな手法であり、ハイパーパラメータもほとんど存在しないため扱いやすい。 As can be understood from these formulas, both the binary-based version and the frequency-based version are very simple methods, and since there are almost no hyperparameters, ANAGRAM is easy to handle.

つまり、ＡＮＡＧＲＡＭ（ｆｒｅｑｕｅｎｃｙ−ｂａｓｅｄ、ｂｉｎａｒｙ−ｂａｓｅｄ）の問題点は、下記の３点である。１点目は、ＡＮＡＧＲＡＭでは、頻度に関する情報を落としてしまっているため、不正なパケットやランダム性の高いパケットの影響で正常でないパケットのＮ−ｇｒａｍを観測してしまった場合に、著しい精度の劣化に繋がる点である。２点目は、Ｎ≧４程度の大きなＮに対してＡＮＡＧＲＡＭを用いる場合、ブルームフィルタの利用が不可欠となるため、ブルームフィルタのサイズ設計を行う必要がある点である。３点目は、ＡＮＡＧＲＡＭ（ｆｒｅｑｕｅｎｃｙ−ｂａｓｅｄ）のスコア算出の関数は経験的なものであり、確率論的な妥当性が無い点である。 In other words, the problems with NAGRAM (frequency-based, binary-based) are the following three points. The first point is that since the information on frequency is dropped in ANAGRAM, when the N-gram of an incorrect packet is observed due to the influence of an incorrect packet or a highly random packet, the accuracy is extremely high. It is a point that leads to deterioration. The second point is that when using an ANAGRAM for a large N such as N ≧ 4, the use of the Bloom filter is essential, so it is necessary to design the Bloom filter size. The third point is that the function of calculating NAGRAM (frequency-based) scores is empirical and there is no probabilistic relevance.

以上のことから、本発明者らは、鋭意検討の上、異常なパケットを精度よく特定することができる異常検知方法、学習方法、異常検知装置、および、学習装置を見出すに至った。 From the above, the inventors of the present invention have found out an abnormality detection method, a learning method, an abnormality detection device, and a learning device capable of identifying an abnormal packet with high accuracy, after intensive investigation.

これによれば、ペイロードにおけるデータ単位の並び情報を考慮して異常検知モデルを学習しているため、異常なパケットを精度よく特定することができる。 According to this, since the anomaly detection model is learned in consideration of the alignment information of the data unit in the payload, the anomaly packet can be identified with high accuracy.

また、学習において、スムージング処理を行うことで算出した第１の確率を用いているため、ノイズに対する頑健性を向上させることができる。 Further, in learning, since the first probability calculated by performing the smoothing process is used, the robustness to noise can be improved.

また、前記第１の確率の算出では、前記スムージング処理として、前記第１の数の全てに、正の数を加算することで複数の第２の数を算出し、抽出した前記全ての第１の組み合わせのそれぞれについて算出した前記複数の第２の数に基づいて、前記第１に確率を算出してもよい。 Further, in the calculation of the first probability, as the smoothing processing, a plurality of second numbers are calculated by adding a positive number to all of the first numbers, and all the extracted first numbers are calculated. The first probability may be calculated based on the plurality of second numbers calculated for each of the combinations of.

また、学習において、複数の第１の数の全てに正の数を加算することで算出した複数の第２の数に基づく第１の確率を用いているため、ノイズに対する頑健性を向上させることができる。 In addition, since the first probability based on the plurality of second numbers calculated by adding a positive number to all the plurality of first numbers is used in learning, the robustness to noise is improved. Can.

また、前記抽出では、Ｎ−ｇｒａｍを用いることで、前記Ｎ個のデータ単位の前記第１の組み合わせを抽出してもよい。 In the extraction, the first combination of the N data units may be extracted by using an N-gram.

また、前記Ｎは、２または３であってもよい。 Also, the N may be 2 or 3.

また、前記出力では、取得した前記複数のパケットのそれぞれについて、（１）当該パケットに含まれるペイロードを構成するデータ列をＡ（Ａは１以上の整数）ビット単位で区切ることにより得られる複数個のデータ単位のうちのＮ（Ｎは２以上の整数）個のデータ単位の取り得る全ての第２の組み合わせであって、当該ペイロードにおける互いに連続している並び順、または、Ｂ（Ｂは１以上の整数）個飛ばしの並び順でのＮ個のデータ単位の第２の組み合わせを抽出し、（２）当該パケットから抽出した前記全ての第２の組み合わせのそれぞれについて、当該第２の組み合わせが、取得した当該パケットが有する前記ペイロードにおいて出現する回数である第３の数をカウントし、（３）当該パケットにおける前記全ての第２の組み合わせのそれぞれについてカウントすることで得られた複数の前記第３の数に基づいて、当該パケットにおいて当該第２の組み合わせが出現する確率である複数の第２の確率を算出し、（４）当該パケットに対して算出した前記複数の第２の確率の対数の総和を前記ペイロードのペイロード長で規定される規定値で除算することでスコアを算出し、（５）当該パケットに対して算出した前記スコアが、前記メモリに記憶されている前記異常検知モデルに基づく所定の閾値を超えている場合、当該スコアが算出されたパケットが異常であることを出力してもよい。 Further, in the output, for each of the plurality of acquired packets, (1) a plurality of pieces obtained by dividing a data string constituting a payload included in the packet in A (A is an integer of 1 or more) bits All possible second combinations of N (N is an integer of 2 or more) data units of the data units in the order in which the payloads are consecutive in the payload, or B (B is 1 Extract the second combination of N data units in the order of the above integer) piece-by-piece, and (2) for each of the second combinations extracted from the packet, the second combination , Counting a third number which is the number of times of appearance in the payload of the acquired packet, (3) all the second combinations in the packet Based on a plurality of the third numbers obtained by counting each of the set, a plurality of second probabilities that are probabilities of the occurrence of the second combination in the packet are calculated, (4) The score is calculated by dividing the sum of logarithms of the plurality of second probabilities calculated for the packet by the specified value defined by the payload length of the payload, and (5) the calculated for the packet When the score exceeds a predetermined threshold based on the abnormality detection model stored in the memory, it may be output that the packet for which the score is calculated is abnormal.

これによれば、ペイロードにおけるデータ単位の並び情報を考慮してスコアを算出しているため、異常なパケットを精度よく特定することができる。 According to this, since the score is calculated in consideration of the alignment information of the data unit in the payload, an abnormal packet can be identified with high accuracy.

また、前記メモリは、前記全ての第１の組み合わせのそれぞれにおける前記第１の数に基づく第４の数を前記異常検知モデルとして記憶しており、前記異常検知方法では、前記プロセッサが、さらに、カウントした前記第３の数を用いて、前記異常検知モデルに含まれる前記第４の数を更新してもよい。 Further, the memory stores, as the abnormality detection model, a fourth number based on the first number in each of the first combinations, and in the abnormality detection method, the processor further includes: The fourth number included in the abnormality detection model may be updated using the counted third number.

このため、異常検知モデルを追加学習すること、または、古いデータを削除した異常検知モデルに更新することができる。よって、異常なパケットを精度よく特定することができる。 Therefore, it is possible to additionally learn the anomaly detection model or to update the anomaly detection model from which old data has been deleted. Thus, abnormal packets can be identified with high accuracy.

また、前記異常検知方法では、前記プロセッサが、さらに、取得した前記複数の学習用パケットのそれぞれについて、当該学習用パケットが有するヘッダに応じて当該学習用パケットを複数のモデルのいずれか１つに分類し、前記複数のモデルのそれぞれについて、（１）さらに、カウントした前記第１の数を用いて、前記複数の学習用パケットのうち当該モデルに分類された複数の学習用パケットにおいて、前記全ての第１の組み合わせのそれぞれが出現する回数である第５の数を算出し、（２）当該モデルに分類された前記複数の学習用パケットから抽出した前記全ての第１の組み合わせのそれぞれについて、算出した前記第５の数の全てに、正の数を加算することで前記複数の第６の数を算出し、（３）抽出した前記全ての第１の組み合わせのそれぞれについて、算出した前記複数の第６の数に基づいて、当該モデルに分類された前記複数の学習用パケットにおいて当該第１の組み合わせが出現する確率である複数の第１の確率を算出してもよい。 Further, in the abnormality detection method, the processor further sets, for each of the plurality of acquired learning packets, the learning packet to any one of a plurality of models according to a header of the learning packet. In each of a plurality of learning packets classified into the model among the plurality of learning packets using (1) and further counting the first number for each of the plurality of models. Calculating a fifth number, which is the number of times each of the first combinations appears, and (2) for each of the first combinations extracted from the plurality of learning packets classified into the model, The plurality of sixth numbers are calculated by adding a positive number to all of the calculated fifth numbers, and (3) all the first sets extracted For each set, based on the plurality of calculated sixth numbers, a plurality of first probabilities, which are probabilities of occurrence of the first combination in the plurality of learning packets classified into the model, are calculated You may

また、前記メモリは、前記複数のモデル毎に、前記所定の閾値を記憶しており、前記異常検知方法では、前記プロセッサが、さらに、取得した前記複数のパケットのそれぞれを、当該パケットが有するヘッダに応じて複数のモデルのいずれか１つに分類し、前記出力では、算出した前記スコアが、当該スコアが算出されたパケットが分類されたモデルに対応する前記所定の閾値を超えている場合、当該パケットが異常であることを出力してもよい。 In addition, the memory stores the predetermined threshold for each of the plurality of models, and in the abnormality detection method, the processor further includes a header having each of the plurality of acquired packets. Classified according to one of a plurality of models, and in the output, if the calculated score exceeds the predetermined threshold value corresponding to the model into which the packet for which the score is calculated is classified It may output that the packet is abnormal.

また、前記複数のモデルのそれぞれは、前記パケットの宛先ＩＰ、宛先ポート、送信元ＩＰ、及びプロトコルの少なくとも１つにより分類されるモデルであってもよい。 Further, each of the plurality of models may be a model classified by at least one of a destination IP, a destination port, a source IP, and a protocol of the packet.

また、前記メモリは、前記複数のモデル毎における、前記全ての第１の組み合わせのそれぞれにおける前記第５の数を前記異常検知モデルとして記憶しており、前記異常検知方法では、前記プロセッサが、さらに、カウントした前記第３の数を用いて、前記異常検知モデルに含まれる前記第５の数を更新してもよい。 Further, the memory stores the fifth number in each of the first combinations in each of the plurality of models as the abnormality detection model, and in the abnormality detection method, the processor further calculates The fifth number included in the abnormality detection model may be updated using the counted third number.

本開示の一態様に係る学習方法は、監視対象内での通信、または、前記監視対象と前記監視対象が接続されているネットワークとの間での通信に異常があるか否かを検知するための異常検知モデルを学習する学習装置が実行する学習方法であって、前記学習装置は、プロセッサおよびメモリを備え、前記学習方法では、前記プロセッサが、複数の学習用パケットを取得し、取得した前記複数の学習用パケットのそれぞれについて、当該学習用パケットに含まれるペイロードを構成するデータ列をＡ（Ａは１以上の整数）ビット単位で区切ることにより得られる複数個のデータ単位のうちのＮ（Ｎは２以上の整数）個のデータ単位の取り得る全ての第１の組み合わせであって、当該ペイロードにおける互いに連続している並び順、または、Ｂ（Ｂは１以上の整数）個飛ばしの並び順でのＮ個のデータ単位の第１の組み合わせを抽出し、前記複数の学習用パケットについて抽出した前記全ての第１の組み合わせのそれぞれについて、当該第１の組み合わせが前記複数の学習用パケットにおいて出現する回数である第１の数をカウントし、抽出した前記全ての第１の組み合わせのそれぞれについて、カウントすることで得られた複数の前記第１の数に基づいて、スムージング処理を行うことで前記複数の学習用パケットにおいて当該第１の組み合わせが出現する確率である複数の第１の確率を算出し、算出した前記複数の第１の確率を前記異常検知モデルとして前記メモリに記憶させる。 A learning method according to an aspect of the present disclosure detects whether there is an abnormality in communication within a monitoring target or communication between the monitoring target and a network to which the monitoring target is connected. A learning device for learning an abnormality detection model, wherein the learning device includes a processor and a memory, and in the learning method, the processor acquires a plurality of learning packets and acquires the learning packets. For each of the plurality of learning packets, N of the plurality of data units obtained by dividing the data string constituting the payload included in the learning packet into A (A is an integer of 1 or more) bits N is an all possible first combination of two or more integers of data units, and the arrangement order in which the payloads are consecutive to one another, or B ( (A) extracts a first combination of N data units in the order of 1 or more integers, and the first combination of all the first combinations extracted for the plurality of learning packets A plurality of the first numbers obtained by counting the first number which is the number of times of the combination occurring in the plurality of learning packets and counting each of the extracted first combinations. And performing the smoothing process to calculate a plurality of first probabilities that are probabilities of occurrence of the first combination in the plurality of learning packets, and calculating the plurality of calculated first probabilities as the abnormality. It is stored in the memory as a detection model.

また、学習において、複数の第１の数の全てに正の数を加算することで算出した複数の第１の数に基づく第１の確率を用いているため、ノイズに対する頑健性を向上させることができる。 In addition, since the first probability based on the plurality of first numbers calculated by adding a positive number to all the plurality of first numbers is used in learning, the robustness to noise is improved. Can.

なお、これらの全般的または具体的な態様は、システム、方法、集積回路、コンピュータプログラムまたはコンピュータ読み取り可能なＣＤ−ＲＯＭなどの記録媒体で実現されてもよく、システム、方法、集積回路、コンピュータプログラムまたは記録媒体の任意な組み合わせで実現されてもよい。 Note that these general or specific aspects may be realized by a system, a method, an integrated circuit, a computer program, or a recording medium such as a computer readable CD-ROM, a system, a method, an integrated circuit, a computer program Or it may be realized by any combination of recording media.

以下、本発明の一態様に係る異常検知方法、学習方法、異常検知装置、および、学習装置について、図面を参照しながら具体的に説明する。 Hereinafter, an abnormality detection method, a learning method, an abnormality detection device, and a learning device according to an aspect of the present invention will be specifically described with reference to the drawings.

なお、以下で説明する実施の形態は、いずれも本発明の一具体例を示すものである。以下の実施の形態で示される数値、形状、材料、構成要素、構成要素の配置位置及び接続形態、ステップ、ステップの順序などは、一例であり、本発明を限定する主旨ではない。また、以下の実施の形態における構成要素のうち、最上位概念を示す独立請求項に記載されていない構成要素については、任意の構成要素として説明される。 Each embodiment described below shows one specific example of the present invention. Numerical values, shapes, materials, components, arrangement positions and connection forms of components, steps, order of steps, and the like shown in the following embodiments are merely examples, and the present invention is not limited thereto. Further, among the components in the following embodiments, components not described in the independent claim indicating the highest concept are described as arbitrary components.

（実施の形態１）
［２−１異常検知システムの構成］
まず、本実施の形態における異常検知システムの概略構成について説明する。 Embodiment 1
[2-1 Configuration of anomaly detection system]
First, a schematic configuration of the abnormality detection system according to the present embodiment will be described.

図１は、実施の形態に係る異常検知システムの概略図である。 FIG. 1 is a schematic view of an abnormality detection system according to the embodiment.

具体的には、図１において、異常検知システム１は、異常検知装置１００、パケット収集装置２００、および、外部のネットワーク５００に通信接続されている監視対象３００を備える。異常検知システム１では、異常検知装置１００が監視対象３００内での通信、または、監視対象とネットワーク５００との間での通信に異常があるか否かを検知する。 Specifically, in FIG. 1, the abnormality detection system 1 includes an abnormality detection apparatus 100, a packet collection apparatus 200, and a monitoring target 300 communicably connected to an external network 500. In the abnormality detection system 1, the abnormality detection apparatus 100 detects whether there is an abnormality in the communication within the monitoring target 300 or the communication between the monitoring target and the network 500.

監視対象３００は、異常検知の対象となるシステムである。監視対象３００は、例えば、化学プラント、制御システム、車載ネットワークシステムなどである。監視対象３００は、ハブ３１１、３１２、３２１、３２２と、ＳＣＡＤＡ（Supervisory Control And Data Acquisition）３１３と、ＰＬＣ（Programmable Logic Controller）３１４と、ＰＣ（Personal Computer）３１５、３２３、３２４と、ルータ４００とを備える制御システムである。 The monitoring target 300 is a system that is a target of abnormality detection. The monitoring target 300 is, for example, a chemical plant, a control system, an in-vehicle network system, or the like. The monitoring target 300 includes hubs 311, 312, 321 and 322, supervisory control and data acquisition (SCADA) 313, programmable logic controller (PLC) 314, personal computer (PC) 315, 323 and 324, and router 400. Control system comprising

ルータ４００は、監視対象３００と外部のネットワーク５００との間において、データの送受信を中継する通信機器である。ルータ４００は、受信したデータを解析し、解析した結果に基づいてデータの転送経路を選択するなどのデータの転送制御を行う。 The router 400 is a communication device that relays transmission and reception of data between the monitoring target 300 and the external network 500. The router 400 analyzes the received data and performs data transfer control such as selecting a data transfer path based on the analyzed result.

ハブ３１１、３２１は、例えば、スイッチングハブである。ハブ３１１は、ルータ４００、ハブ３１２、ＳＣＡＤＡ３１３、ハブ３２１、および、パケット収集装置２００と通信接続される。ハブ３２１は、ハブ３１１、ハブ３２２、および、パケット収集装置２００と通信接続される。ハブ３１１、３２１は、受信したデータを接続された機器のうち、受信したデータに含まれる宛先情報に基づく機器に転送する。ハブ３１１、３２１は、例えば、受信したデータをコピーし、コピーしたデータを出力するミラーポートを有する。ハブ３１１、３２１は、ミラーポートにおいて、パケット収集装置２００と接続されている。監視対象３００と外部のネットワーク５００との間で送受信される複数のパケットは、ハブ３１１、３２１のミラーポート経由で抽出され、パケット収集装置２００に送信される。 The hubs 311 and 321 are, for example, switching hubs. The hub 311 is communicably connected to the router 400, the hub 312, the SCADA 313, the hub 321, and the packet collection device 200. The hub 321 is communicably connected to the hub 311, the hub 322, and the packet collection device 200. The hubs 311 and 321 transfer the received data to the connected device based on the destination information included in the received data. The hubs 311 and 321 have, for example, a mirror port that copies received data and outputs the copied data. The hubs 311 and 321 are connected to the packet collection device 200 at the mirror port. A plurality of packets transmitted and received between the monitoring target 300 and the external network 500 are extracted via the mirror ports of the hubs 311 and 321 and transmitted to the packet collection device 200.

ハブ３１２、３２２は、例えば、スイッチングハブである。ハブ３１２は、ハブ３１１、ＰＬＣ３１４、および、ＰＣ３１５と通信接続される。ハブ３２２は、ハブ３２１およびＰＣ３２３、３２４と通信接続される。ハブ３１２、３２２は、ハブ３１１、３２１と同様に、受信したデータを接続された機器のうち、受信したデータに含まれる宛先情報に基づく機器に転送する。 The hubs 312 and 322 are, for example, switching hubs. The hub 312 is communicably connected to the hub 311, the PLC 314, and the PC 315. The hub 322 is communicably connected to the hub 321 and the PCs 323 and 324. Similar to the hubs 311 and 321, the hubs 312 and 322 transfer the received data to the connected devices based on the destination information included in the received data.

ＳＣＡＤＡ３１３は、監視対象３００である制御システムのシステム監視、プロセス制御などを行うコンピュータである。 The SCADA 313 is a computer that performs system monitoring, process control, and the like of a control system to be monitored 300.

ＰＬＣ３１４は、各種機械を制御するための制御装置である。 The PLC 314 is a control device for controlling various machines.

ＰＣ３１５は、汎用のコンピュータである。 The PC 315 is a general-purpose computer.

パケット収集装置２００は、監視対象３００のハブ３１１、３２１から送信された複数のパケットを受信し、受信した複数のパケットを記憶する装置である。パケット収集装置２００は、例えば、サーバである。パケット収集装置２００は、例えば１週間などの所定期間にわたって、監視対象３００から複数のパケットを受信し、所定期間分の複数のパケットを記憶する。パケット収集装置２００は、記憶した複数のパケットを異常検知装置１００に送信する。パケット収集装置２００は、また、異常検知装置１００が異常検知モデルを生成するための複数の学習用パケットを記憶していてもよい。複数の学習用パケットは、異常を有していない、正常なパケットにより構成される。 The packet collection device 200 is a device that receives a plurality of packets transmitted from the hubs 311 and 321 of the monitoring target 300 and stores the plurality of received packets. The packet collection device 200 is, for example, a server. The packet collection device 200 receives a plurality of packets from the monitoring target 300 over a predetermined period such as one week, for example, and stores a plurality of packets for a predetermined period. The packet collection device 200 transmits the stored plurality of packets to the abnormality detection device 100. The packet collection device 200 may also store a plurality of learning packets for the abnormality detection device 100 to generate an abnormality detection model. The plurality of learning packets are composed of normal packets that do not have an abnormality.

［２−２異常検知装置の構成］
次に、異常検知装置１００のハードウェア構成について図２を用いて説明する。 [2-2 Configuration of Abnormality Detection Device]
Next, the hardware configuration of the abnormality detection apparatus 100 will be described with reference to FIG.

図２は、実施の形態に係る異常検知装置のハードウェア構成の一例を示すブロック図である。 FIG. 2 is a block diagram showing an example of the hardware configuration of the abnormality detection apparatus according to the embodiment.

図２に示すように、異常検知装置１００は、ハードウェア構成として、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１０１と、メインメモリ１０２と、ストレージ１０３と、通信ＩＦ（Ｉｎｔｅｒｆａｃｅ）１０４と、入力ＩＦ（Ｉｎｔｅｒｆａｃｅ）１０５と、ディスプレイ１０６とを備える。 As shown in FIG. 2, the abnormality detection apparatus 100 includes a CPU (Central Processing Unit) 101, a main memory 102, a storage 103, a communication IF (Interface) 104, and an input IF (Interface) 105 as a hardware configuration. And a display 106.

ＣＰＵ１０１は、ストレージ１０３等に記憶された制御プログラムを実行するプロセッサの一例である。 The CPU 101 is an example of a processor that executes a control program stored in the storage 103 or the like.

メインメモリ１０２は、ＣＰＵ１０１が制御プログラムを実行するときに使用するワークエリアとして用いられる揮発性の記憶領域、つまりメモリの一例である。 The main memory 102 is an example of a volatile storage area used as a work area used when the CPU 101 executes a control program, that is, an example of a memory.

ストレージ１０３は、制御プログラム、コンテンツなどを保持する不揮発性の記憶領域、つまり、メモリの一例である。 The storage 103 is an example of a non-volatile storage area holding a control program, content and the like, that is, a memory.

通信ＩＦ１０４は、通信ネットワークを介してパケット収集装置２００と通信する通信インタフェースである。通信ＩＦ１０４は、例えば、有線ＬＡＮインタフェースである。なお、通信ＩＦ１０４は、無線ＬＡＮインタフェースであってもよい。また、通信ＩＦ１０４は、ＬＡＮインタフェースに限らずに、パケット収集装置２００との間で通信接続を確立できる通信インタフェースであれば、どのような通信インタフェースであってもよい。 The communication IF 104 is a communication interface that communicates with the packet collection device 200 via a communication network. The communication IF 104 is, for example, a wired LAN interface. The communication IF 104 may be a wireless LAN interface. The communication IF 104 is not limited to the LAN interface, and may be any communication interface as long as it can establish a communication connection with the packet collection device 200.

入力ＩＦ１０５は、例えば、テンキー、キーボード、マウスなどの入力装置である。 The input IF 105 is an input device such as, for example, a ten key, a keyboard, or a mouse.

ディスプレイ１０６は、ＣＰＵ１０１での処理結果を表示する表示装置である。ディスプレイ１０６は、例えば、液晶ディスプレイ、有機ＥＬディスプレイである。 The display 106 is a display device that displays the processing result of the CPU 101. The display 106 is, for example, a liquid crystal display or an organic EL display.

［２−３異常検知装置の機能構成］
次に、異常検知装置１００の機能構成について、図３を用いて説明する。なお、異常検知装置１００は、異常を検知するための異常検知モデルの学習処理も行う学習装置の一例でもある。 [2-3 Functional configuration of abnormality detection device]
Next, the functional configuration of the abnormality detection apparatus 100 will be described with reference to FIG. The abnormality detection apparatus 100 is also an example of a learning apparatus that also performs learning processing of an abnormality detection model for detecting an abnormality.

図３は、本実施の形態における異常検知装置の機能構成の一例を示すブロック図である。 FIG. 3 is a block diagram showing an example of a functional configuration of the abnormality detection device according to the present embodiment.

パケット収集装置２００に蓄積されている複数のパケットからなるデータ２１０は、学習用データ２１１と検査用データ２１２とを含む。 Data 210 composed of a plurality of packets accumulated in the packet collection device 200 includes learning data 211 and inspection data 212.

学習用データ２１１は、取得されたデータ２１０のうちで、機械学習による異常検知モデルを生成するためのデータである。検査用データ２１２は、取得されたデータ２１０のうちで、生成された異常検知モデルを用いて監視対象３００から得られたデータ２１０が異常か否かを判断する異常診断の対象となるデータである。なお、学習用データ２１１には、正常なデータだけでなく、異常なデータも含む取得された複数のパケットを用いることができる。例えば、学習用データ２１１は、データ２１０の始めの所定期間で取得されたデータであり、検査用データ２１２は、学習用データ２１１を取得した所定期間より後の期間において取得されたデータとしてもよい。また、検査用データ２１２は、異常検知モデルを更新するための学習用のデータとして用いられてもよい。 The learning data 211 is data for generating an abnormality detection model by machine learning among the acquired data 210. The inspection data 212 is data to be subjected to abnormality diagnosis to determine whether the data 210 obtained from the monitoring target 300 is abnormal using the generated abnormality detection model among the acquired data 210. . In addition, as the data for learning 211, a plurality of acquired packets including not only normal data but also abnormal data can be used. For example, the learning data 211 may be data acquired in a predetermined period at the beginning of the data 210, and the inspection data 212 may be data acquired in a period after the predetermined period in which the learning data 211 is acquired. . Also, the inspection data 212 may be used as learning data for updating the abnormality detection model.

なお、複数のパケットは、例えば、図４Ｂに示すような、ＴＣＰプロトコルのパケットである。図４Ｂは、ＴＣＰプロトコルのパケット構造の一例を示す図である。ＴＣＰプロトコルの構造は、ＲＦＣ７９３により規定されている。 The plurality of packets are, for example, packets of the TCP protocol as shown in FIG. 4B. FIG. 4B is a diagram showing an example of a packet structure of the TCP protocol. The structure of the TCP protocol is defined by RFC 793.

異常検知装置１００は、取得部１１０と、検知モデル学習部１２０と、異常検知モデルＤＢ（Ｄａｔａｂａｓｅ）１３０と、入力受付部１４０と、アラート閾値算出部１５０と、検知部１６０と、提示部１７０とを備える。 The abnormality detection apparatus 100 includes an acquisition unit 110, a detection model learning unit 120, an abnormality detection model DB (Database) 130, an input reception unit 140, an alert threshold calculation unit 150, a detection unit 160, and a presentation unit 170. Equipped with

取得部１１０は、パケット収集装置２００から学習用データ２１１としての複数のパケットである複数の学習用パケットを取得する。取得部１１０は、パケット収集装置２００から検査用データ２１２としての複数のパケットを取得してもよい。取得部１１０は、例えば、ＣＰＵ１０１、メインメモリ１０２、ストレージ１０３、および、通信ＩＦ１０４などにより実現される。 The acquisition unit 110 acquires a plurality of learning packets, which are a plurality of packets as the learning data 211, from the packet collection device 200. The acquisition unit 110 may acquire a plurality of packets as the inspection data 212 from the packet collection device 200. The acquisition unit 110 is realized by, for example, the CPU 101, the main memory 102, the storage 103, and the communication IF 104.

検知モデル学習部１２０は、取得部１１０により取得された複数の学習用パケットを用いて学習処理を行うことで、異常検知モデルを生成する。具体的には、検知モデル学習部１２０は、取得部１１０により取得され複数の学習用パケットのそれぞれについて、当該学習用パケットに含まれるペイロードを構成するデータ列をＡ（Ａは１以上の整数）ビット単位で区切ることにより得られる複数個のデータ単位のうちのＮ（Ｎは２以上の整数）個のデータ単位の取り得る全ての組み合わせであって、当該ペイロードにおける互いに連続している並び順でのＮ個のデータ単位の組み合わせを抽出する。ここで、抽出される組み合わせは、第１の組み合わせの一例である。ここで、Ａビット単位は、例えば、８ビット単位、つまり、１バイト単位である。なお、Ａは、８に限らずに、８以外の他の数値であってもよい。検知モデル学習部１２０は、Ｎ−ｇｒａｍを用いることで、Ｎ個のデータ単位の組み合わせを抽出する。ここで、Ｎは、例えば、２または３である。つまり、検知モデル学習部１２０は、２−ｇｒａｍまたは３−ｇｒａｍを用いることで、２個のデータ単位の組み合わせ、または、３個のデータ単位の組み合わせを抽出する。なお、検知モデル学習部１２０は、Ｎ−ｇｒａｍのように互いに連続している並び順でのＮ個のデータ単位の組み合わせに限らずに、Ｂ（Ｂは１以上の整数）個飛ばしの並び順でのＮ個のデータ単位の組み合わせを抽出してもよい。 The detection model learning unit 120 generates an abnormality detection model by performing learning processing using the plurality of learning packets acquired by the acquisition unit 110. Specifically, for each of the plurality of learning packets acquired by the acquiring unit 110, the detection model learning unit 120 sets a data string constituting the payload included in the learning packet to A (A is an integer of 1 or more). All possible combinations of N (N is an integer of 2 or more) data units out of a plurality of data units obtained by dividing into bit units, in the order in which they are consecutive in the payload Extract combinations of N data units of Here, the extracted combination is an example of a first combination. Here, the A-bit unit is, for example, an 8-bit unit, that is, a 1-byte unit. Note that A is not limited to eight, and may be another numerical value other than eight. The detection model learning unit 120 extracts a combination of N data units by using the N-gram. Here, N is, for example, 2 or 3. That is, the detection model learning unit 120 extracts a combination of two data units or a combination of three data units by using 2-gram or 3-gram. In addition, the detection model learning unit 120 is not limited to the combination of N data units in a sequential order such as N-gram, but the arrangement order of B (B is an integer of 1 or more) skips A combination of N data units in may be extracted.

次に、検知モデル学習部１２０は、取得部１１０により取得された複数の学習用パケットの複数のペイロードを構成するデータ列から抽出した全ての組み合わせのそれぞれについて、当該組み合わせが当該学習用パケットにおいて出現する回数である第１の数をカウントする。検知モデル学習部１２０は、抽出した全ての組み合わせのそれぞれについて、カウントすることで得られた複数の第１の数の全てに、正の数を加算することで複数の第２の数を算出する。検知モデル学習部１２０は、抽出した全ての組み合わせのそれぞれについて算出した複数の第２の数に基づいて、複数の学習用パケットにおいて当該第１の組み合わせが出現する確率である複数の第１の確率を算出する。 Next, the detection model learning unit 120 causes the combination to appear in the learning packet for each of all the combinations extracted from the data string constituting the plurality of payloads of the plurality of learning packets acquired by the acquiring unit 110. Count the first number which is the number of times The detection model learning unit 120 calculates a plurality of second numbers by adding a positive number to all of the plurality of first numbers obtained by counting for each of all the extracted combinations. . The detection model learning unit 120 determines a plurality of first probabilities that are probabilities that the first combination appears in a plurality of learning packets, based on the plurality of second numbers calculated for each of all the extracted combinations. Calculate

なお、検知モデル学習部１２０は、複数の第１の数の全てに、正の数を加算することで複数の第２の数を算出し、複数の第２の数に基づいて、複数の第１の確率を算出するとしたが、これに限らない。検知モデル学習部１２０は、例えば、抽出した全ての組み合わせのそれぞれについて算出した複数の第１の数に基づいて、複数の学習用パケットにおいて当該組み合わせが出現する複数の確率を算出し、算出した複数の確率に正の数を加算することで複数の第１の確率を算出してもよい。 Note that the detection model learning unit 120 calculates a plurality of second numbers by adding a positive number to all of the plurality of first numbers, and generates a plurality of second numbers based on the plurality of second numbers. Although the probability of 1 is calculated, it is not limited thereto. The detection model learning unit 120 calculates a plurality of probabilities in which the combination appears in the plurality of learning packets based on the plurality of first numbers calculated for each of all the extracted combinations, for example. The plurality of first probabilities may be calculated by adding a positive number to the probability of.

検知モデル学習部１２０は、取得部１１０により取得された複数の学習用パケットのそれぞれについて、さらに、当該学習用パケットが有するヘッダに応じて当該学習用パケットを複数のモデルのいずれか１つに分類してもよい。検知モデル学習部１２０は、複数の異常検知モデルを保持または学習しても良い。この場合、検知モデル学習部１２０は、この複数の異常検知モデルを、例えばパケットのヘッダに含まれる情報である、宛先ＩＰ、宛先ポート、送信元ＩＰ、及びプロトコルの少なくとも１つの値に応じて切り替えて学習または検査の処理を行う。 The detection model learning unit 120 further classifies the learning packet into any one of a plurality of models according to the header of the learning packet for each of the plurality of learning packets acquired by the acquiring unit 110. You may The detection model learning unit 120 may hold or learn a plurality of abnormality detection models. In this case, the detection model learning unit 120 switches the plurality of abnormality detection models according to at least one value of the destination IP, the destination port, the transmission source IP, and the protocol, which are information included in the header of the packet, for example. Process the learning or examination.

複数の学習用パケットを複数のモデルに分類する場合、検知モデル学習部１２０は、複数のモデルのそれぞれについて、カウントした第１の数を用いて、複数の学習用パケットのうち当該モデルに分類された複数の学習用パケットにおいて、全ての組み合わせのそれぞれが出現する回数である第５の数を算出してもよい。そして、検知モデル学習部１２０は、複数のモデルのそれぞれについて、当該モデルに分類された複数の学習用パケットから抽出した全ての組み合わせのそれぞれについて、算出した第５の数の全てに、正の数を加算することで複数の第６の数を算出する。その後、検知モデル学習部１２０は、複数のモデルのそれぞれについて抽出した全ての組み合わせのそれぞれについて、算出した複数の第６の数に基づいて、当該モデルに分類された複数の学習用パケットにおいて当該組み合わせが出現する確率を第１の確率として算出してもよい。 When classifying a plurality of learning packets into a plurality of models, the detection model learning unit 120 classifies the plurality of learning packets into the model using the first number counted for each of the plurality of models. In the plurality of learning packets, a fifth number may be calculated, which is the number of times each of all the combinations appear. Then, for each of the plurality of models, the detection model learning unit 120 generates a positive number for all of the calculated fifth numbers for each of all the combinations extracted from the plurality of learning packets classified into the model. The plurality of sixth numbers are calculated by adding. After that, the detection model learning unit 120 performs the combination in the plurality of learning packets classified into the model based on the calculated plurality of sixth numbers for each of all the combinations extracted for each of the plurality of models. The probability that C appears may be calculated as the first probability.

検知モデル学習部１２０は、例えば、ＣＰＵ１０１、メインメモリ１０２、ストレージ１０３などにより実現される。 The detection model learning unit 120 is realized by, for example, the CPU 101, the main memory 102, the storage 103, and the like.

例えば、検知モデル学習部１２０は、次のような処理を行うことで学習を実行する。 For example, the detection model learning unit 120 performs learning by performing the following process.

異常検知方法における学習方法では、パケットの宛先ＩＰ、宛先ポートごとにモデルを分割することになるため、学習フェーズにおいては、ユニークな（宛先ＩＰ、宛先ポート）ペアの数だけモデル

を初期化しておく。各モデルは、ペイロードの２−ｇｒａｍを記録する

を保持しており、その初期値は６５５３６次元の零ベクトルである。その後、学習対象となる全パケットに対し、学習対象のパケットの（宛先ＩＰ、宛先ポート）のペアが（ｉｐ_ｉ，Ｐｏｒｔ_ｊ）の場合、各パケットを、下記の方法で２−ｇｒａｍのベクトル（∈Ｎ^{６５５３６}）に変換し、

に加算する。各バケットの２−ｇｒａｍベクトルへの変換法は下記の通りである：パケットのペイロードのバイト列が｛Ｘ_１，Ｘ_２，Ｘ_３，．．．，Ｘ_Ｌ｝であったとする（Ｌはペイロード長）。ここで各Ｘ_ｋ（ｋ＝１，．．．，Ｌ）∈｛０，．．．，２５５｝である。２−ｇｒａｍを取得する場合、上記バイト列から次の２−ｇｒａｍの列を得る。｛Ｘ_１Ｘ_２，Ｘ_２Ｘ_３，．．．，Ｘ_Ｌ−１Ｘ_Ｌ｝。この２−ｇｒａｍの列から、次のルールで２−ｇｒａｍベクトルを生成する：
１．空のベクトルｙ（∈Ｎ^{６５５３６}）を準備する。
２．各２−ｇｒａｍＸ_ｉＸ_ｉ＋１（ｉ＝１，．．．，Ｌ一１）に対して、ｔ_ｉ＝２５６＊Ｘ_ｉ＋Ｘ_ｉ＋１を計算する。（ｔ_ｉ∈｛０，．．．，６５５３５｝）
３．全てのｉ＝１，．．．，Ｌ一１に対してｙ［ｔ_ｉ］＋１を実行する（ここでｙ［ｔ_ｉ］はベクトルｙのｔ_ｉ番目の要素を表す）。
４．パケットの２−ｇｒａｍベクトルへの変換結果ｙを得る。 In the learning method in the anomaly detection method, since the model is divided for each destination IP and destination port of the packet, in the learning phase, only the number of unique (destination IP, destination port) pairs is used.

Initialize it. Each model records a 2-gram of the payload

And its initial value is a 65536-dimensional zero vector. After that, if the pair of (destination IP, destination port) of the packet to be learned is (ip _i , Port _j ) for all packets to be learned, each packet is a 2-gram vector ( Convert to ∈ N 65 ⁵³⁶ ),

Add to The method of converting each bucket to a 2-gram vector is as follows: the packet payload byte sequence is {X ₁ , X ₂ , X ₃ ,. . . , X _L } (where L is the payload length). Where each X _k (k = 1,..., L) ∈ {0,. . . , 255}. When obtaining a 2-gram, the next 2-gram string is obtained from the byte string. {X ₁ X ₂ , X ₂ X ₃ ,. . . , X _L-1 X _L }. From this 2-gram column, generate a 2-gram vector according to the following rules:
1. Prepare an empty vector y (∈ N ⁶⁵⁵³⁶ ).
2. For each 2-gram X _i X _{i + 1} (i = 1,..., L 1 1), calculate t _i = 256 * X _i + X _{i +1} . (T _i ∈ {0, ..., 65535})
3. All i = 1,. . . , L 1 1 execute y [t _i ] +1 (where y [t _i ] represents the t _i th element of the vector y).
4. The conversion result y of the packet into a 2-gram vector is obtained.

全パケットの学習が完了した段階で、各モデル

は、どの２−ｇｒａｍが何回出現したかを表すベクトル

を保持している。この２−ｇｒａｍが何回出現したかを表すベクトルは、全ての組み合わせのそれぞれについて、カウントされることにより得られた第１の数の一例である。このベクトルを用いて、各２−ｇｒａｍの出現確率を算出する。最も単純に考えると、２−ｇｒａｍのインデックスをｋ∈｛０，１，．．．，６５５３５｝としたとき、２−ｇｒａｍｇ_ｋが出現する確率ｐ（ｇ_ｋ）は、学習対象バケットに現れた２−ｇｒａｍｇ_ｋの出現回数をｘ_ｉ，ｊ［ｋ］としたとき、下記の式８で表すことができる。式８で表される出現確率は、複数の第１の確率の一例である。 Once all packets have been learned, each model

Is a vector representing which 2-gram has appeared and how many times

Hold A vector indicating how many times this 2-gram has appeared is an example of the first number obtained by being counted for each of all the combinations. The occurrence probability of each 2-gram is calculated using this vector. In the simplest way, the indices of 2-grams are k∈ {0, 1,. . . , When a 65535}, the probability p _{(g k)} that 2-gram g _k appears, when the number of occurrences of 2-gram g _k appearing in the learning target bucket was _{x i,} j _[k], the following Equation 8 of The appearance probability represented by Equation 8 is an example of a plurality of first probabilities.

しかし、この式で確率を表現した場合、学習対象パケットに１度も現れなかった２−ｇｒａｍの確率を０とすることになり、後述するスコアリング手法を用いるとスコアが発散してしまう。この事象を回避するために、既にいくつかの方法が提案されているが、本実施の形態ではスムージング処理の１つであるＬａｐｌａｃｅｓｍｏｏｔｈｉｎｇと呼ばれる手法を採用する。Ｌａｐｌａｃｅｓｍｏｏｔｈｉｎｇとは、カテゴリデータをスムージングする手法の１種で、Ｎ回の試行の多項分布から得られたデータｘ＝（ｘ_１，．．．，ｘ_ｄ）があるとき、この多項分布のパラメータθ＝（θ_１，．．．，θ_ｄ）を下記の式９により推定する手法のことである。 However, when the probability is expressed by this expression, the probability of 2-gram that has never appeared in the learning target packet is set to 0, and the score is diverged when the scoring method described later is used. Several methods have been proposed to avoid this phenomenon, but in this embodiment, a method called Laplace smoothing, which is one of the smoothing processes, is employed. Laplace smoothing is one of the methods for smoothing categorical data, and when there are data x = (x ₁ , ..., x _d ) obtained from multinomial distribution of N trials, parameters of this multinomial distribution This is a method of estimating θ = (θ ₁ ,..., θ _d ) according to the following equation 9.

すなわち全てのカテゴリに対し、出現回数ｘ_ｉをα回分水増しして式１０を適用することに相当する。通常αは１，０．１，０．０１などの値を選ぶ。この方法を提案手法に適用すると、下記の式を得る。ここで、αは、加算する正の数の一例である。 That is, this corresponds to applying Equation 10 to the number of occurrences x _i by α for each category. Usually, α is selected to be a value such as 1, 0.1 or 0.01. When this method is applied to the proposed method, the following equation is obtained. Here, α is an example of a positive number to be added.

この式により得られたｐ（ｇ_ｋ）を２−ｇｒａｍｇ_ｋの出現確率とみなす。つまり、式１０により得られた出現確率は、第１の確率の一例である。 P (g _k ) obtained by this equation is regarded as the appearance probability of 2-gram g _k . That is, the appearance probability obtained by Equation 10 is an example of the first probability.

なお、スムージング処理としては、Ｌａｐｌａｃｅｓｍｏｏｔｈｉｎｇに限らずに、Ｋｎｅｓｅｒ−Ｎｅｙｓｍｏｏｔｈｉｎｇなど他のスムージング処理を行ってもよい。 The smoothing process is not limited to Laplace smoothing, and other smoothing processes such as Kneser-Ney smoothing may be performed.

異常検知モデルＤＢ１３０は、検知モデル学習部１２０により生成された、つまり算出された複数の第１の確率を異常検知モデルとして記憶する。異常検知モデルＤＢ１３０は、全ての第１の組み合わせのそれぞれにおける第１の数に基づく第４の数を異常検知モデルとして記憶していてもよい。なお、異常検知モデルにおいて記憶される第１の数に基づく第４の数は、第１の数であってもよいし、第２の数であってもよいし、第５の数であってもよいし、第６の数であってもよい。 The abnormality detection model DB 130 stores the plurality of first probabilities generated by the detection model learning unit 120, that is, calculated, as an abnormality detection model. The abnormality detection model DB 130 may store, as an abnormality detection model, a fourth number based on the first number in each of all the first combinations. Note that the fourth number based on the first number stored in the abnormality detection model may be the first number, the second number, or the fifth number. It may be a sixth number.

図５〜図８は、異常検知モデルＤＢが保持する異常検知モデルの例を示す図である。 5-8 is a figure which shows the example of the abnormality detection model which abnormality detection model DB hold | maintains.

図５に示す異常検知モデル１３１は、モデルＩＤ、宛先ＩＰ、宛先ポート、Ｎ−ｇｒａｍ取得対象データ、Ｎ−ｇｒａｍ出現回数、および、Ｎ−ｇｒａｍ出現確率の各項目のデータにより構成される。モデルＩＤは、複数のモデルのそれぞれを識別するための、当該モデルに一意に付与された識別子である。宛先ＩＰは、当該モデルに対応付けられたパケットの宛先ＩＰを示す情報である。宛先ポートは、当該モデルに対応付けられたパケットの宛先ポートを示す情報である。Ｎ−ｇｒａｍ取得対象データは、当該モデルに対応付けられたＮ−ｇｒａｍ取得の対象となるデータを示す情報であり、例えば各プロトコルのパケットのデータ部を示すペイロードである。Ｎ−ｇｒａｍ出現回数ｎ_１〜ｎ_６は、当該モデルに対応付けられたパケットのＮ−ｇｒａｍの出現回数、つまり、第６の数の一例である。Ｎ−ｇｒａｍ出現回数ｎ_１〜ｎ_６は、全てのＮ−ｇｒａｍのそれぞれの出現回数で表されるため、全てのＮ−ｇｒａｍの数に対応する次元のベクトルデータである。つまり、ｎ_１は、例えば、［００：５１回，０１：１２回，．．．，ＦＦ：３１回］で表される。よって、ｎ_ｋ（ｋは、１〜６の整数）は、例えば、［ｎ_ｋ１、ｎ_ｋ２、・・・、ｎ_ｋＬ］で表される。Ｎ−ｇｒａｍ出現確率Ｐｒ_１〜Ｐｒ_６は、当該モデルに対応付けられたパケットにおけるＮ−ｇｒａｍの出現確率、つまり、第１の確率の一例である。Ｎ−ｇｒａｍ出現確率Ｐｒ_１〜Ｐｒ_６も、Ｎ−ｇｒａｍ出現回数ｎ_１〜ｎ_６と同様に、全てのＮ−ｇｒａｍの数に対応する次元のベクトルデータである。つまり、Ｐｒ_１は、例えば、［００：０．１，０１：０．０２，．．．，ＦＦ：０．０６］で表される。よって、Ｐｒ_ｋ（ｋは、１〜６の整数）は、例えば、｛Ｐｒ_ｋ１、Ｐｒ_ｋ２、・・・、Ｐｒ_ｋＬ｝で表される。このように、異常検知モデル１３１では、宛先ＩＰおよび宛先ポートの組み合わせに応じて複数のモデルが分類される。 The abnormality detection model 131 illustrated in FIG. 5 includes data of each item of model ID, destination IP, destination port, N-gram acquisition target data, N-gram appearance frequency, and N-gram appearance probability. The model ID is an identifier uniquely assigned to the model for identifying each of the plurality of models. The destination IP is information indicating the destination IP of the packet associated with the model. The destination port is information indicating the destination port of the packet associated with the model. The N-gram acquisition target data is information indicating data to be an N-gram acquisition target associated with the model, and is, for example, a payload indicating a data section of a packet of each protocol. The N-gram appearance frequency n _{1 to} n ₆ is an example of the N-gram appearance frequency of the packet associated with the model, that is, the sixth number. The N-gram appearance frequency n _{1 to} n ₆ is vector data of a dimension corresponding to the number of all N-grams because it is represented by the number of appearances of all N-grams. That is, n ₁ is, for example, [00: 51 times, 01: 12 times,. . . , FF: 31 times]. Therefore, n _k (k is an integer of 1 to 6) is represented by, for example, [n _k1 , n _k2 ,..., N _kL ]. N-gram probability _Pr 1 to PR ₆ is the probability of occurrence of N-gram in the packet associated with the model, that is, an example of the first probability. The N-gram appearance probabilities Pr _{1 to} Pr ₆ are also vector data of dimensions corresponding to the number of all N-grams, similarly to the N-gram appearance times n _{1 to} n ₆ . That is, Pr ₁ is, for example, [00: 0.1, 01: 0.02,. . . , FF: 0.06]. Therefore, Pr _k (k is an integer of 1 to 6) is represented by, for example, {Pr _k1 , Pr _k2 ,..., Pr _kL }. Thus, in the anomaly detection model 131, a plurality of models are classified according to the combination of the destination IP and the destination port.

つまり、検知モデル学習部１２０は、宛先のＩＰ、宛先ポートごとに学習し、学習した結果を異常検知モデル１３１として生成する。宛先ＩＰ、宛先ポートごとに異常検知モデルを学習する理由は、宛先ＩＰ、宛先ポートが同一のバケットには、似た役割を持つパケットが多いからである。 That is, the detection model learning unit 120 learns for each destination IP and destination port, and generates the learning result as the abnormality detection model 131. The reason for learning the anomaly detection model for each destination IP and destination port is that there are many packets having similar roles in buckets having the same destination IP and destination port.

図６に示す異常検知モデル１３２は、図５で示した異常検知モデル１３１に、さらに、ＳｏｒｃｅＩＰの項目を加えたモデルである。ＳｏｒｃｅＩＰは、当該モデルに対応付けられたパケットの送信元ＩＰを示す情報である。このように、異常検知モデル１３２では、宛先ＩＰおよび宛先ポートに加えて送信元ＩＰの組み合わせに応じて複数のモデルが分類される。 The abnormality detection model 132 shown in FIG. 6 is a model obtained by further adding an item of Sorce IP to the abnormality detection model 131 shown in FIG. 5. Sorce IP is information indicating a transmission source IP of a packet associated with the model. Thus, in the anomaly detection model 132, a plurality of models are classified according to the combination of the destination IP and the destination IP in addition to the source IP.

図７に示す異常検知モデル１３３は、図６で示した異常検知モデル１３２の宛先ポートの項目を対象プロトコルの項目で置き換えたモデルである。対象プロトコルは、当該モデルに対応付けられたパケットのプロトコルを示す情報である。このように、異常検知モデル１３３では、宛先ＩＰ、送信元ＩＰおよび対象プロトコルの組み合わせに応じて複数のモデルが分類される。 The abnormality detection model 133 shown in FIG. 7 is a model in which the item of the destination port of the abnormality detection model 132 shown in FIG. 6 is replaced with the item of the target protocol. The target protocol is information indicating the protocol of the packet associated with the model. As described above, in the abnormality detection model 133, a plurality of models are classified according to the combination of the destination IP, the transmission source IP, and the target protocol.

図８に示す異常検知モデル１３４は、図５で示した異常検知モデル１３１に、さらに、アラート閾値の項目を加えたモデルである。アラート閾値は、後述するが、例えばユーザの入力に応じて決定される閾値であって、パケットの異常を検知するための閾値である。アラート閾値は、パケットにおいて算出されるスコアとの比較対象となる閾値である。このように、異常検知モデル１３４は、モデルの分類に加えて、パケットの異常を検知するためのアラート閾値が対応付けられていてもよい。 The abnormality detection model 134 shown in FIG. 8 is a model in which an alert threshold item is further added to the abnormality detection model 131 shown in FIG. 5. The alert threshold, which will be described later, is a threshold determined according to, for example, the user's input, and is a threshold for detecting a packet abnormality. The alert threshold is a threshold to be compared with the score calculated in the packet. Thus, the anomaly detection model 134 may be associated with an alert threshold for detecting an anomaly of a packet in addition to the classification of the model.

なお、図８に示す異常検知モデル１３４のように、モデルの分類にアラート閾値は必ずしも対応付けられていなくてもよい。 As in the abnormality detection model 134 shown in FIG. 8, the alert threshold may not necessarily be associated with the classification of the model.

なお、図５〜図８に示す異常検知モデル１３１〜１３４では、モデルの数は６つであるが、６つに限らずに、２以上の６以外の数であってもよい。 In the abnormality detection models 131 to 134 shown in FIGS. 5 to 8, the number of models is six, but the number is not limited to six, and may be a number other than two or more than six.

図９は、宛先ポートとアラート閾値とが対応付けられた対応情報を示す図である。 FIG. 9 is a diagram showing correspondence information in which a destination port and an alert threshold are associated.

図９に示す対応情報１３５に示すように、アラート閾値は、モデル毎に対応付けられていなくてもよく、宛先ポートごとに対応付けられていてもよい。つまり、異常検知モデル１３１〜１３３のいずれか１つと、対応情報１３５とに応じて、各モデルにアラート閾値が対応付けられてもよい。 As shown in the correspondence information 135 shown in FIG. 9, the alert threshold may not be associated with each model, but may be associated with each destination port. That is, according to any one of the abnormality detection models 131 to 133 and the correspondence information 135, an alert threshold may be associated with each model.

異常検知モデルＤＢ１３０は、異常検知モデル１３１〜１３３のいずれか１つと、対応情報１３５とをセットで保持していてもよいし、異常検知モデル１３４のみを保持していてもよい。 The abnormality detection model DB 130 may hold any one of the abnormality detection models 131 to 133 and the correspondence information 135 as a set, or may hold only the abnormality detection model 134.

異常検知モデルＤＢ１３０は、例えば、ストレージ１０３などにより実現される。 The abnormality detection model DB 130 is realized by, for example, the storage 103 or the like.

入力受付部１４０は、ユーザからの入力を受け付ける。入力受付部１４０は、ユーザから、例えば、監視対象３００から得られる複数のパケットのうち、監視対象のＩＰの範囲、および、ポートの範囲の少なくとも一方と、Ｎ−ｇｒａｍを抽出する範囲とを示す入力を受け付ける。ここでＮ−ｇｒａｍを抽出する範囲とは、例えば、異常検知モデル１３１〜１３４においてＮ−ｇｒａｍ取得対象データで示されるパケットの検査の対象とするデータ部のことであり、例えば、各プロトコルに対応するペイロードである。また、入力受付部１４０は、アラートを発生するためのアラート発生率に関するパラメータの入力を受け付ける。アラート発生率とは、例えば、ａ個のパケットに１つ、１日にｂ回など全ての組み合わせが発生する発生率について、当該発生率に基づく通常状態からの乖離が大きいと判断するため、つまり、当該パケットに異常が含まれると判断するための閾値である。ここで、入力受付部１４０は、上記パラメータを、複数のモデルにそれぞれ対応する複数のパラメータを受け付ける。なお、入力受付部１４０は、上記パラメータを、複数のモデルに共通する１つのパラメータとして受け付けてもよい。入力受付部１４０は、アラート閾値を示す入力を受け付けてもよい。アラート閾値は、例えば、後述するスコアについて、当該スコアに基づく通常状態からの乖離が大きいと判断するため、つまり、当該パケットに異常が含まれると判断するための全ての組み合わせの出現確率を基準として決定される閾値である。 The input receiving unit 140 receives an input from the user. The input reception unit 140 indicates, from the user, at least one of the range of the IP to be monitored and the range of the port among a plurality of packets obtained from the monitoring target 300, for example, and a range of extracting the N-gram. Accept input. Here, the range from which the N-gram is extracted refers to, for example, a data unit to be subjected to inspection of a packet indicated by N-gram acquisition target data in the abnormality detection models 131 to 134, and corresponds to each protocol, for example. Payload. Further, the input receiving unit 140 receives input of a parameter related to an alert occurrence rate for generating an alert. The alert occurrence rate is, for example, to determine that the deviation from the normal state based on the occurrence rate is large for the occurrence rate in which all combinations such as one in a packets and b times a day occur. The threshold value is used to determine that the packet contains an abnormality. Here, the input accepting unit 140 accepts a plurality of parameters respectively corresponding to a plurality of models. The input accepting unit 140 may accept the above-described parameter as one parameter common to a plurality of models. The input receiving unit 140 may receive an input indicating an alert threshold. The alert threshold is, for example, a score to be described later, in order to determine that the deviation from the normal state based on the score is large, that is, based on the appearance probability of all combinations for determining that the packet includes an abnormality. It is a threshold to be determined.

入力受付部１４０は、例えば、ＣＰＵ１０１、メインメモリ１０２、ストレージ１０３、入力ＩＦ１０５などにより実現される。 The input receiving unit 140 is realized by, for example, the CPU 101, the main memory 102, the storage 103, the input IF 105, and the like.

アラート閾値算出部１５０は、入力受付部１４０により受け付けられた、アラート発生率に関するパラメータと、学習用パケットに対して算出されたスコアとに基づいてアラート閾値を算出する。アラート閾値算出部１５０は、学習用パケットにおいて算出された複数の第１の確率を後述する式１２に適用することで、学習用パケットに対するスコアを算出する。アラート閾値算出部１５０は、例えば、パラメータにより指定されたアラート発生率以下となるように、アラート閾値を算出する。アラート閾値算出部１５０は、複数のモデルにそれぞれ複数のパラメータが入力された場合には、複数のモデル毎のパラメータに基づいてアラート閾値を算出する。アラート閾値算出部１５０により算出された複数のモデル毎にアラート閾値は、異常検知モデルＤＢ１３０の異常検知モデルとして記憶される。アラート閾値算出部１５０は、例えば、ＣＰＵ１０１、メインメモリ１０２、ストレージ１０３などにより実現される。 The alert threshold calculation unit 150 calculates an alert threshold based on the parameter related to the alert occurrence rate accepted by the input acceptance unit 140 and the score calculated for the learning packet. The alert threshold calculation unit 150 calculates the score for the learning packet by applying the plurality of first probabilities calculated in the learning packet to Equation 12 described later. The alert threshold calculation unit 150 calculates an alert threshold so as to be, for example, equal to or less than the alert occurrence rate specified by the parameter. The alert threshold calculation unit 150 calculates an alert threshold based on parameters for each of a plurality of models when a plurality of parameters are input to each of a plurality of models. The alert threshold for each of the plurality of models calculated by the alert threshold calculation unit 150 is stored as an abnormality detection model of the abnormality detection model DB 130. The alert threshold calculation unit 150 is realized by, for example, the CPU 101, the main memory 102, the storage 103, and the like.

検知部１６０は、取得部１１０により取得された複数のパケットのそれぞれについて、異常があるか否かを検知する。具体的には、検知部１６０は、取得部１１０により取得された複数のパケットのそれぞれについて、以下の（１）〜（６）の処理を順に行う。 The detection unit 160 detects whether there is an abnormality in each of the plurality of packets acquired by the acquisition unit 110. Specifically, the detection unit 160 sequentially performs the following processes (1) to (6) for each of the plurality of packets acquired by the acquisition unit 110.

（１）検知部１６０は、当該パケットに含まれるペイロードを構成するデータ列をＡビット単位で区切ることにより得られる複数個のデータ単位のうちＮ個のデータ単位の取り得る全ての第２の組み合わせであって、当該ペイロードにおける互いに連続している並び順でのＮ個のデータ単位の組み合わせを抽出する。ここで抽出される組み合わせは、第２の組み合わせの一例である。検知部１６０は、具体的には、検知モデル学習部１２０と同様にＮ−ｇｒａｍを用いることで、Ｎ個のデータ単位の組み合わせを抽出する。ここで、Ｎは、例えば、２または３である。つまり、検知部１６０は、２−ｇｒａｍまたは３−ｇｒａｍを用いることで、２個のデータ単位の組み合わせ、または、３個のデータ単位の組み合わせを抽出する。なお、検知部１６０は、Ｎ−ｇｒａｍのように互いに連続している並び順でのＮ個のデータ単位の組み合わせに限らずに、Ｂ（Ｂは１以上の整数）個飛ばしの並び順でのＮ個のデータ単位の組み合わせを抽出してもよい。 (1) The detection unit 160 selects all possible second combinations of N data units among the plurality of data units obtained by dividing the data string making up the payload included in the packet into A bit units. And extract a combination of N data units in consecutive order in the payload. The combination extracted here is an example of a second combination. Specifically, the detection unit 160 extracts a combination of N data units by using an N-gram as in the detection model learning unit 120. Here, N is, for example, 2 or 3. That is, the detection unit 160 extracts a combination of two data units or a combination of three data units by using 2-gram or 3-gram. Note that the detection unit 160 is not limited to the combination of N data units in a sequential order such as N-gram, but may be a sequential order of B (B is an integer of 1 or more) skips. A combination of N data units may be extracted.

（２）検知部１６０は、当該パケットから抽出した全ての組み合わせのそれぞれについて、当該組み合わせが、取得した当該パケットが有するペイロードにおいて出現する回数である第３の数をカウントする。 (2) The detection unit 160 counts, for each of all combinations extracted from the packet, a third number, which is the number of times the combination appears in the payload of the acquired packet.

（３）検知部１６０は、当該パケットから抽出した全ての組み合わせのそれぞれについて、カウントすることで得られた複数の第３の数に基づいて、当該パケットにおいて当該組み合わせが出現する確率である複数の第２の確率を算出する。 (3) The detection unit 160 determines a plurality of probability that the combination appears in the packet based on the plurality of third numbers obtained by counting each of all the combinations extracted from the packet. Calculate the second probability.

（４）検知部１６０は、当該パケットに対して算出した複数の第２の確率の対数の総和をペイロードのペイロード長で規定される規定値で除算することでスコアを算出する。 (4) The detection unit 160 calculates a score by dividing the sum of logarithms of the plurality of second probabilities calculated for the packet by a specified value defined by the payload length of the payload.

（５）検知部１６０は、当該パケットに対して算出したスコアが、異常検知モデルＤＢ１３０に記憶されている異常検知モデルに基づく所定の閾値としてのアラート閾値を超えているか否かを判定する。検知部１６０は、アラート閾値を超えるスコアが算出されたパケットに異常があることを検知し、アラート閾値以下のスコアが算出されたパケットに異常が無いことを検知する。 (5) The detection unit 160 determines whether the score calculated for the packet exceeds an alert threshold as a predetermined threshold based on the abnormality detection model stored in the abnormality detection model DB 130. The detection unit 160 detects that there is an abnormality in the packet in which the score exceeding the alert threshold is calculated, and detects that there is no abnormality in the packet in which the score below the alert threshold is calculated.

なお、検知部１６０は、検知モデル学習部１２０と同様に、取得部１１０において取得された複数のパケットのそれぞれを、当該パケットが有するヘッダに応じて複数のモデルのいずれか１つに分類してもよい。この場合、検知部１６０は、算出したスコアが、当該スコアが算出されたパケットが分類されたモデルに対応する所定の閾値を超えているか否かを判定してもよい。 As in the detection model learning unit 120, the detection unit 160 classifies each of the plurality of packets acquired by the acquisition unit 110 into any one of a plurality of models according to the header of the packet. It is also good. In this case, the detection unit 160 may determine whether the calculated score exceeds a predetermined threshold value corresponding to the model into which the packet for which the score is calculated is classified.

検知部１６０は、例えば、ＣＰＵ１０１、メインメモリ１０２、ストレージ１０３などにより実現される。 The detection unit 160 is realized by, for example, the CPU 101, the main memory 102, the storage 103, and the like.

例えば、検知部１６０は、次のような処理を行うことで検査を実行する。 For example, the detection unit 160 performs an inspection by performing the following process.

本実施の形態に係る異常検知方法では、検知部１６０は、ＰＡＹＬやＡＮＡＧＲＡＭと同様、検査フェーズでは各パケットに対して異常スコアを算出する。異常スコアの算出対象となる各パケットは、学習フェーズに行った変換法と同様に２−ｇｒａｍのベクトルｙ（∈Ｎ^{６５５３６}）に変換する。変換されたベクトルに対し、次の式を使ってスコアリングを行う。 In the abnormality detection method according to the present embodiment, the detection unit 160 calculates an abnormality score for each packet in the inspection phase, as in PAYL and ANAGRAM. Each packet to be subjected to abnormality score calculation is converted into a 2-gram vector y (∈N ⁶⁵⁵³⁶ ) in the same manner as the conversion method performed in the learning phase. Score the transformed vector using the following formula.

式１１において、Ｌ乗根を採るのは、異なる長さのペイロードに対して平等なスコアの比較が行なえるようにするためである。このスコアを直接計算するのは指数演算が入ってしまい負荷が高いため、またスコアが大きなパケットほど異常度が高いパケットとして扱うため、上記ｓｃｏｒｅ’の負の対数をスコアとして扱うことにする。すなわち、ｓｃｏｒｅの算出は下記の式１２によって行われる。 The reason for taking the L-th power in Equation 11 is to enable equal score comparison to be performed for payloads of different lengths. This score is directly calculated because the load is high due to the addition of an exponentiation operation, and a packet with a larger score is treated as a packet with a higher degree of anomaly, so the negative logarithm of the score 'is treated as a score. That is, calculation of score is performed by the following equation 12.

ｓｃｏｒｅは値が大きければ大きいほど異常度が高いとみなせる。このスコアリングの仕方はＡＮＡＧＲＡＭ（ｆｒｅｑｕｅｎｃｙ−ｂａｓｅｄ）のものと比較して合理的なものであり、後述の評価結果からもわかるように、ＡＮＡＧＲＡＭ（ｆｒｅｑｕｅｎｃｙ−ｂａｓｅｄ）よりも高い性能を発揮する。 The higher the score, the higher the degree of abnormality. This scoring method is rational as compared to that of ANAGRAM (frequency-based), and exhibits higher performance than ANAGRAM (frequency-based) as can be understood from the evaluation results described later.

提示部１７０は、検知部１６０において異常があることを検知されたパケットについて、当該パケットに異常があることを示す情報であるアラートを出力する。なお、提示部１７０は、算出されたスコアを出力してもよい。提示部１７０は、スコアを提示する場合、アラートを出力するか否かにかかわらずスコアを出力してもよく、スコアを出力し、アラートを出力しなくてもよい。提示部１７０は、例えば、ディスプレイ１０６にアラートを示す画像を表示させることで、ユーザにアラートを提示する。提示部１７０は、例えば、ＣＰＵ１０１、メインメモリ１０２、ストレージ１０３、ディスプレイ１０６などにより実現される。 The presentation unit 170 outputs an alert, which is information indicating that the packet is abnormal, for the packet whose abnormality is detected by the detection unit 160. In addition, the presentation unit 170 may output the calculated score. When presenting the score, the presentation unit 170 may output the score regardless of whether or not the alert is output, and may output the score and may not output the alert. The presentation unit 170 presents an alert to the user, for example, by causing the display 106 to display an image indicating the alert. The presentation unit 170 is realized by, for example, the CPU 101, the main memory 102, the storage 103, the display 106, and the like.

なお、提示部１７０は、異常検知装置１００がスピーカを有している場合には、音によってスピーカからユーザにアラートを提示してもよい。また、提示部１７０は、アラートを示す情報をスマートフォンなどの情報端末に出力することにより、情報端末にアラートを提示させてもよい。 In addition, the presentation part 170 may present an alert to a user from a speaker by a sound, when the abnormality detection apparatus 100 has a speaker. In addition, the presentation unit 170 may cause the information terminal to present the alert by outputting information indicating the alert to the information terminal such as a smartphone.

［２−４動作］
次に、異常検知装置１００における動作について説明する。 [2-4 Operation]
Next, the operation of the abnormality detection apparatus 100 will be described.

図１０は、異常検知装置における動作の概要を示すフローチャートである。 FIG. 10 is a flowchart showing an outline of the operation in the abnormality detection device.

異常検知装置１００は、まず、取得部１１０により取得された学習用データ２１１である複数の学習用パケットを用いて、学習処理を実行する（Ｓ１）。これにより、異常検知装置１００では、複数のモデル毎に異常検知モデルが生成される。学習処理の詳細は、後述する。 First, the abnormality detection apparatus 100 executes a learning process using a plurality of learning packets that are learning data 211 acquired by the acquisition unit 110 (S1). Thereby, in the abnormality detection apparatus 100, an abnormality detection model is generated for each of the plurality of models. Details of the learning process will be described later.

次に、異常検知装置１００は、アラート閾値決定処理を実行する（Ｓ２）。ここれにより、異常検知装置１００では、アラート閾値が異常検知モデルのモデル毎に対応付けられる。アラート閾値決定処理の詳細は、後述する。 Next, the abnormality detection apparatus 100 executes alert threshold determination processing (S2). Hereby, in the abnormality detection apparatus 100, the alert threshold is associated with each model of the abnormality detection model. Details of the alert threshold determination process will be described later.

最後に、異常検知装置１００は、取得部１１０により取得された検査用データ２１２である複数のパケットを用いて、検査処理を実行する（Ｓ３）。これにより、異常検知装置１００は、複数のパケットのそれぞれについて、異常があるか否かを検知する。検査処理の詳細は、後述する。 Finally, the abnormality detection apparatus 100 executes an inspection process using a plurality of packets that are inspection data 212 acquired by the acquisition unit 110 (S3). Thus, the abnormality detection apparatus 100 detects whether there is an abnormality in each of the plurality of packets. Details of the inspection process will be described later.

次に、学習処理、つまり学習方法の詳細について説明する。 Next, the learning process, that is, the learning method will be described in detail.

図１１は、異常検知装置における学習処理の詳細の一例を示すフローチャートである。 FIG. 11 is a flowchart showing an example of the details of the learning process in the abnormality detection device.

まず、異常検知装置１００では、入力受付部１４０が監視対象３００から得られる複数のパケットのうち、監視対象のＩＰの範囲、および、ポートの範囲の少なくとも一方と、Ｎ−ｇｒａｍを抽出する範囲とを示す入力を受け付ける（Ｓ１１）。また、入力受付部１４０は、このとき、学習用パケットのプロトコルの識別が必要か否かを示す情報の入力をユーザから受け付けてもよい。ステップＳ１１の処理は、一度実行されればよく、学習の度に実行されなくてもよい。 First, in the abnormality detection apparatus 100, of the plurality of packets obtained from the monitoring target 300, the input receiving unit 140 extracts at least one of the range of the monitoring target IP and the range of the port, and the range in which the N-gram is extracted Is accepted (S11). At this time, the input accepting unit 140 may accept input of information indicating whether it is necessary to identify the protocol of the learning packet from the user. The process of step S11 may be performed once and may not be performed each time learning.

次に、取得部１１０は、学習用データ２１１である複数の学習用パケットを取得する（Ｓ１２）。 Next, the acquisition unit 110 acquires a plurality of learning packets that are the learning data 211 (S12).

以下、検知モデル学習部１２０は、複数の学習用パケットのそれぞれについて、ステップＳ１３〜ステップＳ２０の処理を繰り返す。 Subsequently, the detection model learning unit 120 repeats the processing of steps S13 to S20 for each of the plurality of learning packets.

検知モデル学習部１２０は、学習を実行するのに、学習用パケットのプロトコルの識別が必要であるか否かを判定する（Ｓ１３）。検知モデル学習部１２０は、例えば、ステップＳ１１において入力受付部１４０がプロトコルの識別が必要であることを示す情報の入力を受け付けていれば、プロトコルの識別が必要であると判定し、そうでなければ、プロトコルの識別が不要であると判定する。検知モデル学習部１２０は、プロトコルの識別が必要であると判定すれば（Ｓ１３でＹｅｓ）、ステップＳ１４に進み、プロトコルの識別が不要である判定すれば（Ｓ１３でＮｏ）、ステップＳ１５に進む。 The detection model learning unit 120 determines whether it is necessary to identify the protocol of the learning packet in order to execute learning (S13). For example, if the input accepting unit 140 accepts the input of information indicating that the identification of the protocol is required in step S11, the detection model learning unit 120 determines that the identification of the protocol is necessary. For example, it is determined that identification of the protocol is unnecessary. If the detection model learning unit 120 determines that the identification of the protocol is necessary (Yes in S13), the process proceeds to Step S14. If the identification of the protocol is not necessary (No in S13), the process proceeds to Step S15.

検知モデル学習部１２０は、ステップＳ１４において、処理対象の学習用パケットのヘッダに基づいてプロトコルの識別処理を実行し、ステップＳ１５に進む。 In step S14, the detection model learning unit 120 executes a protocol identification process based on the header of the processing target learning packet, and proceeds to step S15.

検知モデル学習部１２０は、ステップＳ１５において、処理対象の学習用パケットが該当するモデルを特定する。検知モデル学習部１２０は、処理対象の学習用パケットのヘッダを読み取ることで得られる、宛先ＩＰ、宛先ポート、プロトコル、および送信元ＩＰ少なくとも１つに応じたモデルを特定する。ここで、検知モデル学習部１２０は、ステップＳ１１において受け付けられた監視対象のＩＰの範囲、および、ポートの範囲の少なくとも一方と、Ｎ−ｇｒａｍを抽出する範囲とに応じて、特定するモデルの分類を決定する。 In step S15, the detection model learning unit 120 specifies a model to which the processing target learning packet corresponds. The detection model learning unit 120 specifies a model according to at least one of the destination IP, the destination port, the protocol, and the transmission source IP, which is obtained by reading the header of the processing target learning packet. Here, the detection model learning unit 120 classifies the model to be identified according to at least one of the range of the IP to be monitored received in step S11 and the range of the port and the range for extracting the N-gram. Decide.

検知モデル学習部１２０は、特定したモデルが既に存在しているか否かを判定する（Ｓ１６）。つまり、検知モデル学習部１２０は、特定したモデルに属する学習用パケットが既に存在しているか否かを判定する。検知モデル学習部１２０は、特定したモデルがまだ存在していないと判定すれば（Ｓ１６でＮｏ）、ステップＳ１７に進み、特定したモデルが既に存在すると判定すれば（Ｓ１６でＹｅｓ）、ステップＳ１８に進む。 The detection model learning unit 120 determines whether the identified model already exists (S16). That is, the detection model learning unit 120 determines whether a learning packet belonging to the specified model already exists. If the detection model learning unit 120 determines that the specified model does not exist yet (No in S16), the process proceeds to step S17, and if it is determined that the specified model already exists (Yes in S16), the process proceeds to step S18. move on.

検知モデル学習部１２０は、ステップＳ１７において、特定したモデルを新規モデルとして追加し、ステップＳ１８に進む。 In step S17, the detection model learning unit 120 adds the identified model as a new model, and the process proceeds to step S18.

検知モデル学習部１２０は、ステップＳ１８において、処理対象の学習用パケット中の対象データ部を抽出する。具体的には、検知モデル学習部１２０は、ステップＳ１１において受け付けられたＮ−ｇｒａｍを抽出する範囲を示す入力に基づいて特定される対象データ部であって、各モデルに対応付けられた検査の対象となる対象データ部を抽出する。 In step S18, the detection model learning unit 120 extracts the target data portion in the processing target learning packet. Specifically, the detection model learning unit 120 is a target data unit identified based on the input indicating the range for extracting the N-gram received in step S11, and is an examination of the examination associated with each model. Extract the target data part to be the target.

検知モデル学習部１２０は、処理対象の学習用パケットが属するモデルのＮ−ｇｒａｍ出現回数ｎ_１〜ｎ_６をカウントする（Ｓ１９）。ここで、検知モデル学習部１２０は、Ｎ−ｇｒａｍ出現回数ｎ_１〜ｎ_６として、第５の数をカウントし、第５の数から第６の数を算出する。これにより、検知モデル学習部１２０は、処理対象のパケットにおける学習処理を終了する。 The detection model learning unit 120 counts the N-gram appearance frequency n _{1 to} n ₆ of the model to which the processing target learning packet belongs (S19). Here, the detection model learning unit 120 counts the fifth number as the N-gram appearance frequency n _{1 to} n ₆ , and calculates the sixth number from the fifth number. Thus, the detection model learning unit 120 ends the learning process on the processing target packet.

検知モデル学習部１２０は、複数の学習用パケットのうち、未学習のパケットが存在するか否かを判定し（Ｓ２０）、未学習のパケットが存在すれば（Ｓ２０でＹｅｓ）、未学習のパケットについてステップＳ１３〜ステップＳ１９の処理を実行する。検知モデル学習部１２０は、未学習のパケットが存在しなければ（Ｓ２０でＮｏ）、つまり、全ての学習用パケットについてステップＳ１３〜ステップＳ１９の処理が終了していれば、学習処理を終了する。 The detection model learning unit 120 determines whether an unlearned packet exists among the plurality of learning packets (S20), and if an unlearned packet exists (Yes in S20), the unlearned packet The processing of step S13 to step S19 is executed. The detection model learning unit 120 ends the learning process if the unlearned packet does not exist (No in S20), that is, if the process of steps S13 to S19 is completed for all the learning packets.

なお、取得部１１０は、複数の学習用パケットを一度に全て取得しなくてもよく、複数回に分けて取得してもよく、例えば、複数の学習用パケットを１つずつ取得してもよい。このように、取得部１１０が複数回に分けて複数の学習用パケットを取得する場合、異常検知装置１００は、ステップＳ１２〜ステップＳ２０を繰り返すこととなる。 Note that the acquiring unit 110 may not acquire all of the plurality of learning packets at one time, may divide it into multiple times, and may acquire multiple learning packets one by one, for example . As described above, when the acquisition unit 110 divides a plurality of times and acquires a plurality of learning packets, the abnormality detection apparatus 100 repeats steps S12 to S20.

次に、アラート閾値決定処理の詳細について説明する。 Next, the details of the alert threshold determination process will be described.

図１２は、アラート閾値決定処理の詳細の一例を示すフローチャートである。 FIG. 12 is a flowchart illustrating an example of the alert threshold determination process in detail.

異常検知装置１００では、入力受付部１４０がアラートを発生するためのアラート発生率に関するパラメータの入力を受け付け、受け付けたパラメータを設定する（Ｓ２１）。 In the abnormality detection apparatus 100, the input receiving unit 140 receives an input of a parameter related to the alert occurrence rate for generating an alert, and sets the received parameter (S21).

次に、アラート閾値算出部１５０は、学習用パケットにおいて算出された複数の第１の確率を式１２に適用することで学習用パケットに対するスコアを算出する（Ｓ２２）。 Next, the alert threshold calculation unit 150 calculates a score for the learning packet by applying the plurality of first probabilities calculated in the learning packet to Equation 12 (S22).

そして、アラート閾値算出部１５０は、入力受付部１４０により受け付けられた、アラート発生率に関するパラメータと、学習用パケットに対して算出されたスコアとに基づいてアラート閾値を算出する（Ｓ２３）。アラート閾値算出部１５０は、例えば、パラメータにより指定されたアラート発生率以下となるように、アラート閾値を算出する。 Then, the alert threshold calculation unit 150 calculates an alert threshold based on the parameter related to the alert occurrence rate received by the input receiving unit 140 and the score calculated for the learning packet (S23). The alert threshold calculation unit 150 calculates an alert threshold so as to be, for example, equal to or less than the alert occurrence rate specified by the parameter.

図１２の例では、異常検知装置１００は、パラメータからアラート閾値を算出するとしたが、次のようにアラート閾値をユーザから直接受け付けてもよい。 Although the abnormality detection apparatus 100 calculates the alert threshold from the parameter in the example of FIG. 12, the alert threshold may be directly received from the user as follows.

図１３は、アラート閾値決定処理の詳細の他の一例を示すフローチャートである。 FIG. 13 is a flowchart illustrating another example of details of the alert threshold determination process.

異常検知装置１００では、入力受付部１４０がアラート閾値を示す入力を受け付ける（Ｓ２１Ａ）。 In the abnormality detection apparatus 100, the input receiving unit 140 receives an input indicating an alert threshold (S21A).

アラート閾値算出部１５０は、入力受付部１４０により受け付けられた入力が示すアラート閾値を、アラート閾値として設定する（Ｓ２２Ａ）。 The alert threshold calculation unit 150 sets an alert threshold indicated by the input received by the input reception unit 140 as an alert threshold (S22A).

次に、検査処理、つまり異常検知方法の詳細について説明する。 Next, the inspection process, that is, the abnormality detection method will be described in detail.

図１４は、異常検知装置における検査処理の詳細の一例を示すフローチャートである。 FIG. 14 is a flowchart showing an example of the details of inspection processing in the abnormality detection device.

異常検知装置１００では、検知モデル学習部１２０が異常検知モデルにおける複数のモデルのそれぞれのＮ−ｇｒａｍ出現回数ｎ_１〜ｎ_６からＮ−ｇｒａｍ出現確率Ｐｒ_１〜Ｐｒ_６を算出する（Ｓ３１）。 In the abnormality detecting device 100, the detection model learning unit 120 calculates the N-gram probability _Pr 1 to PR ₆ from each of the N-gram number of occurrences _n 1 ~n ₆ of a plurality of models in the abnormality detection model (S31).

次に、取得部１１０は、検査用データ２１２である複数のパケットを取得する（Ｓ３２）。 Next, the acquisition unit 110 acquires a plurality of packets that are inspection data 212 (S32).

以下、検知部１６０は、複数のパケットのそれぞれについて、ステップＳ３３〜ステップＳ４１の処理を繰り返す。 Thereafter, the detection unit 160 repeats the processing of step S33 to step S41 for each of the plurality of packets.

なお、検知部１６０が実行するステップＳ３３〜ステップＳ３６は、検知モデル学習部１２０が実行するステップＳ１３〜ステップＳ１６と同様であるので説明を省略する。 In addition, since step S33-step S36 which the detection part 160 performs are the same as step S13-step S16 which the detection model learning part 120 performs, description is abbreviate | omitted.

検知部１６０は、特定したモデルが既に存在すると判定すれば（Ｓ３６でＹｅｓ）、ステップＳ３７に進み、特定したモデルがまだ存在していないと判定すれば（Ｓ３６でＮｏ）、ステップＳ４１に進む。 If the detection unit 160 determines that the specified model already exists (Yes in S36), the process proceeds to step S37, and if it is determined that the specified model does not exist yet (No in S36), the process proceeds to step S41.

検知部１６０は、ステップＳ３７において、処理対象のパケット中の対象データ部を抽出する。この処理は、学習処理のステップＳ１８と同様であるので説明を省略する。 In step S37, the detection unit 160 extracts the target data portion in the packet to be processed. Since this process is the same as step S18 of the learning process, the description will be omitted.

検知部１６０は、処理対象のパケットのスコアを算出する（Ｓ３８）。具体的には、検知部１６０は、上述した検知部１６０の説明における（１）〜（６）の処理を行うことにより、処理対象のパケットのスコアを算出する。 The detection unit 160 calculates the score of the processing target packet (S38). Specifically, the detection unit 160 calculates the score of the processing target packet by performing the processes (1) to (6) in the description of the detection unit 160 described above.

検知部１６０は、処理対象のパケットに対して算出したスコアが、異常検知モデルＤＢ１３０に記憶されている異常検知モデルで、当該処理対象のパケットのモデルに対応付けられているアラート閾値を超えているか否かを判定する（Ｓ３９）。検知部１６０は、算出したスコアが対応するアラート閾値を超えていれば（Ｓ３９でＹｅｓ）、提示部１７０は、アラートを提示し（Ｓ４０）、算出したスコアが対応するアラート閾値以下であれば（Ｓ３９でＮｏ）、ステップＳ４１に進む。 Does the detection unit 160 determine whether the score calculated for the processing target packet exceeds the alert threshold value associated with the processing target packet model in the abnormality detection model stored in the abnormality detection model DB 130? It is determined whether or not it is (S39). The detecting unit 160 presents an alert if the calculated score exceeds the corresponding alert threshold (Yes in S39) (S40), and if the calculated score is less than or equal to the corresponding alert threshold (S40) No), the process proceeds to step S41.

検知部１６０は、複数のパケットのうち、未検査のパケットが存在するか否かを判定し（Ｓ４１）、未検査のパケットが存在すれば（Ｓ４１でＹｅｓ）、未検査のパケットについてステップＳ３３〜ステップＳ４０の処理を実行する。検知部１６０は、未検査のパケットが存在しなければ（Ｓ４１でＮｏ）、つまり、全てのパケットについてステップＳ３３〜ステップＳ４０の処理が終了していれば、検査処理を終了する。 The detection unit 160 determines whether an unchecked packet exists among the plurality of packets (S41), and if an unchecked packet exists (Yes in S41), the unchecked packet is processed in steps S33 to S33. The process of step S40 is performed. If there is no untested packet (No in S41), that is, if the processing in step S33 to step S40 is completed for all the packets, the detection unit 160 ends the inspection process.

なお、取得部１１０は、複数のパケットを一度に全て取得しなくてもよく、複数回に分けて取得してもよく、例えば、複数のパケットを１つずつ取得してもよい。このように、取得部１１０が複数回に分けて複数の学習用パケットを取得する場合、異常検知装置１００は、ステップＳ３２〜ステップＳ４１を繰り返すこととなる。 Note that the acquiring unit 110 may not acquire a plurality of packets all at once, may divide and acquire the packets a plurality of times, and may acquire a plurality of packets one by one, for example. As described above, when the acquisition unit 110 divides a plurality of times and acquires a plurality of learning packets, the abnormality detection apparatus 100 repeats steps S32 to S41.

［３効果など］
本実施の形態に係る異常検知方法によれば、パケットに含まれるペイロードのうち、Ａビット単位で区切ることにより得られる複数個のデータ単位のうちのＮ個のデータ単位の当該ペイロードにおける並びを含む当該Ｎ個のデータ単位の組み合わせであって、取り得る全ての組み合わせを抽出し、全ての組み合わせのそれぞれが出現する第２の確率を算出し、算出した複数の第２の確率に基づいてスコアを算出する。このように、Ｎ個のデータ単位の当該ペイロードにおける並びを含む組み合わせが出現する確率に基づいてスコアを算出するため、並び情報を考慮した精度のよいスコアを算出することができる。 [3 effects etc]
According to the anomaly detection method of the present embodiment, among the payloads included in the packet, the payload includes a sequence of N data units among the plurality of data units obtained by dividing in A bit units in the payload. It is a combination of the N data units, and all possible combinations are extracted, a second probability that each of all the combinations appears is calculated, and a score is calculated based on the calculated plurality of second probabilities. calculate. As described above, since the score is calculated based on the probability that the combination including the arrangement of the N data units in the payload appears, the accurate score can be calculated in consideration of the arrangement information.

また、本実施の形態に係る学習方法によれば、異常検知モデルを追加学習すること、または、古いデータを削除した異常検知モデルに更新することができる。よって、異常なパケットを精度よく特定することができる。 Further, according to the learning method according to the present embodiment, it is possible to additionally learn the anomaly detection model or update the anomaly detection model from which old data has been deleted. Thus, abnormal packets can be identified with high accuracy.

このように、本実施の形態に係る異常検知方法は、既存手法に見られる欠点を克服していると考えられる。まず、ＰＡＹＬはバイト列の並び情報を無視しているという欠点があったが、本実施の形態に係る異常検知方法はＮ−ｇｒａｍ（Ｎ＝２、３）の情報を利用することでこの欠点を回避している。また、ＡＮＡＧＲＡＭは、Ｎ−ｇｒａｍの出現回数に関する情報を完全に捨ててしまっていたが、提案手法ではＮ−ｇｒａｍの出現回数も考慮したモデルを考える。ＡＮＡＧＲＡＭ（ｆｒｅｑｕｅｎｃｙ−ｂａｓｅｄ）は、Ｎ−ｇｒａｍの頻度情報を利用してはいたものの、スコアの算出法が経験的な方法であったため、提案手法ではＬａｐｌａｃｅｓｍｏｏｔｈｉｎｇの利用と、対数尤度を使った自然なスコアリングを用いてこの間題を回避している。 Thus, the anomaly detection method according to the present embodiment is considered to overcome the drawbacks found in the existing method. First, PAYL has the disadvantage that it ignores the information on the byte sequence, but the anomaly detection method according to the present embodiment uses this information by using N-gram (N = 2, 3) information. Is avoiding. In addition, although ANAGRAM has completely discarded information on the number of occurrences of N-grams, in the proposed method, consider a model in which the number of occurrences of N-grams is also taken into consideration. Although ANAGRAM (frequency-based) used N-gram frequency information, the score calculation method was an empirical method, so the proposed method uses Laplace smoothing and uses log likelihood Natural scoring is used to avoid this problem.

また、本実施の形態に係る異常検知方法は各モデルに関して出現するＮ−ｇｒａｍ（Ｎ＝２、３）の出現回数を保持しておけば良いため、メモリ効率もよく、ＡＮＡＧＲＡＭのようにブルームフィルタのサイズの見積りの必要などはない。 In addition, since the abnormality detection method according to the present embodiment only needs to hold the number of appearances of N-gram (N = 2, 3) appearing with respect to each model, memory efficiency is also good, and Bloom filter like ANAGRAM. There is no need to estimate the size of the

また、本実施の形態に係る異常検知方法におけるハイパーパラメータはＬａｐｌａｃｅｓｍｏｏｔｈｉｎｇの際に利用される底上げパラメータβのみであり、このパラメータは例えばβ＝０．０１などに固定してしまっても良く、経験的にこの値を少々変動させたところで、異常検知モデルの性能にほとんど影響を与えないことがわかっている。 Also, the hyper parameter in the abnormality detection method according to the present embodiment is only the raising parameter β used in Laplace smoothing, and this parameter may be fixed to, for example, β = 0.01 or the like. A slight variation of this value has been found to have little effect on the performance of the anomaly detection model.

また、本実施の形態に係る異常検知方法では各モデルにおける２−ｇｒａｍの出現回数ベクトルｘさえ記憶しておけば、既に学習したモデルに追加で学習を行うこと（追加学習）や、逆に既にモデルが学習したデータを学習していない状態に戻すこと（忘却）が可能である。特に忘却の機能は他の手法に見られない特徴である。忘却機能を利用することで、常に１ヶ月分のデータのみ学習された状態にしておくことや、通常データとして好ましくないデータが得られた日時のデータを選択的にモデルから忘却させることができる。この性質は異常検知システムを実際に運用していく上で有用な性質である。つまり、検知部１６０は、カウントした第３の数を用いて、異常検知モデルに含まれる第４の数を更新してもよい。例えば、検知部１６０は、第４の数に第３の数を追加することで異常検知モデルに学習データを追加することができる。また、新たにカウントすることで得られた第３の数を追加すると共に、過去の所定期間においてカウントした数を異常検知モデルの第４の数から削除することで、異常検知モデルを最新の状態とすることができる。なお、異常検知モデルの第４の数に、新たにカウントした数を追加することなく、当該第４の数から過去の所定期間においてカウントした数を削除してもよい。 In addition, in the anomaly detection method according to the present embodiment, if only the appearance frequency vector x of 2-gram in each model is stored, additional learning is performed on the already learned model (additional learning), and conversely, already. It is possible to return the data learned by the model back to the unlearned state (forgetting). In particular, the function of oblivion is a feature not found in other approaches. By using the forgetting function, it is possible to keep only one month's worth of data learned at all times, or to selectively forget from the model the date / time when unwanted data was obtained as normal data. This property is useful in practical operation of the anomaly detection system. That is, the detection unit 160 may update the fourth number included in the abnormality detection model using the counted third number. For example, the detection unit 160 can add learning data to the abnormality detection model by adding the third number to the fourth number. Moreover, while adding the 3rd number obtained by newly counting and deleting the number counted in the past predetermined period from the 4th number of the abnormality detection model, the latest state of the abnormality detection model It can be done. Note that the number counted in the past predetermined period may be deleted from the fourth number without adding the newly counted number to the fourth number of the abnormality detection model.

（その他）
以上のように、異常検知装置１００では、以下の異常検知方法を実行している。 (Others)
As described above, the abnormality detection apparatus 100 executes the following abnormality detection method.

１．ｕｎｉｇｒａｍを用いるとバイト列の並びに関する情報が完全に失われてしまうため、Ｎ−ｇｒａｍ（Ｎ≧２）を特徴量として用いている。 1. When unigram is used, information on byte string arrangement is completely lost, so N-gram (N ≧ 2) is used as a feature amount.

２．ＡＮＡＧＲＡＭのようにＮ−ｇｒａｍの出現頻度に関する情報を完全に落としてしまわずに、Ｎ−ｇｒａｍの出現回数の情報を利用する。 2. Information on the number of occurrences of N-grams is used without completely dropping information on the occurrence frequency of N-grams as in ANAGRAM.

３．ＡＮＡＧＲＡＭ（ｆｒｅｑｕｅｎｃｙ−ｂａｓｅｄ）の手法は異常スコアの計算が単純な算出平均を用いる方法であったことに着目し、確率的なモデルを仮定し、より理論的に妥当性のある異常スコアの算出法を利用している。 3. The ANAGRAM (frequency-based) method focuses on the fact that the calculation of the anomaly score is a method using a simple calculation average, assumes a probabilistic model, and calculates an anomaly score that is more theoretically valid. Using

４．実環境ではハイパーパラメータのチューニングを適切に行えるような教師データの入手が困難であるため、ハイパーパラメータが少ないモデルを利用している。 4. In the real environment, it is difficult to obtain teacher data that can properly tune hyperparameters, so models with few hyperparameters are used.

１つ目は明らかに、ｕｎｉｇｒａｍよりもＮ−ｇｒａｍ（Ｎ≧２）の持つリッチな情報を持つことを利用したいためである。これはＰＡＹＬの精度がＡＮＡＧＲＡＭと比較して低い理由がｕｎｉｇｒａｍを利用していることが原因だと思われるからである。 The first reason is to obviously use rich information possessed by N-gram (N ≧ 2) rather than unigram. This is because the accuracy of PAYL is lower than that of ANAGRAM because the reason is using unigram.

２つ目に関しても同様で、あるＮ−ｇｒａｍが何回出現したか、という情報は、あるＮ−ｇｒａｍが出現したことがあるか、という情報より多くの情報を含んでいるからである。また、ランダム性の高いバイナリ列が多く含まれていると考えられる制御システムネットワークのパケットにおいては、あるＮ−ｇｒａｍが出現したことがあるかどうかだけで判断してしまっては、たまたまランダムなバイナリ中に含まれたＮ−ｇｒａｍを正常な列とみなしてしまうおそれがあるからである。 The same applies to the second one, because the information indicating how many times an N-gram has appeared contains more information than the information indicating whether an N-gram has appeared. Also, in a packet of a control system network that is considered to contain many highly random binary strings, if it is determined whether or not a certain N-gram has occurred, random binary This is because the N-gram contained therein may be regarded as a normal string.

この１つ目、２つ目の特性は、ｆｒｅｑｕｅｎｃｙ−ｂａｓｅｄのＡＮＡＧＲＡＭが持つ特性と同一である。しかしＡＮＡＧＲＡＭの論文中では、ｆｒｅｑｕｅｎｃｙ−ｂａｓｅｄのＡＮＡＧＲＡＭは、ｂｉｎａｒｙ−ｂａｓｅｄのＡＮＡＧＲＡＭに明確に劣ると記述されていた。３つ目の特性に述べたとおりだが、本稿では、ｆｒｅｑｕｅｎｃｙ−ｂａｓｅｄのＡＮＡＧＲＡＭの異常スコアの算出手法に問題があったことを示し、適切な異常スコア算出法の下では、ｂｉｇｒａｍ（Ｎ＝２のときのＮ−ｇｒａｍ）を用いれば、ＰＡＹＬやＡＮＡＧＲＡＭを凌ぐ精度となり得る。 The first and second characteristics are identical to the characteristics of frequency-based ANAGRAM. However, in the paper of ANAGRAM, frequency-based ANAGRAM was described as being clearly inferior to binary-based ANAGRAM. As stated in the third characteristic, this paper shows that there was a problem in the method of calculating the anomaly score of the frequency-based NAGRAM, and under the appropriate anomaly score calculation method, the bigram (N = 2 If the N-gram of the time is used, it may be more accurate than PAYL or ANAGRAM.

本実施の形態に係る異常検知方法も、過去のＮ−ｇｒａｍを用いた手法と同様に、ペイロード列のＮ−ｇｒａｍ情報を特徴として利用する。本実施の形態ではＮ＝２の場合、すなわち２−ｇｒａｍを特徴として利用する。Ｎ≧３となるＮを使わない理由は、Ｎ≧３の場合、各Ｎ−ｇｒａｍの情報がスパースになってしまい、出現回数情報の信頼性が落ちてしまうからである（これがＡＮＡＧＲＡＭ（ｆｒｅｑｕｅｎｃｙ−ｂａｓｅｄ）の検知性能が低い一因とされている）が、データが豊富に存在する場合、Ｎ＝３として本手法を適用しても高い精度を発揮することが期待される。Ｎ≧４の場合、現実的な場面ではＮ−ｇｒａｍデータがスパースになってしまい実用的ではないと考えられる。 The abnormality detection method according to the present embodiment also uses N-gram information of the payload sequence as a feature, as in the method using the past N-gram. In the present embodiment, the case of N = 2, that is, 2-gram is used as a feature. The reason why N ≧ 3 where N N3 is not used is that in the case of NN3, the information of each N-gram becomes sparse, and the reliability of the appearance frequency information drops (this is an ANAGRAM (frequency- However, if the data is abundant, it is expected that high accuracy will be achieved even if N = 3 and this method is applied. In the case of N ≧ 4, in a realistic scene, N-gram data is considered to be sparse and not practical.

アノマリベースの異常検知技術手法の多くは学習フェーズを持ち、学習期間として与えられたデータを使って正常な通信のふるまいを学習する。検査フェーズでは、与えられたパケットが正常であるか異常であるかを、学習フェーズに得られた検知モデルを使って判断することになる。本実施の形態に係る異常検知方法はペイロードベースの手法であるが、ヘッダの情報も利用している。これは提案手法が宛先ＩＰアドレスや宛先ポートに応じて学習／検査に利用する異常検知モデルを変えているためである。例えばＨＴＴＰプロトコルとＦＴＰプロトコルでは、観測されるペイロードが全く異なるためである。 Most anomaly-based anomaly detection techniques have a learning phase, and use data provided as a learning period to learn normal communication behavior. In the inspection phase, it is determined using a detection model obtained in the learning phase whether a given packet is normal or abnormal. Although the anomaly detection method according to the present embodiment is a payload-based method, header information is also used. This is because the proposed method changes the anomaly detection model used for learning / inspection according to the destination IP address and destination port. For example, in the HTTP protocol and the FTP protocol, observed payloads are completely different.

［４変形例］
上記実施の形態に係る異常検知方法において、Ｎ−ｇｒａｍを用いた系列生成モデルに従ったスコアリングを行うこともできる。ここでｘ_ｉ，ｊ［Ｘ_Ｔ，Ｘ_Ｔ＋１］を、そのモデルにおける２−ｇｒａｍＸ_Ｔ，Ｘ_Ｔ＋１の出現回数とする。このときｐ（Ｘ_Ｔ＋１｜Ｘ_Ｔ）を下記の式により定める。 [4 variations]
In the abnormality detection method according to the above-described embodiment, it is also possible to perform scoring in accordance with a sequence generation model using an N-gram. Here _{_{_{x i, j [X T,}}} X T + 1] and the 2-gramX _T, the number of occurrences of _{X T + 1} in the model. At this time, p (X _{T + 1} | X _T ) is determined by the following equation.

また、ｐ（Ｘ_１）は別途下記の式により定める。ただしｓｔａｒｔはデータの開始を意味する記号である。 Further, p (X ₁ ) is separately determined by the following equation. However, start is a symbol that means the start of data.

この値を取得するため、学習時にはペイロード中の最初の文字の出現回数をモデルごとに保持しておく必要がある。 In order to acquire this value, it is necessary to hold the number of appearances of the first character in the payload for each model at the time of learning.

また、検査の処理では式４から自然に下記の式１５により導出できる。 Further, in the process of inspection, it can be naturally derived from Expression 4 by Expression 15 below.

［５実験と評価結果］
本実験では、既存手法として挙げたＰＡＹＬ、ＡＮＡＧＲＡＭ（ｆｒｅｑｕｅｎｃｙ−ｂａｓｅｄ）、ＡＮＡＧＲＡＭ（ｂｉｎａｒｙ−ｂａｓｅｄ）を比較対象として本実施の形態に係る異常検知方法を評価している。ＡＮＡＧＲＡＭ（ｂｉｎａｒｙ−ｂａｓｅｄ）は３−ｇｒａｍを評価対象とし、ＡＮＡＧＲＡＭ（ｆｒｅｑｕｅｎｃｙ−ｂａｓｅｄ）は２−ｇｒａｍと３−ｇｒａｍをともに評価している。本実施の形態に係る異常検知方法としては２−ｇｒａｍを利用している。 [5 Experiment and evaluation result]
In this experiment, the anomaly detection method according to the present embodiment is evaluated, comparing PAYL, an ANAGRAM (frequency-based), and an ANAGRAM (binary-based), which are mentioned as existing methods, as comparison targets. ANAGRAM (binary-based) targets 3-gram as an evaluation target, and ANAGRAM (frequency-based) evaluates both 2-gram and 3-gram. As the abnormality detection method according to the present embodiment, 2-gram is used.

［５−１実験に用いるデータセットと評価の仕方］
ここでは、データセットとして１９９９ＤＡＲＰＡＩＤＳＤａｔａＳｅｔ（以降ＤＡＲＰＡ９９データセット）を用いている。ＤＡＲＰＡ９９データセットはＭＩＴＬｉｎｃｏｌｎＬａｂｓでＩＤＳ評価用に収集されたデータセットであり、それぞれのパケットのペイロードを含む全てのネットワークトラフィックがｔｃｐｄｕｍｐのフォーマットで提供されている。データは３週間の学習用データと、２週間のテスト用データから成っており、学習用データは２週間分の攻撃が含まれて居ないデータと、１週間の攻撃を含むデータから成っている。テスト用データには全ての日付において攻撃が含まれている。また、攻撃データはそれぞれ一連の攻撃をまとめたインスタンスと呼ばれる単位に集約されており、ＤＡＲＰＡ９９データセットでは各攻撃インスタンスが発生した期間や対象ＩＰ、対象ポートなどの情報が公開されている。本評価実験において、各手法は学習用データのうち、攻撃データが含まれていない２週間分のデータを用いて学習を行い、２週間分のテスト用データに現れるパケットに対して異常スコアを算出した。また、今回評価した手怯は、Ｎ＝１，２，３のＮ−ｇｒａｍを用いているため、平等な評価結果となるようペイロード長が３ｂｙｔｅ以上のパケットのみを学習とテストの対象とした。 [5-1 Data set used for experiment and evaluation method]
Here, 1999 DARPA IDS Data Set (hereinafter DARPA 99 data set) is used as a data set. The DARPA 99 data set is a data set collected for IDS evaluation at MIT Lincoln Labs, and all network traffic including the payload of each packet is provided in tcpdump format. The data consists of 3 weeks of training data and 2 weeks of testing data, and the training data consists of data without 2 weeks of attacks and data with 1 week of attacks . Test data includes attacks on all dates. In addition, attack data is collected in a unit called an instance that combines a series of attacks, and in the DARPA 99 data set, information such as a time period in which each attack instance has occurred, a target IP, and a target port is disclosed. In this evaluation experiment, each method performs learning using data for 2 weeks not including attack data among learning data, and calculates an abnormality score for a packet appearing in test data for 2 weeks. did. In addition, since the procedure evaluated this time uses N = 1, 2, and 3 N-grams, only packets having a payload length of 3 bytes or more were subjected to learning and testing so as to obtain equal evaluation results.

本実験ではＰＡＹＬ論文に従って、ＤＡＲＰＡ９９データセットのうち、ペイロードに情報が現れる攻撃インスタンスに絞って、プロトコルごとにインスタンスベースの検知率（縦軸）とパケットベースの誤検知率（横軸）のグラフで各手法を評価する。各手法は各パケットに対して異常スコア（スカラー値）を算出するのみであるため、あるパケットを異常と判定するか正常と判定するかは、定められたスコアの闘値に依存することになる。すなわち定められた闘値を上回る異常スコアのパケットを異常、そうでないパケットを正常と判定する。闘値を大きくすればするほど誤検知率は低下するが、検知率も低下する。逆に闘値を小さくすればするほど検知率は増加するが、誤検知率も増加してしまうというトレードオフの関係にある。 In this experiment, according to PAYL thesis, in the DARPA 99 data set, we narrow down to attack instances where information appears in the payload, and use a graph of instance-based detection rate (vertical axis) and packet-based false alarm rate (horizontal axis) for each protocol. Evaluate each method. Since each method only calculates an anomaly score (scalar value) for each packet, whether a packet is determined to be abnormal or normal depends on a defined score threshold value. . That is, it is determined that a packet with an abnormal score exceeding a predetermined threshold value is abnormal, and a packet that is not is normal. The larger the threshold, the lower the false alarm rate, but the lower the false alarm rate. Conversely, the smaller the threshold, the higher the detection rate, but the higher the false detection rate.

（インスタンスベースの検知率）
ある特定の攻撃インスタンスに含まれるパケット群のうち、１つ以上のパケットを検知した場合にそのインスタンスを検知したものと判定する。インスタンスベースの検知率とは、この判断基準の下で、全インスタンスの中で検知されたインスタンスの割合を示す。 (Instance-based detection rate)
When one or more packets are detected from a packet group included in a specific attack instance, it is determined that the instance is detected. Instance-based detection rate indicates the ratio of detected instances among all instances under this criterion.

（パケットベースの誤検知率）
異常検知モデルが異常と判断したもののうち、攻撃インスタンスに含まれるパケットを除いたものを正常パケットと呼ぶ。パケットベースの誤検知率とは、この正常パケットのうち、誤って異常と判定してしまったパケットの割合である。 (Packet based false alarm rate)
Of those determined as abnormal by the abnormality detection model, those excluding the packet included in the attack instance are called normal packets. The packet-based false alarm rate is the ratio of packets which are erroneously determined to be abnormal among the normal packets.

ＤＡＲＰＡ９９のデータには複数のプロトコルのパケットが含まれているが、それぞれのプロトコルに含まれるパケット数や攻撃インスタンス数には大きなバラつきがあるため、評価用のデータとして使える程度に多くのデータが存在するのは、ＨＴＴＰ、ＦＴＰ、ＴＥＬＮＥＴ、ＳＭＴＰの４種類程度であると考えられる。本技術は特に制御システムにおける利用を想定しているが、ＤＡＲＰＡ９９のデータには制御システム用のプロトコルのパケットデータが存在しない。そのため、ＤＡＲＰＡ９９のデータの中では比較的制御システム用のプロトコルで見られる制御コマンドに近いと考えられるＦＴＰプロトコルとＴＥＬＮＥＴプロトコルにおいて評価を行った。 Although DARPA 99 data contains packets of multiple protocols, the number of packets included in each protocol and the number of attack instances vary widely, so there is a large amount of data that can be used as data for evaluation. It is thought that there are about four types of HTTP, FTP, TELNET, and SMTP. Although the present technology is specifically intended for use in a control system, packet data of the protocol for the control system does not exist in the data of DARPA 99. Therefore, we evaluated the FTP protocol and TELNET protocol, which are considered to be relatively close to the control commands found in the protocol for control system, among the data of DARPA 99.

［５−２実験結果］
図１５は、ＦＴＰプロトコルにおいて評価を行った場合の本実施の形態に係る異常検知方法と他の手法とを比較した実験結果を示す図である。図１６は、ＴＥＬＮＥＴプロトコルにおいて評価を行った場合の本実施の形態に係る異常検知方法と他の手法とを比較した実験結果を示す図である。 [5-2 experimental result]
FIG. 15 is a diagram showing an experimental result comparing the abnormality detection method according to the present embodiment and another method when evaluation is performed in the FTP protocol. FIG. 16 is a diagram showing experimental results comparing the anomaly detection method according to the present embodiment and another method when evaluation is performed in the TELNET protocol.

各手法の結果は右上がりの線となっているが、これは闘値を小さな値に定めたものから大きな値に定めたものまでの変動をプロットしたものである。ＦＴＰプロトコルの評価結果が示す通り、提案手法は既存の３−ｇｒａｍを利用したＡＮＡＧＲＡＭ（ｂｉｎａｒｙ−ｂａｓｅｄ、ｆｒｅｑｕｅｎｃｙ−ｂａｓｅｄ）と同等以上の性能を発揮していることがわかる。ＰＡＹＬや２−ｇｒａｍのＡＮＡＧＲＡＭ（ｆｒｅｑｕｅｎｃｙ−ｂａｓｅｄ）より明らかに良い性能を発揮している。また、ＴＥＬＮＥＴプロトコルの評価結果では、実施の形態に係る異常検知方法は他のどの手法よりも優れた検知性能を発揮している。このことから実施の形態に係る異常検知方法はチューニングの必要性が少ない異常検知手法の中でも比較的良い性能を示すアルゴリズムであることがわかる。 The result of each method is an upward-sloping line, which is a plot of the variation from a small threshold to a large threshold. As the evaluation results of the FTP protocol show, it is understood that the proposed method exhibits performance equivalent to or better than the existing 3-gram-based ANGRAM (binary-based, frequency-based). It clearly performs better than PAYL and 2-gram ANAGRAM (frequency-based). Moreover, in the evaluation result of the TELNET protocol, the anomaly detection method according to the embodiment exhibits superior detection performance than any other method. From this, it can be seen that the anomaly detection method according to the embodiment is an algorithm that exhibits relatively good performance among anomaly detection methods that require less tuning.

なお、上記各実施の形態において、各構成要素は、専用のハードウェアで構成されるか、各構成要素に適したソフトウェアプログラムを実行することによって実現されてもよい。各構成要素は、ＣＰＵまたはプロセッサなどのプログラム実行部が、ハードディスクまたは半導体メモリなどの記録媒体に記録されたソフトウェアプログラムを読み出して実行することによって実現されてもよい。ここで、上記各実施の形態の異常検知方法、学習方法などを実現するソフトウェアは、次のようなプログラムである。 In the above embodiments, each component may be configured by dedicated hardware or may be realized by executing a software program suitable for each component. Each component may be realized by a program execution unit such as a CPU or a processor reading and executing a software program recorded in a recording medium such as a hard disk or a semiconductor memory. Here, software for realizing the abnormality detection method, the learning method, and the like of each of the above-described embodiments is a program as follows.

すなわち、このプログラムは、コンピュータに、監視対象内での通信、または、前記監視対象と前記監視対象が接続されているネットワークとの間での通信に異常があるか否かを検知する異常検知装置が実行する異常検知方法であって、前記異常検知装置は、プロセッサおよびメモリを備え、前記メモリは、複数の学習用パケットを用いた学習により生成された異常検知モデルを記憶しており、前記異常検知方法では、前記プロセッサが、前記複数の学習用パケットを取得し、取得した前記複数の学習用パケットのそれぞれについて、当該学習用パケットに含まれるペイロードを構成するデータ列をＡ（Ａは１以上の整数）ビット単位で区切ることにより得られる複数個のデータ単位のうちのＮ（Ｎは２以上の整数）個のデータ単位の取り得る全ての第１の組み合わせであって、当該ペイロードにおける互いに連続している並び順、または、Ｂ（Ｂは１以上の整数）個飛ばしの並び順でのＮ個のデータ単位の第１の組み合わせを抽出し、前記複数の学習用パケットについて抽出した前記全ての第１の組み合わせのそれぞれについて、当該第１の組み合わせが前記複数の学習用パケットにおいて出現する回数である第１の数をカウントし、抽出した前記全ての第１の組み合わせのそれぞれについて、カウントすることで得られた複数の前記第１の数に基づいて、スムージング処理を行うことで前記複数の学習用パケットにおいて当該第１の組み合わせが出現する確率である複数の第１の確率を算出し、算出した前記複数の第１の確率を前記異常検知モデルとして前記メモリに記憶させ、複数のパケットを取得し、取得した複数のパケットのそれぞれについて、当該パケットに対して算出したスコアが、前記メモリに記憶されている前記異常検知モデルに基づく所定の閾値を超えている場合、当該スコアが算出されたパケットが以上であることを出力する異常検知方法を実行させる。 That is, this program detects whether or not there is an abnormality in the communication within the monitoring target or the communication between the monitoring target and the network to which the monitoring target is connected to the computer. The anomaly detection apparatus includes a processor and a memory, and the memory stores an anomaly detection model generated by learning using a plurality of learning packets, and the anomaly is detected by the anomaly detection method. In the detection method, the processor acquires the plurality of learning packets, and for each of the plurality of acquired learning packets, a data string constituting a payload included in the learning packet is A (A is one or more; Of (N is an integer of 2 or more) of the plurality of data units obtained by dividing in units of bits A first combination of N data units in an arrangement order of consecutive ones in the payload, or in an arrangement order of B (B is an integer of 1 or more) skips, of all the first combinations. The first number which is the number of times the first combination appears in the plurality of learning packets is counted and extracted for each of all the first combinations extracted and extracted for the plurality of learning packets The first combination appears in the plurality of learning packets by performing the smoothing process on the basis of the plurality of first numbers obtained by counting for each of the first combinations described above. Calculating a plurality of first probabilities which are the probability of occurrence and storing the plurality of calculated first probabilities in the memory as the abnormality detection model When a plurality of packets are acquired, and a score calculated for the plurality of packets for each of the plurality of acquired packets exceeds a predetermined threshold based on the abnormality detection model stored in the memory, the score Execute an abnormality detection method that outputs that the calculated packet is greater than or equal to.

また、このプログラムは、コンピュータに、監視対象内での通信、または、前記監視対象と前記監視対象が接続されているネットワークとの間での通信に異常があるか否かを検知するための異常検知モデルを学習する学習装置が実行する学習方法であって、前記学習装置は、プロセッサおよびメモリを備え、前記学習方法では、前記プロセッサが、複数の学習用パケットを取得し、取得した前記複数の学習用パケットのそれぞれについて、当該学習用パケットに含まれるペイロードを構成するデータ列をＡ（Ａは１以上の整数）ビット単位で区切ることにより得られる複数個のデータ単位のうちのＮ（Ｎは２以上の整数）個のデータ単位の取り得る全ての第１の組み合わせであって、当該ペイロードにおける互いに連続している並び順、または、Ｂ（Ｂは１以上の整数）個飛ばしの並び順でのＮ個のデータ単位の第１の組み合わせを抽出し、前記複数の学習用パケットについて抽出した前記全ての第１の組み合わせのそれぞれについて、当該第１の組み合わせが前記複数の学習用パケットにおいて出現する回数である第１の数をカウントし、抽出した前記全ての第１の組み合わせのそれぞれについて、カウントすることで得られた複数の前記第１の数に基づいて、スムージング処理を行うことで前記複数の学習用パケットにおいて当該第１の組み合わせが出現する確率である複数の第１の確率を算出し、算出した前記複数の第１の確率を前記異常検知モデルとして前記メモリに記憶させる学習方法を実行させる。 In addition, this program is an abnormality for detecting whether the computer has an abnormality in communication within the monitoring target or communication between the monitoring target and the network to which the monitoring target is connected. A learning method executed by a learning device for learning a detection model, wherein the learning device includes a processor and a memory, and in the learning method, the processor acquires a plurality of learning packets and acquires the plurality of acquired packets. For each of the learning packets, N (N is a number) of the plurality of data units obtained by dividing the data string constituting the payload included in the learning packet into A (A is an integer of 1 or more) bits. All possible first combinations of two or more integers) of data units, in a sequential order in which the payloads are contiguous, and , B (B is an integer greater than or equal to 1) pieces of first combinations of N data units in the order of skipping, and for each of the first combinations extracted for the plurality of learning packets A plurality of the above obtained by counting a first number which is the number of times the first combination appears in the plurality of learning packets, and counting each of the extracted first combinations. A plurality of first probabilities, which are probabilities of occurrence of the first combination in the plurality of learning packets, are calculated by performing smoothing processing based on the first number, and the plurality of calculated first plurality of first probabilities are calculated. A learning method is executed to store the probability in the memory as the abnormality detection model.

以上、本発明の一つまたは複数の態様に係る異常検知方法、異常検知装置、学習方法、および、学習装置について、実施の形態に基づいて説明したが、本発明は、この実施の形態に限定されるものではない。本発明の趣旨を逸脱しない限り、当業者が思いつく各種変形を本実施の形態に施したものや、異なる実施の形態における構成要素を組み合わせて構築される形態も、本発明の一つまたは複数の態様の範囲内に含まれてもよい。 The abnormality detection method, the abnormality detection device, the learning method, and the learning device according to one or more aspects of the present invention have been described above based on the embodiments, but the present invention is limited to these embodiments. It is not something to be done. Without departing from the spirit of the present invention, various modifications as may occur to those skilled in the art may be applied to this embodiment, or a configuration constructed by combining components in different embodiments may be one or more of the present invention. It may be included within the scope of the embodiments.

本開示は、異常なパケットを、精度よく特定することができる異常検知方法、または、精度よく特定するための異常検知モデルを学習する学習方法などとして有用である。 The present disclosure is useful as an anomaly detection method capable of identifying an abnormal packet with high accuracy, or as a learning method of learning an anomaly detection model for identifying the packet with high accuracy.

１異常検知システム
１００異常検知装置
１０１ＣＰＵ
１０２メインメモリ
１０３ストレージ
１０４通信ＩＦ
１０５入力ＩＦ
１０６ディスプレイ
１１０取得部
１２０検知モデル学習部
１３０異常検知モデルＤＢ
１３１〜１３４異常検知モデル
１３５対応情報
１４０入力受付部
１５０アラート閾値算出部
１６０検知部
１７０提示部
２００パケット収集装置
２１０データ
２１１学習用データ
２１２検査用データ
３００監視対象
３１１、３１２、３２１、３２２ハブ
３１３ＳＣＡＤＡ
３１４ＰＬＣ
３１５、３２３、３２４ＰＣ
４００ルータ
５００ネットワーク 1 abnormality detection system 100 abnormality detection device 101 CPU
102 Main memory 103 Storage 104 Communication IF
105 Input IF
106 Display 110 Acquisition unit 120 Detection model learning unit 130 Abnormality detection model DB
131 to 134 anomaly detection model 135 correspondence information 140 input acceptance unit 150 alert threshold calculation unit 160 detection unit 170 presentation unit 200 packet collection device 210 data 211 learning data 212 inspection data 300 monitoring targets 311, 312, 321, 322 hub 313 SCADA
314 PLC
315, 323, 324 PC
400 router 500 network

Claims

An abnormality detection method executed by an abnormality detection apparatus that detects whether there is an abnormality in communication within a monitoring target or communication between the monitoring target and a network to which the monitoring target is connected. ,
The anomaly detection device comprises a processor and a memory,
The memory stores an abnormality detection model generated by learning using a plurality of learning packets,
In the abnormality detection method, the processor
Acquire the plurality of learning packets,
Among a plurality of data units obtained by dividing a data string constituting a payload included in the learning packet for each of the acquired plurality of learning packets into A (A is an integer of 1 or more) bits. All possible first combinations of N (N is an integer of 2 or more) data units, in mutually consecutive arrangement order in the payload, or B (B is an integer of 1 or more) Extract the first combination of N data units in the order of the skipping,
A first number, which is the number of times that the first combination appears in the plurality of learning packets, is counted for each of the first combinations extracted for the plurality of learning packets,
The smoothing process is performed based on the plurality of first numbers obtained by counting each of the extracted first combinations, and the first combination is obtained in the plurality of learning packets. Calculate a plurality of first probabilities that are probabilities of appearing,
Storing the plurality of calculated first probabilities in the memory as the abnormality detection model;
Get multiple packets,
For each of the plurality of acquired packets, when the score calculated for the packet exceeds a predetermined threshold based on the abnormality detection model stored in the memory, the packet for which the score is calculated is greater than or equal to An anomaly detection method that outputs

In the calculation of the first probability, as the smoothing process, a plurality of second numbers are calculated by adding a positive number to all of the first numbers, and the extracted first combinations of all The abnormality detection method according to claim 1, wherein the first probability is calculated based on the plurality of second numbers calculated for each of.

The abnormality detection method according to claim 1, wherein the first combination of the N data units is extracted by using an N-gram in the extraction.

The abnormality detection method according to claim 3, wherein the N is 2 or 3.

At the output, for each of the plurality of acquired packets,
(1) N (N is an integer of 2 or more) of a plurality of data units obtained by dividing the data string constituting the payload included in the packet into A (A is an integer of 1 or more) bits All possible second combinations of the data units of N, and N data units in consecutive order in the payload or in order of B (B is an integer of 1 or more) skipping order Extract the second combination of
(2) For each of all the second combinations extracted from the packet, a third number is counted, which is the number of times the second combination appears in the payload of the acquired packet,
(3) A plurality of probability that the second combination appears in the packet, based on the plurality of third numbers obtained by counting each of all the second combinations in the packet Calculate the second probability,
(4) A score is calculated by dividing the sum of logarithms of the plurality of second probabilities calculated for the packet by a specified value defined by the payload length of the payload,
(5) If the score calculated for the packet exceeds a predetermined threshold based on the abnormality detection model stored in the memory, the packet for which the score is calculated is output as being abnormal The abnormality detection method according to any one of claims 1 to 4.

The memory stores, as the anomaly detection model, a fourth number based on the first number in each of the first combinations.
In the abnormality detection method, the processor further includes:
The abnormality detection method according to claim 5, wherein the fourth number included in the abnormality detection model is updated using the counted third number.

In the abnormality detection method, the processor further includes:
The learning packet is classified into any one of a plurality of models according to the header of the learning packet for each of the plurality of learning packets acquired,
For each of the plurality of models,
(1) The number of times each of the first combinations appears in a plurality of learning packets classified into the model among the plurality of learning packets using the counted first number Calculate the fifth number, which is
(2) For each of the first combinations extracted from the plurality of learning packets classified into the model, the plurality of fifth combinations calculated by adding a positive number to all of the calculated fifth numbers Calculate the sixth number of
(3) The first combination appears in the plurality of learning packets classified into the model based on the calculated plurality of sixth numbers for each of the extracted first combinations. The abnormality detection method according to claim 5, wherein a plurality of first probabilities that are probabilities are calculated.

The memory stores the predetermined threshold for each of the plurality of models.
In the abnormality detection method, the processor further includes:
Classify each of the plurality of acquired packets into any one of a plurality of models according to a header of the packet,
The output indicates that the packet is abnormal if the calculated score exceeds the predetermined threshold corresponding to a model into which the packet for which the score is calculated is classified. Anomaly detection method.

9. The anomaly detection method according to claim 7, wherein each of the plurality of models is a model classified according to at least one of a destination IP, a destination port, a source IP, and a protocol of the packet.

The memory stores, as the abnormality detection model, the fifth number in each of the first combinations for each of the plurality of models.
In the abnormality detection method, the processor further includes:
The abnormality detection method according to any one of claims 7 to 9, wherein the fifth number included in the abnormality detection model is updated using the counted third number.

A learning device for learning an abnormality detection model for detecting whether there is an abnormality in communication within the monitoring target or communication between the monitoring target and the network to which the monitoring target is connected is executed Learning method, and
The learning device comprises a processor and a memory,
In the learning method, the processor
Get multiple learning packets,
Among a plurality of data units obtained by dividing a data string constituting a payload included in the learning packet for each of the acquired plurality of learning packets into A (A is an integer of 1 or more) bits. All possible first combinations of N (N is an integer of 2 or more) data units, in mutually consecutive arrangement order in the payload, or B (B is an integer of 1 or more) Extract the first combination of N data units in the order of the skipping,
A first number, which is the number of times that the first combination appears in the plurality of learning packets, is counted for each of the first combinations extracted for the plurality of learning packets,
The smoothing process is performed based on the plurality of first numbers obtained by counting each of the extracted first combinations, and the first combination is obtained in the plurality of learning packets. Calculate a plurality of first probabilities that are probabilities of appearing,
A learning method for storing the plurality of calculated first probabilities in the memory as the abnormality detection model.

An abnormality detection device that detects whether there is an abnormality in communication within a monitoring target or communication between the monitoring target and a network to which the monitoring target is connected,
The anomaly detection device comprises a processor and a memory,
The memory stores an abnormality detection model generated by learning using a plurality of learning packets,
The processor is
Acquire the plurality of learning packets,
Among a plurality of data units obtained by dividing a data string constituting a payload included in the learning packet for each of the acquired plurality of learning packets into A (A is an integer of 1 or more) bits. All possible first combinations of N (N is an integer of 2 or more) data units, in mutually consecutive arrangement order in the payload, or B (B is an integer of 1 or more) Extract the first combination of N data units in the order of the skipping,
A first number, which is the number of times that the first combination appears in the plurality of learning packets, is counted for each of the first combinations extracted for the plurality of learning packets,
The smoothing process is performed based on the plurality of first numbers obtained by counting each of the extracted first combinations, and the first combination is obtained in the plurality of learning packets. Calculate a plurality of first probabilities that are probabilities of appearing,
Storing the plurality of calculated first probabilities in the memory as the abnormality detection model;
Get multiple packets,
For each of the plurality of acquired packets, when the score calculated for the packet exceeds a predetermined threshold based on the abnormality detection model stored in the memory, the packet for which the score is calculated is greater than or equal to An anomaly detection device that outputs

A learning device for learning an abnormality detection model for detecting whether there is an abnormality in communication within a monitoring target or communication between the monitoring target and a network to which the monitoring target is connected. ,
The learning device comprises a processor and a memory,
The processor is
Get multiple learning packets,
Among a plurality of data units obtained by dividing a data string constituting a payload included in the learning packet for each of the acquired plurality of learning packets into A (A is an integer of 1 or more) bits. All possible first combinations of N (N is an integer of 2 or more) data units, in mutually consecutive arrangement order in the payload, or B (B is an integer of 1 or more) Extract the first combination of N data units in the order of the skipping,
A first number, which is the number of times that the first combination appears in the plurality of learning packets, is counted for each of the first combinations extracted for the plurality of learning packets,
The smoothing process is performed based on the plurality of first numbers obtained by counting each of the extracted first combinations, and the first combination is obtained in the plurality of learning packets. Calculate a plurality of first probabilities that are probabilities of appearing,
A learning device for storing the plurality of calculated first probabilities in the memory as the abnormality detection model.