WO2022219787A1 - Labeling device, labeling method, and program - Google Patents
Labeling device, labeling method, and program Download PDFInfo
- Publication number
- WO2022219787A1 WO2022219787A1 PCT/JP2021/015632 JP2021015632W WO2022219787A1 WO 2022219787 A1 WO2022219787 A1 WO 2022219787A1 JP 2021015632 W JP2021015632 W JP 2021015632W WO 2022219787 A1 WO2022219787 A1 WO 2022219787A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- monitoring data
- label
- labeled
- model
- data
- Prior art date
Links
- 238000002372 labelling Methods 0.000 title claims abstract description 57
- 238000012544 monitoring process Methods 0.000 claims abstract description 156
- 238000004891 communication Methods 0.000 claims abstract description 38
- 238000000034 method Methods 0.000 claims abstract description 31
- 238000012806 monitoring device Methods 0.000 claims abstract description 14
- 238000012549 training Methods 0.000 abstract description 10
- 230000010485 coping Effects 0.000 abstract description 8
- 239000013598 vector Substances 0.000 description 14
- 230000008569 process Effects 0.000 description 13
- 238000010586 diagram Methods 0.000 description 12
- 238000013528 artificial neural network Methods 0.000 description 5
- 239000000969 carrier Substances 0.000 description 5
- 238000003066 decision tree Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 230000000903 blocking effect Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000002265 prevention Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
Definitions
- the present invention relates to a labeling device, a labeling method, and a program.
- a communication security monitoring device such as an IDS (Intrusion Detection System) or an IPS (Intrusion Prevention System) is installed in a communication channel, monitors communication data, discovers malicious communication data (threat data), and deals with it (storage/ (notification/blocking, etc.) (Fig. 1).
- IDS Intrusion Detection System
- IPS Intrusion Prevention System
- a list of threat data (monitoring data) to be discovered is set in the IDS/IPS, and the threat data is discovered by comparing the communication data and the monitoring data.
- a countermeasure (save/notify/block, etc.) is set, and the IDS/IPS takes action according to the setting.
- IDS/IPS monitoring data is provided by the security vendor that provides the IDS/IPS.
- Monitoring data of security vendors (hereinafter referred to as "general-purpose monitoring data") is exhaustive and huge in number in order to require versatility.
- the telecommunications carrier selects only the necessary monitoring data (hereinafter referred to as "individual monitoring data") according to the conditions of its own communication system, and sets the coping method for its own communication system.
- the present invention has been made in view of the above points, and an object of the present invention is to reduce the load of setting work for coping methods for monitoring data.
- the labeling device provides a label indicating whether or not the monitoring data is necessary and how to deal with the specific communication data for each of a plurality of pieces of monitoring data each indicating characteristics of specific communication data.
- a learning unit for learning a model that receives the monitoring data as input and outputs the label corresponding to the monitoring data based on the first monitoring data group to which is assigned; and a setting unit configured to set monitoring data labeled indicating that the monitoring data is a monitoring device that monitors communication data based on the monitoring data.
- FIG. 3 is a diagram showing an example of the functional configuration of a labeling model 13;
- FIG. 4 is a diagram for explaining initial learning of the labeling model 13;
- FIG. 4 is a diagram showing a configuration example of labeled general-purpose monitoring data;
- FIG. 4 is a diagram for explaining operation using a trained labeling model 13a and re-learning of the trained labeling model 13a;
- 4 is a diagram for explaining a learning procedure of the labeling model 13;
- FIG. 2 is a diagram showing a hardware configuration example of the labeling device 10 according to the embodiment of the present invention.
- the labeling device 10 of FIG. 2 has a drive device 100, an auxiliary storage device 102, a memory device 103, a processor 104, an interface device 105, etc., which are interconnected by a bus B, respectively.
- a program that implements the processing in the labeling device 10 is provided by a recording medium 101 such as a CD-ROM.
- a recording medium 101 such as a CD-ROM.
- the program is installed from the recording medium 101 to the auxiliary storage device 102 via the drive device 100 .
- the program does not necessarily need to be installed from the recording medium 101, and may be downloaded from another computer via the network.
- the auxiliary storage device 102 stores installed programs, as well as necessary files and data.
- the memory device 103 reads and stores the program from the auxiliary storage device 102 when a program activation instruction is received.
- the processor 104 is a CPU or a GPU (Graphics Processing Unit), or a CPU and a GPU, and executes functions related to the labeling device 10 according to programs stored in the memory device 103 .
- the interface device 105 is used as an interface for connecting to a network.
- FIG. 3 is a diagram showing a functional configuration example of the labeling device 10 according to the embodiment of the present invention.
- the labeling device 10 includes a learning section 11 , a setting section 12 and a labeling model 13 . These units are implemented by processing that one or more programs installed in the labeling apparatus 10 cause the processor 104 to execute.
- the labeling model 13 receives as input a set of monitoring data (hereinafter referred to as "general-purpose monitoring data") provided by a security vendor (hereinafter referred to as “general-purpose monitoring data group”), and classifies the monitoring data as , and the correct/incorrect judgment result for the classification result.
- the classification results are output in the form of labels.
- a label is information indicating whether general-purpose monitoring data to which the label is attached is necessary and how to deal with it.
- the label assignment model 13 assigns a label as a classification result to the input general-purpose monitoring data.
- the learning unit 11 causes the labeling model 13 to learn the correspondence relationship between each general-purpose monitoring data included in the general-purpose monitoring data group and the labels for the general-purpose monitoring data.
- initial learning and re-learning are performed as the learning of the labeling model 13 .
- labels are manually assigned by the user to the general-purpose monitoring data used as learning training data.
- the setting unit 12 determines whether or not the monitoring data is necessary based on the label indicated by the classification result of the monitoring data, and sets the necessary monitoring data in the communication security monitoring device 20 .
- the communication security monitoring device 20 is, for example, an IDS (Intrusion Detection System) or an IPS (Intrusion Prevention System). That is, the communication security monitoring device 20 monitors communication data based on set monitoring data, detects (discovers) specific communication data corresponding to the monitoring data, A process is executed according to the coping method indicated by the label assigned to the monitoring data.
- FIG. 4 is a diagram showing a functional configuration example of the labeling model 13.
- the labeling model 13 includes three models: a classification estimator 131 , a classification estimation process observer 132 and an error determiner 133 . Each of these units may be the same as the functional unit with the same name described in Patent Document 2.
- FIG. 4 is a diagram showing a functional configuration example of the labeling model 13.
- the labeling model 13 includes three models: a classification estimator 131 , a classification estimation process observer 132 and an error determiner 133 . Each of these units may be the same as the functional unit with the same name described in Patent Document 2.
- FIG. 1 is a diagram showing a functional configuration example of the labeling model 13.
- the labeling model 13 includes three models: a classification estimator 131 , a classification estimation process observer 132 and an error determiner 133 . Each of these units may be the same as the functional unit with the same name described in Patent Document 2.
- FIG. 4 is a diagram showing a functional configuration example of
- the classification estimation unit 131 estimates the label of the input general-purpose monitoring data and outputs the label as the classification result.
- the classification estimator 131 can be implemented using artificial intelligence-related technology such as SVM, neural network, Bayesian network, and decision tree, for example.
- the classification estimation process observation unit 132 observes the calculation process (estimation process) when the classification estimation unit 131 estimates the label of the general-purpose monitoring data, acquires data in the estimation process, and converts the data into a feature vector. and outputs the feature vector to error determination section 133 .
- the classification estimation process observation unit 132 uses values output from each node (activation function) of each intermediate layer and output layer of the neural network as features. May be output as a vector. For example, if the values of each node in the hidden layer are 0.5, 0.4, 0.7 and the values of each node in the output layer are 0.2, 0.7, 0.1, the feature vector is It can be configured as [0.5 0.4 0.7 0.2 0.7 0.1].
- the classification estimation process observing unit 132 observes the route leading to the classification decision and constructs a feature vector. For example, when a certain label is estimated from the route node 1->node 3->node 6, the classification estimation process observation unit 132 assigns [1 0 1 0 0 1 0 0 0] indicating the route to the feature vector can be output as In this example, the subscript of the vector element corresponds to the node number of the decision tree. to construct the feature vector.
- the error determination unit 133 receives the feature vector from the classification estimation process observation unit 132 and determines whether the label estimated by the classification estimation unit 131 is "correct” or "wrong” based on the feature vector.
- the configuration method of the error determination unit 133 is not limited to a specific method.
- the error determination unit 133 determines whether a specific value of the feature vector (especially the value of the output layer of the neural network or the number of votes of the random forest) exceeds the threshold, and the classification estimation unit 131 estimates It can be determined whether the label is "correct” or "wrong".
- the error determination unit 133 may be configured with a model that is often used in the field of machine learning.
- the error determination unit 133 can be configured with an SVM, neural network, or the like. When these models are used, the error determination unit 133 can be implemented by performing model parameter tuning by supervised learning.
- FIG. 5 is a diagram for explaining the initial learning of the labeling model 13.
- the learning unit 11 selects all or part of a set of a plurality of general-purpose monitoring data (hereinafter referred to as "general-purpose monitoring data group X") provided by the security vendor during a predetermined period up to the time of initial learning.
- general-purpose monitoring data group X For the general-purpose monitoring data, an input of a label is received from the user, and a labeled general-purpose monitoring data group X to which the input label is assigned is generated as learning training data (S101).
- the labeling target is limited to a portion of the general-purpose monitoring data, the portion may be selected at random, for example.
- FIG. 6 is a diagram showing a configuration example of labeled general-purpose monitoring data.
- One row in FIG. 6 shows the labeled generic monitoring data.
- a single labeled generic monitoring data includes generic monitoring data and a label.
- Generic monitoring data is data that characterizes specific (eg, malicious) communication data (threat data), including, for example, protocol, source address, source port, destination address, destination port, and communication content data.
- the label values are "unnecessary”, “save”, “notify”, or "block”.
- the telecommunications carrier selects whether or not each general-purpose monitoring data included in the general-purpose monitoring data group X is necessary based on other information related to the general-purpose monitoring data (reports of cyberattacks, information on internal troubles within the carrier, etc.). , "unnecessary" is given to unnecessary general-purpose monitoring data, and necessary general-purpose monitoring data is determined to be dealt with when communication data corresponding to the general-purpose monitoring data is found.
- the setting unit 12 extracts general-purpose monitoring data labeled with a label other than “unnecessary” (hereinafter referred to as “labeled individual monitoring data”) from the labeled general-purpose monitoring data group X, and extracts each labeled individual monitoring data. is set in the communication security monitoring device 20 (S102). At this time, the setting unit 12 specifies, based on the label given to each labeled individual monitoring data, a coping method when communication data corresponding to each labeled individual monitoring data is found, and communicates the coping method. Set to the security monitoring device 20 .
- the learning unit 11 learns the labeled model 13 using the labeled general-purpose monitoring data group X as training data (S103). As a result, a trained labeling model 13a is generated.
- FIG. 7 is a diagram for explaining operation using the trained labeling model 13a and re-learning of the trained labeling model 13a. Re-learning is performed in parallel with operation using the trained labeling model 13a.
- FIG. 7 shows operation and learned labeling when a new set of multiple general-purpose monitoring data (hereinafter referred to as “general-purpose monitoring data group Y”) is provided by a security vendor after generation of the trained labeling model 13a.
- the general-purpose monitoring data group Y may include the general-purpose monitoring data group X or may contain only new data.
- each general-purpose monitoring data included in the general-purpose monitoring data group Y is input to the trained labeling model 13a (S201).
- the trained labeling model 13a outputs, for each general-purpose monitoring data, a label for the general-purpose monitoring data and a correct/wrong determination result for the label.
- the result of correctness/incorrectness determination is "correct” or "wrong”.
- the general-purpose monitoring data group related to the label whose correctness judgment result is “correct” will be referred to as “certain labeled general-purpose monitoring data group Y (automatic)”, and the label related to the label whose judgment result is “wrong”.
- the general-purpose monitoring data group is referred to as "uncertain labeled general-purpose monitoring data group Y".
- the learning unit 11 receives input of a correct label from the user for each labeled general-purpose monitoring data included in the uncertain labeled general-purpose monitoring data group Y, and corrects the assigned label based on the label input by the user. (replace) (S202).
- the uncertain labeled general-purpose monitoring data group Y becomes the certain labeled general-purpose monitoring data group Y (manual). That is, since the label is manually corrected by the user, the labeled general-purpose monitoring data with the correct label is generated.
- the setting unit 12 selects general-purpose monitoring data with a label other than “unnecessary” (hereinafter referred to as “label labeled individual monitoring data”), and set each labeled individual monitoring data to the communication security monitoring device 20 (S203).
- the learning unit 11 also re-learns the labeling model 13 using the reliable labeled general-purpose monitoring data group Y (automatic) and the reliable labeled general-purpose monitoring data group Y (manual) as learning training data (S204). As a result, a relearned labeling model 13b is generated.
- the trained labeled model 13a can be re-learned using the reliable labeled general-purpose monitoring data group Y based thereon as learning training data.
- the greater the number of learning training data the higher the learning effect and the higher the probability of obtaining correct results. Even if the general-purpose monitoring data group Y is only new data, new learning training data can be obtained. It can be expected that the performance of the labeling model 13 will be improved by re-learning with new learning training data.
- the relearned labeling model 13b is operated by the same procedure as in FIG. learning takes place.
- FIG. 8 is a diagram for explaining the learning procedure of the labeling model 13.
- the (labeled) general monitoring data group Z in FIG. 8 is the (labeled) general monitoring data group X in the case of FIG. 5, and the (definitely labeled) general monitoring data group in the case of FIG. Y (automatic or manual).
- the learning unit 11 uses the labeled general-purpose monitoring data group Z to cause the classification estimation unit 131 to learn the correspondence relationship between the general-purpose monitoring data and the labels (S301).
- the learning unit 11 inputs the general-purpose monitoring data group Z to the learned classification estimation unit 131 (S302).
- the classification estimation unit 131 outputs a list of labels (hereinafter referred to as "estimated label list") estimated for each general-purpose monitoring data included in the general-purpose monitoring data group Z (S303).
- the classification estimation process observation unit 132 acquires data of the label estimation process for each general-purpose monitoring data (S304), and outputs a feature vector for each data (S305).
- the learning unit 11 creates a list of correct labels assigned to each labeled general-purpose monitoring data included in the labeled general-purpose monitoring data group Z (hereinafter referred to as a “correct label list”) and an estimated label list. are compared for each element of the list (that is, for each label corresponding to the same general-purpose monitoring data), and a list indicating correctness/incorrectness of each label included in the estimated label list (hereinafter referred to as "correction list”) is generated. (S306).
- the correct/incorrect list is a list of 1's or 0's, such as "1011". A 0 indicates a correct label and a 1 indicates an incorrect label.
- the learning unit 11 causes the error determination unit 133 to learn the correspondence relationship between the feature vector list and the correct/incorrect list (S307). As a result, the error determination unit 133 becomes learned. Note that the learning of the error determination unit 133 is detailed in Patent Document 2 as well.
- the labeling model 13 can automatically label general-purpose monitoring data. As a result, it is possible to reduce the load of the work of setting the coping method for the monitoring data. Further, by re-learning the labeling model 13, the classification accuracy can be improved.
- the learning unit 11 is also an example of a relearning unit.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
This labeling device comprises: a training unit which trains, on the basis of a first monitoring data group in which each of a plurality of pieces of monitoring data each showing characteristics of specific communication data is given a label indicating whether or not the monitoring data is necessary and indicating a method for coping with the specific communication data, a model that receives the monitoring data as an input and outputs the label corresponding to the monitoring data; and a setting unit which sets monitoring data with a label indicating that it is necessary among the first monitoring data group, to a monitoring device that monitors communication data on the basis of the monitoring data, whereby the load of setting work for a method for coping with the monitoring data is reduced.
Description
本発明は、ラベル付与装置、ラベル付与方法及びプログラムに関する。
The present invention relates to a labeling device, a labeling method, and a program.
IDS(Intrusion Detection System)又はIPS(Intrusion Prevention System)等の通信セキュリティ監視装置は、通信路中に設けられ、通信データを監視し、悪意のある通信データ(脅威データ)を発見し対処(保存/通知/遮断等)する装置である(図1)。
A communication security monitoring device such as an IDS (Intrusion Detection System) or an IPS (Intrusion Prevention System) is installed in a communication channel, monitors communication data, discovers malicious communication data (threat data), and deals with it (storage/ (notification/blocking, etc.) (Fig. 1).
IDS/IPSには、発見すべき脅威データ(監視データ)のリストが設定され、通信データと監視データとを比較することにより、脅威データを発見する。監視データには発見時の対処方法(保存する/通知する/遮断する、など)が設定されており、IDS/IPSはその設定に従って対処する。
A list of threat data (monitoring data) to be discovered is set in the IDS/IPS, and the threat data is discovered by comparing the communication data and the monitoring data. In the monitoring data, a countermeasure (save/notify/block, etc.) is set, and the IDS/IPS takes action according to the setting.
IDS/IPSの監視データは、IDS/IPSを提供するセキュリティベンダから提供される。セキュリティベンダの監視データ(以下、「汎用監視データ」という。)は、汎用性を求めるため、網羅的であり膨大な数になる。
IDS/IPS monitoring data is provided by the security vendor that provides the IDS/IPS. Monitoring data of security vendors (hereinafter referred to as "general-purpose monitoring data") is exhaustive and huge in number in order to require versatility.
IDS/IPSを適用し運用する通信事業者がセキュリティベンダの汎用監視データをそのまま利用すると、通信データを膨大な数の汎用監視データと比較することになるため、通信の遅延などの通信性能劣化を引き起こすことになる。
If telecommunications carriers that apply and operate IDS/IPS use general-purpose monitoring data from security vendors as they are, communication data will be compared with a huge amount of general-purpose monitoring data, which will reduce communication performance deterioration such as communication delays. will cause.
そのため、通信事業者は、自通信システムの条件に応じて、必要な監視データ(以下、「個別監視データ」という。)のみを選別し、対処方法も自通信システム用に設定する。
Therefore, the telecommunications carrier selects only the necessary monitoring data (hereinafter referred to as "individual monitoring data") according to the conditions of its own communication system, and sets the coping method for its own communication system.
IDS/IPSを適用し運用する通信事業者にとって、セキュリティベンダの膨大な汎用監視データから個別監視データを選別し、個別の対処方法を設定するには大きな稼働を要する。
For telecommunications carriers that apply and operate IDS/IPS, it takes a lot of work to sort out individual monitoring data from the vast amount of general-purpose monitoring data from security vendors and set individual countermeasures.
本発明は、上記の点に鑑みてなされたものであって、監視データに対する対処方法の設定作業の負荷を軽減することを目的とする。
The present invention has been made in view of the above points, and an object of the present invention is to reduce the load of setting work for coping methods for monitoring data.
そこで上記課題を解決するため、ラベル付与装置は、それぞれが特定の通信データの特徴を示す複数の監視データのそれぞれに対して当該監視データの要否及び前記特定の通信データに対する対処方法を示すラベルが付与された第1の監視データ群に基づいて、前記監視データを入力とし、当該監視データに対応する前記ラベルを出力するモデルを学習する学習部と、前記第1の監視データ群のうち必要であることを示すラベルが付与された監視データを、前記監視データに基づいて通信データを監視する監視装置へ設定する設定部と、を有する。
Therefore, in order to solve the above problem, the labeling device provides a label indicating whether or not the monitoring data is necessary and how to deal with the specific communication data for each of a plurality of pieces of monitoring data each indicating characteristics of specific communication data. a learning unit for learning a model that receives the monitoring data as input and outputs the label corresponding to the monitoring data based on the first monitoring data group to which is assigned; and a setting unit configured to set monitoring data labeled indicating that the monitoring data is a monitoring device that monitors communication data based on the monitoring data.
監視データに対する対処方法の設定作業の負荷を軽減することができる。
It is possible to reduce the load of setting work to deal with monitoring data.
以下、図面に基づいて本発明の実施の形態を説明する。図2は、本発明の実施の形態におけるラベル付与装置10のハードウェア構成例を示す図である。図2のラベル付与装置10は、それぞれバスBで相互に接続されているドライブ装置100、補助記憶装置102、メモリ装置103、プロセッサ104、及びインタフェース装置105等を有する。
Embodiments of the present invention will be described below based on the drawings. FIG. 2 is a diagram showing a hardware configuration example of the labeling device 10 according to the embodiment of the present invention. The labeling device 10 of FIG. 2 has a drive device 100, an auxiliary storage device 102, a memory device 103, a processor 104, an interface device 105, etc., which are interconnected by a bus B, respectively.
ラベル付与装置10での処理を実現するプログラムは、CD-ROM等の記録媒体101によって提供される。プログラムを記憶した記録媒体101がドライブ装置100にセットされると、プログラムが記録媒体101からドライブ装置100を介して補助記憶装置102にインストールされる。但し、プログラムのインストールは必ずしも記録媒体101より行う必要はなく、ネットワークを介して他のコンピュータよりダウンロードするようにしてもよい。補助記憶装置102は、インストールされたプログラムを格納すると共に、必要なファイルやデータ等を格納する。
A program that implements the processing in the labeling device 10 is provided by a recording medium 101 such as a CD-ROM. When the recording medium 101 storing the program is set in the drive device 100 , the program is installed from the recording medium 101 to the auxiliary storage device 102 via the drive device 100 . However, the program does not necessarily need to be installed from the recording medium 101, and may be downloaded from another computer via the network. The auxiliary storage device 102 stores installed programs, as well as necessary files and data.
メモリ装置103は、プログラムの起動指示があった場合に、補助記憶装置102からプログラムを読み出して格納する。プロセッサ104は、CPU若しくはGPU(Graphics Processing Unit)、又はCPU及びGPUであり、メモリ装置103に格納されたプログラムに従ってラベル付与装置10に係る機能を実行する。インタフェース装置105は、ネットワークに接続するためのインタフェースとして用いられる。
The memory device 103 reads and stores the program from the auxiliary storage device 102 when a program activation instruction is received. The processor 104 is a CPU or a GPU (Graphics Processing Unit), or a CPU and a GPU, and executes functions related to the labeling device 10 according to programs stored in the memory device 103 . The interface device 105 is used as an interface for connecting to a network.
図3は、本発明の実施の形態におけるラベル付与装置10の機能構成例を示す図である。図3において、ラベル付与装置10は、学習部11、設定部12及びラベル付与モデル13を含む。これら各部は、ラベル付与装置10にインストールされた1以上のプログラムが、プロセッサ104に実行させる処理により実現される。
FIG. 3 is a diagram showing a functional configuration example of the labeling device 10 according to the embodiment of the present invention. In FIG. 3 , the labeling device 10 includes a learning section 11 , a setting section 12 and a labeling model 13 . These units are implemented by processing that one or more programs installed in the labeling apparatus 10 cause the processor 104 to execute.
ラベル付与モデル13は、セキュリティベンダから提供される監視データ(以下、「汎用監視データ」という。)の集合(以下、「汎用監視データ群」という。)を入力とし、当該監視データの分類結果と、当該分類結果に対する正誤の判定結果とを出力するモデルである。当該分類結果は、ラベルの形式で出力される。ラベルとは、当該ラベルが付与された汎用監視データの要否及び対処方法を示す情報である。ラベル付与モデル13は、入力された汎用監視データに対して、分類結果としてのラベルを付与する。
The labeling model 13 receives as input a set of monitoring data (hereinafter referred to as "general-purpose monitoring data") provided by a security vendor (hereinafter referred to as "general-purpose monitoring data group"), and classifies the monitoring data as , and the correct/incorrect judgment result for the classification result. The classification results are output in the form of labels. A label is information indicating whether general-purpose monitoring data to which the label is attached is necessary and how to deal with it. The label assignment model 13 assigns a label as a classification result to the input general-purpose monitoring data.
学習部11は、汎用監視データ群に含まれる各汎用監視データと、当該汎用監視データに対するラベルとの対応関係をラベル付与モデル13に学習させる。本実施の形態において、ラベル付与モデル13の学習として、初期学習と再学習とが行われる。初期学習において、学習訓練データとして利用される汎用監視データに対するラベルは、ユーザによる手作業により付与される。
The learning unit 11 causes the labeling model 13 to learn the correspondence relationship between each general-purpose monitoring data included in the general-purpose monitoring data group and the labels for the general-purpose monitoring data. In the present embodiment, initial learning and re-learning are performed as the learning of the labeling model 13 . In the initial learning, labels are manually assigned by the user to the general-purpose monitoring data used as learning training data.
設定部12は、監視データに対する分類結果が示すラベルに基づいて、当該監視データの要否を判定し、必要な監視データを通信セキュリティ監視装置20に設定する。なお、通信セキュリティ監視装置20は、例えば、IDS(Intrusion Detection System)又はIPS(Intrusion Prevention System)等である。すなわち、通信セキュリティ監視装置20は、設定された監視データに基づき通信データを監視し、当該監視データに該当する特定の通信データを検知(発見)するとともに、当該特定の通信データに対して、当該監視データに付与されたラベルが示す対処方法に応じた処理を実行する。
The setting unit 12 determines whether or not the monitoring data is necessary based on the label indicated by the classification result of the monitoring data, and sets the necessary monitoring data in the communication security monitoring device 20 . The communication security monitoring device 20 is, for example, an IDS (Intrusion Detection System) or an IPS (Intrusion Prevention System). That is, the communication security monitoring device 20 monitors communication data based on set monitoring data, detects (discovers) specific communication data corresponding to the monitoring data, A process is executed according to the coping method indicated by the label assigned to the monitoring data.
図4は、ラベル付与モデル13の機能構成例を示す図である。図4が示すように、ラベル付与モデル13は、分類推定部131、分類推定過程観測部132及び誤り判定部133の3つのモデルを含む。これら各部は、特許文献2に記載された同名の機能部と同様でよい。
FIG. 4 is a diagram showing a functional configuration example of the labeling model 13. As shown in FIG. As shown in FIG. 4 , the labeling model 13 includes three models: a classification estimator 131 , a classification estimation process observer 132 and an error determiner 133 . Each of these units may be the same as the functional unit with the same name described in Patent Document 2. FIG.
具体的には、分類推定部131は、入力された汎用監視データのラベルを推定し、当該ラベルを分類結果として出力する。分類推定部131は、例えば、SVM、ニューラルネットワーク、ベイジアンネットワーク、決定木などの人工知能関連の技術を用いて実現できる。
Specifically, the classification estimation unit 131 estimates the label of the input general-purpose monitoring data and outputs the label as the classification result. The classification estimator 131 can be implemented using artificial intelligence-related technology such as SVM, neural network, Bayesian network, and decision tree, for example.
分類推定過程観測部132は、分類推定部131が汎用監視データのラベルを推定する際の計算過程(推定過程)を観測して、当該推定過程のデータを取得し、当該データを特徴ベクトルへ変換し、当該特徴ベクトルを誤り判定部133へ出力する。
The classification estimation process observation unit 132 observes the calculation process (estimation process) when the classification estimation unit 131 estimates the label of the general-purpose monitoring data, acquires data in the estimation process, and converts the data into a feature vector. and outputs the feature vector to error determination section 133 .
例えば、分類推定部131がニューラルネットワークを用いてラベルを推定する場合、分類推定過程観測部132は、ニューラルネットワークの各中間層と出力層の各ノード(活性化関数)から出力される値を特徴ベクトルとして出力してもよい。例えば、中間層の各ノードの値が0.5,0.4,0.7であり、出力層の各ノードの値が0.2,0.7,0.1である場合、特徴ベクトルは[0.5 0.4 0.7 0.2 0.7 0.1]と構成することができる。
For example, when the classification estimation unit 131 estimates a label using a neural network, the classification estimation process observation unit 132 uses values output from each node (activation function) of each intermediate layer and output layer of the neural network as features. May be output as a vector. For example, if the values of each node in the hidden layer are 0.5, 0.4, 0.7 and the values of each node in the output layer are 0.2, 0.7, 0.1, the feature vector is It can be configured as [0.5 0.4 0.7 0.2 0.7 0.1].
又は、分類推定部131が決定木を用いてラベルを推定する場合、分類推定過程観測部132は、分類が決定に至るルートを観測して特徴ベクトルを構成する。例えば、ノード1->ノード3->ノード6というルートで或るラベルが推定された場合、分類推定過程観測部132は、当該ルートを示す[1 0 1 0 0 1 0 0 0]を特徴ベクトルとして出力してもよい。この例では、ベクトルの要素の添え字と、決定木のノード番号とが対応しており、そのノードを通過したならそのノードに対応する要素に1が入り、通過していないなら0が入るようにして特徴ベクトルを構成している。
Alternatively, when the classification estimating unit 131 estimates a label using a decision tree, the classification estimation process observing unit 132 observes the route leading to the classification decision and constructs a feature vector. For example, when a certain label is estimated from the route node 1->node 3->node 6, the classification estimation process observation unit 132 assigns [1 0 1 0 0 1 0 0 0] indicating the route to the feature vector can be output as In this example, the subscript of the vector element corresponds to the node number of the decision tree. to construct the feature vector.
その他の特徴ベクトルの例については特許文献2に開示されている通りである。
Examples of other feature vectors are as disclosed in Patent Document 2.
誤り判定部133は、分類推定過程観測部132から特徴ベクトルを受け取り、当該特徴ベクトルに基づいて、分類推定部131が推定したラベルが「正しい」か「誤り」であるかを判定する。
The error determination unit 133 receives the feature vector from the classification estimation process observation unit 132 and determines whether the label estimated by the classification estimation unit 131 is "correct" or "wrong" based on the feature vector.
誤り判定部133の構成方法は特定の方法に限定されない。例えば、誤り判定部133は、特徴ベクトルの特定の値(特にニューラルネットワークの出力層の値やランダムフォレストの得票数)が閾値を越えているかどうかを判定することで、分類推定部131が推定したラベルが「正しい」ものか「誤り」であるかを判定することができる。
The configuration method of the error determination unit 133 is not limited to a specific method. For example, the error determination unit 133 determines whether a specific value of the feature vector (especially the value of the output layer of the neural network or the number of votes of the random forest) exceeds the threshold, and the classification estimation unit 131 estimates It can be determined whether the label is "correct" or "wrong".
また、誤り判定部133は機械学習分野でよく用いられているモデルで構成することとしてもよい。例えばSVM、あるいはニューラルネットワークなどで誤り判定部133を構成することができる。これらのモデルを用いる場合、教師あり学習でモデルのパラメータチューニングを行なうことで誤り判定部133を実装することができる。
Also, the error determination unit 133 may be configured with a model that is often used in the field of machine learning. For example, the error determination unit 133 can be configured with an SVM, neural network, or the like. When these models are used, the error determination unit 133 can be implemented by performing model parameter tuning by supervised learning.
図5は、ラベル付与モデル13の初期学習を説明するための図である。
FIG. 5 is a diagram for explaining the initial learning of the labeling model 13. FIG.
初期学習の時点では、ラベル付与モデル13を学習させる学習訓練データが存在しない。そこで、学習部11は、初期学習の時点までの所定の期間にセキュリティベンダから提供された複数の汎用監視データの集合(以下、「汎用監視データ群X」という。)の全て、又は一部の汎用監視データについて、ラベルの入力をユーザから受け付け、入力されたラベルが付与されたラベル付き汎用監視データ群Xを学習訓練データとして生成する(S101)。ラベルの付与対象を一部の汎用監視データに限定する場合、当該一部は、例えば、ランダムに選択されてもよい。
At the time of initial learning, there is no training data for learning the labeling model 13. Therefore, the learning unit 11 selects all or part of a set of a plurality of general-purpose monitoring data (hereinafter referred to as "general-purpose monitoring data group X") provided by the security vendor during a predetermined period up to the time of initial learning. For the general-purpose monitoring data, an input of a label is received from the user, and a labeled general-purpose monitoring data group X to which the input label is assigned is generated as learning training data (S101). When the labeling target is limited to a portion of the general-purpose monitoring data, the portion may be selected at random, for example.
図6は、ラベル付き汎用監視データの構成例を示す図である。図6における1行は、ラベル付き汎用監視データを示す。1つのラベル付き汎用監視データは、汎用監視データとラベルとを含む。汎用監視データは、特定の(例えば、悪意のある)通信データ(脅威データ)の特徴を示すデータであり、例えば、プロトコル、ソースアドレス、ソースポート、宛先アドレス、宛先ポート及び通信内容データを含む。
FIG. 6 is a diagram showing a configuration example of labeled general-purpose monitoring data. One row in FIG. 6 shows the labeled generic monitoring data. A single labeled generic monitoring data includes generic monitoring data and a label. Generic monitoring data is data that characterizes specific (eg, malicious) communication data (threat data), including, for example, protocol, source address, source port, destination address, destination port, and communication content data.
本実施の形態において、ラベルの値は、「不要」、「保存」、「通知」又は「遮断」である。
In the present embodiment, the label values are "unnecessary", "save", "notify", or "block".
「不要」は、当該ラベルを付与された汎用監視データが通信事業者にとって不要であることを示す。
"Unnecessary" indicates that the labeled general-purpose monitoring data is unnecessary for telecommunications carriers.
「保存」、「通知」及び「遮断」は、通信事業者にとって必要な汎用監視データに対して付与されるラベルである。換言すれば、「保存」、「通知」及び「遮断」は、当該ラベルを付与された汎用監視データが通信事業者にとって必要であることとともに、当該汎用監視データに該当する通信データの発見時の対処方法を示す。
"Save", "notify" and "block" are labels given to general-purpose monitoring data necessary for telecommunications carriers. In other words, "storage", "notification" and "blocking" mean that the labeled general-purpose monitoring data is necessary for telecommunications carriers, and that communication data corresponding to the general-purpose monitoring data is discovered. Indicate how to deal with it.
「保存」は、当該通信データを保存することを示す。「通知」は、当該通信データの検知を通信事業者へ通知させることを示す。「遮断」は、当該通信データを遮断することを示す。
"Save" indicates to save the communication data. "Notify" indicates to notify the carrier of the detection of the communication data. “Cut off” indicates that the communication data is cut off.
例えば、通信事業者は、汎用監視データに関する別の情報(サイバー攻撃の報道や、業者内部の不具合の情報等)に基づき、汎用監視データ群Xに含まれる各汎用監視データについて要否を選別し、不要な汎用監視データには「不要」を付与し、必要な汎用監視データには当該汎用監視データに該当する通信データの発見時の対処を決定する。
For example, the telecommunications carrier selects whether or not each general-purpose monitoring data included in the general-purpose monitoring data group X is necessary based on other information related to the general-purpose monitoring data (reports of cyberattacks, information on internal troubles within the carrier, etc.). , "unnecessary" is given to unnecessary general-purpose monitoring data, and necessary general-purpose monitoring data is determined to be dealt with when communication data corresponding to the general-purpose monitoring data is found.
設定部12は、ラベル付き汎用監視データ群Xから、「不要」以外のラベルが付与された汎用監視データ(以下、「ラベル付き個別監視データ」という。)を抽出し、各ラベル付き個別監視データを通信セキュリティ監視装置20へ設定する(S102)。この際、設定部12は、各ラベル付き個別監視データに該当する通信データを発見した場合の対処方法を各ラベル付き個別監視データに付与されているラベルに基づいて特定し、当該対処方法を通信セキュリティ監視装置20へ設定する。
The setting unit 12 extracts general-purpose monitoring data labeled with a label other than “unnecessary” (hereinafter referred to as “labeled individual monitoring data”) from the labeled general-purpose monitoring data group X, and extracts each labeled individual monitoring data. is set in the communication security monitoring device 20 (S102). At this time, the setting unit 12 specifies, based on the label given to each labeled individual monitoring data, a coping method when communication data corresponding to each labeled individual monitoring data is found, and communicates the coping method. Set to the security monitoring device 20 .
一方、学習部11は、ラベル付き汎用監視データ群Xを学習訓練データとしてラベル付与モデル13を学習する(S103)。その結果、学習済みラベル付与モデル13aが生成される。
On the other hand, the learning unit 11 learns the labeled model 13 using the labeled general-purpose monitoring data group X as training data (S103). As a result, a trained labeling model 13a is generated.
次に、学習済みラベル付与モデル13aを用いた運用及び学習済みラベル付与モデル13aの再学習について説明する。
Next, operation using the trained labeling model 13a and re-learning of the trained labeling model 13a will be described.
図7は、学習済みラベル付与モデル13aを用いた運用及び学習済みラベル付与モデル13aの再学習を説明するための図である。再学習は、学習済みラベル付与モデル13aを用いた運用と並行して行われる。
FIG. 7 is a diagram for explaining operation using the trained labeling model 13a and re-learning of the trained labeling model 13a. Re-learning is performed in parallel with operation using the trained labeling model 13a.
図7は、学習済みラベル付与モデル13aの生成後に、新たな複数の汎用監視データの集合(以下、「汎用監視データ群Y」という。)がセキュリティベンダから提供された場合の運用及び学習済みラベル付与モデル13aの再学習の例を示す。なお、汎用監視データ群Yは、汎用監視データ群Xを内包する場合と、新規データのみである場合が考えられる。
FIG. 7 shows operation and learned labeling when a new set of multiple general-purpose monitoring data (hereinafter referred to as “general-purpose monitoring data group Y”) is provided by a security vendor after generation of the trained labeling model 13a. An example of re-learning of the provision model 13a is shown. The general-purpose monitoring data group Y may include the general-purpose monitoring data group X or may contain only new data.
まず、学習済みラベル付与モデル13aに対して汎用監視データ群Yに含まれる各汎用監視データが入力される(S201)。学習済みラベル付与モデル13aは、当該汎用監視データごとに、当該汎用監視データに対するラベルと、当該ラベルについての正誤の判定結果とを出力する。正誤の判定結果は、「正しい」又は「誤り」である。以下、正誤の判定結果が「正しい」であるラベルに係る汎用監視データ群を、「確実なラベル付き汎用監視データ群Y(自動)」といい、当該判定結果が「誤り」であるベルに係る汎用監視データ群を、「不確実なラベル付き汎用監視データ群Y」という。なお、「確実なラベル付き汎用監視データ群Y(自動)」における「(自動)」は、後述において、ユーザの手作業によって生成される、後述の確実なラベル付き汎用監視データ群Y(手動)と区別するための、便宜的な識別情報である。
First, each general-purpose monitoring data included in the general-purpose monitoring data group Y is input to the trained labeling model 13a (S201). The trained labeling model 13a outputs, for each general-purpose monitoring data, a label for the general-purpose monitoring data and a correct/wrong determination result for the label. The result of correctness/incorrectness determination is "correct" or "wrong". Hereinafter, the general-purpose monitoring data group related to the label whose correctness judgment result is “correct” will be referred to as “certain labeled general-purpose monitoring data group Y (automatic)”, and the label related to the label whose judgment result is “wrong”. The general-purpose monitoring data group is referred to as "uncertain labeled general-purpose monitoring data group Y". It should be noted that "(automatic)" in "reliable labeled general-purpose monitoring data group Y (automatic)" is generated manually by the user, and is described later in a reliable labeled general-purpose monitoring data group Y (manual). It is convenient identification information for distinguishing from
学習部11は、不確実なラベル付き汎用監視データ群Yに含まれる各ラベル付き汎用監視データについて、ユーザから正しいラベルの入力を受け付け、ユーザによって入力されたラベルによって、付与されているラベルを訂正(置換)する(S202)。これによって、不確実なラベル付き汎用監視データ群Yは、確実なラベル付き汎用監視データ群Y(手動)となる。すなわち、ユーザの手作業によってラベルの訂正が行われるため、正しいラベルが付与されたラベル付き汎用監視データが生成される。
The learning unit 11 receives input of a correct label from the user for each labeled general-purpose monitoring data included in the uncertain labeled general-purpose monitoring data group Y, and corrects the assigned label based on the label input by the user. (replace) (S202). As a result, the uncertain labeled general-purpose monitoring data group Y becomes the certain labeled general-purpose monitoring data group Y (manual). That is, since the label is manually corrected by the user, the labeled general-purpose monitoring data with the correct label is generated.
設定部12は、確実なラベル付き汎用監視データ群Y(自動)及び確実なラベル付き汎用監視データ群Y(手動)から、「不要」以外のラベルが付与された汎用監視データ(以下、「ラベル付き個別監視データ」という。)を抽出し、各ラベル付き個別監視データを通信セキュリティ監視装置20へ設定する(S203)。
The setting unit 12 selects general-purpose monitoring data with a label other than “unnecessary” (hereinafter referred to as “label labeled individual monitoring data"), and set each labeled individual monitoring data to the communication security monitoring device 20 (S203).
学習部11は、また、確実なラベル付き汎用監視データ群Y(自動)及び確実なラベル付き汎用監視データ群Y(手動)を学習訓練データとしてラベル付与モデル13を再学習する(S204)。その結果、再学習済みラベル付与モデル13bが生成される。
The learning unit 11 also re-learns the labeling model 13 using the reliable labeled general-purpose monitoring data group Y (automatic) and the reliable labeled general-purpose monitoring data group Y (manual) as learning training data (S204). As a result, a relearned labeling model 13b is generated.
なお、汎用監視データ群Yが汎用監視データ群Xを内包する場合は、これらに基づく確実なラベル付き汎用監視データ群Yを学習訓練データとして学習済みラベル付与モデル13aを再学習することができる。学習訓練データの数が多いほど学習効果は高くなり、正しい結果を得る可能性が高くなる。汎用監視データ群Yが新規データのみである場合でも、新規の学習訓練データが得られる。新規の学習訓練データによる再学習によってラベル付与モデル13の性能の向上を期待することができる。
When the general-purpose monitoring data group Y includes the general-purpose monitoring data group X, the trained labeled model 13a can be re-learned using the reliable labeled general-purpose monitoring data group Y based thereon as learning training data. The greater the number of learning training data, the higher the learning effect and the higher the probability of obtaining correct results. Even if the general-purpose monitoring data group Y is only new data, new learning training data can be obtained. It can be expected that the performance of the labeling model 13 will be improved by re-learning with new learning training data.
その後、新たな汎用監視データ群がセキュリティベンダから提供された場合には、図7と同じ手順によって再学習済みラベル付与モデル13bを用いて運用が行われるとともに、再学習済みラベル付与モデル13bについて再学習が行われる。
After that, when a new set of general-purpose monitoring data is provided by the security vendor, the relearned labeling model 13b is operated by the same procedure as in FIG. learning takes place.
続いて、図5のステップS103及び図7のステップS204の詳細について説明する。図8は、ラベル付与モデル13の学習手順を説明するための図である。図8における(ラベル付き)汎用監視データ群Zは、図5の場合には、(ラベル付き)汎用監視データ群Xであり、図7の場合には、(確実なラベル付き)汎用監視データ群Y(自動又は手動)である。
Next, the details of step S103 in FIG. 5 and step S204 in FIG. 7 will be described. FIG. 8 is a diagram for explaining the learning procedure of the labeling model 13. As shown in FIG. The (labeled) general monitoring data group Z in FIG. 8 is the (labeled) general monitoring data group X in the case of FIG. 5, and the (definitely labeled) general monitoring data group in the case of FIG. Y (automatic or manual).
まず、学習部11は、ラベル付き汎用監視データ群Zを用いて、汎用監視データとラベルとの対応関係を分類推定部131に学習させる(S301)。
First, the learning unit 11 uses the labeled general-purpose monitoring data group Z to cause the classification estimation unit 131 to learn the correspondence relationship between the general-purpose monitoring data and the labels (S301).
続いて、学習部11は、学習済みの分類推定部131に対して汎用監視データ群Zを入力する(S302)。分類推定部131は、汎用監視データ群Zに含まれるそれぞれの汎用監視データに対して推定したラベルのリスト(以下、「推定ラベルリスト」という。)を出力する(S303)。この際、分類推定過程観測部132は、当該汎用監視データごとに、ラベルの推定過程のデータを取得し(S304)、当該データごとに特徴ベクトルを出力する(S305)。
Subsequently, the learning unit 11 inputs the general-purpose monitoring data group Z to the learned classification estimation unit 131 (S302). The classification estimation unit 131 outputs a list of labels (hereinafter referred to as "estimated label list") estimated for each general-purpose monitoring data included in the general-purpose monitoring data group Z (S303). At this time, the classification estimation process observation unit 132 acquires data of the label estimation process for each general-purpose monitoring data (S304), and outputs a feature vector for each data (S305).
続いて、学習部11は、ラベル付き汎用監視データ群Zに含まれるそれぞれのラベル付き汎用監視データに付与されている正しいラベルのリスト(以下、「正解ラベルリスト」という。)と、推定ラベルリストとをリストの要素ごと(すなわち、同一の汎用監視データに対応するラベルごと)に比較し、推定ラベルリストに含まれる各ラベルの正誤を示すリスト(以下、「正誤リスト」という。)を生成する(S306)。正誤リストは、例えば、「1011・・・」のように、1又は0のリストである。0は正解のラベルを示し、1は誤りのラベルを示す。
Subsequently, the learning unit 11 creates a list of correct labels assigned to each labeled general-purpose monitoring data included in the labeled general-purpose monitoring data group Z (hereinafter referred to as a “correct label list”) and an estimated label list. are compared for each element of the list (that is, for each label corresponding to the same general-purpose monitoring data), and a list indicating correctness/incorrectness of each label included in the estimated label list (hereinafter referred to as "correction list") is generated. (S306). The correct/incorrect list is a list of 1's or 0's, such as "1011...". A 0 indicates a correct label and a 1 indicates an incorrect label.
続いて、学習部11は、特徴ベクトルのリストと正誤リストとの対応関係を誤り判定部133に学習させる(S307)。その結果、誤り判定部133は、学習済みとなる。なお、誤り判定部133の学習については、特許文献2にも詳しい。
Subsequently, the learning unit 11 causes the error determination unit 133 to learn the correspondence relationship between the feature vector list and the correct/incorrect list (S307). As a result, the error determination unit 133 becomes learned. Note that the learning of the error determination unit 133 is detailed in Patent Document 2 as well.
上述したように、本実施の形態によれば、ラベル付与モデル13によって、汎用監視データに対して自動的なラベル付けを可能とすることができる。その結果、監視データに対する対処方法の設定作業の負荷を軽減することができる。また、ラベル付与モデル13を再学習することで、分類精度を向上させることができる。
As described above, according to the present embodiment, the labeling model 13 can automatically label general-purpose monitoring data. As a result, it is possible to reduce the load of the work of setting the coping method for the monitoring data. Further, by re-learning the labeling model 13, the classification accuracy can be improved.
なお、本実施の形態において、学習部11は、再学習部の一例でもある。
In addition, in the present embodiment, the learning unit 11 is also an example of a relearning unit.
以上、本発明の実施の形態について詳述したが、本発明は斯かる特定の実施形態に限定されるものではなく、請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。
Although the embodiments of the present invention have been described in detail above, the present invention is not limited to such specific embodiments, and various modifications can be made within the scope of the gist of the present invention described in the claims.・Changes are possible.
10 ラベル付与装置
11 学習部
12 設定部
13 ラベル付与モデル
13a 学習済みラベル付与モデル
13b 再学習済みラベル付与モデル
20 通信セキュリティ監視装置
100 ドライブ装置
101 記録媒体
102 補助記憶装置
103 メモリ装置
104 プロセッサ
105 インタフェース装置
131 分類推定部
132 分類推定過程観測部
133 誤り判定部
B バス 10labeling device 11 learning unit 12 setting unit 13 labeling model 13a learned labeling model 13b relearned labeling model 20 communication security monitoring device 100 drive device 101 recording medium 102 auxiliary storage device 103 memory device 104 processor 105 interface device 131 classification estimation unit 132 classification estimation process observation unit 133 error determination unit B bus
11 学習部
12 設定部
13 ラベル付与モデル
13a 学習済みラベル付与モデル
13b 再学習済みラベル付与モデル
20 通信セキュリティ監視装置
100 ドライブ装置
101 記録媒体
102 補助記憶装置
103 メモリ装置
104 プロセッサ
105 インタフェース装置
131 分類推定部
132 分類推定過程観測部
133 誤り判定部
B バス 10
Claims (7)
- それぞれが特定の通信データの特徴を示す複数の監視データのそれぞれに対して当該監視データの要否及び前記特定の通信データに対する対処方法を示すラベルが付与された第1の監視データ群に基づいて、前記監視データを入力とし、当該監視データに対応する前記ラベルを出力するモデルを学習する学習部と、
前記第1の監視データ群のうち必要であることを示すラベルが付与された監視データを、前記監視データに基づいて通信データを監視する監視装置へ設定する設定部と、
を有することを特徴とするラベル付与装置。 Based on a first monitoring data group in which each of a plurality of pieces of monitoring data each showing characteristics of specific communication data is given a label indicating whether or not the monitoring data is necessary and how to deal with the specific communication data a learning unit that learns a model that receives the monitoring data as input and outputs the label corresponding to the monitoring data;
a setting unit configured to set, from the first monitoring data group, monitoring data labeled as necessary to a monitoring device that monitors communication data based on the monitoring data;
A labeling device comprising: - 前記学習部は、前記監視データを入力とし、当該監視データに対応する前記ラベルと当該ラベルについての正誤の判定結果とを出力する前記モデルを学習し、
第2の監視データ群に含まれる前記監視データのうち、前記モデルが正しいと判定した第1のラベルが出力された前記監視データに対して当該第1のラベルが付与された第1のラベル付き監視データと、前記モデルが誤りであると判定した前記ラベルが出力された前記監視データに対してユーザによって入力された第2のラベルが付与された第2のラベル付き監視データとに基づいて前記モデルを再学習する再学習部、
を有することを特徴とする請求項1記載のラベル付与装置。 The learning unit receives the monitoring data as an input and learns the model that outputs the label corresponding to the monitoring data and the correct/wrong judgment result for the label,
With a first label in which the first label is given to the monitoring data output with the first label judged to be correct by the model, among the monitoring data contained in the second monitoring data group based on the monitoring data and second labeled monitoring data with a second label input by a user to the monitoring data output with the label determined to be erroneous by the model; a relearning unit for relearning the model;
2. The labeling device according to claim 1, characterized by comprising: - 前記設定部は、前記第1のラベル付き監視データと前記第2のラベル付き監視データとを前記監視装置へ設定する、
ことを特徴とする請求項2記載のラベル付与装置。 The setting unit sets the first labeled monitoring data and the second labeled monitoring data to the monitoring device.
3. The labeling apparatus according to claim 2, characterized in that: - それぞれが特定の通信データの特徴を示す複数の監視データのそれぞれに対して当該監視データの要否及び前記特定の通信データに対する対処方法を示すラベルが付与された第1の監視データ群に基づいて、前記監視データを入力とし、当該監視データに対応する前記ラベルを出力するモデルを学習する学習手順と、
前記第1の監視データ群のうち必要であることを示すラベルが付与された監視データを、前記監視データに基づいて通信データを監視する監視装置へ設定する設定手順と、
をコンピュータが実行することを特徴とするラベル付与方法。 Based on a first monitoring data group in which each of a plurality of pieces of monitoring data each showing characteristics of specific communication data is given a label indicating whether or not the monitoring data is necessary and how to deal with the specific communication data , a learning procedure for learning a model that receives the monitoring data as input and outputs the label corresponding to the monitoring data;
a setting procedure for setting, from the first monitoring data group, monitoring data labeled as necessary to a monitoring device that monitors communication data based on the monitoring data;
A labeling method characterized in that the computer executes - 前記学習手順は、前記監視データを入力とし、当該監視データに対応する前記ラベルと当該ラベルについての正誤の判定結果とを出力する前記モデルを学習し、
第2の監視データ群に含まれる前記監視データのうち、前記モデルが正しいと判定した第1のラベルが出力された前記監視データに対して当該第1のラベルが付与された第1のラベル付き監視データと、前記モデルが誤りであると判定した前記ラベルが出力された前記監視データに対してユーザによって入力された第2のラベルが付与された第2のラベル付き監視データとに基づいて前記モデルを再学習する再学習手順、
をコンピュータが実行することを特徴とする請求項4記載のラベル付与方法。 The learning procedure learns the model that receives the monitoring data as an input and outputs the label corresponding to the monitoring data and the result of determining whether the label is correct or incorrect,
With a first label in which the first label is given to the monitoring data output with the first label judged to be correct by the model, among the monitoring data contained in the second monitoring data group based on the monitoring data and second labeled monitoring data with a second label input by a user to the monitoring data output with the label determined to be erroneous by the model; a retraining procedure for retraining the model;
5. The labeling method according to claim 4, wherein the computer executes: - 前記設定手順は、前記第1のラベル付き監視データと前記第2のラベル付き監視データとを前記監視装置へ設定する、
ことを特徴とする請求項5記載のラベル付与方法。 The setting procedure sets the first labeled monitoring data and the second labeled monitoring data to the monitoring device.
6. The labeling method according to claim 5, characterized in that: - 請求項4乃至6いずれか一項記載のラベル付与方法をコンピュータに実行させることを特徴とするプログラム。 A program characterized by causing a computer to execute the labeling method according to any one of claims 4 to 6.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2023514284A JP7544260B2 (en) | 2021-04-15 | 2021-04-15 | Labeling device, labeling method, and program |
PCT/JP2021/015632 WO2022219787A1 (en) | 2021-04-15 | 2021-04-15 | Labeling device, labeling method, and program |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2021/015632 WO2022219787A1 (en) | 2021-04-15 | 2021-04-15 | Labeling device, labeling method, and program |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022219787A1 true WO2022219787A1 (en) | 2022-10-20 |
Family
ID=83640261
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2021/015632 WO2022219787A1 (en) | 2021-04-15 | 2021-04-15 | Labeling device, labeling method, and program |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP7544260B2 (en) |
WO (1) | WO2022219787A1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018077607A (en) * | 2016-11-08 | 2018-05-17 | 株式会社日立システムズ | Security rule evaluation device and security rule evaluation system |
WO2020031960A1 (en) * | 2018-08-06 | 2020-02-13 | 日本電信電話株式会社 | Error determination device, error determination method, and program |
JP2020149090A (en) * | 2019-03-11 | 2020-09-17 | 富士通株式会社 | Determination method, information processing device and determination program |
-
2021
- 2021-04-15 JP JP2023514284A patent/JP7544260B2/en active Active
- 2021-04-15 WO PCT/JP2021/015632 patent/WO2022219787A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018077607A (en) * | 2016-11-08 | 2018-05-17 | 株式会社日立システムズ | Security rule evaluation device and security rule evaluation system |
WO2020031960A1 (en) * | 2018-08-06 | 2020-02-13 | 日本電信電話株式会社 | Error determination device, error determination method, and program |
JP2020149090A (en) * | 2019-03-11 | 2020-09-17 | 富士通株式会社 | Determination method, information processing device and determination program |
Also Published As
Publication number | Publication date |
---|---|
JPWO2022219787A1 (en) | 2022-10-20 |
JP7544260B2 (en) | 2024-09-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xu et al. | Droidevolver: Self-evolving android malware detection system | |
US11055632B2 (en) | Method and device for improving the robustness against “adversarial examples” | |
JP6984551B2 (en) | Anomaly detection device and anomaly detection method | |
US8401982B1 (en) | Using sequencing and timing information of behavior events in machine learning to detect malware | |
US10511613B2 (en) | Knowledge transfer system for accelerating invariant network learning | |
US9009824B1 (en) | Methods and apparatus for detecting phishing attacks | |
WO2018206504A1 (en) | Pre-training system for self-learning agent in virtualized environment | |
CN109871954B (en) | Training sample generation method, abnormality detection method and apparatus | |
US20150047040A1 (en) | Cognitive information security using a behavioral recognition system | |
US8352409B1 (en) | Systems and methods for improving the effectiveness of decision trees | |
US20200234184A1 (en) | Adversarial treatment to machine learning model adversary | |
US11805140B2 (en) | Systems and methods for utilizing a machine learning model to detect anomalies and security attacks in software-defined networking | |
CN110012037B (en) | Network attack prediction model construction method based on uncertainty perception attack graph | |
EP3720054A1 (en) | Abnormal communication detection device, abnormal communication detection method, and program | |
US10931706B2 (en) | System and method for detecting and identifying a cyber-attack on a network | |
CN113691556A (en) | Big data processing method and server applied to information protection detection | |
CN114283306B (en) | Industrial control network anomaly detection method and system | |
CN114363212B (en) | Equipment detection method, device, equipment and storage medium | |
WO2022219787A1 (en) | Labeling device, labeling method, and program | |
CN118353667A (en) | Network security early warning method and system based on deep learning | |
CN104837130B (en) | Wireless sensor network compromise node identification method with time-varying credit value under beta distribution | |
WO2019169982A1 (en) | Url abnormality positioning method and device, and server and storage medium | |
WO2022219786A1 (en) | Labeling device, labeling method, and program | |
CN108334778A (en) | Method for detecting virus, device, storage medium and processor | |
US11487747B2 (en) | Anomaly location identification device, anomaly location identification method, and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21936983 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2023514284 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21936983 Country of ref document: EP Kind code of ref document: A1 |