WO2022064579A1

WO2022064579A1 - Classification device, classification method, and classification program

Info

Publication number: WO2022064579A1
Application number: PCT/JP2020/035873
Authority: WO
Inventors: 麿与山嵜
Original assignee: 日本電信電話株式会社
Priority date: 2020-09-23
Filing date: 2020-09-23
Publication date: 2022-03-31

Abstract

A classification device (1) comprises a pseudo-teacher extraction unit (103), a learning unit (104), and a document classification unit (203). The pseudo-teacher extraction unit (103) creates pseudo-teacher data from threat documents that include descriptions relating to cyber threats, on the basis of a rule for creating pseudo-teacher data. The learning unit (104) learns a noise model and a classification model in parallel using the relationship between these models on the basis of the threat documents and the pseudo-teacher data created by the pseudo-teacher extraction unit (103). The document classification unit (203) classifies an input threat document to be classified, using the classification model generated from the learning by the learning unit (104).

Description

Classification device, classification method and classification program

The present invention relates to a classification device, a classification method, and a classification program.

In recent years, as a security measure method, research on technology that takes measures based on the behavior of an attacker has become active. In this technique, in order to accurately detect the behavior of an attacker, it is important to collect more patterns of the behavior of the attacker. Therefore, as a method of collecting various attacker behavior patterns, research on an extraction method for extracting attacker behavior from threat documents related to cyber threats has been actively conducted.

Various extraction methods have been proposed in research to extract the behavior of attackers represented by ATT & CK (registered trademark). For example, there are a rule-based extraction method using an ontology created by humans and an extraction rule, and an extraction method using supervised learning such as SVM (Support Vector Machine). When machine learning is performed on the behavior of an attacker, it is a multi-label task of assigning a plurality of labels to one data.

In the relationship extraction task in the field of natural language processing, a method called distinct supervision (learning with long-distance supervised learning) that generates labeled teacher data from unlabeled data using a knowledge base and extraction rules created manually. Exists. The distant supervision uses the facts that exist in the knowledge base that expresses the relationship between the actual conditions, and when there is a reference to two actual conditions in a certain sentence and the relationship between those actual conditions exists in the knowledge base. , It is a method of learning based on heuristics that assigns the relation label as pseudo teacher data.

By using distant supervision, although there are False Positive and False Negative data in the pseudo teacher data, it is possible to create large-scale teacher data without manual annotation. .. For distant supervision, a learning method using pseudo teacher data created by different heuristics is being studied not only in the relationship extraction task but also in series labeling and document classification.

Furthermore, in distinct supervision, a method for reducing noise existing in pseudo teacher data is proposed. In the relationship extraction task using deep learning, a method is taken to deal with False Positive by relaxing the hypothesis that assumes reference to the relationship for all sentences including two actual conditions, and multiple methods are used for one data. It is also applicable to multi-label tasks that give the label of.

In addition, noise reduction methods have been proposed in the field of image processing as well, and noise model networks for reducing noise by False Positive and False Negative in multi-label tasks have been proposed.

However, the conventional method of extracting the behavior of an attacker has the following problems. For example, in the rule-based extraction method, since a knowledge base created manually is used, the cost is high, and it may be difficult to secure a sufficient number of rules in terms of processing load and cost to the creator. be. Therefore, in the rule-based extraction method, the accuracy of behavior extraction may decrease.

On the other hand, the extraction method using supervised learning has a problem that the available teacher data is scarce. For example, it is conceivable to use a Web page hyperlinked from the ATT & CT knowledge base as teacher data. However, with this method, the amount of data obtained as teacher data is small. In addition, since the data obtained by this method is not the data constructed as teacher data, there is a possibility that a large amount of False Negative data to which the label that should be originally assigned is not attached may be included. Therefore, even with this method, the accuracy of behavior extraction may decrease.

In addition, in the conventional attacker behavior extraction method, when learning a model from teacher data including noise, the influence of noise is not considered in the learning model, so model parameters are updated with incorrect teacher data. This is done and may cause a decrease in accuracy. For these reasons, it is difficult to improve cyber security capabilities with conventional attacker behavior extraction methods.

According to the distinct supervision, it is possible to deal with the shortage of teacher data, but in the conventional technology, the use of the distinct supervision in the task of extracting the behavior of the attacker has not been examined. Therefore, with the conventional technology, it is difficult to deal with the shortage of teacher data, and it is difficult to improve the cyber security capability.

Furthermore, in the distinct supervision, measures are taken to deal with False Positives in order to reduce noise, but measures to deal with both False Positives and False Negatives are not taken into consideration. Further, as a method for dealing with False Positive and False Negative, a document classification method using a noise model layer has been proposed, but it does not target multi-label tasks. Therefore, even with these techniques, it is difficult to reduce noise in extracting the behavior of an attacker, and it is difficult to improve cyber security capability.

The present invention has been made in view of the above, and an object thereof is to improve cyber security capability.

In order to solve the above-mentioned problems and achieve the purpose, the pseudo-teacher creation department creates pseudo-teacher data from a threat document that has a description about cyber threats based on the pseudo-teacher data creation rules. The learning unit learns the noise model and the classification model in parallel using the mutual relationship based on the threat document and the pseudo-teacher data created by the pseudo-teacher creation unit. The document classification unit classifies the input classification target threat document using the classification model generated by the learning by the learning unit.

According to the present invention, the cyber security capability can be improved.

FIG. 1 is a block diagram of a classification device according to the first embodiment. FIG. 2 is a block diagram showing details of the pseudo-teacher extraction unit. FIG. 3 is a diagram showing an example of a rule for creating pseudo teacher data. FIG. 4 is a block diagram showing details of the learning unit. FIG. 5 is a flowchart of the learning process. FIG. 6 is a flowchart of a pseudo-teacher data generation process by the pseudo-teacher extraction unit. FIG. 7 is a flowchart of a classification model generation process by learning using pseudo teacher data in consideration of noise by the learning unit. FIG. 8 is a flowchart of the threat document classification process by the classification device. FIG. 9 is a diagram showing an example of a computer that executes a classification program.

Hereinafter, one embodiment of the classification device, the classification method, and the classification program disclosed in the present application will be described in detail based on the drawings. The following embodiments do not limit the classification device, classification method, and classification program disclosed in the present application.

[Structure of classification device]
The configuration of the classification device will be described with reference to FIG. FIG. 1 is a block diagram of a classification device. The classification device 1 is an information processing device such as a server. The classification device 1 is a device that classifies the behavior of an attacker with respect to an information processing system. As shown in FIG. 1, the classification device 1 has a model learning unit 10 and an inference unit 20.

The model learning unit 10 performs machine learning to create a classification model that classifies the behavior of an attacker. The model learning unit 10 has a document collection unit 101, a threat document extraction unit 102, a pseudo-teacher extraction unit 103, and a learning unit 104.

The document collection unit 101 collects a set of documents from WEB2 and the like. Then, the document collection unit 101 outputs the collected document set to the threat document extraction unit 102.

The threat document extraction unit 102 receives the input of the document set from the document collection unit 101. Next, the threat document extraction unit 102 extracts threat documents related to cyber threats from the document set. For example, the threat document extraction unit 102 classifies threat documents and other documents by using the structure, rules, statistical methods, and the like of the WEB site from which each document included in the document set is acquired. After that, the threat document extraction unit 102 outputs the extracted threat document to the pseudo-teacher extraction unit 103.

The pseudo-teacher extraction unit 103 receives the input of the threat document extracted from the document set from the threat document extraction unit 102. Next, the pseudo-teacher extraction unit 103 generates pseudo-teacher data from the threat document. After that, the pseudo-teacher extraction unit 103 outputs the threat document to the learning unit 104 together with the generated pseudo-teacher data.

Here, with reference to FIG. 2, the function of the pseudo-teacher extraction unit 103 will be described in detail. FIG. 2 is a block diagram showing details of the pseudo-teacher extraction unit. As shown in FIG. 2, the pseudo-teacher extraction unit 103 includes a sentence division unit 131, a token division unit 132, a rule conforming unit 133, a synthesis rule conforming unit 134, a convergence test unit 135, an external knowledge reference unit 136, and a pseudo-teacher generation unit. It has 137.

The sentence division unit 131 acquires the threat document input from the threat document extraction unit 102. Then, the sentence division unit 131 divides the threat document into sentence units. After that, the sentence division unit 131 outputs each sentence obtained by dividing the threat document to the token division unit 132 and the rule conforming unit 133.

The token division unit 132 receives the input of each sentence obtained by dividing the threat document from the sentence division unit 131. Then, the token dividing unit 132 divides the acquired sentence into token units. A token is a character string consisting of one or a plurality of characters in a character sequence in a threat document, and is a character string that is the smallest unit that makes sense. After that, the token dividing unit 132 outputs each token obtained by dividing each sentence included in the threat document to the rule conforming unit 133.

The rule conforming unit 133 receives the input of each sentence obtained by dividing the threat document from the sentence dividing unit 131. Further, the rule conforming unit 133 receives the input of each token obtained by dividing each sentence included in the threat document from the token dividing unit 132.

Here, the threat document that is the source of the sentence and token acquired by the rule conforming unit 133 is included in the document group by using the structure, rules, statistical methods, etc. of the WEB site from which each document is acquired, as described above. Specified from. Therefore, it is possible to create rules based on threat documents. Therefore, the rule conforming unit 133 has a rule for creating pseudo teacher data created on the premise of a threat document.

For example, the rule conforming unit 133 has external knowledge such as token string matching, regular expression matching, matching using a combination of syntactic structure and token string matching and regular expression matching, and external knowledge such as NVD (National Vulnerability Database) and tokens. It has column matching, regular expression matching, and matching using a combination with a syntactic structure as rules for creating teacher data. However, the rules for creating teacher data are not limited to these, and the rule conforming unit 133 may use other rules as long as the rules are created on the premise of a threat document.

FIG. 3 is a diagram showing an example of a rule for creating pseudo teacher data. Each rule in FIG. 3 is a rule that when the threat document matches the condition on the left side of the arrow, the description of the attacker's behavior on the right side of the arrow exists in the threat document. That is, when a certain threat document is used as input data, the behavior of the attacker described in the threat document that is the output day is specified by using a rule for creating pseudo teacher data that satisfies the condition of the threat document. Can be done. Therefore, by using the rules for creating pseudo-teacher data for threat documents, it is possible to generate pseudo-teacher data including input data and output data.

In FIG. 3, the rule whose type is a phrase is an example of a creation rule corresponding to a match of token columns. A rule whose type is a regular expression is an example of a creation rule that matches a regular expression. A rule whose type is CVE (Common Vulnerabilities and Exposures) or CPE (Common Platform Enumeration) is a creation rule that corresponds to a combination of external knowledge such as NVD and a token string match, a regular expression match, and a syntax structure. This is just one example. The rule whose type is logical operation is a rule that, when the conditions on the left side of the arrow are met, all the creation rules are given to the sentence as pseudo teacher data of the behavior of the attacker on the right side. ..

The rule conforming unit 133 outputs an observation information acquisition request used for determining whether or not the teacher data creation rule is conforming to the external knowledge reference unit 136. The observation information used for conformity judgment using the rules for creating teacher data includes, for example, clues suggesting the behavior of each attacker such as SQL (Structured Query Language) injection, and the programs and files used to execute each behavior. It includes names, API (Application Programming Interface) and function names, command names, and external information associated with CVE (Common Vulnerabilities and Exposures). After that, the rule conforming unit 133 acquires the observation information used for the conformity determination using the teacher data creation rule from the external knowledge reference unit 136.

The rule conforming unit 133 uses the observation information acquired from the external knowledge reference unit 136 as the teacher data creation rule to determine whether or not the teacher data creation rule conforms to the sentence level and token level of the threat document. Here, the determination of conformity of the teacher data creation rule for a certain threat document is whether or not the attacker's behavior is required by using the teacher data creation rule to be determined for the threat document. It is to judge.

After that, the rule conforming unit 133 outputs the teacher data creation rule determined to be conformable to the synthesis rule conforming unit 134 together with the sentence and token included in the threat document targeted for the determination. The rule conforming unit 133 determines whether or not the teacher data creation rule conforms to all the threat documents corresponding to the acquired sentences and tokens, and includes the teacher data creation rule based on the determination in the corresponding threat document. It is output to the synthesis rule conforming unit 134 together with the sentence and the token.

The synthesis rule conforming unit 134 receives input of the teacher data creation rule conforming to each threat document and the sentences and tokens included in the corresponding threat document from the rule conforming unit 133. Next, the synthesis rule conforming unit 134 generates a synthesis rule for each threat document by a logical operation of a teacher data creation rule that can be conformed by the rule conforming unit 133.

Then, the synthesis rule conforming unit 134 outputs an acquisition request for observation information used for determining whether or not the generated synthesis rule is conforming to the external knowledge reference unit 136. The observation information used for determining conformance using this synthesis rule includes clues suggesting the behavior of each attacker, programs and files used to execute each behavior, as well as the information acquired by the rule conforming unit 133. Names, API and function names, command names, and external information associated with CVE are included. After that, the synthesis rule conforming unit 134 acquires observation information used for determining conformability using the synthesis rule from the external knowledge reference unit 136.

The synthesis rule conforming unit 134 uses the observation information acquired from the external knowledge reference unit 136 for the synthesis rule to determine whether or not the synthesis rule is conformable at the sentence level and the token level of the threat document. For example, the rule whose type is a logical operation in FIG. 3 is an example of a synthesis rule that determines conformability by a logical operation as a result of a plurality of rule conformances. The behavior starting with x does not directly correspond to the behavior of the attacker, but is information used to give the behavior of the attacker according to the synthesis rule. The behavior starting from x is, for example, information that gives a match of SQL injection in a rule whose type is a phrase or a predetermined behavior in a rule whose type is CVE.

When it is determined that the synthesis rule can be conformed, the synthesis rule conforming unit 134 outputs the synthesis rule determined to be conformable to the convergence test unit 135 together with the sentence and token included in the threat document targeted for the determination. Further, the synthesis rule conforming unit 134 outputs the creation rule of the teacher data conforming to each threat document acquired from the rule conforming unit 133 to the convergence test unit 135.

After that, when the convergence test unit 135, which will be described later, receives a request for determining whether or not the new composition rule is applicable, the composition rule conforming unit 134 determines whether or not the new composition rule is applicable to the threat document in which the new composition rule is generated. Make a judgment. After that, the synthesis rule conforming unit 134 outputs the synthesis rule determined to be conformable among the composition rules including the new composition rule to the convergence test unit 135 together with the sentence and the token included in the threat document targeted for the determination. ..

The external knowledge reference unit 136 refers to an external knowledge group by being connected to an external server (not shown) or the like in which knowledge is accumulated. The external knowledge reference unit 136 receives a request for acquisition of observation information from the rule conforming unit 133 and the synthesis rule conforming unit 134. Then, the external knowledge reference unit 136 acquires the observation information used for the designated pseudo-teacher data creation rule and the synthesis rule from the external knowledge group. Then, the external knowledge reference unit 136 outputs the acquired observation information to the rule conforming unit 133 or the synthesis rule conforming unit 134, which is the transmission source of the acquisition request.

The convergence test unit 135 receives the input of the composition rule conforming to each threat document and the sentences and tokens included in the corresponding threat document from the composition rule conforming unit 134. Here, a threat document conforming to a certain synthetic rule may result in a new synthetic rule that may be conformed. For example, a synthetic rule generated by a logical operation between a synthetic rule and an original teacher data creation rule may become a new synthetic rule. However, if the attacker's behavior contained in the threat document is not added when the new synthesis rule is used, the number of teacher data will increase even if the teacher data is generated using the new synthesis rule. do not have. Therefore, the convergence test unit 135 determines whether or not a new attacker's behavior is added to the behavior set after the synthesis rule is applied to the behavior set that collects the attacker's behavior included in the threat document.

When a new attacker's behavior is added, the convergence test unit 135 determines that the threat document may be compatible with the synthesis rule to which the new attacker's behavior is added. Then, the convergence test unit 135 requests the synthesis rule conforming unit 134 to determine whether or not the synthesis rule conforms to the behavior of a new attacker.

On the other hand, if no new behavior is added, the convergence test unit 135 determines that the conformity test of the synthesis rule has converged. After that, the convergence test unit 135 outputs the teacher data creation rule and the synthesis rule suitable for each threat document to the pseudo-teacher generation unit 137 together with the sentences and tokens included in the corresponding threat document.

The pseudo-teacher generation unit 137 receives input of teacher data creation rules and synthesis rules suitable for each threat document, as well as sentences and tokens included in the corresponding threat document from the convergence test unit 135. Then, the pseudo-teacher generation unit 137 generates pseudo-teacher data for the sentences and tokens included in the threat document by using the teacher data creation rule and the synthesis rule suitable for the threat document. After that, the pseudo-teacher generation unit 137 outputs the generated pseudo-teacher data to the learning unit 104. Further, the pseudo-teacher extraction unit 103 outputs a threat document to the learning unit 104 in addition to the generated pseudo-teacher data.

The learning unit 104 receives the input of the threat document and the pseudo-teacher data from the pseudo-teacher extraction unit 103. Here, since pseudo teacher data is used, noise exists in the teacher data. Therefore, the learning unit 104 generates a classification model by performing learning in consideration of noise using pseudo teacher data. After that, the learning unit 104 outputs the generated classification model to the document classification unit 203 of the inference unit 20.

Here, the function of the learning unit 104 will be described in detail with reference to FIG. FIG. 4 is a block diagram showing details of the learning unit. As shown in FIG. 4, the learning unit 104 has a document expression layer 141, a label classification layer 142, a token expression layer 143, an encode layer 144, a certainty layer 145, a noise distribution layer 146, and an observation label classification layer 147.

The document expression layer 141 acquires the threat document output from the pseudo-teacher extraction unit 103. Then, the document expression layer 141 acquires the characteristics of each threat document as a whole. The characteristics of the entire threat document are information that expresses the meaning of the threat document. The document representation layer 141 generates one vector for one threat document. After that, the document expression layer 141 outputs the characteristics of each threat document as a whole to the label classification layer 142. The document expression layer 141 can be realized by using, for example, BERT (Bidirectional Encoder Representations from Transformers). However, the method of realizing the document expression layer 141 is not limited to BERT, and an implementation with another configuration can be used.

The token expression layer 143 acquires the threat document output from the pseudo-teacher extraction unit 103. Then, the token expression layer 143 extracts the token included in each threat document. Here, the token expression layer 143 may acquire the token included in each threat document created by the pseudo-teacher extraction unit 103.

Next, the token expression layer 143 acquires the characteristics of each token included in each threat document. The characteristics of a token are information that expresses the meaning of the token. Here, since the length of each token is different, the characteristics of the tokens are also different. Since the token expression layer 143 generates one vector for each token, for one threat document, the vector of the number of tokens contained therein is generated. After that, the token expression layer 143 outputs the characteristics of each token to the encoding layer 144. The token expression layer 143 can also be realized by using BERT, for example, like the document expression layer 141. However, the method of realizing the token expression layer 143 is not limited to BERT, and implementation by other configurations can be used.

By arranging the document expression layer 141 and the token expression layer 143 in this way, the meaning of the entire sentence is taken into consideration when classifying labels, and the elements that affect errors in individual tokens are taken into consideration when modeling noise. can do.

The label classification layer 142 receives input of the overall characteristics of the threat document from the document expression layer 141. Then, the label classification layer 142 performs machine learning using a neural network with the overall characteristics of the threat document as an input, and predicts a true label corresponding to the behavior of the attacker given to the threat document. The label is information indicating what kind of attacker's behavior is included in each threat document. Then, when K labels are assigned, the label classification layer 142 generates a label probability which is a vector representing each probability including each of the K labels.

The encode layer 144 is a layer that converts a variable length token sequence into a fixed length. The encode layer 144 receives an input of a variable length vector representing the characteristics of each token from the token expression layer 143. Then, the encode layer 144 converts the characteristics of each acquired token into a fixed length. Next, the encode layer 144 integrates information representing the characteristics of the fixed-length token to generate one vector. Further, the encoding layer 144 converts the vector into a 4K-dimensional vector, where K is the number of labels given to the generated vector. After that, the encode layer 144 outputs one fixed-length and 4K-dimensional vector collectively representing the characteristics of each token to the certainty layer 145. The encode layer 144 can be realized by using, for example, an RNN (Recurrent Neural Network). However, the method of realizing the encode layer 144 is not limited to the RNN, and it is also possible to use an implementation with another configuration.

The certainty layer 145 is a convolution layer representing the certainty of conversion from the true value (0 or 1) of each label to the value (0 or 1) of the observation label which is a label containing noise. The certainty is the certainty of transition from each value of the true label to each value of the observed label for k labels when the number of labels is k. There are four transition methods because the true label takes two values and the observation label also takes two values. The confidence layer 145 has 4K kernels of size 1 × 1 and bias parameters.

The certainty layer 145 receives an input of one fixed-length and 4K-dimensional vector that collectively represents the characteristics of each token from the encode layer 144. Then, the conviction layer 145 executes a convolution operation using the kernel and the bias parameter held in the acquired vector, and generates a 2 × 2 × k tensor representing the conviction. In this classification device 1, since pseudo teacher data is used when learning, an observation label containing noise is generated. Therefore, it is necessary to determine how the true label behaves with respect to the obtained observation label. Therefore, by using the certainty that the new label indicated by the certainty transitions to each observation label, it is possible to estimate the true label from the obtained observation label. The certainty layer 145 outputs a 2 × 2 × k tensor representing the generated certainty to the noise distribution layer 146.

The noise distribution layer 146 receives an input of a certainty degree represented by a 2 × 2 × k tensor from the certainty degree layer 145. Then, the noise distribution layer 146 normalizes the acquired certainty with the Softmax function to acquire the noise distribution represented by the tensor of 2 × 2 × k. The noise distribution is a probability distribution that represents the transition probability that each of the k true labels transitions to each of the k observation labels in a threat document. After that, the noise distribution layer 146 outputs the noise distribution represented by the 2 × 2 × k tensor to the observation label classification layer 147.

The observation label classification layer 147 receives the input of the noise distribution created by the noise distribution layer 146 based on the output from the token expression layer 143 representing the characteristics of each token. Further, the observation label classification layer 147 obtains a label probability, which is a vector representing each probability that each true label is included for each threat document, from the label classification layer 142. Next, the observation label classification layer 147 calculates the weighted sum of the output of the noise distribution layer 146 to the output of the label classification layer 142 to predict the observation label including noise to be given to each threat document. Then, when K labels are assigned, the observation label classification layer 147 generates an observation label probability, which is a vector representing each probability that each K observation label is included in each threat document.

The document expression layer 141 and the label classification layer 142 in the learning unit 104 correspond to a classification model for calculating the appearance probability of the attacker's behavior included in the threat document and classifying the threat document. Further, the token expression layer 143, the encode layer 144, the certainty layer 145, the noise distribution layer 146, and the observation label classification layer 147 in the learning unit 104 are noise models used to reduce noise when learning the classification model.

Here, the actual learning by the learning unit 104 is carried out in the following flow. Since pseudo-teacher data is used, the label that can be obtained from the threat document is the observation label. Therefore, in order to properly learn the noise model in the observation label classification layer 147, the learning unit 104 adjusts the parameters of the label classification layer 142 and performs the first half learning to improve the prediction accuracy of the true label, and then the noise. The classification model is completed by learning considering noise using the model. After the learning of the first half is completed, the learning unit 104 shifts to the learning of the second half.

In the first half of the learning, the learning unit 104 uses the document expression layer 141 and the label classification layer 142 without using the observation label classification layer 147 from the token expression layer 143, and treats the observation label as a true label and pseudo. Learning is performed based on typical teacher data. In this case, the learning unit 104 learns so as to minimize the error between the output of the label classification layer 142 and the output of the observation label. The learning unit 104 determines, for example, that the learning of the first half is completed when the prediction that the observation label is regarded as a true label is completed a predetermined number of times.

In the latter half of the learning, the learning unit 104 performs learning based on pseudo teacher data using all the layers from the document expression layer 141 to the observation label classification layer 147. In this case, the learning unit 104 learns so as to minimize the error between the output of the observation label classification layer 147 and the observation label given by the pseudo teacher data. As a result, the learning unit 104 can realize a classification model in which the influence of noise due to pseudo teacher data is suppressed by the document expression layer 141 and the label classification layer 142. The learning unit 104 determines, for example, that the latter half of the learning is completed when the prediction of the true label is completed a predetermined number of times. After the latter half of the learning is completed, the learning unit 104 outputs the generated classification model to the document classification unit 203 of the inference unit 20.

The inference unit 20 classifies the behavior of the attacker included in the threat document of the input document using the classification model generated by the model learning unit 10. The inference unit 20 has an input reception unit 201, a threat document extraction unit 202, a document classification unit 203, and an output unit 204.

The input reception unit 201 accepts the input of a document. Then, the input reception unit 201 outputs the received document to the threat document extraction unit 202.

The threat document extraction unit 202 receives the input of the document from the input reception unit 201. Next, the threat document extraction unit 202 extracts threat documents related to cyber threats from the acquired documents. The threat document extraction unit 202 classifies threat documents and other documents by using the structure, rules, statistical methods, etc. of the WEB site from which each document included in the document set is acquired. After that, the threat document extraction unit 202 outputs the extracted threat document to the document classification unit 203.

The document classification unit 203 receives the input of the threat document extracted from the input document from the threat document extraction unit 202. Further, the document classification unit 203 acquires the classification model generated by the learning unit 104 of the model learning unit 10. Then, the document classification unit 203 classifies threat documents using the acquired classification model. That is, the document classification unit 203 obtains the probability that a certain threat document includes the behavior of an attacker. After that, the document classification unit 203 outputs the classification result to the output unit 204.

The output unit 204 receives the input of the input classification result of the document from the document classification unit 203. Then, the output unit 204 outputs the classification result of the input document to a monitor or the like, and notifies the user of the classification result of the input document. Specifically, the output unit 204 outputs information such as what kind of attacker's behavior is included in the threat document in the input document and to what extent.

In this way, by classifying threat documents and determining what kind of attacker's behavior is included in the threat document and how much, for example, what kind of attacker's behavior occurs frequently is specified. It is possible. Also, if each threat document often includes both the attacker's first behavior and the attacker's second behavior, the attacker's second behavior when the attacker's first behavior is detected. It can be judged that there is a high probability that the behavior of will also occur, and it will be possible to take prompt measures.

[Classification process]
Next, with reference to FIG. 5, the entire flow of the learning process for generating the classification model by the model learning unit 10 of the classification device 1 will be described. FIG. 5 is a flowchart of the learning process.

The document collection unit 101 acquires a document set by collecting documents from WEB 2 or the like (step S1). Next, the document collection unit 101 outputs the acquired document set to the threat document extraction unit 102.

The threat document extraction unit 102 receives the input of the document set from the document collection unit 101. Next, the threat document extraction unit 102 extracts a threat document related to the cyber threat from the document set (step S2). After that, the threat document extraction unit 102 outputs the extracted threat document to the pseudo-teacher extraction unit 103.

The pseudo-teacher extraction unit 103 receives the input of the threat document extracted from the document set from the threat document extraction unit 102. Next, the pseudo-teacher extraction unit 103 generates pseudo-teacher data from the threat document (step S3). After that, the pseudo-teacher extraction unit 103 outputs the generated pseudo-teacher data to the learning unit 104.

The learning unit 104 receives the input of the threat document, the token included in each threat document, and the pseudo teacher data from the pseudo teacher extraction unit 103. Next, the learning unit 104 performs learning in consideration of noise using pseudo teacher data to generate a classification model (step S4). After that, the learning unit 104 outputs the generated classification model to the document classification unit 203 of the inference unit 20.

Next, with reference to FIG. 6, the flow of the pseudo-teacher data generation process by the pseudo-teacher extraction unit 103 will be described. FIG. 6 is a flowchart of a pseudo-teacher data generation process by the pseudo-teacher extraction unit. The series of processes shown in the flowchart of FIG. 6 corresponds to an example of the processes executed in step S3 of the flowchart shown in FIG.

The sentence division unit 131 acquires the threat document input from the threat document extraction unit 102 (step S101).

Next, the sentence division unit 131 divides the threat document into sentence units (step S102). After that, the sentence division unit 131 outputs each sentence obtained by dividing the threat document to the token division unit 132 and the rule conforming unit 133.

The token division unit 132 receives the input of each sentence obtained by dividing the threat document from the sentence division unit 131. Then, the token dividing unit 132 divides the acquired sentence into token units (step S103). After that, the token dividing unit 132 outputs each token obtained by dividing each sentence included in the threat document to the rule conforming unit 133.

The rule conforming unit 133 receives the input of each sentence obtained by dividing the threat document from the sentence dividing unit 131. Further, the rule conforming unit 133 receives the input of each token obtained by dividing each sentence included in the threat document from the token dividing unit 132. Then, the rule conforming unit 133 uses the information acquired through the external knowledge reference unit 136 for the teacher data creation rule held by itself, and whether or not the teacher data creation rule at the sentence level and the token level of the threat document conforms. Judgment is made. As a result, the rule conforming unit 133 identifies a rule for creating teacher data applicable to each threat document (step S104). After that, the rule conforming unit 133 outputs the teacher data creation rule determined to be conformable to the synthesis rule conforming unit 134 together with the sentence and token included in the threat document targeted for the determination.

The synthesis rule conforming unit 134 receives input of the teacher data creation rule conforming to each threat document and the sentences and tokens included in the corresponding threat document from the rule conforming unit 133. Next, the synthesis rule conforming unit 134 generates a synthesis rule for each threat document by a logical operation of a teacher data creation rule that can be conformed by the rule conforming unit 133. Then, the synthesis rule conforming unit 134 determines whether or not the synthesis rule is conformable at the sentence level and the token level of the threat document by using the information acquired through the external knowledge reference unit 136 for the synthesis rule (step S105). After that, the synthesis rule conforming unit 134 outputs the synthesis rule determined to be conformable to the convergence test unit 135 together with the sentence and the token included in the threat document targeted for the determination. Further, the synthesis rule conforming unit 134 outputs the creation rule of the teacher data conforming to each threat document acquired from the rule conforming unit 133 to the pseudo teacher generation unit 137.

The convergence test unit 135 receives the input of the composition rule conforming to each threat document and the sentences and tokens included in the corresponding threat document from the composition rule conforming unit 134. Next, the convergence test unit 135 synthesizes depending on whether or not a new attacker's behavior is added to the behavior set after the synthesis rule is applied to the behavior set that collects the attacker's behavior contained in each threat document. It is determined whether or not the conformity of the rules has converged (step S106).

When the conformity of the synthesis rule has not converged (step S106: negation), the convergence test unit 135 determines whether or not the conformity of the synthesis rule may be conformed, which may lead to the behavior of a new attacker. Ask to. The synthesis rule conforming unit 134 adds a new synthesis rule to the synthesis rule for the threat document in which the new synthesis rule has occurred (step S107). After that, the synthesis rule conforming unit 134 returns to step S105.

On the other hand, when the conformity of the synthesis rule is converged (step S106: affirmative), the convergence determination unit 135 sets the teacher data creation rule and the synthesis rule conforming to each threat document to the sentence included in the corresponding threat document and the synthesis rule. It is output to the pseudo-teacher generation unit 137 together with the token. The pseudo-teacher generation unit 137 receives input of a teacher data creation rule and a synthesis rule suitable for each threat document, and sentences and tokens included in the corresponding threat document from the convergence test unit 135. Then, the pseudo-teacher generation unit 137 generates pseudo-teacher data for the sentences and tokens included in the threat document by using the teacher data creation rule and the synthesis rule suitable for the threat document (step S108). .. After that, the pseudo-teacher generation unit 137 outputs the generated pseudo-teacher data to the learning unit 104.

Next, with reference to FIG. 7, the flow of the classification model generation process by learning using pseudo teacher data in consideration of noise by the learning unit 104 will be described. FIG. 7 is a flowchart of a classification model generation process by learning using pseudo teacher data in consideration of noise by the learning unit. The series of processes shown in the flowchart of FIG. 7 corresponds to an example of the processes executed in step S4 of the flowchart shown in FIG.

The document expression layer 141 and the token expression layer 143 acquire the threat document output from the pseudo-teacher extraction unit 103 (step S201).

The document expression layer 141 acquires the characteristics of the entire threat document (step S202). After that, the document expression layer 141 outputs the characteristics of each threat document as a whole to the label classification layer 142.

Here, the learning unit 104 determines whether or not the learning of the first half is completed based on whether or not the calculation of the label probability when the observed label is regarded as a true label has been performed a predetermined number of times (step S203). When the learning of the first half is not completed (step S203: denial), the learning unit 104 uses the document expression layer 141 and the label classification layer 142 without using the observation label classification layer 147 from the token expression layer 143. , The observation label is regarded as a true label, and learning is performed based on the pseudo teacher data (step S204). At this time, the label classification layer 142 receives input of the overall characteristics of the threat document from the document expression layer 141. Then, the label classification layer 142 performs machine learning on the overall characteristics of the threat document using a neural network, and predicts the observation label corresponding to the attacker's behavior given to the threat document as a true label. To calculate the probability label. After that, the learning unit 104 returns to step S201.

On the other hand, when the learning in the first half is completed (step S203: affirmative), the learning unit 104 shifts the learning in the second half. The token expression layer 143 extracts tokens contained in each threat document. Next, the token expression layer 143 acquires the characteristics of each token included in each threat document (step S205). After that, the token expression layer 143 outputs the characteristics of each acquired token to the encoding layer 144.

The encode layer 144 receives the input of the characteristics of each token from the token expression layer 143. Then, the encode layer 144 converts the characteristics of each acquired token into a fixed length. Next, the encode layer 144 integrates information representing the characteristics of the fixed-length token to generate one vector. Further, when the number of labels given to the generated vector is K, the encode layer 144 converts the vector into a 4K-dimensional vector (step S206). After that, the encode layer 144 outputs one fixed-length and 4K-dimensional vector collectively representing the characteristics of each token to the certainty layer 145.

The certainty layer 145 receives an input of one fixed-length and 4K-dimensional vector that collectively represents the characteristics of each token from the encode layer 144. Then, the conviction layer 145 executes a convolution operation using the kernel and the bias parameter held in the acquired vector, and calculates the conviction (step S207). After that, the certainty layer 145 outputs the calculated certainty to the noise distribution layer 146.

The noise distribution layer 146 receives input of certainty from the certainty layer 145. Then, the noise distribution layer 146 normalizes the acquired certainty with the Softmax function to acquire the noise distribution (step S208). After that, the noise distribution layer 146 outputs the noise distribution to the observation label classification layer 147.

The label classification layer 142 receives input of the overall characteristics of the threat document from the document expression layer 141. Then, the label classification layer 142 performs machine learning using the overall characteristics of the threat document, predicts the true label corresponding to the behavior of the attacker given to the threat document, and generates the label probability. The observation label classification layer 147 receives the input of the noise distribution layer 146. Further, the observation label classification layer 147 obtains a label probability, which is a vector representing each probability that each true label is included for each threat document, from the label classification layer 142. Next, the observation label classification layer 147 calculates the weighted sum of the output of the noise distribution layer 146 to the output of the label classification layer 142 to predict the observation label including noise to be given to each threat document. Then, the observation label classification layer 147 generates the observation label probability. The learning unit 104 learns a classification model using a noise model based on the probability label calculated by the label classification layer 142 and the observation label model calculated by the observation label classification layer 147 (step S209).

Next, the learning unit 104 determines whether or not the latter half of the learning is completed based on whether or not the number of times of learning of the classification model using the noise model has reached a predetermined number of times (step S210). If the subsequent learning is not completed (step S210: negation), the learning unit 104 returns to step S201.

On the other hand, when the latter half of the learning is completed (step S210: affirmative), the learning unit 104 ends the classification model generation process.

Next, with reference to FIG. 8, the flow of the threat document classification process by the inference unit 20 of the classification device 1 will be described. FIG. 8 is a flowchart of the threat document classification process by the classification device.

The input receiving unit 201 accepts the input of the document (step S11). Then, the input reception unit 201 outputs the received document to the threat document extraction unit 202.

The threat document extraction unit 202 receives the input of the document from the input reception unit 201. Next, the threat document extraction unit 202 extracts a threat document related to the cyber threat from the acquired document (step S12). After that, the threat document extraction unit 202 outputs the extracted threat document to the document classification unit 203.

The document classification unit 203 receives the input of the threat document extracted from the input document from the threat document extraction unit 202. Further, the document classification unit 203 acquires the classification model generated by the learning unit 104 of the model learning unit 10. Then, the document classification unit 203 classifies the threat document using the acquired classification model (step S13). That is, the document classification unit 203 obtains the probability that a certain threat document includes the behavior of an attacker. After that, the document classification unit 203 outputs the classification result to the output unit 204.

The output unit 204 receives the input of the input classification result of the document from the document classification unit 203. Then, the output unit 204 outputs the classification result of the input document to a monitor or the like, and notifies the user of the classification result (step S14).

[Effects of classification device and classification processing]
As described above, the classification device 1 generates pseudo-teacher data from the acquired threat document by using the rules for creating pseudo-teacher data that matches the acquired threat document. Then, the classification device 1 simultaneously learns the classification model and the noise model based on the acquired threat document and pseudo teacher data. After that, the classification device 1 classifies the threat documents included in the input document by using the learned classification model. That is, the classification device 1 can create pseudo teacher data for performing the behavior of an attacker from a threat document without human intervention, and can generate pseudo teacher data of a natural sentence including generated noise. It can be used to generate a classification model that takes noise into consideration.

In particular, the classification device 1 automatically generates pseudo teacher data based on an extraction rule based on heuristics when a threat document is assumed in order to learn a device that extracts the attacker's behavior from the threat document. .. By using the observation information on the system used when executing a certain behavior such as the program name and file name in this extraction rule, a rule for creating pseudo teacher data that has a strong connection with the behavior and has little notational fluctuation. Can be constructed. Therefore, it is possible to generate pseudo teacher data having a true label with high probability.

Further, the classification device 1 uses the characteristics of the entire document for the construction of the classification model and the characteristics of each token for the construction of the noise model, so that the noise model of the network for variable-length natural sentence input is used. Can be realized. As a result, the classification device 1 has a network structure for reducing the influence of False Negative and False Positive data when generating a classification model using pseudo teacher data of natural sentences including noise.

That is, in the classification device 1, when training a statistical learning model for classifying the behavior of an attacker, by using a creation rule for each label, low cost without manual annotation work for each document. It is possible to construct teacher data with. In addition, by using the classification device 1, noise modeling networks can be learned from teacher data including noise such as pseudo teacher data without using clean data without noise, and False Positive and False Negative data. It will be possible to realize the classification of threat documents with reduced influence of. Therefore, by extracting the attacker's behavior with high accuracy by the classification device 1, it becomes possible to analyze the complicated attacker's behavior using a large-scale threat document set that is difficult to analyze manually, and cyber. It is possible to improve the security capability.

[System configuration, etc.]
Further, each component of each of the illustrated devices is a functional concept, and does not necessarily have to be physically configured as shown in the figure. That is, the specific forms of distribution and integration of each device are not limited to those shown in the figure, and all or part of them may be functionally or physically dispersed or physically distributed in arbitrary units according to various loads and usage conditions. Can be integrated and configured. Further, each processing function performed by each device is realized by a CPU (Central Processing Unit) and a program that is analyzed and executed by the CPU, or hardware by wired logic. Can be realized as.

Further, among the processes described in the present embodiment, all or part of the processes described as being automatically performed can be manually performed, or the processes described as being manually performed can be performed. All or part of it can be done automatically by a known method. In addition, the processing procedure, control procedure, specific name, and information including various data and parameters shown in the above document and drawings can be arbitrarily changed unless otherwise specified.

[program]
As one embodiment, the classification device 1 can be implemented by installing a classification program that executes the above information processing as package software or online software on a desired computer. For example, by causing the information processing device to execute the above classification program, the information processing device can be made to function as the classification device 1. The information processing device referred to here includes a desktop type or notebook type personal computer. In addition, information processing devices include smartphones, mobile phones, mobile communication terminals such as PHS (Personal Handy-phone System), and slate terminals such as PDAs (Personal Digital Assistants). Is done.

Further, the classification device 1 can be implemented as a management server device in which the terminal device used by the user is a client and the service related to the above management process is provided to the client. For example, the management server device is implemented as a server device that receives a config input request as an input and provides a management service for inputting a config. In this case, the management server device may be implemented as a Web server, or may be implemented as a cloud that provides services related to the above management processing by outsourcing.

FIG. 9 is a diagram showing an example of a computer that executes a classification program. The computer 1000 has, for example, a memory 1010 and a CPU 1020. The computer 1000 also has a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. Each of these parts is connected by a bus 1080.

The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM (Random Access Memory) 1012. The ROM 1011 stores, for example, a boot program such as a BIOS (BASIC Input Output System). The hard disk drive interface 1030 is connected to the hard disk drive 1090. The disk drive interface 1040 is connected to the disk drive 1100. For example, a removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1100. The serial port interface 1050 is connected to, for example, a mouse 1110 and a keyboard 1120. The video adapter 1060 is connected to, for example, the display 1130.

The hard disk drive 1090 stores, for example, the OS 1091, the application program 1092, the program module 1093, and the program data 1094. That is, the classification program that defines each process of the classification device 1 having the same function as the classification device 1 is implemented as a program module 1093 in which a code that can be executed by a computer is described. The program module 1093 is stored in, for example, the hard disk drive 1090. For example, the program module 1093 for executing the same processing as the functional configuration in the classification device 1 is stored in the hard disk drive 1090. The hard disk drive 1090 may be replaced by an SSD (Solid State Drive).

Further, the setting data used in the processing of the above-described embodiment is stored as program data 1094 in, for example, a memory 1010 or a hard disk drive 1090. Then, the CPU 1020 reads the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1090 into the RAM 1012 as needed, and executes the process of the above-described embodiment.

The program module 1093 and the program data 1094 are not limited to those stored in the hard disk drive 1090, but may be stored in, for example, a removable storage medium and read by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program module 1093 and the program data 1094 may be stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.). Then, the program module 1093 and the program data 1094 may be read from another computer by the CPU 1020 via the network interface 1070.

1 Sorting device 2 WEB
10 Model learning unit 20 Reasoning unit 101 Document collection unit 102 Threat document extraction unit 103 Pseudo-teacher extraction unit 104 Learning unit 131 Sentence division unit 132 Token division unit 133 Rule conformance unit 134 Synthesis rule conformance unit 135 Convergence judgment unit 136 External knowledge reference unit 137 Pseudo-teacher generation unit 141 Document expression layer 142 Label classification layer 143 Token expression layer 144 Encoding layer 145 Confidence layer 146 Noise distribution layer 147 Observation label classification layer 201 Input reception unit 202 Threat document extraction unit 203 Document classification unit 204 Output unit

Claims

Pseudo-teacher creation department that creates pseudo-teacher data from threat documents that have descriptions about cyber threats based on the rules for creating pseudo-teacher data.
Based on the threat document and the pseudo-teacher data created by the pseudo-teacher creation unit, a learning unit that learns a noise model and a classification model in parallel using mutual relationships, and a learning unit.
A classification device including a document classification unit that classifies input classification target threat documents using the classification model generated by learning by the learning unit.
The classification device according to claim 1, further comprising a threat document extraction unit for extracting a threat document having a description regarding a cyber threat.
The classification device according to claim 1 or 2, wherein the creation rule is a rule based on an extraction rule for extracting the threat document from a document set.
One of claims 1 to 3, wherein the pseudo-teacher creation unit creates the pseudo-teacher data based on a synthesis rule generated by combining the creation rule and a plurality of the creation rules. The classification device described in 1.
Any of claims 1 to 4, wherein the learning unit learns the noise model based on the token included in the threat document, and learns the classification model based on the entire threat document. The classification device described in one.
The learning unit performs learning in the first half of learning the classification model based on the threat document and the pseudo teacher data, and completes the learning of the threat document, the pseudo teacher data, and the first half. The classification device according to any one of claims 1 to 5, wherein the latter half of the learning is performed based on the model, in which the classification model and the noise model are learned in parallel using the mutual relationship. ..
Based on the rules for creating pseudo-teacher data, the process of creating pseudo-teacher data from a threat document that has a description of cyber threats, and the process of creating pseudo-teacher data.
A learning process in which a noise model and a classification model are learned in parallel using mutual relationships based on the threat document and the pseudo teacher data created in the creation process.
A classification method including a document classification step of classifying an input threat document to be classified by using the classification model generated by learning in the learning step.
Creation steps to create pseudo-teacher data from threat documents with descriptions of cyber threats based on the rules for creating pseudo-teacher data,
A learning step in which a noise model and a classification model are learned in parallel using mutual relationships based on the threat document and the pseudo teacher data created in the creation step.
A classification program characterized by causing a computer to execute a document classification step for classifying an input threat document to be classified by using the classification model generated by the training in the learning step.