CN111565192A

CN111565192A - Credibility-based multi-model cooperative defense method for internal network security threats

Info

Publication number: CN111565192A
Application number: CN202010382950.0A
Authority: CN
Inventors: 王志; 陈炜嘉; 付晏升; 王雨奇
Original assignee: Nankai University
Current assignee: Nankai University
Priority date: 2020-05-08
Filing date: 2020-05-08
Publication date: 2020-08-21

Abstract

An intranet security threat multi-model cooperative defense method based on credibility. The method is realized by the following steps: 1, extracting a heterogeneous log template set from a mass of logs by utilizing thirteen log analysis algorithms such as LogSig, partitioning by block _ id to generate a feature matrix, learning the feature matrix by utilizing three machine learning algorithms such as SVM and the like, and establishing thirty-nine log detection models; 2, calculating the credibility of different detection models to the prediction result of the log to be tested through a statistical learning algorithm; and 3, fusing the multi-model prediction results by using the credibility of the multi-model prediction results obtained by calculation, so as to realize the cooperative defense of the heterogeneous model. The method is different from an analysis mode based on a threshold value and a single model, and utilizes thirteen log analysis algorithms and three machine learning algorithms to generate a log detection model so as to realize multi-model cooperation; by using the statistical learning method, the detection capability of the abnormal log is improved.

Description

Credibility-based multi-model cooperative defense method for internal network security threats

Technical Field

The present invention belongs to the field of computer network security.

Background

With the continuous development of computer networks, network security issues become a focus of attention and a challenge. The intranet security threat is a focus of attention in the network security problem, and intranet attacks are frequent and have strong aggressivity. At present, the quantity of logs generated by equipment is larger and larger, manual analysis is difficult, and meanwhile, a single model is degraded along with time, so that a comprehensive and accurate detection result cannot be obtained, and therefore a model which can analyze the logs by using a machine and can defend through multi-model cooperation is required to be constructed to find threats.

The system comprehensively uses a plurality of log analysis algorithms, machine learning algorithms and statistical learning algorithms to cooperatively analyze the problem of intranet attack, and improves the accuracy and stability.

Disclosure of Invention

The invention aims to solve the problems that under the condition that a large number of logs are generated in an intranet, the logs are easy to tamper and cannot be combined for use, and a model is degraded, so that a comprehensive and accurate result cannot be obtained in prediction, and provides an intranet security threat intelligent analysis method based on statistical learning. The method utilizes machine learning analysis instead of manual analysis to realize the analysis of the logs and realize the combined use of the logs generated by different devices in the intranet; the method supports various log analysis models and realizes multi-model cooperation; and the machine learning algorithm and the statistical learning algorithm are utilized to improve the identification capability of the abnormal log.

Technical scheme of the invention

An intranet security threat multi-model cooperative defense method based on credibility relates to the basic concepts:

(1) logging: recording text of valuable application and system running information, wherein the content comprises a time stamp and the like;

(2) log stream: a log having a standard format, usable as input to a log parsing algorithm;

(3) log block: a series of logs generated based on the same behavior;

(4) a log template: the log stream is only composed of constants in the log stream and can represent a group of log streams;

(5) machine learning: researching how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer;

(6) inconsistency metric function: evaluating the similarity of the log stream to be tested and the known log stream set through the score; describing the similarity between a log stream and a group of known log streams, inputting a log template and a log stream to be tested, outputting a numerical value, namely an inconsistency score, wherein the higher the score is, the more similar the log stream to be tested is to the group of log streams; the lower the score is, the more dissimilar the log stream to be tested is to the group of log streams;

(7) P-Value: the statistical quantity of the significance of the log stream to be measured in the known log stream set is measured and used for comparing the credibility of the multi-model prediction result;

the method comprises the following specific steps:

1, calculating a multi-model inconsistency score, which comprises the following steps:

step 1.1, generating a log template set to obtain a characteristic value of a log;

1.1.1, the original log is first preprocessed to generate a log stream set. Processing the log stream set by using a log analysis algorithm f to obtain a log template set T;

1.1.2, input of log analysis: original log stream set X, log parsing algorithm set G:

① Log stream set X containing n log blocks X_j，j∈{1，2，…，n},X＝{x₁,…，x_n}；

② Log resolution Algorithm set F containing m Log resolution algorithms F_k,k∈{1,2,…,n},F＝{f₁,…,f_n}; the input of the algorithm set is a log stream set, and the log stream set is returned as a log template set T.

1.1.3, output of log analysis: set of m log templates T_k,k∈{1,2,…,m},T＝{T₁,…,T_m}; wherein each log template set T_kContaining q log templates t_kb，b∈{1,2,…,q},T_k＝{t_k1,…,t_kq}。

Step 1.2, template preprocessing, namely partitioning the log according to different events;

1.2.1, for a log template set generated by each heterogeneous log analysis algorithm, firstly, partitioning according to block _ id contained in the content of the log template, and then, dividing into a training set and a test set according to a partition proportion and a partition mode given by a user;

1.2.2, preprocessed input: a log template set T comprising m log template sets T_m,k∈{1,2,…,s},T＝{T₁,…，T_s}; wherein each log template set T_kContaining q log templates t_kb，b∈{1,2，…,q},T_k＝{tk1,…,tk_q}；

1.2.3, output of preprocessing: log block set B containing m blocks by block _ id_kK ∈ {1,2, …, m }, where each set of log blocks contains r log blocks b_ki，i∈{1,2,…,r}，B_k＝{b_k1,…,b_kr}; and each log block set is divided into a training set C_kK ∈ {1,2, …, m }, where each training set contains h log blocks c_ki，i∈{1,2，…,h}，C_k＝{c_k1，…，c_kh}, and test set D_kK ∈ {1,2, …, m }, where each test set contains l log blocks d_ki，i∈{1,2,…,l}，D_k＝{d_k1,…,d_kl}，B_k＝{C_k,D_k}。

1.3, extracting features to generate a feature matrix;

1.3.1, performing feature extraction on a training set and a test set in each heterogeneous log block set to generate a matrix for training a machine learning model;

1.3.2, input of feature extraction: training set C, testing set D:

① training set C containing m training sets C_i，i∈{1,2,…,m}；

② test set D containing m test sets D_i，i∈{1,2,…,m}；

1.3.3, output of feature extraction: training matrix E, testing matrix F:

① training matrix E consisting of m training matrices E_i，i∈{1,2,…,m}；

② test matrix F containing m training matrices F_i，i∈{1,2,…,m}；

Step 1.4, training set inconsistency measurement

1.4.1. Each heterogeneous training set C_iI ∈ {1,2, …, m }, calculating an inconsistency score α by using a machine learning algorithm g according to the trained model;

1.4.2, input of training set inconsistency measure: training set C, set of machine learning algorithms (inconsistency metric function) G:

① training set C containing m training sets C_iI ∈ {1,2, …, m }, where each training set contains h log blocks c_ki，i∈{1,2,…,h}，C_k＝{c_k1,…,c_kh}；

② machine learning algorithm set G comprising p log parsing algorithms G_k,k∈{1,2，…,m},G＝{g₁,…,g_pThe returned value of the algorithm is the probability that a certain log block is normal and abnormal, namely the inconsistency score;

1.4.3, output of training set inconsistency measure: a set of inconsistency scores;

1.4.4, algorithm flow:

step 1.5, carrying out inconsistency measurement on the log template set to be measured by the plurality of models respectively;

1.5.1. Each heterogeneous test set d_iCalculating an inconsistency score α by using a machine learning algorithm g according to the trained model, wherein the inconsistency scores given by the heterogeneous model have no comparability, and the quality of a model prediction result cannot be directly compared according to the inconsistency scores;

1.5.2, input of test set inconsistency metric: test set D, set of machine learning algorithms (inconsistency metric function) G:

① test set D containing m test sets D_iI ∈ {1,2, …, m }, where each test set contains l log blocks d_ki，i∈{1,2，…，l}，D_k＝{d_k1,…,d_kl}；

② machine learning algorithm set G comprising p log parsing algorithms G_k,k∈{1,2,…,m}，G＝{g₁,…,g_pThe returned value of the algorithm is the probability that a certain log block is normal and abnormal, namely the inconsistency score;

1.5.3, output of test set inconsistency metric: a set of inconsistency scores;

1.5.4, algorithm flow:

2, calculating the reliability of the model prediction result by using a statistical learning algorithm;

calculating P-Value of each model prediction result, and finally obtaining the reliability of the prediction result according to the P-Value;

2.1, to the feature matrix t_kbIn particular, a feature vector x is generated for the original log blocks in the test set and training set_jAn inconsistency measure is performed, resulting in a corresponding set of inconsistency scores { α_{kb_1},α_{kb_2},…,α_{kb_j}}；

2.2 Log Block y test set_jDiscordance score α_{kb_i}Put into the training set's set of inconsistency scores, such P-Value values are less than or equal to the test set Log Block y_jDiscordance score α_{kb_1}The ratio of the number of log blocks to the total number of log blocks;

the larger the P-Value of 2.3 indicates that the log block y to be tested is_jThe higher the significance in this class;

2.4, input: training an inconsistency scoring set of the set log blocks;

and 2.5, outputting: log block y to be tested_jP-Value of (1);

2.6, algorithm flow:

step 3, performing cooperative defense by multiple models, and jointly predicting the malicious degree of a certain behavior;

and based on the result of the P-Value, fusing the multi-model prediction results by using a simple voting mode according to the maximum error probability given by the user, so as to achieve the cooperative defense.

3.1, calculating P-Value values smaller than the maximum error probability for all the P-Value sets obtained in the previous step to obtain training models corresponding to the P-Value values, if the number of the training models is smaller than the set number s of the models, predicting the log block yi to be tested to be a normal log block, otherwise predicting the log stream y to be tested_jIs an abnormal log block;

3.2, input: test set Log Block y_jA set of P-Value values of; an acceptable maximum error probability, provided by the user, indicating a maximum error probability that the user can accept;

3.3, output: predicting the result;

3.4, algorithm flow:

the invention has the advantages and positive effects that:

the invention provides an intranet multi-model defense system based on credibility. The method utilizes machine learning analysis instead of manual analysis to realize analysis of the log, thereby improving the analysis efficiency; the method realizes the merging use of the logs generated by different equipment of the internal network, and improves the efficiency of detecting the abnormity; the system obtains the credibility of each model by utilizing statistical learning, and performs fusion of a plurality of models based on the credibility, thereby changing the traditional analysis mode based on a threshold value; the method supports various heterogeneous log analysis models, and realizes multi-model cooperative defense by using machine learning and statistical learning methods.

Drawings

Fig. 1 is a flow chart of an intranet security threat intelligent analysis method based on statistical learning.

FIG. 2 is a log template set obtained by using IPLoM algorithm for industrial control logs.

FIG. 3 is a log template set obtained by using a Drain algorithm for an industrial control log.

FIG. 4 is a log template set obtained by using a LogSig algorithm for an industrial control log.

FIG. 5 is the result of a partial log being sorted by BlockId

Fig. 6 is a partial matrix generated by extracting features from a log block set.

FIG. 7 is a P-Value set obtained by using a LogSig algorithm and a decision tree model for an industrial control log.

FIG. 8 is a determination of an anomaly log by a multi-model system.

FIG. 9 is a decision of normal logs for a multi-model system.

FIG. 10 shows the accuracy of the multi-model to single model at a threshold of 0.7 and a ratio value of 13.

FIG. 11 shows the recall ratio of multi-model to single model at a threshold of 0.7 and a ratio value of 13.

FIG. 12 shows F1_ measure for multi-model and single model with a threshold of 0.7 and a ratio value of 13.

FIG. 13 shows the specificity of the multi-model to the single model when the threshold is 0.7 and the ratio value is 13.

Detailed Description

The invention specifically describes the detection of abnormal log blocks as an example, any log analysis algorithm and any machine learning algorithm which can obtain a log template set by inputting an original log stream set can be used in the method, the flow of the method is shown in fig. 1, six log analysis algorithms of AEL, Drain, IPLoM, LogSig, SHISO and Spell and three machine learning algorithms of decision tree, support vector machine and logistic regression are exemplified in the embodiment, and the specific description is as follows:

AEL (inverting Execution logs) is a log parsing algorithm. The algorithm is divided into four steps when the log is analyzed: the first step is anonymization, where the algorithm uses heuristic methods to identify the markers in the log lines that correspond to the dynamically variable portions; secondly, tokenizing, wherein an algorithm divides anonymous log lines into different groups according to the number of words and parameters in each log line; third, comparing the log lines in each group and abstracting them into corresponding execution events; the fourth step is to re-examine all the execution events to identify the events to be merged, resulting in the final log template.

IPLoM is a log resolution algorithm. The algorithm is divided into four steps when the log is analyzed, and all the log is input at the beginning. The first step is that all original logs are divided into different groups according to the length; the second step is to continue grouping the original logs in the same log length group. All words in the same position of all log records are counted, the position with the least unique words is found, and the words are classified according to the unique words. Raw logs with the same unique word are grouped into a group. The third step will operate on each group obtained in the previous step. And dividing the original log into different groups according to the corresponding relation between words. And step four, processing each finally obtained group in the step. And if the words at the same position are different, replacing the words with wildcards, and directly writing down the same words to obtain the log template.

Drain is a log parsing algorithm. The algorithm is divided into three steps when the log is analyzed, and one log is input at a time. The purpose of the first step is to group original logs with the same length into a group; the second step is to classify the logs with the same token and the same length into a group, wherein the token in the method can be the first word of the log, the last word of the log and a wildcard; the third step is to gather the similar logs in the original logs with the same length and the same token together to obtain a log template which can represent the set.

LogSig is a log parsing algorithm. The algorithm is divided into three steps when the log is analyzed, and all the log is input at the beginning. The first step is that each log stream is cut into a plurality of word pairs to form word pair groups; the second step is that all logs are randomly divided into groups with fixed number, and then each log is transferred to a proper group; the third step is to obtain a suitable log template for each group obtained in the previous step.

The SHISO is a method of mining the log format and retrieving log types and parameters in an online manner. The SHISO may continuously optimize the log format in real time by creating a structured tree using nodes generated from the log messages. The method has a mechanism to continuously refine the identified log format, which can be searched for new log messages and to extract log parameters. The method can rapidly analyze the log with the unknown format and extract parameters from the log information, easily know the log types and the parameters in a large amount of system logs, and enable a system log analysis tool to identify and import unstructured log information immediately after the log is transmitted.

Spell is a structured streaming parser for event logs using the LCS (longest common subsequence) method. The algorithm is divided into three steps when the log is analyzed, and one log is input at a time. The algorithm runs in a streaming manner, initially the LCSMap list is empty. The first step is to use a set of delimiters to separate new log entries e_iResolved into token sequence S_i(ii) a The second step is to_iCompares with the LCSeq' S of all LCSObjects in the current LCSMAP to see S_iWhether or not to "match" one of the existing LCSSseq's (hence, line id)_iAdded to the corresponding lcscobject) or we need to create a new lcscobject for the LCSMap; and thirdly, obtaining a new LCS.

Logistic Regression (logistic Regression) is a statistical model widely used for classification, and is a generalized linear Regression analysis model. And (3) establishing a cost function in the face of a regression or classification problem, then iteratively solving the optimal model parameters by an optimization method, and testing and verifying the quality of the solved model. To determine the state of an instance, logistic regression estimates the probability p of all possible states (normal or abnormal). The probability p is calculated by a logistic function, which is built on the labeled training data. When a new instance occurs, the logistic function can compute the probability p for all possible states (0< p < 1). After the probability is obtained, the state with the maximum probability is classified output

A Decision Tree (Decision Tree) is a Tree structure diagram that uses branches to illustrate the prediction state of each instance. The model can efficiently classify unknown data, and is a graphical method for intuitively applying probability analysis. Decision tree models are often used to solve classification and regression problems. A decision tree is a predictive model that is constructed in a top-down manner using training data and represents a mapping between object attributes and object values. Each tree node is created using the current "best" attribute, which is selected by the information gain of the attribute. Its branches represent objects that meet the node conditions.

A Support Vector Machine (Support Vector Machine) is a supervised classification learning method. Given a set of training instances, each labeled as belonging to one or the other of two classes, the SVM training algorithm creates a model that assigns the new instance to one of the two classes, making it a non-probabilistic binary linear classifier. In SVMs, the hyperplane is structured to separate instances of different classes in a high dimensional space. Finding the hyperplane is an optimization problem that maximizes the distance between the hyperplane and the nearest data points in the different classes, reducing the generalization error of the classifier.

Each log analysis algorithm can process the original log set to obtain a template set T capable of representing the log set. For the log x to be tested, an inconsistency score set can be calculated by utilizing an inconsistency measurement function (namely a machine learning algorithm) g according to the obtained log template set T. The inconsistency scores given by the heterogeneous models are not comparable, and the quality of the model prediction result cannot be directly compared according to the inconsistency scores, so that the logs to be tested need to be predicted based on statistical learning after the inconsistency scores are obtained by the heterogeneous models.

1. Get a set of log templates

In the embodiment, an HDFS log set, which is open source data of Logpai in GitHub, is used for testing, the size of the data set is 1.5G, the data set can be divided into 575061 different log blocks according to different events, the log blocks are labeled by experts in related fields, wherein 558223 normal log blocks are shared, and 16838 abnormal log blocks are shared. Further, the distribution time of these logs was 2008/11/09 at 20:35:18 to 2008/11/11 at 11:16: 28. Each log message is converted to a specific event template associated with a key parameter for subsequent analysis by processing the log stream set using six log parsing algorithms AEL, Drain, IPLoM, LogSig, SHISO, and Spell. The IPLoM algorithm combines the log length, the location of the word tokens and the mapping between word tokens to generate a log event template, as shown in FIG. 2. The Drain algorithm utilizes a directed acyclic graph that can be automatically generated and updated to generate a log time template, which is used primarily in online and distributed systems, as shown in FIG. 3. The LogSig algorithm first converts each log into a plurality of word pairs including two words and a position between them, then performs a grouping operation on them, and finally obtains a log event template, as shown in fig. 4.

2. Preprocessing a set of log templates

In this embodiment, for the log template set generated by each heterogeneous log analysis algorithm, firstly, the block _ id included in the content of the log template is partitioned, as shown in fig. 5, each log block includes a plurality of log templates, and then the log template set is partitioned into a training set and a test set according to a partition ratio and a partition manner given by a user.

3. Feature extraction for training set and test set

In the present embodiment, feature extraction is performed on a training set and a test set in each heterogeneous log block set, and a matrix for training a machine learning model is generated, as shown in fig. 6.

4. Computing a training set inconsistency score

In this embodiment, the computation of the inconsistency score is performed on the training set by using three machine learning algorithms, namely a decision tree, a support vector machine and a logistic regression. Each training set will get the same set of inconsistency scores as the number of machine learning models used.

5. Computing a test set inconsistency score

In this embodiment, for each heterogeneous test set, the inconsistency score is calculated by using the model trained in the previous step and using a machine learning algorithm. Since the inconsistency scores given by different models are not comparable, the quality of the model prediction result cannot be directly compared according to the inconsistency scores.

6. Calculating P-Value

In this embodiment, one of the machine learning algorithms is used to score the feature matrix generated by one training set, and the corresponding inconsistency score is taken as S, and a trained machine learning model is obtained, and the model is used to score the feature vector x generated by the test set log block, so as to obtain an inconsistency score a. And placing the inconsistency score alpha of the feature vector x into an inconsistency score set S corresponding to the training set, wherein the P-Value in the template is the ratio of the number of log blocks with the inconsistency score smaller than or equal to alpha in the set S to the total number of the log blocks. And calculating the same log block to be tested by using different machine learning models to obtain the P-Value set corresponding to each log block. Here, an example of a corresponding P-Value set after a test set is scored by a decision tree model trained by a training set generated by a LogSig algorithm is shown in fig. 7.

7. Detecting log streams based on statistical learning

Calculating P-Value values smaller than the maximum error probability for all the P-Value sets obtained in the previous step to obtain training models corresponding to the P-Value values, and if the number of the training models is smaller than the set number s of the models, predicting the log block y to be tested_iIs a normal log block, otherwise, the log stream y to be tested is predicted_iIs an exception log block.

Fig. 8 shows a case where a log block is determined to be abnormal by the method, and it can be seen that, although the log block is abnormal, both the Spell-LR model and the Spell-SVM model determine that the log block records a normal behavior with a relatively high p-value. Similarly, fig. 9 shows a case where the method determines a log block as normal, and although the log block is normal, there are 6 models that erroneously determine it as an abnormal log block.

FIG. 10 shows the comparison of multiple models utilized by the method with a single model in terms of accuracy, recall, F1_ measure, and specificity.

8. General algorithm flow

(1) Inputting: a log stream set X, a log Y to be tested, a log analysis algorithm set F and a machine learning algorithm set G:

① Log stream set X containing n logs X_j，j∈{1，2，…，n}，X＝{x₁，…，x_n}；

② Log block Y to be tested containing s original logs Y_j，j∈{1，2，…，s}；

③ Log resolution Algorithm set F containing m Log resolution algorithms F_k,k∈{1，2，…,m},F＝{f₁,…,f_m}; the input of the algorithm set is a log stream set, and a log template set is returned;

④ Log template set T containing m Log template sets T_k,k∈{1,2,…,m},T＝{T₁，…，T_m}；

⑤ Log template set B classified by Log Block comprising m Log template sets B partitioned by Block _ id_kK ∈ {1,2, …, m }, where each log template set contains r log blocks b_ki，i∈{1,2,…,r}，B_k＝{b_k1,…,b_kr}；

⑥ machine learning algorithm set G, comprising p log parsing algorithms G_k,k∈{1,2,…,m},G＝{g₁,…,g_pThe returned value is the probability that a certain log block is normal and abnormal, namely the inconsistency score, and the input of the function is a feature vector v in a feature matrix generated by a log stream set_iThe return value is two real numbers, which indicates the similarity of the feature matrix and the log stream set;

the acceptable maximum error probability is provided by the user and indicates the maximum error probability which can be accepted by the user;

⑧ model number ratio provided by the user, which indicates that if the number of training models corresponding to the P-Value less than the maximum error probability is less than the set model number ratio, the log block y to be tested is predicted_iIs a normal log block, otherwise, the log stream y to be tested is predicted_iIs an exception log block.

(2) And (3) outputting:

and predicting the result, namely whether the log block y to be tested is normal or abnormal.

(3) The algorithm flow is as follows:

Claims

1. an intelligent analysis method for intranet security threats is characterized by comprising the following steps:

1.3, extracting features to generate a feature matrix;

1.4, calculating the inconsistency score of the training set;

3, performing cooperative defense by multiple models, and jointly predicting the malicious degree of a certain behavior;

2. The intelligent analysis method for intranet security threats according to claim 1, wherein the step 1.1 comprises:

1.1.1, firstly preprocessing an original log to generate a log stream set, and processing the log stream set by using a log analysis algorithm f to obtain a log template set T;

① Log stream set X containing n log blocks X_j，j∈{1,2，…，n}，X＝{x₁，…，x_n}；

② Log resolution Algorithm set F containing m Log resolution algorithms F_k,k∈{1,2,…,m},F＝{f₁,…，f_m}; the input of the algorithm set is a log stream set, and the log stream set is returned as a log template set T;

1.1.3, outputting m log template sets T_m,k∈{1,2,…，m},T＝{T₁,…,T_m}; wherein each log template set T_kContaining q log templates t_kb，b∈{1,2,…,q},T_k＝{t_k1，…,t_kq}。

3. The intelligent analysis method for intranet security threats according to claim 2, wherein the step 1.2 comprises:

1.2.2, preprocessed input: a log template set T comprising m log template sets T_k,k∈{1,2,…,m},T＝{T₁,…,T_m}; wherein each log template set T_kContaining q log templates t_kb，b∈{1,2,…,q},T_k＝{t_k1,…,t_kq}；

1.2.3, output of preprocessing: log block set B containing m blocks by block _ id_kK ∈ {1,2, …, m }, where each set of log blocks contains r log blocks b_ki，i∈{1,2,...,r}，B_k＝{b_k1,...,b_kr}; and each log block set is divided into a training set C_kK ∈ {1,2, ·, m }, where each training set contains h log blocks c_ki，i∈{1,2,...,h}，C_k＝{c_k1,...,c_kh}, and test set D_kK ∈ {1, 2...., m }, where each test set contains l log blocks d_ki，i∈{1,2,...,l}，D_k＝{d_k1,...,d_kl}，B_k＝{C_k,D_k}。

4. The intelligent analysis method for intranet security threats according to claim 3, wherein the step 1.3 comprises:

1.3.2, input of feature extraction: training set C, testing set D:

① training set C containing m training sets C_i，i∈{1,2,...,m}；

② test set D containing m test sets D_i，i∈{1,2,...,m}；

1.3.3, output of feature extraction: training matrix E, testing matrix F:

① training matrix E consisting of m training matrices E_i，i∈{1,2,...,m}；

② test matrix F containing m training matrices F_i，i∈{1,2，...，m}。

5. The intelligent analysis method for intranet security threats according to claim 4, wherein the step 1.4 comprises:

1.4.1. Each heterogeneous training set C_i，i∈{1，2，...，m}，Calculating an inconsistency score α by using a machine learning algorithm g according to the trained model;

1.4.2, input of training set inconsistency measure: training set C, machine learning algorithm, i.e. inconsistency metric function set G:

① training set C containing m training sets C_iI ∈ {1, 2...., m }, where each training set contains h log blocks c_ki，i∈{1,2,...,h}，C_k＝{c_k1,...,c_kh}；

② machine learning algorithm set G comprising p log parsing algorithms G_k,k∈{1,2,…,m},G＝{g₁,…,g_pThe returned value of the algorithm is the probability that a certain log block is normal and abnormal, namely the inconsistency score;

1.4.4, algorithm flow:

let c_ki∈C_k，C_k＝{c_k1,...,c_kh},C_k∈C，C＝{C₁,...，C_m}；g_j∈G，G＝{g₁，…，g_p}

6. The intelligent analysis method for intranet security threats according to claim 5, wherein the step 1.5 comprises:

1.5.2, input of test set inconsistency metric: test set D, machine learning algorithm, i.e. inconsistency metric function set G:

① test set D containing m testsTrial set D_iI ∈ {1, 2...., m }, where each test set contains l log blocks d_ki，i∈{1，2，...,l}，D_k＝{d_k1,...，d_kl}；

② machine learning algorithm set G comprising p log parsing algorithms G_k,k∈{1，2，…,m},G＝{g₁,…,g_p}，

The return value of the algorithm is the probability that a certain log block is normal and abnormal, namely the inconsistency score;

1.5.3, output of test set inconsistency metric: a set of inconsistency scores;

1.5.4, algorithm flow:

let d_ki∈D_k，D_k＝{d_k1,...,d_kh},D_k∈D，D＝{D₁,...,D_m}；g_j∈G,G＝{g₁,…,g_p}

7. The intelligent analysis method for intranet security threats according to claim 6, wherein the 2 nd step of credibility calculation comprises the following steps:

2.1, to the feature matrix t_kbIn particular, a feature vector x is generated for the original log blocks in the test set and training set_jAn inconsistency measure is performed, resulting in a corresponding set of inconsistency scores { α_{kb_1},α_{kb_2},...,α_{kb_j}}；

2.4, input: training an inconsistency scoring set of the set log blocks;

and 2.5, outputting: log block y to be tested_jP-Value of (1);

2.6, algorithm flow:

8. the intelligent analysis method for intranet security threats according to claim 7, wherein the step 3 comprises:

3.1, calculating P-Value values smaller than the maximum error probability for all the P-Value sets obtained in the previous step to obtain training models corresponding to the P-Value values, and if the number of the training models is smaller than the set number s of the models, predicting the log block y to be tested_iIs a normal log block, otherwise, the log block y to be tested is predicted_iIs an abnormal log block;

3.2, input: test set Log Block y_iA set of P-Value values of; an acceptable maximum error probability, provided by the user, indicating a maximum error probability that the user can accept;

3.3, output: predicting the result;

3.4, algorithm flow: