CN110826617A

CN110826617A - Situation element classification method and training method and device of model thereof, and server

Info

Publication number: CN110826617A
Application number: CN201911057313.XA
Authority: CN
Inventors: 李欣; 孙海春; 段詠程
Original assignee: CHINESE PEOPLE'S PUBLIC SECURITY UNIVERSITY
Current assignee: CHINESE PEOPLE'S PUBLIC SECURITY UNIVERSITY
Priority date: 2019-10-31
Filing date: 2019-10-31
Publication date: 2020-02-21

Abstract

The invention provides a situation element classification method, a training method and a training device of a model thereof, and a server; wherein, the method comprises the following steps: after a current training data set is determined based on a preset training set, inputting training data in the training data set into a pre-established initial classifier to obtain a classification result corresponding to each training data; determining the classification accuracy of the initial classifier according to the classification and classification results of the pre-acquired training data; further optimizing preset parameters of the initial classifier according to the classification accuracy and a preset optimization algorithm; and continuously determining the current training data set based on the preset training set until the classification accuracy rate is converged to obtain the situation element classification model. According to the invention, the speed and the accuracy of the situation element classification are improved by optimizing the parameters of the situation element classification model.

Description

Situation element classification method and training method and device of model thereof, and server

Technical Field

The invention relates to the technical field of network security, in particular to a situation element classification method, a situation element model training method, a situation element classification device and a situation element model training server.

Background

In the related technology, the original situation data adopted in the situation element classification method comprises a large amount of redundant data with low importance degree, so that the speed and the accuracy of situation element classification are reduced; and the classifier adopted in the method has large accuracy fluctuation and overlong operation time, so that the situation element classification accuracy is low.

Disclosure of Invention

In view of the above, the present invention provides a situation element classification method, a training method of a model thereof, an apparatus thereof, and a server, so as to improve the classification speed and accuracy of the situation elements.

In a first aspect, an embodiment of the present invention provides a method for training a situation element classification model, including: determining a current training data set based on a preset training set; the training data set includes a plurality of training data; inputting training data in a training data set into a pre-established initial classifier to obtain a classification result corresponding to each training data; determining the classification accuracy of the initial classifier according to the class and classification result of the pre-acquired training data; optimizing preset parameters of the initial classifier according to the classification accuracy and a preset optimization algorithm; and continuously determining the current training data set based on the preset training set until the classification accuracy rate is converged to obtain the situation element classification model.

With reference to the first aspect, an embodiment of the present invention provides a first possible implementation manner of the first aspect, where the method further includes: acquiring an original training data set; the original training data set comprises a plurality of situation element data; preprocessing the situation element data; simplifying the preprocessed situation element data by adopting a preset simplification algorithm to obtain a training set; the number of situation element data in the training set is not larger than the number of situation element data in the original training data set.

With reference to the first possible implementation manner of the first aspect, an embodiment of the present invention provides a second possible implementation manner of the first aspect, where the simplified algorithm includes a linear discriminant analysis algorithm; and simplifying the preprocessed situation element data by adopting a preset simplification algorithm to obtain a training set, wherein the training set comprises the following steps: and performing dimensionality reduction on the preprocessed situation element data by adopting a linear discriminant analysis algorithm to obtain a training set.

With reference to the first aspect, an embodiment of the present invention provides a third possible implementation manner of the first aspect, where the initial classifier includes an XGboost classifier; the preset parameters include learning rate, maximum tree depth, minimum leaf weight, and minimum loss function degradation value.

With reference to the third possible implementation manner of the first aspect, the present invention provides a fourth possible implementation manner of the first aspect, wherein the optimization algorithm includes a quantum genetic algorithm; the method comprises the following steps of optimizing preset parameters of an initial classifier according to classification accuracy and a preset optimization algorithm, wherein the steps comprise: taking the classification accuracy as a fitness parameter of the quantum genetic algorithm; and updating the learning rate, the maximum tree depth, the minimum leaf weight and the minimum loss function reduction value of the current XGboost classifier through a quantum genetic algorithm to obtain the updated XGboost classifier.

In a second aspect, an embodiment of the present invention further provides a situation element classification method, including: acquiring an original situation element data set; the original situation element data set comprises a plurality of original situation element data to be classified; preprocessing the original situation element data; simplifying the preprocessed original situation element data by adopting a preset simplification algorithm; classifying each simplified original situation element data through a situation element classification model obtained through pre-training to obtain a classification result; the situation element classification model is established through the training method of the situation element classification model.

In a third aspect, an embodiment of the present invention further provides a training device for a situation element classification model, including: the training data determining module is used for determining a current training data set based on a preset training set; the training data set includes a plurality of training data; the first classification module is used for inputting the training data in the training data set into a pre-established initial classifier to obtain a classification result corresponding to each training data; the classification accuracy determining module is used for determining the classification accuracy of the initial classifier according to the classification and classification results of the pre-acquired training data; the parameter optimization module is used for optimizing preset parameters of the initial classifier according to the classification accuracy and a preset optimization algorithm; and continuously determining the current training data set based on the preset training set until the classification accuracy rate is converged to obtain the situation element classification model.

In a fourth aspect, an embodiment of the present invention further provides a situation element classification device, including: the original data acquisition module is used for acquiring an original situation element data set; the original situation element data set comprises a plurality of original situation element data to be classified; the preprocessing module is used for preprocessing the original situation element data; the simplification processing module is used for simplifying the preprocessed original situation element data by adopting a preset simplification algorithm; the second classification module is used for classifying the simplified original situation element data through a situation element classification model obtained through pre-training to obtain a classification result; the situation element classification model is established through the training method of the situation element classification model.

In a fifth aspect, an embodiment of the present invention further provides a server, which includes a processor and a memory, where the memory stores machine executable instructions that can be executed by the processor, and the processor executes the machine executable instructions to implement the above-mentioned training method for the situation element classification model or the situation element classification method.

In a sixth aspect, the embodiments of the present invention further provide a machine-readable storage medium, where the machine-readable storage medium stores machine-executable instructions, and when the machine-executable instructions are called and executed by a processor, the machine-executable instructions cause the processor to implement the above-mentioned training method or the above-mentioned posture element classification method of the posture element classification model.

The embodiment of the invention has the following beneficial effects:

the embodiment of the invention provides a situation element classification method and a training method, a device and a server of a model thereof.A current training data set is determined based on a preset training set, and then training data in the training data set are input into a pre-established initial classifier to obtain a classification result corresponding to each training data; determining the classification accuracy of the initial classifier according to the classification and classification results of the pre-acquired training data; further optimizing preset parameters of the initial classifier according to the classification accuracy and a preset optimization algorithm; and continuously determining the current training data set based on the preset training set until the classification accuracy rate is converged to obtain the situation element classification model. The method improves the speed and the accuracy of the situation element classification by optimizing the parameters of the situation element classification model.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the invention as set forth above.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a signal flow diagram of a method for extracting situation elements according to an embodiment of the present invention;

FIG. 2 is a flowchart of a method for training a situation element classification model according to an embodiment of the present invention;

FIG. 3 is a flowchart of another training method for a situation element classification model according to an embodiment of the present invention;

fig. 4 is a flowchart of a situation element classification method according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a local classification module based on an LDA-QGA-XGBoost situation element extraction method according to an embodiment of the present invention;

fig. 6 is a flowchart of a method for extracting situation elements based on LDA-QGA-XGBoost according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a training apparatus for a situation element classification model according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a situation element classification apparatus according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a server according to an embodiment of the present invention.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Network situation awareness is the capability of dynamically and integrally knowing network security risks based on a network environment; based on the security big data, a mode for recognizing, understanding, analyzing and responding to the handling capacity of the discovery of the security threat is promoted from a global view.

In the related art, the situation element extraction method flow shown in fig. 1 realizes the perception of the network situation; the method is realized based on an equipment layer, a sensor layer and a situation element extraction layer.

The device layer is a network device adopted in the current network, such as a host, a server, a firewall, a router, an IDS and the like. The sensor layer is mainly responsible for data acquisition of the device layer, and mainly takes charge of data acquired by heterogeneous sensors deployed on the Network device, such as log-type data, vulnerability scanning records, Simple Network Management Protocol (SNMP) data, Network flow (Network traffic) data analysis records, and the like. The situation element extraction layer is mainly responsible for processing, analyzing and extracting the whole situation element original data collected by the sensor layer.

As shown in fig. 1, the situation element extraction layer includes a global classification module and a local classification module. The global classification module provides classification rules for the local classification module, is responsible for integrating and counting the processing results of the local classification module, and provides global situation elements. The local classification module is responsible for preprocessing the initial situation element set, such as data normalization, format normalization and the like, and provides the local situation elements for the global classification module through the classification learning module. The situation elements of each local network area can be obtained in real time by using the extraction model, so that the situation elements of the whole network area can be obtained, and when a certain local network area has a problem, effective response measures can be timely found and taken through the model.

The current network topology structure is complex, the adopted network devices are different, therefore, a hierarchical network security situation element extraction model is adopted, firstly, heterogeneous sensors are deployed on each regional device to obtain situation element original data including data such as security logs and vulnerability information, then, the situation element original data are uploaded and then are subjected to preprocessing operation to form local situation elements, then, the local situation elements are extracted through a classification learning module, and finally, local analysis results are collected to form global situation elements. The main principle of the hierarchical network security situation element extraction model is to realize the dynamic and real-time network security situation element extraction work of the whole network through local block regional analysis.

In the process of classifying the situation elements, the attributes with different importance degrees have different effects on classification, so that the classification effect of the classifier is reduced by using data which is not subjected to feature reduction and noise reduction; a large number of parameters exist in the classifier, and the classification model trained based on different parameters also influences the effect of the classifier, so that the optimal model parameter is found to play an important role in the situation element extraction effect.

The traditional situation element extraction method only uses a classifier to extract situation element original data, does not reduce the situation element original data, and does not clear a large amount of redundant data with low importance degree in the situation element data, so that the speed and the accuracy of situation element extraction are greatly influenced; most classifiers used in the traditional situation element extraction method are not subjected to parameter optimization processing or only subjected to parameter processing by using a grid search algorithm, so that the classifier accuracy rate is large in fluctuation, and the operation time is too long.

Based on this, the embodiment of the invention provides a situation element classification method, a situation element model training method, a situation element classification device and a situation element model training server, which can be applied to classification of situation element data or other data.

For the convenience of understanding the embodiment, a detailed description is first given to a training method of a situation element classification model disclosed in the embodiment of the present invention.

The embodiment of the invention provides a method for training a situation element classification model, which is shown in a flow chart shown in figure 2 and comprises the following steps:

step S200, determining a current training data set based on a preset training set; the training data set includes a plurality of training data.

The training set may include a large amount of situation element data acquired by a sensor provided in the device, and the situation element data may be used as training data; such as log-type data, vulnerability scanning records, SNMP data and NetFlow data; the categories to which these data belong have been predetermined. The situation element data can be preprocessed, such as normalization, format unification, etc.; in order to make the training data more effective, an optimization algorithm can be adopted to delete redundant data and noise data in the original situation element data to obtain optimized training data.

The classification accuracy of the initial classifier is calculated according to the classification result of the training data in one training data group, so that the number of the training data in one training data group is not small; in general, all training data in the training set can be used repeatedly as a training data set in the training process of the model.

Step S202, inputting the training data in the training data set into a pre-established initial classifier to obtain a classification result corresponding to each training data.

The classifier is a general name of a method for classifying samples in data mining; the initial classifier can be established based on algorithms such as decision trees, logistic regression, naive Bayes, neural networks and the like; the initial classifier is able to map the training data entered therein to one of the given classes, so that a classification of the training data can be obtained. The initial parameters of the initial classifier can be given according to the value range obtained by historical experience.

And step S204, determining the classification accuracy of the initial classifier according to the class and the classification result of the pre-acquired training data.

After the classification result of the training data is obtained, whether the classification result is the same as the class of the training data is checked; if the two are the same, the classification is correct; if not, the classification is wrong. In general, the ratio of the number of classification results with correct classification to the total classification result may be used as the classification accuracy, and if the number of classification results with correct classification is 89 and the number of total classification results is 100, the classification accuracy is 0.89. And weighting the classification result of each given class according to the requirement to obtain the classification accuracy of the initial classifier.

Step S206, optimizing preset parameters of the initial classifier according to the classification accuracy and a preset optimization algorithm; and continuously determining the current training data set based on the preset training set until the classification accuracy rate is converged to obtain the situation element classification model.

The preset parameters may be all parameters of the initial classifier, or may be several parameters or a certain parameter of the initial classifier selected based on experience. The initial parameters of the initial classifier can be given by the optimization algorithm according to the value range obtained according to historical experience; in the optimization process, the classification accuracy can be used as a parameter input by an initial classifier based on an optimization algorithm; and the optimization algorithm gives the updated preset parameter value of the initial classifier based on the classification accuracy and the current value of the preset parameter of the initial classifier, so that the preset parameter of the initial classifier is optimized. The optimization algorithm may be a genetic algorithm or a quantum algorithm.

After determining a current training data set based on a preset training set, inputting training data in the training data set into a pre-established initial classifier to obtain a classification result corresponding to each training data; determining the classification accuracy of the initial classifier according to the classification and classification results of the pre-acquired training data; further optimizing preset parameters of the initial classifier according to the classification accuracy and a preset optimization algorithm; and continuously determining the current training data set based on the preset training set until the classification accuracy rate is converged to obtain the situation element classification model. The method improves the speed and the accuracy of the situation element classification by optimizing the parameters of the situation element classification model.

The embodiment of the invention also provides another training method of the situation element classification model, which is realized on the basis of the method in the embodiment; the embodiment mainly describes a specific process of generating a training set and a process of optimizing preset parameters of an initial classifier by adopting a preset optimization algorithm; as shown in fig. 3, the method comprises the steps of:

step S300, acquiring an original training data set; the original training data set comprises a plurality of situation element data; the raw training data set may include situation element data acquired by a sensor provided in the device without preprocessing.

Step S302, preprocessing the situation element data; specifically, the preprocessing may include preprocessing processes such as continuous attribute discretization, normalization, and the like.

And S304, performing dimensionality reduction on the preprocessed situation element data by adopting a linear discriminant analysis algorithm to obtain a training set.

The Linear Discriminant Analysis (LDA) is a supervised data dimension reduction method, in which the class information of the data is considered in the dimension reduction process, that is, each sample of the application data set has a class output. The LDA core idea is that data with category information in a high-dimensional space is projected to a low-dimensional space, so that distances between central points of projected projection point clusters of the same category are closer, and distances between central points of projection point clusters of different categories are farther, namely the minimum intra-category dispersion and the maximum inter-category dispersion.

In a complex network environment, the network situation element information has the characteristics of different data sources, large data volume, various data types and the like, the primary task of situation element extraction is to accurately extract abnormal information from complex heterogeneous data, and the essential task is to select a required element set from the situation element set. As a classification process in the situation element extraction, before the situation elements are classified, if the dimensionality of the situation element set can be reduced through reduction of attributes in the situation element set, noise and redundant attributes are deleted, and main situation elements are selected for classification, the classification efficiency of the situation elements can be greatly improved.

The LDA algorithm comprises the following specific steps:

(1) and calculating various sample mean vectors AVGi and a total sample mean vector AVG.

Suppose that the situation element information set SA { (a)₁,B₁),(A₂,B₂),…,(A_i,B_i) Where i is the number of samples, A_iIs the ith sample feature and is an n-dimensional vector, n is the maximum dimension of the sample feature, B_i∈{C₁,C₂,…,C_mAnd m is the maximum category number of the sample characteristics. Define AVG_j(j is 1,2, …, m) is the mean vector of the jth class sample, and the intra-class dispersion matrix is SM_WInter-class dispersion matrix SM_B。

(2) Calculating an intra-class discrete matrix SMw and an inter-class discrete matrix SMb of the sample, wherein the specific formula is as follows:

wherein N is_j(j is 1,2, …, m) is the j-th sample number, X_j(j ═ 1,2, …, m) is the data set for the jth class of samples, and AVG is the mean vector of all samples.

(3) And solving the maximum Fisher criterion function to obtain the maximum m-1 eigenvalues and corresponding m-1 eigenvectors of the matrix SMw-1SMb, and obtaining the projection matrix W.

In the embodiment of the invention, a plurality of classes of LDA are adopted to project high-dimensional space information to a low-dimensional space, and the low-dimensional space at the moment is a hyperplane. According to the Fisher criterion function idea, if m types of sample data are shared, the maximum spatial dimension of projection is m-1, the maximum projection vector matrix W is composed of m-1 projection vectors W, and the maximum original sample is reduced to m-1 dimension. To minimize intra-class dispersion and maximize inter-class dispersion, the maximum Fisher criterion function can be solved:

when the discrete matrix in the class is nonsingular, the SM is obtained by mathematical means_w-1SM_bW ═ λ W, where the maximum is the matrix SM_w-1SM_bMaximum eigenvalue of, for SM_w-1SM_bAnd solving the eigenvalues, wherein each column vector of the matrix W is the eigenvector corresponding to the first m-1 maximum eigenvalues of the matrix SMw-1 SMb.

(4) And converting the features in the original data set into new sample features according to the projection matrix W to obtain a reduced data set, namely a training set.

Step S306, determining a current training data set based on the training set; the training data set includes a plurality of training data.

Step S308, inputting the training data in the training data group into a pre-established XGboost classifier to obtain a classification result corresponding to each training data.

Step S310, determining the classification accuracy of the current XGboost classifier according to the class and classification result of the pre-acquired training data.

The XGBoost algorithm is an integration algorithm proposed on the basis of a Gradient Boosting Decision Tree (GBDT), and its principle is to combine a plurality of Decision Tree classifiers with lower classification accuracy into one classifier with higher accuracy. Compared with the GBDT algorithm, the XGboost algorithm can automatically utilize multithreading of the CPU to carry out calculation, the running time is saved, the XGboost algorithm solves the problem that other loss functions except the square loss function are difficult to solve through the transformation of a second-order Taylor expansion formula on the loss function, the objective function only depends on the first-order derivative and the second-order derivative of the loss function, meanwhile, the complexity of a tree is considered by the objective function, the model variance is reduced, and the problem of overfitting is solved.

The optimized objective function of the classifier based on the XGBoost algorithm is:

wherein, Obj^(t)Is the objective function at the t-th iteration, n is the total number of samples, l is the loss function,

is the predicted value of the sample at the t-1 th iteration, f_t(x_i) For the newly added model, Ω (f)_t) Is a decision tree complexity function.

Defining a sample set of leaf nodes of the tree model as I by using a second order Taylor expansion for the objective function and removing the constant term_j＝{i|q(x_iJ), where q (xi ═ j) represents a structural part of the tree model generated in each iteration process, and j is the number of leaf nodes in the sample set, resulting in:

wherein g is_iIs the first derivative, h_iIs the second derivative, T is the number of leaf nodes, w_jAre leaf node sample weights. By passingTo w_jDerivation to obtain the best w_jAnd the corresponding optimal value of the objective function, namely:

thereby obtaining the classification result of the training data.

And step S312, taking the classification accuracy as a fitness parameter of the quantum genetic algorithm.

The Quantum Genetic Algorithm (QGA) is an intelligent optimization algorithm combining a Quantum algorithm and a genetic algorithm, solves the problem that the traditional genetic algorithm is easy to fall into a local optimal solution to a certain extent, can quickly converge to a global optimal solution when the population scale is small, and has higher global search capability and convergence speed. The QGA replaces a binary coding mode adopted by chromosomes in a traditional genetic algorithm through quantum bits and quantum superposition states, so that the variation range of the chromosome taking value is increased, the crossover method of an original chromosome is replaced by a quantum full-interference crossover method, the variation method of the chromosomes is improved through a quantum revolving door algorithm, and the algorithm convergence is ensured.

In quantum coding, each binary digit of a chromosome is called a qubit, and the qubit can be simultaneously in a superposition of two quantum states, the states of which are represented as:

wherein α and β are two complex probability amplitude pairs, and | α i |2+ | β i |2 ═ 1(i ═ 1,2, …, m), m is the number of qubits, | α i |2 and | β i |2 represent the probabilities that a quantum is in states 0 and 1, respectively.

QGA adopts quantum revolving door to carry out updating variation to chromosome, and the essence makes the chromosome value tend to better chromosome through changing the quantum rotation angle, and the variation process is as follows:

where θ i is the quantum rotation angle and [ α i β i ] T is the ith quantum bit of the chromosome.

The classification accuracy can be used as a fitness parameter of a quantum genetic algorithm, and the fitness parameter finally presents a convergence trend when the value range of the quantum bit is determined.

Step S314, updating the learning rate, the maximum tree depth, the minimum leaf weight and the minimum loss function reduction value of the current XGboost classifier through a quantum genetic algorithm to obtain an updated XGboost classifier; and continuously determining the current training data set based on the preset training set until the classification accuracy rate is converged to obtain the situation element classification model.

Specifically, in the current XGboost classifier, the learning rate, the maximum tree depth, the minimum leaf weight, and the minimum loss function reduction value may be used as a chromosome quantum bit in the quantum genetic algorithm, and the chromosome may be updated and varied by using a quantum revolving gate to obtain an updated learning rate, the maximum tree depth, the minimum leaf weight, and the minimum loss function reduction value. Training the XGboost classifier with updated parameters by adopting a training data group, and calculating to obtain the classification accuracy; and when the classification accuracy rate is converged (maximum), stopping training to obtain the situation element classification model.

In the situation element classification method, the linear discriminant analysis algorithm is adopted to perform dimensionality reduction on the initial situation element data to obtain a simplified training set; in the training process of the XGboost classifier, parameters of the XGboost classifier are optimized by adopting a quantum genetic algorithm, and finally a situation element classification model is obtained; the method improves the speed and the accuracy of the situation element classification by simplifying the training data and optimizing the situation element classification model parameters.

The embodiment of the invention also provides a situation element classification method, which is realized on the basis of the training method of the situation element classification model in the embodiment; as shown in fig. 4, the method includes the steps of:

s400, acquiring an original situation element data set; the original situation element data set comprises a plurality of original situation element data to be classified; the raw situation element data may be acquired by a sensor provided in the device.

Step S402, preprocessing the original situation element data; specifically, the preprocessing may include preprocessing processes such as continuous attribute discretization, normalization, and the like.

S404, simplifying the preprocessed original situation element data by adopting a preset simplification algorithm; specifically, the simplified algorithm may be a linear discriminant analysis algorithm; the simplified processing process can be used for carrying out dimension reduction processing on the original situation element data by a linear discriminant analysis algorithm.

Step S406, classifying each simplified original situation element data through a situation element classification model obtained through pre-training to obtain a classification result; the situation element classification model is established through the training method of the situation element classification model.

The situation element classification method comprises the steps of firstly preprocessing and simplifying acquired original situation element data, classifying each simplified original situation element data by adopting a situation element classification model obtained through pre-training, and finally obtaining a classification result. The method improves the speed and the accuracy of the situation element classification.

Based on the embodiment, the invention also provides a method for extracting the situation elements based on the LDA-QGA-XGboost. In order to solve the defects of the traditional extraction model, the method adds an attribute reduction module and a parameter optimization module in a local classification module; the attribute reduction module reduces the attributes of the original data of the situation elements and deletes the redundant data with low importance; and the parameter optimization module adjusts parameters of the classifier by using a parameter optimization algorithm, so that the classification precision is improved. The local classification module based on this method is shown in fig. 5.

The LDA algorithm can carry out dimension reduction processing on the situation element original data to delete redundant data and noise in the data, and can also retain key elements in the situation element original data; the QGA has good global search capability and has higher convergence rate when the population scale is smaller; compared with the traditional classification algorithm, the XGboost algorithm has higher accuracy and operation efficiency in classification. Therefore, the embodiment of the invention applies the XGboost algorithm to the situation element extraction, and provides a network security situation element extraction method based on the LDA-QGA-XGboost, so that the LDA, the QGA and the XGboost are combined, and the accuracy of situation element extraction is effectively improved. As shown in fig. 6, the method for extracting situation elements based on LDA-QGA-XGBoost provided in the embodiment of the present invention specifically includes the following steps:

(1) and carrying out continuous attribute discretization, normalization and other preprocessing processes on the original situation data to form an original network security situation element set comprising a training situation element set and a testing situation element set.

(2) And reducing the training situation element set and the testing situation element set through LDA, deleting redundant attributes, and determining an optimized situation element subset.

(3) Optimizing XGboost classifier parameters by using QGA; because the classifier has more parameters, the embodiment of the invention selects four parameters which have larger influence on the classifier to optimize: learning _ rate, max _ depth (maximum tree depth), min _ child _ weight (minimum leaf weight), gamma (minimum loss function degradation value).

(4) Training the XGboost classifier by using the parameters obtained in the step (3) and evaluating the fitness of each parameter, wherein the fitness function selected in the embodiment of the invention is the accuracy of the model to obtain the XGboost classifier for extracting the situation elements; the training end condition may be model accuracy convergence.

(5) And testing the QGA optimized XGboost classifier, classifying the reduced test situation element set, and obtaining a final test result.

The method carries out dimensionality reduction on data, removes redundant situation elements and noise, and solves the problem of large accuracy fluctuation of a classification model through parameter optimization of the classification model, so that the speed and the accuracy of classification of the situation elements are improved.

Corresponding to the above embodiment of the method for training the situation element classification model, an embodiment of the present invention further provides a device for training the situation element classification model, as shown in fig. 7, the device includes: the training data determining module 700 determines current training data and categories corresponding to the training data based on a preset training set; a first classification module 702, configured to input training data into a pre-established initial classifier to obtain a classification result; an accuracy determining module 704, configured to determine the classification accuracy of the initial classifier according to the category and the classification result of the training data; a parameter optimization module 706, configured to optimize a preset parameter of the initial classifier according to the classification accuracy and a preset optimization algorithm; and continuously inputting the next group of training data into the initial classifier for training until the classification accuracy is greater than or equal to a preset accuracy threshold value, and obtaining a situation element classification model.

Further, the above apparatus further comprises: the original training data set acquisition module is used for acquiring an original training data set; the original training data set comprises a plurality of situation element data; the first preprocessing module is used for preprocessing the situation element data; the first simplification processing module is used for simplifying the preprocessed situation element data by adopting a preset simplification algorithm to obtain a training set; the number of situation element data in the training set is not larger than the number of situation element data in the original training data set.

Specifically, the simplified algorithm includes a linear discriminant analysis algorithm; the first simplified processing module is further configured to: and performing dimensionality reduction on the preprocessed situation element data by adopting a linear discriminant analysis algorithm to obtain a training set.

Specifically, the initial classifier includes an XGboost classifier; the preset parameters include learning rate, maximum tree depth, minimum leaf weight, and minimum loss function degradation value.

Specifically, the optimization algorithm includes a quantum genetic algorithm; the parameter optimization module is further configured to: taking the classification accuracy as a fitness parameter of the quantum genetic algorithm; and updating the learning rate, the maximum tree depth, the minimum leaf weight and the minimum loss function reduction value of the current XGboost classifier through a quantum genetic algorithm to obtain the updated XGboost classifier.

The implementation principle and the generated technical effect of the training method of the situation element classification model provided by the embodiment of the invention are the same as those of the embodiment of the training method of the situation element classification model, and for the sake of brief description, corresponding contents in the embodiment of the training method of the situation element classification model can be referred to for the part not mentioned in the embodiment of the training device of the situation element classification model.

Corresponding to the above embodiment of the method for classifying situation elements, an embodiment of the present invention further provides a device for classifying situation elements, as shown in fig. 8, where the device includes:

an original data obtaining module 800, configured to obtain original situation element data to be classified; a second preprocessing module 802, configured to preprocess the original situation element data; the second simplification processing module 804 is configured to simplify the preprocessed original situation element data by using a preset simplification algorithm; the second classification module 806 is configured to perform classification processing on the simplified original situation element data through a situation element classification model obtained through pre-training to obtain a classification result; the situation element classification model is established through the training method of the situation element classification model.

The implementation principle and the generated technical effect of the method for classifying the situation elements provided by the embodiment of the present invention are the same as those of the method for classifying the situation elements, and for the sake of brief description, corresponding contents in the method for classifying the situation elements may be referred to where the embodiment of the device for classifying the situation elements is not mentioned.

An embodiment of the present invention further provides a server, as shown in fig. 9, the server includes a processor 130 and a memory 131, the memory 131 stores machine executable instructions capable of being executed by the processor 130, and the processor 130 executes the machine executable instructions to implement the training method or the situation element classification method of the situation element classification model.

Further, the server shown in fig. 9 further includes a bus 132 and a communication interface 133, and the processor 130, the communication interface 133 and the memory 131 are connected through the bus 132.

The Memory 131 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 133 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used. The bus 132 may be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 9, but this does not indicate only one bus or one type of bus.

The processor 130 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 130. The Processor 130 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 131, and the processor 130 reads the information in the memory 131 and completes the steps of the method of the foregoing embodiment in combination with the hardware thereof.

The embodiment of the present invention further provides a machine-readable storage medium, where the machine-readable storage medium stores machine-executable instructions, and when the machine-executable instructions are called and executed by a processor, the machine-executable instructions cause the processor to implement the training method or the situation element classification method for the situation element classification model, which can be specifically implemented by referring to the method embodiment and will not be described herein again.

The situation element classification method and the training method and device of the model thereof, and the computer program product of the server provided by the embodiments of the present invention include a computer-readable storage medium storing program codes, instructions included in the program codes may be used to execute the method described in the foregoing method embodiments, and specific implementations may refer to the method embodiments, and are not described herein again.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A training method of a situation element classification model is characterized by comprising the following steps:

determining a current training data set based on a preset training set; the training data set includes a plurality of training data;

inputting the training data in the training data set into a pre-established initial classifier to obtain a classification result corresponding to each training data;

determining the classification accuracy of the initial classifier according to the class of the pre-acquired training data and the classification result;

optimizing preset parameters of the initial classifier according to the classification accuracy and a preset optimization algorithm;

and continuously determining the current training data set based on a preset training set until the classification accuracy rate is converged to obtain a situation element classification model.

2. The method of claim 1, further comprising:

acquiring an original training data set; the original training data set comprises a plurality of situation element data;

preprocessing the situation element data;

simplifying the preprocessed situation element data by adopting a preset simplification algorithm to obtain the training set; the number of situation element data in the training set is not larger than the number of situation element data in the original training data set.

3. The method of claim 2, wherein the simplified algorithm comprises a linear discriminant analysis algorithm;

the method comprises the following steps of simplifying the preprocessed situation element data by adopting a preset simplification algorithm to obtain the training set, and comprises the following steps:

and performing dimensionality reduction on the preprocessed situation element data by adopting a linear discriminant analysis algorithm to obtain a training set.

4. The method of claim 1, wherein the initial classifier comprises an XGboost classifier; the preset parameters comprise a learning rate, a maximum tree depth, a minimum leaf weight and a minimum loss function reduction value.

5. The method of claim 4, wherein the optimization algorithm comprises a quantum genetic algorithm;

and optimizing the preset parameters of the initial classifier according to the classification accuracy and a preset optimization algorithm, wherein the step comprises the following steps of:

taking the classification accuracy as a fitness parameter of the quantum genetic algorithm;

and updating the learning rate, the maximum tree depth, the minimum leaf weight and the minimum loss function reduction value of the current XGboost classifier through the quantum genetic algorithm to obtain the updated XGboost classifier.

6. A situation element classification method is characterized by comprising the following steps:

acquiring an original situation element data set; the original situation element data set comprises a plurality of original situation element data to be classified;

preprocessing the original situation element data;

simplifying the preprocessed original situation element data by adopting a preset simplification algorithm;

classifying each simplified original situation element data through a situation element classification model obtained through pre-training to obtain a classification result; the situational element classification model is created by the method of any one of claims 1-5.

7. A training device for a situation element classification model is characterized by comprising:

the training data determining module is used for determining a current training data set based on a preset training set; the training data set includes a plurality of training data;

the first classification module is used for inputting the training data in the training data set into a pre-established initial classifier to obtain a classification result corresponding to each training data;

the classification accuracy determining module is used for determining the classification accuracy of the initial classifier according to the class of the pre-acquired training data and the classification result;

the parameter optimization module is used for optimizing preset parameters of the initial classifier according to the classification accuracy and a preset optimization algorithm; and continuously determining the current training data set based on a preset training set until the classification accuracy rate is converged to obtain a situation element classification model.

8. A situation element classification device, comprising:

the original data acquisition module is used for acquiring an original situation element data set; the original situation element data set comprises a plurality of original situation element data to be classified;

the preprocessing module is used for preprocessing the original situation element data;

the simplification processing module is used for simplifying the preprocessed original situation element data by adopting a preset simplification algorithm;

the second classification module is used for classifying the simplified original situation element data through a situation element classification model obtained through pre-training to obtain a classification result; the situational element classification model is created by the method of any one of claims 1-5.

9. A server comprising a processor and a memory, the memory storing machine executable instructions executable by the processor, the processor executing the machine executable instructions to implement the method of training a situational element classification model according to any one of claims 1 to 5 or the method of claim 6.

10. A machine-readable storage medium having stored thereon machine-executable instructions which, when invoked and executed by a processor, cause the processor to carry out the method of training a situational element classification model of any of claims 1 to 5 or the method of claim 6.