CN111835707B

CN111835707B - Malicious program identification method based on improved support vector machine

Info

Publication number: CN111835707B
Application number: CN202010459366.0A
Authority: CN
Inventors: 陈锦富; 殷上; 张祖法; 黄如兵; 杨健
Original assignee: Jiangsu University
Current assignee: Jiangsu University
Priority date: 2020-05-27
Filing date: 2020-05-27
Publication date: 2022-12-16
Anticipated expiration: 2040-05-27
Also published as: CN111835707A

Abstract

The invention provides a malicious program identification method based on an improved support vector machine, which comprises the following steps: collecting data in network flow through Netflow, and carrying out data normalization on the collected data packet; in order to complete the identification of the malicious program, feature extraction is required; in order to eliminate the problem of redundant features, feature attribute dimension reduction is carried out, and normalization processing is carried out; then, carrying out classification training by adopting an OFSVM algorithm; and finally, constructing a network traffic identification model by using an NTMI (network to average) identification algorithm, and finally realizing identification of malicious programs in the network traffic.

Description

Malicious program identification method based on improved support vector machine

Technical Field

The invention belongs to the field of malicious program detection in network flow, and relates to a malicious program identification method based on an improved support vector machine.

Background

With the increasing population, the network scale is promoted to be enlarged day by day, the network flow is full of various complex data, and some stealing beneficiaries attack the network by means of some bugs in the network, so that the important information is leaked, the security problem of illegal access is caused, and even more, the enterprise system is paralyzed, and great troubles are brought to the life of people.

In the huge network flow, a network malicious attacker can release some phishing websites or worm viruses to steal important information of users, and then normal programs are converted into malicious programs by using the bugs, so that a host of the user is controlled or crashed by a hacker, huge economic loss is caused, and social order is disturbed.

Before detecting the malicious programs, network traffic needs to be classified and identified, and the malicious programs overflowing towards the buffer area are further detected after harmful malicious programs are separated, and with the continuous development of technologies, the classification and identification technologies are diverse, and the existing classification and identification methods have advantages and disadvantages. The Teufl et al provides a framework for simplifying the selection of the empirical model and the feature extraction, observes whether data in the traffic violates a certain rule by analyzing the network traffic, and extracts an optimal feature set from the data to construct a traffic classification model, thereby realizing the classification and identification of the network traffic. Shrivastav et al analyzed and realized a semi-supervised network traffic classification method, by classifying the training data of labeled and unlabeled flows, the data set contains attack data and normal data, the labeled data were divided into clusters for classification and identification, and then the test results were compared with the classifier based on SVM, experiments proved that the method had better accuracy. After analyzing a plurality of network data, yang et al find that parameters transmitted by an application layer are different for different protocols, such as the size of a payload and the information entropy of each packet, and then train and classify by means of a decision tree algorithm based on a minimum partition distance, and experiments show that intercepting the first four or six data packets can shorten the time length and have higher accuracy for classification. The technologies scan malicious attack behaviors which may appear in a network, and analyze the obtained corresponding data, so that the delay is high, and meanwhile, the final classification and identification test result is greatly different from an expected result, so that the malicious program identification method based on the improved support vector machine is of great significance.

Disclosure of Invention

Based on the conditions that the detection accuracy of malicious programs in network traffic is not high, the classification accuracy is low and the like in the prior art, the invention provides a malicious program identification method based on an improved support vector machine to solve the problems.

The invention provides a malicious program identification method based on an improved support vector machine, which comprises the following steps:

step 1, acquiring data in network flow through Netflow, and carrying out data standardization on an acquired data packet;

step 2, in order to complete the identification of the malicious program, feature extraction is required;

step 3, in order to eliminate the problem of redundant features, feature attribute dimension reduction is carried out, and normalization processing is carried out;

step 4, performing classification training by adopting an OFSVM algorithm;

and 5, finally, constructing a network traffic identification model by using an NTMI (network to average molecular dynamics) identification algorithm, and finally realizing identification of malicious programs in the network traffic.

In a first aspect, the step 2 specifically includes:

by comparing the processed data set with the correlation between the sample type and the characteristic attribute, the weight value will be increased continuously with the higher correlation, and then a threshold value is set, and if the threshold value is exceeded, the characteristic attribute is retained, otherwise, the characteristic attribute is not selected. Meanwhile, if a plurality of characteristic attributes of a certain data packet are found in the extraction process, the data packet with the highest frequency of occurrence is selected for substitution. The specific characteristic selection process is as follows: a few samples s are selected hierarchically and randomly from the data set D, and then the same type D as the samples s is nearest _a In y samples r, then in different classes D _b Selecting y samples t, and finally calculating the distance D between the sample s and the samples r and t _sr And D _st (ii) a If D is _sr >D _st If the attribute is a problem, the attribute cannot be used for classification, and a smaller weight is set; conversely, if the feature attribute is easy to classify, a larger weight is set.

In a second aspect, the step 3 specifically includes:

firstly, adding extracted feature attributes into a set S, after researching some previous methods, providing a Filter feature dimension reduction method on the basis, then, evaluating information gain of the feature attribute set S by means of an information gain algorithm, determining whether to update a value and whether to update the feature attribute set S by evaluating the effect of each feature attribute on subsequent classification, then, ranking the feature attributes by adopting a heuristic search strategy to obtain a feature attribute set S1, circulating the process, stopping until a specified number of times is reached, on the basis, adopting a wrapper method to perform secondary feature selection, adopting an heuristic sequence forward search mode to obtain a feature attribute set S2, after performing feature dimension reduction, not only shortening time and reducing computational complexity, but also improving the classification effect.

In a third aspect, the OFSVM algorithm includes:

in parameter optimization, an optimal parameter combination is found in limited search, and a grid search parameter optimization is used for improving the SVM algorithm; while for each sample point s, by using the distance between each sample and the class as the ambiguity factor _i There is a corresponding blurring factor e _i This represents the uncertainty of the sample distribution, where 0 ≦ e _i 1 or less, then R is used ⁺ 、R ^- To represent the mean point of positive and negative samples, the normal vector can be used

To represent, the corresponding hyperplane can be represented as (s-R) ² cosα ^T =0, so that a distance of the sample point from the hyperplane can be obtained

Then the maximum distance d from the positive sample point to the hyperplane can be obtained ₁ If and only if R is R ⁺ In the same way, when R is R ^- When d is greater than ₂ For the maximum distance of the negative sample point to the hyperplane, then using the adjustment factor

To make 0 < e _i 1 or less, then a blurring factor of

Wherein the value of d is d when different positive and negative samples are taken ₁ And d ₂ And proposing the validity of the constructed features to eliminate the influence of redundant features on the classification precision, and finally generating a classifier model by depending on the radial basis kernel function verified by experiments.

In a fourth aspect, the NTMI recognition algorithm specifically includes: the method comprises the steps of carrying out data sampling and normalization processing on acquired network flow data to obtain a data set which is more valuable to an experiment, simultaneously, extracting features of the network flow data more conveniently, then extracting the features of a data packet in the network flow by utilizing a Relieff algorithm, wherein the extracted features still contain some redundant attribute features, the features greatly reduce the precision of network flow classification, further providing the dimension reduction of the extracted feature set, carrying out calculation and evaluation on the features by utilizing an information gain technology, then sequencing the feature set, carrying out secondary feature selection, adopting a heuristic sequence forward searching mode, calculating the correlation of the features, and finally realizing the dimension reduction of the features. Then, normalization processing is carried out on the obtained feature subsets, all feature attributes are converted into numerical values, then the numerical values are put into a matrix array, minimum Euclidean distance calculation is carried out, training is carried out by means of an OFSVM algorithm, a classifier with a large classification effect is obtained, the rest network traffic test set is used as input, classification of normal programs and malicious programs in network traffic is achieved by means of the classifier, and finally recognition of malicious programs in the network traffic is achieved.

The invention has the beneficial effects that:

the OFSVM algorithm can be used for improving the classification accuracy of network flow, grid search is used for expanding the search range, a fuzzy factor is designed by adopting the distance from a sample to a classification hyperplane, the influence of a classification plane shape on the classification accuracy is reduced, meanwhile, a feature weight is measured according to feature effectiveness, finally, a radial basis kernel function is used for reducing complexity, and finally, the classification training performance is improved.

And 2, the NTMI recognition algorithm performs feature extraction, feature dimension reduction and normalization processing on the collected data packet to serve as the input of the OFSVM classification algorithm, so that a classifier with better classification performance is generated, a malicious program recognition model of the network flow is constructed, and the malicious program recognition is completed.

3. Corresponding data traffic is effectively collected from the network traffic to complete real-time monitoring; extracting the characteristics of the data packet; redundant features are processed by feature dimension reduction, so that the classification performance is improved; the characteristic processing is convenient, the normalization processing is provided, and the normalization processing can be better used as the input processing; the OFSVM algorithm is used for completing classification training of malicious programs; NTMI algorithms are used to identify whether there are malicious programs in the network traffic; experimental results show that the method has a certain effect on identifying the malicious programs in the network flow, can identify the malicious programs in the network flow, and ensures the network safety.

Drawings

FIG. 1 is a flow diagram of feature dimension reduction;

FIG. 2 is a flow chart of the malicious program identification method based on the improved support vector machine of the invention;

FIG. 3 is a flow diagram of a malware identification model in network traffic;

FIG. 4 is a schematic diagram of feature attributes after feature extraction;

FIG. 5 is a diagram of feature attributes after feature dimensionality reduction;

FIG. 6 is a graph comparing accuracy on CAIDA for five methods;

fig. 7 is a comparison graph of false alarm rates on CAIDA by the five methods.

Detailed Description

The invention will be further elucidated by means of the figures and the specific steps.

The invention aims to provide a malicious program identification method based on an improved support vector machine aiming at malicious programs utilizing vulnerabilities in network traffic, effectively completes identification of the malicious programs, provides an NTMI identification algorithm, and performs sufficient experiments, which also proves the feasibility and effectiveness of the method.

As shown in fig. 2, the method for identifying malicious programs based on an improved support vector machine of the present invention includes:

step 201, acquiring data in network flow through Netflow, and performing data normalization on an acquired data packet;

step 202, in order to complete the identification of the malicious program, feature extraction is required;

step 203, in order to eliminate the problem of redundant features, feature attribute dimension reduction is carried out, and normalization processing is carried out;

step 204, performing classification training by adopting an OFSVM algorithm;

step 205 is finally to use NTMI recognition algorithm to construct a network traffic recognition model, and finally to realize recognition of malicious programs in network traffic.

In the step 201, the specific steps are as follows:

(1) Data acquisition

The method includes the steps that firstly, network flow data acquisition is needed by means of NetFlow, the tool can also analyze the network flow to further eliminate network faults, but the identification efficiency of malicious programs of a plurality of vulnerability types written by an attacker is low, meanwhile, corresponding network equipment is needed to support the NetFlow, and users are needed to distinguish normal flow and malicious flow.

(2) Data normalization

And before the collected network flow data packet is normalized, data sampling is carried out to select a better data set. Data sampling is mainly to select some data as subsets in the whole experimental data set and then to perform sampling observation, and because the set has the characteristics of the original set, the excellent judgment on the whole network traffic data set is realized. The main sampling modes are systematic sampling, random sampling and hierarchical sampling. The system sampling is to sort the original data samples, and randomly extract a specified amount of sample data from the beginning every certain time; random sampling: selecting some sample data randomly from the whole sample data; the hierarchical sampling is to firstly layer the whole data sample set according to a specified rule and then randomly extract some data in each layer. Hierarchical sampling will be taken herein to observe the goodness of the entire data set.

For step 202, the main steps of extracting the features of the data packets in the network traffic are as follows:

(1) The method is characterized in that the correlation between the type of a sample and a characteristic attribute is compared with a processed data set, the weight value is continuously increased along with the higher correlation, then a threshold value is set, if the weight value exceeds the threshold value, the characteristic attribute is reserved, and if the weight value does not exceed the threshold value, the characteristic attribute is not selected. Meanwhile, if a plurality of characteristic attributes of a certain data packet are found in the extraction process, the data packet with the highest frequency of occurrence is selected for substitution.

(2) The specific characteristic selection process is as follows: randomly selecting some samples s hierarchically from the data set D, then selecting y samples r in the same type Da closest to the samples s, and then in different classes D _b Selecting y samples t, and finally calculating the distance D between the sample s and the samples r and t _sr And D _st (ii) a If D is _sr >D _st It is indicated that the characteristic attribute is problematic and cannot be used for classification, and a smaller weight is set; otherwise, the feature attribute is easy to classify, a larger weight is set, and the calculation of the feature weight is carried out by referring to the existing literature, wherein D (x, r, t) is the corresponding Euclidean distance, w (x) is the corresponding weight, D _j And (4) for j sample data in the data set, wherein n refers to calculating weights in n data to extract features, the processes are executed circularly, the finally calculated weights are compared with the set weights, and the finally calculated weights are reserved if the weights meet the requirements, and are abandoned if the weights do not meet the requirements, so that a final extracted feature attribute set S can be obtained. The final extracted features are shown in fig. 4.

For step 203, in order to eliminate the problem of redundant features, feature attribute dimension reduction is performed, and normalization processing is performed, which includes the following specific steps:

(1) Firstly, adding the extracted feature attributes into a set S, and after researching some previous methods, providing a method for reducing dimensions of the Filter features, and then, by means ofInformation gain algorithm, E _IG ＝evaluate(F _filter S) is to evaluate the information gain of the characteristic attribute set S, and whether to update E is determined by evaluating the effect of each characteristic attribute on the subsequent classification _IG And whether the characteristic attribute set S is updated or not, then sequencing the characteristic attributes by adopting a heuristic search strategy to obtain a characteristic attribute set S1, circulating the process, stopping the process until the specified times are reached, on the basis, performing secondary characteristic selection by adopting a Wrapper method, and obtaining a characteristic attribute set S2 by adopting a heuristic sequence forward search mode, wherein a specific flow chart is shown in FIG. 3. After feature dimension reduction is carried out, the time is shortened, the calculation complexity is reduced, and the classification effect is improved.

(2) When the Wrapper method is used, the following formula performs secondary selection on the characteristic attributes by calculating the correlation of the flow characteristic attributes by using the existing literature, wherein n represents the number of the initially selected characteristic attributes,

representing coefficient of characteristic attribute, m _ri Represents the average value of the flow characteristic attribute of the ith data packet,

is the corresponding variance, m _r Represents the average value of the flow characteristic attribute r. The final feature attributes after feature dimensionality reduction are shown in fig. 5.

(3) Data normalization plays an important role in data mining, measurement units corresponding to different evaluation indexes are different, data analysis operation cannot be performed under the condition, normalization processing is performed based on the difference, different data are made to have comparability and operability, and after the data are processed, the data are converted into dimensionless and unit pure values to become data of the same magnitudeIndexes are convenient for subsequent processing and evaluation, and meanwhile, after normalization is carried out, the convergence rate is increased and the classification precision is improved. The specific normalization process is as follows: with the dispersion normalization method proposed in the existing literature, which may also be referred to as min-max normalization, which is mainly used to process data, by converting the target data set to between 0 and 1, by linearly transforming the acquired feature subsets, the transfer function is used as follows:

in this formula min refers to the minimum value of the sample data and max refers to the maximum value of the sample data, but there is a disadvantage that adding data to the target transition process will cause max and min to be changed, which in turn affects the normalization criteria, so that it is ensured that the data set will remain unchanged before the normalization process is performed.

For step 204, an OFSVM algorithm is then used for classification training, and the specific steps are as follows:

for the existing SVM classification method, along with the rapid development of economy, the popularization range of a network is expanded, so that the network flow scale is larger and larger, meanwhile, a lot of noises exist in a real network environment, and a lot of redundant features exist in sample data, so that the SVM classification precision is lower; in addition, in the process of training the sample data to generate the classifier, the sample data needs to be identified manually, so that a lot of energy is consumed, and meanwhile, human errors are difficult to prevent.

In order to solve the problems, an SVM algorithm is improved mainly from the perspective of parameter optimization, wherein SVM parameter optimization mainly finds an approximate optimal solution in finite searches by using a certain search strategy in a plurality of parameter spaces, and two important parameters, namely a kernel function parameter and a penalty parameter, need to be considered in parameter optimization. The penalty parameter plays a role in determining the generalization ability of the SVM hyperplane, and is mainly used for representing the fault tolerance when the hyperplane is constructed, and the kernel function parameter determines the action range and further influences the generalization ability of the SVM.

(1) From the perspective of parameter optimization and finding out the optimal parameter combination in limited search, the SVM algorithm is improved by using grid search parameter optimization. The principle of grid search is as follows, firstly dividing k-dimensional parameter space into k parameters, wherein grid nodes are used to represent candidate parameters; next, samples are taken at a specified step size and a corresponding set P is generated (c) _i )＝{P(c ₁ )×P(c ₂ )×…×P(c _k ) And set parameter c _i To generate grids in different directions; finally, each grid node c is evaluated according to the designated evaluation method _i And evaluating and outputting the final approximate optimal solution. In the process, firstly, the incremental step length is set to be t times of the default step length q, namely q.t, in order to reduce the search time and the density of the generated grid, then, traversal search is carried out, and after all sample data are executed, the optimal parameter combination can be obtained. In order to represent the fault tolerance of sample data when a classification plane is constructed, a penalty parameter P is introduced, the penalty parameter P is compared with a set overfitting critical value f, when the penalty parameter P is smaller than f, a search space is reduced, the step length of search is set to be half of the initial step length, the search is carried out again, the step length is reduced to enlarge the density of the grid, and therefore more accurate search is achieved; if the overfitting critical value f is exceeded, the search space is expanded, the direction of the search direction is adjusted to perform searching again, the purpose is to optimize parameters and prevent overfitting behaviors, sample data is executed in a circulating mode until the punishment parameter P is within the critical range, execution is stopped, and the optimal parameter combination value is output. The algorithm has a larger searchable space, the nodes are mutually independent, the universality is higher, and the minimum error for helping finishing classification can be realized.

(2) Then, in order to improve the classification accuracy, firstly, a fuzzy factor is introduced, and some existing researches propose that the distance between each sample and each class is calculated to be used as the fuzzy factor, so that the optimal classification hyperplane cannot be obtained, and the method reduces the effect of the support vector on the classification hyperplane. The study will use the distance from the sample to the classification hyperplane to setAnd calculating fuzzy factors, and reducing the influence of the classification plane shape on the classification precision by the method. On the basis, the corresponding classification hyperplane is constructed firstly, and then the distance from each sample node to the hyperplane is calculated, so that the classification precision of redundant noise can be eliminated by means of fuzzy factors. For each sample point s _i There is a corresponding blurring factor e _i This represents the uncertainty of the sample distribution, where 0 ≦ e _i 1 or less, then R is used ⁺ 、R ^- To represent the mean point of positive and negative samples, then the normal vector can be used

To illustrate, and with reference to the methods in the prior art, the corresponding hyperplane can be represented as (s-R) ² cosα ^T =0, so that a distance of the sample point from the hyperplane can be obtained

The maximum distance d from the positive sample point to the hyperplane can then be obtained ₁ If and only if R is R ⁺ In the same way, when R is R ^- When d is greater than ₂ The maximum distance of the negative sample point to the hyperplane. Then using the adjustment factor

To make 0 < e _i Less than or equal to 1, then a blurring factor of

Wherein the value of d is d when different positive and negative samples are taken ₁ And d ₂ Thus, the influence of redundant noise on the classification accuracy is eliminated by using different fuzzy factors, but the influence of different features on the classification is not considered, and then the feature validity is introduced to eliminate the influence of weak correlation features on the classification accuracy.

(3) By referring to the calculation method of feature validity proposed by the existing literature, the methodEach sample data feature i has a corresponding feature validity

Can indicate the influence degree of a certain characteristic used for classification, and when the classification capability of the characteristic i is strong, the effectiveness of the characteristic is high

The classification effect of each feature is judged by calculating the reinforcement learning ability of each feature in the feature set S. Assuming that a training sample set S has a total number of | S |, and there are p feature attributes in a certain sample, the feature validity can be expressed as

When a certain feature i has a large reinforcement learning value, the feature effectiveness will be large, that is, the contribution degree to classification is high. Finally, considering the importance of kernel function parameters to classification performance, the research optimizes the SVM classification algorithm by selecting a proper kernel function angle.

(4) The kernel function is mainly used for mapping original nonlinear sample data into a feature space, and then the nonlinear sample is converted into a linear classifiable problem by means of a constructed optimal classification plane, so that huge calculation amount caused by a high-dimensional feature space can be avoided. Assuming that an input space P ∈ R ^ n and a corresponding feature space is F, when a mapping function γ (Y) = Y → P exists, K (Y) is satisfied for any of yi and yj belonging to Y _i ，y _j )＝γ(y _i ) ^T γ(y _j ) Then the kernel function K is present at this point. The kernel function needs to satisfy the Mercer theorem, that is, for any vector of the input space, the corresponding kernel matrix should be a semi-positive matrix. After selecting the proper kernel function, the linear classification is completed without increasing the complexity. Therefore, the classification effect of the SVM is greatly related to the kernel function. The present study will use a radial basis kernel function as the kernel function, which has better performance in a local area,meanwhile, the high classification efficiency of the sample points in the data set can be realized. And the advantage that the method is not limited by the number of samples and the feature dimension makes the method more widely applied, and the radial basis kernel function has fewer parameters, while the complexity of the kernel function is generally related to the number of the parameters, so that the kernel function has lower complexity. By adopting the method to improve the classification of the SVM algorithm, the error is relatively small, and the classification and identification capability of the malicious program in the network flow is greatly improved.

For step 205, a network traffic identification model is finally constructed by using an NTMI identification algorithm, and identification of a malicious program in the network traffic is finally realized, which specifically comprises the following steps:

(1) Firstly, solving the problem of accurately classifying programs in network flow, and in order to achieve the target, firstly, acquiring the network flow by using a NetFlow technology, wherein the whole acquisition flow mainly comprises three steps, namely, trying to acquire a network card list, acquiring the network card list by using a network bottom access tool, and monitoring all flows passing through the network card in real time; selecting a network card for detection, and setting the network card data acquired in the step one to be in a hybrid mode; and step three, merging the data packets in the flow, extracting and merging the data packets of the flow data passing through the network within a certain period of time, and finally obtaining the acquired network flow data.

(2) The method comprises the steps of carrying out data sampling and normalization processing on collected network flow data to obtain a data set which is more valuable to an experiment, simultaneously, enabling the network flow data to be more convenient for people to extract features, then utilizing a Relieff algorithm to carry out feature extraction on data packets in network flow, enabling the extracted features to still contain some redundant attribute features, greatly reducing the precision of network flow classification through the features, further providing the feature set for dimension reduction, carrying out calculation and evaluation on each feature through an information gain technology, then sequencing the feature set, carrying out secondary feature selection through a wrapper method, adopting a heuristic sequence forward searching mode, calculating the correlation of the features, and finally realizing dimension reduction on the features.

(3) Normalization processing is needed to be carried out on the obtained feature subset, all feature attributes are converted into numerical values, then the numerical values are put into a matrix array, minimum Euclidean distance calculation is carried out, training is carried out by means of an OFSVM algorithm, a classifier with a large classification effect is obtained, the rest network flow test set is used as input, classification of normal programs and malicious programs in network flow is achieved by means of the classifier, recognition of the malicious programs in the network flow is finally achieved, and construction of the recognition model is further completed on the basis.

Comparing the NTMI recognition method provided by the present invention with the existing four methods, as shown in fig. 6 and fig. 7, for the public data set, the public data set is larger, then we select 10% of the data sets as training and testing respectively, and finally the data sets for testing are close to about 4 ten thousand, meanwhile, as can be seen from the figure, the accuracy of the NTMI algorithm provided by the present research is still good, and as the number of data packets increases, in the larger-scale network flow public data set, the false alarm rate of the NTMI algorithm is lower than that of the other four algorithms, and also tends to be stable, and is maintained at about 6%, which also proves that the present invention is feasible.

In the description of the present specification, reference to the description of "one embodiment," "some embodiments," "an illustrative embodiment," "an example," "a specific example," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. A malicious program identification method based on an improved support vector machine is characterized by comprising the following steps:

step 4, carrying out classification training by adopting an OFSVM algorithm;

the OFSVM algorithm of step 4 includes:

in the parameter optimization, the optimal parameter combination is searched out in limited searching, and the SVM algorithm is improved by using grid searching parameter optimization; while for each sample point s, calculating the distance between each sample and the class as a blurring factor _i There is a corresponding blurring factor e _i This represents the uncertainty of the sample distribution, where 0 ≦ e _i 1 or less, then R is used ⁺ 、R ^- To represent the mean point of positive and negative samples, the normal vector can be used

The maximum distance d from the positive sample point to the hyperplane can then be obtained ₁ If and only if R is R ⁺ In the same way, when R is R ^- When d is greater than ₂ For the maximum distance of the negative sample point to the hyperplane, then using the adjustment factor

To make 0 < e _i 1 or less, then a blurring factor of

Wherein the value of d is d when different positive and negative samples are taken ₁ And d ₂ And proposing the validity of the constructed features to eliminate the influence of redundant features on the classification precision, and finally generating a classifier model by depending on a radial basis kernel function verified by experiments;

step 5, constructing a network traffic identification model by using an NTMI (network transfer model) identification algorithm, and finally realizing identification of malicious programs in the network traffic;

the NTMI recognition algorithm of step 5 specifically includes:

the method comprises the steps of carrying out data sampling and normalization processing on acquired network flow data to obtain a data set which is more valuable to an experiment, simultaneously extracting features of the network flow data more conveniently, then carrying out feature extraction on a data packet in the network flow by utilizing a Relieff algorithm, calculating and evaluating each feature by utilizing an information gain technology, then sequencing feature sets, carrying out secondary feature selection, adopting a heuristic sequence forward searching mode, calculating the correlation of the features, and finally realizing the dimension reduction of the features; then, normalization processing is carried out on the obtained feature subsets, all feature attributes are converted into numerical values, then the numerical values are put into a matrix array, minimum Euclidean distance calculation is carried out, training is carried out by means of an OFSVM algorithm, a classifier with a large classification effect is obtained, the rest network flow test set is used as input, classification of normal programs and malicious programs in network flow is achieved by means of the classifier, and finally identification of the malicious programs in the network flow is achieved.

2. The method according to claim 1, wherein the step 2 specifically comprises:

comparing the correlation between the sample type and the characteristic attribute of the processed data set, continuously increasing the weight value, setting a threshold value along with the higher correlation, and if the weight value exceeds the threshold value, keeping the characteristic attribute, otherwise, not selecting the characteristic attribute; meanwhile, if a plurality of characteristic attributes of a certain data packet are found in the extraction process, the data packet with the highest frequency of occurrence is selected for substitution, and the specific characteristic selection process is as follows: randomly selecting a plurality of samples s from a data set D in a layering mode, then selecting y samples r from the same type Da closest to the samples s, then selecting y samples t from different types of Db, and finally calculating the distances between the samples s and the samples r and t respectively to be Dsr and Dst; if Dsr > Dst, the characteristic attribute is problematic and cannot be used for classification, and a smaller weight is set; conversely, if the feature attribute is easy to classify, a larger weight is set.

3. The method according to claim 1, wherein the step 3 specifically comprises:

firstly, adding extracted feature attributes into a set S, on the basis, providing a Filter feature dimension reduction method, then, evaluating information gain on the feature attribute set S by means of an information gain algorithm, determining whether to update a value and whether to update the feature attribute set S by evaluating the effect of each feature attribute on subsequent classification, then, sequencing the feature attributes by adopting a heuristic search strategy to obtain a feature attribute set S1, circulating the process, stopping the process until the specified times are reached, on the basis, adopting a wrapper method to perform secondary feature selection, adopting a heuristic sequence forward search mode to obtain a feature attribute set S2, and after feature dimension reduction, not only shortening time and reducing computational complexity, but also improving the classification effect.