CN114172748A - Encrypted malicious traffic detection method - Google Patents

Encrypted malicious traffic detection method Download PDF

Info

Publication number
CN114172748A
CN114172748A CN202210124869.1A CN202210124869A CN114172748A CN 114172748 A CN114172748 A CN 114172748A CN 202210124869 A CN202210124869 A CN 202210124869A CN 114172748 A CN114172748 A CN 114172748A
Authority
CN
China
Prior art keywords
feature
features
model
feature subset
sparse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210124869.1A
Other languages
Chinese (zh)
Other versions
CN114172748B (en
Inventor
霍跃华
赵法起
李晓宇
裴超
曹洪治
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Mining and Technology Beijing CUMTB
Original Assignee
China University of Mining and Technology Beijing CUMTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Mining and Technology Beijing CUMTB filed Critical China University of Mining and Technology Beijing CUMTB
Priority to CN202210124869.1A priority Critical patent/CN114172748B/en
Publication of CN114172748A publication Critical patent/CN114172748A/en
Application granted granted Critical
Publication of CN114172748B publication Critical patent/CN114172748B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Algebra (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses an encrypted malicious flow detection method. The invention utilizes a Wireshark tool to process the flow packet; filtering out invalid IP checksum, preprocessing the sample set and marking malicious/benign labels; performing primary feature extraction on the preprocessed flow packet; 3 feature subsets are constructed for the preliminarily extracted features, and normalization and encoding are carried out; performing feature dimensionality reduction by adopting a machine learning or principal component analysis method for each type of feature subset; respectively establishing a random forest, XGboost and Gaussian naive Bayes classifier model aiming at the 3 feature subsets; combining 3 classifier models according to a Stacking strategy to form a DMMFC detection model; fusing the 3 characteristic subsets through stream fingerprints to form a sample set, dividing the sample set into a training set and a test set, and training a model; checking the model, using accuracy,F 1Score and false alarm rate evaluation meansEvaluating the test effect of the DMMFC model; the method of combining multi-feature fusion and the Stacking strategy is adopted to detect the encrypted malicious traffic, and the detection capability is strong.

Description

Encrypted malicious traffic detection method
Technical Field
The invention belongs to the field of encrypted malicious flow detection in data identification, and particularly relates to a double-layer multi-model fusion (DMMFC) encrypted malicious flow detection method for a Stacking strategy.
Background
In recent years, the trend of each industry is toward digitalization, network attack means are diversified, and security events such as data leakage and lasso software are frequently generated. In order to protect the security of users accessing the internet, many websites have adopted a transport encryption protocol. According to the Sophos News report, the encryption enabled proportion of the Chrome loaded web page ranged from 40% in 2014 to 98% in 2021. In unfortunate, while legal traffic is encrypted, malicious traffic also employs the TLS encryption protocol to mask attacks. In 2020, 23% of detected malware that communicates with remote systems over the Internet uses Transport Layer Security (TLS); today, this ratio is close to 46%.
The decryption technology has the advantages of high hardware overhead, long time consumption and high cost, and is not in line with the original purpose of protecting the internet privacy of the user. As the computing power of computers has been significantly improved in recent years, many researchers have started to identify malicious traffic in networks by using information entropy, machine learning or deep learning without decryption. The general idea of malicious traffic inspection based on the machine learning method is as follows: the method comprises the steps of obtaining malicious flow and normal flow, extracting features according to specific rules, constructing a feature matrix and a training set, establishing a machine learning model, inputting the training set into the machine learning model for training, and after training is completed, carrying out malicious flow detection by using the model. The deep learning-based method is similar to machine learning, and is different in that when a training network is established, the training network is interconnected in a neuron mode, and common networks comprise a deep neural network, a one-dimensional convolutional neural network, a long-short term memory network (LSTM), and the like; the learner also obtains good results by converting the features into images and using a two-dimensional convolutional neural network.
In the past, the conventional deep packet detection-based method analyzes the underlying information of the data packet, so that on one hand, the internet privacy of a user is violated, and on the other hand, the method has high false alarm rate and brings troubles to network security practitioners. Nowadays, malicious traffic detection method based on machine learning has become the mainstream research method, but TLS encrypted traffic detection has the following problems: (1) TLS encrypted traffic features are various in types, and a single machine learning model is not suitable for processing multiple heterogeneous features; (2) TLS encrypted malicious flow detection has low recall rate and high false alarm rate.
Disclosure of Invention
Aiming at the defects and shortcomings in the prior art, the invention provides a method for detecting DMMFC encrypted malicious traffic of a Stacking strategy, aiming at considering flow characteristics, connection characteristics, DNS response characteristics, HTTP background characteristics and TLS handshake characteristics in the detection process, synthesizing traffic behaviors and combining the Stacking strategy to solve the problems.
The technical route of the invention is to extract the flow characteristics, the connection characteristics, the DNS response characteristics, the HTTP background characteristics and the TLS handshake characteristics to detect the encrypted malicious traffic in the mixed traffic under the condition of non-decryption. The technical idea is that a complete pcap data packet is obtained, the data packet is preprocessed, malicious/benign flow is labeled, then feature extraction is carried out, the data packet is divided into 3 feature subsets according to feature categories, and the feature subsets are standardized, coded and dimension reduced; respectively designing 3 classifier models for the 3 feature subsets; performing feature fusion on the 3 feature subsets subjected to dimensionality reduction according to the stream fingerprints, constructing a sample set, disordering and dividing the sample set into a training set and a test set; combining 3 classifier models and 1 logistic regression model according to a Stacking strategy to form a DMMFC detection model; inputting the training set into a DMMFC model for training, and checking the DMMFC detection model by using the test set; according to the accuracy,F 1And (4) checking the performance of the model by using the scores and the false alarm rate.
According to the technical idea, the technical scheme for achieving the purpose of the invention comprises the following steps:
first, an original traffic packet is obtained:
(1) collecting malicious traffic generated by 7 kinds of malicious software in the attack process by using a Wireshark tool, and combining 7 kinds of malicious traffic packets to obtain malicious traffic packets;
(2) collecting benign traffic under normal conditions by using a Wireshark tool to obtain a benign traffic packet;
further, data preprocessing is carried out, invalid IP checksum in the flow packets is filtered, and malicious flow packets and benign flow packets which can be used for data analysis are obtained;
further, analyzing the flow packet by using a Zeek tool, extracting features to obtain flow features, connection features, DNS response features, HTTP background features and TLS handshake features, and marking malicious/benign labels; the TLS handshake characteristics comprise a Client Hello part, a server Hello part and a certificate verification part;
further, according to the feature types, combining the extracted flow features and the extracted connection features to form a flow feature subset, wherein the extracted DNS response features, HTTP background features and Client Hello and server Hello parts in the TLS handshake features form a protocol feature subset, and a certificate verification part in the extracted TLS handshake features forms a certificate feature subset;
further, 3 feature subsets are subjected to standardization and coding processing, and feature dimensionality reduction is carried out according to a feature importance assessment method and a principal component analysis method:
(1) after the stream feature subset is subjected to standardization processing, a 101-dimensional standard stream feature subset is obtained, and feature importance is determined
Figure 740931DEST_PATH_IMAGE001
After sorting, taking
Figure 2148DEST_PATH_IMAGE002
As a new subset of stream featuresX 1
(2) After the one-hot coding processing is carried out on the protocol feature subset, 117-dimensional sparse protocol feature subsets are obtained, and the accumulative maximum feature contribution rate is set
Figure 671027DEST_PATH_IMAGE003
The characteristic dimensionality is reduced to 4 dimensions by adopting a principal component analysis method, the dimensionality-reduced 4-dimensional characteristics are fused with TLS version number characteristics to obtain 7-dimensional characteristics, and a new protocol characteristic subset is formedX 2
(3) After the certificate feature subset is subjected to one-hot encoding processing, a 2874-dimensional sparse certificate feature subset is obtained, and the accumulated maximum feature tribute is setContribution rate
Figure 729637DEST_PATH_IMAGE003
The index of (2) is subjected to feature dimension reduction by adopting a principal component analysis method to obtain 120-dimensional features, and a new certificate feature subset is formedX 3
Further, the feature subset after feature dimension reductionX 1X 2AndX 3respectively establishing a classifier model, adjusting model parameters, and training a model:
(1) for the reduced standard stream feature subsetX 1Establishing a random forest classifier model;
(2) for reduced sparse protocol feature subsetsX 2Establishing an XGboost classifier model;
(3) for reduced sparse certificate feature subsetX 3Establishing a Gaussian naive Bayes classifier model;
(4) in the parameter adjusting process, the parameters of a random forest classifier and a Gaussian naive Bayes classifier model are adjusted by adopting a control variable method, and the parameters of an XGboost classifier model are adjusted by adopting a grid search method;
(5) training the 3 models;
further, the 3 feature subsets after dimension reduction are fused through stream fingerprints and the label values of the feature subsetsYA sample set is constructed and calculated as 7: 3, dividing the ratio into a training set and a test set;
further, the 3 classifier models and the 1 logistic regression model are combined into a DMMFC detection model through a Stcalking strategy:
(1) the first layer network of the DMMFC detection model consists of a random forest classifier, an XGboost classifier and a Gaussian naive Bayes classifier;
(2) the second layer network of the DMMFC detection model consists of 1 logistic regression model;
(3) combining two layers of networks according to a Stacking strategy to form a DMMFC detection model;
further, the accuracy is utilized,F 1And evaluating the performance of the encrypted malicious flow detection model by using the score and the false alarm rate.
Compared with the prior art, the invention has the beneficial effects that:
1. the TLS encrypted malicious flow can be accurately detected under the actual network environment, the false alarm rate is low, the number of wrong detection samples is small, the burden of a network flow analyzer is reduced, and the timely response of a user is facilitated;
2. according to the characteristic that the machine learning model has tendentiousness to different types of data, the extracted features are divided into 3 feature subsets, and a proper classifier model is respectively established for each feature subset, so that the classification accuracy is improved;
3. the first layer network of the DMMFC detection model is trained aiming at the characteristic dimension, so that the designed machine learning model can be effectively trained, and the encrypted malicious flow can be accurately detected; a single-layer logistic regression model is adopted in the second layer network of the DMMFC detection model, so that training overfitting is prevented, the overall complexity of the detection model is reduced, and the detection efficiency is improved.
Drawings
In order to more clearly describe the technical solution of the present invention, the drawings which are needed to be used in the present invention are briefly described below, and the drawings are only for illustrating the embodiments of the present invention and are not to be construed as limiting the present invention.
Fig. 1 is a flowchart of a method for detecting DMMFC encrypted malicious traffic of a Stacking policy according to an embodiment of the present invention;
fig. 2 is a design diagram of a method for detecting DMMFC encrypted malicious traffic according to a Stacking policy in an embodiment of the present invention;
FIG. 3 is a flow chart of encrypted traffic processing according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a DMMFC model of a Stacking strategy according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be analyzed and expressed more fully and completely with reference to the drawings in the following embodiments, and it is obvious that the embodiments described are only a part of the embodiments of the present invention, so as to further explain the present invention, and enable those skilled in the art to clearly and thoroughly understand the present invention, and not to limit the present invention.
As shown in fig. 1 to 4, a method for detecting encrypted malicious traffic according to an embodiment of the present invention is designed as follows: dividing an original flow packet into a benign flow packet and a malicious flow packet, performing feature extraction, feature subset construction, feature coding and dimension reduction, establishing a classifier model for each dimension-reduced feature subset, combining 3 classifier models and 1 logistic regression model through a Stacking strategy to form a DMMFC detection model, and performing training and prediction by using a training set and a test set;
as shown in fig. 2, a method for detecting encrypted malicious traffic provided in an embodiment of the present invention includes the following steps:
step 1, obtaining an original pcap flow packet:
(1) collecting flow generated by 7 types of malicious software running in a network communication environment by using a Wireshark tool, and combining the flow to obtain an original pcap flow packet of the malicious flow, thereby ensuring the diversity of encrypted malicious flow and preventing an overfitting result of a model for training encrypted malicious flow of a single category;
(2) collecting benign traffic under normal conditions by using a Wireshark tool to obtain an original pcap traffic packet of the benign traffic;
step 2, data preprocessing:
filtering invalid IP checksum in the traffic packet, preprocessing the captured traffic to obtain a malicious traffic packet (containing 35552 TLS encrypted traffic) containing 653633 traffic and a benign traffic packet (containing 51703 TLS encrypted traffic) containing 314733 traffic;
and 3, extracting features, namely extracting and converting the features of the pretreated flow to obtain a feature vector of the flow as shown in figure 3:
extracting the characteristics of each flow in the two preprocessed flow packets by using a Zeek tool, wherein the extracted characteristics comprise flow characteristics, namely the arrival characteristics and the arrival process of the bidirectional flow; connection characteristics, namely TCP and UDP protocol transmission and connection establishment processes; DNS response features and HTTP background features, including dialogs between initiators and responders; TLS handshake characteristics including a Client Hello, a server Hello and a certificate verification part; in order to better perform statistics and analysis on the extracted features, the extracted features are respectively stored in 5 log files, a unique flow fingerprint is given to each flow and used for associating all behaviors of each flow, each flow is labeled, malicious flows are represented by-1, and benign flows are represented by 1;
and 4, feature conversion, namely classifying the features extracted in the step 3 into 3 types according to the categories:
(1) the stream feature subset consists of stream features and connection features, and the data types of the stream features are numerical types and have 101 dimensionalities in total;
(2) the protocol feature subset consists of Client Hello and server Hello parts in DNS response features, HTTP background features and TLS handshake features, the data types of the protocol feature subset are text types, and the protocol feature subset has 21 dimensions;
(3) the certificate feature subset consists of an encryption suite 'cipher suite', a certificate issuing authority 'issuers' and a certificate main body 'subject' in the TLS handshake feature, and the data types of the certificate feature subset are text types and have 3 dimensions.
Step 5, standardizing, coding and reducing dimensions of the feature subset:
because the machine learning model has a tendency to different types of data, in order to make the machine learning model function better, the 3 feature subsets in the step 4 are respectively standardized and encoded, and simultaneously, in order to reduce the complexity of the detection model and improve the detection efficiency of the detection model, the 3 feature subsets after being standardized and encoded are subjected to feature dimension reduction;
(1) after the stream feature subsets are normalized and normalized, 101 dimension standard stream feature subsets are obtained, and a sample set is formed by the dimension standard stream feature subsets and label values of the dimension standard stream feature subsetsT 1Will beT 1Inputting a random forest classifier model for training, after training is finished, performing importance evaluation on the features of each dimension by using a random forest feature importance evaluator, and setting feature importance
Figure 483967DEST_PATH_IMAGE001
Is 0.01, is taken
Figure 283295DEST_PATH_IMAGE002
As a new subset of stream featuresX 1Forming a new sample set with the tag values
Figure 806681DEST_PATH_IMAGE004
(2) After the one-hot coding processing is carried out on the protocol feature subset, 117-dimension sparse protocol feature subsets are obtained, and a sample set is formed by the protocol feature subsets and tag values of the sparse protocol feature subsetsT 2Performing feature dimensionality reduction on the image by adopting a principal component analysis method, and setting the accumulated maximum feature contribution rate
Figure 298842DEST_PATH_IMAGE003
The 4-dimensional feature after dimension reduction is obtained and fused with the 3-dimensional TLS version number after encoding to be used as the identification of the encryption flow, and then the protocol feature subset of 7 dimensions is obtainedX 2Form a new feature subset with its tag value
Figure 540467DEST_PATH_IMAGE004
(3) After the one-hot encoding processing is carried out on the certificate feature subset, 2874 dimensionality sparse certificate feature subsets are obtained, and a sample set is formed by the sparse certificate feature subsets and label values of the sparse certificate feature subsetsT 3Performing feature dimensionality reduction on the image by adopting a principal component analysis method, and setting the accumulated maximum feature contribution rate
Figure 877908DEST_PATH_IMAGE003
To obtain a 120-dimensional feature subset after dimension reductionX 3Forming a new sample set with the tag values
Figure 521379DEST_PATH_IMAGE005
Step 6, respectively establishing classifier models for the 3 feature subsets after dimensionality reduction, adjusting model parameters, training the models:
(1) for reduced-dimension stream feature subsetsX 1Establishing a Bagging-based random forest classifier model according to the characteristics of high dimensionality and sample imbalance, setting the number of trees in a forest to be 110 and the maximum depth of each tree to be 20 by adopting a controlled variable method according to a parallel training sample set of a random sampling principle;
(2) for reduced-dimension subset of protocol featuresX 2Establishing a Boosting-based XGboost classifier model based on moderate dimensionality, serially training a sample set of the model to enable misclassified samples to get more attention, and setting the maximum depth of each tree to be 20, the proportion of sampling column numbers to be 0.8 and the number of iterators to be 100 by adopting a grid search method;
(3) for certificate feature subset after dimension reductionX 3Establishing a Gaussian naive Bayes classifier model assuming that the sample set obeys Gaussian distribution, based on which
Figure 918862DEST_PATH_IMAGE006
Making a decision, and calculating prior probability by adopting a maximum likelihood method;
(4) training the 3 models;
and 7, feature fusion:
3 feature subsets after dimension reductionX 1X 2X 3Fusing through stream fingerprints, wherein the fused characteristic dimension is 155 and the label value thereofYForming a sample setTAnd according to the following steps of 7: 3, dividing the ratio into a training set and a test set;
step 8, combining the 3 models according to a sthooking strategy, as shown in fig. 4, wherein the structure and training mechanism of the DMMFC detection model are as follows:
(1) the first layer network of the DMMFC detection model consists of a random forest classifier, an XGboost classifier and a Gaussian naive Bayes classifier; in order to fully play the role of each machine learning model, a first-layer network of a DMMFC detection model is trained aiming at feature dimensions, namely the 1 st-28 th dimension feature of a training sample set of a random forest classifier, the 29 th-35 th dimension feature of a training sample set of an XGboost classifier and the 36 th-155 th dimension feature of a training sample set of a Gaussian naive Bayes classifier are trained in a five-fold cross validation mode, the training result of each classifier of the first-layer network is used as the feature of one dimension to reconstruct the features, the label value of each sample is kept unchanged, and the reconstructed features and labels form a new sample set and are input into a second-layer network;
(2) in order to prevent overfitting, a second layer network of the DMMFC detection model is formed by 1 logistic regression model, and a new sample set obtained by training of the first layer network is fitted;
(3) combining two layers of networks according to a Stacking strategy to form a DMMFC detection model;
step 9, training the model, and checking the performance of the model:
(1) inputting the training set into a DMMFC detection model, and training the model;
(2) inputting the test set into the DMMFC detection model after training for testing, and judging each sample by the model to obtain a final prediction result; if the prediction result is 1, the test sample is predicted to be benign flow, and if the prediction result is-1, the test sample is predicted to be malicious flow;
(3) by utilizing the accuracy,F 1Scores and false alarm rates were used to evaluate the performance of DMMFC test models, and table 1 shows the performance of each model proposed by the examples of the present invention.
Table 1: performance of each model
Model (model) Accuracy (%) F 1Fraction (%) False alarm rate (%)
Random forest classifier 99.80 99.68 0.11
XGboost classifier 97.12 95.00 1.4
Gaussian naive Bayes classifier 14.2 24.82 0.12
DMMFC 99.90 99.91 0.05
In summary, in the encrypted malicious flow detection method, the performance of the DMMFC detection model is superior to that of a single classifier model; in the misinformation samples, the DMMFC detection model misinformation 1 TLS encryption malicious sample, and the number of the rest models misinformation TLS encryption malicious samples is more than 50, which reflects that the method provided by the invention has low misinformation rate, high detection rate of TLS encryption malicious flow and reduced workload of threat response personnel; secondly, the overall accuracy reaches 99.90%, which shows that the model not only can mix encrypted malicious traffic in the traffic, but also has good detection capability on malicious samples without encryption; in the third place, the first place is,F 1the score reaches 99.91%, which reflects that the model achieves higher harmonic mean value in prediction accuracy and recall rate, and also indicates that the model has higher accuracy and recall rate.

Claims (7)

1. An encrypted malicious traffic detection method is characterized by comprising the following steps:
capturing a pcap traffic packet by using a Wireshark tool, and constructing an encrypted traffic original data set;
filtering invalid IP checksums in the original data set, and marking malicious/benign labels;
analyzing the pcap traffic packet by using a Zeek tool, and extracting flow characteristics, connection characteristics, DNS response characteristics, HTTP background characteristics and TLS handshake characteristics;
step four, constructing a stream feature subset, a protocol feature subset and a certificate feature subset according to the extracted stream features, connection features, DNS response features, HTTP background features and TLS handshake features, and carrying out standardization and encoding;
step five, performing feature dimensionality reduction on the stream feature subset, the protocol feature subset and the certificate feature subset by using a machine learning and principal component analysis method to obtain a standard stream feature subset, a sparse protocol feature subset and a sparse certificate feature subset;
step six, respectively establishing 3 classifier models for the standard flow feature subset, the sparse protocol feature subset and the sparse certificate feature subset, training the 3 classifier models, and simultaneously respectively carrying out parameter adjustment on the 3 classifier models;
step seven, fusing the standard stream feature subset, the sparse protocol feature subset and the sparse certificate feature subset through the stream fingerprint to form a sample set with the label, and dividing the sample set into a training set and a test set;
step eight, forming a DMMFC detection model by the 1 logistic regression model and the 3 classifier models in the step six through an Stcalking strategy, and training the DMMFC detection model;
step nine, inputting a test set by using the trained DMMFC model for prediction, and using the accuracy,F 1And 3 evaluation indexes of the value and the false alarm rate are used for evaluating the performance of the DMMFC detection model.
2. The encryption malicious traffic detection method according to claim 1,
the normalization and encoding of the feature subsets comprises:
the stream feature subset comprises stream features and connection log features, and standard stream feature subsets with 101 dimensions are obtained after standardization processing and processing;
the protocol feature subset comprises DNS response, HTTP background and TLS handshake features, and is coded in a one-hot coding mode, and after coding, sparse protocol feature subsets with 117 dimensions are obtained;
the certificate feature subset comprises TLS certificate features and an encryption algorithm selected in the TLS handshake process, one-hot coding is adopted for coding, and 2874 dimensionalities of sparse certificate feature subsets are obtained after coding.
3. The encrypted malicious traffic detection method according to claim 2, wherein the dimension reduction and fusion manner of the standard flow feature subset, the sparse protocol feature subset and the sparse certificate feature subset includes:
aiming at the standard stream feature subset, a random forest feature importance evaluator is used for evaluating the feature of each dimension in the subset, and the feature with feature importance greater than or equal to 0.01 is selected to obtain 28-dimensional features;
setting the accumulated maximum feature importance contribution rate epsilon to be more than or equal to 90% aiming at the sparse protocol feature subset, carrying out feature dimension reduction by using a principal component analysis method, adding a label feature of TLS (traffic class service) encrypted flow to determine that the feature set has 7 dimensions after the dimension reduction is carried out on 4-dimensional features;
setting the accumulated maximum feature importance contribution rate epsilon to be more than or equal to 90% aiming at the sparse certificate feature subset, performing feature dimensionality reduction by using a principal component analysis method, and performing 120-dimensional feature dimensionality reduction;
and fusing the 3 feature subsets subjected to dimension reduction according to the stream fingerprint, wherein the fused sample set has 155-dimensional features.
4. The tagged features of TLS encrypted traffic of claim 3, wherein,
based on the extracted TLS handshake features, TLS version number features are extracted from each TLS encrypted flow, 3-dimensional features are obtained after encoding, and the features and the sparse protocol feature subset after dimension reduction are fused.
5. The encrypted malicious traffic detection method according to claim 1, wherein the classifier model and parameter adjustment respectively established according to the standard flow feature subset, the sparse protocol feature subset, and the sparse certificate feature subset comprise:
establishing a random forest classifier model aiming at the standard stream feature subset; establishing an XGboost classifier model aiming at the sparse protocol feature subset; establishing a Gaussian naive Bayes classifier model aiming at the sparse certificate feature subset;
and the parameter adjustment comprises the steps of adjusting the parameters of a random forest classifier and a Gaussian naive Bayes classifier model by adopting a control variable method and adjusting the parameters of an XGboost classifier model by adopting a grid search method.
6. The encrypted malicious traffic detection method according to claim 1, wherein the DMMFC detection model:
the first layer of network consists of a random forest classifier, an XGboost classifier and a Gaussian naive Bayes classifier; the layer two network consists of 1 logistic regression model.
7. The encrypted malicious traffic detection method according to claim 1, wherein the DMMFC detection model is trained in a manner of:
the first-layer network of the DMMFC model is trained aiming at feature dimensions, namely the 1 st to 28 th dimension features of a training sample set of a random forest classifier, the 29 th to 35 th dimension features of a training sample set of an XGboost classifier and the 36 th to 155 th dimension features of a training sample set of a Gaussian naive Bayes classifier, and is trained in a five-fold cross validation mode, the result of the training of the first-layer network is input into a second-layer network as a new feature, and a logistic regression model is adopted for fitting.
CN202210124869.1A 2022-02-10 2022-02-10 Encrypted malicious traffic detection method Active CN114172748B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210124869.1A CN114172748B (en) 2022-02-10 2022-02-10 Encrypted malicious traffic detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210124869.1A CN114172748B (en) 2022-02-10 2022-02-10 Encrypted malicious traffic detection method

Publications (2)

Publication Number Publication Date
CN114172748A true CN114172748A (en) 2022-03-11
CN114172748B CN114172748B (en) 2022-04-15

Family

ID=80489613

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210124869.1A Active CN114172748B (en) 2022-02-10 2022-02-10 Encrypted malicious traffic detection method

Country Status (1)

Country Link
CN (1) CN114172748B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114553605A (en) * 2022-04-26 2022-05-27 中国矿业大学(北京) Encrypted malicious flow detection method for voting strategy
CN114553591A (en) * 2022-03-21 2022-05-27 北京华云安信息技术有限公司 Training method of random forest model, abnormal flow detection method and device
CN114629718A (en) * 2022-04-07 2022-06-14 浙江工业大学 Hidden malicious behavior detection method based on multi-model fusion
CN114785563A (en) * 2022-03-28 2022-07-22 中国矿业大学(北京) Encrypted malicious flow detection method for soft voting strategy
CN114938290A (en) * 2022-04-22 2022-08-23 北京天际友盟信息技术有限公司 Information detection method, device and equipment
CN115001763A (en) * 2022-05-20 2022-09-02 北京天融信网络安全技术有限公司 Phishing website attack detection method and device, electronic equipment and storage medium
CN115051874A (en) * 2022-08-01 2022-09-13 杭州默安科技有限公司 Multi-feature CS malicious encrypted traffic detection method and system
CN115174170A (en) * 2022-06-23 2022-10-11 东北电力大学 VPN encrypted flow identification method based on ensemble learning
CN115314268A (en) * 2022-07-27 2022-11-08 天津市国瑞数码安全系统股份有限公司 Malicious encrypted traffic detection method and system based on traffic fingerprints and behaviors
CN115632875A (en) * 2022-11-29 2023-01-20 湖北省楚天云有限公司 Malicious flow detection method and system based on multi-feature fusion and real-time analysis
CN116055201A (en) * 2023-01-16 2023-05-02 中国矿业大学(北京) Multi-view encryption malicious traffic detection method based on collaborative training
CN116346452A (en) * 2023-03-17 2023-06-27 中国电子产业工程有限公司 Multi-feature fusion malicious encryption traffic identification method and device based on stacking

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107577942A (en) * 2017-08-22 2018-01-12 中国民航大学 A kind of composite character screening technique for Android malware detection
US20180152467A1 (en) * 2016-11-30 2018-05-31 Cisco Technology, Inc. Leveraging synthetic traffic data samples for flow classifier training
CN110113349A (en) * 2019-05-15 2019-08-09 北京工业大学 A kind of malice encryption traffic characteristics analysis method
CN111277578A (en) * 2020-01-14 2020-06-12 西安电子科技大学 Encrypted flow analysis feature extraction method, system, storage medium and security device
CN113259313A (en) * 2021-03-30 2021-08-13 浙江工业大学 Malicious HTTPS flow intelligent analysis method based on online training algorithm
CN113660210A (en) * 2021-07-20 2021-11-16 北京天融信网络安全技术有限公司 Malicious TLS encrypted traffic detection model training method, detection method and terminal
CN113965390A (en) * 2021-10-26 2022-01-21 杭州安恒信息技术股份有限公司 Malicious encrypted traffic detection method, system and related device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180152467A1 (en) * 2016-11-30 2018-05-31 Cisco Technology, Inc. Leveraging synthetic traffic data samples for flow classifier training
CN107577942A (en) * 2017-08-22 2018-01-12 中国民航大学 A kind of composite character screening technique for Android malware detection
CN110113349A (en) * 2019-05-15 2019-08-09 北京工业大学 A kind of malice encryption traffic characteristics analysis method
CN111277578A (en) * 2020-01-14 2020-06-12 西安电子科技大学 Encrypted flow analysis feature extraction method, system, storage medium and security device
CN113259313A (en) * 2021-03-30 2021-08-13 浙江工业大学 Malicious HTTPS flow intelligent analysis method based on online training algorithm
CN113660210A (en) * 2021-07-20 2021-11-16 北京天融信网络安全技术有限公司 Malicious TLS encrypted traffic detection model training method, detection method and terminal
CN113965390A (en) * 2021-10-26 2022-01-21 杭州安恒信息技术股份有限公司 Malicious encrypted traffic detection method, system and related device

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114553591A (en) * 2022-03-21 2022-05-27 北京华云安信息技术有限公司 Training method of random forest model, abnormal flow detection method and device
CN114553591B (en) * 2022-03-21 2024-02-02 北京华云安信息技术有限公司 Training method of random forest model, abnormal flow detection method and device
CN114785563A (en) * 2022-03-28 2022-07-22 中国矿业大学(北京) Encrypted malicious flow detection method for soft voting strategy
CN114629718A (en) * 2022-04-07 2022-06-14 浙江工业大学 Hidden malicious behavior detection method based on multi-model fusion
CN114938290A (en) * 2022-04-22 2022-08-23 北京天际友盟信息技术有限公司 Information detection method, device and equipment
CN114553605A (en) * 2022-04-26 2022-05-27 中国矿业大学(北京) Encrypted malicious flow detection method for voting strategy
CN115001763A (en) * 2022-05-20 2022-09-02 北京天融信网络安全技术有限公司 Phishing website attack detection method and device, electronic equipment and storage medium
CN115001763B (en) * 2022-05-20 2024-03-19 北京天融信网络安全技术有限公司 Phishing website attack detection method and device, electronic equipment and storage medium
CN115174170B (en) * 2022-06-23 2023-05-09 东北电力大学 VPN encryption flow identification method based on ensemble learning
CN115174170A (en) * 2022-06-23 2022-10-11 东北电力大学 VPN encrypted flow identification method based on ensemble learning
CN115314268A (en) * 2022-07-27 2022-11-08 天津市国瑞数码安全系统股份有限公司 Malicious encrypted traffic detection method and system based on traffic fingerprints and behaviors
CN115314268B (en) * 2022-07-27 2023-12-12 天津市国瑞数码安全系统股份有限公司 Malicious encryption traffic detection method and system based on traffic fingerprint and behavior
CN115051874B (en) * 2022-08-01 2022-12-09 杭州默安科技有限公司 Multi-feature CS malicious encrypted traffic detection method and system
CN115051874A (en) * 2022-08-01 2022-09-13 杭州默安科技有限公司 Multi-feature CS malicious encrypted traffic detection method and system
CN115632875A (en) * 2022-11-29 2023-01-20 湖北省楚天云有限公司 Malicious flow detection method and system based on multi-feature fusion and real-time analysis
CN116055201A (en) * 2023-01-16 2023-05-02 中国矿业大学(北京) Multi-view encryption malicious traffic detection method based on collaborative training
CN116055201B (en) * 2023-01-16 2023-09-01 中国矿业大学(北京) Multi-view encryption malicious traffic detection method based on collaborative training
CN116346452A (en) * 2023-03-17 2023-06-27 中国电子产业工程有限公司 Multi-feature fusion malicious encryption traffic identification method and device based on stacking
CN116346452B (en) * 2023-03-17 2023-12-01 中国电子产业工程有限公司 Multi-feature fusion malicious encryption traffic identification method and device based on stacking

Also Published As

Publication number Publication date
CN114172748B (en) 2022-04-15

Similar Documents

Publication Publication Date Title
CN114172748B (en) Encrypted malicious traffic detection method
Kim et al. AI-IDS: Application of deep learning to real-time Web intrusion detection
Salo et al. Dimensionality reduction with IG-PCA and ensemble classifier for network intrusion detection
CN112398779B (en) Network traffic data analysis method and system
CN112738039B (en) Malicious encrypted flow detection method, system and equipment based on flow behavior
CN110381079B (en) Method for detecting network log abnormity by combining GRU and SVDD
Elnakib et al. EIDM: Deep learning model for IoT intrusion detection systems
CN114785563B (en) Encryption malicious traffic detection method of soft voting strategy
CN113904881B (en) Intrusion detection rule false alarm processing method and device
CN111695597A (en) Credit fraud group recognition method and system based on improved isolated forest algorithm
CN113194064B (en) Webshell detection method and device based on graph convolution neural network
Yu et al. Detecting malicious web requests using an enhanced textcnn
Gonaygunta Machine learning algorithms for detection of cyber threats using logistic regression
CN112884121A (en) Traffic identification method based on generation of confrontation deep convolutional network
CN110519228B (en) Method and system for identifying malicious cloud robot in black-production scene
Alqarni et al. Improving intrusion detection for imbalanced network traffic using generative deep learning
Montes et al. Web application attacks detection using deep learning
Kasim Automatic detection of phishing pages with event-based request processing, deep-hybrid feature extraction and light gradient boosted machine model
US20230164180A1 (en) Phishing detection methods and systems
CN115987544A (en) Network security threat prediction method and system based on threat intelligence
Li et al. A Method for Network Intrusion Detection Based on GAN-CNN-BiLSTM
Nandakumar et al. A Novel Approach to User Agent String Parsing for Vulnerability Analysis Using Multi-Headed Attention
CN117807590B (en) Information security prediction and monitoring system and method based on artificial intelligence
CN116582301B (en) Industrial control network abnormal flow detection method, system and computer readable storage medium based on Laplacian pyramid
CN117614742B (en) Malicious traffic detection method with enhanced honey point perception

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant