CN116032588A - Abnormal encryption flow identification method based on feature selection - Google Patents
Abnormal encryption flow identification method based on feature selection Download PDFInfo
- Publication number
- CN116032588A CN116032588A CN202211662536.0A CN202211662536A CN116032588A CN 116032588 A CN116032588 A CN 116032588A CN 202211662536 A CN202211662536 A CN 202211662536A CN 116032588 A CN116032588 A CN 116032588A
- Authority
- CN
- China
- Prior art keywords
- feature
- sample data
- abnormal
- traffic
- encrypted
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention provides an abnormal encryption flow identification method based on feature selection, which comprises the steps of collecting relevant encryption flow sample data according to the realization purpose of a configuration system, wherein the relevant encryption flow sample data comprises abnormal encryption flow sample data; performing invalid data cleaning and sample data tagging on the collected encrypted traffic sample data; performing feature extraction operations of different dimensions based on the cleaned encrypted traffic sample data; model training for identifying abnormal encrypted network traffic is carried out according to the screened feature subsets, and a corresponding training model is generated; and testing the training model, and evaluating the using effect of the training model until a satisfactory training model is obtained. According to the invention, the feature selection method for identifying the abnormal encrypted traffic is improved by optimizing the target fitness function of the PSO algorithm, so that the feature screening can be rapidly and accurately carried out on the sample traffic, the identification accuracy and the generalization rate of the algorithm model are improved, and a more efficient mode is provided for identifying the abnormal encrypted traffic of the network.
Description
Technical Field
The invention relates to the technical field of network security, in particular to an abnormal encryption traffic identification method based on feature selection.
Background
Currently, internet encrypted traffic is continuously increasing, and "encisa encrypted traffic analysis report" indicates that Windows users using Chrome browser access websites through HTTPS account for about 87%, linux users account for about 78%, and Mac users account for up to about 93%. From the perspective of end user privacy, the message is a good message, provides security guarantee for confidentiality and integrity of data, and improves the security privacy condition as a whole. However, privacy protection also allows lawbreakers to have a multiplicable mechanism, encryption can hide malicious traffic as other information is hidden, and a series of worms, trojans and viruses are brought to the network.
The main method for identifying malicious encrypted traffic comprises the following steps:
the TLS fingerprint, namely JA3 (S), is adopted to identify malicious traffic, and the technology is widely applied to threat information;
and decrypting the encrypted traffic based on the SSL proxy mode to identify malicious encrypted traffic.
For internet traffic encryption, the ability of rule-based monitoring and detection control in traditional modes is significantly reduced, so that enterprises have blind spots in managing network information security. Such as application firewalls, intrusion Detection and Prevention Systems (IDPS), data anti-leakage/protection (DLP) tools, etc., which rely to a large extent on the identification and analysis of unencrypted traffic.
With the gradual popularization of the TLS1.3 protocol, handshake messages between a client and a server are also encrypted, and the ability to identify malicious encrypted traffic by using TLS fingerprinting is also severely challenged.
Based on the SSL proxy mode, the encrypted traffic is unloaded, the method has great improvement on the original network, has high proxy server performance, and is difficult to widely popularize.
Disclosure of Invention
Aiming at the defects of the background technology, the invention can rapidly and accurately perform feature screening on sample flow by the feature selection-based abnormal encryption flow identification method, improves the identification accuracy and generalization rate of an algorithm model, and provides a more efficient mode for identifying abnormal encryption flow of a network.
The invention adopts the following technical scheme for solving the technical problems:
an abnormal encryption traffic identification method based on feature selection comprises the following steps:
step 1, according to the realization purpose of a configuration system, collecting related encrypted flow sample data, including abnormal encrypted flow sample data;
step 2, performing operations such as invalid data cleaning, sample data labeling and the like on the collected encrypted traffic sample data;
step 3, performing feature extraction operations of different dimensions based on the cleaned encrypted traffic sample data, such as attribute information based on total byte number of session, uplink and downlink byte number, total session duration, handshake establishment connection duration, average duration of intra-session response, validity period of SSL certificate, domain name of SSL certificate and the like;
step 4, carrying out feature evaluation and screening on each feature of the feature set based on an RFE-PSO algorithm;
step 5, training a model for identifying abnormal encrypted network traffic according to the screened feature subsets, and generating a corresponding training model;
and 6, performing generalization test according to the obtained training model, evaluating the actual use effect of the training model, if the effect is poor, returning to the step 4, performing feature selection again according to the RFE-PSO algorithm, and performing iterative optimization of the model until a satisfactory training model is obtained.
Preferably, the step 4 specifically includes the following steps:
the system regards feature selection as an optimization problem, and takes the weight coefficient d of each feature as 0 or 1,0 represents discarding, and 1 represents selection; the feature evaluation module evaluates the candidate feature subset using the following target fitness function;
the following gives the target fitness function in conjunction with the PSO algorithm:
wherein d i Representing a set of feature selection coefficient vectors (i representing the ith particle in the vector d); alpha represents a characteristic weight coefficient of the detection accuracy; p represents the accuracy of abnormal encryption flow detection; m is the characteristic dimension of the sample data before detection; m represents the feature set d i Feature-selected subset dimensions.
Preferably, the method comprises the key process of algorithm realization, and comprises the following steps:
step 1, initializing the number M of particle groups;
step 2, configuring particle swarm parameters such as iteration number N, inertial weight w and velocity factor v i Location factor x i ;
Step 3, calculating the fitness value of each particle;
step 4, updating the ith particle d i Search rate v of (v) i And position x i ;
Step 5, updating the local optimal solution P i And a globally optimal solution P g Local optimum solution P i Better than the global optimal solution P g At the time P g Equal to the optimum P i ;
Step 6, judging whether the current iteration times are smaller than N, and if so, returning to the step 3;
step 7, outputting a feature vector d according to the global optimal solution i 。
Compared with the prior art, the technical scheme has the following beneficial effects:
1. according to the feature selection-based abnormal encryption flow identification method, the target fitness function of the PSO algorithm is optimized, the feature selection method for abnormal encryption flow identification is improved, and feature screening can be rapidly and accurately conducted on sample flow.
2. The abnormal encryption traffic identification method based on feature selection improves the identification accuracy and generalization rate of the algorithm model, and provides a more efficient mode for network abnormal encryption traffic identification.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a general flow chart of a calibration software system;
fig. 2 is a flow chart of feature selection based on the PSO algorithm.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, the following general flow of the implementation of the system described above is described in conjunction with fig. 1, and includes the following steps:
and step 1, collecting relevant encrypted flow sample data according to the realization purpose of the configuration system, wherein the relevant encrypted flow sample data comprises abnormal encrypted flow sample data.
And 2, performing operations such as invalid data cleaning, sample data tagging and the like on the collected encrypted traffic sample data.
And 3, performing feature extraction operations of different dimensions based on the cleaned encrypted traffic sample data, such as attribute information based on total byte number of a session, uplink and downlink byte number, total session duration, handshake establishment connection duration, average intra-session response duration, validity period of an SSL certificate, domain name of the SSL certificate and the like.
And 4, carrying out feature evaluation and screening on each feature of the feature set based on an RFE-PSO algorithm.
And step 5, training a model for identifying the network abnormal encryption traffic according to the screened feature subset, and generating a corresponding training model.
And 6, performing generalization test according to the obtained training model, evaluating the actual use effect of the training model, if the effect is poor, returning to the step 4, performing feature selection again according to the RFE-PSO algorithm, and performing iterative optimization of the model until a satisfactory training model is obtained.
The core of the system is how to realize feature selection based on the RFE-PSO algorithm on the premise of guaranteeing the network abnormal encryption flow identification accuracy, and the goals of reducing feature dimension and accelerating detection efficiency are achieved, namely, the process of evaluating and screening features by the RFE-PSO algorithm in the fourth step.
The system regards feature selection as an optimization problem and takes the weight coefficient d of each feature as 0 or 1,0 representing discard and 1 representing selection. The feature evaluation module evaluates the candidate feature subset using the following target fitness function.
The following gives the target fitness function in conjunction with the PSO algorithm:
wherein d i Representing a set of feature selection coefficient vectors (i representing the ith particle in the vector d); alpha represents a characteristic weight coefficient of the detection accuracy; p represents the accuracy of abnormal encryption flow detection; m is the characteristic dimension of the sample data before detection; m represents the feature set d i Feature-selected subset dimensions.
The network abnormal encryption traffic identification problem is actually a classification problem, and under the condition that the accuracy P is equal, the smaller m is, the more simplified the selected feature subset is, and the better the feature performance is.
For example, the encrypted traffic anomaly detection originally uses a random forest algorithm to perform feature selection, and when the accuracy of the training set sample test is P, the PSO algorithm is changed and used, and f (d) is determined on the premise that the accuracy is greater than P i ) The smaller the value of (c) is, the better the performance of the feature subset selected, and the more refined the number of feature subsets isIs simple.
Compared with the traditional feature selection method, the formula has the advantages that the detection accuracy weight coefficient is introduced to determine the range of the feature subset, so that the problem of poor model generalization caused by small sample data volume of the training set is effectively solved;
referring to fig. 2, the following describes in detail the key process implemented by the RFE-PSO algorithm with reference to fig. 2, including the following steps:
step 1, initializing the particle swarm number M.
Step 2, configuring particle swarm parameters such as iteration number N, inertial weight w and velocity factor v i Location factor x i 。
And 3, calculating the fitness value of each particle.
Step 4, updating the ith particle d i Search rate v of (v) i And position x i 。
Step 5, updating the local optimal solution P i And a globally optimal solution P g Local optimum solution P i Better than the global optimal solution P g At the time P g Equal to the optimum P i 。
And step 6, judging whether the current iteration times are smaller than N, and if so, returning to the step 3.
Step 7, outputting a feature vector d according to the global optimal solution i 。
According to the invention, the feature selection method for identifying the abnormal encrypted traffic is improved by optimizing the target fitness function of the PSO algorithm, so that the feature screening can be rapidly and accurately carried out on the sample traffic, the identification accuracy and the generalization rate of the algorithm model are improved, and a more efficient mode is provided for identifying the abnormal encrypted traffic of the network.
According to the technical scheme provided by the invention, the target fitness function in the PSO algorithm adopts the detection accuracy as the threshold coefficient, and the same is also possible if the target detection error rate is selected as the threshold coefficient, and P represents the accuracy of abnormal encryption flow detection, and the value of the target fitness function can be obtained according to the formula, and is larger and better. As a result, the optimized feature subset can also be screened out.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.
Claims (3)
1. The abnormal encryption traffic identification method based on feature selection is characterized by comprising the following steps:
step 1, according to the realization purpose of a configuration system, collecting related encrypted flow sample data, including abnormal encrypted flow sample data;
step 2, performing invalid data cleaning and sample data labeling operation on the collected encrypted traffic sample data;
step 3, carrying out feature extraction operations of different dimensions based on the cleaned encrypted flow sample data;
step 4, carrying out feature evaluation and screening on each feature of the feature set based on an RFE-PSO algorithm;
step 5, training a model for identifying abnormal encrypted network traffic according to the screened feature subsets, and generating a corresponding training model;
and 6, performing generalization test according to the obtained training model, evaluating the actual use effect of the training model, if the effect is poor, returning to the step 4, performing feature selection again according to the RFE-PSO algorithm, and performing iterative optimization of the model until a satisfactory training model is obtained.
2. The method for identifying abnormal encrypted traffic based on feature selection according to claim 1, wherein the step 4 is specifically as follows:
the system regards feature selection as an optimization problem, and takes the weight coefficient d of each feature as 0 or 1,0 represents discarding, and 1 represents selection; the feature evaluation module evaluates the candidate feature subset using the following target fitness function;
the following gives the target fitness function in conjunction with the PSO algorithm:
wherein d i Representing a set of feature selection coefficient vectors (i representing the ith particle in the vector d); alpha represents a characteristic weight coefficient of the detection accuracy; p represents the accuracy of abnormal encryption flow detection; m is the characteristic dimension of the sample data before detection; m represents the feature set d i Feature-selected subset dimensions.
3. The method for identifying abnormal encrypted traffic based on feature selection according to claim 1, comprising the key process implemented by an algorithm, comprising the steps of:
step 1, initializing the number M of particle groups;
step 2, configuring particle swarm parameters such as iteration number N, inertial weight w and velocity factor v i Location factor x i ;
Step 3, calculating the fitness value of each particle;
step 4, updating the ith particle d i Search rate v of (v) i And position x i ;
Step 5, updating the local optimal solution P i And a globally optimal solution P g Local optimum solution P i Better than the global optimal solution P g At the time P g Equal to the optimum P i ;
Step 6, judging whether the current iteration times are smaller than N, and if so, returning to the step 3;
step 7, outputting a feature vector d according to the global optimal solution i 。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211662536.0A CN116032588A (en) | 2022-12-23 | 2022-12-23 | Abnormal encryption flow identification method based on feature selection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211662536.0A CN116032588A (en) | 2022-12-23 | 2022-12-23 | Abnormal encryption flow identification method based on feature selection |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116032588A true CN116032588A (en) | 2023-04-28 |
Family
ID=86075306
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211662536.0A Pending CN116032588A (en) | 2022-12-23 | 2022-12-23 | Abnormal encryption flow identification method based on feature selection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116032588A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117034124A (en) * | 2023-10-07 | 2023-11-10 | 中孚信息股份有限公司 | Malicious traffic classification method, system, equipment and medium based on small sample learning |
CN117251691A (en) * | 2023-08-04 | 2023-12-19 | 华能信息技术有限公司 | Suspicious sample analysis processing method and system |
CN117952624A (en) * | 2024-02-02 | 2024-04-30 | 南京弘竹泰信息技术有限公司 | Call center data analysis method and system based on artificial intelligence |
-
2022
- 2022-12-23 CN CN202211662536.0A patent/CN116032588A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117251691A (en) * | 2023-08-04 | 2023-12-19 | 华能信息技术有限公司 | Suspicious sample analysis processing method and system |
CN117034124A (en) * | 2023-10-07 | 2023-11-10 | 中孚信息股份有限公司 | Malicious traffic classification method, system, equipment and medium based on small sample learning |
CN117034124B (en) * | 2023-10-07 | 2024-02-23 | 中孚信息股份有限公司 | Malicious traffic classification method, system, equipment and medium based on small sample learning |
CN117952624A (en) * | 2024-02-02 | 2024-04-30 | 南京弘竹泰信息技术有限公司 | Call center data analysis method and system based on artificial intelligence |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Karatas et al. | Deep learning in intrusion detection systems | |
Sharafaldin et al. | Towards a reliable intrusion detection benchmark dataset | |
CN116032588A (en) | Abnormal encryption flow identification method based on feature selection | |
Catak et al. | Distributed denial of service attack detection using autoencoder and deep neural networks | |
US9094288B1 (en) | Automated discovery, attribution, analysis, and risk assessment of security threats | |
US8402543B1 (en) | Machine learning based botnet detection with dynamic adaptation | |
Sung et al. | The feature selection and intrusion detection problems | |
Zhang et al. | Federated rnn-based detection of ransomware attacks: A privacy-preserving approach | |
Bai et al. | Rdp-based lateral movement detection using machine learning | |
Thomas | Improving intrusion detection for imbalanced network traffic | |
Chopra et al. | Evaluating machine learning algorithms to detect and classify DDoS attacks in IoT | |
Beer et al. | Feature selection for flow-based intrusion detection using rough set theory | |
Rimmer et al. | Open-world network intrusion detection | |
de Campos et al. | Network intrusion detection system using data mining | |
Luo et al. | Deep learning based device classification method for safeguarding internet of things | |
CN112968891B (en) | Network attack defense method and device and computer readable storage medium | |
Warmer | Detection of web based command & control channels | |
Singh | Use of machine learning for securing IoT | |
Kanagaraj et al. | Hybrid intrusion detector using deep learning technique | |
CN114401112B (en) | Bypass deployment real-time deep packet detection method for malicious traffic encrypted by TLS | |
Iradukunda et al. | Multi-classifier deep neural network for detecting intruder behavior in cyber security | |
Priyalakshmi et al. | Intrusion Detection Using Enhanced Transductive Support Vector Machine | |
Hui Tang et al. | Research on CNN-based malicious traffic identification method | |
Tangi et al. | A novel mechanism for development of intrusion detection system with BPNN | |
CN118018249A (en) | TLS (transport layer security) encryption malicious traffic detection and analysis method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |