CN113381996A - C & C communication attack detection method based on machine learning - Google Patents
C & C communication attack detection method based on machine learning Download PDFInfo
- Publication number
- CN113381996A CN113381996A CN202110637965.1A CN202110637965A CN113381996A CN 113381996 A CN113381996 A CN 113381996A CN 202110637965 A CN202110637965 A CN 202110637965A CN 113381996 A CN113381996 A CN 113381996A
- Authority
- CN
- China
- Prior art keywords
- flow
- machine learning
- packet
- detection method
- communication
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Computer Hardware Design (AREA)
- Computer Networks & Wireless Communication (AREA)
- Medical Informatics (AREA)
- Mathematical Physics (AREA)
- Signal Processing (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a C & C communication attack detection method based on machine learning, which comprises the following steps: acquiring a continuous downlink flow packet, filtering the flow packet to enable the length distribution of the flow packet to be normal, and carrying out session aggregation on the flow packet according to a specified condition; extracting session flow characteristics by using random cluster sampling and Apriori algorithm; and calculating the similarity of the aggregated flow context data by combining the editing distance and the Longest Common Subsequence (LCS) for detecting the sequence similarity. The invention can detect the communication of undiscovered malicious software without depending on a feature library; when a large number of attack flow samples are detected, the detection time complexity is low, and the detection time is short.
Description
Technical Field
The invention relates to the technical field of communication security, in particular to a C & C communication attack detection method based on machine learning.
Background
At present, there are three aspects to C & C communication detection, which are statistical feature detection based on traffic packets, feature code detection based on traffic payload, and detection based on existing malware supervised machine learning methods.
The existing method has certain defects aiming at the detection of C & C communication attacks. First, existing methods have certain drawbacks to the detection of unpublished or undiscovered malware. Secondly, the detection effect of the existing method is more dependent on a feature library rather than comprehensive. Finally, because normal users use network scenes more diversified, the situation that the normal user traffic attribute features are similar to the malicious traffic attribute features is easily caused, and if the situation is judged according to the size and the arrival time interval of the data packet, the communication process of the existing partial chat software has the similar features to the malicious software. Therefore, the existing method has certain limitation on the detection precision and detection effect of C & C communication. Certain disadvantages exist in the aspect of C & C communication detection. Based on the statistical characteristic detection of the flow packet, as the communication of malicious software changes along with the change of network congestion and the current normal network application scenes are more and more, the statistical characteristics of normal user flow and malicious user flow are easy to be similar, so that the false alarm rate is higher. Based on the feature code detection in the traffic payload, the method has a high detection effect on the existing known malicious software, but if the malicious software is mutated, the feature code is changed, and the detection is invalid. The detection based on the existing malicious software supervised machine learning method is mainly based on the flow characteristics of the existing malicious software to carry out supervised learning, and the detection effect of the method is more dependent on the coverage of a training set of machine learning and the scientificity of the learning method.
Disclosure of Invention
The invention aims to provide a C & C communication attack detection method based on machine learning.
The technical scheme adopted by the invention is as follows:
the C & C communication attack detection method based on machine learning comprises the following steps:
step 1, filtering a flow packet: obtaining continuous downlink flow packets and filtering the flow packets to ensure that the distribution of the length of the flow packets is normal,
step 2, flow session aggregation: carrying out session aggregation on the traffic packet according to a specified condition;
step 3, extracting session flow characteristics by using random cluster sampling and Apriori algorithm;
and 4, calculating the similarity of the aggregated flow context data by combining the editing distance and the Longest Common Subsequence (LCS) for detecting the sequence similarity.
And 5, judging whether abnormal C & C communication exists according to whether the context similarity of the downlink flow of the session exceeds a set value.
Further, as a preferred embodiment, step 1 sets a filtering threshold according to the positive distribution of the packet length of the traffic, filters part of the uncorrelated traffic,
further, as a preferred embodiment, the packet length critical value of the small flow rate packet is calculated by setting the packet filtering rate in step 1, and the final filtering packet length is determined by adopting normal distribution estimation and threshold setting mode comprehensive calculation.
Further, as a preferred embodiment, in step 2, session aggregation is performed according to a source address, a source port, a destination address or a destination port.
Further, as a preferred embodiment, when the amount of data processed in step 3 is too large, a reservoir sampling algorithm is used for probability sampling.
Further, as a preferred embodiment, in step 4, the edit distance calculation is performed on the sequence pairs, the sequence pairs with larger distance values are filtered and removed according to the calculation result, and then the LCS calculation is performed on the sequence pairs.
By adopting the technical scheme, the context similarity detection is carried out on the session flow data after the filtering, sampling and aggregation are carried out on the network flow according to the flow, and then whether the malicious software communication exists is detected. The invention has the following advantages: 1. undetected malware communications can be detected without relying on a feature library. 2. The method is different from the existing malicious software supervised machine learning method, the detection is mainly based on the flow characteristics of the existing malicious software for supervised learning, and the detection effect of the method is more dependent on the coverage of a training set of machine learning and the scientificity of the learning method. 3. For C & C communication detection, the detection algorithm based on downlink payload similarity has higher accuracy and recall rate compared with the detection of a traffic packet detection algorithm and a payload feature code, and has certain advantages in detection time.
Drawings
The invention is described in further detail below with reference to the accompanying drawings and the detailed description;
fig. 1 is a schematic flow chart of the C & C communication attack detection method based on machine learning according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
As shown in fig. 1, the invention discloses a C & C communication attack detection method based on machine learning, which comprises the following steps:
step 1, filtering a flow packet: acquiring a continuous downlink flow packet; at present, the flow in the existing network environment is increasingly large, the malicious software downlink flow packets are mostly small, the flow packets are filtered to avoid resource waste caused by meaningless analysis and detection of non-related flow, so that the distribution of the length of the flow packets is in normal distribution,
further, as a preferred embodiment, step 1 sets a filtering threshold according to the positive distribution of the packet length of the traffic, and filters the part of the uncorrelated traffic. Specifically, a packet length critical value of the small flow packet is calculated by setting a packet filtering rate, and finally the filtering packet length is determined by adopting normal distribution estimation and threshold setting mode comprehensive calculation.
Step 2, flow session aggregation: carrying out session aggregation on the traffic packet according to a specified condition;
step 3, extracting session flow characteristics by using random cluster sampling and Apriori algorithm;
and 4, calculating the similarity of the aggregated flow context data by combining the editing distance and the Longest Common Subsequence (LCS) for detecting the sequence similarity.
And 5, judging whether abnormal C & C communication exists according to whether the context similarity of the downlink flow of the session exceeds a set value.
Further, as a preferred embodiment, in step 2, session aggregation is performed according to a source address, a source port, a destination address or a destination port.
Further, as a preferred embodiment, the sampling in step 3 is to extract a sample representing the population from the population by a certain sampling algorithm. The overall characteristics are predicted by detecting the characteristics of the extracted samples, the content similarity in payload of continuous downlink flow is detected, the condition that continuity possibly occurs in the same name and interest in the actual attack process is considered, therefore, a random cluster sampling algorithm is adopted, and if the processing data volume is overlarge, a reservoir sampling algorithm can be adopted for probability sampling.
Further, as a preferred embodiment, in step 4, the edit distance calculation is performed on the sequence pairs, the sequence pairs with larger distance values are filtered and removed according to the calculation result, and then the LCS calculation is performed on the sequence pairs.
Specifically, the detection of the similarity of the downlink traffic packet sequence is mainly based on the combination of the value algorithm for calculating the Longest Common Subsequence (LCS) and calculating the edit distance between the two sequences. The LCS is the longest common subsequence, and the similarity of the two sequences is obtained by obtaining the length of the maximum common subsequence of the two sequences. The longest common subsequence is generally obtained by using a dynamic programming algorithm. The editing distance, also called Levenshtein distance, represents the minimum number of edits required to convert one character string into another character string, and the editing refers to replacing one character in the character string with another character or inserting and deleting characters.
Because the complexity of the calculation time of the edit distance is low, some irrelevant sequence pairs can be removed firstly, and the similarity of LCS calculation is more accurate, the detection result has higher reliability.
By adopting the technical scheme, the context similarity detection is carried out on the session flow data after the filtering, sampling and aggregation are carried out on the network flow according to the flow, and then whether the malicious software communication exists is detected. The invention has the following advantages: 1. undetected malware communications can be detected without relying on a feature library. 2. The method is different from the existing malicious software supervised machine learning method, the detection is mainly based on the flow characteristics of the existing malicious software for supervised learning, and the detection effect of the method is more dependent on the coverage of a training set of machine learning and the scientificity of the learning method. 3. For C & C communication detection, the detection algorithm based on downlink payload similarity has higher accuracy and recall rate compared with the detection of a traffic packet detection algorithm and a payload feature code, and has certain advantages in detection time.
It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. The embodiments and features of the embodiments in the present application may be combined with each other without conflict. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the detailed description of the embodiments of the present application is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Claims (6)
1. The C & C communication attack detection method based on machine learning is characterized by comprising the following steps: which comprises the following steps:
step 1, filtering a flow packet: obtaining continuous downlink flow packets and filtering the flow packets to ensure that the distribution of the length of the flow packets is normal,
step 2, flow session aggregation: carrying out session aggregation on the traffic packet according to a specified condition;
step 3, extracting session flow characteristics by using random cluster sampling and Apriori algorithm;
and 4, calculating the similarity of the aggregated flow context data by combining the editing distance and the longest public subsequence for sequence similarity detection.
2. And 5, judging whether abnormal C & C communication exists according to whether the context similarity of the downlink flow of the session exceeds a set value.
3. The machine learning-based C & C communication attack detection method according to claim 1, characterized in that: step 1, setting a filtering threshold value according to the positive distribution of the length of the flow packet, filtering part of irrelevant flow,
the machine learning-based C & C communication attack detection method according to claim 1, characterized in that: step 1, calculating the packet length critical value of the small flow packet by setting the packet filtering rate, and finally determining the filtering packet length by adopting normal distribution estimation and threshold setting mode comprehensive calculation.
4. The machine learning-based C & C communication attack detection method according to claim 1, characterized in that: and in step 2, carrying out session aggregation according to the source address, the source port, the destination address or the destination port.
5. The machine learning-based C & C communication attack detection method according to claim 1, characterized in that: and 3, when the processing data volume is overlarge, performing probability sampling by adopting a reservoir sampling algorithm.
6. The machine learning-based C & C communication attack detection method according to claim 1, characterized in that: in step 4, the edit distance of the sequence pair is calculated, the sequence pair with larger distance value is screened and removed according to the calculation result, and then LCS calculation is carried out on the sequence pair.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110637965.1A CN113381996B (en) | 2021-06-08 | 2021-06-08 | C & C communication attack detection method based on machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110637965.1A CN113381996B (en) | 2021-06-08 | 2021-06-08 | C & C communication attack detection method based on machine learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113381996A true CN113381996A (en) | 2021-09-10 |
CN113381996B CN113381996B (en) | 2023-04-28 |
Family
ID=77576530
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110637965.1A Active CN113381996B (en) | 2021-06-08 | 2021-06-08 | C & C communication attack detection method based on machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113381996B (en) |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030236995A1 (en) * | 2002-06-21 | 2003-12-25 | Fretwell Lyman Jefferson | Method and apparatus for facilitating detection of network intrusion |
US20100138919A1 (en) * | 2006-11-03 | 2010-06-03 | Tao Peng | System and process for detecting anomalous network traffic |
US20130174256A1 (en) * | 2011-12-29 | 2013-07-04 | Architecture Technology Corporation | Network defense system and framework for detecting and geolocating botnet cyber attacks |
CN103297433A (en) * | 2013-05-29 | 2013-09-11 | 中国科学院计算技术研究所 | HTTP botnet detection method and system based on net data stream |
CN103746982A (en) * | 2013-12-30 | 2014-04-23 | 中国科学院计算技术研究所 | Automatic generation method and system for HTTP (Hyper Text Transport Protocol) network feature code |
CN104683346A (en) * | 2015-03-06 | 2015-06-03 | 西安电子科技大学 | P2P botnet detection device and method based on flow analysis |
CN106034056A (en) * | 2015-03-18 | 2016-10-19 | 北京启明星辰信息安全技术有限公司 | Service safety analysis method and system thereof |
CN106101121A (en) * | 2016-06-30 | 2016-11-09 | 中国人民解放军防空兵学院 | A kind of all-network flow abnormity abstracting method |
US20160381054A1 (en) * | 2015-06-26 | 2016-12-29 | Board Of Regents, The University Of Texas System | System and device for preventing attacks in real-time networked environments |
US9870465B1 (en) * | 2013-12-04 | 2018-01-16 | Plentyoffish Media Ulc | Apparatus, method and article to facilitate automatic detection and removal of fraudulent user information in a network environment |
CN107665191A (en) * | 2017-10-19 | 2018-02-06 | 中国人民解放军陆军工程大学 | A kind of proprietary protocol message format estimating method based on expanded prefix tree |
CN107733937A (en) * | 2017-12-01 | 2018-02-23 | 广东奥飞数据科技股份有限公司 | A kind of Abnormal network traffic detection method |
US20180131717A1 (en) * | 2016-11-10 | 2018-05-10 | Electronics And Telecommunications Research Institute | Apparatus and method for detecting distributed reflection denial of service attack |
CN108965248A (en) * | 2018-06-04 | 2018-12-07 | 上海交通大学 | A kind of P2P Botnet detection system and method based on flow analysis |
-
2021
- 2021-06-08 CN CN202110637965.1A patent/CN113381996B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030236995A1 (en) * | 2002-06-21 | 2003-12-25 | Fretwell Lyman Jefferson | Method and apparatus for facilitating detection of network intrusion |
US20100138919A1 (en) * | 2006-11-03 | 2010-06-03 | Tao Peng | System and process for detecting anomalous network traffic |
US20130174256A1 (en) * | 2011-12-29 | 2013-07-04 | Architecture Technology Corporation | Network defense system and framework for detecting and geolocating botnet cyber attacks |
CN103297433A (en) * | 2013-05-29 | 2013-09-11 | 中国科学院计算技术研究所 | HTTP botnet detection method and system based on net data stream |
US9870465B1 (en) * | 2013-12-04 | 2018-01-16 | Plentyoffish Media Ulc | Apparatus, method and article to facilitate automatic detection and removal of fraudulent user information in a network environment |
CN103746982A (en) * | 2013-12-30 | 2014-04-23 | 中国科学院计算技术研究所 | Automatic generation method and system for HTTP (Hyper Text Transport Protocol) network feature code |
CN104683346A (en) * | 2015-03-06 | 2015-06-03 | 西安电子科技大学 | P2P botnet detection device and method based on flow analysis |
CN106034056A (en) * | 2015-03-18 | 2016-10-19 | 北京启明星辰信息安全技术有限公司 | Service safety analysis method and system thereof |
US20160381054A1 (en) * | 2015-06-26 | 2016-12-29 | Board Of Regents, The University Of Texas System | System and device for preventing attacks in real-time networked environments |
CN106101121A (en) * | 2016-06-30 | 2016-11-09 | 中国人民解放军防空兵学院 | A kind of all-network flow abnormity abstracting method |
US20180131717A1 (en) * | 2016-11-10 | 2018-05-10 | Electronics And Telecommunications Research Institute | Apparatus and method for detecting distributed reflection denial of service attack |
CN107665191A (en) * | 2017-10-19 | 2018-02-06 | 中国人民解放军陆军工程大学 | A kind of proprietary protocol message format estimating method based on expanded prefix tree |
CN107733937A (en) * | 2017-12-01 | 2018-02-23 | 广东奥飞数据科技股份有限公司 | A kind of Abnormal network traffic detection method |
CN108965248A (en) * | 2018-06-04 | 2018-12-07 | 上海交通大学 | A kind of P2P Botnet detection system and method based on flow analysis |
Non-Patent Citations (4)
Title |
---|
YUNMING WANG: "Measure of invulnerability for command and control network based on network invulnerability entropy", 《2016 2ND INTERNATIONAL CONFERENCE ON CONTROL SCIENCE AND SYSTEMS ENGINEERING (ICCSSE)》 * |
牛伟纳;张小松;孙恩博;杨国武;赵凌园;: "基于流相似性的两阶段P2P僵尸网络检测方法" * |
苏欣;张大方;罗章琪;曾彬;黎文伟;: "基于Command and Control通信信道流量属性聚类的僵尸网络检测方法" * |
陈兴蜀等: "基于告警属性聚类的攻击场景关联规则挖掘方法研究", 《工程科学与技术》 * |
Also Published As
Publication number | Publication date |
---|---|
CN113381996B (en) | 2023-04-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111935170B (en) | Network abnormal flow detection method, device and equipment | |
CN109714322B (en) | Method and system for detecting network abnormal flow | |
CN110519290B (en) | Abnormal flow detection method and device and electronic equipment | |
CN111355697B (en) | Detection method, device, equipment and storage medium for botnet domain name family | |
CN109818970B (en) | Data processing method and device | |
CN110611640A (en) | DNS protocol hidden channel detection method based on random forest | |
EP3905084A1 (en) | Method and device for detecting malware | |
CN111835763B (en) | DNS tunnel traffic detection method and device and electronic equipment | |
CN110798426A (en) | Method and system for detecting flood DoS attack behavior and related components | |
CN112434298B (en) | Network threat detection system based on self-encoder integration | |
CN109660517B (en) | Abnormal behavior detection method, device and equipment | |
CN112528277A (en) | Hybrid intrusion detection method based on recurrent neural network | |
CN111654482B (en) | Abnormal flow detection method, device, equipment and medium | |
CN112437062B (en) | ICMP tunnel detection method, device, storage medium and electronic equipment | |
CN112565229A (en) | Hidden channel detection method and device | |
CN111523588A (en) | Method for classifying APT attack malicious software traffic based on improved LSTM | |
CN109120733B (en) | Detection method for communication by using DNS (Domain name System) | |
CN113378161A (en) | Security detection method, device, equipment and storage medium | |
CN113037748A (en) | C and C channel hybrid detection method and system | |
CN112953948A (en) | Real-time network transverse worm attack flow detection method and device | |
CN113381996B (en) | C & C communication attack detection method based on machine learning | |
CN112235242A (en) | C & C channel detection method and system | |
CN113872980B (en) | Identification method and device of industrial control equipment information, storage medium and equipment | |
CN111371727A (en) | Detection method for NTP protocol covert communication | |
CN114362972B (en) | Botnet hybrid detection method and system based on flow abstract and graph sampling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |