CN113381996A - C & C communication attack detection method based on machine learning - Google Patents

C & C communication attack detection method based on machine learning Download PDF

Info

Publication number
CN113381996A
CN113381996A CN202110637965.1A CN202110637965A CN113381996A CN 113381996 A CN113381996 A CN 113381996A CN 202110637965 A CN202110637965 A CN 202110637965A CN 113381996 A CN113381996 A CN 113381996A
Authority
CN
China
Prior art keywords
flow
machine learning
packet
detection method
communication
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110637965.1A
Other languages
Chinese (zh)
Other versions
CN113381996B (en
Inventor
黄丽荣
陈耿生
蔡悦贞
戴宏鹏
黄嘉诚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Fufu Information Technology Co Ltd
Original Assignee
China Telecom Fufu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Fufu Information Technology Co Ltd filed Critical China Telecom Fufu Information Technology Co Ltd
Priority to CN202110637965.1A priority Critical patent/CN113381996B/en
Publication of CN113381996A publication Critical patent/CN113381996A/en
Application granted granted Critical
Publication of CN113381996B publication Critical patent/CN113381996B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Signal Processing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a C & C communication attack detection method based on machine learning, which comprises the following steps: acquiring a continuous downlink flow packet, filtering the flow packet to enable the length distribution of the flow packet to be normal, and carrying out session aggregation on the flow packet according to a specified condition; extracting session flow characteristics by using random cluster sampling and Apriori algorithm; and calculating the similarity of the aggregated flow context data by combining the editing distance and the Longest Common Subsequence (LCS) for detecting the sequence similarity. The invention can detect the communication of undiscovered malicious software without depending on a feature library; when a large number of attack flow samples are detected, the detection time complexity is low, and the detection time is short.

Description

C & C communication attack detection method based on machine learning
Technical Field
The invention relates to the technical field of communication security, in particular to a C & C communication attack detection method based on machine learning.
Background
At present, there are three aspects to C & C communication detection, which are statistical feature detection based on traffic packets, feature code detection based on traffic payload, and detection based on existing malware supervised machine learning methods.
The existing method has certain defects aiming at the detection of C & C communication attacks. First, existing methods have certain drawbacks to the detection of unpublished or undiscovered malware. Secondly, the detection effect of the existing method is more dependent on a feature library rather than comprehensive. Finally, because normal users use network scenes more diversified, the situation that the normal user traffic attribute features are similar to the malicious traffic attribute features is easily caused, and if the situation is judged according to the size and the arrival time interval of the data packet, the communication process of the existing partial chat software has the similar features to the malicious software. Therefore, the existing method has certain limitation on the detection precision and detection effect of C & C communication. Certain disadvantages exist in the aspect of C & C communication detection. Based on the statistical characteristic detection of the flow packet, as the communication of malicious software changes along with the change of network congestion and the current normal network application scenes are more and more, the statistical characteristics of normal user flow and malicious user flow are easy to be similar, so that the false alarm rate is higher. Based on the feature code detection in the traffic payload, the method has a high detection effect on the existing known malicious software, but if the malicious software is mutated, the feature code is changed, and the detection is invalid. The detection based on the existing malicious software supervised machine learning method is mainly based on the flow characteristics of the existing malicious software to carry out supervised learning, and the detection effect of the method is more dependent on the coverage of a training set of machine learning and the scientificity of the learning method.
Disclosure of Invention
The invention aims to provide a C & C communication attack detection method based on machine learning.
The technical scheme adopted by the invention is as follows:
the C & C communication attack detection method based on machine learning comprises the following steps:
step 1, filtering a flow packet: obtaining continuous downlink flow packets and filtering the flow packets to ensure that the distribution of the length of the flow packets is normal,
step 2, flow session aggregation: carrying out session aggregation on the traffic packet according to a specified condition;
step 3, extracting session flow characteristics by using random cluster sampling and Apriori algorithm;
and 4, calculating the similarity of the aggregated flow context data by combining the editing distance and the Longest Common Subsequence (LCS) for detecting the sequence similarity.
And 5, judging whether abnormal C & C communication exists according to whether the context similarity of the downlink flow of the session exceeds a set value.
Further, as a preferred embodiment, step 1 sets a filtering threshold according to the positive distribution of the packet length of the traffic, filters part of the uncorrelated traffic,
further, as a preferred embodiment, the packet length critical value of the small flow rate packet is calculated by setting the packet filtering rate in step 1, and the final filtering packet length is determined by adopting normal distribution estimation and threshold setting mode comprehensive calculation.
Further, as a preferred embodiment, in step 2, session aggregation is performed according to a source address, a source port, a destination address or a destination port.
Further, as a preferred embodiment, when the amount of data processed in step 3 is too large, a reservoir sampling algorithm is used for probability sampling.
Further, as a preferred embodiment, in step 4, the edit distance calculation is performed on the sequence pairs, the sequence pairs with larger distance values are filtered and removed according to the calculation result, and then the LCS calculation is performed on the sequence pairs.
By adopting the technical scheme, the context similarity detection is carried out on the session flow data after the filtering, sampling and aggregation are carried out on the network flow according to the flow, and then whether the malicious software communication exists is detected. The invention has the following advantages: 1. undetected malware communications can be detected without relying on a feature library. 2. The method is different from the existing malicious software supervised machine learning method, the detection is mainly based on the flow characteristics of the existing malicious software for supervised learning, and the detection effect of the method is more dependent on the coverage of a training set of machine learning and the scientificity of the learning method. 3. For C & C communication detection, the detection algorithm based on downlink payload similarity has higher accuracy and recall rate compared with the detection of a traffic packet detection algorithm and a payload feature code, and has certain advantages in detection time.
Drawings
The invention is described in further detail below with reference to the accompanying drawings and the detailed description;
fig. 1 is a schematic flow chart of the C & C communication attack detection method based on machine learning according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
As shown in fig. 1, the invention discloses a C & C communication attack detection method based on machine learning, which comprises the following steps:
step 1, filtering a flow packet: acquiring a continuous downlink flow packet; at present, the flow in the existing network environment is increasingly large, the malicious software downlink flow packets are mostly small, the flow packets are filtered to avoid resource waste caused by meaningless analysis and detection of non-related flow, so that the distribution of the length of the flow packets is in normal distribution,
further, as a preferred embodiment, step 1 sets a filtering threshold according to the positive distribution of the packet length of the traffic, and filters the part of the uncorrelated traffic. Specifically, a packet length critical value of the small flow packet is calculated by setting a packet filtering rate, and finally the filtering packet length is determined by adopting normal distribution estimation and threshold setting mode comprehensive calculation.
Step 2, flow session aggregation: carrying out session aggregation on the traffic packet according to a specified condition;
step 3, extracting session flow characteristics by using random cluster sampling and Apriori algorithm;
and 4, calculating the similarity of the aggregated flow context data by combining the editing distance and the Longest Common Subsequence (LCS) for detecting the sequence similarity.
And 5, judging whether abnormal C & C communication exists according to whether the context similarity of the downlink flow of the session exceeds a set value.
Further, as a preferred embodiment, in step 2, session aggregation is performed according to a source address, a source port, a destination address or a destination port.
Further, as a preferred embodiment, the sampling in step 3 is to extract a sample representing the population from the population by a certain sampling algorithm. The overall characteristics are predicted by detecting the characteristics of the extracted samples, the content similarity in payload of continuous downlink flow is detected, the condition that continuity possibly occurs in the same name and interest in the actual attack process is considered, therefore, a random cluster sampling algorithm is adopted, and if the processing data volume is overlarge, a reservoir sampling algorithm can be adopted for probability sampling.
Further, as a preferred embodiment, in step 4, the edit distance calculation is performed on the sequence pairs, the sequence pairs with larger distance values are filtered and removed according to the calculation result, and then the LCS calculation is performed on the sequence pairs.
Specifically, the detection of the similarity of the downlink traffic packet sequence is mainly based on the combination of the value algorithm for calculating the Longest Common Subsequence (LCS) and calculating the edit distance between the two sequences. The LCS is the longest common subsequence, and the similarity of the two sequences is obtained by obtaining the length of the maximum common subsequence of the two sequences. The longest common subsequence is generally obtained by using a dynamic programming algorithm. The editing distance, also called Levenshtein distance, represents the minimum number of edits required to convert one character string into another character string, and the editing refers to replacing one character in the character string with another character or inserting and deleting characters.
Because the complexity of the calculation time of the edit distance is low, some irrelevant sequence pairs can be removed firstly, and the similarity of LCS calculation is more accurate, the detection result has higher reliability.
By adopting the technical scheme, the context similarity detection is carried out on the session flow data after the filtering, sampling and aggregation are carried out on the network flow according to the flow, and then whether the malicious software communication exists is detected. The invention has the following advantages: 1. undetected malware communications can be detected without relying on a feature library. 2. The method is different from the existing malicious software supervised machine learning method, the detection is mainly based on the flow characteristics of the existing malicious software for supervised learning, and the detection effect of the method is more dependent on the coverage of a training set of machine learning and the scientificity of the learning method. 3. For C & C communication detection, the detection algorithm based on downlink payload similarity has higher accuracy and recall rate compared with the detection of a traffic packet detection algorithm and a payload feature code, and has certain advantages in detection time.
It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. The embodiments and features of the embodiments in the present application may be combined with each other without conflict. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the detailed description of the embodiments of the present application is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Claims (6)

1. The C & C communication attack detection method based on machine learning is characterized by comprising the following steps: which comprises the following steps:
step 1, filtering a flow packet: obtaining continuous downlink flow packets and filtering the flow packets to ensure that the distribution of the length of the flow packets is normal,
step 2, flow session aggregation: carrying out session aggregation on the traffic packet according to a specified condition;
step 3, extracting session flow characteristics by using random cluster sampling and Apriori algorithm;
and 4, calculating the similarity of the aggregated flow context data by combining the editing distance and the longest public subsequence for sequence similarity detection.
2. And 5, judging whether abnormal C & C communication exists according to whether the context similarity of the downlink flow of the session exceeds a set value.
3. The machine learning-based C & C communication attack detection method according to claim 1, characterized in that: step 1, setting a filtering threshold value according to the positive distribution of the length of the flow packet, filtering part of irrelevant flow,
the machine learning-based C & C communication attack detection method according to claim 1, characterized in that: step 1, calculating the packet length critical value of the small flow packet by setting the packet filtering rate, and finally determining the filtering packet length by adopting normal distribution estimation and threshold setting mode comprehensive calculation.
4. The machine learning-based C & C communication attack detection method according to claim 1, characterized in that: and in step 2, carrying out session aggregation according to the source address, the source port, the destination address or the destination port.
5. The machine learning-based C & C communication attack detection method according to claim 1, characterized in that: and 3, when the processing data volume is overlarge, performing probability sampling by adopting a reservoir sampling algorithm.
6. The machine learning-based C & C communication attack detection method according to claim 1, characterized in that: in step 4, the edit distance of the sequence pair is calculated, the sequence pair with larger distance value is screened and removed according to the calculation result, and then LCS calculation is carried out on the sequence pair.
CN202110637965.1A 2021-06-08 2021-06-08 C & C communication attack detection method based on machine learning Active CN113381996B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110637965.1A CN113381996B (en) 2021-06-08 2021-06-08 C & C communication attack detection method based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110637965.1A CN113381996B (en) 2021-06-08 2021-06-08 C & C communication attack detection method based on machine learning

Publications (2)

Publication Number Publication Date
CN113381996A true CN113381996A (en) 2021-09-10
CN113381996B CN113381996B (en) 2023-04-28

Family

ID=77576530

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110637965.1A Active CN113381996B (en) 2021-06-08 2021-06-08 C & C communication attack detection method based on machine learning

Country Status (1)

Country Link
CN (1) CN113381996B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030236995A1 (en) * 2002-06-21 2003-12-25 Fretwell Lyman Jefferson Method and apparatus for facilitating detection of network intrusion
US20100138919A1 (en) * 2006-11-03 2010-06-03 Tao Peng System and process for detecting anomalous network traffic
US20130174256A1 (en) * 2011-12-29 2013-07-04 Architecture Technology Corporation Network defense system and framework for detecting and geolocating botnet cyber attacks
CN103297433A (en) * 2013-05-29 2013-09-11 中国科学院计算技术研究所 HTTP botnet detection method and system based on net data stream
CN103746982A (en) * 2013-12-30 2014-04-23 中国科学院计算技术研究所 Automatic generation method and system for HTTP (Hyper Text Transport Protocol) network feature code
CN104683346A (en) * 2015-03-06 2015-06-03 西安电子科技大学 P2P botnet detection device and method based on flow analysis
CN106034056A (en) * 2015-03-18 2016-10-19 北京启明星辰信息安全技术有限公司 Service safety analysis method and system thereof
CN106101121A (en) * 2016-06-30 2016-11-09 中国人民解放军防空兵学院 A kind of all-network flow abnormity abstracting method
US20160381054A1 (en) * 2015-06-26 2016-12-29 Board Of Regents, The University Of Texas System System and device for preventing attacks in real-time networked environments
US9870465B1 (en) * 2013-12-04 2018-01-16 Plentyoffish Media Ulc Apparatus, method and article to facilitate automatic detection and removal of fraudulent user information in a network environment
CN107665191A (en) * 2017-10-19 2018-02-06 中国人民解放军陆军工程大学 A kind of proprietary protocol message format estimating method based on expanded prefix tree
CN107733937A (en) * 2017-12-01 2018-02-23 广东奥飞数据科技股份有限公司 A kind of Abnormal network traffic detection method
US20180131717A1 (en) * 2016-11-10 2018-05-10 Electronics And Telecommunications Research Institute Apparatus and method for detecting distributed reflection denial of service attack
CN108965248A (en) * 2018-06-04 2018-12-07 上海交通大学 A kind of P2P Botnet detection system and method based on flow analysis

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030236995A1 (en) * 2002-06-21 2003-12-25 Fretwell Lyman Jefferson Method and apparatus for facilitating detection of network intrusion
US20100138919A1 (en) * 2006-11-03 2010-06-03 Tao Peng System and process for detecting anomalous network traffic
US20130174256A1 (en) * 2011-12-29 2013-07-04 Architecture Technology Corporation Network defense system and framework for detecting and geolocating botnet cyber attacks
CN103297433A (en) * 2013-05-29 2013-09-11 中国科学院计算技术研究所 HTTP botnet detection method and system based on net data stream
US9870465B1 (en) * 2013-12-04 2018-01-16 Plentyoffish Media Ulc Apparatus, method and article to facilitate automatic detection and removal of fraudulent user information in a network environment
CN103746982A (en) * 2013-12-30 2014-04-23 中国科学院计算技术研究所 Automatic generation method and system for HTTP (Hyper Text Transport Protocol) network feature code
CN104683346A (en) * 2015-03-06 2015-06-03 西安电子科技大学 P2P botnet detection device and method based on flow analysis
CN106034056A (en) * 2015-03-18 2016-10-19 北京启明星辰信息安全技术有限公司 Service safety analysis method and system thereof
US20160381054A1 (en) * 2015-06-26 2016-12-29 Board Of Regents, The University Of Texas System System and device for preventing attacks in real-time networked environments
CN106101121A (en) * 2016-06-30 2016-11-09 中国人民解放军防空兵学院 A kind of all-network flow abnormity abstracting method
US20180131717A1 (en) * 2016-11-10 2018-05-10 Electronics And Telecommunications Research Institute Apparatus and method for detecting distributed reflection denial of service attack
CN107665191A (en) * 2017-10-19 2018-02-06 中国人民解放军陆军工程大学 A kind of proprietary protocol message format estimating method based on expanded prefix tree
CN107733937A (en) * 2017-12-01 2018-02-23 广东奥飞数据科技股份有限公司 A kind of Abnormal network traffic detection method
CN108965248A (en) * 2018-06-04 2018-12-07 上海交通大学 A kind of P2P Botnet detection system and method based on flow analysis

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
YUNMING WANG: "Measure of invulnerability for command and control network based on network invulnerability entropy", 《2016 2ND INTERNATIONAL CONFERENCE ON CONTROL SCIENCE AND SYSTEMS ENGINEERING (ICCSSE)》 *
牛伟纳;张小松;孙恩博;杨国武;赵凌园;: "基于流相似性的两阶段P2P僵尸网络检测方法" *
苏欣;张大方;罗章琪;曾彬;黎文伟;: "基于Command and Control通信信道流量属性聚类的僵尸网络检测方法" *
陈兴蜀等: "基于告警属性聚类的攻击场景关联规则挖掘方法研究", 《工程科学与技术》 *

Also Published As

Publication number Publication date
CN113381996B (en) 2023-04-28

Similar Documents

Publication Publication Date Title
CN111935170B (en) Network abnormal flow detection method, device and equipment
CN109714322B (en) Method and system for detecting network abnormal flow
CN110519290B (en) Abnormal flow detection method and device and electronic equipment
CN111355697B (en) Detection method, device, equipment and storage medium for botnet domain name family
CN109818970B (en) Data processing method and device
CN110611640A (en) DNS protocol hidden channel detection method based on random forest
EP3905084A1 (en) Method and device for detecting malware
CN111835763B (en) DNS tunnel traffic detection method and device and electronic equipment
CN110798426A (en) Method and system for detecting flood DoS attack behavior and related components
CN112434298B (en) Network threat detection system based on self-encoder integration
CN109660517B (en) Abnormal behavior detection method, device and equipment
CN112528277A (en) Hybrid intrusion detection method based on recurrent neural network
CN111654482B (en) Abnormal flow detection method, device, equipment and medium
CN112437062B (en) ICMP tunnel detection method, device, storage medium and electronic equipment
CN112565229A (en) Hidden channel detection method and device
CN111523588A (en) Method for classifying APT attack malicious software traffic based on improved LSTM
CN109120733B (en) Detection method for communication by using DNS (Domain name System)
CN113378161A (en) Security detection method, device, equipment and storage medium
CN113037748A (en) C and C channel hybrid detection method and system
CN112953948A (en) Real-time network transverse worm attack flow detection method and device
CN113381996B (en) C & C communication attack detection method based on machine learning
CN112235242A (en) C & C channel detection method and system
CN113872980B (en) Identification method and device of industrial control equipment information, storage medium and equipment
CN111371727A (en) Detection method for NTP protocol covert communication
CN114362972B (en) Botnet hybrid detection method and system based on flow abstract and graph sampling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant