WO2021082339A1 - 将机器学习和规则匹配相融合的安全检测方法和设备 - Google Patents
将机器学习和规则匹配相融合的安全检测方法和设备 Download PDFInfo
- Publication number
- WO2021082339A1 WO2021082339A1 PCT/CN2020/079972 CN2020079972W WO2021082339A1 WO 2021082339 A1 WO2021082339 A1 WO 2021082339A1 CN 2020079972 W CN2020079972 W CN 2020079972W WO 2021082339 A1 WO2021082339 A1 WO 2021082339A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- traffic
- machine learning
- data
- network traffic
- learning model
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/145—Network analysis or design involving simulating, designing, planning or modelling of a network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
- H04L63/1466—Active attacks involving interception, injection, modification, spoofing of data unit addresses, e.g. hijacking, packet injection or TCP sequence number attacks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/16—Implementing security features at a particular protocol layer
- H04L63/166—Implementing security features at a particular protocol layer at the transport layer
Definitions
- the present disclosure relates to the technical field of machine learning and information security, and more particularly to a security detection method and device that integrates machine learning and rule matching.
- IDS Intrusion Detection System
- Snort is an IDS with inline anti-intrusion function that supports medium and high-speed networks. It includes a module for obtaining network packets, a module for decoding and classifying network packets, and a module for detecting malicious packets based on a rule set. Snort uses rule sets to check whether there is malicious traffic in network packets, and trigger an alert when the payload of the packet matches one of the rules. Snort's single-threaded architecture is shown in Figure 1.
- IDS must handle higher network traffic to detect malicious traffic, at a speed of about 10Gbps. If IDS cannot perform packet inspection at the required rate, they will allow undetected malicious packets to enter the computer network.
- the purpose of the present disclosure is to provide a security detection method and device that integrates machine learning and rule matching.
- the method and device can detect both known malicious traffic and unknown malicious traffic, and minimize the errors of the intrusion detection system. Report rate and false report rate to ensure the security of computer network.
- the purpose of the present disclosure is achieved by a security detection method that combines machine learning and rule matching, and the method includes:
- the recognition process includes: extracting features of the preprocessed network traffic, and then based on the extracted features, using the trained Machine learning models to identify malicious traffic;
- the malicious traffic detected by the rule-based matching method is merged with the malicious traffic identified by the trained machine learning model.
- a security detection device that integrates machine learning and rule matching, and the device includes:
- a memory where the memory stores instructions, which when executed by the processor, cause the processor to:
- the recognition process includes: extracting features of the preprocessed network traffic, and then based on the extracted features, using the trained Machine learning models to identify malicious traffic;
- the outgoing malicious traffic is merged.
- the above-mentioned technical solution provided by the present disclosure uses a method based on rule matching to detect known malicious traffic, and at the same time, uses a machine learning method to detect unknown malicious traffic, thereby reducing the false positive rate and false negative rate of the intrusion detection system, and improving Improve the accuracy of malicious traffic detection.
- FIG. 1 is a schematic diagram of a single-threaded architecture of Snort provided by the background technology of the disclosure
- FIG. 2 is a flowchart of a security detection method that combines machine learning and rule matching according to an embodiment of the present disclosure
- FIG. 3 is an architecture diagram of a security detection system that integrates machine learning and rule matching according to an embodiment of the present disclosure
- FIG. 4 is a schematic diagram of a software load distributor based on a multi-core CPU according to an embodiment of the present disclosure
- Fig. 5 is a structural diagram of a safety detection system according to an embodiment of the present disclosure.
- Fig. 6 is a block diagram of a security detection device that integrates machine learning and rule matching according to an embodiment of the present disclosure.
- the embodiments of the present disclosure provide a security detection method and device that integrates machine learning and rule matching.
- the method and device use both a rule-based matching method and a machine learning method to detect known and unknown malicious traffic. , Thereby reducing the rate of false positives and false negatives of the intrusion detection system, and improving the accuracy of malicious traffic detection.
- the GPU parallel computing technology can be used according to the embodiments of the present disclosure, so that the system can meet the requirements of high throughput.
- FIG. 2 shows a security detection method 20 that combines machine learning and rule matching according to an embodiment of the present disclosure.
- the method may include the following steps: at step S200, a machine learning model is established; at step S202, the machine learning model established by using labeled legitimate traffic and malicious traffic is trained; at step S204 , Collect network traffic; at step S206, preprocess the collected network traffic; at step S208, use a rule-based matching method to detect malicious traffic from the preprocessed network traffic; in step S210, preprocess The processed network traffic is subjected to feature extraction (step S210 1 ), and then based on the extracted features, the trained machine learning model is used to identify malicious traffic (step S210 2 ); and at step S212, the rule-based matching The malicious traffic detected by the method is merged with the malicious traffic identified by the trained machine learning model.
- the method 20 may also optionally include: at step S203, verifying the trained machine learning model using a verification data set; at step S205, verifying the collected network according to a specified sampling rule The flow is sampled; and at step S213, the result of the fusion is visualized.
- the step S206 of the method 20 further includes preprocessing the sampled data stream.
- FIG. 3 shows an exemplary security detection system 30 that combines machine learning and rule matching according to an embodiment of the present disclosure. The steps of the method 20 shown in FIG. 2 will be described in more detail below in conjunction with the safety detection system 30 shown in FIG. 3.
- the safety detection system 30 shown in FIG. 3 mainly includes an offline part 310 and an online part 320. Steps S200 and S202 and optional step S203 of method 20 may be executed in the offline part 310 shown in FIG. 3. That is to say, in the offline part, a machine learning model 312 is established, using labeled legal traffic and malicious traffic as the training data set 314 to train the established machine learning model 312, and optionally, using verification data Set 316 to verify the trained machine learning model.
- Steps S204-S212 and optional steps S205 and S213 of method 20 may be performed in the online part 320 shown in FIG. 3.
- network traffic is collected and preprocessed.
- two parts are processed in parallel or sequentially: The first part is to detect malicious traffic from the preprocessed network traffic using a rule-based matching method; the second part is to use a machine learning model to identify the preprocessed network
- the identification process may include feature extraction from the preprocessed network traffic, and then, based on the extracted features, the machine learning model trained in the offline part is used to identify the malicious traffic.
- the results of these two parts of processing are combined to achieve the interception of malicious traffic.
- a machine learning model 312 is first established.
- Machine learning models that can be selected include support vector machines, decision trees, fuzzy logic, naive Bayes, and neural networks. Then, the labeled legitimate traffic and malicious traffic are used as the training set 314. Extract time-based features, network layer-based features, and time-to-live (TTL)-based features from the training set. Then, model training is performed on the established machine learning model 310 based on these extracted features.
- the established machine learning model can be trained by referring to the traditional model training method.
- the verification data set 314 can be used to verify the trained machine learning model.
- the validated model can be used for online processing.
- the offline part can perform high-speed parallel operations on the GPU, thereby effectively increasing the operating speed of the system and meeting high-throughput requirements.
- the online portion 320 of the exemplary system 30 may include:
- the network traffic collection module 321 is used to collect network traffic (step S204 shown in FIG. 2);
- the traffic sampling module 322 is configured to sample the collected network traffic according to a specified sampling rule (optional step S205 in FIG. 2), and the traffic sampling module 324 may be optional;
- the data preprocessing module 323 is configured to preprocess the collected or sampled (if the collected network traffic is sampled) network traffic (step 206 in FIG. 2);
- the rule matching module 324 is configured to use a rule-based matching method to detect malicious traffic from the preprocessed result (step S208 in FIG. 2);
- the feature extraction module 325 is used to perform feature extraction on the result of the preprocessing
- the traffic classification module 326 is configured to use the machine learning model trained in the offline part to classify network traffic based on the features extracted by the feature extraction module 330 (step S210 in FIG. 2), so as to identify malicious traffic;
- the result fusion module 327 is used to merge the malicious traffic detected 324 by the rule matching module and the malicious traffic identified by the feature extraction module 325 and the traffic classification module 326 (step S212 in FIG. 2), and
- the result display module 328 is used to display the result of the fusion through visualization technology (optional step S213 in FIG. 2).
- the rule matching module 324, the feature extraction module 325, and the traffic classification module 326 can run in parallel on the GPU, thereby increasing the computing speed and meeting the requirements for processing high-throughput network traffic.
- the premise of network intrusion detection is to effectively collect network traffic. Online real-time intrusion detection systems often need to process input traffic up to 10-100Gbps. Therefore, high-speed packet capture technology is a prerequisite for subsequent flow identification.
- a high-performance Data Plane Development Kit (DPDK) is adopted in the embodiments of the present disclosure.
- the network traffic collection module designed with DPDK can be based on zero-copy technology and use direct memory access (DMA) structure to directly copy data packets from the cache queue of the network card to the user space, thereby bypassing the processing part of the intermediate kernel space and saving a lot of money
- DMA direct memory access
- the traditional DPDK runs on the CPU in a serial mode, which makes it difficult to meet the demand for capturing high-throughput data.
- the traditional data capture tool is modified so that the network traffic collection module runs on the GPU, so as to improve the efficiency of network traffic collection.
- mainstream network cards support dividing their ring buffers into multiple hardware queues (typically, a single network card supports up to 16 queues). This feature can be used in multi-core CPU scenarios for packet processing.
- the scheduling method based on the hash function the massive input data packets can be distributed to multiple network card queues for load balancing, which realizes the load balancing of the data flow granularity.
- the embodiment according to the present disclosure mainly adopts a method based on a hash function to map the input data stream to a dedicated queue.
- the four-tuple (source IP address, source port number, destination IP address, and destination port number) of the data packet in the data stream is mainly used as input to obtain the hash value.
- FIG. 4 shows a schematic diagram of a software load distributor 40 based on a multi-core CPU according to an embodiment of the present disclosure.
- K CPUs are used as load distributors, and the remaining M CPUs are used as workers.
- the load distributor is specifically used to retrieve data packets from the network card queue, and then distribute them to idle workers.
- the workers are mainly responsible for the subsequent data packet preprocessing process.
- the security detection system 30 is provided with a flow sampling module.
- Sampling refers to the process of extracting some representative data from a large amount of data according to a certain sampling rule. Set up different sampling functions according to different needs, in order to reduce the consumption of memory and CPU of the measurement equipment in the high-speed network.
- Packet sampling can be easily implemented under the premise of using very little CPU power and memory. However, packet sampling cannot accurately infer the statistical characteristics of the flow. Adaptive packet sampling technology can adjust the sampling rate, thereby reducing memory consumption or increasing the accuracy of statistics. The emergence of stream sampling overcomes the limitations of packet sampling, which can improve accuracy, but requires more memory and CPU. In order to solve these problems, especially to reduce memory and bandwidth consumption, a flexible sampling algorithm is adopted in the present disclosure.
- Data preprocessing is to perform some processing on the data before the formal inspection. Use the corresponding plug-in to check the original data packet in the data stream, and find the "behavior" of the original data, such as port scanning, IP fragmentation, etc. Data packets in the data stream can be passed to the traditional rule matching module and feature extraction and traffic classification module after preprocessing.
- Data preprocessing mainly includes: packet reorganization, protocol decoding and anomaly detection, etc.
- Packet reorganization is mainly divided into fragment reorganization and stream reorganization.
- Fragmentation reassembly means that the data link layer uses MTU (Maximum Transmission Unit) to limit the size of data packets that can be transmitted. When the size of the sent IP datagram exceeds the MTU, the IP layer needs to fragment the data.
- Stream reassembly means that TCP divides the data stream into message segments of appropriate length, where the maximum message segment size (MSS) is usually limited by the Ethernet MTU. Because TCP uses IP to deliver its message segments, IP does not provide the functions of duplication elimination and ensuring the correct order, so flow reassembly is mainly used to deal with packet out-of-sequence and packet duplication.
- MTU Maximum Transmission Unit
- Protocol decoding is the process of decoding the protocol of the data packet into a unified format so that the traditional rule matching module can perform rule matching.
- URLs have many expression formats, such as ASCII and Unicode. Different expression formats bring great inconvenience to malicious traffic monitoring. Attack messages can often be detected in one format.
- protocol decoding messages in various formats are converted into detectable standard formats in advance to facilitate subsequent detection.
- Anomaly detection includes port scanning and so on.
- Port access can be legal or illegal, but there is no fixed rule to determine whether a certain port access is legal. If it is detected by rule matching, it may have a higher rate of false positives and false negatives. Therefore, the data preprocessing module uses the state detection method to perform statistical analysis on the port access and destination host within a certain period of time, and sends out alarms for the port access beyond the normal conditions.
- the output of the preprocessing is the data after packet reassembly and protocol decoding.
- certain illegal traffic such as Dos attack traffic
- using rule matching to detect Dos may result in a high false positive rate and a false negative rate. Therefore, the present disclosure introduces an anomaly detection method in the preprocessing stage to clear out these illegal traffic in advance, and if there is illegal access to the port, an alarm can be generated.
- the traditional rule matching module uses the rule set and matching algorithm of the existing intrusion detection system to detect malicious traffic, such as Snort and Hyperscan.
- the traditional rule matching algorithm is generally the Aho-Corasick algorithm and the matching algorithm based on regular expressions.
- the traditional rule matching algorithm will mark the traffic as malicious traffic and trigger an alarm, so that the malicious traffic that has been set in the rule set can be found.
- the traditional rule matching algorithm runs on the CPU, which makes it unable to meet the high-throughput and real-time requirements of intrusion detection systems.
- the present disclosure improves the traditional rule matching algorithm so that it can be operated in parallel on the GPU, thereby effectively improving the efficiency of the safety detection system.
- the Parallel Failureless Aho-Corasick (PFAC) algorithm is used to realize the detection of malicious traffic.
- the PFAC algorithm effectively utilizes the parallelism of the AC algorithm.
- the PFAC algorithm creates a separate thread for each byte of the input data stream to identify any pattern starting from the beginning of the thread. The number of threads created is equal to the length of the input data stream.
- Each thread of PFAC is only responsible for identifying patterns starting from the beginning of the thread. Whenever the thread cannot find any mode at the starting position, the thread terminates without failover with the backtracking state machine.
- Each final state of PFAC represents a unique mode, which can maintain the uniqueness of each final state in PFAC without processing multiple outputs.
- the effective load of network traffic is matched and verified with multiple rules in the intrusion detection rule set in parallel at the same time. If a match occurs, mark the traffic as malicious traffic and trigger an alarm, and then the malicious traffic that has been set in the rule set can be found.
- This algorithm is effectively suitable for GPU parallel computing and improves the detection efficiency of the traditional rule matching module.
- the characteristics of network data traffic are diverse.
- the feature extraction module 325 first needs to extract relevant features that need to be counted.
- the features that need to be extracted include: source port, source address, destination port, destination address, ICMP type, protocol identifier, original data length and original data, etc.
- the traffic classification module 326 uses the machine learning model trained in the offline part to classify the network traffic as legitimate traffic or malicious traffic, thereby effectively identifying unknown malicious traffic that cannot be detected using the rule matching method.
- the characteristic data information corresponding to a specific data stream must be updated in real time when a data packet arrives.
- the massive traffic of up to 100Gbps may contain tens of thousands of active data streams and millions of data packets per second, this will make it extremely challenging to quickly retrieve the target feature data under such a large number of update requests. Sex.
- a hash table is implemented in the GPU to maintain and track the index of the feature data corresponding to each active data stream.
- the hash value unique to each GPU data unit is used to determine a specific data stream.
- An atomic lock is used on each mutually exclusive hash entry, so that only one thread is allowed to update its hash entry at a time.
- the corresponding data stream will become inactive, which will trigger the operation of deleting the characteristic data corresponding to the corresponding data stream from the hash table.
- the time of the last data packet arriving is recorded in the hash table.
- a threshold-based method is used to determine inactive data streams.
- the characteristic data of the corresponding data stream is considered to be inactive at this time.
- Set a timing task to output the feature data of the inactive data stream for deep analysis (that is, use the machine learning model trained in the offline part for classification), or directly output the feature data of the inactive data stream to an output File (that is, save the stream statistical information extracted by the feature extraction and traffic analysis module) for offline analysis.
- the fusion module can merge the malicious traffic detected by the rule matching module 324 and the malicious traffic identified by the feature extraction module 325 and the traffic classification module 326, so that malicious traffic can be intercepted and legal traffic can pass smoothly.
- the result display module can save the characteristics of the intercepted malicious traffic in the database and display the result of the fusion through visualization technology, so as to show whether the system has been maliciously attacked in real time, so as to take corresponding actions and make follow-up characteristics of the malicious traffic. analysis.
- Identify known malicious traffic Detect known malicious traffic by using the rule set of the traditional intrusion detection system for rule matching. If malicious traffic matches the rule set, an alert will be triggered. This method is highly efficient and has a low false alarm rate.
- the security detection system uses the Snort open source intrusion detection system as the main framework, and Snort is initialized first. Then we use the network traffic capture tool DPDK to collect network traffic. In order to meet the demand for high-throughput real-time performance, DPDK is transplanted to GPU to capture data packets in parallel, thereby effectively improving the efficiency of network traffic collection. Then, a sampling function is set for the safety detection system 30. Since the design of this system needs to meet the requirements of real-time and high throughput, in this example, a sampling function is set so that one data packet is captured every two data packets. Subsequently, the sampled data is preprocessed for packet reassembly, protocol decoding and port detection.
- Snort's rule matching algorithm is transplanted to the GPU for parallel computing, thereby improving the efficiency of rule matching.
- Another thread performs feature extraction on the source port, source address, destination port, destination address, ICMP type, protocol identifier, original data length, and original data of the data packet. Then use the trained machine learning model to classify the traffic to identify unknown malicious traffic.
- the results obtained by the two threads are merged, so that the malicious traffic can be effectively intercepted and the legitimate traffic can pass the detection system smoothly.
- the result of the fusion can be displayed on a visual interface, and/or information related to malicious traffic can be stored in a database for subsequent analysis and processing.
- FIG. 6 shows a block diagram of a security detection device 60 that integrates machine learning and rule matching according to an embodiment of the present disclosure.
- the safety detection device 60 may include a processor 62 and a memory 64.
- the memory 64 stores instructions, which can be executed by the processor 62.
- the processor 62 When the instruction is executed by the processor 62, the processor 62 is caused to: establish a machine learning model; use the machine learning model established by training with tagged legitimate traffic and malicious traffic; collect network traffic; Processing: Use rule-based matching methods to detect malicious traffic from preprocessed network traffic; perform feature extraction on the preprocessed network traffic, and then use the trained machine learning model to identify malicious traffic based on the extracted features; And the fusion of the malicious traffic detected by the rule-based matching method and the malicious traffic identified by the trained machine learning model.
- the processor 62 when the instructions are executed by the processor 62, the processor 62 is also caused to execute any steps of the method shown in FIG. 2.
- the foregoing embodiments can be implemented by software, or can be implemented by means of software plus a necessary general hardware platform.
- the technical solutions of the above-mentioned embodiments can be embodied in the form of a software product, and the software product can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.).
- the non-volatile storage medium includes a number of instructions to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the various embodiments of the present disclosure.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computer Hardware Design (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Description
Claims (21)
- 一种将机器学习和规则匹配相融合的安全检测方法(20),所述方法包括:建立(200)机器学习模型;利用带有标签的合法流量和恶意流量来训练(S202)所述机器学习模型;采集(S204)网络流量;对采集到的网络流量进行预处理(S206);采用基于规则匹配的方法从预处理后的网络流量中检测(S208)恶意流量;采用所述训练后的机器学习模型来从预处理的网络流量中识别(S210)恶意流量,其中,所述识别过程包括:对预处理后的网络流量进行特征提取(S210 1),并且基于提取到的特征,利用训练后的机器学习模型来识别(S210 2)恶意流量;以及对采用所述基于规则匹配的方法检测到的恶意流量和利用所述训练后的机器学习模型识别出的恶意流量进行融合(S212)。
- 根据权利要求1所述的安全检测方法(20),其中,利用带有标签的合法流量和恶意流量来训练所述机器学习模型(S202)包括:从带有标签的合法流量和恶意流量中提取基于时间的特征、基于网络层的特征和基于TTL的特征(S202 1);以及基于所提取的特征来训练所述机器学习模型(S202 2);并且其中,所述安全检测方法(20)还包括使用验证数据集来验证训练后的机器学习模型(S203)。
- 根据权利要求1所述的安全检测方法(20),其中,所述安全检测方法(20)还包括:按照指定采样规则,对采集到的网络流量进行采样(S205),并且对采集到的网络流量进行预处理(S206)还包括:对采样得到的网络流量进行预处理;并且其中,所述安全检测方法(20)还包括:通过可视化技术来显示融合的结果(S214)。
- 根据权利要求1或3所述的安全检测方法(20),其中,对所述网络流量的采集是在GPU上执行的,并且其中,基于零拷贝技术,利用直接内存存取结构,将网络流量中的数据包从网卡的缓存队列直接复制到用户空间。
- 根据权利要求1或3所述的安全检测方法(20),其中,对所述采集到的网络流量进行预处理(S206)包括对所述采集到的网络流量中的数据包进行数据包重组、协议解码和/或异常检测;其中,所述数据包重组分为流重组和分片重组,所述协议解码是将数据包的协议解码成统一的格式,所述异常检测至少包括端口扫描;并且其中,当数据包通过异常检测时,预处理的结果是经过数据包重组与协议解码的数据;否则,产生报警。
- 根据权利要求1或3所述的安全检测方法(20),其中采用所述基于规则匹配的方法从所述预处理后的网络流量中检测恶意流量(S208)包括:使用PFAC算法来检测恶意流量;其中,所述PFAC算法为输入数据流的每个字节创建一个单独的线程,以标识从线程起始位置开始的任何模式,所创建的线程数等于输入数据流的长度;其中,所述PFAC算法的每个线程仅负责识别从线程起始位置开始的模式,每当线程找不到位于线程起始位置的任何模式时,终止而不以回溯状态机进行故障转换;所述PFAC算法的每个最终状态代表一种独特的模式,从而能够在不处理多个输出的情况下保持PFAC中每个最终状态的唯一性;其中,通过PFAC算法,将数据流的有效负载同时并行地与入侵检测的规则集中的多条规则进行匹配验证,如果发生匹配,则将数据流标示为恶意流量并触发警报。
- 根据权利要求1或3所述的安全检测方法(20),其中,对所述预处理后的网络流量进行特征提取(S210 1)包括提取如下特征:源端口、源地址、目的端口、目的地址、ICMP类型、协议标识符、原始数据长度和原始数据。
- 根据权利要求8所述的安全检测方法(20),其中,对所述预处理后的网络流量进行特征提取(S210 1)包括:在GPU中实现一个哈希表,所述哈希表用来维护和追踪网络流量中的每一条活跃流量的特征数据的索引,每个数据单元特有的哈希值用来确定一条特定的数据流;其中,每一个互斥的哈希条目上使用了原子锁,使得每一个时刻只有一个线程被允许更新其哈希条目;当一个特征数据传输结束时,其对应的数据流会变成非活跃的,这将触发从哈希表中删除相应数据流对应的特征数据的操作;对所述网络流量中的每条数据流而言,最后到达的数据包的时间被记录在了哈希表中,其中,采用一种基于阈值的方法来确定一条非活跃的数据流,所述基于阈值的方法包括当时间间隔超过了阈值时,确定相应数据流的特征数据是非活跃的;其中,通过设置一个定时任务来输出非活跃数据流的特征数据,并且其中,基于特征数据,利用所述训练后的机器学习模型进行分类。
- 根据权利要求1所述的安全检测方法(20),其中,建立和训练所述机器学习模型的步骤是离线地执行的,采集、预处理、检测、识别和融合的步骤是在线地执行的。
- 一种将机器学习和规则匹配相融合的安全检测设备(60),包括:处理器(62);以及存储器(64),所述存储器存储指令,所述指令当由所述处理器(62)执行时,使所述处理器(62):建立机器学习模型;利用带有标签的合法流量和恶意流量来训练所述机器学习模型;采集网络流量;对采集到的网络流量进行预处理;采用基于规则匹配的方法从预处理后的网络流量中检测恶意流量;采用所述训练后的机器学习模型来从预处理的网络流量中识别恶意流量,其中,所述识别过程包括:对预处理后的网络流量进行特征提取,并且基于提取到的特征,利用训练后的机器学习模型来识别恶意流量;以及对采用所述基于规则匹配的方法检测到的恶意流量和利用所述训练后的机器学习模型识别出的恶意流量进行融合。
- 根据权利要求11所述的安全检测设备(60),其中,所述指令当由所述处理器(62)执行时,还使所述处理器(62):从带有标签的合法流量和恶意流量中提取基于时间的特征、基于网络层的特征和基 于TTL的特征;基于所提取的特征来训练所述机器学习模型;以及使用验证数据集来验证训练后的机器学习模型。
- 根据权利要求11所述的安全检测设备(60),其中,所述指令当由所述处理器(62)执行时,还使所述处理器(62):按照指定采样规则,对采集到的网络流量进行采样,并且对采样得到的网络流量进行预处理;并且其中,所述指令当由所述处理器(62)执行时,还使所述处理器(62)通过可视化技术来显示融合的结果。
- 根据权利要求11或13所述的安全检测设备(60),其中,对所述网络流量的采集是在GPU上执行的,并且其中,基于零拷贝技术,利用直接内存存取结构,将网络流量中的数据包从网卡的缓存队列直接复制到用户空间。
- 根据权利要求11或13所述的安全检测设备(60),其中,所述指令当由所述处理器(62)执行时,还使所述处理器(62):对所述采集到的网络流量中的数据包进行数据包重组、协议解码和/或异常检测;其中,所述数据包重组分为流重组和分片重组,所述协议解码是将数据包的协议解码成统一的格式,所述异常检测至少包括端口扫描;并且其中,当数据包通过异常检测时,预处理的结果是经过数据包重组与协议解码的数据;否则,产生报警。
- 根据权利要求11或13所述的安全检测设备(60),其中,所述指令当由所述处理器(62)执行时,还使所述处理器(62):使用PFAC算法来检测恶意流量;其中,所述PFAC算法为输入数据流的每个字节创建一个单独的线程,以标识从线程起始位置开始的任何模式,所创建的线程数等于输入数据流的长度;其中,所述PFAC算法的每个线程仅负责识别从线程起始位置开始的模式,每当线程找不到位于线程起始位置的任何模式时,终止而不以回溯状态机进行故障转换;所述PFAC算法的每个最终状态代表一种独特的模式,从而能够在不处理多个输出的情况下保持PFAC中每个最终状态的唯一性;其中,通过PFAC算法,将数据流的有效负载同时并行地与入侵检测的规则集中的多条规则进行匹配验证,如果发生匹配,则将数据流标示为恶意流量并触发警报。
- 根据权利要求11或13所述的安全检测设备(60),其中,所述指令当由所述处理器(62)执行时,还使所述处理器(62):从所述预处理后的网络流量中提取如下特征:源端口、源地址、目的端口、目的地址、ICMP类型、协议标识符、原始数据长度和原始数据。
- 根据权利要求18所述的安全检测设备(60),其中,所述指令当由所述处理器(62)执行时,还使所述处理器(62):在GPU中实现一个哈希表,所述哈希表用来维护和追踪网络流量中的每一条活跃流量的特征数据的索引,每个数据单元特有的哈希值用来确定一条特定的数据流;其中,每一个互斥的哈希条目上使用了原子锁,使得每一个时刻只有一个线程被允许更新其哈希条目;当一个特征数据传输结束时,其对应的数据流会变成非活跃的,这将触发从哈希表中删除相应数据流对应的特征数据的操作;对所述网络流量中的每条数据流而言,最后到达的数据包的时间被记录在了哈希表中,其中,采用一种基于阈值的方法来确定一条非活跃的数据流,所述基于阈值的方法包括当时间间隔超过了阈值时,确定相应数据流的特征数据是非活跃的;其中,通过设置一个定时任务来输出非活跃数据流的特征数据,并且其中,基于特征数据,利用所述训练后的机器学习模型进行分类。
- 根据权利要求11所述的安全检测设备(60),其中,所述指令当由所述处理器(62)执行时,还使所述处理器(62):以离线方式执行所述机器学习模型的建立和训练的操作,并且以在线方式执行采集、预处理、检测、识别和融合的操作。
- 一种存储指令的计算机可读存储介质,所述指令当由处理器执行时,使所述处 理器执行根据权利要求1-10中的任一项所述的方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/761,861 US20220368703A1 (en) | 2019-10-28 | 2020-03-18 | Method and device for detecting security based on machine learning in combination with rule matching |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911031332.5 | 2019-10-28 | ||
CN201911031332.5A CN110753064B (zh) | 2019-10-28 | 2019-10-28 | 机器学习和规则匹配融合的安全检测系统 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021082339A1 true WO2021082339A1 (zh) | 2021-05-06 |
Family
ID=69280495
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/079972 WO2021082339A1 (zh) | 2019-10-28 | 2020-03-18 | 将机器学习和规则匹配相融合的安全检测方法和设备 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20220368703A1 (zh) |
CN (1) | CN110753064B (zh) |
WO (1) | WO2021082339A1 (zh) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113709129A (zh) * | 2021-08-20 | 2021-11-26 | 绿盟科技集团股份有限公司 | 一种基于流量学习的白名单生成方法、装置和系统 |
CN114553513A (zh) * | 2022-02-15 | 2022-05-27 | 北京华圣龙源科技有限公司 | 一种通信检测方法、装置及设备 |
CN114979828A (zh) * | 2022-05-18 | 2022-08-30 | 成都安讯智服科技有限公司 | 基于Modbus的物联网通信模块流量控制方法及系统 |
CN115208682A (zh) * | 2022-07-26 | 2022-10-18 | 上海欣诺通信技术股份有限公司 | 一种基于snort的高性能网络攻击特征检测方法及装置 |
CN115776449A (zh) * | 2022-11-08 | 2023-03-10 | 中车工业研究院有限公司 | 列车以太网通信状态监测方法及系统 |
Families Citing this family (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110753064B (zh) * | 2019-10-28 | 2021-05-07 | 中国科学技术大学 | 机器学习和规则匹配融合的安全检测系统 |
CN111885059B (zh) * | 2020-07-23 | 2021-08-31 | 清华大学 | 一种工业网络流量异常检测定位的方法 |
CN112532642B (zh) * | 2020-12-07 | 2022-05-20 | 河北工业大学 | 一种基于改进Suricata引擎的工控系统网络入侵检测方法 |
CN114697068A (zh) * | 2020-12-31 | 2022-07-01 | 华为技术有限公司 | 一种恶意流量识别方法及相关装置 |
CN112769840B (zh) * | 2021-01-15 | 2023-04-07 | 杭州安恒信息技术股份有限公司 | 一种基于强化学习Dyna框架的网络攻击行为识别方法 |
US20220232023A1 (en) * | 2021-01-21 | 2022-07-21 | Noname Gate Ltd. | Techniques for securing computing interfaces |
CN113132349A (zh) * | 2021-03-12 | 2021-07-16 | 中国科学院信息工程研究所 | 一种免代理云平台虚拟流量入侵检测方法及装置 |
CN112671618B (zh) * | 2021-03-15 | 2021-06-15 | 北京安帝科技有限公司 | 深度报文检测方法和装置 |
CN112965970A (zh) * | 2021-03-22 | 2021-06-15 | 湖南大学 | 一种基于哈希算法的异常流量并行检测方法及系统 |
CN112953971B (zh) * | 2021-04-01 | 2023-05-16 | 长扬科技(北京)股份有限公司 | 一种网络安全流量入侵检测方法和系统 |
CN115225301B (zh) * | 2021-04-21 | 2023-11-21 | 上海交通大学 | 基于d-s证据理论的混合入侵检测方法和系统 |
CN113098895A (zh) * | 2021-04-26 | 2021-07-09 | 成都中恒星电科技有限公司 | 一种基于dpdk的网络流量隔离系统 |
CN113381980B (zh) * | 2021-05-13 | 2022-11-22 | 优刻得(上海)数据科技有限公司 | 信息安全防御方法及系统、电子设备、存储介质 |
CN113472791B (zh) * | 2021-06-30 | 2023-07-14 | 深信服科技股份有限公司 | 一种攻击检测方法、装置、电子设备及可读存储介质 |
CN113556354B (zh) * | 2021-07-29 | 2022-03-01 | 国家工业信息安全发展研究中心 | 一种基于流量分析的工业互联网安全威胁检测方法与系统 |
CN113761522A (zh) * | 2021-09-02 | 2021-12-07 | 恒安嘉新(北京)科技股份公司 | 一种webshell流量的检测方法、装置、设备和存储介质 |
CN113691562B (zh) * | 2021-09-15 | 2024-04-23 | 神州网云(北京)信息技术有限公司 | 一种精确识别恶意网络通讯的规则引擎实现方法 |
CN114189368B (zh) * | 2021-11-30 | 2023-02-14 | 华中科技大学 | 一种多推理引擎兼容的实时流量检测系统和方法 |
CN114499991B (zh) * | 2021-12-30 | 2023-04-18 | 浙江大学 | 一种拟态waf中恶意流量检测和行为分析方法 |
CN114584371A (zh) * | 2022-03-04 | 2022-06-03 | 桀安信息安全技术(上海)有限公司 | 一种加密流量行为检测的方法、系统及装置 |
CN114866279B (zh) * | 2022-03-24 | 2023-07-25 | 中国科学院信息工程研究所 | 基于http请求有效负载的漏洞攻击流量检测方法和系统 |
CN115022100B (zh) * | 2022-08-10 | 2022-11-01 | 东南大学 | 一种基于流量画像与机器学习的物联网入侵检测方法 |
CN115296919B (zh) * | 2022-08-15 | 2023-04-25 | 江西师范大学 | 一种边缘网关对特殊流量包计算方法及系统 |
CN115563570B (zh) * | 2022-12-05 | 2023-04-14 | 上海飞旗网络技术股份有限公司 | 一种资源的异常检测方法、装置及设备 |
CN115695046B (zh) * | 2022-12-28 | 2023-03-31 | 广东工业大学 | 一种基于增强集成学习的网络入侵检测方法 |
CN116346452B (zh) * | 2023-03-17 | 2023-12-01 | 中国电子产业工程有限公司 | 一种基于stacking的多特征融合恶意加密流量识别方法和装置 |
CN116821907B (zh) * | 2023-06-29 | 2024-02-02 | 哈尔滨工业大学 | 一种基于Drop-MAML的小样本学习入侵检测方法 |
CN116738415A (zh) * | 2023-08-10 | 2023-09-12 | 北京中超伟业信息安全技术股份有限公司 | 基于粒子群优化加权朴素贝叶斯入侵检测方法及装置 |
CN117220911B (zh) * | 2023-08-11 | 2024-03-29 | 释空(上海)品牌策划有限公司 | 一种基于协议深度分析的工控安全审计系统 |
CN117061249B (zh) * | 2023-10-12 | 2024-04-26 | 明阳时创(北京)科技有限公司 | 基于网络流量的入侵监控方法及系统 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105208037A (zh) * | 2015-10-10 | 2015-12-30 | 中国人民解放军信息工程大学 | 一种基于轻量级入侵检测的DoS/DDoS攻击检测和过滤方法 |
WO2016043739A1 (en) * | 2014-09-17 | 2016-03-24 | Resurgo, Llc | Heterogeneous sensors for network defense |
US20160294859A1 (en) * | 2015-03-30 | 2016-10-06 | Electronics And Telecommunications Research Institute | Apparatus and method for detecting malicious domain cluster |
CN110213287A (zh) * | 2019-06-12 | 2019-09-06 | 北京理工大学 | 一种基于集成机器学习算法的双模式入侵检测装置 |
CN110224990A (zh) * | 2019-07-17 | 2019-09-10 | 浙江大学 | 一种基于软件定义安全架构的入侵检测系统 |
CN110753064A (zh) * | 2019-10-28 | 2020-02-04 | 中国科学技术大学 | 机器学习和规则匹配融合的安全检测系统 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102904770A (zh) * | 2012-08-02 | 2013-01-30 | 北京邮电大学 | 一种高带宽VoIP检测系统 |
CN103685268A (zh) * | 2013-12-10 | 2014-03-26 | 华东理工大学 | 一种基于gpu和svm的网络入侵检测方法 |
US9699205B2 (en) * | 2015-08-31 | 2017-07-04 | Splunk Inc. | Network security system |
CN108123939A (zh) * | 2017-12-14 | 2018-06-05 | 华中师范大学 | 恶意行为实时检测方法及装置 |
CN108616498A (zh) * | 2018-02-24 | 2018-10-02 | 国家计算机网络与信息安全管理中心 | 一种web访问异常检测方法和装置 |
CN110311829B (zh) * | 2019-05-24 | 2021-03-16 | 西安电子科技大学 | 一种基于机器学习加速的网络流量分类方法 |
-
2019
- 2019-10-28 CN CN201911031332.5A patent/CN110753064B/zh active Active
-
2020
- 2020-03-18 US US17/761,861 patent/US20220368703A1/en active Pending
- 2020-03-18 WO PCT/CN2020/079972 patent/WO2021082339A1/zh active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016043739A1 (en) * | 2014-09-17 | 2016-03-24 | Resurgo, Llc | Heterogeneous sensors for network defense |
US20160294859A1 (en) * | 2015-03-30 | 2016-10-06 | Electronics And Telecommunications Research Institute | Apparatus and method for detecting malicious domain cluster |
CN105208037A (zh) * | 2015-10-10 | 2015-12-30 | 中国人民解放军信息工程大学 | 一种基于轻量级入侵检测的DoS/DDoS攻击检测和过滤方法 |
CN110213287A (zh) * | 2019-06-12 | 2019-09-06 | 北京理工大学 | 一种基于集成机器学习算法的双模式入侵检测装置 |
CN110224990A (zh) * | 2019-07-17 | 2019-09-10 | 浙江大学 | 一种基于软件定义安全架构的入侵检测系统 |
CN110753064A (zh) * | 2019-10-28 | 2020-02-04 | 中国科学技术大学 | 机器学习和规则匹配融合的安全检测系统 |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113709129A (zh) * | 2021-08-20 | 2021-11-26 | 绿盟科技集团股份有限公司 | 一种基于流量学习的白名单生成方法、装置和系统 |
CN114553513A (zh) * | 2022-02-15 | 2022-05-27 | 北京华圣龙源科技有限公司 | 一种通信检测方法、装置及设备 |
CN114979828A (zh) * | 2022-05-18 | 2022-08-30 | 成都安讯智服科技有限公司 | 基于Modbus的物联网通信模块流量控制方法及系统 |
CN114979828B (zh) * | 2022-05-18 | 2023-03-10 | 成都安讯智服科技有限公司 | 基于Modbus的物联网通信模块流量控制方法及系统 |
CN115208682A (zh) * | 2022-07-26 | 2022-10-18 | 上海欣诺通信技术股份有限公司 | 一种基于snort的高性能网络攻击特征检测方法及装置 |
CN115208682B (zh) * | 2022-07-26 | 2023-12-12 | 上海欣诺通信技术股份有限公司 | 一种基于snort的高性能网络攻击特征检测方法及装置 |
CN115776449A (zh) * | 2022-11-08 | 2023-03-10 | 中车工业研究院有限公司 | 列车以太网通信状态监测方法及系统 |
CN115776449B (zh) * | 2022-11-08 | 2023-10-03 | 中车工业研究院有限公司 | 列车以太网通信状态监测方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
CN110753064B (zh) | 2021-05-07 |
US20220368703A1 (en) | 2022-11-17 |
CN110753064A (zh) | 2020-02-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021082339A1 (zh) | 将机器学习和规则匹配相融合的安全检测方法和设备 | |
Dainotti et al. | Issues and future directions in traffic classification | |
TWI477106B (zh) | 用於在交換器asic中整合線路速率應用識別的系統和方法 | |
CN107733851A (zh) | 基于通信行为分析的dns隧道木马检测方法 | |
Liu et al. | The detection method of low-rate DoS attack based on multi-feature fusion | |
JP2006279930A (ja) | 不正アクセス検出方法及び装置、並びに不正アクセス遮断方法及び装置 | |
US20210303984A1 (en) | Machine-learning based approach for classification of encrypted network traffic | |
WO2023207548A1 (zh) | 一种流量检测方法、装置、设备及存储介质 | |
Yan et al. | Identifying wechat red packets and fund transfers via analyzing encrypted network traffic | |
US20170155668A1 (en) | Identifying malicious communication channels in network traffic by generating data based on adaptive sampling | |
Wang et al. | Characterizing application behaviors for classifying p2p traffic | |
US20240064107A1 (en) | System for classifying encrypted traffic based on data packet | |
Iqbal et al. | A classification framework to detect DoS attacks | |
CN114091602A (zh) | 一种基于机器学习的ssr流量识别系统及方法 | |
Liu et al. | A survey on encrypted traffic identification | |
Bayazit et al. | Neural network based Android malware detection with different IP coding methods | |
Liang et al. | FECC: DNS Tunnel Detection model based on CNN and Clustering | |
US9398040B2 (en) | Intrusion detection system false positive detection apparatus and method | |
Karimov et al. | Problems of increasing efficiency of NIDS by using implementing methods packet classifications on FPGA | |
CN101984635B (zh) | P2p协议流量识别方法及系统 | |
Long et al. | Deep encrypted traffic detection: An anomaly detection framework for encryption traffic based on parallel automatic feature extraction | |
Qiu et al. | Traffic Analytics Development Kits (TADK): Enable Real-Time AI Inference in Networking Apps | |
CN112104628B (zh) | 一种自适应特征规则匹配的实时恶意流量检测方法 | |
Parvat et al. | Performance improvement of deep packet inspection for Intrusion Detection | |
Zhu et al. | A research review on SDN-based DDOS attack detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20881070 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20881070 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20881070 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 19/10/2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20881070 Country of ref document: EP Kind code of ref document: A1 |