WO2020135233A1 - 僵尸网络检测方法、系统及存储介质 - Google Patents
僵尸网络检测方法、系统及存储介质 Download PDFInfo
- Publication number
- WO2020135233A1 WO2020135233A1 PCT/CN2019/126754 CN2019126754W WO2020135233A1 WO 2020135233 A1 WO2020135233 A1 WO 2020135233A1 CN 2019126754 W CN2019126754 W CN 2019126754W WO 2020135233 A1 WO2020135233 A1 WO 2020135233A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- domain name
- access
- terminal
- botnet
- node
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0876—Network utilisation, e.g. volume of load or congestion level
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/142—Network analysis or design using statistical or mathematical methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/04—Processing captured monitoring data, e.g. for logfile generation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/06—Generation of reports
- H04L43/065—Generation of reports related to network devices
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L61/00—Network arrangements, protocols or services for addressing or naming
- H04L61/45—Network directories; Name-to-address mapping
- H04L61/4505—Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
- H04L61/4511—Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
- H04L63/145—Countermeasures against malicious traffic the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2463/00—Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00
- H04L2463/144—Detection or countermeasures against botnets
Definitions
- the present disclosure relates to the technical field of network security, and in particular, to a botnet detection method, system, and storage medium.
- Botnet refers to the use of one or more propagation methods to infect a large number of hosts with a bot program (BOT program) virus, thus forming a one-to-many command and control between the controller and the infected host (Command and Control, C&C) network.
- BOT program bot program
- C&C Command and Control
- Attackers spread bot programs through various channels to infect a large number of hosts on the Internet, and the infected hosts receive commands from the attackers through a control channel to form a botnet.
- This attack method provides the attacker with a hidden, flexible, and efficient one-to-many command and control mechanism, which enables the attacker to achieve information stealing, distributed denial of attack, and spam sending. It has become an important area of current network security. Threat.
- the mainstream method for detecting botnets is still through network traffic analysis, through clustering or association detection of network data traffic.
- This method mostly uses the characteristic information contained in the data flow information on multiple switches, gateway devices and other network devices widely distributed in the monitored network, such as source IP address, destination IP address, packet size, port number, protocol
- Such information can be used for cluster analysis to find abnormal behavior of network traffic that belongs to a botnet, or to perform correlation analysis based on the known behavior of the botnet, identify abnormal traffic from the network traffic, and then detect the botnet.
- these methods require a variety of data and information, huge data streams, and need to parse data packets to extract multiple features, and the processing load is too large, resulting in low detection efficiency of botnets.
- the main purpose of the present disclosure is to provide a botnet detection method, system and storage medium, aiming to improve the detection efficiency of the botnet.
- a botnet detection method provided by the present disclosure is applied to a botnet detection system.
- the method includes: acquiring original network traffic data in a monitored network, and performing a process on the original network traffic data Pre-processing to obtain pre-processed network traffic data; constructing a terminal access relationship graph based on the pre-processed network traffic data; mining terminal identification lists that access multiple same domain names from the terminal access relationship graph to obtain candidates The node combination, based on a preset screening rule, screens the candidate node combination to obtain the detection result of the botnet node.
- An embodiment of the present disclosure also proposes a botnet detection system, including: a data preprocessing module for acquiring original network traffic data in a monitored network, and preprocessing the original network traffic data to obtain a preprocessed network Traffic data; relationship graph construction module, used to construct a terminal access relationship graph based on the preprocessed network traffic data; a node judgment module, mining terminal identification lists that access multiple same domain names from the terminal access relationship graph, A candidate node combination is obtained, and the candidate node combination is screened based on a preset screening rule to obtain a detection result of a botnet node.
- An embodiment of the present disclosure also proposes a network access system.
- the network access system includes: a botnet detection system, and a malicious domain name detection system, a monitoring system, and a DNS server all in communication connection with the botnet detection system.
- the network detection system is also connected to several access terminals; wherein, the botnet detection system is a botnet detection system as described above; a DNS server is used to provide domain name access services to the access terminals; and the malicious domain name detection system is used to receive The suspicious domain name in the detection result sent by the botnet detection system to evaluate the suspicious domain name to optimize the detection result; the monitoring system is used to receive the detection result reported by the botnet detection system , Display the detection result, and/or, based on the detection result, perform related management and control operations on the terminal suspected of being infected with the zombie virus.
- An embodiment of the present disclosure also proposes a botnet detection system, including: a memory, a processor, and a botnet detection program stored on the memory and executable on the processor, the botnet detection program is described by The processor implements the steps of the botnet detection method as described above when executed.
- An embodiment of the present disclosure also proposes a computer-readable storage medium that stores a botnet detection program on the computer-readable storage medium, and when the botnet detection program is executed by a processor, implements the botnet detection method described above step.
- FIG. 1 is an architecture diagram of a network access system involved in an embodiment of the present disclosure
- FIG. 2 is a schematic flowchart of a first embodiment of a botnet detection method of the present disclosure
- FIG. 3(a) is an example of a processing flow of a module for building a relationship graph of a botnet detection system according to an embodiment of the present disclosure
- 3(b) is an example of a processing flow of a relationship graph construction module of a botnet detection system according to an embodiment of the present disclosure
- FIG. 4 is a diagram illustrating an example of a processing flow of a node judgment module of a botnet detection system according to an embodiment of the present disclosure
- FIG. 5 is a schematic diagram of a process of building a pattern tree according to an embodiment of the present disclosure
- FIG. 6 is a schematic flowchart of a second embodiment of the botnet detection method of the present disclosure.
- FIG. 7 is a schematic flowchart of a third embodiment of the botnet detection method of the present disclosure.
- FIG. 8 is a schematic diagram of a system architecture involved in an operating environment according to an embodiment of the present disclosure.
- the main solution of the embodiments of the present disclosure is to obtain the pre-processed network traffic data by acquiring the original network traffic data in the monitored network and pre-processing the original network traffic data; based on the pre-processed network traffic data Build a terminal access relationship graph; dig out a list of terminal identifiers that access multiple same domain names from the terminal access relationship graph to obtain candidate node combinations, and filter the candidate node combinations to obtain botnet nodes based on preset screening rules Test results.
- the domain name query or access information in the terminal's network behavior is used to analyze and compare the terminal's behavior pattern, and the existence of the botnet is detected and detected according to the characteristic that the botnet controlled terminal usually has the same or similar behavior pattern, which can effectively improve The detection efficiency, and the solution does not need to be detected based on the known botnet behavior characteristics, and can be better applied to the detection of unknown botnet threats.
- the mainstream method for detecting botnets is to use the feature information contained in the data flow information on multiple switches, gateway devices, and other network devices widely distributed in the monitored network, such as source IP address, destination IP address, and data.
- these methods require a variety of data and information, huge data streams, and need to parse data packets to extract multiple features, and the processing load is too large, resulting in low detection efficiency of botnets.
- the present disclosure provides a solution that can apply fewer types of data, has fewer features extracted in data traffic, has a lower computational overhead, and effectively improves detection efficiency.
- FIG. 1 is an architecture diagram of a network access system involved in an embodiment of the present disclosure, where FIG. 1 shows a schematic diagram of an application deployment of a botnet detection system in an embodiment of the present disclosure.
- an embodiment of the present disclosure proposes a network access system.
- the network access system includes: a botnet detection system, and a malicious domain name detection system, a monitoring system, and a DNS (which are all in communication with the botnet detection system. Domain Name System) server, the botnet detection system is also connected to a number of access terminals; among them: the botnet detection system is the botnet detection system proposed by the embodiment of the present disclosure.
- the botnet detection system is deployed in the network architecture through a bypass connection, and the network traffic is acquired by mirroring for detection and analysis.
- the botnet detection system performs the detection by acquiring the terminal identification information in the network traffic data and the domain name information queried or accessed by the terminal.
- the system needs to extract the terminal identification that identifies the terminal information from the network traffic (eg, terminal IP address, terminal MAC address or terminal user identity information associated with the terminal, etc.) and the domain name information accessed or queried by the terminal.
- the above network traffic data acquired by the system may include various methods, including but not limited to, for example, a domain name query request of the terminal, an HTTP connection request for the terminal to access the domain name, and so on.
- the botnet detection system may detect the traffic logs within a period of time, analyze terminal behavior patterns, and finally classify all terminals with the same behavior patterns.
- the above-mentioned classification of the same behavior pattern refers to that multiple terminals have exactly the same access or query behavior to multiple domain names within a set time interval or threshold.
- the botnet detection system may send the suspicious domain name in the detection result to the malicious domain name detection system to further evaluate the suspicious domain name.
- the malicious domain name detection system can be a malicious domain name blacklist filtering system, or a threat intelligence matching detection system, or an arbitrary domain name detection system, and so on.
- the botnet detection system can further sort the threat level of the detection results and optimize the detection results.
- the malicious domain name detection system determines that the domain name accessed by a group of terminals is a botnet domain name, it can be determined that the group of terminals with the same behavior mode is suspected to be controlled by the same botnet virus, thereby directly obtaining the topological relationship of the botnet .
- the botnet detection system can determine whether to perform this step according to the actual situation, and further detect the domain name information.
- the botnet detection system may also report the detection result to the monitoring system, and the monitoring system displays the detection result.
- the monitoring system can also perform related control operations on terminals suspected of being infected with zombie viruses according to the detection results, for example, performing network speed limit processing on terminals suspected of being infected with zombie viruses.
- the deployment scheme of the above botnet detection system is only a preferred example, and in specific applications, the deployment of the botnet detection system may be performed according to actual networking conditions.
- the botnet detection system of the present disclosure in addition to being deployed in a network for real-time traffic detection, also supports application to parse traffic log files and output network behavior analysis detection results.
- a DNS server is used to provide domain name access services to access terminals
- the malicious domain name detection system is configured to receive the suspicious domain name in the detection result sent by the botnet detection system, and evaluate the suspicious domain name to optimize the detection result;
- the monitoring system is configured to receive the detection result reported by the botnet detection system, display the detection result, and/or, based on the detection result, perform related management and control operations on the terminal suspected of being infected with the bot virus.
- the botnet detection system of the embodiment of the present disclosure analyzes and compares the behavior patterns of the terminal through domain name query or access information in the terminal network behavior, and detects that the botnet is detected according to the characteristic that the controlled terminal of the botnet usually has the same or similar behavior pattern exist.
- the disclosed solution has fewer application data types, fewer features extracted from data traffic, and less computational overhead, which can effectively improve detection efficiency, and the solution does not require detection based on known botnet behavior characteristics. It can be better applied to the detection of unknown botnet threats.
- the first embodiment of the present disclosure proposes a botnet detection method.
- the method is applied to a botnet detection system.
- the method includes:
- Step S101 Obtain original network traffic data in the monitored network, and preprocess the original network traffic data to obtain preprocessed network traffic data;
- the botnet detection system can obtain network traffic data in a variety of ways, including but not limited to, for example, the domain name query request of the terminal, and the HTTP connection request for the terminal to access the domain name.
- the step of obtaining the original network traffic data in the monitored network may include: the botnet detection system obtains real-time traffic data or traffic log files within a preset time range in the monitored network, As the original network traffic data, the real-time traffic data or the traffic log file includes at least the domain name query request of the terminal and/or the HTTP connection request of the terminal to access the domain name.
- preprocessing the original network traffic data includes: extracting valid fields in the network traffic data, and performing data cleaning to remove redundant and duplicate information.
- a data preprocessing module can be set to execute.
- the following scheme may be adopted:
- the time interval grouping refers to setting a time window T in advance, and grouping traffic logs containing domain name information with T as a period, all log data with time stamps in T as a set of data content to be processed, and subsequent Terminal identification aggregation processing.
- the reason for setting the time interval grouping is that: under normal circumstances, the controlled end infected with the same zombie virus shows a certain degree of consistency in its behavior patterns, so its traffic data has certain periodic characteristics and reasonable time interval settings , Can make the positioning of botnet behavior more accurate.
- valid fields are extracted from the network traffic data grouped in the time interval, and the valid fields include at least three key fields: time stamp, terminal identification, and access domain name;
- the effective field extraction in this embodiment lies in: selecting three key fields such as time stamp, terminal identification, and access domain name in the log data.
- the above valid fields are the minimum set of fields required for detection and are indispensable.
- the extracted field contents include but are not limited to the above fields.
- the network traffic data containing the above valid fields is cleaned, the redundant data and the white list are filtered, and the log sequence with the data structure of ⁇ timestamp, terminal identification, access domain name> is obtained.
- the data cleaning process may include the following two steps: redundant data filtering and whitelist filtering.
- redundant data filtering means that if the ⁇ terminal ID, access domain name> information of multiple logs is detected in the log processing process during the same time interval, duplicate records are removed in the same time interval, and only one log is retained.
- Whitelist filtering means that the accessed domain name in the log information can be analyzed. If the domain name belongs to a trusted domain name, the log is moved out without performing the subsequent detection process.
- the data preprocessing module After the above data cleaning process, the data preprocessing module outputs a log sequence with the structure of ⁇ timestamp, terminal identification, access domain name>.
- Step S102 Construct a terminal access relationship graph based on the preprocessed network traffic data
- the construction of the terminal access relationship graph may be performed by the relationship graph construction module.
- the construction of the terminal access relationship graph may include three parts: terminal identification information aggregation, relationship graph construction, and relationship graph update operation.
- the step of constructing the terminal access relationship graph based on the preprocessed network traffic data may include:
- the terminal identification information within the same time interval is aggregated with the domain name as the center to form an aggregation structure of ⁇ time identification, domain name, terminal identification set to access the domain name>;
- the aggregation of terminal identification information refers to performing aggregation operation on data within a time interval.
- the operation uses the access domain name in the log information as the key value, and aggregates the query identification information or the terminal identification information accessing the domain name.
- the aggregation result is ⁇ time identification, domain name, terminal identification set>, where the time identification is used to mark the occurrence of these access operations Time interval.
- the data obtained from the original network traffic data after data preprocessing is a tuple of ⁇ time stamp, terminal IP address identification, access domain name>;
- the relationship graph uses ⁇ time identification #domain name information> and ⁇ terminal identification information> as nodes, and includes the adjacency relationship between the terminal identification and the visited domain name.
- the bipartite graph includes two types of nodes, a node set U represents a terminal identification set, and a node set V represents a set of access domain names recorded in the time interval.
- the update rules are as follows:
- the newly added domain name aggregated data in the T2 time interval is directly added to the relationship graph, for example, ⁇ T2 in Table 2 above, In Domain3, ⁇ IP-2,IP-3,IP-5>>, comparing the historical relationship diagram, Domain3 is the newly added domain name, and the data is directly added to the relationship diagram.
- domain name Domain If the domain name of the existing node in the historical relationship graph is the same as the domain name of the aggregated data in the T2 time interval (domain name Domain), check the set of adjacent nodes (collection A1) corresponding to the domain name Domain in the historical relationship graph and within the T2 time interval Domain name domain adjacent node set (set A2) comparison:
- A1 is a subset of A2, remove the domain name Domain node and related adjacency in the original relationship graph, and update the adjacency relationship data corresponding to A2 to the graph, for example, historical relationship graph ⁇ T1, Domain1, ⁇ IP
- the IP address list in -1, IP-2, IP-3>> is the IP in ⁇ T2, Domain1, ⁇ IP-1, IP-2, IP-3, IP-4>> in the T2 time interval relationship diagram
- the nodes and corresponding relationships in the historical relationship graph are deleted and updated to the corresponding adjacency relationship between the node T2#Domain1 and its IP address list;
- the set A2 is a subset of A1, the data of nodes and adjacency relationships in the historical relationship graph are retained, and the domain name node and set A2 do not need to be updated in the relationship graph;
- Step S103 Mining a list of terminal identifiers that access multiple same domain names from the terminal access relationship graph to obtain a candidate node combination, and filtering the candidate node combination based on preset screening rules to obtain a botnet node detection result .
- this step can be performed by the node judgment module.
- the step of digging a list of terminal identifiers that access multiple same domain names from the terminal access relationship graph to obtain a candidate node combination may include:
- the terminal identification node in the linked list is used as the tree node, and the access domain name information is added to the attribute information of the tree node to construct the access mode tree.
- the candidate node combination is screened to obtain the detection result of the botnet node.
- the implementation is as follows: determine whether there is a candidate node containing the relationship in the candidate node combination; if it exists, delete the candidate nodes included in the candidate node combination to obtain the candidate node combination after redundancy screening; The candidate node combination of each candidate node to obtain the number of elements of the terminal identification set of each candidate node and the number of elements of the access domain name set; the number of elements of the reserved terminal identification set is greater than the preset threshold of the number of terminal identifications, and the number of elements accessing the domain name set is greater than the preset The set of candidate nodes that access the threshold of the number of domain names is obtained as a botnet node.
- the linked list is first constructed and sorted based on the adjacency relationship:
- the construction of a linked list refers to the construction of each domain name node and its corresponding adjacency relationship node into a linked list based on the terminal access relationship graph, where the ⁇ time ID# access domain name> node is used as the linked list head node in the linked list, and subsequent nodes Identify the nodes for the terminal.
- the order of the terminal identification nodes is sorted in descending order according to the degree of the node in the entire relationship graph;
- the different linked lists are arranged in sequence, wherein the arrangement rules are sorted in descending order based on the degree of the access domain name node in the entire relationship graph.
- the corresponding linked list of the node ⁇ T2#Domain1> is established according to the terminal access relationship graph obtained above.
- T2#Domain1 is the head node, and the sub-nodes corresponding to the head node include ⁇ IP-1, IP-2, IP-3, IP-4>. From the terminal access relationship diagram, it can be seen that the IP nodes are sorted according to degree ⁇ IP-3,IP-2,IP-1,IP-4>.
- the access mode tree uses the terminal identification node as a tree node, and the access domain name information is added to the attribute information of the tree node.
- the construction process of the access pattern tree includes the following processes:
- each linked list in FIG. 4 uses a domain name node as a head node, and all terminal identification nodes are sorted in descending order of degree.
- the terminal identification node set is the node to be processed.
- the current node is the root node, and the node to be processed is the first node with the highest degree after the terminal identification set is sorted in descending order;
- the current node is the terminal identification node that has been processed in the previous cycle
- the pending node is the terminal identification node whose degree is second only to the current node after the terminal identification set is sorted in descending order.
- candidate node combination extraction is performed: for the access pattern tree constructed above, each path in the tree starting from the root node is a candidate node combination.
- the set of nodes on the path represents the set of terminal identifiers, and the ⁇ time ID#domain name> list of the end nodes of the path represents the set of domain names that these terminal IDs access together, forming the structure ⁇ source IP set, time ID#domain name list>, which is the candidate botnet Node combination.
- the extracted candidate botnet node combinations are screened to select the results that meet the threshold.
- thresholds are set for the number of elements of the terminal identification set and the number of visited domain names, respectively.
- terminal identification domain names that access multiple same domain names can be obtained.
- the format is ⁇ terminal identification set, time identification #domain name set>, where the terminal identification set is the machine infected by the botnet.
- hosts that exhibit the same access behavior multiple times and access the domain name with suspicious risks perform abnormally, It is believed to be a machine infected by a botnet.
- Th1 the number of elements of the terminal identification set
- Th2 the number of elements of the access domain name set are greater than the second preset threshold Th2
- the original network traffic data in the monitored network is obtained, and the original network traffic data is preprocessed to obtain preprocessed network traffic data; a terminal access is constructed based on the preprocessed network traffic data Relationship graph; mining a list of terminal identifiers that access multiple same domain names from the terminal access relationship graph to obtain a candidate node combination, and screening the candidate node combination based on preset screening rules to obtain a botnet node detection result . Therefore, the behavior patterns of the terminal are analyzed and compared through the domain name query or access information in the terminal network behavior, and the existence of the botnet is detected based on the characteristic that the controlled terminal of the botnet usually has the same or similar behavior pattern.
- the disclosed solution has fewer application data types, fewer features extracted from data traffic, and less computational overhead, which can effectively improve detection efficiency, and the solution does not require detection based on known botnet behavior characteristics. It can be better applied to the detection of unknown botnet threats.
- a second embodiment of the present disclosure proposes a botnet detection method. Based on the first embodiment shown in FIG. 2 above, the method further includes: Step S104, the suspicious domain name in the detection result Sent to a malicious domain name detection system, and the malicious domain name detection system evaluates the suspicious domain name to optimize the detection result.
- this embodiment further includes a scheme for the malicious domain name detection system to evaluate the suspicious domain name to optimize the detection result.
- the botnet detection system may send the suspicious domain name in the detection result to the malicious domain name detection system to further evaluate the suspicious domain name.
- the malicious domain name detection system here can be implemented in multiple ways, such as a malicious domain name blacklist filtering system, or a threat intelligence matching detection system, or an arbitrary domain name detection system, and so on.
- the malicious domain name detection system evaluates the suspicious domain name, and the botnet detection system can further sort the detection results by threat level and optimize the detection results. For example, if the malicious domain name detection system determines that the domain name accessed by a group of terminals is a botnet domain name, it can be determined that the group of terminals with the same behavior mode is suspected to be controlled by the same botnet virus, thereby directly obtaining the topological relationship of the botnet . In actual deployment, the botnet detection system can determine whether to perform this step operation according to the actual situation, and further detect the domain name information.
- the original network traffic data in the monitored network is obtained, and the original network traffic data is preprocessed to obtain preprocessed network traffic data; a terminal access is constructed based on the preprocessed network traffic data Relationship graph; mining a list of terminal identifiers that access multiple same domain names from the terminal access relationship graph to obtain a candidate node combination, and screening the candidate node combination based on preset screening rules to obtain a botnet node detection result . Therefore, the behavior patterns of the terminal are analyzed and compared through the domain name query or access information in the terminal network behavior, and the existence of the botnet is detected based on the characteristic that the controlled terminal of the botnet usually has the same or similar behavior pattern. In addition, after obtaining the detection result, the botnet detection system can send the suspicious domain name in the detection result to the malicious domain name detection system to further evaluate the suspicious domain name, so that the botnet detection system can further sort the threat level of the detection result. Optimize test results.
- the disclosed solution has fewer application data types, fewer features extracted from data traffic, and less computational overhead, which can effectively improve detection efficiency, and the solution does not require detection based on known botnet behavior characteristics. It can be better applied to the detection of unknown botnet threats.
- a third embodiment of the present disclosure proposes a botnet detection method. Based on the second embodiment shown in FIG. 2 above, the method further includes:
- Step S105 Report the detection result to the monitoring system, and the monitoring system displays the detection result, and/or the monitoring system performs related management and control operations on the terminal suspected of being infected with the zombie virus according to the detection result.
- this embodiment further includes: a scheme in which the monitoring system performs related operations on the detection result.
- the botnet detection system may report the detection result to the monitoring system, and the monitoring system displays the detection result.
- the monitoring system may also perform related management and control operations on the terminal suspected of being infected with the zombie virus according to the detection result, for example, performing network speed limit processing on the terminal suspected of being infected with the zombie virus.
- botnet detection system deployment solution is only a preferred example, and the botnet detection system can be deployed in the application according to actual networking conditions.
- the botnet detection system of the present disclosure in addition to being deployed in a network for real-time traffic detection, also supports application to parse traffic log files and output network behavior analysis detection results.
- an embodiment of the present disclosure also provides a botnet detection system, including: a data preprocessing module, a relationship graph construction module, and a node judgment module, where:
- Data pre-processing module This module receives the original traffic and processes the original data, extracts the valid fields in the traffic data, and performs data cleaning to remove redundant and duplicate information;
- Relationship diagram construction module receives log data processed by the data preprocessing module, and performs terminal identification information aggregation, relationship diagram construction, and relationship diagram update operations.
- the main operations of this module include the aggregation of terminal identification information in the same time interval with the domain name as the center, forming an aggregate structure similar to ⁇ time identification, domain name, terminal identification list for accessing domain name>, and constructing and updating domain name nodes and access The adjacency diagram between the terminal identification lists of domain names.
- Node judgment module This module mines a list of terminal identifiers that access multiple same domain names from the relationship graph output by the relationship graph construction module, and after filtering, gets infected botnet nodes.
- the main operations of this module include: first mining a list of terminal IDs that access multiple same domain names in the access relationship graph to form a structure similar to ⁇ terminal ID set, access domain name set>, the output structure is the extracted candidate node combination, and then Set corresponding filtering rules to filter the candidate node combinations, and the output of the filtering is the detected botnet node.
- a data pre-processing module is used to obtain the original network traffic data in the monitored network, and pre-process the original network traffic data to obtain the pre-processed network traffic data;
- the relationship graph construction module uses Construct a terminal access relationship graph based on the pre-processed network traffic data;
- a node judgment module mines a list of terminal identifiers that access multiple same domain names from the terminal access relationship graph to obtain candidate node combinations based on preset Screening rules, screening the candidate node combination to obtain the detection result of the botnet node.
- the data pre-processing module is further used to obtain real-time traffic data or traffic log files within a preset time range in the monitored network as original network traffic data, the real-time traffic data or traffic log files at least include The domain name query request of the terminal and/or the HTTP connection request of the terminal to access the domain name.
- the data pre-processing module is further used to group the original network traffic data in time intervals; extracting valid fields from the network traffic data after the time interval grouping, the valid fields at least include: timestamp, Three key fields: terminal identification and access domain name; clean the network traffic data containing the effective field, filter redundant data and white list, and get the log sequence with the data structure of ⁇ time stamp, terminal identification, access domain name> .
- the above relationship graph construction module is further used to extract terminal identification and terminal access or query domain name information in the network behavior of the terminal in the corresponding time interval from the preprocessed network traffic data; based on the extracted Terminal identification and domain name information accessed or queried by the terminal, centering on the domain name to aggregate terminal identification information in the same time interval to form an aggregation structure of ⁇ time identification, domain name, terminal identification set accessing domain name>; based on the aggregation structure , Construct and update the adjacency relationship graph between the domain name node and the terminal identification list accessing the domain name to obtain the terminal access relationship graph.
- the above-mentioned node judgment module is also used to construct a linked list based on the adjacency relationship in the terminal access relationship graph and sort; construct an access pattern tree based on the sorted linked list; extract each path of the access pattern tree
- the set of nodes is used as the terminal identification set, and the ⁇ time identification#domain name> list of the end node of the path is used as the set of access domain names that the corresponding terminal identifications visit together to obtain a list of terminal identifications that access multiple same domain names as the candidate node combination.
- the botnet detection system further includes: a processing module configured to send the suspicious domain name in the detection result to a malicious domain name detection system, and the malicious domain name detection system evaluates the suspicious domain name to optimize The detection result; and reporting the detection result to the monitoring system, where the monitoring system displays the detection result, and/or the monitoring system performs related management and control operations on the terminal suspected of being infected with the zombie virus according to the detection result .
- the disclosed solution has fewer application data types, fewer features extracted from data traffic, and less computational overhead, which can effectively improve detection efficiency, and the solution does not require detection based on known botnet behavior characteristics. It can be better applied to the detection of unknown botnet threats.
- an embodiment of the present disclosure also proposes a botnet detection system, including: a memory, a processor, and a botnet detection program stored on the memory and executable on the processor, the botnet detection program is The processor implements the steps of the botnet detection method as described above when executed.
- the system in this embodiment may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, and a communication bus 1002.
- the communication bus 1002 is used to implement connection communication between these components.
- the user interface 1003 may include a display (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
- the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
- the memory 1005 may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as a disk memory.
- the memory 1005 may optionally be a storage device independent of the foregoing processor 1001.
- FIG. 8 does not constitute a limitation on the platform, and may include more or fewer components than those illustrated, or combine certain components, or arrange different components.
- the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a botnet detection program.
- the network interface 1004 is mainly used to connect to a network server and perform data communication with the network server;
- the user interface 1003 is mainly used to connect to a client and perform data communication with the client;
- the processor 1001 may be used to Call the botnet detection program stored in the memory 1005 and perform the following operations: obtain the original network traffic data in the monitored network, preprocess the original network traffic data to obtain preprocessed network traffic data; based on the Preprocessed network traffic data to construct a terminal access relationship graph; mining terminal identification lists that access multiple same domain names from the terminal access relationship graph to obtain candidate node combinations, based on preset filtering rules, the candidate node The combination is screened to obtain the detection result of the botnet node.
- the processor 1001 may be used to call a botnet detection program stored in the memory 1005, and also perform the following operations: obtain real-time traffic data or traffic log files within a preset time range in the monitored network as the original network Traffic data, the real-time traffic data or the traffic log file at least includes a domain name query request of the terminal and/or an HTTP connection request for the terminal to access the domain name.
- the processor 1001 may be used to call the botnet detection program stored in the memory 1005, and further perform the following operations: group the original network traffic data in time intervals; from the network traffic data grouped in time intervals Extract valid fields.
- the valid fields include at least three key fields: timestamp, terminal identification, and access domain name; clean the network traffic data that contains the valid fields, filter redundant data and whitelist, and get the data structure It is the log sequence of ⁇ time stamp, terminal ID, access domain name>.
- the processor 1001 may be used to call a botnet detection program stored in the memory 1005, and also perform the following operations: extracting terminal information identifying terminal information in the network behavior of the terminal within a corresponding time interval from the preprocessed network traffic data Terminal identification and domain name information accessed or queried by the terminal; based on the extracted terminal identification and domain name information accessed or queried by the terminal, the terminal identification information within the same time interval is aggregated around the domain name to form ⁇ time identification, domain name, access domain name Based on the aggregate structure, construct and update the adjacency relation graph between the domain name node and the terminal identity list for accessing the domain name, and obtain the terminal access relation graph.
- the processor 1001 may be used to call a botnet detection program stored in the memory 1005, and also perform the following operations: construct and sort a linked list based on the adjacency relationship in the terminal access relationship graph; construct based on the sorted linked list Access mode tree; extract the node set on each path of the access mode tree as the terminal ID set, and the ⁇ time ID#domain name> list of the end nodes of the path as the access domain name set that the corresponding terminal IDs jointly access to obtain multiple access A list of terminal identifiers with the same domain name is used as the candidate node combination.
- the processor 1001 may be used to call a botnet detection program stored in the memory 1005, and further perform the following operations: based on the terminal access relationship graph, each domain name node and its corresponding neighbor relationship node are constructed It is a linked list, where the ⁇ time ID# access domain name> node is used as the head node of the linked list, and the subsequent nodes are terminal identification nodes.
- the terminal identification nodes are arranged in descending order according to the degree of the nodes in the entire relationship graph; Arrange the order between the linked lists, where the arrangement rules are sorted in descending order based on the degree of the access domain name node in the entire relationship graph.
- the processor 1001 may be used to call the botnet detection program stored in the memory 1005, and also perform the following operations: based on the sorted linked list, use the terminal identification node in the linked list as a tree node, and add the access domain name information to From the attribute information of the tree node, an access pattern tree is constructed.
- the processor 1001 may be used to call a botnet detection program stored in the memory 1005, and also perform the following operations: determine whether there is a candidate node that contains a relationship in the candidate node combination; if it exists, delete the The candidate nodes included in the candidate node combination are redundantly filtered candidate node combinations; based on the redundantly filtered candidate node combination, the number of elements of the terminal identification set and the number of elements of the access domain name set of each candidate node are obtained; reserved The number of elements of the terminal identification set is greater than the preset threshold of the number of terminal identifications, and the set of candidate nodes whose number of elements of the access domain name set is greater than the preset threshold of the number of access domain names is a botnet node.
- the processor 1001 may be used to call a botnet detection program stored in the memory 1005, and further perform the following operations: send the suspicious domain name in the detection result to a malicious domain name detection system, which is detected by the malicious domain name The system evaluates the suspicious domain name to optimize the detection result.
- the processor 1001 may be used to call a botnet detection program stored in the memory 1005, and further perform the following operations: report the detection result to the monitoring system, and the monitoring system displays the detection result, and/or Based on the detection result, the monitoring system performs related management and control operations on the terminal suspected of being infected with a zombie virus.
- an embodiment of the present disclosure also provides a computer-readable storage medium that stores a botnet detection program on the computer-readable storage medium, and the botnet detection program is executed by the processor to implement the botnet detection as described above Method steps.
- a botnet detection method, system and storage medium preprocess the original network traffic data by acquiring the original network traffic data in the monitored network to obtain the preprocessed network Traffic data; constructing a terminal access relationship graph based on the preprocessed network traffic data; mining terminal identification lists that access multiple same domain names from the terminal access relationship graph to obtain candidate node combinations based on preset filtering rules , Screening the candidate node combination to obtain the detection result of the botnet node. Therefore, the behavior patterns of the terminal are analyzed and compared through the domain name query or access information in the terminal network behavior, and the existence of the botnet is detected based on the characteristic that the controlled terminal of the botnet usually has the same or similar behavior pattern.
- the disclosed solution has fewer types of application data, fewer features extracted from data traffic, and less computational overhead, which can effectively improve detection efficiency, and the solution does not require detection based on known botnet behavior characteristics. It can be better applied to the detection of unknown botnet threats.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Virology (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Algebra (AREA)
- Environmental & Geological Engineering (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Pure & Applied Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
本公开公开了一种僵尸网络检测方法、系统及存储介质,其方法包括:获取被监测网络中的原始网络流量数据,对原始网络流量数据进行预处理,得到预处理后的网络流量数据;基于预处理后的网络流量数据构建终端访问关系图;从终端访问关系图中挖掘出访问多个相同域名的终端标识列表,得到候选节点组合,基于预设的筛选规则,对候选节点组合进行筛选得到僵尸网络节点的检测结果。
Description
本公开要求享有2018年12月26日提交的名称为“僵尸网络检测方法、系统及存储介质”的中国专利申请CN201811602973.7的优先权,其全部内容通过引用并入本文中。
本公开涉及网络安全技术领域,尤其涉及一种僵尸网络检测方法、系统及存储介质。
僵尸网络(Botnet)是指采用一种或多种传播手段,使得大量主机感染僵尸程序(BOT程序)病毒,从而在控制者与被感染主机之间形成的一个可一对多命令与控制(Command and Control,C&C)的网络。攻击者通过各种途径传播僵尸程序感染互联网上的大量主机,而被感染的主机通过一个控制信道接受攻击者的指令,组成一个僵尸网络。这种攻击方式为攻击者提供了隐匿、灵活且高效的一对多命令与控制机制,使得攻击者能够实现信息窃取、分布式拒绝攻击和垃圾邮件发送等攻击目的,成为当前网络安全领域的重要威胁。
目前,检测僵尸网络的主流方法还是通过网络流量分析,通过对网络数据流量的聚类或关联检测来实现。这种方法多是通过被监测网络中广泛分布的多个交换机、网关设备等网络设备上的数据流信息中包含的特征信息,如源IP地址、目的IP地址、数据包大小、端口号、协议等信息来进行聚类分析,发现属于僵尸网络的网络流量的异常行为,或是根据确知的僵尸网络行为进行关联性分析,从网络流量中识别异常流量,进而检测出僵尸网络。但是,这些方法需要的数据信息多种多样,数据流庞大,并需要解析数据包提取多个特征,处理负载过大,导致僵尸网络的检测效率较低。
发明内容
本公开的主要目的在于提供一种僵尸网络检测方法、系统及存储介质,旨在提高僵尸网络的检测效率。
为实现上述目的,本公开提供的一种僵尸网络检测方法,所述方法应用于僵尸网络检测系统,所述方法包括:获取被监测网络中的原始网络流量数据,对所述原始网络流量数据进行预处理,得到预处理后的网络流量数据;基于所述预处理后的网络流量数据构建终端访问关系图;从所述终端访问关系图中挖掘出访问多个相同域名的终端标识列表,得到候选节点组合,基于预设的筛选规则,对所述候选节点组合进行筛选,得到僵尸网络节点的检测结果。
本公开实施例还提出一种僵尸网络检测系统,包括:数据预处理模块,用于获取被监测网络中的原始网络流量数据,对所述原始网络流量数据进行预处理,得到预处理后的网络流量数据;关系图构建模块,用于基于所述预处理后的网络流量数据构建终端访问关系图;节点判断模块,从所述终端访问关系图中挖掘出访问多个相同域名的终端标识列表,得到候选节点组合,基于预设的筛选规则,对所述候选节点组合进行筛选得到僵尸网络节点的检测结果。
本公开实施例还提出一种网络访问系统,所述网络访问系统包括:僵尸网络检测系统,以及均与所述僵尸网络检测系统通信连接的恶意域名检测系统、监控系统、DNS服务器,所述僵尸网络检测系统还连接若干访问终端;其中,所述僵尸网络检测系统为如上所述的僵尸网络检测系统;DNS服务器,用于向访问终端提供域名访问服务;所述恶意域名检测系统,用于接收所述僵尸网络检测系统发送的所述检测结果中的可疑域名,对所述可疑域名进行评估,以优化所述检测结果;所述监控系统,用于接收所述僵尸网络检测系统上报的检测结果,对检测结果进行展示,和/或,根据所述检测结果,对疑似感染僵尸病毒的终端进行相关管控操作。
本公开实施例还提出一种僵尸网络检测系统,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的僵尸网络检测程序,所述僵尸网络检测程序被所述处理器执行时实现如上所述的僵尸网络检测方法的步骤。
本公开实施例还提出一种计算机可读存储介质,所述计算机可读存储介质上存储有僵尸网络检测程序,所述僵尸网络检测程序被处理器执行时实现如上所述的僵尸网络检测方法的步骤。
图1是本公开实施例涉及的网络访问系统架构图;
图2是本公开僵尸网络检测方法第一实施例的流程示意图;
图3(a)是本公开实施例涉及的僵尸网络检测系统关系图构建模块一种处理流程示例图;
图3(b)是本公开实施例涉及的僵尸网络检测系统关系图构建模块一种处理流程示例图;
图4是本公开实施例涉及的僵尸网络检测系统节点判断模块处理流程示例图;
图5是本公开实施例涉及的构建模式树的过程示意图;
图6是本公开僵尸网络检测方法第二实施例的流程示意图;
图7是本公开僵尸网络检测方法第三实施例的流程示意图;
图8是本公开实施例运行环境涉及的系统架构示意图。
本公开目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
为了使本公开的技术方案更加清楚、明了,下面将结合附图作进一步详述。
应当理解,此处所描述的实施例仅仅用以解释本公开,并不用于限定本公开。显然,以下所描述的实施例仅是本公开的部分实施例。基于本公开中实施例,本领域普通技术人员在没有作出创造性劳动的前提下所获得的所有其他实施例,都属于本公开的保护范围。在不冲突的情况下,本公开中的实施例及实施例中的特征可以相互任意组合。并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此的顺序执行所示出或描述的步骤。
本公开实施例的主要解决方案是:通过获取被监测网络中的原始网络流量数据,对原始网络流量数据进行预处理,得到预处理后的网络流量数据;基于所述预处理后的网络流量数据构建终端访问关系图;从所述终端访问关系图中挖掘出访问多个相同域名的终端标识列表,得到候选节点组合,基于预设的筛选规则,对所述候选节点组合进行筛选得到僵尸网络节点的检测结果。由此,通过终端网络行为中的域名查询或访问信息来分析比较终端的行为模式,并依据僵尸网络被控终端通常具有相同或近似行为模式这一特征来检测发现僵尸网络的存在,能够有效提升检测效率,且该方案不需要基于确知的僵尸网络行为特征进行检测,能够更好的应用于未知僵尸网络威胁的检出。
由于一些情况中,检测僵尸网络的主流方法是通过被监测网络中广泛分布的多个交换机、网关设备等网络设备上的数据流信息中包含的特征信息,如源IP地址、目的IP地址、数据包大小、端口号、协议等信息来进行聚类分析,发现属于僵尸网络的网络流量的异常行为,或是根据确知的僵尸网络行为进行关联性分析,从网络流量中识别异常流量,进而检测出僵尸网络。但是,这些方法需要的数据信息多种多样,数据流庞大,并需要解析数据包提取多个特征,处理负载过大,导致僵尸网络的检测效率较低。
本公开提供一种解决方案,可以应用较少的数据种类,在数据流量中提取的特征数量少,计算开销较小,有效提升检测效率。
在一个实施例中,参照图1,图1是本公开实施例涉及的网络访问系统架构图,其中,图1示出了本公开实施例中僵尸网络检测系统的一种应用部署示意图。
如图1所示,本公开实施例提出一种网络访问系统,该网络访问系统包括:僵尸网络 检测系统,以及均与所述僵尸网络检测系统通信连接的恶意域名检测系统、监控系统、DNS(Domain Name System,域名系统)服务器,所述僵尸网络检测系统还连接若干访问终端;其中:僵尸网络检测系统即为本公开实施例提出的僵尸网络检测系统。
在图1所示的应用部署中,僵尸网络检测系统通过旁路连接的方式部署在网络架构中,镜像获取网络流量进行检测分析。
在本公开实施方案中,僵尸网络检测系统通过获取网络流量数据中的终端标识信息和终端查询或访问的域名信息来进行检测,系统需要从网络流量中提取标识终端信息的终端标识(例如,终端IP地址、终端MAC地址或终端关联的主机用户身份信息等)和终端访问或查询的域名信息。系统获取的上述网络流量数据可包含多种方式,包括但不限于如终端的域名查询请求、终端访问域名的HTTP连接请求等。
本公开实施例的实施过程中,僵尸网络检测系统的部署使用实现方式有多种,可以参照图1中的旁路部署方案,也可以通过改变物理连接关系或流量路由规则等手段,使得访问流量经由僵尸网络检测系统后再到达终端或服务器,等等。
本公开实施例的实施过程中,僵尸网络检测系统可以通过对一段时间范围内的流量日志进行检测,分析终端行为模式,最终将所有具有同样行为模式的终端进行归类。上述同样行为模式归类,是指多个终端在设定时间区间或阈值内,对多个域名的访问或查询行为完全一致。
在一个实施例中,在本公开实施例的实施过程中,僵尸网络检测系统可将检测结果中的可疑域名发送至恶意域名检测系统,进一步对可疑域名进行评估。可选的,此处的恶意域名检测系统有多种实现方式,可以是恶意域名黑名单过滤系统,或威胁情报匹配检测系统,或随意域名检测系统等等。通过恶意域名检测系统,僵尸网络检测系统可对检测结果进一步进行威胁等级排序,优化检测结果。举例来说,若恶意域名检测系统判断一组终端访问的域名为某僵尸网络域名,则可判定该组具有相同行为模式的终端疑似被同样的僵尸病毒控制,从而直接获取到僵尸网络的拓扑关系。在实际部署中,僵尸网络检测系统可根据实际情况判断是否执行该步骤操作,对域名信息进一步检测。
进一步,在本公开实施例的实施过程中,僵尸网络检测系统还可以将检测结果上报至监控系统,由监控系统对检测结果进行展示。扩展地,监控系统还可根据检测结果,对疑似感染僵尸病毒的终端进行相关管控操作,例如,对疑似感染僵尸病毒的终端进行网络限速处理等操作。
需要说明的是,上述僵尸网络检测系统的部署方案仅为优选示例,具体应用中可根据实际组网情况,进行僵尸网络检测系统的部署。
在一个实施例中,除部署在网络中,进行实时流量检测外,本公开僵尸网络检测系统还支持应用于解析流量日志文件,输出网络行为分析检测结果。
在上述图1所示的系统架构中,DNS服务器,用于向访问终端提供域名访问服务;
所述恶意域名检测系统,用于接收所述僵尸网络检测系统发送的所述检测结果中的可疑域名,对所述可疑域名进行评估,以优化所述检测结果;
所述监控系统,用于接收所述僵尸网络检测系统上报的检测结果,对检测结果进行展示,和/或,根据所述检测结果,对疑似感染僵尸病毒的终端进行相关管控操作。
本公开实施例提出的僵尸网络检测系统会在以下各实施例进行详细阐述。
本公开实施例僵尸网络检测系统通过终端网络行为中的域名查询或访问信息来分析比较终端的行为模式,并依据僵尸网络被控终端通常具有相同或近似行为模式这一特征来检测发现僵尸网络的存在。相比一些情况,本公开方案应用数据种类少,在数据流量中提取的特征数量少,计算开销较小,能够有效提升检测效率,且该方案不需要基于确知的僵尸网络行为特征进行检测,能够更好的应用于未知僵尸网络威胁的检出。
在一个实施例中,如图2所示,本公开第一实施例提出一种僵尸网络检测方法,所述方法应用于僵尸网络检测系统,所述方法包括:
步骤S101,获取被监测网络中的原始网络流量数据,对所述原始网络流量数据进行预处理,得到预处理后的网络流量数据;
其中,僵尸网络检测系统获取网络流量数据可包含多种方式,包括但不限于如终端的域名查询请求、终端访问域名的HTTP连接请求等。
在一个实施例中,作为一种实施方式,获取被监测网络中的原始网络流量数据的步骤可以包括:僵尸网络检测系统获取被监测网络中预设时间范围内的实时流量数据或流量日志文件,作为原始网络流量数据,其中,实时流量数据或流量日志文件至少包括终端的域名查询请求和/或终端访问域名的HTTP连接请求。
在获取被监测网络中的原始网络流量数据后,对该原始网络流量数据进行预处理。
其中,对原始网络流量数据进行预处理包括:提取网络流量数据中的有效字段,并进行数据清洗,去除冗余和重复信息。可以设定一数据预处理模块来执行。
在一个实施例中,对原始网络流量数据进行预处理,可以采用如下方案:
首先,对原始网络流量数据进行时间区间分组;
其中,时间区间分组是指预先设定一个时间窗口T,并以T为周期对包含域名信息的流量日志进行分组,时间戳在T内的所有日志数据作为一组待处理的数据内容,进行后续终端标识聚合处理。
设置时间区间分组的原因在于:通常情况下,感染同样僵尸病毒的受控端,其行为模式表现出一定程度的一致性,因此其流量数据中具有一定的周期性特点,合理的时间区间的设置,能够使僵尸网络行为的定位更为准确。
然后,从时间区间分组后的网络流量数据中提取有效字段,所述有效字段至少包括:时间戳、终端标识、访问域名三个关键字段;
其中,有效字段提取在本实施例中在于:选择提取日志数据中的时间戳、终端标识、访问域名等三个关键字段。
上述有效字段为检测所需字段的最小集合,缺一不可。在实际应用中,提取字段内容包括但不限于上述字段。
最后,对包含有上述有效字段的网络流量数据进行清洗,过滤冗余数据及白名单,得到数据结构为<时间戳、终端标识、访问域名>的日志序列。
在一个实施例中,在本实施例中,数据清洗的过程可包括如下两个步骤:冗余数据过滤和白名单过滤。
其中,冗余数据过滤是指在同一时间区间内,若日志处理过程中检测到多条日志的<终端标识,访问域名>信息一致,则同一时间区间内,去除重复记录,仅保留一条日志。
白名单过滤是指:可对日志信息中的访问域名进行分析,若域名属于可信域名,则将日志移出,不执行后续的检测过程。
经过上述数据清洗处理后,数据预处理模块输出结构为<时间戳、终端标识、访问域名>的日志序列。
步骤S102,基于所述预处理后的网络流量数据构建终端访问关系图;
其中,构建终端访问关系图可以由关系图构建模块来执行。终端访问关系图的构建可以包括三部分:终端标识信息聚合、关系图构建和关系图更新操作。
在一个实施例中,基于所述预处理后的网络流量数据构建终端访问关系图的步骤可以包括:
从预处理后的网络流量数据中提取对应时间区间内终端网络行为中标识终端信息的终端标识和终端访问或查询的域名信息;
基于提取的终端标识和终端访问或查询的域名信息,以域名为中心将同一时间区间内的终端标识信息进行聚合,形成<时间标识、域名、访问域名的终端标识集合>的聚合结构;
基于所述聚合结构,构建和更新域名节点与访问域名的终端标识列表间的邻接关系图,得到终端访问关系图。
其中,终端标识信息聚合即为对一个时间区间内的数据执行聚合操作。该操作以日志信息中的访问域名为键值,聚合发起查询请求或访问该域名的终端标识信息,聚合结果为 <时间标识,域名,终端标识集合>,其中时间标识用以标记这些访问操作发生的时间区间。
以终端标识为IP地址为例,进行详细说明:
原始网络流量数据经过数据预处理(时间片提取、有效字段提取、数据过滤)后得到的数据为<时间戳,终端IP地址标识,访问域名>的元组;
对预处理后的元组信息按照时间区间进行划分,并对T1时间区间内的日志记录执行数据聚合操作,得到如下表1所示的<时间标识,访问域名,终端IP地址标识集合>的元组;
时间标识 | 访问域名 | 终端IP地址标识 |
T1 | Domain1 | IP-1,IP-2,IP-3 |
T1 | Domain2 | IP-1,IP-2,IP-3,IP-4 |
表1
之后,基于上述中所述时间区间内的聚合数据,构建为关系图,关系图中以<时间标识#域名信息>和<终端标识信息>为节点,并包含终端标识与访问域名的邻接关系。
以二分图法构建关系图为例,对该关系图构建操作进行详细描述。所述二分图中包括两类节点,节点集合U代表终端标识集合,节点集合V代表所述时间区间内记录的访问域名的集合。另外二分图中还可定义E={<ui,vi>},表示某终端标识ui与其对应访问域名vi间的邻接关系。以上述T1时间区间内的聚合数据为例,可构建得出T1时间区间内的访问关系图,如图3(a)所示。
然后,进行关系图更新,加载已存在的历史关系图。
将上述得到的时间区间关系图信息,更新至历史关系图中。
为对上述关系图更新操作,及关系图更新规则进一步详细说明,举例如下:
假设已存在历史关系图,且关系图内容为前述中的T1时间区间内的聚合数据。假设当前需更新的数据对应时间区间为T2,且聚合数据如下表2所示:
时间标识 | 访问域名 | 终端IP地址标识 |
T2 | Domain1 | IP-1,IP-2,IP-3,IP-4 |
T2 | Domain2 | IP-3,IP-5 |
T2 | Domain3 | IP-2,IP-3,IP-5 |
表2
更新规则如下:
对上表2中的T2时间区间内的聚合数据,与已有关系图(图3(a))中的数据结构进行比较。
若历史关系图中已有节点的域名均与T2时间区间内聚合数据的域名不同,则将T2时 间区间内中新增域名聚合数据直接添加到关系图中,例如上表2中的<T2,Domain3,<IP-2,IP-3,IP-5>>中,对比历史关系图,Domain3为新增域名,该数据直接添加至关系图中。
若历史关系图中已有节点的域名与T2时间区间内聚合数据的域名一致(域名Domain),则查看历史关系图中域名Domain对应的相邻节点集合(集合A1),并与T2时间区间内域名Domain的相邻节点集合(集合A2)比对:
若集合A1是A2的子集,则将原关系图中域名Domain节点及相关的邻接关系移除,并将A2对应的邻接关系数据更新至图中,例如历史关系图<T1,Domain1,<IP-1,IP-2,IP-3>>中的IP地址列表是T2时间区间关系图中<T2,Domain1,<IP-1,IP-2,IP-3,IP-4>>中的IP地址列表的子集,则将历史关系图中的节点及对应关系删除,更新为节点T2#Domain1和其IP地址列表的对应邻接关系;
反之,若集合A2是A1的子集,则保留历史关系图中的节点及邻接关系数据,域名节点及集合A2不需要更新至关系图中;
若集合A1与A2无子集包含关系,则将域名节点和A2添加至关系图中。例如历史关系图中存在<T1,Domain2,<IP-1,IP-2,IP-3,IP-4>>,T2时间区间新增关系图<T2,Domain2,<IP-3,IP-5>>,两个关系图中虽然访问域名相同,但A1与A2集合无包含关系,则在关系图中新增节点<T2#Domain2>,并添加其对应邻接关系。
经过上述更新,上述图3(a)原有关系图更新结果如图3(b)所示。
步骤S103,从所述终端访问关系图中挖掘出访问多个相同域名的终端标识列表,得到候选节点组合,基于预设的筛选规则,对所述候选节点组合进行筛选得到僵尸网络节点的检测结果。
实现时,本步骤可以由节点判断模块来执行。
参照图4所示,首先,从上述最终更新得到的访问关系图,挖掘访问多个相同域名的源IP组合作为候选节点组合。
作为一种实施方式,所述从所述终端访问关系图中挖掘出访问多个相同域名的终端标识列表,得到候选节点组合的步骤可以包括:
基于所述终端访问关系图中的邻接关系构建链表并排序;
基于排序后的链表构建访问模式树;
在一个实施例中,基于排序后的链表,以链表中终端标识节点为树节点,将访问域名信息添加至树节点的属性信息中,构建得到访问模式树。
然后,提取所述访问模式树的每一条路径上的节点集合作为终端标识集合,以及路径终点节点的<时间标识#域名>列表作为对应的终端标识共同访问的访问域名集合,得到访问多个相同域名的终端标识列表,作为所述候选节点组合。
最后,基于预设的筛选规则,对所述候选节点组合进行筛选得到僵尸网络节点的检测结果。
实现如下:判断所述候选节点组合中是否存在包含关系的候选节点;若存在,则删除所述候选节点组合中被包含的候选节点,得到冗余筛选后的候选节点组合;基于冗余筛选后的候选节点组合,获取各候选节点的终端标识集合的元素数目和访问域名集合的元素数目;保留终端标识集合的元素数目大于预设的终端标识数目阈值,以及访问域名集合的元素数目大于预设的访问域名数目阈值的候选节点集合,得到僵尸网络节点。
以下结合实例对上述僵尸网络节点筛选进行详细阐述:
在一个实施例中,首先基于邻接关系构建链表并排序:
其中,链表构建是指基于终端访问关系图,将每个域名节点和其对应的邻接关系节点均构建为一条链表,其中,链表中以<时间标识#访问域名>节点作为链表头节点,后续节点为终端标识节点,终端标识节点的排列顺序按照节点在整个关系图中的度数进行降序排序;
在一个实施例中,对不同链表之间进行顺序排列,其中,排列规则基于访问域名节点在整个关系图中的度数进行降序排序。
以上述终端标识为IP地址的实例为例,根据上述获得的终端访问关系图,建立节点<T2#Domain1>的对应链表。T2#Domain1为头节点,该头结点对应的子节点包括<IP-1,IP-2,IP-3,IP-4>,从终端访问关系图中可看出IP节点按照度数的排序为<IP-3,IP-2,IP-1,IP-4>。
同样对其他链表排序后,再按照域名节点的度数进行降序排序为<T2#Domain1,T1#Domain2,T2#Domain2,T2#Domain3>,最终得到排序后的链表组合如下表3所示:
时间标识#域名 | 访问源IP |
T2#Domain1 | IP-3,IP-2,IP-1,IP-4 |
T1#Domain2 | IP-3,IP-2,IP-1,IP-4 |
T2#Domain2 | IP-3,IP-5 |
T2#Domain3 | IP-3,IP-2,IP-5 |
表3
其次,构建访问模式树:
该访问模式树以终端标识节点为树节点,访问域名信息添加至树节点的属性信息中。访问模式树的构建过程,如图5所示,包括如下过程:
(1)初始化模式树。初始化一棵只包含根节点的树,根节点默认root节点。
(2)读入待处理链表。图4中已经根据域名节点的度数进行了链表降序排序,基于 排序结果逐条读入当前待处理的链表。
(3)设置当前节点。选择设置当前节点,若链表初始读入,则将根节点root设置为当前节点。若链表不是初始读入,则将上一步骤中处理完的节点,设置为当前节点。
(4)链表节点处理。对当前链表中的待处理节点进行判断,判断访问模式树中是否已添加待处理节点为当前节点的子节点。若当前节点所有子节点中未包含待处理节点,则在树中添加待处理节点作为当前节点的子节点,并将当前链表的域名标识信息添加至对应子节点属性信息中;若当前节点所有子节点中包含待处理节点,则将当前链表的域名标识信息更新至对应子节点的属性信息中。
需要说明的是,图4中每条链表以域名节点为头结点,所有终端标识节点按照度数降序排序。终端标识节点集合即为所述待处理节点。
若链表初始读入,则当前节点为根节点,待处理节点为终端标识集合降序排序后度数最高的首节点;
若链表不是初始读入,则当前节点为上一循环中已处理的终端标识节点,待处理节点为终端标识集合降序排序后度数仅次于当前节点的终端标识节点。
(5)链表处理结束判断。判断已处理节点是否为该链表的最后一个节点,若是,则转入步骤6;若不是,则转入步骤3。
(6)访问模式树结束判断。判断已处理链表是否为最后一条链表,若是,则模式树构建结束。若不是,则转入步骤2,读入下一条待处理链表。
以图4中的数据为例,构建模式树的过程如图5所示。
然后,进行候选节点组合提取:对于上述构建好的访问模式树,树中每一条从根节点开始的路径都是一个候选节点组合。路径上的节点集合代表终端标识集合,路径终点节点的<时间标识#域名>列表表示这些终端标识共同访问的域名集合,形成结构<源IP集合,时间标识#域名列表>,即为候选僵尸网络节点组合。
以上述终端标识为IP地址的实例为例并结合图4,可提取到如下候选节点:
<(IP-2,IP-3),(T2#Domain1,T1#Domain2,T2#Domain3)>;
<(IP-5,IP-3),(T2#Domain2)>;
<(IP-1,IP-2,IP-3),(T2#Domain1,T1#Domain2)>;
<(IP-5,IP-2,IP-3),(T2#Domain3)>;
<(IP-4,IP-1,IP-2,IP-3),(T2#Domain1,T1#Domain2)>;
<(IP-4,IP-1,IP-3,IP-2),(T3#Domain3)>。
最后,进行僵尸网络节点筛选:
a、冗余筛选
对于所有候选节点组合结构,若其中存在包含关系,移除所有被包含的结构。
如前述示例中,候选节点<(IP-1,IP-2,IP-3),(T2#Domain1,T1#Domain2)>与<(IP-4,IP-1,IP-2,IP-3),(T2#Domain1,T1#Domain2)>存在包含关系,将被包含的候选节点<(IP-1,IP-2,IP-3),(T2#Domain1,T1#Domain2)>移除。
b、阈值筛选
为了提升准确率,对提取出的候选僵尸网络节点组合进行一定的筛选,挑选符合阈值的结果。
例如,对终端标识集合的元素数目和访问域名数目分别设定阈值终端标识数目阈值Th1和访问域名数目阈值Th2:对于所有候选节点组合,保留终端标识集合的元素数目大于第一预设阈值Th1同时访问域名数目大于第二预设阈值Th2的结构。
经过上述步骤,可得到访问多个相同域名的终端标识域名。格式为<终端标识集合,时间标识#域名集合>,其中终端标识集合即为被僵尸网络感染的机器。
其中:多次表现出相同访问行为,且访问域名具有可疑风险(终端标识集合的元素数目大于第一预设阈值Th1,访问域名集合的元素数目大于第二预设阈值Th2)的主机表现异常,被认为是被僵尸网络感染的机器。对于一条结果,终端标识集合的元素数目越大,访问域名集合的元素数目越大,代表这些机器被僵尸网络感染的概率越高。
本实施例通过上述方案,通过获取被监测网络中的原始网络流量数据,对原始网络流量数据进行预处理,得到预处理后的网络流量数据;基于所述预处理后的网络流量数据构建终端访问关系图;从所述终端访问关系图中挖掘出访问多个相同域名的终端标识列表,得到候选节点组合,基于预设的筛选规则,对所述候选节点组合进行筛选得到僵尸网络节点的检测结果。由此,通过终端网络行为中的域名查询或访问信息来分析比较终端的行为模式,并依据僵尸网络被控终端通常具有相同或近似行为模式这一特征来检测发现僵尸网络的存在。相比一些情况,本公开方案应用数据种类少,在数据流量中提取的特征数量少,计算开销较小,能够有效提升检测效率,且该方案不需要基于确知的僵尸网络行为特征进行检测,能够更好的应用于未知僵尸网络威胁的检出。
如图6所示,本公开第二实施例提出一种僵尸网络检测方法,基于上述图2所示的第一实施例,所述方法还包括:步骤S104,将所述检测结果中的可疑域名发送至恶意域名检测系统,由所述恶意域名检测系统对可疑域名进行评估,以优化所述检测结果。
相比上述实施例,本实施例还包括:恶意域名检测系统对可疑域名进行评估,以优化检测结果的方案。
在一个实施例中,僵尸网络检测系统在获得检测结果后,可将检测结果中的可疑域名发送至恶意域名检测系统,进一步对可疑域名进行评估。
在一个实施例中,此处的恶意域名检测系统可以有多种实现方式,可以是恶意域名黑名单过滤系统,或威胁情报匹配检测系统,或随意域名检测系统等等。
通过恶意域名检测系统对可疑域名进行评估,僵尸网络检测系统可对检测结果进一步进行威胁等级排序,优化检测结果。举例来说,若恶意域名检测系统判断一组终端访问的域名为某僵尸网络域名,则可判定该组具有相同行为模式的终端疑似被同样的僵尸病毒控制,从而直接获取到僵尸网络的拓扑关系。实际部署中,所述僵尸网络检测系统可根据实际情况判断是否执行该步骤操作,对域名信息进一步检测。
本实施例通过上述方案,通过获取被监测网络中的原始网络流量数据,对原始网络流量数据进行预处理,得到预处理后的网络流量数据;基于所述预处理后的网络流量数据构建终端访问关系图;从所述终端访问关系图中挖掘出访问多个相同域名的终端标识列表,得到候选节点组合,基于预设的筛选规则,对所述候选节点组合进行筛选得到僵尸网络节点的检测结果。由此,通过终端网络行为中的域名查询或访问信息来分析比较终端的行为模式,并依据僵尸网络被控终端通常具有相同或近似行为模式这一特征来检测发现僵尸网络的存在。此外,僵尸网络检测系统在获得检测结果后,可将检测结果中的可疑域名发送至恶意域名检测系统,进一步对可疑域名进行评估,从而使得僵尸网络检测系统可对检测结果进一步进行威胁等级排序,优化检测结果。
相比一些情况,本公开方案应用数据种类少,在数据流量中提取的特征数量少,计算开销较小,能够有效提升检测效率,且该方案不需要基于确知的僵尸网络行为特征进行检测,能够更好的应用于未知僵尸网络威胁的检出。
如图7所示,本公开第三实施例提出一种僵尸网络检测方法,基于上述图2所示的第二实施例,所述方法还包括:
步骤S105,将所述检测结果上报至监控系统,由监控系统对检测结果进行展示,和/或,由所述监控系统根据所述检测结果,对疑似感染僵尸病毒的终端进行相关管控操作。
相比上述实施例,本实施例还包括:监控系统对检测结果进行相关操作的方案。
在一个实施例中,本公开实施例的实施过程中,僵尸网络检测系统可以将检测结果上报至监控系统,由监控系统对检测结果进行展示。
在一个实施例中,监控系统还可根据检测结果,对疑似感染僵尸病毒的终端进行相关管控操作,例如,对疑似感染僵尸病毒的终端进行网络限速处理等操作。
需要说明的是,上述僵尸网络检测系统部署方案仅为优选示例,应用中可根据实际组网情况,进行僵尸网络检测系统的部署。
在一个实施例中,除部署在网络中,进行实时流量检测外,本公开僵尸网络检测系统还支持应用于解析流量日志文件,输出网络行为分析检测结果。
此外,本公开实施例还提出一种僵尸网络检测系统,包括:数据预处理模块、关系图构建模块以及节点判断模块,其中:
数据预处理模块:该模块接收原始流量并处理原始数据,提取流量数据中的有效字段,并进行数据清洗,去除冗余和重复信息;
关系图构建模块:该模块接收数据预处理模块处理后的日志数据,并执行终端标识信息聚合、关系图构建和关系图更新操作。该模块的主要操作包括,以域名为中心将同一时间间隔内的终端标识信息进行聚合,形成类似<时间标识、域名、访问域名的终端标识列表>的聚合结构,并构建和更新域名节点与访问域名的终端标识列表间的邻接关系图。
节点判断模块:该模块从关系图构建模块输出的关系图中挖掘访问多个相同域名的终端标识列表,经过筛选后得到被感染的僵尸网络节点。该模块主要操作包括,首先挖掘访问关系图中访问多个相同域名的终端标识列表,形成类似<终端标识集合,访问域名集合>的结构,所述输出结构即为提取到的候选节点组合,然后设置对应的筛选规则,对所述候选节点组合进行筛选,筛选输出结果即为检测到的僵尸网络节点。
在一个实施例中,数据预处理模块,用于获取被监测网络中的原始网络流量数据,对所述原始网络流量数据进行预处理,得到预处理后的网络流量数据;关系图构建模块,用于基于所述预处理后的网络流量数据构建终端访问关系图;节点判断模块,从所述终端访问关系图中挖掘出访问多个相同域名的终端标识列表,得到候选节点组合,基于预设的筛选规则,对所述候选节点组合进行筛选得到僵尸网络节点的检测结果。
在一个实施例中,上述数据预处理模块还用于获取被监测网络中预设时间范围内的实时流量数据或流量日志文件,作为原始网络流量数据,所述实时流量数据或流量日志文件至少包括终端的域名查询请求和/或终端访问域名的HTTP连接请求。
在一个实施例中,上述数据预处理模块还用于对所述原始网络流量数据进行时间区间分组;从时间区间分组后的网络流量数据中提取有效字段,所述有效字段至少包括:时间戳、终端标识、访问域名三个关键字段;对包含有所述有效字段的网络流量数据进行清洗,过滤冗余数据及白名单,得到数据结构为<时间戳、终端标识、访问域名>的日志序列。
在一个实施例中,上述关系图构建模块还用于从预处理后的网络流量数据中提取对应时间区间内终端网络行为中标识终端信息的终端标识和终端访问或查询的域名信息;基于提取的终端标识和终端访问或查询的域名信息,以域名为中心将同一时间区间内的终端标识信息进行聚合,形成<时间标识、域名、访问域名的终端标识集合>的聚合结构;基于所 述聚合结构,构建和更新域名节点与访问域名的终端标识列表间的邻接关系图,得到终端访问关系图。
在一个实施例中,上述节点判断模块还用于基于所述终端访问关系图中的邻接关系构建链表并排序;基于排序后的链表构建访问模式树;提取所述访问模式树的每一条路径上的节点集合作为终端标识集合,以及路径终点节点的<时间标识#域名>列表作为对应的终端标识共同访问的访问域名集合,得到访问多个相同域名的终端标识列表,作为所述候选节点组合。
判断所述候选节点组合中是否存在包含关系的候选节点;若存在,则删除所述候选节点组合中被包含的候选节点,得到冗余筛选后的候选节点组合;基于冗余筛选后的候选节点组合,获取各候选节点的终端标识集合的元素数目和访问域名集合的元素数目;保留终端标识集合的元素数目大于预设的终端标识数目阈值,以及访问域名集合的元素数目大于预设的访问域名数目阈值的候选节点集合,得到僵尸网络节点。
在一个实施例中,僵尸网络检测系统还包括:处理模块,用于将所述检测结果中的可疑域名发送至恶意域名检测系统,由所述恶意域名检测系统对可疑域名进行评估,以优化所述检测结果;以及将所述检测结果上报至监控系统,由监控系统对检测结果进行展示,和/或,由所述监控系统根据所述检测结果,对疑似感染僵尸病毒的终端进行相关管控操作。
本实施例实现僵尸网络检测的详细过程及原理,请参照上述各实施例,在此不再赘述。
相比一些情况,本公开方案应用数据种类少,在数据流量中提取的特征数量少,计算开销较小,能够有效提升检测效率,且该方案不需要基于确知的僵尸网络行为特征进行检测,能够更好的应用于未知僵尸网络威胁的检出。
此外,本公开实施例还提出一种僵尸网络检测系统,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的僵尸网络检测程序,所述僵尸网络检测程序被所述处理器执行时实现如上所述的僵尸网络检测方法的步骤。
在一个实施例中,如图8所示,本实施例系统可以包括:处理器1001,例如CPU,网络接口1004,用户接口1003,存储器1005,通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。
本领域技术人员可以理解,图8中示出的系统结构并不构成对平台的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
如图8所示,作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及僵尸网络检测程序。
在图8所示的系统中,网络接口1004主要用于连接网络服务器,与网络服务器进行数据通信;用户接口1003主要用于连接客户端,与客户端进行数据通信;而处理器1001可以用于调用存储器1005中存储的僵尸网络检测程序,并执行以下操作:获取被监测网络中的原始网络流量数据,对所述原始网络流量数据进行预处理,得到预处理后的网络流量数据;基于所述预处理后的网络流量数据构建终端访问关系图;从所述终端访问关系图中挖掘出访问多个相同域名的终端标识列表,得到候选节点组合,基于预设的筛选规则,对所述候选节点组合进行筛选,得到僵尸网络节点的检测结果。
在一个实施例中,处理器1001可以用于调用存储器1005中存储的僵尸网络检测程序,还执行以下操作:获取被监测网络中预设时间范围内的实时流量数据或流量日志文件,作为原始网络流量数据,所述实时流量数据或流量日志文件至少包括终端的域名查询请求和/或终端访问域名的HTTP连接请求。
在一个实施例中,处理器1001可以用于调用存储器1005中存储的僵尸网络检测程序,还执行以下操作:对所述原始网络流量数据进行时间区间分组;从时间区间分组后的网络流量数据中提取有效字段,所述有效字段至少包括:时间戳、终端标识、访问域名三个关键字段;对包含有所述有效字段的网络流量数据进行清洗,过滤冗余数据及白名单,得到数据结构为<时间戳、终端标识、访问域名>的日志序列。
在一个实施例中,处理器1001可以用于调用存储器1005中存储的僵尸网络检测程序,还执行以下操作:从预处理后的网络流量数据中提取对应时间区间内终端网络行为中标识终端信息的终端标识和终端访问或查询的域名信息;基于提取的终端标识和终端访问或查询的域名信息,以域名为中心将同一时间区间内的终端标识信息进行聚合,形成<时间标识、域名、访问域名的终端标识集合>的聚合结构;基于所述聚合结构,构建和更新域名节点与访问域名的终端标识列表间的邻接关系图,得到终端访问关系图。
在一个实施例中,处理器1001可以用于调用存储器1005中存储的僵尸网络检测程序,还执行以下操作:基于所述终端访问关系图中的邻接关系构建链表并排序;基于排序后的链表构建访问模式树;提取所述访问模式树的每一条路径上的节点集合作为终端标识集合,以及路径终点节点的<时间标识#域名>列表作为对应的终端标识共同访问的访问域名集合,得到访问多个相同域名的终端标识列表,作为所述候选节点组合。
在一个实施例中,处理器1001可以用于调用存储器1005中存储的僵尸网络检测程序,还执行以下操作:基于所述终端访问关系图,将每个域名节点和其对应的邻接关系节点均 构建为一条链表,其中,链表中以<时间标识#访问域名>节点作为链表头节点,后续节点为终端标识节点,终端标识节点的排列顺序按照节点在整个关系图中的度数进行降序排序;对不同链表之间进行顺序排列,其中,排列规则基于访问域名节点在整个关系图中的度数进行降序排序。
在一个实施例中,处理器1001可以用于调用存储器1005中存储的僵尸网络检测程序,还执行以下操作:基于排序后的链表,以链表中终端标识节点为树节点,将访问域名信息添加至树节点的属性信息中,构建得到访问模式树。
在一个实施例中,处理器1001可以用于调用存储器1005中存储的僵尸网络检测程序,还执行以下操作:判断所述候选节点组合中是否存在包含关系的候选节点;若存在,则删除所述候选节点组合中被包含的候选节点,得到冗余筛选后的候选节点组合;基于冗余筛选后的候选节点组合,获取各候选节点的终端标识集合的元素数目和访问域名集合的元素数目;保留终端标识集合的元素数目大于预设的终端标识数目阈值,以及访问域名集合的元素数目大于预设的访问域名数目阈值的候选节点集合,得到僵尸网络节点。
在一个实施例中,处理器1001可以用于调用存储器1005中存储的僵尸网络检测程序,还执行以下操作:将所述检测结果中的可疑域名发送至恶意域名检测系统,由所述恶意域名检测系统对可疑域名进行评估,以优化所述检测结果。
在一个实施例中,处理器1001可以用于调用存储器1005中存储的僵尸网络检测程序,还执行以下操作:将所述检测结果上报至监控系统,由监控系统对检测结果进行展示,和/或,由所述监控系统根据所述检测结果,对疑似感染僵尸病毒的终端进行相关管控操作。
本实施例实现僵尸网络检测的详细过程及原理,请参照上述各实施例,在此不再赘述。
此外,本公开实施例还提出一种计算机可读存储介质,所述计算机可读存储介质上存储有僵尸网络检测程序,所述僵尸网络检测程序被处理器执行时实现如上所述的僵尸网络检测方法的步骤。
本实施例实现僵尸网络检测的详细过程及原理,请参照上述各实施例,在此不再赘述。
相比一些情况,本公开实施例提出的一种僵尸网络检测方法、系统及存储介质,通过获取被监测网络中的原始网络流量数据,对原始网络流量数据进行预处理,得到预处理后的网络流量数据;基于所述预处理后的网络流量数据构建终端访问关系图;从所述终端访问关系图中挖掘出访问多个相同域名的终端标识列表,得到候选节点组合,基于预设的筛选规则,对所述候选节点组合进行筛选得到僵尸网络节点的检测结果。由此,通过终端网络行为中的域名查询或访问信息来分析比较终端的行为模式,并依据僵尸网络被控终端通常具有相同或近似行为模式这一特征来检测发现僵尸网络的存在。相比一些情况,本公开 方案应用数据种类少,在数据流量中提取的特征数量少,计算开销较小,能够有效提升检测效率,且该方案不需要基于确知的僵尸网络行为特征进行检测,能够更好的应用于未知僵尸网络威胁的检出。
以上所述仅为本公开的优选实施例,并非因此限制本公开的专利范围,凡是利用本公开说明书及附图内容所作的等效结构或流程变换,或直接或间接运用在其它相关的技术领域,均同理包括在本公开的专利保护范围内。
Claims (14)
- 一种僵尸网络检测方法,其中,所述方法应用于僵尸网络检测系统,所述方法包括:获取被监测网络中的原始网络流量数据,对所述原始网络流量数据进行预处理,得到预处理后的网络流量数据;基于所述预处理后的网络流量数据构建终端访问关系图;从所述终端访问关系图中挖掘出访问多个相同域名的终端标识列表,得到候选节点组合,基于预设的筛选规则,对所述候选节点组合进行筛选,得到僵尸网络节点的检测结果。
- 根据权利要求1所述的方法,其中,所述获取被监测网络中的原始网络流量数据的步骤包括:获取被监测网络中预设时间范围内的实时流量数据或流量日志文件,作为原始网络流量数据,所述实时流量数据或流量日志文件至少包括终端的域名查询请求和/或终端访问域名的HTTP连接请求。
- 根据权利要求2所述的方法,其中,所述对所述原始网络流量数据进行预处理,得到预处理后的网络流量数据的步骤包括:对所述原始网络流量数据进行时间区间分组;从时间区间分组后的网络流量数据中提取有效字段,所述有效字段至少包括:时间戳、终端标识、访问域名三个关键字段;对包含有所述有效字段的网络流量数据进行清洗,过滤冗余数据及白名单,得到数据结构为<时间戳、终端标识、访问域名>的日志序列。
- 根据权利要求3所述的方法,其中,所述基于所述预处理后的网络流量数据构建终端访问关系图的步骤包括:从预处理后的网络流量数据中提取对应时间区间内终端网络行为中标识终端信息的终端标识和终端访问或查询的域名信息;基于提取的终端标识和终端访问或查询的域名信息,以域名为中心将同一时间区间内的终端标识信息进行聚合,形成<时间标识、域名、访问域名的终端标识集合>的聚合结构;基于所述聚合结构,构建和更新域名节点与访问域名的终端标识列表间的邻接关系图,得到终端访问关系图。
- 根据权利要求4所述的方法,其中,所述从所述终端访问关系图中挖掘出访问多个相同域名的终端标识列表,得到候选节点组合的步骤:基于所述终端访问关系图中的邻接关系构建链表并排序;基于排序后的链表构建访问模式树;提取所述访问模式树的每一条路径上的节点集合作为终端标识集合,以及路径终点节点的<时间标识#域名>列表作为对应的终端标识共同访问的访问域名集合,得到访问多个相同域名的终端标识列表,作为所述候选节点组合。
- 根据权利要求5所述的方法,其中,所述基于所述终端访问关系图中的邻接关系构建链表并排序的步骤包括:基于所述终端访问关系图,将每个域名节点和其对应的邻接关系节点均构建为一条链表,其中,链表中以<时间标识#访问域名>节点作为链表头节点,后续节点为终端标识节点,终端标识节点的排列顺序按照节点在整个关系图中的度数进行降序排序;对不同链表之间进行顺序排列,其中,排列规则基于访问域名节点在整个关系图中的度数进行降序排序。
- 根据权利要求6所述的方法,其中,所述基于排序后的链表构建访问模式树的步骤包括:基于排序后的链表,以链表中终端标识节点为树节点,将访问域名信息添加至树节点的属性信息中,构建得到访问模式树。
- 根据权利要求7所述的方法,其中,所述基于预设的筛选规则,对所述候选节点组合进行筛选得到僵尸网络节点的检测结果的步骤包括:判断所述候选节点组合中是否存在包含关系的候选节点;若存在,则删除所述候选节点组合中被包含的候选节点,得到冗余筛选后的候选节点组合;基于冗余筛选后的候选节点组合,获取各候选节点的终端标识集合的元素数目和访问域名集合的元素数目;保留终端标识集合的元素数目大于预设的终端标识数目阈值,以及访问域名集合的元素数目大于预设的访问域名数目阈值的候选节点集合,得到僵尸网络节点。
- 根据权利要求1-8中任一项所述的方法,其中,所述方法还包括:将所述检测结果中的可疑域名发送至恶意域名检测系统,由所述恶意域名检测系统对可疑域名进行评估,以优化所述检测结果。
- 根据权利要求1-8中任一项所述的方法,其中,所述方法还包括:将所述检测结果上报至监控系统,由监控系统对检测结果进行展示,和/或,由所述监控系统根据所述检测结果,对疑似感染僵尸病毒的终端进行相关管控操作。
- 一种僵尸网络检测系统,其中,包括:数据预处理模块,用于获取被监测网络中的原始网络流量数据,对所述原始网络流量数据进行预处理,得到预处理后的网络流量数据;关系图构建模块,用于基于所述预处理后的网络流量数据构建终端访问关系图;节点判断模块,从所述终端访问关系图中挖掘出访问多个相同域名的终端标识列表,得到候选节点组合,基于预设的筛选规则,对所述候选节点组合进行筛选得到僵尸网络节点的检测结果。
- 一种网络访问系统,其中,所述网络访问系统包括:僵尸网络检测系统,以及均与所述僵尸网络检测系统通信连接的恶意域名检测系统、监控系统、DNS服务器,所述僵尸网络检测系统还连接若干访问终端;其中,所述僵尸网络检测系统为权利要求11所述的僵尸网络检测系统;DNS服务器,用于向访问终端提供域名访问服务;所述恶意域名检测系统,用于接收所述僵尸网络检测系统发送的所述检测结果中的可疑域名,对所述可疑域名进行评估,以优化所述检测结果;所述监控系统,用于接收所述僵尸网络检测系统上报的检测结果,对检测结果进行展示,和/或,根据所述检测结果,对疑似感染僵尸病毒的终端进行相关管控操作。
- 一种僵尸网络检测系统,其中,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的僵尸网络检测程序,所述僵尸网络检测程序被所述处理器执行时实现如权利要求1-10中任一项所述的僵尸网络检测方法的步骤。
- 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有僵尸网络检测程序,所述僵尸网络检测程序被处理器执行时实现如权利要求1-10中任一项所述的僵尸网络检测方法的步骤。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP19901939.9A EP3905622A4 (en) | 2018-12-26 | 2019-12-19 | ZOMBIE NETWORK DETECTION METHOD AND SYSTEM, AND INFORMATION HOLDER |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811602973.7A CN111371735B (zh) | 2018-12-26 | 2018-12-26 | 僵尸网络检测方法、系统及存储介质 |
CN201811602973.7 | 2018-12-26 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020135233A1 true WO2020135233A1 (zh) | 2020-07-02 |
Family
ID=71128478
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/126754 WO2020135233A1 (zh) | 2018-12-26 | 2019-12-19 | 僵尸网络检测方法、系统及存储介质 |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP3905622A4 (zh) |
CN (1) | CN111371735B (zh) |
WO (1) | WO2020135233A1 (zh) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112988915A (zh) * | 2021-01-27 | 2021-06-18 | 厦门市健康医疗大数据中心(厦门市医药研究所) | 数据展示方法和装置 |
CN113242159A (zh) * | 2021-05-24 | 2021-08-10 | 中国工商银行股份有限公司 | 应用访问关系确定方法及装置 |
CN113676374A (zh) * | 2021-08-13 | 2021-11-19 | 杭州安恒信息技术股份有限公司 | 目标网站线索检测方法、装置、计算机设备和介质 |
CN114205095A (zh) * | 2020-08-27 | 2022-03-18 | 极客信安(北京)科技有限公司 | 一种加密恶意流量的检测方法和装置 |
CN114327506A (zh) * | 2021-12-16 | 2022-04-12 | 安天科技集团股份有限公司 | 一种终端设备应用软件的识别方法、装置及电子设备 |
US20220124101A1 (en) * | 2019-03-07 | 2022-04-21 | Lookout, Inc. | Domain name and url visual verification for increased security |
CN115134095A (zh) * | 2021-03-10 | 2022-09-30 | 中国电信股份有限公司 | 僵尸网络控制端检测方法及装置、存储介质、电子设备 |
CN115277077A (zh) * | 2022-06-22 | 2022-11-01 | 中国电力科学研究院有限公司 | 一种确定处于通信频繁模式的受控设备的方法及系统 |
CN116132167A (zh) * | 2023-02-13 | 2023-05-16 | 中国民航大学 | 一种面向物联网的多协议僵尸网络检测方法 |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113645191B (zh) * | 2021-07-13 | 2023-02-28 | 北京华云安信息技术有限公司 | 可疑主机的确定方法、装置、设备和计算机可读存储介质 |
CN114244580A (zh) * | 2021-11-29 | 2022-03-25 | 北京华清信安科技有限公司 | 用于互联网僵尸网络的图形分析识别方法 |
CN116233064B (zh) * | 2021-12-06 | 2024-09-13 | 中移(苏州)软件技术有限公司 | 一种信息确定方法、装置、设备及计算机可读存储介质 |
CN114401122B (zh) * | 2021-12-28 | 2024-04-05 | 中国电信股份有限公司 | 一种域名检测方法、装置、电子设备及存储介质 |
CN115118491B (zh) * | 2022-06-24 | 2024-02-09 | 北京天融信网络安全技术有限公司 | 僵尸网络检测的方法、装置、电子设备及可读存储介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103491074A (zh) * | 2013-09-09 | 2014-01-01 | 中国科学院计算机网络信息中心 | 僵尸网络检测方法及装置 |
US20150281259A1 (en) * | 2012-07-05 | 2015-10-01 | Tenable Network Security, Inc. | System and method for strategic anti-malware monitoring |
CN106060067A (zh) * | 2016-06-29 | 2016-10-26 | 上海交通大学 | 基于Passive DNS迭代聚类的恶意域名检测方法 |
CN107249049A (zh) * | 2017-07-21 | 2017-10-13 | 北京亚鸿世纪科技发展有限公司 | 一种对网络采集的域名数据进行筛选的方法及设备 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8555388B1 (en) * | 2011-05-24 | 2013-10-08 | Palo Alto Networks, Inc. | Heuristic botnet detection |
CN102685145A (zh) * | 2012-05-28 | 2012-09-19 | 西安交通大学 | 一种基于dns数据包的僵尸网络域名发现方法 |
CN103532969A (zh) * | 2013-10-23 | 2014-01-22 | 国家电网公司 | 一种僵尸网络检测方法、装置及处理器 |
CN106375345B (zh) * | 2016-10-28 | 2019-07-16 | 中国科学院信息工程研究所 | 一种基于周期性检测的恶意软件域名检测方法及系统 |
-
2018
- 2018-12-26 CN CN201811602973.7A patent/CN111371735B/zh active Active
-
2019
- 2019-12-19 EP EP19901939.9A patent/EP3905622A4/en active Pending
- 2019-12-19 WO PCT/CN2019/126754 patent/WO2020135233A1/zh unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150281259A1 (en) * | 2012-07-05 | 2015-10-01 | Tenable Network Security, Inc. | System and method for strategic anti-malware monitoring |
CN103491074A (zh) * | 2013-09-09 | 2014-01-01 | 中国科学院计算机网络信息中心 | 僵尸网络检测方法及装置 |
CN106060067A (zh) * | 2016-06-29 | 2016-10-26 | 上海交通大学 | 基于Passive DNS迭代聚类的恶意域名检测方法 |
CN107249049A (zh) * | 2017-07-21 | 2017-10-13 | 北京亚鸿世纪科技发展有限公司 | 一种对网络采集的域名数据进行筛选的方法及设备 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3905622A4 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220124101A1 (en) * | 2019-03-07 | 2022-04-21 | Lookout, Inc. | Domain name and url visual verification for increased security |
US11968217B2 (en) * | 2019-03-07 | 2024-04-23 | Lookout, Inc. | Domain name and URL visual verification for increased security |
CN114205095B (zh) * | 2020-08-27 | 2023-08-18 | 极客信安(北京)科技有限公司 | 一种加密恶意流量的检测方法和装置 |
CN114205095A (zh) * | 2020-08-27 | 2022-03-18 | 极客信安(北京)科技有限公司 | 一种加密恶意流量的检测方法和装置 |
CN112988915A (zh) * | 2021-01-27 | 2021-06-18 | 厦门市健康医疗大数据中心(厦门市医药研究所) | 数据展示方法和装置 |
CN115134095A (zh) * | 2021-03-10 | 2022-09-30 | 中国电信股份有限公司 | 僵尸网络控制端检测方法及装置、存储介质、电子设备 |
CN113242159B (zh) * | 2021-05-24 | 2022-12-09 | 中国工商银行股份有限公司 | 应用访问关系确定方法及装置 |
CN113242159A (zh) * | 2021-05-24 | 2021-08-10 | 中国工商银行股份有限公司 | 应用访问关系确定方法及装置 |
CN113676374A (zh) * | 2021-08-13 | 2021-11-19 | 杭州安恒信息技术股份有限公司 | 目标网站线索检测方法、装置、计算机设备和介质 |
CN113676374B (zh) * | 2021-08-13 | 2024-03-22 | 杭州安恒信息技术股份有限公司 | 目标网站线索检测方法、装置、计算机设备和介质 |
CN114327506A (zh) * | 2021-12-16 | 2022-04-12 | 安天科技集团股份有限公司 | 一种终端设备应用软件的识别方法、装置及电子设备 |
CN115277077A (zh) * | 2022-06-22 | 2022-11-01 | 中国电力科学研究院有限公司 | 一种确定处于通信频繁模式的受控设备的方法及系统 |
CN116132167A (zh) * | 2023-02-13 | 2023-05-16 | 中国民航大学 | 一种面向物联网的多协议僵尸网络检测方法 |
CN116132167B (zh) * | 2023-02-13 | 2024-04-26 | 中国民航大学 | 一种面向物联网的多协议僵尸网络检测方法 |
Also Published As
Publication number | Publication date |
---|---|
EP3905622A1 (en) | 2021-11-03 |
CN111371735A (zh) | 2020-07-03 |
EP3905622A4 (en) | 2022-09-07 |
CN111371735B (zh) | 2022-06-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020135233A1 (zh) | 僵尸网络检测方法、系统及存储介质 | |
CN108701187B (zh) | 用于混合硬件软件分布式威胁分析的设备和方法 | |
US10547674B2 (en) | Methods and systems for network flow analysis | |
US9860154B2 (en) | Streaming method and system for processing network metadata | |
US11140038B2 (en) | Systems and methods for network device management using device clustering | |
CN101399749B (zh) | 一种报文过滤的方法、系统和设备 | |
JP7425832B2 (ja) | IoTセキュリティにおけるパターンマッチングベースの検出 | |
EP3223495B1 (en) | Detecting an anomalous activity within a computer network | |
CN109150859B (zh) | 一种基于网络流量流向相似性的僵尸网络检测方法 | |
Gao et al. | A dos resilient flow-level intrusion detection approach for high-speed networks | |
Sarica et al. | A novel sdn dataset for intrusion detection in iot networks | |
JP2016508353A (ja) | ネットワークメタデータを処理する改良されたストリーミング方法およびシステム | |
US10291632B2 (en) | Filtering of metadata signatures | |
Narang et al. | PeerShark: flow-clustering and conversation-generation for malicious peer-to-peer traffic identification | |
US11343143B2 (en) | Using a flow database to automatically configure network traffic visibility systems | |
CN108833430B (zh) | 一种软件定义网络的拓扑保护方法 | |
JP2007074339A (ja) | 拡散型不正アクセス検出方法および拡散型不正アクセス検出システム | |
US10187414B2 (en) | Differential malware detection using network and endpoint sensors | |
Blaise et al. | Split-and-Merge: detecting unknown botnets | |
TWI634769B (zh) | Method for detecting domain name transformation botnet through proxy server log | |
JP2008135871A (ja) | ネットワーク監視システム、ネットワーク監視方法及びネットワーク監視プログラム | |
TWI666568B (zh) | 在Netflow上以會話型式之P2P殭屍網路偵測方法 | |
Kheir et al. | Peerviewer: Behavioral tracking and classification of P2P malware | |
CN108347447B (zh) | 基于周期性通讯行为分析的p2p僵尸网络检测方法、系统 | |
Huang et al. | Exploiting intra-packet dependency for fine-grained protocol format inference |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19901939 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2019901939 Country of ref document: EP Effective date: 20210726 |