WO2020133986A1 - 僵尸网络域名家族的检测方法、装置、设备及存储介质 - Google Patents

僵尸网络域名家族的检测方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2020133986A1
WO2020133986A1 PCT/CN2019/093331 CN2019093331W WO2020133986A1 WO 2020133986 A1 WO2020133986 A1 WO 2020133986A1 CN 2019093331 W CN2019093331 W CN 2019093331W WO 2020133986 A1 WO2020133986 A1 WO 2020133986A1
Authority
WO
WIPO (PCT)
Prior art keywords
domain name
domain
botnet
family
graph
Prior art date
Application number
PCT/CN2019/093331
Other languages
English (en)
French (fr)
Inventor
闫凡
赵振洋
古亮
Original Assignee
深信服科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深信服科技股份有限公司 filed Critical 深信服科技股份有限公司
Priority to SG11202106429VA priority Critical patent/SG11202106429VA/en
Priority to EP19903904.1A priority patent/EP3905624B1/en
Publication of WO2020133986A1 publication Critical patent/WO2020133986A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1458Denial of Service
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1466Active attacks involving interception, injection, modification, spoofing of data unit addresses, e.g. hijacking, packet injection or TCP sequence number attacks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2101/00Indexing scheme associated with group H04L61/00
    • H04L2101/30Types of network names
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2463/00Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00
    • H04L2463/144Detection or countermeasures against botnets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2463/00Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00
    • H04L2463/146Tracing the source of attacks

Definitions

  • This application relates to the field of information security technology, and in particular to a method, device, device, and computer-readable storage medium for detecting a botnet domain name family.
  • DDoS distributed denial of service attacks
  • C&C command and control
  • the mainstream botnet domain name family detection schemes mainly include grammar-based detection and virus traffic-based detection.
  • the botnet domain name family detection based on grammatical features considers that domain names belonging to the same botnet family have similarity in grammatical features. Domain names of the same botnet family are often generated using the same domain name generation algorithm (DGA). By generating a large number of random domain names with similar grammatical characteristics to escape the blacklist detection. By extracting features such as the proportion of consonants in the domain name, the longest meaningful length, n-gram (the distribution of the sequence of n consecutive words in the text or language), you can distinguish the domain names generated by different DGA algorithms To discover domain names that belong to the same botnet family. Recently, Recurrent Neural Network (RNN) has also been used in the detection of DGA domain names.
  • RNN Recurrent Neural Network
  • Botnet domain name family detection based on virus traffic clusters and detects C&C domain names by family based on malicious file family information. By analyzing the domain names accessed by malicious files belonging to the same virus family, it can be analyzed whether the two domain names belong to the same botnet family. Although this method does not simply rely on the grammatical characteristics of the domain name, it will be limited by the number of virus samples. In addition, viruses may interfere with this method by accessing some legitimate domain names, leading to certain false positives. Some studies have shown that the active time of a large number of C&C domain names will be weeks or even months earlier than the time when the corresponding virus samples are obtained, which leads to a certain lag in the method based on virus traffic, which cannot be found immediately and eliminated. Threat.
  • the purpose of this application is to provide a botnet domain name family detection method, device, equipment and computer-readable storage medium to solve the existing botnet domain name family detection single detection dimension, too dependent on virus sample collection, real-time detection Poor problem.
  • this application provides a method for detecting a botnet domain name family, including:
  • each suspicious domain name serves as a node and is composed of two domain names with at least one correlation One side, the correlation between the two domain names is used as the attribute value of the side;
  • the judgment index of the compactness of each node in the graph calculation it is determined that the compact domain names in the domain-time space-time association graph are obtained, and the set of corresponding domain names is regarded as a botnet domain name family.
  • the determining, according to the judgment index of the degree of compactness of each node in the graph calculation, determining that the domain names that are compact in the spatial-temporal association graph of the domain name include:
  • the determining, according to the judgment index of the degree of compactness of each node in the graph calculation, determining that the domain names that are compact in the spatial-temporal association graph of the domain name include:
  • the abnormal subgraph and the abnormal node in the domain name spatio-temporal association graph are removed, and it is determined that the domain names in the domain name spatio-temporal association graph are compactly connected.
  • the removing the abnormal sub-graph and the abnormal node in the domain-time spatio-temporal association graph includes:
  • Webpage ranking algorithm is used to measure the importance of each node in the remaining communities after removing the abnormal communities, and remove the potential abnormal points with low importance.
  • the correlation between the different dimensions includes any combination of the following features:
  • the method further includes:
  • the method further includes:
  • the new domain name belongs to the botnet domain name family, so as to monitor the variation and expansion of the botnet domain name family.
  • This application also provides a botnet domain name family detection device, including:
  • Domain name acquisition module for obtaining suspicious domain names
  • the association graph construction module is used to construct a domain name spatio-temporal association graph based on the correlation between the suspicious domain names in different dimensions; in the domain name spatio-temporal association graph, each suspicious domain name as a node has at least one association The two domain names form an edge, and the correlation between the two domain names is used as the attribute value of the edge;
  • the detection module is used to determine the compact domain names in the space-time association graph of the domain name according to the judgment index of the compactness of each node in the graph calculation, and use the set of corresponding domain names as the botnet domain name family.
  • the detection module is configured to: according to the sparseness and average clustering degree, determine and obtain a compact domain name in the spatial-temporal association diagram of the domain name.
  • the detection module is configured to: remove anomalous subgraphs and abnormal nodes in the domain name spatio-temporal association graph, and determine to obtain a compact domain name in the domain name spatio-temporal association graph.
  • the detection module is configured to: decompose the spatio-temporal correlation graph of the domain name into subgraphs composed of multiple connected branches according to connectivity, and remove abnormal subgraphs that do not meet the preset first connectivity index; use community discovery The algorithm divides the remaining subgraphs after removing the abnormal subgraphs to obtain a community composed of multiple connected branches, and removes the abnormal communities that do not meet the preset second connected index; the web ranking algorithm is used to measure the remaining of each node after removing the abnormal communities Importance level in the community, removing potential anomalies with low importance.
  • the correlation between the different dimensions includes any combination of the following characteristics: similarity in the grammatical characteristics of the domain name, similarity in the association between the domain name and the virus, similarity in the IP address resolved by the domain name, domain name Similarity in access traffic.
  • it also includes:
  • the security detection module is used to analyze the family characteristics of the botnet domain name family after taking the set of corresponding domain names as the botnet domain name family, and perform security detection on the application scenario according to the analyzed family characteristics.
  • it also includes:
  • the monitoring module is used to determine whether the new domain name belongs to the botnet domain name family by judging the association between the new domain name and the known domain name after taking the set of corresponding domain names as the botnet domain name family, so as to Monitor variants and expansion of the domain name family.
  • This application also provides a botnet domain name family detection equipment, including:
  • Memory used to store computer programs
  • the processor is configured to implement any of the steps of the method for detecting a botnet domain name family when executing the computer program.
  • the present application also provides a computer-readable storage medium that stores a computer program on the computer-readable storage medium, and when the computer program is executed by a processor, implements any of the above methods for detecting the botnet domain name family step.
  • the detection method of the botnet domain name family is to obtain the suspicious domain name; based on the correlation between the suspicious domain names in different dimensions, construct a domain name space-time association graph; in the domain name space-time association graph, each suspicious domain name serves as a node , Two domain names with at least one association form an edge, and the association between the two domain names is used as the attribute value of the edge; according to the judgment index of the compactness of each node in the graph calculation, it is determined that the domain name spatiotemporal association graph is obtained Contact a compact domain name and use the set of corresponding domain names as a botnet domain name family.
  • this application can express the correlation between different dimensions between domain names in the form of a correlation graph, so it has stronger detection capabilities .
  • the detection of this application does not depend on virus traffic, it can detect the zombie domain family more quickly, and process it in time to reduce the various losses caused, and has a wider applicability.
  • the present application also provides a detection device, device, and computer-readable storage medium for a botnet domain name family with the above-mentioned technical advantages.
  • FIG. 1 is a flowchart of a specific implementation manner of a method for detecting a botnet domain name family provided by this application;
  • FIG. 2 is a flowchart of a process for removing anomalous sub-graphs and anomalous nodes in the domain-time spatial-temporal association diagram of the present application;
  • FIG. 3 is a structural block diagram of a detection device for a botnet domain name family provided by an embodiment of the present application.
  • FIG. 1 A flowchart of a specific implementation manner of a method for detecting a botnet domain name family provided by this application is shown in FIG. 1, and the method includes:
  • Step S101 Obtain a suspicious domain name.
  • a suspicious domain name refers to a domain name that excludes a legitimate domain name that is obviously normal and detects at least one abnormal behavior.
  • the grammatical characteristics of a domain name are particularly DGA domain names, and the active time is always concentrated in the early morning. Excluding domain names that are obviously normal can be done through whitelisting techniques, such as adding Alexa-ranked domain names to the whitelist and thinking that these higher-ranked domain names are unlikely to become C&C domain names for botnets.
  • Step S102 Construct a spatio-temporal association graph of domain names based on the correlation between the suspicious domain names in different dimensions; in the spatio-temporal association graph of domain names, each suspicious domain name serves as a node and has at least one related domain name An edge is formed between the two, and the correlation between the two domain names is used as the attribute value of the edge.
  • the suspicious domain names can be analyzed to obtain the correlation between the suspicious domain names in different dimensions.
  • the relevance between domain names may be expressed in the form of a triplet, such as (domain name 1, domain name 2, relevance index).
  • the relevance index in the triple contains the similarity measure between the associated dimension and the two domain names in this dimension.
  • different targeted metrics can be set to measure the similarity of the two domain names in this dimension.
  • the two domain names can be similar in multiple feature dimensions, which means that in the triple, the relevance index part can be represented by an array.
  • the correlation between the different dimensions includes any combination of the following characteristics: similarity in the grammatical characteristics of the domain name, similarity in the association between the domain name and the virus, similarity in the IP address resolved by the domain name, and domain name access traffic Similarity.
  • similarity in the grammatical characteristics of the domain name similarity in the association between the domain name and the virus
  • similarity in the IP address resolved by the domain name similarity in the IP address resolved by the domain name
  • domain name access traffic Similarity it is not limited to the above-mentioned types. You can choose as many and comprehensive dimensions as possible to analyze the relevance of the domain name, including all aspects of the domain name behavior, and if you find a new dimension of relevance, you can The feature analysis of this dimension is added to the analysis and has a strong scalability.
  • a graph database can be used to construct and store a spatio-temporal correlation graph based on triples, and subsequent expansion of the spatio-temporal correlation graph over time can also be conveniently operated based on the graph database.
  • the space-time association diagram of domain names depicts the association between domain names in time and space.
  • the association in space is the association in various dimensions.
  • the association in time refers to the change relationship between domain names over time. Domain names that were not originally associated with time may have associations in certain dimensions with the advancement of time and the occurrence of certain security events.
  • each suspicious domain name serves as a node, and two domain names with at least one association form an edge, and the association between the two domain names serves as the attribute value of the edge.
  • An edge can have multiple attributes, and each attribute corresponds to a one-dimensional correlation, which can include the time when the domain name was detected suspicious, whether the domain name can successfully resolve the IP address, and so on. For example, if domain name 1 and domain name 2 are related, then in the domain name space-time correlation diagram, domain name 1 and domain name 2 are connected by an edge. That is to say, only domain names that exist in at least one dimension will have connected edges. Relevance between domain names, such as (association dimension A, similarity degree of association dimension A; association dimension B, similarity degree of association dimension B; association dimension C, similarity degree of association dimension C%) as edges Property.
  • Step S103 According to the judgment index of the compactness of each node in the graph calculation, determine the domain names that are compact in the spatial-temporal association graph of the domain name, and use the set of corresponding domain names as the botnet domain name family.
  • Sparseness and average clustering degree are two suitable indicators to measure the connectivity of subgraphs.
  • Sparseness is defined as the ratio of the number of edges in the graph to the number of fully connected graph edges formed by the nodes in the graph;
  • the average clustering degree is defined as the average of the ratio of the number of triangles around the node to the number of potential triangles.
  • the sparsity index can accurately measure the compactness of the graph when the number of nodes is small, and the average clustering degree can accurately measure the compactness of the graph when the number of nodes is large. Therefore, the combination of sparsity and average clustering degree can more accurately measure the compactness or connectivity of the graph.
  • the present application may determine, according to the sparseness and average clustering degree, the domain names with compact links in the spatial-temporal association graph of the domain name.
  • the detection method of the botnet domain name family is to obtain the suspicious domain name; based on the correlation of each suspicious domain name in different dimensions, construct a domain name space-time association graph; in the domain name space-time association graph, each suspicious domain name serves as a node , Two domain names with at least one association form an edge, and the association between the two domain names is used as the attribute value of the edge; according to the judgment index of the compactness of each node in the graph calculation, it is determined that the domain name spatiotemporal association graph is obtained Contact a compact domain name and use the set of corresponding domain names as a botnet domain name family.
  • this application can express the correlation between different dimensions between domain names in the form of a correlation graph, so it has stronger detection capabilities .
  • the detection of this application does not depend on virus traffic, it can detect the zombie domain family more quickly, and process it in time to reduce the various losses caused, and has a wider applicability.
  • This application can comprehensively consider the similarity of various dimensions between domain names, construct a spatiotemporal correlation graph between domain names, exclude abnormal subgraphs in the graph based on graph calculations, and exclude abnormal nodes in normal subgraphs, leaving only the closest links
  • the domain name constitutes a family of botnets.
  • the botnet family or close subgraph is used as a unit to further analyze the botnet family from the dimensions of virus traffic, domain name grammar characteristics and other dimensions to eliminate possible false positives.
  • the process of determining the compact domain name in the domain name space-time association graph may specifically include:
  • the abnormal node is removed, and it is determined that the domain name with compact contact in the spatial-temporal correlation diagram of the domain name is obtained.
  • the process of removing the abnormal sub-graph and the abnormal node in the spatial-temporal correlation diagram of the domain name in this application may specifically include:
  • Step S201 Decompose the spatiotemporal correlation graph of the domain name into subgraphs composed of multiple connected branches according to connectivity, and remove abnormal subgraphs that do not satisfy the preset first connectivity index;
  • the preset first connectivity index may be that the number of nodes in the subgraph is greater than a preset threshold and the connectivity is strong.
  • the number of nodes in the subgraph is too small, it means that there is too little information collected for the botnet domain names of the family, so it is difficult to accurately analyze the correlation between the family domain names, so the subgraph with a small number of these nodes is discarded , Wait until enough information is collected before further analysis. In this way, subgraphs with a sufficient number of nodes and strong connectivity can be further analyzed, and the remaining subgraphs are discarded.
  • the connectivity in this step not only includes a specific metric, but also includes the use of multiple graphs of compactness metrics.
  • the initial spatio-temporal correlation graph contains all the botnet families that need to be analyzed, the mass storage of the graph structure needs to be dealt with when performing connected branch discovery, and the demand for memory and computing power is relatively large, so it needs to be distributed.
  • Type of processing When calculating the connectivity of subgraphs composed of connected branches, one or more metrics can be selected according to the requirements to exclude subgraphs that are not sufficiently connected.
  • Step S202 use the community discovery algorithm to divide the remaining subgraphs after removing the abnormal subgraphs to obtain communities composed of multiple connected branches, and remove the abnormal communities that do not meet the preset second connected index;
  • the community refers to a group structure with close internal connections and sparse external connections.
  • the community discovery algorithm can discover such a group structure by measuring the association between nodes.
  • a community discovery algorithm is used to divide a subgraph composed of a branch with strong connectivity into several communities with stronger connectivity, and discard those communities with insufficient connectivity.
  • the subgraphs obtained through the community discovery algorithm have a stronger degree of compactness, so the possibility of belonging to the same botnet family is greater, and the possibility of false positives is smaller.
  • the second connectivity index in this application should be more strict than the first connectivity index in terms of connectivity index, because the compactness of the subgraph composed of the community and the subgraph composed of the connected branch of the community Compared to stronger.
  • Step S203 A webpage ranking algorithm is used to measure the importance of each node in the remaining communities after removing the abnormal communities, and the potential abnormal points with low importance are removed.
  • the webpage ranking algorithm was first used to measure the importance of a particular webpage relative to other webpages in the search engine index.
  • the webpage ranking algorithm can measure the node's overall graph based on the degree of association between a node and other nodes. importance. For communities that are sufficiently connected, use a page ranking algorithm to measure the importance of each node in the community, eliminate potential outliers with lower importance, and further improve the compactness and connectivity of the community.
  • the webpage ranking algorithm inputs a graph and outputs a measure of the importance of each node in the graph.
  • the sum of the webpage ranking values of all nodes is 1. Intuitively, if a node is associated with more other nodes, the more important it is in the graph. Sort the nodes in the graph according to the ranking value of the web page, and you can find those nodes that are less important in the graph and less relevant to other nodes in the graph. By deleting these nodes from the graph, you can ensure that the nodes in the graph have strong connectivity with each other.
  • the method of outlier detection based on webpage ranking value can include a variety of statistical methods. By analyzing the distribution of node page rank values in the graph, you can find those nodes whose webpage ranking value is clearly lower than the average level, and then you can use these The node is deleted from the graph.
  • the method for detecting a botnet domain name family provided by this application after the set of corresponding domain names is used as a botnet domain name family further includes:
  • the main characteristics of each family are mainly determined by the attribute values of the edges in the community consisting of the family domain name.
  • the edge attribute values include the similarity in the grammatical characteristics of the domain name, the similarity in the association between the domain name and the virus, and the IP address resolved by the domain name. One or more of similarity, similarity in domain name access traffic, etc.
  • the resulting black domain name can be deployed in a variety of security detection application scenarios, including but not limited to joining the black domain name database, performing domain name reputation scoring, etc.
  • Domain names that are clustered in the same botnet family usually contain both known C&C domain names and newly discovered domain names. By evaluating the relevance of newly discovered domain names to known C&C domain names, these newly discovered domain names can be given as The malicious confidence is to determine the reputation of the domain name.
  • the present application may further include: determining whether the new domain name belongs to the botnet domain name family by judging the correlation between the new domain name and the known domain name, so as to monitor the variation and expansion of the botnet domain name family.
  • hackers will use a new domain name to communicate. Because they can evaluate the correlation between a new domain name and a known domain name in multiple dimensions, and use the data structure of the graph to represent it, it is easy to get from the spatio-temporal association graph of the botnet family. Determine whether the new domain name belongs to the previously detected botnet family, and observe the evolution and expansion of the family. In addition, due to the real-time nature of the botnet domain name family detection method based on graph calculation, it is possible to detect the newly added C&C domain name of a botnet family in the first time.
  • This application analyzes the similarity of various dimensions between domain names, and expresses the similarity of different dimensions in a unified graph data structure to form a spatio-temporal correlation diagram of domain names.
  • the graph calculation algorithm is executed on the spatio-temporal association graph. After connected branch discovery, community discovery, and outlier detection based on web page ranking, several subgraphs with strong cohesion and weak externality are found. Each subgraph corresponds to a botnet domain name family.
  • This application does not only rely on the characteristics of a certain dimension to cluster and detect C&C domain names belonging to the same botnet family, but comprehensively considers the similarity relationship between domains in various dimensions, so it can Domain names are more comprehensively analyzed and have stronger scalability, and the features of the newly added dimensions can also be easily added to the detection method proposed in this application.
  • the detection method proposed in this application has strong real-time performance, and it is not necessary to obtain a malicious file sample corresponding to the C&C domain name in order to analyze the botnet family to which it belongs.
  • the present application can analyze the different properties of different types of C&C domain names based on the family and further apply the family analysis results to other security detection scenarios. In addition, it can also track the evolution and expansion of the botnet domain name family and discover the newly added C&C domain names in the family.
  • the detection device of the botnet domain name family provided by the embodiment of the present application will be described below.
  • the detection device of the botnet domain name family described below and the detection method of the botnet domain name family described above may correspond to each other.
  • FIG. 3 is a structural block diagram of a detection device for a botnet domain name family according to an embodiment of the present application.
  • the detection device for a botnet domain name family may include:
  • Domain name acquisition module 100 used to obtain suspicious domain names
  • the association graph construction module 200 is used to construct a spatio-temporal association graph of domain names based on the correlation between the suspicious domain names in different dimensions; in the spatio-temporal association graph of domain names, each suspicious domain name as a node has at least one association
  • the two domain names of the sex form an edge, and the correlation between the two domain names is used as the attribute value of the edge;
  • the detection module 300 is used for determining the compact domain names in the spatial-temporal association graph of the domain name according to the judgment index of the compactness of each node in the graph calculation, and using the set of corresponding domain names as the botnet domain name family.
  • the detection module 300 is specifically configured to: according to the sparsity and the average clustering degree, determine the domain names with compact links in the spatial-temporal association diagram of the domain names.
  • the detection module 300 is specifically configured to: remove the abnormal subgraph and the abnormal node in the domain name spatio-temporal association graph, and determine to obtain a compact domain name in the domain name spatio-temporal association graph.
  • the correlation between the different dimensions includes any combination of the following features:
  • the botnet domain name family detection device may further include:
  • the security detection module is used to analyze the family characteristics of the botnet domain name family after taking the set of corresponding domain names as the botnet domain name family, and perform security detection on the application scenario according to the analyzed family characteristics.
  • the botnet domain name family detection device may further include:
  • the monitoring module is used to determine whether the new domain name belongs to the botnet domain name family by judging the association between the new domain name and the known domain name after taking the set of corresponding domain names as the botnet domain name family, so as to Monitor variants and expansion of the domain name family.
  • the detection device of the botnet domain name family in this embodiment is used to implement the foregoing detection method of the botnet domain name family. Therefore, the specific implementation of the detection device of the botnet domain name family can be seen in the implementation of the detection method of the botnet domain name family in the foregoing
  • the domain name acquisition module 100, the association graph construction module 200, and the detection module 300 are respectively used to implement steps S101, S102, and S103 in the above detection method of the botnet domain name family. Therefore, for specific implementations, refer to the corresponding The description of the embodiments of each part will not be repeated here.
  • this application also provides a detection device for a botnet domain name family, including:
  • Memory used to store computer programs
  • the processor is configured to implement any of the steps of the method for detecting the botnet domain name family when executing the computer program.
  • the present application also provides a computer-readable storage medium that stores a computer program on the computer-readable storage medium, and when the computer program is executed by a processor, implements any one of the aforementioned botnet domain name families The steps of the detection method.
  • this application analyzes the similarity of various dimensions between domain names, and expresses the similarity of different dimensions in a unified graph data structure to form a spatio-temporal correlation diagram of domain names.
  • the graph calculation algorithm is executed on the spatio-temporal association graph. After connected branch discovery, community discovery, and outlier detection based on web page ranking, several subgraphs with strong cohesion and weak externality are found, and each subgraph corresponds to a botnet domain name family.
  • This application does not only rely on the characteristics of a certain dimension to cluster and detect C&C domain names belonging to the same botnet family, but comprehensively considers the similarity relationship between domains in various dimensions, so it can Domain names are more comprehensively analyzed and have stronger scalability, and the features of the newly added dimensions can also be easily added to the detection method proposed in this application.
  • the detection method proposed in this application has strong real-time performance, and it is not necessary to obtain a malicious file sample corresponding to the C&C domain name in order to analyze the botnet family to which it belongs.
  • RAM random access memory
  • ROM read-only memory
  • electrically programmable ROM electrically erasable and programmable ROM
  • registers hard disks, removable disks, CD-ROMs, or all devices in the technical field Any other known storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

一种僵尸网络域名家族的检测方法、装置、设备以及计算机可读存储介质,通过获取可疑域名;基于各可疑域名在不同维度之间的关联性,构建域名时空关联图;在域名时空关联图中,各可疑域名作为一个节点,拥有至少一种关联性的两个域名之间构成一条边,两个域名之间的关联性作为边的属性值;根据图计算中各个节点紧凑程度的判断指标,确定得到域名时空关联图中联系紧凑的域名,将对应域名的集合作为僵尸网络域名家族。本申请将域名之间各个不同维度之间的关联性统一用关联图的形式表示出来,具备更强的检测能力。同时,能够更快速检测出僵尸域名家族,具备更广泛的适用性。

Description

僵尸网络域名家族的检测方法、装置、设备及存储介质
本申请要求于2018年12月24日提交中国专利局、申请号为201811584694.2、发明名称为“僵尸网络域名家族的检测方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及信息安全技术领域,特别是涉及一种僵尸网络域名家族的检测方法、装置、设备以及计算机可读存储介质。
背景技术
僵尸网络对网络安全构成严重威胁,不法分子通过僵尸网络发起分布式拒绝服务攻击(Distributed Denial of Service,DDoS)、进行恶意挖矿、信息窃取、发送垃圾邮件等,严重危及国家、企业、组织和个人的利益,快速、准确的识别僵尸网络的通信并及时阻断有着非常重大的意义。大量的僵尸网络基于DNS协议发送命令与控制(Command and Control,C&C)信息与僵尸主机通信。
主流的僵尸网络域名家族检测方案主要包括基于文法特征的检测与基于病毒流量的检测两种。
基于文法特征的僵尸网络域名家族检测认为属于同一僵尸网络家族的域名具备文法特征的相似性。同一僵尸网络家族的域名往往使用相同的域名生成算法(Domain Generation Algorithm,DGA)生成,通过生成大量的具有相似文法特征的随机域名来逃脱黑名单的检测。通过提取例如域名中辅音字母占的比例、最长有意义的长度、n-gram(文字或语言中的n个连续的单词组成序列的分布)等特征,可以将由不同DGA算法生成的域名区分开,从而发现属于同一僵尸网络家族的域名。最近,循环神经网络(Recurrent Neural Network,RNN)也被用于DGA域名的检测中,通过RNN学习出域名构成的字符序列的特征,发现属于同一僵尸网络家族的DGA域名。然而,并不是所有的僵尸网络家族都只使用DGA算法来生成 C&C域名,一旦同一家族的C&C域名之间不具备明显的文法特征上的相似性,这一类的僵尸网络域名家族检测算法就不能获得很好的性能。
基于病毒流量的僵尸网络域名家族检测通过恶意文件的家族信息对C&C域名按家族进行聚类并检测。通过分析属于同一病毒家族的恶意文件访问的域名情况可以分析出两个域名是否属于同一个僵尸网络家族。虽然这种方法并不只是简单的依赖于域名的文法特征,但是会受到病毒样本的数目的制约。此外,病毒会通过访问一些合法的域名对这种方法造成干扰,导致一定误报的产生。一些研究表明,大量C&C域名的活跃时间会比对应的病毒样本被获得的时间早上数周,甚至数月,这就导致基于病毒流量的方法具有一定的滞后性,不能第一时间发现威胁,排除威胁。
发明内容
本申请的目的是提供一种僵尸网络域名家族的检测方法、装置、设备以及计算机可读存储介质,以解决现有僵尸网络域名家族检测中检测维度单一、过于依赖病毒样本收集、检测实时性较差的问题。
为解决上述技术问题,本申请提供一种僵尸网络域名家族的检测方法,包括:
获取可疑域名;
基于各所述可疑域名在不同维度之间的关联性,构建域名时空关联图;在所述域名时空关联图中,各可疑域名作为一个节点,拥有至少一种关联性的两个域名之间构成一条边,两个域名之间的关联性作为边的属性值;
根据图计算中各个节点紧凑程度的判断指标,确定得到所述域名时空关联图中联系紧凑的域名,将对应域名的集合作为僵尸网络域名家族。
可选地,所述根据图计算中各个节点紧凑程度的判断指标,确定得到所述域名时空关联图中联系紧凑的域名包括:
根据稀疏度和平均聚类程度,确定得到所述域名时空关联图中联系紧凑的域名。
可选地,所述根据图计算中各个节点紧凑程度的判断指标,确定得到所述域名时空关联图中联系紧凑的域名包括:
对所述域名时空关联图中的异常子图以及异常节点进行去除,确定得到所述域名时空关联图中联系紧凑的域名。
可选地,所述对所述域名时空关联图中的异常子图以及异常节点进行去除包括:
依据连通性将所述域名时空关联图分解为多个连通分支构成的子图,去除不满足预设第一连通性指标的异常子图;
使用社区发现算法对去除异常子图后剩余的子图进行分割,得到多个连通分支构成的社区,去除不满足预设第二连通指标的异常社区;
采用网页排名算法度量每个节点在去除异常社区后剩余的社区中的重要程度,去除重要程度低的潜在异常点。
可选地,所述不同维度之间的关联性包括以下特征中的任意组合:
域名文法特征上的相似性、域名与病毒关联上的相似性、域名解析出的IP地址上的相似性、域名访问流量上的相似性。
可选地,在所述将对应域名的集合作为僵尸网络域名家族之后还包括:
对所述僵尸网络域名家族的家族特征进行分析,并根据分析得到的家族特征对应用场景进行安全检测。
可选地,在所述将对应域名的集合作为僵尸网络域名家族之后还包括:
通过判断新域名与已知域名之间的关联性,确定新域名是否属于所述僵尸网络域名家族,以便对所述僵尸网络域名家族的变种和扩张进行监控。
本申请还提供了一种僵尸网络域名家族的检测装置,包括:
域名获取模块,用于获取可疑域名;
关联图构建模块,用于基于各所述可疑域名在不同维度之间的关联性,构建域名时空关联图;在所述域名时空关联图中,各可疑域名作为一个节点,拥有至少一种关联性的两个域名之间构成一条边,两个域名之间的关联性作为边的属性值;
检测模块,用于根据图计算中各个节点紧凑程度的判断指标,确定得到所述域名时空关联图中联系紧凑的域名,将对应域名的集合作为僵尸网络域名家族。
可选地,所述检测模块用于:根据稀疏度和平均聚类程度,确定得到 所述域名时空关联图中联系紧凑的域名。
可选地,所述检测模块用于:对所述域名时空关联图中的异常子图以及异常节点进行去除,确定得到所述域名时空关联图中联系紧凑的域名。
可选地,所述检测模块用于:依据连通性将所述域名时空关联图分解为多个连通分支构成的子图,去除不满足预设第一连通性指标的异常子图;使用社区发现算法对去除异常子图后剩余的子图进行分割,得到多个连通分支构成的社区,去除不满足预设第二连通指标的异常社区;采用网页排名算法度量每个节点在去除异常社区后剩余的社区中的重要程度,去除重要程度低的潜在异常点。
可选地,所述不同维度之间的关联性包括以下特征中的任意组合:域名文法特征上的相似性、域名与病毒关联上的相似性、域名解析出的IP地址上的相似性、域名访问流量上的相似性。
可选地,还包括:
安全检测模块,用于在将对应域名的集合作为僵尸网络域名家族之后,对所述僵尸网络域名家族的家族特征进行分析,并根据分析得到的家族特征对应用场景进行安全检测。
可选地,还包括:
监控模块,用于在将对应域名的集合作为僵尸网络域名家族之后,通过判断新域名与已知域名之间的关联性,确定新域名是否属于所述僵尸网络域名家族,以便对所述僵尸网络域名家族的变种和扩张进行监控。
本申请还提供了一种僵尸网络域名家族的检测设备,包括:
存储器,用于存储计算机程序;
处理器,用于执行所述计算机程序时实现上述任一种僵尸网络域名家族的检测方法的步骤。
本申请还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现上述任一种所述僵尸网络域名家族的检测方法的步骤。
本申请所提供的僵尸网络域名家族的检测方法,通过获取可疑域名;基于各可疑域名在不同维度之间的关联性,构建域名时空关联图;在域名 时空关联图中,各可疑域名作为一个节点,拥有至少一种关联性的两个域名之间构成一条边,两个域名之间的关联性作为边的属性值;根据图计算中各个节点紧凑程度的判断指标,确定得到域名时空关联图中联系紧凑的域名,将对应域名的集合作为僵尸网络域名家族。本申请与基于文法特征的家族检测方法只关心某一层面的关联性相比,能够将域名之间各个不同维度之间的关联性统一用关联图的形式表示出来,因而具备更强的检测能力。同时,本申请的检测不依赖于病毒流量,能够更快速检测出僵尸域名家族,并及时进行处理,降低造成的各种损失,具备更广泛的适用性。此外,本申请还提供了具有上述技术优点的僵尸网络域名家族的检测装置、设备以及计算机可读存储介质。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请所提供的僵尸网络域名家族的检测方法的一种具体实施方式的流程图;
图2为本申请对所述域名时空关联图中的异常子图以及异常节点进行去除的过程流程图;
图3为本申请实施例提供的僵尸网络域名家族的检测装置的结构框图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面结合附图和具体实施方式对本申请作进一步的详细说明。显然,所描述的实施例仅仅是 本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请所提供的僵尸网络域名家族的检测方法的一种具体实施方式的流程图如图1所示,该方法包括:
步骤S101:获取可疑域名。
可疑域名指的是排除了明显为正常的合法域名,并且检测出具备至少一项异常行为的域名,如域名文法特征特别像DGA域名、活跃时间总是集中在凌晨等。排除明显为正常的域名可以通过白名单技术,如将Alexa排名较高的域名加入白名单,并且认为这些排名较高的域名不太可能成为僵尸网络的C&C域名。
步骤S102:基于各所述可疑域名在不同维度之间的关联性,构建域名时空关联图;在所述域名时空关联图中,各可疑域名作为一个节点,拥有至少一种关联性的两个域名之间构成一条边,两个域名之间的关联性作为边的属性值。
在步骤S101获取到可疑域名之后,可以对这些可疑域名进行分析,能够得到各个可疑域名在不同维度之间的关联性。作为一种具体实施方式,可以将域名之间的关联性表示为三元组的形式,如(域名1,域名2,关联性指标)。其中,三元组中的关联性指标包含关联的维度与在该维度上两个域名的相似性度量。针对不同的关联维度,可以设置针对性的不同的度量指标,用以衡量两个域名在该维度上的相似程度。两个域名可以在多个特征维度上具有相似性,表示在三元组中,关联性指标部分可以用数组来表示。
其中,所述不同维度之间的关联性包括以下特征中的任意组合:域名文法特征上的相似性、域名与病毒关联上的相似性、域名解析出的IP地址上的相似性、域名访问流量上的相似性。当然,并不限于上述这几种,可以选取尽可能多、尽可能全面的维度对域名的关联性进行分析,包罗域名行为的各个方面,并且如果发现新的维度上的关联性,就可以将该维度的特征分析加入到分析中来,具备很强的可扩展性。
本实施例中,可以使用图数据库基于三元组进行时空关联图的构建与存储,后续随着时间的推进对时空关联图的扩充也可以很方便的基于图数据库进行操作。域名时空关联图刻画了域名之间在时间与空间上的关联性,空间上的关联即为各个维度上的关联性,时间上的关联指的是域名之间随着时间的变化关系,两个原本并没有关联的域名随着时间的推进以及某些特定安全事件的产生,有可能会在某些维度产生关联。
在构建的域名时空关联图中,各可疑域名作为一个节点,拥有至少一种关联性的两个域名之间构成一条边,两个域名之间的关联性作为边的属性值。一条边可以有多个属性,每个属性对应着一个维度的关联性,可以包括域名被检测可疑的时间、域名能否成功解析出IP地址等。例如,域名1与域名2之间具有关联性,则在域名时空关联图中,用一条边将域名1与域名2连起来。也就是说,只有在至少一个维度上存在关联的域名才会存在相连的边。将域名之间的关联性,例如(关联维度A,关联维度A对应的相似程度;关联维度B,关联维度B对应的相似程度;关联维度C,关联维度C对应的相似程度……)作为边的属性。
步骤S103:根据图计算中各个节点紧凑程度的判断指标,确定得到所述域名时空关联图中联系紧凑的域名,将对应域名的集合作为僵尸网络域名家族。
连通性,也就是各个节点紧凑程度,可以有多个度量标准,不同的度量标准在不同的场景下拥有更好的性能。稀疏度和平均聚类程度是两个比较合适的度量子图连通性的指标。稀疏度定义为图中边的数目与图中节点构成的全连接图边的数目的比值;平均聚类程度定义为节点周边三角形的数目与潜在三角形数目比值的平均。稀疏度指标在节点数目较少时能够准确衡量图的紧凑程度,而平均聚类程度则在节点数目较多的时候能够准确衡量图的紧凑程度。因此,稀疏度与平均聚类程度配合使用能够更加准确的衡量图的紧凑程度或连通程度。作为一种具体实施方式,本申请可以根据稀疏度和平均聚类程度,确定得到所述域名时空关联图中联系紧凑的域名。
本申请所提供的僵尸网络域名家族的检测方法,通过获取可疑域名; 基于各可疑域名在不同维度之间的关联性,构建域名时空关联图;在域名时空关联图中,各可疑域名作为一个节点,拥有至少一种关联性的两个域名之间构成一条边,两个域名之间的关联性作为边的属性值;根据图计算中各个节点紧凑程度的判断指标,确定得到域名时空关联图中联系紧凑的域名,将对应域名的集合作为僵尸网络域名家族。本申请与基于文法特征的家族检测方法只关心某一层面的关联性相比,能够将域名之间各个不同维度之间的关联性统一用关联图的形式表示出来,因而具备更强的检测能力。同时,本申请的检测不依赖于病毒流量,能够更快速检测出僵尸域名家族,并及时进行处理,降低造成的各种损失,具备更广泛的适用性。
本申请能够综合考虑域名之间各个维度的相似度,构建出域名之间的时空关联图,基于图计算排除图中的异常子图,并排除正常子图中的异常节点,只保留联系最紧密的域名构成的僵尸网络家族。最后,以僵尸网络家族或紧密子图为单位,从病毒流量、域名文法特征等维度对僵尸网络家族进行进一步的分析,排除可能的误报。
本申请实施例中,根据图计算中各个节点紧凑程度的判断指标,确定得到所述域名时空关联图中联系紧凑的域名的过程可以具体为:对所述域名时空关联图中的异常子图以及异常节点进行去除,确定得到所述域名时空关联图中联系紧凑的域名。参照图2,本申请对所述域名时空关联图中的异常子图以及异常节点进行去除的过程可以具体包括:
步骤S201:依据连通性将所述域名时空关联图分解为多个连通分支构成的子图,去除不满足预设第一连通性指标的异常子图;
其中,预设第一连通性指标可以为子图中的节点数目大于预设阈值并且连通性强。当子图中的节点数目过少时,说明对于该家族的僵尸网络域名收集的信息过少,因此很难准确分析出该家族域名之间的关联性,于是将这些节点数目较少的子图丢弃,等到收集到足够的信息后再做进一步的分析。这样,可以将子图中节点数目足够多并且连通性较强的子图进行进一步分析,其余的子图则丢弃。
需要指出的是,本步骤中的连通性不只包含某一个特定的度量指标, 也可以包含多个图的紧凑程度度量指标的配合使用。
本步骤中,由于初始的时空关联图包含需要进行分析的所有僵尸网络家族,在进行连通分支发现时需要处理海量的以图结构存储的性质,对内存以及计算能力的需求比较大,因此需要分布式的处理。在计算连通分支构成的子图的连通度时,可以根据需求选取一个或多个度量指标,将连通不够紧密的子图排除。
步骤S202:使用社区发现算法对去除异常子图后剩余的子图进行分割,得到多个连通分支构成的社区,去除不满足预设第二连通指标的异常社区;
社区指的是一种内部连接紧密,而外部稀疏的群体结构。社区发现算法能够通过衡量节点之间的关联发现这样的群体结构。使用社区发现算法将一个连通性较强的分支构成的子图分割成若干个连通性更强的社区,丢弃那些连通性不够强的社区。通过社区发现算法获得的子图具有更强的紧凑程度,因此属于同一个僵尸网络家族的可能就更大,出现误报的可能就更小。
需要指出的是,本申请中第二连通指标比第一连通性指标在连通性指标上应该更加严格,因为社区构成的子图的紧凑程度与该社区所在的连通分支构成的子图的连通程度相比更强。
本步骤中在选择社区发现算法时,需要综合考虑算法的精度与时间复杂度。一些具有很好社区划分性能的算法往往需要大量的计算资源,耗费大量的计算时间;而一些启发式的算法虽然能够快速的进行社区发现,但划分的结果有待进一步考虑,因此在实际部署时,需要根据实际的需求对二者进行权衡。
步骤S203:采用网页排名算法度量每个节点在去除异常社区后剩余的社区中的重要程度,去除重要程度低的潜在异常点。
网页排名算法最早用来衡量特定网页相对于搜索引擎索引中的其他网页而言的重要程度,在一个图中网页排名算法能够根据一个节点与其他节点的关联程度衡量出该节点在整个图中的重要性。对于连通性足够强的社区,使用网页排名算法度量每个节点在社区中的重要程度,排除重要程度 较低的潜在异常点,进一步提升社区的紧凑程度与连通性。
网页排名算法输入一张图,输出图中每个节点在图中的重要程度的度量,所有节点的网页排名值之和为1。直观的,如果一个节点与越多的其他节点有关联,其在图中的重要性就越强。把图中的节点按照网页排名值进行排序,可以发现那些在图中重要性较低的与图中其他节点关联性较低的节点。通过将这些节点从图中删去,可以保证图中的节点彼此之间均拥有较强的连通性。基于网页排名值进行异常点检测的方法可以包含多种统计学方法,通过对图中节点page rank值的分布进行分析,可以发现那些网页排名值明确低于平均水平的节点,然后就可以将这些节点从图中删去。
属于同一个社区的域名已经经过了之前多次的过滤,保留的节点之间拥有很强的关联性,具有很强的内聚性,因此认为这些节点属于同一个僵尸网络家族,至此就完成同一僵尸网络家族节点的聚类。
在上述任一实施例的基础上,本申请所提供的僵尸网络域名家族的检测方法在所述将对应域名的集合作为僵尸网络域名家族之后还包括:
对所述僵尸网络域名家族的家族特征进行分析,并根据分析得到的家族特征对应用场景进行安全检测。
各个家族的主要特征主要由该家族域名构成的社区中边的属性值决定,边的属性值包括域名文法特征上的相似性、域名与病毒关联上的相似性、域名解析出的IP地址上的相似性、域名访问流量上的相似性等方面中的一个或多个。通过对边的属性值进行分析能够了解到该家族的域名是通过什么原因而聚类到一起,从而了解到该僵尸网络家族中域名主要的特征是什么。
产出的黑域名可被部署在多种安全检测的应用场景,包括但不限于加入黑域名库、进行域名信誉评分等。被聚类在同一个僵尸网络家族的域名中通常既包含已知的C&C域名,也包含新发现的域名,通过评测新发现域名与已知C&C域名的关联程度,可以给出这些新发现域名为恶意的置信度,即确定域名信誉。
进一步地,本申请还可以包括:通过判断新域名与已知域名之间的关 联性,确定新域名是否属于所述僵尸网络域名家族,以便对所述僵尸网络域名家族的变种和扩张进行监控。
随着域名信息收集的越来越完善,将会获得更多关于某个僵尸网络家族的域名信息与域名关联信息,该家族的域名时空关联图也会变得越来越完善。原本很多因为图中节点数目不足够或者连通程度不够紧密而未被继续分析的域名家族随着信息的收集,能够不再被过滤掉,从而产出更多的检测结果;原本被认为是属于两个僵尸网络家族的子图也有可能因为信息收集的不断完善而被发现是属于同一个僵尸网络家族。
黑客为了逃脱检测会使用新的域名进行通信,由于能够在多个维度评测一个新域名与已知域名的关联性,并统一用图的数据结构表示,很容易从僵尸网络家族的时空关联图中判断新域名是否属于之前已经检测出的僵尸网络家族,从而观察到该家族的演变与扩张的过程。此外,由于基于图计算的僵尸网络域名家族检测方法的实时性,能够在第一时间检测出某个僵尸网络家族新增的C&C域名。
本申请通过对域名之间各个维度的相似性进行分析,并将不同维度的相似性统一用图的数据结构进行表示,构成域名的时空关联图。在时空关联图上执行图计算算法,经过连通分支发现、社区发现、基于网页排名的异常点检测发现具有强内聚、弱外联性质的若干子图,每个子图对应一个僵尸网络域名家族。本申请并不只是仅仅依赖某一维度的特征对属于同一僵尸网络家族的C&C域名进行聚类与检测,而是综合考虑域名之间各个维度的相似性关系,因此能够对属于同一僵尸网络家族的域名更加全面的分析,并且具备更强的可扩展性,新增的维度的特征也可以很容易加入到本申请提出的检测方法中去。本申请提出的检测方法具备很强的实时性,并不需要一定要获得C&C域名对应的恶意文件样本之后才能对其所属的僵尸网络家族进行分析。
进一步地,本申请能够基于家族可以分析不同类型C&C域名不同的性质并将家族分析结果进一步应用到其他的安全检测场景中。并且,还能够跟踪僵尸网络域名家族的演变与扩张,第一时间发现家族中新增的C&C域名。
下面对本申请实施例提供的僵尸网络域名家族的检测装置进行介绍,下文描述的僵尸网络域名家族的检测装置与上文描述的僵尸网络域名家族的检测方法可相互对应参照。
图3为本申请实施例提供的僵尸网络域名家族的检测装置的结构框图,参照图3僵尸网络域名家族的检测装置可以包括:
域名获取模块100,用于获取可疑域名;
关联图构建模块200,用于基于各所述可疑域名在不同维度之间的关联性,构建域名时空关联图;在所述域名时空关联图中,各可疑域名作为一个节点,拥有至少一种关联性的两个域名之间构成一条边,两个域名之间的关联性作为边的属性值;
检测模块300,用于根据图计算中各个节点紧凑程度的判断指标,确定得到所述域名时空关联图中联系紧凑的域名,将对应域名的集合作为僵尸网络域名家族。
其中,所述检测模块300具体用于:根据稀疏度和平均聚类程度,确定得到所述域名时空关联图中联系紧凑的域名。
作为一种具体实施方式,所述检测模块300具体用于:对所述域名时空关联图中的异常子图以及异常节点进行去除,确定得到所述域名时空关联图中联系紧凑的域名。
其中,所述不同维度之间的关联性包括以下特征中的任意组合:
域名文法特征上的相似性、域名与病毒关联上的相似性、域名解析出的IP地址上的相似性、域名访问流量上的相似性。
在上述任一实施例的基础上,本申请所提供的僵尸网络域名家族的检测装置还可以进一步包括:
安全检测模块,用于在将对应域名的集合作为僵尸网络域名家族之后,对所述僵尸网络域名家族的家族特征进行分析,并根据分析得到的家族特征对应用场景进行安全检测。
此外,在上述任一实施例的基础上,本申请所提供的僵尸网络域名家族的检测装置还可以进一步包括:
监控模块,用于在将对应域名的集合作为僵尸网络域名家族之后,通过判断新域名与已知域名之间的关联性,确定新域名是否属于所述僵尸网络域名家族,以便对所述僵尸网络域名家族的变种和扩张进行监控。
本实施例的僵尸网络域名家族的检测装置用于实现前述的僵尸网络域名家族的检测方法,因此僵尸网络域名家族的检测装置中的具体实施方式可见前文中的僵尸网络域名家族的检测方法的实施例部分,例如,域名获取模块100,关联图构建模块200,检测模块300,分别用于实现上述僵尸网络域名家族的检测方法中步骤S101,S102,S103,所以,其具体实施方式可以参照相应的各个部分实施例的描述,在此不再赘述。
另一方面,本申请还提供了一种僵尸网络域名家族的检测设备,包括:
存储器,用于存储计算机程序;
处理器,用于执行所述计算机程序时实现上述任一种所述僵尸网络域名家族的检测方法的步骤。
又一方面,本申请还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现上述任一种所述僵尸网络域名家族的检测方法的步骤。
综上,本申请通过对域名之间各个维度的相似性进行分析,并将不同维度的相似性统一用图的数据结构进行表示,构成域名的时空关联图。在时空关联图上执行图计算算法,经过连通分支发现、社区发现、基于网页排名的异常点检测发现具有强内聚、弱外联性质的若干子图,每个子图对应一个僵尸网络域名家族。本申请并不只是仅仅依赖某一维度的特征对属于同一僵尸网络家族的C&C域名进行聚类与检测,而是综合考虑域名之间各个维度的相似性关系,因此能够对属于同一僵尸网络家族的域名更加全面的分析,并且具备更强的可扩展性,新增的维度的特征也可以很容易加入到本申请提出的检测方法中去。本申请提出的检测方法具备很强的实时性,并不需要一定要获得C&C域名对应的恶意文件样本之后才能对其所属的僵尸网络家族进行分析。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见 即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
以上对本申请所提供的僵尸网络域名家族的检测方法、装置、设备以及计算机可读存储介质进行了详细介绍。本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想。应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请权利要求的保护范围内。

Claims (20)

  1. 一种僵尸网络域名家族的检测方法,其特征在于,包括:
    获取可疑域名;
    基于各所述可疑域名在不同维度之间的关联性,构建域名时空关联图;在所述域名时空关联图中,各可疑域名作为一个节点,拥有至少一种关联性的两个域名之间构成一条边,两个域名之间的关联性作为边的属性值;
    根据图计算中各个节点紧凑程度的判断指标,确定得到所述域名时空关联图中联系紧凑的域名,将对应域名的集合作为僵尸网络域名家族。
  2. 如权利要求1所述的僵尸网络域名家族的检测方法,其特征在于,所述根据图计算中各个节点紧凑程度的判断指标,确定得到所述域名时空关联图中联系紧凑的域名包括:
    根据稀疏度和平均聚类程度,确定得到所述域名时空关联图中联系紧凑的域名。
  3. 如权利要求1所述的僵尸网络域名家族的检测方法,其特征在于,所述根据图计算中各个节点紧凑程度的判断指标,确定得到所述域名时空关联图中联系紧凑的域名包括:
    对所述域名时空关联图中的异常子图以及异常节点进行去除,确定得到所述域名时空关联图中联系紧凑的域名。
  4. 如权利要求3所述的僵尸网络域名家族的检测方法,其特征在于,所述对所述域名时空关联图中的异常子图以及异常节点进行去除包括:
    依据连通性将所述域名时空关联图分解为多个连通分支构成的子图,去除不满足预设第一连通性指标的异常子图;
    使用社区发现算法对去除异常子图后剩余的子图进行分割,得到多个连通分支构成的社区,去除不满足预设第二连通指标的异常社区;
    采用网页排名算法度量每个节点在去除异常社区后剩余的社区中的重要程度,去除重要程度低的潜在异常点。
  5. 如权利要求1所述的僵尸网络域名家族的检测方法,其特征在于,所述不同维度之间的关联性包括以下特征中的任意组合:
    域名文法特征上的相似性、域名与病毒关联上的相似性、域名解析出 的IP地址上的相似性、域名访问流量上的相似性。
  6. 如权利要求1至5任一项所述的僵尸网络域名家族的检测方法,其特征在于,在所述将对应域名的集合作为僵尸网络域名家族之后还包括:
    对所述僵尸网络域名家族的家族特征进行分析,并根据分析得到的家族特征对应用场景进行安全检测。
  7. 如权利要求6所述的僵尸网络域名家族的检测方法,其特征在于,在所述将对应域名的集合作为僵尸网络域名家族之后还包括:
    通过判断新域名与已知域名之间的关联性,确定新域名是否属于所述僵尸网络域名家族,以便对所述僵尸网络域名家族的变种和扩张进行监控。
  8. 如权利要求1所述的僵尸网络域名家族的检测方法,其特征在于,所述基于各所述可疑域名在不同维度之间的关联性,构建域名时空关联图,包括:
    将各所述可疑域名在不同维度之间的关联性表示为三元组的形式,其中,所述三元组中的关联性指标包含关联的维度与在该维度上两个域名的相似性度量;
    使用图数据库基于所述三元组构建域名时空关联图。
  9. 如权利要求4所述的僵尸网络域名家族的检测方法,其特征在于,所述第二连通指标比所述第一连通性指标在连通性指标上更加严格。
  10. 一种僵尸网络域名家族的检测装置,其特征在于,包括:
    域名获取模块,用于获取可疑域名;
    关联图构建模块,用于基于各所述可疑域名在不同维度之间的关联性,构建域名时空关联图;在所述域名时空关联图中,各可疑域名作为一个节点,拥有至少一种关联性的两个域名之间构成一条边,两个域名之间的关联性作为边的属性值;
    检测模块,用于根据图计算中各个节点紧凑程度的判断指标,确定得到所述域名时空关联图中联系紧凑的域名,将对应域名的集合作为僵尸网络域名家族。
  11. 如权利要求10所述的僵尸网络域名家族的检测装置,其特征在于,所述检测模块用于:根据稀疏度和平均聚类程度,确定得到所述域名时空 关联图中联系紧凑的域名。
  12. 如权利要求10所述的僵尸网络域名家族的检测装置,其特征在于,所述检测模块用于:对所述域名时空关联图中的异常子图以及异常节点进行去除,确定得到所述域名时空关联图中联系紧凑的域名。
  13. 如权利要求12所述的僵尸网络域名家族的检测装置,其特征在于,所述检测模块用于:依据连通性将所述域名时空关联图分解为多个连通分支构成的子图,去除不满足预设第一连通性指标的异常子图;使用社区发现算法对去除异常子图后剩余的子图进行分割,得到多个连通分支构成的社区,去除不满足预设第二连通指标的异常社区;采用网页排名算法度量每个节点在去除异常社区后剩余的社区中的重要程度,去除重要程度低的潜在异常点。
  14. 如权利要求10所述的僵尸网络域名家族的检测装置,其特征在于,所述不同维度之间的关联性包括以下特征中的任意组合:域名文法特征上的相似性、域名与病毒关联上的相似性、域名解析出的IP地址上的相似性、域名访问流量上的相似性。
  15. 如权利要求10至14任一项所述的僵尸网络域名家族的检测装置,其特征在于,还包括:
    安全检测模块,用于在将对应域名的集合作为僵尸网络域名家族之后,对所述僵尸网络域名家族的家族特征进行分析,并根据分析得到的家族特征对应用场景进行安全检测。
  16. 如权利要求15所述的僵尸网络域名家族的检测装置,其特征在于,还包括:
    监控模块,用于在将对应域名的集合作为僵尸网络域名家族之后,通过判断新域名与已知域名之间的关联性,确定新域名是否属于所述僵尸网络域名家族,以便对所述僵尸网络域名家族的变种和扩张进行监控。
  17. 如权利要求10所述的僵尸网络域名家族的检测装置,其特征在于,所述关联图构建模块,具体用于将各所述可疑域名在不同维度之间的关联性表示为三元组的形式,其中,所述三元组中的关联性指标包含关联的维度与在该维度上两个域名的相似性度量;使用图数据库基于所述三元组构 建域名时空关联图。
  18. 如权利要求13所述的僵尸网络域名家族的检测装置,其特征在于,所述第二连通指标比所述第一连通性指标在连通性指标上更加严格。
  19. 一种僵尸网络域名家族的检测设备,其特征在于,包括:
    存储器,用于存储计算机程序;
    处理器,用于执行所述计算机程序时实现如权利要求1至9任一项所述僵尸网络域名家族的检测方法的步骤。
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至9任一项所述僵尸网络域名家族的检测方法的步骤。
PCT/CN2019/093331 2018-12-24 2019-06-27 僵尸网络域名家族的检测方法、装置、设备及存储介质 WO2020133986A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
SG11202106429VA SG11202106429VA (en) 2018-12-24 2019-06-27 Botnet domain name family detecting method, apparatus, device, and storage medium
EP19903904.1A EP3905624B1 (en) 2018-12-24 2019-06-27 Botnet domain name family detecting method, device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811584694.2 2018-12-24
CN201811584694.2A CN111355697B (zh) 2018-12-24 2018-12-24 僵尸网络域名家族的检测方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2020133986A1 true WO2020133986A1 (zh) 2020-07-02

Family

ID=71128992

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/093331 WO2020133986A1 (zh) 2018-12-24 2019-06-27 僵尸网络域名家族的检测方法、装置、设备及存储介质

Country Status (4)

Country Link
EP (1) EP3905624B1 (zh)
CN (1) CN111355697B (zh)
SG (1) SG11202106429VA (zh)
WO (1) WO2020133986A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112565283A (zh) * 2020-12-15 2021-03-26 厦门服云信息科技有限公司 一种apt攻击检测方法、终端设备及存储介质
CN114389974A (zh) * 2022-03-23 2022-04-22 中国人民解放军国防科技大学 查找分布式训练系统中异常流量节点的方法、装置及介质
CN114615003A (zh) * 2020-12-07 2022-06-10 中国移动通信有限公司研究院 命令和控制c&c域名的验证方法、装置及电子设备
CN115118491A (zh) * 2022-06-24 2022-09-27 北京天融信网络安全技术有限公司 僵尸网络检测的方法、装置、电子设备及可读存储介质

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112910888A (zh) * 2021-01-29 2021-06-04 杭州迪普科技股份有限公司 非法域名注册团伙挖掘方法及装置
CN113364764B (zh) * 2021-06-02 2022-07-12 中国移动通信集团广东有限公司 基于大数据的信息安全防护方法及装置
CN113595994B (zh) * 2021-07-12 2023-03-21 深信服科技股份有限公司 一种异常邮件检测方法、装置、电子设备及存储介质
CN114039772B (zh) * 2021-11-08 2023-11-28 北京天融信网络安全技术有限公司 针对网络攻击的检测方法及电子设备
CN114051015B (zh) * 2021-11-22 2023-07-14 北京天融信网络安全技术有限公司 域名流量图的构建方法、装置、设备以及存储介质
CN114662110B (zh) * 2022-05-18 2022-09-02 杭州海康威视数字技术股份有限公司 一种网站检测方法、装置及电子设备
TWI816441B (zh) * 2022-06-20 2023-09-21 中華電信股份有限公司 域名偵測系統及方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102045215A (zh) * 2009-10-21 2011-05-04 成都市华为赛门铁克科技有限公司 僵尸网络检测方法及装置
US8763117B2 (en) * 2012-03-02 2014-06-24 Cox Communications, Inc. Systems and methods of DNS grey listing
CN107566376A (zh) * 2017-09-11 2018-01-09 中国信息安全测评中心 一种威胁情报生成方法、装置及系统

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101350822B (zh) * 2008-09-08 2011-06-15 南开大学 一种Internet恶意代码的发现和追踪方法
US8516585B2 (en) * 2010-10-01 2013-08-20 Alcatel Lucent System and method for detection of domain-flux botnets and the like
US9922190B2 (en) * 2012-01-25 2018-03-20 Damballa, Inc. Method and system for detecting DGA-based malware
CN105224533B (zh) * 2014-05-28 2019-09-03 北京搜狗科技发展有限公司 浏览器收藏夹整理方法和装置
CN104579773B (zh) * 2014-12-31 2016-08-24 北京奇虎科技有限公司 域名系统分析方法及装置
CN106060067B (zh) * 2016-06-29 2018-12-25 上海交通大学 基于Passive DNS迭代聚类的恶意域名检测方法
CN106897273B (zh) * 2017-04-12 2018-02-06 福州大学 一种基于知识图谱的网络安全动态预警方法
CN107666490B (zh) * 2017-10-18 2019-09-20 中国联合网络通信集团有限公司 一种可疑域名检测方法及装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102045215A (zh) * 2009-10-21 2011-05-04 成都市华为赛门铁克科技有限公司 僵尸网络检测方法及装置
US8763117B2 (en) * 2012-03-02 2014-06-24 Cox Communications, Inc. Systems and methods of DNS grey listing
CN107566376A (zh) * 2017-09-11 2018-01-09 中国信息安全测评中心 一种威胁情报生成方法、装置及系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3905624A4

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114615003A (zh) * 2020-12-07 2022-06-10 中国移动通信有限公司研究院 命令和控制c&c域名的验证方法、装置及电子设备
CN112565283A (zh) * 2020-12-15 2021-03-26 厦门服云信息科技有限公司 一种apt攻击检测方法、终端设备及存储介质
CN114389974A (zh) * 2022-03-23 2022-04-22 中国人民解放军国防科技大学 查找分布式训练系统中异常流量节点的方法、装置及介质
CN115118491A (zh) * 2022-06-24 2022-09-27 北京天融信网络安全技术有限公司 僵尸网络检测的方法、装置、电子设备及可读存储介质
CN115118491B (zh) * 2022-06-24 2024-02-09 北京天融信网络安全技术有限公司 僵尸网络检测的方法、装置、电子设备及可读存储介质

Also Published As

Publication number Publication date
CN111355697A (zh) 2020-06-30
EP3905624A4 (en) 2022-08-17
SG11202106429VA (en) 2021-07-29
EP3905624B1 (en) 2024-09-11
EP3905624A1 (en) 2021-11-03
CN111355697B (zh) 2022-02-25

Similar Documents

Publication Publication Date Title
WO2020133986A1 (zh) 僵尸网络域名家族的检测方法、装置、设备及存储介质
US10574681B2 (en) Detection of known and unknown malicious domains
US10270744B2 (en) Behavior analysis based DNS tunneling detection and classification framework for network security
Sun et al. {HinDom}: A robust malicious domain detection system based on heterogeneous information network with transductive classification
Jiang et al. Identifying suspicious activities through dns failure graph analysis
Gogoi et al. MLH-IDS: a multi-level hybrid intrusion detection method
US8260914B1 (en) Detecting DNS fast-flux anomalies
CN112910929B (zh) 基于异质图表示学习的恶意域名检测方法及装置
Brissaud et al. Transparent and service-agnostic monitoring of encrypted web traffic
CN106534146A (zh) 一种安全监测系统及方法
Hu et al. Security metric methods for network multistep attacks using AMC and big data correlation analysis
Zhang et al. Toward unsupervised protocol feature word extraction
Celik et al. Detection of Fast-Flux Networks using various DNS feature sets
Kozik et al. Cost‐Sensitive Distributed Machine Learning for NetFlow‐Based Botnet Activity Detection
Zhu Attack pattern discovery in forensic investigation of network attacks
Elekar Combination of data mining techniques for intrusion detection system
Sadineni et al. ProvNet-IoT: Provenance based network layer forensics in Internet of Things
Ma et al. GraphNEI: A GNN-based network entity identification method for IP geolocation
Liu et al. CCGA: clustering and capturing group activities for DGA-based botnets detection
Jing et al. DDoS detection based on graph structure features and non‐negative matrix factorization
Zang et al. Attack scenario reconstruction via fusing heterogeneous threat intelligence
CN114301659A (zh) 网络攻击预警方法、系统、设备及存储介质
Shinan et al. BotSward: Centrality Measures for Graph-Based Bot Detection Using Machine Learning.
CN113572781A (zh) 网络安全威胁信息归集方法
TW202311994A (zh) 偵測惡意網域查詢行為的系統及方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19903904

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019903904

Country of ref document: EP

Effective date: 20210726