CN113315739A - Malicious domain name detection method and system - Google Patents
Malicious domain name detection method and system Download PDFInfo
- Publication number
- CN113315739A CN113315739A CN202010119771.8A CN202010119771A CN113315739A CN 113315739 A CN113315739 A CN 113315739A CN 202010119771 A CN202010119771 A CN 202010119771A CN 113315739 A CN113315739 A CN 113315739A
- Authority
- CN
- China
- Prior art keywords
- domain name
- cluster
- data
- malicious
- new core
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 66
- 238000000034 method Methods 0.000 claims abstract description 50
- 238000004590 computer program Methods 0.000 claims description 25
- 230000002159 abnormal effect Effects 0.000 claims description 15
- 238000001914 filtration Methods 0.000 claims description 10
- 238000000605 extraction Methods 0.000 abstract description 3
- 230000008569 process Effects 0.000 description 11
- 230000006870 function Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 208000001613 Gambling Diseases 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 206010042635 Suspiciousness Diseases 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L61/00—Network arrangements, protocols or services for addressing or naming
- H04L61/45—Network directories; Name-to-address mapping
- H04L61/4505—Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
- H04L61/4511—Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computing Systems (AREA)
- Computer Hardware Design (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The embodiment of the application provides a method and a system for detecting a malicious domain name, which are used for performing grouping clustering on a new core domain name, clustering a single domain name into a domain name cluster, and performing feature extraction in a cluster form to detect the malicious features, so that the detection accuracy of the domain name is effectively improved. The method in the embodiment of the application comprises the following steps: performing grouping clustering on the obtained new core domain names to obtain a plurality of domain name clusters; respectively counting the data characteristics of each domain name cluster; performing malicious feature matching of a security scene on the data features of each domain name cluster; and according to the matching result, dividing each domain name cluster into a malicious domain name data cluster and/or an unknown threatening domain name data cluster corresponding to the safety scene.
Description
Technical Field
The present application relates to the field of data security technologies, and in particular, to a method and a system for detecting a malicious domain name.
Background
New core domain names, i.e., new core domain names in DNS data, where "new" generally refers to new registrations or new discoveries, are a broad category of security events and are one type of unknown anomaly detection.
Currently, regarding malicious domain name detection in the dimension of a new core domain name, there are two main methods in the industry:
one is to judge whether the domain name is a new domain name by comparing the client traffic data, if the domain name is found to be the new domain name, the domain names are directly added into a blacklist with time efficiency, the client is refused to access, and if the time efficiency is exceeded, the domain name is released from the blacklist.
The other is that when the domain name used or resolved by the client for the first time is found, the initial information of the domain name is recorded, and after a period of time, the latest information is obtained by other information finding methods, and is compared with the initial information and detected. If the domain name is found to be malicious, the domain name is added into a blacklist, and the client is denied access.
The above scheme has several problems:
1. the mode of directly adding the blacklist is too forceful and hard to report by mistake, so that the normal service of a client is influenced;
2. both the above-mentioned two schemes utilize the dimension of time, and the processing mode of the new domain name needs to be changed through the change of a period of time, however, the time required by the change of different types of security events is inconsistent, and the utilization of the time is easy to cause the report omission.
The unit for judging the result is a domain name, and a large amount of behavior characteristics can be lost when only one domain name is used for judging, so that the detection accuracy is reduced.
Disclosure of Invention
The embodiment of the application provides a method and a system for detecting a malicious domain name, which are used for performing grouping clustering on a new core domain name, clustering a single domain name into a domain name cluster, and performing feature extraction in a cluster form to detect the malicious features, so that the detection accuracy of the domain name is effectively improved.
A first aspect of an embodiment of the present application provides a method for detecting a malicious domain name, including:
performing grouping clustering on the obtained new core domain names to obtain a plurality of domain name clusters;
respectively counting the data characteristics of each domain name cluster;
performing malicious feature matching of a security scene on the data features of each domain name cluster;
and according to the matching result, dividing each domain name cluster into a malicious domain name data cluster and/or an unknown threatening domain name data cluster corresponding to the safety scene.
Preferably, after separately counting the data characteristics of each domain name cluster, the method further includes:
dividing each domain name cluster into an analyzable data cluster and a non-analyzable data cluster;
the performing malicious feature matching of a security scene on the data features of each domain name cluster comprises:
and respectively executing malicious feature matching of a security scene on the analyzable data cluster and the non-analyzable data cluster.
Preferably, before the performing packet clustering on the obtained new core domain name, the method further includes:
judging whether the new core domain name meets the domain name naming specification;
if yes, acquiring a second-level domain name in the new core domain name, performing matching of the longest meaningful character string on the second-level domain name, and counting the occupation ratio of the meaningful character string;
and if the occupation ratio of the meaningful character strings is not greater than a preset threshold value, defining the new core domain name as a malicious domain name.
Preferably, the method further comprises:
and if the proportion of the meaningful character strings is larger than the preset threshold value, triggering and executing the step of executing the grouping clustering of the acquired new core domain name.
Preferably, before the determining whether the new core domain name meets the domain name naming specification, the method further includes:
performing white list filtering on the acquired new core domain name to acquire non-white list data in the new core domain name;
inputting the non-white list data into an abnormal domain name detection model for prediction so as to obtain a new core domain name meeting the detection model;
and triggering and judging whether the new core domain name meeting the detection model meets the domain name naming specification.
Preferably, the method further comprises:
and acquiring domain name data with access times less than the preset times from the domain name system by using a preset time window, and defining the domain name data as a new core domain name.
A second aspect of the embodiments of the present application provides a system for detecting a malicious domain name, where the system includes:
the group clustering module is used for performing group clustering on the obtained new core domain name to obtain a plurality of domain name clusters;
the statistic module is used for respectively counting the data characteristics of each domain name cluster;
the matching module is used for matching the data characteristics of each domain name cluster with the malicious characteristics of a security scene;
and the domain name dividing module is used for dividing each domain name cluster into a malicious domain name data cluster and/or an unknown threat domain name data cluster corresponding to the safety scene according to the matching result.
Preferably, the domain name dividing module is further configured to:
dividing each domain name cluster into an analyzable data cluster and a non-analyzable data cluster;
the matching module is specifically configured to:
and respectively executing malicious feature matching of a security scene on the analyzable data cluster and the non-analyzable data cluster.
Preferably, the system further comprises:
the grammar analysis module is used for judging whether the obtained new core domain name meets the domain name naming specification or not before the obtained new core domain name is subjected to packet clustering;
if yes, acquiring a second-level domain name in the new core domain name, performing matching of the longest meaningful character string on the second-level domain name, and counting the occupation ratio of the meaningful character string;
and if the occupation ratio of the meaningful character strings is not greater than a preset threshold value, defining the new core domain name as a malicious domain name.
Preferably, the grammar module is further configured to:
and if the proportion of the meaningful character strings is larger than the preset threshold value, triggering and executing the step of executing the grouping clustering of the acquired new core domain name.
Preferably, the system further comprises:
the system comprises a white list identification module, a white list filtering module and a white list filtering module, wherein the white list identification module is used for executing white list filtering on the acquired new core domain name to acquire non-white list data in the new core domain name before judging whether the new core domain name meets the domain name naming specification;
inputting the non-white list data into an abnormal domain name detection model for prediction so as to obtain a new core domain name meeting the detection model;
and triggering and judging whether the new core domain name meeting the detection model meets the domain name naming specification.
Preferably, the system further comprises:
and the new core domain name mining module is used for acquiring domain name data with access times smaller than the preset times from the domain name system by using a preset time window and defining the domain name data as the new core domain name.
A third aspect of an embodiment of the present application provides a computer apparatus, including a processor, where the processor is configured to, when executing a computer program stored in a memory, implement the method for detecting a malicious domain name according to the first aspect of the embodiment of the present application.
A fourth aspect of the embodiments of the present application provides a readable computer storage medium, on which a computer program is stored, where the computer program is used, when being executed by a processor, to implement the method for detecting a malicious domain name according to the first aspect of the embodiments of the present application.
According to the technical scheme, the embodiment of the application has the following advantages:
in the embodiment of the application, the obtained new core domain name is subjected to grouping clustering to obtain a plurality of domain name clusters; respectively counting the data characteristics of each domain name cluster; performing malicious feature matching of a security scene on the data features of each domain name cluster; and according to the matching result, dividing each domain name cluster into a malicious domain name data cluster and/or an unknown threatening domain name data cluster corresponding to the safety scene. In the embodiment of the application, a single domain name is clustered into the domain name clusters, and the features are extracted in the form of the clusters so as to detect malicious features, so that the used detection dimensions are more, and the accuracy of domain name detection is effectively improved.
Drawings
Fig. 1 is a schematic diagram of an embodiment of a method for detecting a malicious domain name in an embodiment of the present application;
fig. 2 is a schematic diagram of another embodiment of a method for detecting a malicious domain name in an embodiment of the present application;
fig. 3 is a schematic diagram of another embodiment of a method for detecting a malicious domain name in an embodiment of the present application;
fig. 4 is a schematic diagram of an embodiment of a malicious domain name detection system in an embodiment of the present application.
Detailed Description
The embodiment of the application provides a method and a system for detecting a malicious domain name, which are used for performing grouping clustering on a new core domain name, clustering a single domain name into a domain name cluster, and performing feature extraction in a cluster form to detect the malicious features, so that the detection accuracy of the domain name is effectively improved.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For convenience of understanding, the terms of art referred to in this application are described first and are used throughout the following description without further elaboration.
Newly discovered Core domain name (newley sen Core Domians): a core domain name first discovered within a time window.
Anomaly Detection Model (Anomaly Detection Model): and (3) discovering abnormal values or singular values in the data in a probability statistic mode.
Legal domain name (Legitimate Domains): namely, the domain name of the normal service, and the normal domain name accessed by the client according to the self requirement.
Malicious domain names (Malicious Domains): that is, abnormal domain names, when a host is invaded by a hacker, the hacker often uses the attacked host to access the malicious domain names of the hacker, perform C & C communication, and perform some illegal activities such as data theft.
Cluster (Cluster): i.e. a set of similar data.
Next, a method for detecting a malicious domain name in the present application is described, with reference to fig. 1 in particular, an embodiment of a method for detecting a malicious domain name in the present application includes:
101. performing grouping clustering on the obtained new core domain names to obtain a plurality of domain name clusters;
in the prior art, a single domain name is usually detected, so that the available data features are less during domain name detection, and the detection accuracy is low.
According to the malicious domain name detection method, the obtained new core domain names are grouped and clustered to obtain a plurality of domain name clusters, and therefore the accuracy of domain name judgment is improved by means of a plurality of data characteristics of the plurality of domain name clusters.
Specifically, the experimental analysis shows that various characteristics and behaviors of the same family of malicious domain names are extremely consistent, time aggregation exists, the same type of malicious domain names are aggregated and judged uniformly, the detection dimensionality can be increased, and the interpretability is higher.
In practical operation, at least one of the host ID, the destination IP, the resolution IP (resource record answering the request), the Qtype (type of request resource record), the Rcode (type of response message), and the occurrence time may be used as a grouping basis to perform grouping clustering on the obtained new core domain name to generate a plurality of data clusters.
102. Respectively counting the data characteristics of each domain name cluster;
after dividing the acquired new core domain name into a plurality of data clusters, respectively counting the data characteristics of each data cluster so as to judge the domain name cluster according to the plurality of data characteristics.
Specifically, one or more data features of the digital word occupancy, the yellow net word hit rate, the domain name length entropy, the access time entropy, the similarity between domain names, and the top-level domain name type ratio in each data cluster may be counted, and the judgment may be performed on each domain name cluster according to the data features.
103. Performing malicious feature matching of a security scene on the data features of each domain name cluster;
specifically, the process of performing malicious domain name judgment on each domain name cluster is as follows: and performing malicious feature matching of the security scenes according to the data features of each domain name cluster, wherein each security scene represents one type of security event, and the malicious features under each security scene are a set of malicious domain names under the corresponding security scene.
Wherein the security scenario includes, but is not limited to, hard coding, spam, yellow net, gambling, parkking, etc., and is not limited thereto.
104. And according to the matching result, dividing each domain name cluster into a malicious domain name data cluster and/or an unknown threatening domain name data cluster corresponding to the safety scene.
According to the matching result, each domain name cluster can be divided into a malicious domain name data cluster and/or an unknown threatening domain name data cluster corresponding to the security scene.
Specifically, the data feature of the current domain name cluster may be defined as a malicious domain name in the security scene when the matching rate with the malicious feature of the corresponding security scene is greater than a preset threshold (e.g., 80%), and otherwise, the current domain name cluster is defined as an unknown threat domain name data cluster.
The preset threshold of the matching rate may be customized according to actual conditions, and is not limited specifically here.
In the embodiment of the application, the obtained new core domain name is subjected to grouping clustering to obtain a plurality of domain name clusters; respectively counting the data characteristics of each domain name cluster; performing malicious feature matching of a security scene on the data features of each domain name cluster; and according to the matching result, dividing each domain name cluster into a malicious domain name data cluster and/or an unknown threatening domain name data cluster corresponding to the safety scene. In the embodiment of the application, a single domain name is clustered into the domain name clusters, and the features are extracted in the form of the clusters so as to detect malicious features, so that more detection dimensions are used, and the accuracy of detection is effectively improved.
Based on the embodiment described in fig. 1, after step 102, in order to quickly perform malicious feature matching of a security scenario on the data features of the domain name cluster, the following steps may also be performed:
201. dividing each domain name cluster into an analyzable data cluster and a non-analyzable data cluster;
because the malicious feature matching process under different security scenes needs to meet different preconditions, for example, when hard-coded malicious feature matching is performed, matching of other malicious features can be performed only if the corresponding domain name cluster is resolvable, in order to perform malicious feature matching of security scenes quickly, the corresponding domain name cluster can be divided into resolvable data clusters and unresolvable data clusters according to whether the result of an Answer field in DNS data is empty, so as to accelerate the process of matching malicious features of the domain name cluster.
202. And respectively executing malicious feature matching of a security scene on the analyzable data cluster and the non-analyzable data cluster.
After dividing each domain name cluster into an analyzable data cluster and an unresolvable data cluster, malicious feature matching of a security scene can be performed on the domain name clusters respectively.
The specific matching process is as follows:
and utilizing the analyzable data cluster to match malicious characteristics of known security scenes such as hard codes, junk mails, yellow nets, gambling, parkking and the like, defining the successfully matched data cluster as a malicious domain name data cluster under the corresponding security scene, and defining the data cluster which does not match the characteristics of the security scene as an unknown threat data cluster.
Similarly, the data clusters which are successfully matched are defined as corresponding safety scene data clusters by using the known safety scene malicious characteristics of the unresolvable data clusters such as DGA, the yellow net, the parkking and the like. And defining the data cluster which does not match the safety scene characteristics as an unknown threat data cluster.
In the embodiment, each domain name cluster is divided into the resolvable data cluster and the unresolvable data cluster, so that the matching process of malicious characteristics of the domain name clusters is accelerated, the detection efficiency of malicious domain names is improved, and the real-time performance of malicious domain name detection is improved.
Based on the embodiment described in fig. 1 or fig. 2, in order to reduce the false alarm rate of malicious domain name detection and further improve the accuracy and detection efficiency of malicious domain name detection, before step 101, the following steps may also be performed:
301. acquiring domain name data with access times smaller than the preset times from a domain name system by using a preset time window, and defining the domain name data as a new core domain name;
it is readily understood that the new core domain name needs to be acquired before packet clustering is performed on the acquired new core domain name.
Since a company or an individual corresponding to each client device should be similar in business and the access domain name type should be stable, core domain names with access times less than a preset number of times in a period of time belong to singular values in the overall data, and thus the suspiciousness is high. In this embodiment, the domain name data with access times smaller than the preset times may be obtained from the domain name system by using a preset time window, and defined as the new core domain name.
Wherein, the preset time window, each user can perform self-defined setting according to the own service requirement, such as 10:00-12:00 or 15:00-17:00 every day, and the like, and is not limited specifically here.
302. Performing white list filtering on the acquired new core domain name to acquire non-white list data in the new core domain name;
in order to reduce the false alarm rate of the malicious domain name and further improve the accuracy of malicious domain name detection, newly found core domain names can be filtered by using white list data accumulated by a user for a long time, and data not in the white list is left to obtain non-white list data in the new core domain name so as to execute step 303.
303. Inputting the non-white list data into an abnormal domain name detection model for prediction so as to obtain a new core domain name meeting the detection model;
after the non-white list data are obtained, the non-white list data are further input into an abnormal domain name detection model for prediction, so that a new core domain name meeting the detection model is obtained.
The abnormal domain name detection model can extract feature engineering by utilizing legal domain name data accumulated by a company for a long time, such as domain name length, DNS request type, access time period, access time sequence and the like. And then training an One-Class SVM abnormal detection model, and bringing non-white list data into the model for prediction to obtain a new core domain name meeting the abnormal domain name detection model.
It should be noted that the One-Class SVM anomaly detection model herein is replaced by other anomaly detection models, such as isolated forest, and the training process of the above models is described in detail in the prior art, and is not described herein again.
304. Judging whether the new core domain name meeting the detection model meets the domain name naming specification, if so, executing step 305, and if not, executing step 308;
further, in order to improve the accuracy of detecting the malicious domain name again, it may be further determined whether the new core domain name that meets the abnormal domain name detection model meets the naming specification of the domain name, if so, step 305 is executed, and if not, step 308 is executed.
Specifically, it may be determined whether the new core domain name meets the naming specification of the domain name according to a chinese-english dictionary, for example, it is determined whether baidu.com meets the naming specification, and baidu.com is a new core domain name meeting the naming specification of the domain name because baidu is a mode meeting a chinese dictionary.
305. Acquiring a second-level domain name in the new core domain name, performing matching of the longest meaningful character string on the second-level domain name, and counting the occupation ratio of the meaningful character string;
the domain names registered by hackers for malicious attacks are often some that do not meet the legal naming specifications, with the main purpose that the domain names are already occupied by other legitimate traffic, resulting in their attack being blocked. Therefore, domain names composed of random characters are more likely to be malicious domain names.
Therefore, if the new core domain name meets the naming specification of the domain name, a second-level domain name in the new core domain name is further obtained, matching of the longest meaningful character string is performed on the second-level domain name, and the occupation ratio of the meaningful character string is counted.
306. If the occupation ratio of the meaningful character strings is not larger than a preset threshold value, defining the new core domain name as a malicious domain name;
and if the occupation ratio of the meaningful character strings is not greater than a preset threshold value, defining the new core domain name as a malicious domain name.
307. If the ratio of the meaningful character strings is larger than a preset threshold value, triggering a step of executing grouping clustering on the acquired new core domain name;
if the percentage of the meaningful character strings is greater than the preset threshold, it indicates that the new core domain may also be a legal domain, and the step of performing the group clustering on the obtained new core domain, that is, the step in the embodiment of fig. 1, is continuously triggered and executed, which is not described herein again.
308. And if the new core domain name meeting the detection model does not meet the domain name naming specification, defining the new core domain name as a malicious domain name.
Corresponding to step 304, if the new core domain name satisfying the abnormal domain name detection model does not meet the domain name naming specification again, the new core domain name is directly defined as a malicious domain name.
It should be noted that, in the embodiment, there is no strict sequence between the steps 302-303 and 304-308, that is, the steps 302-303 and 304-308 may be executed first, or the steps 304-308 and 302-303 may be executed first, but the steps 302-303 and 304-308 are executed first, which is a preferred embodiment because the malicious domain name detection step is executed first for the new core domain name, which can improve the domain name detection efficiency.
In the embodiment of the application, in order to further reduce the false alarm rate of the malicious domain name and improve the accuracy and detection efficiency of malicious domain name detection, the white list filtering and domain name grammar analysis steps are executed before the new core domain name is subjected to the packet clustering, so that the accuracy and detection efficiency of the malicious domain name detection are further improved.
The above describes in detail the method for detecting a malicious domain name in the embodiment of the present application, and then describes in detail a system for detecting a malicious domain name in the embodiment of the present application, with reference to fig. 4, an embodiment of the system for detecting a malicious domain name in the embodiment of the present application includes:
the group clustering module 401 is configured to perform group clustering on the obtained new core domain name to obtain a plurality of domain name clusters;
a statistic module 402, configured to separately count data characteristics of each domain name cluster;
a matching module 403, configured to perform malicious feature matching on a security scene on the data features of each domain name cluster;
and a domain name dividing module 404, configured to divide each domain name cluster into a malicious domain name data cluster and/or an unknown threat domain name data cluster corresponding to the security scene according to the matching result.
Preferably, the domain name dividing module 404 is further configured to:
dividing each domain name cluster into an analyzable data cluster and a non-analyzable data cluster;
the matching module 403 is specifically configured to:
and respectively executing malicious feature matching of a security scene on the analyzable data cluster and the non-analyzable data cluster.
Preferably, the system further comprises:
a grammar analysis module 405, configured to determine whether the new core domain name meets a domain name naming specification before performing packet clustering on the obtained new core domain name;
if yes, acquiring a second-level domain name in the new core domain name, performing matching of the longest meaningful character string on the second-level domain name, and counting the occupation ratio of the meaningful character string;
and if the occupation ratio of the meaningful character strings is not greater than a preset threshold value, defining the new core domain name as a malicious domain name.
Preferably, the grammar module 405 is further configured to:
and if the proportion of the meaningful character strings is larger than the preset threshold value, triggering and executing the step of executing the grouping clustering of the acquired new core domain name.
Preferably, the system further comprises:
the white list identifying module 406 is configured to perform white list filtering on the obtained new core domain name to obtain non-white list data in the new core domain name before the determination of whether the new core domain name meets the domain name naming specification;
inputting the non-white list data into an abnormal domain name detection model for prediction so as to obtain a new core domain name meeting the detection model;
and triggering and judging whether the new core domain name meeting the detection model meets the domain name naming specification.
Preferably, the system further comprises:
and the new core domain name mining module 407 is configured to obtain domain name data with access times smaller than the preset times from the domain name system by using a preset time window, and define the domain name data as a new core domain name.
It should be noted that the functions of the modules in this embodiment are similar to the types described in the embodiments of fig. 1 to fig. 3, and are not described herein again.
In the embodiment of the application, the obtained new core domain name is subjected to group clustering by a group clustering module 401 to obtain a plurality of domain name clusters; respectively counting the data characteristics of each domain name cluster through a counting module 402; performing malicious feature matching of a security scene on the data features of each domain name cluster through a matching module 403; according to the matching result, the domain name clusters are divided into malicious domain name data clusters and/or unknown threat domain name data clusters corresponding to the security scene by the domain name dividing module 404. In the embodiment of the application, a single domain name is clustered into the domain name clusters, and the features are extracted in the form of the clusters so as to detect malicious features, so that more detection dimensions are used, and the accuracy of detection is effectively improved.
The above describes the malicious domain name detection system in the embodiment of the present invention from the perspective of the modular functional entity, and the following describes the computer apparatus in the embodiment of the present invention from the perspective of hardware processing:
the computer device is used for realizing the function of a malicious domain name detection system, and one embodiment of the computer device in the embodiment of the invention comprises the following steps:
a processor and a memory;
the memory is used for storing the computer program, and the processor is used for realizing the following steps when executing the computer program stored in the memory:
performing grouping clustering on the obtained new core domain names to obtain a plurality of domain name clusters;
respectively counting the data characteristics of each domain name cluster;
performing malicious feature matching of a security scene on the data features of each domain name cluster;
and according to the matching result, dividing each domain name cluster into a malicious domain name data cluster and/or an unknown threatening domain name data cluster corresponding to the safety scene.
In some embodiments of the present invention, the processor may be further configured to:
dividing each domain name cluster into an analyzable data cluster and a non-analyzable data cluster;
the performing malicious feature matching of a security scene on the data features of each domain name cluster comprises:
and respectively executing malicious feature matching of a security scene on the analyzable data cluster and the non-analyzable data cluster.
In some embodiments of the present invention, the processor may be further configured to:
judging whether the new core domain name meets the domain name naming specification;
if yes, acquiring a second-level domain name in the new core domain name, performing matching of the longest meaningful character string on the second-level domain name, and counting the occupation ratio of the meaningful character string;
and if the occupation ratio of the meaningful character strings is not greater than a preset threshold value, defining the new core domain name as a malicious domain name.
In some embodiments of the present invention, the processor may be further configured to:
and if the proportion of the meaningful character strings is larger than the preset threshold value, triggering and executing the step of executing the grouping clustering of the acquired new core domain name.
In some embodiments of the present invention, the processor may be further configured to:
performing white list filtering on the acquired new core domain name to acquire non-white list data in the new core domain name;
inputting the non-white list data into an abnormal domain name detection model for prediction so as to obtain a new core domain name meeting the detection model;
and triggering and judging whether the new core domain name meets the domain name naming specification.
In some embodiments of the present invention, the processor may be further configured to:
and acquiring domain name data with access times less than the preset times from the domain name system by using a preset time window, and defining the domain name data as a new core domain name.
It is to be understood that, when the processor in the computer apparatus described above executes the computer program, the functions of each unit in the corresponding apparatus embodiments may also be implemented, and are not described herein again. Illustratively, the computer program may be partitioned into one or more modules/units that are stored in the memory and executed by the processor to implement the invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used for describing the execution process of the computer program in the application starting system. For example, the computer program may be divided into units in the application launching system described above, and each unit may implement specific functions as described above in the description of the corresponding application launching system.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing equipment. The computer device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that the processor, memory are merely examples of a computer apparatus and are not meant to be limiting, and that more or fewer components may be included, or certain components may be combined, or different components may be included, for example, the computer apparatus may also include input output devices, network access devices, buses, etc.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable gate array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware component, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like which is the control center for the computer device and which connects the various parts of the overall computer device using various interfaces and lines.
The memory may be used to store the computer programs and/or modules, and the processor may implement various functions of the computer device by running or executing the computer programs and/or modules stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the terminal, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
The present invention also provides a computer-readable storage medium for implementing the functions of the malicious domain name detection system, having a computer program stored thereon, which, when executed by a processor, the processor is operable to perform the steps of:
performing grouping clustering on the obtained new core domain names to obtain a plurality of domain name clusters;
respectively counting the data characteristics of each domain name cluster;
performing malicious feature matching of a security scene on the data features of each domain name cluster;
and according to the matching result, dividing each domain name cluster into a malicious domain name data cluster and/or an unknown threatening domain name data cluster corresponding to the safety scene.
In some embodiments of the invention, the computer program stored on the computer-readable storage medium, when executed by the processor, may be specifically configured to perform the steps of:
dividing each domain name cluster into an analyzable data cluster and a non-analyzable data cluster;
the performing malicious feature matching of a security scene on the data features of each domain name cluster comprises:
and respectively executing malicious feature matching of a security scene on the analyzable data cluster and the non-analyzable data cluster.
In some embodiments of the invention, the computer program stored on the computer-readable storage medium, when executed by the processor, may be specifically configured to perform the steps of:
judging whether the new core domain name meets the domain name naming specification;
if yes, acquiring a second-level domain name in the new core domain name, performing matching of the longest meaningful character string on the second-level domain name, and counting the occupation ratio of the meaningful character string;
and if the occupation ratio of the meaningful character strings is not greater than a preset threshold value, defining the new core domain name as a malicious domain name.
In some embodiments of the invention, the computer program stored on the computer-readable storage medium, when executed by the processor, may be specifically configured to perform the steps of:
and if the proportion of the meaningful character strings is larger than the preset threshold value, triggering and executing the step of executing the grouping clustering of the acquired new core domain name.
In some embodiments of the invention, the computer program stored on the computer-readable storage medium, when executed by the processor, may be specifically configured to perform the steps of:
performing white list filtering on the acquired new core domain name to acquire non-white list data in the new core domain name;
inputting the non-white list data into an abnormal domain name detection model for prediction so as to obtain a new core domain name meeting the detection model;
and triggering and judging whether the new core domain name meets the domain name naming specification.
In some embodiments of the invention, the computer program stored on the computer-readable storage medium, when executed by the processor, may be specifically configured to perform the steps of:
and acquiring domain name data with access times less than the preset times from the domain name system by using a preset time window, and defining the domain name data as a new core domain name.
It will be appreciated that the integrated units, if implemented as software functional units and sold or used as a stand-alone product, may be stored in a corresponding one of the computer readable storage media. Based on such understanding, all or part of the flow of the method according to the above embodiments may be implemented by a computer program, which may be stored in a computer-readable storage medium and used by a processor to implement the steps of the above embodiments of the method. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.
Claims (10)
1. A method for detecting a malicious domain name, the method comprising:
performing grouping clustering on the obtained new core domain names to obtain a plurality of domain name clusters;
respectively counting the data characteristics of each domain name cluster;
performing malicious feature matching of a security scene on the data features of each domain name cluster;
and according to the matching result, dividing each domain name cluster into a malicious domain name data cluster and/or an unknown threatening domain name data cluster corresponding to the safety scene.
2. The method of claim 1, wherein after separately counting the data characteristics of each domain name cluster, the method further comprises:
dividing each domain name cluster into an analyzable data cluster and a non-analyzable data cluster;
the performing malicious feature matching of a security scene on the data features of each domain name cluster comprises:
and respectively executing malicious feature matching of a security scene on the analyzable data cluster and the non-analyzable data cluster.
3. The method according to claim 1, wherein before the performing packet clustering on the obtained new core domain name, the method further comprises:
judging whether the new core domain name meets the domain name naming specification;
if yes, acquiring a second-level domain name in the new core domain name, performing matching of the longest meaningful character string on the second-level domain name, and counting the occupation ratio of the meaningful character string;
and if the occupation ratio of the meaningful character strings is not greater than a preset threshold value, defining the new core domain name as a malicious domain name.
4. The method of claim 3, further comprising:
and if the proportion of the meaningful character strings is larger than the preset threshold value, triggering and executing the step of executing the grouping clustering of the acquired new core domain name.
5. The method of claim 3, wherein prior to said determining whether the new core domain name complies with a domain name naming specification, the method further comprises:
performing white list filtering on the acquired new core domain name to acquire non-white list data in the new core domain name;
inputting the non-white list data into an abnormal domain name detection model for prediction so as to obtain a new core domain name meeting the detection model;
and triggering and judging whether the new core domain name meeting the detection model meets the domain name naming specification.
6. The method according to any one of claims 1 to 5, further comprising:
and acquiring domain name data with access times less than the preset times from the domain name system by using a preset time window, and defining the domain name data as a new core domain name.
7. A system for detecting malicious domain names, the system comprising:
the group clustering module is used for performing group clustering on the obtained new core domain name to obtain a plurality of domain name clusters;
the statistic module is used for respectively counting the data characteristics of each domain name cluster;
the matching module is used for matching the data characteristics of each domain name cluster with the malicious characteristics of a security scene;
and the domain name dividing module is used for dividing each domain name cluster into a malicious domain name data cluster and/or an unknown threat domain name data cluster corresponding to the safety scene according to the matching result.
8. The system of claim 7, wherein the domain name resolution module is further configured to:
dividing each domain name cluster into an analyzable data cluster and a non-analyzable data cluster;
the matching module is specifically configured to:
and respectively executing malicious feature matching of a security scene on the analyzable data cluster and the non-analyzable data cluster.
9. A computer arrangement comprising a processor, characterized in that the processor, when executing a computer program stored on a memory, is adapted to implement the method of detection of a malicious domain name according to any of claims 1 to 6.
10. A readable computer storage medium on which a computer program is stored, the computer program, when being executed by a processor, being configured to implement the method for detecting a malicious domain name according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010119771.8A CN113315739A (en) | 2020-02-26 | 2020-02-26 | Malicious domain name detection method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010119771.8A CN113315739A (en) | 2020-02-26 | 2020-02-26 | Malicious domain name detection method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113315739A true CN113315739A (en) | 2021-08-27 |
Family
ID=77369983
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010119771.8A Pending CN113315739A (en) | 2020-02-26 | 2020-02-26 | Malicious domain name detection method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113315739A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113746952A (en) * | 2021-09-14 | 2021-12-03 | 京东科技信息技术有限公司 | DGA domain name detection method, device, electronic equipment and computer storage medium |
WO2024139862A1 (en) * | 2022-12-28 | 2024-07-04 | 中国互联网络信息中心 | Clustering analysis-based domain name abuse detection method and system |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120158626A1 (en) * | 2010-12-15 | 2012-06-21 | Microsoft Corporation | Detection and categorization of malicious urls |
US20160065611A1 (en) * | 2011-07-06 | 2016-03-03 | Nominum, Inc. | Analyzing dns requests for anomaly detection |
CN107566376A (en) * | 2017-09-11 | 2018-01-09 | 中国信息安全测评中心 | One kind threatens information generation method, apparatus and system |
CN108600200A (en) * | 2018-04-08 | 2018-09-28 | 腾讯科技(深圳)有限公司 | Domain name detection method, device, computer equipment and storage medium |
CN108712403A (en) * | 2018-05-04 | 2018-10-26 | 哈尔滨工业大学(威海) | The illegal domain name method for digging of similitude is constructed based on domain name |
CN108734011A (en) * | 2017-04-17 | 2018-11-02 | 中国移动通信有限公司研究院 | software link detection method and device |
CN110535821A (en) * | 2019-05-17 | 2019-12-03 | 南京聚铭网络科技有限公司 | A kind of Host Detection method of falling based on DNS multiple features |
-
2020
- 2020-02-26 CN CN202010119771.8A patent/CN113315739A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120158626A1 (en) * | 2010-12-15 | 2012-06-21 | Microsoft Corporation | Detection and categorization of malicious urls |
US20160065611A1 (en) * | 2011-07-06 | 2016-03-03 | Nominum, Inc. | Analyzing dns requests for anomaly detection |
US20180054457A1 (en) * | 2011-07-06 | 2018-02-22 | Nominum, Inc. | Analyzing DNS Requests for Anomaly Detection |
CN108734011A (en) * | 2017-04-17 | 2018-11-02 | 中国移动通信有限公司研究院 | software link detection method and device |
CN107566376A (en) * | 2017-09-11 | 2018-01-09 | 中国信息安全测评中心 | One kind threatens information generation method, apparatus and system |
CN108600200A (en) * | 2018-04-08 | 2018-09-28 | 腾讯科技(深圳)有限公司 | Domain name detection method, device, computer equipment and storage medium |
CN108712403A (en) * | 2018-05-04 | 2018-10-26 | 哈尔滨工业大学(威海) | The illegal domain name method for digging of similitude is constructed based on domain name |
CN110535821A (en) * | 2019-05-17 | 2019-12-03 | 南京聚铭网络科技有限公司 | A kind of Host Detection method of falling based on DNS multiple features |
Non-Patent Citations (5)
Title |
---|
刘洪亮: "用机器智能解决网络安全问题:基于DNS的实践", 《瀚海数据说》 * |
张慧: "面向恶意网址检测的广谱特征选择与评估", 《现代电子技术》 * |
张洋: "基于多元属性特征的恶意域名检测", 《计算机应用》 * |
程光著: "《僵尸网络检测技术》", 1 October 2014, 东南大学出版社 * |
黄凯: "一种基于字符及解析特征的恶意域名检测方法", 《计算机仿真》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113746952A (en) * | 2021-09-14 | 2021-12-03 | 京东科技信息技术有限公司 | DGA domain name detection method, device, electronic equipment and computer storage medium |
CN113746952B (en) * | 2021-09-14 | 2024-04-16 | 京东科技信息技术有限公司 | DGA domain name detection method and device, electronic equipment and computer storage medium |
WO2024139862A1 (en) * | 2022-12-28 | 2024-07-04 | 中国互联网络信息中心 | Clustering analysis-based domain name abuse detection method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109951500B (en) | Network attack detection method and device | |
CN107666490B (en) | A kind of suspicious domain name detection method and device | |
US8549645B2 (en) | System and method for detection of denial of service attacks | |
CN110830986B (en) | Method, device, equipment and storage medium for detecting abnormal behavior of Internet of things card | |
US10484408B2 (en) | Malicious communication pattern extraction apparatus, malicious communication pattern extraction method, and malicious communication pattern extraction program | |
CN106161451A (en) | The method of defence CC attack, Apparatus and system | |
EP3108399A1 (en) | Scoring for threat observables | |
CN107222511B (en) | Malicious software detection method and device, computer device and readable storage medium | |
CN109344611B (en) | Application access control method, terminal equipment and medium | |
CN109600362B (en) | Zombie host recognition method, device and medium based on recognition model | |
CN108923972B (en) | Weight-reducing flow prompting method, device, server and storage medium | |
CN108449349B (en) | Method and device for preventing malicious domain name attack | |
CN108737336A (en) | Threat behavior processing method and processing device, equipment and storage medium based on block chain | |
CN110188538B (en) | Method and device for detecting data by adopting sandbox cluster | |
CN106960153B (en) | Virus type identification method and device | |
CN110798426A (en) | Method and system for detecting flood DoS attack behavior and related components | |
CN113315739A (en) | Malicious domain name detection method and system | |
CN110868418A (en) | Threat information generation method and device | |
CN112153062B (en) | Multi-dimension-based suspicious terminal equipment detection method and system | |
CN114189390B (en) | Domain name detection method, system, equipment and computer readable storage medium | |
CN104751051A (en) | Method, device and mobile terminal for identifying malicious advertisements | |
CN110135162A (en) | The recognition methods of the back door WEBSHELL, device, equipment and storage medium | |
CN107172033B (en) | WAF misjudgment identification method and device | |
EP3331210B1 (en) | Apparatus, method, and non-transitory computer-readable storage medium for network attack pattern determination | |
CN117294497A (en) | Network traffic abnormality detection method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210827 |