CN111935097B - Method for detecting DGA domain name - Google Patents

Method for detecting DGA domain name Download PDF

Info

Publication number
CN111935097B
CN111935097B CN202010684753.4A CN202010684753A CN111935097B CN 111935097 B CN111935097 B CN 111935097B CN 202010684753 A CN202010684753 A CN 202010684753A CN 111935097 B CN111935097 B CN 111935097B
Authority
CN
China
Prior art keywords
domain name
dga
detected
domain
family
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010684753.4A
Other languages
Chinese (zh)
Other versions
CN111935097A (en
Inventor
徐钟豪
陈伟
谢忱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Douxiang Information Technology Co ltd
Original Assignee
Shanghai Douxiang Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Douxiang Information Technology Co ltd filed Critical Shanghai Douxiang Information Technology Co ltd
Priority to CN202010684753.4A priority Critical patent/CN111935097B/en
Publication of CN111935097A publication Critical patent/CN111935097A/en
Application granted granted Critical
Publication of CN111935097B publication Critical patent/CN111935097B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • H04L63/0236Filtering by address, protocol, port number or service, e.g. IP-address or URL

Abstract

The invention relates to a method for detecting a DGA domain name, which comprises the following steps: establishing a DGA domain name detection model and a DGA family detection model; collecting DNS protocol data to be detected, and extracting a domain name to be detected in the DNS protocol data to be detected; extracting features in the domain name to be detected; normalizing the extracted features; importing the normalized features into a DGA domain name detection model and a DGA family detection model to obtain the probability that each detected domain name is a DGA domain name and the probability that each detected domain name is a DGA family; continuously detecting each detected domain name, obtaining the domain names belonging to the DGA domain names in the detected domain names according to the probability that the detected domain names are DGA domain names and the number of the characteristics, and obtaining the domain names belonging to the DGA family in the detected domain names according to the probability that the detected domain names are DGA families and the number of the detected domain names in each DGA family; domain names belonging to DGA and domain names belonging to DGA family are shown. Therefore, the detection process has the advantages of flexible characteristics, low false alarm, low maintenance cost, high new variety detection rate and the like.

Description

Method for detecting DGA domain name
Technical Field
The invention relates to the technical field of internet security, in particular to a method for detecting a DGA domain name.
Background
DGA (domain name generation algorithm) is a technical means to generate C & C domain names using random characters, thereby evading domain name blacklist detection. For example, a DGA created by Cryptolocker generates the domain xeogrhxquubt.com, and if a process attempts to establish a connection with it, the attempting machine may infect Cryptolocker lux viruses. Domain name blacklists are commonly used to detect and block connections for these domains, but do not work well for the constantly updated DGA algorithms.
At present, most safety products are detected based on domain names and the characteristics of the safety products are extracted for detection, but the problem of high false alarm exists in practical application, and many normal domain names can be detected, such as Chinese pinyin domain names and normal overlong domain names, which are easily detected as DGA domain names.
Therefore, it is necessary to provide a method for detecting DGA domain names, which has the advantages of flexible features, low false alarm, low maintenance cost, high new variety detection rate, and the like in the detection process.
Disclosure of Invention
The invention aims to provide a method for detecting a DGA domain name, which has the advantages of flexible characteristics, low false alarm, low maintenance cost, high new variety detection rate and the like in the detection process.
In order to solve the problems in the prior art, the invention provides a method for detecting a DGA domain name, which comprises the following steps:
establishing a DGA domain name detection model and a DGA family detection model;
collecting DNS protocol data to be detected, and extracting a domain name to be detected in the DNS protocol data to be detected;
extracting features in the domain name to be detected;
normalizing the extracted features;
importing the normalized features into a DGA domain name detection model and a DGA family detection model to obtain the probability that each detected domain name is a DGA domain name and the probability that each detected domain name is a DGA family;
continuously detecting each detected domain name, and obtaining the domain name belonging to the DGA domain name in the detected domain name according to the probability that the detected domain name is the DGA domain name and the number of the characteristics;
continuously detecting each detected domain name, and obtaining the domain names belonging to the DGA families in the detected domain names according to the probability that the detected domain names are the DGA families and the number of the detected domain names contained in each DGA family;
domain names belonging to DGA and domain names belonging to DGA family are shown.
Optionally, in the method for detecting a DGA domain name, establishing a DGA domain name detection model and a DGA family detection model includes the following steps:
generating training data comprising normal domain name and DGA domain name data;
performing characteristic engineering processing on the training data, and extracting modeling characteristics;
carrying out standardization processing on the modeling characteristics;
and performing model training on the normalized normal domain name modeling characteristics and the normalized DGA domain name modeling characteristics by adopting a machine learning algorithm to form a DGA domain name detection model and a DGA family detection model.
Optionally, in the method for detecting a DGA domain name, a manner of generating training data is:
collecting domain name data used for DGA domain name detection model training, wherein the domain name data comprises a normal domain name and DGA domain name data;
DGA domain name data for DGA family detection model training is collected.
Optionally, in the method for detecting a DGA domain name, the training data is subjected to feature engineering processing to extract modeling features, and the method includes:
extracting 18 features of each domain name in the training data as modeling features, wherein the 18 features are respectively as follows: the method comprises the steps of domain name entropy, domain name length, ratio of entropy to length, frequency of consonants, frequency of digits appearing, frequency of repeated letters appearing, frequency of continuous digits appearing, frequency of continuous consonants appearing, whether a top level domain is a private domain, mean number of times of sub-domain name unigram appearing in a sample, variance of times of sub-domain name unigram appearing in a sample, mean number of times of sub-domain name bigram appearing in a sample, variance of times of sub-domain name bigram appearing in a sample, mean number of times of sub-domain name trigram appearing in a sample, variance of times of sub-domain name trigram appearing in a sample, n-gram transition probability, ratio of times of top level domain appearing in positive and negative samples, and ratio of times of sub-domain name trigram appearing in positive and negative samples.
Optionally, in the method for detecting a DGA domain name, extracting features in the domain name to be detected includes the following steps:
the extracted features in the domain name to be detected are the same as the extracted modeling features.
Optionally, in the method for detecting a DGA domain name, the extracted features are normalized and the modeling features are normalized in the following manner:
normalizing all extracted features and all modeling feature values to be between 0 and 1.
Optionally, in the method for detecting a DGA domain name, after obtaining the probability that each detected domain name is a DGA domain name and the probability that each detected domain name is a DGA family, before continuously detecting each detected domain name, model pre-filtering is further performed, where the model pre-filtering includes:
and removing the detected domain names with the DGA domain name probability and the DGA family probability of less than 0.5.
Optionally, in the method for detecting a DGA domain name, the method for detecting a DGA domain name further includes the following steps:
determining a suspicious host, detecting DNS protocol data in all hosts, and determining the suspicious host in the following modes: counting the number of the unresponsive DNS queries sent by each host, and if the number of the unresponsive DNS queries sent is more than 20, determining that the host is a suspicious host; otherwise it is a non-suspicious host.
Optionally, in the method for detecting a DGA domain name, for a suspicious host and a non-suspicious host, a domain name belonging to the DGA domain name in the detected domain name is obtained according to the probability that the detected domain name is the DGA domain name and the number of features, and the method includes the following steps:
calculating mutation time, and filtering out hosts without mutation time;
carrying out hierarchical clustering according to the non-response DNS query data in mutation time to generate class clusters, and filtering out the class clusters with the number less than 15;
calculating the upper and lower boundaries of the class cluster meeting the conditions, acquiring the domain names with response in the mutation time, extracting and displaying 15 domain names with response in the upper and lower boundaries of the class cluster, wherein the characteristics of the domain names with response are 15;
and post-filtering, and filtering to obtain the non-response domain name with the DGA domain name probability larger than 0.95 in the non-response domain name and displaying the non-response domain name.
Optionally, in the method for detecting a DGA domain name, for a suspicious host and a non-suspicious host, obtaining domain names belonging to a DGA family in detected domain names according to the probability that the detected domain name is a DGA family and the number of detected domain names included in each DGA family, including the following steps:
according to the obtained probability that the detected domain name is the DGA family, only the detected domain name with the probability of the DGA family being more than 0.95 is reserved,
grouping the detected domain names according to the DGA family types, counting the number of the detected domain names in each DGA family, and displaying the detected domain names in the corresponding DGA family when the number of the detected domain names in the same DGA family is larger than a threshold value.
In the method for detecting the DGA domain name, the characteristic that a large number of domain names are often generated in a short time by combining a DGA algorithm is used, the generation rule of the DGA domain name is found out by using a statistical method, and then the model formed by training of a machine learning algorithm is combined for detection, so that the problems of high false alarm and the like caused by domain name based detection in the prior art are solved, and the detection method has the advantages of flexible characteristics, low false alarm, low maintenance cost, high new variety detection rate and the like.
Drawings
FIG. 1 is a flow chart of a detection method according to an embodiment of the present invention;
fig. 2 is a flowchart for establishing a DGA domain name detection model and a DGA family detection model according to an embodiment of the present invention;
fig. 3 is a flowchart for obtaining a domain name belonging to a DGA domain name according to an embodiment of the present invention;
fig. 4 is a flowchart for obtaining domain names belonging to the DGA family according to an embodiment of the present invention.
Detailed Description
The following describes in more detail embodiments of the present invention with reference to the schematic drawings. The advantages and features of the present invention will become more apparent from the following description. It is to be noted that the drawings are in a very simplified form and are not to precise scale, which is provided for the purpose of facilitating and clearly illustrating embodiments of the present invention.
Hereinafter, if the method described herein comprises a series of steps, the order of such steps presented herein is not necessarily the only order in which such steps may be performed, and some of the described steps may be omitted and/or some other steps not described herein may be added to the method.
At present, most safety products are detected based on domain names and the characteristics of the safety products are extracted for detection, but the problem of high false alarm exists in practical application, and many normal domain names can be detected, such as Chinese pinyin domain names and normal overlong domain names, which are easily detected as DGA domain names.
Therefore, it is necessary to provide a method for detecting a DGA domain name, as shown in fig. 1, fig. 1 is a flowchart of a detection method provided in an embodiment of the present invention, where the detection method includes the following steps:
establishing a DGA domain name detection model and a DGA family detection model;
collecting DNS protocol data to be detected, and extracting a domain name to be detected in the DNS protocol data to be detected;
extracting features in the domain name to be detected;
normalizing the extracted features;
importing the normalized features into a DGA domain name detection model and a DGA family detection model to obtain the probability that each detected domain name is a DGA domain name and the probability that each detected domain name is a DGA family;
continuously detecting each detected domain name, and obtaining the domain name belonging to the DGA domain name in the detected domain name according to the probability that the detected domain name is the DGA domain name and the number of the characteristics;
continuously detecting each detected domain name, and obtaining the domain names belonging to the DGA families in the detected domain names according to the probability that the detected domain names are the DGA families and the number of the detected domain names contained in each DGA family;
domain names belonging to DGA and domain names belonging to DGA family are shown.
The invention combines the characteristic that a DGA algorithm often generates a large number of domain names in a short time, finds out the generation rule of the DGA domain names by using a statistical method, and then combines a model formed by training of a machine learning algorithm to detect, thereby solving the problems of higher false alarm and the like caused by detecting based on the domain names in the prior art, and leading the detection method to have the advantages of flexible characteristics, low false alarm, low maintenance cost, high new variety detection rate and the like.
As shown in fig. 2, fig. 2 is a flowchart for establishing a DGA domain name detection model and a DGA family detection model according to an embodiment of the present invention, where the step of establishing the DGA domain name detection model and the DGA family detection model in the present invention includes the following steps:
generating training data including normal domain name and DGA domain name data;
performing characteristic engineering processing on the training data, and extracting modeling characteristics;
carrying out standardization processing on the modeling characteristics;
performing model training on the normalized normal domain name modeling characteristics and the normalized DGA domain name modeling characteristics by adopting a machine learning algorithm to form a DGA domain name detection model and a DGA family detection model;
and storing the trained DGA domain name detection model and the DGA family detection model for detection.
The machine learning algorithm includes, but is not limited to, a random forest algorithm, a support vector machine algorithm, a logistic regression algorithm, and the like, and the optimal algorithm can be selected by adopting a method for evaluating the algorithm effect in the algorithm selection process and through cross validation of various methods.
Specifically, the mode of generating the training data is as follows:
collecting domain name data used for DGA domain name detection model training, wherein the domain name data comprises a normal domain name and DGA domain name data; DGA domain name data for DGA family detection model training is collected. In one embodiment, the DGA domain name data is collected by using a self-developed DGA data generator, which refers to a tool for generating DGA domain names by using DGA algorithms such as matsnu, zeus, pushdo and tiba, and the normal domain name data is obtained from global domain name ranking data counted by an Alexa website.
For the establishment of a DGA domain name detection model and a DGA family detection model, only the adopted training data is different, and the rest processes are the same.
Further, the training data is subjected to feature engineering processing, and modeling features are extracted in the following mode:
extracting 18 features of each domain name in the training data as modeling features, wherein the 18 features are respectively as follows: (1) domain name entropy, (2) domain name length, (3) entropy-to-length ratio, (4) frequency of consonant occurrence, (5) frequency of numeric occurrence, (6) frequency of repeated letter occurrence, (7) frequency of continuous numeric occurrence, (8) frequency of continuous consonant occurrence, (9) whether or not the top-level domain is a private domain, (10) the mean of the number of occurrences of the subdomain unigram in the sample, (11) the variance of the number of occurrences of the subdomain unigram in the sample, (12) the mean of the number of occurrences of the subdomain bigram in the sample, (13) the variance of the number of occurrences of the subdomain bigram in the sample, (14) the mean of the number of occurrences of the subdomain trigram in the sample, (15) the variance of the number of occurrences of the subdomain trigram in the sample, (16) the n-gram transition probability, (17) the ratio of the number of occurrences of the top-level domain in the positive and negative samples, and (18) the ratio of the number of occurrences of the subdomain trigram in the positive and negative samples.
Of these, 18 features can be classified into 5 classes, features (1) to (3) are features of a first class, features (4) to (8) are features of a second class, feature (9) is a third class, features (10) to (16) are features of a fourth class, and features (17) to (18) are features of a fifth class. And, the sub domain name in all the features specifically refers to the remaining part of the domain name without the top level domain name (TLD).
Generally, extracting features in a domain name to be detected includes the following steps: the extracted features in the domain name to be detected are the same as the extracted modeling features.
Further, because each feature dimension is different and the span of the value range is large, the features need to be normalized, and the modes of normalizing the extracted features and normalizing the modeling features are as follows: and normalizing all extracted features and all modeling feature values to be between 0 and 1.
In the method for detecting the DGA domain name, the detection method is as follows:
firstly, collecting DNS protocol data to be detected in a real production environment, extracting all DNS protocol flow data in the DNS protocol data to be detected by using a protocol analysis tool, and extracting a domain name to be detected from the DNS protocol flow data. The protocol resolution tools may be, for example, software tools such as bro and argus.
Secondly, the flow of extracting the features in the domain name to be detected and normalizing the extracted features is the same as the flow of training the model, and is not repeated here.
And then, introducing the normalized features into a DGA domain name detection model and a DGA family detection model to obtain the probability that each detected domain name is a DGA domain name and the probability that each detected domain name is a DGA family.
Further, after obtaining the probability that each detected domain name is a DGA domain name and the probability that each detected domain name is a DGA family, before continuing to detect each detected domain name, model pre-filtering is required, where the model pre-filtering includes: and removing the detected domain name with the DGA domain name probability and the DGA family probability smaller than 0.5, and only reserving the suspicious detected domain name for subsequent detection.
Preferably, in the method for detecting a DGA domain name, the method for detecting a DGA domain name further includes the following steps: determining a suspicious host, detecting DNS protocol data in all hosts, and determining the suspicious host in the following modes: counting the number of the unresponsive DNS queries sent by each host, and if the number of the unresponsive DNS queries sent is more than 20, determining that the host is a suspicious host; otherwise it is a non-suspicious host.
Finally, each detected domain name is continuously detected, as shown in fig. 3, fig. 3 is a flowchart for obtaining the domain name belonging to the DGA domain name provided in the embodiment of the present invention. Preferably, for suspicious hosts and non-suspicious hosts, the domain name belonging to the DGA domain name in the detected domain name is obtained according to the probability that the detected domain name is the DGA domain name and the number of features, and the method comprises the following steps:
calculating mutation time, and filtering out hosts without mutation time;
carrying out hierarchical clustering according to the non-response DNS query data in mutation time to generate class clusters, and filtering out the class clusters with the number less than 15;
calculating the upper and lower boundaries of the class cluster meeting the conditions, acquiring the responsive domain name within the mutation time, extracting and displaying 15 responsive domain names with the characteristics within the upper and lower boundaries of the class cluster;
and post-filtering, and filtering to obtain the non-response domain name with the DGA domain name probability larger than 0.95 in the non-response domain name and displaying the non-response domain name.
Further, each detected domain name is continuously detected, as shown in fig. 4, fig. 4 is a flowchart for obtaining domain names belonging to the DGA family according to the embodiment of the present invention. Preferably, for suspicious hosts and non-suspicious hosts, the domain names belonging to the DGA family in the detected domain names are obtained according to the probability that the detected domain names are the DGA family and the number of the detected domain names contained in each DGA family, and the method comprises the following steps:
according to the obtained probability that the detected domain name is the DGA family, only the detected domain name with the probability of the DGA family being more than 0.95 is reserved,
and grouping the reserved detected domain names, grouping the detected domain names according to the DGA family types, counting the number of the detected domain names in each DGA family, and displaying the detected domain names in the corresponding DGA family when the number of the detected domain names in the same DGA family is greater than a threshold value.
Obtaining the domain name belonging to the DGA domain name in the detected domain name according to the probability of the detected domain name being the DGA domain name and the number of the characteristics; obtaining domain names belonging to the DGA families in the detected domain names according to the probability that the detected domain names are the DGA families and the number of the detected domain names contained in each DGA family; the detection method not only depends on the domain name and the characteristics thereof, but also increases flexible processing means such as judging the quantity and the like in specific time, and the detection method has the advantages of flexible characteristics, low false alarm, low maintenance cost, high new variety detection rate and the like.
In summary, in the method for detecting a DGA domain name provided by the present invention, by combining the characteristic that a DGA algorithm often generates a large number of domain names in a short time, a statistical method is used to find out the generation rule of the DGA domain name, and then a model formed by training a machine learning algorithm is used for detection, so that the problem of high false alarm caused by domain name based detection in the prior art is solved, and the detection method of the present invention has the advantages of flexible characteristics, low false alarm, low maintenance cost, high new variety detection rate, etc.
The above description is only a preferred embodiment of the present invention, and does not limit the present invention in any way. Any person skilled in the art can make any equivalent substitutions or modifications on the technical solutions and technical contents disclosed in the present invention without departing from the scope of the technical solutions of the present invention, and still fall within the protection scope of the present invention without departing from the technical solutions of the present invention.

Claims (7)

1. A method for detecting a DGA domain name is characterized by comprising the following steps:
establishing a DGA domain name detection model and a DGA family detection model;
collecting DNS protocol data to be detected, and extracting a domain name to be detected in the DNS protocol data to be detected;
extracting features in the domain name to be detected;
normalizing the extracted features;
leading the normalized features into a DGA domain name detection model and a DGA family detection model to obtain the probability that each detected domain name is a DGA domain name and the probability that each detected domain name is a DGA family;
continuously detecting each detected domain name, and obtaining the domain name belonging to the DGA domain name in the detected domain name according to the probability that the detected domain name is the DGA domain name and the number of the characteristics; continuously detecting each detected domain name, and obtaining the domain names belonging to the DGA families in the detected domain names according to the probability that the detected domain names are the DGA families and the number of the detected domain names contained in each DGA family;
the method is as follows: determining a suspicious host, detecting DNS protocol data in all hosts, and determining the suspicious host in the following modes: counting the number of the unresponsive DNS queries sent by each host, and if the number of the unresponsive DNS queries sent is more than 20, determining that the host is a suspicious host; otherwise, the host is a non-suspicious host;
for suspicious hosts and non-suspicious hosts, obtaining domain names belonging to the DGA domain name in the detected domain name according to the probability and the characteristic quantity of the DGA domain name of the detected domain name, and comprising the following steps: calculating mutation time, and filtering out hosts without mutation time; carrying out hierarchical clustering according to the non-response DNS query data in mutation time to generate class clusters, and filtering out the class clusters with the number less than 15; calculating the upper and lower boundaries of the class cluster meeting the conditions, acquiring the domain names with response in the mutation time, extracting and displaying 15 domain names with response in the upper and lower boundaries of the class cluster, wherein the characteristics of the domain names with response are 15; post-filtering, and filtering to obtain a non-response domain name with the DGA domain name probability of more than 0.95 in the non-response domain name and displaying the non-response domain name;
for suspicious hosts and non-suspicious hosts, obtaining domain names belonging to the DGA family in the detected domain names according to the probability that the detected domain names are the DGA family and the number of the detected domain names contained in each DGA family, and comprising the following steps: and according to the obtained probability that the detected domain name is a DGA family, only the detected domain names with the probability of the DGA family being more than 0.95 are reserved, the reserved detected domain names are grouped according to the DGA family type, the number of the detected domain names in each DGA family is counted, and when the number of the detected domain names in the same DGA family is more than a threshold value, the detected domain names in the corresponding DGA family are displayed.
2. The method of detecting a DGA domain name of claim 1, wherein establishing a DGA domain name detection model and a DGA family detection model comprises the steps of:
generating training data including normal domain name and DGA domain name data;
performing feature engineering processing on the training data, and extracting modeling features;
carrying out standardization processing on the modeling characteristics;
and performing model training on the normalized normal domain name modeling characteristics and the normalized DGA domain name modeling characteristics by adopting a machine learning algorithm to form a DGA domain name detection model and a DGA family detection model.
3. The method of detecting a DGA domain name of claim 2, wherein the training data is generated by:
collecting domain name data used for DGA domain name detection model training, wherein the domain name data comprises a normal domain name and DGA domain name data;
DGA domain name data for DGA family detection model training is collected.
4. The method of detecting a DGA domain name of claim 2, wherein the training data is subjected to feature engineering to extract modeling features in the following manner:
extracting 18 features of each domain name in the training data as modeling features, wherein the 18 features are respectively as follows: the method comprises the steps of domain name entropy, domain name length, ratio of entropy to length, frequency of consonants, frequency of digits appearing, frequency of repeated letters appearing, frequency of continuous digits appearing, frequency of continuous consonants appearing, whether a top level domain is a private domain, mean number of times of sub-domain name unigram appearing in a sample, variance of times of sub-domain name unigram appearing in a sample, mean number of times of sub-domain name bigram appearing in a sample, variance of times of sub-domain name bigram appearing in a sample, mean number of times of sub-domain name trigram appearing in a sample, variance of times of sub-domain name trigram appearing in a sample, n-gram transition probability, ratio of times of top level domain appearing in positive and negative samples, and ratio of times of sub-domain name trigram appearing in positive and negative samples.
5. The method of detecting a DGA domain name of claim 4, wherein extracting the features in the domain name to be detected comprises the steps of:
the extracted features in the domain name to be detected are the same as the extracted modeling features.
6. The method of detecting a DGA domain name according to claim 2, wherein the extracted features are normalized and the modeled features are normalized by:
normalizing all extracted features and all modeling feature values to be between 0 and 1.
7. The method of claim 1 wherein model pre-filtering is further required before continuing to detect each detected domain name after obtaining the probability that each detected domain name is a DGA domain name and the probability that each detected domain name is a DGA family, the step of model pre-filtering comprising:
and removing the detected domain names with the DGA domain name probability and the DGA family probability of less than 0.5.
CN202010684753.4A 2020-07-16 2020-07-16 Method for detecting DGA domain name Active CN111935097B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010684753.4A CN111935097B (en) 2020-07-16 2020-07-16 Method for detecting DGA domain name

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010684753.4A CN111935097B (en) 2020-07-16 2020-07-16 Method for detecting DGA domain name

Publications (2)

Publication Number Publication Date
CN111935097A CN111935097A (en) 2020-11-13
CN111935097B true CN111935097B (en) 2022-07-19

Family

ID=73313191

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010684753.4A Active CN111935097B (en) 2020-07-16 2020-07-16 Method for detecting DGA domain name

Country Status (1)

Country Link
CN (1) CN111935097B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113158660B (en) * 2021-04-09 2023-03-21 深圳市联软科技股份有限公司 Sub-domain name discovery method and system applied to penetration test
CN113328994B (en) * 2021-04-30 2022-07-12 新华三信息安全技术有限公司 Malicious domain name processing method, device, equipment and machine readable storage medium
CN113746952B (en) * 2021-09-14 2024-04-16 京东科技信息技术有限公司 DGA domain name detection method and device, electronic equipment and computer storage medium
CN114844682B (en) * 2022-04-11 2023-05-26 广东工业大学 DGA domain name detection method and system
CN116743483B (en) * 2023-07-14 2024-04-16 上海斗象信息科技有限公司 Subdomain name generating method, subdomain name naming rule learning method and device

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105407103A (en) * 2015-12-19 2016-03-16 中国人民解放军信息工程大学 Network threat evaluation method based on multi-granularity anomaly detection
CN106713312A (en) * 2016-12-21 2017-05-24 深圳市深信服电子科技有限公司 Method and device for detecting illegal domain name
US9690938B1 (en) * 2015-08-05 2017-06-27 Invincea, Inc. Methods and apparatus for machine learning based malware detection
CN107645503A (en) * 2017-09-20 2018-01-30 杭州安恒信息技术有限公司 A kind of detection method of the affiliated DGA families of rule-based malice domain name
CN107742079A (en) * 2017-10-18 2018-02-27 杭州安恒信息技术有限公司 Malware recognition methods and system
CN107786575A (en) * 2017-11-11 2018-03-09 北京信息科技大学 A kind of adaptive malice domain name detection method based on DNS flows
CN108600200A (en) * 2018-04-08 2018-09-28 腾讯科技(深圳)有限公司 Domain name detection method, device, computer equipment and storage medium
CN110233849A (en) * 2019-06-20 2019-09-13 电子科技大学 The method and system of network safety situation analysis
CN110263827A (en) * 2019-05-31 2019-09-20 中国工商银行股份有限公司 Abnormal transaction detection method and device based on transaction rule identification
CN110602100A (en) * 2019-09-16 2019-12-20 上海斗象信息科技有限公司 DNS tunnel flow detection method
CN110826059A (en) * 2019-09-19 2020-02-21 浙江工业大学 Method and device for defending black box attack facing malicious software image format detection model
CN111031026A (en) * 2019-12-09 2020-04-17 杭州安恒信息技术股份有限公司 DGA malicious software infected host detection method
CN111147459A (en) * 2019-12-12 2020-05-12 北京网思科平科技有限公司 C & C domain name detection method and device based on DNS request data

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10198579B2 (en) * 2014-08-22 2019-02-05 Mcafee, Llc System and method to detect domain generation algorithm malware and systems infected by such malware
US10326736B2 (en) * 2016-11-02 2019-06-18 Cisco Technology, Inc. Feature-based classification of individual domain queries

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9690938B1 (en) * 2015-08-05 2017-06-27 Invincea, Inc. Methods and apparatus for machine learning based malware detection
CN105407103A (en) * 2015-12-19 2016-03-16 中国人民解放军信息工程大学 Network threat evaluation method based on multi-granularity anomaly detection
CN106713312A (en) * 2016-12-21 2017-05-24 深圳市深信服电子科技有限公司 Method and device for detecting illegal domain name
CN107645503A (en) * 2017-09-20 2018-01-30 杭州安恒信息技术有限公司 A kind of detection method of the affiliated DGA families of rule-based malice domain name
CN107742079A (en) * 2017-10-18 2018-02-27 杭州安恒信息技术有限公司 Malware recognition methods and system
CN107786575A (en) * 2017-11-11 2018-03-09 北京信息科技大学 A kind of adaptive malice domain name detection method based on DNS flows
CN108600200A (en) * 2018-04-08 2018-09-28 腾讯科技(深圳)有限公司 Domain name detection method, device, computer equipment and storage medium
CN110263827A (en) * 2019-05-31 2019-09-20 中国工商银行股份有限公司 Abnormal transaction detection method and device based on transaction rule identification
CN110233849A (en) * 2019-06-20 2019-09-13 电子科技大学 The method and system of network safety situation analysis
CN110602100A (en) * 2019-09-16 2019-12-20 上海斗象信息科技有限公司 DNS tunnel flow detection method
CN110826059A (en) * 2019-09-19 2020-02-21 浙江工业大学 Method and device for defending black box attack facing malicious software image format detection model
CN111031026A (en) * 2019-12-09 2020-04-17 杭州安恒信息技术股份有限公司 DGA malicious software infected host detection method
CN111147459A (en) * 2019-12-12 2020-05-12 北京网思科平科技有限公司 C & C domain name detection method and device based on DNS request data

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
detecting domain generation algorithms based on reinforcement learning;cheng hua et al;《2019 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)》;20191231;全文 *
一种通用的恶意域名检测集成学习方法;刘浩杰等;《网络空间安全》;20190925(第09期);全文 *
基于组合分类器的恶意域名检测技术;盛剑涛等;《电信科学》;20200520(第05期);全文 *

Also Published As

Publication number Publication date
CN111935097A (en) 2020-11-13

Similar Documents

Publication Publication Date Title
CN111935097B (en) Method for detecting DGA domain name
CN107566376B (en) Threat information generation method, device and system
CN110233849B (en) Method and system for analyzing network security situation
CN105072214B (en) C&C domain name recognition methods based on domain name feature
CN105897714A (en) Botnet detection method based on DNS (Domain Name System) flow characteristics
CN112866023B (en) Network detection method, model training method, device, equipment and storage medium
CN111818198B (en) Domain name detection method, domain name detection device, equipment and medium
CN108712403B (en) Illegal domain name mining method based on domain name construction similarity
CN109660518B (en) Communication data detection method and device of network and machine-readable storage medium
CN107666490A (en) A kind of suspicious domain name detection method and device
CN104077396A (en) Method and device for detecting phishing website
CN109922065B (en) Quick identification method for malicious website
Layton et al. Automatically determining phishing campaigns using the uscap methodology
CN111031026A (en) DGA malicious software infected host detection method
CN109039875B (en) Phishing mail detection method and system based on link characteristic analysis
CN111131260A (en) Mass network malicious domain name identification and classification method and system
CN113098887A (en) Phishing website detection method based on website joint characteristics
CN113704328B (en) User behavior big data mining method and system based on artificial intelligence
CN110830607B (en) Domain name analysis method and device and electronic equipment
CN112948725A (en) Phishing website URL detection method and system based on machine learning
CN112565164B (en) Dangerous IP identification method, dangerous IP identification device and computer readable storage medium
CN113645173A (en) Malicious domain name identification method, system and equipment
CN109120733B (en) Detection method for communication by using DNS (Domain name System)
CN111431884B (en) Host computer defect detection method and device based on DNS analysis
CN113965377A (en) Attack behavior detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant