CN109302418B - Malicious domain name detection method and device based on deep learning - Google Patents

Malicious domain name detection method and device based on deep learning Download PDF

Info

Publication number
CN109302418B
CN109302418B CN201811361303.0A CN201811361303A CN109302418B CN 109302418 B CN109302418 B CN 109302418B CN 201811361303 A CN201811361303 A CN 201811361303A CN 109302418 B CN109302418 B CN 109302418B
Authority
CN
China
Prior art keywords
domain name
sample set
malicious
training sample
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811361303.0A
Other languages
Chinese (zh)
Other versions
CN109302418A (en
Inventor
黄小鹏
赵子渊
刘欣春
陈丽红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Eastcompeace Technology Co Ltd
Original Assignee
Eastcompeace Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Eastcompeace Technology Co Ltd filed Critical Eastcompeace Technology Co Ltd
Priority to CN201811361303.0A priority Critical patent/CN109302418B/en
Publication of CN109302418A publication Critical patent/CN109302418A/en
Application granted granted Critical
Publication of CN109302418B publication Critical patent/CN109302418B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • H04L63/0236Filtering by address, protocol, port number or service, e.g. IP-address or URL
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/09Mapping addresses
    • H04L61/10Mapping addresses of different types
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]

Abstract

The embodiment of the invention discloses a malicious domain name detection method and device based on deep learning, wherein a first weak correlation characteristic of each domain name in a domain name training set is extracted, and a malicious domain name detection model is trained by utilizing a known domain name detection result and the first weak correlation characteristic of the domain name, so that the trained malicious domain name detection model can still obtain a normal or malicious detection result of the domain name through the first weak correlation characteristic when the characteristic directly related to the normal or malicious detection result of the domain name is not obtained, and the technical problems that the domain name cannot be automatically analyzed due to the fact that the traditional malicious domain name detection is generally based on a result of threat intelligence library or manual analysis and the existing efficiency is low are solved.

Description

Malicious domain name detection method and device based on deep learning
Technical Field
The invention relates to the technical field of domain name detection, in particular to a malicious domain name detection method and device based on deep learning.
Background
The DNS (domain name system) is an important infrastructure of the internet and is mainly responsible for performing the inter-conversion between IP addresses and domain names. Due to the openness of the DNS, a hacker can use the malicious domain name to implement network attack or broiler control, so that the detection of the malicious domain name becomes an important measure for network security protection.
The traditional malicious domain name detection is generally based on the result of threat intelligence library or manual analysis, and the domain name cannot be automatically analyzed, so that the technical problem of low efficiency exists.
Disclosure of Invention
The invention provides a malicious domain name detection method and device based on deep learning, and solves the technical problems that the traditional malicious domain name detection is generally based on the results of threat intelligence libraries or manual analysis, the domain name cannot be automatically analyzed, and the efficiency is low.
The invention provides a malicious domain name detection method based on deep learning, which comprises the following steps:
acquiring a domain name training sample set;
acquiring a first weak correlation characteristic of each domain name in the domain name training sample set;
performing malicious domain name detection training based on deep learning by using the domain name training sample set and the first weak correlation characteristic of each domain name in the domain name training sample set to generate a malicious domain name detection model;
and detecting whether the unknown domain name is a malicious domain name or not through the malicious domain name detection model.
Optionally, the obtaining the first weak correlation feature of each domain name in the domain name training sample set specifically includes:
obtaining an A record, an AAAA record, an MX record and an NS record of each domain name in the domain name training sample set through DNS query;
obtaining the domain name registration time, the domain name registrant, the domain name registration mailbox and the domain name registration mechanism of each domain name in the domain name training sample set through WHOIS query;
and acquiring domain name ranking information, the search engine listing number of the domain names, a WEB home page corresponding to the domain names, WEB HTTPS certificate information corresponding to the domain names, IP geographic positions corresponding to the domain names and domain name IP resolution history of each domain name in the domain name training sample set through a WEB request tool.
Optionally, after the first weak correlation feature of each domain name in the domain name training sample set is obtained, the performing malicious domain name detection training based on deep learning by using the domain name training sample set and the first weak correlation feature of each domain name in the domain name training sample set further includes before the generating a malicious domain name detection model:
and normalizing the first weak correlation characteristic of each domain name in the domain name training sample set, and converting the first weak correlation characteristic of each domain name in the domain name training sample set into floating point numbers in a range of [0, 1).
Optionally, the domain name training sample set includes a positive sample set and a negative sample set;
the domain names in the positive sample set are normal domain names, and the domain names in the negative sample set are malicious domain names.
Optionally, the performing malicious domain name detection training based on deep learning by using the domain name training sample set and the weak correlation features of each domain name in the domain name training sample set specifically includes:
performing model training by using the domain name training sample set and the first weak correlation characteristics of each domain name in the domain name training sample set through at least one feedforward neural network model to generate at least one malicious domain name detection model;
when two or more malicious domain name detection models are generated, a domain name test sample set is obtained;
acquiring a second weak correlation characteristic of each domain name in the domain name test sample set;
respectively testing two or more generated malicious domain name detection models by utilizing the domain name test sample set and the second weak correlation characteristics of each domain name in the domain name test sample set;
respectively counting test results of the generated two or more malicious domain name detection models, wherein the test results comprise accuracy and recall rate;
determining an optimal malicious domain name detection model in two or more malicious domain name detection models according to the test result;
correspondingly, the detecting, by the malicious domain name detection model, whether the unknown domain name is a malicious domain name specifically includes:
and detecting whether the unknown domain name is a malicious domain name or not through the optimal malicious domain name detection model.
The invention provides a malicious domain name detection device based on deep learning, which comprises:
the first acquisition unit is used for acquiring a domain name training sample set;
the second acquisition unit is used for acquiring a first weak correlation characteristic of each domain name in the domain name training sample set;
the training unit is used for carrying out malicious domain name detection training based on deep learning by utilizing the domain name training sample set and the first weak correlation characteristic of each domain name in the domain name training sample set to generate a malicious domain name detection model;
and the detection unit is used for detecting whether the unknown domain name is a malicious domain name or not through the malicious domain name detection model.
Optionally, the second obtaining unit includes:
the first acquisition subunit is used for acquiring an A record, an AAAA record, an MX record and an NS record of each domain name in the domain name training sample set through DNS query;
the second acquisition subunit is used for acquiring the domain name registration time, the domain name registrant, the domain name registration mailbox and the domain name registration mechanism of each domain name in the domain name training sample set through WHOIS query;
and the third acquiring subunit is configured to acquire, by using a WEB request tool, domain name ranking information, search engine entry number of the domain names, a WEB home page corresponding to the domain names, WEB HTTPS certificate information corresponding to the domain names, IP geographical positions corresponding to the domain names, and domain name IP resolution history of each domain name in the domain name training sample set.
Optionally, the method further comprises:
a preprocessing unit, configured to perform normalization processing on the first weak correlation feature of each domain name in the domain name training sample set, and convert the first weak correlation feature of each domain name in the domain name training sample set into a floating point number in a [0,1) range.
Optionally, the domain name training sample set includes a positive sample set and a negative sample set;
the domain names in the positive sample set are normal domain names, and the domain names in the negative sample set are malicious domain names.
Optionally, the training unit comprises:
the training subunit is used for performing model training by adopting the domain name training sample set and the first weak correlation characteristic of each domain name in the domain name training sample set through at least one feedforward neural network model to generate at least one malicious domain name detection model;
the fourth acquisition subunit is used for acquiring a domain name test sample set when two or more malicious domain name detection models are generated;
a fifth obtaining subunit, configured to obtain a second weak correlation feature of each domain name in the domain name test sample set;
the testing subunit is used for respectively testing the two or more generated malicious domain name detection models by utilizing the domain name testing sample set and the second weak correlation characteristics of each domain name in the domain name testing sample set;
the statistical subunit is used for respectively counting the test results of the generated two or more malicious domain name detection models, and the test results comprise accuracy and recall rate;
the determining subunit is used for determining an optimal malicious domain name detection model in the two or more malicious domain name detection models according to the test result;
correspondingly, the detection unit is further configured to detect whether the unknown domain name is a malicious domain name through the optimal malicious domain name detection model.
According to the technical scheme, the invention has the following advantages:
according to the method, the first weak correlation characteristics of each domain name in the domain name training set are extracted, the known domain name detection result and the first weak correlation characteristics of the domain name are used for training the malicious domain name detection model, so that the trained malicious domain name detection model can still obtain the normal or malicious detection result of the domain name through the first weak correlation characteristics when the characteristics directly related to the normal or malicious detection result of the domain name are not obtained, and the technical problems that the domain name cannot be automatically analyzed and the efficiency is low due to the fact that the traditional malicious domain name detection is generally based on the result of threat information base or manual analysis are solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
Fig. 1 is a schematic flowchart of an embodiment of a malicious domain name detection method based on deep learning according to the present invention;
fig. 2 is a schematic flowchart of another embodiment of a malicious domain name detection method based on deep learning according to the present invention;
fig. 3 is a schematic structural diagram of an embodiment of a malicious domain name detection apparatus based on deep learning according to the present invention;
fig. 4 is a schematic structural diagram of another embodiment of a malicious domain name detection apparatus based on deep learning according to the present invention.
Detailed Description
The embodiment of the invention provides a malicious domain name detection method and device based on deep learning, and solves the technical problems that the traditional malicious domain name detection is generally based on the result of threat intelligence library or manual analysis, the domain name cannot be automatically analyzed, and the efficiency is low.
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, an embodiment of a malicious domain name detection method based on deep learning according to the present invention includes:
101. acquiring a domain name training sample set;
it should be noted that, first, a domain name training sample set is required to be obtained, where the domain name training sample set includes various feature information of a domain name with known normal or malicious results, the malicious domain name is used as a negative sample in the training sample set, and the normal domain name is used as a positive sample in the training sample set.
102. Acquiring a first weak correlation characteristic of each domain name in a domain name training sample set;
it should be noted that the weakly correlated feature does not clearly lead to a defined result by this feature, i.e. there is no direct causal link. Specifically, weakly correlated features include, but are not limited to: domain name registration time, domain name registrant, domain name registration mailbox and mechanism; the method comprises the steps of A recording IP obtained by domain name resolution, historical record of domain name resolution, certificate of website corresponding to the domain name, record corresponding to the domain name, MX recording of the domain name, web home page of the domain name, suffix of the domain name, global ranking of the domain name, the recording number of the domain name in a search engine, the geographic position of the IP corresponding to the domain name and the like.
And after a domain name training sample set is obtained, the first weak correlation characteristic of each domain name in the domain name training sample set is collected.
103. Carrying out malicious domain name detection training based on deep learning by utilizing the domain name training sample set and the first weak correlation characteristic of each domain name in the domain name training sample set to generate a malicious domain name detection model;
it should be noted that the malicious domain name detection training based on deep learning is performed by using the domain name training sample set and the first weak correlation feature of each domain name in the domain name training sample set, so as to generate a malicious domain name detection model, wherein the training model is a deep learning model.
104. Detecting whether the unknown domain name is a malicious domain name or not through a malicious domain name detection model;
it should be noted that, after the malicious domain name detection model is generated through training, the unknown domain name is input into the malicious domain name detection model as an input quantity, and a detection result of the unknown domain name is obtained.
According to the embodiment of the invention, the first weak correlation characteristic of each domain name in the domain name training set is extracted, and the known domain name detection result and the first weak correlation characteristic of the domain name are used for training the malicious domain name detection model, so that the trained malicious domain name detection model can still obtain the normal or malicious detection result of the domain name through the first weak correlation characteristic when the characteristic directly related to the normal or malicious detection result of the domain name is not obtained, and the technical problems that the traditional malicious domain name detection is generally based on the result of threat intelligence library or manual analysis, the domain name cannot be automatically analyzed, and the existing efficiency is low are solved.
The above is a description of an embodiment of the malicious domain name detection method based on deep learning provided by the present invention, and another embodiment of the malicious domain name detection method based on deep learning provided by the present invention will be described below.
Referring to fig. 2, another embodiment of a malicious domain name detection method based on deep learning according to the present invention includes:
201. acquiring a domain name training sample set;
it should be noted that, first, a domain name training sample set is required to be obtained, where the domain name training sample set includes various feature information of a domain name with known normal or malicious results, the malicious domain name is used as a negative sample in the training sample set, and the normal domain name is used as a positive sample in the training sample set.
202. Obtaining an A record, an AAAA record, an MX record and an NS record of each domain name in a domain name training sample set through DNS query;
it should be noted that, in the first aspect, the a record, the AAAA record, the MX record, and the NS record of each domain name in the domain name training sample can be obtained through DNS query.
203. Obtaining the domain name registration time, domain name registrars, domain name registration mailboxes and domain name registration mechanisms of each domain name in a domain name training sample set through WHOIS query;
it should be noted that, in the second aspect, the domain name registration time, the domain name registrant, the domain name registration mailbox, and the domain name registration mechanism of each domain name in the domain name training sample set may be obtained through WHOIS query.
204. Through a WEB request tool, acquiring domain name ranking information, search engine listing number of domain names, WEB home pages corresponding to the domain names, WEB HTTPS certificate information corresponding to the domain names, IP geographic positions corresponding to the domain names and domain name IP resolution history of each domain name in a domain name training sample set;
in the third aspect, the domain name ranking information, the search engine listing number of the domain names, the WEB home page corresponding to the domain names, the WEB HTTPS certificate information corresponding to the domain names, the IP geographical positions corresponding to the domain names, and the domain name IP resolution history of each domain name in the domain name training sample set may also be obtained by using a WEB request tool.
205. Normalizing the first weak correlation characteristic of each domain name in the domain name training sample set, and converting the first weak correlation characteristic of each domain name in the domain name training sample set into a floating point number in a range of [0,1 ];
it should be noted that after the first weak correlation feature of each domain name in the domain name training sample set is obtained, normalization processing needs to be performed on the first weak correlation feature, and the first weak correlation feature is converted into a floating point number in a range of [0,1), which actually operates as follows:
for the first weak correlation characteristics with small numerical value variation difference, such as A records, AAAA records, MX records, NS records, domain name registrars and the like of the domain name, normalization processing can be carried out in a linear scaling mode;
for the first weak correlation characteristics with larger difference of the domain name registration time, the domain name ranking information and the number of lamp value values recorded by the search engine of the domain name, normalization processing can be performed in a Z score scaling mode.
206. Performing model training by adopting a domain name training sample set and a first weak correlation characteristic of each domain name in the domain name training sample set through at least one feedforward neural network model to generate at least one malicious domain name detection model;
it should be noted that, the domain name training sample set and the first weak correlation feature of each domain name in the domain name training sample set are used to perform malicious domain name detection training based on deep learning, so as to generate at least one malicious domain name detection model, where the training model may be a KNN model, a convolutional neural network model, or the like.
207. When two or more malicious domain name detection models are generated, a domain name test sample set is obtained;
it should be noted that when two or more malicious domain name detection models are generated, the efficiency and the success rate of the two or more malicious domain name detection models need to be compared, so as to obtain a domain name test sample set, and the two or more malicious domain name detection models are tested through the domain name test sample set.
It is understood that the domain name test sample set may be a partial sample set extracted from the domain name training sample set, or may be a single batch of sample sets including positive samples and negative samples.
208. Acquiring a second weak correlation characteristic of each domain name in a domain name test sample set;
it should be noted that, similarly, after the domain name test sample set is obtained, the second weakly correlated feature of each domain name in the domain name test sample set is obtained.
209. Respectively testing the generated two or more malicious domain name detection models by utilizing the domain name test sample set and the second weak correlation characteristics of each domain name in the domain name test sample set;
it should be noted that, the generated two or more malicious domain name detection models are respectively tested by using the domain name test sample set and the second weak correlation characteristic of each domain name in the domain name test sample set, so as to obtain a test result.
210. Respectively counting test results of the generated two or more malicious domain name detection models, wherein the test results comprise accuracy and recall rate;
it should be noted that after the test results of each domain name for two or more malicious domain name detection models are obtained, the accuracy and the recall rate of the two or more malicious domain name detection models are respectively counted.
211. Determining an optimal malicious domain name detection model in two or more malicious domain name detection models according to the test result;
it should be noted that the malicious domain name detection model with the highest accuracy and recall rate in the two or more malicious domain name detection models is selected as the optimal malicious domain name detection model.
212. Detecting whether the unknown domain name is a malicious domain name or not through an optimal malicious domain name detection model;
it should be noted that, after the optimal malicious domain name detection model is determined, the unknown domain name is input into the optimal malicious domain name detection model as an input quantity, and a detection result of the unknown domain name is obtained.
The above is a description of another embodiment of the malicious domain name detection method based on deep learning provided by the present invention, and an embodiment of the malicious domain name detection device based on deep learning provided by the present invention will be described below.
Referring to fig. 3, an embodiment of a malicious domain name detection apparatus based on deep learning according to the present invention includes:
a first obtaining unit 301, configured to obtain a domain name training sample set;
a second obtaining unit 302, configured to obtain a first weak correlation feature of each domain name in a domain name training sample set;
the training unit 303 is configured to perform malicious domain name detection training based on deep learning by using the domain name training sample set and the first weak correlation feature of each domain name in the domain name training sample set, and generate a malicious domain name detection model;
a detecting unit 304, configured to detect whether the unknown domain name is a malicious domain name through a malicious domain name detection model.
The above is a description of an embodiment of a malicious domain name detection device based on deep learning provided by the present invention, and another embodiment of a malicious domain name detection device based on deep learning provided by the present invention will be described below.
Referring to fig. 4, another embodiment of a malicious domain name detection apparatus based on deep learning according to the present invention includes:
a first obtaining unit 401, configured to obtain a domain name training sample set;
a second obtaining unit 402, configured to obtain a first weak correlation feature of each domain name in a domain name training sample set;
the second acquisition unit 402 includes:
the first obtaining subunit 4021 is configured to obtain an a record, an AAAA record, an MX record, and an NS record of each domain name in the domain name training sample set through DNS query;
the second obtaining subunit 4022 is configured to obtain, through WHOIS query, domain name registration time, a domain name registrar, a domain name registration mailbox, and a domain name registration mechanism for each domain name in the domain name training sample set;
a third obtaining subunit 4023, configured to obtain, by using a WEB request tool, domain name ranking information, search engine entry number of domain names, a WEB home page corresponding to a domain name, WEB HTTPS certificate information corresponding to a domain name, a geographic location of an IP corresponding to a domain name, and a domain name IP resolution history of each domain name in a domain name training sample set;
a preprocessing unit 403, configured to perform normalization processing on the first weak correlation feature of each domain name in the domain name training sample set, and convert the first weak correlation feature of each domain name in the domain name training sample set into a floating point number in a range of [0, 1);
a training unit 404, configured to perform malicious domain name detection training based on deep learning by using the domain name training sample set and the first weak correlation feature of each domain name in the domain name training sample set, and generate a malicious domain name detection model;
the training unit 404 includes:
the training subunit 4041 is configured to perform model training by using a domain name training sample set and the first weak correlation feature of each domain name in the domain name training sample set through at least one feedforward neural network model, and generate at least one malicious domain name detection model;
a fourth obtaining subunit 4042, configured to obtain a domain name test sample set when two or more malicious domain name detection models are generated;
a fifth obtaining subunit 4043, configured to obtain a second weak correlation feature of each domain name in the domain name test sample set;
the testing subunit 4044 is configured to respectively test the two or more generated malicious domain name detection models by using the domain name testing sample set and the second weak correlation feature of each domain name in the domain name testing sample set;
a statistics subunit 4045, configured to separately count test results of the generated two or more malicious domain name detection models, where the test results include accuracy and recall rate;
the determining subunit 4046 is configured to determine, according to the test result, an optimal malicious domain name detection model in the two or more malicious domain name detection models;
the detecting unit 405 is configured to detect whether the unknown domain name is a malicious domain name through an optimal malicious domain name detection model.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (8)

1. A malicious domain name detection method based on deep learning is characterized by comprising the following steps:
acquiring a domain name training sample set;
acquiring a first weak correlation characteristic of each domain name in the domain name training sample set;
performing malicious domain name detection training based on deep learning by using the domain name training sample set and the first weak correlation characteristic of each domain name in the domain name training sample set to generate a malicious domain name detection model;
detecting whether the unknown domain name is a malicious domain name or not through the malicious domain name detection model;
the malicious domain name detection training based on deep learning is performed by using the domain name training sample set and the weak correlation characteristics of each domain name in the domain name training sample set, and the generation of the malicious domain name detection model specifically comprises the following steps:
performing model training by adopting the domain name training sample set and the first weak correlation characteristic of each domain name in the domain name training sample set through at least one feedforward neural network model to generate at least two malicious domain name detection models;
when two or more malicious domain name detection models are generated, a domain name test sample set is obtained;
acquiring a second weak correlation characteristic of each domain name in the domain name test sample set;
respectively testing two or more generated malicious domain name detection models by utilizing the domain name test sample set and the second weak correlation characteristics of each domain name in the domain name test sample set;
respectively counting test results of the generated two or more malicious domain name detection models, wherein the test results comprise accuracy and recall rate;
determining an optimal malicious domain name detection model in two or more malicious domain name detection models according to the test result;
correspondingly, the detecting, by the malicious domain name detection model, whether the unknown domain name is a malicious domain name specifically includes:
and detecting whether the unknown domain name is a malicious domain name or not through the optimal malicious domain name detection model.
2. The malicious domain name detection method based on deep learning according to claim 1, wherein the obtaining of the first weakly-correlated feature of each domain name in the domain name training sample set specifically comprises:
obtaining an A record, an AAAA record, an MX record and an NS record of each domain name in the domain name training sample set through DNS query;
obtaining the domain name registration time, the domain name registrant, the domain name registration mailbox and the domain name registration mechanism of each domain name in the domain name training sample set through WHOIS query;
and acquiring domain name ranking information, the search engine listing number of the domain names, a WEB home page corresponding to the domain names, WEB HTTPS certificate information corresponding to the domain names, IP geographic positions corresponding to the domain names and domain name IP resolution history of each domain name in the domain name training sample set through a WEB request tool.
3. The method according to claim 2, wherein after the first weak correlation feature of each domain name in the domain name training sample set is obtained, the malicious domain name detection training based on deep learning is performed by using the domain name training sample set and the first weak correlation feature of each domain name in the domain name training sample set, and before the malicious domain name detection model is generated, the method further comprises:
and normalizing the first weak correlation characteristic of each domain name in the domain name training sample set, and converting the first weak correlation characteristic of each domain name in the domain name training sample set into floating point numbers in a range of [0, 1).
4. The malicious domain name detection method based on deep learning of claim 1, wherein the domain name training sample set comprises a positive sample set and a negative sample set;
the domain names in the positive sample set are normal domain names, and the domain names in the negative sample set are malicious domain names.
5. A malicious domain name detection device based on deep learning is characterized by comprising:
the first acquisition unit is used for acquiring a domain name training sample set;
the second acquisition unit is used for acquiring a first weak correlation characteristic of each domain name in the domain name training sample set;
the training unit is used for carrying out malicious domain name detection training based on deep learning by utilizing the domain name training sample set and the first weak correlation characteristic of each domain name in the domain name training sample set to generate a malicious domain name detection model;
the detection unit is used for detecting whether the unknown domain name is a malicious domain name or not through the malicious domain name detection model;
the training unit includes:
the training subunit is used for performing model training by adopting the domain name training sample set and the first weak correlation characteristic of each domain name in the domain name training sample set through at least one feedforward neural network model to generate at least two malicious domain name detection models;
the fourth acquisition subunit is used for acquiring a domain name test sample set when two or more malicious domain name detection models are generated;
a fifth obtaining subunit, configured to obtain a second weak correlation feature of each domain name in the domain name test sample set;
the testing subunit is used for respectively testing the two or more generated malicious domain name detection models by utilizing the domain name testing sample set and the second weak correlation characteristics of each domain name in the domain name testing sample set;
the statistical subunit is used for respectively counting the test results of the generated two or more malicious domain name detection models, and the test results comprise accuracy and recall rate;
the determining subunit is used for determining an optimal malicious domain name detection model in the two or more malicious domain name detection models according to the test result;
correspondingly, the detection unit is further configured to detect whether the unknown domain name is a malicious domain name through the optimal malicious domain name detection model.
6. The apparatus according to claim 5, wherein the second obtaining unit comprises:
the first acquisition subunit is used for acquiring an A record, an AAAA record, an MX record and an NS record of each domain name in the domain name training sample set through DNS query;
the second acquisition subunit is used for acquiring the domain name registration time, the domain name registrant, the domain name registration mailbox and the domain name registration mechanism of each domain name in the domain name training sample set through WHOIS query;
and the third acquiring subunit is configured to acquire, by using a WEB request tool, domain name ranking information, search engine entry number of the domain names, a WEB home page corresponding to the domain names, WEB HTTPS certificate information corresponding to the domain names, IP geographical positions corresponding to the domain names, and domain name IP resolution history of each domain name in the domain name training sample set.
7. The apparatus for detecting malicious domain name based on deep learning according to claim 6, further comprising:
a preprocessing unit, configured to perform normalization processing on the first weak correlation feature of each domain name in the domain name training sample set, and convert the first weak correlation feature of each domain name in the domain name training sample set into a floating point number in a [0,1) range.
8. The deep learning based malicious domain name detection device according to claim 5, wherein the domain name training sample set comprises a positive sample set and a negative sample set;
the domain names in the positive sample set are normal domain names, and the domain names in the negative sample set are malicious domain names.
CN201811361303.0A 2018-11-15 2018-11-15 Malicious domain name detection method and device based on deep learning Active CN109302418B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811361303.0A CN109302418B (en) 2018-11-15 2018-11-15 Malicious domain name detection method and device based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811361303.0A CN109302418B (en) 2018-11-15 2018-11-15 Malicious domain name detection method and device based on deep learning

Publications (2)

Publication Number Publication Date
CN109302418A CN109302418A (en) 2019-02-01
CN109302418B true CN109302418B (en) 2021-11-12

Family

ID=65144447

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811361303.0A Active CN109302418B (en) 2018-11-15 2018-11-15 Malicious domain name detection method and device based on deep learning

Country Status (1)

Country Link
CN (1) CN109302418B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008705A (en) * 2019-04-15 2019-07-12 北京微步在线科技有限公司 A kind of recognition methods of malice domain name, device and electronic equipment based on deep learning
CN110427540B (en) * 2019-07-30 2021-11-30 国家计算机网络与信息安全管理中心 Implementation method and system for determining IP address responsibility main body
CN110866611A (en) * 2019-10-14 2020-03-06 杭州安恒信息技术股份有限公司 Malicious domain name detection method based on SVM machine learning
CN111291078B (en) * 2020-01-17 2021-02-02 武汉思普崚技术有限公司 Domain name matching detection method and device
CN112995361A (en) * 2021-04-30 2021-06-18 鹏城实验室 Domain name knowledge graph construction method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105072214A (en) * 2015-08-28 2015-11-18 携程计算机技术(上海)有限公司 C&C domain name identification method based on domain name feature
CN107786575A (en) * 2017-11-11 2018-03-09 北京信息科技大学 A kind of adaptive malice domain name detection method based on DNS flows
CN108200054A (en) * 2017-12-29 2018-06-22 北京奇安信科技有限公司 A kind of malice domain name detection method and device based on dns resolution
CN108600200A (en) * 2018-04-08 2018-09-28 腾讯科技(深圳)有限公司 Domain name detection method, device, computer equipment and storage medium
CN108737439A (en) * 2018-06-04 2018-11-02 上海交通大学 A kind of large-scale malicious domain name detecting system and method based on self feed back study

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2828752B1 (en) * 2012-03-22 2020-04-29 Triad National Security, LLC Path scanning for the detection of anomalous subgraphs and use of dns requests and host agents for anomaly/change detection and network situational awareness

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105072214A (en) * 2015-08-28 2015-11-18 携程计算机技术(上海)有限公司 C&C domain name identification method based on domain name feature
CN107786575A (en) * 2017-11-11 2018-03-09 北京信息科技大学 A kind of adaptive malice domain name detection method based on DNS flows
CN108200054A (en) * 2017-12-29 2018-06-22 北京奇安信科技有限公司 A kind of malice domain name detection method and device based on dns resolution
CN108600200A (en) * 2018-04-08 2018-09-28 腾讯科技(深圳)有限公司 Domain name detection method, device, computer equipment and storage medium
CN108737439A (en) * 2018-06-04 2018-11-02 上海交通大学 A kind of large-scale malicious domain name detecting system and method based on self feed back study

Also Published As

Publication number Publication date
CN109302418A (en) 2019-02-01

Similar Documents

Publication Publication Date Title
CN109302418B (en) Malicious domain name detection method and device based on deep learning
CN110099059B (en) Domain name identification method and device and storage medium
CN111949803A (en) Method, device and equipment for detecting network abnormal user based on knowledge graph
EP3913888A1 (en) Detection method for malicious domain name in domain name system and detection device
CN105376217B (en) A kind of malice jumps and the automatic judging method of malice nested class objectionable website
CN112866023A (en) Network detection method, model training method, device, equipment and storage medium
CN107612911B (en) Method for detecting infected host and C & C server based on DNS traffic
CN102804180A (en) Characterizing Unregistered Domain Names
CN107046586A (en) A kind of algorithm generation domain name detection method based on natural language feature
CN113037680A (en) Application server access method and device based on domain name resolution result
CN108574625B (en) Application test invitation method and device
CN109309665B (en) Access request processing method and device, computing device and storage medium
CN113158660B (en) Sub-domain name discovery method and system applied to penetration test
CN110866611A (en) Malicious domain name detection method based on SVM machine learning
CN111885220B (en) Active acquisition and verification method for target unit IP assets
CN107659602B (en) Method, device and system for associating user access records
CN106611010B (en) Method and device for determining webpage loading speed
CN110489604B (en) Analysis method and system for test measurement data of gas turbine
CN110661677B (en) DNS (Domain name System) testing method, device and system
Wahyudi et al. Algorithm Evaluation for Classification “Phishing Website” Using Several Classification Algorithms
CN117155707B (en) Harmful domain name detection method based on passive network flow measurement
CN116599722A (en) Domain name discrimination method and device, storage medium and electronic equipment
CN108900655A (en) A kind of domain name viability recognition methods, device and electronic equipment
CN110347959B (en) Anonymous user identification method, device, computer equipment and storage medium
CN110912860B (en) Method and device for detecting pseudo periodic access behavior

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant