CN113765841A - Malicious domain name detection method and device - Google Patents

Malicious domain name detection method and device Download PDF

Info

Publication number
CN113765841A
CN113765841A CN202010482760.6A CN202010482760A CN113765841A CN 113765841 A CN113765841 A CN 113765841A CN 202010482760 A CN202010482760 A CN 202010482760A CN 113765841 A CN113765841 A CN 113765841A
Authority
CN
China
Prior art keywords
domain name
probability distribution
domain
joint probability
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010482760.6A
Other languages
Chinese (zh)
Inventor
马兆铭
渠凯
卞子琪
王铮
杨迪
任华
汪少敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN202010482760.6A priority Critical patent/CN113765841A/en
Publication of CN113765841A publication Critical patent/CN113765841A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1483Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2463/00Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00
    • H04L2463/144Detection or countermeasures against botnets

Abstract

The disclosure relates to a method and a device for detecting a malicious domain name, and relates to the technical field of communication. The method comprises the following steps: sequencing the domain names according to the time sequence of the network requests of the domain names to generate a domain name sequence; clustering each domain name according to the time interval of the network requests of the adjacent domain names in the domain name sequence to obtain each domain name set; calculating the feature vector of each domain name by using a maximum likelihood estimation method according to the joint probability distribution of each domain name set by using the feature vector of each domain name as a variable; and judging whether each domain name is a malicious domain name or not by utilizing a machine learning model according to the feature vector of each domain name.

Description

Malicious domain name detection method and device
Technical Field
The present disclosure relates to the field of communications technologies, and in particular, to a malicious domain name detection method, a malicious domain name detection apparatus, and a non-volatile computer-readable storage medium.
Background
In recent years, more and more illegal activities of the network have achieved their malicious purpose by abusing the domain name system. For example, phishers register new domain names that look very similar to well-known legitimate domain names and mount fishing websites to trick network users. Botnets use DGA (Domain Generation Algorithm) to batch generate a large number of Domain names for botnet command and control channel communications to avoid containment and screening by authoritative security defense mechanisms.
In the related art, characteristics are manually matched for each Domain Name from data such as DNS (Domain Name System) traffic and web page information, and detection of a malicious Domain Name is performed based on these characteristics.
Disclosure of Invention
The inventors of the present disclosure found that the following problems exist in the above-described related art: the manually matched features lack good robustness, and an attacker can escape detection by adjusting the features, resulting in poor network security.
In view of this, the present disclosure provides a technical solution for detecting a malicious domain name, which can improve the robustness of the extracted features of the malicious domain name, thereby improving the network security.
According to some embodiments of the present disclosure, there is provided a method for detecting a malicious domain name, including: sequencing the domain names according to the time sequence of the network requests of the domain names to generate a domain name sequence; clustering the domain names according to the time interval of the network requests of the adjacent domain names in the domain name sequence to obtain domain name sets; calculating the feature vector of each domain name by using a maximum likelihood estimation method according to the joint probability distribution of each domain name set by using the feature vector of each domain name as a variable; and judging whether each domain name is a malicious domain name or not by utilizing a machine learning model according to the feature vector of each domain name.
In some embodiments, the calculating the feature vector of each domain name by using a maximum likelihood estimation method according to the joint probability distribution of each domain name set by using the feature vector of each domain name as a variable includes: determining the probability distribution of the domain name sequence according to the joint probability distribution of each domain name set; determining a target function according to the probability distribution of the domain name sequence; and solving the objective function by using a maximum likelihood estimation method to obtain the feature vector of each domain name.
In some embodiments, the determining the probability distribution of the domain name sequence according to the joint probability distribution of the domain name sets includes: in each domain name set of the domain name sequence, respectively combining all two domain names with the sorting distance smaller than a threshold value into a domain name pair; and calculating the joint probability distribution of each domain name set according to the joint probability distribution of each domain name pair.
In some embodiments, the determining the probability distribution of the domain name sequence according to the joint probability distribution of the domain name sets includes: calculating the logarithm of the joint probability distribution of each domain name set according to the weighted sum of the logarithms of the joint probability distribution of each domain name pair; and calculating the logarithm of the probability distribution of the domain name sequence according to the weighted sum of the logarithms of the joint probability distribution of each domain name set.
In some embodiments, said determining an objective function according to the probability distribution of the domain name sequence comprises: and determining the target function according to the logarithm of the probability distribution of the domain name sequence.
In some embodiments, said calculating a joint probability distribution for each set of domain names based on the joint probability distribution for each pair of domain names comprises: and determining the joint probability distribution of the domain name pair by taking the vector product of the transposition of the feature vector of one domain name and the feature vector of the other domain name in the domain name pair as the variable of the Sigmoid function.
In some embodiments, the clustering the domain names according to the time interval of the network requests of the adjacent domain names in the domain name sequence to obtain domain name sets includes: and dividing the adjacent domain names with the time interval smaller than the threshold value into a domain name set.
According to other embodiments of the present disclosure, there is provided a malicious domain name detection apparatus including: the generating unit is used for sequencing the domain names according to the time sequence of the network requests of the domain names to generate a domain name sequence; the clustering unit is used for clustering the domain names according to the time interval of the network requests of the adjacent domain names in the domain name sequence to obtain domain name sets; a calculating unit, configured to calculate the feature vector of each domain name by using a maximum likelihood estimation method according to the joint probability distribution of each domain name set by using the feature vector of each domain name as a variable; and the judging unit is used for judging whether each domain name is a malicious domain name or not by utilizing a machine learning model according to the characteristic vector of each domain name.
In some embodiments, the calculation unit determines the probability distribution of the domain name sequence according to the joint probability distribution of each domain name set, determines an objective function according to the probability distribution of the domain name sequence, and solves the objective function by using a maximum likelihood estimation method to obtain the feature vector of each domain name.
In some embodiments, the calculating unit respectively combines, in each domain name set of the domain name sequence, all two domain names whose sorting distance is smaller than a threshold value into a domain name pair, and calculates the joint probability distribution of each domain name set according to the joint probability distribution of each domain name pair.
In some embodiments, the calculation unit calculates the logarithm of the joint probability distribution of each domain name set according to the weighted sum of the logarithms of the joint probability distribution of each domain name pair, calculates the logarithm of the probability distribution of the domain name sequence according to the weighted sum of the logarithms of the joint probability distribution of each domain name set, and determines the objective function according to the logarithm of the probability distribution of the domain name sequence.
In some embodiments, the computing unit determines the joint probability distribution of the domain name pair as a function of Sigmoid function as a vector product of a transpose of a feature vector of one domain name and a feature vector of the other domain name in the domain name pair.
In some embodiments, the clustering unit partitions adjacent domain names having a time interval less than a threshold into a set of domain names.
According to still other embodiments of the present disclosure, there is provided a malicious domain name detection apparatus including: a memory; and a processor coupled to the memory, the processor configured to perform the method for detecting a malicious domain name in any of the above embodiments based on instructions stored in the memory device.
According to still further embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of detecting a malicious domain name in any of the above embodiments.
In the above embodiment, the potential time accompanying relationship of each domain name is mined from the network request, and the feature vector of each domain name as the basis for detection is extracted based on this accompanying relationship. Thus, the robustness of the extracted features of the malicious domain name is improved, and the network security is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.
The present disclosure may be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:
fig. 1 illustrates a flow diagram of some embodiments of a malicious domain name detection method of the present disclosure;
fig. 2 illustrates a schematic diagram of some embodiments of a malicious domain name detection method of the present disclosure;
fig. 3 shows a schematic diagram of further embodiments of the disclosed malicious domain name detection method;
fig. 4 illustrates a flow diagram of further embodiments of the disclosed malicious domain name detection method;
fig. 5 illustrates a block diagram of some embodiments of a malicious domain name detection apparatus of the present disclosure;
fig. 6 shows a block diagram of further embodiments of the malicious domain name detection apparatus of the present disclosure;
fig. 7 illustrates a block diagram of still further embodiments of the malicious domain name detection apparatus of the present disclosure.
Detailed Description
Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.
Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
As described above, in order to solve the technical problem that malicious domain name features extracted manually are poor in robustness and cause network security to be degraded, the inventors of the present disclosure have studied and found that network requests of domain names are not independent from each other, but have characteristics of similar time and co-occurrence. I.e., there is a close association between domain names that have time companions with each other.
Therefore, the technical scheme for detecting the malicious domain name is provided based on the time accompanying relation between the network requests of the domain names, so that the characteristic robustness of the malicious domain name is improved, and the network security is improved. For example, the above technical solution can be realized by the following embodiments.
Fig. 1 illustrates a flow diagram of some embodiments of a malicious domain name detection method of the present disclosure.
As shown in fig. 1, the method includes: step 110, generating a domain name sequence; step 120, acquiring a domain name set; step 130, calculating a feature vector; and step 140, judging the malicious domain name.
In step 110, the domain names are sorted according to the time sequence of the network requests of the domain names, and a domain name sequence is generated. For example, the network request of each domain name can be obtained from DNS traffic, and the domain names are arranged into the domain name sequence S according to the time sequence of the network request.
In step 120, each domain name is clustered according to the time interval of the network request of the adjacent domain name in the domain name sequence, and each domain name set is obtained. For example, adjacent domain names with time intervals smaller than a threshold are divided into a domain name set. I.e. domain names with time-adjoint relationships are divided into the same cluster.
In some embodiments, the domain names may be partitioned into different sets of domain names by the embodiment in fig. 2.
Fig. 2 illustrates a schematic diagram of some embodiments of the disclosed malicious domain name detection method.
As shown in fig. 2, each domain name d can be obtained from DNS trafficiNetwork request and request time t thereofi. For example, there are 10 domain names (d) in total over a period of time1~d10) A network request is issued.
Calculating the time interval between the network requests of every two domain names, and dividing the domain names with the time interval smaller than the sum value into a domain name set Sj. For example, a set of domain names S having a time-contingency relationship can be produced1={d1,d2And S2={d4,d5,d6,d7,d8,d9}. It is also possible to classify domain names that do not have a time-accompanied relationship with any other domain name into a set of domain names that are not time-accompanied, e.g. { d }3,d10}。
After the domain name set is partitioned, the malicious domain name can be detected through other steps in fig. 1.
In step 130, the feature vector of each domain name is calculated by a maximum likelihood estimation method according to the joint probability distribution of each domain name set, using the feature vector of each domain name as a variable.
In some embodiments, the probability distribution of the domain name sequences is determined from the joint probability distribution of the domain name sets. And determining an objective function according to the probability distribution of the domain name sequence. And solving the objective function by using a maximum likelihood estimation method to obtain the characteristic vector of each domain name.
For example, the domain name sequence S ═ { S ═ S1,S2,…,Sj,…,SJEach SjEach of which comprises a plurality of domain names d with time adjoint relationshipi. The objective function of the maximum likelihood estimation method can be set according to the following formula:
Figure BDA0002517830280000061
p (S) is the probability distribution of the domain name sequence, P (S)j) For joint probability distribution of each domain name set, P (S)j) With SjD contained in (1)iCharacteristic vector v ofiAre variables. Solving the objective function by maximum likelihood estimation method, calculating each v when P (S) obtains maximum valuei
In some embodiments, in each domain name set of the domain name sequence, respectively combining all two domain names whose sorting distance is smaller than a threshold value into a domain name pair; and calculating the joint probability distribution of each domain name set according to the joint probability distribution of each domain name pair. For example, domain name pairs may be combined by the embodiment in fig. 3.
Fig. 3 is a schematic diagram illustrating further embodiments of the malicious domain name detection method of the present disclosure.
As shown in FIG. 3, for the Domain name set S2={d4,d5,d6,d7,d8,d9Can be according to rowsThe threshold value of the order distance sets a sliding window. For example, the length of the sliding window is 2, i.e. the threshold value of the sorting distance is 2, S is2In diDomain name d with front-back distance less than or equal to 2kAnd diAre combined into domain name pairs.
In some embodiments, the name set S may be by domain name set2In diThe current domain name is sequentially determined (the diagonal line shaded boxes in the figure), and then other domain names (the dotted shaded boxes in the figure) in the front-back sliding window where the current domain name is located and the current domain name are combined into a domain name pair by taking the current domain name as the center.
For example, the current domain name is d4The sliding window contains a field name d5、d6Then the domain name pair is (d)4,d5) (d4, d 6). In order, the current domain name becomes d5The sliding window contains a field name d4、d6、d7Then the domain name pair is (d)5,d4)、(d5,d6)、(d5,d7). By analogy, S can be determined2All domain name pairs in (1).
In some embodiments, the logarithm of the joint probability distribution for each set of domain names is calculated from a weighted sum of the logarithms of the joint probability distributions for each pair of domain names. And calculating the logarithm of the probability distribution of the domain name sequence according to the weighted sum of the logarithms of the joint probability distribution of each domain name set. And determining the target function according to the logarithm of the probability distribution of the domain name sequence. For example, the objective function may be set according to the following formula:
Figure BDA0002517830280000071
P(di,dk) For a domain name pair di、dkOc is a positive proportional relationship symbol.
In some embodiments, the product of the transpose of the feature vector of one domain in a domain pair and the feature vector of the other domain is used as a function of the Sigmoid function (or other Sigmoid function)And determining the joint probability distribution of the domain name pair. For example, P (d) may be calculated as an activation function using the following formulai,dk):
Figure BDA0002517830280000072
For example, the domain name d when the maximization of the objective function can be calculated in an iterative manneriAt RdFeature vector v of spacei
After the objective function is solved, the feature vector of each domain name can be obtained, and the malicious domain name can be detected through other steps in fig. 1.
In step 140, a machine learning model is used to determine whether each domain name is a malicious domain name according to the feature vector of each domain name. For example, a machine learning model may be trained as a classifier for detecting malicious domain names.
Fig. 4 shows a flow diagram of further embodiments of the disclosed malicious domain name detection method.
As shown in fig. 4, in step 410, the network request and the request time of each domain name are obtained from the DNS traffic data.
In some embodiments, steps 420, 430 may be performed by a domain name sequence temporal relationship clustering module.
In step 420, the domain names are sorted by using the time sequence of the network requests of the domain names to generate a domain name sequence.
In step 430, clustering is performed on each domain name in the domain name sequence at time intervals, and domain names with time association relations are divided into the same cluster to form a time association domain name set.
In some embodiments, D is DNS traffic data heard through the DNS server. D may be decomposed into a set of DNS requests for different end users, i.e., D ═ Un,Dn}。UnFor the end user, Dn={(d1,t1),(d2,t2),…,(di,ti),…(dI,tI) Is eachSet in which DNS requests for domain names are distributed on the time axis, (d)i,ti) Representing user UnAt time tiDomain name di
In some embodiments, the domain name sequence may be cluster partitioned based on the temporal clustering properties of the DNS requests, and the temporal attendant properties of the DNS requests for the domain names.
For example, if the time interval of DNS requests of adjacent 2 domain names is greater than a threshold, the 2 domain names are divided into different domain name sets; otherwise, the domain names are divided into the same domain name set. In this way, the domain name sequence may be divided into a plurality of domain name sets that are time-dependent, i.e., time-associated domain name sets extracted from the DNS traffic D.
In some embodiments, steps 440-460 may be performed by a spatiotemporal companion domain name feature learning module.
In step 440, the time companion domain name set is divided into pairs of companion domain names with time companion relationships using a sliding window approach.
In some embodiments, in order to reduce the complexity of the maximum likelihood probability calculation of the domain name set, the domain name set may be divided into a plurality of domain name pairs in a sliding window manner. For example, the window size may be set to 2, and the domain name set may be divided into a plurality of domain name pairs. The size of the sliding window of the domain name set including 5 domain names may be set to 2 (a window whose front and rear distances are 2 with the current domain name as the center) according to the manner in fig. 3, and the domain name pairs are divided.
In step 450, the domain name pair association probability is expressed using an activation function to determine an objective function for the maximum likelihood estimation method.
In step 460, the feature vector for the domain name is found by iteratively maximizing the joint probability.
In step 470, in combination with the labeled domain name sample data, a malicious domain name detection classifier (machine learning model) is trained for detecting more unknown malicious domain names.
In the above embodiment, the domain name sequence segmentation algorithm based on time intervals can effectively extract the domain name sets with the accompanying relationship from the DNS traffic. And mapping each domain name into a feature vector of a low-dimensional space by using an unsupervised domain name quantization algorithm, and keeping the adjoint relationship between the domain names. Based on the hash sum with the companion relationship, the potential time companion relationship of the domain name can be automatically mined from the original DNS traffic for detecting malicious domain names.
Therefore, potential domain name accompanying relations can be automatically mined from DNS traffic and mapped into feature vectors, manual expert experience is not needed, and complex work of manually designing features is omitted. The accompanying relationships are mined using the chronological order of the network requests without additional information. Moreover, this enables to deal with malicious domain names that do not have normal responses; the domain names with the accompanying relation are clustered, and the clustering method can be used for malicious domain name group discovery.
Fig. 5 illustrates a block diagram of some embodiments of the malicious domain name detection apparatus of the present disclosure.
As shown in fig. 5, the detection apparatus 5 of a malicious domain name includes a generation unit 51, a clustering unit 52, a calculation unit 53, and a judgment unit 5.
The generation unit 51 sorts the domain names according to the time sequence of the network requests of the domain names, and generates a domain name sequence.
The clustering unit 52 clusters the domain names according to the time interval of the network requests of the adjacent domain names in the domain name sequence, and obtains a set of the domain names.
In some embodiments, clustering unit 52 partitions adjacent domain names having a time interval less than a threshold into a set of domain names.
The calculation unit 53 calculates the feature vector of each domain name by using the maximum likelihood estimation method according to the joint probability distribution of each domain name set, using the feature vector of each domain name as a variable.
In some embodiments, the calculation unit 53 determines the probability distribution of the domain name sequences based on the joint probability distribution of the domain name sets; the calculating unit 53 determines an objective function according to the probability distribution of the domain name sequence; the calculation unit 53 solves the objective function by using a maximum likelihood estimation method, and obtains the feature vector of each domain name.
In some embodiments, the calculating unit 53 respectively combines, in each domain name set of the domain name sequence, all two domain names whose sorting distance is smaller than the threshold value into a domain name pair; the calculation unit 53 calculates the joint probability distribution of each domain name set based on the joint probability distribution of each domain name pair.
In some embodiments, the calculating unit 53 calculates the logarithm of the joint probability distribution of each domain name set according to a weighted sum of the logarithms of the joint probability distribution of each domain name pair; the calculating unit 53 calculates the logarithm of the probability distribution of the domain name sequence based on the weighted sum of the logarithms of the joint probability distribution of each domain name set; the calculation unit 53 determines an objective function from the logarithm of the probability distribution of the domain name sequence.
In some embodiments, the calculation unit 53 determines the joint probability distribution of a domain name pair as a function of Sigmoid function by taking the vector product of the transpose of the feature vector of one domain name and the feature vector of the other domain name in the domain name pair.
The determination unit 54 determines whether each domain name is a malicious domain name by using a machine learning model according to the feature vector of each domain name.
Fig. 6 shows a block diagram of further embodiments of the malicious domain name detection apparatus of the present disclosure.
As shown in fig. 6, the detection apparatus 6 for a malicious domain name according to this embodiment includes: a memory 61 and a processor 62 coupled to the memory 61, the processor 62 being configured to execute the method for detecting a malicious domain name in any one of the embodiments of the present disclosure based on instructions stored in the memory 61.
The memory 61 may include, for example, a system memory, a fixed nonvolatile storage medium, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader (Boot Loader), a database, and other programs.
Fig. 7 illustrates a block diagram of still further embodiments of the malicious domain name detection apparatus of the present disclosure.
As shown in fig. 7, the detection apparatus 7 for a malicious domain name according to this embodiment includes: a memory 710 and a processor 720 coupled to the memory 710, the processor 720 being configured to execute the malicious domain name detection method in any of the foregoing embodiments based on instructions stored in the memory 710.
The memory 610 may include, for example, system memory, fixed non-volatile storage media, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader (Boot Loader), and other programs.
The malicious domain name detection device 7 may further include an input/output interface 730, a network interface 740, a storage interface 750, and the like. These interfaces 730, 740, 750, as well as the memory 710 and the processor 720, may be connected, for example, by a bus 760. The input/output interface 730 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, a touch screen, a microphone, and a speaker. The network interface 740 provides a connection interface for various networking devices. The storage interface 750 provides a connection interface for external storage devices such as an SD card and a usb disk.
As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
So far, the detection method of a malicious domain name, the detection apparatus of a malicious domain name, and the nonvolatile computer-readable storage medium according to the present disclosure have been described in detail. Some details that are well known in the art have not been described in order to avoid obscuring the concepts of the present disclosure. It will be fully apparent to those skilled in the art from the foregoing description how to practice the presently disclosed embodiments.
The method and system of the present disclosure may be implemented in a number of ways. For example, the methods and systems of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the foregoing examples are for purposes of illustration only and are not intended to limit the scope of the present disclosure. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope and spirit of the present disclosure. The scope of the present disclosure is defined by the appended claims.

Claims (14)

1. A method for detecting a malicious domain name comprises the following steps:
sequencing the domain names according to the time sequence of the network requests of the domain names to generate a domain name sequence;
clustering the domain names according to the time interval of the network requests of the adjacent domain names in the domain name sequence to obtain domain name sets;
calculating the feature vector of each domain name by using a maximum likelihood estimation method according to the joint probability distribution of each domain name set by using the feature vector of each domain name as a variable;
and judging whether each domain name is a malicious domain name or not by utilizing a machine learning model according to the feature vector of each domain name.
2. The detection method according to claim 1, wherein the calculating the feature vector of each domain name by using a maximum likelihood estimation method according to the joint probability distribution of each domain name set by using the feature vector of each domain name as a variable comprises:
determining the probability distribution of the domain name sequence according to the joint probability distribution of each domain name set;
determining a target function according to the probability distribution of the domain name sequence;
and solving the objective function by using a maximum likelihood estimation method to obtain the feature vector of each domain name.
3. The detection method according to claim 2, wherein the determining the probability distribution of the domain name sequences according to the joint probability distribution of the domain name sets comprises:
in each domain name set of the domain name sequence, respectively combining all two domain names with the sorting distance smaller than a threshold value into a domain name pair;
and calculating the joint probability distribution of each domain name set according to the joint probability distribution of each domain name pair.
4. The detection method according to claim 3, wherein the determining the probability distribution of the domain name sequences according to the joint probability distribution of the domain name sets comprises:
calculating the logarithm of the joint probability distribution of each domain name set according to the weighted sum of the logarithms of the joint probability distribution of each domain name pair;
calculating the logarithm of the probability distribution of the domain name sequence according to the weighted sum of the logarithms of the joint probability distribution of each domain name set;
the determining the objective function according to the probability distribution of the domain name sequence includes:
and determining the target function according to the logarithm of the probability distribution of the domain name sequence.
5. The detection method according to claim 3, wherein the calculating the joint probability distribution of each domain name set according to the joint probability distribution of each domain name pair comprises:
and determining the joint probability distribution of the domain name pair by taking the vector product of the transposition of the feature vector of one domain name and the feature vector of the other domain name in the domain name pair as the variable of the Sigmoid function.
6. The detection method according to any one of claims 1 to 5, wherein the clustering the domain names according to the time interval of the network requests of the adjacent domain names in the domain name sequence to obtain domain name sets comprises:
and dividing the adjacent domain names with the time interval smaller than the threshold value into a domain name set.
7. A malicious domain name detection apparatus, comprising:
the generating unit is used for sequencing the domain names according to the time sequence of the network requests of the domain names to generate a domain name sequence;
the clustering unit is used for clustering the domain names according to the time interval of the network requests of the adjacent domain names in the domain name sequence to obtain domain name sets;
a calculating unit, configured to calculate the feature vector of each domain name by using a maximum likelihood estimation method according to the joint probability distribution of each domain name set by using the feature vector of each domain name as a variable;
and the judging unit is used for judging whether each domain name is a malicious domain name or not by utilizing a machine learning model according to the characteristic vector of each domain name.
8. The detection apparatus according to claim 7,
and the calculation unit determines the probability distribution of the domain name sequence according to the joint probability distribution of each domain name set, determines an objective function according to the probability distribution of the domain name sequence, solves the objective function by using a maximum likelihood estimation method and acquires the feature vector of each domain name.
9. The detection apparatus according to claim 8,
the calculation unit respectively combines two domain names with the sorting distance smaller than a threshold value into domain name pairs in each domain name set of the domain name sequence, and calculates the joint probability distribution of each domain name set according to the joint probability distribution of each domain name pair.
10. The detection apparatus according to claim 9,
the calculation unit calculates the logarithm of the joint probability distribution of each domain name set according to the weighted sum of the logarithms of the joint probability distribution of each domain name pair, calculates the logarithm of the probability distribution of the domain name sequence according to the weighted sum of the logarithms of the joint probability distribution of each domain name set, and determines the objective function according to the logarithm of the probability distribution of the domain name sequence.
11. The detection apparatus according to claim 9,
the calculation unit determines the joint probability distribution of the domain name pair by taking the vector product of the transpose of the feature vector of one domain name and the feature vector of the other domain name in the domain name pair as the variable of the Sigmoid function.
12. The detection apparatus according to any one of claims 7-11, wherein the clustering unit partitions neighboring domain names having a time interval smaller than a threshold value into a set of domain names.
13. A malicious domain name detection apparatus, comprising:
a memory; and
a processor coupled to the memory, the processor configured to perform the method of detecting a malicious domain name of any of claims 1-6 based on instructions stored in the memory.
14. A non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of detecting a malicious domain name according to any one of claims 1 to 6.
CN202010482760.6A 2020-06-01 2020-06-01 Malicious domain name detection method and device Pending CN113765841A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010482760.6A CN113765841A (en) 2020-06-01 2020-06-01 Malicious domain name detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010482760.6A CN113765841A (en) 2020-06-01 2020-06-01 Malicious domain name detection method and device

Publications (1)

Publication Number Publication Date
CN113765841A true CN113765841A (en) 2021-12-07

Family

ID=78782397

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010482760.6A Pending CN113765841A (en) 2020-06-01 2020-06-01 Malicious domain name detection method and device

Country Status (1)

Country Link
CN (1) CN113765841A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114401122A (en) * 2021-12-28 2022-04-26 中国电信股份有限公司 Domain name detection method and device, electronic equipment and storage medium
CN114553486A (en) * 2022-01-20 2022-05-27 北京百度网讯科技有限公司 Illegal data processing method and device, electronic equipment and storage medium
CN114745355A (en) * 2022-01-25 2022-07-12 合肥讯飞数码科技有限公司 DNS detection method and device, electronic equipment and storage medium
CN116760645A (en) * 2023-08-22 2023-09-15 北京长亭科技有限公司 Malicious domain name detection method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102685145A (en) * 2012-05-28 2012-09-19 西安交通大学 Domain name server (DNS) data packet-based bot-net domain name discovery method
US8631498B1 (en) * 2011-12-23 2014-01-14 Symantec Corporation Techniques for identifying potential malware domain names
US20160294859A1 (en) * 2015-03-30 2016-10-06 Electronics And Telecommunications Research Institute Apparatus and method for detecting malicious domain cluster
US10467536B1 (en) * 2014-12-12 2019-11-05 Go Daddy Operating Company, LLC Domain name generation and ranking
CN110557382A (en) * 2019-08-08 2019-12-10 中国科学院信息工程研究所 Malicious domain name detection method and system by utilizing domain name co-occurrence relation
CN110572359A (en) * 2019-08-01 2019-12-13 杭州安恒信息技术股份有限公司 Phishing webpage detection method based on machine learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8631498B1 (en) * 2011-12-23 2014-01-14 Symantec Corporation Techniques for identifying potential malware domain names
CN102685145A (en) * 2012-05-28 2012-09-19 西安交通大学 Domain name server (DNS) data packet-based bot-net domain name discovery method
US10467536B1 (en) * 2014-12-12 2019-11-05 Go Daddy Operating Company, LLC Domain name generation and ranking
US20160294859A1 (en) * 2015-03-30 2016-10-06 Electronics And Telecommunications Research Institute Apparatus and method for detecting malicious domain cluster
CN110572359A (en) * 2019-08-01 2019-12-13 杭州安恒信息技术股份有限公司 Phishing webpage detection method based on machine learning
CN110557382A (en) * 2019-08-08 2019-12-10 中国科学院信息工程研究所 Malicious domain name detection method and system by utilizing domain name co-occurrence relation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
彭成维等: "一种基于域名请求伴随关系的恶意域名检测方法", 《计算机研究与发展》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114401122A (en) * 2021-12-28 2022-04-26 中国电信股份有限公司 Domain name detection method and device, electronic equipment and storage medium
CN114401122B (en) * 2021-12-28 2024-04-05 中国电信股份有限公司 Domain name detection method and device, electronic equipment and storage medium
CN114553486A (en) * 2022-01-20 2022-05-27 北京百度网讯科技有限公司 Illegal data processing method and device, electronic equipment and storage medium
CN114745355A (en) * 2022-01-25 2022-07-12 合肥讯飞数码科技有限公司 DNS detection method and device, electronic equipment and storage medium
CN116760645A (en) * 2023-08-22 2023-09-15 北京长亭科技有限公司 Malicious domain name detection method and device
CN116760645B (en) * 2023-08-22 2023-11-14 北京长亭科技有限公司 Malicious domain name detection method and device

Similar Documents

Publication Publication Date Title
CN113765841A (en) Malicious domain name detection method and device
Abdelhakim et al. A quality guaranteed robust image watermarking optimization with Artificial Bee Colony
CN111382430B (en) System and method for classifying objects of a computer system
CN110362677B (en) Text data category identification method and device, storage medium and computer equipment
CN109214238A (en) Multi-object tracking method, device, equipment and storage medium
Çayır et al. Random CapsNet forest model for imbalanced malware type classification task
CN111382434B (en) System and method for detecting malicious files
US20210385253A1 (en) Cluster detection and elimination in security environments
CN109376277B (en) Method and device for determining equipment fingerprint homology
US20170372069A1 (en) Information processing method and server, and computer storage medium
CN110969243B (en) Method and device for training countermeasure generation network for preventing privacy leakage
CN112468487B (en) Method and device for realizing model training and method and device for realizing node detection
WO2023024749A1 (en) Video retrieval method and apparatus, device, and storage medium
Wu et al. G-UAP: Generic Universal Adversarial Perturbation that Fools RPN-based Detectors.
WO2022042638A1 (en) Deterministic learning video scene detection
Soliman et al. Adagraph: adaptive graph-based algorithms for spam detection in social networks
Chen et al. Using adversarial examples to bypass deep learning based url detection system
CN112380537A (en) Method, device, storage medium and electronic equipment for detecting malicious software
JP5520353B2 (en) BoF expression generation device and BoF expression generation method
Li et al. An empirical study on the efficacy of deep active learning for image classification
CN112348041A (en) Log classification and log classification training method and device, equipment and storage medium
US11017055B2 (en) Hotspots for probabilistic model testing and cyber analysis
CN113159317B (en) Antagonistic sample generation method based on dynamic residual corrosion
JP7075362B2 (en) Judgment device, judgment method and judgment program
Shao et al. Federated face presentation attack detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20211207

RJ01 Rejection of invention patent application after publication