CN113765841A

CN113765841A - Malicious domain name detection method and device

Info

Publication number: CN113765841A
Application number: CN202010482760.6A
Authority: CN
Inventors: 马兆铭; 渠凯; 卞子琪; 王铮; 杨迪; 任华; 汪少敏
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2020-06-01
Filing date: 2020-06-01
Publication date: 2021-12-07

Abstract

The disclosure relates to a method and a device for detecting a malicious domain name, and relates to the technical field of communication. The method comprises the following steps: sequencing the domain names according to the time sequence of the network requests of the domain names to generate a domain name sequence; clustering each domain name according to the time interval of the network requests of the adjacent domain names in the domain name sequence to obtain each domain name set; calculating the feature vector of each domain name by using a maximum likelihood estimation method according to the joint probability distribution of each domain name set by using the feature vector of each domain name as a variable; and judging whether each domain name is a malicious domain name or not by utilizing a machine learning model according to the feature vector of each domain name.

Description

Malicious domain name detection method and device

Technical Field

The present disclosure relates to the field of communications technologies, and in particular, to a malicious domain name detection method, a malicious domain name detection apparatus, and a non-volatile computer-readable storage medium.

Background

In recent years, more and more illegal activities of the network have achieved their malicious purpose by abusing the domain name system. For example, phishers register new domain names that look very similar to well-known legitimate domain names and mount fishing websites to trick network users. Botnets use DGA (Domain Generation Algorithm) to batch generate a large number of Domain names for botnet command and control channel communications to avoid containment and screening by authoritative security defense mechanisms.

In the related art, characteristics are manually matched for each Domain Name from data such as DNS (Domain Name System) traffic and web page information, and detection of a malicious Domain Name is performed based on these characteristics.

Disclosure of Invention

The inventors of the present disclosure found that the following problems exist in the above-described related art: the manually matched features lack good robustness, and an attacker can escape detection by adjusting the features, resulting in poor network security.

In view of this, the present disclosure provides a technical solution for detecting a malicious domain name, which can improve the robustness of the extracted features of the malicious domain name, thereby improving the network security.

According to some embodiments of the present disclosure, there is provided a method for detecting a malicious domain name, including: sequencing the domain names according to the time sequence of the network requests of the domain names to generate a domain name sequence; clustering the domain names according to the time interval of the network requests of the adjacent domain names in the domain name sequence to obtain domain name sets; calculating the feature vector of each domain name by using a maximum likelihood estimation method according to the joint probability distribution of each domain name set by using the feature vector of each domain name as a variable; and judging whether each domain name is a malicious domain name or not by utilizing a machine learning model according to the feature vector of each domain name.

In some embodiments, the calculating the feature vector of each domain name by using a maximum likelihood estimation method according to the joint probability distribution of each domain name set by using the feature vector of each domain name as a variable includes: determining the probability distribution of the domain name sequence according to the joint probability distribution of each domain name set; determining a target function according to the probability distribution of the domain name sequence; and solving the objective function by using a maximum likelihood estimation method to obtain the feature vector of each domain name.

In some embodiments, the determining the probability distribution of the domain name sequence according to the joint probability distribution of the domain name sets includes: in each domain name set of the domain name sequence, respectively combining all two domain names with the sorting distance smaller than a threshold value into a domain name pair; and calculating the joint probability distribution of each domain name set according to the joint probability distribution of each domain name pair.

In some embodiments, the determining the probability distribution of the domain name sequence according to the joint probability distribution of the domain name sets includes: calculating the logarithm of the joint probability distribution of each domain name set according to the weighted sum of the logarithms of the joint probability distribution of each domain name pair; and calculating the logarithm of the probability distribution of the domain name sequence according to the weighted sum of the logarithms of the joint probability distribution of each domain name set.

In some embodiments, said determining an objective function according to the probability distribution of the domain name sequence comprises: and determining the target function according to the logarithm of the probability distribution of the domain name sequence.

In some embodiments, said calculating a joint probability distribution for each set of domain names based on the joint probability distribution for each pair of domain names comprises: and determining the joint probability distribution of the domain name pair by taking the vector product of the transposition of the feature vector of one domain name and the feature vector of the other domain name in the domain name pair as the variable of the Sigmoid function.

In some embodiments, the clustering the domain names according to the time interval of the network requests of the adjacent domain names in the domain name sequence to obtain domain name sets includes: and dividing the adjacent domain names with the time interval smaller than the threshold value into a domain name set.

According to other embodiments of the present disclosure, there is provided a malicious domain name detection apparatus including: the generating unit is used for sequencing the domain names according to the time sequence of the network requests of the domain names to generate a domain name sequence; the clustering unit is used for clustering the domain names according to the time interval of the network requests of the adjacent domain names in the domain name sequence to obtain domain name sets; a calculating unit, configured to calculate the feature vector of each domain name by using a maximum likelihood estimation method according to the joint probability distribution of each domain name set by using the feature vector of each domain name as a variable; and the judging unit is used for judging whether each domain name is a malicious domain name or not by utilizing a machine learning model according to the characteristic vector of each domain name.

In some embodiments, the calculation unit determines the probability distribution of the domain name sequence according to the joint probability distribution of each domain name set, determines an objective function according to the probability distribution of the domain name sequence, and solves the objective function by using a maximum likelihood estimation method to obtain the feature vector of each domain name.

In some embodiments, the calculating unit respectively combines, in each domain name set of the domain name sequence, all two domain names whose sorting distance is smaller than a threshold value into a domain name pair, and calculates the joint probability distribution of each domain name set according to the joint probability distribution of each domain name pair.

In some embodiments, the calculation unit calculates the logarithm of the joint probability distribution of each domain name set according to the weighted sum of the logarithms of the joint probability distribution of each domain name pair, calculates the logarithm of the probability distribution of the domain name sequence according to the weighted sum of the logarithms of the joint probability distribution of each domain name set, and determines the objective function according to the logarithm of the probability distribution of the domain name sequence.

In some embodiments, the computing unit determines the joint probability distribution of the domain name pair as a function of Sigmoid function as a vector product of a transpose of a feature vector of one domain name and a feature vector of the other domain name in the domain name pair.

In some embodiments, the clustering unit partitions adjacent domain names having a time interval less than a threshold into a set of domain names.

According to still other embodiments of the present disclosure, there is provided a malicious domain name detection apparatus including: a memory; and a processor coupled to the memory, the processor configured to perform the method for detecting a malicious domain name in any of the above embodiments based on instructions stored in the memory device.

According to still further embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of detecting a malicious domain name in any of the above embodiments.

In the above embodiment, the potential time accompanying relationship of each domain name is mined from the network request, and the feature vector of each domain name as the basis for detection is extracted based on this accompanying relationship. Thus, the robustness of the extracted features of the malicious domain name is improved, and the network security is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.

The present disclosure may be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:

fig. 1 illustrates a flow diagram of some embodiments of a malicious domain name detection method of the present disclosure;

fig. 2 illustrates a schematic diagram of some embodiments of a malicious domain name detection method of the present disclosure;

fig. 3 shows a schematic diagram of further embodiments of the disclosed malicious domain name detection method;

fig. 4 illustrates a flow diagram of further embodiments of the disclosed malicious domain name detection method;

fig. 5 illustrates a block diagram of some embodiments of a malicious domain name detection apparatus of the present disclosure;

fig. 6 shows a block diagram of further embodiments of the malicious domain name detection apparatus of the present disclosure;

fig. 7 illustrates a block diagram of still further embodiments of the malicious domain name detection apparatus of the present disclosure.

Detailed Description

Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

As described above, in order to solve the technical problem that malicious domain name features extracted manually are poor in robustness and cause network security to be degraded, the inventors of the present disclosure have studied and found that network requests of domain names are not independent from each other, but have characteristics of similar time and co-occurrence. I.e., there is a close association between domain names that have time companions with each other.

Therefore, the technical scheme for detecting the malicious domain name is provided based on the time accompanying relation between the network requests of the domain names, so that the characteristic robustness of the malicious domain name is improved, and the network security is improved. For example, the above technical solution can be realized by the following embodiments.

Fig. 1 illustrates a flow diagram of some embodiments of a malicious domain name detection method of the present disclosure.

As shown in fig. 1, the method includes: step 110, generating a domain name sequence; step 120, acquiring a domain name set; step 130, calculating a feature vector; and step 140, judging the malicious domain name.

In step 110, the domain names are sorted according to the time sequence of the network requests of the domain names, and a domain name sequence is generated. For example, the network request of each domain name can be obtained from DNS traffic, and the domain names are arranged into the domain name sequence S according to the time sequence of the network request.

In step 120, each domain name is clustered according to the time interval of the network request of the adjacent domain name in the domain name sequence, and each domain name set is obtained. For example, adjacent domain names with time intervals smaller than a threshold are divided into a domain name set. I.e. domain names with time-adjoint relationships are divided into the same cluster.

In some embodiments, the domain names may be partitioned into different sets of domain names by the embodiment in fig. 2.

Fig. 2 illustrates a schematic diagram of some embodiments of the disclosed malicious domain name detection method.

As shown in fig. 2, each domain name d can be obtained from DNS traffic_iNetwork request and request time t thereof_i. For example, there are 10 domain names (d) in total over a period of time₁～d₁₀) A network request is issued.

Calculating the time interval between the network requests of every two domain names, and dividing the domain names with the time interval smaller than the sum value into a domain name set S_j. For example, a set of domain names S having a time-contingency relationship can be produced₁＝{d₁，d₂And S₂＝{d₄，d₅，d₆，d₇，d₈，d₉}. It is also possible to classify domain names that do not have a time-accompanied relationship with any other domain name into a set of domain names that are not time-accompanied, e.g. { d }₃，d₁₀}。

After the domain name set is partitioned, the malicious domain name can be detected through other steps in fig. 1.

In step 130, the feature vector of each domain name is calculated by a maximum likelihood estimation method according to the joint probability distribution of each domain name set, using the feature vector of each domain name as a variable.

In some embodiments, the probability distribution of the domain name sequences is determined from the joint probability distribution of the domain name sets. And determining an objective function according to the probability distribution of the domain name sequence. And solving the objective function by using a maximum likelihood estimation method to obtain the characteristic vector of each domain name.

For example, the domain name sequence S ═ { S ═ S₁，S₂，…，S_j，…，S_JEach S_jEach of which comprises a plurality of domain names d with time adjoint relationship_i. The objective function of the maximum likelihood estimation method can be set according to the following formula:

p (S) is the probability distribution of the domain name sequence, P (S)_j) For joint probability distribution of each domain name set, P (S)_j) With S_jD contained in (1)_iCharacteristic vector v of_iAre variables. Solving the objective function by maximum likelihood estimation method, calculating each v when P (S) obtains maximum value_i。

In some embodiments, in each domain name set of the domain name sequence, respectively combining all two domain names whose sorting distance is smaller than a threshold value into a domain name pair; and calculating the joint probability distribution of each domain name set according to the joint probability distribution of each domain name pair. For example, domain name pairs may be combined by the embodiment in fig. 3.

Fig. 3 is a schematic diagram illustrating further embodiments of the malicious domain name detection method of the present disclosure.

As shown in FIG. 3, for the Domain name set S₂＝{d₄，d₅，d₆，d₇，d₈，d₉Can be according to rowsThe threshold value of the order distance sets a sliding window. For example, the length of the sliding window is 2, i.e. the threshold value of the sorting distance is 2, S is₂In d_iDomain name d with front-back distance less than or equal to 2_kAnd d_iAre combined into domain name pairs.

In some embodiments, the name set S may be by domain name set₂In d_iThe current domain name is sequentially determined (the diagonal line shaded boxes in the figure), and then other domain names (the dotted shaded boxes in the figure) in the front-back sliding window where the current domain name is located and the current domain name are combined into a domain name pair by taking the current domain name as the center.

For example, the current domain name is d₄The sliding window contains a field name d₅、d₆Then the domain name pair is (d)₄,d₅) (d4, d 6). In order, the current domain name becomes d₅The sliding window contains a field name d₄、d₆、d₇Then the domain name pair is (d)₅,d₄)、(d₅,d₆)、(d₅,d₇). By analogy, S can be determined₂All domain name pairs in (1).

In some embodiments, the logarithm of the joint probability distribution for each set of domain names is calculated from a weighted sum of the logarithms of the joint probability distributions for each pair of domain names. And calculating the logarithm of the probability distribution of the domain name sequence according to the weighted sum of the logarithms of the joint probability distribution of each domain name set. And determining the target function according to the logarithm of the probability distribution of the domain name sequence. For example, the objective function may be set according to the following formula:

P(d_i，d_k) For a domain name pair d_i、d_kOc is a positive proportional relationship symbol.

In some embodiments, the product of the transpose of the feature vector of one domain in a domain pair and the feature vector of the other domain is used as a function of the Sigmoid function (or other Sigmoid function)And determining the joint probability distribution of the domain name pair. For example, P (d) may be calculated as an activation function using the following formula_i，d_k)：

For example, the domain name d when the maximization of the objective function can be calculated in an iterative manner_iAt R^dFeature vector v of space_i。

After the objective function is solved, the feature vector of each domain name can be obtained, and the malicious domain name can be detected through other steps in fig. 1.

In step 140, a machine learning model is used to determine whether each domain name is a malicious domain name according to the feature vector of each domain name. For example, a machine learning model may be trained as a classifier for detecting malicious domain names.

Fig. 4 shows a flow diagram of further embodiments of the disclosed malicious domain name detection method.

As shown in fig. 4, in step 410, the network request and the request time of each domain name are obtained from the DNS traffic data.

In some embodiments,

steps

420, 430 may be performed by a domain name sequence temporal relationship clustering module.

In step 420, the domain names are sorted by using the time sequence of the network requests of the domain names to generate a domain name sequence.

In step 430, clustering is performed on each domain name in the domain name sequence at time intervals, and domain names with time association relations are divided into the same cluster to form a time association domain name set.

In some embodiments, D is DNS traffic data heard through the DNS server. D may be decomposed into a set of DNS requests for different end users, i.e., D ═ U_n,D_n}。U_nFor the end user, D_n＝{(d₁,t₁),(d₂,t₂),…,(d_i,t_i),…(d_I,t_I) Is eachSet in which DNS requests for domain names are distributed on the time axis, (d)_i,t_i) Representing user U_nAt time t_iDomain name d_i。

In some embodiments, the domain name sequence may be cluster partitioned based on the temporal clustering properties of the DNS requests, and the temporal attendant properties of the DNS requests for the domain names.

For example, if the time interval of DNS requests of adjacent 2 domain names is greater than a threshold, the 2 domain names are divided into different domain name sets; otherwise, the domain names are divided into the same domain name set. In this way, the domain name sequence may be divided into a plurality of domain name sets that are time-dependent, i.e., time-associated domain name sets extracted from the DNS traffic D.

In some embodiments, steps 440-460 may be performed by a spatiotemporal companion domain name feature learning module.

In step 440, the time companion domain name set is divided into pairs of companion domain names with time companion relationships using a sliding window approach.

In some embodiments, in order to reduce the complexity of the maximum likelihood probability calculation of the domain name set, the domain name set may be divided into a plurality of domain name pairs in a sliding window manner. For example, the window size may be set to 2, and the domain name set may be divided into a plurality of domain name pairs. The size of the sliding window of the domain name set including 5 domain names may be set to 2 (a window whose front and rear distances are 2 with the current domain name as the center) according to the manner in fig. 3, and the domain name pairs are divided.

In step 450, the domain name pair association probability is expressed using an activation function to determine an objective function for the maximum likelihood estimation method.

In step 460, the feature vector for the domain name is found by iteratively maximizing the joint probability.

In step 470, in combination with the labeled domain name sample data, a malicious domain name detection classifier (machine learning model) is trained for detecting more unknown malicious domain names.

In the above embodiment, the domain name sequence segmentation algorithm based on time intervals can effectively extract the domain name sets with the accompanying relationship from the DNS traffic. And mapping each domain name into a feature vector of a low-dimensional space by using an unsupervised domain name quantization algorithm, and keeping the adjoint relationship between the domain names. Based on the hash sum with the companion relationship, the potential time companion relationship of the domain name can be automatically mined from the original DNS traffic for detecting malicious domain names.

Therefore, potential domain name accompanying relations can be automatically mined from DNS traffic and mapped into feature vectors, manual expert experience is not needed, and complex work of manually designing features is omitted. The accompanying relationships are mined using the chronological order of the network requests without additional information. Moreover, this enables to deal with malicious domain names that do not have normal responses; the domain names with the accompanying relation are clustered, and the clustering method can be used for malicious domain name group discovery.

Fig. 5 illustrates a block diagram of some embodiments of the malicious domain name detection apparatus of the present disclosure.

As shown in fig. 5, the detection apparatus 5 of a malicious domain name includes a generation unit 51, a clustering unit 52, a calculation unit 53, and a judgment unit 5.

The generation unit 51 sorts the domain names according to the time sequence of the network requests of the domain names, and generates a domain name sequence.

The clustering unit 52 clusters the domain names according to the time interval of the network requests of the adjacent domain names in the domain name sequence, and obtains a set of the domain names.

In some embodiments, clustering unit 52 partitions adjacent domain names having a time interval less than a threshold into a set of domain names.

The calculation unit 53 calculates the feature vector of each domain name by using the maximum likelihood estimation method according to the joint probability distribution of each domain name set, using the feature vector of each domain name as a variable.

In some embodiments, the calculation unit 53 determines the probability distribution of the domain name sequences based on the joint probability distribution of the domain name sets; the calculating unit 53 determines an objective function according to the probability distribution of the domain name sequence; the calculation unit 53 solves the objective function by using a maximum likelihood estimation method, and obtains the feature vector of each domain name.

In some embodiments, the calculating unit 53 respectively combines, in each domain name set of the domain name sequence, all two domain names whose sorting distance is smaller than the threshold value into a domain name pair; the calculation unit 53 calculates the joint probability distribution of each domain name set based on the joint probability distribution of each domain name pair.

In some embodiments, the calculating unit 53 calculates the logarithm of the joint probability distribution of each domain name set according to a weighted sum of the logarithms of the joint probability distribution of each domain name pair; the calculating unit 53 calculates the logarithm of the probability distribution of the domain name sequence based on the weighted sum of the logarithms of the joint probability distribution of each domain name set; the calculation unit 53 determines an objective function from the logarithm of the probability distribution of the domain name sequence.

In some embodiments, the calculation unit 53 determines the joint probability distribution of a domain name pair as a function of Sigmoid function by taking the vector product of the transpose of the feature vector of one domain name and the feature vector of the other domain name in the domain name pair.

The determination unit 54 determines whether each domain name is a malicious domain name by using a machine learning model according to the feature vector of each domain name.

Fig. 6 shows a block diagram of further embodiments of the malicious domain name detection apparatus of the present disclosure.

As shown in fig. 6, the detection apparatus 6 for a malicious domain name according to this embodiment includes: a memory 61 and a processor 62 coupled to the memory 61, the processor 62 being configured to execute the method for detecting a malicious domain name in any one of the embodiments of the present disclosure based on instructions stored in the memory 61.

The memory 61 may include, for example, a system memory, a fixed nonvolatile storage medium, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader (Boot Loader), a database, and other programs.

As shown in fig. 7, the detection apparatus 7 for a malicious domain name according to this embodiment includes: a memory 710 and a processor 720 coupled to the memory 710, the processor 720 being configured to execute the malicious domain name detection method in any of the foregoing embodiments based on instructions stored in the memory 710.

The memory 610 may include, for example, system memory, fixed non-volatile storage media, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader (Boot Loader), and other programs.

The malicious domain name detection device 7 may further include an input/output interface 730, a network interface 740, a storage interface 750, and the like. These

interfaces

730, 740, 750, as well as the memory 710 and the processor 720, may be connected, for example, by a bus 760. The input/output interface 730 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, a touch screen, a microphone, and a speaker. The network interface 740 provides a connection interface for various networking devices. The storage interface 750 provides a connection interface for external storage devices such as an SD card and a usb disk.

As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

So far, the detection method of a malicious domain name, the detection apparatus of a malicious domain name, and the nonvolatile computer-readable storage medium according to the present disclosure have been described in detail. Some details that are well known in the art have not been described in order to avoid obscuring the concepts of the present disclosure. It will be fully apparent to those skilled in the art from the foregoing description how to practice the presently disclosed embodiments.

The method and system of the present disclosure may be implemented in a number of ways. For example, the methods and systems of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the foregoing examples are for purposes of illustration only and are not intended to limit the scope of the present disclosure. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope and spirit of the present disclosure. The scope of the present disclosure is defined by the appended claims.

Claims

1. A method for detecting a malicious domain name comprises the following steps:

sequencing the domain names according to the time sequence of the network requests of the domain names to generate a domain name sequence;

clustering the domain names according to the time interval of the network requests of the adjacent domain names in the domain name sequence to obtain domain name sets;

calculating the feature vector of each domain name by using a maximum likelihood estimation method according to the joint probability distribution of each domain name set by using the feature vector of each domain name as a variable;

and judging whether each domain name is a malicious domain name or not by utilizing a machine learning model according to the feature vector of each domain name.

2. The detection method according to claim 1, wherein the calculating the feature vector of each domain name by using a maximum likelihood estimation method according to the joint probability distribution of each domain name set by using the feature vector of each domain name as a variable comprises:

determining the probability distribution of the domain name sequence according to the joint probability distribution of each domain name set;

determining a target function according to the probability distribution of the domain name sequence;

and solving the objective function by using a maximum likelihood estimation method to obtain the feature vector of each domain name.

3. The detection method according to claim 2, wherein the determining the probability distribution of the domain name sequences according to the joint probability distribution of the domain name sets comprises:

in each domain name set of the domain name sequence, respectively combining all two domain names with the sorting distance smaller than a threshold value into a domain name pair;

and calculating the joint probability distribution of each domain name set according to the joint probability distribution of each domain name pair.

4. The detection method according to claim 3, wherein the determining the probability distribution of the domain name sequences according to the joint probability distribution of the domain name sets comprises:

calculating the logarithm of the joint probability distribution of each domain name set according to the weighted sum of the logarithms of the joint probability distribution of each domain name pair;

calculating the logarithm of the probability distribution of the domain name sequence according to the weighted sum of the logarithms of the joint probability distribution of each domain name set;

the determining the objective function according to the probability distribution of the domain name sequence includes:

and determining the target function according to the logarithm of the probability distribution of the domain name sequence.

5. The detection method according to claim 3, wherein the calculating the joint probability distribution of each domain name set according to the joint probability distribution of each domain name pair comprises:

and determining the joint probability distribution of the domain name pair by taking the vector product of the transposition of the feature vector of one domain name and the feature vector of the other domain name in the domain name pair as the variable of the Sigmoid function.

6. The detection method according to any one of claims 1 to 5, wherein the clustering the domain names according to the time interval of the network requests of the adjacent domain names in the domain name sequence to obtain domain name sets comprises:

and dividing the adjacent domain names with the time interval smaller than the threshold value into a domain name set.

7. A malicious domain name detection apparatus, comprising:

the generating unit is used for sequencing the domain names according to the time sequence of the network requests of the domain names to generate a domain name sequence;

the clustering unit is used for clustering the domain names according to the time interval of the network requests of the adjacent domain names in the domain name sequence to obtain domain name sets;

a calculating unit, configured to calculate the feature vector of each domain name by using a maximum likelihood estimation method according to the joint probability distribution of each domain name set by using the feature vector of each domain name as a variable;

and the judging unit is used for judging whether each domain name is a malicious domain name or not by utilizing a machine learning model according to the characteristic vector of each domain name.

8. The detection apparatus according to claim 7,

and the calculation unit determines the probability distribution of the domain name sequence according to the joint probability distribution of each domain name set, determines an objective function according to the probability distribution of the domain name sequence, solves the objective function by using a maximum likelihood estimation method and acquires the feature vector of each domain name.

9. The detection apparatus according to claim 8,

the calculation unit respectively combines two domain names with the sorting distance smaller than a threshold value into domain name pairs in each domain name set of the domain name sequence, and calculates the joint probability distribution of each domain name set according to the joint probability distribution of each domain name pair.

10. The detection apparatus according to claim 9,

the calculation unit calculates the logarithm of the joint probability distribution of each domain name set according to the weighted sum of the logarithms of the joint probability distribution of each domain name pair, calculates the logarithm of the probability distribution of the domain name sequence according to the weighted sum of the logarithms of the joint probability distribution of each domain name set, and determines the objective function according to the logarithm of the probability distribution of the domain name sequence.

11. The detection apparatus according to claim 9,

the calculation unit determines the joint probability distribution of the domain name pair by taking the vector product of the transpose of the feature vector of one domain name and the feature vector of the other domain name in the domain name pair as the variable of the Sigmoid function.

12. The detection apparatus according to any one of claims 7-11, wherein the clustering unit partitions neighboring domain names having a time interval smaller than a threshold value into a set of domain names.

13. A malicious domain name detection apparatus, comprising:

a memory; and

a processor coupled to the memory, the processor configured to perform the method of detecting a malicious domain name of any of claims 1-6 based on instructions stored in the memory.

14. A non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of detecting a malicious domain name according to any one of claims 1 to 6.