CN113572770B

CN113572770B - Method and device for detecting domain name generated by domain name generation algorithm

Info

Publication number: CN113572770B
Application number: CN202110841774.7A
Authority: CN
Inventors: 叶晓俊; 来耀; 平国楼
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2021-07-26
Filing date: 2021-07-26
Publication date: 2022-09-02
Anticipated expiration: 2041-07-26
Also published as: CN113572770A

Abstract

The application discloses a method and a device for detecting a domain name generated by a domain name generation algorithm, wherein the method comprises the following steps: extracting top-level domain name features from the top-level domain name by utilizing the one-hot code and two full-connected layers; extracting secondary domain name features from the secondary domain name by using a long-term and short-term memory network; splicing the top level domain name features and the secondary domain name features to obtain domain name fusion features; and detecting and classifying the DGA domain name according to the domain name fusion characteristics, and obtaining a detection result of the domain name generation algorithm family according to the classification result based on the abnormal detection model of the corresponding class. The method of the embodiment of the application not only can detect the domain name generation algorithm domain name of the known family in real time, but also can detect the domain name generation algorithm domain name of the unknown family, so that the potential network attack of the new family of the domain name generation algorithm can be found in time.

Description

Method and device for detecting domain name generated by domain name generation algorithm

Technical Field

The present application relates to the field of network security technologies, and in particular, to a method and an apparatus for detecting a domain name generated by a domain name generation algorithm.

Background

At present, the attack in the internet gradually changes into large-scale network attack, according to the statistics of laboratories of foreign security companies, the number of large-scale network attacks detected in 2018 is 15314, and the attack range covers 101 countries and regions in the world. Most of such large-scale network attacks avoid detection of security software by traffic obfuscation through Domain-Flux technology. The technique generates a series of Domain names via a Domain Generation Algorithm (DGA) through which an attacker-controlled host communicates. Detection of such a type of network attack is typically accomplished by detecting DGA domain names in network traffic.

In the related art, the detection method of domain names generated by DGA mainly comprises three types: a reverse engineering based method, a traffic analysis based method and a domain name analysis based method. The reverse engineering-based method needs to perform reverse analysis on collected DNS (Domain Name System) traffic samples, so as to obtain a process of a Domain Name generation algorithm, and detect a DGA Domain Name by a method of generating a blacklist. This approach requires a large amount of manual analysis and therefore is slow to respond to new malicious DGAs for network attack detection. The method based on traffic analysis is to identify the DGA domain name by analyzing the behavior of DNS traffic, and the method can consider the behavior characteristics of querying the domain name, but needs to collect a large amount of DNS traffic information, and cannot realize real-time DGA domain name detection. The domain name analysis-based method carries out DGA domain name detection by directly analyzing the domain name text characteristics according to the fact that a normal domain name and DGA domain names of various families have different text characteristics. The method can realize real-time detection with high accuracy and is widely applied to the detection of the DGA domain name.

However, most of the domain name analysis-based DGA detection methods detect and classify known family DGA domain names through a machine learning method, but the known family DGA domain names are wrongly classified as normal benign domain names or known family DGA domain names, so that domain name attacks cannot be detected in time or detected unknown DGA attacks are misjudged as known family DGA domain names.

Disclosure of Invention

The application provides a method and a device for detecting a domain name generated by a domain name generation algorithm, which are used for solving the problems that the domain name of a known family is detected by a machine learning method in the related technology, so that the domain name of a DGA of an unknown family is wrongly classified into a normal benign domain name or the domain name of the DGA of the known family, and the domain name attack of the DGA cannot be detected in time or the attack is detected but is misjudged as the attack of the known DGA family.

An embodiment of a first aspect of the present application provides a method for detecting a domain name generated by a domain name generation algorithm, including the following steps: extracting top-level domain name features from the top-level domain name by utilizing the unique hot code and two full connection layers; extracting secondary domain name features from the secondary domain name by using a long-short term memory network; splicing the top-level domain name characteristics with the secondary domain name characteristics to obtain domain name fusion characteristics; and detecting and classifying the DGA domain name according to the domain name fusion characteristics, and obtaining a detection result of the domain name generation algorithm family according to the classification result based on the abnormal detection model of the corresponding class.

Optionally, in an embodiment of the present application, the method further includes: collecting the maximum value of the output vector of each domain name generation algorithm family in the last layer of all-connected layer of correctly classified samples, and training a corresponding single-class support vector machine learning model; and inputting the output of each sample on the full connection layer into a single-class support vector machine learning model of the corresponding class to perform anomaly detection so as to identify the new class of the unknown domain name generation algorithm family.

Optionally, in an embodiment of the present application, a dimension of the vector of the one-hot codes is equal to the number of top-level domain names, and the one-hot code of each top-level domain name is a numerical value of a subscript of the corresponding top-level domain name of 1, and numerical values of other subscripts are all 0.

Optionally, in an embodiment of the present application, the extracting, by using a long-short term memory network, the secondary domain name feature from the secondary domain name includes: filling up the trained secondary domain names into a format with a preset length through null characters; and converting a character representation method into numerical representation in a matrix form, inputting the numerical representation into the long-term and short-term memory network for feature learning, and extracting the features of the secondary domain name according to the secondary domain name.

Optionally, in an embodiment of the present application, before the extracting, by using the long-short term memory network, the secondary domain name feature from the secondary domain name, the method further includes: and setting the vector dimension of the result output by the long-short term memory network to be the same as the learned vector dimension of the top-level domain name feature.

An embodiment of a second aspect of the present application provides an apparatus for detecting a domain name generated by a domain name generation algorithm, including: the first extraction module is used for extracting top-level domain name features from the top-level domain name by utilizing the one-hot code and two full-connection layers; the second extraction module is used for extracting secondary domain name features from the secondary domain name by utilizing a long-short term memory network; the fusion module is used for splicing the top-level domain name features and the secondary domain name features to obtain domain name fusion features; and the first detection module is used for detecting and classifying the DGA domain name according to the domain name fusion characteristics and obtaining a detection result of a DGA family of a domain name generation algorithm based on an abnormal detection model of a corresponding class according to a classification result.

Optionally, in an embodiment of the present application, the method further includes: the training module is used for acquiring the maximum value of the output vector of each domain name generation algorithm family in the last layer of all-connected layer of correctly classified samples and training a corresponding single-class support vector machine learning model; and the second detection module is used for inputting the output of each sample on the full connection layer into the single-class support vector machine learning model of the corresponding class to perform anomaly detection so as to identify the new class of the unknown domain name generation algorithm family.

Optionally, in an embodiment of the present application, the second extraction module includes: the completion unit is used for completing the trained secondary domain names into a format with a preset length through null characters; and the learning unit is used for converting the character representation method into matrix numerical representation so as to input the long-term and short-term memory network for feature learning, and extracting the features of the secondary domain name according to the secondary domain name.

An embodiment of a third aspect of the present application provides an electronic device, including: a memory, a processor, and a computer program and a learning model stored on the memory and executable on the processor, the processor executing the program and the model to implement the method of detecting a domain name generated by a domain name generation algorithm as described in the above embodiments.

A fourth aspect of the present application provides a computer-readable storage medium, on which a computer program and a learning model are stored, wherein the program and the learning model are executed by a processor for implementing a method for detecting a domain name generated by a domain name generation algorithm as described in the above embodiments.

Establishing a classification detection model for the DGA domain names of known families, and establishing an abnormal detection model for the DGA domain names of different families on the basis of realizing the detection of the domain names of the known DGA families, thereby realizing the detection of the domain names of unknown DGA families, wherein for the detection of the DGA domain names of the known families, corresponding characteristics are extracted by respectively using a method of single-hot coding and long-short term memory network for a top-level domain name and a secondary domain name, the DGA domain names are detected and classified by a learning model on the basis of fusing the characteristics of the top-level domain name and the secondary domain name, and for the detection of the DGA domain names of the unknown families, the abnormal detection model is fitted by a single-class support vector machine for the output of each type of correctly classified and detected samples in the detection classification model on the DGA domain names of the known families, so that the detection of the DGA domains of the unknown families is carried out by using the abnormal detection model of corresponding classes according to the result of initial classification, the domain name detection method has the advantages that the DGA domain name of an unknown family is identified by fully utilizing the detection technology of an unknown class, and the domain name on the DNS can be detected in real time, so that the suspicious domain name can be intercepted, and the network safety can be guaranteed. Therefore, the method and the device solve the problem that the domain name attack cannot be detected in time or the domain name attack cannot be misjudged to be the domain name attack of a known family by detecting the domain name of the known family DGA through a machine learning method in the related technology, so that the domain name of the DGA of the unknown family is wrongly classified into a normal benign domain name or the DGA of the known family.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a flowchart of a method for detecting a domain name generated by a domain name generation algorithm according to an embodiment of the present application;

FIG. 2 is a flow diagram of a method for detecting a domain name generated by a domain name generation algorithm according to one embodiment of the present application;

FIG. 3 is a flow diagram during a training phase for unknown DGA family domain name detection according to one embodiment of the present application;

FIG. 4 is a flow diagram during a testing phase for unknown DGA family domain name detection according to one embodiment of the present application;

fig. 5 is an exemplary diagram of an apparatus for detecting a domain name generated by a domain name generation algorithm according to an embodiment of the present application;

FIG. 6 is an exemplary diagram of an apparatus for detecting a domain name generated by a domain name generation algorithm in accordance with one embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Description of reference numerals:

means 10 for detecting a domain name generated by a domain name generation algorithm; 100-a first extraction module, 200-a second extraction module, 300-a fusion module, 400-a first detection module, 500-a training module, 600-a second detection module; 701-memory, 702-processor, and 703-communication interface.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.

The following describes a method and an apparatus for detecting a domain name generated by a domain name generation algorithm according to an embodiment of the present application with reference to the drawings. Aiming at the problem that the domain name attack cannot be detected in time or the domain name attack is misjudged to be the domain name attack of a certain known family by detecting the known family DGA domain name by a machine learning method in the related technology mentioned in the background technology center, the application provides a method for detecting the domain name generated by a domain name generation algorithm, in the method, a classification detection model is established for the known family DGA domain name, and an abnormal detection model is established for the DGA domain names of different families on the basis of realizing the detection of the known DGA family domain name, so that the detection of the unknown DGA family domain name is realized, wherein for the detection of the known family DGA domain name, the characteristics are extracted by respectively using a single hot code and a long-term memory network for a top-level domain name and a second-level domain name, the method is characterized in that DGA Domain names are detected and classified on the basis of feature fusion, and for the detection of the unknown family DGA Domain names, a single-class support vector machine is used for fitting an abnormal detection model for the output of each type of sample which is correctly classified and detected in a detection classification model on the known family DGA Domain names, so that when the unknown family DGA Domain names are detected, the abnormal detection model of the corresponding class is used for detecting the unknown family DGA Domain names according to the initial classification result, the unknown family DGA Domain names are identified by fully utilizing the detection technology of the unknown class, and the Domain names on a DNS (Domain Name System) server can be detected in real time, so that suspicious Domain names can be intercepted immediately, and the network safety is guaranteed. Therefore, the problem that the domain name attack cannot be detected in time or the domain name attack is detected but misjudged to be the domain name attack of the DGA of a certain known family due to the fact that the DGA domain name of the unknown family is wrongly classified into a normal benign domain name or the DGA domain name of the certain known family by detecting the attack through a machine learning method in the correlation technique is solved.

Specifically, fig. 1 is a schematic flowchart of a method for detecting a domain name generated by a domain name generation algorithm according to an embodiment of the present application.

As shown in fig. 1, the method for detecting a domain name generated by a domain name generation algorithm includes the following steps:

in step S101, top-level domain name features are extracted from the top-level domain name using the one-hot encoding and the two fully-connected layers.

It is understood that step S101 extracts the top-level domain name features for using one-hot encoding plus two fully-connected layers. The domain name to be detected comprises a top-level domain name and a secondary domain name, the domain name to be detected is taken as an object, the domain name is divided into different levels by English periods, all sub-character strings behind the last English period character are called as the top-level domain name part, and the sub-character string adjacent to the left side of the top-level domain name is called as the following secondary domain name part.

Optionally, in an embodiment of the present application, a dimension of the vector of the one-hot codes is equal to the number of the top-level domain names, and the one-hot code of each top-level domain name is a numerical value of a subscript of the corresponding top-level domain name of 1, and numerical values of other subscripts are all 0.

Specifically, the top-level domain name features are extracted by using the one-hot code and two fully-connected layers, namely, the one-hot code is established for the top-level domain name. The dimensionality of the unique hot code vector is equal to the number of the existing top-level domain names, the unique hot code of each top-level domain name is 1 corresponding to the subscript value of the top-level domain name, and the numerical values of other subscripts are 0. And outputting the extraction result of the characteristics of the one-hot coded vector through two fully-connected layers (namely multiplying two different weight matrixes) in the neural network.

In step S102, secondary domain name features are extracted from the secondary domain name using a long-short term memory network.

It is understood that step S102 extracts the secondary domain name features for using the long-short term memory network.

Optionally, in an embodiment of the present application, extracting the secondary domain name feature from the secondary domain name using a long-short term memory network includes: filling up the trained secondary domain names into a format with a preset length through null characters; the character representation method is converted into numerical representation in a matrix form to input into a long-term and short-term memory network for feature learning, so that the secondary domain name features are extracted according to the secondary domain name.

Specifically, the Long-Short Term Memory network is used for extracting the secondary domain name features, namely, the secondary domain name is firstly filled into a format with a fixed length through empty characters, and then is converted into numerical representation in a matrix form through a character representation method and input into an LSTM (Long Short-Term Memory) for learning the secondary domain name features. The long-short term memory network is a widely used recurrent neural network, and the structure thereof can be realized by referring to a standard structure.

In addition, in an embodiment of the present application, before extracting the secondary domain name feature from the secondary domain name by using the long-short term memory network, the method further includes: and setting the vector dimension of the result output by the long-short term memory network to be the same as the vector dimension of the learning of the top-level domain name features.

That is, the vector dimension of the result output by the long-short term memory network is set to be the same as the vector dimension of the top-level domain name feature learning.

In step S103, the top-level domain name feature and the secondary-level domain name feature are spliced to obtain a domain name fusion feature.

It can be understood that, in step S103, the top-level domain name features and the secondary domain name features are concatenated, that is, the extracted top-level domain name features and the secondary domain name features are vector-concatenated to form a merged domain name feature vector.

In step S104, the DGA domain name is detected and classified according to the domain name fusion features, and a detection result of the domain name generation algorithm family is obtained based on the abnormal detection model of the corresponding class according to the classification result.

It can be understood that step S104 is to input the spliced features into two full-link layers for classification, so as to detect both the DGA domain name of a known family and the DGA domain name of an unknown family, and thus can be effectively used for network attack detection, and also make full use of the characteristics of the domain name hierarchy for domain name feature extraction, and implement identification of the DGA domain name of the unknown family by adding an anomaly detection model, so that both the DGA domain name of the known family and the DGA domain name of the unknown family can have a higher detection rate, and security of network communication is ensured.

In short, the spliced features are input into two full-connection layers for classification, that is, the spliced domain name features are input into the full-connection layers to obtain the final DGA domain name detection and identification result.

Optionally, in an embodiment of the present application, the method further includes: collecting the maximum value of the output vector of each DGA family in the last layer of the fully-connected layer of the correctly classified sample, and training a corresponding single-class support vector machine learning model; and inputting the output of each sample on the full connection layer into a single-class support vector machine learning model of a corresponding class to perform anomaly detection so as to identify a new class of the unknown domain name generation algorithm family.

In an actual implementation process, after step S104, the method according to the embodiment of the present application further includes: and collecting the maximum value of the output vector of the sample of each DGA family which is correctly classified in the fourth step in the last full-connection layer to train a corresponding single-class support vector machine learning model, and inputting the output of each sample on the full-connection layer into the single-class support vector machine of the corresponding class to perform anomaly detection so as to identify the new class of the unknown DGA family.

Specifically, the above steps S101 to S104 are processes for DGA domain name detection identification. For the selection of the weight of the model, the embodiment of the present application may train in a large amount of labeled data through a back propagation method, which includes:

step S105: and (4) collecting the maximum value of the output vector of the sample correctly classified in the step 4 of each DGA family in the last layer of the fully-connected layer to train the corresponding single-class support vector machine. Namely, for each DGA family and normal domain name category, respectively training an abnormal detection model based on a single-class support vector machine, wherein the training sample of the model is the maximum value (including the vector before the activation function and the vector after the activation function) of the output vector of the sample of which the corresponding category is correctly identified in the DGA domain name identification method at the last layer of the full connection layer.

Step S106: and (4) inputting the output of each sample on the full connection layer into a single-class support vector machine of a corresponding class to perform anomaly detection, and identifying a new class of an unknown DGA family. Firstly, obtaining an initial detection identification result type of the domain name to be detected by the DGA domain name identification method of the steps S101-S104, then using the corresponding single-type support vector machine obtained in the step S105 to carry out anomaly detection, and if the domain name to be detected is an anomaly sample, determining the domain name to be detected as the DGA domain name of an unknown family; if the detected sample is a non-abnormal sample, the detection result obtained in step S104 is retained.

In summary, in the embodiments of the present application, the hierarchical features in the domain name are fully utilized, and different network methods are used to extract the top-level domain name and the secondary-level domain name features respectively, so as to increase the accuracy of DGA domain name detection. Meanwhile, an abnormal detection model of a single-class support vector machine is added in the detection method, so that the domain name of an unknown DGA family can be detected while the domain name of a known DGA family is detected, the domain name attack of new DGA classes which continuously appear is resisted, and the safety of a target host is guaranteed. The method is very suitable for being applied to firewalls and intrusion detection products based on DNS flow detection.

The principles of the method of embodiments of the present application are described in detail below in one specific implementation.

As shown in fig. 2, fig. 2 is a flowchart of detecting a known DGA family according to the embodiment of the present application, and both the training process and the detection process can be represented by the flowchart, which includes two parts, i.e., feature extraction and training and detection. The domain names are divided into a top-level domain name and a secondary domain name by feature extraction, features are extracted by adopting an independent hot code and a long-short term memory network respectively, and then feature vectors of the top-level domain name and the secondary domain name are spliced into a feature vector to be identified and classified through a full connection layer. In training, a large data set of domain name labels is used, the labels being DGA family categories to which the corresponding domain names represented by numerical values belong. The data set is used for parameter fitting of the module using neural network back propagation and the like. During testing, the parameters learned in the training stage are used for inputting a single domain name into the module, and the detection and identification results are obtained at the end of the module.

FIG. 3 is a flow chart that adds a training portion of unknown DGA family domain name detection. The method comprises the steps of finding out all samples which are correctly identified in training set data, inputting the samples into a model of the method, and extracting the maximum value of an output vector of the last layer of the model to fit an abnormality detection model of a single-class support vector machine of a corresponding class. The process is a common way to obtain a single class of support vector machine for the number of DGA family classes. (when detecting that the domain name data contains a normal domain name, the number of the single-class support vector machines is DGA family number plus 1.)

FIG. 4 is a flow chart of a test method of an unknown DGA domain name detection module. The domain name to be detected is firstly detected by a module of a known DGA family to obtain an initial identification result. And finding a single-class support vector machine of a corresponding class based on the result, inputting the maximum value of the output vector of the last layer of the domain name to be detected in the module into the single-class support vector machine, and obtaining the abnormal detection result of the support vector machine. If the abnormal sample is judged, the domain name is regarded as a DGA domain name of an unknown family; and if the abnormal sample is judged to be not abnormal, keeping the identification result in the original identification module.

According to the method for detecting the domain name generated by the domain name generation algorithm provided by the embodiment of the application, a classification detection model is established for the DGA domain name of a known family, on the basis of realizing the detection of the domain name of the known DGA family, an abnormal detection model is established for the DGA domain names of different families, so that the detection of the domain name of an unknown DGA family is realized, for the detection of the domain name of the known family, corresponding characteristics are extracted by respectively using a method of single thermal coding and a long and short term memory network for a top-level domain name and a secondary domain name, the model is learned to detect and classify the DGA domain name on the basis of fusing the characteristics of the top-level domain name and the secondary domain name, for the detection of the domain name of the unknown family, the abnormal detection model is fitted through a single-class support vector machine for the output of each type of correctly classified and detected samples in the detection classification model on the domain name of the known family DGA, so that when the detection of the domain name of the unknown DGA family is carried out, the unknown family DGA domain name is detected by using the abnormal detection model of the corresponding class according to the initial classification result, the unknown family DGA domain name is identified by fully utilizing the detection technology of the unknown class, and the domain name on the DNS server can be detected in real time, so that the suspicious domain name can be intercepted, and the network safety can be guaranteed.

Next, an apparatus for detecting a domain name generated by a domain name generation algorithm according to an embodiment of the present application will be described with reference to the drawings.

Fig. 5 is a block diagram illustrating an apparatus for detecting a domain name generated by a domain name generation algorithm according to an embodiment of the present application.

As shown in fig. 5, the apparatus 10 for detecting a domain name generated by a domain name generation algorithm includes: a first extraction module 100, a second extraction module 200, a fusion module 300 and a first detection module 400.

The first extraction module 100 is configured to extract a top-level domain name feature from a top-level domain name by using a unique hot code and two full-connected layers.

And a second extraction module 200, configured to extract the secondary domain name features from the secondary domain name using the long-term and short-term memory network.

And the fusion module 300 is configured to splice the top-level domain name features and the secondary-level domain name features to obtain domain name fusion features.

The first detection module 400 is configured to detect and classify the DGA domain name according to the domain name fusion features, and obtain a detection result of the domain name generation algorithm family according to the classification result based on the abnormal detection model of the corresponding class.

Optionally, in an embodiment of the present application, as shown in fig. 6, the method further includes: a training module 500 and a second detection module 600.

The training module 500 is configured to collect a maximum value of an output vector of each domain name generation algorithm family in a last layer of a fully-connected layer of a correctly-classified sample, and train a corresponding single-class support vector machine learning model.

The second detection module 600 is configured to perform anomaly detection on the single support vector machine learning model of the class corresponding to the output and input of each sample on the full connection layer, so as to identify a new class of the unknown domain name generation algorithm family.

Optionally, in an embodiment of the present application, the second extraction module includes: a completion unit and a learning unit.

The completion unit is used for completing the trained secondary domain name into a format with a preset length through null characters.

And the learning unit is used for converting the character representation method into matrix numerical representation so as to input the long-term and short-term memory network for feature learning, and extracting the features of the secondary domain name according to the secondary domain name.

It should be noted that the foregoing explanation of the embodiment of the method for detecting a domain name generated by a domain name generation algorithm is also applicable to the apparatus for detecting a domain name generated by a domain name generation algorithm in this embodiment, and is not described herein again.

According to the device for detecting the domain name generated by the domain name generation algorithm, a classification detection model is established for the DGA domain name of a known family, on the basis of realizing the detection of the domain name of the known DGA family, an abnormal detection model is established for the DGA domain names of different families, so that the detection of the domain name of an unknown DGA family is realized, for the detection of the domain name of the known family, corresponding characteristics are extracted by respectively using a method of single-hot coding and a long-short term memory network for a top-level domain name and a secondary domain name, a model is learned on the basis of fusing the characteristics of the top-level domain name and the secondary domain name to detect and classify the DGA domain name of the unknown family, and for the detection of the DGA domain name of the unknown family, the abnormal detection model is fitted through a single-class support vector machine for the output of each type of correctly classified and detected samples in the detection classification model on the domain name of the known family DGA, so that when the detection of the domain name of the unknown DGA family is carried out, the unknown family DGA domain name is detected by using the abnormal detection model of the corresponding class according to the initial classification result, the unknown family DGA domain name is identified by fully utilizing the detection technology of the unknown class, and the domain name on the DNS server can be detected in real time, so that the suspicious domain name can be intercepted, and the network safety can be guaranteed.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may include:

memory 701, processor 702, and a computer program and a learning model stored on memory 701 and executable on processor 702.

The programs and models, when executed by the processor 702, implement the method of detecting domain names generated by the domain name generation algorithm provided in the embodiments described above.

Further, the electronic device further includes:

a communication interface 703 for communication between the memory 701 and the processor 702.

A memory 701 for storing computer programs and learning models that may be run on the processor 702.

Memory 701 may include high-speed RAM memory, and may also include non-volatile memory, such as at least one disk memory.

If the memory 701, the processor 702 and the communication interface 703 are implemented independently, the communication interface 703, the memory 701 and the processor 702 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 7, but that does not indicate only one bus or one type of bus.

Optionally, in a specific implementation, if the memory 701, the processor 702, and the communication interface 703 are integrated on a chip, the memory 701, the processor 702, and the communication interface 703 may complete mutual communication through an internal interface.

The processor 702 may be a Central Processing Unit (CPU) or a deep learning architecture combining a CPU and a plurality of Graphics Processing Units (GPUs), or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present Application.

The present embodiment also provides a computer-readable storage medium having stored thereon a computer program and a learning model, characterized in that the program and the model, when executed by a processor, implement the method of detecting a domain name generated by a domain name generation algorithm as above.

In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "N" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of implementing the embodiments of the present application.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, such as Programmable Gate Arrays (PGA), Field Programmable Gate Arrays (FPGA), and the like.

It will be understood by those of ordinary skill in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by associated hardware instructed by a program or a model, which may be stored in a computer-readable storage medium, and when executed, include one or a combination of the steps of the method embodiments.

Claims

1. A method for detecting a domain name generated by a domain name generation algorithm, comprising the steps of:

extracting top-level domain name features from the top-level domain name by utilizing the one-hot code and two full-connected layers;

extracting secondary domain name features from the secondary domain name by using a long-term and short-term memory network;

splicing the top level domain name features and the secondary domain name features to obtain domain name fusion features; and

and detecting and classifying the DGA domain name according to the domain name fusion characteristics, obtaining a detection result of a domain name generation algorithm family according to a classification result based on an abnormal detection model of a corresponding class, acquiring the maximum value of an output vector of a correctly classified sample of each domain name generation algorithm family in the last full connection layer in the model training process, training a corresponding single-class support vector machine learning model, inputting the output of each sample on the full connection layer into the single-class support vector machine learning model of the corresponding class for abnormal detection, and identifying a new class of an unknown domain name generation algorithm family.

2. The method of claim 1, wherein the dimension of the vector of the one-hot codes is equal to the number of top-level domain names, and the one-hot code of each top-level domain name is a numerical value of the subscript of the corresponding top-level domain name of 1, and the numerical values of the other subscripts are all 0.

3. The method of claim 1, wherein extracting the secondary domain name features from the secondary domain name using the long-short term memory network comprises:

filling up the trained secondary domain names into a format with a preset length through null characters;

converting the character representation method into matrix numerical representation to input the long-term and short-term memory network for feature learning, and extracting the secondary domain name features according to the secondary domain name.

4. The method of claim 3, further comprising, prior to said extracting the secondary domain name features from the secondary domain name using long-short term memory network:

and setting the vector dimension of the result output by the long-short term memory network to be the same as the learned vector dimension of the top-level domain name feature.

5. An apparatus for detecting a domain name generated by a domain name generation algorithm, comprising:

the first extraction module is used for extracting top-level domain name features from the top-level domain name by utilizing the one-hot code and two full-connection layers;

the second extraction module is used for extracting the secondary domain name features from the secondary domain name by utilizing a long-term and short-term memory network;

the fusion module is used for splicing the top-level domain name features and the secondary domain name features to obtain domain name fusion features; and

the first detection module is used for detecting and classifying the DGA domain name according to the domain name fusion characteristics, obtaining a detection result of a domain name generation algorithm family according to the classification result based on an abnormal detection model of a corresponding class, collecting the maximum value of an output vector of each correctly classified sample of the domain name generation algorithm family in the last layer of the full connection layer in the model training process, training a corresponding single-class support vector machine learning model, inputting the output of each sample on the full connection layer into the single-class support vector machine learning model of the corresponding class for abnormal detection, and identifying a new class of the unknown domain name generation algorithm family.

6. The apparatus of claim 5, wherein the second extraction module comprises:

the completion unit is used for completing the trained secondary domain names into a format with a preset length through null characters;

7. An electronic device, comprising: a memory, a processor, and a computer program and a learning model stored on the memory and executable on the processor, the processor executing the program and model to implement the method of detecting a domain name generated by a domain name generation algorithm according to any one of claims 1-4.

8. A computer-readable storage medium having stored thereon a computer program and a machine learning model, characterized in that the program and the model are executable by a processor for implementing a method for detecting a domain name generated by a domain name generation algorithm according to any of claims 1-4.