CN113810372B

CN113810372B - Low-throughput DNS hidden channel detection method and device

Info

Publication number: CN113810372B
Application number: CN202110901654.1A
Authority: CN
Inventors: 章坚武; 安彦军
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2021-08-06
Filing date: 2021-08-06
Publication date: 2022-10-04
Anticipated expiration: 2041-08-06
Also published as: CN113810372A

Abstract

The invention discloses a low-throughput DNS hidden channel detection method and a device, wherein a data set of low-throughput DNS hidden channel activity is captured, key features used for detection in the data set are extracted, the extracted key features are converted into feature vectors learned by a machine, the feature vectors corresponding to the data set are input into a constructed SPP-Net-LSTM detection model, the SPP-Net-LSTM detection model is obtained through training, the trained SPP-Net-LSTM detection model is used for real-time detection, and if abnormality is detected, a corresponding domain is immediately set to be inaccessible. The invention improves the SPP-NET network, improves the sensitivity of the network to increase the sensitivity of a few samples, and effectively extracts the spatial characteristics of the target. Through the SPP-NET-LSTM network, the speaking right of a few covert channel samples is effectively increased, the DNS covert channel under low throughput can be effectively detected, and the related domain of the DNS covert channel is set to be prohibited from being accessed, so that information is prevented from being continuously leaked through the covert channel with low throughput, which is not easy to detect.

Description

Low-throughput DNS (Domain name Server) covert channel detection method and device

Technical Field

The application belongs to the technical field of attack detection, and particularly relates to a low-throughput DNS hidden channel detection method and device.

Background

DNS is such an important service for almost all applications, and any communication from a local computer to the Internet (excluding static IP-based communications) relies on DNS services, limiting DNS communications may result in the disconnection of legitimate remote services, and thus enterprise firewalls are typically configured to allow all packets on UDP port 53 (used by DNS), i.e., DNS traffic is typically allowed through the enterprise firewall without deep inspection or state maintenance. From an attacker's perspective, this makes the DNS protocol a data-exposed covert communication channel. One way for an attacker to exploit DNS is to register a domain name (e.g., fengrou2019. Club) so that malware of the attacker in the host victim can encode valuable private information (e.g., credit card number, login password, or intellectual property) into a DNS request in the form of arbitrary-string. This DNS request is forwarded by the resolver in the global domain name system to the authoritative server of the fengorou2019. Club domain (under the control of the attacker), which in turn sends a response to the host victim. This provides the attacker with a low speed but covert two-way communication channel between the main victim and its command and control center. The ability of DNS to penetrate firewalls provides an attacker with a covert channel, albeit a low speed channel, through which private data can be leaked and communications with malware can be maintained by tunneling other protocols (e.g., SSH, FTP) to the command and control center. Modern malware and cyber attacks rely heavily on DNS services, making their activities reliable and difficult to track.

With the development of the DNS covert detection technology, the DNS covert channel detection has been greatly achieved. In recent years, some DNS tunnels have reduced the frequency of activity in order to improve their covertness. They enter a dormant state after sending a packet, wait for a certain time, and then send the next data packet, thereby escaping detection. This low throughput DNS blinded channel detection deficiency is then easily overlooked, with serious consequences if confidential information leaks through this channel.

Disclosure of Invention

The application aims to provide a low-throughput DNS hidden channel detection method and device so as to accurately detect the low-throughput DNS hidden channel and reduce the attack risk.

In order to achieve the purpose, the technical scheme of the application is as follows:

a low throughput DNS covert channel detection method comprises the following steps:

capturing a data set of low throughput DNS hidden channel activity, extracting key features for detection in the data set, and converting the extracted key features into feature vectors for machine learning;

inputting the feature vector corresponding to the data set into a constructed SPP-Net-LSTM detection model, and training to obtain the SPP-Net-LSTM detection model, wherein the SPP-Net-LSTM detection model comprises an improved SPP-Net network and a cost-sensitive LSTM network, an SPP pooling layer in the improved SPP-Net network comprises four pooling kernels of 1*1,2*2, 3*3 and 4*4 which are parallel, and the output of the SPP pooling layer is directly connected with the cost-sensitive LSTM network;

and (3) using the trained SPP-Net-LSTM detection model for real-time detection, and immediately setting the corresponding domain as inaccessible if an abnormality is detected.

Further, the capturing a data set of low throughput DNS hidden channel activity further comprises:

and carrying out undersampling operation on normal samples in the data set, and reducing the number of the normal samples.

Further, the key features for detection in the extracted dataset include one or more of a domain name, a resource record, a TTL value, the number of host names of a specific domain name, a time difference between two requests under the same TLD, an nxdmoin record, and an a record and an NS record that are added recently.

Further, the extracted key features are converted into feature vectors for machine learning, and for resource records, if records which are not frequently used appear, the records are marked as 0, otherwise, the records are marked as 1.

Further, the extracted key features are converted into the feature vectors of machine learning, and for the TTL value, if the value of TTL is between [0,100], the TTL value is marked as 0, otherwise, the TTL value is marked as 1.

Further, the extracted key features are converted into feature vectors for machine learning, and for the nxdmoin record, if a "nxdmoin" response occurs to a piece of data in the dataset, the record is marked as 0, otherwise, the record is marked as 1.

Further, the extracted key features are converted into feature vectors for machine learning, for the newly added a record and NS record, a set of history a record and NS record is first established, if the a record or NS record of a piece of data in the data set never appears in the set, the piece of data is marked as 0, otherwise, the piece of data is marked as 1.

Further, for the domain name, the converting the extracted key features into the machine-learned feature vector includes:

removing the separators in the domain names;

deleting the top level domain TLD of all domain names;

counting characters in each secondary domain name SLD name;

creating a vocabulary list for the counted characters;

assigning a unique integer label to each character in the vocabulary;

and replacing each character in the SLD name with a corresponding integer label so as to obtain the feature vector represented by the label.

Further, the loss function of the cost-sensitive LSTM network is:

where pos represents a set of fractional samples, neg represents a set of majority samples, x _i Representing one sample in a minority set or a majority set, y _i Represents a sample x _i Corresponding authentic tag, p _i For a sample x _i C represents a penalty factor.

The application also provides a low-throughput DNS hidden channel detection device, which comprises a processor and a memory, wherein the memory is used for storing a plurality of computer instructions, and the computer instructions are executed by the processor to realize the steps of the low-throughput DNS hidden channel detection method.

According to the low-throughput DNS hidden channel detection method and device, the feature vectors with any domain name length are detected by using the SPP-NET network, the SPP-NET network is improved, the sensitivity of the network is improved to increase the sensitivity of a few samples, and the spatial features of a target are effectively extracted; and finally, further improving the decision weight of a few samples by combining with a cost-sensitive LSTM network, and effectively extracting the time characteristics of the target. Through the SPP-NET-LSTM network, the speaking right of a few covert channel samples is effectively increased. According to the method and the device, the DNS hidden channel under low throughput can be effectively detected, and the related domain is set to be prohibited to access, so that information is prevented from being continuously leaked through the hidden channel which is not easy to perceive and has low throughput.

Drawings

FIG. 1 is a flow chart of a low throughput DNS covert channel detection method of the present application;

FIG. 2 is a schematic diagram of a prior art SPP-Net network structure;

FIG. 3 is a schematic diagram of an improved SPP-Net network structure of the present application;

FIG. 4 is a schematic structural diagram of the SPP-Net-LSTM detection model of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.

The low-throughput DNS covert channel detection method is used for detecting the DNS covert channel under low throughput. An SPP-Net-LSTM detection model is constructed, the improved SPP-Net network and a cost sensitive LSTM network (CLSTM) are fused by the model, and the purpose of detecting a small number of DNS hidden channel flows from a large number of DNS flows is achieved. First, SPP-Net is very flexible to the size of the inputs, the matrix dimensions of the inputs need not be uniform, but can produce a fixed size output. In addition, since SPP-Net has different size receptive fields, the sensitivity of a small number of data can be improved. The CLSTM network puts attention on the minority samples, and improves the decision weight of the minority samples, so that the detection rate of the minority samples is improved. The detection precision of the decimal sample can be further improved by fusing SPP-Net and CLSTM.

In one embodiment, a low throughput DNS hidden channel detection method, as shown in fig. 1, includes:

s1, capturing a data set of low-throughput DNS covert channel activity, extracting key features for detection in the data set, and converting the extracted key features into a feature vector for machine learning.

The SPP-Net-LSTM detection model constructed by training is firstly prepared before training.

And capturing a data set of DNS hidden channel activity with low throughput by using wireshark software, and randomly dividing to obtain a training set and a test set.

In a preferred embodiment, the capturing a data set of low throughput DNS hidden channel activity further includes:

and performing undersampling operation on normal samples in the data set, and reducing the number of the normal samples.

Because the proportion of the normal samples and the tunnel samples in the data set is greatly different, an undersampling step is used in advance to reduce the number of a part of normal samples, and because the proportion difference is large, the useful information cannot be lost too much.

Extracting key features used for detection in a data set, wherein the DNS data set obtained through wireshark contains a lot of redundant information irrelevant to detection, so that the obtained data is cleaned, redundant data is filtered, and beneficial features are extracted, wherein one or more of the following features can be selected:

domain name: the domain name often contains information of DNS hidden channel codes, and detecting domain name differences is a key characteristic for identifying DNS hidden channels.

Resource recording: common DNS resource records are A, AAAA, CNAME, NX and the like. However, the ability of these records to carry information cannot meet the DNS hidden channel requirement, and the DNS tunnel will use other less common resource records, such as TXT, NULL, to transmit information.

TTL value: TTL is an abbreviation of Time To Live, records the caching Time of DNS response To a specific domain, and can bring the maximum benefit To a DNS server by setting the TTL To 1-5 days. However, tunneling tools typically set lower TTL values, abusing round robin DNS mechanisms. The present embodiment suggests dividing TTL values in the range of [0,100] into malicious domains.

Number of host names of specific domain name: to transport non-duplicate information, the DNS tunnel only requests a unique host name for a particular domain, which results in more host names at the same Top Level (TLD).

Time difference of two requests under the same TLD: the active interval time of the low-throughput blind channel is usually well defined by an algorithm and shows a certain regularity.

Nxdmoin records: the domain name of the DNS hidden channel request is often generated temporarily, there is no relevant record on the DNS service, and the DNS server will generate a "nxdmoin" response indicating that it has no relevant request.

Recently added a records and NS records: the domain name used by the DNS hidden channel is usually the most recent one, and its A, NS record is also the most recently added.

It should be noted that the "nxdmoin" response, the a record, and the NS record are all commonly used technical terms in the DNS technical field, and are not described herein again.

And after extracting the key features, converting the extracted key features into the feature vector of machine learning.

The features extracted from the DNS packet cannot be directly recognized by the machine, so a preprocessing process is required to convert the selected features into feature vectors recognizable by the machine, and the process performs the following conversion steps for different key features:

domain name conversion: com, called Top Level Domain (Top Level Domain, TLD), and Second Level Domain (Second Level Domain, SLD). The first step is to remove the separator "-" in the domain name first, because it does not carry information, so remove; the second step is to delete the TLD of all domain names, which is called the secondary domain name (SLD) name; the third step is to count the characters (including capital and lowercase letters, numbers and special characters) in each SLD name, and it is noted that the recorded characters are not repeatedly recorded; fourthly, creating a vocabulary list for the counted characters; the fifth step is that each character in the vocabulary is assigned with a unique integer label; and the sixth step is to replace each character in the SLD name with a corresponding integer label so as to obtain the feature vector represented by the label.

Resource record conversion: common DNS resource records are A, AAAA, CNAME, NX and the like. However, the ability of these records to carry information cannot meet the DNS hidden channel requirement, and the DNS tunnel will use other less common resource records, such as TXT, NULL, to transmit information. If an infrequently used record occurs, it is marked as 0, otherwise it is marked as 1.

And (3) TTL value conversion: TTL is an abbreviation of Time To Live, records the caching Time of DNS response To a specific domain, and can bring the maximum benefit To a DNS server by setting the TTL To 1-5 days. However, tunneling tools typically set lower TTL values, abusing round-robin DNS mechanisms, so it is recommended to classify TTL values in the [0,100] range as malicious domains. Thus, TTL is marked as 0 if it has a value between [0,100], and 1 otherwise.

Host name number conversion for a particular domain name: to transfer non-duplicate information, a DNS tunnel requests only a unique host name for a particular Domain, which results in more host names for the same Top Level Domain (TLD). The embodiment counts the number of host names of a specific domain name by writing a python script.

Time difference of two requests under the same TLD: the active interval time of the low-throughput blind channel is usually well defined by an algorithm and shows a certain regularity. This embodiment counts the time interval of two requests under the same TLD through the python script.

Nxdmoin record conversion: extracting 'NXDOMAIN' records, wherein the domain name of the DNS hidden channel request is usually generated temporarily, no relevant records exist on the DNS service, and the DNS server can generate 'NXDOMAIN' response to indicate that no relevant requests exist. So if a "nxdmoin" response occurs for a piece of data in the dataset, it is marked as 0, otherwise it is marked as 1.

The newly added a record is converted with the NS record: the domain name used by the DNS hidden channel is usually the most recent one, and its A, NS record is also the most recently added. Firstly, a set of history A records and NS records is established, if the A record or the NS record of a piece of data in the data set never appears in the set, the piece of data is marked as 0, otherwise, the piece of data is marked as 1.

S2, inputting the feature vector corresponding to the data set into the constructed SPP-Net-LSTM detection model, and training to obtain the SPP-Net-LSTM detection model, wherein the SPP-Net-LSTM detection model comprises an improved SPP-Net network and a cost-sensitive LSTM network, the SPP pooling layer in the improved SPP-Net network comprises four pooling kernels of 1*1,2*2, 3*3 and 4*4 which are parallel, and the output of the SPP pooling layer is directly connected with the cost-sensitive LSTM network.

In a neural network, the matrix dimensions input to the fully-connected layer for classification must be uniform. In the data set of the application, the domain name length is changed, so that the dimensionality of the input matrix is inconsistent. The traditional way is to fill the domain name to the same length, and in doing so, the sequence information and the structure information of a part of the domain name itself are lost. SPP-Net is very flexible to the size of the input matrix, and can generate output with a fixed size without the dimension of the input matrix being consistent. In addition, the SPP-Net has different sensing fields, so that the sensitivity of a small number of data can be improved, and the SPP-Net is used as a first-layer network to extract and recombine spatial features.

The Spp-Net converts matrices of different dimensions to the same dimension by a max pooling (maxporoling) operation. Assuming that the dimension of the input matrix is n × n, to obtain the pooling result of m × m, the following relationship is given:

size of pooling window

The step length of pooling is:

the result of pooling was: c = MAX { C }, C being the elements of the original matrix within a size × size range.

The application improves the SPP-Net network, which is originally used in the field of image processing for processing training problems of images with different scales and extracting key information in the images, and the network is shown in FIG. 2.

The application makes the following modifications with reference to the original SPP-Net:

first, 3-way maxporoling in the Spatial Pyramid convolution (SPP for short) in the original SPP-Net is changed to 4-way to increase the number of features. Namely, on the original pooling of 1 × 1,2 × 2,4 × 4, one 3*3 is added in parallel, and one input is added, namely, the SPP pooling layer comprises four pooling cores of 1*1,2*2, 3*3 and 4*4 which are arranged in parallel, and part of characteristics are added.

Second, the serial 1 × 1,3 × 3,1 × 1 convolution operations after SPP operations in the SPP-Net network are removed, because the SPP-Net combines the features after SPP operations, followed by a fully connected layer for classification. Therefore, the key information of the combined features is extracted through convolution operation to improve the detection precision. And the application is connected with the CLSTM model of the next layer after the SPP operation, so the application removes the convolution operation to reserve more information. The modified improved SPP-Net network is shown in fig. 3.

The CLSTM network focuses on a few samples to increase decision weights of the few samples, and the principle is as follows:

the goal of the LSTM is to minimize the loss function of the network, where let y denote the true tag, p _i For the prediction probability of the ith sample, the loss function is defined as follows:

loss＝∑y _i logp _i

the method improves the loss function, introduces cost sensitive coefficients, sets data set, sets the number of sets of most samples as N, sets the number of a few samples as M, and sets penalty coefficients

The LSTM cost sensitive penalty function is then:

where pos represents the set of fractional samples and neg represents the set of majority samples. x is the number of _i Representing one sample in a minority set or a majority set, y _i Represents a sample x _i Corresponding genuine label, p _i Is to a sample x _i C represents a penalty factor. The decision weight of a few samples is improved through the cost-sensitive loss function, and the center is biased to a small sample during decision making.

The improved SPP-Net and CLSTM models are combined to obtain the fusion SPP-NET-CLSTM model. And splicing the improved SPP-Net network and the CLSTM sequentially to obtain a fused detection model, as shown in FIG. 4.

And S3, applying the trained SPP-Net-LSTM detection model to real-time detection, and immediately setting the corresponding domain as inaccessible if the abnormality is detected.

After the SPP-Net-LSTM detection model is trained, the trained model is used for real-time detection, and if the abnormality is detected, the corresponding domain is immediately set to be inaccessible so as to prevent information from being continuously leaked.

During real-time monitoring, a pcap packet is generated through wireshark flow, then python unpacking is used for extracting Dns flow information, and the Dns flow information is stored in a csv file. Then, the features are selected according to the needs and can be converted into feature vectors, and the feature vectors are input into the trained SPP-Net-LSTM detection model.

The method also predicts the data of the test set by using the trained model, records the prediction result of the model, evaluates the model according to the predicted result data and the data labeled by the test set, and evaluates the model by calculating the Accuracy (ACC), precision (Precision), recall (Recall) and F1-score of the model. Wherein:

wherein True Positive (TP) indicates buried channel data correctly classified as a buried channel; false Positives (FP) -normal data that is misclassified as a covert channel; true Negative (TN) -normal data correctly classified as normal; false Negative (FN) -hidden channel that is misclassified as normal.

In one embodiment, the present application further provides a low throughput DNS covert channel detection device, including a processor and a memory storing computer instructions, which when executed by the processor, implement the steps of the low throughput DNS covert channel detection method.

For specific limitations of the low-throughput DNS hidden channel detecting apparatus, reference may be made to the above limitations of the low-throughput DNS hidden channel detecting method, which is not described herein again. The low throughput DNS covert channel detection apparatus described above may be implemented in whole or in part by software, hardware, and combinations thereof. The method can be embedded in hardware or independent of a processor in the computer device, and can also be stored in software in a memory in the computer device, so that the processor can call and execute the corresponding operation.

The memory and the processor are electrically connected, directly or indirectly, to enable transmission or interaction of data. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The memory stores a computer program that can be executed on the processor, and the processor executes the computer program stored in the memory, thereby implementing the network topology layout method in the embodiment of the present invention.

The Memory may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Read Only Memory (EPROM), an electrically Erasable Read Only Memory (EEPROM), and the like. The memory is used for storing programs, and the processor executes the programs after receiving the execution instructions.

The processor may be an integrated circuit chip having data processing capabilities. The Processor may be a general-purpose Processor including a Central Processing Unit (CPU), a Network Processor (NP), and the like. The various methods, steps and logic blocks disclosed in embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent application shall be subject to the appended claims.

Claims

1. A low throughput DNS covert channel detection method, wherein the low throughput DNS covert channel detection method comprises:

inputting the feature vector corresponding to the data set into a constructed SPP-Net-LSTM detection model, and training to obtain the SPP-Net-LSTM detection model, wherein the SPP-Net-LSTM detection model comprises an improved SPP-Net network and a cost-sensitive LSTM network, an SPP pooling layer in the improved SPP-Net network comprises four pooling kernels of 1*1,2*2, 3*3 and 4*4 which are arranged in parallel, and the output of the SPP pooling layer is directly connected with the cost-sensitive LSTM network;

and (3) applying the trained SPP-Net-LSTM detection model to real-time detection, and immediately setting the corresponding domain as inaccessible if an abnormity is detected.

2. The low throughput DNS hidden channel detection method of claim 1 wherein said capturing a data set of low throughput DNS hidden channel activity further comprises:

3. The low throughput DNS hidden channel detection method of claim 1 wherein the key features extracted from the dataset for detection include one or more of domain name, resource record, TTL value, number of host names for a particular domain name, time difference between two requests under the same TLD, nxdmoin record, recently added a record and NS record.

4. The low throughput DNS hidden channel detection method of claim 3 wherein said converting the extracted key features into machine learned feature vectors is labeled 0 if an infrequently used record occurs for a resource record and 1 otherwise.

5. The low throughput DNS hidden channel detection method of claim 3 wherein said converting the extracted key features into machine-learned feature vectors is labeled 0 if the TTL value is between [0,100] for TTL values and 1 otherwise.

6. The low throughput DNS hidden channel detection method of claim 3 wherein said converting the extracted key features into machine learned feature vectors for nxdmoin records is marked as 0 if a "nxdmoin" response occurs for a piece of data in the dataset and is marked as 1 otherwise.

7. The low throughput DNS hidden channel detection method of claim 3 wherein said converting the extracted key features into machine-learned feature vectors, for the most recently added a records and NS records, first creating a set of historical a records and NS records, marking a data in a data set as 0 if an a record or NS record for that data never appears in the set, and otherwise marking as 1.

8. The low throughput DNS hidden channel detection method of claim 3 wherein said converting the extracted key features into machine-learned feature vectors for domain names comprises:

removing the separators in the domain names;

deleting the top level domain TLD of all domain names;

counting characters in each secondary domain name SLD name;

creating a vocabulary list for the counted characters;

assigning a unique integer label to each character in the vocabulary;

9. The low throughput DNS covert channel detection method of claim 1, wherein a penalty function of said cost-sensitive LSTM network is:

where pos represents a set of fractional samples, neg represents a set of majority samples, x _i Representing one sample in a minority set or a majority set, y _i Representing a sample x _i Corresponding authentic tag, p _i Is to a sample x _i C represents a penalty factor.

10. A low throughput DNS covert channel detection device comprising a processor and a memory storing computer instructions, wherein said computer instructions, when executed by the processor, implement the steps of the method of any one of claims 1 to 9.