CN110071829B - DNS tunnel detection method and device and computer readable storage medium - Google Patents

DNS tunnel detection method and device and computer readable storage medium Download PDF

Info

Publication number
CN110071829B
CN110071829B CN201910295145.1A CN201910295145A CN110071829B CN 110071829 B CN110071829 B CN 110071829B CN 201910295145 A CN201910295145 A CN 201910295145A CN 110071829 B CN110071829 B CN 110071829B
Authority
CN
China
Prior art keywords
matrix
dns
dns tunnel
mahalanobis distance
normal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910295145.1A
Other languages
Chinese (zh)
Other versions
CN110071829A (en
Inventor
郭豪
梁玉
洪春华
齐恒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910295145.1A priority Critical patent/CN110071829B/en
Publication of CN110071829A publication Critical patent/CN110071829A/en
Application granted granted Critical
Publication of CN110071829B publication Critical patent/CN110071829B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Abstract

The invention discloses a DNS tunnel detection method and a device, wherein the method comprises the following steps: acquiring acquired traffic data, wherein the traffic data is generated according to a DNS query behavior initiated by a terminal to a DNS server in a local area network; extracting a feature vector from the flow data; calculating the Mahalanobis distance between the characteristic vector and the normal traffic matrix through a DNS tunnel detection model based on the normal traffic matrix provided by the DNS tunnel detection model, wherein the DNS tunnel detection model is generated by performing model training through historical traffic data of the DNS server; and performing DNS tunnel abnormity identification on the flow data according to the Mahalanobis distance. The DNS tunnel detection method and the DNS tunnel detection device effectively solve the problem that the DNS tunnel is missed to be detected because the set rule is easy to bypass in the prior art.

Description

DNS tunnel detection method and device and computer readable storage medium
Technical Field
The present invention relates to the field of network security technologies, and in particular, to a DNS tunnel detection method, device, and computer-readable storage medium.
Background
The DNS tunnel is used for carrying out data transmission by using a hidden channel established in a DNS query process. According to the DNS protocol, if the access domain name accessed by the terminal is not inquired on the DNS server in the local area network, inquiring is carried out through the DNS server outside the local area network, and finally, an inquiry result is returned. That is, communication between the terminal in the local area network and the outside is realized by the DNS server in the local area network.
Based on the characteristics of the DNS tunnel, there are cases where the DNS tunnel is maliciously utilized, such as remote control and data theft by using the DNS tunnel through a tunneling tool. Therefore, detecting DNS tunnels is critical to identifying network security threats.
In the prior art, DNS tunnel detection is performed by setting rules, for example: the length of the domain name requested to be accessed is larger than a preset value, and the frequency domain of the domain name requested to be accessed is larger than the preset value, but for the detection method, an attacker can bypass the setting rule by modifying the characteristics related in the rules of domain name length, request frequency and the like, so that the accuracy of DNS tunnel detection is not high.
Therefore, the problem that the DNS tunnel is missed to be detected because the set rule is easy to bypass exists in the prior art.
Disclosure of Invention
In order to solve the problems in the related art, the invention provides a DNS tunnel detection method, a DNS tunnel detection device and a computer-readable storage medium.
In a first aspect, a DNS tunnel detection method includes:
acquiring acquired traffic data, wherein the traffic data is generated according to a DNS query behavior initiated by a terminal to a DNS server in a local area network;
extracting a feature vector from the flow data;
calculating the Mahalanobis distance between the characteristic vector and the normal traffic matrix through a DNS tunnel detection model based on the normal traffic matrix provided by the DNS tunnel detection model, wherein the DNS tunnel detection model is generated by performing model training through historical traffic data of the DNS server;
and performing DNS tunnel abnormity identification on the flow data according to the Mahalanobis distance.
In a second aspect, an apparatus for DNS tunnel detection, the apparatus comprising:
the acquisition module is used for acquiring the acquired traffic data, and the traffic data is generated according to the access behavior of the terminal to a DNS server in a local area network;
the extraction module is used for extracting a characteristic vector from the flow data;
the calculation module is used for calculating the Mahalanobis distance between the characteristic vector and the normal flow matrix through the DNS tunnel detection model based on the normal flow matrix provided by the DNS tunnel detection model, and the DNS tunnel detection model is generated by model training through historical flow data of the DNS server;
and the abnormal identification module is used for performing DNS tunnel abnormal identification on the flow data according to the Mahalanobis distance.
In a third aspect, an apparatus for DNS tunnel detection, the apparatus comprising:
a processor; and
a memory having computer readable instructions stored thereon which, when executed by the processor, implement the method described above.
In a fourth aspect, a computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the method as described above.
The technical scheme provided by the embodiment of the invention can have the following beneficial effects:
according to the invention, the Mahalanobis distance between the characteristic vector corresponding to the flow data and the normal flow matrix is calculated by using the DNS tunnel detection model generated by training the historical flow data of the DNS server in the local area network, and the abnormal identification of the DNS tunnel is carried out based on the Mahalanobis distance. On one hand, the DNS tunnel abnormity identification is carried out based on a DNS tunnel detection model, so that the problem that an attacker is easy to bypass based on a set rule in the prior art is effectively solved, and the accuracy of DNS tunnel detection is ensured; on the other hand, the DNS tunnel detection model generated based on the historical traffic data training of the DNS server in the local area network is suitable for the actual traffic environment in the local area network, so that the DNS tunnel detection model can be suitable for different detection environments, is trained based on the traffic data in the detection environments, and has universality.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a schematic illustration of an implementation environment in accordance with the present invention;
FIG. 2 is a block diagram illustrating a server in accordance with an exemplary embodiment;
FIG. 3 is a flow diagram illustrating a DNS tunnel detection method in accordance with an exemplary embodiment;
FIG. 4 is a flow diagram of steps in one embodiment before step S150 of the corresponding embodiment of FIG. 3;
FIG. 5 is a flow diagram of step S230 of the corresponding embodiment of FIG. 4 in one embodiment;
FIG. 6 is a flow diagram of step S217 of the corresponding embodiment of FIG. 5 in one embodiment;
FIG. 7 is a flow diagram of step S250 of the corresponding embodiment of FIG. 4 in one embodiment;
FIG. 8 is a compression diagram of compressing a matrix;
FIG. 9 is a flow diagram of step S130 in one embodiment;
FIG. 10 is a flow chart of step S170 in one embodiment;
FIG. 11 is a flow chart of step S170 in another embodiment;
FIG. 12 is a schematic diagram of a probability distribution function for a normal distribution;
FIG. 13 is a flowchart of step S175 of the corresponding embodiment of FIG. 10 in one embodiment;
fig. 14 is a block diagram illustrating a DNS tunnel detection apparatus in accordance with an exemplary embodiment;
fig. 15 is a block diagram illustrating a DNS tunnel detecting apparatus according to another exemplary embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
FIG. 1 is a schematic illustration of an implementation environment in accordance with the present invention. The implementation environment includes: the internal DNS server 300, at least one terminal 100 and the detection server 400 located in the local area network, and the external DNS server 500 corresponding to the domain name to be accessed by the terminal 100.
According to the DNS protocol, after a user inputs an access domain name of a website to be accessed in the terminal 100, DNS query is performed by the internal DNS server 300 according to the access domain name, if the internal DNS server 300 does not query, the internal DNS server 300 sends a query request to a DNS server outside the local area network through a firewall of the local area network, requests the external DNS server 500 corresponding to the access domain name to query through a large number of redirections, and returns a query result to the internal DNS server 300 and then returns to the terminal 100.
In this process, a covert DNS tunnel between the terminal 100 and the external DNS server 500 is constructed by the internal DNS server 300. According to the DNS protocol, the firewall does not process DNS query requests transmitted from the internal DNS server 300, and therefore, the terminal can communicate with the outside through the firewall by using the constructed DNS tunnel. Based on this feature, the DNS tunnel constructed by the internal DNS server is vulnerable to malicious exploitation, such as remote control through the DNS tunnel by a tunneling tool or even stealing data.
Based on this, in the present invention, the detection server 400 is deployed in the local area network, and the detection server 400 performs DNS tunnel detection on the traffic data of the internal DNS server 300 according to the technical solution of the present invention, so as to detect abnormal traffic data transmitted through the DNS tunnel, and perform DNS tunnel alarm.
FIG. 2 is a block diagram illustrating a server in accordance with an example embodiment. Server 200 may serve as detection server 400 in the embodiment of fig. 1.
It should be noted that the server 200 is only an example adapted to the present invention, and should not be considered as providing any limitation to the scope of the present invention. The server 200 is also not to be construed as necessarily dependent upon or having one or more components of the exemplary server 200 shown in fig. 2.
The hardware structure of the server 200 may be greatly different due to different configurations or performances, as shown in fig. 2, the server 200 includes: a power supply 210, an interface 230, at least one memory 250, and at least one Central Processing Unit (CPU) 270.
The power supply 210 is used to provide operating voltage for each hardware device on the server 200.
The interface 230 includes at least one wired or wireless network interface 231, at least one serial-to-parallel conversion interface 233, at least one input/output interface 235, and at least one USB interface 237, etc. for communicating with external devices.
The storage 250 is used as a carrier for resource storage, and may be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc., and the resources stored thereon include an operating system 251, an application 253 or data 255, etc., and the storage manner may be a transient storage or a permanent storage. The operating system 251 is used for managing and controlling various hardware devices and application programs 253 on the server 200 to implement the computation and processing of the mass data 255 by the central processing unit 270, and may be Windows server, Mac OS XTM, UnixTM, linux, FreeBSDTM, FreeRTOS, and the like. The application 253 is a computer program that performs at least one specific task on the operating system 251, and may include at least one module (not shown in fig. 2), each of which may contain a series of computer-readable instructions for the server 200. Data 255 may be collected historical traffic data for the internal DNS server, etc.
The central processor 270 may include one or more processors and is arranged to communicate with the memory 250 via a bus for computing and processing the mass data 255 in the memory 250.
As described in detail above, the server 200 to which the present invention is applied will implement the web page information acquisition method by the central processor 270 reading a series of computer readable instructions stored in the memory 250.
Furthermore, the present invention can be implemented by hardware circuitry or by a combination of hardware circuitry and software instructions, and thus, implementation of the present invention is not limited to any specific hardware circuitry, software, or combination of both.
Fig. 3 is a flow chart illustrating a DNS tunnel detection method in accordance with an example embodiment. The DNS tunnel detection method is used for the detection server 400 of the implementation environment shown in fig. 1. As shown in fig. 3, the DNS tunnel detection method, which may be executed by the detection server 400, may include the following steps:
step S110, acquiring the collected traffic data, wherein the traffic data is generated according to a DNS query behavior initiated by the terminal to a DNS server in the local area network.
As described above, according to the DNS protocol, when a user inputs an access domain name of a website to be accessed on a terminal in a local area network, the terminal first sends a DNS query request to an internal DNS server in the local area network to obtain an IP address corresponding to the access domain name, and when the internal DNS server in the local area network is not queried, performs DNS query by using a DNS server outside the local area network, and returns a query result to the terminal by using the internal DNS server in the local area network.
That is to say, the DNS tunnel is constructed based on a terminal requesting an internal DNS server in the local area network to perform DNS query, and a behavior in which the terminal initiates a DNS query request to the internal DNS server in the local area network is a DNS query behavior.
Based on the constructed DNS tunnel, the terminal transmits data with a DNS server outside the local area network (namely, the DNS server corresponding to the domain name to be accessed by the terminal), and the data transmitted in the constructed DNS tunnel is the traffic data to be detected by the DNS tunnel.
In a specific embodiment, the traffic of a DNS server in a local area network is collected in real time, and then the collected traffic is aggregated and filtered. The aggregation is performed based on both sides of the DNS tunnel (i.e., the terminal and the access domain name accessed by the terminal) and the statistical time. For example, according to the collected traffic, the traffic data of the terminal a and the access domain name B within a specified time period C are counted. In step S110, the acquired traffic data is also traffic data between a certain terminal passing through the DNS tunnel and the access domain name in the counted specified time period. Of course, the specified time period may be set according to actual needs, for example, half an hour, and the like, and is not particularly limited herein.
Step S130, extracting a feature vector from the flow data.
The acquired traffic data includes access duration of the terminal to the access domain name based on the DNS tunnel each time in a specified time period, access times in the specified time period, data volume of data transmitted during each access, domain name information of the access domain name (for example, a name of the access domain name, a number of levels of the domain name, a sub-domain name at each level, etc.), a DNS record in the access process, and the like.
In order to construct a feature vector of traffic data, based on the characteristics of the DNS tunnel, DNS tunnel features to be extracted are preset. The DNS tunnel feature is data for describing a DNS tunnel-based communication behavior.
The feature vector is a vector representation of the traffic data, specifically, a vector representation of DNS tunnel features extracted from the traffic data. Namely, after the DNS tunnel features are extracted from the traffic data, the extracted DNS tunnel features are subjected to numerical mapping, and feature vectors of the traffic data are constructed.
In an embodiment, the DNS tunnel characteristics extracted from the data traffic may be one, more or all of the following DNS tunnel characteristics:
the method comprises the steps of session total access time, session total flow, large packet ratio, small packet ratio, uplink-downlink ratio, sub-domain total number, access time mean value, access time variance, access interval time mean value, access interval time variance, domain name average series, sub-domain name average length, sub-domain name different character average ratio, each level of sub-domain name longest average length, sub-domain name value average ratio, sub-domain name entropy, A record ratio, AAAA record ratio, CNAME record ratio, NS record ratio, MX record ratio, TXT record ratio, PTR record ratio and other record ratio.
Where a record refers to a record specifying the IP address corresponding to the host name (or access domain name).
The NS record is a domain name server record, i.e., a record that is used to specify which DNS server to use for domain name resolution.
The MX record is a mail exchange record for locating a mail server according to an address suffix of an addressee when an electronic mail system sends a mail.
A CNAME record is an alias record, e.g., the same target server may have two or more domain names, and the alias record records all the domain names of the target server.
The TXT record refers to a description of a host or domain name.
PTR, shorthand for pointer, is used to map an IP address to a corresponding domain name. A PTR record refers to a record that resolves IP addresses to domain names.
AAAA record: records specifying the IPv6 address (e.g., ff06:0:0:0:0:0:0: c3) to which the host name (or domain name) corresponds.
Other record types than those listed above, such as explicit URL records, implicit URL records, SRV records, etc., are referred to by the Other record types in the domain name resolution record types.
In other embodiments, other DNS tunnel features besides the above listed DNS tunnel features may also be extracted, and are not specifically limited herein.
And step S150, calculating the Mahalanobis distance between the characteristic vector and the normal flow matrix through a DNS tunnel detection model based on the normal flow matrix provided by the DNS tunnel detection model, wherein the DNS tunnel detection model is generated by model training through historical flow data of a DNS server.
The normal traffic matrix is constructed according to the historical traffic data of the DNS server in the local area network in step S130.
In a specific embodiment, the historical traffic data for constructing the normal traffic matrix is also data after clustering according to the terminal and the visited domain name and the visit time, that is, the collected traffic data is grouped, for example, the group in the historical traffic data includes terminal B1 and visited domain name C1 at 12: traffic data in 00-13:00, terminal B1 and access domain name C1 are at 15: traffic data within 00-16:00, traffic data within 9:00-10:00 for terminal B2 and access domain C2, and so on.
Therefore, DNS tunnel feature extraction is carried out on each group of data in the historical traffic data, and a normal traffic matrix is constructed.
Mahalanobis distance is used to describe the degree of difference between the eigenvector and the normal traffic matrix. In the technical scheme of the invention, the normal flow data in the DNS tunnel in the local area network is represented by the normal flow matrix, so that the Mahalanobis distance obtained by calculation reflects the difference degree between the flow data and the normal flow data.
The DNS tunnel detection model is generated by performing unsupervised training according to historical traffic data of a DNS server of the local area network. That is, before model training, the historical traffic data as training samples does not need to be labeled, and training is continued directly through the historical traffic data. The process of model training is described in detail below.
The DNS tunnel detection model generated by model training builds a normal flow matrix based on historical flow data used by training, and therefore the DNS tunnel detection model calculates the Mahalanobis distance based on the characteristic vector of the flow data and the normal flow matrix.
And step S170, performing DNS tunnel abnormity identification on the flow data according to the Mahalanobis distance.
The DNS tunnel identification performed identifies whether traffic data transmitted through the DNS tunnel is normal traffic data for performing DNS query or abnormal traffic data (also referred to as DNS tunnel traffic) for transmitting other data based on the DNS tunnel. If the traffic data is abnormal traffic data, the constructed DNS tunnel (the DNS tunnel constructed between the terminal and the DNS server corresponding to the access domain name) is indicated as an abnormal DNS tunnel.
As described above, the computed mahalanobis distance represents a difference degree between the feature vector of the traffic data and the normal traffic matrix, and therefore, if the obtained mahalanobis distance is large, it indicates that the feature vector corresponding to the traffic data is large in difference with the normal traffic matrix, and the traffic data may be abnormal traffic data transmitted through the DNS tunnel; on the contrary, if the computed mahalanobis distance is smaller, it indicates that the difference between the eigenvector corresponding to the flow data and the normal flow matrix is small, and the flow data may be normal flow data.
In the prior art, if the traffic data of the internal DNS server in the local area network is not monitored, it may happen that malicious data is transmitted through the DNS tunnel constructed by the DNS protocol, such as transmitting viruses, trojans, performing remote control through the DNS tunnel, and stealing data. Through the technical scheme of the invention, the traffic data is identified as abnormal traffic data on the DNS tunnel, so that the alarm is given.
In the technical scheme of the invention, the DNS tunnel detection model is deployed in the network environment of the local area network for training, namely training is carried out according to historical flow data of an internal DNS server in the local area network, so that the trained DNS tunnel detection model is suitable for the flow data in the local area network, the abnormal identification precision of the DNS tunnel is improved, the accuracy of DNS tunnel detection is ensured, and the problem of low DNS tunnel detection accuracy caused by the fact that set rules are easy to bypass in the prior art is effectively solved.
In the prior art, there are also: and carrying out supervised training on the DNS tunnel detection model through the training sample subjected to label marking, and then deploying the trained DNS tunnel detection model to a network environment to be subjected to DNS tunnel detection so as to carry out DNS tunnel detection.
For the DNS tunnel detection mode by the DNS tunnel detection model generated by supervised training, on the one hand, the supervised training mode is based on collecting positive samples (i.e. DNS tunnel traffic) and negative samples. However, DNS tunnel traffic is generally constructed by DNS tunneling tools, since true positive samples are harder to obtain. And the DNS tunnel traffic constructed by the DNS tunnel tool is single, so that the detection capability of the trained DNS tunnel detection model for the DNS tunnel traffic of the non-DNS tunnel tool is insufficient, and the detection accuracy is low.
On the other hand, the negative samples of the training detection model are limited and have no universality, the DNS tunnel detection model is firstly trained offline, DNS tunnel detection is carried out when the DNS tunnel detection model is deployed in an actual network environment, and the normal traffic difference in different actual network environments is large. Therefore, the detection accuracy difference of the trained DNS tunnel detection model under different network environments is large, and the defect that the detection capability is insufficient in some network environments exists.
According to the technical scheme of the invention, the DNS tunnel detection model is trained in the deployed network environment, namely through traffic data in the network environment (the traffic data to be subjected to DNS tunnel traffic identification is historical traffic data). Therefore, after a normal traffic matrix is constructed according to training, DNS tunnel inspection is carried out on traffic data of an internal DNS server in the local area network, on one hand, collection and construction of positive samples and negative samples are not required to be specially carried out, on the other hand, a DNS tunnel detection model is deployed in an actual working network environment for training, the DNS tunnel detection model can be ensured to be adaptive to the working network environment, and the accuracy of DNS tunnel detection is ensured.
In a specific embodiment, in order to ensure the adaptability of the DNS tunnel detection model to the working network environment, the DNS tunnel detection model is retrained in the network environment at set time intervals to adapt to the changing working network environment. The change of the working network environment is mainly reflected in the change of the flow in the network environment (namely the local area network where the DNS server is located), so that the DNS tunnel detection model is updated and trained according to the latest historical flow data of the DNS server in the local area network at set time intervals, and the updated and trained DNS tunnel detection model is adaptive to the latest flow data in the local area network.
In an embodiment, as shown in fig. 4, before the step S150, the method further includes:
step S230, obtaining historical traffic data of the DNS server as a training sample, and constructing a normal traffic matrix.
As described above, the historical traffic data is clustered and grouped according to the terminal and the access domain name, so that each group of data in the historical traffic data is used as a training sample to obtain a plurality of training samples.
The historical traffic data is also obtained based on collecting traffic of a DNS server in the local area network.
As described above, the normal traffic matrix is used to characterize normal traffic of the DNS server in the local area network, and is constructed based on the training samples to detect abnormal traffic data in the DNS tunnel constructed by the DNS server in the local area network.
And S250, constructing a basic model, and performing model training on the basic model according to the normal traffic matrix to obtain a DNS tunnel detection model.
The basic model is constructed by collecting traffic data of a DNS server in the local area network based on the Mahalanobis distance to be calculated.
Before training of the basic model, a loss function is constructed for model training and model parameters of the basic model are initialized, so that iterative training is carried out by taking historical flow data as training data, the model parameters of the basic model are adjusted in the training process until the adjusted model parameters enable the loss function to be converged, and the basic model after model parameter adjustment is used as a DNS tunnel detection model.
In one embodiment, step S230 shown in fig. 5 includes:
and S231, extracting corresponding DNS tunnel characteristics from each training sample based on the characteristics of the DNS tunnel.
The DNS tunnel features to be extracted from the training samples are consistent with the DNS tunnel features extracted from the traffic data, e.g. all DNS tunnel features listed above are extracted from both the training samples and the traffic data.
In step S233, a feature vector of the training sample is generated from the DNS tunnel feature extracted from each training sample.
And the feature vector of the training sample is the vector representation of the extracted DNS tunnel feature.
And step S235, constructing and obtaining a sample flow matrix by taking the feature vector of each training sample as a row vector.
Step S237, denoising the sample flow matrix to obtain a normal flow matrix
The denoising processing is to remove the eigenvector corresponding to the positive sample in the sample flow matrix and retain the eigenvector corresponding to the negative sample. And taking the matrix after removing the characteristic vector of the positive sample in the sample flow matrix as a normal flow matrix. Thus, the effect of the positive samples on the calculated mahalanobis distance is avoided.
In one embodiment, as shown in fig. 6, step S217 includes:
and S310, performing matrix decomposition on the sample flow matrix through a Robust PCA algorithm to obtain a low-rank matrix and a sparse matrix.
Assuming that the sample flow matrix is Y, the analysis model of the Robust PCA algorithm is as follows:
Figure BDA0002026234890000111
therefore, the sample flow matrix Y is decomposed into a matrix L and a matrix S, wherein the matrix L is a low-rank matrix and is an approximate real value, and the matrix S is a sparse matrix and is an error matrix. In a specific embodiment, the low-rank matrix L and the sparse matrix S may be calculated by an iterative threshold algorithm, an accelerated near-end gradient algorithm, an augmented lagrange multiplier algorithm, and the like.
And step S330, performing summation operation on the row vectors in the sparse matrix, and taking the operation result as the error of the training sample corresponding to the row vector.
And step S350, constructing and obtaining a normal flow matrix according to the row vector of which the corresponding error is smaller than the set error.
Through the steps, the characteristic vector corresponding to the positive sample is removed from the sample flow matrix, the characteristic vector corresponding to the negative sample is reserved, and the normal flow matrix is constructed. And calculating the Mahalanobis distance based on the normal flow matrix, thereby ensuring the accuracy of the calculated Mahalanobis distance.
In one embodiment, as shown in fig. 7, the step of model training in step S250 includes:
step S251, performs singular value decomposition on the normal traffic matrix to obtain a decomposition matrix corresponding to the normal traffic matrix.
Singular Value Decomposition (SVD) i.e. the Decomposition of a normal flow matrix X of order m X n into an orthogonal matrix V of order m U, n and a diagonal matrix S of order m X n:
Xm×n=Um×mSm×nVn×n T (1-2)
wherein S ═ diag (σ)12,......,σr),σiAnd the diagonal matrix S is a singular value matrix of the normal flow matrix X, and the diagonal matrix S is greater than 0(i 1, 2.
Through the decomposition process, the normal traffic matrix X is subjected to matrix decomposition to obtain three decomposition matrices: the m-order orthogonal matrix U, n is an order orthogonal matrix V, which is also called the left singular vector of the normal traffic matrix X, and an m × n diagonal matrix S, and the orthogonal matrix V is also called the right singular vector of the normal traffic matrix.
And step S253, compressing the decomposition matrix to obtain a compression matrix which is used as a parameter of the basic model.
For the singular value matrix S of the normal traffic matrix, the elements on the diagonal are the singular values of the normal traffic matrix X, the other elements are zero, and in the singular value matrix S, the singular values on the diagonal are arranged from large to small, and the singular values decrease rapidly, and in many cases, the sum of the first 10% or even 1% of the singular values accounts for more than 99% of the sum of all the singular values, so the matrix can be approximately described by the largest r singular values and the corresponding left and right singular vectors, that is:
Figure BDA0002026234890000121
in a specific embodiment, the value of r is taken according to a set proportion, and the set proportion is a matrix S obtained by setting compressionr×rThe sum of the medium singular values accounts for the singular value matrix S obtained by decompositionm×nThe ratio of the sum of all singular values, e.g. as0.95, so that, according to the set ratio and the singular value matrix Sm×nK can be obtained, and then a decomposition matrix U is obtained by calculation respectivelym×m、Sm×nAnd Vn×nCompression matrix of (2): u shapem×r、Sr×r(i.e., S mentioned hereinafter)r)、Vr×n(i.e., V mentioned later)r). FIG. 8 is a pair matrix Um×m、Sm×nAnd Vn×nSchematic representation of the compression.
Step S255, it is determined whether the parameters of the basic model converge the loss function corresponding to the basic model.
If yes, step S257 is executed to obtain the DNS tunnel detection model through the basic model convergence.
If not, executing step S258, and iteratively updating the parameters of the basic model based on the normal traffic matrix. Until the updated parameters of the base model converge the loss function corresponding to the enumerated model.
As described above, before the model training, a base model is constructed from the collected flow data, a loss function is constructed for the base model, the model parameters of the base model are initialized, and the training of the base model is performed based on the initialization.
And in the training process, continuously adjusting the model parameters of the basic model until the loss function of the basic model converges. In the technical solution of the present invention, the compressed matrix is used as the model parameter of the basic model, and the adjustment of the parameter of the basic model is based on the normal traffic matrix constructed by the historical traffic data, that is, the adjustment of the model parameter is actually realized by updating and adjusting the normal traffic matrix.
In an embodiment, before step S251, the method further includes:
and normalizing the normal flow matrix to enable singular value decomposition to be based on the normal flow matrix after normalization processing.
As described above, the normal traffic matrix is constructed from the DNS tunnel features extracted from the negative examples. The extracted DNS tunnel characteristics are independent and uniformly distributed random variables. And calculating each DNS tunnel characteristic based on DNS tunnel characteristics extracted from a plurality of training samples and flow data of DNS tunnel detection, and calculating the mean value and standard deviation of each DNS tunnel characteristic.
Suppose that the extracted DNS tunnel features are respectively A1,A2,A3...AiI is the number of the extracted DNS tunnel characteristics, a normal flow matrix is formed by the DNS tunnel characteristics of j negative samples, and the normal flow matrix
Figure BDA0002026234890000131
X in the normal traffic matrixk1(1. ltoreq. k. ltoreq. j) is A extracted from the negative samplekIs used for numerical characterization of (1).
Let a be obtained by calculation1,A2,A3...AiThe mean values of (a) are respectively: mu.s123...μiThe standard deviations are respectively: sigma123,...σiNormalizing the normal traffic matrix according to the mean value and the variance of the characteristics of each DNS tunnel, wherein the normalized normal traffic matrix is as follows:
Figure BDA0002026234890000132
by carrying out normalization processing, the absolute value relationship of elements in the normal flow matrix is converted into a certain relative value relationship, so that the calculation of the Mahalanobis distance is simplified, and the calculation efficiency is improved.
In one embodiment, as shown in fig. 9, step S130 includes:
step S131, based on the characteristics of the DNS tunnel, extracting a plurality of DNS tunnel characteristics from the flow data.
Step S133, generating a feature vector of the traffic data from the features of the DNS tunnels.
The process of constructing the feature vector of the flow data is similar to the process of constructing the feature vector of the training sample, as described in detail above.
In one embodiment, step S150 includes:
based on a DNS tunnel detection model, according to a formula:
Figure BDA0002026234890000141
(wherein y ═ V)r T(x-μ)) (1-4)
The mahalanobis distance of the eigenvector from the normal traffic matrix is calculated.
Wherein m is the mahalanobis distance between the eigenvector and the normal traffic matrix.
x is a feature vector; mu is a mean vector formed by the mean of DNS tunnel features in the feature vector; vrFirst r columns of matrix V, SrIs a matrix obtained from diagonal elements of the first r items of the matrix S, the matrix V and the matrix S are decomposed matrices obtained by performing singular value decomposition on a normal flow matrix, and the matrix VrAnd SrIs a matrix obtained by compressing the matrix V and the matrix S.
In the training process of the DNS tunnel detection model, a matrix V for converging a loss function of the DNS tunnel detection model is obtained based on the constructed normal traffic matrixrAnd SrThus, the DNS tunnel detection model passes through the formula
Figure BDA0002026234890000142
The mahalanobis distance between the feature vector of the flow data and the normal flow matrix can be calculated.
In one embodiment, as shown in fig. 10, step S170 includes:
in step S171, a score is obtained from the mahalanobis distance map.
Since mahalanobis distance is unbounded, the computed mahalanobis distance may be infinite, and thus the mahalanobis distance is mapped to a limited interval, e.g., [0, 100], to reflect the magnitude of the mahalanobis distance by the score mapped by the mahalanobis distance.
The mahalanobis distance is score mapped by the following formula:
Figure BDA0002026234890000143
score is the fraction mapped by the mahalanobis distance m; a is set according to a set finite interval, for example, Ma's distance is mapped to [0, 100]]In this interval, a takes a value of 100; k and m0By a fraction of two given mahalanobis distances, e.g. mahalanobis distance m1Mapping to 1 minute, and dividing the Mahalanobis distance m2The mapping is 99 points, thus, according to m to be given1And 1, and m2And 99 are substituted into the above formula to calculate k and m0
In step S173, if the score exceeds the set score, it is recognized that the traffic data is from an abnormal DNS tunnel.
That is, a preset score is preset for the mapped score, and for the score corresponding to the mahalanobis distance exceeding the preset score, it is determined that the traffic data corresponding to the mahalanobis distance is from the abnormal DNS tunnel. Otherwise, if the score does not exceed the set score, the flow data corresponding to the mahalanobis distance is regarded as normal flow data.
Step S175, perform DNS tunnel warning on the access domain name corresponding to the traffic data, where the access domain name is the domain name requested to be queried by the DNS query behavior of the terminal.
For the traffic data, a terminal starting from the local area network initiates a DNS query request to an internal DNS server according to an accessed domain name, so that, for the terminal in the local area network, the security threat based on the DNS tunnel comes from the domain name server where the accessed domain name is located, and thus, when performing DNS tunnel alarm, a user is prompted which access domain names have security threat to the terminal, that is, the access domain name corresponding to the traffic data is alarmed.
After the Markov distance is subjected to score mapping, DNS tunnel recognition is carried out on the flow data corresponding to the Markov distance according to the set score, and unbounded Markov distance is mapped to a limited interval, so that the recognition precision of the DNS tunnel recognition on the flow data is improved, and the recognition efficiency is also improved.
In an embodiment, as shown in fig. 11, after step S173, the method further includes:
and step S174, according to the probability distribution function constructed for the Mahalanobis distance, carrying out abnormity detection on the Mahalanobis distance corresponding to the flow data.
If the mahalanobis distance is detected to be an abnormal mahalanobis distance, step S175 is executed, and a DNS tunnel alarm is performed to the monitoring end of the local area network according to the traffic data identified from the abnormal DNS tunnel.
The mahalanobis distance calculated for the flow data approximates to a normal distribution, and thus, the range of the normal mahalanobis distance is set in conjunction with the probability distribution function of the normal distribution, i.e., if the mahalanobis distance calculated is within the range, the mahalanobis distance is regarded as the normal mahalanobis distance, and otherwise, the mahalanobis distance is regarded as the abnormal mahalanobis distance. The range of the set normal mahalanobis distance may be set according to a mean and a standard deviation of a normal distribution to which the mahalanobis distance conforms.
In one embodiment, the range of normal mahalanobis distances is set based on a 3 σ criterion. The distribution function for a random variable conforming to a normal distribution is shown in fig. 12. In the 3 σ criterion, the probability that a numerical value is distributed in (μ -3 σ, μ +3 σ) is 0.9973, and therefore, for a numerical value distributed outside the range of (μ -3 σ, μ +3 σ), it is considered as abnormal.
Correspondingly, the value of Mahalanobis distance is located at (μ)s-3σss+3σs) A Mahalanobis distance that is within the range is considered a normal Mahalanobis distance, while a Mahalanobis distance that is outside the range is considered an abnormal Mahalanobis distance. Mu.ssIs the mean, σ, of the normal distribution to which this random variable of mahalanobis distance conformssMahalanobis distance is fit to the standard deviation of normal distribution. Thus, if the calculated mahalanobis distance is at (μ)s-3σss+3σs) Within the range, the mahalanobis distance is the normal mahalanobis distance, otherwise, the mahalanobis distance is the abnormal mahalanobis distance.
For the abnormal mahalanobis distance, step S175 is executed, and for the normal mahalanobis distance, no warning is made.
And combining the scores mapped according to the Mahalanobis distance and the abnormal detection result of the Mahalanobis distance, and performing DNS tunnel warning on the traffic data of which the scores exceed the set scores and the Mahalanobis distance is the abnormal Mahalanobis distance, so that the DNS tunnel detection precision is improved.
In one embodiment, as shown in fig. 13, step S175 includes:
step S410, calculating a weight of each DNS tunnel feature in the feature vector according to the feature vector of the traffic data.
Eigenvector x ═ x for flow data1,x2,x3,...,xi) And i is the number of DNS tunnel features according to the following formula:
Figure BDA0002026234890000161
calculating to obtain xkFeature weight c of located DNS tunnel featurek
And step S430, determining the DNS tunnel characteristics with the weight exceeding the set weight according to the calculated weight.
Step S450, generating DNS tunnel alarm information for the access domain name corresponding to the flow data according to the determined DNS tunnel characteristics, and performing DNS tunnel alarm on the access domain name through the DNS tunnel alarm information.
The weight of the DNS tunnel characteristics is obtained through calculation, and the DNS tunnel warning message is generated according to the DNS tunnel characteristics with the weight exceeding the set weight, so that a user can know which DNS tunnel characteristics cause larger Mahalanobis distance of the traffic data through the DNS tunnel warning message and warn the access domain name of the traffic data, and the interpretability of DNS tunnel warning is guaranteed.
In an embodiment, before step S450, the method further includes:
and filtering the access domain name to be alarmed according to the configured false alarm filtering condition.
If the access domain name does not satisfy the false alarm filtering condition, step S450 is executed.
The configured false positive filtering condition can be configured according to the actual network traffic in the internal DNS server in the local area network.
And filtering out the access domain name without security threat although the traffic data comes from an abnormal DNS tunnel through the configured false alarm filtering condition, thereby improving the effectiveness of DNS tunnel alarm.
In an application scenario, although data is transmitted between the terminal and the access domain name through the DNS tunnel, the transmitted data is not security-threatening to the terminal in the local area network, for example, the transmitted data is data for feedback. That is to say, the access domain name corresponding to the traffic data does not present a security threat to the terminal, and then the DNS tunnel alarm for the access domain name is not required. Therefore, in this scenario, the access domain name without security threat can be filtered out from the access domain name to be alarmed by configuring the false alarm filtering condition. And the configured false alarm filtering condition, such as a white list, does not perform DNS tunnel alarm on the access domain name on the white list.
In another scenario, in conjunction with actual network traffic in the internal DNS server, it is found that: in the traffic data with security threat, the DNS tunnel causing the Mahalanobis distance to be large is characterized in that: DNS Tunnel feature A1、A2And A3And for the except DNS tunnel feature A1、A2And A3Other DNS tunnel features, even if the weight is greater than the set weight, are not security threats. Therefore, the false alarm filtering condition can be configured, for example, a specified DNS tunnel characteristic is configured, that is, when the weight of the specified DNS tunnel characteristic of the traffic data exceeds a set weight, the access domain name corresponding to the traffic data is only alarmed, and when the DNS tunnel characteristic of the traffic data whose weight exceeds the set weight does not include the specified DNS tunnel characteristic, no alarm is given.
The following is an embodiment of the apparatus of the present invention, which can be used to execute an embodiment of the DNS tunnel detection method executed by the detection server 400 of the present invention. For details that are not disclosed in the embodiments of the apparatus of the present invention, please refer to the embodiments of the DNS tunnel detection method of the present invention.
Fig. 14 is a block diagram illustrating a DNS tunnel detection apparatus 900 according to an exemplary embodiment, where the DNS tunnel detection apparatus 900 may be used in the detection server 400 in the implementation environment shown in fig. 1, and performs all or part of the steps of any of the method embodiments.
As shown in fig. 14, the detection server 400 apparatus 900 includes but is not limited to: an acquisition module 110, an extraction module 130, a calculation module 150, and an anomaly identification module 170.
The obtaining module 110 is configured to obtain the collected traffic data, where the traffic data is generated according to a DNS query behavior initiated by the terminal to the DNS server in the local area network.
And the extracting module 130 is configured to extract a feature vector from the flow data.
The calculating module 150 is configured to calculate a mahalanobis distance between the feature vector and the normal traffic matrix through a DNS tunnel detection model based on the normal traffic matrix provided by the DNS tunnel detection model, where the DNS tunnel detection model is generated by performing model training through historical traffic data of a DNS server.
And the anomaly identification module 170 is configured to perform DNS tunnel anomaly identification on the traffic data according to the mahalanobis distance.
The implementation process of the function and the effect of each module in the above device is specifically detailed in the implementation process of the corresponding step in the above DNS tunnel detection method, and is not described herein again.
It is understood that these modules may be implemented in hardware, software, or a combination of both. When implemented in hardware, these modules may be implemented as one or more hardware modules, such as one or more application specific integrated circuits. When implemented in software, the modules may be implemented as one or more computer programs executing on one or more processors, such as programs stored in memory 250 for execution by central processor 270 of FIG. 2.
In an embodiment, the DNS tunnel detection apparatus further includes, but is not limited to: the system comprises a normal traffic matrix construction module and a DNS tunnel detection model generation module.
The normal traffic matrix construction module is used for acquiring historical traffic data of the DNS as a training sample and constructing a normal traffic matrix.
And the DNS tunnel detection model generation module is used for constructing a basic model and carrying out model training on the basic model according to the normal flow matrix to obtain a DNS tunnel detection model.
In one embodiment, the normal traffic matrix building module includes, but is not limited to: the device comprises a first DNS tunnel feature extraction unit, a first feature vector generation unit, a sample flow matrix construction unit and a denoising processing unit.
The first DNS tunnel feature extraction unit is used for extracting corresponding DNS tunnel features from each training sample based on the characteristics of the DNS tunnel.
And the first feature vector generation unit is used for generating a feature vector of the training sample according to the DNS tunnel feature extracted from each training sample.
And the sample flow matrix construction unit is used for constructing and obtaining a sample flow matrix by taking the characteristic vector of each training sample as a row vector.
And the denoising processing unit is used for denoising the sample flow matrix to obtain a normal flow matrix.
In one embodiment, the denoising processing unit includes, but is not limited to: the device comprises a first matrix decomposition unit, an error calculation unit and a normal flow matrix construction unit.
The first matrix decomposition unit is used for performing matrix decomposition on the sample flow matrix through a Robust PCA algorithm to obtain a low-rank matrix and a sparse matrix.
And the error calculation unit is used for performing summation operation on the row vectors in the sparse matrix and taking the operation result as the error of the training sample corresponding to the row vector.
And the normal flow matrix construction unit is used for constructing and obtaining a normal flow matrix according to the row vector of which the corresponding error is smaller than the set error.
In one embodiment, the DNS tunnel detection model generation module includes, but is not limited to: the device comprises a singular value decomposition unit, a matrix compression unit, a DNS tunnel detection model obtaining unit and an iteration updating unit.
The singular value decomposition unit is used for performing singular value decomposition on the normal flow matrix to obtain a decomposition matrix corresponding to the normal flow matrix.
And the matrix compression unit is used for compressing the decomposition matrix to obtain a compression matrix which is used as a parameter of the basic model.
And the DNS tunnel detection model obtaining unit is used for obtaining the DNS tunnel detection model through the convergence of the basic model if the parameters of the basic model enable the loss function corresponding to the basic model to be converged.
And the iteration updating unit is used for performing iteration updating on the parameters of the basic model based on the normal flow matrix if the parameters of the basic model enable the loss function corresponding to the non-basic model to be converged.
In an embodiment, the DNS tunnel detection model generation module further includes, but is not limited to: and a normalization unit.
The normalization unit is used for performing normalization processing on the normal flow matrix, so that singular value decomposition is based on the normal flow matrix after the normalization processing.
In one embodiment, the extraction module, includes but is not limited to: a second DNS tunnel feature extraction unit and a second feature vector generation unit.
And the second DNS tunnel feature extraction unit is used for extracting a plurality of DNS tunnel features from the flow data based on the characteristics of the DNS tunnel.
And the second feature vector generating unit is used for generating a feature vector of the traffic data by the features of the plurality of DNS tunnels.
In one embodiment, the computing modules, include, but are not limited to: and a computing unit.
Wherein, the calculation unit is used for detecting the model based on the DNS tunnel according to the formula
Figure BDA0002026234890000201
And calculating the mahalanobis distance between the feature vector and the normal flow matrix.
Wherein m is the mahalanobis distance between the eigenvector and the normal traffic matrix.
y=Vr T(x- μ), x being a feature vector; mu is a mean vector formed by the mean of DNS tunnel features in the feature vector; vrFirst r columns of matrix V, SrIs a diagonal matrix formed by the first r diagonal elements of a diagonal matrix S, the matrix V and the matrix S are decomposition matrices obtained by performing singular value decomposition on a normal traffic matrix, and the matrix VrAnd SrIs a matrix obtained by compressing the matrix V and the diagonal matrix S.
In one embodiment, the anomaly identification module includes, but is not limited to: the device comprises a mapping unit, an identification unit and an alarm unit.
The mapping unit is used for obtaining the score through Mahalanobis distance mapping.
And the identification unit is used for identifying that the traffic data comes from the abnormal DNS tunnel if the score exceeds the set score.
And the warning unit is used for performing DNS tunnel warning on the access domain name corresponding to the flow data, and the access domain name is the domain name requested to be queried by the DNS query behavior of the terminal.
In one embodiment, the anomaly identification module further includes, but is not limited to: an abnormality detection unit.
The anomaly detection unit detects anomalies of the Mahalanobis distance corresponding to the flow data according to the probability distribution function constructed for the Mahalanobis distance.
And if the detected Mahalanobis distance is the abnormal Mahalanobis distance, jumping to an alarm unit.
In one embodiment, the alarm unit includes, but is not limited to: the device comprises a weight calculation unit, a feature determination unit and a DNS tunnel alarm message generation unit.
The weight calculation unit is used for calculating the weight of each DNS tunnel feature in the feature vector according to the feature vector of the traffic data.
And the characteristic determining unit is used for determining the DNS tunnel characteristics with the weight exceeding the set weight according to the calculated weight.
And the DNS tunnel warning message generating unit is used for generating a DNS tunnel warning message for the access domain name corresponding to the flow data according to the determined DNS tunnel characteristics, and performing DNS tunnel warning on the access domain name through the DNS tunnel warning message.
In one embodiment, the alarm unit further includes, but is not limited to: and a filtering unit.
The filtering unit is used for filtering the access domain name to be alarmed according to the configured false alarm filtering condition.
And if the access domain name does not meet the false alarm filtering condition, skipping to a DNS tunnel alarm message generating unit.
The implementation process of the function and the effect of each module/unit in the above device is specifically detailed in the implementation process of the corresponding step in the above DNS tunnel detection method, and is not described herein again.
Optionally, the present invention further provides a DNS tunnel detection apparatus, where the DNS tunnel detection apparatus may be used in the detection server 400 in the implementation environment shown in fig. 1, and performs all or part of the steps of the DNS tunnel detection method shown in any one of the above method embodiments. As shown in fig. 15, the DNS tunnel detection apparatus 1000 includes, but is not limited to: a processor 1001 and a memory 1002.
Wherein the memory 1002 has stored thereon computer readable instructions which, when executed by the processor 1001, implement the method of any of the above method implementations.
Wherein the executable instructions, when executed by the processor 1001, implement the method in any of the above embodiments. Such as computer readable instructions, which when executed by the processor 1001, read stored in the memory via the communication line/bus 1003 connected to the memory.
The specific manner in which the processor in this embodiment performs the operation has been described in detail in the embodiment related to the DNS tunnel detection method, and will not be elaborated here.
In an exemplary embodiment, a storage medium is also provided, which is a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any of the above method embodiments. The storage medium includes, for example and without limitation, a memory of instructions executable by a central processor of a server to perform the DNS tunnel detection method described above.
The specific manner in which the processor in this embodiment performs the operation has been described in detail in the embodiment related to the DNS tunnel detection method, and will not be elaborated here.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (13)

1. A DNS tunnel detection method is characterized by comprising the following steps:
acquiring acquired traffic data, wherein the traffic data is generated according to a DNS query behavior initiated by a terminal to a DNS server in a local area network;
extracting a feature vector from the flow data;
calculating the Mahalanobis distance between the characteristic vector and the normal traffic matrix through a DNS tunnel detection model based on the normal traffic matrix provided by the DNS tunnel detection model, wherein the DNS tunnel detection model is generated by performing model training by taking the latest historical traffic data of the DNS server as a training sample according to a set time interval, and the normal traffic matrix provided by the DNS tunnel detection model is constructed based on the latest historical traffic data of the training sample as the DNS tunnel detection model; the DNS tunnel detection model is obtained through the following processes: acquiring the latest historical traffic data of the DNS as a training sample, and constructing the normal traffic matrix; constructing a basic model, and performing singular value decomposition on the normal flow matrix to obtain a decomposition matrix corresponding to the normal flow matrix; compressing the decomposition matrix to obtain a compression matrix which is used as a parameter of the basic model; if the parameters of the basic model enable the loss function corresponding to the basic model to be converged, the DNS tunnel detection model is obtained through the convergence of the basic model;
performing DNS tunnel anomaly identification on the traffic data according to the Mahalanobis distance, wherein the DNS tunnel anomaly identification on the traffic data according to the Mahalanobis distance comprises the following steps: and combining the scores mapped according to the Mahalanobis distance and the abnormal detection result of the Mahalanobis distance, and performing DNS tunnel warning on the flow data of which the scores exceed the set scores and the Mahalanobis distance is the abnormal Mahalanobis distance.
2. The method of claim 1, wherein the obtaining the latest historical traffic data of the DNS server as a training sample and constructing the normal traffic matrix comprises:
based on the characteristics of the DNS tunnel, extracting corresponding DNS tunnel characteristics from each training sample;
generating a feature vector of each training sample according to the DNS tunnel features extracted from the training sample;
constructing a sample flow matrix by taking the characteristic vector of each training sample as a row vector;
and denoising the sample flow matrix to obtain the normal flow matrix.
3. The method of claim 2, wherein the denoising the sample traffic matrix to obtain the normal traffic matrix comprises:
performing matrix decomposition on the sample flow matrix through a Robust PCA algorithm to obtain a low-rank matrix and a sparse matrix;
performing summation operation on the row vectors in the sparse matrix, and taking an operation result as an error of a training sample corresponding to the row vector;
and constructing and obtaining the normal flow matrix according to the row vector with the corresponding error smaller than the set error.
4. The method of claim 1, wherein the model training of the base model according to the normal traffic matrix to obtain the DNS tunnel detection model further comprises:
and normalizing the normal flow matrix to enable the singular value decomposition to be based on the normal flow matrix after the normalization processing.
5. The method of any one of claims 1 to 4, wherein said extracting a feature vector from said flow data comprises:
extracting a plurality of DNS tunnel characteristics from the flow data based on the characteristics of the DNS tunnel;
generating a feature vector of the traffic data from a number of DNS tunnel features.
6. The method according to any one of claims 1 to 4, wherein calculating the Mahalanobis distance between the feature vector and the normal traffic matrix based on the normal traffic matrix provided by the DNS tunnel detection model comprises:
based on the DNS tunnel detection model, according to a formula
Figure FDA0003279596390000021
Calculating the Mahalanobis distance between the feature vector and the normal flow matrix;
wherein m is the mahalanobis distance between the eigenvector and the normal flow matrix;
y=Vr T(x- μ), x being the feature vector; mu is a mean value vector formed by the mean values of the DNS tunnel features in the feature vector; vrFirst r columns of matrix V, SrIs a diagonal matrix formed by the first r diagonal elements of a diagonal matrix S, the matrix V and the matrix S are decomposition matrices obtained by performing singular value decomposition on the normal flow matrix, and the matrix VrAnd SrIs a matrix obtained by compressing the matrix V and the diagonal matrix S.
7. The method of any one of claims 1 to 4, wherein said performing DNS tunnel anomaly identification on said traffic data according to said Mahalanobis distance comprises:
obtaining a score from the mahalanobis distance mapping;
if the score exceeds a set score, identifying that the traffic data is from an abnormal DNS tunnel;
and performing DNS tunnel alarm on an access domain name corresponding to the flow data, wherein the access domain name is the domain name requested to be inquired by the DNS inquiry behavior of the terminal.
8. The method of claim 7, wherein after identifying that the traffic data is from an abnormal DNS tunnel if the score exceeds a set score, the method further comprises:
according to a probability distribution function constructed for the Mahalanobis distance, carrying out anomaly detection on the Mahalanobis distance corresponding to the flow data;
and if the Mahalanobis distance is detected to be an abnormal Mahalanobis distance, executing the step of performing DNS tunnel warning on the access domain name corresponding to the flow data.
9. The method of claim 7, wherein the performing the DNS tunnel alert on the access domain name corresponding to the traffic data comprises:
calculating the weight of each DNS tunnel feature in the feature vector according to the feature vector of the flow data;
according to the calculated weight, determining the DNS tunnel characteristics with the weight exceeding the set weight;
and generating a DNS tunnel alarm message for the access domain name corresponding to the flow data according to the determined DNS tunnel characteristics, and performing DNS tunnel alarm on the access domain name through the DNS tunnel alarm message.
10. The method of claim 9, wherein before generating the DNS tunnel alert message for the access domain name corresponding to the traffic data according to the determined DNS tunnel characteristic, the method further comprises:
filtering the access domain name to be alarmed according to the configured false alarm filtering condition;
and if the access domain name does not meet the false alarm filtering condition, executing the step of generating a DNS tunnel alarm message for the access domain name corresponding to the flow data according to the determined DNS tunnel characteristic.
11. An apparatus for DNS tunnel detection, the apparatus comprising:
the acquisition module is used for acquiring the acquired traffic data, and the traffic data is generated according to a DNS query behavior initiated by a terminal to a DNS server in a local area network;
the extraction module is used for extracting a characteristic vector from the flow data;
the calculation module is used for calculating the Mahalanobis distance between the characteristic vector and the normal flow matrix through the DNS tunnel detection model based on the normal flow matrix provided by the DNS tunnel detection model, the DNS tunnel detection model is generated by performing model training by taking the latest historical flow data of the DNS server as a training sample according to a set time interval, and the normal flow matrix provided by the DNS tunnel detection model is constructed based on the latest historical flow data of the training sample taken as the DNS tunnel detection model; the DNS tunnel detection model is obtained through the following processes: acquiring the latest historical traffic data of the DNS as a training sample, and constructing the normal traffic matrix; constructing a basic model, and performing singular value decomposition on the normal flow matrix to obtain a decomposition matrix corresponding to the normal flow matrix; compressing the decomposition matrix to obtain a compression matrix which is used as a parameter of the basic model; if the parameters of the basic model enable the loss function corresponding to the basic model to be converged, the DNS tunnel detection model is obtained through the convergence of the basic model;
an anomaly identification module, configured to perform DNS tunnel anomaly identification on the traffic data according to the mahalanobis distance, where the performing DNS tunnel anomaly identification on the traffic data according to the mahalanobis distance includes: and combining the scores mapped according to the Mahalanobis distance and the abnormal detection result of the Mahalanobis distance, and performing DNS tunnel warning on the flow data of which the scores exceed the set scores and the Mahalanobis distance is the abnormal Mahalanobis distance.
12. An apparatus for DNS tunnel detection, the apparatus comprising:
a processor; and
a memory having computer readable instructions stored thereon which, when executed by the processor, implement the method of any of claims 1 to 10.
13. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1 to 10.
CN201910295145.1A 2019-04-12 2019-04-12 DNS tunnel detection method and device and computer readable storage medium Active CN110071829B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910295145.1A CN110071829B (en) 2019-04-12 2019-04-12 DNS tunnel detection method and device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910295145.1A CN110071829B (en) 2019-04-12 2019-04-12 DNS tunnel detection method and device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110071829A CN110071829A (en) 2019-07-30
CN110071829B true CN110071829B (en) 2022-03-04

Family

ID=67367688

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910295145.1A Active CN110071829B (en) 2019-04-12 2019-04-12 DNS tunnel detection method and device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110071829B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111683096A (en) * 2020-06-10 2020-09-18 北京天融信网络安全技术有限公司 Data processing method based on domain name service protocol and electronic equipment
CN112640392B (en) * 2020-11-20 2022-05-13 华为技术有限公司 Trojan horse detection method, device and equipment
CN112953916B (en) * 2021-01-29 2023-01-03 丁牛信息安全科技(江苏)有限公司 Anomaly detection method and device
CN113660212B (en) * 2021-07-26 2022-11-29 北京天融信网络安全技术有限公司 Method and device for detecting DNS tunnel flow in real time
CN113839948B (en) * 2021-09-26 2023-10-24 新华三信息安全技术有限公司 DNS tunnel traffic detection method and device, electronic equipment and storage medium
CN114422476B (en) * 2021-12-28 2023-09-22 互联网域名系统北京市工程研究中心有限公司 Method and device for preventing CNAME (CNAME) cache pollution
CN115348188B (en) * 2022-10-18 2023-03-24 安徽华云安科技有限公司 DNS tunnel traffic detection method and device, storage medium and terminal

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101702660A (en) * 2009-11-12 2010-05-05 中国科学院计算技术研究所 Abnormal domain name detection method and system
CN101841435A (en) * 2010-01-18 2010-09-22 中国科学院计算机网络信息中心 Method, apparatus and system for detecting abnormality of DNS (domain name system) query flow
CN103001825A (en) * 2012-11-15 2013-03-27 中国科学院计算机网络信息中心 Method and system for detecting DNS (domain name system) traffic abnormality
US9003518B2 (en) * 2010-09-01 2015-04-07 Raytheon Bbn Technologies Corp. Systems and methods for detecting covert DNS tunnels
EP2112800B1 (en) * 2008-04-25 2017-12-27 Deutsche Telekom AG Method and system for enhanced recognition of attacks to computer systems
CN108400972A (en) * 2018-01-30 2018-08-14 北京兰云科技有限公司 A kind of method for detecting abnormality and device
CN109218124A (en) * 2017-07-06 2019-01-15 杨连群 DNS tunnel transmission detection method and device
CN109474575A (en) * 2018-09-11 2019-03-15 北京奇安信科技有限公司 A kind of detection method and device in the tunnel DNS

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103326894B (en) * 2013-05-29 2016-12-28 深信服网络科技(深圳)有限公司 The method and apparatus of DNS Tunnel testing
US9794229B2 (en) * 2015-04-03 2017-10-17 Infoblox Inc. Behavior analysis based DNS tunneling detection and classification framework for network security
US9967227B2 (en) * 2015-11-11 2018-05-08 Fastly, Inc. Enhanced content route selection in content delivery networks
US10462159B2 (en) * 2016-06-22 2019-10-29 Ntt Innovation Institute, Inc. Botnet detection system and method
US10432651B2 (en) * 2017-08-17 2019-10-01 Zscaler, Inc. Systems and methods to detect and monitor DNS tunneling
CN107733851B (en) * 2017-08-23 2020-05-01 刘胜利 DNS tunnel Trojan detection method based on communication behavior analysis
CN109309673A (en) * 2018-09-18 2019-02-05 南京方恒信息技术有限公司 A kind of DNS private communication channel detection method neural network based

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2112800B1 (en) * 2008-04-25 2017-12-27 Deutsche Telekom AG Method and system for enhanced recognition of attacks to computer systems
CN101702660A (en) * 2009-11-12 2010-05-05 中国科学院计算技术研究所 Abnormal domain name detection method and system
CN101841435A (en) * 2010-01-18 2010-09-22 中国科学院计算机网络信息中心 Method, apparatus and system for detecting abnormality of DNS (domain name system) query flow
US9003518B2 (en) * 2010-09-01 2015-04-07 Raytheon Bbn Technologies Corp. Systems and methods for detecting covert DNS tunnels
CN103001825A (en) * 2012-11-15 2013-03-27 中国科学院计算机网络信息中心 Method and system for detecting DNS (domain name system) traffic abnormality
CN109218124A (en) * 2017-07-06 2019-01-15 杨连群 DNS tunnel transmission detection method and device
CN108400972A (en) * 2018-01-30 2018-08-14 北京兰云科技有限公司 A kind of method for detecting abnormality and device
CN109474575A (en) * 2018-09-11 2019-03-15 北京奇安信科技有限公司 A kind of detection method and device in the tunnel DNS

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于DNS分析恶意行为检测的研究;白凡;《电信网技术》;20170815(第8期);全文 *

Also Published As

Publication number Publication date
CN110071829A (en) 2019-07-30

Similar Documents

Publication Publication Date Title
CN110071829B (en) DNS tunnel detection method and device and computer readable storage medium
US9386028B2 (en) System and method for malware detection using multidimensional feature clustering
CN109951477B (en) Method and device for detecting network attack based on threat intelligence
CN107920055B (en) IP risk evaluation method and IP risk evaluation system
US10476753B2 (en) Behavior-based host modeling
US10367842B2 (en) Peer-based abnormal host detection for enterprise security systems
JP2022533552A (en) Hierarchical Behavior Modeling and Detection Systems and Methods for System-Level Security
US11522916B2 (en) System and method for clustering networked electronic devices to counter cyberattacks
CN113497797B (en) Abnormality detection method and device for ICMP tunnel transmission data
US20230125203A1 (en) Network anomaly detection
US10476754B2 (en) Behavior-based community detection in enterprise information networks
CN111835681A (en) Large-scale abnormal flow host detection method and device
CN117216660A (en) Method and device for detecting abnormal points and abnormal clusters based on time sequence network traffic integration
CN113706100A (en) Real-time detection and identification method and system for distribution network Internet of things terminal equipment
CN112351018A (en) DNS hidden channel detection method, device and equipment
Nalavade et al. Evaluation of k-means clustering for effective intrusion detection and prevention in massive network traffic data
US20220210172A1 (en) Detection of anomalies associated with fraudulent access to a service platform
CN114189348A (en) Asset identification method suitable for industrial control network environment
CN111291078B (en) Domain name matching detection method and device
JP6930663B2 (en) Device identification device and device identification method
CN115037532B (en) Malicious domain name detection method based on heteromorphic image, electronic device and storage medium
CN113792291B (en) Host recognition method and device infected by domain generation algorithm malicious software
EP2991305B1 (en) Apparatus and method for identifying web page for industrial control system
Zhou et al. Fingerprinting IIoT devices through machine learning techniques
WO2018217259A2 (en) Peer-based abnormal host detection for enterprise security systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant