CN114567487A - DNS hidden tunnel detection method with multi-feature fusion - Google Patents

DNS hidden tunnel detection method with multi-feature fusion Download PDF

Info

Publication number
CN114567487A
CN114567487A CN202210198998.5A CN202210198998A CN114567487A CN 114567487 A CN114567487 A CN 114567487A CN 202210198998 A CN202210198998 A CN 202210198998A CN 114567487 A CN114567487 A CN 114567487A
Authority
CN
China
Prior art keywords
domain name
dns
sample
samples
hidden tunnel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210198998.5A
Other languages
Chinese (zh)
Other versions
CN114567487B (en
Inventor
林飞
李鼎
易永波
古元
毛华阳
华仲峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Act Technology Development Co ltd
Original Assignee
Beijing Act Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Act Technology Development Co ltd filed Critical Beijing Act Technology Development Co ltd
Priority to CN202210198998.5A priority Critical patent/CN114567487B/en
Publication of CN114567487A publication Critical patent/CN114567487A/en
Application granted granted Critical
Publication of CN114567487B publication Critical patent/CN114567487B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/029Firewall traversal, e.g. tunnelling or, creating pinholes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Signal Processing (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Security & Cryptography (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computer Hardware Design (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Probability & Statistics with Applications (AREA)
  • Pure & Applied Mathematics (AREA)
  • Algebra (AREA)
  • Medical Informatics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A DNS hidden tunnel detection method with multi-feature fusion relates to the technical field of information, and the method comprises the following steps: 1) acquiring a DNS hidden tunnel flow packet by a black sample collector through a self-built DNS hidden tunnel; 2) preprocessing DNS hidden tunnel flow packet data by a black sample standardization module, and extracting DNS hidden tunnel flow packet data characteristics; 3) a white sample standardization module acquires a normal DNS request sample; 4) constructing a neural network model module; 5) constructing a rapid pre-screening module by using a white sample; the quick pre-screening module can simply distinguish the normal request domain name from the tunnel request domain name, efficiently and quickly eliminates the normal request domain name occupying most of the actual work, and in the aspect of deep learning detection, the quick pre-screening module combines the general rule characteristic and the deep domain name text characteristic for DNS hidden tunnel detection, thereby improving the detection accuracy and reducing the detection difficulty.

Description

Multi-feature fusion DNS hidden tunnel detection method
Technical Field
The invention relates to the technical field of information.
Background
With the continuous development of the internet, the DNS becomes an indispensable service, so that a general firewall cannot detect and filter DNS traffic. Therefore, lawless persons can take the opportunity to use the DNS as a hidden channel to realize remote control, file transmission and other operations, and great threat is brought to network security. Whether a DNS hidden tunnel exists or not is detected and identified, so that the user loss can be effectively reduced, and the health and the safety of a network environment are guaranteed.
At present, related patents detect DNS hidden tunnels, for example, patent [ CN111786993A ] designs manually to extract DNS request related features, such as request record type, length of a single label of a domain name, various character ratios, and then sets multiple thresholds to determine whether a DNS tunnel exists. The method designs abundant characteristics, but the judgment is carried out by completely depending on set rules, so that the manual interference is too much, and the misjudgment is easily caused. Patent [ CN110149418A ] uses a deep learning method for detection, a deep neural network can often learn hidden features which cannot be designed manually, and the detection effect is good, but the method does not use features such as request record types, and the features are also closely related to the detection result. Secondly, the problem of high computation complexity of a deep learning model is not considered, and finally, training data is not amplified by using data enhancement, so that the problem of cost consumption of manual labeling data can be solved by domain name data enhancement.
The method comprises the steps of extracting the domain name and the related request information of each request, using the fusion characteristics as input, and using a deep learning model as a detection model to judge whether the DNS flow has a hidden tunnel.
Using known techniques
N-Gram is an algorithm based on a statistical language model. The basic idea is to perform a sliding window operation with the size of N on the content in the text according to bytes, and form a byte fragment sequence with the length of N. Each byte segment is called as a gram, the occurrence frequency of all the grams is counted, and filtering is performed according to a preset threshold value to form a key gram list, namely a vector feature space of the text, wherein each gram in the list is a feature vector dimension. The model is based on the assumption that the nth word occurs only in relation to the first N-1 words, but not in relation to any other word, and that the probability of a complete sentence is the product of the probabilities of occurrence of the words. These probabilities can be obtained by counting the number of times that N words occur simultaneously directly from the corpus. Binary Bi-grams and ternary Tri-grams are commonly used.
The DNS tunnel, which is one of the covert channels, referred to herein as a DNS covert tunnel, establishes communication by encapsulating other protocols in a DNS protocol for transmission. Since DNS is an essential service in our network world, most firewalls and intrusion detection devices rarely filter DNS traffic, which provides the DNS as a covert channel that can be used to perform operations such as remote control, file transfer, etc., and increasing research has now demonstrated that DNS covert tunnels also often play an important role in botnet and APT attacks.
DNS hidden tunneling has been implemented by a plurality of tools from the proposal to the present, NSTX and Ozymandns are relatively early in history, iodine and dnscat2 are relatively active at present, and Denise, DNS2tcp and Heyoka are also available. The core principles of different tools are similar, but there is a certain difference in terms of coding, implementation details and target application scenarios. The implementation tool of the DNS hidden tunnel comprises: NSTX, Ozymandns, iododine, dnscat2, Denise, dns2tcp and Heyoka.
The PCAP is a data packet capture library, and a plurality of software uses the PCAP as a data packet capture tool. WireShark also uses the PCAP library to capture data packets. The data packets captured by the PCAP are not original network byte streams, but are newly assembled to form a new data format.
the tcpdump adopts a command line mode to screen and capture the data packet of the interface, and the rich characteristic of the tcpdump is expressed on a flexible expression. Tcpdump without any option will grab the first network interface by default and will stop grabbing the package only if the tcpdump process is terminated.
dropout refers to that in the deep learning training process, for a neural network training unit, the neural network training unit is removed from the network according to a certain probability, and it is noted that temporarily, for the descending of the random gradient, each mini-batch is training a different network due to random discarding.
Disclosure of Invention
In view of the defects of the prior art, the method for detecting the DNS hidden tunnel with the multi-feature fusion provided by the invention comprises the following steps:
1) obtaining DNS hidden tunnel flow packets through self-built DNS hidden tunnels by black sample collector
The method comprises the following steps that a black sample collector builds a DNS hidden tunnel by using two servers and a DNS hidden tunnel implementation tool, wherein one server serves as a server end of the DNS server deployment DNS hidden tunnel implementation tool, and the other server serves as an access end of the DNS server deployment DNS hidden tunnel implementation tool; the DNS server is deployed to be the DNS server for analyzing the specific domain name, and the specific domain name is only set in a test environment between two servers, so that the external network environment is not influenced and is not influenced by the external network environment; editing data of any content as transmission sample data, wherein the size of the transmission sample data is not limited; deploying a tcpdump tool on a DNS server to collect DNS traffic, storing the DNS traffic in a PCAP (personal computer application protocol) packet mode, and using the DNS traffic as a DNS hidden tunnel traffic packet;
2) preprocessing DNS hidden tunnel flow packet data by a black sample standardization module, and extracting DNS hidden tunnel flow packet data characteristics
Extracting key fields in the PCAP flow packet by using a Wireshark tool, wherein the key fields mainly comprise a source ip, a source port, a destination ip, a destination port, a requested domain name and a request type; removing the domain name suffix, and dividing the sub domain name without the domain name suffix into a plurality of character strings by taking the character as a boundary, namely a plurality of sub domain name fragments; randomly replacing characters in the sub-domain names according to an expansion rule to expand data, so that a plurality of groups of sub-domain name fragments can be obtained; the expansion rule is that only characters of the same type are replaced when characters are replaced, the replacement positions and the number of the replaced characters are determined randomly, at least 1 character is replaced when the characters are replaced, the number of the characters replaced at most when the characters are replaced is half of the length of the character string, and the length of the replaced sub domain name is the same as that of the atomic domain name; extracting the domain name length; extracting the number of domain name labels, wherein the number of the domain name labels refers to the number of domain name fragments segmented in'; extracting the DNS request record type; taking a group of a plurality of sub domain name fragments, domain name length, domain name label number and DNS request record type as a DNS hidden tunnel traffic packet data sample, wherein the DNS hidden tunnel traffic packet data sample is called a black sample;
3) obtaining normal DNS request sample by white sample standardization module
Storing DNS traffic in daily work as a PCAP (personal computer application protocol) packet by collecting the DNS traffic in daily work, and extracting key fields in the PCAP traffic packet by using a Wireshark tool, wherein the key fields mainly comprise a source ip, a source port, a destination ip, a destination port, a requested domain name and a request type; removing the domain name suffix, and dividing the sub domain name without the domain name suffix into a plurality of character strings by taking the character as a boundary, namely a plurality of sub domain name fragments; extracting the domain name length; extracting domain name label number features; extracting the DNS request record type; taking a group of a plurality of sub domain name fragments, domain name length, domain name label number and DNS request record type as a normal DNS flow packet data sample, wherein the normal DNS flow packet data sample is called a white sample;
4) model module for constructing neural network
Numbering the domain name characters and features, and establishing a word list so as to be used for neural network model input: for the domain name length feature, the domain name fragment length is less than 10 and is coded as 1, the domain name fragment length is 10-20 and is coded as 2, the domain name fragment length is 20-30 and is coded as 3, the domain name fragment length is 30-50 and is coded as 4, and the domain name fragment length is 5 above 50; for the domain name label number characteristics, the number of domain name labels less than 3 is coded as 6, the number of domain name labels from 3 to 5 is coded as 7, and the number of domain name labels more than or equal to 5 is coded as 8; for the DNS record type feature, the DNS record type is a TXT record, the code is 9, and the code is 10 if the DNS record type is not a TXT record; for the domain name string feature, characters a through Z correspond to encodings 11 through 36, characters a through Z correspond to encodings 37 through 62, and characters 0-9 correspond to encodings 63-72; randomly taking 70% of samples of all samples as a training set, and randomly dividing the rest samples into a verification set and a test set in equal parts, wherein all samples comprise DNS hidden tunnel traffic packet data samples and normal DNS traffic packet data samples; performing filling operation before inputting a sample, setting the maximum length of a domain name fragment code to be 64, cutting off the part of input data which is longer than 64, and if the length is less than 64, supplementing 0 at the tail part; the word vector layer is used for converting each digital code into a vector form; in a CNN convolutional neural network layer, fully learning text characteristics of a domain name through one-dimensional convolutions with different convolution kernel sizes, splicing results of three convolutional layers, performing maximum pooling, adding a Dropout layer for reducing model complexity and preventing overfitting, and finally using a full-connection classification layer, wherein two classes exist, namely a DNS hidden tunnel class and a normal DNS request class, the DNS hidden tunnel class is called a black sample class, and the normal DNS request class is called a white sample class;
5) construction of a Rapid prescreening Module Using white samples
Before the class judgment of the newly acquired DNS traffic is carried out, the newly acquired DNS traffic is defaulted to be a white sample class, and the class of the newly acquired DNS traffic is changed into a black sample class until the white sample is judged to be a black sample class through the judgment of the neural network model module; the fast pre-screening module is used for fast judging the newly received white samples, eliminating data with low probability of becoming black samples in the white samples, accelerating the speed of judging the category of the newly acquired DNS traffic and reducing the calculation amount of the neural network model module;
the steps of constructing the rapid pre-screening module comprise:
taking white samples acquired by a white sample standardization module as training samples, recording each training sample as a sub-domain name sequence, recording the sub-domain name sequence as S, and expressing the occurrence probability of the whole sequence as follows when the length is m:
Figure 982815DEST_PATH_IMAGE002
secondly, according to the Markov assumption, the occurrence of a word is only related to the previous n words, n takes a value of 3, and the conditional probability calculation in the formula is simplified as follows:
Figure 135841DEST_PATH_IMAGE004
Figure 465978DEST_PATH_IMAGE006
thirdly, a Bayesian formula is utilized, and the calculation mode of each item is as follows:
Figure 313717DEST_PATH_IMAGE008
wherein count (…) represents the number of times these words in the sample set co-occur in succession; in order to avoid the condition that the denominator is zero, after smoothing, the existence probability calculation formula of each sample is obtained and is expressed as follows:
Figure 629554DEST_PATH_IMAGE010
v is the number of words in the word list, and for the whole sample set, the probability of all ternary combinations can be calculated and stored as a model for use in subsequent prediction;
calculating existence probability p of each sample sequence in the training sample, wherein due to the fact that the lengths of different sequences are different, the number of triples is different, and the probability difference calculated through the formula product is large, one-time conversion is conducted, the number of ternary combinations in the sequences is set to be t, and the existence probability of each sample sequence is expressed as
Figure 454946DEST_PATH_IMAGE012
Fifthly, determining a segmentation threshold, taking the median of the existence probability of all the training samples as a threshold, directly marking the training samples larger than the probability threshold as white samples, and enabling the white samples smaller than the probability threshold to enter a neural network model module for category judgment.
Advantageous effects
The fast pre-screening module can simply distinguish the normal request domain name from the tunnel request domain name, and the efficiency is high, so that the normal request domain name occupying most of the actual work can be quickly eliminated, and the subsequent deep learning model detection with high complexity and low speed is avoided. In the aspect of deep learning detection, the method combines the general rule characteristic and the deep domain name text characteristic for DNS hidden tunnel detection. Compared with the prior DNS detection method, the characteristics of domain name complexity, information entropy and the like do not need to be designed manually, and the deep network model can automatically learn the information in the domain name text, so that the method has a better effect compared with manual design. Next, network request characteristics such as the request record type, which are not in the domain name text, are also encoded as model inputs. In the aspect of deep model training, the method uses domain name data enhancement, can effectively amplify training data, improves the model effect, and reduces the cost of manpower and material resources brought by manually collecting data. Finally, the method also uses a black-and-white list mechanism, so that the previous detection result plays a role, and the detection efficiency and effect are improved.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
Example one
Referring to fig. 1, the implementation steps of the method for detecting a DNS hidden tunnel with multi-feature fusion provided by the present invention include:
s01 obtaining DNS hidden tunnel flow packet by black sample collector through self-built DNS hidden tunnel
The method comprises the following steps that a black sample collector builds a DNS hidden tunnel by using two servers and a DNS hidden tunnel implementation tool, wherein one server serves as a server end of the DNS server deployment DNS hidden tunnel implementation tool, and the other server serves as an access end of the DNS server deployment DNS hidden tunnel implementation tool; the DNS server is deployed to be the DNS server for analyzing the specific domain name, and the specific domain name is only set in a test environment between two servers, so that the external network environment is not influenced and is not influenced by the external network environment; editing data of any content as transmission sample data, wherein the size of the transmission sample data is not limited; deploying a tcpdump tool on a DNS server to collect DNS traffic, storing the DNS traffic in a PCAP packet mode, and using the DNS traffic as a DNS hidden tunnel traffic packet;
s02 preprocessing DNS hidden tunnel flow packet data by a black sample standardization module and extracting DNS hidden tunnel flow packet data characteristics
Extracting key fields in the PCAP flow packet by using a Wireshark tool, wherein the key fields mainly comprise a source ip, a source port, a destination ip, a destination port, a requested domain name and a request type; removing the domain name suffix, and dividing the sub domain name without the domain name suffix into a plurality of character strings by taking the character as a boundary, namely a plurality of sub domain name fragments; randomly replacing characters in the sub-domain names according to an expansion rule to expand data, so that a plurality of groups of sub-domain name fragments can be obtained; the expansion rule is that only characters of the same type are replaced when characters are replaced, the replacement positions and the number of the replaced characters are determined randomly, at least 1 character is replaced when the characters are replaced, the number of the characters replaced at most when the characters are replaced is half of the length of the character string, and the length of the replaced sub domain name is the same as that of the atomic domain name; extracting the domain name length; extracting the number of domain name labels, wherein the number of the domain name labels refers to the number of domain name fragments divided in a 'way'; extracting the DNS request record type; taking a group of a plurality of sub domain name fragments, domain name length, domain name label number and DNS request record type as a DNS hidden tunnel traffic packet data sample, wherein the DNS hidden tunnel traffic packet data sample is called a black sample;
s03 obtaining normal DNS request sample by white sample standardization module
Storing DNS traffic in daily work as a PCAP packet by collecting the DNS traffic, and extracting key fields in the PCAP traffic packet by using a Wireshark tool, wherein the key fields mainly comprise a source ip, a source port, a destination ip, a destination port, a requested domain name and a request type; removing the domain name suffix, and dividing the sub domain name without the domain name suffix into a plurality of character strings by taking the character as a boundary, namely a plurality of sub domain name fragments; extracting the domain name length; extracting domain name label number features; extracting the DNS request record type; taking a group of multiple sub domain name fragments, domain name length, domain name label number and DNS request record type as a normal DNS flow packet data sample, wherein the normal DNS flow packet data sample is called a white sample;
s04 neural network model building module
Numbering the domain name characters and features, and establishing a word list so as to be used for neural network model input: for the domain name length feature, the domain name fragment length is less than 10 and is coded as 1, the domain name fragment length is 10-20 and is coded as 2, the domain name fragment length is 20-30 and is coded as 3, the domain name fragment length is 30-50 and is coded as 4, and the domain name fragment length is more than 50 and is coded as 5; for the domain name label number characteristics, the number of domain name labels less than 3 is coded as 6, the number of domain name labels from 3 to 5 is coded as 7, and the number of domain name labels more than or equal to 5 is coded as 8; for the DNS record type feature, the DNS record type is a TXT record, the code is 9, and the code is 10 if the DNS record type is not a TXT record; for the domain name string feature, characters a through Z correspond to encodings 11 through 36, characters a through Z correspond to encodings 37 through 62, and characters 0-9 correspond to encodings 63-72; randomly taking 70% of samples of all samples as a training set, and randomly dividing the rest samples into a verification set and a test set in equal parts, wherein all samples comprise DNS hidden tunnel traffic packet data samples and normal DNS traffic packet data samples; performing filling operation before inputting a sample, setting the maximum length of a domain name fragment code to be 64, cutting off the part of input data which is longer than 64, and if the length is less than 64, supplementing 0 at the tail part; the word vector layer is used for converting each digital code into a vector form; in a CNN convolutional neural network layer, fully learning text characteristics of a domain name through one-dimensional convolutions with different convolution kernel sizes, splicing results of three convolutional layers, performing maximum pooling, adding a Dropout layer for reducing model complexity and preventing overfitting, and finally using a full-connection classification layer, wherein two classes exist, namely a DNS hidden tunnel class and a normal DNS request class, the DNS hidden tunnel class is called a black sample class, and the normal DNS request class is called a white sample class;
s05 construction of Rapid prescreening Module Using white samples
Before the class judgment of the newly acquired DNS traffic is carried out, the newly acquired DNS traffic is defaulted to be a white sample class, and the class of the newly acquired DNS traffic is changed into a black sample class until the white sample is judged to be a black sample class through the judgment of the neural network model module; the fast pre-screening module is used for fast judging the newly received white samples, eliminating data with low probability of becoming black samples in the white samples, accelerating the speed of judging the category of the newly acquired DNS traffic and reducing the calculation amount of the neural network model module;
the steps of constructing the rapid pre-screening module comprise:
taking white samples acquired by a white sample standardization module as training samples, recording each training sample as a sub-domain name sequence, recording the sub-domain name sequence as S, and expressing the occurrence probability of the whole sequence as follows when the length is m:
Figure 272379DEST_PATH_IMAGE013
secondly, according to the Markov assumption, the occurrence of a word is only related to the previous n words, n takes a value of 3, and the conditional probability calculation in the formula is simplified as follows:
Figure 189389DEST_PATH_IMAGE014
Figure 94153DEST_PATH_IMAGE015
thirdly, a Bayesian formula is utilized, and the calculation mode of each item is as follows:
Figure 591042DEST_PATH_IMAGE016
wherein count (…) represents the number of times that the words in the sample set co-occur consecutively; in order to avoid the condition that the denominator is zero, after the smoothing processing, the existence probability calculation formula of each sample is obtained and is expressed as follows:
Figure 797901DEST_PATH_IMAGE017
v is the number of words in the word list, and for the whole sample set, the probability of all ternary combinations can be calculated and stored as a model for use in subsequent prediction;
calculating existence probability p of each sample sequence in the training sample, wherein due to the fact that the lengths of different sequences are different, the number of triples is different, and the probability difference calculated through the formula product is large, one-time conversion is conducted, the number of ternary combinations in the sequences is set to be t, and the existence probability of each sample sequence is expressed as
Figure 803422DEST_PATH_IMAGE018
Fifthly, determining a segmentation threshold, taking the median of the existence probability of all the training samples as a threshold, directly marking the training samples larger than the probability threshold as white samples, and enabling the white samples smaller than the probability threshold to enter a neural network model module for category judgment.
Example two
Newly acquired DNS network traffic classification
1) Inputting newly acquired DNS network flow into a white sample standardization module to obtain a white sample;
2) inputting the white sample into a quick pre-screening module to filter the white sample into a white sample with low black sample probability;
3) and inputting the white samples with high probability of becoming black samples in the white samples into a neural network model module, and finally classifying the input white samples.

Claims (1)

1. A DNS hidden tunnel detection method with multi-feature fusion is characterized by comprising the following implementation steps:
1) obtaining DNS hidden tunnel flow packets through self-built DNS hidden tunnels by black sample collector
The method comprises the following steps that a black sample collector builds a DNS hidden tunnel by using two servers and a DNS hidden tunnel implementation tool, wherein one server serves as a server end of the DNS server deployment DNS hidden tunnel implementation tool, and the other server serves as an access end of the DNS server deployment DNS hidden tunnel implementation tool; the DNS server is deployed to be the DNS server for analyzing the specific domain name, and the specific domain name is only set in a test environment between two servers, so that the external network environment is not influenced and is not influenced by the external network environment; editing data of any content as transmission sample data, wherein the size of the transmission sample data is not limited; deploying a tcpdump tool on a DNS server to collect DNS traffic, storing the DNS traffic in a PCAP packet mode, and using the DNS traffic as a DNS hidden tunnel traffic packet;
2) preprocessing DNS hidden tunnel flow packet data by a black sample standardization module, and extracting DNS hidden tunnel flow packet data characteristics
Extracting key fields in the PCAP flow packet by using a Wireshark tool, wherein the key fields mainly comprise a source ip, a source port, a destination ip, a destination port, a requested domain name and a request type; removing the domain name suffix, and dividing the sub domain name without the domain name suffix into a plurality of character strings by taking the character as a boundary, namely a plurality of sub domain name fragments; randomly replacing characters in the sub-domain names according to an expansion rule to expand data, so that a plurality of groups of sub-domain name fragments can be obtained; the expansion rule is that only characters of the same type are replaced when characters are replaced, the replacement positions and the number of the replaced characters are determined randomly, at least 1 character is replaced when the characters are replaced, the number of the characters replaced at most when the characters are replaced is half of the length of the character string, and the length of the replaced sub domain name is the same as that of the atomic domain name; extracting the domain name length; extracting the number of domain name labels, wherein the number of the domain name labels refers to the number of domain name fragments segmented in'; extracting the DNS request record type; taking a group of a plurality of sub domain name fragments, domain name length, domain name label number and DNS request record type as a DNS hidden tunnel traffic packet data sample, wherein the DNS hidden tunnel traffic packet data sample is called a black sample;
3) obtaining normal DNS request sample by white sample standardization module
Storing DNS traffic in daily work as a PCAP (personal computer application protocol) packet by collecting the DNS traffic in daily work, and extracting key fields in the PCAP traffic packet by using a Wireshark tool, wherein the key fields mainly comprise a source ip, a source port, a destination ip, a destination port, a requested domain name and a request type; removing the domain name suffix, and dividing the sub domain name without the domain name suffix into a plurality of character strings by taking the character as a boundary, namely a plurality of sub domain name fragments; extracting the domain name length; extracting domain name label number features; extracting the DNS request record type; taking a group of a plurality of sub domain name fragments, domain name length, domain name label number and DNS request record type as a normal DNS flow packet data sample, wherein the normal DNS flow packet data sample is called a white sample;
4) model module for constructing neural network
Numbering the domain name characters and features, and establishing a word list so as to be used for neural network model input: for the domain name length feature, the domain name fragment length is less than 10 and is coded as 1, the domain name fragment length is 10-20 and is coded as 2, the domain name fragment length is 20-30 and is coded as 3, the domain name fragment length is 30-50 and is coded as 4, and the domain name fragment length is 5 above 50; for the domain name label number characteristics, the number of domain name labels less than 3 is coded as 6, the number of domain name labels from 3 to 5 is coded as 7, and the number of domain name labels more than or equal to 5 is coded as 8; for the DNS record type feature, the DNS record type is a TXT record, the code is 9, and the code is 10 if the DNS record type is not a TXT record; for the domain name string feature, characters a through Z correspond to encodings 11 through 36, characters a through Z correspond to encodings 37 through 62, and characters 0-9 correspond to encodings 63-72; randomly taking 70% of samples of all samples as a training set, and randomly dividing the rest samples into a verification set and a test set in equal parts, wherein all samples comprise DNS hidden tunnel traffic packet data samples and normal DNS traffic packet data samples; performing filling operation before inputting a sample, setting the maximum length of a domain name fragment code to be 64, cutting off the part of input data which is longer than 64, and if the length is less than 64, supplementing 0 at the tail part; the word vector layer is used for converting each digital code into a vector form; in a CNN convolutional neural network layer, fully learning text characteristics of a domain name through one-dimensional convolutions with different convolution kernel sizes, splicing results of three convolutional layers, performing maximum pooling, adding a Dropout layer for reducing model complexity and preventing overfitting, and finally using a full-connection classification layer, wherein two classes exist, namely a DNS hidden tunnel class and a normal DNS request class, the DNS hidden tunnel class is called a black sample class, and the normal DNS request class is called a white sample class;
5) construction of a Rapid prescreening Module Using white samples
Before the class judgment of the newly acquired DNS traffic is carried out, the newly acquired DNS traffic is defaulted to be a white sample class, and the class of the newly acquired DNS traffic is changed into a black sample class until the white sample is judged to be a black sample class through the judgment of the neural network model module; the fast pre-screening module is used for fast judging the newly received white samples, eliminating data with low probability of becoming black samples in the white samples, accelerating the speed of judging the category of the newly acquired DNS traffic and reducing the calculation amount of the neural network model module;
the steps of constructing the rapid pre-screening module comprise:
taking white samples acquired by a white sample standardization module as training samples, recording each training sample as a sub-domain name sequence, recording the sub-domain name sequence as S, and expressing the occurrence probability of the whole sequence as follows when the length is m:
Figure 230770DEST_PATH_IMAGE002
secondly, according to the Markov assumption, the occurrence of a word is only related to the previous n words, n takes a value of 3, and the conditional probability calculation in the formula is simplified as follows:
Figure 150840DEST_PATH_IMAGE004
Figure 887502DEST_PATH_IMAGE006
thirdly, a Bayesian formula is utilized, and the calculation mode of each item is as follows:
Figure DEST_PATH_IMAGE008
wherein count (…) represents the number of times that the words in the sample set co-occur consecutively; in order to avoid the condition that the denominator is zero, after the smoothing processing, the existence probability calculation formula of each sample is obtained and is expressed as follows:
Figure DEST_PATH_IMAGE010
v is the number of words in the word list, and for the whole sample set, the probability of all ternary combinations can be calculated and stored as a model for use in subsequent prediction;
calculating existence probability p of each sample sequence in the training sample, wherein due to the fact that the lengths of different sequences are different, the number of triples is different, and the probability difference calculated through the formula product is large, one-time conversion is conducted, the number of ternary combinations in the sequences is set to be t, and the existence probability of each sample sequence is expressed as
Figure DEST_PATH_IMAGE012
Fifthly, determining a segmentation threshold, taking the median of the existence probability of all the training samples as a threshold, directly marking the training samples larger than the probability threshold as white samples, and enabling the white samples smaller than the probability threshold to enter a neural network model module for category judgment.
CN202210198998.5A 2022-03-03 2022-03-03 Multi-feature fusion type DNS hidden tunnel detection method Active CN114567487B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210198998.5A CN114567487B (en) 2022-03-03 2022-03-03 Multi-feature fusion type DNS hidden tunnel detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210198998.5A CN114567487B (en) 2022-03-03 2022-03-03 Multi-feature fusion type DNS hidden tunnel detection method

Publications (2)

Publication Number Publication Date
CN114567487A true CN114567487A (en) 2022-05-31
CN114567487B CN114567487B (en) 2024-08-06

Family

ID=81716791

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210198998.5A Active CN114567487B (en) 2022-03-03 2022-03-03 Multi-feature fusion type DNS hidden tunnel detection method

Country Status (1)

Country Link
CN (1) CN114567487B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115086080A (en) * 2022-08-03 2022-09-20 上海欣诺通信技术股份有限公司 DNS hidden tunnel detection method based on flow characteristics
CN115134168A (en) * 2022-08-29 2022-09-30 成都盛思睿信息技术有限公司 Method and system for detecting cloud platform hidden channel based on convolutional neural network
CN115348188A (en) * 2022-10-18 2022-11-15 安徽华云安科技有限公司 DNS tunnel traffic detection method and device, storage medium and terminal
CN115643087A (en) * 2022-10-24 2023-01-24 天津大学 DNS tunnel detection method based on fusion of coding characteristics and statistical behavior characteristics
CN116614262A (en) * 2023-04-27 2023-08-18 华能信息技术有限公司 Hidden network channel detection method
CN118041698A (en) * 2024-04-11 2024-05-14 深圳大学 DNS hidden tunnel detection method, device and storage medium
CN116614262B (en) * 2023-04-27 2024-10-25 华能信息技术有限公司 Hidden network channel detection method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120054860A1 (en) * 2010-09-01 2012-03-01 Raytheon Bbn Technologies Corp. Systems and methods for detecting covert dns tunnels
US20170318035A1 (en) * 2016-04-29 2017-11-02 International Business Machines Corporation Cognitive and contextual detection of malicious dns
US20180063162A1 (en) * 2016-08-25 2018-03-01 International Business Machines Corporation Dns tunneling prevention
CN110149418A (en) * 2018-12-12 2019-08-20 国网信息通信产业集团有限公司 A kind of hidden tunnel detection method of DNS based on deep learning
CN111835763A (en) * 2020-07-13 2020-10-27 北京邮电大学 DNS tunnel traffic detection method and device and electronic equipment
CN111953673A (en) * 2020-08-10 2020-11-17 深圳市联软科技股份有限公司 DNS hidden tunnel detection method and system
CN113347210A (en) * 2021-08-03 2021-09-03 北京观成科技有限公司 DNS tunnel detection method and device and electronic equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120054860A1 (en) * 2010-09-01 2012-03-01 Raytheon Bbn Technologies Corp. Systems and methods for detecting covert dns tunnels
US20170318035A1 (en) * 2016-04-29 2017-11-02 International Business Machines Corporation Cognitive and contextual detection of malicious dns
US20180063162A1 (en) * 2016-08-25 2018-03-01 International Business Machines Corporation Dns tunneling prevention
CN110149418A (en) * 2018-12-12 2019-08-20 国网信息通信产业集团有限公司 A kind of hidden tunnel detection method of DNS based on deep learning
CN111835763A (en) * 2020-07-13 2020-10-27 北京邮电大学 DNS tunnel traffic detection method and device and electronic equipment
CN111953673A (en) * 2020-08-10 2020-11-17 深圳市联软科技股份有限公司 DNS hidden tunnel detection method and system
CN113347210A (en) * 2021-08-03 2021-09-03 北京观成科技有限公司 DNS tunnel detection method and device and electronic equipment

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115086080A (en) * 2022-08-03 2022-09-20 上海欣诺通信技术股份有限公司 DNS hidden tunnel detection method based on flow characteristics
CN115086080B (en) * 2022-08-03 2024-05-07 上海欣诺通信技术股份有限公司 DNS hidden tunnel detection method based on flow characteristics
CN115134168A (en) * 2022-08-29 2022-09-30 成都盛思睿信息技术有限公司 Method and system for detecting cloud platform hidden channel based on convolutional neural network
CN115348188A (en) * 2022-10-18 2022-11-15 安徽华云安科技有限公司 DNS tunnel traffic detection method and device, storage medium and terminal
CN115348188B (en) * 2022-10-18 2023-03-24 安徽华云安科技有限公司 DNS tunnel traffic detection method and device, storage medium and terminal
CN115643087A (en) * 2022-10-24 2023-01-24 天津大学 DNS tunnel detection method based on fusion of coding characteristics and statistical behavior characteristics
CN115643087B (en) * 2022-10-24 2024-04-30 天津大学 DNS tunnel detection method based on fusion of coding features and statistical behavior features
CN116614262A (en) * 2023-04-27 2023-08-18 华能信息技术有限公司 Hidden network channel detection method
CN116614262B (en) * 2023-04-27 2024-10-25 华能信息技术有限公司 Hidden network channel detection method
CN118041698A (en) * 2024-04-11 2024-05-14 深圳大学 DNS hidden tunnel detection method, device and storage medium
CN118041698B (en) * 2024-04-11 2024-06-18 深圳大学 DNS hidden tunnel detection method, device and storage medium

Also Published As

Publication number Publication date
CN114567487B (en) 2024-08-06

Similar Documents

Publication Publication Date Title
CN114567487B (en) Multi-feature fusion type DNS hidden tunnel detection method
WO2022041394A1 (en) Method and apparatus for identifying network encrypted traffic
CN110597734B (en) Fuzzy test case generation method suitable for industrial control private protocol
CN112839034B (en) Network intrusion detection method based on CNN-GRU hierarchical neural network
CN113518063B (en) Network intrusion detection method and system based on data enhancement and BilSTM
CN111340191B (en) Bot network malicious traffic classification method and system based on ensemble learning
CN110012029B (en) Method and system for distinguishing encrypted and non-encrypted compressed flow
JP6055548B2 (en) Apparatus, method, and network server for detecting data pattern in data stream
CN108199863B (en) Network traffic classification method and system based on two-stage sequence feature learning
CN109753987B (en) File recognition method and feature extraction method
CN113079069A (en) Mixed granularity training and classifying method for large-scale encrypted network traffic
CN110865970B (en) Compression flow pattern matching engine and pattern matching method based on FPGA platform
CN114553983A (en) Deep learning-based high-efficiency industrial control protocol analysis method
CN111611280A (en) Encrypted traffic identification method based on CNN and SAE
CN111130942B (en) Application flow identification method based on message size analysis
CN112887291A (en) I2P traffic identification method and system based on deep learning
CN114338437B (en) Network traffic classification method and device, electronic equipment and storage medium
CN115622926A (en) Industrial control protocol reverse analysis method based on network traffic
Yujie et al. End-to-end android malware classification based on pure traffic images
CN117082118A (en) Network connection method based on data derivation and port prediction
CN112437084B (en) Attack feature extraction method
CN113378163A (en) Android malicious software family classification method based on DEX file partition characteristics
CN114884894B (en) Semi-supervised network traffic classification method based on transfer learning
CN113852605B (en) Protocol format automatic inference method and system based on relation reasoning
CN114205151B (en) HTTP/2 page access flow identification method based on multi-feature fusion learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant