CN114567487A - DNS hidden tunnel detection method with multi-feature fusion - Google Patents
DNS hidden tunnel detection method with multi-feature fusion Download PDFInfo
- Publication number
- CN114567487A CN114567487A CN202210198998.5A CN202210198998A CN114567487A CN 114567487 A CN114567487 A CN 114567487A CN 202210198998 A CN202210198998 A CN 202210198998A CN 114567487 A CN114567487 A CN 114567487A
- Authority
- CN
- China
- Prior art keywords
- domain name
- dns
- sample
- samples
- hidden tunnel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 19
- 230000004927 fusion Effects 0.000 title claims abstract description 7
- 238000000034 method Methods 0.000 claims abstract description 17
- 238000003062 neural network model Methods 0.000 claims abstract description 15
- 238000012216 screening Methods 0.000 claims abstract description 11
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 239000012634 fragment Substances 0.000 claims description 37
- 238000012549 training Methods 0.000 claims description 25
- 238000004364 calculation method Methods 0.000 claims description 12
- 230000005540 biological transmission Effects 0.000 claims description 8
- 238000013527 convolutional neural network Methods 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 6
- 238000013528 artificial neural network Methods 0.000 claims description 5
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000004883 computer application Methods 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 3
- 238000009499 grossing Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 230000001502 supplementing effect Effects 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 2
- 238000013135 deep learning Methods 0.000 abstract description 4
- 230000000694 effects Effects 0.000 description 5
- 238000013136 deep learning model Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- PPEKGEBBBBNZKS-HGRQIUPRSA-N neosaxitoxin Chemical compound N=C1N(O)[C@@H](COC(=O)N)[C@@H]2NC(=N)N[C@@]22C(O)(O)CCN21 PPEKGEBBBBNZKS-HGRQIUPRSA-N 0.000 description 2
- ZCYVEMRRCGMTRW-UHFFFAOYSA-N 7553-56-2 Chemical compound [I] ZCYVEMRRCGMTRW-UHFFFAOYSA-N 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 229910052740 iodine Inorganic materials 0.000 description 1
- 239000011630 iodine Substances 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000005641 tunneling Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/02—Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
- H04L63/029—Firewall traversal, e.g. tunnelling or, creating pinholes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/142—Network analysis or design using statistical or mathematical methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/16—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/20—Network architectures or network communication protocols for network security for managing network security; network security policies in general
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Signal Processing (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Security & Cryptography (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Computer Hardware Design (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Probability & Statistics with Applications (AREA)
- Pure & Applied Mathematics (AREA)
- Algebra (AREA)
- Medical Informatics (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
A DNS hidden tunnel detection method with multi-feature fusion relates to the technical field of information, and the method comprises the following steps: 1) acquiring a DNS hidden tunnel flow packet by a black sample collector through a self-built DNS hidden tunnel; 2) preprocessing DNS hidden tunnel flow packet data by a black sample standardization module, and extracting DNS hidden tunnel flow packet data characteristics; 3) a white sample standardization module acquires a normal DNS request sample; 4) constructing a neural network model module; 5) constructing a rapid pre-screening module by using a white sample; the quick pre-screening module can simply distinguish the normal request domain name from the tunnel request domain name, efficiently and quickly eliminates the normal request domain name occupying most of the actual work, and in the aspect of deep learning detection, the quick pre-screening module combines the general rule characteristic and the deep domain name text characteristic for DNS hidden tunnel detection, thereby improving the detection accuracy and reducing the detection difficulty.
Description
Technical Field
The invention relates to the technical field of information.
Background
With the continuous development of the internet, the DNS becomes an indispensable service, so that a general firewall cannot detect and filter DNS traffic. Therefore, lawless persons can take the opportunity to use the DNS as a hidden channel to realize remote control, file transmission and other operations, and great threat is brought to network security. Whether a DNS hidden tunnel exists or not is detected and identified, so that the user loss can be effectively reduced, and the health and the safety of a network environment are guaranteed.
At present, related patents detect DNS hidden tunnels, for example, patent [ CN111786993A ] designs manually to extract DNS request related features, such as request record type, length of a single label of a domain name, various character ratios, and then sets multiple thresholds to determine whether a DNS tunnel exists. The method designs abundant characteristics, but the judgment is carried out by completely depending on set rules, so that the manual interference is too much, and the misjudgment is easily caused. Patent [ CN110149418A ] uses a deep learning method for detection, a deep neural network can often learn hidden features which cannot be designed manually, and the detection effect is good, but the method does not use features such as request record types, and the features are also closely related to the detection result. Secondly, the problem of high computation complexity of a deep learning model is not considered, and finally, training data is not amplified by using data enhancement, so that the problem of cost consumption of manual labeling data can be solved by domain name data enhancement.
The method comprises the steps of extracting the domain name and the related request information of each request, using the fusion characteristics as input, and using a deep learning model as a detection model to judge whether the DNS flow has a hidden tunnel.
Using known techniques
N-Gram is an algorithm based on a statistical language model. The basic idea is to perform a sliding window operation with the size of N on the content in the text according to bytes, and form a byte fragment sequence with the length of N. Each byte segment is called as a gram, the occurrence frequency of all the grams is counted, and filtering is performed according to a preset threshold value to form a key gram list, namely a vector feature space of the text, wherein each gram in the list is a feature vector dimension. The model is based on the assumption that the nth word occurs only in relation to the first N-1 words, but not in relation to any other word, and that the probability of a complete sentence is the product of the probabilities of occurrence of the words. These probabilities can be obtained by counting the number of times that N words occur simultaneously directly from the corpus. Binary Bi-grams and ternary Tri-grams are commonly used.
The DNS tunnel, which is one of the covert channels, referred to herein as a DNS covert tunnel, establishes communication by encapsulating other protocols in a DNS protocol for transmission. Since DNS is an essential service in our network world, most firewalls and intrusion detection devices rarely filter DNS traffic, which provides the DNS as a covert channel that can be used to perform operations such as remote control, file transfer, etc., and increasing research has now demonstrated that DNS covert tunnels also often play an important role in botnet and APT attacks.
DNS hidden tunneling has been implemented by a plurality of tools from the proposal to the present, NSTX and Ozymandns are relatively early in history, iodine and dnscat2 are relatively active at present, and Denise, DNS2tcp and Heyoka are also available. The core principles of different tools are similar, but there is a certain difference in terms of coding, implementation details and target application scenarios. The implementation tool of the DNS hidden tunnel comprises: NSTX, Ozymandns, iododine, dnscat2, Denise, dns2tcp and Heyoka.
The PCAP is a data packet capture library, and a plurality of software uses the PCAP as a data packet capture tool. WireShark also uses the PCAP library to capture data packets. The data packets captured by the PCAP are not original network byte streams, but are newly assembled to form a new data format.
the tcpdump adopts a command line mode to screen and capture the data packet of the interface, and the rich characteristic of the tcpdump is expressed on a flexible expression. Tcpdump without any option will grab the first network interface by default and will stop grabbing the package only if the tcpdump process is terminated.
dropout refers to that in the deep learning training process, for a neural network training unit, the neural network training unit is removed from the network according to a certain probability, and it is noted that temporarily, for the descending of the random gradient, each mini-batch is training a different network due to random discarding.
Disclosure of Invention
In view of the defects of the prior art, the method for detecting the DNS hidden tunnel with the multi-feature fusion provided by the invention comprises the following steps:
1) obtaining DNS hidden tunnel flow packets through self-built DNS hidden tunnels by black sample collector
The method comprises the following steps that a black sample collector builds a DNS hidden tunnel by using two servers and a DNS hidden tunnel implementation tool, wherein one server serves as a server end of the DNS server deployment DNS hidden tunnel implementation tool, and the other server serves as an access end of the DNS server deployment DNS hidden tunnel implementation tool; the DNS server is deployed to be the DNS server for analyzing the specific domain name, and the specific domain name is only set in a test environment between two servers, so that the external network environment is not influenced and is not influenced by the external network environment; editing data of any content as transmission sample data, wherein the size of the transmission sample data is not limited; deploying a tcpdump tool on a DNS server to collect DNS traffic, storing the DNS traffic in a PCAP (personal computer application protocol) packet mode, and using the DNS traffic as a DNS hidden tunnel traffic packet;
2) preprocessing DNS hidden tunnel flow packet data by a black sample standardization module, and extracting DNS hidden tunnel flow packet data characteristics
Extracting key fields in the PCAP flow packet by using a Wireshark tool, wherein the key fields mainly comprise a source ip, a source port, a destination ip, a destination port, a requested domain name and a request type; removing the domain name suffix, and dividing the sub domain name without the domain name suffix into a plurality of character strings by taking the character as a boundary, namely a plurality of sub domain name fragments; randomly replacing characters in the sub-domain names according to an expansion rule to expand data, so that a plurality of groups of sub-domain name fragments can be obtained; the expansion rule is that only characters of the same type are replaced when characters are replaced, the replacement positions and the number of the replaced characters are determined randomly, at least 1 character is replaced when the characters are replaced, the number of the characters replaced at most when the characters are replaced is half of the length of the character string, and the length of the replaced sub domain name is the same as that of the atomic domain name; extracting the domain name length; extracting the number of domain name labels, wherein the number of the domain name labels refers to the number of domain name fragments segmented in'; extracting the DNS request record type; taking a group of a plurality of sub domain name fragments, domain name length, domain name label number and DNS request record type as a DNS hidden tunnel traffic packet data sample, wherein the DNS hidden tunnel traffic packet data sample is called a black sample;
3) obtaining normal DNS request sample by white sample standardization module
Storing DNS traffic in daily work as a PCAP (personal computer application protocol) packet by collecting the DNS traffic in daily work, and extracting key fields in the PCAP traffic packet by using a Wireshark tool, wherein the key fields mainly comprise a source ip, a source port, a destination ip, a destination port, a requested domain name and a request type; removing the domain name suffix, and dividing the sub domain name without the domain name suffix into a plurality of character strings by taking the character as a boundary, namely a plurality of sub domain name fragments; extracting the domain name length; extracting domain name label number features; extracting the DNS request record type; taking a group of a plurality of sub domain name fragments, domain name length, domain name label number and DNS request record type as a normal DNS flow packet data sample, wherein the normal DNS flow packet data sample is called a white sample;
4) model module for constructing neural network
Numbering the domain name characters and features, and establishing a word list so as to be used for neural network model input: for the domain name length feature, the domain name fragment length is less than 10 and is coded as 1, the domain name fragment length is 10-20 and is coded as 2, the domain name fragment length is 20-30 and is coded as 3, the domain name fragment length is 30-50 and is coded as 4, and the domain name fragment length is 5 above 50; for the domain name label number characteristics, the number of domain name labels less than 3 is coded as 6, the number of domain name labels from 3 to 5 is coded as 7, and the number of domain name labels more than or equal to 5 is coded as 8; for the DNS record type feature, the DNS record type is a TXT record, the code is 9, and the code is 10 if the DNS record type is not a TXT record; for the domain name string feature, characters a through Z correspond to encodings 11 through 36, characters a through Z correspond to encodings 37 through 62, and characters 0-9 correspond to encodings 63-72; randomly taking 70% of samples of all samples as a training set, and randomly dividing the rest samples into a verification set and a test set in equal parts, wherein all samples comprise DNS hidden tunnel traffic packet data samples and normal DNS traffic packet data samples; performing filling operation before inputting a sample, setting the maximum length of a domain name fragment code to be 64, cutting off the part of input data which is longer than 64, and if the length is less than 64, supplementing 0 at the tail part; the word vector layer is used for converting each digital code into a vector form; in a CNN convolutional neural network layer, fully learning text characteristics of a domain name through one-dimensional convolutions with different convolution kernel sizes, splicing results of three convolutional layers, performing maximum pooling, adding a Dropout layer for reducing model complexity and preventing overfitting, and finally using a full-connection classification layer, wherein two classes exist, namely a DNS hidden tunnel class and a normal DNS request class, the DNS hidden tunnel class is called a black sample class, and the normal DNS request class is called a white sample class;
5) construction of a Rapid prescreening Module Using white samples
Before the class judgment of the newly acquired DNS traffic is carried out, the newly acquired DNS traffic is defaulted to be a white sample class, and the class of the newly acquired DNS traffic is changed into a black sample class until the white sample is judged to be a black sample class through the judgment of the neural network model module; the fast pre-screening module is used for fast judging the newly received white samples, eliminating data with low probability of becoming black samples in the white samples, accelerating the speed of judging the category of the newly acquired DNS traffic and reducing the calculation amount of the neural network model module;
the steps of constructing the rapid pre-screening module comprise:
taking white samples acquired by a white sample standardization module as training samples, recording each training sample as a sub-domain name sequence, recording the sub-domain name sequence as S, and expressing the occurrence probability of the whole sequence as follows when the length is m:
secondly, according to the Markov assumption, the occurrence of a word is only related to the previous n words, n takes a value of 3, and the conditional probability calculation in the formula is simplified as follows:
thirdly, a Bayesian formula is utilized, and the calculation mode of each item is as follows:
wherein count (…) represents the number of times these words in the sample set co-occur in succession; in order to avoid the condition that the denominator is zero, after smoothing, the existence probability calculation formula of each sample is obtained and is expressed as follows:
v is the number of words in the word list, and for the whole sample set, the probability of all ternary combinations can be calculated and stored as a model for use in subsequent prediction;
calculating existence probability p of each sample sequence in the training sample, wherein due to the fact that the lengths of different sequences are different, the number of triples is different, and the probability difference calculated through the formula product is large, one-time conversion is conducted, the number of ternary combinations in the sequences is set to be t, and the existence probability of each sample sequence is expressed as;
Fifthly, determining a segmentation threshold, taking the median of the existence probability of all the training samples as a threshold, directly marking the training samples larger than the probability threshold as white samples, and enabling the white samples smaller than the probability threshold to enter a neural network model module for category judgment.
Advantageous effects
The fast pre-screening module can simply distinguish the normal request domain name from the tunnel request domain name, and the efficiency is high, so that the normal request domain name occupying most of the actual work can be quickly eliminated, and the subsequent deep learning model detection with high complexity and low speed is avoided. In the aspect of deep learning detection, the method combines the general rule characteristic and the deep domain name text characteristic for DNS hidden tunnel detection. Compared with the prior DNS detection method, the characteristics of domain name complexity, information entropy and the like do not need to be designed manually, and the deep network model can automatically learn the information in the domain name text, so that the method has a better effect compared with manual design. Next, network request characteristics such as the request record type, which are not in the domain name text, are also encoded as model inputs. In the aspect of deep model training, the method uses domain name data enhancement, can effectively amplify training data, improves the model effect, and reduces the cost of manpower and material resources brought by manually collecting data. Finally, the method also uses a black-and-white list mechanism, so that the previous detection result plays a role, and the detection efficiency and effect are improved.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
Example one
Referring to fig. 1, the implementation steps of the method for detecting a DNS hidden tunnel with multi-feature fusion provided by the present invention include:
s01 obtaining DNS hidden tunnel flow packet by black sample collector through self-built DNS hidden tunnel
The method comprises the following steps that a black sample collector builds a DNS hidden tunnel by using two servers and a DNS hidden tunnel implementation tool, wherein one server serves as a server end of the DNS server deployment DNS hidden tunnel implementation tool, and the other server serves as an access end of the DNS server deployment DNS hidden tunnel implementation tool; the DNS server is deployed to be the DNS server for analyzing the specific domain name, and the specific domain name is only set in a test environment between two servers, so that the external network environment is not influenced and is not influenced by the external network environment; editing data of any content as transmission sample data, wherein the size of the transmission sample data is not limited; deploying a tcpdump tool on a DNS server to collect DNS traffic, storing the DNS traffic in a PCAP packet mode, and using the DNS traffic as a DNS hidden tunnel traffic packet;
s02 preprocessing DNS hidden tunnel flow packet data by a black sample standardization module and extracting DNS hidden tunnel flow packet data characteristics
Extracting key fields in the PCAP flow packet by using a Wireshark tool, wherein the key fields mainly comprise a source ip, a source port, a destination ip, a destination port, a requested domain name and a request type; removing the domain name suffix, and dividing the sub domain name without the domain name suffix into a plurality of character strings by taking the character as a boundary, namely a plurality of sub domain name fragments; randomly replacing characters in the sub-domain names according to an expansion rule to expand data, so that a plurality of groups of sub-domain name fragments can be obtained; the expansion rule is that only characters of the same type are replaced when characters are replaced, the replacement positions and the number of the replaced characters are determined randomly, at least 1 character is replaced when the characters are replaced, the number of the characters replaced at most when the characters are replaced is half of the length of the character string, and the length of the replaced sub domain name is the same as that of the atomic domain name; extracting the domain name length; extracting the number of domain name labels, wherein the number of the domain name labels refers to the number of domain name fragments divided in a 'way'; extracting the DNS request record type; taking a group of a plurality of sub domain name fragments, domain name length, domain name label number and DNS request record type as a DNS hidden tunnel traffic packet data sample, wherein the DNS hidden tunnel traffic packet data sample is called a black sample;
s03 obtaining normal DNS request sample by white sample standardization module
Storing DNS traffic in daily work as a PCAP packet by collecting the DNS traffic, and extracting key fields in the PCAP traffic packet by using a Wireshark tool, wherein the key fields mainly comprise a source ip, a source port, a destination ip, a destination port, a requested domain name and a request type; removing the domain name suffix, and dividing the sub domain name without the domain name suffix into a plurality of character strings by taking the character as a boundary, namely a plurality of sub domain name fragments; extracting the domain name length; extracting domain name label number features; extracting the DNS request record type; taking a group of multiple sub domain name fragments, domain name length, domain name label number and DNS request record type as a normal DNS flow packet data sample, wherein the normal DNS flow packet data sample is called a white sample;
s04 neural network model building module
Numbering the domain name characters and features, and establishing a word list so as to be used for neural network model input: for the domain name length feature, the domain name fragment length is less than 10 and is coded as 1, the domain name fragment length is 10-20 and is coded as 2, the domain name fragment length is 20-30 and is coded as 3, the domain name fragment length is 30-50 and is coded as 4, and the domain name fragment length is more than 50 and is coded as 5; for the domain name label number characteristics, the number of domain name labels less than 3 is coded as 6, the number of domain name labels from 3 to 5 is coded as 7, and the number of domain name labels more than or equal to 5 is coded as 8; for the DNS record type feature, the DNS record type is a TXT record, the code is 9, and the code is 10 if the DNS record type is not a TXT record; for the domain name string feature, characters a through Z correspond to encodings 11 through 36, characters a through Z correspond to encodings 37 through 62, and characters 0-9 correspond to encodings 63-72; randomly taking 70% of samples of all samples as a training set, and randomly dividing the rest samples into a verification set and a test set in equal parts, wherein all samples comprise DNS hidden tunnel traffic packet data samples and normal DNS traffic packet data samples; performing filling operation before inputting a sample, setting the maximum length of a domain name fragment code to be 64, cutting off the part of input data which is longer than 64, and if the length is less than 64, supplementing 0 at the tail part; the word vector layer is used for converting each digital code into a vector form; in a CNN convolutional neural network layer, fully learning text characteristics of a domain name through one-dimensional convolutions with different convolution kernel sizes, splicing results of three convolutional layers, performing maximum pooling, adding a Dropout layer for reducing model complexity and preventing overfitting, and finally using a full-connection classification layer, wherein two classes exist, namely a DNS hidden tunnel class and a normal DNS request class, the DNS hidden tunnel class is called a black sample class, and the normal DNS request class is called a white sample class;
s05 construction of Rapid prescreening Module Using white samples
Before the class judgment of the newly acquired DNS traffic is carried out, the newly acquired DNS traffic is defaulted to be a white sample class, and the class of the newly acquired DNS traffic is changed into a black sample class until the white sample is judged to be a black sample class through the judgment of the neural network model module; the fast pre-screening module is used for fast judging the newly received white samples, eliminating data with low probability of becoming black samples in the white samples, accelerating the speed of judging the category of the newly acquired DNS traffic and reducing the calculation amount of the neural network model module;
the steps of constructing the rapid pre-screening module comprise:
taking white samples acquired by a white sample standardization module as training samples, recording each training sample as a sub-domain name sequence, recording the sub-domain name sequence as S, and expressing the occurrence probability of the whole sequence as follows when the length is m:
secondly, according to the Markov assumption, the occurrence of a word is only related to the previous n words, n takes a value of 3, and the conditional probability calculation in the formula is simplified as follows:
thirdly, a Bayesian formula is utilized, and the calculation mode of each item is as follows:
wherein count (…) represents the number of times that the words in the sample set co-occur consecutively; in order to avoid the condition that the denominator is zero, after the smoothing processing, the existence probability calculation formula of each sample is obtained and is expressed as follows:
v is the number of words in the word list, and for the whole sample set, the probability of all ternary combinations can be calculated and stored as a model for use in subsequent prediction;
calculating existence probability p of each sample sequence in the training sample, wherein due to the fact that the lengths of different sequences are different, the number of triples is different, and the probability difference calculated through the formula product is large, one-time conversion is conducted, the number of ternary combinations in the sequences is set to be t, and the existence probability of each sample sequence is expressed as;
Fifthly, determining a segmentation threshold, taking the median of the existence probability of all the training samples as a threshold, directly marking the training samples larger than the probability threshold as white samples, and enabling the white samples smaller than the probability threshold to enter a neural network model module for category judgment.
Example two
Newly acquired DNS network traffic classification
1) Inputting newly acquired DNS network flow into a white sample standardization module to obtain a white sample;
2) inputting the white sample into a quick pre-screening module to filter the white sample into a white sample with low black sample probability;
3) and inputting the white samples with high probability of becoming black samples in the white samples into a neural network model module, and finally classifying the input white samples.
Claims (1)
1. A DNS hidden tunnel detection method with multi-feature fusion is characterized by comprising the following implementation steps:
1) obtaining DNS hidden tunnel flow packets through self-built DNS hidden tunnels by black sample collector
The method comprises the following steps that a black sample collector builds a DNS hidden tunnel by using two servers and a DNS hidden tunnel implementation tool, wherein one server serves as a server end of the DNS server deployment DNS hidden tunnel implementation tool, and the other server serves as an access end of the DNS server deployment DNS hidden tunnel implementation tool; the DNS server is deployed to be the DNS server for analyzing the specific domain name, and the specific domain name is only set in a test environment between two servers, so that the external network environment is not influenced and is not influenced by the external network environment; editing data of any content as transmission sample data, wherein the size of the transmission sample data is not limited; deploying a tcpdump tool on a DNS server to collect DNS traffic, storing the DNS traffic in a PCAP packet mode, and using the DNS traffic as a DNS hidden tunnel traffic packet;
2) preprocessing DNS hidden tunnel flow packet data by a black sample standardization module, and extracting DNS hidden tunnel flow packet data characteristics
Extracting key fields in the PCAP flow packet by using a Wireshark tool, wherein the key fields mainly comprise a source ip, a source port, a destination ip, a destination port, a requested domain name and a request type; removing the domain name suffix, and dividing the sub domain name without the domain name suffix into a plurality of character strings by taking the character as a boundary, namely a plurality of sub domain name fragments; randomly replacing characters in the sub-domain names according to an expansion rule to expand data, so that a plurality of groups of sub-domain name fragments can be obtained; the expansion rule is that only characters of the same type are replaced when characters are replaced, the replacement positions and the number of the replaced characters are determined randomly, at least 1 character is replaced when the characters are replaced, the number of the characters replaced at most when the characters are replaced is half of the length of the character string, and the length of the replaced sub domain name is the same as that of the atomic domain name; extracting the domain name length; extracting the number of domain name labels, wherein the number of the domain name labels refers to the number of domain name fragments segmented in'; extracting the DNS request record type; taking a group of a plurality of sub domain name fragments, domain name length, domain name label number and DNS request record type as a DNS hidden tunnel traffic packet data sample, wherein the DNS hidden tunnel traffic packet data sample is called a black sample;
3) obtaining normal DNS request sample by white sample standardization module
Storing DNS traffic in daily work as a PCAP (personal computer application protocol) packet by collecting the DNS traffic in daily work, and extracting key fields in the PCAP traffic packet by using a Wireshark tool, wherein the key fields mainly comprise a source ip, a source port, a destination ip, a destination port, a requested domain name and a request type; removing the domain name suffix, and dividing the sub domain name without the domain name suffix into a plurality of character strings by taking the character as a boundary, namely a plurality of sub domain name fragments; extracting the domain name length; extracting domain name label number features; extracting the DNS request record type; taking a group of a plurality of sub domain name fragments, domain name length, domain name label number and DNS request record type as a normal DNS flow packet data sample, wherein the normal DNS flow packet data sample is called a white sample;
4) model module for constructing neural network
Numbering the domain name characters and features, and establishing a word list so as to be used for neural network model input: for the domain name length feature, the domain name fragment length is less than 10 and is coded as 1, the domain name fragment length is 10-20 and is coded as 2, the domain name fragment length is 20-30 and is coded as 3, the domain name fragment length is 30-50 and is coded as 4, and the domain name fragment length is 5 above 50; for the domain name label number characteristics, the number of domain name labels less than 3 is coded as 6, the number of domain name labels from 3 to 5 is coded as 7, and the number of domain name labels more than or equal to 5 is coded as 8; for the DNS record type feature, the DNS record type is a TXT record, the code is 9, and the code is 10 if the DNS record type is not a TXT record; for the domain name string feature, characters a through Z correspond to encodings 11 through 36, characters a through Z correspond to encodings 37 through 62, and characters 0-9 correspond to encodings 63-72; randomly taking 70% of samples of all samples as a training set, and randomly dividing the rest samples into a verification set and a test set in equal parts, wherein all samples comprise DNS hidden tunnel traffic packet data samples and normal DNS traffic packet data samples; performing filling operation before inputting a sample, setting the maximum length of a domain name fragment code to be 64, cutting off the part of input data which is longer than 64, and if the length is less than 64, supplementing 0 at the tail part; the word vector layer is used for converting each digital code into a vector form; in a CNN convolutional neural network layer, fully learning text characteristics of a domain name through one-dimensional convolutions with different convolution kernel sizes, splicing results of three convolutional layers, performing maximum pooling, adding a Dropout layer for reducing model complexity and preventing overfitting, and finally using a full-connection classification layer, wherein two classes exist, namely a DNS hidden tunnel class and a normal DNS request class, the DNS hidden tunnel class is called a black sample class, and the normal DNS request class is called a white sample class;
5) construction of a Rapid prescreening Module Using white samples
Before the class judgment of the newly acquired DNS traffic is carried out, the newly acquired DNS traffic is defaulted to be a white sample class, and the class of the newly acquired DNS traffic is changed into a black sample class until the white sample is judged to be a black sample class through the judgment of the neural network model module; the fast pre-screening module is used for fast judging the newly received white samples, eliminating data with low probability of becoming black samples in the white samples, accelerating the speed of judging the category of the newly acquired DNS traffic and reducing the calculation amount of the neural network model module;
the steps of constructing the rapid pre-screening module comprise:
taking white samples acquired by a white sample standardization module as training samples, recording each training sample as a sub-domain name sequence, recording the sub-domain name sequence as S, and expressing the occurrence probability of the whole sequence as follows when the length is m:
secondly, according to the Markov assumption, the occurrence of a word is only related to the previous n words, n takes a value of 3, and the conditional probability calculation in the formula is simplified as follows:
thirdly, a Bayesian formula is utilized, and the calculation mode of each item is as follows:
wherein count (…) represents the number of times that the words in the sample set co-occur consecutively; in order to avoid the condition that the denominator is zero, after the smoothing processing, the existence probability calculation formula of each sample is obtained and is expressed as follows:
v is the number of words in the word list, and for the whole sample set, the probability of all ternary combinations can be calculated and stored as a model for use in subsequent prediction;
calculating existence probability p of each sample sequence in the training sample, wherein due to the fact that the lengths of different sequences are different, the number of triples is different, and the probability difference calculated through the formula product is large, one-time conversion is conducted, the number of ternary combinations in the sequences is set to be t, and the existence probability of each sample sequence is expressed as;
Fifthly, determining a segmentation threshold, taking the median of the existence probability of all the training samples as a threshold, directly marking the training samples larger than the probability threshold as white samples, and enabling the white samples smaller than the probability threshold to enter a neural network model module for category judgment.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210198998.5A CN114567487B (en) | 2022-03-03 | 2022-03-03 | Multi-feature fusion type DNS hidden tunnel detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210198998.5A CN114567487B (en) | 2022-03-03 | 2022-03-03 | Multi-feature fusion type DNS hidden tunnel detection method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114567487A true CN114567487A (en) | 2022-05-31 |
CN114567487B CN114567487B (en) | 2024-08-06 |
Family
ID=81716791
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210198998.5A Active CN114567487B (en) | 2022-03-03 | 2022-03-03 | Multi-feature fusion type DNS hidden tunnel detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114567487B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115086080A (en) * | 2022-08-03 | 2022-09-20 | 上海欣诺通信技术股份有限公司 | DNS hidden tunnel detection method based on flow characteristics |
CN115134168A (en) * | 2022-08-29 | 2022-09-30 | 成都盛思睿信息技术有限公司 | Method and system for detecting cloud platform hidden channel based on convolutional neural network |
CN115348188A (en) * | 2022-10-18 | 2022-11-15 | 安徽华云安科技有限公司 | DNS tunnel traffic detection method and device, storage medium and terminal |
CN115643087A (en) * | 2022-10-24 | 2023-01-24 | 天津大学 | DNS tunnel detection method based on fusion of coding characteristics and statistical behavior characteristics |
CN116614262A (en) * | 2023-04-27 | 2023-08-18 | 华能信息技术有限公司 | Hidden network channel detection method |
CN118041698A (en) * | 2024-04-11 | 2024-05-14 | 深圳大学 | DNS hidden tunnel detection method, device and storage medium |
CN116614262B (en) * | 2023-04-27 | 2024-10-25 | 华能信息技术有限公司 | Hidden network channel detection method |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120054860A1 (en) * | 2010-09-01 | 2012-03-01 | Raytheon Bbn Technologies Corp. | Systems and methods for detecting covert dns tunnels |
US20170318035A1 (en) * | 2016-04-29 | 2017-11-02 | International Business Machines Corporation | Cognitive and contextual detection of malicious dns |
US20180063162A1 (en) * | 2016-08-25 | 2018-03-01 | International Business Machines Corporation | Dns tunneling prevention |
CN110149418A (en) * | 2018-12-12 | 2019-08-20 | 国网信息通信产业集团有限公司 | A kind of hidden tunnel detection method of DNS based on deep learning |
CN111835763A (en) * | 2020-07-13 | 2020-10-27 | 北京邮电大学 | DNS tunnel traffic detection method and device and electronic equipment |
CN111953673A (en) * | 2020-08-10 | 2020-11-17 | 深圳市联软科技股份有限公司 | DNS hidden tunnel detection method and system |
CN113347210A (en) * | 2021-08-03 | 2021-09-03 | 北京观成科技有限公司 | DNS tunnel detection method and device and electronic equipment |
-
2022
- 2022-03-03 CN CN202210198998.5A patent/CN114567487B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120054860A1 (en) * | 2010-09-01 | 2012-03-01 | Raytheon Bbn Technologies Corp. | Systems and methods for detecting covert dns tunnels |
US20170318035A1 (en) * | 2016-04-29 | 2017-11-02 | International Business Machines Corporation | Cognitive and contextual detection of malicious dns |
US20180063162A1 (en) * | 2016-08-25 | 2018-03-01 | International Business Machines Corporation | Dns tunneling prevention |
CN110149418A (en) * | 2018-12-12 | 2019-08-20 | 国网信息通信产业集团有限公司 | A kind of hidden tunnel detection method of DNS based on deep learning |
CN111835763A (en) * | 2020-07-13 | 2020-10-27 | 北京邮电大学 | DNS tunnel traffic detection method and device and electronic equipment |
CN111953673A (en) * | 2020-08-10 | 2020-11-17 | 深圳市联软科技股份有限公司 | DNS hidden tunnel detection method and system |
CN113347210A (en) * | 2021-08-03 | 2021-09-03 | 北京观成科技有限公司 | DNS tunnel detection method and device and electronic equipment |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115086080A (en) * | 2022-08-03 | 2022-09-20 | 上海欣诺通信技术股份有限公司 | DNS hidden tunnel detection method based on flow characteristics |
CN115086080B (en) * | 2022-08-03 | 2024-05-07 | 上海欣诺通信技术股份有限公司 | DNS hidden tunnel detection method based on flow characteristics |
CN115134168A (en) * | 2022-08-29 | 2022-09-30 | 成都盛思睿信息技术有限公司 | Method and system for detecting cloud platform hidden channel based on convolutional neural network |
CN115348188A (en) * | 2022-10-18 | 2022-11-15 | 安徽华云安科技有限公司 | DNS tunnel traffic detection method and device, storage medium and terminal |
CN115348188B (en) * | 2022-10-18 | 2023-03-24 | 安徽华云安科技有限公司 | DNS tunnel traffic detection method and device, storage medium and terminal |
CN115643087A (en) * | 2022-10-24 | 2023-01-24 | 天津大学 | DNS tunnel detection method based on fusion of coding characteristics and statistical behavior characteristics |
CN115643087B (en) * | 2022-10-24 | 2024-04-30 | 天津大学 | DNS tunnel detection method based on fusion of coding features and statistical behavior features |
CN116614262A (en) * | 2023-04-27 | 2023-08-18 | 华能信息技术有限公司 | Hidden network channel detection method |
CN116614262B (en) * | 2023-04-27 | 2024-10-25 | 华能信息技术有限公司 | Hidden network channel detection method |
CN118041698A (en) * | 2024-04-11 | 2024-05-14 | 深圳大学 | DNS hidden tunnel detection method, device and storage medium |
CN118041698B (en) * | 2024-04-11 | 2024-06-18 | 深圳大学 | DNS hidden tunnel detection method, device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN114567487B (en) | 2024-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114567487B (en) | Multi-feature fusion type DNS hidden tunnel detection method | |
WO2022041394A1 (en) | Method and apparatus for identifying network encrypted traffic | |
CN110597734B (en) | Fuzzy test case generation method suitable for industrial control private protocol | |
CN112839034B (en) | Network intrusion detection method based on CNN-GRU hierarchical neural network | |
CN113518063B (en) | Network intrusion detection method and system based on data enhancement and BilSTM | |
CN111340191B (en) | Bot network malicious traffic classification method and system based on ensemble learning | |
CN110012029B (en) | Method and system for distinguishing encrypted and non-encrypted compressed flow | |
JP6055548B2 (en) | Apparatus, method, and network server for detecting data pattern in data stream | |
CN108199863B (en) | Network traffic classification method and system based on two-stage sequence feature learning | |
CN109753987B (en) | File recognition method and feature extraction method | |
CN113079069A (en) | Mixed granularity training and classifying method for large-scale encrypted network traffic | |
CN110865970B (en) | Compression flow pattern matching engine and pattern matching method based on FPGA platform | |
CN114553983A (en) | Deep learning-based high-efficiency industrial control protocol analysis method | |
CN111611280A (en) | Encrypted traffic identification method based on CNN and SAE | |
CN111130942B (en) | Application flow identification method based on message size analysis | |
CN112887291A (en) | I2P traffic identification method and system based on deep learning | |
CN114338437B (en) | Network traffic classification method and device, electronic equipment and storage medium | |
CN115622926A (en) | Industrial control protocol reverse analysis method based on network traffic | |
Yujie et al. | End-to-end android malware classification based on pure traffic images | |
CN117082118A (en) | Network connection method based on data derivation and port prediction | |
CN112437084B (en) | Attack feature extraction method | |
CN113378163A (en) | Android malicious software family classification method based on DEX file partition characteristics | |
CN114884894B (en) | Semi-supervised network traffic classification method based on transfer learning | |
CN113852605B (en) | Protocol format automatic inference method and system based on relation reasoning | |
CN114205151B (en) | HTTP/2 page access flow identification method based on multi-feature fusion learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |