CN110535821A - A kind of Host Detection method of falling based on DNS multiple features - Google Patents

A kind of Host Detection method of falling based on DNS multiple features Download PDF

Info

Publication number
CN110535821A
CN110535821A CN201910413662.4A CN201910413662A CN110535821A CN 110535821 A CN110535821 A CN 110535821A CN 201910413662 A CN201910413662 A CN 201910413662A CN 110535821 A CN110535821 A CN 110535821A
Authority
CN
China
Prior art keywords
dns
domain name
data
request
host
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910413662.4A
Other languages
Chinese (zh)
Inventor
陈虎
唐开达
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Juming Network Technology Co Ltd
Original Assignee
Nanjing Juming Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Juming Network Technology Co Ltd filed Critical Nanjing Juming Network Technology Co Ltd
Priority to CN201910413662.4A priority Critical patent/CN110535821A/en
Publication of CN110535821A publication Critical patent/CN110535821A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • H04L63/0236Filtering by address, protocol, port number or service, e.g. IP-address or URL
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The Host Detection method of falling based on DNS multiple features that the present invention relates to a kind of, the described method comprises the following steps: the unpacking of step 1) DNS data packet and Dialog processing;Step 2) domain name classification processing;Step 3) establishes name server list;Step 4) data characteristics extracts;Step 5) builds training dataset;Step 6) examines DNS request in practice;The program can position the host there may be problem, achieve the purpose that position comprehensively, improve the accuracy of positioning.

Description

A kind of Host Detection method of falling based on DNS multiple features
Technical field
The present invention relates to a kind of detection methods, and in particular to a kind of Host Detection method of falling based on DNS multiple features, Belong to Host Detection technical field.
Background technique
Currently, correspondingly, the black production in underground is also more prosperous as internet and related system are applied more and morely, Various rogue softwares, wooden horse the related rogue program such as extort and dig mine and be also implanted into its institute in the unwitting situation of user The computer used, the main purpose of hackers are to be made profit (for example to encrypt critical file, thus to computer The owner extort, these documents can be decrypted by generally requiring payment electronic money such as bit coin etc.), certainly Also other purpose invasions have been not excluded for;For being implanted the host of similar software, then host of falling is referred to as.
In computer safety field, an important job is exactly to be screened, examined to being likely to occur the host fallen It surveys and isolation is even diffused in production network with preventing it in office network, cause irremediable loss, one As similar host carried out discriminating method for distinguishing being carried out from host computer side and network side nothing but;It is such as identified, is then needed from host computer side The software of special purpose, such as 360, tinder are installed, and these general softwares, for Windows platform, they are needed There is the feature database of real-time update, but windows platform host can infect all kinds of harmful softwares, even Linux in reality Platform host can also be implanted number of types of Malware, such as Lucky (mutation of a Satan's wooden horse), be exactly a ELF format extorts software+digging mine software, therefore being monitored in network side to host just becomes necessary, but even with net Network activity is monitored, and generally also relies on the detection feature database of such as IDS, firewall, Security Wall product, can not be to novel Characteristic of malware is timely updated, so effect is caused to be had a greatly reduced quality, thus bungle the chance of winning a battle, it cannot be at the first time Harm brought by Malware is reduced, is abused so as to cause leakage of data, computing resource, important documents are encrypted etc. sternly Weight consequence.
Unknown threat can so be detected independent of known features it can of course consider using sandbox Technology reaches this purpose, but sandbox technology is generally based on and restores to the file of known protocol (and being non-encrypted), Then it is made to carry out operation in sandbox to observe the behavior of these softwares to infer whether nocuousness, it can be seen that deposit here In some limitations: first, the reduction of file is necessarily dependent upon non-encrypted transport protocol, such as HTTP, FTP and non-encrypted Mail associated transport agreement;Second, the reduction of file is necessarily dependent upon the known network transmission protocol, unknown transport protocol It can not then restore;Third can not generally be detected and be restored for the file being passed to by other media, such as USB flash disk.By above-mentioned Analysis, it can be seen that some unknown threats can be detected using sandbox technology, but there is very big limitation.
In general, similar wooden horse, extorting software, digging mine software etc. and require to carry out network communication, such as general wooden horse Software needs and the communication of its control centre is to carry out more complicated activity, and extorts software and need to communicate to obtain key, digs mine Software needs switching task information etc., and the destination that they are communicated is usually overseas, therefore they attempt to make no public appearances completely Be it is impossible, in terms of this for, some clues and traces should be able to then be obtained by the detection to communication data;In early days The communication purpose IP address of some Malwares be hard coded in code, this results in being easy to by coherent detection Software found, thus modern Malware be generally communicated by dynamic address (but address be also in fact Variation in limited range) since address change, then it needs using domain name come the corresponding of sum, therefore also can by DNS request It is detected, but consequent is that hacker software will use dynamic generation algorithm domain name not stop to convert domain name, this causes pair The means of specific black domain name detection also fail.
Modern network security checks that system is typically equipped with dynamic generation domain name detection module, can be to a certain degree with one Determine to detect similar domain name in range, but its there are still some problems, be mainly manifested in:
1. different establish a capital of dynamic generation algorithm domain name is carried out by hacker software, some normal softwares can also utilize this skill Art;
2. the detection of dynamic generation algorithm domain name depends on training set, therefore there may be biggish wrong reports, i.e., if it is new Dynamic generation algorithm occurs, and possible existing training result will be invalid;
3. if only detected to domain name itself, and other aspects are ignored for the illegal communication detection of Malware, A possibility that then seeming more single, reporting by mistake and fail to report, is often larger.
In view of the foregoing it is apparent that the network communication data to host of falling detect, especially DNS related protocol number According to detection can yet be regarded as a kind of preferable method, but will cause certain result error just for the detection of request domain name, therefore need Comprehensive means are used to carry out it.
Summary of the invention
The present invention exactly aiming at the problems existing in the prior art, provides a kind of host inspection of falling based on DNS multiple features Survey method, this method detect possible host of falling by extracting several features of DNS Protocol, these features not only cover Dynamic algorithm mentioned above generates the detection of domain name, and further comprise to DNS data packet feature, history feature and The feature of request results detection etc..Then data are divided using certain mathematical tool, it may production to position The host of raw problem, achievees the purpose that position comprehensively.
To achieve the goals above, technical scheme is as follows, a kind of host inspection of falling based on DNS multiple features Survey method, which is characterized in that the described method comprises the following steps:
The unpacking of step 1) DNS data packet and Dialog processing;
Step 2) domain name classification processing;
Step 3) establishes name server list;
Step 4) data characteristics extracts;
Step 5) builds training dataset;
Step 6) examines DNS request in practice.
As an improvement of the present invention, the unpacking of step 1) the DNS data packet and Dialog processing are specific as follows: root DNS related data packets are unpacked according to RFC relevant regulations content, unpacking includes its data link layer, network layer, transport layer (mainly udp protocol) and application layer partial information can will request domain name in application layer, the return IP in response data packet The information such as address, ttl value parse, for used in subsequent analysis;In addition, system can according to the Transaction Identification Number in DNS Protocol, Request data package and response data packet are associated, to form complete DNS session data;For the (main of request failure If name server can return to " no this domain name " information) it is marked.
As an improvement of the present invention, step 2) the domain name classification processing is specific as follows: domain name classification processing: root According to the domain name white list pre-established, classify to all domain name requests, the matched request of energy on domain name white list will They are divided into square data, and then first divide losing side data in advance without ranking on white list.Although (may they not It is all problematic data);
As an improvement of the present invention, the step 3) establishes name server list;It is specific as follows: to establish domain name The purpose of list server be general host and name server are distinguished in order to filter name server because Domain name request is multilayered structure, and in the units such as some large enterprises or school, there are multistage name server, these domain names Server should be not involved in relevant calculating, exactly should first exclude them.
As an improvement of the present invention, the step 4) data characteristics extracts: extracting to various features, shape It is specific as follows at corresponding feature vector:
41) domain name spells feature extraction: being extracted by the analysis of the spelling feature to domain name, mainly includes vowel word Female accounting, such as baidu, continuous vowel accounting are 60%, because containing three vowels in its spelling, and corresponding consonant Alphabetical accounting is 40%;Alphabetical accounting is then 100% for baidu, and similar is 0% if the domain names such as 163 (Neteases), and Digital accounting is 100%;Furthermore it is possible to the dictionary word accounting to domain name counts, this be also to discriminate between normal operation in normal domain name with it is different A kind of very effective method of normal domain name, naturally it is also possible to be learnt according to known white domain name, be shifted by markov Probability matrix extracts the letter of domain name two or triliteral ratio characteristic, but this is only a kind of feature that the present invention is included;
42) TTL statistical nature extracts: using the analysis of the ttl value previously for DNS data packet, to (this in the unit time Invention uses 1 hour time window) some statistical natures of ttl value are analyzed, in order to obtain positive and negative sense data, We are respectively to the domain name data in white list and not in the DNS response data of domain name whitelisted content (because TTL is in DNS Response data packet in) calculate separately its TTL statistical nature, i.e. TTL mean value, TTL standard deviation:
In above-mentioned formula, TTLipIt is the life span of all IP returned in DNS response data packet, AVGTTLIt is average The generation time of IP, N are the quantity of sampling;In addition, can also obtain the different value number features of all TTL in the present invention (i.e. How many different ttl value) and for white list, non-white list TTL value range, finally use Min-Max method They are normalized and (are constrained in their value range between 0 to 1, similarly hereinafter), to obtain corresponding part Vector value;
43) return address DNS feature extraction: carrying out feature extraction according to the address that DNS response data packet is returned, this Part is relatively simple, and exactly calculating average return address quantity, (general regular large-scale portal website may return to up to ten Remaining a IP address, and corresponding C&C server generally only has 1 IP public network address whithin a period of time);Pass through the IP of return Address maps national information in conjunction with IP geographical information library (such as IP-GEO), calculates distribution of countries situation;Using with returning to IP Location inquires this IP address and belongs to how many a domain names in history in conjunction with history relevant information, i.e., IP domain name is counter looks into, general malice Network address is in history it is possible that multiple IP public network addresses, this is also the important feature screened malice and go back to even address;
44) domain name request failure ratio characteristic extracts: passing through relevant solution in the dns response data packet of name server Analysis failure or successful result, obtain related resolution failure ratio, and the parsing to fail in these data packets can generally return to " No The mark such as such domain name ", and be corresponding to it without any corresponding IP address;
45) domain name request ranking feature extraction: the extraction of this Partial Feature is fairly simple, i.e., if in domain name ranking list It is assigned a value of 1, is otherwise assigned a value of 0, this feature is also to obtain the foundation of initial training data;Domain name ranking list generally can be with (free or charged) is obtained by certain method;
46) request frequency feature extraction: within the unit time (present invention uses 1 hour time window), to related service The domain name request frequency of device or terminal is counted;In addition, to request domain name be grouped, it is requested distribution entropy into Row calculates, and formula is as follows:
In above formula, N is the different domain name quantity of request, and piFor the accounting of different domain name quantity, it can be seen that if entropy Value is smaller, and the distribution requested more disperses, and otherwise more concentrates, and the former may then represent a possibility that associated host is fallen and be cured Greatly;47) DNS data packet entropy feature extraction: the extraction of this feature is primarily to check whether wrap in related DNS data packet Containing specific encryption data, because the character distribution of encryption data is completely different from normal distribution, and Malware is sometimes Some encryption datas, such as SSH are embedded in DNS request packet;The calculating of this feature and above request frequency feature extraction class Seemingly, but its distribution be mainly each character distribution ratio, i.e., all load datas in request data package are mapped to 0-255 In a character, each character proportion is counted, entropy is then calculated.
As an improvement of the present invention, it is specific as follows to build training dataset for the step 5): by training environment The related data of collection generates vector data according to the information of each dimension of extraction, marks to positively and negatively data Note, divides related data using support vector machines, is screened, be explained as follows to be generated as actual operating data:
■ definition: assuming that there is object set X, size is N, and element is the vector from p dimension real number space, i.e. x ∈Rp, the category value u of each objecti∈{-1,1}.Linear separability support vector machines is and classifier that output class is expressed as Y1=(x, wT) and Y x+b2=(x ,-wTX-b), if xk∈Y1Then there is wTX+b >=1, if xk∈Y2Then there is wTx+b≤-1。
■ supporting vector: following with the range formula of point to plane it can be concluded that distance of the sample away from two classes differentiation plane Are as follows:
Two above hyperplane is parallel to each other, and no sample point is fallen into this section, separating hyperplance be just located at this two A plane central (being identical to the distance of two planes), and w is exactly the normal vector of hyperplane, so w is exactly So-called supporting vector;After solution is just exactly that we are obtained by data substitution;Libsvm work can be used in specific calculate Tool obtains.In practice, due to having extracted the data of up to more than ten of dimension, in order to guarantee inspection result, we also make A liter dimension (Gaussian kernel of actual use), formula are carried out with radial kernel function (RBF) are as follows:
Its meaning is to calculate the distance between some sample point and other sample points, and be converted between 0 to 1 nothing but A floating number, then passing through the above method if there is m sample point can be by sample point by l dimension (general l be much smaller than m) Rise to m dimension.
As an improvement of the present invention, the step 6) examines DNS request in practice, specific as follows: by institute The supporting vector parameter of acquisition is stored in associated documents, for real data, according to the correlated characteristic extraction side in step 4 Then method extraction feature detect using supporting vector, to obtain associated host with the presence or absence of sign of falling.
Compared with the existing technology, it the invention has the advantages that, can effectively be detected through the invention from many aspects Falling, there may be ground exceptions in terms of DNS communication for host, and not merely only rely upon and spell this one aspect for domain name Feature is detected, to efficiently avoid failing to report and reporting by mistake to host of falling, improves the credible of host detecting of falling Degree, to bring significantly more value for associated user;2) program needs to extract following feature mainly for DNS Protocol Content: domain name spells feature: this partial content and dynamic generation domain name detection algorithm are not exactly the same, mainly acquisition request Such as vowel accounting, consonant accounting, the digital accounting, character accounting, dictionary word accounting statistic of domain name;TTL Statistical nature: TTL is the abbreviation of Time To Live, it characterizes the frequency of domain name refreshing, since harmful software generally requires Frequently change domain name in a short time, to hide the inspection of coherent detection software, therefore TTL value is generally shorter, much smaller than general The domain name time-to-live;Here we need to extract the mean value of TTL and calculate its variance, TTL value number of samples (such as 3600, 86400 etc.);The return address DNS feature: extracting the correlated characteristic of DNS response data packet, including the address such as returned The national number of number, address of cache, and pass through the anti-quantity etc. for looking into domain name of IP address;Domain name request failure ratio characteristic: for It for DGA domain name, is randomly generated due to it, therefore wherein only has sub-fraction that can successfully obtain corresponding IP address letter Breath, therefore the overwhelming majority all has failed, so should also can be seen that certain host with the presence or absence of possibility of falling by this feature;Request Domain name ranking feature:, can be by the domain name of host request in contrast, such as by establishing domain name request ranked list mode Fruit is not within ranking it may be considered that there are some problems, if not within 10,000,000 etc.;Request frequency feature: The frequency feature of the various domain names of request, higher to mean to be cured there may be the possibility fallen in unit time (generally 1 hour) Greatly;DNS data packet entropy feature: entropy calculating is carried out to related data packets using shannon entropy method, obtains its entropy spy Sign;The purpose for extracting entropy is because can be entrained with encryption data in the DNS request data packet of part harmful software.Pass through pumping Take above-mentioned correlated characteristic, by they organize become vector form, using two classification methods (present invention mainly use support to Amount machine, i.e. Support Vector Machine, are abbreviated as SVM, naturally it is also possible to using other sorting algorithms, for example patrol Recurrence etc. is collected, but in higher dimensional space, preferably obtains acceptably result using supporting vector function) they are drawn Point, such as it is divided into the data of negative sense, then it is assumed that there may be problems;Detailed embodiment see below content.The program The abstracting method of perfect DNS network communication data based on multiple features is established, checks the domain for the host that may fall comprehensively Name request and response condition, rather than it is single only for domain name spelling situation detected.
Specific embodiment:
In order to deepen the understanding of the present invention, below with reference to embodiment, the present invention is described in further detail.
Related definition involved in the program is as follows:
DNS: i.e. domain name system is the title that the IP address that will be not easy to remember assigns an easy memory, generally English, The composition such as digital, '-' and ' ';
DGA domain name: i.e. dynamic generation algorithm generates a kind of method of domain name generally by certain pseudo-random algorithms, it The communication being generally widely used between Trojan software and C&C control server;
It falls host: by wooden horse, digging the host that the types harmful softwares such as mine are infected, it can generally be carried out long-range by hacker Control.
Embodiment 1: a kind of Host Detection method of falling based on DNS multiple features the described method comprises the following steps:
The unpacking of step 1) DNS data packet and Dialog processing;
Step 2) domain name classification processing;
Step 3) establishes name server list;
Step 4) data characteristics extracts;
Step 5) builds training dataset;
Step 6) examines DNS request in practice.
The unpacking of step 1) the DNS data packet and Dialog processing are specific as follows: according to RFC relevant regulations content to DNS Related data packets are unpacked, and are unpacked comprising its data link layer, network layer, transport layer (mainly udp protocol) and application Layer partial information can will request domain name in application layer, the information such as the return IP address in response data packet, ttl value parse Come, for used in subsequent analysis;In addition, system can be according to the Transaction Identification Number in DNS Protocol, by request data package and response data Packet is associated, to form complete DNS session data;For request failure (mainly name server can return " no this domain name " information) it is marked.
Step 2) the domain name classification processing is specific as follows: domain name classification processing: according to the domain name white list pre-established, Classify to all domain name requests, on domain name white list can matched request, they are divided into square data, and Losing side data are then first divided in advance without ranking on white list.Although (may they be not all problematic data);
The step 3) establishes name server list;It is specific as follows: establish name server list purpose be in order to Name server is filtered, i.e., is distinguished general host and name server, because domain name request is multilayered structure, and In the units such as some large enterprises or school, there are multistage name server, these name servers should be not involved in it is relevant It calculates, exactly they should first be excluded.
Step 4) the data characteristics extracts: extracting to various features, forms corresponding feature vector, specifically It is as follows:
41) domain name spells feature extraction: being extracted by the analysis of the spelling feature to domain name, mainly includes vowel word Female accounting, such as baidu, continuous vowel accounting are 60%, because containing three vowels in its spelling, and corresponding consonant Alphabetical accounting is 40%;Alphabetical accounting is then 100% for baidu, and similar is 0% if the domain names such as 163 (Neteases), and Digital accounting is 100%;Furthermore it is possible to the dictionary word accounting to domain name counts, this be also to discriminate between normal operation in normal domain name with it is different A kind of very effective method of normal domain name, naturally it is also possible to be learnt according to known white domain name, be shifted by markov Probability matrix extracts the letter of domain name two or triliteral ratio characteristic, but this is only a kind of feature that the present invention is included;
42) TTL statistical nature extracts: using the analysis of the ttl value previously for DNS data packet, to (this in the unit time Invention uses 1 hour time window) some statistical natures of ttl value are analyzed, in order to obtain positive and negative sense data, We are respectively to the domain name data in white list and not in the DNS response data of domain name whitelisted content (because TTL is in DNS Response data packet in) calculate separately its TTL statistical nature, i.e. TTL mean value, TTL standard deviation:
In above-mentioned formula, TTLipIt is the life span of all IP returned in DNS response data packet, AVGTTLIt is average The generation time of IP, N are the quantity of sampling;In addition, can also obtain the different value number features of all TTL in the present invention (i.e. How many different ttl value) and for white list, non-white list TTL value range, finally use Min-Max method They are normalized and (are constrained in their value range between 0 to 1, similarly hereinafter), to obtain corresponding part Vector value;
43) return address DNS feature extraction: carrying out feature extraction according to the address that DNS response data packet is returned, this Part is relatively simple, and exactly calculating average return address quantity, (general regular large-scale portal website may return to up to ten Remaining a IP address, and corresponding C&C server generally only has 1 IP public network address whithin a period of time);Pass through the IP of return Address maps national information in conjunction with IP geographical information library (such as IP-GEO), calculates distribution of countries situation;Using with returning to IP Location inquires this IP address and belongs to how many a domain names in history in conjunction with history relevant information, i.e., IP domain name is counter looks into, general malice Network address is in history it is possible that multiple IP public network addresses, this is also the important feature screened malice and go back to even address;
44) domain name request failure ratio characteristic extracts: passing through relevant solution in the dns response data packet of name server Analysis failure or successful result, obtain related resolution failure ratio, and the parsing to fail in these data packets can generally return to " No The mark such as such domain name ", and be corresponding to it without any corresponding IP address;
45) domain name request ranking feature extraction: the extraction of this Partial Feature is fairly simple, i.e., if in domain name ranking list It is assigned a value of 1, is otherwise assigned a value of 0, this feature is also to obtain the foundation of initial training data;Domain name ranking list generally can be with (free or charged) is obtained by certain method;
46) request frequency feature extraction: within the unit time (present invention uses 1 hour time window), to related service The domain name request frequency of device or terminal is counted;In addition, to request domain name be grouped, it is requested distribution entropy into Row calculates, and formula is as follows:
In above formula, N is the different domain name quantity of request, and piFor the accounting of different domain name quantity, it can be seen that if entropy Value is smaller, and the distribution requested more disperses, and otherwise more concentrates, and the former may then represent a possibility that associated host is fallen and be cured Greatly;47) DNS data packet entropy feature extraction: the extraction of this feature is primarily to check whether wrap in related DNS data packet Containing specific encryption data, because the character distribution of encryption data is completely different from normal distribution, and Malware is sometimes Some encryption datas, such as SSH are embedded in DNS request packet;The calculating of this feature and above request frequency feature extraction class Seemingly, but its distribution be mainly each character distribution ratio, i.e., all load datas in request data package are mapped to 0-255 In a character, each character proportion is counted, entropy is then calculated.
It is specific as follows that the step 5) builds training dataset: by the related data collected in training environment, according to pumping The information of each dimension taken generates vector data, positively and negatively data is marked, using support vector machines to correlation Data are divided, and are screened, are explained as follows to be generated as actual operating data:
■ definition: assuming that there is object set X, size is N, and element is the vector from p dimension real number space, i.e. x ∈Rp, the category value u of each objecti∈{-1,1}.Linear separability support vector machines is and classifier that output class is expressed as Y1=(x, wT) and Y x+b2=(x ,-wTX-b), if xk∈Y1Then there is wTX+b >=1, if xk∈Y2Then there is wTx+b≤-1。
■ supporting vector: following with the range formula of point to plane it can be concluded that distance of the sample away from two classes differentiation plane Are as follows:
Two above hyperplane is parallel to each other, and no sample point is fallen into this section, separating hyperplance be just located at this two A plane central (being identical to the distance of two planes), and w is exactly the normal vector of hyperplane, so w is exactly So-called supporting vector;After solution is just exactly that we are obtained by data substitution;Libsvm work can be used in specific calculate Tool obtains.In practice, due to having extracted the data of up to more than ten of dimension, in order to guarantee inspection result, we also make A liter dimension (Gaussian kernel of actual use), formula are carried out with radial kernel function (RBF) are as follows:
Its meaning is to calculate the distance between some sample point and other sample points, and be converted between 0 to 1 nothing but A floating number, then passing through the above method if there is m sample point can be by sample point by l dimension (general l be much smaller than m) Rise to m dimension.
The step 6) examines DNS request in practice, specific as follows: acquired supporting vector parameter is saved In associated documents, for real data, according to the correlated characteristic abstracting method extraction feature in step 4, support is then utilized Vector detect, to obtain associated host with the presence or absence of sign of falling.
It should be noted that above-described embodiment, is not intended to limit the scope of protection of the present invention, in above-mentioned technical proposal On the basis of made equivalents or substitution each fall within the range that the claims in the present invention are protected.

Claims (7)

1. a kind of Host Detection method of falling based on DNS multiple features, which is characterized in that the described method comprises the following steps:
The unpacking of step 1) DNS data packet and Dialog processing;
Step 2) domain name classification processing;
Step 3) establishes name server list;
Step 4) data characteristics extracts;
Step 5) builds training dataset;
Step 6) examines DNS request in practice.
2. a kind of Host Detection method of falling based on DNS multiple features according to claim 1, which is characterized in that described The unpacking of step 1) DNS data packet and Dialog processing are specific as follows: according to RFC relevant regulations content to DNS related data packets into Row unpacks, and unpacking includes its data link layer, network layer, transport layer and application layer partial information, can will request in application layer The information such as return IP address, ttl value in domain name, response data packet parse, for used in subsequent analysis;System can basis Transaction Identification Number in DNS Protocol, request data package and response data packet are associated, to form complete DNS session data; For being marked for request failure.
3. a kind of Host Detection method of falling based on DNS multiple features according to claim 2, which is characterized in that described Step 2) domain name classification processing is specific as follows: domain name classification processing: according to the domain name white list pre-established, asking to all domain names It asks and classifies, they are divided into square data by the matched request of energy on domain name white list, and without ranking on white list Then in advance first divide losing side data
4. a kind of Host Detection method of falling based on DNS multiple features according to claim 3, which is characterized in that described Step 3) establishes name server list;Specific as follows: the purpose for establishing name server list is to filter domain name service Device distinguishes general host and name server.
5. a kind of Host Detection method of falling based on DNS multiple features according to claim 3, which is characterized in that described Step 4) data characteristics extracts, i.e., extracts to various features, form corresponding feature vector, specific as follows:
41) domain name spells feature extraction: being extracted by the analysis of the spelling feature to domain name, is mainly accounted for including vowel Than statistics such as, consonant accounting, digital accounting, character accounting, dictionary word accountings;
42) TTL statistical nature extracts: using the analysis of the ttl value previously for DNS data packet, to ttl value in the unit time Some statistical natures are analyzed, in order to obtain positive and negative sense data, respectively to the domain name data in white list and not Its TTL statistical nature, i.e. TTL mean value, TTL standard deviation are calculated separately in the DNS response data of domain name whitelisted content:
In above-mentioned formula, TTLipIt is the life span of all IP returned in DNS response data packet, AVGTTLIt is average IP The time is generated, N is the quantity of sampling;
43) return address DNS feature extraction: feature extraction is carried out according to the address that DNS response data packet is returned, including such as The national number of the number of addresses of return, address of cache, and pass through the anti-quantity etc. for looking into domain name of IP address;
44) domain name request failure ratio characteristic extracts: being lost by parsing relevant in the dns response data packet of name server It loses or successful result, acquisition related resolution failure ratio, the parsing to fail in these data packets can generally return to " No such The mark such as domain name ", and be corresponding to it without any corresponding IP address;
45) domain name request ranking feature extraction:, can be by the domain name of host request by establishing domain name request ranked list mode In contrast, if not within ranking it may be considered that there are some problems;
46) request frequency feature extraction: the domain name request frequency of associated server or terminal is counted within the unit time; In addition, being grouped to request domain name, entropy of distribution is requested it to calculate, formula is as follows:
In above formula, Entropyrequest_domainTo request domain name entropy, N is the different domain name quantity of request, and piFor different domain names The accounting of quantity, it can be seen that the distribution requested if entropy is smaller more disperses, and otherwise more concentrates, and the former may then represent A possibility that associated host is fallen is bigger;
47) entropy meter DNS data packet entropy feature extraction: is carried out to related data packets using shannon entropy method (as above) It calculates, obtains its entropy feature;The purpose for extracting entropy is added because can be entrained in the DNS request data packet of part harmful software Ciphertext data.
6. a kind of Host Detection method of falling based on DNS multiple features according to claim 2, which is characterized in that described It is specific as follows that step 5) builds training dataset: by the related data collected in training environment, according to each dimension of extraction Information generate vector data, positively and negatively data are marked, related data is divided using support vector machines, It is screened, is explained as follows to be generated as actual operating data:
■ definition: assuming that there is object set X, size is N, and element is the vector from p dimension real number space, i.e. x ∈ Rp, The category value u of each objecti∈{-1,1}.Linear separability support vector machines is and classifier that output class is expressed as Y1= (x,wT) and Y x+b2=(x ,-wTX-b), if xk∈Y1Then there is wTX+b >=1, if xk∈Y2Then there is wTx+b≤-1。
■ supporting vector: following with the range formula of point to plane it can be concluded that distance of the sample away from two classes differentiation plane are as follows:
Two above hyperplane is parallel to each other, and no sample point is fallen into this section, and separating hyperplance is just flat positioned at the two Face center (being identical to the distance of two planes), and w is exactly the normal vector of hyperplane, so w is exactly so-called Supporting vector;After solution is just exactly that we are obtained by data substitution;The acquisition of libsvm tool can be used in specific calculate. In practice, due to having extracted the data of up to more than ten of dimension, in order to guarantee inspection result, we also use radial direction Kernel function (RBF) carries out a liter dimension (Gaussian kernel of actual use), formula are as follows:
Its meaning is to calculate the distance between some sample point and other sample points, and one be converted between 0 to 1 nothing but A floating number, m can be risen to by l dimension (general l is much smaller than m) for sample point by then passing through the above method if there is m sample point Dimension.
7. a kind of Host Detection method of falling based on DNS multiple features according to claim 2, which is characterized in that described Step 6) examines DNS request in practice, specific as follows: acquired supporting vector parameter is stored in associated documents In, for real data, according to the correlated characteristic abstracting method extraction feature in step 4, then utilize the progress of supporting vector Detection, to obtain associated host with the presence or absence of sign of falling.
CN201910413662.4A 2019-05-17 2019-05-17 A kind of Host Detection method of falling based on DNS multiple features Pending CN110535821A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910413662.4A CN110535821A (en) 2019-05-17 2019-05-17 A kind of Host Detection method of falling based on DNS multiple features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910413662.4A CN110535821A (en) 2019-05-17 2019-05-17 A kind of Host Detection method of falling based on DNS multiple features

Publications (1)

Publication Number Publication Date
CN110535821A true CN110535821A (en) 2019-12-03

Family

ID=68659219

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910413662.4A Pending CN110535821A (en) 2019-05-17 2019-05-17 A kind of Host Detection method of falling based on DNS multiple features

Country Status (1)

Country Link
CN (1) CN110535821A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113315739A (en) * 2020-02-26 2021-08-27 深信服科技股份有限公司 Malicious domain name detection method and system
WO2021169730A1 (en) * 2020-02-25 2021-09-02 深信服科技股份有限公司 Method and device for data processing, and storage medium
CN113704749A (en) * 2020-05-20 2021-11-26 中国移动通信集团浙江有限公司 Malicious excavation detection processing method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103001825A (en) * 2012-11-15 2013-03-27 中国科学院计算机网络信息中心 Method and system for detecting DNS (domain name system) traffic abnormality
CN106060067A (en) * 2016-06-29 2016-10-26 上海交通大学 Passive DNS iterative clustering-based malicious domain name detection method
CN107786575A (en) * 2017-11-11 2018-03-09 北京信息科技大学 A kind of adaptive malice domain name detection method based on DNS flows
CN108462675A (en) * 2017-02-20 2018-08-28 沪江教育科技(上海)股份有限公司 A kind of network accesses recognition methods and system
CN108737439A (en) * 2018-06-04 2018-11-02 上海交通大学 A kind of large-scale malicious domain name detecting system and method based on self feed back study
CN109450842A (en) * 2018-09-06 2019-03-08 南京聚铭网络科技有限公司 A kind of network malicious act recognition methods neural network based

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103001825A (en) * 2012-11-15 2013-03-27 中国科学院计算机网络信息中心 Method and system for detecting DNS (domain name system) traffic abnormality
CN106060067A (en) * 2016-06-29 2016-10-26 上海交通大学 Passive DNS iterative clustering-based malicious domain name detection method
CN108462675A (en) * 2017-02-20 2018-08-28 沪江教育科技(上海)股份有限公司 A kind of network accesses recognition methods and system
CN107786575A (en) * 2017-11-11 2018-03-09 北京信息科技大学 A kind of adaptive malice domain name detection method based on DNS flows
CN108737439A (en) * 2018-06-04 2018-11-02 上海交通大学 A kind of large-scale malicious domain name detecting system and method based on self feed back study
CN109450842A (en) * 2018-09-06 2019-03-08 南京聚铭网络科技有限公司 A kind of network malicious act recognition methods neural network based

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
林成虎; 李晓东; 金键; 尉迟学彪; 吴军: "基于W_Kmeans算法的DNS流量异常检测", 《计算机工程与设计》 *
殷聪贤: "基于大数据分析的恶意域名检测技术研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
王震: "基于SVM的DGA域名检测方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021169730A1 (en) * 2020-02-25 2021-09-02 深信服科技股份有限公司 Method and device for data processing, and storage medium
CN113381962A (en) * 2020-02-25 2021-09-10 深信服科技股份有限公司 Data processing method, device and storage medium
CN113381962B (en) * 2020-02-25 2023-02-03 深信服科技股份有限公司 Data processing method, device and storage medium
CN113315739A (en) * 2020-02-26 2021-08-27 深信服科技股份有限公司 Malicious domain name detection method and system
CN113704749A (en) * 2020-05-20 2021-11-26 中国移动通信集团浙江有限公司 Malicious excavation detection processing method and device
CN113704749B (en) * 2020-05-20 2024-03-19 中国移动通信集团浙江有限公司 Malicious mining detection processing method and device

Similar Documents

Publication Publication Date Title
Bhavsar et al. Intrusion detection system using data mining technique: Support vector machine
US10178107B2 (en) Detection of malicious domains using recurring patterns in domain names
CN112953933A (en) Abnormal attack behavior detection method, device, equipment and storage medium
CN110535821A (en) A kind of Host Detection method of falling based on DNS multiple features
CN109922065B (en) Quick identification method for malicious website
WO2010126733A1 (en) Systems and methods for sensitive data remediation
Krishnaveni et al. Ensemble approach for network threat detection and classification on cloud computing
US20200153865A1 (en) Sensor based rules for responding to malicious activity
Therdphapiyanak et al. Applying Hadoop for log analysis toward distributed IDS
US10462170B1 (en) Systems and methods for log and snort synchronized threat detection
CN107895122A (en) A kind of special sensitive information active defense method, apparatus and system
CN116860489A (en) System and method for threat risk scoring of security threats
CN107231383B (en) CC attack detection method and device
CN110855716B (en) Self-adaptive security threat analysis method and system for counterfeit domain names
CN106790025B (en) Method and device for detecting link maliciousness
CN117478433B (en) Network and information security dynamic early warning system
CN113746952B (en) DGA domain name detection method and device, electronic equipment and computer storage medium
Harbola et al. Improved intrusion detection in DDoS applying feature selection using rank & score of attributes in KDD-99 data set
Ruiling et al. A dns-based data exfiltration traffic detection method for unknown samples
Gautam et al. Anomaly detection system using entropy based technique
CN117391214A (en) Model training method and device and related equipment
Anand et al. Enchanced multiclass intrusion detection using supervised learning methods
CN114925365A (en) File processing method and device, electronic equipment and storage medium
Vyas et al. Intrusion detection systems: a modern investigation
Patel et al. Hybrid relabeled model for network intrusion detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191203