CN102685145A - Domain name server (DNS) data packet-based bot-net domain name discovery method - Google Patents
Domain name server (DNS) data packet-based bot-net domain name discovery method Download PDFInfo
- Publication number
- CN102685145A CN102685145A CN2012101683406A CN201210168340A CN102685145A CN 102685145 A CN102685145 A CN 102685145A CN 2012101683406 A CN2012101683406 A CN 2012101683406A CN 201210168340 A CN201210168340 A CN 201210168340A CN 102685145 A CN102685145 A CN 102685145A
- Authority
- CN
- China
- Prior art keywords
- domain name
- mrow
- msub
- botnet
- host
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 238000012216 screening Methods 0.000 claims description 11
- 230000000977 initiatory effect Effects 0.000 claims description 9
- 238000001914 filtration Methods 0.000 claims description 8
- 238000012937 correction Methods 0.000 claims description 7
- 238000007781 pre-processing Methods 0.000 claims description 7
- 238000010606 normalization Methods 0.000 claims description 3
- 230000006399 behavior Effects 0.000 abstract description 11
- 238000001514 detection method Methods 0.000 abstract description 9
- 230000002688 persistence Effects 0.000 abstract description 5
- 238000013077 scoring method Methods 0.000 abstract description 2
- 238000004364 calculation method Methods 0.000 description 5
- 238000002474 experimental method Methods 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 241000700605 Viruses Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- ZXQYGBMAQZUVMI-GCMPRSNUSA-N gamma-cyhalothrin Chemical compound CC1(C)[C@@H](\C=C(/Cl)C(F)(F)F)[C@H]1C(=O)O[C@H](C#N)C1=CC=CC(OC=2C=CC=CC=2)=C1 ZXQYGBMAQZUVMI-GCMPRSNUSA-N 0.000 description 1
- 238000009776 industrial production Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
Images
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a domain name server (DNS) data packet-based bot-net domain name discovery method. A DNS data packet is taken as a basic data source in a network layer, and a domain name co-occurrence scoring method is used for tracking and discovering a plurality of bot-net domain names under the condition that a part of bot-net domain names are known by utilizing the two key features of groupment and persistence of a bot-net. Known local features (manifesting as bot-net domain names) of the bot-net are updated or changed into unknown domain names along with time, the unknown domain names are discovered, and the dynamic variations of access behaviors of a specified bio-net are discovered, mastered and tracked, so that the shortcomings of the conventional bio-net detection method are overcome. According to the method, domain names are taken as features, so that the limitation caused by bio-net protocol diversity, information encryption and the like when feature codes are used for detection can be avoided; and an object is observed according to the co-occurrence behaviors of the domain names, so that the unknown bio-net domain names can be discovered by fully utilizing the features of groupment and persistence of the bio-net.
Description
Technical Field
The method relates to the field of computer network security, relates to a botnet domain name discovery method, and particularly relates to a botnet domain name discovery method based on a DNS (domain name system) data packet.
Background
Botnets are a group of zombie hosts (zoobie) infected by bots (bots) and having command and control relations, the bots are distributed in various occasions such as families, enterprises and government organizations, receive instructions from controllers (botmasters), execute various network attacks such as DDoS (distributed denial of service), information stealing, phishing, junk mails, advertisement abusing points, illegal voting and the like, and as a group large-scale network attack means, serious security threats are caused to civil internet, industrial production control systems, military networks and the like. One-to-many command and control (C & C) is the fundamental characteristic that botnets are different from traditional virus, trojan, backdoor and other attack technologies, and the botnets have typical characteristics of large scale, organization, high controllability, high concealment, long latency and the like.
At present, the traditional method for detecting the botnet is to discover a controlled botnet host by using feature codes, and the method for detecting the botnet by using non-feature codes mainly comprises the following steps: the method comprises the steps of collecting and classifying network characteristics, grading threats and relevance among hosts, mining malicious domain names in the domain names through semantic analysis of domain name lexical, and detecting botnets and the like through Fast-Flux phenomena of IP and the domain names. These methods face the following problems:
1) the characteristics of long-term latency of the botnet and the like determine that the interaction between a controller and a botnet host is a dynamic command and control process, so that the known characteristic information is updated quickly, a detection method based on the characteristic code cannot keep up with the steps of the characteristic information, and the detection failure rate is gradually improved along with time.
2) The network-based detection method has high requirements on data sources, and is difficult to apply to a large network due to complex data calculation.
3) The detection method based on literal semantic analysis and the detection method using the Fast-Flux phenomenon are very limited and cannot effectively detect a wide variety of botnets.
Disclosure of Invention
The method aims to provide a botnet domain name discovery method based on a DNS data packet, and by means of known local characteristics (expressed as the domain name of the botnet) of the botnet, an unknown domain name which is updated or changed along with time change is discovered, and dynamic changes of access behaviors of the given botnet are discovered, mastered and tracked, so that the defects of an existing botnet detection method are overcome. The method of the invention takes the domain name as the characteristic, and can avoid the limitations caused by the diversity of botnet protocols or information encryption and the like when the characteristic code is taken as the detection means; the object is observed by the co-occurrence behavior of the domain names, and the unknown botnet domain names can be found by fully utilizing the characteristics of the population and the persistence of the botnets. Experiments show that the method can effectively and reliably discover the unknown botnet domain name in the network scale of tens of thousands of hosts.
In order to achieve the purpose, the invention adopts the following technical scheme:
a botnet domain name discovery method based on DNS data packets comprises the following steps,
data preprocessing:
step 1.1: analyzing DNS query data from a data packet by taking given network outlet flow as a data source, and extracting a quadruplet r = (t, h, p, d) set containing DNS query characteristic information from the data packet, wherein t is request initiating time, h is a request initiating host, p is a requested resource record type, and d is a requested domain name;
step 1.2: filtering and reducing a quadruplet r = (t, h, p, d) set through a domain name white list, and removing the quadruplet containing a given domain name of the domain name white list from the quadruplet r = (t, h, p, d) set;
step 1.3: identifying an NAT (IP Network Address Translator/IP Network Address Translator) host, filtering an access record of the NAT host to a domain name in the NAT Network, and removing the access record from a quadruple set after the quadruple set r = (t, h, p, d) is removed from the quadruple set after the domain name is given by the domain name white list in the step 1.2; eliminating to obtain reduced quadruple set;
step 1.4: counting the time windows on the reduced quaternary set obtained in the step 3 by taking the domain name as a main body, counting the number of times of inquiring each domain name by each host in each time window, and defining the number as a quadruple s = (T)i,h,d,nall);TiRepresents from tiTo ti+1Time range of (t)i+1=ti+ T, h is the request initiating host, d is the domain name of the request, nallThe number of times each domain name is queried by each host in a time window, wherein the time window is T in size;
preferably, the process of identifying the NAT host in step 1.3 includes the following steps:
step a: dividing time periods, and recording the observed value x of the number of the domain names accessed by each host i in each time period jij(ii) a j =1,2,3, … N, N being a natural number;
step b: calculating the average value of the number of domain names accessed by the host i in n time periods as
Step c: calculating a threshold MkLet M standkIs a random variable XijUpper k quantile of (i.e., P { X)ij>Mk} = k, k ∈ (0,1), where the random variable XijRepresenting the domain name query number of the host i in the time period j; experiments show that k is 0.05, so that the optimal effect is achieved;
Preferably, the botnet domain name discovery method based on the DNS packet further includes the step of domain name co-occurrence scoring:
step 2.1: for a given botnet, determining, from the set of domain names for the given botnet, that there is a quad s = (T =)i,h,d,nall) The domain name set to be detected; a host sending a DNS query request to a domain name in any known zombie network is a zombie host, and an unknown domain name in a data set accessed by all the zombie hosts is a domain name set to be tested;
step 2.2: time window divisionFor each time window TiCalculating the co-occurrence score of each domain name in the domain name set to be detected and the given botnet domain name:
1) calculating a time window TiAnd the similarity coefficient between the domain name to be detected and each domain name in the domain name set of the given botnet is as follows:wherein, D (h, T)i) For the time window TiSet of domain names visited by the inner host h, diFor the domain name to be measured in the time window, djA given domain name in a domain name set of a given botnet within the time window;
2) calculating the time window TiIn the method, the domain name d to be tested and the known domain name set Z in all given botnetsbSum of similarity coefficients of
3) Calculating the time window TiIn the method, the correction coefficient W (d, T) of the domain name d to be measuredi) The correction factor is the number of zombie hosts visited the domain name divided by the number of all hosts visited the domain name, i.e. the correction factor is the number of zombie hosts visited the domain nameWherein, H (d, T)i) Is shown in a time window TiA host set internally accessed by a domain name d;
4) calculating the time window TiIn the method, the co-occurrence score S of the domain name to be detected and the given botnet domain name setb(d,Ti), <math><mrow>
<msub>
<mi>S</mi>
<mi>b</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>d</mi>
<mo>,</mo>
<msub>
<mi>T</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mrow>
<mo>(</mo>
<msub>
<mi>Σ</mi>
<mrow>
<msub>
<mi>d</mi>
<mi>b</mi>
</msub>
<mo>∈</mo>
<msub>
<mi>Z</mi>
<mi>b</mi>
</msub>
</mrow>
</msub>
<mi>C</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>d</mi>
<mi>b</mi>
</msub>
<mo>,</mo>
<msub>
<mrow>
<mi>d</mi>
<mo>,</mo>
<mi>T</mi>
</mrow>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>)</mo>
</mrow>
<mo>*</mo>
<mi>W</mi>
<mrow>
<mo>(</mo>
<mi>d</mi>
<mo>,</mo>
<msub>
<mi>T</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>;</mo>
</mrow></math>
Step 2.3: calculating a domain name co-occurrence score S for multiple time windowsb(d);
1) For x continuous time windows, calculating domain name d and botnet BbAverage co-occurrence score of <math><mrow>
<msub>
<mover>
<mi>S</mi>
<mo>‾</mo>
</mover>
<mi>b</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>d</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mrow>
<mo>(</mo>
<msub>
<mi>S</mi>
<mi>b</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>d</mi>
<mo>,</mo>
<msub>
<mi>T</mi>
<mn>1</mn>
</msub>
<mo>)</mo>
</mrow>
<mo>+</mo>
<msub>
<mi>S</mi>
<mi>b</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>d</mi>
<mo>,</mo>
<msub>
<mi>T</mi>
<mn>2</mn>
</msub>
<mo>)</mo>
</mrow>
<mo>+</mo>
<mo>·</mo>
<mo>·</mo>
<mo>·</mo>
<mo>+</mo>
<msub>
<mi>S</mi>
<mi>b</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>d</mi>
<mo>,</mo>
<msub>
<mi>T</mi>
<mi>x</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>)</mo>
</mrow>
<mo>/</mo>
<mi>x</mi>
<mo>;</mo>
</mrow></math>
2) For x continuous time windows, calculating domain name d and botnet BbMaximum co-occurrence score of Sbmax(d)=max(Sb(d,Ti));
3) Normalization is carried out, and domain name co-occurrence score S of multiple time windows is calculatedb(d):
Wherein,as botnet BbMaximum value of average co-occurrence score, max (S), of all domain names to be testedbmax(di) Is botnet BbThe maximum value of the maximum co-occurrence scores of all the domain names to be detected, and alpha is a scale factor reflecting the ratio of the average value to the maximum value of the domain name co-occurrence scores in the network; experiments show that the result accuracy is highest when the value of alpha is 0.8.
Preferably, the botnet domain name discovery method based on the DNS packet further includes a botnet domain name screening step:
sorting domain name co-occurrence scores of all domain names to be detected of the given botnet, and screening the domain names to be detected with the score greater than 0.2; and judging the maliciousness of the domain name to be detected with the score of more than 0.2 by utilizing a domain name maliciousness judgment rule and recommending.
Preferably, the domain name maliciousness judgment rule satisfies any one or more of the following conditions:
(1) the safety manufacturer publishes the domain name as a malicious domain name or a malicious URL exists under the domain name;
(2) the domain name has the same secondary domain name as the known malicious domain name, and the secondary domain name is not a dynamic domain name provider;
(3) the same prefix as a known malicious domain name;
(4) it is found by the search engine that there is no information for the domain name at all, but it does exist, and the resolved IP address is the same as that resolved by a known malicious domain name.
Preferably, the domain name in the domain name white list in step 1.2 is one or more of a common domain name, a misconfigured domain name and a program frequently-sent domain name.
Compared with the prior art, the invention has the following beneficial effects: the method does not need the support of host layer data, takes a DNS data packet as a basic data source in a network layer, and uses a domain name co-occurrence scoring method to track and find more botnet domain names by utilizing two key characteristics of the group property and the persistence of the botnets under the condition of knowing partial botnet domain names. The method comprises three parts of data preprocessing, domain name co-occurrence score calculation and botnet domain name screening. The method eliminates the interference of the NAT host in the network at the data preprocessing part; performing score analysis from a space dimension and a time dimension in a domain name co-occurrence score calculation part, so that domain name co-occurrence behaviors shown by a botnet can be obviously different from domain name co-occurrence behaviors shown by other normal applications; and finally, sorting the domain name co-occurrence scores and screening the domain names with the highest degree of correlation with the known botnet domain names.
Drawings
Figure 1 is a schematic diagram of the cooccurrence behavior of botnets according to the method.
Fig. 2 is a detailed flow diagram of botnet domain name discovery according to the present method.
Detailed Description
For a more clear understanding of the present method, the present method is described in further detail below by way of specific embodiments in conjunction with the accompanying drawings.
Figure 1 is a schematic diagram of zombie network co-occurrence behavior.
Botnets, whether centralized or distributed, whether IRC or HTTP protocol, have the following commonalities: (1) population in space. Controlled by the same hacker or hacker organization, receiving the same or coordinated attack commands, and having the same network access rule; (2) the persistence over time. The zombie hosts continue to access relevant target servers (including control servers, update servers, etc.) over time, all the while remaining in contact with the zombie controller. Botnets typically use a number of different domain names in their command and control processes, and botnet hosts continue to access these specific domain names for the life of their hosts to maintain, receive commands and control of attackers, and to ensure their privacy and reliability. In a typical botnet control case, the access process of a bot host to various domain names is as follows: firstly, accessing a command control server domain name to complete the receiving of a control command; then accessing a related server domain name to execute control commands such as updating a bot program, downloading malicious codes, uploading stealing information and the like; and finally, accessing the domain name of the victim server to carry out network attack and the like. Access to the server domain name by zombie hosts in the same zombie network necessarily has the same or similar access behavior, as controlled by the same zombie controller.
Namely, domain name access of botnets has definite domain name co-occurrence behavior: the given set of domain names is served by a known zombie domain name, such as a command control domain name resulting from capture; the co-occurrence domain name set covers various unknown zombie domain names, such as related domain names and damaged domain names. Therefore, the method scores and finds unknown botnet domain names based on the domain name co-occurrence behavior of botnets.
Figure 2 is a detailed flow chart of the discovery of botnet domain names using the present method.
The data source of the method is the flow data of a given network, the flow mirror of a network outlet can be adopted, the inlet flow of a regional network DNS server can also be adopted, the data packet analysis is carried out through wincap, the quadruple containing DNS query characteristic information in the DNS query flow is extracted and is stored in a database as metadata. This step needs to be performed for a long time to acquire data for a long time. In the steps of white list filtering, NAT host filtering and data statistics, the operation data are all the metadata.
When the data preprocessing part is finished, the obtained domain name feature quadruple is filtered and the obtained statistical feature quadruple is counted to serve as the input of the domain name co-occurrence score calculating part, firstly, a co-occurrence domain name set to be detected is screened out according to the known domain name of a given botnet, and the host computer of the domain name to be detected which sends a DNS query request to the known domain name of any given botnet serves as a botnet host computer. Then dividing a time window, calculating the domain name co-occurrence score of each domain name to be detected in a single time window according to the steps in the invention content, and then calculating the domain name co-occurrence score of each domain name to be detected in a multi-time window according to the steps in the invention content.
And when the domain name co-occurrence score calculation part is finished, obtaining a list of domain name co-occurrence scores to be detected, sorting the domain name scores according to the scores, and screening the domain name with the highest degree of correlation with the known botnet domain name. And judging the maliciousness of the domain name with the score of more than 0.2 and recommending the domain name.
The method comprises three parts of data preprocessing, domain name co-occurrence score calculation and botnet domain name screening. Wherein each part comprises the following steps:
a data preprocessing part:
step 1: and analyzing DNS query data from a data packet by taking the given network outlet flow as a data source, extracting a quadruplet r = (t, h, p, d) (t is request initiating time, h is a request initiating host, p is a requested resource record type, and d is a requested domain name) set containing DNS query characteristic information, and preparing a series of data for subsequent steps.
Step 2: the quadruplets r = (t, h, p, d) set are reduced through domain name white list filtering, and the quadruplets containing the given domain name are removed from the quadruplet r = (t, h, p, d) set. The domain names in the white list are mainly of the following types: common domain names, misconfigured domain names, and program frequent domain names.
And step 3: and (3) identifying an NAT (IP Network Address converter/IP Network Address Translator) host, filtering an access record of the NAT host to the domain name in the NAT Network based on the access statistical property of the Network domain name without hardware support, and removing the reduced quadruple set from the quadruple set obtained in the step (2).
The identification steps of the NAT host are as follows:
1) dividing the time period, and recording the observed value x of the number of domain names accessed by each host i in each time period j (j =1,2,3, …)ij。
2) Calculating the average value of the number of domain names accessed by the host i in n time periods as
3) Calculating a threshold MkLet M standkIs a random variable XijUpper k quantile of (i.e., P { X)ij>Mk} = k, k ∈ (0,1), where the random variable XijIndicating the number of domain name queries by host i during this time period j. Experiments show that k is 0.05, so that the optimal effect is achieved.
4) If it is judged thatThe host is considered to be a NAT host.
And 4, step 4: counting the time windows on the reduced quaternion set obtained in the step 3 by taking the domain name as a main body, counting the number of times of inquiring each domain name by each host in each time window (the time window is preferably 1 natural day), and defining the number as a quaternion s = (T)i,h,d,nall)(TiRepresents from tiTo ti+1Time range of (t)i+1=ti+ T, h is the request initiating host, d is the domain name of the request, nallThe number of times each domain name is queried by each host in a time window, where the time window is of size T). Without loss of useful informationThe case highlights the statistical nature of the domain name and reduces the data set.
The domain name co-occurrence score calculating part:
step 1: for a given botnet, determining, from the set of domain names for the given botnet, that there is a quad s = (T =)i,h,d,nall) The domain name set to be tested. The host that issued the DNS query request to any domain name in any known botnet is the botnet host, and the four-tuple s = (T) that all botnet hosts have accessedi,h,d,nall) The unknown domain name in (i.e., the domain name other than the domain name set for the given botnet) is the set of domain names to be tested.
Step 2: dividing the time windows, for each time window TiCalculating the co-occurrence score of each domain name in the domain name set to be detected and the given botnet domain name:
1) calculating a time window TiAnd correcting the similarity coefficient between the domain name to be detected and each domain name in the domain name set of the given botnet based on the Jacobian similarity coefficient. D (h, T)i) For the time window TiSet of domain names visited by the inner host h, diFor the domain name to be measured in the time window, djFor a given domain name in the domain name set of a given botnet within the time window, the similarity coefficient is:
2) calculating the time window TiIn the method, the domain name d to be tested and the known domain name set Z in all given botnetsbSum of similarity coefficients of
3) Calculating the time window TiAnd the correction coefficient of the domain name d to be detected is the number of zombie hosts which visit the domain name to be detected divided by the number of all hosts which visit the domain name to be detected, namely Wherein, H (d, T)i) Is shown in a time window TiThe set of hosts that have internally visited domain name d.
4) Calculating the time window TiIn the method, the co-occurrence score S of the domain name to be detected and the given botnet domain name setb(d,Ti), <math><mrow>
<msub>
<mi>S</mi>
<mi>b</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>d</mi>
<mo>,</mo>
<msub>
<mi>T</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mrow>
<mo>(</mo>
<msub>
<mi>Σ</mi>
<mrow>
<msub>
<mi>d</mi>
<mi>b</mi>
</msub>
<mo>∈</mo>
<msub>
<mi>Z</mi>
<mi>b</mi>
</msub>
</mrow>
</msub>
<mi>C</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>d</mi>
<mi>b</mi>
</msub>
<mo>,</mo>
<msub>
<mrow>
<mi>d</mi>
<mo>,</mo>
<mi>T</mi>
</mrow>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>)</mo>
</mrow>
<mo>*</mo>
<mi>W</mi>
<mrow>
<mo>(</mo>
<mi>d</mi>
<mo>,</mo>
<msub>
<mi>T</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>.</mo>
</mrow></math>
And step 3: calculating a domain name co-occurrence score S for multiple time windowsb(d)。
1) For x continuous time windows, calculating domain name d and botnet BbAverage co-occurrence score of <math><mrow>
<msub>
<mover>
<mi>S</mi>
<mo>‾</mo>
</mover>
<mi>b</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>d</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mrow>
<mo>(</mo>
<msub>
<mi>S</mi>
<mi>b</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>d</mi>
<mo>,</mo>
<msub>
<mi>T</mi>
<mn>1</mn>
</msub>
<mo>)</mo>
</mrow>
<mo>+</mo>
<msub>
<mi>S</mi>
<mi>b</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>d</mi>
<mo>,</mo>
<msub>
<mi>T</mi>
<mn>2</mn>
</msub>
<mo>)</mo>
</mrow>
<mo>+</mo>
<mo>·</mo>
<mo>·</mo>
<mo>·</mo>
<mo>+</mo>
<msub>
<mi>S</mi>
<mi>b</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>d</mi>
<mo>,</mo>
<msub>
<mi>T</mi>
<mi>x</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>)</mo>
</mrow>
<mo>/</mo>
<mi>x</mi>
<mo>.</mo>
</mrow></math>
2) For x continuous time windows, calculating domain name d and botnet BbMaximum co-occurrence score of Sbmax(d)=max(Sb(d,Ti))。
3) Normalization is carried out, and domain name co-occurrence score S of multiple time windows is calculatedb(d):
Wherein,as botnet BbMaximum value of average co-occurrence score, max (S), of all domain names to be testedbmax(di) Is botnet BbAnd the maximum value of the maximum co-occurrence scores of all the domain names to be detected, wherein alpha is a scale factor reflecting the ratio of the average value to the maximum value of the domain name co-occurrence scores in the network. Experiments show that the result accuracy is highest when the value of alpha is 0.8.
A botnet domain name screening part:
step 1: the domain name co-occurrence scores for all domain names to be tested for a given botnet are ranked.
Step 2: and screening the domain name to be detected with the score of more than 0.2 (the domain name to be detected with the score of more than 0.2 has high correlation degree with the domain name of the known zombie network).
And step 3: and judging the maliciousness of the domain name to be detected obtained by screening, wherein the judgment rule meets any one or more descriptions as follows:
1) the security manufacturer publishes the domain name as a malicious domain name or a malicious URL exists under the domain name.
2) And the known malicious domain name has the same secondary domain name, and the secondary domain name is not a dynamic domain name provider.
3) And known malicious domain names have the same prefix.
4) It is found by the search engine that there is no information for the domain name at all, but it does exist, and the resolved IP address is the same as that resolved by a known malicious domain name.
Claims (6)
1. A botnet domain name discovery method based on DNS data packets is characterized by comprising the following data preprocessing steps:
step 1.1: analyzing DNS query data from a data packet by taking given network outlet flow as a data source, and extracting a quadruplet r = (t, h, p, d) set containing DNS query characteristic information from the data packet, wherein t is request initiating time, h is a request initiating host, p is a requested resource record type, and d is a requested domain name;
step 1.2: filtering and reducing a quadruplet r = (t, h, p, d) set through a domain name white list, and removing the quadruplet containing a given domain name of the domain name white list from the quadruplet r = (t, h, p, d) set;
step 1.3: identifying an NAT host, filtering access records of the NAT host to a domain name in an NAT network, and removing the access records from a quadruple set after a quadruple r = (t, h, p, d) set is removed and a domain name is given by a domain name white list in the step 1.2; eliminating to obtain reduced quadruple set;
step 1.4: counting according to time windows by taking the domain name as a main body on the reduced quaternary set obtained in the step 3, and counting each time window TiThe number of times each domain name is queried by each host is defined as the quadruple s = (T)i,h,d,nall);TiRepresents from tiTo ti+1Time range of (t)i+1=ti+ T, h is the request initiating host, d is the domain name of the request, nallThe number of times each domain name is queried by each host in a time window, where the time window is of size T.
2. The DNS packet-based botnet domain name discovery method according to claim 1, wherein the process of identifying the NAT host in step 1.3 includes the steps of:
step a: dividing time periods, and recording the observed value x of the number of the domain names accessed by each host i in each time period jij(ii) a j =1,2,3, … N, N being a natural number;
step b: calculating the average value of the number of domain names accessed by the host i in n time periods as
Step c: calculating a threshold MkLet M standkIs a random variable XijUpper k quantile of (i.e., P { X)ij>Mk} = k, k ∈ (0,1), where the random variable XijRepresenting the domain name query number of the host i in the time period j; k = 0.05;
3. The DNS packet-based botnet domain name discovery method according to claim 1, wherein the DNS packet-based botnet domain name discovery method further comprises a step of domain name co-occurrence scoring:
step 2.1: for a given botnet, determining, from the set of domain names for the given botnet, that there is a quad s = (T =)i,h,d,nall) The domain name set to be detected; a host sending a DNS query request to a domain name in any known zombie network is a zombie host, and an unknown domain name in a data set accessed by all the zombie hosts is a domain name set to be tested;
step 2.2: dividing the time windows, for each time window TiCalculating the co-occurrence score of each domain name in the domain name set to be detected and the given botnet domain name:
1) calculating a time window TiAnd the similarity coefficient between the domain name to be detected and each domain name in the domain name set of the given botnet is as follows:wherein, D (h, T)i) For the time window TiSet of domain names visited by the inner host h, diFor the domain name to be measured in the time window, djA given domain name in a domain name set of a given botnet within the time window;
2) calculating the time window TiIn the method, the domain name d to be tested and the known domain name set Z in all given botnetsbSum of similarity coefficients of
3) Calculating the time window TiIn the method, the correction coefficient W (d, T) of the domain name d to be measuredi) The correction factor is the number of zombie hosts visited the domain name divided by the number of all hosts visited the domain name, i.e. the correction factor is the number of zombie hosts visited the domain nameWherein, H (d, T)i) Is shown in a time window TiA host set internally accessed by a domain name d;
4) calculating the time window TiIn the method, the co-occurrence score S of the domain name to be detected and the given botnet domain name setb(d,Ti), <math>
<mrow>
<msub>
<mi>S</mi>
<mi>b</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>d</mi>
<mo>,</mo>
<msub>
<mi>T</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mrow>
<mo>(</mo>
<msub>
<mi>Σ</mi>
<mrow>
<msub>
<mi>d</mi>
<mi>b</mi>
</msub>
<mo>∈</mo>
<msub>
<mi>Z</mi>
<mi>b</mi>
</msub>
</mrow>
</msub>
<mi>C</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>d</mi>
<mi>b</mi>
</msub>
<mo>,</mo>
<msub>
<mrow>
<mi>d</mi>
<mo>,</mo>
<mi>T</mi>
</mrow>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>)</mo>
</mrow>
<mo>*</mo>
<mi>W</mi>
<mrow>
<mo>(</mo>
<mi>d</mi>
<mo>,</mo>
<msub>
<mi>T</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>;</mo>
</mrow>
</math>
Step 2.3: calculating a domain name co-occurrence score S for multiple time windowsb(d);
1) For x continuous time windows, calculating domain name d and botnet BbAverage co-occurrence score of <math>
<mrow>
<msub>
<mover>
<mi>S</mi>
<mo>‾</mo>
</mover>
<mi>b</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>d</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mrow>
<mo>(</mo>
<msub>
<mi>S</mi>
<mi>b</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>d</mi>
<mo>,</mo>
<msub>
<mi>T</mi>
<mn>1</mn>
</msub>
<mo>)</mo>
</mrow>
<mo>+</mo>
<msub>
<mi>S</mi>
<mi>b</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>d</mi>
<mo>,</mo>
<msub>
<mi>T</mi>
<mn>2</mn>
</msub>
<mo>)</mo>
</mrow>
<mo>+</mo>
<mo>·</mo>
<mo>·</mo>
<mo>·</mo>
<mo>+</mo>
<msub>
<mi>S</mi>
<mi>b</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>d</mi>
<mo>,</mo>
<msub>
<mi>T</mi>
<mi>x</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>)</mo>
</mrow>
<mo>/</mo>
<mi>x</mi>
<mo>;</mo>
</mrow>
</math>
2) For x continuous time windows, calculating domain name d and botnet BbMaximum co-occurrence score of Sbmax(d)=max(Sb(d,Ti));
3) Normalization is carried out, and domain name co-occurrence score S of multiple time windows is calculatedb(d):
Wherein,as botnet BbMaximum value of average co-occurrence score, max (S), of all domain names to be testedbmax(di) Is botnet BbThe maximum value of the maximum co-occurrence scores of all the domain names to be detected, and alpha is a scale factor reflecting the ratio of the average value to the maximum value of the domain name co-occurrence scores in the network; α = 0.8.
4. The botnet domain name discovery method based on DNS packets according to claim 3, wherein the botnet domain name discovery method based on DNS packets further comprises a botnet domain name screening step:
sorting domain name co-occurrence scores of all domain names to be detected of the given botnet, and screening the domain names to be detected with the score greater than 0.2; and judging the maliciousness of the domain name to be detected with the score of more than 0.2 by utilizing a domain name maliciousness judgment rule and recommending.
5. The DNS packet-based botnet domain name discovery method according to claim 4, wherein the domain name maliciousness determination rule is that any one or more of the following are satisfied:
(1) the safety manufacturer publishes the domain name as a malicious domain name or a malicious URL exists under the domain name;
(2) the domain name has the same secondary domain name as the known malicious domain name, and the secondary domain name is not a dynamic domain name provider;
(3) the same prefix as a known malicious domain name;
(4) it is found by the search engine that there is no information for the domain name at all, but it does exist, and the resolved IP address is the same as that resolved by a known malicious domain name.
6. The DNS packet-based zombie network domain name discovery method according to claim 1, wherein the domain name in the domain name white list in step 1.2 is one or more of a common domain name, a misconfigured domain name, and a programmed frequent domain name.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012101683406A CN102685145A (en) | 2012-05-28 | 2012-05-28 | Domain name server (DNS) data packet-based bot-net domain name discovery method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012101683406A CN102685145A (en) | 2012-05-28 | 2012-05-28 | Domain name server (DNS) data packet-based bot-net domain name discovery method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102685145A true CN102685145A (en) | 2012-09-19 |
Family
ID=46816508
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2012101683406A Pending CN102685145A (en) | 2012-05-28 | 2012-05-28 | Domain name server (DNS) data packet-based bot-net domain name discovery method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102685145A (en) |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103345605A (en) * | 2013-06-06 | 2013-10-09 | 西安交通大学 | System and method for estimating scale of hosts infected by malicious codes |
CN103685230A (en) * | 2013-11-01 | 2014-03-26 | 上海交通大学 | Distributed cooperation detection system and method for botnet malicious domain name |
CN105430112A (en) * | 2015-11-03 | 2016-03-23 | 中国互联网络信息中心 | Temporary domain name identification method and system |
CN105610830A (en) * | 2015-12-30 | 2016-05-25 | 山石网科通信技术有限公司 | Method and device for detecting domain name |
CN105897714A (en) * | 2016-04-11 | 2016-08-24 | 天津大学 | Botnet detection method based on DNS (Domain Name System) flow characteristics |
CN104579773B (en) * | 2014-12-31 | 2016-08-24 | 北京奇虎科技有限公司 | Domain name system analyzes method and device |
CN106060067A (en) * | 2016-06-29 | 2016-10-26 | 上海交通大学 | Passive DNS iterative clustering-based malicious domain name detection method |
CN103685230B (en) * | 2013-11-01 | 2016-11-30 | 上海交通大学 | The distributed collaboration detecting system of Botnet malice domain name and method |
CN106375345A (en) * | 2016-10-28 | 2017-02-01 | 中国科学院信息工程研究所 | Malware domain name detection method and system based on periodic detection |
CN106790062A (en) * | 2016-12-20 | 2017-05-31 | 国家电网公司 | A kind of method for detecting abnormality and system based on the polymerization of inverse dns nailing attribute |
US9680842B2 (en) | 2013-08-09 | 2017-06-13 | Verisign, Inc. | Detecting co-occurrence patterns in DNS |
CN107071084A (en) * | 2017-04-01 | 2017-08-18 | 北京神州绿盟信息安全科技股份有限公司 | A kind of DNS evaluation method and device |
CN107360185A (en) * | 2017-08-18 | 2017-11-17 | 中国移动通信集团海南有限公司 | A kind of assessing network method and system based on DNS behavioural characteristics |
CN107480190A (en) * | 2017-07-11 | 2017-12-15 | 国家计算机网络与信息安全管理中心 | A kind of filter method and device of non-artificial access log |
CN107659564A (en) * | 2017-09-15 | 2018-02-02 | 广州唯品会研究院有限公司 | A kind of method and electronic equipment of active detecting fishing website |
CN108076027A (en) * | 2016-11-16 | 2018-05-25 | 蓝盾信息安全技术有限公司 | A kind of adaptive black and white lists access control method and system based on attribute |
CN109063106A (en) * | 2018-07-27 | 2018-12-21 | 北京字节跳动网络技术有限公司 | Network address modification method, device, computer equipment and storage medium |
CN109413079A (en) * | 2018-11-09 | 2019-03-01 | 四川大学 | Fast-Flux Botnet detection method and system under a kind of high speed network |
CN110177140A (en) * | 2019-05-27 | 2019-08-27 | 湖南快乐阳光互动娱乐传媒有限公司 | IP scheduling system and method for client data downloading |
CN110472191A (en) * | 2019-07-02 | 2019-11-19 | 北京大学 | A kind of the service evaluation calculation method and device of dynamic self-adapting |
CN111371735A (en) * | 2018-12-26 | 2020-07-03 | 中兴通讯股份有限公司 | Botnet detection method, system and storage medium |
CN111818049A (en) * | 2020-07-08 | 2020-10-23 | 宝牧科技(天津)有限公司 | Botnet flow detection method and system based on Markov model |
CN113765841A (en) * | 2020-06-01 | 2021-12-07 | 中国电信股份有限公司 | Malicious domain name detection method and device |
CN115174521A (en) * | 2022-06-09 | 2022-10-11 | 浙江远望信息股份有限公司 | NAT subnet discovery method based on domain name resolution protocol analysis |
CN116032604A (en) * | 2022-12-28 | 2023-04-28 | 广州大学 | Internet of things zombie equipment detection method based on long-term and short-term memory network |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101986642A (en) * | 2010-10-18 | 2011-03-16 | 中国科学院计算技术研究所 | Detection system and method of Domain Flux data stream |
US20110283357A1 (en) * | 2010-05-13 | 2011-11-17 | Pandrangi Ramakant | Systems and methods for identifying malicious domains using internet-wide dns lookup patterns |
US20120084860A1 (en) * | 2010-10-01 | 2012-04-05 | Alcatel-Lucent Usa Inc. | System and method for detection of domain-flux botnets and the like |
-
2012
- 2012-05-28 CN CN2012101683406A patent/CN102685145A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110283357A1 (en) * | 2010-05-13 | 2011-11-17 | Pandrangi Ramakant | Systems and methods for identifying malicious domains using internet-wide dns lookup patterns |
US20120084860A1 (en) * | 2010-10-01 | 2012-04-05 | Alcatel-Lucent Usa Inc. | System and method for detection of domain-flux botnets and the like |
CN101986642A (en) * | 2010-10-18 | 2011-03-16 | 中国科学院计算技术研究所 | Detection system and method of Domain Flux data stream |
Non-Patent Citations (1)
Title |
---|
夏秦,王志文,刘璐: "基于域名共现行为的僵尸网络行为追踪", 《西安交通大学学报》 * |
Cited By (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103345605B (en) * | 2013-06-06 | 2016-01-06 | 西安交通大学 | A kind of malicious code infections main frame size estim ate system and method |
CN103345605A (en) * | 2013-06-06 | 2013-10-09 | 西安交通大学 | System and method for estimating scale of hosts infected by malicious codes |
US9680842B2 (en) | 2013-08-09 | 2017-06-13 | Verisign, Inc. | Detecting co-occurrence patterns in DNS |
CN103685230A (en) * | 2013-11-01 | 2014-03-26 | 上海交通大学 | Distributed cooperation detection system and method for botnet malicious domain name |
CN103685230B (en) * | 2013-11-01 | 2016-11-30 | 上海交通大学 | The distributed collaboration detecting system of Botnet malice domain name and method |
CN104579773B (en) * | 2014-12-31 | 2016-08-24 | 北京奇虎科技有限公司 | Domain name system analyzes method and device |
CN105430112A (en) * | 2015-11-03 | 2016-03-23 | 中国互联网络信息中心 | Temporary domain name identification method and system |
CN105430112B (en) * | 2015-11-03 | 2019-02-22 | 中国互联网络信息中心 | Provisional domain name recognition methods and system |
CN105610830A (en) * | 2015-12-30 | 2016-05-25 | 山石网科通信技术有限公司 | Method and device for detecting domain name |
CN105897714A (en) * | 2016-04-11 | 2016-08-24 | 天津大学 | Botnet detection method based on DNS (Domain Name System) flow characteristics |
CN105897714B (en) * | 2016-04-11 | 2018-11-09 | 天津大学 | Botnet detection method based on DNS traffic characteristics |
CN106060067A (en) * | 2016-06-29 | 2016-10-26 | 上海交通大学 | Passive DNS iterative clustering-based malicious domain name detection method |
CN106060067B (en) * | 2016-06-29 | 2018-12-25 | 上海交通大学 | Malice domain name detection method based on Passive DNS iteration cluster |
CN106375345A (en) * | 2016-10-28 | 2017-02-01 | 中国科学院信息工程研究所 | Malware domain name detection method and system based on periodic detection |
CN108076027A (en) * | 2016-11-16 | 2018-05-25 | 蓝盾信息安全技术有限公司 | A kind of adaptive black and white lists access control method and system based on attribute |
CN106790062A (en) * | 2016-12-20 | 2017-05-31 | 国家电网公司 | A kind of method for detecting abnormality and system based on the polymerization of inverse dns nailing attribute |
CN106790062B (en) * | 2016-12-20 | 2020-05-08 | 国家电网公司 | Anomaly detection method and system based on reverse DNS query attribute aggregation |
US11431742B2 (en) | 2017-04-01 | 2022-08-30 | NSFOCUS Information Technology Co., Ltd. | DNS evaluation method and apparatus |
CN107071084A (en) * | 2017-04-01 | 2017-08-18 | 北京神州绿盟信息安全科技股份有限公司 | A kind of DNS evaluation method and device |
CN107071084B (en) * | 2017-04-01 | 2019-07-26 | 北京神州绿盟信息安全科技股份有限公司 | A kind of evaluation method and device of DNS |
CN107480190A (en) * | 2017-07-11 | 2017-12-15 | 国家计算机网络与信息安全管理中心 | A kind of filter method and device of non-artificial access log |
CN107360185B (en) * | 2017-08-18 | 2020-09-25 | 中国移动通信集团海南有限公司 | Network evaluation method and device based on DNS behavior characteristics |
CN107360185A (en) * | 2017-08-18 | 2017-11-17 | 中国移动通信集团海南有限公司 | A kind of assessing network method and system based on DNS behavioural characteristics |
CN107659564B (en) * | 2017-09-15 | 2020-07-31 | 广州唯品会研究院有限公司 | Method for actively detecting phishing website and electronic equipment |
CN107659564A (en) * | 2017-09-15 | 2018-02-02 | 广州唯品会研究院有限公司 | A kind of method and electronic equipment of active detecting fishing website |
CN109063106B (en) * | 2018-07-27 | 2022-03-04 | 北京字节跳动网络技术有限公司 | Website correction method and device, computer equipment and storage medium |
CN109063106A (en) * | 2018-07-27 | 2018-12-21 | 北京字节跳动网络技术有限公司 | Network address modification method, device, computer equipment and storage medium |
CN109413079A (en) * | 2018-11-09 | 2019-03-01 | 四川大学 | Fast-Flux Botnet detection method and system under a kind of high speed network |
CN111371735A (en) * | 2018-12-26 | 2020-07-03 | 中兴通讯股份有限公司 | Botnet detection method, system and storage medium |
CN111371735B (en) * | 2018-12-26 | 2022-06-21 | 中兴通讯股份有限公司 | Botnet detection method, system and storage medium |
CN110177140B (en) * | 2019-05-27 | 2022-06-07 | 湖南快乐阳光互动娱乐传媒有限公司 | IP scheduling system and method for client data downloading |
CN110177140A (en) * | 2019-05-27 | 2019-08-27 | 湖南快乐阳光互动娱乐传媒有限公司 | IP scheduling system and method for client data downloading |
CN110472191B (en) * | 2019-07-02 | 2021-03-12 | 北京大学 | Dynamic self-adaptive service evaluation calculation method and device |
CN110472191A (en) * | 2019-07-02 | 2019-11-19 | 北京大学 | A kind of the service evaluation calculation method and device of dynamic self-adapting |
CN113765841A (en) * | 2020-06-01 | 2021-12-07 | 中国电信股份有限公司 | Malicious domain name detection method and device |
CN111818049A (en) * | 2020-07-08 | 2020-10-23 | 宝牧科技(天津)有限公司 | Botnet flow detection method and system based on Markov model |
CN115174521A (en) * | 2022-06-09 | 2022-10-11 | 浙江远望信息股份有限公司 | NAT subnet discovery method based on domain name resolution protocol analysis |
CN116032604A (en) * | 2022-12-28 | 2023-04-28 | 广州大学 | Internet of things zombie equipment detection method based on long-term and short-term memory network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102685145A (en) | Domain name server (DNS) data packet-based bot-net domain name discovery method | |
CN110149350B (en) | Network attack event analysis method and device associated with alarm log | |
Vinayakumar et al. | Scalable framework for cyber threat situational awareness based on domain name systems data analysis | |
CN106713371B (en) | Fast Flux botnet detection method based on DNS abnormal mining | |
US10574681B2 (en) | Detection of known and unknown malicious domains | |
US8260914B1 (en) | Detecting DNS fast-flux anomalies | |
AU2018208693B2 (en) | A system to identify machines infected by malware applying linguistic analysis to network requests from endpoints | |
Jiang et al. | Identifying suspicious activities through dns failure graph analysis | |
CN107666490B (en) | A kind of suspicious domain name detection method and device | |
US8762298B1 (en) | Machine learning based botnet detection using real-time connectivity graph based traffic features | |
EP3905624B1 (en) | Botnet domain name family detecting method, device, and storage medium | |
US20140047543A1 (en) | Apparatus and method for detecting http botnet based on densities of web transactions | |
Niu et al. | Identifying APT malware domain based on mobile DNS logging | |
Bisio et al. | Real-time behavioral DGA detection through machine learning | |
CN103152442A (en) | Detection and processing method and system for botnet domain names | |
Krishnan et al. | Crossing the threshold: Detecting network malfeasance via sequential hypothesis testing | |
Ye et al. | Application layer DDoS detection using clustering analysis | |
Yu et al. | Online botnet detection based on incremental discrete fourier transform | |
Celik et al. | Detection of Fast-Flux Networks using various DNS feature sets | |
Yu et al. | Data-adaptive clustering analysis for online botnet detection | |
Lei et al. | Detecting malicious domains with behavioral modeling and graph embedding | |
CN111628961A (en) | DNS (Domain name Server) anomaly detection method | |
CN110650157B (en) | Fast-flux domain name detection method based on ensemble learning | |
CN117354024A (en) | DNS malicious domain name detection system and method based on big data | |
CN111371917B (en) | Domain name detection method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20120919 |