CN110830505A - Abnormity detection method for DNS query - Google Patents

Abnormity detection method for DNS query Download PDF

Info

Publication number
CN110830505A
CN110830505A CN201911197648.1A CN201911197648A CN110830505A CN 110830505 A CN110830505 A CN 110830505A CN 201911197648 A CN201911197648 A CN 201911197648A CN 110830505 A CN110830505 A CN 110830505A
Authority
CN
China
Prior art keywords
data
cluster
point
dns
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911197648.1A
Other languages
Chinese (zh)
Inventor
黄韬
余思雨
鄂新华
潘恬
张娇
刘江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201911197648.1A priority Critical patent/CN110830505A/en
Publication of CN110830505A publication Critical patent/CN110830505A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]

Abstract

The invention discloses an anomaly detection method aiming at DNS query, which comprises the following steps: (1) extracting log information in a DNS (domain name system) server by using a data acquisition unit; (2) preprocessing the extracted data according to the set characteristics; (3) performing dimensionality reduction on the preprocessed data; (4) performing group analysis on the data in a low-dimensional space; (5) calculating the reliability of the data points after the dimensionality reduction based on the relative density; (6) and marking abnormal IP according to the distribution of the data and the credibility thereof. By extracting and processing the query log information in the DNS server, the behavior of each IP is analyzed so as to realize the detection of the abnormal behavior.

Description

Abnormity detection method for DNS query
Technical Field
The invention belongs to the technical field of data information networks, and particularly relates to a method for detecting DNS query abnormity.
Background
A DNS (Domain Name Server) is a Server that converts a Domain Name (Domain Name) and an IP address (IP address) corresponding to the Domain Name. The DNS stores a table of domain names and their corresponding IP addresses (IP addresses) to resolve the domain names of messages. After the domain name registration queries the domain name and purchases the host services, you need to resolve the domain name to the purchased host to see the website content. At present, the problem of network attack by using a DNS query mode exists.
Disclosure of Invention
The invention aims to provide an anomaly detection method for DNS query aiming at various problems of network attack by using a DNS query mode, which can identify an IP (Internet protocol) sending an abnormal DNS request by analyzing processed low-dimensional data and a reliability index of a data point.
An anomaly detection method for DNS query, comprising the following steps:
extracting log information in a DNS (domain name system) server by using a data acquisition unit;
preprocessing the extracted data according to the set characteristics;
performing dimensionality reduction on the preprocessed data;
performing group analysis on the data in a low-dimensional space;
calculating the reliability of the data points after the dimensionality reduction based on the relative density;
and marking abnormal IP according to the distribution of the data and the credibility thereof.
Preferably, the collected data is derived from a query log in the DNS server; the collected data comprises source IP, destination IP, source port number and DNS message information.
Preferably, the data preprocessing operation comprises:
the characteristic attributes of the data comprise DNS request times in source IP unit time, peak values of the DNS request times, the ratio of DNS request failure, the information entropy of a source port, the information entropy of domain name types, peak values of the domain name types, the ratio of illegal domain names, the ratio of abnormal packets and the denial of service rate of a server; the data preprocessing process sequentially comprises normalization and normalization processing, wherein standard scores are adopted for normalization processing under the condition that the actual minimum value and the actual maximum value of the characteristic attribute are unknown, and then all data are normalized.
Preferably, the data dimensionality reduction operation is to perform centralization processing on the multidimensional data set, then calculate the covariance matrix of the multidimensional data set, perform eigenvalue decomposition, and select eigenvectors corresponding to a plurality of larger eigenvalues to form a projection matrix.
Preferably, the cluster analysis processing operation can be divided into three parts, namely determining the optimal cluster number, determining an initial centroid point and dividing the optimal cluster; the optimal cluster number is determined mainly by calculating the ratio of the corresponding inter-cluster variance to the global variance according to the increasing order of the cluster number, and selecting the cluster number of the inflection point as the optimal cluster number; determining an initial centroid point, wherein a point is selected randomly from an input data set as a first centroid point; for each point in the data set, calculating its distance from the closest centroid point and storing it in an array, and calculating the sum of these distances; finally, taking a random value, and taking and calculating the next centroid point in a weighting mode until all centroid points are selected; the optimal cluster division is mainly to calculate the Euclidean distance between each point in the data set and each centroid point according to the determined optimal cluster number and the initial centroid point, select the cluster of the centroid point corresponding to the point with the closest distance as the cluster to which the centroid point belongs, and update the centroid point until convergence.
Preferably, the operation of calculating the credibility of the data points is to calculate the relative distance of the data in each cluster after the cluster analysis, and the relative density of the data points is the reciprocal of the relative distance; the confidence level of the data points in each cluster is expressed in relative density.
Preferably, the operation of marking abnormal data points marks abnormal data points in the clusters according to the high and low credibility in each cluster, and compares the data among the clusters to find out whether the clusters with data abnormality exist.
According to the method for detecting the DNS query abnormity, the IP sending the abnormal DNS request can be identified through the analysis of the processed low-dimensional data and the reliability index of the data point.
Drawings
FIG. 1 is a flow chart of a method for detecting an exception of a DNS query according to an embodiment of the present invention
FIG. 2 is a block diagram illustrating an exception detection method for DNS query according to an embodiment of the present invention
FIG. 3 is a flow chart of a network organization for a DNS query anomaly detection method according to an embodiment of the present invention
Detailed Description
The following is a detailed description of embodiments of the invention, illustrated in the accompanying drawings in which like or similar reference numerals refer to the same or similar components or components having the same or similar functions throughout the several views. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of illustrating the present invention and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or "coupled". As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
As shown in fig. 1, a flowchart of a method for detecting DNS query anomalies according to an embodiment of the present invention:
101, extracting log information in a DNS server by using a data acquisition unit;
102, preprocessing the extracted data according to the set characteristics;
103, performing dimensionality reduction on the preprocessed data;
104, performing cluster analysis on the data in a low-dimensional space;
105, calculating the reliability of the data points after dimensionality reduction based on the relative density;
and 106, marking the abnormal IP according to the distribution of the data and the credibility thereof.
In step 101, the data collected by the data collector includes:
the collected data mainly come from query logs in a DNS server;
the collected data includes, but is not limited to, source IP, destination IP, source port number, DNS packet information, etc.
In step 102, the data preprocessing operation includes:
the characteristic attributes of the data include, but are not limited to, the number of DNS requests per unit time of the source IP, the peak number of DNS requests, the proportion of DNS request failures, the entropy of information of the source port, the entropy of information of the domain name category, the peak number of domain name category, the proportion of illegal domain names, the proportion of exception packets, the rate of server denial of service, and the like. The data preprocessing process sequentially comprises normalization and normalization processing. For the case where the actual minimum and maximum values of the feature attributes are unknown, a normalization process is performed using the standard score. All data were then normalized.
In step 103, the data dimension reduction operation includes:
the main process of the data dimension reduction operation is to firstly perform centralization processing on the multidimensional data set, then calculate the covariance matrix of the multidimensional data set, perform eigenvalue decomposition, and select eigenvectors corresponding to a plurality of larger eigenvalues to form a projection matrix. Data dimensionality reduction operations aim to map data points in a high-dimensional space into a low-dimensional space using a mapping relationship, often down-scaling to two-dimensional or three-dimensional spaces in view of the need for low-dimensional data analysis.
In step 104, the cluster analysis operation includes:
the cluster analysis processing operation can be divided into three parts, namely determining the optimal cluster number, determining an initial centroid point and dividing the optimal cluster. The optimal cluster number is determined by calculating the ratio of the corresponding inter-cluster variance to the global variance according to the ascending order of the cluster number and selecting the cluster number of the inflection point as the optimal cluster number. Determining an initial centroid point, wherein a point is selected randomly from an input data set as a first centroid point; for each point in the data set, calculating its distance from the closest centroid point and storing it in an array, and calculating the sum of these distances; and finally, taking a random value, and taking and calculating the next centroid point in a weighting mode until all centroid points are selected. The optimal cluster division is mainly to calculate the Euclidean distance between each point in the data set and each centroid point according to the determined optimal cluster number and the initial centroid point, select the cluster of the centroid point corresponding to the point with the closest distance as the cluster to which the centroid point belongs, and update the centroid point until convergence.
In step 105, the calculate data point confidence operation includes:
the operation of calculating the credibility of the data points is mainly to calculate the relative distance of the data in each cluster after the cluster analysis, and the relative density of the data points is the reciprocal of the relative distance. The reliability of the data points in each cluster is expressed by relative density, and the reliability of the data points is higher when the relative density is higher, and the reliability of the data points is lower when the relative density is lower.
In step 106, the mark outlier data point operation comprises:
the operation of marking abnormal data points is mainly to mark abnormal data points in clusters according to the high and low credibility in each cluster and compare data among the clusters to find out whether the clusters with data abnormality exist.
Fig. 2 is a diagram illustrating an abnormality detection method for DNS query according to an embodiment of the present invention. The data acquisition unit extracts a source IP, a destination IP, a source port number and a DNS message in log information of the DNS server; the data preprocessing module extracts the set characteristics and sequentially carries out normalization and normalization processing on the data; the dimension reduction module is used for reducing the dimension of the multi-dimensional data set so as to mine the correlation among the dimensions and simultaneously facilitate the visual observation of the distribution characteristics of the data in the low dimension; the group analysis module aims at reasonably dividing the clusters of the data; the abnormal detection module is mainly used for calculating the reliability of each data point based on the relative density to detect the abnormal IP in the same cluster, and meanwhile, performing comparative analysis based on the mass center point of each cluster to detect the abnormal data cluster.
Fig. 3 shows a flow chart of a network organization for a DNS query anomaly detection method according to an embodiment of the present invention. The method comprises the steps that a data acquisition unit is used for extracting log information in a DNS server, wherein the log information comprises information such as a source IP, a destination IP, a source port number, a DNS message and the like; extracting statistical attribute features in the information, and carrying out normalization or normalization processing; carrying out dimensionality reduction on the multi-dimensional data set to find the correlation among dimensions, and simultaneously being beneficial to observing the distribution of data more intuitively; performing cluster analysis on the data subjected to dimensionality reduction in a low-dimensional space to reasonably divide clusters to which the data points belong; the method for detecting abnormal points in the same cluster data and the visual analysis of the overall characteristics of the data among different clusters are adopted, so that the IP detection of the DNS abnormal query can be realized in the global and local layers.
By the technical scheme provided by the invention, the abnormal DNS inquired host IP can be detected according to the reliability, and the implicit relation among host IP behaviors can be visually observed to analyze the abnormal problem in the network.
Those skilled in the art will appreciate that the present invention may be directed to an apparatus for performing one or more of the operations described in the present application. The apparatus may be specially designed and constructed for the required purposes, or it may comprise any known apparatus in a general purpose computer selectively activated or reconfigured by a program stored in the general purpose computer.
It will be understood by those within the art that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the methods specified in the block or blocks of the block diagrams and/or flowchart block or blocks.
Those of skill in the art will appreciate that various operations, methods, steps in the processes, acts, or solutions discussed in the present application may be alternated, modified, combined, or deleted. Further, various operations, methods, steps in the flows, which have been discussed in the present application, may be interchanged, modified, rearranged, decomposed, combined, or eliminated. Further, steps, measures, schemes in the various operations, methods, procedures disclosed in the prior art and the present invention can also be alternated, changed, rearranged, decomposed, combined, or deleted.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (7)

1. An anomaly detection method for DNS query is characterized by comprising the following steps:
extracting log information in a DNS (domain name system) server by using a data acquisition unit;
preprocessing the extracted data according to the set characteristics;
performing dimensionality reduction on the preprocessed data;
performing group analysis on the data in a low-dimensional space;
calculating the reliability of the data points after the dimensionality reduction based on the relative density;
and marking abnormal IP according to the distribution of the data and the credibility thereof.
2. The method of claim 1, wherein the collected data is derived from a query log in a DNS server; the collected data comprises source IP, destination IP, source port number and DNS message information.
3. The method of claim 1, wherein the data preprocessing operation comprises:
the characteristic attributes of the data comprise DNS request times in source IP unit time, peak values of the DNS request times, the ratio of DNS request failure, the information entropy of a source port, the information entropy of domain name types, peak values of the domain name types, the ratio of illegal domain names, the ratio of abnormal packets and the denial of service rate of a server; the data preprocessing process sequentially comprises normalization and normalization processing, wherein standard scores are adopted for normalization processing under the condition that the actual minimum value and the actual maximum value of the characteristic attribute are unknown, and then all data are normalized.
4. The method of claim 1, wherein the data dimensionality reduction operation is to perform a centering process on the multidimensional data set, then calculate a covariance matrix of the multidimensional data set, perform eigenvalue decomposition, and select eigenvectors corresponding to larger eigenvalues to form a projection matrix.
5. The method of claim 1, wherein the cluster analysis processing operation comprises determining an optimal cluster number, determining an initial centroid point, and performing an optimal cluster partition; the optimal cluster number is determined mainly by calculating the ratio of the corresponding inter-cluster variance to the global variance according to the increasing order of the cluster number, and selecting the cluster number of the inflection point as the optimal cluster number; determining an initial centroid point, wherein a point is selected randomly from an input data set as a first centroid point; for each point in the data set, calculating its distance from the closest centroid point and storing it in an array, and calculating the sum of these distances; finally, taking a random value, and taking and calculating the next centroid point in a weighting mode until all centroid points are selected; the optimal cluster division is mainly to calculate the Euclidean distance between each point in the data set and each centroid point according to the determined optimal cluster number and the initial centroid point, select the cluster of the centroid point corresponding to the point with the closest distance as the cluster to which the centroid point belongs, and update the centroid point until convergence.
6. The method of claim 1, wherein the calculating a confidence level of the data points comprises calculating a relative distance of the data in each cluster after the cluster analysis, wherein the relative density of the data points is an inverse of the relative distance; the confidence level of the data points in each cluster is expressed in relative density.
7. The method of claim 1, wherein the operation of labeling abnormal data points labels abnormal data points in clusters according to the confidence level in each cluster, and compares data between clusters to find whether there is a cluster with data abnormality.
CN201911197648.1A 2019-11-29 2019-11-29 Abnormity detection method for DNS query Pending CN110830505A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911197648.1A CN110830505A (en) 2019-11-29 2019-11-29 Abnormity detection method for DNS query

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911197648.1A CN110830505A (en) 2019-11-29 2019-11-29 Abnormity detection method for DNS query

Publications (1)

Publication Number Publication Date
CN110830505A true CN110830505A (en) 2020-02-21

Family

ID=69543126

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911197648.1A Pending CN110830505A (en) 2019-11-29 2019-11-29 Abnormity detection method for DNS query

Country Status (1)

Country Link
CN (1) CN110830505A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111726810A (en) * 2020-06-17 2020-09-29 华中科技大学 Wireless signal monitoring and wireless communication behavior auditing system in numerical control processing environment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103001825A (en) * 2012-11-15 2013-03-27 中国科学院计算机网络信息中心 Method and system for detecting DNS (domain name system) traffic abnormality
US20130174253A1 (en) * 2011-12-29 2013-07-04 Verisign, Inc. Systems and methods for detecting similarities in network traffic
CN110336789A (en) * 2019-05-28 2019-10-15 北京邮电大学 Domain-flux Botnet detection method based on blended learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130174253A1 (en) * 2011-12-29 2013-07-04 Verisign, Inc. Systems and methods for detecting similarities in network traffic
CN103001825A (en) * 2012-11-15 2013-03-27 中国科学院计算机网络信息中心 Method and system for detecting DNS (domain name system) traffic abnormality
CN110336789A (en) * 2019-05-28 2019-10-15 北京邮电大学 Domain-flux Botnet detection method based on blended learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
吉星 等: "基于日志信息的 DNS 查询异常检测算法", 《北京邮电大学学报》 *
岑咏华等: "一种基于改进K-means的文档聚类算法的实现研究", 《现代图书情报技术》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111726810A (en) * 2020-06-17 2020-09-29 华中科技大学 Wireless signal monitoring and wireless communication behavior auditing system in numerical control processing environment

Similar Documents

Publication Publication Date Title
CN105915555B (en) Method and system for detecting network abnormal behavior
US9560063B2 (en) Apparatus and method for detecting malicious domain cluster
CN110336827B (en) Modbus TCP protocol fuzzy test method based on abnormal field positioning
Li et al. On challenges in evaluating malware clustering
US7073074B2 (en) System and method for storing events to enhance intrusion detection
US20160294852A1 (en) Determining string similarity using syntactic edit distance
EP3343869A1 (en) A method for modeling attack patterns in honeypots
CN109842588B (en) Network data detection method and related equipment
CN106470214B (en) Attack detection method and device
CN110071829B (en) DNS tunnel detection method and device and computer readable storage medium
EP2692119B1 (en) Non-existent domain names traffic analysis
WO2019112986A1 (en) Efficient event searching
CN113706100B (en) Real-time detection and identification method and system for Internet of things terminal equipment of power distribution network
CN113785289A (en) System and method for dynamically generating a set of API endpoints
EP3242240B1 (en) Malicious communication pattern extraction device, malicious communication pattern extraction system, malicious communication pattern extraction method and malicious communication pattern extraction program
CN111274218A (en) Multi-source log data processing method for power information system
CN105959321A (en) Passive identification method and apparatus for network remote host operation system
CN113872962B (en) Low-speed port scanning detection method for high-speed network sampling data acquisition scene
US8756312B2 (en) Multi-tier message correlation
CN111835781B (en) Method and system for discovering host of same source attack based on lost host
CN110830505A (en) Abnormity detection method for DNS query
US9398040B2 (en) Intrusion detection system false positive detection apparatus and method
CN104424316A (en) Data storage method, data searching method, related device and system
CN112003884A (en) Network asset acquisition and natural language retrieval method
CN111314109A (en) Weak key-based large-scale Internet of things equipment firmware identification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200221