WO2018032936A1 - 一种对算法生成域名进行检测的方法及装置 - Google Patents

一种对算法生成域名进行检测的方法及装置 Download PDF

Info

Publication number
WO2018032936A1
WO2018032936A1 PCT/CN2017/093890 CN2017093890W WO2018032936A1 WO 2018032936 A1 WO2018032936 A1 WO 2018032936A1 CN 2017093890 W CN2017093890 W CN 2017093890W WO 2018032936 A1 WO2018032936 A1 WO 2018032936A1
Authority
WO
WIPO (PCT)
Prior art keywords
domain name
algorithm
tuple
normal
character
Prior art date
Application number
PCT/CN2017/093890
Other languages
English (en)
French (fr)
Inventor
孙默
罗熙
王利明
杨婧
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2018032936A1 publication Critical patent/WO2018032936A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1483Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing

Definitions

  • the present application relates to, but is not limited to, the field of communication technologies, and in particular, to a method and apparatus for detecting a domain name generated by an algorithm.
  • DGA Domain Generation Algorithm
  • the detection based on the characteristics of DNS access behavior is a common method for detecting the domain name generated by the algorithm.
  • This method analyzes the access behavior of the domain name generated by the access algorithm, extracts the feature that can describe the domain name, establishes a corresponding domain name detection model or counts a specific threshold, and then uses the domain name detection model or threshold to generate an unknown algorithm. The domain name is detected.
  • the embodiment of the invention provides a method and a device for detecting a domain name generated by an algorithm, so as to solve the problem that the domain name cannot be quickly detected by the algorithm.
  • An embodiment of the present invention provides a method for detecting a domain name generated by an algorithm, including:
  • a random model is established according to the algorithm to generate a domain name set, and a normal model is established according to the normal domain name set;
  • the algorithm generates a domain name according to the stochastic model and the normal model.
  • the step of generating a stochastic model based on the probabilistic model generation algorithm, generating a stochastic model according to the algorithm, and establishing a normal model according to the normal domain name set includes:
  • the method further includes:
  • the valid information tuple includes: access ip, domain name, and timestamp.
  • the step of performing noise filtering on the access data of the domain name system DNS includes:
  • the information record error in the access data of the domain name system DNS and the domain name in the whitelist list are filtered.
  • the step of detecting a domain name generated by the algorithm according to the random model and the normal model includes:
  • the window voting is performed on the marked domain name. If the number of domain names generated by the algorithm in the domain name queue to be detected is greater than the preset threshold quantity t m , the domain name generated by the algorithm in the ip and the domain name queue to be detected is marked as abnormal.
  • Belong to the initial character probability matrix ⁇ 1 Belongs to the character conversion probability matrix B 1 , Belongs to the initial character probability matrix ⁇ 2 , Belongs to the character conversion probability matrix B 2 ,
  • CharSeq i is the i-th character conversion tuple, Converts the element to the nth character, k is a natural number, and n is the total number of conversion tuple sequences.
  • the step of extracting the converted tuple sequence set CharSeqSet includes:
  • the application further provides a computer readable storage medium storing computer executable instructions that are implemented when the computer executable instructions are executed.
  • An embodiment of the present invention provides an apparatus for detecting a domain name generated by an algorithm, including:
  • the detecting unit is configured to detect the domain name generated by the algorithm according to the random model and the normal model.
  • the device further includes:
  • the filtering unit is configured to perform noise filtering on the access data of the domain name system DNS to obtain effective Information tuple Info; the valid information tuple includes: access ip, domain name, and timestamp.
  • the filtering unit is further configured to filter information records in the access data of the domain name system DNS and domain names in the whitelist to obtain a valid information tuple Info.
  • the detecting unit further includes:
  • Extracting module configured to access ip units maintain a predetermined queue length t w of the domain name to be detected, when the queue is full, wherein the domain name of the process, a set of extracted tuples conversion CharSeqSet;
  • a calculation module that sets a sequence of each character conversion tuple in a set of converted tuple sequences for each access ip, CharSeqSet Which is a normal probability calculation domain P1 i and the probability of belonging to the domain algorithm generator P2 i, wherein:
  • the judging module is configured to perform window voting on the marked domain name. If the number of domain names generated by the algorithm in the domain name queue to be detected is greater than a preset threshold quantity t m , the algorithm generates the domain name in the ip and the domain name queue to be detected as abnormal;
  • Belong to the initial character probability matrix ⁇ 1 Belongs to the character conversion probability matrix B 1 , Belongs to the initial character probability matrix ⁇ 2 , Belongs to the character conversion probability matrix B 2 ,
  • CharSeq i is the i-th character conversion tuple, Converts the element to the nth character, k is a natural number, and n is the total number of conversion tuple sequences.
  • the extracting module is further configured to maintain a domain name queue to be detected with a predetermined length t w in units of access ip, and extract, when the queue is full, a character conversion element for each domain name in the domain name queue to be detected. Group, get the character conversion tuple sequence Forming a set of converted tuple sequences of a predetermined size
  • the embodiment of the invention generates a stochastic model and a normal model according to the algorithm to generate a domain name set and a normal domain name set according to the algorithm, and detects the generated domain name by using the above model, thereby realizing the rapid detection of the domain name generated by the algorithm, thereby effectively Solved the problem that the algorithm cannot generate domains Name the problem of rapid detection.
  • FIG. 1 is a schematic flowchart of a method for detecting a domain name generated by an algorithm according to an embodiment of the present invention
  • FIG. 2 is a schematic flowchart diagram of another method for detecting a domain name generated by an algorithm according to an embodiment of the present invention
  • FIG. 3 is a schematic flowchart of still another method for detecting a domain name generated by an algorithm according to an embodiment of the present invention
  • FIG. 4 is a schematic flow chart of a method for noise filtering according to an embodiment of the present invention.
  • FIG. 5 is a schematic flowchart of a method for maintaining a queue according to an embodiment of the present invention
  • FIG. 6 is a schematic flow chart of a method for detecting an abnormality according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of an apparatus for detecting a domain name generated by an algorithm according to an embodiment of the present invention.
  • this Embodiments of the present invention provide a method and apparatus for detecting a domain name generated by an algorithm, modeling a character conversion probability, and describing a difference in character distribution between a domain name and a normal domain name, so that the algorithm can quickly respond to the domain name generated by the algorithm, and can respond to There is only a single ip in the environment.
  • the embodiments of the present invention are further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting.
  • An embodiment of the present invention provides a method for detecting a domain name generated by an algorithm, as shown in FIG. 1 .
  • the method includes:
  • the probability model generation algorithm is used to generate a random model according to the algorithm to generate a domain name set, and establish a normal model according to the normal domain name set;
  • the embodiment of the present invention establishes a stochastic model and a normal model by generating a domain name set and a normal domain name set according to the algorithm, and generates a domain name by using the above model to detect the domain name generated by the algorithm. Detection, and thus effectively solve the problem of not being able to quickly detect the domain name generated by the algorithm.
  • step S101 in the embodiment of the present invention includes:
  • the embodiment of the present invention is based on the difference in the character distribution of the domain name generated by the normal domain name and the algorithm, and the conversion feature of the domain name character is modified by the model, thereby realizing the fast and effective detection of the domain name generated by the algorithm.
  • the probability model generation algorithm described in the embodiment of the present invention includes calculating an initial character probability matrix ⁇ and a character conversion probability matrix B:
  • the initial character probability matrix ⁇ indicates the initial character probability P ⁇ of the domain name beginning with the characters ⁇ [a,b,...,z,0,...,9, others], and the calculation method is as follows:
  • the initial character probability matrix ⁇ [P a , P b , ..., P z , P 0 , ..., P 9 , P others ] is formed.
  • the domain name is regarded as a sequence of characters, the character conversion tuple ( ⁇ , ⁇ ) is extracted, and the number of occurrences of each character conversion tuple n ⁇ is counted;
  • the embodiment of the present invention further includes:
  • the valid information tuple includes: access ip, domain name, and timestamp.
  • the information record error in the access data of the domain name system DNS and the domain name in the whitelist list are filtered to obtain a valid information tuple Info.
  • step S102 described in the embodiment of the present invention includes:
  • the window voting is performed on the marked domain name. If the number of domain names generated by the algorithm in the domain name queue to be detected is greater than the preset threshold quantity t m , the domain name generated by the algorithm in the ip and the domain name queue to be detected is marked as abnormal, to access Ip, domain name, timestamp> format output.
  • Belong to the initial character probability matrix ⁇ 1 Belongs to the character conversion probability matrix B 1 , Belongs to the initial character probability matrix ⁇ 2 , Belongs to the character conversion probability matrix B 2 ,
  • CharSeq i is the i-th character conversion tuple, Converts the element to the nth character, k is a natural number, and n is the total number of conversion tuple sequences.
  • the method in the embodiment of the present invention further includes a method for maintaining a domain name queue, and the specific steps are as follows;
  • the method for extracting a tuple sequence set in the embodiment of the present invention is to extract a character conversion tuple for each domain name in the domain name queue to be detected, and obtain a character conversion tuple sequence. Finally, a set of converted tuple sequences of predetermined size t w is formed
  • the method according to the embodiment of the present invention includes a training establishment model and an application model for detecting two processes.
  • noise filtering is performed on the access data of the Domain Name System (DNS) to obtain a valid information tuple; then, the domain name extraction conversion tuple sequence set in the domain name queue to be detected is detected; finally, the model M is utilized. 1 and M 2 , combined with the converted tuple sequence set, the domain name in the domain name queue to be detected is abnormally detected, and the detection result is output, as shown in FIG. 2 .
  • the domain name in the domain name queue to be detected is abnormally detected, and the detection result is output, as shown in FIG. 2 .
  • the probabilistic model generating method learns the normal domain name set and the algorithm generated domain name set in the training data, including calculating an initial character probability matrix ⁇ and a character conversion probability matrix B, and generating a normal model M 1 and a stochastic model.
  • M 2 see Figure 3, the specific process is as follows:
  • the noise filtering method in the embodiment of the present invention includes three steps of effective information tuple extraction, record error filtering, and white list filtering. Referring to FIG. 4, the specific process is as follows:
  • the destination port is not 53;
  • Alexa ranks the top 1 million domain name
  • the queue maintenance method of the embodiment of the present invention is used to maintain a domain name queue to be detected for accessing an ip.
  • the domain name in the extraction queue is converted into a converted tuple sequence set, the queue element is deleted, and the new request domain name is awaited. And delete the queue waiting for timeout.
  • the specific process is as follows:
  • the module receives the Info tuple generated by the noise filtering module, it is determined whether the access ip in the tuple is a new IP address: if yes, a domain name queue to be detected is created for it, and the Info tuple is The domain name is added to the end of the queue; if not, it is determined whether the domain name is in the queue of the domain to be detected that accesses the ip. If it exists, it is not processed; otherwise, it joins the tail of the queue of the domain name to be detected;
  • the abnormality detecting method combines the normal model M 1 and the random model M 2 output during the training process, and performs algorithm for generating a domain name determination on the character conversion sequence in the converted tuple sequence set, and adopts a window voting mechanism to detect Abnormal, output test results, see Figure 6, the specific process is as follows:
  • the embodiment of the present invention is based on the difference in character distribution between the normal domain name and the algorithm domain name, and uses a probability model to describe the conversion feature of the domain name character, which can quickly and effectively detect the random characteristics of the domain name; further, we choose to Accessing ip as a detection unit can effectively cope with the situation that only a single ip accesses the DGA domain name in the network environment; in the abnormal detection of the access ip, the method of probability comparison of the double probability model is adopted, and the window voting mechanism is combined to greatly reduce the false positive report. rate.
  • the DGA domain name detection using the embodiment of the present invention can achieve good results in both detection efficiency and detection performance.
  • Embodiments of the present invention further provide a computer readable storage medium storing computer executable instructions that are implemented when the computer executable instructions are executed.
  • An embodiment of the present invention provides an apparatus for detecting a domain name generated by an algorithm.
  • the apparatus includes:
  • the detecting unit is configured to detect the domain name generated by the algorithm according to the random model and the normal model.
  • the embodiment of the present invention establishes a stochastic model and a normal model according to the probabilistic model generation algorithm according to the algorithm, and generates a random model and a normal model according to the algorithm to generate a domain name set and a normal domain name set respectively, and detects the domain name generated by the algorithm according to the above model, thereby realizing
  • the algorithm generates a domain name for rapid detection, which effectively solves the problem that the domain name cannot be quickly detected by the algorithm.
  • the embodiment of the present invention is based on the difference in the character distribution of the domain name generated by the normal domain name and the algorithm, and the conversion feature of the domain name character is modified by the model, thereby realizing the fast and effective detection of the domain name generated by the algorithm.
  • the probability model generation algorithm described in the embodiment of the present invention includes calculating an initial character probability matrix ⁇ and a character conversion probability matrix B:
  • the initial character probability matrix ⁇ indicates the initial character probability P ⁇ of the domain name beginning with the characters ⁇ [a,b,...,z,0,...,9, others], and the calculation method is as follows:
  • the initial character probability matrix ⁇ [P a , P b , ..., P z , P 0 , ..., P 9 , P others ] is formed.
  • the domain name is regarded as a sequence of characters, the character conversion tuple ( ⁇ , ⁇ ) is extracted, and the number of occurrences of each character conversion tuple n ⁇ is counted;
  • the apparatus further includes: a filtering unit, performing noise filtering on the access data of the domain name system DNS by the filtering unit, to obtain a valid information tuple Info; the valid information tuple includes: accessing the ip, Domain name and timestamp.
  • the filtering unit in the embodiment of the present invention filters the domain name in the access data of the DNS of the domain name system and the domain name in the whitelist to obtain a valid information tuple Info.
  • the detecting unit of the device in the embodiment of the present invention further includes:
  • Extracting module configured to access ip units maintain a predetermined queue length t w of the domain name to be detected, when the queue is full, wherein the domain name of the process, a set of extracted tuples conversion CharSeqSet;
  • a calculation module that sets a sequence of each character conversion tuple in a set of converted tuple sequences for each access ip, CharSeqSet Which is a normal probability calculation domain P1 i and the probability of belonging to the domain algorithm generator P2 i, wherein:
  • the judging module is configured to perform window voting on the marked domain name. If the number of domain names generated by the algorithm in the domain name queue to be detected is greater than a preset threshold quantity t m , the algorithm generates the domain name in the ip and the domain name queue to be detected as abnormal;
  • Belong to the initial character probability matrix ⁇ 1 Belongs to the character conversion probability matrix B 1 , Belongs to the initial character probability matrix ⁇ 2 , Belongs to the character conversion probability matrix B 2 ,
  • CharSeq i is the i-th character conversion tuple, Converts the element to the nth character, k is a natural number, and n is the total number of conversion tuple sequences.
  • the extraction module of the embodiment of the present invention maintains a domain name queue to be detected with a predetermined length tw in the unit of access ip. When the queue is full, the character conversion tuple is extracted for each domain name in the domain name queue to be detected. Character conversion tuple sequence Forming a set of converted tuple sequences of a predetermined size
  • the embodiment of the invention generates a stochastic model and a normal model according to the algorithm to generate a domain name set and a normal domain name set according to the algorithm, and detects the generated domain name by using the above model, thereby realizing the rapid detection of the domain name generated by the algorithm, thereby effectively Solved the problem that the algorithm cannot generate a domain name for rapid detection.
  • computer storage medium includes volatile and nonvolatile, implemented in any method or technology for storing information, such as computer readable instructions, data structures, program modules or other data. Sex, removable and non-removable media.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical disc storage, magnetic cartridge, magnetic tape, magnetic disk storage or other magnetic storage device, or may Any other medium used to store the desired information and that can be accessed by the computer.
  • communication media typically includes computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and can include any information delivery media. .
  • a stochastic model and a normal model are generated according to the algorithm to generate the domain name set and the normal domain name set respectively, and the domain name is detected by the above model, thereby realizing the rapid detection of the domain name generated by the algorithm, thereby effectively solving the problem.
  • the algorithm generates a domain name for rapid detection.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本发明实施例公开了一种对算法生成域名进行检测的方法及装置,其通过概率模型生成算法,分别根据算法生成域名集和正常域名集建立随机模型和正常模型,并通过上述模型对算法生成域名进行检测,从而实现对算法生成域名进行快速检测,进而有效解决了不能对算法生成域名进行快速检测的问题。

Description

一种对算法生成域名进行检测的方法及装置 技术领域
本申请涉及但不限于通信技术领域,特别是涉及一种对算法生成域名进行检测的方法及装置。
背景技术
相对于早期的蠕虫、病毒等不受控的恶意软件,当前绝大多数攻击者都会通过一个命令与控制(Command and Control,简称为C&C)信道控制他们的恶意软件,来实施其更具目的性的攻击行为。同时,由于域名系统(Domain Name System,简称为DNS)所带来的便捷性,使用DNS来定位C&C服务器变成一种主流方式。
攻击者为了逃避检测,会采用域名生成算法(Domain Generation Algorithm,简称为DGA),每隔一段时间生成大量的随机域名进行访问,这些域名也被称为算法生成域名,来确定真正C&C域名。以知名的僵尸网络conficker为例,其每小时生成250个域名,并随机选择其中32个进行连接尝试。
在相关研究方面,基于DNS访问行为特征的检测是对算法生成域名进行检测的一种常见方法。这种方法通过对访问算法生成域名的访问行为进行分析,提取能够对域名的进行描述的特征,建立对应的域名检测模型或者统计出特定的阈值,然后利用这个域名检测模型或阈值对未知算法生成域名进行检测。
但是,在提取DNS访问流量的时间特性时,往往需要很大的计算资源,使得这种检测方法很有可能无法对算法生成域名进行快速响应。同时,攻击者对自己控制域名更换的非常频繁,若无法对这些域名进行快速响应,对它们造成的影响将大大降低。
发明内容
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
本发明实施例提供了一种对算法生成域名进行检测的方法及装置,以解决不能对算法生成域名进行快速检测的问题。
本发明实施例一方面提供了一种对算法生成域名进行检测的方法,包括:
基于概率模型生成算法,根据算法生成域名集建立随机模型,并根据正常域名集建立正常模型;
根据所述随机模型和所述正常模型对算法生成域名进行检测。
可选地,所述基于概率模型生成算法,根据算法生成域名集建立随机模型,并根据正常域名集建立正常模型的步骤,包括:
基于概率模型生成算法,计算正常域名集的初始字符概率矩阵π1和字符转换概率矩阵B1,建立正常模型M1=<B1,π1>,并计算算法生成域名集的初始字符概率矩阵π2和字符转换概率矩阵B2,建立随机模型M2=<B2,π2>。
可选地,所述基于概率模型生成算法,根据算法生成域名集建立随机模型,并根据正常域名集建立正常模型的步骤之后,还包括:
对域名系统DNS的访问数据进行噪声过滤,得到有效信息元组Info;
所述有效信息元组包括:访问ip,域名和时间戳。
可选地,所述对域名系统DNS的访问数据进行噪声过滤的步骤,包括:
对所述域名系统DNS的访问数据中信息记录错误和白名单列表当中的域名进行过滤。
可选地,所述根据所述随机模型和所述正常模型对算法生成域名进行检测的步骤,包括:
以访问ip为单位,维护预定长度tw的待检测域名队列,当队列满时,对其中的域名进行处理,提取转换元组序列集合CharSeqSet;
对每个访问ip的转换元组序列集合CharSeqSet中的每个字符转换元组序列
Figure PCTCN2017093890-appb-000001
计算其属于正常域名的概率P1i和属于算法生成域名的概率P2i,其中:
Figure PCTCN2017093890-appb-000002
当P1i>P2i,则标记待检测域名队列中的第i个域名为正常域名,否则,标记为算法生成域名;
对标记后的域名进行窗口投票,若待检测域名队列当中的算法生成域名的数量大于预设阈值数量tm,则将该ip和待检测域名队列中的算法生成域名标记为异常;
其中,
Figure PCTCN2017093890-appb-000003
属于初始字符概率矩阵π1
Figure PCTCN2017093890-appb-000004
属于字符转换概率矩阵B1
Figure PCTCN2017093890-appb-000005
属于初始字符概率矩阵π2
Figure PCTCN2017093890-appb-000006
属于字符转换概率矩阵B2,CharSeqi为第i个字符转换元组,
Figure PCTCN2017093890-appb-000007
为第n个字符转换元,k为自然数,n为转换元组序列的总数。
可选地,所述提取转换元组序列集合CharSeqSet的步骤,包括:
对于待检测域名队列中的每个域名,提取其字符转换元组,得到字符转换元组序列
Figure PCTCN2017093890-appb-000008
形成预定大小的转换元组序列集合
Figure PCTCN2017093890-appb-000009
本申请另外提供一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令被执行时实现上述方法。
本发明实施例另一方面提供了一种对算法生成域名进行检测的装置,包括:
建立单元,设置成基于概率模型生成算法,根据算法生成域名集建立随机模型,并根据正常域名集建立正常模型;
检测单元,设置成根据所述随机模型和所述正常模型对算法生成域名进行检测。
可选地,所述建立单元还设置成,基于概率模型生成算法,并计算正常域名集的初始字符概率矩阵π1和字符转换概率矩阵B1,建立正常模型M1=<B1,π1>,计算算法生成域名集的初始字符概率矩阵π2和字符转换概率矩阵B2,建立随机模型M2=<B2,π2>。
可选地,该装置还包括:
过滤单元,设置成对域名系统DNS的访问数据进行噪声过滤,得到有效 信息元组Info;所述有效信息元组包括:访问ip,域名和时间戳。
可选地,所述过滤单元还设置成,对所述域名系统DNS的访问数据中信息记录错误和白名单列表当中的域名进行过滤,得到有效信息元组Info。
可选地,所述检测单元还包括:
提取模块,设置成以访问ip为单位,维护预定长度tw的待检测域名队列,当队列满时,对其中的域名进行处理,提取转换元组序列集合CharSeqSet;
计算模块,设置成对每个访问ip的转换元组序列集合CharSeqSet中的每个字符转换元组序列
Figure PCTCN2017093890-appb-000010
计算其属于正常域名的概率P1i和属于算法生成域名的概率P2i,其中:
Figure PCTCN2017093890-appb-000011
当P1i>P2i,则标记待检测域名队列中的第i个域名为正常域名,否则,标记为算法生成域名;
判断模块,设置成对标记后的域名进行窗口投票,若待检测域名队列当中的算法生成域名的数量大于预设阈值数量tm,则将该ip和待检测域名队列中的算法生成域名标记为异常;
其中,
Figure PCTCN2017093890-appb-000012
属于初始字符概率矩阵π1
Figure PCTCN2017093890-appb-000013
属于字符转换概率矩阵B1
Figure PCTCN2017093890-appb-000014
属于初始字符概率矩阵π2
Figure PCTCN2017093890-appb-000015
属于字符转换概率矩阵B2,CharSeqi为第i个字符转换元组,
Figure PCTCN2017093890-appb-000016
为第n个字符转换元,k为自然数,n为转换元组序列的总数。
可选地,所述提取模块还设置成,以访问ip为单位,维护预定长度tw的待检测域名队列,当队列满时,对于待检测域名队列中的每个域名,提取其字符转换元组,得到字符转换元组序列
Figure PCTCN2017093890-appb-000017
Figure PCTCN2017093890-appb-000018
形成预定大小的转换元组序列集合
Figure PCTCN2017093890-appb-000019
本发明实施例有益效果如下:
本发明实施例通过概率模型生成算法,分别根据算法生成域名集和正常域名集建立随机模型和正常模型,并通过上述模型对算法生成域名进行检测,从而实现对算法生成域名进行快速检测,进而有效解决了不能对算法生成域 名进行快速检测的问题。
在阅读并理解了附图和详细描述后,可以明白其他方面。
附图概述
图1是本发明实施例的一种对算法生成域名进行检测的方法的流程示意图;
图2是本发明实施例的另一种对算法生成域名进行检测的方法的流程示意图;
图3是本发明实施例的再一种对算法生成域名进行检测的方法的流程示意图;
图4是本发明实施例的噪声过滤的方法的流程示意图;
图5是本发明实施例的对队列维护的方法的流程示意图;
图6是本发明实施例的异常检测的方法的流程示意图;
图7是本发明实施例的一种对算法生成域名进行检测的装置的结构示意图。
本发明的实施方式
由于攻击者需要在产生大量域名的同时,要避免其C&C域名与正常域名产生冲突,所以这些算法生成域名在字符特征上会与正常域名相差很大,并且具有很强的随机性,因此,本发明实施例提供一种对算法生成域名进行检测的方法及装置,对字符转换概率建模,描述算法生成域名与正常域名的字符分布差异,使其可以对算法生成域名进行快速响应,并且能够应对环境内只存在单个ip的情况。以下结合附图以及实施例,对本发明实施例进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不限定本申请。
方法实施例
本发明实施例提供了一种对算法生成域名进行检测的方法,参见图1, 该方法包括:
S101、基于概率模型生成算法,根据算法生成域名集建立随机模型,并根据正常域名集建立正常模型;
S102、根据所述随机模型和所述正常模型对算法生成域名进行检测。
也就是说,本发明实施例通过概率模型生成算法,分别根据算法生成域名集和正常域名集建立随机模型和正常模型,并通过上述模型对算法生成域名进行检测,从而实现对算法生成域名进行快速检测,进而有效解决了不能对算法生成域名进行快速检测的问题。
可选地,本发明实施例所述步骤S101包括:
基于概率模型生成算法,并计算正常域名集的初始字符概率矩阵π1和字符转换概率矩阵B1,建立正常模型M1=<B1,π1>,计算算法生成域名集的初始字符概率矩阵π2和字符转换概率矩阵B2,建立随机模型M2=<B2,π2>。
即,本发明实施例是基于正常域名和算法生成域名在字符分布上的差异性,采用改了模型刻画域名字符的转换特征,从而实现快速有效的检测出算法生成域名。
具体实施时,本发明实施例所述的概率模型生成算法,包括计算初始字符概率矩阵π和字符转换概率矩阵B:
初始字符概率矩阵π,表示域名以字符α∈[a,b,...,z,0,...,9,others]开头的初始字符概率Pα,计算方法如下:
在训练数据中,对[a,b,...,z,0,...,9,others]中的任一字符α,统计以α开头的域名数量nα,以及所有域名的数量N,其中others代表非数字字母的字符;
计算字符α的初始概率
Figure PCTCN2017093890-appb-000020
形成初始字符概率矩阵π=[Pa,Pb,...,Pz,P0,...,P9,Pothers]。
字符转换概率矩阵B,表示字符之间的转换概率Pαβ=P(β|α)(α,β∈[a,b,...,z,0,...,9,others]),即当前出现的字符为α,下一个字符为β的概率,计算方法如下:
在训练数据中,将域名视为字符序列,提取其中的字符转换元组(α,β),并统计每一个字符转换元组出现的次数nαβ
设以α开头的字符转换元组的总数为nα*,则字符α到字符β的转换概率为:
Figure PCTCN2017093890-appb-000021
计算所有Pαβ(α,β∈[a,b,...,z,0,...,9,others]),得到字符转换概率矩阵B=(Pαβ)37×37
本发明实施例在步骤S101之后,步骤S102之前,还包括:
对域名系统DNS的访问数据进行噪声过滤,得到有效信息元组Info;
所述有效信息元组包括:访问ip,域名和时间戳。
即,对所述域名系统DNS的访问数据中信息记录错误和白名单列表当中的域名进行过滤,得到有效信息元组Info。
需要说明的是,为了更好更准确的对算法生成域名进行检测,本领域的技术人员也可以根据需要来设置其他的有效信息元组。
可选地,本发明实施例所述的步骤S102包括:
以访问ip为单位,维护预定长度tw的待检测域名队列,当队列满时,对其中的域名进行处理,提取转换元组序列集合CharSeqSet;
对每个访问ip的转换元组序列集合CharSeqSet中的每个字符转换元组序列
Figure PCTCN2017093890-appb-000022
计算其属于正常域名的概率P1i和属于算法生成域名的概率P2i,其中:
Figure PCTCN2017093890-appb-000023
当P1i>P2i,则标记待检测域名队列中的第i个域名为正常域名,否则,标记为算法生成域名;
对标记后的域名进行窗口投票,若待检测域名队列当中的算法生成域名的数量大于预设阈值数量tm,则将该ip和待检测域名队列中的算法生成域名标记为异常,以<访问ip,域名,时间戳>的格式输出。
其中,
Figure PCTCN2017093890-appb-000024
属于初始字符概率矩阵π1
Figure PCTCN2017093890-appb-000025
属于字符转换概率矩阵B1
Figure PCTCN2017093890-appb-000026
属于初始字符概率矩阵π2
Figure PCTCN2017093890-appb-000027
属于字符转换概率矩阵B2,CharSeqi为第i个字符转换元组,
Figure PCTCN2017093890-appb-000028
为第n个字符转换元,k为自然数,n为转换元组序列的总数。
具体实施例时,本发明实施例所述的方法还包括对检测域名队列维护方法,具体步骤如下;
1)对每个访问ip,维护一个大小为tw待检测域名队列,队列元素为其请求的域名;
2)当待检测域名队列满时,提取转换元组集合;
3)删除待检测域名队列队头元素,进入等待状态,当等待时间超过预设阈值时间tover时,删除队列,若ip访问了新的域名,则进入步骤1);
本发明实施例所述转换元组序列集合提取方法为,对于待检测域名队列中的每个域名,提取其字符转换元组,得到字符转换元组序列
Figure PCTCN2017093890-appb-000029
Figure PCTCN2017093890-appb-000030
最后形成预定大小为tw的转换元组序列集合
Figure PCTCN2017093890-appb-000031
下面将结合附图,通过一个具体的例子对本发明实施例所述的方法进行详细的解释和说明:
本发明实施例所述的方法,包括训练建立模型和应用模型进行检测两个过程,训练过程中要使用概率模型生成算法来对训练数据(该训练数据具体包括算法生成域名集和正常域名集)进行学习,得到正常模型M1=<B11>和随机模型M2=<B22>。检测过程中,首先,对域名系统(Domain Name System,DNS)的访问数据进行噪声过滤,得到有效信息元组;然后,对待检测域名队列中的域名提取转换元组序列集合;最后,利用模型M1和M2,结合转换元组序列集合,对待检测域名队列中的域名进行异常检测,输出检测结果,具体如图2所示。
本发明实施例所述的概率模型生成方法是对训练数据中的正常域名集和算法生成域名集进行学习,包括计算初始字符概率矩阵π和字符转换概率矩阵B,生成正常模型M1和随机模型M2,参见图3,具体流程如下:
1)初始字符概率矩阵:
a)统计以字符α∈[a,b,...z,0,...,9,others]开头的域名的数量nα,以及域名总数N;
b)对每个字符α∈[a,b,...,z,0,...,9,others],计算其初始概率
Figure PCTCN2017093890-appb-000032
c)对正常域名集和算法生成域名集,分别计算所有在[a,b,...,z,0,...,9,others]当中的字符的初始概率Pα,得到正常初始字符概率矩阵
Figure PCTCN2017093890-appb-000033
和随机初始字符概率矩阵
Figure PCTCN2017093890-appb-000034
2)字符转换概率矩阵:
a)将域名视为一串字符序列,将所有相邻的两个字符αβ视为转换元组(α,β),其中α,β∈[a,b,...,z,0,...,9,others]
b)对于所有的转换元组,分别计算其出现的次数nαβ
c)设以α开头的元组出现的次数为nα*,则字符α到字符β的转换概率为:
Figure PCTCN2017093890-appb-000035
d)对正常域名集和算法生成域名集,分别计算所有Pαβ(α,β∈[a,b,...,z,0,...,9,others]),得到正常字符转换概率矩阵B1=(P1αβ)37×37,随机字符转换概率矩阵B2=(P2αβ)37×37
本发明实施例所述的噪声过滤方法包括有效信息元组提取、记录错误过滤、白名单过滤三个步骤,参见图4,具体流程如下:
1)提取DNS访问数据中的每一条访问记录的有效信息元组Info=<访问ip,域名,时间戳>;
2)过滤掉由于信息记录错误而造成的噪声数据,这些数据满足以下条件:
a)目的端口不为53;
b)域名为空或者’-’;
3)对2)中过滤后的有效信息元组进行白名单过滤,即请求域名存在于白名单时,将其对应的有效信息元组过滤掉,其中白名单包括:
a)Alexa排名前100万的域名;
b)能够匹配下列关键字正则表达式的域名
'in-addr|dns|cdn|cache|che|download|update|tracker|weather|read|msg|yun|pan|tui|trade|name|message|session|tel|akamai|img|tag|reg|sdk|app|api|time|timing|3g|4g|wifi|msn|game|profile|file|config|cfg|device|dvs|data|check|play|mobile|mail|cloud|tool|resolver|analy|log|open|service|pay|talk|gov|ads|stat|letv|tv\.|live|radio|video|show|movie|online|air|dianshi|qzone|iie|cartoon|ip4|ipv4|ip6|ipv6|http|tcp|wpad|workgroup'
本发明实施例所述的队列维护方法,用来维护访问ip的待检测域名队列,当队列满时,提取队列中的域名转换为转换元组序列集合,删除队头元素,等待新的请求域名,并删除等待超时的队列。参见图5,具体流程如下:
1)当本模块接收到噪声过滤模块产生的Info元组时,判断该元组中的访问ip是否为新的ip地址:若是,则为其创建待检测域名队列,并将Info元组中的域名加入队尾;若不是,则判断域名是否在访问ip的待检测域名队列中,若存在,则不作处理,否则加入待检测域名队列的队尾;
2)当某ip对应的待检测域名队列满时,对于待检测队列中的每个域名,结合顶级域名数据,去掉其顶级域名,保留其域名标识得到域名标识集合
Figure PCTCN2017093890-appb-000036
3)对于sLabelSet中的每个域名标识,提取字符转换元组序列
Figure PCTCN2017093890-appb-000037
最后形成大小为tw的转换元组序列集合
Figure PCTCN2017093890-appb-000038
4)删除待检测队列的队头元素,进入等待状态,当等待时间超过阈值tover时,删除队列,若接收到新的访问元组,则进入步骤1)。
本发明实施例所述的异常检测方法结合了训练过程中输出的正常模型M1和随机模型M2,对转换元组序列集合中的字符转换序列进行算法生成域名判定,并采取窗口投票机制检测异常,输出检测结果,参见图6,具体流程如下:
1)对每个访问ip的转换元组序列集合CharSeqSet中的每个 转换元组序列
Figure PCTCN2017093890-appb-000039
计算其属于正常域名的概率P1i和属于算法生成域名的概率P2i
a)其属于正常域名的概率P1i的计算公式如下:
Figure PCTCN2017093890-appb-000040
其中,
Figure PCTCN2017093890-appb-000041
b)其属于算法生成域名的概率P2i的计算公式如下:
Figure PCTCN2017093890-appb-000042
其中,
Figure PCTCN2017093890-appb-000043
2)若P1i>P2i,标记待检测域名队列中第i个域名为正常域名,否则,标记为算法生成域名;
3)对检测结果进行窗口投票,若待检测域名队列中的算法生成域名的数量大于阈值tm,则将其对应的ip和域名标记为异常,以<访问ip,域名,时间戳>的格式输出。
也就是说,本发明实施例是基于正常域名和算法域名在字符分布上的差异性,采用概率模型刻画域名字符的转换特征,能够快速有效的检测出域名的随机特性;再者,我们选择以访问ip作为检测单位,可以有效应对网络环境内只有单个ip访问DGA域名的情况;在对访问ip进行异常检测时,采用了双概率模型概率比较的方法,结合窗口投票机制,大大减少了误报率。采用本发明实施例进行DGA域名检测,在检测效率和检测性能上都能够取得很好的效果。
本发明实施例另外提供一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令被执行时实现上述方法。
装置实施例
本发明实施例提供了一种对算法生成域名进行检测的装置,参见图7,该装置包括:
建立单元,设置成基于概率模型生成算法,根据算法生成域名集建立随机模型,并根据正常域名集建立正常模型;
检测单元,设置成根据所述随机模型和所述正常模型对算法生成域名进行检测。
也就是说,本发明实施例通过建立单元根据概率模型生成算法,分别根据算法生成域名集和正常域名集建立随机模型和正常模型,并通过检测单元根据上述模型对算法生成域名进行检测,从而实现对算法生成域名进行快速检测,进而有效解决了不能对算法生成域名进行快速检测的问题。
可选地,本发明实施例所述建立单元还设置成,基于概率模型生成算法,并计算正常域名集的初始字符概率矩阵π1和字符转换概率矩阵B1,建立正常模型M1=<B1,π1>,计算算法生成域名集的初始字符概率矩阵π2和字符转换概率矩阵B2,建立随机模型M2=<B2,π2>。
即,本发明实施例是基于正常域名和算法生成域名在字符分布上的差异性,采用改了模型刻画域名字符的转换特征,从而实现快速有效的检测出算法生成域名。
具体实施时,本发明实施例所述的概率模型生成算法,包括计算初始字符概率矩阵π和字符转换概率矩阵B:
初始字符概率矩阵π,表示域名以字符α∈[a,b,...,z,0,...,9,others]开头的初始字符概率Pα,计算方法如下:
在训练数据中,对[a,b,...,z,0,...,9,others]中的任一字符α,统计以α开头的域名数量nα,以及所有域名的数量N,其中others代表非数字字母的字符;
计算字符α的初始概率
Figure PCTCN2017093890-appb-000044
形成初始字符概率矩阵π=[Pa,Pb,...,Pz,P0,...,P9,Pothers]。
字符转换概率矩阵B,表示字符之间的转换概率Pαβ=P(β|α)(α,β∈[a,b,...,z,0,...,9,others]),即当前出现的字符为α,下一个字符为β的概率,计算方法如下:
在训练数据中,将域名视为字符序列,提取其中的字符转换元组(α,β),并统计每一个字符转换元组出现的次数nαβ
设以α开头的字符转换元组的总数为nα*,则字符α到字符β的转换概率为:
Figure PCTCN2017093890-appb-000045
计算所有Pαβ(α,β∈[a,b,...,z,0,...,9,others]),得到字符转换概率矩阵B=(Pαβ)37×37
可选地,本发明实施例所述的装置还包括:过滤单元,通过过滤单元对域名系统DNS的访问数据进行噪声过滤,得到有效信息元组Info;所述有效信息元组包括:访问ip,域名和时间戳。
具体实施时,本发明实施例所述过滤单元是通过对所述域名系统DNS的访问数据中信息记录错误和白名单列表当中的域名进行过滤,从而得到有效信息元组Info。
可选地,本发明实施例所述的装置的检测单元还包括:
提取模块,设置成以访问ip为单位,维护预定长度tw的待检测域名队列,当队列满时,对其中的域名进行处理,提取转换元组序列集合CharSeqSet;
计算模块,设置成对每个访问ip的转换元组序列集合CharSeqSet中的每个字符转换元组序列
Figure PCTCN2017093890-appb-000046
计算其属于正常域名的概率P1i和属于算法生成域名的概率P2i,其中:
Figure PCTCN2017093890-appb-000047
当P1i>P2i,则标记待检测域名队列中的第i个域名为正常域名,否则,标记为算法生成域名;
判断模块,设置成对标记后的域名进行窗口投票,若待检测域名队列当中的算法生成域名的数量大于预设阈值数量tm,则将该ip和待检测域名队列中的算法生成域名标记为异常;
其中,
Figure PCTCN2017093890-appb-000048
属于初始字符概率矩阵π1
Figure PCTCN2017093890-appb-000049
属于字符转换概率矩阵B1
Figure PCTCN2017093890-appb-000050
属于初始字符概率矩阵π2
Figure PCTCN2017093890-appb-000051
属于字符转换概率矩阵B2,CharSeqi为第i个字符转换元组,
Figure PCTCN2017093890-appb-000052
为第n个字符转换元,k为自然数,n为转换元组序列的总数。
其中,本发明实施例的提取模块是以访问ip为单位,维护预定长度tw的待检测域名队列,当队列满时,对于待检测域名队列中的每个域名,提取其字符转换元组,得到字符转换元组序列
Figure PCTCN2017093890-appb-000053
Figure PCTCN2017093890-appb-000054
形成预定大小的转换元组序列集合
Figure PCTCN2017093890-appb-000055
本发明实施例中的相关内容可参照方法实施例部分进行理解,在此不再赘述。
本发明实施例可以至少可以达到以下的有益效果:
本发明实施例通过概率模型生成算法,分别根据算法生成域名集和正常域名集建立随机模型和正常模型,并通过上述模型对算法生成域名进行检测,从而实现对算法生成域名进行快速检测,进而有效解决了不能对算法生成域名进行快速检测的问题。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些组件或所有组件可以被实施为由处理器,如数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
尽管为示例目的,已经公开了本发明的优选实施例,本领域的技术人员将意识到各种改进、增加和取代也是可能的,因此,本申请的范围应当不限于上述实施例。
工业实用性
通过概率模型生成算法,分别根据算法生成域名集和正常域名集建立随机模型和正常模型,并通过上述模型对算法生成域名进行检测,从而实现对算法生成域名进行快速检测,进而有效解决了不能对算法生成域名进行快速检测的问题。

Claims (13)

  1. 一种对算法生成域名进行检测的方法,包括:
    基于概率模型生成算法,根据算法生成域名集建立随机模型,并根据正常域名集建立正常模型;
    根据所述随机模型和所述正常模型对算法生成域名进行检测。
  2. 根据权利要求1所述的方法,其中,所述基于概率模型生成算法,根据算法生成域名集建立随机模型,并根据正常域名集建立正常模型的步骤,包括:
    基于概率模型生成算法,计算正常域名集的初始字符概率矩阵π1和字符转换概率矩阵B1,建立正常模型M1=<B1,π1>,并计算算法生成域名集的初始字符概率矩阵π2和字符转换概率矩阵B2,建立随机模型M2=<B2,π2>。
  3. 根据权利要求1所述的方法,在所述基于概率模型生成算法,根据算法生成域名集建立随机模型,并根据正常域名集建立正常模型的步骤之后,还包括:
    对域名系统DNS的访问数据进行噪声过滤,得到有效信息元组Info;
    所述有效信息元组包括:访问ip,域名和时间戳。
  4. 根据权利要求3所述的方法,其中,所述对域名系统DNS的访问数据进行噪声过滤的步骤,包括:
    对所述域名系统DNS的访问数据中信息记录错误和白名单列表当中的域名进行过滤。
  5. 根据权利要求3所述的方法,其中,所述根据所述随机模型和所述正常模型对算法生成域名进行检测的步骤,包括:
    以访问ip为单位,维护预定长度tw的待检测域名队列,当队列满时,对其中的域名进行处理,提取转换元组序列集合CharSeqSet;
    对每个访问ip的转换元组序列集合CharSeqSet中的每个字符转换元组序列
    Figure PCTCN2017093890-appb-100001
    计算其属于正常域名的概率P1i和属于算法生成域名的概率P2i,其中:
    Figure PCTCN2017093890-appb-100002
    当P1i>P2i,则标记待检测域名队列中的第i个域名为正常域名,否则,标记为算法生成域名;
    对标记后的域名进行窗口投票,若待检测域名队列当中的算法生成域名的数量大于预设阈值数量tm,则将该ip和待检测域名队列中的算法生成域名标记为异常;
    其中,
    Figure PCTCN2017093890-appb-100003
    属于初始字符概率矩阵π1
    Figure PCTCN2017093890-appb-100004
    属于字符转换概率矩阵B1
    Figure PCTCN2017093890-appb-100005
    属于初始字符概率矩阵π2
    Figure PCTCN2017093890-appb-100006
    属于字符转换概率矩阵B2,CharSeqi为第i个字符转换元组,
    Figure PCTCN2017093890-appb-100007
    为第n个字符转换元,k为自然数,n为转换元组序列的总数。
  6. 根据权利要求5所述的方法,其中,所述提取转换元组序列集合CharSeqSet的步骤,包括:
    对于待检测域名队列中的每个域名,提取其字符转换元组,得到字符转换元组序列
    Figure PCTCN2017093890-appb-100008
    形成预定大小的转换元组序列集合
    Figure PCTCN2017093890-appb-100009
  7. 一种对算法生成域名进行检测的装置,包括:
    建立单元,设置成基于概率模型生成算法,根据算法生成域名集建立随机模型,并根据正常域名集建立正常模型;
    检测单元,设置成根据所述随机模型和所述正常模型对算法生成域名进行检测。
  8. 根据权利要求7所述的装置,所述建立单元还设置成,基于概率模型生成算法,并计算正常域名集的初始字符概率矩阵π1和字符转换概率矩阵B1,建立正常模型M1=<B1,π1>,计算算法生成域名集的初始字符概率矩阵π2和字符转换概率矩阵B2,建立随机模型M2=<B2,π2>。
  9. 根据权利要求7所述的装置,还包括:
    过滤单元,设置成对域名系统DNS的访问数据进行噪声过滤,得到有效信息元组Info;所述有效信息元组包括:访问ip,域名和时间戳。
  10. 根据权利要求9所述的装置,所述过滤单元还设置成,对所述域名系统DNS的访问数据中信息记录错误和白名单列表当中的域名进行过滤,得到有效信息元组Info。
  11. 根据权利要求9所述的装置,所述检测单元还包括:
    提取模块,设置成以访问ip为单位,维护预定长度tw的待检测域名队列,当队列满时,对其中的域名进行处理,提取转换元组序列集合CharSeqSet;
    计算模块,设置成对每个访问ip的转换元组序列集合CharSeqSet中的每个字符转换元组序列
    Figure PCTCN2017093890-appb-100010
    计算其属于正常域名的概率P1i和属于算法生成域名的概率P2i,其中:
    Figure PCTCN2017093890-appb-100011
    当P1i>P2i,则标记待检测域名队列中的第i个域名为正常域名,否则,标记为算法生成域名;
    判断模块,设置成对标记后的域名进行窗口投票,若待检测域名队列当中的算法生成域名的数量大于预设阈值数量tm,则将该ip和待检测域名队列中的算法生成域名标记为异常;
    其中,
    Figure PCTCN2017093890-appb-100012
    属于初始字符概率矩阵π1
    Figure PCTCN2017093890-appb-100013
    属于字符转换概率矩阵B1
    Figure PCTCN2017093890-appb-100014
    属于初始字符概率矩阵π2
    Figure PCTCN2017093890-appb-100015
    属于字符转换概率矩阵B2,CharSeqi
    为第i个字符转换元组,
    Figure PCTCN2017093890-appb-100016
    为第n个字符转换元,k为自然数,n为转换元组序列的总数。
  12. 根据权利要求11所述的装置,所述提取模块还设置成,以访问ip为单位,维护预定长度tw的待检测域名队列,当队列满时,对于待检测域名队列中的每个域名,提取其字符转换元组,得到字符转换元组序列
    Figure PCTCN2017093890-appb-100017
    形成预定大小的转换元组序列集合
    Figure PCTCN2017093890-appb-100018
  13. 一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令被执行时实现上述方法。
PCT/CN2017/093890 2016-08-18 2017-07-21 一种对算法生成域名进行检测的方法及装置 WO2018032936A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610686248.7 2016-08-18
CN201610686248.7A CN107770132B (zh) 2016-08-18 2016-08-18 一种对算法生成域名进行检测的方法及装置

Publications (1)

Publication Number Publication Date
WO2018032936A1 true WO2018032936A1 (zh) 2018-02-22

Family

ID=61196330

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/093890 WO2018032936A1 (zh) 2016-08-18 2017-07-21 一种对算法生成域名进行检测的方法及装置

Country Status (2)

Country Link
CN (1) CN107770132B (zh)
WO (1) WO2018032936A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110233830A (zh) * 2019-05-20 2019-09-13 中国银行股份有限公司 域名识别和域名识别模型生成方法、装置及存储介质
CN110392064A (zh) * 2019-09-04 2019-10-29 中国工商银行股份有限公司 风险识别方法、装置、计算设备以及计算机可读存储介质
WO2020199029A1 (zh) * 2019-03-29 2020-10-08 华为技术有限公司 一种数据处理方法及其装置
CN112771523A (zh) * 2018-08-14 2021-05-07 北京嘀嘀无限科技发展有限公司 用于检测生成域的系统和方法
CN112995360A (zh) * 2021-04-30 2021-06-18 新华三技术有限公司 一种域名检测方法、装置、dga服务设备及存储介质

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020014916A1 (zh) * 2018-07-19 2020-01-23 华为技术有限公司 一种用户识别方法和相关设备
CN109241483B (zh) * 2018-08-31 2021-10-12 中国科学院计算技术研究所 一种基于域名推荐的网站发现方法和系统
CN109450845B (zh) * 2018-09-18 2020-08-04 浙江大学 一种基于深度神经网络的算法生成恶意域名检测方法
CN110213255B (zh) * 2019-05-27 2022-03-04 北京奇艺世纪科技有限公司 一种对主机进行木马检测的方法、装置及电子设备
CN111314291A (zh) * 2020-01-15 2020-06-19 北京小米移动软件有限公司 网址安全性检测方法及装置、存储介质
CN111340574B (zh) * 2020-05-15 2020-08-25 支付宝(杭州)信息技术有限公司 风险用户的识别方法、装置和电子设备
CN114666077B (zh) * 2020-12-08 2022-11-15 北京中科网威信息技术有限公司 Dga域名检测方法及系统、电子设备及存储介质
CN113572770B (zh) * 2021-07-26 2022-09-02 清华大学 检测域名生成算法生成的域名的方法及装置

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105119876A (zh) * 2015-06-29 2015-12-02 中国科学院信息工程研究所 一种自动生成的域名的检测方法及系统

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103957191A (zh) * 2014-04-03 2014-07-30 中国科学院计算机网络信息中心 一种中文域名仿冒攻击的检测方法
CN105024969B (zh) * 2014-04-17 2018-04-03 北京启明星辰信息安全技术有限公司 一种实现恶意域名识别的方法及装置
CN105577660B (zh) * 2015-12-22 2019-03-08 国家电网公司 基于随机森林的dga域名检测方法
CN105610830A (zh) * 2015-12-30 2016-05-25 山石网科通信技术有限公司 域名的检测方法及装置

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105119876A (zh) * 2015-06-29 2015-12-02 中国科学院信息工程研究所 一种自动生成的域名的检测方法及系统

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112771523A (zh) * 2018-08-14 2021-05-07 北京嘀嘀无限科技发展有限公司 用于检测生成域的系统和方法
WO2020199029A1 (zh) * 2019-03-29 2020-10-08 华为技术有限公司 一种数据处理方法及其装置
CN110233830A (zh) * 2019-05-20 2019-09-13 中国银行股份有限公司 域名识别和域名识别模型生成方法、装置及存储介质
CN110392064A (zh) * 2019-09-04 2019-10-29 中国工商银行股份有限公司 风险识别方法、装置、计算设备以及计算机可读存储介质
CN110392064B (zh) * 2019-09-04 2022-03-15 中国工商银行股份有限公司 风险识别方法、装置、计算设备以及计算机可读存储介质
CN112995360A (zh) * 2021-04-30 2021-06-18 新华三技术有限公司 一种域名检测方法、装置、dga服务设备及存储介质

Also Published As

Publication number Publication date
CN107770132A (zh) 2018-03-06
CN107770132B (zh) 2021-11-05

Similar Documents

Publication Publication Date Title
WO2018032936A1 (zh) 一种对算法生成域名进行检测的方法及装置
Perdisci et al. Iotfinder: Efficient large-scale identification of iot devices via passive dns traffic analysis
US11057404B2 (en) Method and apparatus for defending against DNS attack, and storage medium
US20190294784A1 (en) Method for detecting a cyber attack
US20190268358A1 (en) Countering service enumeration through imposter-driven response
US8650646B2 (en) System and method for optimization of security traffic monitoring
WO2016006520A1 (ja) 検知装置、検知方法及び検知プログラム
US9350748B1 (en) Countering service enumeration through optimistic response
CN108737447B (zh) 用户数据报协议流量过滤方法、装置、服务器及存储介质
JP6686033B2 (ja) メッセージをプッシュするための方法および装置
JP2019501547A (ja) DoS/DDoS攻撃を検出する方法、装置、サーバ及び記憶媒体
CN108390856B (zh) 一种DDoS攻击检测方法、装置及电子设备
CN106790299B (zh) 一种在无线接入点ap上应用的无线攻击防御方法和装置
CN113518057B (zh) 分布式拒绝服务攻击的检测方法、装置及其计算机设备
US20180191650A1 (en) Publish-subscribe based exchange for network services
KR102059688B1 (ko) 사이버 블랙박스 시스템 및 그 방법
CN115499230A (zh) 网络攻击检测方法和装置、设备及存储介质
CN110061998B (zh) 一种攻击防御方法及装置
CN109005181B (zh) 一种dns放大攻击的检测方法、系统及相关组件
CN113242260A (zh) 攻击检测方法、装置、电子设备及存储介质
CN110198294B (zh) 安全攻击检测方法及装置
WO2015027523A1 (zh) 一种确定tcp端口扫描的方法及装置
EP2892187A1 (en) Method and device for processing and tracking tacacs+ session
CN108650274B (zh) 一种网络入侵检测方法及系统
WO2024027079A1 (zh) 域名反射攻击检测方法及装置、电子设备、存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17840911

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17840911

Country of ref document: EP

Kind code of ref document: A1