CN108768954B - DGA malicious software identification method - Google Patents

DGA malicious software identification method Download PDF

Info

Publication number
CN108768954B
CN108768954B CN201810419555.8A CN201810419555A CN108768954B CN 108768954 B CN108768954 B CN 108768954B CN 201810419555 A CN201810419555 A CN 201810419555A CN 108768954 B CN108768954 B CN 108768954B
Authority
CN
China
Prior art keywords
host
dga
infected
random walk
domain name
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810419555.8A
Other languages
Chinese (zh)
Other versions
CN108768954A (en
Inventor
罗熙
徐震
王利明
杨婧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201810419555.8A priority Critical patent/CN108768954B/en
Publication of CN108768954A publication Critical patent/CN108768954A/en
Application granted granted Critical
Publication of CN108768954B publication Critical patent/CN108768954B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/145Countermeasures against malicious traffic the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]

Abstract

The invention discloses a DGA malicious software identification method which can quickly identify DGA malicious software based on the weakness of a DGA technology. Since the host infected by DGA malware does not know its control server domain name, the infected host needs to constantly and randomly generate domain names and attempt to connect until it is successfully connected to the control server. Based on the defects and by using the idea of random walk for reference, the invention considers the domain name connection failed each time of the host as one random walk, provides a calculation method of random walk increment, and judges whether the host is infected by DGA malicious software or not by comparing the random walk number and the random walk increment with a preset threshold value. The method can complete detection before the infected host is connected to the control server, effectively inhibits the application of DGA malicious software, and has wide application prospect in the field of network security.

Description

DGA malicious software identification method
Technical Field
The invention belongs to the technical field of network security, and particularly relates to a DGA (Domain Generation Algorithm) malicious software identification method.
Background
Early DGA techniques were mainly used for botnets, and many correlation detection methods detect DGA malware based on characteristics of botnets (including synchronicity, periodicity, and node correlation). While recent DGA techniques are used in lasso software, in such applications, lasso software does not have the characteristics of botnet described above, and thus the conventional detection methods are difficult to apply to such scenarios.
The key weakness of DGA malware is that the infected host does not know the domain name of the control server, i.e. the infected host needs to continuously generate domain names and try to connect through a random algorithm until successfully connecting to the domain name of the control server. Thus, DGA malware identification can be achieved by analyzing the number of domain names that fail a connection and the characteristics of the domain names themselves.
The detection METHODs proposed by patent CN106576058A "system AND METHOD for detecting domain generation algorithm malware AND system infected by the malware", patent CN106992969A "detection METHOD for domain name generation BASED on DGA of domain name string statistical features", patent CN105577660A "detection METHOD for domain name of DGA BASED on random forest", patent CN107046586A "detection METHOD for domain name generation BASED on natural language-like features", patent US2013191915(a1) "METHOD AND system domain DETECTING DGA-base MA L way", all use a single domain name or related parameters of the domain name as analysis detection objects, AND because a large number of normal domain names exist in the actual network environment, especially short domain names, the detection METHODs all have high false alarm rate.
Disclosure of Invention
The invention solves the problems: aiming at the key weakness of DGA malware, namely that an infected host does not know the domain name of a control server, a DGA malware identification method is provided, and the infected host can be detected before being connected to the control server.
The technical scheme of the invention is as follows: in order to achieve the purpose, the invention adopts the following technical scheme.
A DGA malware identification method comprising the steps of:
a) the domain name connection of each failure of the host is called a random walk, and the random walk increment delta is calculated based on the domain name of the ith connection failure of the hostiAnd obtaining Λ the increment of the previous n random walksn
b) When ΛnGreater than a predetermined upper threshold BuOr the number n of steps of the random walk exceeds a preset threshold BsWhen it is determined that the host is infected with DGA malware, ΛnLess than a lower threshold BlIf so, judging that the host is not infected by the DGA malicious software;
c) when a host is determined to be infected, an alarm is raised and reset Λ10, when the host is determined to be in the normal state, direct reset Λ is performed1=0。
Further, in step 1), the random walk increment ΔiIs calculated by
Figure BDA0001650317870000021
Figure BDA0001650317870000022
Wherein l is domain name length, Pr (α)0) And Pr (α)kk-1) The statistical derivation of Pr (α) based on the top 10 million Alexa-ranked Domain names0) For all domain names the initial character is α0Statistical probability of (3), Pr (α)kk-1) For the k-1 character in all domain names is αk-1Under the condition that the k-th character is αkThe probability of (c).
Further, Pr (α)kk-1) Is calculated by
Figure BDA0001650317870000023
Wherein
Figure BDA0001650317870000024
As a binary character set αk-1αkThe number of times that it occurs in all domain names,
Figure BDA0001650317870000025
is a start character of αk-1The number of occurrences of the binary character set in all domain names.
Further, in step 2), the upper threshold limit BuAnd lower threshold bound BlBased on the calculation of the missing report rate fnr and the false report rate fpr, the calculation method is
Figure BDA0001650317870000026
The false alarm rate indicates that the host is not infected but the determination result is that the host is in an infected state.
Further, in step 2), the false alarm rate fnr, the false alarm rate fpr and the threshold value BsThe method can be comprehensively determined according to factors such as system security requirements, current network conditions and the like.
Compared with the prior art, the invention has the beneficial effects that: existing DGA malware identification methods can be broadly divided into two categories. The DGA domain names are judged by extracting and analyzing the characteristics of single domain names, and the detection method has high false alarm rate due to the fact that a large number of normal irregular domain names exist in the actual network environment, particularly the domain names with short lengths. The other type is based on the characteristics of botnets, namely whether the domain name is abnormal is judged by analyzing the characteristics of multiple connections, so that the DGA domain name can be detected only after the connection request is completed. According to the DGA malicious software identification method based on the threshold random walk algorithm, malicious samples are not needed to be used as training sets, detection can be completed before infected hosts are connected to a control server, and the detection rate can be improved to the maximum extent by the threshold random walk algorithm while the detection accuracy is guaranteed. The invention is verified by experiments that the false alarm rate can be less than 3%, thus showing the effectiveness.
Drawings
FIG. 1 is a diagram of a finite state machine according to the present invention.
Detailed Description
The following detailed description of specific embodiments of the invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention, but not to limit the scope of the invention.
The invention needs to set a missing report rate fnr and a false report rate fpr at first, and calculates the upper threshold value boundary B of the random walk increment based on fnr and fpruAnd lower threshold bound Bl. Step number threshold B for random walksIt is not set for the moment.
For example, if fnr is set to 0.01 and fpr is set to 0.001, it can be calculated
Figure BDA0001650317870000031
Figure BDA0001650317870000032
The maximum number of end steps S for normal access under the above parameters is then tested,then setting a step number threshold B of random walk according to Ss
For example, if S is 12, B may be sets=15。
In the detection phase, FIG. 1, Λ in the initial state1When the host tries a domain name connection, if the connection is successful ΛnIf the connection fails, the random walk increment delta is calculated according to the following formulaiAnd a random walk increment sum Λn,Λn=∑iΔi
Figure BDA0001650317870000033
Wherein l is the domain name length, Pr (α)0) For all domain names the initial character is α0Statistical probability of (3), Pr (α)kk-1) For the k-1 character in all domain names is αk-1Under the condition that the k-th character is αkProbability of (D.Pr) (α)kk-1) The calculation method comprises the following steps:
Figure BDA0001650317870000034
wherein
Figure BDA0001650317870000035
As a binary character set αk-1αkThe number of times that it occurs in all domain names,
Figure BDA0001650317870000036
is a start character of αk-1The number of occurrences of the binary character set in all domain names.
When ΛnGreater than a predetermined upper threshold BuOr the number of steps of the random walk exceeds a preset threshold BsWhen it is determined that the host is infected with DGA malware, ΛnLess than a lower threshold BlThen it is determined that the host is not infected by DGA malware.
When a host is determined to be infected, an alarm is raised and the host returns to the initial state, i.e., reset Λ1When the host is 0If it is determined to be normal, it is returned directly to the initial state and reset Λ1=0。
The above embodiments are merely illustrative, and not restrictive, and various changes and modifications may be made by those skilled in the art without departing from the spirit and scope of the invention, and therefore all equivalent technical solutions are intended to be included within the scope of the invention.

Claims (4)

1. A DGA malware identification method is characterized by comprising the following steps:
(1) the domain name connection of each failure of the host is called a random walk, and the random walk increment delta is calculated based on the domain name of the ith connection failure of the hostiAnd obtaining Λ the increment of the previous n random walksn
(2) When ΛnGreater than a predetermined upper threshold BuOr n exceeds a preset threshold BsWhen it is determined that the host is infected with DGA malware, ΛnLess than a lower threshold BlIf so, judging that the host is not infected by the DGA malicious software;
(3) when a host is determined to be infected, an alarm is raised and reset Λ10, when the host is determined to be in the normal state, direct reset Λ is performed1=0;
In the step (1), the random walk increment deltaiIs calculated by
Figure FDA0002493689310000011
Figure FDA0002493689310000012
Wherein l is the domain name length, Pr (α)0) For all domain names the initial character is α0Statistical probability of (3), Pr (α)kk-1) For the k-1 character in all domain names is αk-1Under the condition that the k-th character is αkThe probability of (d);
in the step (1), the increment sum Λ is randomly strokednIs Λn=∑iΔi
In the step (2), the upper threshold BuAnd lower threshold bound BlBased on the calculation of the missing report rate fnr and the false report rate fpr, the calculation method is
Figure FDA0002493689310000013
The false alarm rate indicates that the host is not infected but the determination result is that the host is in an infected state.
2. The DGA malware identification method of claim 1, wherein the Pr (α)kk-1) The calculation method comprises the following steps:
Figure FDA0002493689310000014
wherein
Figure FDA0002493689310000015
As a binary character set αk-1αkThe number of times that it occurs in all domain names,
Figure FDA0002493689310000016
is a start character of αk-1The number of occurrences of the binary character set in all domain names.
3. The DGA malware identification method of claim 1, wherein in step (1), if the domain name accessed by the host is successfully connected, Λ is performednRemain unchanged.
4. The DGA malware identification method of claim 1, wherein: in the step (2), the false alarm rate fnr, the false alarm rate fpr and the threshold BsThe setting principle is as follows: setting interval (0, 0.01) of missing report rate fnr]To ensure that abnormal accesses can be identified more; setting interval (0, 0.001) of false alarm rate fpr]To ensure that normal access finishes the whole identification process in a short time(ii) a Threshold value BsSetting interval as [ S, S x 150%]Wherein S is not set to the threshold BsIn the case of (3), the maximum number of end steps in the normal access, that is, the maximum number of random walk steps required for the host to be determined to be in the normal state.
CN201810419555.8A 2018-05-04 2018-05-04 DGA malicious software identification method Active CN108768954B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810419555.8A CN108768954B (en) 2018-05-04 2018-05-04 DGA malicious software identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810419555.8A CN108768954B (en) 2018-05-04 2018-05-04 DGA malicious software identification method

Publications (2)

Publication Number Publication Date
CN108768954A CN108768954A (en) 2018-11-06
CN108768954B true CN108768954B (en) 2020-07-10

Family

ID=64010106

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810419555.8A Active CN108768954B (en) 2018-05-04 2018-05-04 DGA malicious software identification method

Country Status (1)

Country Link
CN (1) CN108768954B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110278212A (en) * 2019-06-26 2019-09-24 中国工商银行股份有限公司 Link detection method and device
CN112468484B (en) * 2020-11-24 2022-09-20 山西三友和智慧信息技术股份有限公司 Internet of things equipment infection detection method based on abnormity and reputation

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1859199A (en) * 2006-02-20 2006-11-08 华为技术有限公司 System and method for detecting network worm
CN101175078A (en) * 2006-10-30 2008-05-07 丛林网络公司 Identification of potential network threats using a distributed threshold random walk
CN101626377A (en) * 2009-08-07 2010-01-13 成都市华为赛门铁克科技有限公司 Method and device for detecting viruses
CN101707539A (en) * 2009-11-26 2010-05-12 成都市华为赛门铁克科技有限公司 Method and device for detecting worm virus and gateway equipment
CN103973663A (en) * 2013-02-01 2014-08-06 中国移动通信集团河北有限公司 Method and device for dynamic threshold anomaly traffic detection of DDOS (distributed denial of service) attack
CN105072089A (en) * 2015-07-10 2015-11-18 中国科学院信息工程研究所 WEB malicious scanning behavior abnormity detection method and system
CN105577660A (en) * 2015-12-22 2016-05-11 国家电网公司 DGA domain name detection method based on random forest
CN105681313A (en) * 2016-01-29 2016-06-15 博雅网信(北京)科技有限公司 Flow detection system and method for virtualization environment
CN106170002A (en) * 2016-09-08 2016-11-30 中国科学院信息工程研究所 A kind of Chinese counterfeit domain name detection method and system
CN106576058A (en) * 2014-08-22 2017-04-19 迈克菲股份有限公司 System and method to detect domain generation algorithm malware and systems infected by such malware
CN106992969A (en) * 2017-03-03 2017-07-28 南京理工大学 DGA based on domain name character string statistical nature generates the detection method of domain name
CN107046586A (en) * 2017-04-14 2017-08-15 四川大学 A kind of algorithm generation domain name detection method based on natural language feature
CN107592312A (en) * 2017-09-18 2018-01-16 济南互信软件有限公司 A kind of malware detection method based on network traffics

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8800036B2 (en) * 2010-01-22 2014-08-05 The School Of Electrical Engineering And Computer Science (Seecs), National University Of Sciences And Technology (Nust) Method and system for adaptive anomaly-based intrusion detection
US9922190B2 (en) * 2012-01-25 2018-03-20 Damballa, Inc. Method and system for detecting DGA-based malware

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1859199A (en) * 2006-02-20 2006-11-08 华为技术有限公司 System and method for detecting network worm
CN101175078A (en) * 2006-10-30 2008-05-07 丛林网络公司 Identification of potential network threats using a distributed threshold random walk
CN101626377A (en) * 2009-08-07 2010-01-13 成都市华为赛门铁克科技有限公司 Method and device for detecting viruses
CN101707539A (en) * 2009-11-26 2010-05-12 成都市华为赛门铁克科技有限公司 Method and device for detecting worm virus and gateway equipment
CN103973663A (en) * 2013-02-01 2014-08-06 中国移动通信集团河北有限公司 Method and device for dynamic threshold anomaly traffic detection of DDOS (distributed denial of service) attack
CN106576058A (en) * 2014-08-22 2017-04-19 迈克菲股份有限公司 System and method to detect domain generation algorithm malware and systems infected by such malware
CN105072089A (en) * 2015-07-10 2015-11-18 中国科学院信息工程研究所 WEB malicious scanning behavior abnormity detection method and system
CN105577660A (en) * 2015-12-22 2016-05-11 国家电网公司 DGA domain name detection method based on random forest
CN105681313A (en) * 2016-01-29 2016-06-15 博雅网信(北京)科技有限公司 Flow detection system and method for virtualization environment
CN106170002A (en) * 2016-09-08 2016-11-30 中国科学院信息工程研究所 A kind of Chinese counterfeit domain name detection method and system
CN106992969A (en) * 2017-03-03 2017-07-28 南京理工大学 DGA based on domain name character string statistical nature generates the detection method of domain name
CN107046586A (en) * 2017-04-14 2017-08-15 四川大学 A kind of algorithm generation domain name detection method based on natural language feature
CN107592312A (en) * 2017-09-18 2018-01-16 济南互信软件有限公司 A kind of malware detection method based on network traffics

Also Published As

Publication number Publication date
CN108768954A (en) 2018-11-06

Similar Documents

Publication Publication Date Title
US10826684B1 (en) System and method of validating Internet of Things (IOT) devices
CN106790186B (en) Multi-step attack detection method based on multi-source abnormal event correlation analysis
KR101122650B1 (en) Apparatus, system and method for detecting malicious code injected with fraud into normal process
JP6636096B2 (en) System and method for machine learning of malware detection model
KR102210627B1 (en) Method, apparatus and system for detecting malicious process behavior
CN102664875B (en) Malicious code type detection method based on cloud mode
CN108111466A (en) A kind of attack detection method and device
CN110581827B (en) Detection method and device for brute force cracking
Shabtai et al. F-sign: Automatic, function-based signature generation for malware
US10356113B2 (en) Apparatus and method for detecting abnormal behavior
CN105046152B (en) Malware detection method based on function call graph fingerprint
KR20080071862A (en) Apparatus for detecting intrusion code and method using the same
CN109257393A (en) XSS attack defence method and device based on machine learning
JP2010182019A (en) Abnormality detector and program
CN108768954B (en) DGA malicious software identification method
CN104598820A (en) Trojan virus detection method based on feature behavior activity
WO2020134311A1 (en) Method and device for detecting malware
CN114969766A (en) Account locking bypassing logic vulnerability detection method, system and storage medium
US11916953B2 (en) Method and mechanism for detection of pass-the-hash attacks
CN101719906B (en) Worm propagation behavior-based worm detection method
CN111901286B (en) APT attack detection method based on flow log
CN113839963B (en) Network security vulnerability intelligent detection method based on artificial intelligence and big data
CN113709097B (en) Network risk sensing method and defense method
Ponomarev et al. Session duration based feature extraction for network intrusion detection in control system networks
CN115373834A (en) Intrusion detection method based on process call chain

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant