CN107645503A - A kind of detection method of the affiliated DGA families of rule-based malice domain name - Google Patents

A kind of detection method of the affiliated DGA families of rule-based malice domain name Download PDF

Info

Publication number
CN107645503A
CN107645503A CN201710855704.0A CN201710855704A CN107645503A CN 107645503 A CN107645503 A CN 107645503A CN 201710855704 A CN201710855704 A CN 201710855704A CN 107645503 A CN107645503 A CN 107645503A
Authority
CN
China
Prior art keywords
domain name
dga
sample
rule
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710855704.0A
Other languages
Chinese (zh)
Other versions
CN107645503B (en
Inventor
程华才
范渊
李凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DBAPPSecurity Co Ltd
Original Assignee
DBAPPSecurity Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DBAPPSecurity Co Ltd filed Critical DBAPPSecurity Co Ltd
Priority to CN201710855704.0A priority Critical patent/CN107645503B/en
Publication of CN107645503A publication Critical patent/CN107645503A/en
Application granted granted Critical
Publication of CN107645503B publication Critical patent/CN107645503B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention relates to network security APT detection fields, it is desirable to provide a kind of detection method of the rule-based affiliated DGA families of malice domain name.The detection method of this kind of rule-based affiliated DGA family of malice domain name, for malice domain name to be analyzed and detected, identify the DGA families for the computer equipment institute virus infection attacked in network.The feature calculation that the present invention passes through the abnormal domain name to largely being asked in the bot program short time, result of calculation is matched with the domain name characterization rules that known DGA algorithms generate, the related DGA families type of the bot program of a certain computer equipment infection in current network is rapidly identified, is advantageous to removing work, the development of remedial measure that the tracking of follow-up network attack is traced to the source with bot program.

Description

A kind of detection method of the affiliated DGA families of rule-based malice domain name
Technical field
The present invention is on network security APT (Advanced Persistent Threat, advanced continuation threaten) inspection Survey field, more particularly to a kind of detection method of the affiliated DGA families of rule-based malice domain name.
Background technology
Domain name system (Domain Name System, DNS), one of important infrastructure of Internet service, as domain The distributed data base that name and IP address mutually map, makes user more easily connect, accesses internet, without spending Remembeing can be by IP address digit string that machine is directly read.Current most of the Internet, applications before specific business is carried out, It is required for completing the addressing conversion from domain name to IP address using domain name system.
Botnet as one of network safety filed research direction (Botnet, refers to propagate using one or more Means, by a large amount of main frames infect bot programs (bot program, belong to rogue program, such as:Worm or trojan horse etc.), from And one formed between effector and infected main frame can one-to-many control network), it is most of or utilizing DNS Resource acquisition is carried out, server selection is carried out, receives instruction etc..In order to improve the survival ability of itself, and reach more preferable Hide and flexible effect, extension life span, these Botnet make use of some to escape detection techniques:DGA domain name generating algorithms (Domain Generation Algorithm, be generally also called DGA algorithms), it is usually used in the C&C of bot program and its effector Server (Command&Control, order and control) communication, is usually generated using a privately owned random string and calculated Method, use date or other variable elements as random seed (i.e. the input parameter of algorithm), in each cycle (such as:Often My god, weekly, every 10 days etc.) generate some random strings, then attacker is registered as C&C servers using a part therein Domain name (is referred to as malice domain name or DGA domain names), and also these random domain names are generated according to same algorithm inside rogue program, Then rogue program is attempted to initiate DNS name resolution request, and after a certain domain name of request analysis returns successfully, rogue program can be after It is continuous to attempt to be communicated with the IP address that domain name mapping returns, if communicating successfully, then it represents that rogue program, which have found, to draw oneself up C&C servers, further perform other operations, such as:The instruction of follow-up work is received from C&C servers, upload has obtained The internal network information taken is to C&C servers etc..
Because Botnet has, a variety of (Botnet that the domain name generated by DGA algorithms is communicated also is called DGA house Race, the generally corresponding DGA algorithm of a DGA family, or one group of similar DGA algorithm), correspondingly, DGA algorithms also have more Kind, such as:Conficker, virut, simda, mirai etc., the feature of the domain name of generation is also just different, but to same The domain name of DGA domain names generating algorithm generation, can show same or similar feature in terms of morphology:SLD(Second Level Domain, second level domain) length (such as:A fixed length, or an interval range), SLD values The character set (i.e. comprising which possible alphabetic character and numerical character) that uses, TLD (Top Level Domain, one-level domain Name) span (for example, be a fixed top-level domain, or multiple optional top-level domains, or uncertain , determined by the input parameter of algorithm) etc..
After it may be infected worm or trojan horse in detecting network, technical staff traces back network attack is carried out Source, finally identification take possible remedial measure with removing virus and possible relative Botnet.Different diseases Poison, sweep-out method and remedial measure may be different, therefore, identify the infected viral DGA family of computer equipment in network Race's species is a very important step during whole network attacker-tracing is traced to the source.
The content of the invention
It is a primary object of the present invention to overcome deficiency of the prior art, there is provided a kind of rule-based malice domain name institute Belong to the detection method of DGA families.In order to solve the above technical problems, the solution of the present invention is:There is provided a kind of rule-based The detection method of the affiliated DGA families of malice domain name, for malice domain name to be analyzed and detected, identify what is attacked in network (Botnet that the domain name generated by DGA algorithms is communicated also is called for the DGA families of computer equipment institute virus infection DGA families, the generally corresponding DGA algorithm of DGA family, or one group of similar DGA algorithm) species, it is described to be based on The detection method of the affiliated DGA families of malice domain name of rule comprises the steps:
Step 1:The related data of DGA algorithms is collected on the internet, and obtains the domain generated by these DGA algorithms Name sample;
Step 2:To each DGA algorithms, the set of domains feature that will be analyzed is chosen, ultimately forms a total feature List, to each at least one training sample of DGA algorithms selections, to each feature in feature list, calculate and conclude Go out characteristic value corresponding to training sample, ultimately form an eigenmatrix, specifically include following sub-steps:
Step (2A):Each DGA algorithm in step 1 is generated in the period of one day or one week etc. Set of domains (for example, 100 domain names that proslikefan algorithms generate daily), as a sample, i.e. a set of domains As a sample, the domain name feature for needing to analyze is listed;
The domain name feature for needing to analyze refers to can be by the domain name in the set of domains and common domain name and other DGA algorithms One or more feature that the domain name of generation is distinguished, such as:Whether contain vowel in SLD length, SLD character strings Whether numerical character, algorithm are contained in minimum rate and maximum rate, SLD that character, SLD medial vowel alphabetic characters occupy one The domain name number that is generated in the individual time cycle, TLD lists etc.;
Step (2B):The domain name feature that obtaining each sample by step (2A) needs to analyze (is calculated each DGA Method, a sample is selected, list the domain name feature that sample needs to analyze, if in step 1,50 DGA can collected and calculated Method, then 50 samples are selected, the domain name feature for needing to analyze are listed to each sample), then these domain name features are done Union operation, finally obtain a feature list;
For example, the domain name sample to DGA algorithms pykspa generations is, it is necessary to which the domain name of analysis is characterized as:
TLD lists:TLD, i.e. Top Level Domain, represent top-level domain;TLD lists represent that the DGA algorithms generate Domain name which top-level domain may be used;
SLD length ranges:SLD, i.e. Second Level Domain, represent second level domain, and SLD length ranges refer to this The domain name of DGA algorithms generation, the scope of the length of second level domain part is how many;
SLD character spans:I.e. second level domain is probably to be made up of which letter;
Algorithm generates the time cycle of domain name:That is this algorithm every how long generating a different set of domains, In this time cycle, this set is belonged to using the domain name of the rogue program request analysis of the algorithm;
The number for the domain name that algorithm generates in a time cycle of generation domain name:I.e. the algorithm generates domain name at one Time cycle in, the domain name number of generation;
Minimum rate containing vowel in SLD:That is the domain name of DGA algorithms pykspa generations, at least more than percent Contain vowel character in few SLD;
For example, the domain name sample to DGA algorithms qadars generations is, it is necessary to which the domain name of analysis is characterized as:
TLD lists;
SLD length ranges;
SLD character spans;
Algorithm generates the time cycle of domain name;
The number of domain name that algorithm generates in a time cycle of generation domain name, before this five features and algorithm The domain name feature that pykspa needs are analyzed is identical;
Whether SLD has alphabetic character and the situation of numerical character switching:I.e. in second level domain, alphabetic character and numerical character Mutually switching, such as domain name:05qj09mf4d2b.com;
For domain name sample to the generation of the two DGA algorithms, it is necessary to which the domain name feature of analysis does union operation, formation contains seven The feature list of individual feature:
TLD lists, SLD length ranges, SLD character values source, the time cycle of algorithm generation domain name, algorithm are generating Whether the minimum rate containing vowel, SLD have alphabetic word in the number of the domain name generated in the time cycle of domain name, SLD Symbol and the situation of numerical character switching;
Step (2C):At least two set of domains generated to each DGA algorithm are (i.e. to each DGA algorithms selection Multiple samples, at least select two samples), it is divided into two parts:A part is used as training sample, for creating rule, if instruction The number of samples for practicing sample is M;Another part is as test sample, the recognition accuracy for test order;
For M training sample, value (each in the feature list obtained with reference to step (2B) of feature is calculated respectively Feature, or by the domain name in sample, the value of feature is obtained by conclusion), if having N in the feature list that step (2B) obtains Individual feature, then each sample is calculated or concluded the value of this N number of feature, the DGA family belonged to along with each sample Same clan's type, M training sample form the eigenmatrix of a M rows N+1 row after calculating;Wherein, M, N are oneself more than zero So number;
For example, it is assumed that the feature list that step (2B) obtains is as follows, in specific embodiment, feature list may include Tens features:TLD lists, SLD length ranges, SLD character value source, the time cycle of algorithm generation domain name, algorithm Whether the minimum rate containing vowel, SLD have in the number of domain name that is generated in the time cycle of generation domain name, SLD Alphabetic character and the situation of numerical character switching;
Illustrate below, according to feature list, calculate and summarize the value of algorithm pykspa and qadars each feature:
The domain name that algorithm pykspa is generated daily has 800, illustrates 4 below:
hrxzdi.net,llwfnz.info,tknutifsxwh.com,kqcmxjplngd.org
To 800 domain names, by calculating and concluding, obtained characteristic value is as follows, character ':' left side can be regarded as feature Title, the value that the right is characterized:
TLD lists:com,net,org,info;That is the domain name of algorithm generation, top-level domain may be these four;
SLD length ranges:6~12;That is the domain name of algorithm generation, the length range of second level domain is 6~12;
SLD character span:A~z;That is the domain name of algorithm generation, the character source of second level domain are a~z, 26 Individual alphabetic character;
Algorithm generates the time cycle of domain name:My god;I.e. the algorithm generates a set of domains daily;
The number for the domain name that algorithm generates in the time cycle of generation domain name:800;The domain name that i.e. algorithm generates daily There are 800 domain names in set;
Minimum rate containing vowel in SLD:70%;That is the domain name of algorithm generation, at least 70% two level Contain vowel character in domain name;
Whether SLD has alphabetic character and the situation of numerical character switching:It is no;That is the domain name of algorithm generation, in the absence of word Alphabetic character and the situation of numerical character switching;
DGA families type:pykspa
Domain name weekly-generated algorithm qadars has 1800, illustrates 4 below:
7wpyj01ijol2.org,k9ijkhiz8hy7.org,jkhu7w123whu.net,1if05u3gtevs.com
To 1800 domain names by calculating and concluding, obtained characteristic value is as follows:
TLD lists:com,net,org;That is the domain name of algorithm generation, top-level domain may be these three;
SLD length ranges:12;That is the domain name of algorithm generation, the length position fixation of second level domain, 12 characters;
SLD character span:A~z, 0~9;That is the domain name of algorithm generation, the character source of second level domain is a ~z and 0~9;
Algorithm generates the time cycle of domain name:Week;That is the weekly-generated set of domains of the algorithm;
The number for the domain name that algorithm generates in the time cycle of generation domain name:1800;That is the weekly-generated domain of the algorithm There are 1800 domain names in name set;
Minimum rate containing vowel in SLD:95%;That is the domain name of algorithm generation, at least 95% two level Contain vowel character in domain name;
Whether SLD has alphabetic character and the situation of numerical character switching:It is;That is, alphabetic word be present in algorithm generation domain name Symbol and the situation of numerical character switching;
DGA families type:qadars
Citing above, can obtaining the row of two rows (i.e. two DGA algorithms) eight, (value of i.e. seven features adds DGA Family's type one arranges) eigenmatrix, this feature matrix can be regarded as sample digitization description;
Step 3:Rule creation function creates DGA families classification of type rule according to the eigenmatrix of training sample;DGA Family's classification of type rule refers to, by step 2C output result (eigenmatrixes of M rows N+1 row), calculate by decision tree C4.5 The processing of the machine learning algorithms such as method, K- nearest neighbor algorithms, the classifying rules being created that, the classifying rules can be by a certain set of domains institute The DGA families type identification of category comes out;
After creating DGA families classification of type rule, by DGA families classification of type rule be stored in configuration file or In relational database, loaded by detection module and used;The detection module is used to detect some specific domain name collection Close the DGA families type of sample;
For example the multiple domain names for accessing certain computer equipment within the period of 30 minutes etc. see one as Set of domains, i.e. a sample, the feature for calculating and summarizing the sample are (every in the feature list that i.e. step 2B finally gives One feature) value, the input using the value of the feature of the sample as detection module;
The detection module, the as program with DGA families type detection function, refer to that some can be detected specific Set of domains sample DGA families type program;For the value of the feature of a set of domains sample of input, mould is detected Block will be regular according to the DGA families classification of type created, judges the DGA families type of the set of domains sample, either Other situations;For example, being common normal domain name, i.e., the domain name in the set of domains is generated by some DGA algorithm, can Check the description of step 8;
For DGA families classification of type rule, hereafter the 4th point proposes two kinds of establishment modes, and wherein Figure of description 3 is An exemplary model of DGA family classifications rule that what is proposed in the present invention created using decision tree, in model only comprising pykspa, This 7 kinds of DGA families types of madmax, shifu, qadars, mirai, rovnix and murfet, and the feature used only includes SLD length ranges, TLD lists, SLD characters span this 3 kinds of features;In a particular embodiment, DGA families type can reach Tens kinds, the feature used can also reach more than 10 more than kind;For the feature of a set of domains sample of input, detection module Can be according to the value of specific feature, one-level one-level searches downwards the branch of characteristic matching from tree root, until finding specific DGA family Same clan's type, or obtain one " unknown " result (represent to search failure, i.e., do not find the DGA families type of characteristic matching, There is corresponding description in step 8);Another kind is to be retouched below in the 4th point mode (3B) using the mode of configuration rule State;
Step 4:Acquisition module collection DNS Protocol flow, http protocol flow, obtain original data on flows;
The acquisition module for network traffics gather, can directly from network interface card gathered data, also can directly receive other The data on flows that system sends over;
The DNS Protocol flow refers to computer equipment to obtain IP address corresponding to domain name, is sent to dns server Parsing domain name corresponding to IP address request and dns server return domain name mapping result;Need to protect by collecting The DNS Protocol flow of computer equipment (need computer equipment to be protected to send and receive related to domain name mapping Flow), and detect the domain name of these computer request analysis, if generated for some DGA algorithm, to judge these computers Whether equipment has been infected the rogue programs such as virus, wooden horse, and the DGA family type related to rogue program;
The http protocol flow detects that rogue program after request analysis malice domain name returns successfully, enters for record The HTTP operations that one step can request that (for example, downloading the more redaction of rogue program itself, upload the sensitive information collected To C&C servers), facilitate follow-up risk supervision;
Step 5:Protocol resolution module parses DNS Protocol flow, http protocol flow according to protocol specification, and reduction is original Network behavior information, obtain the data on flows that follow-up function module can be handled, i.e. source IP, purpose IP, source port, destination Mouth, the operation of the domain name of request analysis, the result of domain name mapping, request time, HTTP request and return information etc.;
The protocol resolution module according to protocol specification, can parse the information of communicating pair in network flow data, Including source IP, purpose IP, source port, destination interface, request content and response message;
Step 6:Protocol resolution module (is the enterprise that this detection method is protected to needing computer equipment to be protected Computer equipment in internal network) request analysis domain name, filtered using domain name white list storehouse, if in the white name of domain name It can be found in single storehouse, then it is assumed that be normal common domain name, no longer detect the DGA families type belonging to the domain name, continue to locate Manage next domain name;If do not found in domain name white list storehouse, need the domain name and the request analysis domain name Computer IP address is sent to detection module, into step 7 processing;
Domain name white list storehouse refers to some are conventional, is clearly stored in a text without threaten, harmless domain name In part or in relational database, for the domain name of computer equipment request analysis, if belonging to domain name white list storehouse, recognize To be normally performed activity;
Step 7:Detection module specifically includes following sub-steps according to rule detection DGA families type:
Step (7A):The DGA families classification of type rule that detection module is loaded and created using step 3, receiving step Six computers sended over (being the computer equipment in the Intranet that this detection method is protected) IP and the meter The domain name of calculation machine request analysis, detection module please solve the computer in the time of one 10 minutes or half an hour etc. Multiple domain names of analysis (if there is multiple stage computers in Intranet, are counted as a set of domains sample to each Then multiple domain names of calculation machine request analysis in a period handle these samples successively respectively as a sample), knot The feature list that step (2B) obtains is closed, calculates and summarize the feature of the set of domains sample, i.e., the spy in feature list Sign:TLD lists are that what, SLD length ranges are what what, SLD character spans be ...;
Step (7B):The sample characteristics being calculated using step (7A), the DGA family for going matching detection module to load Same clan's type classifying rules;
If the match is successful, the domain name for illustrating to include in the set of domains sample is by the evil related to the DGA families Meaning PROGRAMMED REQUESTS parses, i.e. the rogue program request analysis domain name, to reach the purpose with C&C server communications, further Illustrate that the computer (being the computer equipment in the Intranet that this detection method is protected) has infected rogue program; If it fails to match, step 8 is continued executing with;
Step 8:If according to the domain name of request analysis, the DGA families type of characteristic matching is not found, then under being divided into State how two kinds of situations (judge whether domain name is normal, or malice:It can be searched on the internet by search engine The rope domain names, the log-on message, record information and information change record of the domain name are checked, the domain name is inquired about in security website, Whether to network attack, virus related information is had;Can also using third party's safety detection instrument come detect domain name whether be Safety):
(1) domain name of request analysis is normal domain name, then supplement, more new domain name white list storehouse;
(2) domain name of request analysis is malice domain name, is divided into three kinds of situations:
A, this is a kind of existing Virus, but the domain name feature that it is asked is not in the rule created In;
B, this be existing Virus new mutation, or Virus is to originally existing DGA algorithms have used different input parameters, cause the changing features of the set of domains of output very big;
C, this is a brand-new Virus, and the domain name feature of its request analysis does not provide completely also on network Material.
To three kinds of situations above, the related researcher of network security can pass through the malice to being infected on computer equipment Program carries out conversed analysis to determine, but can directly judge to belong to any, it is necessary to regular without the method for simplicity The fresh information of the website of information or network security enterprise in terms of ground concern network security, such as worm, wooden horse, corpse net The related information of network;If situation C, that is, there is new virus or wooden horse type, and it is calculated using existing DGA is different from When method is to generate malice domain name, then need to handle the DGA algorithms repeat step one of the species to step 3:Search correlation Algorithm or Virus Sample, algorithm is run, or Virus Sample is submitted to sandbox and carries out dry run, while gather network flow Amount, the domain name of the Virus Sample request analysis is obtained, obtain the set of domains of the DGA algorithms generation of Virus Sample calling, make For sample, sample characteristics are chosen, re-create rule.
In the present invention, the step 1 specifically includes following sub-steps:
Step (1A):The related data of a certain DGA algorithms is searched on network, including for generating the program code of domain name (code of a certain specific computer programming language), the false code of description algorithm;
If finding the related code of the DGA algorithms on network,:
For the obtained program code for being used for generating domain name, after running the program code, the domain name of program output is obtained;
For the false code of obtained description algorithm, it is written as being able to carry out using specific computer programming language Code rerun obtain output domain name, or according to the description of algorithm deduce the algorithm actually perform when, output Domain name;
Then jump to step (1D) execution;
If not finding the related code of the DGA algorithms on network, step (1B) is continued executing with;
Step (1B):The domain name sample generated by the DGA algorithms is searched, domain name sample will can show DGA algorithms life Into domain name principal character;
If finding the domain name sample of DGA algorithms generation, step (1D) execution is jumped to;
If not finding the domain name sample of DGA algorithms generation, step (1C) is continued executing with;
Step (1C):The related malice of the DGA algorithms is downloaded to from Viral diagnosis website or the website of network security Program sample, then the rogue program sample is put into sandbox and run, in the process of running, using Technology of Network Sniffer (such as The network traffics such as tcpdump, wireshark sampling instrument) crawl network traffics, the rogue program sample is obtained in running The domain name of middle request analysis, domain name sample is collected, and run by the rogue program sample in sandbox repeatedly, obtain the evil The principal character of the domain name of program of anticipating sample request parsing;
Sandbox, be called and make sand table, be a virtual system program, it is allowed to you run in sand table environment browser or its His program, therefore change caused by running can be deleted then;It creates the independently working environment of a similar sandbox, The program of its internal operation can not produce permanent influence to hard disk;It is an independent virtual environment, can be used to survey Try not trusted application program or internet behavior;
Step (1D):Check whether that also known DGA algorithms do not obtain relative set of domains sample, if Have, then repeated since step 1A;Until current all known DGA algorithms, set of domains sample is all obtained, has been stopped Circulation.
In the present invention, in the step 1, current typical DGA families type includes but is not limited to these (by letter Sequence, following DGA families type name information derive from internet):bamital、banjori、blackhole、chinad、 conficker、cryptolocker、dircrypt、dyre、emotet、fobber、gameover、gspy、locky、 madmax、matsnu、mirai、murofet、necurs、nymaim、proslikefan、pykspa、qadars、ramnit、 ranbyus、rovnix、shifu、simda、suppobox、symmi、tempedreve、tinba、tofsee、vawtrak、 vidro、virut。
In the present invention, in the step 3, rule creation specifically includes following two ways:
Mode (3A):Selection sort algorithm;Such as can be with trade-off decision tree C4.5 algorithms, decision Tree algorithms:Decision tree is One forecast model, what it was represented is a kind of mapping relations between object properties and object value, and each node represents certain in tree Individual object, and some possible property value that each diverging paths then represent, and each leaf node then correspond to from root node to this The value for the object represented by path that leaf node is undergone;Decision tree only has single output, if being intended to plural output, can establish Independent decision tree is to handle different outputs;Decision Tree in Data Mining is a kind of technology often to be used, be can be used for point Data are analysed, equally may also be used for predicting, the machine learning techniques that decision tree is produced from data are called decision tree learning, popular Say to be exactly decision tree, decision Tree algorithms are divided into multiple versions, ID3, C4.5, Cart etc. again;
The eigenmatrix being calculated in learning procedure (2C), output category model, the model are exactly the DGA finally needed Family's classification of type rule;Such as the decision-tree model that decision Tree algorithms create according to eigenmatrix;
, can be by preparing multigroup training sample (i.e. in step 2C standard in order that model (i.e. classifying rules) is more accurate Standby training sample), carry out repeatedly training (repeatedly learning different training samples by sorting algorithm), obtain multiple models, Then it is utilized respectively these models to test test sample (test sample in step 2C prepared), finally selection warp Test is crossed to DGA families type identification accuracy rate highest model;
Mode (3B):The eigenmatrix that will be obtained in step (2C), the mode that can be configured is converted to, i.e., by each step Feature in the feature list that (2B) finally gives is matched somebody with somebody as an attribute that can be configured to the DGA algorithms of each type A rule is put, i.e., as DGA families classification of type rule;
For example, for the algorithm qadars of the citing in the step (2C), can configure following rule (in following rule, word Accord with ' # ' and represent that the word in current line behind is to annotate, character ':' configuration on the left side can be regarded as feature name, i.e., one The individual attribute that can be configured, character ':' value that is characterized of the right):
TLD:Com, net, org# character pair TLD lists
SLD SIZE:12# character pair SLD length ranges
VALUE SOURCE:A~z, 0~9# character pair SLD character value source
ALPHA DIGIT SWITCH:Whether 1# character pairs SLD has alphabetic character and the situation of numerical character switching,
# is configured to 1 herein, indicates letter and the situation of numerical character switching
MIN VOWEL RATIO:Minimum rate containing vowel in 95#SLD
COUNT:The algorithm of 1800/W# the type, weekly-generated 1800 different domain names, W:Represent generation domain name when Between the cycle be one week (week)
...:... # configures other possible features as needed in embodiment
...:... # configures other possible features as needed in embodiment
DGA:The type of DGA families corresponding to qadars# rules
After creating rule, the test sample prepared using step (2C) is tested, and tests the rule created The accuracy rate classified to actual sample;If accuracy rate is not reaching to the effect threshold value of expected setting, analysis reason is simultaneously Respective handling;Again selected characteristic is determined the need for according to concrete reason, if need to reselect training sample etc.;If Rate of accuracy reached to expected setting effect threshold value, for example, identify the rate of accuracy reached of DGA families type to test sample to 95%, Then represent rule creation success.
In the present invention, the matching process in the step (7B) is specific as follows:
If DGA families classification of type rule is a decision tree, after matched rule, one can be obtained if the match is successful Output valve, this output valve are exactly the type of DGA families, and otherwise i.e. it fails to match;
If DGA families classification of type rule, decision-tree model is used, detection module uses sample characteristics, go With all rule (i.e. the rule of occupation mode (3B) configuration), during matching, calculate matching degree and (for example use score value To represent matching degree), the unmatched rule of feature is eliminated one by one:If at least one regular matching degree exceedes matching degree Threshold value, then select matching degree highest rule, i.e., with the set of domains feature (the sample spy being calculated in step (7A) Sign) the DGA families type that most matches;If the matching degree of neither one rule exceedes matching degree threshold value, it fails to match;Here Matching degree, according to the experience of test, be provided with a threshold value, even for example, matching degree highest is regular, also require The domain name that 90% is at least needed in set of domains to be detected matches the feature that the rule describes.
The operation principle of the present invention:Each typical Botnet has its distinctive DGA algorithm, to generate domain name, Communicated for control terminal (C&C servers) with the bot program on infected main frame, DGA algorithms are different, after algorithm performs The set of domains of output also just has each different features.Bot program with control terminal before communicating, generally in a short time, A large amount of request analysis have the malice domain name of significant difference with common domain name, if in the gateway of network, collect a certain calculating (being identified by IP address) of machine equipment request analysis these domain names, using these domain names as a set of domains, calculate it Feature, the feature for the domain name for then generating feature with known DGA algorithms are matched, then the corpse of computer equipment infection The related DGA families type of program is probably matching degree highest one, and this is beneficial to follow-up network attack Follow the trail of the removing work traced to the source with bot program.
Compared with prior art, the beneficial effects of the invention are as follows:
The present invention by the feature calculation of the abnormal domain name to largely being asked in the bot program short time, by result of calculation with The domain name characterization rules of known DGA algorithms generation are matched, and rapidly identify a certain computer equipment in current network The related DGA families type of the bot program of infection, be advantageous to follow-up network attack tracking trace to the source it is clear with bot program Except work, the development of remedial measure.
Brief description of the drawings
Fig. 1 is the flow chart that DGA families classification of type rule is created in the present invention.
Fig. 2 is the flow chart that the affiliated DGA families type of suspected malicious domain name is detected in the present invention.
Fig. 3 is an exemplary model of the DGA family classifications created using decision tree the rule proposed in the present invention.
Embodiment
Firstly the need of explanation, the detection method of the affiliated DGA families of malice domain name involved in the present invention, is computer One kind application of the technology in field of information security technology.In the implementation process of the present invention, multiple software function moulds can be related to The application of block.It is applicant's understanding that such as read over application documents, accurate understanding the present invention realization principle and goal of the invention with Afterwards, in the case where combining existing known technology, those skilled in the art can use the software programming technical ability of its grasp completely The present invention is realized, in the absence of the possibility that can not be understood or can not reproduce.Aforementioned software functional module includes but is not limited to:Network Flow collection module, network traffics protocol resolution module, DGA families classification of type rule creation module and the type inspection of DGA families Survey module etc., its specific implementation there can be a many kinds, and category this category that all the present patent application files refer to, applicant is not Enumerate again.
The domain name white list storehouse used in the present invention, can be preserved using text, can also use using MySQL, The Relational DBMSs such as Oracle (RDBMS) preserve.
The DGA families classification of type rule created in the present invention can be preserved using text, can also use adopt Preserved with Relational DBMSs such as MySQL, Oracle.
Heretofore described decision tree C4.5 sorting algorithm, it is a kind of optional sorting algorithm, if when it is implemented, Rule is created using sorting algorithm, other sorting algorithms can be selected according to actual conditions.
Protocol analysis result (primitive network behavioural information, source IP address, source port, purpose IP address, destination interface, The domain name title and request time of request analysis, domain name mapping result, HTTP request operation, request time and return information etc.), And the information such as affiliated DGA families type of malice domain name, the Relational DBMSs such as MySQL, Oracle can be used Preserve, the non-relational database of the distributed computing framework based on NoSQL can also be used to preserve.
The present invention is described in further detail with embodiment below in conjunction with the accompanying drawings:
A kind of rule-based affiliated DGA families detection method of malice domain name as shown in Figure 1 and Figure 2, for rapidly examining The related DGA families type of the bot program of a certain computer equipment infection in network is measured, is advantageous to follow-up network and attacks Removing work, the development of remedial measure that the tracking hit is traced to the source with bot program.Detection method specifically includes following step:
Step 1:The related data of some DGA algorithms is collected on the internet:
(1) the related data of DGA algorithms is searched on network, and obtains the domain name sample generated by these DGA algorithms.
(2) program code (if desired, addition input parameter) that (1) step obtains is run, journey is obtained after operation program The domain name of sequence output.If what (1) step obtained is the false code of algorithm, specific computer program design language is needed to use Say and realize the code that can be performed to write.Or the algorithm is deduced when actually performing according to the description of algorithm, output Domain name.
(3) if not finding the related code of DGA algorithms, the domain name sample generated by the DGA algorithms is required to look up (sample needs that the principal character of the domain name of DGA algorithms generation can be shown as far as possible).
(4) if (2) and (3) step do not obtain the domain name sample of DGA algorithms generation.Then need to attempt from virus The related rogue program sample of the DGA algorithms is downloaded on detection website or the website of network security, then by the malice journey Sequence sample is put into sandbox and run, and in the process of running, uses Technology of Network Sniffer (such as the network such as tcpdump, wireshark Flow collection instrument) crawl network traffics, obtain the sample in the process of running may request analysis domain name.Because DGA is calculated The domain name that method generates on the different dates may be different, if run in sandbox once, the domain name sample collected is seldom, then Need to run the sample in sandbox repeatedly, to reflect the principal character of the domain name of the rogue program sample request analysis.
(5) repeat above step, obtain including but not limited to following DGA families type domain name sample (it is alphabetically ordered, Following DGA families type name information derives from internet):Bamital, banjori, blackhole, chinad, Conficker, cryptolocker, dircrypt, dyre, emotet, fobber, gameover, gspy, locky, Madmax, matsnu, mirai, murofet, necurs, nymaim, proslikefan, pykspa, qadars, ramnit, Ranbyus, rovnix, shifu, simda, suppobox, symmi, tempedreve, tinba, tofsee, vawtrak, Vidro, virut.
Step 2:Choose the set of domains feature that will be analyzed:
(6) to the set of domains of (5) step each DGA algorithm generation (for example, proslikefan algorithms generate daily 100 domain names), as a sample (i.e. a set of domains is as a sample), list the domain name that may need to analyze Set feature, i.e., the domain name in the sample set and common domain name and other DGA algorithms can be generated by these features Domain name is distinguished, and can include but is not limited to following characteristics:
SLD length spans;Such as:It is a regular length, or multiple possible regular lengths, or one Individual interval range;
Character span in SLD character strings;For example, be 26 letters, or the alphabetic character of certain several fixation Random sequential combination and comprising any one in numerical character ' 0 '~' 9 ', or only comprising 16 system characters ' 0 '~' 9 ' and ' a'~' f ';
Whether include vowel character in SLD;
The minimum rate and maximum rate that SLD medial vowel alphabetic characters occupy;
Whether numerical character is included in SLD;
The minimum rate and maximum rate that numerical character occupies in SLD;
Whether common characters string is had in SLD;If so, the value of common characters string whether be to determine it is constant, or by calculating The input parameter of method determines whether position of the common characters string in SLD be fixed;
With the presence or absence of alphabetic character and the situation of numerical character frequent switching in SLD;If so, minimum switching times and most Switching times greatly are respectively how many, or switching times and the ratio of SLD length, and minimum value and maximum are respectively how many;
The minimum ratio of total domain name number in the number and sample of SLD in sample set at least containing a vowel Example is how many;
TLD value;It is a fixed value, or one is randomly choosed from the TLD of several fixations, either It is uncertain, determined by the input parameter of algorithm;
The domain name number that algorithm generates in a time cycle;
Algorithm generates the time cycle of domain name.
(7) (6) step is repeated, one listed to each DGA algorithms does set union operation, most end form to multiple features Into a feature list.
(8) multiple set of domains that the feature list obtained using (7) step is generated to each DGA algorithm are (i.e. to every One multiple sample of DGA algorithms selections, at least select two samples), two parts are divided into, a part is used as training sample, uses In creating rule, another part is as test sample, the recognition accuracy for test order.Finally obtained with reference to step (2B) Feature list in each feature, calculate or summarize the value of feature for training sample, if having N in feature list Individual feature, then each set of domains is calculated or concluded the value of this N number of feature.Belonged to plus each set of domains DGA families type, M training sample form the eigenmatrix of M rows N+1 row after calculating.
Step 3:Rule creation function creates DGA families classification of type rule according to the eigenmatrix of training sample:
(9) selection sort algorithm, such as the feature that (8) step is calculated can be learnt with trade-off decision tree C4.5 algorithms Data, the decision-tree model that decision Tree algorithms create according to characteristic, it is exactly the DGA families classification of type rule finally needed Then.In order that model is more accurate, some samples can be prepared more, repeatedly train and test.Finally selection is by test Recognition accuracy highest rule.
If not using the sorting algorithm of decision tree etc, the eigenmatrix that (8) step can also be obtained, conversion For the mode that can be configured.Using each feature as an attribute that can be configured, to the DGA algorithm configurations of each type One rule.This mode for creating rule, need also exist for carrying out accuracy test to the rule after establishment using test sample, The threshold value of part attribute is adjusted, in specific implementation, description rule (following rule can be carried out using similar following grammer In then, character ' # ' represents that the word in current line behind is to annotate, character ':' configuration on the left side can be regarded as feature name Claim, i.e. an attribute that can be configured, character ':' value that is characterized of the right):
DGA:The type of bamital#DGA families
TLD:Co.cc, cz.cc, info, org#TLD span, the TLD of the domain name of algorithm output value are One among these four TLD
SLD SIZE:The length of 32# domain name SLD parts is fixed, is 32 characters
COUNT:The algorithm of 104/D# the type, 104 different domain names, D are generated daily:Represent the time of generation domain name Cycle is one day
ALPHA DIGIT SWITCH:In 1#SLD, letter and number character is frequent switching
VALUE SOURCE:Hash characters (i.e. character ' 0 '~' 9 ' and ' a '~' f ') are carried in HASH#SLD value
HASH POS:The starting position of hash character strings in 0#SLD, 0:Represent since SLD first character
HASH LENGTH:The length of hash character strings in 32#SLD
... # configures other possible features as needed in embodiment
... # configures other possible features as needed in embodiment
EXP:51bdc61022f0108b7053c5518ae87761.cz.cc, b7422ac536814a6bc6af0cf574e5d60d.info,00c58006323de055d35ef57ff97f8036.co.cc, 9bbf4817211f069df3befe28af3e0ebf.org# domain name samples
After creating rule, rule is stored in configuration file in either relational database, entered by detection module Row loading and use.
(10) after creating rule, the test sample prepared using (8) step is tested, and has been created with test The accuracy rate that rule is classified to actual sample.If accuracy rate is not reaching to expected effect, reason is analyzed, according to Concrete reason determines the need for selected characteristic again, if needs to reselect training sample etc..If rate of accuracy reached is in advance The effect of phase, for example, identifying the rate of accuracy reached of DGA families type to 95% to test sample, then it represents that rule creation success.
Step 4:Acquisition module collection DNS Protocol, http protocol flow, obtain original data on flows.
DNS flows be used for collect the computer equipment that this detection method is protected send and receive it is related to domain name mapping Flow, by the domain name for detecting these computer request analysis, if for some DGA algorithm generation, to judge these meters Calculate whether machine equipment has been infected rogue program (i.e. virus, wooden horse), and the DGA family type related to rogue program, HTTP flows detect that rogue program after request analysis malice domain name returns successfully, can further can request that for record HTTP operations (for example, downloading the more redaction of rogue program itself, the sensitive information that upload has been collected to C&C servers), Be advantageous to follow-up risk supervision.
Step 5:Protocol resolution module parses DNS Protocol flow, http protocol flow according to protocol specification, and generation is original Network behavior information, obtain the manageable data on flows of follow-up function module, i.e. source IP, purpose IP, source port, destination Mouth, the operation of the domain name of request analysis, the result of domain name mapping, request time, HTTP request and return information etc..
Step 6:The domain name that protocol resolution module parses to enterprise-wide computer device request, use domain name white list Storehouse is filtered, i.e. the domain name of enterprise-wide computer device request parsing, if can be found in domain name white list storehouse, It is considered normal common domain name, no longer detects the DGA families type belonging to it.
Step 7:Detection module is according to rule detection DGA families type:
(11) pass through step 3, rule creation well after, loaded by detection module and used.When detection module detects When into actual flow, a certain computer frequent requests parse suspicious domain name, detection program by the computer at 10 minutes or The following period of time of half an hour etc., these domain names of request analysis are as a sample, the characteristic series obtained with reference to (7) step Table, calculate the feature of the sample.
(12) sample characteristics being calculated using (11) step, the rule for going matching to load.
If classifying rules is a decision tree, after matched rule, an output valve can be obtained, this output valve is exactly The type of DGA families.Or it fails to match, into step 8.
If classifying rules does not use decision-tree model, detection program can use sample characteristics, go matching all Rule, during matching, calculate matching degree, for example represent matching degree with score value, do not eliminate feature one by one not The rule matched somebody with somebody, final choice matching degree highest rule, i.e., the DGA families type most matched with the set of domains feature.This In the matching degree said, a threshold value can be set according to the experience of test, even for example, matching degree highest is regular, Also require that the domain name that 90% is at least needed in set of domains to be detected matches the feature of rule description.
Step 8:The processing of DGA families type cases is not detected.
If according to the domain name of request analysis, the DGA families type of characteristic matching is not found, then is probably following two Situation:
(13) domain name of request analysis is normal domain name, then requires supplementation with, more new domain name white list storehouse.
(14) domain name of request analysis is malice domain name, is divided into three kinds of situations:
A. this is a kind of existing Virus, but the domain name feature that it is asked is not in the rule created In.
B. this be existing Virus new mutation, or Virus is to originally existing DGA algorithms have used different input parameters, cause the changing features of the set of domains of output very big.
C. this is a brand-new Virus, and the domain name feature of its request analysis does not provide completely also on network Material.
To three kinds of situations above, it is necessary to the net of the information or network security enterprise in terms of regular concern network security The fresh information (the related information of worm, wooden horse, Botnet) stood.If there is new virus or wooden horse type, and its During using different from existing DGA algorithms to generate malice domain name, then need to the DGA algorithms of the species according to above-mentioned (1) ~(10) step is handled:Related algorithm or Virus Sample are searched, runs algorithm, or Virus Sample is submitted to sandbox Dry run (while gathering network traffics, obtain the domain name of the Virus Sample request analysis) is carried out, obtains Virus Sample tune The set of domains of DGA algorithms generation, as sample, sample characteristics are chosen, re-create rule.
In above-mentioned detection mode, the step 1 to step 3 is, it is necessary to the participation of data analyst:Sample (domain name Set) related data collection, obtain sample, Feature Selection, trained using training sample, being tested using test sample. The step 1 and step 8, it is necessary to be engaged in network security research personnel participate in, be related to tracing to the source to attack traceback, point Analyse the work of rogue program sample.
Function based on the affiliated DGA families type method of above-mentioned detection malice domain name can be deployed to the inspection of certain network security The One function module or subsystem of examining system, such as APT intruding detection systems, are typically deployed at the disengaging of enterprise network Mouthful, monitoring and the network traffics for analyzing whole enterprise.
Finally it should be noted that listed above is only specific embodiment of the invention.It is clear that the invention is not restricted to Above example, there can also be many variations.One of ordinary skill in the art can directly lead from present disclosure All deformations for going out or associating, are considered as protection scope of the present invention.

Claims (5)

  1. A kind of 1. detection method of the affiliated DGA families of rule-based malice domain name, for malice domain name to be analyzed and examined Survey, identify the species of the DGA families for the computer equipment institute virus infection attacked in network, it is characterised in that described to be based on The detection method of the affiliated DGA families of malice domain name of rule comprises the steps:
    Step 1:The related data of DGA algorithms is collected on the internet, and obtains the domain name sample generated by these DGA algorithms Example;
    Step 2:To each DGA algorithm, the set of domains feature that will be analyzed is chosen, ultimately forms a total feature list, To each at least one training sample of DGA algorithms selections, to each feature in feature list, training is calculated and summarized Characteristic value corresponding to sample, an eigenmatrix is ultimately formed, specifically includes following sub-steps:
    Step (2A):By the set of domains that each DGA algorithm generates in a period in step 1, as a sample This, lists the domain name feature for needing to analyze;
    Step (2B):Then these domain name features are done by the domain name feature for needing to analyze by each sample of step (2A) acquisition Union operation, finally obtain a feature list;
    Step (2C):To at least two set of domains of each DGA algorithm generation, two parts are divided into:A part is as instruction Practice sample, if the number of samples of training sample is M;Another part is as test sample;
    For M training sample, the value of feature is calculated respectively, it is if there is N number of feature in the feature list that step (2B) obtains, then right Each sample calculates the value of this N number of feature, along with the DGA families type that each sample is belonged to, M training sample warp After crossing calculating, the eigenmatrix of a M rows N+1 row is formed;Wherein, M, N are the natural number more than zero;
    Step 3:Rule creation function creates DGA families classification of type rule according to the eigenmatrix of training sample;
    After creating DGA families classification of type rule, DGA families classification of type rule is stored in configuration file or relation Lane database, loaded by detection module and used;The detection module is used to detect some specific set of domains sample This DGA families type;
    The detection module, the as program with DGA families type detection function, refer to that some specific domain can be detected The program of the DGA families type of name set sample;For the value of the feature of a set of domains sample of input, detection module will It is regular according to the DGA families classification of type created, judge the DGA families type of the set of domains sample, or other Situation;
    Step 4:Acquisition module collection DNS Protocol flow, http protocol flow, obtain original data on flows;
    The acquisition module for network traffics gather, can directly from network interface card gathered data, also can directly receive other systems The data on flows sended over;
    The DNS Protocol flow refers to computer equipment to obtain IP address corresponding to domain name, the solution sent to dns server Analyse the domain name mapping result of the request of IP address corresponding to domain name and dns server return;Meter to be protected is needed by collecting The DNS Protocol flow of machine equipment is calculated, and detects the domain name of these computer request analysis, if generated for some DGA algorithm, To judge whether these computer equipments have been infected rogue program, and the DGA family type related to rogue program;
    The http protocol flow detects rogue program after request analysis malice domain name returns successfully for record, further The HTTP operations that can be can request that, facilitate follow-up risk supervision;
    Step 5:Protocol resolution module parses DNS Protocol flow, http protocol flow according to protocol specification, reduces primitive network Behavioural information, obtain the data on flows that follow-up function module can be handled;
    The protocol resolution module according to protocol specification, can parse the information of communicating pair in network flow data, including Source IP, purpose IP, source port, destination interface, request content and response message;
    Step 6:Protocol resolution module uses domain name white list storehouse to the domain name of need computer equipment request analysis to be protected Filtered, if can be found in domain name white list storehouse, then it is assumed that be normal common domain name, no longer detect the domain name institute The DGA families type of category, continues with next domain name;If do not found in domain name white list storehouse, need the domain The Computer IP address of name and the request analysis domain name is sent to detection module, into step 7 processing;
    Step 7:Detection module specifically includes following sub-steps according to rule detection DGA families type:
    Step (7A):The DGA families classification of type rule that detection module is loaded and created using step 3, receiving step six are sent out The Computer IP and the domain name of the computer request analysis brought, detection module please solve the computer in following period of time Multiple domain names of analysis the feature list obtained with reference to step (2B), calculate and summarized the domain as a set of domains sample The feature of name set sample;
    Step (7B):The sample characteristics being calculated using step (7A), the same clan of DGA family for going matching detection module to load Type classifying rules;
    If the match is successful, the domain name for illustrating to include in the set of domains sample is by the malice journey related to the DGA families Sequence request analysis, further illustrate computer infected rogue program;If it fails to match, step 8 is continued executing with;
    Step 8:If according to the domain name of request analysis, the DGA families type of characteristic matching is not found, then is divided into following two Kind situation:
    (1) domain name of request analysis is normal domain name, then supplement, more new domain name white list storehouse;
    (2) domain name of request analysis is malice domain name, is divided into three kinds of situations:
    A, this is a kind of existing Virus, but the domain name feature that it is asked is not in the rule created;
    B, this be existing Virus new mutation, or Virus is to originally existing DGA Algorithm has used different input parameters, causes the changing features of the set of domains of output very big;
    C, this is a brand-new Virus, and the domain name feature of its request analysis is not having complete information also on network.
  2. 2. a kind of detection method of rule-based affiliated DGA families of malice domain name according to claim 1, its feature exist In the step 1 specifically includes following sub-steps:
    Step (1A):The related data of a certain DGA algorithms is searched on network, including for generating the program code of domain name, retouching State the false code of algorithm;
    If finding the related code of the DGA algorithms on network,:
    For the obtained program code for being used for generating domain name, after running the program code, the domain name of program output is obtained;
    For the false code of obtained description algorithm, the generation for being written as being able to carry out using specific computer programming language Code, which reruns, obtains the domain name of output, or deduces the algorithm when actually performing according to the description of algorithm, the domain name of output;
    Then jump to step (1D) execution;
    If not finding the related code of the DGA algorithms on network, step (1B) is continued executing with;
    Step (1B):Search the domain name sample generated by the DGA algorithms;
    If finding the domain name sample of DGA algorithms generation, step (1D) execution is jumped to;
    If not finding the domain name sample of DGA algorithms generation, step (1C) is continued executing with;
    Step (1C):The related rogue program of the DGA algorithms is downloaded to from Viral diagnosis website or the website of network security Sample, then the rogue program sample is put into sandbox and run, in the process of running, network flow is captured using Technology of Network Sniffer Amount, the domain name of rogue program sample request analysis in the process of running is obtained, collect domain name sample, and pass through the malice Program sample is run repeatedly in sandbox, obtains the principal character of the domain name of the rogue program sample request analysis;
    Step (1D):Check whether that also known DGA algorithms do not obtain relative set of domains sample, if so, then Repeated since step 1A;Until current all known DGA algorithms, set of domains sample is all obtained, has stopped circulation.
  3. 3. a kind of detection method of rule-based affiliated DGA families of malice domain name according to claim 1, its feature exist In in the step 1, current typical DGA families type includes but is not limited to these:bamital、banjori、 blackhole、chinad、conficker、cryptolocker、dircrypt、dyre、emotet、fobber、gameover、 gspy、locky、madmax、matsnu、mirai、murofet、necurs、nymaim、proslikefan、pykspa、 qadars、ramnit、ranbyus、rovnix、shifu、simda、suppobox、symmi、tempedreve、tinba、 tofsee、vawtrak、vidro、virut。
  4. 4. a kind of detection method of rule-based affiliated DGA families of malice domain name according to claim 1, its feature exist In in the step 3, rule creation specifically includes following two ways:
    Mode (3A):Selection sort algorithm, the eigenmatrix being calculated in learning procedure (2C), output category model, the mould Type is exactly the DGA families classification of type rule finally needed;
    In order that model is more accurate, it can repeatedly be trained by preparing multigroup training sample, obtain multiple models, then It is utilized respectively these models to test test sample, finally selects to pass through test to DGA families type identification accuracy rate most High model;
    Mode (3B):The eigenmatrix that will be obtained in step (2C), the mode that can be configured is converted to, i.e., by each step (2B) Feature in the feature list finally given is as an attribute that can be configured, to the DGA algorithm configurations one of each type Individual rule, i.e., as DGA families classification of type rule;
    After creating rule, the test sample prepared using step (2C) is tested, and tests the rule created to reality The accuracy rate that border sample is classified;If accuracy rate is not reaching to the effect threshold value of expected setting, analysis reason is simultaneously corresponding Processing;If the effect threshold value that rate of accuracy reached is set to expection, then it represents that rule creation success.
  5. 5. a kind of detection method of rule-based affiliated DGA families of malice domain name according to claim 1, its feature exist In the matching process in the step (7B) is specific as follows:
    If DGA families classification of type rule is a decision tree, after matched rule, an output can be obtained if the match is successful Value, this output valve is exactly the type of DGA families, and otherwise i.e. it fails to match;
    If for DGA families classification of type rule without decision-tree model is used, detection module uses sample characteristics, go to match institute Some rules, during matching, matching degree is calculated, eliminates the unmatched rule of feature one by one:If at least one is advised Matching degree then exceedes matching degree threshold value, then selects matching degree highest rule, i.e., most matched with the set of domains feature DGA families type;If the matching degree of neither one rule exceedes matching degree threshold value, it fails to match.
CN201710855704.0A 2017-09-20 2017-09-20 Rule-based method for detecting DGA family to which malicious domain name belongs Active CN107645503B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710855704.0A CN107645503B (en) 2017-09-20 2017-09-20 Rule-based method for detecting DGA family to which malicious domain name belongs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710855704.0A CN107645503B (en) 2017-09-20 2017-09-20 Rule-based method for detecting DGA family to which malicious domain name belongs

Publications (2)

Publication Number Publication Date
CN107645503A true CN107645503A (en) 2018-01-30
CN107645503B CN107645503B (en) 2020-01-24

Family

ID=61112668

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710855704.0A Active CN107645503B (en) 2017-09-20 2017-09-20 Rule-based method for detecting DGA family to which malicious domain name belongs

Country Status (1)

Country Link
CN (1) CN107645503B (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108600200A (en) * 2018-04-08 2018-09-28 腾讯科技(深圳)有限公司 Domain name detection method, device, computer equipment and storage medium
CN108848102A (en) * 2018-07-02 2018-11-20 北京网藤科技有限公司 A kind of APT attack early warning system and its method for early warning
CN109120733A (en) * 2018-07-20 2019-01-01 杭州安恒信息技术股份有限公司 A kind of detection method communicated using DNS
CN109246083A (en) * 2018-08-09 2019-01-18 北京奇安信科技有限公司 A kind of detection method and device of DGA domain name
CN109784049A (en) * 2018-12-21 2019-05-21 北京奇安信科技有限公司 Method, equipment, system and the medium of threat data processing
CN110535820A (en) * 2019-04-18 2019-12-03 国家计算机网络与信息安全管理中心 For the classification method of malice domain name, device, electronic equipment and medium
CN110730175A (en) * 2019-10-16 2020-01-24 杭州安恒信息技术股份有限公司 Botnet detection method and detection system based on threat information
CN111125700A (en) * 2019-12-11 2020-05-08 中山大学 DGA family classification method based on host relevance
CN111147459A (en) * 2019-12-12 2020-05-12 北京网思科平科技有限公司 C & C domain name detection method and device based on DNS request data
CN111327632A (en) * 2020-03-06 2020-06-23 深信服科技股份有限公司 Zombie host detection method, system, equipment and storage medium
CN111800404A (en) * 2020-06-29 2020-10-20 深信服科技股份有限公司 Method and device for identifying malicious domain name and storage medium
CN111935097A (en) * 2020-07-16 2020-11-13 上海斗象信息科技有限公司 Method for detecting DGA domain name
CN112543238A (en) * 2020-12-08 2021-03-23 光通天下网络科技股份有限公司 Domain name over-white list optimization method, device, equipment and medium
CN112771523A (en) * 2018-08-14 2021-05-07 北京嘀嘀无限科技发展有限公司 System and method for detecting a generated domain
CN113238934A (en) * 2021-04-30 2021-08-10 西南电子技术研究所(中国电子科技集团公司第十研究所) Method for automatically judging data acquisition result of test system
CN113259303A (en) * 2020-02-12 2021-08-13 网宿科技股份有限公司 White list self-learning method and device based on machine learning technology
WO2021181169A1 (en) * 2020-03-09 2021-09-16 International Business Machines Corporation Methods and systems for graph computing with hybrid reasoning
CN113612727A (en) * 2021-06-24 2021-11-05 北京华云安信息技术有限公司 Attack IP identification method, device, equipment and computer readable storage medium
CN113746952A (en) * 2021-09-14 2021-12-03 京东科技信息技术有限公司 DGA domain name detection method, device, electronic equipment and computer storage medium
WO2022034405A1 (en) * 2020-08-10 2022-02-17 International Business Machines Corporation Low-latency identification of network-device properties
CN114826679A (en) * 2022-03-30 2022-07-29 西安电子科技大学 Distributed DGA domain name detection method and system for Internet of things equipment

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102523311A (en) * 2011-11-25 2012-06-27 中国科学院计算机网络信息中心 Illegal domain name recognition method and device
CN103581363A (en) * 2013-11-29 2014-02-12 杜跃进 Method and device for controlling baleful domain name and illegal access
CN105024969A (en) * 2014-04-17 2015-11-04 北京启明星辰信息安全技术有限公司 Method and device for realizing malicious domain name identification
CN105072214A (en) * 2015-08-28 2015-11-18 携程计算机技术(上海)有限公司 C&C domain name identification method based on domain name feature
CN105262722A (en) * 2015-09-07 2016-01-20 深信服网络科技(深圳)有限公司 Terminal malicious traffic rule updating method, cloud server and security gateway
CN105577660A (en) * 2015-12-22 2016-05-11 国家电网公司 DGA domain name detection method based on random forest
CN105610830A (en) * 2015-12-30 2016-05-25 山石网科通信技术有限公司 Method and device for detecting domain name
US20160359887A1 (en) * 2015-06-04 2016-12-08 Cisco Technology, Inc. Domain name system (dns) based anomaly detection
CN106230867A (en) * 2016-09-29 2016-12-14 北京知道创宇信息技术有限公司 Prediction domain name whether method, system and the model training method thereof of malice, system
CN106713312A (en) * 2016-12-21 2017-05-24 深圳市深信服电子科技有限公司 Method and device for detecting illegal domain name
CN106992969A (en) * 2017-03-03 2017-07-28 南京理工大学 DGA based on domain name character string statistical nature generates the detection method of domain name
CN107070897A (en) * 2017-03-16 2017-08-18 杭州安恒信息技术有限公司 Network log storage method based on many attribute Hash duplicate removals in intruding detection system
CN107612911A (en) * 2017-09-20 2018-01-19 杭州安恒信息技术有限公司 Method based on the infected main frame of DNS flow detections and C&C servers

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102523311A (en) * 2011-11-25 2012-06-27 中国科学院计算机网络信息中心 Illegal domain name recognition method and device
CN103581363A (en) * 2013-11-29 2014-02-12 杜跃进 Method and device for controlling baleful domain name and illegal access
CN105024969A (en) * 2014-04-17 2015-11-04 北京启明星辰信息安全技术有限公司 Method and device for realizing malicious domain name identification
US20160359887A1 (en) * 2015-06-04 2016-12-08 Cisco Technology, Inc. Domain name system (dns) based anomaly detection
CN105072214A (en) * 2015-08-28 2015-11-18 携程计算机技术(上海)有限公司 C&C domain name identification method based on domain name feature
CN105262722A (en) * 2015-09-07 2016-01-20 深信服网络科技(深圳)有限公司 Terminal malicious traffic rule updating method, cloud server and security gateway
CN105577660A (en) * 2015-12-22 2016-05-11 国家电网公司 DGA domain name detection method based on random forest
CN105610830A (en) * 2015-12-30 2016-05-25 山石网科通信技术有限公司 Method and device for detecting domain name
CN106230867A (en) * 2016-09-29 2016-12-14 北京知道创宇信息技术有限公司 Prediction domain name whether method, system and the model training method thereof of malice, system
CN106713312A (en) * 2016-12-21 2017-05-24 深圳市深信服电子科技有限公司 Method and device for detecting illegal domain name
CN106992969A (en) * 2017-03-03 2017-07-28 南京理工大学 DGA based on domain name character string statistical nature generates the detection method of domain name
CN107070897A (en) * 2017-03-16 2017-08-18 杭州安恒信息技术有限公司 Network log storage method based on many attribute Hash duplicate removals in intruding detection system
CN107612911A (en) * 2017-09-20 2018-01-19 杭州安恒信息技术有限公司 Method based on the infected main frame of DNS flow detections and C&C servers

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108600200A (en) * 2018-04-08 2018-09-28 腾讯科技(深圳)有限公司 Domain name detection method, device, computer equipment and storage medium
CN108848102B (en) * 2018-07-02 2021-04-13 北京网藤科技有限公司 APT attack early warning system and early warning method thereof
CN108848102A (en) * 2018-07-02 2018-11-20 北京网藤科技有限公司 A kind of APT attack early warning system and its method for early warning
CN109120733A (en) * 2018-07-20 2019-01-01 杭州安恒信息技术股份有限公司 A kind of detection method communicated using DNS
CN109246083A (en) * 2018-08-09 2019-01-18 北京奇安信科技有限公司 A kind of detection method and device of DGA domain name
CN109246083B (en) * 2018-08-09 2021-08-03 奇安信科技集团股份有限公司 DGA domain name detection method and device
CN112771523A (en) * 2018-08-14 2021-05-07 北京嘀嘀无限科技发展有限公司 System and method for detecting a generated domain
CN109784049A (en) * 2018-12-21 2019-05-21 北京奇安信科技有限公司 Method, equipment, system and the medium of threat data processing
CN110535820A (en) * 2019-04-18 2019-12-03 国家计算机网络与信息安全管理中心 For the classification method of malice domain name, device, electronic equipment and medium
CN110730175A (en) * 2019-10-16 2020-01-24 杭州安恒信息技术股份有限公司 Botnet detection method and detection system based on threat information
CN111125700A (en) * 2019-12-11 2020-05-08 中山大学 DGA family classification method based on host relevance
CN111125700B (en) * 2019-12-11 2023-02-07 中山大学 DGA family classification method based on host relevance
CN111147459A (en) * 2019-12-12 2020-05-12 北京网思科平科技有限公司 C & C domain name detection method and device based on DNS request data
CN111147459B (en) * 2019-12-12 2021-11-30 北京网思科平科技有限公司 C & C domain name detection method and device based on DNS request data
CN113259303A (en) * 2020-02-12 2021-08-13 网宿科技股份有限公司 White list self-learning method and device based on machine learning technology
CN111327632A (en) * 2020-03-06 2020-06-23 深信服科技股份有限公司 Zombie host detection method, system, equipment and storage medium
GB2609769A (en) * 2020-03-09 2023-02-15 Ibm Methods and systems for graph computing with hybrid reasoning
WO2021181169A1 (en) * 2020-03-09 2021-09-16 International Business Machines Corporation Methods and systems for graph computing with hybrid reasoning
CN111800404A (en) * 2020-06-29 2020-10-20 深信服科技股份有限公司 Method and device for identifying malicious domain name and storage medium
CN111800404B (en) * 2020-06-29 2023-03-24 深信服科技股份有限公司 Method and device for identifying malicious domain name and storage medium
CN111935097B (en) * 2020-07-16 2022-07-19 上海斗象信息科技有限公司 Method for detecting DGA domain name
CN111935097A (en) * 2020-07-16 2020-11-13 上海斗象信息科技有限公司 Method for detecting DGA domain name
WO2022034405A1 (en) * 2020-08-10 2022-02-17 International Business Machines Corporation Low-latency identification of network-device properties
US11743272B2 (en) 2020-08-10 2023-08-29 International Business Machines Corporation Low-latency identification of network-device properties
GB2613117A (en) * 2020-08-10 2023-05-24 Ibm Low-latency identification of network-device properties
CN112543238A (en) * 2020-12-08 2021-03-23 光通天下网络科技股份有限公司 Domain name over-white list optimization method, device, equipment and medium
CN112543238B (en) * 2020-12-08 2022-06-14 光通天下网络科技股份有限公司 Domain name over-white list optimization method, device, equipment and medium
CN113238934A (en) * 2021-04-30 2021-08-10 西南电子技术研究所(中国电子科技集团公司第十研究所) Method for automatically judging data acquisition result of test system
CN113238934B (en) * 2021-04-30 2024-04-05 西南电子技术研究所(中国电子科技集团公司第十研究所) Method for automatically judging data acquisition result of test system
CN113612727B (en) * 2021-06-24 2023-04-18 北京华云安信息技术有限公司 Attack IP identification method, device, equipment and computer readable storage medium
CN113612727A (en) * 2021-06-24 2021-11-05 北京华云安信息技术有限公司 Attack IP identification method, device, equipment and computer readable storage medium
CN113746952A (en) * 2021-09-14 2021-12-03 京东科技信息技术有限公司 DGA domain name detection method, device, electronic equipment and computer storage medium
CN113746952B (en) * 2021-09-14 2024-04-16 京东科技信息技术有限公司 DGA domain name detection method and device, electronic equipment and computer storage medium
CN114826679A (en) * 2022-03-30 2022-07-29 西安电子科技大学 Distributed DGA domain name detection method and system for Internet of things equipment

Also Published As

Publication number Publication date
CN107645503B (en) 2020-01-24

Similar Documents

Publication Publication Date Title
CN107645503A (en) A kind of detection method of the affiliated DGA families of rule-based malice domain name
CN105022960B (en) Multiple features mobile terminal from malicious software detecting method and system based on network traffics
Kayacik et al. Selecting features for intrusion detection: A feature relevance analysis on KDD 99 intrusion detection datasets
Ojugo et al. Genetic algorithm rule-based intrusion detection system (GAIDS)
CN112866023B (en) Network detection method, model training method, device, equipment and storage medium
CN105187394B (en) Proxy server and method with mobile terminal from malicious software action detectability
Relan et al. Implementation of network intrusion detection system using variant of decision tree algorithm
CN108920954B (en) Automatic malicious code detection platform and method
CN108009425A (en) File detects and threat level decision method, apparatus and system
RU2757597C1 (en) Systems and methods for reporting computer security incidents
CN108470003A (en) Fuzz testing methods, devices and systems
Haddadi et al. On botnet behaviour analysis using GP and C4. 5
CN107247902A (en) Malware categorizing system and method
CN110287701A (en) A kind of malicious file detection method, device, system and associated component
CN101588358B (en) System and method for detecting host intrusion based on danger theory and NSA
Esposito et al. Evaluating pattern recognition techniques in intrusion detection systems
Abela et al. An automated malware detection system for android using behavior-based analysis AMDA
CN106682507B (en) The acquisition methods and device of virus base, equipment, server, system
CN117454376A (en) Industrial Internet data security detection response and tracing method and device
Skopik et al. Smart Log Data Analytics
CN116996286A (en) Network attack and security vulnerability management framework platform based on big data analysis
Pan et al. An integrated model of intrusion detection based on neural network and expert system
Alosefer et al. Predicting client-side attacks via behaviour analysis using honeypot data
CN110472416A (en) A kind of web virus detection method and relevant apparatus
CN116248393A (en) Intranet data transmission loophole scanning device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 310051 No. 188 Lianhui Street, Xixing Street, Binjiang District, Hangzhou City, Zhejiang Province

Applicant after: Hangzhou Annan information technology Limited by Share Ltd

Address before: Zhejiang Zhongcai Building No. 68 Binjiang District road Hangzhou City, Zhejiang Province, the 310051 and 15 layer

Applicant before: Dbappsecurity Co.,ltd.

GR01 Patent grant
GR01 Patent grant