CN105430112B - Provisional domain name recognition methods and system - Google Patents

Provisional domain name recognition methods and system Download PDF

Info

Publication number
CN105430112B
CN105430112B CN201510736531.1A CN201510736531A CN105430112B CN 105430112 B CN105430112 B CN 105430112B CN 201510736531 A CN201510736531 A CN 201510736531A CN 105430112 B CN105430112 B CN 105430112B
Authority
CN
China
Prior art keywords
domain name
domain
inquiry
child node
provisional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510736531.1A
Other languages
Chinese (zh)
Other versions
CN105430112A (en
Inventor
尉迟学彪
潘蓝兰
李晓东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Internet Network Information Center
Original Assignee
China Internet Network Information Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Internet Network Information Center filed Critical China Internet Network Information Center
Priority to CN201510736531.1A priority Critical patent/CN105430112B/en
Publication of CN105430112A publication Critical patent/CN105430112A/en
Application granted granted Critical
Publication of CN105430112B publication Critical patent/CN105430112B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/30Managing network names, e.g. use of aliases or nicknames
    • H04L61/3015Name registration, generation or assignment
    • H04L61/3025Domain name generation or assignment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/09Mapping addresses
    • H04L61/25Mapping addresses of the same type
    • H04L61/2503Translation of Internet protocol [IP] addresses
    • H04L61/256NAT traversal
    • H04L61/2575NAT traversal using address mapping retrieval, e.g. simple traversal of user datagram protocol through session traversal utilities for NAT [STUN]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/50Address allocation
    • H04L61/5076Update or notification mechanisms, e.g. DynDNS

Abstract

The present invention provides a kind of provisional domain name recognition methods, is based on an inquiry of the domain name database, comprising the following steps: reads inquiry of the domain name database, constructs an inquiry of the domain name tree according to inquiry of the domain name solicited message;Feature extraction is carried out according to provisional domain name characteristic to all child nodes in inquiry of the domain name tree in addition to leaf node, obtains the domain name feature of each child node;All child nodes in inquiry of the domain name tree in addition to leaf node are clustered according to the domain name feature of extraction, obtain multiple subsets;Subset of the screening comprising child node quantity less than a threshold values from the multiple subset exports a doubtful provisional domain name list according to doubtful interim domain subset as doubtful interim domain subset.The system for realizing the above method is provided simultaneously.

Description

Provisional domain name recognition methods and system
Technical field
The present invention relates to information technology fields, and in particular to a kind of provisional domain name recognition methods and system.
Background technique
Provisional domain name started in recent years by certain specific Internet services as a kind of novel domain name (such as virus Killing, instant messaging etc. need the business of frequent updating) largely use.Part field in such domain name is usually specific by certain Algorithm generates at random, and the scale of construction is huge, but its overall frequency of use is very low, is somewhat similarly to temporarily use.Although such Domain name is the normal domain name for regular Internet service, but a large amount of appearance of provisional domain name, certainly will be (special to domain name service Be not the buffer service of recurrence domain name device) working efficiency bring strong influence.Therefore, it is necessary to be carried out to such domain name special The discovery and identification of door take related counter-measure in order to the timely decorrelation situation of domain name service provider when necessary.
Relevant domain name identification technology is primarily present two kinds in the industry.One is (be such as used for rubbish for bad application domain name The purpose of mail, Botnet) identification technology, another kind is for improper domain name (such as invalid domain name, configuration error domain name Deng) identification technology.Since provisional domain name is for the normal domain name in regular Internet service, characteristic and bad application Domain name, improper domain name are compared to there are great differences, therefore above two domain name identification technology can not achieve to provisional domain name Effective identification.
Summary of the invention
In view of the above-mentioned problems, it is an object of that present invention to provide a kind of provisional domain name recognition methods and systems.According to interim The characteristic of property domain name efficiently and accurately identifies provisional domain name,
For features above, the specific technical solution that the present invention takes for the above-mentioned purpose is:
A kind of provisional domain name recognition methods is based on an inquiry of the domain name database, comprising the following steps:
Inquiry of the domain name database is read, a domain name is constructed according to the inquiry of the domain name solicited message in inquiry of the domain name database and is looked into Ask tree;The tree root of the inquiry of the domain name tree is the rhizosphere in name space, and the child node of the tree root corresponds to the domain field in domain name, domain Position of the field in domain name more keeps left, and corresponding child node rank is lower, and the domain field of left end is corresponding in domain name Child node be leaf node, each child node possess one indicate its corresponding domain field occur in inquiry of the domain name database frequency The weight of rate;
Feature extraction is carried out according to provisional domain name characteristic to all child nodes in inquiry of the domain name tree in addition to leaf node, Obtain the domain name feature of each child node;
All child nodes in inquiry of the domain name tree in addition to leaf node are clustered according to the domain name feature of extraction, are obtained Multiple subsets;
Subset of the screening comprising child node quantity less than a threshold values from the multiple subset, as doubtful interim domain Collection exports a doubtful provisional domain name list according to doubtful interim domain subset.
Domain name inquiry request information includes: that the inquiry of the domain name in inquiry of the domain name database requests to store in original log The record that generates when being used of domain name.
The provisional domain name characteristic includes:
1) most domain name frequency of use in provisional domain name and its place domain are close to 0;
2) the left end field of provisional domain name and its most domain names in the domain of place is randomly generated word string.
The domain name feature of the child node includes:
1) branch's child node quantity under the child node;
2) intermediate value of each branch's child node frequency of occurrences under the child node;
3) mean value of the entropy of the corresponding domain field of each branch's child node under the child node;
4) variance of the entropy of the corresponding domain field of each branch's child node under the child node.
Further, the threshold values is 50.
Further, the algorithm of the cluster optional K-MEANS or K-MEDOIDS.
Further, the subset obtained after the cluster includes the domain name spy of the corresponding domain field of child node and child node Sign.
It is described that a doubtful provisional domain name list is exported according to doubtful interim domain subset, comprising: to judge each doubtful interim domain Whether comprising one or more child nodes in subset, the corresponding domain field of the child node is known interim domain field, then successively Domain name corresponding to whole branch's child nodes of each child node and the child node in the subset is exported, is formed doubtful provisional Domain name list.
A kind of provisional domain name identifying system is based on inquiry of the domain name database, comprising:
One inquiry of the domain name tree constructs module, to read inquiry of the domain name database, and according to the inquiry of the domain name in database Solicited message constructs an inquiry of the domain name tree;Wherein, the tree root of inquiry of the domain name tree is the rhizosphere in name space, the son section of tree root Domain field in the corresponding domain name of point, position of the domain field in domain name more keep left, and corresponding child node rank is lower, is located at domain The corresponding child node of domain field of left end is leaf node in name, and each child node, which possesses one, indicates that its corresponding domain field exists The weight of the frequency of occurrences in inquiry of the domain name database;
One domain name feature extraction module, to all child nodes in inquiry of the domain name tree in addition to leaf node according to interim Property domain name characteristic carry out feature extraction, obtain the domain name feature of each child node;
One domain name cluster module, to according to the domain name feature of each child node in inquiry of the domain name tree in addition to leaf node All child nodes are clustered, and multiple subsets are obtained;Screening is comprising child node quantity less than a threshold values from the multiple subset Subset one doubtful provisional domain name list is exported according to doubtful interim domain subset as doubtful interim domain subset.
By taking above-mentioned technical proposal, provisional domain name recognition methods of the invention and system are deposited compared with the prior art In following advantages:
(1) provisional domain can quickly be identified by quickly screening by carrying out domain name identification specifically for provisional domain name Name;
(2) by carrying out domain name feature extraction using inquiry of the domain name data, entire identification process independently of domain name service, Domain name service will not be impacted;
(3) identification process does not need collection and training sample data, reduces cost of labor;
(4) feature extraction rule can be customized freely, and clustering algorithm also can be with flexible choice.
Detailed description of the invention
Fig. 1 is the provisional domain name identification process schematic diagram in one embodiment of the invention.
Fig. 2 is the configuration diagram of the inquiry of the domain name tree in one embodiment of the invention.
Fig. 3 is the subset list after the cluster in one embodiment of the invention.
Fig. 4 is the partial content in one embodiment of the invention in a subset.
Specific embodiment
To enable features described above and advantage of the invention to be clearer and more comprehensible, special embodiment below, and institute's attached drawing is cooperated to make Detailed description are as follows.
Firstly, it is necessary to illustrate the working principle of the invention and technical concept.
Provisional domain name is generally configured with following characteristic:
(1) most domain name frequency of use in the domain name and its place domain are close to 0;
(2) the left end field of the domain name and its most domain names in the domain of place is randomly generated word string.
The present invention is based on features described above and identifies to provisional domain name.
Provisional domain name recognition methods provided by the present invention is as shown in Fig. 1.
Based on an inquiry of the domain name database, comprising the following steps:
Inquiry of the domain name database is read, a domain name is constructed according to the inquiry of the domain name solicited message in inquiry of the domain name database and is looked into Ask tree;The tree root of the inquiry of the domain name tree is the rhizosphere in name space, and the child node of the tree root corresponds to the domain field in domain name, domain Position of the field in domain name more keeps left, and corresponding child node rank is lower, and the domain field of left end is corresponding in domain name Child node be leaf node, each child node possess one indicate its corresponding domain field occur in inquiry of the domain name database frequency The weight of rate;
Feature extraction is carried out according to provisional domain name characteristic to all child nodes in inquiry of the domain name tree in addition to leaf node, Obtain the domain name feature of each child node;
All child nodes in inquiry of the domain name tree in addition to leaf node are clustered according to the domain name feature of extraction, are obtained Multiple subsets;
Subset of the screening comprising child node quantity less than a threshold values from the multiple subset, as doubtful interim domain Collection exports a doubtful provisional domain name list according to doubtful interim domain subset.
Wherein, inquiry of the domain name database is for recording the domain name from terminal user that recurrence domain name received server-side arrives Inquiry request original log, and be connected in provisional domain name identifying system as input terminal.When certain domain name is only used once then A record is generated, which includes all fields of the domain name, and is stored in the log.
The provisional domain name identifying system for realizing the above method mainly includes three big modules: inquiry of the domain name tree constructs module, Domain name feature extraction module and domain name cluster module.
Inquiry of the domain name tree constructs module, is responsible for reading inquiry of the domain name database, and asked according to the inquiry of the domain name in database Ask information structuring inquiry of the domain name tree.Wherein, the tree root of inquiry of the domain name tree is the rhizosphere " root " in name space, the son of tree root Node is the top level domain field (such as " com ", " cn ") in name space, and second level child node is second-level domain's field in name space (such as " baidu ", " taobao ").In addition, each node in inquiry of the domain name tree respectively possesses a weight, corresponding word is respectively indicated The frequency that section occurs in inquiry of the domain name database.
A simply example is lifted, if the inquiry of the domain name request sequence observed from inquiry of the domain name database in certain time It arranges as follows:
aa.example.cn
a.example.com
aa.example.cn
ab.bb.example.cn
cd.bb.example.cn
So inquiry of the domain name tree constructed by inquiry of the domain name tree building module is then as shown in Figure 2.
Domain name feature extraction module is responsible for carrying out feature to all child nodes (except leaf node) in inquiry of the domain name tree It extracts.The characteristic that non-transitory domain name is showed is compared according to provisional domain name, feature extraction rule includes but is not limited to:
1) branch's child node quantity under the child node;Branch's child node quantity under the corresponding child node in provisional domain is logical Often opposite meeting is more.
2) intermediate value of each branch's child node frequency of occurrences under the child node;Point under the corresponding child node in provisional domain The frequency that branch child node occurs is opposite can be lower.
3) mean value of the entropy of the corresponding domain field of each branch's child node under the child node;The corresponding sub- section in provisional domain The mean value of the entropy of the corresponding domain field of branch's child node under point is opposite can be larger.
4) variance of the entropy of the corresponding domain field of each branch's child node under the child node;The corresponding sub- section in provisional domain The variance of the entropy of the corresponding domain field of branch's child node under point is opposite can be larger.
For the child node corresponding to the domain example.cn in aforementioned inquiry of the domain name request sequence, 4 features in the domain It is respectively as follows:
Branch's child node quantity: 2
Each branch's child node corresponding domain field frequency of occurrences intermediate value: 2
Each branch's child node corresponding domain field entropy mean value: 0
Each branch's child node corresponding domain field entropy variance: 0
Separately corresponding to the domain bb.example.cn in aforementioned inquiry of the domain name request sequence for child node, 4 of the domain Feature is respectively as follows:
Branch's child node quantity: 4
Each branch's child node corresponding domain field frequency of occurrences intermediate value: 1
Each branch's child node corresponding domain field entropy mean value: 1
Each branch's child node corresponding domain field entropy variance: 0
It can be seen from the above, the domain name feature of the corresponding child node of domain field has biggish otherness in provisional domain name, After domain name feature extraction, it is responsible for according to above-mentioned domain name feature by domain name cluster module to all sub- sections in inquiry of the domain name tree Point is clustered.Provisional domain can be divided in independent subset.And due to domain name provisional in the domain name of magnanimity Quantity will be far fewer than non-transitory domain name, so the provisional domain overwhelming majority is included in subdomain in these independent subsets The subset of negligible amounts.Wherein, used clustering algorithm can be the General Clustering Algorithms such as K-MEANS, K-MEDOIDS.
After cluster, all domains in inquiry of the domain name tree will be divided into K independent subset.Subset includes child node pair The domain name feature of the domain field and child node answered.It for each subset, is screened first according to a threshold values, from the multiple Screening is less than the subset of the threshold values comprising child node quantity in subset, as doubtful interim domain subset.Generally taking threshold values is 50, should The foundation that threshold values obtains is, by calculate include in multiple subsets known to interim domain and the subset include all subdomains quantity Ratio, the ratio are greater than the subdomain quantity in twentieth subset included generally below 50.
For these doubtful interim domain subsets, if wherein including some or certain known interim domains, it is sequentially output it In each domain under whole child nodes corresponding to domain name, the as doubtful provisional domain name list of this system institute output.
It is illustrated below with an embodiment, certain large-scale public recursion server log is analyzed (daily 5.5 hundred million times Inquire, contain more than 3,100 ten thousand domain names after duplicate removal), the domain name feature of all child nodes is obtained by preceding method and system, And clustered using k-means algorithm, k=12 is chosen, successfully common temporary domain name can be sorted in small set.Such as Fig. 3 It is shown, in the subset list after cluster [1] row in have nine subsets, have 3 subsets in [10], wherein each subset with it includes The number of subdomain indicates, it can be seen that wherein 7 sons concentrate the subdomain number for including less than 50.
[1] row a subset is analyzed, such as the contents fragment that Fig. 4 is the subset, common temporary domain name is (such as Avqs.mcafee.com, cdntip.com etc. are all contained in the subset.

Claims (7)

1. a kind of provisional domain name recognition methods is based on an inquiry of the domain name database, comprising the following steps:
Read inquiry of the domain name database, the inquiry of the domain name data-base recording recurrence domain name received server-side to come self terminal use Original log is requested in the inquiry of the domain name at family;An inquiry of the domain name is constructed according to the inquiry of the domain name solicited message in inquiry of the domain name database Tree;The tree root of the inquiry of the domain name tree is the rhizosphere in name space, and the child node of the tree root corresponds to the domain field in domain name, domain word Position of the section in domain name more keeps left, and corresponding child node rank is lower, and the domain field of left end is corresponding in domain name Child node is leaf node, and each child node, which possesses one, indicates its corresponding domain field frequency of occurrences in inquiry of the domain name database Weight;
Feature extraction is carried out according to provisional domain name characteristic to all child nodes in inquiry of the domain name tree in addition to leaf node, is obtained The domain name feature of each child node;
The provisional domain name characteristic includes:
1) most domain name frequency of use in provisional domain name and its place domain are close to 0;
2) the left end field of provisional domain name and its most domain names in the domain of place is randomly generated word string;
The domain name feature of the child node includes:
1) branch's child node quantity under the child node;
2) intermediate value of each branch's child node frequency of occurrences under the child node;
3) mean value of the entropy of the corresponding domain field of each branch's child node under the child node;
4) variance of the entropy of the corresponding domain field of each branch's child node under the child node;
All child nodes in inquiry of the domain name tree in addition to leaf node are clustered according to the domain name feature of extraction, are obtained multiple Subset;
Subset of the screening comprising child node quantity less than a threshold values from the multiple subset, as doubtful interim domain subset, root A doubtful provisional domain name list is exported according to doubtful interim domain subset.
2. provisional domain name recognition methods as described in claim 1, which is characterized in that domain name inquiry request packet It includes: the record that the domain name stored in the inquiry of the domain name request original log in inquiry of the domain name database generates when being used.
3. provisional domain name recognition methods as described in claim 1, which is characterized in that the threshold values is 50.
4. provisional domain name recognition methods as described in claim 1, which is characterized in that the optional K- of the algorithm of the cluster MEANS or K-MEDOIDS.
5. provisional domain name recognition methods as described in claim 1, which is characterized in that the subset obtained after the cluster includes The domain name feature of the corresponding domain field of child node and child node.
6. provisional domain name recognition methods as claimed in claim 5, which is characterized in that described defeated according to doubtful interim domain subset A doubtful provisional domain name list out, comprising: judge whether comprising one or more child nodes in each doubtful interim domain subset, it should The corresponding domain field of child node is known interim domain field, the then each child node and the child node being sequentially output in the subset Whole branch's child nodes corresponding to domain name, form doubtful provisional domain name list.
7. a kind of provisional domain name identifying system using claim 1 the method is based on inquiry of the domain name database, comprising:
One inquiry of the domain name tree constructs module, requests to read inquiry of the domain name database, and according to the inquiry of the domain name in database One inquiry of the domain name tree of information structuring;Wherein, the tree root of inquiry of the domain name tree is the rhizosphere in name space, the child node pair of tree root The domain field in domain name is answered, position of the domain field in domain name more keeps left, and corresponding child node rank is lower, is located in domain name The corresponding child node of domain field of left end is leaf node, and each child node, which possesses one, indicates its corresponding domain field in domain name Inquire the weight of the frequency of occurrences in database;
One domain name feature extraction module, to all child nodes in inquiry of the domain name tree in addition to leaf node according to provisional domain Name characteristic carries out feature extraction, obtains the domain name feature of each child node;
One domain name cluster module, to the domain name feature according to each child node to all in addition to leaf node in inquiry of the domain name tree Child node is clustered, and multiple subsets are obtained;Son of the screening comprising child node quantity less than a threshold values from the multiple subset Collection exports a doubtful provisional domain name list according to doubtful interim domain subset as doubtful interim domain subset.
CN201510736531.1A 2015-11-03 2015-11-03 Provisional domain name recognition methods and system Active CN105430112B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510736531.1A CN105430112B (en) 2015-11-03 2015-11-03 Provisional domain name recognition methods and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510736531.1A CN105430112B (en) 2015-11-03 2015-11-03 Provisional domain name recognition methods and system

Publications (2)

Publication Number Publication Date
CN105430112A CN105430112A (en) 2016-03-23
CN105430112B true CN105430112B (en) 2019-02-22

Family

ID=55508048

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510736531.1A Active CN105430112B (en) 2015-11-03 2015-11-03 Provisional domain name recognition methods and system

Country Status (1)

Country Link
CN (1) CN105430112B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102685145A (en) * 2012-05-28 2012-09-19 西安交通大学 Domain name server (DNS) data packet-based bot-net domain name discovery method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2569711A4 (en) * 2010-05-13 2017-03-15 VeriSign, Inc. Systems and methods for identifying malicious domains using internet-wide dns lookup patterns

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102685145A (en) * 2012-05-28 2012-09-19 西安交通大学 Domain name server (DNS) data packet-based bot-net domain name discovery method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DNS Noise: Measuring the Pervasiveness of Disposable Domains in Modern DNS Traffic;Yizheng Chen etal;《2014 44th Annual IEEE/IFTP International Conference on Dependable Systems and Networks》;20140922;第604页左栏-第605页右栏,图8

Also Published As

Publication number Publication date
CN105430112A (en) 2016-03-23

Similar Documents

Publication Publication Date Title
US20210385236A1 (en) System and method for the automated detection and prediction of online threats
US10778702B1 (en) Predictive modeling of domain names using web-linking characteristics
CN109885615B (en) Index-based block chain light client-oriented range query verifiable query method
Gurajala et al. Fake Twitter accounts: profile characteristics obtained using an activity-based pattern detection approach
US9516058B2 (en) Method and system for determining whether domain names are legitimate or malicious
CN106982150B (en) Hadoop-based mobile internet user behavior analysis method
CN105827594A (en) Suspicion detection method based on domain name readability and domain name analysis behavior
CN103440139A (en) Acquisition method and tool facing microblog IDs (identitiesy) of mainstream microblog websites
CN107547671A (en) A kind of URL matching process and device
CN102110132A (en) Uniform resource locator matching and searching method, device and network equipment
Dodd et al. The changing patterns of plant naturalization in Australia
CN103051637A (en) User identification method and device
CN102685145A (en) Domain name server (DNS) data packet-based bot-net domain name discovery method
JP2015513133A (en) Spam detection system and method using character histogram
US20190370384A1 (en) Ensemble-based data curation pipeline for efficient label propagation
CN103067387B (en) A kind of anti-phishing monitoring system and method
CN103237094A (en) Method and device for user identification
CN111695597A (en) Credit fraud group recognition method and system based on improved isolated forest algorithm
Sarabi et al. Characterizing the internet host population using deep learning: A universal and lightweight numerical embedding
Singh et al. Comparison analysis of web usage mining using pattern recognition techniques
CN105530251A (en) Method and device for identifying phishing website
CN113438209B (en) Phishing website detection method based on improved Stacking strategy
WO2006127240A1 (en) Method and apparatus for user identification in computer traffic
CN105430112B (en) Provisional domain name recognition methods and system
Prathibha et al. Analysis of hybrid intrusion detection system based on data mining techniques

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant