CN110099059A - A kind of domain name recognition methods, device and storage medium - Google Patents

A kind of domain name recognition methods, device and storage medium Download PDF

Info

Publication number
CN110099059A
CN110099059A CN201910373033.3A CN201910373033A CN110099059A CN 110099059 A CN110099059 A CN 110099059A CN 201910373033 A CN201910373033 A CN 201910373033A CN 110099059 A CN110099059 A CN 110099059A
Authority
CN
China
Prior art keywords
domain name
analyzed
feature
vector
feature vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910373033.3A
Other languages
Chinese (zh)
Other versions
CN110099059B (en
Inventor
郭真毅
董志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910373033.3A priority Critical patent/CN110099059B/en
Publication of CN110099059A publication Critical patent/CN110099059A/en
Application granted granted Critical
Publication of CN110099059B publication Critical patent/CN110099059B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/30Managing network names, e.g. use of aliases or nicknames
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1433Vulnerability analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • H04L67/025Protocols based on web technology, e.g. hypertext transfer protocol [HTTP] for remote control or remote monitoring of applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a kind of domain name recognition methods, device and storage medium, the embodiment of the present invention is by receiving domain name identification instruction;It identifies that instruction obtains multiple client and records for the access of domain name to be analyzed according to domain name, and the corresponding feature vector of domain name to be analyzed is constructed according to access record;Feature Dimension Reduction processing is carried out to the corresponding feature vector of the domain name to be analyzed, obtains feature vector after the corresponding dimensionality reduction of domain name to be analyzed;The similarity between domain name to be analyzed is calculated based on feature vector after the corresponding dimensionality reduction of the domain name to be analyzed;The domain name to be analyzed is clustered based on the similarity between domain name to be analyzed, the domain name collection after being clustered, analysis domain name is treated according to the domain name collection after the cluster and is identified.The accuracy of domain name identification can be improved in the program.

Description

A kind of domain name recognition methods, device and storage medium
Technical field
The present invention relates to fields of communication technology, and in particular to a kind of domain name recognition methods, device and storage medium.
Background technique
Currently, malice domain name has become one of the harm that domestic or even global network safety filed is paid close attention to the most. Malice domain name is also malicious websites, refers to the loophole of the website use browser or application software, is embedded in malicious code, with In the unwitting situation in family, website that the machine of user is distorted or destroyed.For other counterfeit websites such as bank's net It stands, e-commerce website, although the machine of user is not distorted or destroyed, is also defined as malice domain name.
Malice domain name is capable of forming a huge network system, infected system is controlled by network, while not Cause network harm together, such as quickly propagate wooden horse worm, steal in the short time a large amount of sensitive informations, seize system resource into Row illegal objective makes profit, initiates large-scale distributed denial of service attack etc., tracked to harm and loss inhibit to bring it is huge Trouble.
Traditional malice domain name detection mainly uses rogue program conversed analysis technology.And conversed analysis binary system wooden horse is very The time is expended, and it also requires the case where considering shell adding causes inverse so that the acquisition of conversed analysis rule depends on reverse personnel To the accuracy rate and inefficient of analysis identification.
Summary of the invention
The embodiment of the present invention provides a kind of domain name recognition methods, device and storage medium, and the standard of domain name identification can be improved True property and efficiency.
In order to solve the above technical problems, the embodiment of the present invention the following technical schemes are provided:
Receive domain name identification instruction;
Identify that instruction obtains multiple client and records for the access of domain name to be analyzed according to domain name, and according to described Access record constructs the corresponding feature vector of domain name to be analyzed;
Feature Dimension Reduction processing is carried out to the corresponding feature vector of the domain name to be analyzed, obtains the corresponding drop of domain name to be analyzed Feature vector after dimension;
The similarity between domain name to be analyzed is calculated based on feature vector after the corresponding dimensionality reduction of the domain name to be analyzed;
The domain name to be analyzed is clustered based on the similarity between domain name to be analyzed, the domain name after being clustered Collection;
Analysis domain name is treated according to the domain name collection after the cluster to be identified, recognition result is obtained.
Correspondingly, the embodiment of the present invention also provides a kind of domain name identification device, comprising:
Receiving unit, for receiving domain name identification instruction;
Acquiring unit, for identifying that instruction obtains multiple client and remembers for the access of domain name to be analyzed according to domain name Record, and the corresponding feature vector of domain name to be analyzed is constructed according to access record;
Dimensionality reduction unit, for carrying out Feature Dimension Reduction processing to the corresponding feature vector of the domain name to be analyzed, obtain to point Feature vector after the corresponding dimensionality reduction of analysis domain name;
Computing unit calculates between domain name to be analyzed for feature vector after being based on the corresponding dimensionality reduction of the domain name to be analyzed Similarity;
Cluster cell is obtained for being clustered based on the similarity between domain name to be analyzed to the domain name to be analyzed Domain name collection after cluster;
Recognition unit is identified for treating analysis domain name according to the domain name collection after the cluster.
Optionally, in some embodiments, the dimensionality reduction unit includes determining that subelement and Hash change subelement:
The determining subelement, for hash function needed for determining the corresponding feature vector of the domain name to be analyzed;
The Hash changes subelement, for based on the hash function to the corresponding feature vector of the domain name to be analyzed Hash variation is carried out, feature vector after dimensionality reduction is obtained.
Optionally, in some embodiments, the Hash variation subelement includes adding module, initialization module, comparison Module and dimensionality reduction module;
The adding module, for adding corresponding Hash subcharacter vector to described eigenvector according to hash function;
The initialization module obtains initial for initializing the characteristic value in the Hash subcharacter vector Change Hash feature subvector;
The contrast module, for the initialization Hash feature subvector is corresponding with the domain name to be analyzed sub special Sign vector compares, and obtains comparing result;
The dimensionality reduction module, for being carried out according to the comparing result to the corresponding subcharacter vector of the domain name to be analyzed Dimensionality reduction obtains feature vector after dimensionality reduction.
Optionally, in some embodiments, the contrast module includes: comparison submodule;
The comparison submodule, for by it is described initialization Hash feature subvector characteristic value and the domain name to be analyzed Corresponding subcharacter vector characteristics value carries out size comparison, obtains comparing result.
Optionally, in some embodiments, the dimensionality reduction module includes replacement submodule, combination submodule and dimensionality reduction submodule Block:
The replacement submodule, if the characteristic value for the initialization Hash feature subvector is greater than the domain to be analyzed The corresponding subcharacter vector characteristics value of name, and the corresponding subcharacter vector characteristics value of the domain name to be analyzed is greater than preset value, then The characteristic value of the initialization Hash feature subvector is replaced;If the characteristic value of the initialization Hash feature subvector Subcharacter vector characteristics value corresponding greater than the domain name to be analyzed, and the corresponding subcharacter vector characteristics of the domain name to be analyzed Value is equal to preset value, then the characteristic value of the initialization Hash feature subvector remains unchanged;
The combination submodule, for the characteristic value of replaced Hash feature subvector and the Hash remained unchanged is special The characteristic value of sign subvector is combined, and obtains combination Hash feature subvector;
The dimensionality reduction submodule, for according to combination Hash feature subvector to the corresponding subcharacter of the domain name to be analyzed Vector carries out dimensionality reduction, obtains feature vector after dimensionality reduction.
Optionally, in some embodiments, described device further includes filter element;
The filter element is filtered for being filtered processing to the corresponding feature vector of the domain name to be analyzed Feature vector afterwards;Using filtered feature vector as the corresponding feature vector of domain name to be analyzed.
Optionally, in some embodiments, the computing unit includes the first acquisition subelement and computation subunit:
Described first obtains subelement, for obtaining the similarity calculation coefficient between domain name to be analyzed;Obtain it is described to The vector line number of feature vector after the corresponding dimensionality reduction of analysis domain name;
The computation subunit, for according to the similarity calculation coefficient and vector line number calculate domain name to be analyzed it Between similarity.
Optionally, in some embodiments, the acquiring unit further includes sorting subunit, detection sub-unit and building Unit;
The sorting subunit, for being ranked up according to sequence track to multiple client;
The detection sub-unit whether there is access record to the domain name to be analyzed for client after detecting sequence, The corresponding testing result of client after being sorted;
The building subelement, for being put in order based on client after sequence corresponding testing result, client, structure Build the corresponding feature vector of domain name to be analyzed.
Optionally, in some embodiments, the cluster cell includes the first contrast subunit and cluster subelement;
The contrast subunit, the similarity for being analysed between domain name are compared with preset value;
The cluster subelement, the domain name to be analyzed for similarity to be greater than preset value are clustered to same domain name collection extremely In, the domain name collection after being clustered.
Optionally, in some embodiments, the recognition unit includes extracting subelement, the second contrast subunit and determination Subelement;
The extraction subelement, for extracting the feature of target domain name;
Second contrast subunit, the domain for concentrating the domain name after the feature of the target domain name and the cluster Name feature compares;
The determining subelement, if concentrating for the domain name after the cluster in the presence of consistent with the target domain name feature Domain name, it is determined that the domain name that the domain name after the cluster is concentrated is target domain name.
Optionally, in some embodiments, described device further includes determination unit and transmission unit;
The determination unit, if for identifying that the domain name that domain name is concentrated is target domain name, it is determined that domain name collection The access client of middle domain name;
The transmission unit, for sending a warning message to the access client.
Optionally, in some embodiments, the acquiring unit further includes that display subelement and second obtain subelement;
The display subelement, for showing that domain name detects the page, the page includes domain name identification control;
Described second obtains subelement, for the trigger action based on user for domain name identification control, obtains more A client is recorded for the access of domain name to be analyzed.
Optionally, in some embodiments, described device further includes display unit;
The display unit, for showing domain name recognition result in the domain name detection page.
In addition, the embodiment of the present invention also provides a kind of storage medium, the storage medium is stored with a plurality of instruction, the finger It enables and being loaded suitable for processor, to execute the step in any domain name recognition methods provided in an embodiment of the present invention.
The embodiment of the present invention is by receiving domain name identification instruction;Identify that instruction obtains multiple client needle according to domain name Treat the access record of analysis domain name;The corresponding feature vector of domain name to be analyzed is constructed according to access record;To constructing The feature vector arrived carries out Feature Dimension Reduction processing, obtains feature vector after the corresponding dimensionality reduction of domain name to be analyzed;Based on described wait divide Feature vector calculates the similarity between domain name to be analyzed after the corresponding dimensionality reduction of analysis domain name;Based on similar between domain name to be analyzed Degree clusters the domain name to be analyzed, and domain name collection after being clustered can be treated point according to the domain name collection after the cluster Analysis domain name is identified;Domain name is clustered to domain name according to similarity and is concentrated by this programme, according to the domain of each classification after cluster Name collection is identified again, since the domain name that same domain name is concentrated is same or similar classification, is effectively prevented out in identification process The situation of existing identification types mistake provides the accuracy of domain name identification, and the domain name that domain name is concentrated is similar or identical class Type, does not need to compare all domain name features and can be obtained the type of domain name, reduces the recognition time of domain name, improves domain name Recognition efficiency.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those skilled in the art, without creative efforts, it can also be obtained according to these attached drawings other attached Figure.
Fig. 1 a is a schematic diagram of a scenario of domain name recognition methods provided in an embodiment of the present invention;
Fig. 1 b is another schematic diagram of a scenario of domain name recognition methods provided in an embodiment of the present invention;
Fig. 2 is the flow chart of domain name recognition methods provided in an embodiment of the present invention;
Fig. 3 is the schematic diagram of the domain name detection page of domain name recognition methods provided in an embodiment of the present invention;
Fig. 4 is another flow chart of domain name recognition methods provided in an embodiment of the present invention;
Fig. 5 a is the schematic diagram of service cluster provided in an embodiment of the present invention;
Fig. 5 b is the architecture diagram of domain name recognition methods provided in an embodiment of the present invention;
Fig. 6 a is a structural schematic diagram of domain name identification device provided in an embodiment of the present invention;
Fig. 6 b is another structural schematic diagram of domain name identification device provided in an embodiment of the present invention;
Fig. 6 c is another structural schematic diagram of domain name identification device provided in an embodiment of the present invention;
Fig. 7 is the structural schematic diagram of server provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those skilled in the art's every other implementation obtained without creative efforts Example, shall fall within the protection scope of the present invention.
The embodiment of the present invention provides a kind of domain name recognition methods, device and storage medium.It is described in detail separately below.
The domain name identification device can integrate in the network device, which can be server, such as cloud service Device is also possible to the equipment such as terminal.
For example, with reference to Fig. 1 a, by taking the domain name identification device specifically integrates in the server as an example, firstly, having in client There is domain name to detect the page, user can identify that control triggers domain name identification instruction, domain by clicking the domain name in the domain name detection page Name identifies that the domain name that user ready to receive triggers identifies instruction, is then based on domain name identification instruction acquisition client and treats analysis domain The access record of name, specifically, all clients treat the access record of analysis domain name in available server, can also be with It obtains a certain range of client and treats the access record of analysis domain name, for example obtain the access that client treats analysis domain name Record etc. can specifically obtain it is not limited here according to the range that user inputs, can be with when user does not limit The all clients that default obtains in server treat the access record of analysis domain name.Then according to the access record building to The corresponding feature vector of domain name is analyzed, specifically traversal multiple client is recorded for the access of domain name to be analyzed first;If client There is access record to the domain name to be analyzed in end, then be recorded as the first element, for example be 1, if it does not exist, be then recorded as second Element, for example be 0;Then the first element and second element that record obtains are arranged according to the sequence of traversal Building obtains the corresponding feature vector of domain name to be analyzed.Then Feature Dimension Reduction processing is carried out to the feature vector that building obtains, obtained Feature vector after to the corresponding dimensionality reduction of domain name to be analyzed, to reduce the quantity of domain name to be analyzed;It is then based on described to be analyzed Feature vector calculates the similarity between domain name to be analyzed after the corresponding dimensionality reduction of domain name;Based on the similarity between domain name to be analyzed The domain name to be analyzed is clustered, the high domain name to be analyzed cluster of similarity is concentrated to same domain name, after obtaining cluster Domain name collection, specific cluster process and result are as shown in Figure 1 b, can treat analysis domain name according to the domain name collection after the cluster Classification and Identification is carried out, since this programme is to identify the high domain name cluster of similarity again in domain name concentration, is not needed to every A domain name is identified the type you can learn that domain name, so that the quantity for greatly reducing domain name identification improves domain with the time The efficiency of name identification.And first domain name is clustered to be identified again, due to clustering the similar of the domain name concentrated in same domain name Degree is higher, generally same type domain name, therefore decreases the situation of domain name type identification mistake, to improve domain name identification Accuracy.
It is described in detail separately below.It should be noted that the following description sequence is not as excellent to embodiment The restriction of choosing sequence.
In the present embodiment, it will be described from the angle of domain name identification device, which can specifically collect At in the network equipment such as terminal or server equipment.
A kind of domain name recognition methods, comprising: receive domain name identification instruction;Identify that instruction obtains multiple visitors according to domain name Family end is recorded for the access of domain name to be analyzed;And the corresponding feature vector of domain name to be analyzed is constructed according to access record; Feature Dimension Reduction processing is carried out to the corresponding feature vector of the domain name to be analyzed, obtains feature after the corresponding dimensionality reduction of domain name to be analyzed Vector;The similarity between domain name to be analyzed is calculated based on feature vector after the corresponding dimensionality reduction of the domain name to be analyzed;Based on to Similarity between analysis domain name clusters the domain name to be analyzed, domain name collection after being clustered, after the cluster Domain name collection treat analysis domain name identified.
As shown in Fig. 2, the detailed process of the domain name identification device can be such that
S201 receives domain name identification instruction;
Firstly, having domain name to detect the page, the internet that domain name is formed by a string with the name that point separates in server Upper a certain computer or the title for calculating unit, for identifying the electronic bearing of computer when data are transmitted (sometimes referred to as Geographical location).Domain Name System (DNS, Domain Name System, sometimes also referred to as domain name) is one of internet Kernel service, it can make one more easily as the distributed data base that can mutually map domain name and IP address Internet is accessed, remembers to be detected by clicking domain name by the IP address number string that machine is directly read, user without spending Domain name identification control triggering domain name identification instruction in the page, so as to server reception.
Wherein, user also can be set timing and carry out domain name identification, when user setting timing carries out domain name identification, Reach the time of user setting domain name identification, can the identification instruction of automatic trigger domain name, server domain name ready to receive identification refers to It enables, does not need user and trigger manually every time.User, can also manual trigger field after setting timing carries out domain name identification Name identification instruction, for example, when needing to carry out domain name identification there are emergency case, and current time is not at the domain of user setting When name recognition time, user can be by triggering domain name identification instruction, so as to server reception, to carry out domain name identification manually.
S202 identifies that instruction obtains multiple client and records for the access of domain name to be analyzed according to domain name;And root The corresponding feature vector of domain name to be analyzed is constructed according to access record;
After server receives domain name identification instruction, client can be obtained based on domain name identification instruction and treat analysis domain Name access record, client is user terminal, be it is corresponding with server, provide the program of local service for client.In addition to Some except the application program of local runtime, are typically mounted in common client computer, need to work in coordination with server-side Operation, more common user terminal includes the web browser used such as WWW, receives Email visitor when posting Email Family end and the client software of instant messaging etc..For this kind of application programs, need to have in network corresponding server and Service routine provides corresponding service, such as database service, E-mail service etc., in this way in client-server End, needs to establish specific communication connection, to guarantee the normal operation of application program.Specifically, institute in available server Some clients treat the access record of analysis domain name, and also available a certain range of client treats the visit of analysis domain name It asks record, for example obtains client and treat access record of analysis domain name etc., it is not limited here, specifically can be defeated according to user The range entered is obtained, and when user does not limit, can be defaulted all clients obtained in server and be treated analysis domain The access record of name.Specifically, the client in server treats the access record of analysis domain name, can specifically pass through domain to be analyzed Name, which records access, extracts analysis acquisition, if in the access record of domain name to be analyzed including the client in server, Determine by comprising client treat analysis domain name exist access record.Not by comprising client then treat analysis domain name do not deposit It is recorded in access.
Therefore, the step obtains multiple client and records for the access of domain name to be analyzed, comprising:
Show that domain name detects the page, the page includes domain name identification control;
Based on user for the trigger action of domain name identification control, multiple client is obtained for domain name to be analyzed Access record.
The domain name detection page is provided in server, user can search domain name by fixed interface or network address and detect page Face, after the domain name detection page is searched in triggering, the lookup domain name detection page that server can be triggered according to user refers to user It enables and carries out domain name detection page lookup, and the domain name found the detection page is shown, wherein as shown in figure 3, the page Face includes domain name identification control, the operation such as clicks or slide for user, user is by clicking or touching domain name identification control Part can trigger domain name identification instruction, and server can identify the trigger action of control triggering based on user for domain name, The access that multiple client is obtained for domain name to be analyzed records.
Then the corresponding feature vector of domain name to be analyzed is constructed according to access record, specifically, according to sequence track Multiple client is ranked up;Client treats analysis domain name with the presence or absence of access record, after obtaining sequence after detection sequence The corresponding testing result of client;Put in order based on client after sequence corresponding testing result, client, construct to point Analyse the corresponding feature vector of domain name.
Therefore, further, include: referring to Fig. 4, the step S202
S401 is ranked up multiple client according to sequence track;
S402, client records the domain name to be analyzed with the presence or absence of access after detection sequence, client after being sorted Hold corresponding testing result;
S403 is put in order based on client after sequence corresponding testing result, client, constructs domain name pair to be analyzed The feature vector answered.
Specifically, multiple client is ranked up according to sequence track, specific sortord is randomly ordered, as long as protecting Card obtains sequence consensus of the client to the domain name to be analyzed with the presence or absence of access record, client pair after detection sequence With the presence or absence of access record, the corresponding testing result of client after being sorted can specifically pass through the domain name to be analyzed Specific element records the access record of client, for example, if client has access record to the domain name to be analyzed, It is recorded as the first element, if it does not exist, is then recorded as second element, specifically, first element can be defined as 1, second Element can be defined as 0, then will record the first obtained element and second element according to the progress that puts in order of client Arrangement, building obtain the corresponding feature vector of domain name to be analyzed, such as first client pair got in multiple client There is access record in domain name to be analyzed, then be defined as 1, second client got treats analysis domain name and there is access note Record is then defined as 1, and a client got of third treats analysis domain name there is no access record, then is defined as 0, then wait divide The feature vector for analysing domain name is (1,1,0);If in another domain name to be analyzed, first client got in multiple client Analysis domain name is treated at end, and there is no access records, then are defined as 0, second client got is treated analysis domain name and be not present Access record is then defined as 0, and the client that third is got treats analysis domain name and there is access record, then is defined as 1, then The feature vector of domain name to be analyzed is (0,0,1), it is therefore to be understood that client has multiple, while domain name to be analyzed Also have multiple, therefore the access of each domain name to be analyzed can be recorded with specific reference to multiple client, be defined one by one, structure It builds to obtain the feature vector of each domain name to be analyzed.
S203 carries out Feature Dimension Reduction processing to the corresponding feature vector of the domain name to be analyzed, obtains domain name pair to be analyzed Feature vector after the dimensionality reduction answered;
Further, after obtaining the corresponding feature vector of domain name to be analyzed, it is corresponding that analysis domain name can be treated first Feature vector carry out Feature Dimension Reduction processing, to reduce the calculation amount of domain name to be analyzed.
Specifically, carrying out Feature Dimension Reduction processing to the corresponding feature vector of the domain name to be analyzed may include:
Hash function needed for determining the corresponding feature vector of the domain name to be analyzed;
Hash variation is carried out to the corresponding feature vector of the domain name to be analyzed based on the hash function, after obtaining dimensionality reduction Feature vector.
Specifically, multiple row is carried out due to needing to treat the corresponding feature vector of analysis domain name to upset, then in order to simulate into The effect that every trade is upset determines and selects n random Harsh function h1, h2, h3 ... hn, based on the hash function to described wait divide It analyses the corresponding feature vector of domain name and carries out Hash variation, obtain feature vector after dimensionality reduction, specific Hash change procedure includes:
Corresponding Hash subcharacter vector is added to described eigenvector according to hash function;
Characteristic value in the Hash subcharacter vector is initialized, initialization Hash feature subvector is obtained;
Initialization Hash feature subvector subcharacter vector corresponding with the domain name to be analyzed is compared, is obtained To comparing result;
Dimensionality reduction is carried out to the corresponding subcharacter vector of the domain name to be analyzed according to the comparing result, is obtained special after dimensionality reduction Levy vector.
Corresponding Hash subcharacter vector is added to described eigenvector according to hash function, for example, it is assumed that be analyzed The feature vector of domain name each behavior a, b, c, d, e, a (1,0,0,1), b (0,0,1,0), c (0,1,0,1), d (1,0,1,1), e (0,0,1,0) changes abcde into corresponding line number, adds two hash functions later, wherein h1 (x)=(x+1) mod 5, H2 (x)=(3*x+1) mod 5, notices that x refers to line number here.
S1 S2 S3 S4 h1 h2
0 1 0 0 1 1 1
1 0 0 1 0 2 4
2 0 1 0 1 3 2
3 1 0 1 1 4 0
4 0 0 1 0 0 3
Characteristic value in the Hash subcharacter vector is initialized, initialization Hash feature subvector, tool are obtained Body treatment process is that SIG (i, c) is enabled to indicate element of i-th of hash function on c column.When beginning, by all SIG (i, C) it is initialized as Inf (infinity), i.e., is all initialized as Inf:
S1 S2 S3 S4
h1 Inf Inf Inf Inf
h2 Inf Inf Inf Inf
Further, dimensionality reduction is carried out to the corresponding subcharacter vector of the domain name to be analyzed according to the comparing result, obtained Feature vector after to dimensionality reduction, comprising:
If the characteristic value of the initialization Hash feature subvector is greater than the corresponding subcharacter vector of the domain name to be analyzed Characteristic value, and the corresponding subcharacter vector characteristics value of the domain name to be analyzed is greater than preset value, then it is the initialization Hash is special The characteristic value of sign subvector is replaced;
If the characteristic value of the initialization Hash feature subvector is greater than the corresponding subcharacter vector of the domain name to be analyzed Characteristic value, and the corresponding subcharacter vector characteristics value of the domain name to be analyzed is equal to preset value, then the initialization Hash feature The characteristic value of subvector remains unchanged;
By the characteristic value of replaced Hash feature subvector and the characteristic value of Hash feature subvector that remains unchanged into Row combination obtains combination Hash feature subvector;
Dimensionality reduction is carried out to the corresponding subcharacter vector of the domain name to be analyzed according to combination Hash feature subvector, is dropped Feature vector after dimension.
Initialization Hash feature subvector subcharacter vector corresponding with the domain name to be analyzed is compared, is obtained To comparing result, calculate h1 (r), h2 (r) ... hn (r);For each column c, if the r behavior 0 where c, without place Reason;If the r behavior 1 where c, for each i=1,2 ... n, SIG (i, c) is set to original SIG (i, c) and hi (r) minimum value between.Following calculate the signature matrix, turning next to the 0th row in feature vector;At this moment the value of S2 and S3 is 0, So without change;The value of S1 and S4 is 1, needs to change.H1=1, h2=1.1 is smaller than Inf, so need to be the two positions S1 and S4 It sets corresponding value to replace, i.e., by the characteristic value son corresponding with the domain name to be analyzed of the initialization Hash feature subvector Feature vector characteristic value carries out size comparison, obtains comparing result, effect is as follows after replacement:
S1 S2 S3 S4
h1 1 Inf Inf 1
h2 1 Inf Inf 1
Turning next to the 1st row in feature vector;Only the value of S3 is 1;H1=2 at this time, h2=4;To S3, that column is carried out Replacement, obtains:
S1 S2 S3 S4
h1 1 Inf 2 1
h2 1 Inf 4 1
Turning next to the 2nd row in feature vector;The value of S2 and S4 is 1;H1=3, h2=2;Because in feature vector S4 that Two values of one column are all 1, smaller than 3 and 2, so need to only replace that column of S2:
S1 S2 S3 S4
h1 1 3 2 1
h2 1 2 4 1
Turning next to the 3rd row in feature vector;The value of S1, S3 and S4 are all 1, h1=4, h2=0;Effect is such as after replacement Under:
S1 S2 S3 S4
h1 1 3 2 1
h2 0 2 0 0
Turning next to the 4th row in feature vector;S3 value is 1, h1=0, h2=3, and final effect is as follows:
S1 S2 S3 S4
h1 1 3 0 1
h2 0 2 0 0
In this way, all rows are all traversed once, finally obtained signature matrix is as follows:
S1 S2 S3 S4
h1 1 3 0 1
h2 0 2 0 0
Dimensionality reduction, the signature matrix of last S1 are carried out to the corresponding subcharacter vector of the domain name to be analyzed according to comparing result For (1,0), script feature vector is (1,0,0,1,0), shortens length, and S2 is (3,2), and S3 is (0,0), and S4 is (1,0), is obtained Feature vector after to dimensionality reduction, hence it is evident that as can be seen that the feature vector of domain name to be analyzed all shortens length, to reduce calculating The calculation amount of similarity between domain name to be analyzed, therefore using the signature matrix after last progress hash conversion as to be analyzed Feature vector after the corresponding dimensionality reduction of domain name.
It, can be first right in order to reduce the analysis quantity of domain name to be analyzed before treating analysis domain name and carrying out dimension-reduction treatment Domain name to be analyzed is filtered, specifically:
Processing is filtered to the corresponding feature vector of the domain name to be analyzed, feature vector after being filtered;
Using filtered feature vector as the corresponding feature vector of domain name to be analyzed.
Processing is filtered to the corresponding feature vector of the domain name to be analyzed, the domain name to be analyzed after obtaining filtration treatment Corresponding feature vector.
In specific implementation process, filtration treatment may include the general domain name treated analyze in domain name, the internal domain used Name etc. is filtered, and is filtered to the domain name of fixed malice clique, can specifically include:
(1) obtain fixed general domain name first, i.e., the domain name that substantially each client can access, such as 10000 domain name before Alexa ranking, such as domain name etc. and the general CDN of Google.com (Google, search engine) (Content Delivery Network, content distributing network) domain name, is then analysed to domain name and has determined that with what is got General domain name and general CDN compare, it is consistent with fixed general domain name or general CDN if it exists Domain name, then will be with fixed general domain name or the consistent domain name of general CDN and corresponding feature vector to be analyzed It is removed in domain name list, i.e., determining and fixed general domain name or the consistent domain name of general CDN are normal or general Domain name does not need to be analyzed again, to reduce the calculation amount of domain name to be analyzed.
(2) after removing general domain name or the general CDN in domain name to be analyzed, further obtain white list and Domain name in blacklist.The domain name in white list is obtained first, wherein the domain name in white list may include internal general domain Then name, the domain name etc. for test are analysed to domain name and compare with the domain name in white list, if it exists and in white list Domain name domain name to be analyzed always, then illustrate that consistent domain name to be analyzed is fixed internal domain name, or tested Domain name, then equally by with the consistent domain name to be analyzed of the domain name in white list and corresponding feature vector from domain name to be analyzed It is removed in list, to be further reduced the calculation amount of domain name to be analyzed.Further, the domain name in blacklist is obtained, wherein Domain name in blacklist includes fixed malice domain name, such as the domain name of malice clique etc., be then analysed to domain name with it is black Domain name in list compares, if it exists with the consistent domain name of domain name in blacklist, it is determined that consistent domain name to be analyzed For malice domain name, then will be removed from domain name list to be analyzed with the consistent domain name to be analyzed of domain name in blacklist first, from And it is further reduced the calculation amount of domain name to be analyzed.Then the client with the consistent domain name to be analyzed of domain name in blacklist is obtained End access record, and the client in the presence of access and the consistent domain name to be analyzed of domain name in blacklist is extracted, it then extracts pre- If warning message, and the warning message is sent in the presence of access and the domain name consistent domain name to be analyzed in blacklist Client reminds the user that, reminds it to can suffer from attacking or being invaded, so that user takes precautions against in time.
Then will remove with the domain name in fixed general domain name or the consistent domain name of general CDN, white list with And domain name and corresponding feature vector in the domain name to be analyzed after the domain name in blacklist, as filtered wait divide Analyse the corresponding feature vector of domain name.
S404 is calculated similar between domain name to be analyzed based on feature vector after the corresponding dimensionality reduction of the domain name to be analyzed Degree;
Be analysed to domain name feature vector carry out dimensionality reduction after, can be according to the corresponding dimensionality reduction of domain name to be analyzed after Feature vector calculates the similarity between domain name to be analyzed.
Further, the step S404 includes:
Obtain the similarity calculation coefficient between domain name to be analyzed;
Obtain the vector line number of feature vector after the corresponding dimensionality reduction of the domain name to be analyzed;
The similarity between domain name to be analyzed is calculated according to the similarity calculation coefficient and vector line number.
Specifically, feature vector is divided into default row item (band), Mei Gehang after being analysed to the corresponding dimensionality reduction of domain name Item is made of corresponding row.For each row item, there are a hash functions can be whole by the corresponding line number amount in row item Array at column vector (each column in row item) be mapped to certain domain name concentrate.Identical Hash can be used to all row items Function, a but independent domain name collection array is used for each row item, so even being the same column that do not go together in item Vector will not be hashing onto the same domain name and concentrate.As long as in this way, two feature vectors fallen in some row item it is identical Two column of domain name collection, the two feature vectors are regarded as possible similarity-rough set height, and the candidate as subsequent calculating is right;Then Obtain the similarity calculation coefficient between domain name to be analyzed;Wherein the similarity calculation coefficient between domain name to be analyzed is fixed pre- If the Jaccard (outstanding person blocks German number) of value, such as 0.4;Obtain feature vector after the corresponding dimensionality reduction of the domain name to be analyzed to Measure line number;It is divided into the line number of default row item (band), it can be according to the similarity calculation coefficient and vector line number Calculate the similarity between domain name to be analyzed.Specific formula for calculation is P=1- (1-rC)B, wherein P is represented between domain name to be analyzed Similarity, C is line number amount, and r is similarity calculation coefficient, and B be the quantity for the domain name collection chosen, the value substitution that will acquire In formula, for example assume that similarity calculation coefficient is 0.3, C 3, domain name collection B is 100, then P=1- (1-0.43)100= 0.998, i.e., the similarity between domain name to be analyzed is 0.998.
S405 clusters the domain name to be analyzed based on the similarity between domain name to be analyzed, obtains cluster converse domain Name collection is treated analysis domain name according to the domain name collection after the cluster and is identified.
After the similarity being calculated between domain name to be analyzed, the domain name to be analyzed that similarity is greater than preset value is drawn It assigns to the same domain name to concentrate, that is, is divided to same domain name and concentrates, domain name collection after being clustered.
Specifically, the domain name to be analyzed is clustered based on the similarity between domain name to be analyzed, after obtaining cluster Domain name collection, comprising:
The similarity being analysed between domain name is compared with preset value;
The domain name to be analyzed that similarity is greater than preset value is clustered to same domain name and is concentrated, the domain name collection after being clustered.
The similarity that can be specifically analysed between domain name is compared with preset value, if the phase between domain name to be analyzed It is greater than preset value like degree, is then analysed to domain name and clusters to same domain name concentration, successively clustered, until by all wait divide It analyses domain name cluster to complete, domain name collection after cluster can be obtained.It should be noted that in order to reduce the quantity of subsequent domain name identification, And the accuracy of domain name identification is improved, the preset value of the similarity comparison of setting is biggish value, for example is greater than 0.9 and is less than etc. In 1 etc..
Then analysis domain name is treated according to the domain name collection after the cluster to be identified, concentrated due to the domain name after cluster Domain name similarity is higher, the domain name to be analyzed that part identification domain name is concentrated is carried out, you can learn that the class of remaining domain name to be analyzed Type.Therefore specific identification process is that part identifies the domain name to be analyzed that domain name is concentrated, and obtains the class of part domain name to be analyzed Type can not be known if the type of the part domain name to be analyzed of identification is consistent using the type of part domain name to be analyzed as remaining The type of other domain name to be analyzed.If the Type-Inconsistencies of the part domain name to be analyzed of identification, can be improved the to be analyzed of identification The quantity of domain name, to improve the accuracy of identification, and by the domain name to be analyzed of identification, accounting is more than preset ratio wait divide The type of domain name is analysed as the type for being left unidentified domain name to be analyzed.
Specifically, analysis domain name is treated according to the domain name collection after the cluster to be identified, may include:
Extract the feature of target domain name;
The domain name feature that the feature of the target domain name is concentrated with the domain name after the cluster is compared;
Exist and the consistent domain name of target domain name feature if the domain name after the cluster is concentrated, it is determined that the cluster The domain name that domain name afterwards is concentrated is target domain name.
In specific domain name identification process, the feature of target domain name can be extracted, and by the feature of the target domain name and institute The domain name that domain name is concentrated after clustering is stated to compare, since the similarity of the type of the domain name of same domain name concentration is higher, or It is consistent for type, therefore, part domain name and the feature of target domain name that domain name is concentrated are compared, to reduce pair The quantity of ratio shortens the reduced time, to improve to specific efficiency.Need to illustrate says, the feature of the target domain name of extraction, can be with For the feature of the domain name of malice clique, if the domain name that domain name is concentrated after the cluster exist it is consistent with the feature of the target domain name Domain name, it is determined that the domain name that domain name is concentrated after the cluster is target domain name, such as the domain name of malice clique.Further, Since the time that the domain name of part malice is broken out is shorter, it is understood that there may be the case where not obtaining domain name feature can also pass through at this time It is manually screened, judges that domain name is concentrated with the presence or absence of malice domain name.
Further, the domain name to be analyzed is being clustered based on the similarity between domain name to be analyzed, is being obtained Domain name collection after cluster is treated according to the domain name collection after the cluster after analyzing the step of domain name is identified, further includes:
If identifying, the domain name that domain name is concentrated is target domain name, it is determined that the access client of domain name concentration domain name End;
It sends a warning message to the access client.
If it is determined that the domain name that a certain domain name is concentrated is target domain name, the client of the domain name of domain name concentration was accessed in order to prevent Hold it is under attack, then it needs to be determined that domain name concentrate domain name access client, Xiang Suoshu access client send alarm letter Breath can specifically extract preset warning information, and the warning information is sent to the domain concentrated in the presence of access domain name The client of name, or warning information is generated in real time, and the warning information of generation is sent to access client, to remind use Family reminds it to can suffer from attacking or being invaded, so that user takes precautions against in time.
The domain name recognition methods that the present embodiment proposes, by receiving domain name identification instruction;It is identified and is instructed according to domain name The access that multiple client is obtained for domain name to be analyzed records;The corresponding spy of domain name to be analyzed is constructed according to access record Levy vector;Feature Dimension Reduction processing is carried out to the obtained feature vector of building, obtain after the corresponding dimensionality reduction of domain name to be analyzed feature to Amount;The similarity between domain name to be analyzed is calculated based on feature vector after the corresponding dimensionality reduction of the domain name to be analyzed;Based on wait divide Similarity between analysis domain name clusters the domain name to be analyzed, and domain name collection after being clustered can be according to the cluster Domain name collection afterwards is treated analysis domain name and is identified;Domain name is clustered to domain name according to similarity and is concentrated by this programme, according to cluster The domain name collection of each classification afterwards is identified again, since the domain name that same domain name is concentrated is same or similar classification, is identified The situation for identification types mistake occur is effectively prevented in journey, provides the accuracy of domain name identification, and the domain that domain name is concentrated Entitled similar or identical type, does not need to compare all domain name features and can be obtained the type of domain name, reduces the knowledge of domain name The other time, improve domain name recognition efficiency.
Citing, is described in further detail by the method according to described in preceding embodiment below.
In the present embodiment, it will be illustrated so that the domain name identification device is specifically integrated in Cloud Server as an example.Wherein, The Cloud Server may include a plurality of types of servers, and the quantity of each type of server can be according to concrete application scene Depending on, moreover, these servers can be deployed in different region or computer room, or use different Network Provider.Server Including user's input layer and domain name identification layer.Different levels can be realized using different types of server.For example, such as Shown in Fig. 5 a, for including at least one input server, at least one domain name identification server, wherein various types of The function of server can be such that
(1) server is inputted;
Input server mainly undertakes " access layer " function, the communication for being connected between client and server cluster, Such as referring to Fig. 5 a.
For example, as shown in Figure 5 a, input server (i.e. access layer) specifically can receive all kinds of requests of client transmission, Such as domain name identification request, and send all kinds of requests received to domain name identification server (i.e. domain name identification layer), for example, Domain name identification request is sent to domain name identification server, it, will when receiving the warning information that domain name identification server returns Warning information is sent to client.
(2) domain name identifies server;
Domain name identification server mainly undertakes domain name identification function, logical between client and server cluster for being connected to Letter, for example, domain name identification service implement body can receive all kinds of requests that input server is sent, such as domain name identification request, And carried out executing corresponding operation according to the request received, if the request than receiving is domain name identification request, according to Domain name identification request carries out domain name identification.And when identifying target domain name, sends a warning message and give input server.
As shown in Figure 5 b, a kind of domain name recognition methods, detailed process can be such that
S501, input server input domain name identification instruction, and domain name identification instruction is sent to domain name identification server;
Inputting in server, there is domain name to detect the page, and user can search domain name detection by fixed interface or network address The page, for user after the domain name detection page is searched in triggering, the lookup domain name that server can be triggered according to user detects the page Instruction carries out the domain name detection page and searches, and the domain name found the detection page is shown, wherein as shown in figure 3, described The page includes domain name identification control, and for user's click, user identifies that control can trigger domain name identification and refer to by clicking domain name It enables, input server can be obtained the domain name identification instruction of user's input, then identifies the domain name received and instructs, to domain name Identify that server is sent.Wherein, user also can be set timing and carry out domain name identification, when user setting timing carries out domain name knowledge When other, reach user setting domain name identification time, can automatic trigger domain name identification instruction, input server it is ready to receive Domain name identification instruction, does not need user and triggers manually every time.User, can also be with after setting timing carries out domain name identification Triggering domain name identification instruction manually, for example, when needing to carry out domain name identification there are emergency case, and current time is not at use When the domain name recognition time of family setting, user can be received by triggering domain name identification instruction manually to input server.
S502, domain name identify that server receives the domain name that input server is sent and identifies instruction;
After input server sends domain name identification instruction, domain name identifies that server receives the domain that input server is sent Name identification instruction.It is further possible to which setting timing carries out domain name identification in domain name identification server, when user setting Timing carry out domain name identification when, reach user setting domain name identification time, can automatic trigger domain name identification instruction, domain name It identifies server domain name identification instruction ready to receive, does not need user and trigger manually every time.User carries out in setting timing After domain name identification, domain name identification instruction can also be triggered manually, for example, when needing to carry out domain name identification there are emergency case, And current time is when being not at the domain name recognition time of user setting, user can by triggering domain name identification instruction manually, so as to Domain name identifies that server receives, and identifies instruction without receiving the domain name that input server is sent.
S503, domain name identify that server identifies that instruction obtains multiple client for domain name to be analyzed according to domain name Access record;And the corresponding feature vector of domain name to be analyzed is constructed according to access record;
After domain name identification server receives domain name identification instruction, client pair can be obtained based on domain name identification instruction The access of domain name to be analyzed records, and client is user terminal, be it is corresponding with server, provide the journey of local service for client Sequence.It other than some application programs in local runtime, is typically mounted in common client computer, needs mutual with server-side Operation is matched, more common user terminal includes the web browser used such as WWW, receives electronics when posting Email Mail Clients and the client software of instant messaging etc..For this kind of application programs, need there are corresponding clothes in network Business device and service routine provide corresponding service, such as database service, E-mail service etc., in this way in client computer kimonos It is engaged in device end, needing to establish specific communication connection, to guarantee the normal operation of application program.Specifically, available server In all client treat the access record of analysis domain name, also available a certain range of client treats analysis domain name Access record, such as obtain client treat analysis domain name access record etc., it is not limited here, specifically can according to The range of family input is obtained, and when user does not limit, can be defaulted all clients obtained in server and be treated point Analyse the access record of domain name.Specifically, the client in server treats the access record of analysis domain name, specifically can be by wait divide Analysis domain name extracts analysis to access record and obtains, if including the client in server in the access record of domain name to be analyzed End, it is determined that by comprising client treat analysis domain name exist access record.Not by comprising client then treat analysis domain There is no access records for name.
Then constructing the corresponding feature vector of domain name to be analyzed according to access record specifically can be according to client End is with the presence or absence of access record, and access record, then be defined as the first element if it exists, and access record, then be defined as if it does not exist Second element, then according to the client of acquisition access record sequence, the element of each definition is arranged, as to The element for analyzing the feature vector of domain name, can be obtained the corresponding feature vector of domain name to be analyzed.
S504, domain name identify that server carries out Feature Dimension Reduction processing to the corresponding feature vector of the domain name to be analyzed, obtain Feature vector after to the corresponding dimensionality reduction of domain name to be analyzed;
Further, after obtaining the corresponding feature vector of domain name to be analyzed, it is corresponding that analysis domain name can be treated first Feature vector carry out Feature Dimension Reduction processing, to reduce the calculation amount of domain name to be analyzed.
Specifically, carrying out Feature Dimension Reduction processing to the corresponding feature vector of the domain name to be analyzed may include:
Hash function needed for determining the corresponding feature vector of the domain name to be analyzed;
Hash variation is carried out to the corresponding feature vector of the domain name to be analyzed based on the hash function, after obtaining dimensionality reduction Feature vector.
Specifically, multiple row is carried out due to needing to treat the corresponding feature vector of analysis domain name to upset, then in order to simulate into The effect that every trade is upset determines and selects n random Harsh function h1, h2, h3 ... hn, based on the hash function to described wait divide It analyses the corresponding feature vector of domain name and carries out Hash variation, obtain feature vector after dimensionality reduction, specific Hash change procedure includes:
Corresponding Hash subcharacter vector is added to described eigenvector according to hash function;
Characteristic value in the Hash subcharacter vector is initialized, initialization Hash feature subvector is obtained;
Initialization Hash feature subvector subcharacter vector corresponding with the domain name to be analyzed is compared, is obtained To comparing result;
Dimensionality reduction is carried out to the corresponding subcharacter vector of the domain name to be analyzed according to the comparing result, is obtained special after dimensionality reduction Levy vector.
Corresponding Hash subcharacter vector is added to described eigenvector according to hash function, for example, it is assumed that be analyzed The feature vector of domain name each behavior a, b, c, d, e, a (1,0,0,1), b (0,0,1,0), c (0,1,0,1), d (1,0,1,1), e (0,0,1,0) changes abcde into corresponding line number, adds two hash functions later, wherein h1 (x)=(x+1) mod 5, H2 (x)=(3*x+1) mod 5, notices that x refers to line number here.
S1 S2 S3 S4 h1 h2
0 1 0 0 1 1 1
1 0 0 1 0 2 4
2 0 1 0 1 3 2
3 1 0 1 1 4 0
4 0 0 1 0 0 3
Characteristic value in the Hash subcharacter vector is initialized, initialization Hash feature subvector, tool are obtained Body treatment process is that SIG (i, c) is enabled to indicate element of i-th of hash function on c column.When beginning, by all SIG (i, C) it is initialized as Inf (infinity), i.e., is all initialized as Inf:
S1 S2 S3 S4
h1 Inf Inf Inf Inf
h2 Inf Inf Inf Inf
Further, dimensionality reduction is carried out to the corresponding subcharacter vector of the domain name to be analyzed according to the comparing result, obtained Feature vector after to dimensionality reduction, comprising:
If the characteristic value of the initialization Hash feature subvector is greater than the corresponding subcharacter vector of the domain name to be analyzed Characteristic value, and the corresponding subcharacter vector characteristics value of the domain name to be analyzed is greater than preset value, then it is the initialization Hash is special The characteristic value of sign subvector is replaced;
If the characteristic value of the initialization Hash feature subvector is greater than the corresponding subcharacter vector of the domain name to be analyzed Characteristic value, and the corresponding subcharacter vector characteristics value of the domain name to be analyzed is equal to preset value, then the initialization Hash feature The characteristic value of subvector remains unchanged;
By the characteristic value of replaced Hash feature subvector and the characteristic value of Hash feature subvector that remains unchanged into Row combination obtains combination Hash feature subvector;
Dimensionality reduction is carried out to the corresponding subcharacter vector of the domain name to be analyzed according to combination Hash feature subvector, is dropped Feature vector after dimension.
Initialization Hash feature subvector subcharacter vector corresponding with the domain name to be analyzed is compared, is obtained To comparing result, calculate h1 (r), h2 (r) ... hn (r);For each column c, if the r behavior 0 where c, without place Reason;If the r behavior 1 where c, for each i=1,2 ... n, SIG (i, c) is set to original SIG (i, c) and hi (r) minimum value between.Following calculate the signature matrix, turning next to the 0th row in feature vector;At this moment the value of S2 and S3 is 0, So without change;The value of S1 and S4 is 1, needs to change.H1=1, h2=1.1 is smaller than Inf, so need to be the two positions S1 and S4 It sets corresponding value to replace, i.e., by the characteristic value son corresponding with the domain name to be analyzed of the initialization Hash feature subvector Feature vector characteristic value carries out size comparison, obtains comparing result, effect is as follows after replacement:
S1 S2 S3 S4
h1 1 Inf Inf 1
h2 1 Inf Inf 1
Turning next to the 1st row in feature vector;Only the value of S3 is 1;H1=2 at this time, h2=4;To S3, that column is carried out Replacement, obtains:
S1 S2 S3 S4
h1 1 Inf 2 1
h2 1 Inf 4 1
Turning next to the 2nd row in feature vector;The value of S2 and S4 is 1;H1=3, h2=2;Because in feature vector S4 that Two values of one column are all 1, smaller than 3 and 2, so need to only replace that column of S2:
S1 S2 S3 S4
h1 1 3 2 1
h2 1 2 4 1
Turning next to the 3rd row in feature vector;The value of S1, S3 and S4 are all 1, h1=4, h2=0;Effect is such as after replacement Under:
S1 S2 S3 S4
h1 1 3 2 1
h2 0 2 0 0
Turning next to the 4th row in feature vector;S3 value is 1, h1=0, h2=3, and final effect is as follows:
S1 S2 S3 S4
h1 1 3 0 1
h2 0 2 0 0
In this way, all rows are all traversed once, finally obtained signature matrix is as follows:
S1 S2 S3 S4
h1 1 3 0 1
h2 0 2 0 0
Dimensionality reduction, the signature matrix of last S1 are carried out to the corresponding subcharacter vector of the domain name to be analyzed according to comparing result For (1,0), script feature vector is (1,0,0,1,0), shortens length, and S2 is (3,2), and S3 is (0,0), and S4 is (1,0), is obtained Feature vector after to dimensionality reduction, hence it is evident that as can be seen that the feature vector of domain name to be analyzed all shortens length, to reduce calculating The calculation amount of similarity between domain name to be analyzed, therefore using the signature matrix after last progress hash conversion as to be analyzed Feature vector after the corresponding dimensionality reduction of domain name.
It, can be first right in order to reduce the analysis quantity of domain name to be analyzed before treating analysis domain name and carrying out dimension-reduction treatment Domain name to be analyzed is filtered, specifically:
Processing is filtered to the corresponding feature vector of the domain name to be analyzed, feature vector after being filtered;
Using filtered feature vector as the corresponding feature vector of domain name to be analyzed.
Processing is filtered to the corresponding feature vector of the domain name to be analyzed, the domain name to be analyzed after obtaining filtration treatment Corresponding feature vector.
In specific implementation process, filtration treatment may include the general domain name treated analyze in domain name, the internal domain used Name etc. is filtered, and is filtered to the domain name of fixed malice clique, can specifically include:
(1) obtain fixed general domain name first, i.e., the domain name that substantially each client can access, such as 10000 domain name before Alexa ranking, such as domain name etc. and the general CDN of Google.com (Google, search engine) (Content Delivery Network, content distributing network) domain name, is then analysed to domain name and has determined that with what is got General domain name and general CDN compare, it is consistent with fixed general domain name or general CDN if it exists Domain name, then will be with fixed general domain name or the consistent domain name of general CDN and corresponding feature vector to be analyzed It is removed in domain name list, i.e., determining and fixed general domain name or the consistent domain name of general CDN are normal or general Domain name does not need to be analyzed again, to reduce the calculation amount of domain name to be analyzed.
(2) after removing general domain name or the general CDN in domain name to be analyzed, further obtain white list and Domain name in blacklist.The domain name in white list is obtained first, wherein the domain name in white list may include internal general domain Then name, the domain name etc. for test are analysed to domain name and compare with the domain name in white list, if it exists and in white list Domain name domain name to be analyzed always, then illustrate that consistent domain name to be analyzed is fixed internal domain name, or tested Domain name, then equally by with the consistent domain name to be analyzed of the domain name in white list and corresponding feature vector from domain name to be analyzed It is removed in list, to be further reduced the calculation amount of domain name to be analyzed.Further, the domain name in blacklist is obtained, wherein Domain name in blacklist includes fixed malice domain name, such as the domain name of malice clique etc., be then analysed to domain name with it is black Domain name in list compares, if it exists with the consistent domain name of domain name in blacklist, it is determined that consistent domain name to be analyzed For malice domain name, then will be removed from domain name list to be analyzed with the consistent domain name to be analyzed of domain name in blacklist first, from And it is further reduced the calculation amount of domain name to be analyzed.Then the client with the consistent domain name to be analyzed of domain name in blacklist is obtained End access record, and the client in the presence of access and the consistent domain name to be analyzed of domain name in blacklist is extracted, it then extracts pre- If warning message, and the warning message is sent in the presence of access and the domain name consistent domain name to be analyzed in blacklist Client reminds the user that, reminds it to can suffer from attacking or being invaded, so that user takes precautions against in time.
Then will remove with the domain name in fixed general domain name or the consistent domain name of general CDN, white list with And domain name and corresponding feature vector in the domain name to be analyzed after the domain name in blacklist, as filtered wait divide Analyse the corresponding feature vector of domain name.
S505, feature vector calculates domain to be analyzed after domain name identification server is based on the corresponding dimensionality reduction of the domain name to be analyzed Similarity between name;
Be analysed to domain name feature vector carry out dimensionality reduction after, can be according to the corresponding dimensionality reduction of domain name to be analyzed after Feature vector calculates the similarity between domain name to be analyzed.
Specifically calculating process includes:
Obtain the similarity calculation coefficient between domain name to be analyzed;
Obtain the vector line number of feature vector after the corresponding dimensionality reduction of the domain name to be analyzed;
The similarity between domain name to be analyzed is calculated according to the similarity calculation coefficient and vector line number.
Specifically, feature vector is divided into default row item (band), Mei Gehang after being analysed to the corresponding dimensionality reduction of domain name Item is made of corresponding row.For each row item, there are a hash functions can be whole by the corresponding line number amount in row item Array at column vector (each column in row item) be mapped to certain domain name concentrate.Identical Hash can be used to all row items Function, a but independent domain name collection array is used for each row item, so even being the same column that do not go together in item Vector will not be hashing onto the same domain name and concentrate.As long as in this way, two feature vectors fallen in some row item it is identical Two column of domain name collection, the two feature vectors are regarded as possible similarity-rough set height, and the candidate as subsequent calculating is right;Then Obtain the similarity calculation coefficient between domain name to be analyzed;Wherein the similarity calculation coefficient between domain name to be analyzed is fixed pre- If the Jaccard (outstanding person blocks German number) of value, such as 0.4;Obtain feature vector after the corresponding dimensionality reduction of the domain name to be analyzed to Measure line number;It is divided into the line number of default row item (band), it can be according to the similarity calculation coefficient and vector line number Calculate the similarity between domain name to be analyzed.Specific formula for calculation is P=1- (1-rC)B, wherein P is represented between domain name to be analyzed Similarity, C is line number amount, and r is similarity calculation coefficient, and B be the quantity for the domain name collection chosen, the value substitution that will acquire In formula, for example assume that similarity calculation coefficient is 0.3, C 3, domain name collection B is 100, then P=1- (1-0.43)100= 0.998, i.e., the similarity between domain name to be analyzed is 0.998.
S506, domain name identification server gather the domain name to be analyzed based on the similarity between domain name to be analyzed Class, the domain name collection after being clustered;
S507 treats analysis domain name according to the domain name collection after the cluster and is identified.
After the similarity being calculated between domain name to be analyzed, the domain name to be analyzed that similarity is greater than preset value is drawn It assigns to the same domain name to concentrate, that is, is divided to same domain name and concentrates, domain name collection after being clustered.
Specifically, the domain name to be analyzed is clustered based on the similarity between domain name to be analyzed, after obtaining cluster Domain name collection, comprising:
The similarity being analysed between domain name is compared with preset value;
The domain name to be analyzed that similarity is greater than preset value is clustered to same domain name and is concentrated, the domain name collection after being clustered.
The similarity that can be specifically analysed between domain name is compared with preset value, if the phase between domain name to be analyzed It is greater than preset value like degree, is then analysed to domain name and clusters to same domain name concentration, successively clustered, until by all wait divide It analyses domain name cluster to complete, domain name collection after cluster can be obtained.It should be noted that in order to reduce the quantity of subsequent domain name identification, And the accuracy of domain name identification is improved, the preset value of the similarity comparison of setting is biggish value, for example is greater than 0.9 and is less than etc. In 1 etc..
Then analysis domain name is treated according to the domain name collection after the cluster to be identified, concentrated due to the domain name after cluster Domain name similarity is higher, the domain name to be analyzed that part identification domain name is concentrated is carried out, you can learn that the class of remaining domain name to be analyzed Type.Therefore specific identification process is that part identifies the domain name to be analyzed that domain name is concentrated, and obtains the class of part domain name to be analyzed Type can not be known if the type of the part domain name to be analyzed of identification is consistent using the type of part domain name to be analyzed as remaining The type of other domain name to be analyzed.If the Type-Inconsistencies of the part domain name to be analyzed of identification, can be improved the to be analyzed of identification The quantity of domain name, to improve the accuracy of identification, and by the domain name to be analyzed of identification, accounting is more than preset ratio wait divide The type of domain name is analysed as the type for being left unidentified domain name to be analyzed.
Specifically, analysis domain name is treated according to the domain name collection after the cluster to be identified, may include:
Extract the feature of target domain name;
The domain name feature that the feature of the target domain name is concentrated with the domain name after the cluster is compared;
Exist and the consistent domain name of target domain name feature if the domain name after the cluster is concentrated, it is determined that the cluster The domain name that domain name afterwards is concentrated is target domain name.
In specific domain name identification process, the feature of target domain name can be extracted, and by the feature of the target domain name and institute The domain name that domain name is concentrated after clustering is stated to compare, since the similarity of the type of the domain name of same domain name concentration is higher, or It is consistent for type, therefore, part domain name and the feature of target domain name that domain name is concentrated are compared, to reduce pair The quantity of ratio shortens the reduced time, to improve to specific efficiency.Need to illustrate says, the feature of the target domain name of extraction, can be with For the feature of the domain name of malice clique, if the domain name that domain name is concentrated after the cluster exist it is consistent with the feature of the target domain name Domain name, it is determined that the domain name that domain name is concentrated after the cluster is target domain name, such as the domain name of malice clique.Further, Since the time that the domain name of part malice is broken out is shorter, it is understood that there may be the case where not obtaining domain name feature can also pass through at this time It is manually screened, judges that domain name is concentrated with the presence or absence of malice domain name.
S508, if the domain name that domain name identification server identification domain name is concentrated is target domain name, it is determined that domain name Concentrate the access client of domain name;
S509, domain name identification server send a warning message to the access client.
If it is determined that the domain name that a certain domain name is concentrated is target domain name, the client of the domain name of domain name concentration was accessed in order to prevent Hold it is under attack, then it needs to be determined that domain name concentrate domain name access client, Xiang Suoshu access client send alarm letter Breath can specifically extract preset warning information, and the warning information is sent to the domain concentrated in the presence of access domain name The client of name, or warning information is generated in real time, and the warning information of generation is sent to access client, to remind use Family reminds it to can suffer from attacking or being invaded, so that user takes precautions against in time.
From the foregoing, it will be observed that the present embodiment is by receiving domain name identification instruction;Identify that instruction obtains multiple visitors according to domain name Family end is recorded for the access of domain name to be analyzed;The corresponding feature vector of domain name to be analyzed is constructed according to access record;It is right It constructs obtained feature vector and carries out Feature Dimension Reduction processing, obtain feature vector after the corresponding dimensionality reduction of domain name to be analyzed;Based on institute It states feature vector after the corresponding dimensionality reduction of domain name to be analyzed and calculates similarity between domain name to be analyzed;Based between domain name to be analyzed Similarity the domain name to be analyzed is clustered, domain name collection after being clustered can be according to the domain name collection after the cluster Analysis domain name is treated to be identified;Domain name is clustered to domain name according to similarity and is concentrated by this programme, according to each class after cluster Other domain name collection is identified again, since the domain name that same domain name is concentrated is same or similar classification, is effectively kept away in identification process Exempted from the situation for identification types mistake occur, provide domain name identification accuracy, and domain name concentrate domain name be it is similar or Same type, does not need to compare all domain name features and can be obtained the type of domain name, reduces the recognition time of domain name, improves Domain name recognition efficiency.
In order to better implement above method, the embodiment of the present invention can also provide a kind of domain name identification device, the domain name Identification device specifically can integrate in the network device, which can be the equipment such as terminal or server.
For example, as shown in Figure 6 a, which may include receiving unit 601, acquiring unit 602, dimensionality reduction list Member 603, computing unit 604, cluster cell 605 and recognition unit 606, as follows:
(1) receiving unit 601;
Receiving unit 601, for receiving domain name identification instruction.
For example it receives the domain name that input server is sent and identifies instruction.Or it can also identify in server and be arranged in domain name Timing carry out domain name identification, when user setting timing carry out domain name identification when, reach user setting domain name identification time, Can the identification instruction of automatic trigger domain name, domain name identifies server domain name ready to receive identification instruction, do not need each hand of user It is dynamic to be triggered.User can also trigger domain name identification instruction, for example, working as after setting timing carries out domain name identification manually There are emergency case, need to carry out domain name identification, and when current time is not at the domain name recognition time of user setting, Yong Huke By triggering domain name identification instruction manually, so that domain name identification server receives, sent without receiving input server Domain name identify instruction.
(2) acquiring unit 602;
Acquiring unit 602, for identifying that instruction obtains the visit that multiple client is directed to domain name to be analyzed according to domain name Ask record;And the corresponding feature vector of domain name to be analyzed is constructed according to access record.
After domain name identification server receives domain name identification instruction, client pair can be obtained based on domain name identification instruction The access of domain name to be analyzed records.Specifically, client all in available server treats the access note of analysis domain name Record, also available a certain range of client treats the access record of analysis domain name, for example obtains client to be analyzed Access record of domain name etc. can specifically be obtained according to the range that user inputs, not limited in user it is not limited here Periodically, it can default and obtain the access record that all clients in server treat analysis domain name.Specifically, in server Client treats the access record of analysis domain name, specifically can extract analysis to access record by domain name to be analyzed and obtain, If including the client in server in the access record of domain name to be analyzed, it is determined that by comprising client treat analysis domain name It is recorded in the presence of access.Not by comprising client then treat analysis domain name there is no access record.
Then constructing the corresponding feature vector of domain name to be analyzed according to access record specifically can be according to client End is with the presence or absence of access record, and access record, then be defined as the first element if it exists, and access record, then be defined as if it does not exist Second element, then according to the client of acquisition access record sequence, the element of each definition is arranged, as to The element for analyzing the feature vector of domain name, can be obtained the corresponding feature vector of domain name to be analyzed.
(3) dimensionality reduction unit 603;
Dimensionality reduction unit 603, for carrying out Feature Dimension Reduction processing to the corresponding feature vector of the domain name to be analyzed, obtain to Feature vector after the corresponding dimensionality reduction of analysis domain name.
After obtaining the corresponding feature vector of domain name to be analyzed, can treat first the corresponding feature vector of analysis domain name into Row Feature Dimension Reduction processing, to reduce the calculation amount of domain name to be analyzed.
(4) computing unit 604;
Computing unit 604 calculates domain name to be analyzed for feature vector after being based on the corresponding dimensionality reduction of the domain name to be analyzed Between similarity.
Be analysed to domain name feature vector carry out dimensionality reduction after, can be according to the corresponding dimensionality reduction of domain name to be analyzed after Feature vector calculates the similarity between domain name to be analyzed.Then feature vector is divided into after being analysed to the corresponding dimensionality reduction of domain name Default row item (band), each row item are made of corresponding row.For each row item, there are a hash functions can be by row The column vector (each column in row item) of corresponding line number amount integer composition in item is mapped to certain domain name concentration, that is, maps It is concentrated to a certain domain name.Can to all row items use identical hash function, but for each row item use one solely Vertical domain name collection array will not be hashing onto the same domain name and concentrate so even being the identical column vector that do not go together in item. As long as the two feature vectors are recognized in this way, two feature vectors have two column for falling in same domain name collection in some row item High for possible similarity-rough set, the candidate as subsequent calculating is right;Then it is chosen for Jaccard (the Jie Kade of fixed preset value Coefficient), such as 0.4, candidate is calculated to the probability for falling in the same domain name collection, and specific formula for calculation is P=1- (1-rC)B, wherein P represents probability, and r is line number amount, and C is fixed value 3, and B is the quantity for the domain name collection chosen, and corresponding value is substituted into formula, The candidate similarity between in the feature vector of domain name to be analyzed can be calculated.
(5) cluster cell 605;
Cluster cell 605 is obtained for being clustered based on the similarity between domain name to be analyzed to the domain name to be analyzed Domain name collection after to cluster;
(6) recognition unit 606;
Analysis domain name is treated according to the domain name collection after the cluster to be identified.
After the similarity being calculated between domain name to be analyzed, the domain name to be analyzed that similarity is greater than preset value is drawn It assigns to the same domain name to concentrate, that is, is divided to same domain name and concentrates, domain name collection after being clustered.
Specifically, as shown in Figure 6 b, cluster cell 605 may include:
First contrast subunit 607, the similarity for being analysed between domain name are compared with preset value;
Subelement 608 is clustered, the domain name to be analyzed for similarity to be greater than preset value is clustered to same domain name and concentrated, and is obtained Domain name collection after to cluster.
The similarity that can be specifically analysed between domain name is compared with preset value, if the phase between domain name to be analyzed It is greater than preset value like degree, is then analysed to domain name and clusters to same domain name concentration, successively clustered, until by all wait divide It analyses domain name cluster to complete, domain name collection after cluster can be obtained.It should be noted that in order to reduce the quantity of subsequent domain name identification, And the accuracy of domain name identification is improved, the preset value of the similarity comparison of setting is biggish value, for example is greater than 0.9 and is less than etc. In 1 etc..
Then analysis domain name is treated according to the domain name collection after the cluster to be identified, concentrated due to the domain name after cluster Domain name similarity is higher, the domain name to be analyzed that part identification domain name is concentrated is carried out, you can learn that the class of remaining domain name to be analyzed Type.Therefore specific identification process is that part identifies the domain name to be analyzed that domain name is concentrated, and obtains the class of part domain name to be analyzed Type can not be known if the type of the part domain name to be analyzed of identification is consistent using the type of part domain name to be analyzed as remaining The type of other domain name to be analyzed.If the Type-Inconsistencies of the part domain name to be analyzed of identification, can be improved the to be analyzed of identification The quantity of domain name, to improve the accuracy of identification, and by the domain name to be analyzed of identification, accounting is more than preset ratio wait divide The type of domain name is analysed as the type for being left unidentified domain name to be analyzed.
Specifically, as fig. 6 c, cluster cell 605 can also include extracting subelement 609, the second contrast subunit 610 and determine subelement 611:
Subelement 609 is extracted, for extracting the feature of target domain name;
Second contrast subunit 610, the domain for concentrating the domain name after the feature of the target domain name and the cluster Name feature compares;
Subelement 611 is determined, if concentrating for the domain name after the cluster in the presence of consistent with the target domain name feature Domain name, it is determined that the domain name that the domain name after the cluster is concentrated is target domain name.
In specific domain name identification process, the feature of target domain name can be extracted, and by the feature of the target domain name and institute The domain name that domain name is concentrated after clustering is stated to compare, since the similarity of the type of the domain name of same domain name concentration is higher, or It is consistent for type, therefore, part domain name and the feature of target domain name that domain name is concentrated are compared, to reduce pair The quantity of ratio shortens the reduced time, to improve to specific efficiency.Need to illustrate says, the feature of the target domain name of extraction, can be with For the feature of the domain name of malice clique, if the domain name that domain name is concentrated after the cluster exist it is consistent with the feature of the target domain name Domain name, it is determined that the domain name that domain name is concentrated after the cluster is target domain name, such as the domain name of malice clique.Further, Since the time that the domain name of part malice is broken out is shorter, it is understood that there may be the case where not obtaining domain name feature can also pass through at this time It is manually screened, judges that domain name is concentrated with the presence or absence of malice domain name.
Further, however, it is determined that the domain name that a certain domain name is concentrated is target domain name, accesses domain name concentration in order to prevent The client of domain name is under attack, then it needs to be determined that domain name concentrates the access client of domain name, Xiang Suoshu access client It sends a warning message, can specifically extract preset warning information, and the warning information is sent in the presence of the access domain The client for the domain name that name is concentrated, or warning information is generated in real time, and the warning information of generation is sent to access client, It reminds the user that, reminds it to can suffer from attacking or being invaded, so that user takes precautions against in time.
From the foregoing, it will be observed that the receiving unit 601 of the domain name identification device of the present embodiment is by receiving domain name identification instruction;Then Acquiring unit 602 identifies that instruction obtains multiple client and records for the access of domain name to be analyzed according to domain name;And according to The access record constructs the corresponding feature vector of domain name to be analyzed;Then obtained feature vector is constructed for 603 pairs of dimensionality reduction unit Feature Dimension Reduction processing is carried out, feature vector after the corresponding dimensionality reduction of domain name to be analyzed is obtained;Computing unit 604 is based on described to be analyzed Feature vector calculates the similarity between domain name to be analyzed after the corresponding dimensionality reduction of domain name;Cluster cell be based on again domain name to be analyzed it Between similarity the domain name to be analyzed is clustered, the domain name collection after being clustered can be according to the domain after the cluster Name collection is treated analysis domain name and is identified;It realizes and is concentrated domain name cluster to domain name according to similarity, then according to domain name collection Domain name is identified, the recognition time of domain name is greatly reduced, and improves the accuracy of domain name identification.
The embodiment of the present invention also provides a kind of server, as shown in fig. 7, it illustrates take involved in the embodiment of the present invention The structural schematic diagram of business device, specifically:
The server may include one or processor 701, one or more meters of more than one processing core The components such as memory 702, power supply 703 and the input unit 704 of calculation machine readable storage medium storing program for executing.Those skilled in the art can manage It solves, server architecture shown in Fig. 7 does not constitute the restriction to server, may include than illustrating more or fewer portions Part perhaps combines certain components or different component layouts.Wherein:
Processor 701 is the control centre of the server, utilizes each of various interfaces and the entire server of connection Part by running or execute the software program and/or module that are stored in memory 702, and calls and is stored in memory Data in 702, the various functions and processing data of execute server, to carry out integral monitoring to server.Optionally, locate Managing device 701 may include one or more processing cores;Preferably, processor 701 can integrate application processor and modulatedemodulate is mediated Manage device, wherein the main processing operation system of application processor, user interface and application program etc., modem processor is main Processing wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processor 401.
Memory 702 can be used for storing software program and module, and processor 701 is stored in memory 702 by operation Software program and module, thereby executing various function application and data processing.Memory 702 can mainly include storage journey Sequence area and storage data area, wherein storing program area can the (ratio of application program needed for storage program area, at least one function Such as sound-playing function, image player function) etc.;Storage data area, which can be stored, uses created data according to server Deng.In addition, memory 702 may include high-speed random access memory, it can also include nonvolatile memory, for example, at least One disk memory, flush memory device or other volatile solid-state parts.Correspondingly, memory 702 can also include Memory Controller, to provide access of the processor 701 to memory 702.
Server further includes the power supply 703 powered to all parts, it is preferred that power supply 703 can pass through power management system It unites logically contiguous with processor 701, to realize the function such as management charging, electric discharge and power managed by power-supply management system Energy.Power supply 703 can also include one or more direct current or AC power source, recharging system, power failure monitor electricity The random components such as road, power adapter or inverter, power supply status indicator.
The server may also include input unit 704, which can be used for receiving the number or character letter of input Breath, and generation keyboard related with user setting and function control, mouse, operating stick, optics or trackball signal are defeated Enter.
Although being not shown, server can also be including display unit etc., and details are not described herein.Specifically in the present embodiment, Processor 701 in server can according to following instruction, by the process of one or more application program is corresponding can It executes file to be loaded into memory 702, and runs the application program being stored in memory 702 by processor 701, thus Realize various functions, as follows:
Receive domain name identification instruction;
Identify that instruction obtains multiple client and records for the access of domain name to be analyzed according to domain name;And according to described Access record constructs the corresponding feature vector of domain name to be analyzed;
Feature Dimension Reduction processing is carried out to the corresponding feature vector of the domain name to be analyzed, obtains the corresponding drop of domain name to be analyzed Feature vector after dimension;
The similarity between domain name to be analyzed is calculated based on feature vector after the corresponding dimensionality reduction of the domain name to be analyzed;
The domain name to be analyzed is clustered based on the similarity between domain name to be analyzed, the domain name after being clustered Collection;
Analysis domain name is treated according to the domain name collection after the cluster to be identified.
The specific implementation of above each operation can be found in the embodiment of front, and details are not described herein.
It will appreciated by the skilled person that all or part of the steps in the various methods of above-described embodiment can be with It is completed by instructing, or relevant hardware is controlled by instruction to complete, which can store computer-readable deposits in one In storage media, and is loaded and executed by processor.
For this purpose, the embodiment of the present invention provides a kind of storage medium, wherein being stored with a plurality of instruction, which can be processed Device is loaded, to execute the step in any domain name recognition methods provided by the embodiment of the present invention.For example, the instruction can To execute following steps:
Receive domain name identification instruction;
Identify that instruction obtains multiple client and records for the access of domain name to be analyzed according to domain name;And according to described Access record constructs the corresponding feature vector of domain name to be analyzed;
Feature Dimension Reduction processing is carried out to the corresponding feature vector of the domain name to be analyzed, obtains the corresponding drop of domain name to be analyzed Feature vector after dimension;
The similarity between domain name to be analyzed is calculated based on feature vector after the corresponding dimensionality reduction of the domain name to be analyzed;
The domain name to be analyzed is clustered based on the similarity between domain name to be analyzed, the domain name after being clustered Collection;
Analysis domain name is treated according to the domain name collection after the cluster to be identified.
The specific implementation of above each operation can be found in the embodiment of front, and details are not described herein.
Wherein, which may include: read-only memory (ROM, Read Only Memory), random access memory Body (RAM, Random Access Memory), disk or CD etc..
By the instruction stored in the storage medium, any domain name provided by the embodiment of the present invention can be executed and known Step in other method, it is thereby achieved that achieved by any domain name recognition methods provided by the embodiment of the present invention Beneficial effect is detailed in the embodiment of front, and details are not described herein.
Be provided for the embodiments of the invention a kind of domain name recognition methods above, device and system are described in detail, Used herein a specific example illustrates the principle and implementation of the invention, and the explanation of above embodiments is only used In facilitating the understanding of the method and its core concept of the invention;Meanwhile for those skilled in the art, think of according to the present invention Think, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification should not be construed as pair Limitation of the invention.

Claims (15)

1. a kind of domain name recognition methods characterized by comprising
Receive domain name identification instruction;
Identify that instruction obtains multiple client and records for the access of domain name to be analyzed according to domain name, and according to the access Record constructs the corresponding feature vector of domain name to be analyzed;
Feature Dimension Reduction processing is carried out to the corresponding feature vector of the domain name to be analyzed, after obtaining the corresponding dimensionality reduction of domain name to be analyzed Feature vector;
The similarity between domain name to be analyzed is calculated based on feature vector after the corresponding dimensionality reduction of the domain name to be analyzed;
The domain name to be analyzed is clustered based on the similarity between domain name to be analyzed, the domain name collection after being clustered;
Analysis domain name is treated according to the domain name collection after the cluster to be identified, recognition result is obtained.
2. domain name recognition methods as described in claim 1, which is characterized in that described to the corresponding feature of the domain name to be analyzed Vector carries out Feature Dimension Reduction processing, comprising:
Hash function needed for determining the corresponding feature vector of the domain name to be analyzed;
Hash variation is carried out to the corresponding feature vector of the domain name to be analyzed based on the hash function, obtains feature after dimensionality reduction Vector.
3. domain name recognition methods as claimed in claim 2, which is characterized in that described eigenvector includes that domain name to be analyzed is corresponding Subcharacter vector, it is described that Hash variation is carried out to the corresponding feature vector of the domain name to be analyzed based on the hash function, Obtain feature vector after dimensionality reduction, comprising:
Corresponding Hash subcharacter vector is added to described eigenvector according to hash function;
Characteristic value in the Hash subcharacter vector is initialized, initialization Hash feature subvector is obtained;
Initialization Hash feature subvector subcharacter vector corresponding with the domain name to be analyzed is compared, is obtained pair Compare result;
Dimensionality reduction is carried out to the corresponding subcharacter vector of the domain name to be analyzed according to the comparing result, obtain after dimensionality reduction feature to Amount.
4. domain name recognition methods as claimed in claim 3, which is characterized in that described by the initialization Hash feature subvector Subcharacter vector corresponding with the domain name to be analyzed compares, and obtains comparing result, comprising:
By the characteristic value subcharacter vector characteristics value corresponding with the domain name to be analyzed of the initialization Hash feature subvector Size comparison is carried out, comparing result is obtained.
5. domain name recognition methods as claimed in claim 3, which is characterized in that it is described according to the comparing result to it is described to point It analyses the corresponding subcharacter vector of domain name and carries out dimensionality reduction, obtain feature vector after dimensionality reduction, comprising:
If the characteristic value of the initialization Hash feature subvector is greater than the corresponding subcharacter vector characteristics of the domain name to be analyzed Value, and the corresponding subcharacter vector characteristics value of the domain name to be analyzed is greater than preset value, then by initialization Hash feature The characteristic value of vector is replaced;
If the characteristic value of the initialization Hash feature subvector is greater than the corresponding subcharacter vector characteristics of the domain name to be analyzed Value, and the corresponding subcharacter vector characteristics value of the domain name to be analyzed be equal to preset value, then initialization Hash feature to The characteristic value of amount remains unchanged;
The characteristic value of replaced Hash feature subvector and the characteristic value of the Hash feature subvector remained unchanged are subjected to group It closes, obtains combination Hash feature subvector;
Dimensionality reduction is carried out to the corresponding subcharacter vector of the domain name to be analyzed according to combination Hash feature subvector, after obtaining dimensionality reduction Feature vector.
6. domain name recognition methods as claimed in claim 2, which is characterized in that the corresponding spy of the determination domain name to be analyzed Before hash function needed for levying vector, further includes:
Processing is filtered to the corresponding feature vector of the domain name to be analyzed, feature vector after being filtered;
Using filtered feature vector as the corresponding feature vector of domain name to be analyzed.
7. domain name recognition methods as described in claim 1, which is characterized in that described to be based on the corresponding drop of the domain name to be analyzed Feature vector calculates the similarity between domain name to be analyzed after dimension, comprising:
Obtain the similarity calculation coefficient between domain name to be analyzed;
Obtain the vector line number of feature vector after the corresponding dimensionality reduction of the domain name to be analyzed;
The similarity between domain name to be analyzed is calculated according to the similarity calculation coefficient and vector line number.
8. domain name recognition methods as described in claim 1, which is characterized in that described to be analyzed according to access record building The corresponding feature vector of domain name, comprising:
Multiple client is ranked up according to sequence track;
Client records the domain name to be analyzed with the presence or absence of access after detection sequence, the corresponding inspection of client after being sorted Survey result;
It is put in order based on the corresponding testing result of client after sequence, client, constructs the corresponding feature of domain name to be analyzed Vector.
9. domain name recognition methods as described in claim 1, which is characterized in that the similarity based between domain name to be analyzed The domain name to be analyzed is clustered, domain name collection after being clustered, comprising:
The similarity being analysed between domain name is compared with preset value;
The domain name to be analyzed that similarity is greater than preset value is clustered into the domain name collection to same domain name collection into, after being clustered.
10. the recognition methods of malice domain name as described in claim 1, which is characterized in that the domain according to after the cluster Name collection is treated analysis domain name and is identified, comprising:
Extract the feature of target domain name;
The domain name feature that the feature of the target domain name is concentrated with the domain name after the cluster is compared;
Exist and the consistent domain name of target domain name feature if the domain name after the cluster is concentrated, it is determined that after the cluster The domain name that domain name is concentrated is target domain name.
11. such as the described in any item domain name recognition methods of claim 1-10, which is characterized in that it is described according to the cluster after Domain name collection was treated after the step of analysis domain name is identified, further includes:
If identifying, the domain name that domain name is concentrated is target domain name, it is determined that the access client of domain name concentration domain name;
It sends a warning message to the access client.
12. domain name recognition methods as described in claim 1, which is characterized in that the acquisition multiple client is for be analyzed The access of domain name records, comprising:
Show that domain name detects the page, the page includes domain name identification control;
Based on user for the trigger action of domain name identification control, the access that multiple client is directed to domain name to be analyzed is obtained Record.
13. domain name recognition methods as claimed in claim 12, which is characterized in that the domain name collection pair according to after the cluster After the step of domain name to be analyzed is identified, further includes:
Domain name recognition result is shown in the domain name detection page.
14. a kind of domain name identification device characterized by comprising
Receiving unit, for receiving domain name identification instruction;
Acquiring unit, for identifying that instruction obtains multiple client and records for the access of domain name to be analyzed according to domain name, And the corresponding feature vector of domain name to be analyzed is constructed according to access record;
Dimensionality reduction unit obtains domain to be analyzed for carrying out Feature Dimension Reduction processing to the corresponding feature vector of the domain name to be analyzed Feature vector after the corresponding dimensionality reduction of name;
Computing unit, for calculating the phase between domain name to be analyzed based on feature vector after the corresponding dimensionality reduction of the domain name to be analyzed Like degree;
Cluster cell is clustered for being clustered based on the similarity between domain name to be analyzed to the domain name to be analyzed Domain name collection afterwards;
Recognition unit is identified for treating analysis domain name according to the domain name collection after the cluster.
15. a kind of storage medium, which is characterized in that the storage medium is stored with a plurality of instruction, and described instruction is suitable for processor It is loaded, the step in 1 to 13 described in any item domain name recognition methods is required with perform claim.
CN201910373033.3A 2019-05-06 2019-05-06 Domain name identification method and device and storage medium Active CN110099059B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910373033.3A CN110099059B (en) 2019-05-06 2019-05-06 Domain name identification method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910373033.3A CN110099059B (en) 2019-05-06 2019-05-06 Domain name identification method and device and storage medium

Publications (2)

Publication Number Publication Date
CN110099059A true CN110099059A (en) 2019-08-06
CN110099059B CN110099059B (en) 2021-08-31

Family

ID=67446989

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910373033.3A Active CN110099059B (en) 2019-05-06 2019-05-06 Domain name identification method and device and storage medium

Country Status (1)

Country Link
CN (1) CN110099059B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110826067A (en) * 2019-10-31 2020-02-21 深信服科技股份有限公司 Virus detection method and device, electronic equipment and storage medium
CN111049949A (en) * 2019-12-31 2020-04-21 奇安信科技集团股份有限公司 Domain name identification method, device, electronic equipment and medium
CN112367338A (en) * 2020-11-27 2021-02-12 腾讯科技(深圳)有限公司 Malicious request detection method and device
CN112565283A (en) * 2020-12-15 2021-03-26 厦门服云信息科技有限公司 APT attack detection method, terminal device and storage medium
CN112583738A (en) * 2020-12-29 2021-03-30 北京浩瀚深度信息技术股份有限公司 Method, equipment and storage medium for analyzing and classifying network flow
CN112615861A (en) * 2020-12-17 2021-04-06 赛尔网络有限公司 Malicious domain name identification method and device, electronic equipment and storage medium
CN112910832A (en) * 2019-12-03 2021-06-04 国家计算机网络与信息安全管理中心 International domain name spoofing attack recognition and analysis method and system
WO2021169730A1 (en) * 2020-02-25 2021-09-02 深信服科技股份有限公司 Method and device for data processing, and storage medium
CN113381963A (en) * 2020-02-25 2021-09-10 深信服科技股份有限公司 Domain name detection method, device and storage medium
CN113542202A (en) * 2020-04-21 2021-10-22 深信服科技股份有限公司 Domain name identification method, device, equipment and computer readable storage medium
CN113556308A (en) * 2020-04-23 2021-10-26 深信服科技股份有限公司 Method, system, equipment and computer storage medium for detecting flow security
CN115361358B (en) * 2022-08-19 2024-02-06 山石网科通信技术股份有限公司 IP extraction method and device, storage medium and electronic device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104112018A (en) * 2014-07-21 2014-10-22 南京大学 Large-scale image retrieval method
CN104486461A (en) * 2014-12-29 2015-04-01 北京奇虎科技有限公司 Domain name classification method and device and domain name recognition method and system
CN108282450A (en) * 2017-01-06 2018-07-13 阿里巴巴集团控股有限公司 The detection method and device of abnormal domain name
CN109698820A (en) * 2018-09-03 2019-04-30 长安通信科技有限责任公司 A kind of domain name Similarity measures and classification method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104112018A (en) * 2014-07-21 2014-10-22 南京大学 Large-scale image retrieval method
CN104486461A (en) * 2014-12-29 2015-04-01 北京奇虎科技有限公司 Domain name classification method and device and domain name recognition method and system
CN108282450A (en) * 2017-01-06 2018-07-13 阿里巴巴集团控股有限公司 The detection method and device of abnormal domain name
CN109698820A (en) * 2018-09-03 2019-04-30 长安通信科技有限责任公司 A kind of domain name Similarity measures and classification method and system

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110826067B (en) * 2019-10-31 2022-08-09 深信服科技股份有限公司 Virus detection method and device, electronic equipment and storage medium
CN110826067A (en) * 2019-10-31 2020-02-21 深信服科技股份有限公司 Virus detection method and device, electronic equipment and storage medium
CN112910832A (en) * 2019-12-03 2021-06-04 国家计算机网络与信息安全管理中心 International domain name spoofing attack recognition and analysis method and system
CN111049949A (en) * 2019-12-31 2020-04-21 奇安信科技集团股份有限公司 Domain name identification method, device, electronic equipment and medium
CN113381962A (en) * 2020-02-25 2021-09-10 深信服科技股份有限公司 Data processing method, device and storage medium
WO2021169730A1 (en) * 2020-02-25 2021-09-02 深信服科技股份有限公司 Method and device for data processing, and storage medium
CN113381963A (en) * 2020-02-25 2021-09-10 深信服科技股份有限公司 Domain name detection method, device and storage medium
CN113381962B (en) * 2020-02-25 2023-02-03 深信服科技股份有限公司 Data processing method, device and storage medium
CN113381963B (en) * 2020-02-25 2024-01-02 深信服科技股份有限公司 Domain name detection method, device and storage medium
CN113542202A (en) * 2020-04-21 2021-10-22 深信服科技股份有限公司 Domain name identification method, device, equipment and computer readable storage medium
CN113542202B (en) * 2020-04-21 2022-09-30 深信服科技股份有限公司 Domain name identification method, device, equipment and computer readable storage medium
CN113556308A (en) * 2020-04-23 2021-10-26 深信服科技股份有限公司 Method, system, equipment and computer storage medium for detecting flow security
CN112367338A (en) * 2020-11-27 2021-02-12 腾讯科技(深圳)有限公司 Malicious request detection method and device
CN112565283A (en) * 2020-12-15 2021-03-26 厦门服云信息科技有限公司 APT attack detection method, terminal device and storage medium
CN112615861A (en) * 2020-12-17 2021-04-06 赛尔网络有限公司 Malicious domain name identification method and device, electronic equipment and storage medium
CN112583738A (en) * 2020-12-29 2021-03-30 北京浩瀚深度信息技术股份有限公司 Method, equipment and storage medium for analyzing and classifying network flow
CN115361358B (en) * 2022-08-19 2024-02-06 山石网科通信技术股份有限公司 IP extraction method and device, storage medium and electronic device

Also Published As

Publication number Publication date
CN110099059B (en) 2021-08-31

Similar Documents

Publication Publication Date Title
CN110099059A (en) A kind of domain name recognition methods, device and storage medium
Rao et al. Jail-Phish: An improved search engine based phishing detection system
Zhu et al. OFS-NN: an effective phishing websites detection model based on optimal feature selection and neural network
CN105590055B (en) Method and device for identifying user credible behaviors in network interaction system
CN112866023B (en) Network detection method, model training method, device, equipment and storage medium
CN111949803B (en) Knowledge graph-based network abnormal user detection method, device and equipment
US20170078310A1 (en) Identifying phishing websites using dom characteristics
CN111565171B (en) Abnormal data detection method and device, electronic equipment and storage medium
CN111355697B (en) Detection method, device, equipment and storage medium for botnet domain name family
CN104899508B (en) A kind of multistage detection method for phishing site and system
CN104579773B (en) Domain name system analyzes method and device
RU2722693C1 (en) Method and system for detecting the infrastructure of a malicious software or a cybercriminal
CN108023868B (en) Malicious resource address detection method and device
CN106209488A (en) For detecting the method and apparatus that website is attacked
CN107612911B (en) Method for detecting infected host and C & C server based on DNS traffic
CN108322428A (en) A kind of abnormal access detection method and equipment
CN113132311A (en) Abnormal access detection method, device and equipment
Kozik et al. Modelling HTTP requests with regular expressions for detection of cyber attacks targeted at web applications
Eldos et al. On the KDD'99 Dataset: Statistical Analysis for Feature Selection
Platzer et al. A synopsis of critical aspects for darknet research
CN113535823A (en) Abnormal access behavior detection method and device and electronic equipment
WO2016173327A1 (en) Method and device for detecting website attack
CN115001724B (en) Network threat intelligence management method, device, computing equipment and computer readable storage medium
CN115643044A (en) Data processing method, device, server and storage medium
CN113395268A (en) Online and offline fusion-based web crawler interception method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant