CN110099059A - A kind of domain name recognition methods, device and storage medium - Google Patents
A kind of domain name recognition methods, device and storage medium Download PDFInfo
- Publication number
- CN110099059A CN110099059A CN201910373033.3A CN201910373033A CN110099059A CN 110099059 A CN110099059 A CN 110099059A CN 201910373033 A CN201910373033 A CN 201910373033A CN 110099059 A CN110099059 A CN 110099059A
- Authority
- CN
- China
- Prior art keywords
- domain name
- analyzed
- feature
- vector
- feature vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L61/00—Network arrangements, protocols or services for addressing or naming
- H04L61/30—Managing network names, e.g. use of aliases or nicknames
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1433—Vulnerability analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
- H04L67/025—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP] for remote control or remote monitoring of applications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1095—Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses a kind of domain name recognition methods, device and storage medium, the embodiment of the present invention is by receiving domain name identification instruction;It identifies that instruction obtains multiple client and records for the access of domain name to be analyzed according to domain name, and the corresponding feature vector of domain name to be analyzed is constructed according to access record;Feature Dimension Reduction processing is carried out to the corresponding feature vector of the domain name to be analyzed, obtains feature vector after the corresponding dimensionality reduction of domain name to be analyzed;The similarity between domain name to be analyzed is calculated based on feature vector after the corresponding dimensionality reduction of the domain name to be analyzed;The domain name to be analyzed is clustered based on the similarity between domain name to be analyzed, the domain name collection after being clustered, analysis domain name is treated according to the domain name collection after the cluster and is identified.The accuracy of domain name identification can be improved in the program.
Description
Technical field
The present invention relates to fields of communication technology, and in particular to a kind of domain name recognition methods, device and storage medium.
Background technique
Currently, malice domain name has become one of the harm that domestic or even global network safety filed is paid close attention to the most.
Malice domain name is also malicious websites, refers to the loophole of the website use browser or application software, is embedded in malicious code, with
In the unwitting situation in family, website that the machine of user is distorted or destroyed.For other counterfeit websites such as bank's net
It stands, e-commerce website, although the machine of user is not distorted or destroyed, is also defined as malice domain name.
Malice domain name is capable of forming a huge network system, infected system is controlled by network, while not
Cause network harm together, such as quickly propagate wooden horse worm, steal in the short time a large amount of sensitive informations, seize system resource into
Row illegal objective makes profit, initiates large-scale distributed denial of service attack etc., tracked to harm and loss inhibit to bring it is huge
Trouble.
Traditional malice domain name detection mainly uses rogue program conversed analysis technology.And conversed analysis binary system wooden horse is very
The time is expended, and it also requires the case where considering shell adding causes inverse so that the acquisition of conversed analysis rule depends on reverse personnel
To the accuracy rate and inefficient of analysis identification.
Summary of the invention
The embodiment of the present invention provides a kind of domain name recognition methods, device and storage medium, and the standard of domain name identification can be improved
True property and efficiency.
In order to solve the above technical problems, the embodiment of the present invention the following technical schemes are provided:
Receive domain name identification instruction;
Identify that instruction obtains multiple client and records for the access of domain name to be analyzed according to domain name, and according to described
Access record constructs the corresponding feature vector of domain name to be analyzed;
Feature Dimension Reduction processing is carried out to the corresponding feature vector of the domain name to be analyzed, obtains the corresponding drop of domain name to be analyzed
Feature vector after dimension;
The similarity between domain name to be analyzed is calculated based on feature vector after the corresponding dimensionality reduction of the domain name to be analyzed;
The domain name to be analyzed is clustered based on the similarity between domain name to be analyzed, the domain name after being clustered
Collection;
Analysis domain name is treated according to the domain name collection after the cluster to be identified, recognition result is obtained.
Correspondingly, the embodiment of the present invention also provides a kind of domain name identification device, comprising:
Receiving unit, for receiving domain name identification instruction;
Acquiring unit, for identifying that instruction obtains multiple client and remembers for the access of domain name to be analyzed according to domain name
Record, and the corresponding feature vector of domain name to be analyzed is constructed according to access record;
Dimensionality reduction unit, for carrying out Feature Dimension Reduction processing to the corresponding feature vector of the domain name to be analyzed, obtain to point
Feature vector after the corresponding dimensionality reduction of analysis domain name;
Computing unit calculates between domain name to be analyzed for feature vector after being based on the corresponding dimensionality reduction of the domain name to be analyzed
Similarity;
Cluster cell is obtained for being clustered based on the similarity between domain name to be analyzed to the domain name to be analyzed
Domain name collection after cluster;
Recognition unit is identified for treating analysis domain name according to the domain name collection after the cluster.
Optionally, in some embodiments, the dimensionality reduction unit includes determining that subelement and Hash change subelement:
The determining subelement, for hash function needed for determining the corresponding feature vector of the domain name to be analyzed;
The Hash changes subelement, for based on the hash function to the corresponding feature vector of the domain name to be analyzed
Hash variation is carried out, feature vector after dimensionality reduction is obtained.
Optionally, in some embodiments, the Hash variation subelement includes adding module, initialization module, comparison
Module and dimensionality reduction module;
The adding module, for adding corresponding Hash subcharacter vector to described eigenvector according to hash function;
The initialization module obtains initial for initializing the characteristic value in the Hash subcharacter vector
Change Hash feature subvector;
The contrast module, for the initialization Hash feature subvector is corresponding with the domain name to be analyzed sub special
Sign vector compares, and obtains comparing result;
The dimensionality reduction module, for being carried out according to the comparing result to the corresponding subcharacter vector of the domain name to be analyzed
Dimensionality reduction obtains feature vector after dimensionality reduction.
Optionally, in some embodiments, the contrast module includes: comparison submodule;
The comparison submodule, for by it is described initialization Hash feature subvector characteristic value and the domain name to be analyzed
Corresponding subcharacter vector characteristics value carries out size comparison, obtains comparing result.
Optionally, in some embodiments, the dimensionality reduction module includes replacement submodule, combination submodule and dimensionality reduction submodule
Block:
The replacement submodule, if the characteristic value for the initialization Hash feature subvector is greater than the domain to be analyzed
The corresponding subcharacter vector characteristics value of name, and the corresponding subcharacter vector characteristics value of the domain name to be analyzed is greater than preset value, then
The characteristic value of the initialization Hash feature subvector is replaced;If the characteristic value of the initialization Hash feature subvector
Subcharacter vector characteristics value corresponding greater than the domain name to be analyzed, and the corresponding subcharacter vector characteristics of the domain name to be analyzed
Value is equal to preset value, then the characteristic value of the initialization Hash feature subvector remains unchanged;
The combination submodule, for the characteristic value of replaced Hash feature subvector and the Hash remained unchanged is special
The characteristic value of sign subvector is combined, and obtains combination Hash feature subvector;
The dimensionality reduction submodule, for according to combination Hash feature subvector to the corresponding subcharacter of the domain name to be analyzed
Vector carries out dimensionality reduction, obtains feature vector after dimensionality reduction.
Optionally, in some embodiments, described device further includes filter element;
The filter element is filtered for being filtered processing to the corresponding feature vector of the domain name to be analyzed
Feature vector afterwards;Using filtered feature vector as the corresponding feature vector of domain name to be analyzed.
Optionally, in some embodiments, the computing unit includes the first acquisition subelement and computation subunit:
Described first obtains subelement, for obtaining the similarity calculation coefficient between domain name to be analyzed;Obtain it is described to
The vector line number of feature vector after the corresponding dimensionality reduction of analysis domain name;
The computation subunit, for according to the similarity calculation coefficient and vector line number calculate domain name to be analyzed it
Between similarity.
Optionally, in some embodiments, the acquiring unit further includes sorting subunit, detection sub-unit and building
Unit;
The sorting subunit, for being ranked up according to sequence track to multiple client;
The detection sub-unit whether there is access record to the domain name to be analyzed for client after detecting sequence,
The corresponding testing result of client after being sorted;
The building subelement, for being put in order based on client after sequence corresponding testing result, client, structure
Build the corresponding feature vector of domain name to be analyzed.
Optionally, in some embodiments, the cluster cell includes the first contrast subunit and cluster subelement;
The contrast subunit, the similarity for being analysed between domain name are compared with preset value;
The cluster subelement, the domain name to be analyzed for similarity to be greater than preset value are clustered to same domain name collection extremely
In, the domain name collection after being clustered.
Optionally, in some embodiments, the recognition unit includes extracting subelement, the second contrast subunit and determination
Subelement;
The extraction subelement, for extracting the feature of target domain name;
Second contrast subunit, the domain for concentrating the domain name after the feature of the target domain name and the cluster
Name feature compares;
The determining subelement, if concentrating for the domain name after the cluster in the presence of consistent with the target domain name feature
Domain name, it is determined that the domain name that the domain name after the cluster is concentrated is target domain name.
Optionally, in some embodiments, described device further includes determination unit and transmission unit;
The determination unit, if for identifying that the domain name that domain name is concentrated is target domain name, it is determined that domain name collection
The access client of middle domain name;
The transmission unit, for sending a warning message to the access client.
Optionally, in some embodiments, the acquiring unit further includes that display subelement and second obtain subelement;
The display subelement, for showing that domain name detects the page, the page includes domain name identification control;
Described second obtains subelement, for the trigger action based on user for domain name identification control, obtains more
A client is recorded for the access of domain name to be analyzed.
Optionally, in some embodiments, described device further includes display unit;
The display unit, for showing domain name recognition result in the domain name detection page.
In addition, the embodiment of the present invention also provides a kind of storage medium, the storage medium is stored with a plurality of instruction, the finger
It enables and being loaded suitable for processor, to execute the step in any domain name recognition methods provided in an embodiment of the present invention.
The embodiment of the present invention is by receiving domain name identification instruction;Identify that instruction obtains multiple client needle according to domain name
Treat the access record of analysis domain name;The corresponding feature vector of domain name to be analyzed is constructed according to access record;To constructing
The feature vector arrived carries out Feature Dimension Reduction processing, obtains feature vector after the corresponding dimensionality reduction of domain name to be analyzed;Based on described wait divide
Feature vector calculates the similarity between domain name to be analyzed after the corresponding dimensionality reduction of analysis domain name;Based on similar between domain name to be analyzed
Degree clusters the domain name to be analyzed, and domain name collection after being clustered can be treated point according to the domain name collection after the cluster
Analysis domain name is identified;Domain name is clustered to domain name according to similarity and is concentrated by this programme, according to the domain of each classification after cluster
Name collection is identified again, since the domain name that same domain name is concentrated is same or similar classification, is effectively prevented out in identification process
The situation of existing identification types mistake provides the accuracy of domain name identification, and the domain name that domain name is concentrated is similar or identical class
Type, does not need to compare all domain name features and can be obtained the type of domain name, reduces the recognition time of domain name, improves domain name
Recognition efficiency.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for
For those skilled in the art, without creative efforts, it can also be obtained according to these attached drawings other attached
Figure.
Fig. 1 a is a schematic diagram of a scenario of domain name recognition methods provided in an embodiment of the present invention;
Fig. 1 b is another schematic diagram of a scenario of domain name recognition methods provided in an embodiment of the present invention;
Fig. 2 is the flow chart of domain name recognition methods provided in an embodiment of the present invention;
Fig. 3 is the schematic diagram of the domain name detection page of domain name recognition methods provided in an embodiment of the present invention;
Fig. 4 is another flow chart of domain name recognition methods provided in an embodiment of the present invention;
Fig. 5 a is the schematic diagram of service cluster provided in an embodiment of the present invention;
Fig. 5 b is the architecture diagram of domain name recognition methods provided in an embodiment of the present invention;
Fig. 6 a is a structural schematic diagram of domain name identification device provided in an embodiment of the present invention;
Fig. 6 b is another structural schematic diagram of domain name identification device provided in an embodiment of the present invention;
Fig. 6 c is another structural schematic diagram of domain name identification device provided in an embodiment of the present invention;
Fig. 7 is the structural schematic diagram of server provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, those skilled in the art's every other implementation obtained without creative efforts
Example, shall fall within the protection scope of the present invention.
The embodiment of the present invention provides a kind of domain name recognition methods, device and storage medium.It is described in detail separately below.
The domain name identification device can integrate in the network device, which can be server, such as cloud service
Device is also possible to the equipment such as terminal.
For example, with reference to Fig. 1 a, by taking the domain name identification device specifically integrates in the server as an example, firstly, having in client
There is domain name to detect the page, user can identify that control triggers domain name identification instruction, domain by clicking the domain name in the domain name detection page
Name identifies that the domain name that user ready to receive triggers identifies instruction, is then based on domain name identification instruction acquisition client and treats analysis domain
The access record of name, specifically, all clients treat the access record of analysis domain name in available server, can also be with
It obtains a certain range of client and treats the access record of analysis domain name, for example obtain the access that client treats analysis domain name
Record etc. can specifically obtain it is not limited here according to the range that user inputs, can be with when user does not limit
The all clients that default obtains in server treat the access record of analysis domain name.Then according to the access record building to
The corresponding feature vector of domain name is analyzed, specifically traversal multiple client is recorded for the access of domain name to be analyzed first;If client
There is access record to the domain name to be analyzed in end, then be recorded as the first element, for example be 1, if it does not exist, be then recorded as second
Element, for example be 0;Then the first element and second element that record obtains are arranged according to the sequence of traversal
Building obtains the corresponding feature vector of domain name to be analyzed.Then Feature Dimension Reduction processing is carried out to the feature vector that building obtains, obtained
Feature vector after to the corresponding dimensionality reduction of domain name to be analyzed, to reduce the quantity of domain name to be analyzed;It is then based on described to be analyzed
Feature vector calculates the similarity between domain name to be analyzed after the corresponding dimensionality reduction of domain name;Based on the similarity between domain name to be analyzed
The domain name to be analyzed is clustered, the high domain name to be analyzed cluster of similarity is concentrated to same domain name, after obtaining cluster
Domain name collection, specific cluster process and result are as shown in Figure 1 b, can treat analysis domain name according to the domain name collection after the cluster
Classification and Identification is carried out, since this programme is to identify the high domain name cluster of similarity again in domain name concentration, is not needed to every
A domain name is identified the type you can learn that domain name, so that the quantity for greatly reducing domain name identification improves domain with the time
The efficiency of name identification.And first domain name is clustered to be identified again, due to clustering the similar of the domain name concentrated in same domain name
Degree is higher, generally same type domain name, therefore decreases the situation of domain name type identification mistake, to improve domain name identification
Accuracy.
It is described in detail separately below.It should be noted that the following description sequence is not as excellent to embodiment
The restriction of choosing sequence.
In the present embodiment, it will be described from the angle of domain name identification device, which can specifically collect
At in the network equipment such as terminal or server equipment.
A kind of domain name recognition methods, comprising: receive domain name identification instruction;Identify that instruction obtains multiple visitors according to domain name
Family end is recorded for the access of domain name to be analyzed;And the corresponding feature vector of domain name to be analyzed is constructed according to access record;
Feature Dimension Reduction processing is carried out to the corresponding feature vector of the domain name to be analyzed, obtains feature after the corresponding dimensionality reduction of domain name to be analyzed
Vector;The similarity between domain name to be analyzed is calculated based on feature vector after the corresponding dimensionality reduction of the domain name to be analyzed;Based on to
Similarity between analysis domain name clusters the domain name to be analyzed, domain name collection after being clustered, after the cluster
Domain name collection treat analysis domain name identified.
As shown in Fig. 2, the detailed process of the domain name identification device can be such that
S201 receives domain name identification instruction;
Firstly, having domain name to detect the page, the internet that domain name is formed by a string with the name that point separates in server
Upper a certain computer or the title for calculating unit, for identifying the electronic bearing of computer when data are transmitted (sometimes referred to as
Geographical location).Domain Name System (DNS, Domain Name System, sometimes also referred to as domain name) is one of internet
Kernel service, it can make one more easily as the distributed data base that can mutually map domain name and IP address
Internet is accessed, remembers to be detected by clicking domain name by the IP address number string that machine is directly read, user without spending
Domain name identification control triggering domain name identification instruction in the page, so as to server reception.
Wherein, user also can be set timing and carry out domain name identification, when user setting timing carries out domain name identification,
Reach the time of user setting domain name identification, can the identification instruction of automatic trigger domain name, server domain name ready to receive identification refers to
It enables, does not need user and trigger manually every time.User, can also manual trigger field after setting timing carries out domain name identification
Name identification instruction, for example, when needing to carry out domain name identification there are emergency case, and current time is not at the domain of user setting
When name recognition time, user can be by triggering domain name identification instruction, so as to server reception, to carry out domain name identification manually.
S202 identifies that instruction obtains multiple client and records for the access of domain name to be analyzed according to domain name;And root
The corresponding feature vector of domain name to be analyzed is constructed according to access record;
After server receives domain name identification instruction, client can be obtained based on domain name identification instruction and treat analysis domain
Name access record, client is user terminal, be it is corresponding with server, provide the program of local service for client.In addition to
Some except the application program of local runtime, are typically mounted in common client computer, need to work in coordination with server-side
Operation, more common user terminal includes the web browser used such as WWW, receives Email visitor when posting Email
Family end and the client software of instant messaging etc..For this kind of application programs, need to have in network corresponding server and
Service routine provides corresponding service, such as database service, E-mail service etc., in this way in client-server
End, needs to establish specific communication connection, to guarantee the normal operation of application program.Specifically, institute in available server
Some clients treat the access record of analysis domain name, and also available a certain range of client treats the visit of analysis domain name
It asks record, for example obtains client and treat access record of analysis domain name etc., it is not limited here, specifically can be defeated according to user
The range entered is obtained, and when user does not limit, can be defaulted all clients obtained in server and be treated analysis domain
The access record of name.Specifically, the client in server treats the access record of analysis domain name, can specifically pass through domain to be analyzed
Name, which records access, extracts analysis acquisition, if in the access record of domain name to be analyzed including the client in server,
Determine by comprising client treat analysis domain name exist access record.Not by comprising client then treat analysis domain name do not deposit
It is recorded in access.
Therefore, the step obtains multiple client and records for the access of domain name to be analyzed, comprising:
Show that domain name detects the page, the page includes domain name identification control;
Based on user for the trigger action of domain name identification control, multiple client is obtained for domain name to be analyzed
Access record.
The domain name detection page is provided in server, user can search domain name by fixed interface or network address and detect page
Face, after the domain name detection page is searched in triggering, the lookup domain name detection page that server can be triggered according to user refers to user
It enables and carries out domain name detection page lookup, and the domain name found the detection page is shown, wherein as shown in figure 3, the page
Face includes domain name identification control, the operation such as clicks or slide for user, user is by clicking or touching domain name identification control
Part can trigger domain name identification instruction, and server can identify the trigger action of control triggering based on user for domain name,
The access that multiple client is obtained for domain name to be analyzed records.
Then the corresponding feature vector of domain name to be analyzed is constructed according to access record, specifically, according to sequence track
Multiple client is ranked up;Client treats analysis domain name with the presence or absence of access record, after obtaining sequence after detection sequence
The corresponding testing result of client;Put in order based on client after sequence corresponding testing result, client, construct to point
Analyse the corresponding feature vector of domain name.
Therefore, further, include: referring to Fig. 4, the step S202
S401 is ranked up multiple client according to sequence track;
S402, client records the domain name to be analyzed with the presence or absence of access after detection sequence, client after being sorted
Hold corresponding testing result;
S403 is put in order based on client after sequence corresponding testing result, client, constructs domain name pair to be analyzed
The feature vector answered.
Specifically, multiple client is ranked up according to sequence track, specific sortord is randomly ordered, as long as protecting
Card obtains sequence consensus of the client to the domain name to be analyzed with the presence or absence of access record, client pair after detection sequence
With the presence or absence of access record, the corresponding testing result of client after being sorted can specifically pass through the domain name to be analyzed
Specific element records the access record of client, for example, if client has access record to the domain name to be analyzed,
It is recorded as the first element, if it does not exist, is then recorded as second element, specifically, first element can be defined as 1, second
Element can be defined as 0, then will record the first obtained element and second element according to the progress that puts in order of client
Arrangement, building obtain the corresponding feature vector of domain name to be analyzed, such as first client pair got in multiple client
There is access record in domain name to be analyzed, then be defined as 1, second client got treats analysis domain name and there is access note
Record is then defined as 1, and a client got of third treats analysis domain name there is no access record, then is defined as 0, then wait divide
The feature vector for analysing domain name is (1,1,0);If in another domain name to be analyzed, first client got in multiple client
Analysis domain name is treated at end, and there is no access records, then are defined as 0, second client got is treated analysis domain name and be not present
Access record is then defined as 0, and the client that third is got treats analysis domain name and there is access record, then is defined as 1, then
The feature vector of domain name to be analyzed is (0,0,1), it is therefore to be understood that client has multiple, while domain name to be analyzed
Also have multiple, therefore the access of each domain name to be analyzed can be recorded with specific reference to multiple client, be defined one by one, structure
It builds to obtain the feature vector of each domain name to be analyzed.
S203 carries out Feature Dimension Reduction processing to the corresponding feature vector of the domain name to be analyzed, obtains domain name pair to be analyzed
Feature vector after the dimensionality reduction answered;
Further, after obtaining the corresponding feature vector of domain name to be analyzed, it is corresponding that analysis domain name can be treated first
Feature vector carry out Feature Dimension Reduction processing, to reduce the calculation amount of domain name to be analyzed.
Specifically, carrying out Feature Dimension Reduction processing to the corresponding feature vector of the domain name to be analyzed may include:
Hash function needed for determining the corresponding feature vector of the domain name to be analyzed;
Hash variation is carried out to the corresponding feature vector of the domain name to be analyzed based on the hash function, after obtaining dimensionality reduction
Feature vector.
Specifically, multiple row is carried out due to needing to treat the corresponding feature vector of analysis domain name to upset, then in order to simulate into
The effect that every trade is upset determines and selects n random Harsh function h1, h2, h3 ... hn, based on the hash function to described wait divide
It analyses the corresponding feature vector of domain name and carries out Hash variation, obtain feature vector after dimensionality reduction, specific Hash change procedure includes:
Corresponding Hash subcharacter vector is added to described eigenvector according to hash function;
Characteristic value in the Hash subcharacter vector is initialized, initialization Hash feature subvector is obtained;
Initialization Hash feature subvector subcharacter vector corresponding with the domain name to be analyzed is compared, is obtained
To comparing result;
Dimensionality reduction is carried out to the corresponding subcharacter vector of the domain name to be analyzed according to the comparing result, is obtained special after dimensionality reduction
Levy vector.
Corresponding Hash subcharacter vector is added to described eigenvector according to hash function, for example, it is assumed that be analyzed
The feature vector of domain name each behavior a, b, c, d, e, a (1,0,0,1), b (0,0,1,0), c (0,1,0,1), d (1,0,1,1), e
(0,0,1,0) changes abcde into corresponding line number, adds two hash functions later, wherein h1 (x)=(x+1) mod 5,
H2 (x)=(3*x+1) mod 5, notices that x refers to line number here.
S1 | S2 | S3 | S4 | h1 | h2 | |
0 | 1 | 0 | 0 | 1 | 1 | 1 |
1 | 0 | 0 | 1 | 0 | 2 | 4 |
2 | 0 | 1 | 0 | 1 | 3 | 2 |
3 | 1 | 0 | 1 | 1 | 4 | 0 |
4 | 0 | 0 | 1 | 0 | 0 | 3 |
Characteristic value in the Hash subcharacter vector is initialized, initialization Hash feature subvector, tool are obtained
Body treatment process is that SIG (i, c) is enabled to indicate element of i-th of hash function on c column.When beginning, by all SIG (i,
C) it is initialized as Inf (infinity), i.e., is all initialized as Inf:
S1 | S2 | S3 | S4 | |
h1 | Inf | Inf | Inf | Inf |
h2 | Inf | Inf | Inf | Inf |
Further, dimensionality reduction is carried out to the corresponding subcharacter vector of the domain name to be analyzed according to the comparing result, obtained
Feature vector after to dimensionality reduction, comprising:
If the characteristic value of the initialization Hash feature subvector is greater than the corresponding subcharacter vector of the domain name to be analyzed
Characteristic value, and the corresponding subcharacter vector characteristics value of the domain name to be analyzed is greater than preset value, then it is the initialization Hash is special
The characteristic value of sign subvector is replaced;
If the characteristic value of the initialization Hash feature subvector is greater than the corresponding subcharacter vector of the domain name to be analyzed
Characteristic value, and the corresponding subcharacter vector characteristics value of the domain name to be analyzed is equal to preset value, then the initialization Hash feature
The characteristic value of subvector remains unchanged;
By the characteristic value of replaced Hash feature subvector and the characteristic value of Hash feature subvector that remains unchanged into
Row combination obtains combination Hash feature subvector;
Dimensionality reduction is carried out to the corresponding subcharacter vector of the domain name to be analyzed according to combination Hash feature subvector, is dropped
Feature vector after dimension.
Initialization Hash feature subvector subcharacter vector corresponding with the domain name to be analyzed is compared, is obtained
To comparing result, calculate h1 (r), h2 (r) ... hn (r);For each column c, if the r behavior 0 where c, without place
Reason;If the r behavior 1 where c, for each i=1,2 ... n, SIG (i, c) is set to original SIG (i, c) and hi
(r) minimum value between.Following calculate the signature matrix, turning next to the 0th row in feature vector;At this moment the value of S2 and S3 is 0,
So without change;The value of S1 and S4 is 1, needs to change.H1=1, h2=1.1 is smaller than Inf, so need to be the two positions S1 and S4
It sets corresponding value to replace, i.e., by the characteristic value son corresponding with the domain name to be analyzed of the initialization Hash feature subvector
Feature vector characteristic value carries out size comparison, obtains comparing result, effect is as follows after replacement:
S1 | S2 | S3 | S4 | |
h1 | 1 | Inf | Inf | 1 |
h2 | 1 | Inf | Inf | 1 |
Turning next to the 1st row in feature vector;Only the value of S3 is 1;H1=2 at this time, h2=4;To S3, that column is carried out
Replacement, obtains:
S1 | S2 | S3 | S4 | |
h1 | 1 | Inf | 2 | 1 |
h2 | 1 | Inf | 4 | 1 |
Turning next to the 2nd row in feature vector;The value of S2 and S4 is 1;H1=3, h2=2;Because in feature vector S4 that
Two values of one column are all 1, smaller than 3 and 2, so need to only replace that column of S2:
S1 | S2 | S3 | S4 | |
h1 | 1 | 3 | 2 | 1 |
h2 | 1 | 2 | 4 | 1 |
Turning next to the 3rd row in feature vector;The value of S1, S3 and S4 are all 1, h1=4, h2=0;Effect is such as after replacement
Under:
S1 | S2 | S3 | S4 | |
h1 | 1 | 3 | 2 | 1 |
h2 | 0 | 2 | 0 | 0 |
Turning next to the 4th row in feature vector;S3 value is 1, h1=0, h2=3, and final effect is as follows:
S1 | S2 | S3 | S4 | |
h1 | 1 | 3 | 0 | 1 |
h2 | 0 | 2 | 0 | 0 |
In this way, all rows are all traversed once, finally obtained signature matrix is as follows:
S1 | S2 | S3 | S4 | |
h1 | 1 | 3 | 0 | 1 |
h2 | 0 | 2 | 0 | 0 |
Dimensionality reduction, the signature matrix of last S1 are carried out to the corresponding subcharacter vector of the domain name to be analyzed according to comparing result
For (1,0), script feature vector is (1,0,0,1,0), shortens length, and S2 is (3,2), and S3 is (0,0), and S4 is (1,0), is obtained
Feature vector after to dimensionality reduction, hence it is evident that as can be seen that the feature vector of domain name to be analyzed all shortens length, to reduce calculating
The calculation amount of similarity between domain name to be analyzed, therefore using the signature matrix after last progress hash conversion as to be analyzed
Feature vector after the corresponding dimensionality reduction of domain name.
It, can be first right in order to reduce the analysis quantity of domain name to be analyzed before treating analysis domain name and carrying out dimension-reduction treatment
Domain name to be analyzed is filtered, specifically:
Processing is filtered to the corresponding feature vector of the domain name to be analyzed, feature vector after being filtered;
Using filtered feature vector as the corresponding feature vector of domain name to be analyzed.
Processing is filtered to the corresponding feature vector of the domain name to be analyzed, the domain name to be analyzed after obtaining filtration treatment
Corresponding feature vector.
In specific implementation process, filtration treatment may include the general domain name treated analyze in domain name, the internal domain used
Name etc. is filtered, and is filtered to the domain name of fixed malice clique, can specifically include:
(1) obtain fixed general domain name first, i.e., the domain name that substantially each client can access, such as
10000 domain name before Alexa ranking, such as domain name etc. and the general CDN of Google.com (Google, search engine)
(Content Delivery Network, content distributing network) domain name, is then analysed to domain name and has determined that with what is got
General domain name and general CDN compare, it is consistent with fixed general domain name or general CDN if it exists
Domain name, then will be with fixed general domain name or the consistent domain name of general CDN and corresponding feature vector to be analyzed
It is removed in domain name list, i.e., determining and fixed general domain name or the consistent domain name of general CDN are normal or general
Domain name does not need to be analyzed again, to reduce the calculation amount of domain name to be analyzed.
(2) after removing general domain name or the general CDN in domain name to be analyzed, further obtain white list and
Domain name in blacklist.The domain name in white list is obtained first, wherein the domain name in white list may include internal general domain
Then name, the domain name etc. for test are analysed to domain name and compare with the domain name in white list, if it exists and in white list
Domain name domain name to be analyzed always, then illustrate that consistent domain name to be analyzed is fixed internal domain name, or tested
Domain name, then equally by with the consistent domain name to be analyzed of the domain name in white list and corresponding feature vector from domain name to be analyzed
It is removed in list, to be further reduced the calculation amount of domain name to be analyzed.Further, the domain name in blacklist is obtained, wherein
Domain name in blacklist includes fixed malice domain name, such as the domain name of malice clique etc., be then analysed to domain name with it is black
Domain name in list compares, if it exists with the consistent domain name of domain name in blacklist, it is determined that consistent domain name to be analyzed
For malice domain name, then will be removed from domain name list to be analyzed with the consistent domain name to be analyzed of domain name in blacklist first, from
And it is further reduced the calculation amount of domain name to be analyzed.Then the client with the consistent domain name to be analyzed of domain name in blacklist is obtained
End access record, and the client in the presence of access and the consistent domain name to be analyzed of domain name in blacklist is extracted, it then extracts pre-
If warning message, and the warning message is sent in the presence of access and the domain name consistent domain name to be analyzed in blacklist
Client reminds the user that, reminds it to can suffer from attacking or being invaded, so that user takes precautions against in time.
Then will remove with the domain name in fixed general domain name or the consistent domain name of general CDN, white list with
And domain name and corresponding feature vector in the domain name to be analyzed after the domain name in blacklist, as filtered wait divide
Analyse the corresponding feature vector of domain name.
S404 is calculated similar between domain name to be analyzed based on feature vector after the corresponding dimensionality reduction of the domain name to be analyzed
Degree;
Be analysed to domain name feature vector carry out dimensionality reduction after, can be according to the corresponding dimensionality reduction of domain name to be analyzed after
Feature vector calculates the similarity between domain name to be analyzed.
Further, the step S404 includes:
Obtain the similarity calculation coefficient between domain name to be analyzed;
Obtain the vector line number of feature vector after the corresponding dimensionality reduction of the domain name to be analyzed;
The similarity between domain name to be analyzed is calculated according to the similarity calculation coefficient and vector line number.
Specifically, feature vector is divided into default row item (band), Mei Gehang after being analysed to the corresponding dimensionality reduction of domain name
Item is made of corresponding row.For each row item, there are a hash functions can be whole by the corresponding line number amount in row item
Array at column vector (each column in row item) be mapped to certain domain name concentrate.Identical Hash can be used to all row items
Function, a but independent domain name collection array is used for each row item, so even being the same column that do not go together in item
Vector will not be hashing onto the same domain name and concentrate.As long as in this way, two feature vectors fallen in some row item it is identical
Two column of domain name collection, the two feature vectors are regarded as possible similarity-rough set height, and the candidate as subsequent calculating is right;Then
Obtain the similarity calculation coefficient between domain name to be analyzed;Wherein the similarity calculation coefficient between domain name to be analyzed is fixed pre-
If the Jaccard (outstanding person blocks German number) of value, such as 0.4;Obtain feature vector after the corresponding dimensionality reduction of the domain name to be analyzed to
Measure line number;It is divided into the line number of default row item (band), it can be according to the similarity calculation coefficient and vector line number
Calculate the similarity between domain name to be analyzed.Specific formula for calculation is P=1- (1-rC)B, wherein P is represented between domain name to be analyzed
Similarity, C is line number amount, and r is similarity calculation coefficient, and B be the quantity for the domain name collection chosen, the value substitution that will acquire
In formula, for example assume that similarity calculation coefficient is 0.3, C 3, domain name collection B is 100, then P=1- (1-0.43)100=
0.998, i.e., the similarity between domain name to be analyzed is 0.998.
S405 clusters the domain name to be analyzed based on the similarity between domain name to be analyzed, obtains cluster converse domain
Name collection is treated analysis domain name according to the domain name collection after the cluster and is identified.
After the similarity being calculated between domain name to be analyzed, the domain name to be analyzed that similarity is greater than preset value is drawn
It assigns to the same domain name to concentrate, that is, is divided to same domain name and concentrates, domain name collection after being clustered.
Specifically, the domain name to be analyzed is clustered based on the similarity between domain name to be analyzed, after obtaining cluster
Domain name collection, comprising:
The similarity being analysed between domain name is compared with preset value;
The domain name to be analyzed that similarity is greater than preset value is clustered to same domain name and is concentrated, the domain name collection after being clustered.
The similarity that can be specifically analysed between domain name is compared with preset value, if the phase between domain name to be analyzed
It is greater than preset value like degree, is then analysed to domain name and clusters to same domain name concentration, successively clustered, until by all wait divide
It analyses domain name cluster to complete, domain name collection after cluster can be obtained.It should be noted that in order to reduce the quantity of subsequent domain name identification,
And the accuracy of domain name identification is improved, the preset value of the similarity comparison of setting is biggish value, for example is greater than 0.9 and is less than etc.
In 1 etc..
Then analysis domain name is treated according to the domain name collection after the cluster to be identified, concentrated due to the domain name after cluster
Domain name similarity is higher, the domain name to be analyzed that part identification domain name is concentrated is carried out, you can learn that the class of remaining domain name to be analyzed
Type.Therefore specific identification process is that part identifies the domain name to be analyzed that domain name is concentrated, and obtains the class of part domain name to be analyzed
Type can not be known if the type of the part domain name to be analyzed of identification is consistent using the type of part domain name to be analyzed as remaining
The type of other domain name to be analyzed.If the Type-Inconsistencies of the part domain name to be analyzed of identification, can be improved the to be analyzed of identification
The quantity of domain name, to improve the accuracy of identification, and by the domain name to be analyzed of identification, accounting is more than preset ratio wait divide
The type of domain name is analysed as the type for being left unidentified domain name to be analyzed.
Specifically, analysis domain name is treated according to the domain name collection after the cluster to be identified, may include:
Extract the feature of target domain name;
The domain name feature that the feature of the target domain name is concentrated with the domain name after the cluster is compared;
Exist and the consistent domain name of target domain name feature if the domain name after the cluster is concentrated, it is determined that the cluster
The domain name that domain name afterwards is concentrated is target domain name.
In specific domain name identification process, the feature of target domain name can be extracted, and by the feature of the target domain name and institute
The domain name that domain name is concentrated after clustering is stated to compare, since the similarity of the type of the domain name of same domain name concentration is higher, or
It is consistent for type, therefore, part domain name and the feature of target domain name that domain name is concentrated are compared, to reduce pair
The quantity of ratio shortens the reduced time, to improve to specific efficiency.Need to illustrate says, the feature of the target domain name of extraction, can be with
For the feature of the domain name of malice clique, if the domain name that domain name is concentrated after the cluster exist it is consistent with the feature of the target domain name
Domain name, it is determined that the domain name that domain name is concentrated after the cluster is target domain name, such as the domain name of malice clique.Further,
Since the time that the domain name of part malice is broken out is shorter, it is understood that there may be the case where not obtaining domain name feature can also pass through at this time
It is manually screened, judges that domain name is concentrated with the presence or absence of malice domain name.
Further, the domain name to be analyzed is being clustered based on the similarity between domain name to be analyzed, is being obtained
Domain name collection after cluster is treated according to the domain name collection after the cluster after analyzing the step of domain name is identified, further includes:
If identifying, the domain name that domain name is concentrated is target domain name, it is determined that the access client of domain name concentration domain name
End;
It sends a warning message to the access client.
If it is determined that the domain name that a certain domain name is concentrated is target domain name, the client of the domain name of domain name concentration was accessed in order to prevent
Hold it is under attack, then it needs to be determined that domain name concentrate domain name access client, Xiang Suoshu access client send alarm letter
Breath can specifically extract preset warning information, and the warning information is sent to the domain concentrated in the presence of access domain name
The client of name, or warning information is generated in real time, and the warning information of generation is sent to access client, to remind use
Family reminds it to can suffer from attacking or being invaded, so that user takes precautions against in time.
The domain name recognition methods that the present embodiment proposes, by receiving domain name identification instruction;It is identified and is instructed according to domain name
The access that multiple client is obtained for domain name to be analyzed records;The corresponding spy of domain name to be analyzed is constructed according to access record
Levy vector;Feature Dimension Reduction processing is carried out to the obtained feature vector of building, obtain after the corresponding dimensionality reduction of domain name to be analyzed feature to
Amount;The similarity between domain name to be analyzed is calculated based on feature vector after the corresponding dimensionality reduction of the domain name to be analyzed;Based on wait divide
Similarity between analysis domain name clusters the domain name to be analyzed, and domain name collection after being clustered can be according to the cluster
Domain name collection afterwards is treated analysis domain name and is identified;Domain name is clustered to domain name according to similarity and is concentrated by this programme, according to cluster
The domain name collection of each classification afterwards is identified again, since the domain name that same domain name is concentrated is same or similar classification, is identified
The situation for identification types mistake occur is effectively prevented in journey, provides the accuracy of domain name identification, and the domain that domain name is concentrated
Entitled similar or identical type, does not need to compare all domain name features and can be obtained the type of domain name, reduces the knowledge of domain name
The other time, improve domain name recognition efficiency.
Citing, is described in further detail by the method according to described in preceding embodiment below.
In the present embodiment, it will be illustrated so that the domain name identification device is specifically integrated in Cloud Server as an example.Wherein,
The Cloud Server may include a plurality of types of servers, and the quantity of each type of server can be according to concrete application scene
Depending on, moreover, these servers can be deployed in different region or computer room, or use different Network Provider.Server
Including user's input layer and domain name identification layer.Different levels can be realized using different types of server.For example, such as
Shown in Fig. 5 a, for including at least one input server, at least one domain name identification server, wherein various types of
The function of server can be such that
(1) server is inputted;
Input server mainly undertakes " access layer " function, the communication for being connected between client and server cluster,
Such as referring to Fig. 5 a.
For example, as shown in Figure 5 a, input server (i.e. access layer) specifically can receive all kinds of requests of client transmission,
Such as domain name identification request, and send all kinds of requests received to domain name identification server (i.e. domain name identification layer), for example,
Domain name identification request is sent to domain name identification server, it, will when receiving the warning information that domain name identification server returns
Warning information is sent to client.
(2) domain name identifies server;
Domain name identification server mainly undertakes domain name identification function, logical between client and server cluster for being connected to
Letter, for example, domain name identification service implement body can receive all kinds of requests that input server is sent, such as domain name identification request,
And carried out executing corresponding operation according to the request received, if the request than receiving is domain name identification request, according to
Domain name identification request carries out domain name identification.And when identifying target domain name, sends a warning message and give input server.
As shown in Figure 5 b, a kind of domain name recognition methods, detailed process can be such that
S501, input server input domain name identification instruction, and domain name identification instruction is sent to domain name identification server;
Inputting in server, there is domain name to detect the page, and user can search domain name detection by fixed interface or network address
The page, for user after the domain name detection page is searched in triggering, the lookup domain name that server can be triggered according to user detects the page
Instruction carries out the domain name detection page and searches, and the domain name found the detection page is shown, wherein as shown in figure 3, described
The page includes domain name identification control, and for user's click, user identifies that control can trigger domain name identification and refer to by clicking domain name
It enables, input server can be obtained the domain name identification instruction of user's input, then identifies the domain name received and instructs, to domain name
Identify that server is sent.Wherein, user also can be set timing and carry out domain name identification, when user setting timing carries out domain name knowledge
When other, reach user setting domain name identification time, can automatic trigger domain name identification instruction, input server it is ready to receive
Domain name identification instruction, does not need user and triggers manually every time.User, can also be with after setting timing carries out domain name identification
Triggering domain name identification instruction manually, for example, when needing to carry out domain name identification there are emergency case, and current time is not at use
When the domain name recognition time of family setting, user can be received by triggering domain name identification instruction manually to input server.
S502, domain name identify that server receives the domain name that input server is sent and identifies instruction;
After input server sends domain name identification instruction, domain name identifies that server receives the domain that input server is sent
Name identification instruction.It is further possible to which setting timing carries out domain name identification in domain name identification server, when user setting
Timing carry out domain name identification when, reach user setting domain name identification time, can automatic trigger domain name identification instruction, domain name
It identifies server domain name identification instruction ready to receive, does not need user and trigger manually every time.User carries out in setting timing
After domain name identification, domain name identification instruction can also be triggered manually, for example, when needing to carry out domain name identification there are emergency case,
And current time is when being not at the domain name recognition time of user setting, user can by triggering domain name identification instruction manually, so as to
Domain name identifies that server receives, and identifies instruction without receiving the domain name that input server is sent.
S503, domain name identify that server identifies that instruction obtains multiple client for domain name to be analyzed according to domain name
Access record;And the corresponding feature vector of domain name to be analyzed is constructed according to access record;
After domain name identification server receives domain name identification instruction, client pair can be obtained based on domain name identification instruction
The access of domain name to be analyzed records, and client is user terminal, be it is corresponding with server, provide the journey of local service for client
Sequence.It other than some application programs in local runtime, is typically mounted in common client computer, needs mutual with server-side
Operation is matched, more common user terminal includes the web browser used such as WWW, receives electronics when posting Email
Mail Clients and the client software of instant messaging etc..For this kind of application programs, need there are corresponding clothes in network
Business device and service routine provide corresponding service, such as database service, E-mail service etc., in this way in client computer kimonos
It is engaged in device end, needing to establish specific communication connection, to guarantee the normal operation of application program.Specifically, available server
In all client treat the access record of analysis domain name, also available a certain range of client treats analysis domain name
Access record, such as obtain client treat analysis domain name access record etc., it is not limited here, specifically can according to
The range of family input is obtained, and when user does not limit, can be defaulted all clients obtained in server and be treated point
Analyse the access record of domain name.Specifically, the client in server treats the access record of analysis domain name, specifically can be by wait divide
Analysis domain name extracts analysis to access record and obtains, if including the client in server in the access record of domain name to be analyzed
End, it is determined that by comprising client treat analysis domain name exist access record.Not by comprising client then treat analysis domain
There is no access records for name.
Then constructing the corresponding feature vector of domain name to be analyzed according to access record specifically can be according to client
End is with the presence or absence of access record, and access record, then be defined as the first element if it exists, and access record, then be defined as if it does not exist
Second element, then according to the client of acquisition access record sequence, the element of each definition is arranged, as to
The element for analyzing the feature vector of domain name, can be obtained the corresponding feature vector of domain name to be analyzed.
S504, domain name identify that server carries out Feature Dimension Reduction processing to the corresponding feature vector of the domain name to be analyzed, obtain
Feature vector after to the corresponding dimensionality reduction of domain name to be analyzed;
Further, after obtaining the corresponding feature vector of domain name to be analyzed, it is corresponding that analysis domain name can be treated first
Feature vector carry out Feature Dimension Reduction processing, to reduce the calculation amount of domain name to be analyzed.
Specifically, carrying out Feature Dimension Reduction processing to the corresponding feature vector of the domain name to be analyzed may include:
Hash function needed for determining the corresponding feature vector of the domain name to be analyzed;
Hash variation is carried out to the corresponding feature vector of the domain name to be analyzed based on the hash function, after obtaining dimensionality reduction
Feature vector.
Specifically, multiple row is carried out due to needing to treat the corresponding feature vector of analysis domain name to upset, then in order to simulate into
The effect that every trade is upset determines and selects n random Harsh function h1, h2, h3 ... hn, based on the hash function to described wait divide
It analyses the corresponding feature vector of domain name and carries out Hash variation, obtain feature vector after dimensionality reduction, specific Hash change procedure includes:
Corresponding Hash subcharacter vector is added to described eigenvector according to hash function;
Characteristic value in the Hash subcharacter vector is initialized, initialization Hash feature subvector is obtained;
Initialization Hash feature subvector subcharacter vector corresponding with the domain name to be analyzed is compared, is obtained
To comparing result;
Dimensionality reduction is carried out to the corresponding subcharacter vector of the domain name to be analyzed according to the comparing result, is obtained special after dimensionality reduction
Levy vector.
Corresponding Hash subcharacter vector is added to described eigenvector according to hash function, for example, it is assumed that be analyzed
The feature vector of domain name each behavior a, b, c, d, e, a (1,0,0,1), b (0,0,1,0), c (0,1,0,1), d (1,0,1,1), e
(0,0,1,0) changes abcde into corresponding line number, adds two hash functions later, wherein h1 (x)=(x+1) mod 5,
H2 (x)=(3*x+1) mod 5, notices that x refers to line number here.
S1 | S2 | S3 | S4 | h1 | h2 | |
0 | 1 | 0 | 0 | 1 | 1 | 1 |
1 | 0 | 0 | 1 | 0 | 2 | 4 |
2 | 0 | 1 | 0 | 1 | 3 | 2 |
3 | 1 | 0 | 1 | 1 | 4 | 0 |
4 | 0 | 0 | 1 | 0 | 0 | 3 |
Characteristic value in the Hash subcharacter vector is initialized, initialization Hash feature subvector, tool are obtained
Body treatment process is that SIG (i, c) is enabled to indicate element of i-th of hash function on c column.When beginning, by all SIG (i,
C) it is initialized as Inf (infinity), i.e., is all initialized as Inf:
S1 | S2 | S3 | S4 | |
h1 | Inf | Inf | Inf | Inf |
h2 | Inf | Inf | Inf | Inf |
Further, dimensionality reduction is carried out to the corresponding subcharacter vector of the domain name to be analyzed according to the comparing result, obtained
Feature vector after to dimensionality reduction, comprising:
If the characteristic value of the initialization Hash feature subvector is greater than the corresponding subcharacter vector of the domain name to be analyzed
Characteristic value, and the corresponding subcharacter vector characteristics value of the domain name to be analyzed is greater than preset value, then it is the initialization Hash is special
The characteristic value of sign subvector is replaced;
If the characteristic value of the initialization Hash feature subvector is greater than the corresponding subcharacter vector of the domain name to be analyzed
Characteristic value, and the corresponding subcharacter vector characteristics value of the domain name to be analyzed is equal to preset value, then the initialization Hash feature
The characteristic value of subvector remains unchanged;
By the characteristic value of replaced Hash feature subvector and the characteristic value of Hash feature subvector that remains unchanged into
Row combination obtains combination Hash feature subvector;
Dimensionality reduction is carried out to the corresponding subcharacter vector of the domain name to be analyzed according to combination Hash feature subvector, is dropped
Feature vector after dimension.
Initialization Hash feature subvector subcharacter vector corresponding with the domain name to be analyzed is compared, is obtained
To comparing result, calculate h1 (r), h2 (r) ... hn (r);For each column c, if the r behavior 0 where c, without place
Reason;If the r behavior 1 where c, for each i=1,2 ... n, SIG (i, c) is set to original SIG (i, c) and hi
(r) minimum value between.Following calculate the signature matrix, turning next to the 0th row in feature vector;At this moment the value of S2 and S3 is 0,
So without change;The value of S1 and S4 is 1, needs to change.H1=1, h2=1.1 is smaller than Inf, so need to be the two positions S1 and S4
It sets corresponding value to replace, i.e., by the characteristic value son corresponding with the domain name to be analyzed of the initialization Hash feature subvector
Feature vector characteristic value carries out size comparison, obtains comparing result, effect is as follows after replacement:
S1 | S2 | S3 | S4 | |
h1 | 1 | Inf | Inf | 1 |
h2 | 1 | Inf | Inf | 1 |
Turning next to the 1st row in feature vector;Only the value of S3 is 1;H1=2 at this time, h2=4;To S3, that column is carried out
Replacement, obtains:
S1 | S2 | S3 | S4 | |
h1 | 1 | Inf | 2 | 1 |
h2 | 1 | Inf | 4 | 1 |
Turning next to the 2nd row in feature vector;The value of S2 and S4 is 1;H1=3, h2=2;Because in feature vector S4 that
Two values of one column are all 1, smaller than 3 and 2, so need to only replace that column of S2:
S1 | S2 | S3 | S4 | |
h1 | 1 | 3 | 2 | 1 |
h2 | 1 | 2 | 4 | 1 |
Turning next to the 3rd row in feature vector;The value of S1, S3 and S4 are all 1, h1=4, h2=0;Effect is such as after replacement
Under:
S1 | S2 | S3 | S4 | |
h1 | 1 | 3 | 2 | 1 |
h2 | 0 | 2 | 0 | 0 |
Turning next to the 4th row in feature vector;S3 value is 1, h1=0, h2=3, and final effect is as follows:
S1 | S2 | S3 | S4 | |
h1 | 1 | 3 | 0 | 1 |
h2 | 0 | 2 | 0 | 0 |
In this way, all rows are all traversed once, finally obtained signature matrix is as follows:
S1 | S2 | S3 | S4 | |
h1 | 1 | 3 | 0 | 1 |
h2 | 0 | 2 | 0 | 0 |
Dimensionality reduction, the signature matrix of last S1 are carried out to the corresponding subcharacter vector of the domain name to be analyzed according to comparing result
For (1,0), script feature vector is (1,0,0,1,0), shortens length, and S2 is (3,2), and S3 is (0,0), and S4 is (1,0), is obtained
Feature vector after to dimensionality reduction, hence it is evident that as can be seen that the feature vector of domain name to be analyzed all shortens length, to reduce calculating
The calculation amount of similarity between domain name to be analyzed, therefore using the signature matrix after last progress hash conversion as to be analyzed
Feature vector after the corresponding dimensionality reduction of domain name.
It, can be first right in order to reduce the analysis quantity of domain name to be analyzed before treating analysis domain name and carrying out dimension-reduction treatment
Domain name to be analyzed is filtered, specifically:
Processing is filtered to the corresponding feature vector of the domain name to be analyzed, feature vector after being filtered;
Using filtered feature vector as the corresponding feature vector of domain name to be analyzed.
Processing is filtered to the corresponding feature vector of the domain name to be analyzed, the domain name to be analyzed after obtaining filtration treatment
Corresponding feature vector.
In specific implementation process, filtration treatment may include the general domain name treated analyze in domain name, the internal domain used
Name etc. is filtered, and is filtered to the domain name of fixed malice clique, can specifically include:
(1) obtain fixed general domain name first, i.e., the domain name that substantially each client can access, such as
10000 domain name before Alexa ranking, such as domain name etc. and the general CDN of Google.com (Google, search engine)
(Content Delivery Network, content distributing network) domain name, is then analysed to domain name and has determined that with what is got
General domain name and general CDN compare, it is consistent with fixed general domain name or general CDN if it exists
Domain name, then will be with fixed general domain name or the consistent domain name of general CDN and corresponding feature vector to be analyzed
It is removed in domain name list, i.e., determining and fixed general domain name or the consistent domain name of general CDN are normal or general
Domain name does not need to be analyzed again, to reduce the calculation amount of domain name to be analyzed.
(2) after removing general domain name or the general CDN in domain name to be analyzed, further obtain white list and
Domain name in blacklist.The domain name in white list is obtained first, wherein the domain name in white list may include internal general domain
Then name, the domain name etc. for test are analysed to domain name and compare with the domain name in white list, if it exists and in white list
Domain name domain name to be analyzed always, then illustrate that consistent domain name to be analyzed is fixed internal domain name, or tested
Domain name, then equally by with the consistent domain name to be analyzed of the domain name in white list and corresponding feature vector from domain name to be analyzed
It is removed in list, to be further reduced the calculation amount of domain name to be analyzed.Further, the domain name in blacklist is obtained, wherein
Domain name in blacklist includes fixed malice domain name, such as the domain name of malice clique etc., be then analysed to domain name with it is black
Domain name in list compares, if it exists with the consistent domain name of domain name in blacklist, it is determined that consistent domain name to be analyzed
For malice domain name, then will be removed from domain name list to be analyzed with the consistent domain name to be analyzed of domain name in blacklist first, from
And it is further reduced the calculation amount of domain name to be analyzed.Then the client with the consistent domain name to be analyzed of domain name in blacklist is obtained
End access record, and the client in the presence of access and the consistent domain name to be analyzed of domain name in blacklist is extracted, it then extracts pre-
If warning message, and the warning message is sent in the presence of access and the domain name consistent domain name to be analyzed in blacklist
Client reminds the user that, reminds it to can suffer from attacking or being invaded, so that user takes precautions against in time.
Then will remove with the domain name in fixed general domain name or the consistent domain name of general CDN, white list with
And domain name and corresponding feature vector in the domain name to be analyzed after the domain name in blacklist, as filtered wait divide
Analyse the corresponding feature vector of domain name.
S505, feature vector calculates domain to be analyzed after domain name identification server is based on the corresponding dimensionality reduction of the domain name to be analyzed
Similarity between name;
Be analysed to domain name feature vector carry out dimensionality reduction after, can be according to the corresponding dimensionality reduction of domain name to be analyzed after
Feature vector calculates the similarity between domain name to be analyzed.
Specifically calculating process includes:
Obtain the similarity calculation coefficient between domain name to be analyzed;
Obtain the vector line number of feature vector after the corresponding dimensionality reduction of the domain name to be analyzed;
The similarity between domain name to be analyzed is calculated according to the similarity calculation coefficient and vector line number.
Specifically, feature vector is divided into default row item (band), Mei Gehang after being analysed to the corresponding dimensionality reduction of domain name
Item is made of corresponding row.For each row item, there are a hash functions can be whole by the corresponding line number amount in row item
Array at column vector (each column in row item) be mapped to certain domain name concentrate.Identical Hash can be used to all row items
Function, a but independent domain name collection array is used for each row item, so even being the same column that do not go together in item
Vector will not be hashing onto the same domain name and concentrate.As long as in this way, two feature vectors fallen in some row item it is identical
Two column of domain name collection, the two feature vectors are regarded as possible similarity-rough set height, and the candidate as subsequent calculating is right;Then
Obtain the similarity calculation coefficient between domain name to be analyzed;Wherein the similarity calculation coefficient between domain name to be analyzed is fixed pre-
If the Jaccard (outstanding person blocks German number) of value, such as 0.4;Obtain feature vector after the corresponding dimensionality reduction of the domain name to be analyzed to
Measure line number;It is divided into the line number of default row item (band), it can be according to the similarity calculation coefficient and vector line number
Calculate the similarity between domain name to be analyzed.Specific formula for calculation is P=1- (1-rC)B, wherein P is represented between domain name to be analyzed
Similarity, C is line number amount, and r is similarity calculation coefficient, and B be the quantity for the domain name collection chosen, the value substitution that will acquire
In formula, for example assume that similarity calculation coefficient is 0.3, C 3, domain name collection B is 100, then P=1- (1-0.43)100=
0.998, i.e., the similarity between domain name to be analyzed is 0.998.
S506, domain name identification server gather the domain name to be analyzed based on the similarity between domain name to be analyzed
Class, the domain name collection after being clustered;
S507 treats analysis domain name according to the domain name collection after the cluster and is identified.
After the similarity being calculated between domain name to be analyzed, the domain name to be analyzed that similarity is greater than preset value is drawn
It assigns to the same domain name to concentrate, that is, is divided to same domain name and concentrates, domain name collection after being clustered.
Specifically, the domain name to be analyzed is clustered based on the similarity between domain name to be analyzed, after obtaining cluster
Domain name collection, comprising:
The similarity being analysed between domain name is compared with preset value;
The domain name to be analyzed that similarity is greater than preset value is clustered to same domain name and is concentrated, the domain name collection after being clustered.
The similarity that can be specifically analysed between domain name is compared with preset value, if the phase between domain name to be analyzed
It is greater than preset value like degree, is then analysed to domain name and clusters to same domain name concentration, successively clustered, until by all wait divide
It analyses domain name cluster to complete, domain name collection after cluster can be obtained.It should be noted that in order to reduce the quantity of subsequent domain name identification,
And the accuracy of domain name identification is improved, the preset value of the similarity comparison of setting is biggish value, for example is greater than 0.9 and is less than etc.
In 1 etc..
Then analysis domain name is treated according to the domain name collection after the cluster to be identified, concentrated due to the domain name after cluster
Domain name similarity is higher, the domain name to be analyzed that part identification domain name is concentrated is carried out, you can learn that the class of remaining domain name to be analyzed
Type.Therefore specific identification process is that part identifies the domain name to be analyzed that domain name is concentrated, and obtains the class of part domain name to be analyzed
Type can not be known if the type of the part domain name to be analyzed of identification is consistent using the type of part domain name to be analyzed as remaining
The type of other domain name to be analyzed.If the Type-Inconsistencies of the part domain name to be analyzed of identification, can be improved the to be analyzed of identification
The quantity of domain name, to improve the accuracy of identification, and by the domain name to be analyzed of identification, accounting is more than preset ratio wait divide
The type of domain name is analysed as the type for being left unidentified domain name to be analyzed.
Specifically, analysis domain name is treated according to the domain name collection after the cluster to be identified, may include:
Extract the feature of target domain name;
The domain name feature that the feature of the target domain name is concentrated with the domain name after the cluster is compared;
Exist and the consistent domain name of target domain name feature if the domain name after the cluster is concentrated, it is determined that the cluster
The domain name that domain name afterwards is concentrated is target domain name.
In specific domain name identification process, the feature of target domain name can be extracted, and by the feature of the target domain name and institute
The domain name that domain name is concentrated after clustering is stated to compare, since the similarity of the type of the domain name of same domain name concentration is higher, or
It is consistent for type, therefore, part domain name and the feature of target domain name that domain name is concentrated are compared, to reduce pair
The quantity of ratio shortens the reduced time, to improve to specific efficiency.Need to illustrate says, the feature of the target domain name of extraction, can be with
For the feature of the domain name of malice clique, if the domain name that domain name is concentrated after the cluster exist it is consistent with the feature of the target domain name
Domain name, it is determined that the domain name that domain name is concentrated after the cluster is target domain name, such as the domain name of malice clique.Further,
Since the time that the domain name of part malice is broken out is shorter, it is understood that there may be the case where not obtaining domain name feature can also pass through at this time
It is manually screened, judges that domain name is concentrated with the presence or absence of malice domain name.
S508, if the domain name that domain name identification server identification domain name is concentrated is target domain name, it is determined that domain name
Concentrate the access client of domain name;
S509, domain name identification server send a warning message to the access client.
If it is determined that the domain name that a certain domain name is concentrated is target domain name, the client of the domain name of domain name concentration was accessed in order to prevent
Hold it is under attack, then it needs to be determined that domain name concentrate domain name access client, Xiang Suoshu access client send alarm letter
Breath can specifically extract preset warning information, and the warning information is sent to the domain concentrated in the presence of access domain name
The client of name, or warning information is generated in real time, and the warning information of generation is sent to access client, to remind use
Family reminds it to can suffer from attacking or being invaded, so that user takes precautions against in time.
From the foregoing, it will be observed that the present embodiment is by receiving domain name identification instruction;Identify that instruction obtains multiple visitors according to domain name
Family end is recorded for the access of domain name to be analyzed;The corresponding feature vector of domain name to be analyzed is constructed according to access record;It is right
It constructs obtained feature vector and carries out Feature Dimension Reduction processing, obtain feature vector after the corresponding dimensionality reduction of domain name to be analyzed;Based on institute
It states feature vector after the corresponding dimensionality reduction of domain name to be analyzed and calculates similarity between domain name to be analyzed;Based between domain name to be analyzed
Similarity the domain name to be analyzed is clustered, domain name collection after being clustered can be according to the domain name collection after the cluster
Analysis domain name is treated to be identified;Domain name is clustered to domain name according to similarity and is concentrated by this programme, according to each class after cluster
Other domain name collection is identified again, since the domain name that same domain name is concentrated is same or similar classification, is effectively kept away in identification process
Exempted from the situation for identification types mistake occur, provide domain name identification accuracy, and domain name concentrate domain name be it is similar or
Same type, does not need to compare all domain name features and can be obtained the type of domain name, reduces the recognition time of domain name, improves
Domain name recognition efficiency.
In order to better implement above method, the embodiment of the present invention can also provide a kind of domain name identification device, the domain name
Identification device specifically can integrate in the network device, which can be the equipment such as terminal or server.
For example, as shown in Figure 6 a, which may include receiving unit 601, acquiring unit 602, dimensionality reduction list
Member 603, computing unit 604, cluster cell 605 and recognition unit 606, as follows:
(1) receiving unit 601;
Receiving unit 601, for receiving domain name identification instruction.
For example it receives the domain name that input server is sent and identifies instruction.Or it can also identify in server and be arranged in domain name
Timing carry out domain name identification, when user setting timing carry out domain name identification when, reach user setting domain name identification time,
Can the identification instruction of automatic trigger domain name, domain name identifies server domain name ready to receive identification instruction, do not need each hand of user
It is dynamic to be triggered.User can also trigger domain name identification instruction, for example, working as after setting timing carries out domain name identification manually
There are emergency case, need to carry out domain name identification, and when current time is not at the domain name recognition time of user setting, Yong Huke
By triggering domain name identification instruction manually, so that domain name identification server receives, sent without receiving input server
Domain name identify instruction.
(2) acquiring unit 602;
Acquiring unit 602, for identifying that instruction obtains the visit that multiple client is directed to domain name to be analyzed according to domain name
Ask record;And the corresponding feature vector of domain name to be analyzed is constructed according to access record.
After domain name identification server receives domain name identification instruction, client pair can be obtained based on domain name identification instruction
The access of domain name to be analyzed records.Specifically, client all in available server treats the access note of analysis domain name
Record, also available a certain range of client treats the access record of analysis domain name, for example obtains client to be analyzed
Access record of domain name etc. can specifically be obtained according to the range that user inputs, not limited in user it is not limited here
Periodically, it can default and obtain the access record that all clients in server treat analysis domain name.Specifically, in server
Client treats the access record of analysis domain name, specifically can extract analysis to access record by domain name to be analyzed and obtain,
If including the client in server in the access record of domain name to be analyzed, it is determined that by comprising client treat analysis domain name
It is recorded in the presence of access.Not by comprising client then treat analysis domain name there is no access record.
Then constructing the corresponding feature vector of domain name to be analyzed according to access record specifically can be according to client
End is with the presence or absence of access record, and access record, then be defined as the first element if it exists, and access record, then be defined as if it does not exist
Second element, then according to the client of acquisition access record sequence, the element of each definition is arranged, as to
The element for analyzing the feature vector of domain name, can be obtained the corresponding feature vector of domain name to be analyzed.
(3) dimensionality reduction unit 603;
Dimensionality reduction unit 603, for carrying out Feature Dimension Reduction processing to the corresponding feature vector of the domain name to be analyzed, obtain to
Feature vector after the corresponding dimensionality reduction of analysis domain name.
After obtaining the corresponding feature vector of domain name to be analyzed, can treat first the corresponding feature vector of analysis domain name into
Row Feature Dimension Reduction processing, to reduce the calculation amount of domain name to be analyzed.
(4) computing unit 604;
Computing unit 604 calculates domain name to be analyzed for feature vector after being based on the corresponding dimensionality reduction of the domain name to be analyzed
Between similarity.
Be analysed to domain name feature vector carry out dimensionality reduction after, can be according to the corresponding dimensionality reduction of domain name to be analyzed after
Feature vector calculates the similarity between domain name to be analyzed.Then feature vector is divided into after being analysed to the corresponding dimensionality reduction of domain name
Default row item (band), each row item are made of corresponding row.For each row item, there are a hash functions can be by row
The column vector (each column in row item) of corresponding line number amount integer composition in item is mapped to certain domain name concentration, that is, maps
It is concentrated to a certain domain name.Can to all row items use identical hash function, but for each row item use one solely
Vertical domain name collection array will not be hashing onto the same domain name and concentrate so even being the identical column vector that do not go together in item.
As long as the two feature vectors are recognized in this way, two feature vectors have two column for falling in same domain name collection in some row item
High for possible similarity-rough set, the candidate as subsequent calculating is right;Then it is chosen for Jaccard (the Jie Kade of fixed preset value
Coefficient), such as 0.4, candidate is calculated to the probability for falling in the same domain name collection, and specific formula for calculation is P=1- (1-rC)B, wherein
P represents probability, and r is line number amount, and C is fixed value 3, and B is the quantity for the domain name collection chosen, and corresponding value is substituted into formula,
The candidate similarity between in the feature vector of domain name to be analyzed can be calculated.
(5) cluster cell 605;
Cluster cell 605 is obtained for being clustered based on the similarity between domain name to be analyzed to the domain name to be analyzed
Domain name collection after to cluster;
(6) recognition unit 606;
Analysis domain name is treated according to the domain name collection after the cluster to be identified.
After the similarity being calculated between domain name to be analyzed, the domain name to be analyzed that similarity is greater than preset value is drawn
It assigns to the same domain name to concentrate, that is, is divided to same domain name and concentrates, domain name collection after being clustered.
Specifically, as shown in Figure 6 b, cluster cell 605 may include:
First contrast subunit 607, the similarity for being analysed between domain name are compared with preset value;
Subelement 608 is clustered, the domain name to be analyzed for similarity to be greater than preset value is clustered to same domain name and concentrated, and is obtained
Domain name collection after to cluster.
The similarity that can be specifically analysed between domain name is compared with preset value, if the phase between domain name to be analyzed
It is greater than preset value like degree, is then analysed to domain name and clusters to same domain name concentration, successively clustered, until by all wait divide
It analyses domain name cluster to complete, domain name collection after cluster can be obtained.It should be noted that in order to reduce the quantity of subsequent domain name identification,
And the accuracy of domain name identification is improved, the preset value of the similarity comparison of setting is biggish value, for example is greater than 0.9 and is less than etc.
In 1 etc..
Then analysis domain name is treated according to the domain name collection after the cluster to be identified, concentrated due to the domain name after cluster
Domain name similarity is higher, the domain name to be analyzed that part identification domain name is concentrated is carried out, you can learn that the class of remaining domain name to be analyzed
Type.Therefore specific identification process is that part identifies the domain name to be analyzed that domain name is concentrated, and obtains the class of part domain name to be analyzed
Type can not be known if the type of the part domain name to be analyzed of identification is consistent using the type of part domain name to be analyzed as remaining
The type of other domain name to be analyzed.If the Type-Inconsistencies of the part domain name to be analyzed of identification, can be improved the to be analyzed of identification
The quantity of domain name, to improve the accuracy of identification, and by the domain name to be analyzed of identification, accounting is more than preset ratio wait divide
The type of domain name is analysed as the type for being left unidentified domain name to be analyzed.
Specifically, as fig. 6 c, cluster cell 605 can also include extracting subelement 609, the second contrast subunit
610 and determine subelement 611:
Subelement 609 is extracted, for extracting the feature of target domain name;
Second contrast subunit 610, the domain for concentrating the domain name after the feature of the target domain name and the cluster
Name feature compares;
Subelement 611 is determined, if concentrating for the domain name after the cluster in the presence of consistent with the target domain name feature
Domain name, it is determined that the domain name that the domain name after the cluster is concentrated is target domain name.
In specific domain name identification process, the feature of target domain name can be extracted, and by the feature of the target domain name and institute
The domain name that domain name is concentrated after clustering is stated to compare, since the similarity of the type of the domain name of same domain name concentration is higher, or
It is consistent for type, therefore, part domain name and the feature of target domain name that domain name is concentrated are compared, to reduce pair
The quantity of ratio shortens the reduced time, to improve to specific efficiency.Need to illustrate says, the feature of the target domain name of extraction, can be with
For the feature of the domain name of malice clique, if the domain name that domain name is concentrated after the cluster exist it is consistent with the feature of the target domain name
Domain name, it is determined that the domain name that domain name is concentrated after the cluster is target domain name, such as the domain name of malice clique.Further,
Since the time that the domain name of part malice is broken out is shorter, it is understood that there may be the case where not obtaining domain name feature can also pass through at this time
It is manually screened, judges that domain name is concentrated with the presence or absence of malice domain name.
Further, however, it is determined that the domain name that a certain domain name is concentrated is target domain name, accesses domain name concentration in order to prevent
The client of domain name is under attack, then it needs to be determined that domain name concentrates the access client of domain name, Xiang Suoshu access client
It sends a warning message, can specifically extract preset warning information, and the warning information is sent in the presence of the access domain
The client for the domain name that name is concentrated, or warning information is generated in real time, and the warning information of generation is sent to access client,
It reminds the user that, reminds it to can suffer from attacking or being invaded, so that user takes precautions against in time.
From the foregoing, it will be observed that the receiving unit 601 of the domain name identification device of the present embodiment is by receiving domain name identification instruction;Then
Acquiring unit 602 identifies that instruction obtains multiple client and records for the access of domain name to be analyzed according to domain name;And according to
The access record constructs the corresponding feature vector of domain name to be analyzed;Then obtained feature vector is constructed for 603 pairs of dimensionality reduction unit
Feature Dimension Reduction processing is carried out, feature vector after the corresponding dimensionality reduction of domain name to be analyzed is obtained;Computing unit 604 is based on described to be analyzed
Feature vector calculates the similarity between domain name to be analyzed after the corresponding dimensionality reduction of domain name;Cluster cell be based on again domain name to be analyzed it
Between similarity the domain name to be analyzed is clustered, the domain name collection after being clustered can be according to the domain after the cluster
Name collection is treated analysis domain name and is identified;It realizes and is concentrated domain name cluster to domain name according to similarity, then according to domain name collection
Domain name is identified, the recognition time of domain name is greatly reduced, and improves the accuracy of domain name identification.
The embodiment of the present invention also provides a kind of server, as shown in fig. 7, it illustrates take involved in the embodiment of the present invention
The structural schematic diagram of business device, specifically:
The server may include one or processor 701, one or more meters of more than one processing core
The components such as memory 702, power supply 703 and the input unit 704 of calculation machine readable storage medium storing program for executing.Those skilled in the art can manage
It solves, server architecture shown in Fig. 7 does not constitute the restriction to server, may include than illustrating more or fewer portions
Part perhaps combines certain components or different component layouts.Wherein:
Processor 701 is the control centre of the server, utilizes each of various interfaces and the entire server of connection
Part by running or execute the software program and/or module that are stored in memory 702, and calls and is stored in memory
Data in 702, the various functions and processing data of execute server, to carry out integral monitoring to server.Optionally, locate
Managing device 701 may include one or more processing cores;Preferably, processor 701 can integrate application processor and modulatedemodulate is mediated
Manage device, wherein the main processing operation system of application processor, user interface and application program etc., modem processor is main
Processing wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processor 401.
Memory 702 can be used for storing software program and module, and processor 701 is stored in memory 702 by operation
Software program and module, thereby executing various function application and data processing.Memory 702 can mainly include storage journey
Sequence area and storage data area, wherein storing program area can the (ratio of application program needed for storage program area, at least one function
Such as sound-playing function, image player function) etc.;Storage data area, which can be stored, uses created data according to server
Deng.In addition, memory 702 may include high-speed random access memory, it can also include nonvolatile memory, for example, at least
One disk memory, flush memory device or other volatile solid-state parts.Correspondingly, memory 702 can also include
Memory Controller, to provide access of the processor 701 to memory 702.
Server further includes the power supply 703 powered to all parts, it is preferred that power supply 703 can pass through power management system
It unites logically contiguous with processor 701, to realize the function such as management charging, electric discharge and power managed by power-supply management system
Energy.Power supply 703 can also include one or more direct current or AC power source, recharging system, power failure monitor electricity
The random components such as road, power adapter or inverter, power supply status indicator.
The server may also include input unit 704, which can be used for receiving the number or character letter of input
Breath, and generation keyboard related with user setting and function control, mouse, operating stick, optics or trackball signal are defeated
Enter.
Although being not shown, server can also be including display unit etc., and details are not described herein.Specifically in the present embodiment,
Processor 701 in server can according to following instruction, by the process of one or more application program is corresponding can
It executes file to be loaded into memory 702, and runs the application program being stored in memory 702 by processor 701, thus
Realize various functions, as follows:
Receive domain name identification instruction;
Identify that instruction obtains multiple client and records for the access of domain name to be analyzed according to domain name;And according to described
Access record constructs the corresponding feature vector of domain name to be analyzed;
Feature Dimension Reduction processing is carried out to the corresponding feature vector of the domain name to be analyzed, obtains the corresponding drop of domain name to be analyzed
Feature vector after dimension;
The similarity between domain name to be analyzed is calculated based on feature vector after the corresponding dimensionality reduction of the domain name to be analyzed;
The domain name to be analyzed is clustered based on the similarity between domain name to be analyzed, the domain name after being clustered
Collection;
Analysis domain name is treated according to the domain name collection after the cluster to be identified.
The specific implementation of above each operation can be found in the embodiment of front, and details are not described herein.
It will appreciated by the skilled person that all or part of the steps in the various methods of above-described embodiment can be with
It is completed by instructing, or relevant hardware is controlled by instruction to complete, which can store computer-readable deposits in one
In storage media, and is loaded and executed by processor.
For this purpose, the embodiment of the present invention provides a kind of storage medium, wherein being stored with a plurality of instruction, which can be processed
Device is loaded, to execute the step in any domain name recognition methods provided by the embodiment of the present invention.For example, the instruction can
To execute following steps:
Receive domain name identification instruction;
Identify that instruction obtains multiple client and records for the access of domain name to be analyzed according to domain name;And according to described
Access record constructs the corresponding feature vector of domain name to be analyzed;
Feature Dimension Reduction processing is carried out to the corresponding feature vector of the domain name to be analyzed, obtains the corresponding drop of domain name to be analyzed
Feature vector after dimension;
The similarity between domain name to be analyzed is calculated based on feature vector after the corresponding dimensionality reduction of the domain name to be analyzed;
The domain name to be analyzed is clustered based on the similarity between domain name to be analyzed, the domain name after being clustered
Collection;
Analysis domain name is treated according to the domain name collection after the cluster to be identified.
The specific implementation of above each operation can be found in the embodiment of front, and details are not described herein.
Wherein, which may include: read-only memory (ROM, Read Only Memory), random access memory
Body (RAM, Random Access Memory), disk or CD etc..
By the instruction stored in the storage medium, any domain name provided by the embodiment of the present invention can be executed and known
Step in other method, it is thereby achieved that achieved by any domain name recognition methods provided by the embodiment of the present invention
Beneficial effect is detailed in the embodiment of front, and details are not described herein.
Be provided for the embodiments of the invention a kind of domain name recognition methods above, device and system are described in detail,
Used herein a specific example illustrates the principle and implementation of the invention, and the explanation of above embodiments is only used
In facilitating the understanding of the method and its core concept of the invention;Meanwhile for those skilled in the art, think of according to the present invention
Think, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification should not be construed as pair
Limitation of the invention.
Claims (15)
1. a kind of domain name recognition methods characterized by comprising
Receive domain name identification instruction;
Identify that instruction obtains multiple client and records for the access of domain name to be analyzed according to domain name, and according to the access
Record constructs the corresponding feature vector of domain name to be analyzed;
Feature Dimension Reduction processing is carried out to the corresponding feature vector of the domain name to be analyzed, after obtaining the corresponding dimensionality reduction of domain name to be analyzed
Feature vector;
The similarity between domain name to be analyzed is calculated based on feature vector after the corresponding dimensionality reduction of the domain name to be analyzed;
The domain name to be analyzed is clustered based on the similarity between domain name to be analyzed, the domain name collection after being clustered;
Analysis domain name is treated according to the domain name collection after the cluster to be identified, recognition result is obtained.
2. domain name recognition methods as described in claim 1, which is characterized in that described to the corresponding feature of the domain name to be analyzed
Vector carries out Feature Dimension Reduction processing, comprising:
Hash function needed for determining the corresponding feature vector of the domain name to be analyzed;
Hash variation is carried out to the corresponding feature vector of the domain name to be analyzed based on the hash function, obtains feature after dimensionality reduction
Vector.
3. domain name recognition methods as claimed in claim 2, which is characterized in that described eigenvector includes that domain name to be analyzed is corresponding
Subcharacter vector, it is described that Hash variation is carried out to the corresponding feature vector of the domain name to be analyzed based on the hash function,
Obtain feature vector after dimensionality reduction, comprising:
Corresponding Hash subcharacter vector is added to described eigenvector according to hash function;
Characteristic value in the Hash subcharacter vector is initialized, initialization Hash feature subvector is obtained;
Initialization Hash feature subvector subcharacter vector corresponding with the domain name to be analyzed is compared, is obtained pair
Compare result;
Dimensionality reduction is carried out to the corresponding subcharacter vector of the domain name to be analyzed according to the comparing result, obtain after dimensionality reduction feature to
Amount.
4. domain name recognition methods as claimed in claim 3, which is characterized in that described by the initialization Hash feature subvector
Subcharacter vector corresponding with the domain name to be analyzed compares, and obtains comparing result, comprising:
By the characteristic value subcharacter vector characteristics value corresponding with the domain name to be analyzed of the initialization Hash feature subvector
Size comparison is carried out, comparing result is obtained.
5. domain name recognition methods as claimed in claim 3, which is characterized in that it is described according to the comparing result to it is described to point
It analyses the corresponding subcharacter vector of domain name and carries out dimensionality reduction, obtain feature vector after dimensionality reduction, comprising:
If the characteristic value of the initialization Hash feature subvector is greater than the corresponding subcharacter vector characteristics of the domain name to be analyzed
Value, and the corresponding subcharacter vector characteristics value of the domain name to be analyzed is greater than preset value, then by initialization Hash feature
The characteristic value of vector is replaced;
If the characteristic value of the initialization Hash feature subvector is greater than the corresponding subcharacter vector characteristics of the domain name to be analyzed
Value, and the corresponding subcharacter vector characteristics value of the domain name to be analyzed be equal to preset value, then initialization Hash feature to
The characteristic value of amount remains unchanged;
The characteristic value of replaced Hash feature subvector and the characteristic value of the Hash feature subvector remained unchanged are subjected to group
It closes, obtains combination Hash feature subvector;
Dimensionality reduction is carried out to the corresponding subcharacter vector of the domain name to be analyzed according to combination Hash feature subvector, after obtaining dimensionality reduction
Feature vector.
6. domain name recognition methods as claimed in claim 2, which is characterized in that the corresponding spy of the determination domain name to be analyzed
Before hash function needed for levying vector, further includes:
Processing is filtered to the corresponding feature vector of the domain name to be analyzed, feature vector after being filtered;
Using filtered feature vector as the corresponding feature vector of domain name to be analyzed.
7. domain name recognition methods as described in claim 1, which is characterized in that described to be based on the corresponding drop of the domain name to be analyzed
Feature vector calculates the similarity between domain name to be analyzed after dimension, comprising:
Obtain the similarity calculation coefficient between domain name to be analyzed;
Obtain the vector line number of feature vector after the corresponding dimensionality reduction of the domain name to be analyzed;
The similarity between domain name to be analyzed is calculated according to the similarity calculation coefficient and vector line number.
8. domain name recognition methods as described in claim 1, which is characterized in that described to be analyzed according to access record building
The corresponding feature vector of domain name, comprising:
Multiple client is ranked up according to sequence track;
Client records the domain name to be analyzed with the presence or absence of access after detection sequence, the corresponding inspection of client after being sorted
Survey result;
It is put in order based on the corresponding testing result of client after sequence, client, constructs the corresponding feature of domain name to be analyzed
Vector.
9. domain name recognition methods as described in claim 1, which is characterized in that the similarity based between domain name to be analyzed
The domain name to be analyzed is clustered, domain name collection after being clustered, comprising:
The similarity being analysed between domain name is compared with preset value;
The domain name to be analyzed that similarity is greater than preset value is clustered into the domain name collection to same domain name collection into, after being clustered.
10. the recognition methods of malice domain name as described in claim 1, which is characterized in that the domain according to after the cluster
Name collection is treated analysis domain name and is identified, comprising:
Extract the feature of target domain name;
The domain name feature that the feature of the target domain name is concentrated with the domain name after the cluster is compared;
Exist and the consistent domain name of target domain name feature if the domain name after the cluster is concentrated, it is determined that after the cluster
The domain name that domain name is concentrated is target domain name.
11. such as the described in any item domain name recognition methods of claim 1-10, which is characterized in that it is described according to the cluster after
Domain name collection was treated after the step of analysis domain name is identified, further includes:
If identifying, the domain name that domain name is concentrated is target domain name, it is determined that the access client of domain name concentration domain name;
It sends a warning message to the access client.
12. domain name recognition methods as described in claim 1, which is characterized in that the acquisition multiple client is for be analyzed
The access of domain name records, comprising:
Show that domain name detects the page, the page includes domain name identification control;
Based on user for the trigger action of domain name identification control, the access that multiple client is directed to domain name to be analyzed is obtained
Record.
13. domain name recognition methods as claimed in claim 12, which is characterized in that the domain name collection pair according to after the cluster
After the step of domain name to be analyzed is identified, further includes:
Domain name recognition result is shown in the domain name detection page.
14. a kind of domain name identification device characterized by comprising
Receiving unit, for receiving domain name identification instruction;
Acquiring unit, for identifying that instruction obtains multiple client and records for the access of domain name to be analyzed according to domain name,
And the corresponding feature vector of domain name to be analyzed is constructed according to access record;
Dimensionality reduction unit obtains domain to be analyzed for carrying out Feature Dimension Reduction processing to the corresponding feature vector of the domain name to be analyzed
Feature vector after the corresponding dimensionality reduction of name;
Computing unit, for calculating the phase between domain name to be analyzed based on feature vector after the corresponding dimensionality reduction of the domain name to be analyzed
Like degree;
Cluster cell is clustered for being clustered based on the similarity between domain name to be analyzed to the domain name to be analyzed
Domain name collection afterwards;
Recognition unit is identified for treating analysis domain name according to the domain name collection after the cluster.
15. a kind of storage medium, which is characterized in that the storage medium is stored with a plurality of instruction, and described instruction is suitable for processor
It is loaded, the step in 1 to 13 described in any item domain name recognition methods is required with perform claim.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910373033.3A CN110099059B (en) | 2019-05-06 | 2019-05-06 | Domain name identification method and device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910373033.3A CN110099059B (en) | 2019-05-06 | 2019-05-06 | Domain name identification method and device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110099059A true CN110099059A (en) | 2019-08-06 |
CN110099059B CN110099059B (en) | 2021-08-31 |
Family
ID=67446989
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910373033.3A Active CN110099059B (en) | 2019-05-06 | 2019-05-06 | Domain name identification method and device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110099059B (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110826067A (en) * | 2019-10-31 | 2020-02-21 | 深信服科技股份有限公司 | Virus detection method and device, electronic equipment and storage medium |
CN111049949A (en) * | 2019-12-31 | 2020-04-21 | 奇安信科技集团股份有限公司 | Domain name identification method, device, electronic equipment and medium |
CN112367338A (en) * | 2020-11-27 | 2021-02-12 | 腾讯科技(深圳)有限公司 | Malicious request detection method and device |
CN112565283A (en) * | 2020-12-15 | 2021-03-26 | 厦门服云信息科技有限公司 | APT attack detection method, terminal device and storage medium |
CN112583738A (en) * | 2020-12-29 | 2021-03-30 | 北京浩瀚深度信息技术股份有限公司 | Method, equipment and storage medium for analyzing and classifying network flow |
CN112615861A (en) * | 2020-12-17 | 2021-04-06 | 赛尔网络有限公司 | Malicious domain name identification method and device, electronic equipment and storage medium |
CN112910832A (en) * | 2019-12-03 | 2021-06-04 | 国家计算机网络与信息安全管理中心 | International domain name spoofing attack recognition and analysis method and system |
WO2021169730A1 (en) * | 2020-02-25 | 2021-09-02 | 深信服科技股份有限公司 | Method and device for data processing, and storage medium |
CN113381963A (en) * | 2020-02-25 | 2021-09-10 | 深信服科技股份有限公司 | Domain name detection method, device and storage medium |
CN113542202A (en) * | 2020-04-21 | 2021-10-22 | 深信服科技股份有限公司 | Domain name identification method, device, equipment and computer readable storage medium |
CN113556308A (en) * | 2020-04-23 | 2021-10-26 | 深信服科技股份有限公司 | Method, system, equipment and computer storage medium for detecting flow security |
CN115361358B (en) * | 2022-08-19 | 2024-02-06 | 山石网科通信技术股份有限公司 | IP extraction method and device, storage medium and electronic device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104112018A (en) * | 2014-07-21 | 2014-10-22 | 南京大学 | Large-scale image retrieval method |
CN104486461A (en) * | 2014-12-29 | 2015-04-01 | 北京奇虎科技有限公司 | Domain name classification method and device and domain name recognition method and system |
CN108282450A (en) * | 2017-01-06 | 2018-07-13 | 阿里巴巴集团控股有限公司 | The detection method and device of abnormal domain name |
CN109698820A (en) * | 2018-09-03 | 2019-04-30 | 长安通信科技有限责任公司 | A kind of domain name Similarity measures and classification method and system |
-
2019
- 2019-05-06 CN CN201910373033.3A patent/CN110099059B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104112018A (en) * | 2014-07-21 | 2014-10-22 | 南京大学 | Large-scale image retrieval method |
CN104486461A (en) * | 2014-12-29 | 2015-04-01 | 北京奇虎科技有限公司 | Domain name classification method and device and domain name recognition method and system |
CN108282450A (en) * | 2017-01-06 | 2018-07-13 | 阿里巴巴集团控股有限公司 | The detection method and device of abnormal domain name |
CN109698820A (en) * | 2018-09-03 | 2019-04-30 | 长安通信科技有限责任公司 | A kind of domain name Similarity measures and classification method and system |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110826067B (en) * | 2019-10-31 | 2022-08-09 | 深信服科技股份有限公司 | Virus detection method and device, electronic equipment and storage medium |
CN110826067A (en) * | 2019-10-31 | 2020-02-21 | 深信服科技股份有限公司 | Virus detection method and device, electronic equipment and storage medium |
CN112910832A (en) * | 2019-12-03 | 2021-06-04 | 国家计算机网络与信息安全管理中心 | International domain name spoofing attack recognition and analysis method and system |
CN111049949A (en) * | 2019-12-31 | 2020-04-21 | 奇安信科技集团股份有限公司 | Domain name identification method, device, electronic equipment and medium |
CN113381962A (en) * | 2020-02-25 | 2021-09-10 | 深信服科技股份有限公司 | Data processing method, device and storage medium |
WO2021169730A1 (en) * | 2020-02-25 | 2021-09-02 | 深信服科技股份有限公司 | Method and device for data processing, and storage medium |
CN113381963A (en) * | 2020-02-25 | 2021-09-10 | 深信服科技股份有限公司 | Domain name detection method, device and storage medium |
CN113381962B (en) * | 2020-02-25 | 2023-02-03 | 深信服科技股份有限公司 | Data processing method, device and storage medium |
CN113381963B (en) * | 2020-02-25 | 2024-01-02 | 深信服科技股份有限公司 | Domain name detection method, device and storage medium |
CN113542202A (en) * | 2020-04-21 | 2021-10-22 | 深信服科技股份有限公司 | Domain name identification method, device, equipment and computer readable storage medium |
CN113542202B (en) * | 2020-04-21 | 2022-09-30 | 深信服科技股份有限公司 | Domain name identification method, device, equipment and computer readable storage medium |
CN113556308A (en) * | 2020-04-23 | 2021-10-26 | 深信服科技股份有限公司 | Method, system, equipment and computer storage medium for detecting flow security |
CN112367338A (en) * | 2020-11-27 | 2021-02-12 | 腾讯科技(深圳)有限公司 | Malicious request detection method and device |
CN112565283A (en) * | 2020-12-15 | 2021-03-26 | 厦门服云信息科技有限公司 | APT attack detection method, terminal device and storage medium |
CN112615861A (en) * | 2020-12-17 | 2021-04-06 | 赛尔网络有限公司 | Malicious domain name identification method and device, electronic equipment and storage medium |
CN112583738A (en) * | 2020-12-29 | 2021-03-30 | 北京浩瀚深度信息技术股份有限公司 | Method, equipment and storage medium for analyzing and classifying network flow |
CN115361358B (en) * | 2022-08-19 | 2024-02-06 | 山石网科通信技术股份有限公司 | IP extraction method and device, storage medium and electronic device |
Also Published As
Publication number | Publication date |
---|---|
CN110099059B (en) | 2021-08-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110099059A (en) | A kind of domain name recognition methods, device and storage medium | |
Rao et al. | Jail-Phish: An improved search engine based phishing detection system | |
Zhu et al. | OFS-NN: an effective phishing websites detection model based on optimal feature selection and neural network | |
CN105590055B (en) | Method and device for identifying user credible behaviors in network interaction system | |
CN112866023B (en) | Network detection method, model training method, device, equipment and storage medium | |
CN111949803B (en) | Knowledge graph-based network abnormal user detection method, device and equipment | |
US20170078310A1 (en) | Identifying phishing websites using dom characteristics | |
CN111565171B (en) | Abnormal data detection method and device, electronic equipment and storage medium | |
CN111355697B (en) | Detection method, device, equipment and storage medium for botnet domain name family | |
CN104899508B (en) | A kind of multistage detection method for phishing site and system | |
CN104579773B (en) | Domain name system analyzes method and device | |
RU2722693C1 (en) | Method and system for detecting the infrastructure of a malicious software or a cybercriminal | |
CN108023868B (en) | Malicious resource address detection method and device | |
CN106209488A (en) | For detecting the method and apparatus that website is attacked | |
CN107612911B (en) | Method for detecting infected host and C & C server based on DNS traffic | |
CN108322428A (en) | A kind of abnormal access detection method and equipment | |
CN113132311A (en) | Abnormal access detection method, device and equipment | |
Kozik et al. | Modelling HTTP requests with regular expressions for detection of cyber attacks targeted at web applications | |
Eldos et al. | On the KDD'99 Dataset: Statistical Analysis for Feature Selection | |
Platzer et al. | A synopsis of critical aspects for darknet research | |
CN113535823A (en) | Abnormal access behavior detection method and device and electronic equipment | |
WO2016173327A1 (en) | Method and device for detecting website attack | |
CN115001724B (en) | Network threat intelligence management method, device, computing equipment and computer readable storage medium | |
CN115643044A (en) | Data processing method, device, server and storage medium | |
CN113395268A (en) | Online and offline fusion-based web crawler interception method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |