CN107682348A

CN107682348A - DGA domain name Quick method and devices based on machine learning

Info

Publication number: CN107682348A
Application number: CN201710976231.XA
Authority: CN
Inventors: 莫凡; 范渊; 刘博�
Original assignee: DBAPPSecurity Co Ltd
Current assignee: DBAPPSecurity Co Ltd
Priority date: 2017-10-19
Filing date: 2017-10-19
Publication date: 2018-02-09

Abstract

The invention provides a kind of DGA domain name Quick method and devices based on machine learning, it is related to technical field of network security.The DGA domain name Quick methods based on machine learning include：Structure includes the training set of multiple DGA domain names and normal domain name；Extract the domain name feature of each domain name in the training set；Domain name feature is normalized and obtains characteristic set；Vertical domain name sorter model is built jointly based on the characteristic data set.Method and device provided by the invention is extracted more rich, more representational domain name feature by the research to domain name；By the way that characteristic is normalized, training and test are can speed up, so as to improve computational efficiency；Finally characteristic set is trained using machine learning algorithm and obtains domain name sorter model, generalization ability is improved while judging nicety rate is improved.

Description

DGA domain name Quick method and devices based on machine learning

Technical field

It is quick in particular to a kind of DGA domain names based on machine learning the present invention relates to technical field of network security Method of discrimination and device.

Background technology

DGA domain names refer to utilize a series of of domain name generating algorithm (Domain Generation Algorithm) generation Random domain name.This method is common in Botnet (Botnet), such as conficker, zeus etc, and they can utilize a private Some random string generating algorithms, according to date or other random seeds, some random string domain names are generated daily, so Some of which domain name is registered afterwards, so as to be swindled, propagates the malfeasance such as Malware, distribution Pornograph.

As the technologies such as Domain-Flux, Fast-Flux are used by hacker more and more widely, entered using DGA domain names Capable network attack is more hidden and is difficult to follow the trail of.As long as it is infected machine in Botnet also to attempt according to same algorithm Generate these random domain names and then collide success, can be just controlled by hacker, and then initiate distributed denial of service, rubbish postal The network attacks such as part.

Traditional method is mainly to be detected by the experience of white cap, and this method expends substantial amounts of manpower, very The competent huge mission requirements of today of hardly possible.Another kind of method feature based construction, is triggered from similarity measurement, by calculating sample To obtaining threshold value, so that it is determined that whether domain name to be detected is DGA domain names, it uses relatively simple method for measuring similarity, examines Worry feature is more single, and Generalization Capability is poor, and accuracy rate is not also high.

The content of the invention

It is an object of the invention to provide a kind of DGA domain name Quick method and devices based on machine learning, its energy Enough it is effectively improved above mentioned problem.

What embodiments of the invention were realized in：

In a first aspect, the embodiments of the invention provide a kind of DGA domain name Quick methods based on machine learning, it is described Method includes：Structure includes the training set of multiple DGA domain names and normal domain name；Extract each domain name in the training set Domain name feature；Domain name feature is normalized and obtains characteristic set；Vertical domain is built jointly based on the characteristic data set Name sorter model.

Second aspect, the embodiment of the present invention additionally provide a kind of DGA domain name fast discriminating devices based on machine learning, its Module is built including training set, the training set of multiple DGA domain names and normal domain name is included for building；Characteristic extracting module, For extracting the domain name feature of each domain name in the training set；Module is normalized, for returning to domain name feature One changes acquisition characteristic set；Model building module, for building vertical domain name sorter model jointly based on the characteristic data set.

DGA domain name Quick method and devices provided in an embodiment of the present invention based on machine learning, first structure bag Training set containing multiple DGA domain names and normal domain name, enough samples are provided subsequently to establish domain name sorter model；Then Extract the domain name feature of each domain name in the training set, using representative domain name feature as judge domain name whether be The criterion of DGA domain names；Domain name feature is normalized again and obtains characteristic set, with unified each characteristic Dimension, improve computational efficiency；It is finally based on the characteristic data set and builds vertical domain name sorter model jointly, you can is easy to utilize the machine The domain name sorter model that device learning training obtains detects to various unknown domain names, and realization quickly and accurately judges to be measured Whether domain name is DGA domain names.Relative to prior art, the DGA domain names provided in an embodiment of the present invention based on machine learning are quick Method of discrimination and device are extracted more rich, more representational domain name feature by the research to domain name；By to characteristic According to being normalized, training and test are can speed up, so as to improve computational efficiency；Finally using machine learning algorithm to characteristic It is trained according to set and obtains domain name sorter model, generalization ability is improved while judging nicety rate is improved.

Brief description of the drawings

In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by embodiment it is required use it is attached Figure is briefly described, it will be appreciated that the following drawings illustrate only certain embodiments of the present invention, therefore be not construed as pair The restriction of scope, for those of ordinary skill in the art, on the premise of not paying creative work, can also be according to this A little accompanying drawings obtain other related accompanying drawings.

Fig. 1 is a kind of structured flowchart for the electronic equipment that can be applied in the embodiment of the present invention；

Fig. 2 is the flow chart element for the DGA domain name Quick methods based on machine learning that first embodiment of the invention provides Figure；

Fig. 3 is the sub-step FB(flow block) of step S210 in first embodiment of the invention；

Fig. 4 is the sub-step FB(flow block) of step S300 in first embodiment of the invention；

Fig. 5 is the sub-step FB(flow block) of step S310 in first embodiment of the invention；

Fig. 6 is the sub-step FB(flow block) of step S320 in first embodiment of the invention；

Fig. 7 is the sub-step FB(flow block) of step S330 in first embodiment of the invention；

Fig. 8 is the sub-step FB(flow block) of step S230 in first embodiment of the invention；

Fig. 9 is step S500, the step S510 FB(flow block) that first embodiment of the invention provides；

Figure 10 is the structural frames for the DGA domain name fast discriminating devices based on machine learning that second embodiment of the invention provides Figure.

Embodiment

Below in conjunction with accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Ground describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.Generally exist The component of the embodiment of the present invention described and illustrated in accompanying drawing can be configured to arrange and design with a variety of herein.Cause This, the detailed description of the embodiments of the invention to providing in the accompanying drawings is not intended to limit claimed invention below Scope, but it is merely representative of the selected embodiment of the present invention.Based on embodiments of the invention, those skilled in the art are not doing The every other embodiment obtained on the premise of going out creative work, belongs to the scope of protection of the invention.

It should be noted that：Similar label and letter represents similar terms in following accompanying drawing, therefore, once a certain Xiang Yi It is defined, then it further need not be defined and explained in subsequent accompanying drawing in individual accompanying drawing.Meanwhile the present invention's In description, term " first ", " second " etc. are only used for distinguishing description, and it is not intended that instruction or hint relative importance.

Fig. 1 shows a kind of structured flowchart for the electronic equipment 100 that can be applied in the embodiment of the present application.As shown in figure 1, Electronic equipment 100 can include memory 110, storage control 120, processor 130, display screen 140 and based on engineering The DGA domain name fast discriminating devices of habit.For example, the electronic equipment 100 can be PC (personal computer, PC), tablet personal computer, smart mobile phone, personal digital assistant (personal digital assistant, PDA) etc..

It is directly or indirectly electric between memory 110, storage control 120, processor 130,140 each element of display screen Connection, to realize the transmission of data or interaction.For example, one or more communication bus or signal can be passed through between these elements Bus realizes electrical connection.The DGA domain name Quicks method based on machine learning respectively include it is at least one can be with soft The form of part or firmware (firmware) is stored in the software function module in memory 110, such as described is based on machine learning DGA domain name the fast discriminating devices software function module or computer program that include.

Memory 110 can store various software programs and module, if the embodiment of the present application offer is based on engineering Programmed instruction/module corresponding to the DGA domain name Quick method and devices of habit.Processor 130 is stored in storage by operation Software program and module in device 110, so as to perform various function application and data processing, that is, realize the embodiment of the present application In the DGA domain name Quick methods based on machine learning.Memory 110 can include but is not limited to random access memory (Random Access Memory, RAM), read-only storage (Read Only Memory, ROM), programmable read only memory (Programmable Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM), electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only Memory, EEPROM) etc..

Processor 130 can be a kind of IC chip, have signal handling capacity.Above-mentioned processor can be general Processor, including central processing unit (Central Processing Unit, abbreviation CPU), network processing unit (Network Processor, abbreviation NP) etc.；It can also be digital signal processor (DSP), application specific integrated circuit (ASIC), ready-made programmable Gate array (FPGA) either other PLDs, discrete gate or transistor logic, discrete hardware components.It can To realize or perform disclosed each method, step and the logic diagram in the embodiment of the present application.General processor can be micro- Processor or the processor can also be any conventional processors etc..

Electronic equipment 100 applied in the embodiment of the present invention is DGA domain name Quick of the realization based on machine learning, Can also possess from display function, display screen 140 therein can provide one between the electronic equipment 100 and user Interactive interface (such as user interface) refers to for display image data to user.For example, it can show based on machine The data such as the domain name training set of the DGA domain names fast discriminating device foundation of study and the domain name feature of extraction.

Firstly the need of explanation before the specific embodiment of the present invention is introduced, the present invention is computer technology in information A kind of application of security technology area.In the implementation process of the present invention, the application of multiple software function modules can be related to.Shen Ask someone to think, it is existing combining such as after application documents, accurate understanding realization principle and goal of the invention of the invention is read over In the case of having known technology, those skilled in the art can use the software programming technical ability of its grasp to realize the present invention completely, The software function module that all the present patent application files refer to belongs to this category, and applicant will not enumerate.

First embodiment

Fig. 2 is refer to, a kind of DGA domain name Quick methods based on machine learning is present embodiments provided, is applied to DGA domain name fast discriminating devices based on machine learning, methods described include：

Step S200：Structure includes the training set of multiple DGA domain names and normal domain name；

In the present embodiment, the DGA domain names can be described as positive example again, and it can include what is generated by common DGA algorithms DGA domain names, and the malice domain name obtained by channel of increasing income.The normal domain name can be described as counter-example again, and it can include mesh Preceding generally acknowledged inert normal domain name, for example, in Alexa websites ranking forefront multinomial domain name.

For example, domain name " www.google.com ", it is normal domain name.

Step S210：Extract the domain name feature of each domain name in the training set；

, can be first in training set before the domain name feature of each domain name in extracting the training set in the present embodiment Each domain name pre-processed, extract principal character representative in each domain name, for example, the Main Domain of each domain name, TLD suffix (Top-Level Domain) is the last part of domain name.

For example, domain name " www.google.com ", its Main Domain is google, and its TLD suffix is com.

It is understood that in the present embodiment, the domain name feature of extraction can be single, such as only be entered by Main Domain The differentiation of row DGA domain names；The domain name feature of extraction can also be multiple, such as by extracting Main Domain, the TLD of each domain name Suffix, more features are also expanded on the Main Domain and TLD suffix to refine judgment rule, improve DGA domain names and differentiate The degree of accuracy.For example, can by the character transition probability in the length of Main Domain, the characteristic of speech sounds of Main Domain, Main Domain and The TLD suffix of domain name is extracted collectively as domain name feature.

Step S220：Domain name feature is normalized and obtains characteristic set；

In the present embodiment, the domain name feature of previous step extraction is normalized, it is special that each domain name can be unified The dimension of sign, computational efficiency is improved, is easy to follow-up machine learning training and the foundation of discrimination model.

Step S230：Vertical domain name sorter model is built jointly based on the characteristic data set.

In the present embodiment, the characteristic set can be trained using machine learning algorithm, to establish domain name Sorter model.The domain name sorter model obtained by machine learning, can fast and accurately be identified according to domain name feature DGA domain names, it can be used for being predicted unknown domain name.

It refer to Fig. 3, in the present embodiment, further, the step S210 can include following sub-step：

Step S300：Extract the length characteristic of each domain name in the training set；

In the present embodiment, the length characteristic of each domain name can be the length of Main Domain in each domain name.

For example, domain name " www.google.com ", its Main Domain google length is 6.

Step S310：Extract the n-gram features of each domain name in the training set；

In the present embodiment, the n-gram is called n gram language models, and n members represent n connected characters, its frequency occurred The characteristic of language can be embodied.

For example, when n takes 1,2,3 respectively, n phase loigature of the Main Domain " google " of domain name " www.google.com " Symbol string is as shown in table 1：

Table 1

Step S320：Extract the transition probability feature of each domain name in the training set；

Transition probability is the key concept in Markov chain, if markov chain is divided into m state composition, historical summary conversion For the sequence being made up of this m state.From any one state, by arbitrarily once shifting, necessarily go out present condition 1, 2nd ..., one in m, the transfer between this state are referred to as transition probability.Each character in Main Domain can regard horse as A state in Er Kefu chains (Markov Chain), each of which state value depend on above limited individual state, limited individual shape State generally takes 1 state.

For example, the Main Domain " google " of domain name " www.google.com ", its transition probability are：

P=p (g) × p (g → o) × p (o → o) × p (o → g) × p (g → l) × p (l → e)

Step S330：Extract the TLD suffix features of each domain name in the training set.

Under normal circumstances, DGA domain names meeting alternative costs are low and audit not tight TLD suffix, pass through extraction in the present embodiment The TLD suffix features of each domain name, can be as the foundation for differentiating DGA domain names.

It refer to Fig. 4, in the present embodiment, further, the step S300 can include following sub-step：

Step S301：Extract the Main Domain length of each domain name in the training set, and by the main domain of specific Main Domain Length characteristic of the name length as the specific Main Domain.

Because brief domain name registration is more, therefore brief domain name resources are fewer and fewer, so the Main Domain length of DGA domain names Degree has the trend for becoming big.It is used as length characteristic by extracting the Main Domain length of each domain name in the present embodiment, can be used for sentencing Other DGA domain names.For example, when the Main Domain length of some domain name to be measured exceedes a certain threshold value, it is believed that it is DGA domains that it, which has maximum probability, Name.

It refer to Fig. 5, in the present embodiment, further, the step S310 can include following sub-step：

Step S311：The frequency that n connected characters occur in all Main Domains in the training set is counted, and by described in The frequency that n connected characters occur in all Main Domains ranking from high to low；

Specifically, during n=1, count the frequency of single character appearance in all Main Domains of training set and arrange from high to low Name, P¹(x₁)；

During n=2, the frequency that two connected characters occur and from high to low ranking are counted in all Main Domains of training set, P²(x₁x₂)；

By that analogy, the frequency that n connected characters occur in all Main Domains of training set and from high to low ranking are counted, Pⁿ(x₁x₂...x_n)；

Particularly, because n is bigger, intercharacter relevance gradually weakens, and the frequency reference value come out decreases, Therefore n value suggestion is n≤3, and n is integer.

Step S312：Based on the frequency ranking that n connected characters occur in all Main Domains, specific Main Domain is calculated The average and variance for the frequency ranking that middle n connected characters occur, and n connected characters in the specific Main Domain are occurred N-gram feature of the average and variance of frequency ranking as the specific Main Domain.

Specifically, it is directed to specific Main Domain A=" a₁a₂…a_n-2a_n-1a_n", n connected characters frequency of occurrences can be calculated Ranking average and variance：

During n=1, the ranking of single character occurrence frequency in all Main Domains of training set obtained according to abovementioned steps can Calculate single character ranking average and variance in its specific Main Domain A：

During n=2, the row of two connected characters frequencies of occurrences in all Main Domains of training set obtained according to abovementioned steps Name, can calculate two connected characters ranking averages and variance in its specific Main Domain A：

During n=3, the row of three connected characters frequencies of occurrences in all Main Domains of training set obtained according to abovementioned steps Name, can calculate three connected characters ranking averages and variance in its specific Main Domain A：

In the present embodiment, by regarding the ranking average of n connected characters frequency of occurrences in Main Domain and variance as the master The n-gram features of domain name, its ranking average is smaller, illustrates that the n-gram in its Main Domain occurs more frequent, then the domain name Probability for DGA domain names is lower.

It refer to Fig. 6, in the present embodiment, further, the step S320 can include following sub-step：

Step S321：All Main Domains in the training set count to obtain Markov chain transfer matrix；

Specifically, the Markov chain transfer matrix that statistics obtains is：

α₁ a₂ …a_j …

Wherein, a_jFor the character that all Main Domains occur in training set；

Step S322：Based on the Markov chain transfer matrix, calculate the transition probability of specific Main Domain, and will described in Transition probability feature of the transition probability of specific Main Domain as the specific Main Domain.

Specifically, it is directed to specific Main Domain A=" a₁a₂…a_n-2a_n-1a_n", its transition probability, which can be calculated, is：

P=p (a₁)×p(a₁→a₂)×…×p(a_n-2→a_n-1)×p(a_n-1→a_n)

Wherein, p (*) can directly obtain from transfer matrix.

Find that normal domain name is common, readable, easy to remember, and transition probability value is bigger than normal, and DGA domain names are on the contrary, its turn by research It is less than normal to move probable value.In the present embodiment, by the way that transition probability P to be used as to the feature of the Main Domain, it can be used for differentiating DGA domains Name.

It refer to Fig. 7, in the present embodiment, further, the step S330 can include following sub-step：

Step S331：Extract all different TLD suffix of each domain name in the training set, construction TLD vectors；

In the present embodiment, OneHotEncoder coded systems can be used to TLD suffix.

Specifically, construction TLD vectors (TLD₁ TLD₂ … TLD_N)。

Step S332：For each sample TLD in the TLD vectors, value is 1 in corresponding dimension, in its codimension Value is 0 on degree, obtains TLD matrixes；

In the present embodiment, for each sample TLD extracted from training set, value in dimension is corresponded in sample TLD It is 0 for 1, in remaining dimension.The TLD matrixes of acquisition are：

Step S333：Based on the TLD matrixes, the TLD suffix features of certain domain name are obtained.

In the present embodiment, if domain name to be measured using it is non-it is famous, non-mainstream, price is low, the TLD suffix domain names that easily pass through, can Think that the probability that the domain name is DGA domain names is high.

In the present embodiment, after step S300, step S310, step S320 and step S330 is carried out, step S220 Feature normalization can be carried out to each domain name feature.

Specifically, length characteristic is normalized：

N-gram features are normalized：

During for n=1,

During for n=2,

During for n=3,

Transition probability feature is normalized：

When TLD suffix features are normalized, greatest member value is 1 in the TLD matrixes obtained due to step S332, most Small element value is 0, therefore each element therein is normalized, and value does not change.So behaviour is normalized in TLD suffix Make, it is and consistent before normalization.

Particularly, if to certain row TLD in TLD matrixes₁Its maximum and minimum value is consistent, then this row can be cancelled, because Linked character can not be provided for it.

It refer to Fig. 8, in the present embodiment, further, the step S230 can include following sub-step：

Step S400：Feature Dimension Reduction is carried out to the characteristic set, obtains the characteristic set after dimensionality reduction；

In the present embodiment, sample data is converted into the spy in higher dimensional space by above-mentioned steps S210, step S220 afterwards Data acquisition system is levied, by carrying out Feature Dimension Reduction to it, the complexity of calculating can be substantially reduced, reduce redundancy and made Into identification error, improve the precision of identification.

Particularly, patent of the present invention carries out Feature Dimension Reduction, this method using PCA dimension reduction methods to the characteristic set The dimension of feature can be greatly reduced while most information is retained.

Step S410：Characteristic set after the dimensionality reduction is trained using GBDT classifier algorithms, establishes domain Name sorter model.

In the present embodiment, the characteristic set after the dimensionality reduction that step S400 is obtained uses GBDT (Gradient Boost Decision Tree) classifier algorithm is trained, and after training terminates, establishes domain name sorter model.It is described The feature of domain name to be measured can be identified for domain name sorter model, can be with so as to realize fast and effectively differentiation DGA domain names Unknown domain name is predicted.

It refer to Fig. 9, in the present embodiment, further, after the step S230, can also comprise the following steps：

Step S500：Treat detection domain name and carry out feature extraction, feature normalization and Feature Dimension Reduction successively, after obtaining dimensionality reduction Characteristic to be detected；

In the present embodiment, the side similar with step S210, step S220, step S400 can be used to domain name to be detected Method carries out feature extraction, feature normalization and Feature Dimension Reduction successively.

Step S510：The characteristic to be detected is loaded into domain name sorter model, judges the domain to be detected Whether name is DGA domain names.

The DGA domain name Quick methods based on machine learning that the present embodiment provides, first by the research to domain name, More rich more representational feature is extracted, then characteristic dimension is reduced using Principal Component Analysis Algorithm (PCA), can speed up Training and test, so as to improve computational efficiency, finally carried using machine learning algorithm while domain name differentiation accuracy rate is improved Generalization ability is risen.

Second embodiment

Figure 10 is refer to, present embodiments provides a kind of DGA domain names fast discriminating device 600 based on machine learning, its Including：

Training set builds module 610, and the training set of multiple DGA domain names and normal domain name is included for building；

Characteristic extracting module 620, for extracting the domain name feature of each domain name in the training set；

Module 630 is normalized, characteristic set is obtained for domain name feature to be normalized；

Model building module 640, for building vertical domain name sorter model jointly based on the characteristic data set.

In summary, the DGA domain name Quick method and devices provided in an embodiment of the present invention based on machine learning, it is first First structure includes the training set of multiple DGA domain names and normal domain name, is provided enough subsequently to establish domain name sorter model Sample；Then the domain name feature of each domain name in the training set is extracted, using representative domain name feature as judgement Domain name whether be DGA domain names criterion；Domain name feature is normalized again and obtains characteristic set, with unified Each characteristic dimension, improve computational efficiency；It is finally based on the characteristic data set and builds vertical domain name sorter model jointly, you can just In training the domain name sorter model obtained to detect various unknown domain names using the machine learning, realize quick and accurate Judge whether domain name to be measured is DGA domain names.It is provided in an embodiment of the present invention based on machine learning relative to prior art DGA domain name Quick method and devices are extracted more rich, more representational domain name feature by the research to domain name； By the way that characteristic is normalized, training and test are can speed up, so as to improve computational efficiency；Finally utilize machine learning Algorithm is trained to characteristic set and obtains domain name sorter model, is improved while judging nicety rate is improved extensive Ability.The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for those skilled in the art For member, the present invention can have various modifications and variations.Any modification within the spirit and principles of the invention, being made, Equivalent substitution, improvement etc., should be included in the scope of the protection.

Claims

A kind of 1. DGA domain name Quick methods based on machine learning, it is characterised in that methods described includes：

Structure includes the training set of multiple DGA domain names and normal domain name；

Extract the domain name feature of each domain name in the training set；

Domain name feature is normalized and obtains characteristic set；

Vertical domain name sorter model is built jointly based on the characteristic data set.
2. according to the method for claim 1, it is characterised in that the domain name for extracting each domain name in the training set is special Sign, including：

Extract the length characteristic of each domain name in the training set；

Extract the n-gram features of each domain name in the training set；

Extract the transition probability feature of each domain name in the training set；

Extract the TLD suffix features of each domain name in the training set.
3. according to the method for claim 2, it is characterised in that the length for extracting each domain name in the training set is special Sign, including：

The Main Domain length of each domain name in the training set is extracted, and using the Main Domain length of specific Main Domain as described in The length characteristic of specific Main Domain.
4. according to the method for claim 2, it is characterised in that the n-gram for extracting each domain name in the training set is special Sign, including：

Count in all Main Domains in the training set frequency that n connected characters occur, and by n in all Main Domains The frequency that individual connected characters occur ranking from high to low；

Based on the frequency ranking that n connected characters occur in all Main Domains, n connected characters in specific Main Domain are calculated The average and variance of the frequency ranking of appearance, and the frequency ranking that n connected characters in the specific Main Domain are occurred is equal Value and n-gram feature of the variance as the specific Main Domain.
5. according to the method for claim 2, it is characterised in that extract the transition probability of each domain name in the training set Feature, including：

All Main Domains in the training set count to obtain Markov chain transfer matrix；

Based on the Markov chain transfer matrix, the transition probability of specific Main Domain is calculated, and by the specific Main Domain Transition probability feature of the transition probability as the specific Main Domain.
6. according to the method for claim 2, it is characterised in that extract the TLD suffix of each domain name in the training set Feature, including：

Extract all different TLD suffix of each domain name in the training set, construction TLD vectors；

For each sample TLD in the TLD vectors, value is 1 in corresponding dimension, and value is 0 in remaining dimension, is obtained Obtain TLD matrixes；

Based on the TLD matrixes, the TLD suffix features of certain domain name are obtained.
7. according to the method for claim 1, it is characterised in that vertical domain name grader mould is built jointly based on the characteristic data set Type, including：

Feature Dimension Reduction is carried out to the characteristic set, obtains the characteristic set after dimensionality reduction；

Characteristic set after the dimensionality reduction is trained using GBDT classifier algorithms, establishes domain name sorter model.
8. according to the method for claim 7, it is characterised in that Feature Dimension Reduction is carried out to the characteristic set, obtained Characteristic set after dimensionality reduction, including：

Feature Dimension Reduction is carried out to the characteristic set using PCA dimension reduction methods, obtains the characteristic set after dimensionality reduction.
9. according to the method for claim 1, it is characterised in that methods described also includes：

Treat detection domain name and carry out feature extraction, feature normalization and Feature Dimension Reduction successively, obtain the feature to be detected after dimensionality reduction Data；

The characteristic to be detected is loaded into domain name sorter model, judges whether the domain name to be detected is DGA domains Name.
A kind of 10. DGA domain name fast discriminating devices based on machine learning, it is characterised in that including：

Training set builds module, and the training set of multiple DGA domain names and normal domain name is included for building；

Characteristic extracting module, for extracting the domain name feature of each domain name in the training set；

Module is normalized, characteristic set is obtained for domain name feature to be normalized；

Model building module, for building vertical domain name sorter model jointly based on the characteristic data set.