CN107566389A - A kind of imitation URL link fishing domain name recognition methods based on C4.5 decision trees - Google Patents

A kind of imitation URL link fishing domain name recognition methods based on C4.5 decision trees Download PDF

Info

Publication number
CN107566389A
CN107566389A CN201710843991.3A CN201710843991A CN107566389A CN 107566389 A CN107566389 A CN 107566389A CN 201710843991 A CN201710843991 A CN 201710843991A CN 107566389 A CN107566389 A CN 107566389A
Authority
CN
China
Prior art keywords
mrow
msub
domain name
url link
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710843991.3A
Other languages
Chinese (zh)
Inventor
张永斌
姚强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ji'nan Mutual Trust Software Co Ltd
Original Assignee
Ji'nan Mutual Trust Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ji'nan Mutual Trust Software Co Ltd filed Critical Ji'nan Mutual Trust Software Co Ltd
Priority to CN201710843991.3A priority Critical patent/CN107566389A/en
Publication of CN107566389A publication Critical patent/CN107566389A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a kind of imitation URL link fishing domain name recognition methods based on C4.5 decision trees, comprise the following steps:S1, extraction imitate the domain name and feature of URL link;S2, based on C4.5 algorithms to imitate URL link domain name classify, build classification tree;S3, intercepted for meeting the domain name of type in classification tree.The present invention can extract high-risk domain name therein, detect the security of such domain name in real time.

Description

A kind of imitation URL link fishing domain name recognition methods based on C4.5 decision trees
Technical field
The present invention relates to Internet technical field, more particularly to a kind of imitation URL link fishing based on C4.5 decision trees Domain name recognition methods.
Background technology
Phishing is a kind of electronic theft behavior, by disguised oneself as in ecommerce a trustworthy entity come Sensitive information is obtained from the user there that there is not a shadow of doubt.With the popularization of internet, phishing is endangered to caused by Internet user Evil is more and more common, a large amount of fishing websites in network be present.Anti-phishing working group (Anti-Phishing Working Group, APWG 1,220,523 phishing attacks [1]) are found the fourth quarter in 2016.Chinese anti-phishing alliance (Anti-Phishing Alliance of China, APAC) find 4,958 fishing websites [2] altogether the first quarter in 2017.Situation of going fishing is quite tight It is high, serious influence is formed to network environment.Research is found:In fishing domain name there is obvious characteristic in a large amount of domain names, such as: Www.paypal.com.signin.country.en.locale.en.diamondzapper .com, lack the use of network knowledge Family is easy to regard such domain name as URL link.Referred to herein as this kind of domain name is imitation URL link domain name.Due to such domain name pair User's is fascinating stronger, thus the security of the such domain name of rapid evaluation, and to improving user's online experience, purification network has Important meaning.
The content of the invention
The invention provides a kind of imitation URL link fishing domain name recognition methods based on C4.5 decision trees, extraction is wherein High-risk domain name, detect the security of such domain name in real time.
In order to solve the above technical problems, the embodiment of the present application provides a kind of imitation URL link based on C4.5 decision trees Fishing domain name recognition methods, comprises the following steps:
S1, extraction imitate the domain name and feature of URL link;
S2, based on C4.5 algorithms to imitate URL link domain name classify, build classification tree;
S3, intercepted for meeting the domain name of type in classification tree.
As the preferable technical scheme of the present invention, imitate the domain name of URL link and be characterized as:
1) domain name series is higher, length is longer;
2) domain name character conversion frequency is high, and contiguous alphabet maximum length is shorter or continuous number maximum length is shorter;
3) the hyphen number of domain name is higher;
4) domain name includes brand name, and the position of brand name is more apparent;
5) most long subdomain name series is higher.
It is as follows as the preferable technical scheme of the present invention, the construction method of described classification tree:
Step1:Sample data is pre-processed, authority data form is to form the training set of decision tree;
Step2:Calculate the information gain-ratio of each attribute;
Assuming that training sample set is combined into S, training sample is divided into k classes, as C={ C1,C2,...,Ck, p (Si) represent Sample belongs to CiRatio, now shown in set S comentropy such as formula (1),
Assuming that property set is A, and A={ A1,A2,...,Am, select AjSample is divided for testing attribute, and is set Values(Aj) it is AjCodomain, then attribute AjInformation gain such as formula (2) shown in,
In formula:| S | the number of elements of sample set is represented, | Sv | it is attribute A in sample set SjIt is worth first prime number for v Amount, now, attribute A division sample sets S range and uniformity can be obtained, as shown in formula (3),
Thus, attribute A can be obtained by information gain and division informationjInformation gain-ratio, as shown in Equation 4,
Step3:Build decision-tree model
Select root node of the attribute (such as maximum subdomain name series) as decision tree with highest information gain-ratio. Attribute of the selection with highest information gain-ratio forms decision-tree model as node of divergence, recurrence in remaining candidate attribute.
The one or more technical schemes provided in the embodiment of the present application, have at least the following technical effects or advantages:
High-risk domain name therein is can extract, detects the security of such domain name in real time.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are only this hairs Some bright embodiments, for those of ordinary skill in the art, without having to pay creative labor, can be with Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is the flow circuit theory schematic diagram of the embodiment of the present application;
Fig. 2 is the brand name position significant degree distribution map of the embodiment of the present application domain name;
Fig. 3 is the embodiment of the present application domain name contiguous alphabet maximum length distribution map;
Fig. 4 is the maximum length figure of the embodiment of the present application figure continuous number;
Fig. 5 is the most long subdomain name significant degree figure of the embodiment of the present application figure.
Embodiment
The invention provides a kind of imitation URL link fishing domain name recognition methods based on C4.5 decision trees, extraction is wherein High-risk domain name, detect the security of such domain name in real time.
In order to be better understood from above-mentioned technical proposal, below in conjunction with Figure of description and specific embodiment to upper Technical scheme is stated to be described in detail.
A kind of imitation URL link fishing domain name recognition methods based on C4.5 decision trees described in the present embodiment, including with Lower step:
S1, extraction imitate the domain name and feature of URL link;
S2, based on C4.5 algorithms to imitate URL link domain name classify, build classification tree;
S3, intercepted for meeting the domain name of type in classification tree.
Wherein, in the present embodiment, imitate the domain name of URL link and be characterized as:
1) domain name series is higher, length is longer.And legitimate domain name is remembered for ease of user, its usual length is shorter, and level Number is relatively low.
2) domain name character conversion frequency is high, and contiguous alphabet maximum length is shorter or continuous number maximum length is shorter.And lead to Normal legitimate domain name names domain name by artificial mode, and for ease of memory legitimate domain name frequently with contiguous alphabet or numeral, word Female conversion frequency is smaller.
3) the hyphen number of domain name is higher.And the character framing of legitimate domain name is simpler, the quantity of hyphen is smaller.
4) domain name includes brand name, and the position of brand name is more apparent.Non- well-known legitimate domain name includes the situation of brand name It is then less.Brand name is placed in obvious position by the probability accessed for increase user, fisherman using brand name as subdomain name Put.In addition, well-known domain name is nested in domain name by some fishermans, the fascination of domain name is increased.
5) most long subdomain name series is higher.Real Main Domain is found to cause user to be not easy, generally goes fishing domain name most The series of long subdomain name is relatively low, and legitimate domain name does not possess this feature then.
Wherein, in the present embodiment, the construction method of described classification tree is as follows:
Step1:Sample data is pre-processed, authority data form is to form the training set of decision tree;
Step2:Calculate the information gain-ratio of each attribute;
Assuming that training sample set is combined into S, training sample is divided into k classes, as C={ C1,C2,...,Ck, p (Si) represent Sample belongs to CiRatio, now shown in set S comentropy such as formula (1),
Assuming that property set is A, and A={ A1,A2,...,Am, select AjSample is divided for testing attribute, and is set Values(Aj) it is AjCodomain, then attribute AjInformation gain such as formula (2) shown in,
In formula:| S | the number of elements of sample set is represented, | Sv | it is attribute A in sample set SjIt is worth first prime number for v Amount, now, attribute A division sample sets S range and uniformity can be obtained, as shown in formula (3),
Thus, attribute A can be obtained by information gain and division informationjInformation gain-ratio, as shown in Equation 4,
Step3:Build decision-tree model
Select root node of the attribute (such as maximum subdomain name series) as decision tree with highest information gain-ratio. Attribute of the selection with highest information gain-ratio forms decision-tree model as node of divergence, recurrence in remaining candidate attribute.
During decision tree is created, data noise and isolated point can cause the branch of training set abnormal.It is at this time, it may be necessary to logical The situation of the method processing data over-fitting of beta pruning is crossed, i.e., cuts off insecure branch by statistical measures so that after beta pruning Decision tree is more rapid and better classified data to be tested.
Test result and analysis
Data source
A large amount of known fishing domain names are collected from the website such as Phishtank, Openphish [15], Watcherlab, are used URL link domain name collection, and then therefrom domain name totally 2,008 of the extraction with obvious characteristic are imitated in the extraction of this paper domain names screening conditions It is individual, as negative sample.
Most of domain name is legal in internet, and domain name of going fishing is relatively seldom, and domain name data amount is very big, can not Manually marked, the access data of education network are collected in this experiment, filter the fishing domain name in data set, and therefrom extraction is imitated URL link domain name totally 171,834, as positive sample.
Classification performance is evaluated
Test feature is analyzed
2,008 imitation URL links marked fishing domain names are carried out with statistical analyses, partial analysis result such as Fig. 2,3, Shown in 4.Analysis finds brand name significant degree, contiguous alphabet maximum length, continuous number maximum length, most long subdomain name significant degree Feature, there is preferable discrimination for detection fishing domain name.
As shown in Figure 2, imitate in the fishing domain name of URL link containing brand name it is more, constitute about 36%, and brand name exists Position in domain name is more obvious;And in legitimate domain name, about 93% domain name is not present brand name, and its position significant degree compared with It is low.
From the figure 3, it may be seen that the maximum length of about 56% secure domain name contiguous alphabet is less than 20, and about 94% imitation The maximum length of URL link fishing domain name contiguous alphabet is less than 20.Imitate the contiguous alphabet maximum length of URL link fishing domain name It is relatively low, and the contiguous alphabet maximum length of secure domain name is higher.
As shown in Figure 4, continuous number is not present in about 65% imitation URL link fishing domain name, and in legitimate domain name Continuous number is not present in only about 13% domain name, and the continuous number maximum length of imitation URL link fishing domain name is relatively low, and The continuous number maximum length of secure domain name is higher.
As shown in Figure 5, when most long subdomain name significant degree is less than 0.67, it is total that imitation URL link fishing domain name constitutes about its The 21% of body, and have 45% legitimate domain name within the range;It is more apparent to imitate the most long subdomain name of URL link fishing domain name, And the most long subdomain name average of legitimate domain name is less than fishing domain name.
Classifier Performance Evaluation
Each 1,041 domain name of extraction at random from fishing set of domains, secure domain name set, respectively as the negative of training set Sample and positive sample.Classification checking is carried out to the data set using C4.5 decision tree classifiers and using ten folding cross validations, tied Fruit is as shown in table 1.
Table 1 imitates URL link domain name training set classifying quality
As shown in Table 1, the grader 91.80% is respectively reached to the recognition accuracy of fishing domain name, secure domain name, 96.80%, it can thus be concluded that the grader can effectively extract the high-risk domain name imitated in URL link domain name.Domain name in experiment is missed Report situation is analyzed, and has a small amount of fishing domain name to be reported by mistake for secure domain name, such as:Wp-secured-accout.com, it is Because domain name fishing feature does not cause significantly to report by mistake;Some secure domain names reported by mistake for go fishing domain name, such as:Certain domain of Kingsoft cloud Name bd7316f02e7e46499eda436584d213dc.trace-ldns.ksyun.com, the level Four domain name of the domain name use Random string, similitude be present with some imitation URL link fishing domain names, cause domain name to be reported by mistake.
Grader classifying quality
Due to the ratio very little for domain name of being gone fishing in real network, for the true grader for reflecting this chapter models in live network Effect, the secure domain name number of use is tested far above fishing domain name number, to simulate live network detection scene.Experiment uses 30,000 secure domain name, 967 fishing domain names.Classification results are as shown in table 5.
Table 2 imitates URL link domain name classification results
As shown in Table 2,1.00%, 2.70% secure domain name and fishing domain name are reported by mistake.Analyze the mistake in domain name detection Report situation, the secure domain name reported by mistake be mainly content distributing network (Content Delivery Network, CDN) domain name, Agent software domain name.Such as:1445516683-state-connected.D4EE071C9C86.1445535542.cc.hiwif I.com, the domain name are that pole route domain name, site information to be connected are converted into random character when asking to connect, due to its word It is similar to imitating URL link fishing domain name to accord with construction feature, is thus reported by mistake;128a5743c1148cd503b9ced8e54948 0b.google.com.dnsbl7.mailshell.net is Network Security Service company Mailshell detection data message Secure domain name, because subdomain name includes brand name Google, brand name position is more apparent, and domain name alphanumeric conversion frequency Higher, thus the domain name is mistaken for domain name of going fishing.A small amount of fishing domain name is reported by mistake is the discovery that these domains for secure domain name, analysis Caused by the feature unobvious of name.
Analysis of experimental results is with discussing
In summary, the imitation URL link fishing domain name identification model based on C4.5 decision trees can effective detection fishing domain Name.But the experiment is there is also certain rate of failing to report, a large amount of agent softwares, well-known website CDN domain names are reported by mistake as domain name of going fishing, The experiment later stage can arrange these domain name lists, and the list is added into white list with filtering safe domain name;To what is easily failed to judge Feature unobvious domain name, the experiment later stage will be furtherd investigate, and excavate more validity feature information, to improve fishing domain name Verification and measurement ratio.
The above described is only a preferred embodiment of the present invention, any formal limitation not is made to the present invention, though So the present invention is disclosed above with preferred embodiment, but is not limited to the present invention, any to be familiar with this professional technology people Member, without departing from the scope of the present invention, when the technology contents using the disclosure above make a little change or modification For the equivalent embodiment of equivalent variations, as long as being the content without departing from technical solution of the present invention, the technical spirit according to the present invention Any simple modification, equivalent change and modification made to above example, in the range of still falling within technical solution of the present invention.

Claims (3)

1. a kind of imitation URL link fishing domain name recognition methods based on C4.5 decision trees, it is characterised in that including following step Suddenly:
S1, extraction imitate the domain name and feature of URL link;
S2, based on C4.5 algorithms to imitate URL link domain name classify, build classification tree;
S3, intercepted for meeting the domain name of type in classification tree.
2. a kind of imitation URL link fishing domain name recognition methods based on C4.5 decision trees according to claim 1, it is special Sign is:Imitate the domain name of URL link and be characterized as:
1) domain name series is higher, length is longer;
2) domain name character conversion frequency is high, and contiguous alphabet maximum length is shorter or continuous number maximum length is shorter;
3) the hyphen number of domain name is higher;
4) domain name includes brand name, and the position of brand name is more apparent;
5) most long subdomain name series is higher.
3. a kind of imitation URL link fishing domain name recognition methods based on C4.5 decision trees according to claim 1, it is special Sign is:The construction method of described classification tree is as follows:
Step1:Sample data is pre-processed, authority data form is to form the training set of decision tree;
Step2:Calculate the information gain-ratio of each attribute;
Assuming that training sample set is combined into S, training sample is divided into k classes, as C={ C1,C2,...,Ck, p (Si) represent sample category In CiRatio, now shown in set S comentropy such as formula (1),
<mrow> <mi>I</mi> <mrow> <mo>(</mo> <mi>S</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>k</mi> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>S</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <msub> <mi>log</mi> <mn>2</mn> </msub> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>S</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>
Assuming that property set is A, and A={ A1,A2,...,Am, select AjSample is divided for testing attribute, and sets Values (Aj) For AjCodomain, then attribute AjInformation gain such as formula (2) shown in,
<mrow> <mi>G</mi> <mi>a</mi> <mi>i</mi> <mi>n</mi> <mrow> <mo>(</mo> <mi>S</mi> <mo>,</mo> <msub> <mi>A</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mi>I</mi> <mrow> <mo>(</mo> <mi>S</mi> <mo>)</mo> </mrow> <mo>-</mo> <munder> <mo>&amp;Sigma;</mo> <mrow> <mi>v</mi> <mo>&amp;Element;</mo> <mi>V</mi> <mi>a</mi> <mi>l</mi> <mi>u</mi> <mi>e</mi> <mi>s</mi> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> </munder> <mfrac> <mrow> <mo>|</mo> <msub> <mi>S</mi> <mi>v</mi> </msub> <mo>|</mo> </mrow> <mrow> <mo>|</mo> <mi>S</mi> <mo>|</mo> </mrow> </mfrac> <mi>I</mi> <mrow> <mo>(</mo> <msub> <mi>S</mi> <mi>v</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>
In formula:| S | the number of elements of sample set is represented, | Sv | it is attribute A in sample set SjIt is worth the number of elements for v, this When, range and uniformity that attribute A divides sample set S can be obtained, as shown in formula (3),
<mrow> <mi>S</mi> <mi>p</mi> <mi>l</mi> <mi>i</mi> <mi>t</mi> <mi>I</mi> <mi>n</mi> <mi>f</mi> <mi>o</mi> <mrow> <mo>(</mo> <mi>S</mi> <mo>,</mo> <msub> <mi>A</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <munder> <mo>&amp;Sigma;</mo> <mrow> <mi>v</mi> <mo>&amp;Element;</mo> <mi>V</mi> <mi>a</mi> <mi>l</mi> <mi>u</mi> <mi>e</mi> <mi>s</mi> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> </munder> <mfrac> <mrow> <mo>|</mo> <msub> <mi>S</mi> <mi>v</mi> </msub> <mo>|</mo> </mrow> <mrow> <mo>|</mo> <mi>S</mi> <mo>|</mo> </mrow> </mfrac> <msub> <mi>log</mi> <mn>2</mn> </msub> <mfrac> <mrow> <mo>|</mo> <msub> <mi>S</mi> <mi>v</mi> </msub> <mo>|</mo> </mrow> <mrow> <mo>|</mo> <mi>S</mi> <mo>|</mo> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>
Thus, attribute A can be obtained by information gain and division informationjInformation gain-ratio, as shown in Equation 4,
<mrow> <mi>R</mi> <mi>a</mi> <mi>t</mi> <mi>i</mi> <mi>o</mi> <mrow> <mo>(</mo> <mi>S</mi> <mo>,</mo> <msub> <mi>A</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>G</mi> <mi>a</mi> <mi>i</mi> <mi>n</mi> <mrow> <mo>(</mo> <mi>S</mi> <mo>,</mo> <msub> <mi>A</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mi>S</mi> <mi>p</mi> <mi>l</mi> <mi>i</mi> <mi>t</mi> <mi>I</mi> <mi>n</mi> <mi>f</mi> <mi>o</mi> <mrow> <mo>(</mo> <mi>S</mi> <mo>,</mo> <msub> <mi>A</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>
Step3:Build decision-tree model
Root node of the attribute (such as maximum subdomain name series) as decision tree with highest information gain-ratio is selected, remaining Candidate attribute in selection with highest information gain-ratio attribute be used as node of divergence, recurrence formation decision-tree model.
CN201710843991.3A 2017-09-19 2017-09-19 A kind of imitation URL link fishing domain name recognition methods based on C4.5 decision trees Pending CN107566389A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710843991.3A CN107566389A (en) 2017-09-19 2017-09-19 A kind of imitation URL link fishing domain name recognition methods based on C4.5 decision trees

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710843991.3A CN107566389A (en) 2017-09-19 2017-09-19 A kind of imitation URL link fishing domain name recognition methods based on C4.5 decision trees

Publications (1)

Publication Number Publication Date
CN107566389A true CN107566389A (en) 2018-01-09

Family

ID=60980150

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710843991.3A Pending CN107566389A (en) 2017-09-19 2017-09-19 A kind of imitation URL link fishing domain name recognition methods based on C4.5 decision trees

Country Status (1)

Country Link
CN (1) CN107566389A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109450880A (en) * 2018-10-26 2019-03-08 平安科技(深圳)有限公司 Detection method for phishing site, device and computer equipment based on decision tree
CN111049816A (en) * 2019-12-04 2020-04-21 北京奇虎科技有限公司 Method and device for filtering domain name address and computer readable storage medium
CN111209683A (en) * 2020-01-15 2020-05-29 山东超越数控电子股份有限公司 Method for constructing recognition model of working state of aircraft engine, model and recognition method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102739679A (en) * 2012-06-29 2012-10-17 东南大学 URL(Uniform Resource Locator) classification-based phishing website detection method
CN102790762A (en) * 2012-06-18 2012-11-21 东南大学 Phishing website detection method based on uniform resource locator (URL) classification
CN104077396A (en) * 2014-07-01 2014-10-01 清华大学深圳研究生院 Method and device for detecting phishing website
CN105357221A (en) * 2015-12-04 2016-02-24 北京奇虎科技有限公司 Method and apparatus for identifying phishing website
CN106096748A (en) * 2016-04-28 2016-11-09 武汉宝钢华中贸易有限公司 Entrucking forecast model in man-hour based on cluster analysis and decision Tree algorithms

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102790762A (en) * 2012-06-18 2012-11-21 东南大学 Phishing website detection method based on uniform resource locator (URL) classification
CN102739679A (en) * 2012-06-29 2012-10-17 东南大学 URL(Uniform Resource Locator) classification-based phishing website detection method
CN104077396A (en) * 2014-07-01 2014-10-01 清华大学深圳研究生院 Method and device for detecting phishing website
CN105357221A (en) * 2015-12-04 2016-02-24 北京奇虎科技有限公司 Method and apparatus for identifying phishing website
CN106096748A (en) * 2016-04-28 2016-11-09 武汉宝钢华中贸易有限公司 Entrucking forecast model in man-hour based on cluster analysis and decision Tree algorithms

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DARLING M , HEILEMAN G , GRESSEL G , ET AL: "A lexical approach for classifying malicious URLs", 《IEEE》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109450880A (en) * 2018-10-26 2019-03-08 平安科技(深圳)有限公司 Detection method for phishing site, device and computer equipment based on decision tree
CN111049816A (en) * 2019-12-04 2020-04-21 北京奇虎科技有限公司 Method and device for filtering domain name address and computer readable storage medium
CN111209683A (en) * 2020-01-15 2020-05-29 山东超越数控电子股份有限公司 Method for constructing recognition model of working state of aircraft engine, model and recognition method

Similar Documents

Publication Publication Date Title
CN109510815A (en) A kind of multistage detection method for phishing site and detection system based on supervised learning
CN103544436B (en) System and method for distinguishing phishing websites
CN104040963B (en) The system and method for carrying out spam detection for the frequency spectrum using character string
CN105224600B (en) A kind of detection method and device of Sample Similarity
CN108833139B (en) OSSEC alarm data aggregation method based on category attribute division
CN107566389A (en) A kind of imitation URL link fishing domain name recognition methods based on C4.5 decision trees
CN102790762A (en) Phishing website detection method based on uniform resource locator (URL) classification
CN107438083B (en) Detection method for phishing site and its detection system under a kind of Android environment
CN109873810A (en) A kind of phishing detectin method based on cup ascidian group&#39;s algorithm support vector machines
WO2020082763A1 (en) Decision trees-based method and apparatus for detecting phishing website, and computer device
CN110493262A (en) It is a kind of to improve the network attack detecting method classified and system
CN108337255A (en) A kind of detection method for phishing site learnt based on web automatic tests and width
CN107046586A (en) A kind of algorithm generation domain name detection method based on natural language feature
CN106960040A (en) A kind of URL classification determines method and device
CN109359137A (en) Based on user&#39;s growth of Feature Selection and semi-supervised learning portrait construction method
CN109391584A (en) A kind of recognition methods of doubtful malicious websites and device
Nakamura et al. A multifaceted approach to analyzing taxonomic, functional, and phylogenetic β diversity
CN108734159A (en) The detection method and system of sensitive information in a kind of image
CN104933365B (en) A kind of malicious code based on calling custom automates homologous decision method and system
CN107612911A (en) Method based on the infected main frame of DNS flow detections and C&C servers
Leão et al. Evolutionary patterns in the geographic range size of Atlantic Forest plants
CN113052577A (en) Method and system for estimating category of virtual address of block chain digital currency
CN113438209B (en) Phishing website detection method based on improved Stacking strategy
Wu et al. Extracting link spam using biased random walks from spam seed sets
CN104750828A (en) Induction and deduction knowledge unconsciousness seal-learning method based on 6w rule

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180109

WD01 Invention patent application deemed withdrawn after publication