CN105338001A - Method and device for recognizing phishing website - Google Patents

Method and device for recognizing phishing website Download PDF

Info

Publication number
CN105338001A
CN105338001A CN201510886094.1A CN201510886094A CN105338001A CN 105338001 A CN105338001 A CN 105338001A CN 201510886094 A CN201510886094 A CN 201510886094A CN 105338001 A CN105338001 A CN 105338001A
Authority
CN
China
Prior art keywords
page
characteristic vector
participle
fishing website
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510886094.1A
Other languages
Chinese (zh)
Inventor
李晓波
尹露
杨晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201510886094.1A priority Critical patent/CN105338001A/en
Publication of CN105338001A publication Critical patent/CN105338001A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1483Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2119Authenticating web pages, e.g. with suspicious links

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and device for recognizing a phishing website, and relates to the field of Internet security. The method and the device are invented for solving the problem that the phishing website recognition accuracy is low. The method includes the method of obtaining a page of a known phishing website to serve as a sample page, extracting characteristic vectors of the page from the sample page, training a detection model through the characteristic vectors of the page, detecting an unknown page through the detection model, and obtaining the detection result showing whether the unknown page is the page of the phishing website or not. The method and the device are mainly applied to the process of providing network security guarantee service for internet users through a third party security mechanism.

Description

Identify method and the device of fishing website
Technical field
The present invention relates to internet security field, particularly relate to a kind of method and the device that identify fishing website.
Background technology
Fishing website typically refers to disguise oneself as website of bank or e-commerce website, in order to steal the website of the personal information that user submits to.Lawless person utilizes the URL(uniform resource locator) (UniformResourceLocator of the counterfeit actual site of various means, be called for short URL) and content of pages, induction user accesses counterfeit content of pages, gains the personal information such as account No., password of user's input with this by cheating.The appearance of fishing website has had a strong impact on the development of on-line finance service, destroys the confidence that the public uses the Internet.Therefore effective identification is carried out to fishing website and just become an important process in internet security field.
The mode of existing identification fishing website is: to domain name registration information or the certificate information of third party's domain name registration of website query aim website, by the examination to domain name registration information or certificate information, judges whether targeted website is fishing website.Such as, when domain name registration time of targeted website and current query time very close to time, illustrate that targeted website is the website of recently registering, one of this situation feature belonging to fishing website.Or when domain name is expired and do not extend, targeted website is that the suspicion of fishing website is also larger.
Existing mode is mainly using the domain-name information from domain name registration side as the foundation identifying fishing website, but domain-name information directly can not reflect the page feature of fishing website, such as other Website page patterns counterfeit, display swindle information etc.Existing mode can only be carry out regularity summarization to the domain-name information of fishing website, and identified fishing website by the regular feature of domain-name information, therefore the accuracy of this mode is lower.Such as, also may reach the standard grade in some regular website in the registration of nearest a period of time, more lately just can not be defined as fishing website because of the domain name registration time; Again such as, some regular website may be crossed after date in domain name and forget extension (certainly, can redeem domain name in certain time limit), and only just website is defined as fishing website because domain name is expired, it is also not proper for doing obviously like this.
Summary of the invention
The invention provides a kind of method and the device that identify fishing website, the problem that fishing website recognition accuracy is low can be solved.
For solving the problem, the invention provides a kind of method identifying fishing website on the one hand, the method comprises:
Obtain the page of known fishing website, as sample page;
The characteristic vector of the page is extracted from sample page;
By the characteristic vector training detection model of the page;
Use detection model to detect the unknown page, obtain the testing result whether unknown page is the fishing website page.
Present invention also offers a kind of device identifying fishing website on the other hand, this device comprises:
Acquiring unit, for obtaining the page of known fishing website, as sample page;
Extraction unit, for extracting the characteristic vector of the page from sample page;
Training unit, for the characteristic vector training detection model by the page;
Detecting unit, for using detection model to detect the unknown page, obtains the testing result whether unknown page is the fishing website page.
The method of identification fishing website provided by the invention and device, with the page of known fishing website for sample page, can train detection model by the characteristic vector in the page.Then use detection model to detect the unknown page, if the unknown page has the same or analogous characteristic vector with sample page, then can determine that the unknown page is the fishing website page.Compared with prior art, the present invention is directly can reflect that the page feature vector of fishing website pattern feature is as the basis for estimation identifying fishing website, and use the characteristic vector of detection model to a large amount of fishing website page to learn, comprehensively to absorb the pattern feature of the various fishing website page as far as possible, the accuracy identifying fishing website therefore can be improved.
Above-mentioned explanation is only the general introduction of technical solution of the present invention, in order to technological means of the present invention can be better understood, and can be implemented according to the content of specification, and can become apparent, below especially exemplified by the specific embodiment of the present invention to allow above and other objects of the present invention, feature and advantage.
Accompanying drawing explanation
By reading hereafter detailed description of the preferred embodiment, various other advantage and benefit will become cheer and bright for those of ordinary skill in the art.Accompanying drawing only for illustrating the object of preferred implementation, and does not think limitation of the present invention.And in whole accompanying drawing, represent identical parts by identical reference symbol.In the accompanying drawings:
Fig. 1 shows a kind of method flow diagram identifying fishing website that the embodiment of the present invention provides;
Fig. 2 shows the method flow diagram of the another kind identification fishing website that the embodiment of the present invention provides;
Fig. 3 shows a kind of composition frame chart identifying the device of fishing website that the embodiment of the present invention provides;
Fig. 4 shows the composition frame chart of the device of the another kind identification fishing website that the embodiment of the present invention provides.
Embodiment
Below with reference to accompanying drawings exemplary embodiment of the present disclosure is described in more detail.Although show exemplary embodiment of the present disclosure in accompanying drawing, however should be appreciated that can realize the disclosure in a variety of manners and not should limit by the embodiment set forth here.On the contrary, provide these embodiments to be in order to more thoroughly the disclosure can be understood, and complete for the scope of the present disclosure can be conveyed to those skilled in the art.
Embodiments provide a kind of method identifying fishing website, as shown in Figure 1, the method comprises:
101, the page of known fishing website is obtained, as sample page.
The embodiment of the present invention can be realized by third-party security server or monitor client.The entity form of monitor client includes but not limited to it is mobile phone, PC (PersonalComputer is called for short PC), panel computer and wearable electronic equipment.For ease of stating, the embodiment of the present invention is follow-up will be described for security server.
In the present embodiment, the data material of training detection model is the Webpage of known fishing website, and server can collect the Webpage of known fishing website by existing means, include but not limited to be: 1, obtained by the URL request of the page; 2, acquisition is reported by the network user; 3, obtain to third-party monitoring mechanism; 4, obtained by web crawlers.In practical application, the data material of training detection model can disposablely provide, and also passing in time constantly can carry out incremental update, the present embodiment does not limit the quantity of data material.
102, from sample page, extract the characteristic vector of the page.
Between the page (follow-up referred to as the fishing page) of fishing website and the page of regular website; content and structural style exist some differences; such as go fishing the HTML (HyperTextMarkupLanguage of the page; be called for short HTML) in usually can there is the nested phenomenon of title label; the regular page then there will not be this problem, or there is the prize information etc. of counterfeit website of bank or e-commerce website in fishing page body.In the present embodiment, the information that these can embody fishing website feature is referred to as characteristic vector, server extracts these characteristic vectors from the fishing page, to detection model training, to make detection model obtain the criterion identifying fishing website, thus effectively can identify the page of fishing website follow-up.In the present embodiment, characteristic vector can come from the html source code of the page, and can be also the graph text information presented in the page, the present embodiment be restricted this.
In practical application, the value volume and range of product of sample page is more, and the recognition accuracy of detection model is higher.
103, by the characteristic vector training detection model of the page.
Use the characteristic vector extracted from sample page to detection model training.In the present embodiment, detection model can be trained by the mode of machine learning.Mode of learning specifically can comprise supervised study, the study of non-supervisory formula, semi-supervised learning, intensified learning etc.In the kind of learning algorithm, regression algorithm, the algorithm of Case-based Reasoning, regularization algorithm, decision Tree algorithms, bayesian algorithm, algorithm, clustering algorithm, sorting algorithm, association rule algorithm, artificial neural net, degree of depth study based on core can be comprised, reduce dimension algorithm, Integrated Algorithm etc.The present embodiment does not specifically limit the mode of training detection model.
104, use detection model to detect the unknown page, obtain the testing result whether unknown page is the fishing website page.
After acquisition detection model, this model just can be used to have detected the unknown page.The so-called unknown page is exactly the page of the targeted website as detected object.The object detecting the unknown page is to judge that whether this page is the page of fishing website.
In a kind of implementation of the present embodiment, be whether the qualitative conclusions of the fishing page can not to the out position page, but provide the probability size that the unknown page may be the fishing page, the present embodiment does not specifically limit the pattern of testing result, that all feature based vector detection provide, can screen in the category that fishing website provides the information of guiding suggestion to be all contained in described testing result for user.
When detecting the unknown page, need to extract characteristic vector from the unknown page, characteristic vector described here refers to the information corresponding in structure or content with the characteristic vector of aforementioned sample page, but both might not be identical in result or content.This is because, for the unknown page, it is likely the fishing page is also likely the regular page, when for the fishing page, its characteristic vector can be identical with the characteristic vector of certain or some sample page, and when for the regular page, its characteristic vector is different from the characteristic vector of sample page.When extracting the characteristic vector of the unknown page without the need to the character of its characteristic vector also cannot be determined, as long as extract the content on correspondence position from the unknown page according to the mode extracting sample page characteristic vector.
After the characteristic vector obtaining the unknown page, use detection model to detect it, if meet examination criteria or detected rule, then determine that it is the fishing page, otherwise determine that it is secure page table.Certainly, also according to the matching degree of characteristic vector and examination criteria or detected rule, the probability size that the unknown page is the fishing page can be provided in practical application.
In the present embodiment, can sample page and the unknown page be directly inputted in detection model, without the need to performing separately the step of characteristic vector pickup.Can provide the store path of the page for detection model in practical application, or by special human-computer interaction interface, the page that outside inputs be sent to detection model, the present embodiment is not restricted this.
The method of the identification fishing website that the embodiment of the present invention provides, with the page of known fishing website for sample page, can train detection model by the characteristic vector in the page.Then use detection model to detect the unknown page, if the unknown page has the same or analogous characteristic vector with sample page, then can determine that the unknown page is the fishing website page.Compared with prior art, the embodiment of the present invention is directly can reflect that the page feature vector of fishing website pattern feature is as the basis for estimation identifying fishing website, and use the characteristic vector of detection model to a large amount of fishing website page to learn, comprehensively to absorb the pattern feature of the various fishing website page as far as possible, the accuracy identifying fishing website therefore can be improved.
Further, as to the refinement of method shown in Fig. 1 and expansion, the embodiment of the present invention additionally provides a kind of method identifying fishing website, and as shown in Figure 2, the method comprises:
201, the page of known fishing website is obtained, as sample page.
The implementation of this step is identical with the implementation of Fig. 1 step 101, repeats no more herein.
Further, in a kind of possibility of the present embodiment, the snapshot page that can obtain fishing website uses as sample page.In practical application, part fishing website has the defense mechanism for third party's safety detection website, this mechanism can when third party be by URL request Webpage, the host address (such as IP(Internet Protocol) (InternetProtocol is called for short IP) address) of requesting party is detected.If find that this host address is the host address of safety detection website, then Fishing net standing-meeting shields this host address, thus makes safety detection website cannot obtain the page of fishing website.And in this programme, server can to the snapshot page of search engine server request fishing website.Usually, third party's search engine can when the page is accessed first the main information (being referred to as SNAPSHOT INFO) of this page of buffer memory, these information can use as page info equally.Because third party's search engine can not shield the host address of safety detection website, therefore, the URL of fishing website can be sent to search engine by server, obtains the snapshot page of fishing website from search engine server.
Further, in Fig. 2 step 205, if the unknown page failure of server request, also can use to the snapshot page that the unknown page of search engine request is corresponding.
202, from sample page, extract the characteristic vector of the page.
In the present embodiment, server can obtain the characteristic vector of two types, and first is the characteristic vector obtained from the html source code of sample page, and second is the characteristic vector obtained from the text of this page.In practical application, only can obtain the characteristic vector of any one type above-mentioned, also can obtain the characteristic vector in source code and text respectively, the present embodiment is not restricted this.But it should be noted that, as previously mentioned, the characteristic vector extracted from the unknown page should be consistent with the type of characteristic vector in sample page.If use the characteristic vector training detection model extracted in source code and text, so with regard to needing, the characteristic vector in unknown page source code and text is detected in step 205.
Provide the implementation extracting characteristic vector from html source code and page body below respectively:
1, from the html source code of sample page, characteristic vector is extracted
Contain again following three kinds of different implementations in the manner, these three kinds of implementations can select a use, also can arbitrarily select at least two kinds of implementations to be combined.
1.1, from html source code, the quantity presetting label is extracted, as characteristic vector
There is different Tag labels in the html source code of the page, server can be added up the quantity of these labels, obtains the characteristic vector of fishing website.For label " title ", usual fishing website, in order to defend the detection behavior of safety detection website, can carry out nested to title label, namely arranges an empty title label, in this label, then establishes the title label that real again.Then there will not be this problem in the page source code of regular website, in regular page source code, only have one deck title label.Therefore the characteristic vector of the quantity of title label as sample page can carry out extracting and using by server.
In practical application, the label that server can be added up includes but not limited to it is following several label: <title>, <body>, <h>, <p>, <a>, <img> etc.
1.2, from html source code, predetermined keyword is extracted, as characteristic vector
Some keywords in html source code also can embody the feature of the fishing page, such as " window.location ", " ducment.net " etc.Server can be added up these specific keywords, obtains the characteristic vector of sample page.In practical application, whether can there is particular keywords in the statistical sample page in server, and also can add up the quantity that particular keywords occurs, the present embodiment is not restricted this.
1.3, from html source code, the code burst of the forward predetermined number of contribution degree rank is extracted, as characteristic vector
In the manner, server can carry out the extraction of characteristic vector to html source code entirety.Server extracts the code burst of the forward predetermined number of contribution degree rank from html source code, as characteristic vector.So-called code burst refers to the code character string intercepted according to preset length, and so-called contribution degree to be then different from the percentage contribution of other pages to sample page for characterizing code burst.
Concrete, server can extract the code burst of the forward predetermined number of contribution degree rank according to following flow process:
S1, html source code is converted to binary data stream.
Source code is converted to the manageable binary data stream of machine by server.
S2, by presetting the time window of byte length, binary data stream to be cut into slices, obtaining multiple code burst.
In the present embodiment, the default byte length of time window can be set to 10 or 16, and the present embodiment does not limit concrete byte length.After acquisition data flow, from the first character of data flow joint, (can certainly be from last byte forward) service time, window intercepted the character string that a length is 10 bytes or 16 bytes backward, then a time window mobile byte backward, continue to intercept character late string, by that analogy, till time window moves to data flow ending place.
Exemplary, tentation data stream length is 256 bytes, and the byte length of time window is 10, and the code burst quantity that server is intercepted by time window is 247.The code burst quantity that server intercepts is determined jointly by the length of data flow and the byte length of time window, for the data flow of N number of byte, if use the time window that byte length is X to intercept, then can obtain " N-(X-1) " individual code burst.
S3, to obtain code burst sort according to contribution degree size.
Can use information gain algorithm (informationgain) in the present embodiment to code burst according to contribution degree rank, the code burst rank that contribution degree is large is forward.
S4, from the code burst after sequence, extract the code burst of the forward predetermined number of contribution degree rank.
This predetermined number can pre-set, such as 1000,3000 or 10000.Server, from the code burst ranked the first, extracts front 1000 or 3000 code bursts, uses as characteristic vector.
Further, in the manner, server can service time window do after data flow is caused in the past and once cut into slices, line ordering of going forward side by side obtains a predetermined number code burst, and then using window at the same time once to cut into slices to front doing from rear to data flow, line ordering of going forward side by side obtains a predetermined number code burst.The code burst of twice acquisition uses as characteristic vector the most at last.
Further, in the manner, server can also use the time window of different length to perform above-mentioned steps S1 to step S4 to data flow respectively, is then used as characteristic vector in the lump by the code burst repeatedly obtained.
2, from the text of sample page, characteristic vector is extracted
So-called text refers to the macroscopic data content of user in the page, comprises word, picture, link etc.Process mainly for the word in text in the present embodiment.Different from html source code, the word in the manner is the word that natural language uses, and the present embodiment, for Chinese, extracts the characteristic vector of hanzi form.Server extracts Word message from the page, removes the part such as title, label, obtains body matter.Then participle is carried out to body matter, obtain multiple participle meeting natural language custom, finally extract the participle of the forward predetermined number of contribution degree rank, as characteristic vector, this contribution degree to be different from the percentage contribution of other pages to sample page for characterizing participle.
Concrete, server can extract the participle of the forward predetermined number of contribution degree rank according to following flow process:
S1, by segmentation methods, participle is carried out to the character string of text, obtain multiple participle.
Server can carry out participle by N-gram algorithm to the character string of body part, exemplary for body matter " my mood today is all well and good ", the participle obtained after performing step S1 is " I ", " today ", " mood ", " very ", " well ".In practical application, the participle quantity of acquisition is determined by text length and content thereof, and the present embodiment does not limit the participle quantity obtained.
Further, for obtaining word segmentation result more accurately, the present embodiment can also carry out forward word segmentation processing and reverse word segmentation processing respectively to body matter, and then compared by the participle that twice process obtains, the weight size according to participle is selected the participle more meeting speech habits.So-called forward word segmentation processing refers to that order carries out participle backward from the start-up portion of body matter, and so-called reverse word segmentation processing refers to from the ending start sequence of body matter carries out participle forward.So-called weighted value refers to the probability size characterizing participle and meet speech habits, and server can inquire the weighted value of each participle from the dictionary for word segmentation preset.
Further, for some proper noun, such as name, place name etc., server can also use proper noun dictionary to carry out participle, to ensure the accuracy of participle.Such as word " defendant population's construction ", wherein " fourth construction " is name, and the word segmentation result meeting speech habits should be: " defendant ", " fourth construction ".But because server cannot know that " fourth construction " is proper noun, therefore when carrying out forward word segmentation processing, the word segmentation result of mistake will be obtained: " defendant ", " population ", " construction ".In the present embodiment, proper noun dictionary is used can to avoid obtaining the word segmentation result of mistake.
S2, to obtain participle sort according to contribution degree size.
Can use information gain algorithm (informationgain) in the present embodiment to all participles according to contribution degree rank, the code burst rank that contribution degree is large is forward.
S3, from the participle after sequence, extract the participle of the forward predetermined number of contribution degree rank.
Similar with aforesaid way, this predetermined number also can pre-set, such as 1000,3000 or 10000.Server, from the participle ranked the first, extracts front 1000 or 3000 participles, uses as characteristic vector.
Above the content that can be used as characteristic vector in the present embodiment is introduced, in practical application, server can select a few category feature vectors wherein to use (such as number of labels, keyword), and all types of characteristic vector also can be selected to use.For the latter's implementation, the quantity of the characteristic vector obtained during actual test generally from several ten thousand to hundreds of thousands not etc., identifies that the accuracy of the fishing page can reach more than 95%.
203, duplicate removal process is carried out to the characteristic vector extracted.
Usually, some content in the characteristic vector that server obtains is repetition, and such as, when repeatedly there is word " hope " in page body, the contribution degree rank of these words is identical, and server can obtain multiple participle " hope " as characteristic vector.In this case, just need to carry out duplicate removal process to these characteristic vectors, only retain a characteristic vector in multiple same characteristic features vector, and other characteristic vectors repeated are deleted.
204, by the characteristic vector training detection model of the page.
Server uses unduplicated characteristic vector to detection model training.
205, use detection model to detect the unknown page, obtain the testing result whether unknown page is the fishing website page.
The unknown page is input in detection model by server, extracts characteristic vector in the page, then mates with the characteristic vector of sample page.If the quantity of the characteristic vector of mating with certain sample page in the unknown page exceedes certain proportion, then can show that the unknown page is the result of the fishing page.
If the 206 unknown pages are the fishing website page, then the fishing website page is added in sample page set, evolution training is carried out to detection model.
Further, in a possibility of the present embodiment, when testing result determines the unknown page for the fishing page, this unknown page can also add in the set of sample page by server, based on the sample page after increment, evolution training is carried out to detection model so that follow-up, obtain the detection model of redaction.Usually, the scope that sample page quantity is more, sample page relates to is wider, and the accuracy of the detection model trained is higher.
Further, in a kind of implementation of the present embodiment, distributed server cluster can be adopted to perform flow process shown in Fig. 1 or Fig. 2.As previously mentioned, the characteristic vector quantity in sample page several ten thousand at least, at most hundreds of thousands, when learning a large amount of sample page, the mode that unit calculates can be very consuming time.In actual measurement, if use unit to learn several ten thousand sample page, so the training of detection model will expend the time of 2 to 3 days.And when adopting the sample page of the server cluster of pre-determined size to same quantity to learn, then can by time shorten by several hours.In practical application, the number of servers participating in calculating is more, and the entirety of training detection model is consuming time less.
Further, as the realization to method shown in Fig. 1 and Fig. 2, the embodiment of the present invention additionally provides a kind of device identifying fishing website, and this device can be positioned at security server or monitor client side.As shown in Figure 3, this device comprises: acquiring unit 31, extraction unit 32, training unit 33 and detecting unit 34.Wherein,
Acquiring unit 31, for obtaining the page of known fishing website, as sample page;
Extraction unit 32, for extracting the characteristic vector of the page from sample page;
Training unit 33, for the characteristic vector training detection model by the page;
Detecting unit 34, for using detection model to detect the unknown page, obtains the testing result whether unknown page is the fishing website page.
Further, as shown in Figure 4, extraction unit 32, comprising:
First extraction module 321, for extracting the characteristic vector of the page from the HTML html source code of sample page.
Further, the first extraction module 321, for extracting the quantity presetting label from html source code, as characteristic vector.
Further, the first extraction module 321, for extracting predetermined keyword from html source code, as characteristic vector.
Further, the first extraction module 321, for extracting the code burst of the forward predetermined number of contribution degree rank from html source code, as characteristic vector, contribution degree to be different from the percentage contribution of other pages to sample page for characterizing code burst.
Further, the first extraction module 321, for:
Html source code is converted to binary data stream;
By the time window of default byte length, binary data stream is cut into slices, obtain multiple code burst;
The code burst obtained is sorted according to contribution degree size;
The code burst of the forward predetermined number of contribution degree rank is extracted from the code burst after sequence.
Further, as shown in Figure 4, extraction unit 32, comprising:
Second extraction module 322, for extracting the characteristic vector of the page from the text of sample page.
Further, the second extraction module 322, for extracting the participle of the forward predetermined number of contribution degree rank from the text of sample page, as characteristic vector, contribution degree to be different from the percentage contribution of other pages to sample page for characterizing participle.
Further, the second extraction module 322, for:
By segmentation methods, participle is carried out to the character string of text, obtain multiple participle;
The participle obtained is sorted according to contribution degree size;
The participle of the forward predetermined number of contribution degree rank is extracted from the participle after sequence.
Further, described acquiring unit 31, for obtaining the snapshot page of known fishing website, as described sample page.
Further, as shown in Figure 4, this device comprises further:
Duplicate removal unit 35, for carrying out duplicate removal process to the characteristic vector extracted.
Further, the fishing website page, for when the unknown page is the fishing website page, adds in sample page set by acquiring unit 31, so that training unit 33 pairs of detection models carry out evolution training.
The device of the identification fishing website that the embodiment of the present invention provides, with the page of known fishing website for sample page, can train detection model by the characteristic vector in the page.Then use detection model to detect the unknown page, if the unknown page has the same or analogous characteristic vector with sample page, then can determine that the unknown page is the fishing website page.Compared with prior art, the embodiment of the present invention is directly can reflect that the page feature vector of fishing website pattern feature is as the basis for estimation identifying fishing website, and use the characteristic vector of detection model to a large amount of fishing website page to learn, comprehensively to absorb the pattern feature of the various fishing website page as far as possible, the accuracy identifying fishing website therefore can be improved.
Embodiments of the invention disclose:
A1, a kind of method identifying fishing website, it is characterized in that, described method comprises:
Obtain the page of known fishing website, as sample page;
The characteristic vector of the page is extracted from described sample page;
By the characteristic vector training detection model of the described page;
Use described detection model to detect the unknown page, obtain the testing result whether described unknown page is the fishing website page.
A2, method according to A1, it is characterized in that, the described characteristic vector extracting the page from described sample page, comprising:
The characteristic vector of the page is extracted from the HTML html source code of described sample page.
A3, method according to A2, it is characterized in that, the described characteristic vector extracting the page from the html source code of described sample page, comprising:
The quantity presetting label is extracted, as described characteristic vector from described html source code.
A4, method according to A2, it is characterized in that, the described characteristic vector extracting the page from the html source code of described sample page, comprising:
Predetermined keyword is extracted, as described characteristic vector from described html source code.
A5, method according to A2, it is characterized in that, the described characteristic vector extracting the page from the html source code of described sample page, comprising:
From described html source code, extract the code burst of the forward predetermined number of contribution degree rank, as described characteristic vector, described contribution degree to be different from the percentage contribution of other pages to described sample page for characterizing code burst.
A6, method according to A5, it is characterized in that, the described code burst extracting the forward predetermined number of contribution degree rank from described html source code, comprising:
Described html source code is converted to binary data stream;
By the time window of default byte length, described binary data stream is cut into slices, obtain multiple code burst;
The code burst obtained is sorted according to contribution degree size;
The code burst of the forward predetermined number of contribution degree rank is extracted from the code burst after sequence.
A7, method according to A1, it is characterized in that, the described characteristic vector extracting the page from described sample page, comprising:
The characteristic vector of the page is extracted from the text of described sample page.
A8, method according to A7, it is characterized in that, the described characteristic vector extracting the page from the text of described sample page, comprising:
From the text of sample page, extract the participle of the forward predetermined number of contribution degree rank, as described characteristic vector, described contribution degree to be different from the percentage contribution of other pages to described sample page for characterizing participle.
A9, method according to A8, it is characterized in that, the described participle extracting the forward predetermined number of contribution degree rank from the text of sample page, comprising:
By segmentation methods, participle is carried out to the character string of text, obtain multiple participle;
The participle obtained is sorted according to contribution degree size;
The participle of the forward predetermined number of contribution degree rank is extracted from the participle after sequence.
A10, method according to A1, it is characterized in that, the page of the known fishing website of described acquisition, as sample page, comprising:
Obtain the snapshot page of known fishing website, as described sample page.
A11, method according to A1, it is characterized in that, described method comprises further:
Duplicate removal process is carried out to the characteristic vector extracted.
A12, method according to any one of A1 to A11, it is characterized in that, described method comprises further:
If the described unknown page is the fishing website page, then the described fishing website page is added in sample page set, evolution training is carried out to described detection model.
B13, a kind of device identifying fishing website, it is characterized in that, described device comprises:
Acquiring unit, for obtaining the page of known fishing website, as sample page;
Extraction unit, for extracting the characteristic vector of the page from described sample page;
Training unit, for the characteristic vector training detection model by the described page;
Detecting unit, for using described detection model to detect the unknown page, obtains the testing result whether described unknown page is the fishing website page.
B14, device according to B13, it is characterized in that, described extraction unit, comprising:
First extraction module, for extracting the characteristic vector of the page from the HTML html source code of described sample page.
B15, device according to B14, is characterized in that, described first extraction module, for extracting the quantity presetting label from described html source code, as described characteristic vector.
B16, device according to B14, is characterized in that, described first extraction module, for extracting predetermined keyword from described html source code, as described characteristic vector.
B17, device according to B14, it is characterized in that, described first extraction module, for extracting the code burst of the forward predetermined number of contribution degree rank from described html source code, as described characteristic vector, described contribution degree to be different from the percentage contribution of other pages to described sample page for characterizing code burst.
B18, device according to B17, is characterized in that, described first extraction module, for:
Described html source code is converted to binary data stream;
By the time window of default byte length, described binary data stream is cut into slices, obtain multiple code burst;
The code burst obtained is sorted according to contribution degree size;
The code burst of the forward predetermined number of contribution degree rank is extracted from the code burst after sequence.
B19, device according to B13, it is characterized in that, described extraction unit, comprising:
Second extraction module, for extracting the characteristic vector of the page in the text from described sample page.
B20, device according to B19, it is characterized in that, described second extraction module, for extracting the participle of the forward predetermined number of contribution degree rank from the text of sample page, as described characteristic vector, described contribution degree to be different from the percentage contribution of other pages to described sample page for characterizing participle.
B21, device according to B20, is characterized in that, described second extraction module, for:
By segmentation methods, participle is carried out to the character string of text, obtain multiple participle;
The participle obtained is sorted according to contribution degree size;
The participle of the forward predetermined number of contribution degree rank is extracted from the participle after sequence.
B22, device according to B13, is characterized in that, described acquiring unit, for obtaining the snapshot page of known fishing website, as described sample page.
B23, device according to B13, it is characterized in that, described device comprises further:
Duplicate removal unit, for carrying out duplicate removal process to the characteristic vector extracted.
B24, device according to any one of B13 to 23, it is characterized in that, described acquiring unit is used for when the described unknown page is the fishing website page, is added in sample page set by the described fishing website page, so that described training unit carries out evolution training to described detection model.
In the above-described embodiments, the description of each embodiment is all emphasized particularly on different fields, in certain embodiment, there is no the part described in detail, can see the associated description of other embodiments.
Be understandable that, the correlated characteristic in said method and device can reference mutually.In addition, " first ", " second " in above-described embodiment etc. are for distinguishing each embodiment, and do not represent the quality of each embodiment.
Those skilled in the art can be well understood to, and for convenience and simplicity of description, the system of foregoing description, the specific works process of device and unit, with reference to the corresponding process in preceding method embodiment, can not repeat them here.
Intrinsic not relevant to any certain computer, virtual system or miscellaneous equipment with display at this algorithm provided.Various general-purpose system also can with use based on together with this teaching.According to description above, the structure constructed required by this type systematic is apparent.In addition, the present invention is not also for any certain programmed language.It should be understood that and various programming language can be utilized to realize content of the present invention described here, and the description done language-specific is above to disclose preferred forms of the present invention.
In specification provided herein, describe a large amount of detail.But can understand, embodiments of the invention can be put into practice when not having these details.In some instances, be not shown specifically known method, structure and technology, so that not fuzzy understanding of this description.
Similarly, be to be understood that, in order to simplify the disclosure and to help to understand in each inventive aspect one or more, in the description above to exemplary embodiment of the present invention, each feature of the present invention is grouped together in single embodiment, figure or the description to it sometimes.But, the method for the disclosure should be construed to the following intention of reflection: namely the present invention for required protection requires feature more more than the feature clearly recorded in each claim.Or rather, as claims below reflect, all features of disclosed single embodiment before inventive aspect is to be less than.Therefore, the claims following embodiment are incorporated to this embodiment thus clearly, and wherein each claim itself is as independent embodiment of the present invention.
Those skilled in the art are appreciated that and adaptively can change the module in the equipment in embodiment and they are arranged in one or more equipment different from this embodiment.Module in embodiment or unit or assembly can be combined into a module or unit or assembly, and multiple submodule or subelement or sub-component can be put them in addition.Except at least some in such feature and/or process or unit be mutually repel except, any combination can be adopted to combine all processes of all features disclosed in this specification (comprising adjoint claim, summary and accompanying drawing) and so disclosed any method or equipment or unit.Unless expressly stated otherwise, each feature disclosed in this specification (comprising adjoint claim, summary and accompanying drawing) can by providing identical, alternative features that is equivalent or similar object replaces.
In addition, those skilled in the art can understand, although embodiments more described herein to comprise in other embodiment some included feature instead of further feature, the combination of the feature of different embodiment means and to be within scope of the present invention and to form different embodiments.Such as, in the following claims, the one of any of embodiment required for protection can use with arbitrary compound mode.
All parts embodiment of the present invention with hardware implementing, or can realize with the software module run on one or more processor, or realizes with their combination.It will be understood by those of skill in the art that the some or all functions of the some or all parts in the denomination of invention (as determined the device of website internal chaining grade) that microprocessor or digital signal processor (DSP) can be used in practice to realize according to the embodiment of the present invention.The present invention can also be embodied as part or all equipment for performing method as described herein or device program (such as, computer program and computer program).Realizing program of the present invention and can store on a computer-readable medium like this, or the form of one or more signal can be had.Such signal can be downloaded from internet website and obtain, or provides on carrier signal, or provides with any other form.
The present invention will be described instead of limit the invention to it should be noted above-described embodiment, and those skilled in the art can design alternative embodiment when not departing from the scope of claims.In the claims, any reference symbol between bracket should be configured to limitations on claims.Word " comprises " not to be got rid of existence and does not arrange element in the claims or step.Word "a" or "an" before being positioned at element is not got rid of and be there is multiple such element.The present invention can by means of including the hardware of some different elements and realizing by means of the computer of suitably programming.In the unit claim listing some devices, several in these devices can be carry out imbody by same hardware branch.Word first, second and third-class use do not represent any order.Can be title by these word explanations.

Claims (10)

1. identify a method for fishing website, it is characterized in that, described method comprises:
Obtain the page of known fishing website, as sample page;
The characteristic vector of the page is extracted from described sample page;
By the characteristic vector training detection model of the described page;
Use described detection model to detect the unknown page, obtain the testing result whether described unknown page is the fishing website page.
2. method according to claim 1, is characterized in that, the described characteristic vector extracting the page from described sample page, comprising:
The characteristic vector of the page is extracted from the HTML html source code of described sample page.
3. method according to claim 2, is characterized in that, the described characteristic vector extracting the page from the html source code of described sample page, comprising:
The quantity presetting label is extracted, as described characteristic vector from described html source code.
4. method according to claim 2, is characterized in that, the described characteristic vector extracting the page from the html source code of described sample page, comprising:
Predetermined keyword is extracted, as described characteristic vector from described html source code.
5. method according to claim 2, is characterized in that, the described characteristic vector extracting the page from the html source code of described sample page, comprising:
From described html source code, extract the code burst of the forward predetermined number of contribution degree rank, as described characteristic vector, described contribution degree to be different from the percentage contribution of other pages to described sample page for characterizing code burst.
6. method according to claim 5, is characterized in that, the described code burst extracting the forward predetermined number of contribution degree rank from described html source code, comprising:
Described html source code is converted to binary data stream;
By the time window of default byte length, described binary data stream is cut into slices, obtain multiple code burst;
The code burst obtained is sorted according to contribution degree size;
The code burst of the forward predetermined number of contribution degree rank is extracted from the code burst after sequence.
7. method according to claim 1, is characterized in that, the described characteristic vector extracting the page from described sample page, comprising:
The characteristic vector of the page is extracted from the text of described sample page.
8. method according to claim 7, is characterized in that, the described characteristic vector extracting the page from the text of described sample page, comprising:
From the text of sample page, extract the participle of the forward predetermined number of contribution degree rank, as described characteristic vector, described contribution degree to be different from the percentage contribution of other pages to described sample page for characterizing participle.
9. method according to claim 8, is characterized in that, the described participle extracting the forward predetermined number of contribution degree rank from the text of sample page, comprising:
By segmentation methods, participle is carried out to the character string of text, obtain multiple participle;
The participle obtained is sorted according to contribution degree size;
The participle of the forward predetermined number of contribution degree rank is extracted from the participle after sequence.
10. identify a device for fishing website, it is characterized in that, described device comprises:
Acquiring unit, for obtaining the page of known fishing website, as sample page;
Extraction unit, for extracting the characteristic vector of the page from described sample page;
Training unit, for the characteristic vector training detection model by the described page;
Detecting unit, for using described detection model to detect the unknown page, obtains the testing result whether described unknown page is the fishing website page.
CN201510886094.1A 2015-12-04 2015-12-04 Method and device for recognizing phishing website Pending CN105338001A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510886094.1A CN105338001A (en) 2015-12-04 2015-12-04 Method and device for recognizing phishing website

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510886094.1A CN105338001A (en) 2015-12-04 2015-12-04 Method and device for recognizing phishing website

Publications (1)

Publication Number Publication Date
CN105338001A true CN105338001A (en) 2016-02-17

Family

ID=55288283

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510886094.1A Pending CN105338001A (en) 2015-12-04 2015-12-04 Method and device for recognizing phishing website

Country Status (1)

Country Link
CN (1) CN105338001A (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106227745A (en) * 2016-07-14 2016-12-14 杭州数梦工场科技有限公司 Data carding method between a kind of item set and device
CN106230848A (en) * 2016-08-11 2016-12-14 国家计算机网络与信息安全管理中心 A kind of method of Behavior-based control feature detection fishing website
CN106354800A (en) * 2016-08-26 2017-01-25 中国互联网络信息中心 Undesirable website detection method based on multi-dimensional feature
CN107204956A (en) * 2016-03-16 2017-09-26 腾讯科技(深圳)有限公司 website identification method and device
CN107438053A (en) * 2016-05-25 2017-12-05 阿里巴巴集团控股有限公司 Domain name recognition methods, device and server
CN107612893A (en) * 2017-09-01 2018-01-19 北京百悟科技有限公司 The auditing system and method and structure short message examination & verification model method of short message
CN108092963A (en) * 2017-12-08 2018-05-29 平安科技(深圳)有限公司 Web page identification method, device, computer equipment and storage medium
CN108111478A (en) * 2017-11-07 2018-06-01 中国互联网络信息中心 A kind of phishing recognition methods and device based on semantic understanding
CN108566399A (en) * 2018-04-23 2018-09-21 中国互联网络信息中心 Fishing website recognition methods and system
CN109617864A (en) * 2018-11-27 2019-04-12 烟台中科网络技术研究所 A kind of website identification method and website identifying system
CN110020249A (en) * 2017-12-28 2019-07-16 中国移动通信集团山东有限公司 A kind of caching method, device and the electronic equipment of URL resource
WO2020036622A1 (en) * 2018-08-14 2020-02-20 Didi Research America, Llc System and method for detecting generated domain
CN111291374A (en) * 2020-02-20 2020-06-16 支付宝(杭州)信息技术有限公司 Application program detection method, device and equipment
CN111400705A (en) * 2020-03-04 2020-07-10 支付宝(杭州)信息技术有限公司 Application program detection method, device and equipment
CN112688946A (en) * 2020-12-24 2021-04-20 工业信息安全(四川)创新中心有限公司 Method, module, storage medium, device and system for constructing abnormality detection features
CN113067820A (en) * 2021-03-19 2021-07-02 深圳市安络科技有限公司 Method, device and equipment for early warning abnormal webpage and/or APP
CN114095278A (en) * 2022-01-19 2022-02-25 南京明博互联网安全创新研究院有限公司 Phishing website detection method based on mixed feature selection frame
CN113779481B (en) * 2021-09-26 2024-04-09 恒安嘉新(北京)科技股份公司 Method, device, equipment and storage medium for identifying fraud websites

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1936892A1 (en) * 2005-10-15 2008-06-25 Huawei Technologies Co., Ltd. A system for controlling the security of network and a method thereof
CN102096781A (en) * 2011-01-18 2011-06-15 南京邮电大学 Fishing detection method based on webpage relevance
CN103544436A (en) * 2013-10-12 2014-01-29 深圳先进技术研究院 System and method for distinguishing phishing websites
CN104166725A (en) * 2014-08-26 2014-11-26 哈尔滨工业大学(威海) Phishing website detection method
CN104217160A (en) * 2014-09-19 2014-12-17 中国科学院深圳先进技术研究院 Method and system for detecting Chinese phishing website
CN104239582A (en) * 2014-10-14 2014-12-24 北京奇虎科技有限公司 Method and device for identifying phishing webpage based on feature vector model
CN104504335A (en) * 2014-12-24 2015-04-08 中国科学院深圳先进技术研究院 Fishing APP detection method and system based on page feature and URL feature
CN104537303A (en) * 2014-12-30 2015-04-22 中国科学院深圳先进技术研究院 Distinguishing system and method for phishing website

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1936892A1 (en) * 2005-10-15 2008-06-25 Huawei Technologies Co., Ltd. A system for controlling the security of network and a method thereof
CN102096781A (en) * 2011-01-18 2011-06-15 南京邮电大学 Fishing detection method based on webpage relevance
CN103544436A (en) * 2013-10-12 2014-01-29 深圳先进技术研究院 System and method for distinguishing phishing websites
CN104166725A (en) * 2014-08-26 2014-11-26 哈尔滨工业大学(威海) Phishing website detection method
CN104217160A (en) * 2014-09-19 2014-12-17 中国科学院深圳先进技术研究院 Method and system for detecting Chinese phishing website
CN104239582A (en) * 2014-10-14 2014-12-24 北京奇虎科技有限公司 Method and device for identifying phishing webpage based on feature vector model
CN104504335A (en) * 2014-12-24 2015-04-08 中国科学院深圳先进技术研究院 Fishing APP detection method and system based on page feature and URL feature
CN104537303A (en) * 2014-12-30 2015-04-22 中国科学院深圳先进技术研究院 Distinguishing system and method for phishing website

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107204956A (en) * 2016-03-16 2017-09-26 腾讯科技(深圳)有限公司 website identification method and device
CN107204956B (en) * 2016-03-16 2020-06-23 腾讯科技(深圳)有限公司 Website identification method and device
CN107438053A (en) * 2016-05-25 2017-12-05 阿里巴巴集团控股有限公司 Domain name recognition methods, device and server
CN107438053B (en) * 2016-05-25 2020-08-25 阿里巴巴集团控股有限公司 Domain name identification method and device and server
CN106227745A (en) * 2016-07-14 2016-12-14 杭州数梦工场科技有限公司 Data carding method between a kind of item set and device
CN106230848A (en) * 2016-08-11 2016-12-14 国家计算机网络与信息安全管理中心 A kind of method of Behavior-based control feature detection fishing website
CN106354800A (en) * 2016-08-26 2017-01-25 中国互联网络信息中心 Undesirable website detection method based on multi-dimensional feature
CN107612893A (en) * 2017-09-01 2018-01-19 北京百悟科技有限公司 The auditing system and method and structure short message examination & verification model method of short message
CN107612893B (en) * 2017-09-01 2020-06-02 北京百悟科技有限公司 Short message auditing system and method and short message auditing model building method
CN108111478A (en) * 2017-11-07 2018-06-01 中国互联网络信息中心 A kind of phishing recognition methods and device based on semantic understanding
CN108092963B (en) * 2017-12-08 2020-05-08 平安科技(深圳)有限公司 Webpage identification method and device, computer equipment and storage medium
CN108092963A (en) * 2017-12-08 2018-05-29 平安科技(深圳)有限公司 Web page identification method, device, computer equipment and storage medium
CN110020249A (en) * 2017-12-28 2019-07-16 中国移动通信集团山东有限公司 A kind of caching method, device and the electronic equipment of URL resource
CN110020249B (en) * 2017-12-28 2021-11-30 中国移动通信集团山东有限公司 URL resource caching method and device and electronic equipment
CN108566399A (en) * 2018-04-23 2018-09-21 中国互联网络信息中心 Fishing website recognition methods and system
CN108566399B (en) * 2018-04-23 2020-11-03 中国互联网络信息中心 Phishing website identification method and system
WO2020036622A1 (en) * 2018-08-14 2020-02-20 Didi Research America, Llc System and method for detecting generated domain
US11329952B2 (en) 2018-08-14 2022-05-10 Beijing Didi Infinity Technology And Development Co., Ltd. System and method for detecting generated domain
CN109617864A (en) * 2018-11-27 2019-04-12 烟台中科网络技术研究所 A kind of website identification method and website identifying system
CN109617864B (en) * 2018-11-27 2021-04-16 烟台中科网络技术研究所 Website identification method and website identification system
CN111291374A (en) * 2020-02-20 2020-06-16 支付宝(杭州)信息技术有限公司 Application program detection method, device and equipment
CN111400705A (en) * 2020-03-04 2020-07-10 支付宝(杭州)信息技术有限公司 Application program detection method, device and equipment
CN111400705B (en) * 2020-03-04 2023-03-14 支付宝(杭州)信息技术有限公司 Application program detection method, device and equipment
CN112688946A (en) * 2020-12-24 2021-04-20 工业信息安全(四川)创新中心有限公司 Method, module, storage medium, device and system for constructing abnormality detection features
CN113067820A (en) * 2021-03-19 2021-07-02 深圳市安络科技有限公司 Method, device and equipment for early warning abnormal webpage and/or APP
CN113779481B (en) * 2021-09-26 2024-04-09 恒安嘉新(北京)科技股份公司 Method, device, equipment and storage medium for identifying fraud websites
CN114095278A (en) * 2022-01-19 2022-02-25 南京明博互联网安全创新研究院有限公司 Phishing website detection method based on mixed feature selection frame
CN114095278B (en) * 2022-01-19 2022-05-24 南京明博互联网安全创新研究院有限公司 Phishing website detection method based on mixed feature selection frame

Similar Documents

Publication Publication Date Title
CN105338001A (en) Method and device for recognizing phishing website
Kharraz et al. Surveylance: Automatically detecting online survey scams
CN105357221A (en) Method and apparatus for identifying phishing website
CN103685307B (en) The method and system of feature based storehouse detection fishing fraud webpage, client, server
CN107204960B (en) Webpage identification method and device and server
CN104156490A (en) Method and device for detecting suspicious fishing webpage based on character recognition
CN104063455B (en) Method and device for acquiring counseling messages of disease based on searching
CN105359140B (en) The vertical access of variable search inquiry
CN106874253A (en) Recognize the method and device of sensitive information
CN110191096B (en) Word vector webpage intrusion detection method based on semantic analysis
CN108134784A (en) web page classification method and device, storage medium and electronic equipment
CN104158828B (en) The method and system of suspicious fishing webpage are identified based on cloud content rule base
CN110572359A (en) Phishing webpage detection method based on machine learning
CN107908959A (en) Site information detection method, device, electronic equipment and storage medium
GB2555801A (en) Identifying fraudulent and malicious websites, domain and subdomain names
JP7372707B2 (en) Data acquisition method and device for analyzing cryptocurrency transactions
CN105430101A (en) Method and device for generating promotion link and method and device for analyzing promotion link
CN104239582A (en) Method and device for identifying phishing webpage based on feature vector model
CN106656741A (en) Information push method and system
CN111758098A (en) Named entity identification and extraction using genetic programming
Sánchez-Paniagua et al. Phishing websites detection using a novel multipurpose dataset and web technologies features
CN111967503A (en) Method for constructing multi-type abnormal webpage classification model and abnormal webpage detection method
WO2016177646A1 (en) Computer-implemented methods of website analysis
Nowroozi et al. An adversarial attack analysis on malicious advertisement URL detection framework
Ojewumi et al. Performance evaluation of machine learning tools for detection of phishing attacks on web pages

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160217

RJ01 Rejection of invention patent application after publication