CN107992469A - A kind of fishing URL detection methods and system based on word sequence - Google Patents

A kind of fishing URL detection methods and system based on word sequence Download PDF

Info

Publication number
CN107992469A
CN107992469A CN201710952360.5A CN201710952360A CN107992469A CN 107992469 A CN107992469 A CN 107992469A CN 201710952360 A CN201710952360 A CN 201710952360A CN 107992469 A CN107992469 A CN 107992469A
Authority
CN
China
Prior art keywords
url
word
word sequence
fishing
dictionary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710952360.5A
Other languages
Chinese (zh)
Inventor
亚静
柳厅文
时金桥
张盼盼
张振宇
王玉斌
李全刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201710952360.5A priority Critical patent/CN107992469A/en
Publication of CN107992469A publication Critical patent/CN107992469A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • H04L63/0236Filtering by address, protocol, port number or service, e.g. IP-address or URL
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1483Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Hardware Design (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The present invention provides a kind of fishing URL detection methods and system based on word sequence, for detecting fishing URL.By being segmented to URL character strings, and then obtain the vector representation of word sequence, then the contextual information and feature in word sequence are learnt automatically using deep learning model, it is not necessary to manually include the relevant text feature of word in extraction URL, be used for detecting fishing URL using trained model.So as to solve the problems, such as to run into the fishing URL detections of above-mentioned existing word-based feature.

Description

A kind of fishing URL detection methods and system based on word sequence
Technical field
The present invention relates to information security field, more particularly to a kind of fishing URL detection methods and system based on word sequence.
Background technology
The URL that goes fishing is a kind of phishing behavior, by disguising oneself as legal person's online media sites for winning a high reputation to obtain user Sensitive information, such as user name, password and credit card detail.Fishing URL usually claims the social activity for oneself coming from prevalence Website (including YouTube, Facebook, Twitter etc.), Auction Site (eBay), electronic business transaction website (PayPal, Alibaba etc.) or network manager (Google, Yahoo, ISP) etc., with this, to inveigle, victim's is credulous. Attacker pass through frequently with fraud be the embedded confusing user in URL keyword, as attacker utilize shaped like The URL of " login.mydomain.tld/paypal " inveigles PayPal user.
At present, no matter in research field, or commercial product, the method and safety of existing many fishing URL detections are produced Product, its cardinal principle are all based on greatly the feature of manually extraction URL related datas, build disaggregated model, classify to URL, from And detect fishing URL.According to the difference of analysis data, existing detection method can be divided into the detection method based on multi-source information With two major class of detection method based on URL itself.
Detection method based on multi-source information needs to gather the relevant a variety of data of URL, including Alexa rankings, WHOIS letter Breath, web page contents etc., the model for constructing complexity is trained the data marked, for detecting whether unknown URL is fishing URL.This method usually has higher accuracy rate, still, due to gather these a variety of data need very big resource and The extra expense such as time, therefore, the real-time detection not being suitable in express network.
And based on the detection method of URL itself, the text feature of URL character strings in itself is only analyzed, for building classification mould Type, is a kind of detection method of lightweight, suitable for detection in real time.
Specifically, the fishing detection method based on URL itself, by extracting the text feature of URL character strings, training point Class model, for detecting fishing URL.The text feature of URL character strings in itself can be divided into two class of character feature and word feature again. Character feature mainly considers to form the feature of the character performance of URL text strings, including character length, vowel-consonant ratio, numeral Entropy that number, additional character number, character are distributed etc..The word for having semantic information included in word feature Main Analysis URL And its occurrence frequency feature etc., common word login, update and the famous brand name paypal of prevalence in such as URL, Alibaba etc..
Lightweight fishing detection based on URL itself more meets the demand of real-time response in express network.Based on character Feature have ignored the semantic information included in URL, and URL is for facilitating people to remember, therefore is usually had readable and easy to remember The property recalled, includes multiple significant everyday words.Moreover, in phishing attack, attacker pass through frequently with strategy be to utilize key Word confuses user.
And the fishing URL detection methods of existing word-based feature are mostly using word and the frequency occurred as special at present Sign, does not account for the word sequence feature included in URL, and these features are all based on manually proposing, there is certain limitation. First, manually extraction feature needs to expend substantial amounts of manpower and resource goes statistical analysis and verifies the validity of feature;Secondly, people The feature of work extraction is usually only effective to certain a kind of data, poor robustness;Moreover, the key that attacker uses in the URL that goes fishing Word is usually similar to normal URL, so just can cause the reduction of disaggregated model detection efficiency with confusing user.
The content of the invention
In view of the deficiency of the prior art, it is an object of the invention to provide a kind of fishing based on word sequence URL detection methods and system, for detecting fishing URL.By being segmented to URL character strings, so obtain word sequence to Amount represents, then learns the contextual information and feature in word sequence automatically using deep learning model, it is not necessary to manually extraction The relevant text feature of word is included in URL, is used for detecting fishing URL using trained model.So as to solve mentioned above Existing word-based feature fishing URL detections in the problem of running into.
In order to achieve the above object, the present invention adopts the technical scheme that:
A kind of fishing URL detection methods based on word sequence, comprise the following steps:
URL will have been marked and be converted to word order column vector as training data;
Using training data train classification models;
Unknown URL is converted to word order column vector and is input in trained disaggregated model and is labeled.
Further, will mark URL or unknown URL and be converted to word order column vector includes:
Filter out and marked URL or agreement and generic top-level domain in unknown URL;
Remaining part after filtering is split, the character string of each segmentation obtained to segmentation is passed through using dictionary The mode of Forward Maximum Method is segmented, and obtains word sequence;
Numbering is proceeded by from 1 to word all in above-mentioned dictionary, each word is had unique number, each having marked The word sequence of URL or unknown URL are converted to the fixed length vector of digital representation.
Further, the agreement includes http, https, ftp, ftps, gopher;The generic top-level domain includes com、org、net、edu、gov。
Further, it is described to carry out participle by way of Forward Maximum Method using dictionary and include:
Whole character string is judged whether in dictionary, if so, then no longer being segmented;
If it is not, then removing last character, judge remaining character string whether in dictionary;
Foregoing deterministic process is repeated until matching the word in dictionary, then removes the word in matching;
Above-mentioned steps are continued to the remaining part of character string, until character string is all disposed;
As character string do not include dictionary in word, then be divided into single character.
Further, the dictionary selects Google's English word corpus disclosed in Peter Norvig.
Further, the two-way LSTM models based on word sequence are selected to be instructed using the disaggregated model of training data training Practice.
Further, included using training data train classification models:
Training data is randomly divided into training part and verification portion, by setting the hyper parameter of neural network model and swashing The parameters such as function living are trained two-way LSTM models.
Further, two-way LSTM models include embeding layer, LSTM layers two-way, dropout layers and four layers of god of sigmoid layers Through network, further included using training data train classification models:Output LSTM layers two-way is used to prevent using dropout functions Only over-fitting.
A kind of fishing URL detecting systems based on word sequence, including:
Modular converter and classification based training model;
Modular converter is converted to training data of the word order column vector as train classification models will mark URL;And It is labeled to be converted to word order column vector and be input in trained disaggregated model unknown URL.
As described above, method and system provided by the invention, it is not necessary to manually extract any feature, it is only necessary to which URL is turned Word sequence vector representation is changed to, passes through the contextual information in deep neural network (two-way LSTM models) automatically study word sequence And feature, for detecting fishing URL.
Compared to the technology of traditional detection fishing URL, has the following advantages:
First, it is not necessary to the related data of extra collection URL and the manually text feature of extraction URL, by using depth Degree learning model learns the word sequence contextual information and feature of URL, and thereby detection fishing URL automatically;It is obvious to reduce expense.
In addition, contextual information and the feature that the word sequence of URL includes are excavated by depth, compared to based on artificial extraction The machine learning model of word feature and the deep learning model based on character string have preferable effect, in same data set On detection result it is preferable.
Finally, by the method and system of the present invention, trained model, on common server, single thread are used Predetermined speed is no less than 600 URL up to each second.On the premise of Detection accuracy is improved, it can meet to detect in real time at the same time Demand.
Brief description of the drawings
Fig. 1 is the flow diagram of the fishing URL detection methods based on word sequence in one embodiment of the invention.
Fig. 2 is the two-way LSTM models used in one embodiment of the invention in the fishing URL detection methods based on word sequence Structure diagram.
Embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete Whole description.
In one embodiment of this invention, there is provided a kind of fishing URL detection methods and system based on word sequence, method Key step includes:
(1) word sequence vector representation, first, the crucial word order included in URL is obtained using based on the matched method of dictionary Row, are then based on dictionary and encode to obtain the vector representation of URL word sequences;
(2) model training, it is word-based using the training data training marked to the word order column vector obtained in previous step The two-way LSTM models of sequence;
(3) go fishing URL detection, using the trained two-way unknown URL of LSTM model inspections based on word sequence whether be Fishing.
System includes:Modular converter and classification based training model;
Modular converter is converted to training number of the word sequence vector representation as train classification models will mark URL According to;And it is labeled to be converted to word sequence vector representation and be input in trained disaggregated model unknown URL.
Word sequence vector representation step in this method, primarily to obtaining the vector representation of URL word sequences, mainly has The following steps:
I) first, filter out known agreement and generic top-level domain two parts in URL, common agreement include http, Https, ftp, ftps, gopher etc., generic top-level domain is 14 including com, org, net, edu, gov etc.;
Ii) to remaining part, first split with symbol, preprepared dictionary then is used to each segmentation Segmented by the method for Forward Maximum Method, with reference to the pseudocode shown in lower table algorithm 1, specific participle process is:First Whole character string is judged whether in dictionary, if need not segmented again;If it was not then remove last Character, judges remaining character string whether in dictionary, until matching the word in dictionary, then removes the word in matching, Above-mentioned steps are continued to the remaining part of character string, have been handled until character string is whole, if character string does not include dictionary In word, then be divided into single character.
The dictionary used during above-mentioned participle is that Google's English word corpus (includes disclosed in Peter Norvig 333,333 English words);Other English word dictionaries are not applied to, which is that Peter Norvig have been counted in web page In common word, more meet the naming method of URL.
Iii) then, numbering being proceeded by from 1 to word all in above-mentioned dictionary, each word has only one numbering, The word sequence of each URL is converted to the fixed length vector of digital representation;
Model training step in this method, gathers the vector obtained in previous step, is gathered using the vector marked The two-way LSTM models based on word sequence are trained as training data.Training sample set is randomly divided into training and verification Two parts (account for whole labeled data respectively 80% and 20%), by hyper parameter (each layer for setting neural network model Output dimension etc.) and the parameter such as activation primitive two-way LSTM models are trained.Used deep learning model includes Multilayer neural network, is respectively embeding layer, LSTM layers two-way, dropout layers and four layers of neutral net of sigmoid layers, to two-way LSTM layers of output is used to prevent over-fitting using dropout functions.
Fishing URL detecting steps in this method, the main data realized to not marking, i.e., whether unknown URL, detect it For fishing.The word order column vector of unknown URL is input in trained two-way LSTM models and is labeled, if output is 1 Then represent that it, for fishing URL, is otherwise normal URL.
It is described further with reference to example:Fishing URL detection methods based on word sequence, its overall procedure as shown in Figure 1, Two-way LSTM model structures based on word sequence are as shown in Figure 2.
With the URL that goes fishing:http:Exemplified by //shen.mansell.tripod.com/games/gameboy.html, the URL Mark state is 1, and word sequence vector representation and the two-way LSTM models of training of fixed length are carried out to URL, and uses trained mould Type is to unknown URL:http://fly-project.net//yahoo.link/Yah/T/Y.html is detected.
1) word sequence vector representation is carried out to the URL of input first, URL is carried out first by preprepared dictionary Participle:
Then the word in dictionary is numbered, word sequence is expressed as the fixed length vector that length is N, and the value of N can lead to Cross statistics to obtain, find to include 13 words in the URL more than 90 percent by statistics, therefore set N=13, then two URL is respectively obtained vectorial (1,4,5,6,7,11,13,0,0,0,0,0,0) and (2,19,3,9,12,8,14,0,0,0,0,0,0).
The word sequence vector representation of all URL in sample set is obtained with identical method.Include and marked in sample set Normal URL and fishing url data.
2) it is input to using the vectorial data for gathering acceptance of the bid note of word sequence as training data as shown in Figure 2 based on word order It is trained in the two-way LSTM models of row, the word order column vector of URL first is input to Embedding layers of dimension-reduction treatment, then Be input to it is LSTM layers two-way in learnt, the result of study, which is input to dropout layers, prevents over-fitting, last layer Sigmoid functions export testing result.Mark 1 is expressed as fishing URL, is labeled as the normal URL of 0 expression, really two classification Problem, therefore model output carries out 0-1 classification using sigmoid functions.
All labeled data are input to training data in model, export trained model.
3) for the data not marked, its vector is input in trained model, exports annotation results, if output Fishing URL is expressed as 1, is otherwise normal URL.
Thus, by examples detailed above, the method in this example need not manually extract any feature, it is only necessary to URL Word sequence vector representation is converted to, is believed by the context in deep neural network (two-way LSTM models) automatically study word sequence Breath and feature, for detecting fishing URL.
Its key step includes:1) word sequence vector representation, first segments URL, and URL herein is included and marked And it is unknown.All URL will be converted to vector, then with the data training pattern of mark.Then padding sequence is utilized Method be fixed the vector representation of length;" fixed length " represents that the word sequence vector length that each URL is obtained is identical.Fill out It is the vector for handling different length to fill sequence method, is converted to equal length.
2) model training, the vector obtained to previous step, two-way LSTM models are trained using the training data marked.
3) URL that goes fishing is detected, and for the URL not marked, its vector representation is input to trained two-way LSTM models In be labeled, be labeled as 1 for go fishing URL.
Step 1) by word sequence vector representation, obtains the fixed length vector representation of URL character strings, this method is to URL first Vector representation be trained and analyze;
Step 2) uses two-way LSTM model of the data training marked based on word sequence to pretreated data;
Step 3) is input to the vector representation of unknown URL in trained two-way LSTM models and is labeled, and detects it Whether it is fishing URL;
Fishing URL is detected using the above method;It is capable of contextual information and the spy that the word sequence of depth excavation URL includes Sign, compared to the machine learning model based on the word feature manually extracted and the deep learning model based on character string have compared with Good effect, the detection result in same data set are as shown in table 1;
Also, this method is a kind of fishing URL detection methods of lightweight, using trained model, in common clothes It is engaged on device, single thread predetermined speed is no less than 600 URL up to each second.It can meet real-time while Detection accuracy is improved The demand of detection.
The testing result contrast of the different detection models of 1 four kinds of table
Model Precision Recall F1
The decision-tree model of word-based feature 0.8803 0.8700 0.8751
The Random Forest model of word-based feature 0.8981 0.8965 0.8973
Two-way LSTM models based on character string 0.9553 0.9474 0.9513
Two-way LSTM models based on word sequence 0.9808 0.9716 0.9762
Obviously, described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, the every other implementation that those of ordinary skill in the art are obtained without making creative work Example, belongs to the scope of protection of the invention.

Claims (9)

1. a kind of fishing URL detection methods based on word sequence, comprise the following steps:
URL will have been marked and be converted to word order column vector as training data;
Using training data train classification models;
Unknown URL is converted to word order column vector and is input in trained disaggregated model and is labeled.
2. the fishing URL detection methods based on word sequence as claimed in claim 1, it is characterised in that URL or not will have been marked The URL known, which is converted to word order column vector, to be included:
Filter out and marked URL or agreement and generic top-level domain in unknown URL;
Remaining part after filtering is split, the character string of each segmentation obtained to segmentation passes through forward direction using dictionary Maximum matched mode is segmented, and obtains word sequence;
Numbering is proceeded by from 1 to word all in above-mentioned dictionary, each word is had unique number, each having marked URL Or the word sequence of unknown URL is converted to the fixed length vector of digital representation.
3. the fishing URL detection methods based on word sequence as claimed in claim 2, it is characterised in that the agreement includes http、https、ftp、ftps、gopher;The generic top-level domain includes com, org, net, edu, gov.
4. the fishing URL detection methods based on word sequence as claimed in claim 2, it is characterised in that described to be led to using dictionary The mode for crossing Forward Maximum Method carries out participle and includes:
Whole character string is judged whether in dictionary, if so, then no longer being segmented;
If not, removing last character, judge remaining character string whether in dictionary;
Foregoing deterministic process is repeated until matching the word in dictionary, then removes the word in matching;
Above-mentioned steps are continued to the remaining part of character string, until character string is all disposed;
As character string do not include dictionary in word, then be divided into single character.
5. the fishing URL detection methods based on word sequence as claimed in claim 4, it is characterised in that the dictionary is selected Google's English word corpus disclosed in Peter Norvig.
6. the fishing URL detection methods based on word sequence as claimed in claim 2, it is characterised in that instructed using training data Experienced disaggregated model selects the two-way LSTM models based on word sequence to be trained.
7. the fishing URL detection methods based on word sequence as claimed in claim 1, it is characterised in that instructed using training data Practicing disaggregated model includes:
Training data is randomly divided into training part and verification portion, by the hyper parameter and activation letter that set neural network model The parameters such as number are trained two-way LSTM models.
8. the fishing URL detection methods based on word sequence as claimed in claim 7, it is characterised in that two-way LSTM models bag Containing embeding layer, LSTM layers two-way, dropout layers and four layers of neutral net of sigmoid layers, using training data train classification models Further include:To output LSTM layers two-way using dropout functions for preventing over-fitting.
A kind of 9. fishing URL detecting systems based on word sequence, it is characterised in that including:
Modular converter and classification based training model;
Modular converter is converted to training data of the word order column vector as train classification models will mark URL;And to Unknown URL is converted to word order column vector and is input in trained disaggregated model and is labeled.
CN201710952360.5A 2017-10-13 2017-10-13 A kind of fishing URL detection methods and system based on word sequence Pending CN107992469A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710952360.5A CN107992469A (en) 2017-10-13 2017-10-13 A kind of fishing URL detection methods and system based on word sequence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710952360.5A CN107992469A (en) 2017-10-13 2017-10-13 A kind of fishing URL detection methods and system based on word sequence

Publications (1)

Publication Number Publication Date
CN107992469A true CN107992469A (en) 2018-05-04

Family

ID=62028932

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710952360.5A Pending CN107992469A (en) 2017-10-13 2017-10-13 A kind of fishing URL detection methods and system based on word sequence

Country Status (1)

Country Link
CN (1) CN107992469A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920463A (en) * 2018-06-29 2018-11-30 北京奇虎科技有限公司 A kind of segmenting method and system based on network attack
CN109101552A (en) * 2018-07-10 2018-12-28 东南大学 A kind of fishing website URL detection method based on deep learning
CN109391706A (en) * 2018-11-07 2019-02-26 顺丰科技有限公司 Domain name detection method, device, equipment and storage medium based on deep learning
CN109450853A (en) * 2018-10-11 2019-03-08 深圳市腾讯计算机系统有限公司 Malicious websites determination method, device, terminal and server
CN109450845A (en) * 2018-09-18 2019-03-08 浙江大学 A kind of algorithm generation malice domain name detection method based on deep neural network
CN109522454A (en) * 2018-11-20 2019-03-26 四川长虹电器股份有限公司 The method for automatically generating web sample data
CN109561084A (en) * 2018-11-20 2019-04-02 四川长虹电器股份有限公司 URL parameter rejecting outliers method based on LSTM autoencoder network
CN110493088A (en) * 2019-09-24 2019-11-22 国家计算机网络与信息安全管理中心 A kind of mobile Internet traffic classification method based on URL
CN111125563A (en) * 2018-10-31 2020-05-08 安碁资讯股份有限公司 Method for evaluating domain name and server thereof
CN111447169A (en) * 2019-01-17 2020-07-24 中国科学院信息工程研究所 Method and system for identifying malicious webpage in real time on gateway
CN112948725A (en) * 2021-03-02 2021-06-11 北京六方云信息技术有限公司 Phishing website URL detection method and system based on machine learning
CN113051500A (en) * 2021-03-25 2021-06-29 武汉大学 Phishing website identification method and system fusing multi-source data
CN114650152A (en) * 2020-12-17 2022-06-21 中国科学院计算机网络信息中心 Method and system for detecting vulnerability of super computing center
CN116633684A (en) * 2023-07-19 2023-08-22 中移(苏州)软件技术有限公司 Phishing detection method, system, electronic device and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120158626A1 (en) * 2010-12-15 2012-06-21 Microsoft Corporation Detection and categorization of malicious urls
CN102790762A (en) * 2012-06-18 2012-11-21 东南大学 Phishing website detection method based on uniform resource locator (URL) classification
CN105956472A (en) * 2016-05-12 2016-09-21 宝利九章(北京)数据技术有限公司 Method and system for identifying whether webpage includes malicious content or not
CN106776946A (en) * 2016-12-02 2017-05-31 重庆大学 A kind of detection method of fraudulent website
CN107180077A (en) * 2017-04-18 2017-09-19 北京交通大学 A kind of social networks rumour detection method based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120158626A1 (en) * 2010-12-15 2012-06-21 Microsoft Corporation Detection and categorization of malicious urls
CN102790762A (en) * 2012-06-18 2012-11-21 东南大学 Phishing website detection method based on uniform resource locator (URL) classification
CN105956472A (en) * 2016-05-12 2016-09-21 宝利九章(北京)数据技术有限公司 Method and system for identifying whether webpage includes malicious content or not
CN106776946A (en) * 2016-12-02 2017-05-31 重庆大学 A kind of detection method of fraudulent website
CN107180077A (en) * 2017-04-18 2017-09-19 北京交通大学 A kind of social networks rumour detection method based on deep learning

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920463A (en) * 2018-06-29 2018-11-30 北京奇虎科技有限公司 A kind of segmenting method and system based on network attack
CN109101552A (en) * 2018-07-10 2018-12-28 东南大学 A kind of fishing website URL detection method based on deep learning
CN109101552B (en) * 2018-07-10 2022-01-28 东南大学 Phishing website URL detection method based on deep learning
CN109450845A (en) * 2018-09-18 2019-03-08 浙江大学 A kind of algorithm generation malice domain name detection method based on deep neural network
CN109450853A (en) * 2018-10-11 2019-03-08 深圳市腾讯计算机系统有限公司 Malicious websites determination method, device, terminal and server
CN109450853B (en) * 2018-10-11 2022-02-18 深圳市腾讯计算机系统有限公司 Malicious website determination method and device, terminal and server
CN111125563A (en) * 2018-10-31 2020-05-08 安碁资讯股份有限公司 Method for evaluating domain name and server thereof
CN109391706A (en) * 2018-11-07 2019-02-26 顺丰科技有限公司 Domain name detection method, device, equipment and storage medium based on deep learning
CN109522454A (en) * 2018-11-20 2019-03-26 四川长虹电器股份有限公司 The method for automatically generating web sample data
CN109561084A (en) * 2018-11-20 2019-04-02 四川长虹电器股份有限公司 URL parameter rejecting outliers method based on LSTM autoencoder network
CN111447169B (en) * 2019-01-17 2021-06-08 中国科学院信息工程研究所 Method and system for identifying malicious webpage in real time on gateway
CN111447169A (en) * 2019-01-17 2020-07-24 中国科学院信息工程研究所 Method and system for identifying malicious webpage in real time on gateway
CN110493088A (en) * 2019-09-24 2019-11-22 国家计算机网络与信息安全管理中心 A kind of mobile Internet traffic classification method based on URL
CN114650152A (en) * 2020-12-17 2022-06-21 中国科学院计算机网络信息中心 Method and system for detecting vulnerability of super computing center
CN114650152B (en) * 2020-12-17 2023-06-20 中国科学院计算机网络信息中心 Super computing center vulnerability detection method and system
CN112948725A (en) * 2021-03-02 2021-06-11 北京六方云信息技术有限公司 Phishing website URL detection method and system based on machine learning
CN113051500A (en) * 2021-03-25 2021-06-29 武汉大学 Phishing website identification method and system fusing multi-source data
CN113051500B (en) * 2021-03-25 2022-08-16 武汉大学 Phishing website identification method and system fusing multi-source data
CN116633684A (en) * 2023-07-19 2023-08-22 中移(苏州)软件技术有限公司 Phishing detection method, system, electronic device and readable storage medium
CN116633684B (en) * 2023-07-19 2023-10-13 中移(苏州)软件技术有限公司 Phishing detection method, system, electronic device and readable storage medium

Similar Documents

Publication Publication Date Title
CN107992469A (en) A kind of fishing URL detection methods and system based on word sequence
CN104899508B (en) A kind of multistage detection method for phishing site and system
CN109005145A (en) A kind of malice URL detection system and its method extracted based on automated characterization
CN103559235B (en) A kind of online social networks malicious web pages detection recognition methods
CN101820366B (en) Pre-fetching-based fishing web page detection method
CN109450845B (en) Detection method for generating malicious domain name based on deep neural network algorithm
CN107786575A (en) A kind of adaptive malice domain name detection method based on DNS flows
CN106940732A (en) A kind of doubtful waterborne troops towards microblogging finds method
CN103500175B (en) A kind of method based on sentiment analysis on-line checking microblog hot event
CN103313248B (en) Method and device for identifying junk information
CN105072214B (en) C&C domain name recognition methods based on domain name feature
CN109413028A (en) SQL injection detection method based on convolutional neural networks algorithm
CN107566376A (en) One kind threatens information generation method, apparatus and system
CN103136358B (en) A kind of method of Automatic Extraction forum data
CN103577755A (en) Malicious script static detection method based on SVM (support vector machine)
CN109657470A (en) Malicious web pages detection model training method, malicious web pages detection method and system
CN107566391A (en) Domain identification plus the method for the topic identification structure machine learning model detection dark chain of webpage
CN106874253A (en) Recognize the method and device of sensitive information
CN103577556A (en) Device and method for obtaining association degree of question and answer pair
CN110134876A (en) A kind of cyberspace Mass disturbance perception and detection method based on gunz sensor
CN110830489B (en) Method and system for detecting counterattack type fraud website based on content abstract representation
CN113422761B (en) Malicious social user detection method based on counterstudy
CN107463844A (en) WEB Trojan detecting methods and system
CN109714356A (en) A kind of recognition methods of abnormal domain name, device and electronic equipment
CN107819790A (en) The recognition methods of attack message and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180504

RJ01 Rejection of invention patent application after publication