WO2020082763A1 - Procédé et appareil à base d'arbres de décision pour détecter un site web de hameçonnage, et dispositif d'ordinateur - Google Patents

Procédé et appareil à base d'arbres de décision pour détecter un site web de hameçonnage, et dispositif d'ordinateur Download PDF

Info

Publication number
WO2020082763A1
WO2020082763A1 PCT/CN2019/091878 CN2019091878W WO2020082763A1 WO 2020082763 A1 WO2020082763 A1 WO 2020082763A1 CN 2019091878 W CN2019091878 W CN 2019091878W WO 2020082763 A1 WO2020082763 A1 WO 2020082763A1
Authority
WO
WIPO (PCT)
Prior art keywords
website
url
detected
information
phishing
Prior art date
Application number
PCT/CN2019/091878
Other languages
English (en)
Chinese (zh)
Inventor
谭杰
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020082763A1 publication Critical patent/WO2020082763A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1483Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing

Definitions

  • the present application relates to the field of intelligent decision-making technology, in particular to a method, device and computer equipment for detecting a phishing website based on a decision tree.
  • Phishing website means that criminals use various means to spoof the address and page content of a real website, or use vulnerabilities in the server program of a real website to insert dangerous HTML code in certain pages of the site to deceive users Bank account or credit card account, password and other private information.
  • the detection method of the phishing website in the related art there is a scheme to determine whether the website to be detected is a phishing website by comparing the domain name information and content identification information of the website to be detected with the target website.
  • the accuracy of the detection results is low by comparing with the target website.
  • the purpose of this application is to provide a method, device and computer equipment for detecting a phishing website based on a decision tree to solve the problems in the prior art.
  • the present application provides a method for detecting a phishing website based on a decision tree, including the following steps:
  • Step 01 Construct a random forest in advance, and the constructed random forest includes several decision trees;
  • Step 02 Determine the webpage information of the website to be tested
  • Step 03 Extract the feature information of the website to be detected according to the webpage information of the website to be detected;
  • Step 04 Use each decision tree of the random forest to classify and vote on the extracted feature information
  • Step 05 When the classified voting result is that the number of votes of the phishing website is more than the number of votes of the normal website, it is determined that the website to be detected is a phishing website;
  • the construction of the random forest includes:
  • Step 011 randomly selecting n samples from the sample set with replacement, and the sample set includes webpage information of several phishing websites and webpage information of several normal websites;
  • Step 012 randomly select k feature information from all the set feature information, and use the randomly selected k feature information to build a decision tree on the selected n samples;
  • Step 013 repeat steps 011-012 m times to generate m decision trees, and the generated m decision trees constitute a random forest;
  • n, k, m are all positive integers.
  • the present application also provides a phishing website detection device based on decision tree, including:
  • the random forest construction module is used to construct a random forest in advance to obtain a random forest classifier, and the constructed random forest includes several decision trees;
  • Webpage information determination module used to determine the webpage information of the website to be tested
  • a feature information extraction module configured to extract feature information of the website to be detected according to the webpage information of the website to be detected
  • the random forest classifier is used to classify and vote on the extracted feature information by using each decision tree of the random forest;
  • the detection result determination module is used to determine that the website to be tested is a phishing website when the classified voting result is that the number of votes of the phishing website is more than the number of votes of the normal website;
  • the random forest construction module is specifically used to construct a random forest in the following ways:
  • Step 011 randomly selecting n samples from the sample set with replacement, and the sample set includes webpage information of several phishing websites and webpage information of several normal websites;
  • Step 012 randomly select k feature information from all the set feature information, and use the randomly selected k feature information to build a decision tree on the selected n samples;
  • Step 013 repeat steps 011-012 m times to generate m decision trees, and the generated m decision trees constitute a random forest;
  • n, k, m are all positive integers.
  • the present application also provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor.
  • Step 01 Construct a random forest in advance, and the constructed random forest includes several decision trees;
  • Step 02 Determine the webpage information of the website to be tested
  • Step 03 Extract the feature information of the website to be detected according to the webpage information of the website to be detected;
  • Step 04 Use each decision tree of the random forest to classify and vote on the extracted feature information
  • Step 05 When the classified voting result is that the number of votes of the phishing website is more than the number of votes of the normal website, it is determined that the website to be detected is a phishing website;
  • Step 01 includes:
  • Step 011 randomly selecting n samples from the sample set with replacement, and the sample set includes webpage information of several phishing websites and webpage information of several normal websites;
  • Step 012 randomly select k feature information from all the set feature information, and use the randomly selected k feature information to build a decision tree on the selected n samples;
  • Step 013 repeat steps 011-012 m times to generate m decision trees, and the generated m decision trees constitute a random forest;
  • n, k, m are all positive integers.
  • the present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the following steps of a method for detecting a phishing website based on a decision tree are implemented:
  • Step 01 Construct a random forest in advance, and the constructed random forest includes several decision trees;
  • Step 02 Determine the webpage information of the website to be tested
  • Step 03 Extract the feature information of the website to be detected according to the webpage information of the website to be detected;
  • Step 04 Use each decision tree of the random forest to classify and vote on the extracted feature information
  • Step 05 When the classified voting result is that the number of votes of the phishing website is more than the number of votes of the normal website, it is determined that the website to be detected is a phishing website;
  • step 01 includes:
  • Step 011 randomly selecting n samples from the sample set with replacement, and the sample set includes webpage information of several phishing websites and webpage information of several normal websites;
  • Step 012 randomly select k feature information from all the set feature information, and use the randomly selected k feature information to build a decision tree on the selected n samples;
  • Step 013 repeat steps 011-012 m times to generate m decision trees, and the generated m decision trees constitute a random forest;
  • n, k, m are all positive integers.
  • the method, device and computer equipment for detecting a phishing website based on a decision tree provided by this application by determining the web page information of the website to be detected, extracting the characteristic information of the website to be detected according to the web page information of the website to be detected, and using In the random forest of trees, the extracted feature information is classified and voted.
  • the result of the classification vote is that the number of votes of the phishing website is more than the number of votes of the normal website
  • the website to be detected can be determined to be a phishing website.
  • the random forest is constructed by establishing a decision tree through a large number of website samples.
  • the types of phishing websites included are diverse.
  • the use of random forests for classification and voting has a high accuracy rate.
  • FIG. 1 is a flowchart of Embodiment 1 of a method for detecting a phishing website based on a decision tree;
  • FIG. 2 is a schematic diagram of a program module of a first embodiment of an application for detecting a phishing website based on a decision tree in this application;
  • FIG. 3 is a schematic diagram of another program module of the first embodiment of a phishing website detection device based on a decision tree of the present application;
  • FIG. 4 is a schematic diagram of a hardware structure of a first embodiment of a phishing website detection device based on a decision tree according to the present application;
  • FIG. 5 is a flowchart of Embodiment 2 of a method for detecting a phishing website based on a decision tree in this application.
  • the method, device and computer equipment for detecting a phishing website based on a decision tree are applicable to the field of intelligent decision-making technology, and are a method for classifying and voting through each decision tree in a random forest to detect whether it is a phishing website.
  • This application determines the webpage information of the website to be tested, extracts the feature information of the website to be tested based on the webpage information of the website to be tested, and uses the constructed random forest including several decision trees to classify the extracted feature information to vote.
  • the voting result is that the phishing website has more votes than the normal website, it can be determined whether the website to be tested is a phishing website.
  • the random forest is constructed by establishing a decision tree through a large number of website samples.
  • the types of phishing websites included are diverse, and the use of random forests for classification and voting has a high accuracy rate.
  • a method for detecting a phishing website based on a decision tree in this embodiment includes the following steps:
  • Step 01 Construct a random forest in advance, and the constructed random forest includes several decision trees.
  • a random forest is a classifier that contains multiple decision trees. It uses multiple decision trees to train samples and implement predictions.
  • the output category is determined by the mode of the category output by the individual trees.
  • random forest can be used to detect whether the website to be detected is a phishing website.
  • n samples are randomly selected from the sample set with replacement, and the sample set includes webpage information of several fishing websites and webpage information of several normal websites.
  • the webpage information included in the sample set may be a URL.
  • the phishing websites included in the sample set can be collected and constructed during the accumulation of experience, or they can be obtained from the blacklist of Google Safe API.
  • the Google safe API blacklist includes several URLs, and the websites corresponding to these URLs are all phishing sites. Therefore, when constructing the sample set, all or part of the URLs in the Google safe API blacklist can be added to the sample set .
  • the sample set In order to ensure that the decision tree voting classification constructed using the sample set is more accurate, the sample set also needs to include the webpage information of several normal websites, where the webpage information of the normal website can be obtained from websites other than the blacklist of Google safe API .
  • Step 012 randomly select k feature information from all the set feature information, and use the randomly selected k feature information to establish a decision tree for the selected n samples.
  • the data input to the classifier is the feature information, and the decision of each node in the established decision tree is determined based on these feature information.
  • the feature information includes at least one of the following information:
  • each information resource has a uniform and unique address on the Internet.
  • the address is called URL (Uniform Resource Locator), which is the unified resource location mark of WWW, which refers to the network address.
  • the most commonly used transmission protocol for URLs is the HTTP protocol, which is currently the most widely used protocol in WWW.
  • Common URL formats can include: http format, file format, ftp format, gopher format, etc. According to experience, when the URL is in IP format, the website corresponding to the URL may be a phishing website.
  • the time period in which the URL domain name exists can also be used as the feature information for establishing the decision tree.
  • the set number of days can be 30 days.
  • the website may be a phishing website. Therefore, whether the URL contains the @ character is also used as feature information for establishing a decision tree.
  • Some phishing websites will use multiple domain names to disguise. For example, when clicking on the URL of a website, there will be multiple jumps in the middle. Therefore, when at least two domain names are included in the URL, the website corresponding to the URL may be It will be a phishing website.
  • the purpose of a general phishing website is to steal user account password information. Therefore, if the account password information is included in the form of the website, the website may be a phishing website. Therefore, whether the account password information is included in the form as a decision tree Characteristic information.
  • the value before the URL jump is "Taobao”
  • the value after the URL jump is not "Taobao”
  • the website may be a phishing website, using the value before the URL jump to deceive the user.
  • the value after the URL jump can be obtained by parsing the opened web page.
  • the best splitting method may be calculated according to the k randomly selected feature information.
  • Splitting refers to the process of splitting the training data set into two sub-data sets again and again during the training process of the decision tree.
  • the value k of randomly selecting the number of feature information may be rounding the root number N.
  • the rounding of the root number N may be rounding up or rounding down, which may be preset in advance.
  • the number N of all the feature information set is equal to 10
  • the root number 10 is approximately 3.16.
  • rounding up is equal to 4
  • the number k of feature information randomly selected is 4
  • Take rounding down as an example, then rounding down is equal to 3, and the number k of randomly selected feature information is 3.
  • Step 013 Repeat steps 011-012 m times to generate m decision trees.
  • the generated m decision trees form a random forest; where m is a positive integer.
  • a random forest composed of m decision trees is a random forest classifier.
  • each splitting process of the decision tree in the random forest does not use all the feature information to be selected, but selects a certain amount of feature information randomly from all the feature information to be selected, and then The best feature information is selected from the selected feature information. This can make the decision trees in the random forest different from each other, improve the diversity of the system, and thus improve the classification performance.
  • the unsampled samples in the sample set can be used as the test data of the random forest classifier to verify the accuracy of the random forest classifier. For example, select several unsampled websites in the sample set Web page information, it is known that these selected types of unsampled websites, that is, phishing websites or normal websites, extract the feature information of each website, and input the extracted feature information into the random forest classifier respectively, according to the random forest classification The type of the website detected by the device is compared with its true type. If the accuracy rate exceeds the set probability, it indicates that the accuracy rate of the random forest classifier meets the requirements and can be used.
  • Step 02 Determine the webpage information of the website to be detected.
  • the determined webpage information of the website to be detected may be a URL.
  • a blacklist of phishing websites may be constructed in advance, where the blacklist includes URLs of several websites, all of which have been determined
  • the URL of the website to be tested can be obtained according to the webpage information of the website to be tested, and the URL of the website to be tested can be compared with a pre-built blacklist, If the URL of the website to be detected is included in the blacklist, it is determined that the website to be detected is a phishing website, and if the URL of the website to be detected is not included in the blacklist, the website to be detected needs to be further To check, you need to perform step 03.
  • the blacklist may be a blacklist of Google Safe API.
  • Step 03 Extract the feature information of the website to be detected according to the webpage information of the website to be detected.
  • the extracted feature information may be the same as the feature information in step 012, and the feature information may include at least one of the following information: (1) whether the URL is in IP format; (2) the time period when the URL domain name exists Whether it is less than the set number of days; (3) whether the URL contains the @ character; (4) whether the URL includes at least two domain names; (5) whether the account password information is included in the form; (6) the value and jump after the URL jump Is it the same before transfer?
  • the feature information extracted for the website to be detected is the above six pieces of information.
  • the extracted feature information may be Booleanized and converted into corresponding feature values. For example, for the above six feature information:
  • the URL of the website to be tested is in IP format, it will be converted to feature value 1, if the URL of the website to be tested is not in IP format, it will be converted to feature value 0;
  • time period of the URL domain name of the detection website is less than the set number of days, it will be converted to feature value 1, if the time period of the URL domain name of the detection website is not less than the set number of days, it will be converted to feature value 0;
  • the URL includes at least two domain names, it will be converted to feature value 1, if the URL does not include at least two domain names, it will be converted to feature value 0;
  • the account password information is included in the form, it will be converted to feature value 1, if the account password information is not included in the form, it will be converted to feature value 0;
  • the six feature values can also be converted into feature vectors.
  • the extracted six feature information are: URL is not in IP format, URL domain name exists for a period of time not less than the set number of days, and URL does not contain @ Characters, URL includes a domain name, the form does not include account password information, the value after the URL jump is not the same as before the jump; then the converted feature vector is [0,0,0,0,0,1].
  • Step 04 Use each decision tree of the random forest to classify and vote on the extracted feature information.
  • Step 05 When the result of the classification vote is that the number of votes of the phishing website is more than the number of votes of the normal website, it is determined that the website to be detected is a phishing website.
  • the website to be tested is determined to be a phishing website, and if the classification voting result is that the phishing website has fewer votes than the normal website , The website to be tested is determined to be a normal website.
  • the random forest classifier detects whether the website is a phishing website or a normal website by using type identifiers 1, 0, where output 1 indicates that the website is a phishing website and output 0 indicates that the website is a normal website.
  • it may further include: when it is determined that the website to be detected is a phishing website according to the voting result, adding the URL of the website to be detected to Pre-built blacklist.
  • the feature information of the website to be detected is extracted according to the webpage information of the website to be tested, and the random feature forest including a plurality of decision trees is constructed to classify the extracted feature information to vote ,
  • the random forest is constructed by establishing a decision tree through a large number of website samples. The types of phishing websites included are diverse, and the use of random forests for classification and voting has a high accuracy rate.
  • the deciding tree detection device 10 based on a decision tree may include or be divided into one or more program modules, one or Multiple program modules are stored in the storage medium and executed by one or more processors to complete the present application, and can implement the above-mentioned decision tree-based phishing website detection method.
  • the program module referred to in this application refers to a series of computer program instruction segments capable of performing specific functions, and is more suitable for describing the execution process of the phishing website detection device 10 based on the decision tree in the storage medium than the program itself. The following description will specifically introduce the functions of the program modules of this embodiment:
  • the random forest construction module 11 is used to construct a random forest in advance to obtain a random forest classifier 14, and the constructed random forest includes several decision trees;
  • the webpage information determination module 12 is used to determine the webpage information of the website to be detected
  • the feature information extraction module 13 is configured to extract feature information of the website to be detected according to the webpage information of the website to be detected;
  • the random forest classifier 14 is used to classify and vote on the extracted feature information by using each decision tree of the random forest;
  • the detection result determination module 15 is used to determine that the website to be detected is a phishing website when the classified voting result is that the number of votes of the phishing website is more than the number of votes of the normal website;
  • the random forest construction module 11 is specifically used to select n samples randomly from the sample set with replacement; the sample set includes webpage information of several phishing websites and webpage information of several normal websites; from the settings Randomly select k feature information from all the feature information, and use the randomly selected k feature information to build a decision tree on the selected n samples; repeat the above steps m times to generate m decision trees, which consist of m decision trees Random forest; where n, k, and m are all positive integers.
  • the phishing website detection device 10 based on the decision tree may further include: a Boolean processing module 16, which is used to Booleanize the extracted feature information to convert into corresponding feature values And output the converted feature value to the random forest classifier 14.
  • a Boolean processing module 16 which is used to Booleanize the extracted feature information to convert into corresponding feature values And output the converted feature value to the random forest classifier 14.
  • the phishing website detection device 10 based on the decision tree may further include: a primary detection module 17, configured to obtain the website to be detected according to the webpage information of the website to be detected URL, comparing the URL of the website to be detected with a pre-built blacklist, if the URL of the website to be detected is included in the blacklist, it is determined that the website to be detected is a phishing website, and if the black If the URL of the website to be tested is not included in the list, the web page information of the website to be tested is output to the feature information extraction module 13.
  • a primary detection module 17 configured to obtain the website to be detected according to the webpage information of the website to be detected URL, comparing the URL of the website to be detected with a pre-built blacklist, if the URL of the website to be detected is included in the blacklist, it is determined that the website to be detected is a phishing website, and if the black If the URL of the website to be tested is not included in the list, the web page information of the website to be tested is output to
  • the phishing website detection device 10 based on the decision tree may further include: a blacklist adding module 18, used to determine that the website to be detected is a phishing website based on the voting result Add the URL of the website to be detected to the pre-built blacklist.
  • This embodiment also provides a computer device, such as a smartphone, a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server, or a rack server (including an independent server, or A server cluster composed of multiple servers), etc.
  • the computer device 20 of this embodiment includes at least but not limited to: a memory 21 and a processor 22 that can be communicatively connected to each other through a system bus, as shown in FIG. 4. It should be noted that FIG. 4 only shows the computer device 20 having the components 21-22, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead.
  • the memory 21 (read-only storage medium) includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), Read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, etc.
  • the memory 21 may be an internal storage unit of the computer device 20, such as a hard disk or memory of the computer device 20.
  • the memory 21 may also be an external storage device of the computer device 20, for example, a plug-in hard disk equipped on the computer device 20, a smart memory card (Smart Media, Card, SMC), and secure digital (Secure Digital, SD) card, flash card (Flash Card), etc.
  • the memory 21 may also include both the internal storage unit of the computer device 20 and its external storage device.
  • the memory 21 is generally used to store the operating system and various application software installed in the computer device 20, such as the program code of the phishing website detection apparatus 10 based on the decision tree in the first embodiment.
  • the memory 21 may also be used to temporarily store various types of data that have been output or will be output.
  • the processor 22 may be a central processing unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments.
  • the processor 22 is generally used to control the overall operation of the computer device 20.
  • the processor 22 is used to run the program code or process data stored in the memory 21, for example, to run the decision tree-based phishing website detection device 10, so as to implement the decision tree-based phishing website detection method of Embodiment 1.
  • This embodiment also provides a computer-readable storage medium, such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), only Read memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, server, App store, etc., on which computer programs are stored, When the program is executed by the processor, the corresponding function is realized.
  • the computer-readable storage medium of this embodiment is used to store a decision tree-based phishing website detection device 10, and when executed by a processor, implements the decision tree-based phishing website detection method of Embodiment 1.
  • the method for detecting a phishing website based on a decision tree in this embodiment is based on Embodiment 1, and includes the following steps:
  • Step 01 Construct a random forest.
  • the constructed random forest includes several decision trees.
  • n samples are randomly selected from the sample set with replacement, and the sample set includes webpage information of several phishing websites and webpage information of several normal websites.
  • the webpage information included in the sample set may be a URL.
  • Step 012 randomly select k feature information from all the set feature information, and use the randomly selected k feature information to establish a decision tree for the selected n samples.
  • the feature information includes at least one of the following information: (1) whether the URL is in IP format; (2) the URL domain name exists Whether the time period is less than the set number of days; (3) whether the URL contains the @ character; (4) whether the URL includes at least two domain names; (5) whether the account password information is included in the form; (6) the value after the URL jump Is it the same as before the jump?
  • Step 013. Repeat steps 011-012 m times to generate m decision trees.
  • the generated m decision trees form a random forest; where m is a positive integer.
  • Step 02 Pre-construct a blacklist including webpage information of several phishing websites.
  • the blacklist may be a blacklist of Google Safe API.
  • Step 03 Determine the webpage information of the website to be detected.
  • Step 04 Obtain the URL of the website to be tested according to the webpage information of the website to be tested, and compare the URL of the website to be tested with a pre-built blacklist, if the blacklist includes the to-be-detected The URL of the website determines that the website to be detected is a phishing website. If the URL of the website to be detected is not included in the blacklist, step 05 is performed.
  • Step 05 Extract the feature information of the website to be detected according to the webpage information of the website to be detected.
  • the extracted feature information may be the same as the feature information in step 012.
  • Step 06 Booleanize the extracted feature information to convert to corresponding feature values.
  • the time period of the URL domain name of the detected website is not less than the set number of days, it will be converted to feature value 0; Feature value 1, if the URL does not contain the @ character, it is converted to feature value 0; if the URL includes at least two domain names, it is converted to feature value 1, if the URL does not include at least two domain names, it is converted to feature value 0; if the account password information is included in the form, it will be converted to feature value 1, if the account password information is not included in the form, it will be converted to feature value 0; if the value after the URL jump is not the same as before the jump, it will be converted to Feature value 1, if the value after URL jump is the same as before jump, it will be converted to feature value 0.
  • Step 07 Use each decision tree of the random forest to classify and vote on the extracted feature information.
  • Step 08 When the voting result is that the number of votes of the phishing website is more than the number of votes of the normal website, the website to be detected is determined to be a phishing website; otherwise, it is determined to be a normal website; The website, and perform step 09; if it is a normal website, the user is prompted to visit the website.
  • the random forest classifier detects whether the website is a phishing website or a normal website by using type identifiers 1, 0, where output 1 indicates that the website is a phishing website and output 0 indicates that the website is a normal website.
  • Step 09 Add the URL of the website to be detected to the pre-built blacklist.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

La présente invention concerne un procédé et un appareil à base d'arbres de décision pour détecter un site Web de hameçonnage, et un dispositif d'ordinateur, qui appartiennent au domaine technique de décisions intelligentes, ledit procédé comprenant : la pré-construction d'une forêt aléatoire en tant que modèle de classification ; la détermination d'informations de page Web en ce qui concerne un site Web à détecter, et l'extraction d'informations de caractéristique en ce qui concerne ledit site Web selon des informations de page Web en ce qui concerne ledit site Web ; l'utilisation de la forêt aléatoire construite qui comprend plusieurs arbres de décision pour effectuer un vote de classification sur les informations de caractéristique extraites ; et, lorsque le résultat du vote de classification indique que les votes d'un site Web de hameçonnage sont supérieurs aux votes d'un site Web normal, la détermination que ledit site Web est un site Web de hameçonnage. Dans la présente invention, la forêt aléatoire est construite en établissant des arbres de décision en utilisant un grand nombre d'échantillons de site Web, et comprend divers types de sites Web de hameçonnage ; et la forêt aléatoire est utilisée en tant que modèle de classification pour effectuer un vote de classification, ce qui permet d'obtenir une haute précision.
PCT/CN2019/091878 2018-10-26 2019-06-19 Procédé et appareil à base d'arbres de décision pour détecter un site web de hameçonnage, et dispositif d'ordinateur WO2020082763A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811256189.5 2018-10-26
CN201811256189.5A CN109450880A (zh) 2018-10-26 2018-10-26 基于决策树的钓鱼网站检测方法、装置及计算机设备

Publications (1)

Publication Number Publication Date
WO2020082763A1 true WO2020082763A1 (fr) 2020-04-30

Family

ID=65548383

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/091878 WO2020082763A1 (fr) 2018-10-26 2019-06-19 Procédé et appareil à base d'arbres de décision pour détecter un site web de hameçonnage, et dispositif d'ordinateur

Country Status (2)

Country Link
CN (1) CN109450880A (fr)
WO (1) WO2020082763A1 (fr)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109450880A (zh) * 2018-10-26 2019-03-08 平安科技(深圳)有限公司 基于决策树的钓鱼网站检测方法、装置及计算机设备
CN110061975A (zh) * 2019-03-29 2019-07-26 中国科学院计算技术研究所 一种基于离线流量包解析的仿冒网站识别方法及系统
CN113676374B (zh) * 2021-08-13 2024-03-22 杭州安恒信息技术股份有限公司 目标网站线索检测方法、装置、计算机设备和介质
CN115001763B (zh) * 2022-05-20 2024-03-19 北京天融信网络安全技术有限公司 钓鱼网站攻击检测方法、装置、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103049484A (zh) * 2012-11-30 2013-04-17 北京奇虎科技有限公司 一种网页危险性的识别方法和装置
WO2013106354A1 (fr) * 2012-01-12 2013-07-18 Microsoft Corporation Apprentissage automatique basé sur une classification de comptes utilisateur reposant sur des adresses électroniques et d'autres informations de comptes
CN108306878A (zh) * 2018-01-30 2018-07-20 平安科技(深圳)有限公司 钓鱼网站检测方法、装置、计算机设备和存储介质
CN109450880A (zh) * 2018-10-26 2019-03-08 平安科技(深圳)有限公司 基于决策树的钓鱼网站检测方法、装置及计算机设备

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10157280B2 (en) * 2009-09-23 2018-12-18 F5 Networks, Inc. System and method for identifying security breach attempts of a website
CN104217160B (zh) * 2014-09-19 2017-11-28 中国科学院深圳先进技术研究院 一种中文钓鱼网站检测方法及系统
CN107404473A (zh) * 2017-06-06 2017-11-28 西安电子科技大学 基于Mshield机器学习多模式Web应用防护方法
CN107566389A (zh) * 2017-09-19 2018-01-09 济南互信软件有限公司 一种基于c4.5决策树的模仿url链接钓鱼域名识别方法
CN108319672B (zh) * 2018-01-25 2023-04-18 南京邮电大学 基于云计算的移动终端不良信息过滤方法及系统
CN108540451A (zh) * 2018-03-13 2018-09-14 北京理工大学 一种用机器学习技术对网络攻击行为进行分类检测的方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013106354A1 (fr) * 2012-01-12 2013-07-18 Microsoft Corporation Apprentissage automatique basé sur une classification de comptes utilisateur reposant sur des adresses électroniques et d'autres informations de comptes
CN103049484A (zh) * 2012-11-30 2013-04-17 北京奇虎科技有限公司 一种网页危险性的识别方法和装置
CN108306878A (zh) * 2018-01-30 2018-07-20 平安科技(深圳)有限公司 钓鱼网站检测方法、装置、计算机设备和存储介质
CN109450880A (zh) * 2018-10-26 2019-03-08 平安科技(深圳)有限公司 基于决策树的钓鱼网站检测方法、装置及计算机设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
周浩 (ZHOU, HAO): "基于决策树的搜索引擎恶意网页检测研究与实现 (The Research and Implementation of Malicious Web Pages Detection from Search Engine Based on Decision Tree)", 湖南大学硕士学位论文 (MASTER'S DISSERTATION OF HUNAN UNIVERSITY), 15 July 2014 (2014-07-15) *

Also Published As

Publication number Publication date
CN109450880A (zh) 2019-03-08

Similar Documents

Publication Publication Date Title
CN109922052B (zh) 一种结合多重特征的恶意url检测方法
CN110808968B (zh) 网络攻击检测方法、装置、电子设备和可读存储介质
WO2020082763A1 (fr) Procédé et appareil à base d'arbres de décision pour détecter un site web de hameçonnage, et dispositif d'ordinateur
US9621570B2 (en) System and method for selectively evolving phishing detection rules
JP6530786B2 (ja) Webページの悪意のある要素を検出するシステム及び方法
US20180219907A1 (en) Method and apparatus for detecting website security
CN107707545B (zh) 一种异常网页访问片段检测方法、装置、设备及存储介质
CN108156131B (zh) Webshell检测方法、电子设备和计算机存储介质
CN107204960B (zh) 网页识别方法及装置、服务器
US11212297B2 (en) Access classification device, access classification method, and recording medium
CN109768992B (zh) 网页恶意扫描处理方法及装置、终端设备、可读存储介质
CN105224600B (zh) 一种样本相似度的检测方法及装置
US9210189B2 (en) Method, system and client terminal for detection of phishing websites
US10462168B2 (en) Access classifying device, access classifying method, and access classifying program
CN107463844B (zh) Web木马检测方法及系统
US11516235B2 (en) System and method for detecting bots based on anomaly detection of JavaScript or mobile app profile information
Barlow et al. A novel approach to detect phishing attacks using binary visualisation and machine learning
JP2018041442A (ja) Webページの異常要素を検出するためのシステム及び方法
US20160028746A1 (en) Malicious code detection
CN107786529B (zh) 网站的检测方法、装置及系统
CN114024761B (zh) 网络威胁数据的检测方法、装置、存储介质及电子设备
CN114422271A (zh) 数据处理方法、装置、设备及可读存储介质
CN110855635A (zh) Url识别方法、装置及数据处理设备
CN116800518A (zh) 一种网络防护策略的调整方法及装置
CN111125704A (zh) 一种网页挂马识别方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19875482

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19875482

Country of ref document: EP

Kind code of ref document: A1