CN103399912A - Fishing web page clustering method and device - Google Patents

Fishing web page clustering method and device Download PDF

Info

Publication number
CN103399912A
CN103399912A CN2013103265762A CN201310326576A CN103399912A CN 103399912 A CN103399912 A CN 103399912A CN 2013103265762 A CN2013103265762 A CN 2013103265762A CN 201310326576 A CN201310326576 A CN 201310326576A CN 103399912 A CN103399912 A CN 103399912A
Authority
CN
China
Prior art keywords
domain name
clustering
domain
phishing
counting result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013103265762A
Other languages
Chinese (zh)
Other versions
CN103399912B (en
Inventor
罗焱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201310326576.2A priority Critical patent/CN103399912B/en
Publication of CN103399912A publication Critical patent/CN103399912A/en
Priority to PCT/CN2014/083261 priority patent/WO2015014279A1/en
Application granted granted Critical
Publication of CN103399912B publication Critical patent/CN103399912B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6209Protecting access to data via a platform, e.g. using keys or access control rules to a single file or object, e.g. in a secure envelope, encrypted and accessed using a key, or with access control rules appended to the object itself
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1483Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2119Authenticating web pages, e.g. with suspicious links
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2149Restricted operating environment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2101/00Indexing scheme associated with group H04L61/00
    • H04L2101/30Types of network names
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Alarm Systems (AREA)

Abstract

The invention discloses a fishing web page clustering method and device. The method comprises the steps of receiving any fishing website; acquiring the domain name of the fishing website; acquiring the domain name type corresponding to the domain name from a preset domain name list; according to the domain name type, realizing fishing web page clustering. The fishing web page clustering method and device can realize fishing web page clustering after acquiring the domain name type corresponding to fishing websites, so that two defects generated by the clustering method in the prior art when a fishing criminal uses a secondary domain name of a second-level domain are overcome. Consequently, the false alarm rate and the missing reporting rate of fishing web pages are reduced, the detection ratio of the fishing webpages is improved, and the broadcast of fishing web pages is completely stopped from the source.

Description

Phishing webpage clustering method and device
Technical Field
The invention relates to the field of information security, in particular to a phishing webpage clustering method and device.
Background
The phishing webpage is usually disguised as a bank webpage or an e-commerce webpage, and the main harm is to steal private information such as a bank account number and a password submitted by a user. The phishing webpage is a network fraud behavior, which means that lawless persons use various means to imitate the URL (webpage address) and the page content of a real website, or insert dangerous HTML (hypertext markup language) codes into some webpages of a website by using bugs on a server program of the real website so as to cheat private data such as a user bank or a credit card account number, a password and the like. Clustering phishing webpages refers to grouping together webpages for "phishing" as a comparison criterion for detecting phishing webpages.
In the prior art, a plurality of methods for clustering phishing webpages exist, and the traditional phishing webpage clustering method comprises the following steps: firstly, determining a standard time period, such as a natural day, secondly, presetting a threshold value, acquiring the detected number of fishing webpages in any station or domain, thirdly, judging whether the acquired number exceeds the preset threshold value, and marking the whole station or the whole domain corresponding to the detected number of the fishing webpages exceeding the threshold value as the fishing webpages.
However, since the prior art phishing webpage clustering method only converges to a station or a domain, the prior art clustering method has two disadvantages for phishing perpetrators who are good at using a secondary domain name of a secondary domain name for crime fighting:
first, when a phishing perpetrator uses a secondary domain name of a secondary domain name to commit a crime, the prior art would identify the entire secondary domain as a phishing webpage, which may result in false positives of a part of the secondary domain name under the secondary domain name that is not used for committing, such as a large number of phishing webpages detected under the secondary domain name cn. However, in addition to the secondary domain name (e.g., a.cn.ms) applied for crime fighting by the phishing perpetrator, other secondary domain names (e.g., b.cn.ms) which are not used for "phishing" under the secondary domain name cn.ms may be misinformed as phishing webpages, so the clustering method of the prior art has the disadvantage of high false alarm rate.
Second, when a phishing perpetrator uses a secondary domain name for the secondary domain name to commit a crime, a technique of "extensive domain name resolution" is typically used. For example, b.a.cn.ms, c.a.cn.ms and d.e.a.cn.ms are all secondary domain names of a.cn.ms, if the prior art phishing webpage clustering method is used, all three sub-stations of b.a.cn.ms, c.a.cn.ms and d.e.a.cn.ms are usually identified as phishing webpages, but since the phishing perpetrator uses the "universal resolution technology", a large number of secondary domain names of a.cn.ms, i.e. a.cn.ms, can be automatically generated in a very short time, and thus, the prior art clustering method of the whole station or the whole domain does not completely stop the spread of the phishing webpages from the source.
Disclosure of Invention
In order to solve two defects generated by a clustering method in the prior art when a phishing criminal uses a secondary domain name of a secondary domain name for crime, the invention provides a phishing webpage clustering method and a device, which can reduce the false alarm rate of the phishing webpage and thoroughly prevent the spread of the phishing webpage from the source.
The invention provides a phishing webpage clustering method, which comprises the following steps:
receiving any fishing website;
acquiring a domain name of the fishing website;
acquiring a domain name type corresponding to the domain name from a preset domain name table;
and according to the domain name type, realizing phishing webpage clustering.
Preferably, the clustering phishing webpages according to the domain name type includes:
judging whether the domain name type is a secondary domain name or not, and if so, acquiring a secondary domain of the domain name;
when the preset clustering information base does not comprise the secondary domain, increasing the counting result of the secondary domain by 1 to obtain the counting result of the secondary domain;
and judging whether the counting result of the secondary domain meets the clustering condition, and if so, clustering the secondary domain of the domain name to the clustering information base.
Preferably, the method further comprises:
when the domain name type is not a secondary domain name, increasing 1 to the counting result of the domain name to obtain the counting result of the domain name;
and judging whether the counting result of the domain name meets the clustering condition, and if so, clustering the domain name to the clustering information base.
Preferably, the clustering condition includes:
within a preset time, the counting result is larger than a preset threshold value;
or,
and in the preset time, the ratio of the counting result to the website of the whole domain or the secondary domain is greater than a preset ratio value.
The invention also provides a phishing webpage clustering device, which comprises:
the receiving module is used for receiving any fishing website;
the first acquisition module is used for acquiring the domain name of the phishing website;
the second acquisition module is used for acquiring the domain name type corresponding to the domain name from a preset domain name table;
and the clustering module is used for realizing the clustering of the phishing webpages according to the domain name types.
Preferably, the clustering module includes:
the first judgment sub-module is used for judging whether the domain name type is a secondary domain name;
the first obtaining sub-module is used for obtaining a secondary domain of the domain name when the result of the first judging sub-module is yes;
the first increasing submodule is used for increasing the counting result of the secondary domain by 1 to obtain the counting result of the secondary domain when the preset clustering information base does not comprise the secondary domain;
the second judgment submodule is used for judging whether the counting result of the secondary domain meets the clustering condition or not;
and the first clustering sub-module is used for clustering the secondary domain of the domain name to the clustering information base when the result of the second judging sub-module is yes.
Preferably, the clustering module further comprises:
the second increasing sub-module is used for increasing the counting result of the domain name by 1 to obtain the counting result of the domain name when the domain name type is not the second-level domain name;
the third judgment sub-module is used for judging whether the counting result of the domain name meets the clustering condition;
and the second clustering submodule is used for clustering the domain name to the clustering information base when the result of the third judging submodule is yes.
The method comprises the steps of firstly receiving any phishing website, secondly obtaining a domain name of the phishing website, thirdly obtaining a domain name type corresponding to the domain name from a preset domain name table, and finally realizing phishing webpage clustering according to the domain name type. Compared with the method for clustering the phishing websites to the station or the domain in the prior art, the method can realize the clustering of the phishing webpages according to the domain name type after the domain name type corresponding to the phishing website is obtained, so that two defects generated by the clustering method in the prior art when a phishing criminal uses a secondary domain name of a secondary domain name for crime are effectively overcome, the false alarm rate of the phishing webpages can be reduced, and the spread of the phishing webpages is thoroughly prevented from the source.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a phishing webpage clustering method according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating the distribution of domain names of phishing websites among various types of domain names;
FIG. 3 is a flowchart of a phishing webpage clustering method according to a second embodiment of the present invention;
fig. 4 is a structural diagram of a phishing webpage clustering device provided in the third embodiment of the present invention;
fig. 5 is a schematic structural diagram of a server according to a third embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Example one
The invention discloses a method for clustering phishing webpages, which is characterized in that a domain name type is used for solving the problem of clustering the phishing webpages, so that the defects of the prior art are effectively overcome, when a phishing criminal uses a secondary domain name of a secondary domain name for committing, the phishing webpages can be directly clustered to the secondary domain name of the secondary domain name applied by the phishing criminal and are marked as the phishing webpages, and the propagation of the phishing webpages is thoroughly stopped at the source. The invention can effectively solve the two defects of the clustering method in the prior art for phishing criminals who use the secondary domain name of the secondary domain name to do crime by utilizing the characteristic of the domain name type.
Referring to fig. 1, fig. 1 is a flowchart of a phishing webpage clustering method provided in this embodiment, which may specifically include:
step 101: any phishing website is received.
In this embodiment, any fishing website is received, where the fishing website is a detected fishing website, and the specific detection method is not limited in this embodiment.
Step 102: and acquiring the domain name of the phishing website.
In this embodiment, after receiving the phishing website, the domain name of the phishing website is acquired. Among them, a character-type address corresponding to a numeric-type IP address on a network is called a domain name.
In actual operation, there are many ways to acquire the domain name of the phishing website, and this embodiment does not limit this. For example, if the fishing website is b.a.cn.ms, the domain name of the obtained fishing website is cn.ms, and if the fishing website is b.a.com/1.asp, the domain name of the obtained fishing website is a.com.
Step 103: and acquiring the domain name type corresponding to the domain name from a preset domain name table.
In this embodiment, after the domain name is obtained, the domain name type corresponding to the domain name is queried in a preset domain name table, where the domain name type in the preset domain name table may be set as a secondary domain name or a non-secondary domain name. Referring to table 1, fig. 1 is a preset domain name table provided in this embodiment, and a specific form of the domain name table is not limited to the form provided in table 1, and meanwhile, the domain name table may be obtained through manual statistics. The specific domain name table provided in this embodiment may be as follows:
domain name type Domain name
1 tk
1 co.cc
2 in
2 info
3 com
4 cn
5 cn.ms
5 net.tf
6 3322.org
7 vicp.net
TABLE 1
There are many methods for dividing the domain name type, for example, the domain name type can be divided in terms of the cost for obtaining the domain name, and since the free or low-cost domain name can directly reduce the cost of phishing crimes, the free or low-cost domain name is generally used by most phishing criminals, and the division of the domain name type by the domain name cost can powerfully attack phishing crimes to some extent.
If domain name types are classified according to the cost of acquiring domain names, the domain name types can be classified into free domain names (including free top-level domain names, free second-level domain names), cheap domain names (including cheap top-level domain names, cheap second-level domain names such as dynamic domain names), and the like, and the cheap or free domain names are gradually becoming the flooding areas of phishing websites. Referring to fig. 2, fig. 2 is a schematic diagram of the distribution of the domain names of the phishing websites among various types of domain names, which is taken from the annual report of the chinese phishing alliance 2012, and can be seen from fig. 2, except com, other domain names are basically free domain names and cheap domain names. Wherein, tk,. co.cc,. pl, which occupies a large specific weight, are representative of free top-level domain names; the top level domain names ms and tf contain a large number of free secondary domain names, such as cn.ms, hk.ms, net.tf and eu.tf; cheap top-level domain names include to, info, in, etc., whereas cheap second-level domain names are mostly provided domestically by dynamic domain name providers, such as 3322. org.
Step 104: and according to the domain name type, realizing phishing webpage clustering.
In this embodiment, the clustering of the phishing webpages is realized according to the domain name type corresponding to the domain name.
In actual operation, whether the domain name belongs to a second-level domain name or not can be determined according to the domain name type, and subsequent clustering of phishing webpages is performed according to a determined result.
In this embodiment, firstly, any phishing website is received, secondly, a domain name of the phishing website is obtained, thirdly, a domain name type corresponding to the domain name is obtained from a preset domain name table, and finally, clustering of phishing webpages is achieved according to the domain name type. Compared with the method for clustering the phishing websites to the station or the domain in the prior art, the embodiment can realize the clustering of the phishing webpages according to the domain name type after the domain name type corresponding to the phishing website is obtained, so that two defects generated by the clustering method in the prior art when a phishing criminal uses a secondary domain name of a secondary domain name for crime are effectively overcome, the false alarm rate of the phishing webpages can be reduced, and the propagation of the phishing webpages is thoroughly prevented from the source.
Example two
Referring to fig. 3, fig. 3 is a flowchart of a phishing webpage clustering method provided in this embodiment, which may specifically include:
step 301: receiving any fishing website;
step 302: acquiring a domain name of the fishing website;
step 303: acquiring a domain name type corresponding to the domain name from a preset domain name table;
steps 301 to 303 in this embodiment are the same as steps 101 to 103 in the first embodiment, and are not described again here.
Step 304: and judging whether the domain name type is a secondary domain name or not, if so, entering a step 305, and if not, entering a step 309.
In this embodiment, after obtaining the domain name type corresponding to the domain name, first determine whether the domain name type belongs to the second-level domain name, if so, go to step 305, otherwise, go to step 309.
In actual operation, after determining the domain name type, it may be determined whether the domain name type belongs to a second-level domain name with reference to table 2, where table 2 is a domain name type table, and a specific form of the domain name type table is not limited to the form provided in table 2, and meanwhile, the domain name type table may be obtained through manual statistics. In this embodiment, the domain name type corresponding to the domain name may be first obtained through table 1, and then, whether the domain name type belongs to the second-level domain name is queried in table 2. Specifically, table 2 may be as follows:
Figure BDA00003590677700071
TABLE 2
Step 305: and acquiring a secondary domain of the domain name.
In this embodiment, when the domain name type is a secondary domain name, a secondary domain name of the domain name is obtained, and as illustrated below, if the phishing website is b.a.cn.ms, the domain name of the phishing website is cn.ms, and meanwhile, the secondary domain name of the phishing website is a.cn.ms.
Specifically, there are many ways to obtain the secondary domain of the domain name, which is not limited in this embodiment.
Step 306: and when the preset clustering information base does not comprise the secondary domain, increasing the counting result of the secondary domain by 1 to obtain the counting result of the secondary domain.
In this embodiment, it is first determined whether the obtained secondary domain belongs to a preset clustering information base, and if not, the counting result of the secondary domain is increased by 1, so as to obtain the final counting result of the secondary domain.
In actual operation, the number of times the secondary domain is detected is counted in real time, i.e. the count result is incremented by 1 if the secondary domain is detected once. Among them, the counting method is not limited.
Step 307: and judging whether the counting result of the secondary domain meets the clustering condition, if so, entering the step 308.
In this embodiment, after the counting result of the secondary domain is obtained, it is first determined whether the counting result meets a preset clustering condition, if so, step 308 is entered, otherwise, clustering of other phishing webpages may be continued.
In practical operation, the clustering condition may be: within a preset time, the counting result is larger than a preset threshold value; or, within the preset time, the ratio of the counting result to the website of the whole domain or the secondary domain is greater than a preset ratio value.
Referring to table 2, the clustering condition may be set to "the number of web addresses of the blacked out threshold of the whole day domain is 50", and then, when the count result of the one day of the secondary domain is greater than 50, step 308 is entered. Similarly, the clustering condition may be set to "the ratio of the daily cluster black threshold black sites is 50%", and then step 308 is performed when the one-day counting result of the secondary domain accounts for more than 50% of all black sites.
Step 308: and clustering the secondary domain of the domain name to the clustering information base.
In this embodiment, when the counting result of the secondary domain meets the preset clustering condition, the secondary domain is clustered into the clustering information base, that is, the secondary domain is determined as a phishing webpage.
Step 309: and increasing the counting result of the domain name by 1 to obtain the counting result of the domain name.
In this embodiment, when the domain name type is not a secondary domain name, the counting result of the domain name is added by 1 to obtain the counting result.
Step 310: and judging whether the counting result of the domain name meets the clustering condition, if so, entering step 311.
In this embodiment, after the counting result of the domain name is obtained, it is first determined whether the counting result meets a preset clustering condition, if so, step 311 is performed, otherwise, clustering is performed on other phishing webpages.
For example, for a domain name with a domain name type of 1, the clustering condition obtained through table 2 may be that "the number of websites of the blackout threshold of the entire daily domain is greater than 50", that is, the number of clustered websites of the entire daily domain is greater than 50.
Step 311: and clustering the domain name to the clustering information base.
In this embodiment, when the counting result of the domain name meets the preset clustering condition, the domain name is clustered to the clustering information base.
According to the embodiment, the phishing webpage clustering can be realized according to the domain name type after the domain name type corresponding to the phishing website is obtained, so that two defects generated by the clustering method in the prior art when a phishing criminal uses a secondary domain name of a secondary domain name for committing a crime are effectively solved, the false alarm rate of the phishing webpage can be reduced, and the propagation of the phishing webpage can be thoroughly prevented from the source.
EXAMPLE III
Referring to fig. 4, fig. 4 is a structural diagram of a phishing webpage clustering device provided in this embodiment, where the device may include:
a receiving module 401, configured to receive any phishing website;
a first obtaining module 402, configured to obtain a domain name of the phishing website;
a second obtaining module 403, configured to obtain a domain name type corresponding to the domain name from a preset domain name table;
and the clustering module 404 is configured to implement clustering of the phishing webpages according to the domain name types.
Wherein the clustering module may include:
the first judgment sub-module is used for judging whether the domain name type is a secondary domain name;
the first obtaining sub-module is used for obtaining a secondary domain of the domain name when the result of the first judging sub-module is yes;
the first increasing submodule is used for increasing the counting result of the secondary domain by 1 to obtain the counting result of the secondary domain when the preset clustering information base does not comprise the secondary domain;
the second judgment submodule is used for judging whether the counting result of the secondary domain meets the clustering condition or not;
and the first clustering sub-module is used for clustering the secondary domain of the domain name to the clustering information base when the result of the second judging sub-module is yes.
Meanwhile, the clustering module may further include:
the second increasing sub-module is used for increasing the counting result of the domain name by 1 to obtain the counting result of the domain name when the domain name type is not the second-level domain name;
the third judgment sub-module is used for judging whether the counting result of the domain name meets the clustering condition;
and the second clustering submodule is used for clustering the domain name to the clustering information base when the result of the third judging submodule is yes.
Referring to fig. 5, fig. 5 shows a server provided in the present embodiment, where the server may be used to implement the method provided in the foregoing embodiments. Specifically, the method comprises the following steps:
the server may include components such as a memory 510 having one or more readable storage media, an input unit 520, an output unit 530 including a processor 540 having one or more processing cores, and a power supply 550. Wherein:
the memory 510 may be used to store software programs and modules, and the processor 540 may execute various functional applications and data processing by operating the software programs and modules stored in the memory 510. The memory 510 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the computer, and the like. Further, the memory 510 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 510 may also include a memory controller to provide the processor 540 and the input unit 520 access to the memory 510.
The input unit 520 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
The processor 540 is a control center of the server, connects various parts using various interfaces and lines, performs various functions of the computer and processes data by operating or executing software programs and/or modules stored in the memory 510 and calling data stored in the memory 510, thereby integrally monitoring the mobile phone. Optionally, processor 540 may include one or more processing cores.
The server also includes a power supply 550 (e.g., a battery) for powering the various components, which may preferably be logically coupled to the processor 540 via a power management system to manage charging, discharging, and power consumption management functions via the power management system. The power supply 550 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.
Specifically, in this embodiment, the processor 540 loads the executable file corresponding to the process of one or more application programs into the memory 510 according to the following instructions, and the processor 540 runs the application programs stored in the memory 510, so as to implement various functions:
receiving any fishing website;
acquiring a domain name of the fishing website;
acquiring a domain name type corresponding to the domain name from a preset domain name table;
and according to the domain name type, realizing phishing webpage clustering.
Preferably, the clustering phishing webpages according to the domain name type includes:
judging whether the domain name type is a secondary domain name or not, and if so, acquiring a secondary domain of the domain name;
when the preset clustering information base does not comprise the secondary domain, increasing the counting result of the secondary domain by 1 to obtain the counting result of the secondary domain;
and judging whether the counting result of the secondary domain meets the clustering condition, and if so, clustering the secondary domain of the domain name to the clustering information base.
Preferably, the method further comprises:
when the domain name type is not a secondary domain name, increasing 1 to the counting result of the domain name to obtain the counting result of the domain name;
and judging whether the counting result of the domain name meets the clustering condition, and if so, clustering the domain name to the clustering information base.
Preferably, the clustering condition includes:
within a preset time, the counting result is larger than a preset threshold value;
or,
and in the preset time, the ratio of the counting result to the website of the whole domain or the secondary domain is greater than a preset ratio value.
In this embodiment, firstly, any phishing website is received, secondly, a domain name of the phishing website is obtained, thirdly, a domain name type corresponding to the domain name is obtained from a preset domain name table, and finally, clustering of phishing webpages is achieved according to the domain name type. Compared with the method for clustering the phishing websites to the station or the domain in the prior art, the embodiment can realize the clustering of the phishing webpages according to the domain name type after the domain name type corresponding to the phishing website is obtained, so that two defects generated by the clustering method in the prior art when a phishing criminal uses a secondary domain name of a secondary domain name for crime are effectively overcome, the false alarm rate of the phishing webpages can be reduced, and the propagation of the phishing webpages is thoroughly prevented from the source.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The method and the device for clustering phishing webpages provided by the embodiment of the invention are described in detail, a specific example is applied in the text to explain the principle and the embodiment of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (7)

1.A phishing webpage clustering method, the method comprising:
receiving any fishing website;
acquiring a domain name of the fishing website;
acquiring a domain name type corresponding to the domain name from a preset domain name table;
and according to the domain name type, realizing phishing webpage clustering.
2. The method of claim 1, wherein the enabling phishing web clustering based on the domain name type comprises:
judging whether the domain name type is a secondary domain name or not, and if so, acquiring a secondary domain of the domain name;
when the preset clustering information base does not comprise the secondary domain, increasing the counting result of the secondary domain by 1 to obtain the counting result of the secondary domain;
and judging whether the counting result of the secondary domain meets the clustering condition, and if so, clustering the secondary domain of the domain name to the clustering information base.
3. The method of claim 2, further comprising:
when the domain name type is not a secondary domain name, increasing 1 to the counting result of the domain name to obtain the counting result of the domain name;
and judging whether the counting result of the domain name meets the clustering condition, and if so, clustering the domain name to the clustering information base.
4. The method according to claim 2 or 3, wherein the clustering condition comprises:
within a preset time, the counting result is larger than a preset threshold value;
or,
and in the preset time, the ratio of the counting result to the website of the whole domain or the secondary domain is greater than a preset ratio value.
5. A phishing webpage clustering apparatus, the apparatus comprising:
the receiving module is used for receiving any fishing website;
the first acquisition module is used for acquiring the domain name of the phishing website;
the second acquisition module is used for acquiring the domain name type corresponding to the domain name from a preset domain name table;
and the clustering module is used for realizing the clustering of the phishing webpages according to the domain name types.
6. The apparatus of claim 5, wherein the clustering module comprises:
the first judgment sub-module is used for judging whether the domain name type is a secondary domain name;
the first obtaining sub-module is used for obtaining a secondary domain of the domain name when the result of the first judging sub-module is yes;
the first increasing submodule is used for increasing the counting result of the secondary domain by 1 to obtain the counting result of the secondary domain when the preset clustering information base does not comprise the secondary domain;
the second judgment submodule is used for judging whether the counting result of the secondary domain meets the clustering condition or not;
and the first clustering sub-module is used for clustering the secondary domain of the domain name to the clustering information base when the result of the second judging sub-module is yes.
7. The apparatus of claim 6, wherein the clustering module further comprises:
the second increasing sub-module is used for increasing the counting result of the domain name by 1 to obtain the counting result of the domain name when the domain name type is not the second-level domain name;
the third judgment sub-module is used for judging whether the counting result of the domain name meets the clustering condition;
and the second clustering submodule is used for clustering the domain name to the clustering information base when the result of the third judging submodule is yes.
CN201310326576.2A 2013-07-30 2013-07-30 A kind of fishing webpage clustering method and device Active CN103399912B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201310326576.2A CN103399912B (en) 2013-07-30 2013-07-30 A kind of fishing webpage clustering method and device
PCT/CN2014/083261 WO2015014279A1 (en) 2013-07-30 2014-07-29 Method and device for clustering phishing webpages

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310326576.2A CN103399912B (en) 2013-07-30 2013-07-30 A kind of fishing webpage clustering method and device

Publications (2)

Publication Number Publication Date
CN103399912A true CN103399912A (en) 2013-11-20
CN103399912B CN103399912B (en) 2016-08-17

Family

ID=49563540

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310326576.2A Active CN103399912B (en) 2013-07-30 2013-07-30 A kind of fishing webpage clustering method and device

Country Status (2)

Country Link
CN (1) CN103399912B (en)
WO (1) WO2015014279A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015014279A1 (en) * 2013-07-30 2015-02-05 Tencent Technology (Shenzhen) Company Limited Method and device for clustering phishing webpages
CN104933178A (en) * 2015-07-01 2015-09-23 北京奇虎科技有限公司 Official website determining method and system
CN106453208A (en) * 2015-08-07 2017-02-22 北京奇虎科技有限公司 Advertisement material data website verification method and device
CN106649366A (en) * 2015-10-30 2017-05-10 北京国双科技有限公司 Method and device for classifying keyword search results
CN103701951B (en) * 2013-12-27 2018-03-06 北京奇安信科技有限公司 The analysis method of website visiting record and the analytical equipment of website visiting record
CN108696599A (en) * 2017-04-07 2018-10-23 北京上元信安技术有限公司 A kind of method, system and the firewall box of removal redundancy domain name
CN117892801A (en) * 2024-03-13 2024-04-16 鹏城实验室 Training method of domain name generation model, phishing website discovery method and related device

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10129276B1 (en) * 2016-03-29 2018-11-13 EMC IP Holding Company LLC Methods and apparatus for identifying suspicious domains using common user clustering
GB2555801A (en) * 2016-11-09 2018-05-16 F Secure Corp Identifying fraudulent and malicious websites, domain and subdomain names
CN114629875B (en) * 2022-02-10 2024-06-04 互联网域名系统北京市工程研究中心有限公司 Active detection domain name brand protection method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060123464A1 (en) * 2004-12-02 2006-06-08 Microsoft Corporation Phishing detection, prevention, and notification
US7698442B1 (en) * 2005-03-03 2010-04-13 Voltage Security, Inc. Server-based universal resource locator verification service
CN102571404A (en) * 2010-12-31 2012-07-11 北京新媒传信科技有限公司 Website access statistical method and website access statistical system
CN102938769A (en) * 2012-11-22 2013-02-20 国家计算机网络与信息安全管理中心 Detection method of Domain flux botnet domain names

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103399912B (en) * 2013-07-30 2016-08-17 腾讯科技(深圳)有限公司 A kind of fishing webpage clustering method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060123464A1 (en) * 2004-12-02 2006-06-08 Microsoft Corporation Phishing detection, prevention, and notification
US7698442B1 (en) * 2005-03-03 2010-04-13 Voltage Security, Inc. Server-based universal resource locator verification service
CN102571404A (en) * 2010-12-31 2012-07-11 北京新媒传信科技有限公司 Website access statistical method and website access statistical system
CN102938769A (en) * 2012-11-22 2013-02-20 国家计算机网络与信息安全管理中心 Detection method of Domain flux botnet domain names

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015014279A1 (en) * 2013-07-30 2015-02-05 Tencent Technology (Shenzhen) Company Limited Method and device for clustering phishing webpages
CN103701951B (en) * 2013-12-27 2018-03-06 北京奇安信科技有限公司 The analysis method of website visiting record and the analytical equipment of website visiting record
CN104933178A (en) * 2015-07-01 2015-09-23 北京奇虎科技有限公司 Official website determining method and system
CN104933178B (en) * 2015-07-01 2018-09-11 北京奇虎科技有限公司 Official website determines method and system and the sort method of official website
CN106453208A (en) * 2015-08-07 2017-02-22 北京奇虎科技有限公司 Advertisement material data website verification method and device
CN106649366A (en) * 2015-10-30 2017-05-10 北京国双科技有限公司 Method and device for classifying keyword search results
CN106649366B (en) * 2015-10-30 2020-05-22 北京国双科技有限公司 Keyword search result classification method and device
CN108696599A (en) * 2017-04-07 2018-10-23 北京上元信安技术有限公司 A kind of method, system and the firewall box of removal redundancy domain name
CN108696599B (en) * 2017-04-07 2021-02-19 北京上元信安技术有限公司 Method, system and firewall equipment for removing redundant domain names from domain name classification feature library
CN117892801A (en) * 2024-03-13 2024-04-16 鹏城实验室 Training method of domain name generation model, phishing website discovery method and related device

Also Published As

Publication number Publication date
WO2015014279A1 (en) 2015-02-05
CN103399912B (en) 2016-08-17

Similar Documents

Publication Publication Date Title
CN103399912B (en) A kind of fishing webpage clustering method and device
CN104391979B (en) Network malice reptile recognition methods and device
CN103902888B (en) Method, service end and the system of website degree of belief automatic measure grading
KR101530941B1 (en) Method, system and client terminal for detection of phishing websites
CN108112038B (en) Method and device for controlling access flow
CN110035075A (en) Detection method, device, computer equipment and the storage medium of fishing website
US20150026813A1 (en) Method and system for detecting network link
CN105634855A (en) Method and device for recognizing network address abnormity
CN107784205B (en) User product auditing method, device, server and storage medium
CN105516390B (en) Domain name management method and device
CN113779481B (en) Method, device, equipment and storage medium for identifying fraud websites
CN105208009B (en) Account security detection method and device
CN104980402A (en) Method and device for recognizing malicious operation
CN104933069A (en) Method and system for analyzing web browsing statistics of desktop terminal
CN104598595A (en) Fraud webpage detection method and corresponding device
CN104184653A (en) Message filtering method and device
CN102891861A (en) Client-based phishing website detecting method and device
CN109255254A (en) A kind of data base authority management method, device, equipment and storage medium
CN108062459B (en) Method and device for preventing page information from being captured
CN108875050B (en) Text-oriented digital evidence-obtaining analysis method and device and computer readable medium
CN114765584A (en) User behavior monitoring method and device, electronic equipment and storage medium
CN108389106A (en) Method and device is rented in the automatic spelling in whole source of renting a house
Vito et al. Capital punishment
CN107784054B (en) Page publishing method and device
CN110808961B (en) Data processing method and device for security verification

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant