WO2013013475A1 - 网络钓鱼检测方法及装置 - Google Patents

网络钓鱼检测方法及装置 Download PDF

Info

Publication number
WO2013013475A1
WO2013013475A1 PCT/CN2011/083671 CN2011083671W WO2013013475A1 WO 2013013475 A1 WO2013013475 A1 WO 2013013475A1 CN 2011083671 W CN2011083671 W CN 2011083671W WO 2013013475 A1 WO2013013475 A1 WO 2013013475A1
Authority
WO
WIPO (PCT)
Prior art keywords
phishing
url
suspected
host
host name
Prior art date
Application number
PCT/CN2011/083671
Other languages
English (en)
French (fr)
Inventor
洪博
耿光刚
王利明
肖雅丽
Original Assignee
中国科学院计算机网络信息中心
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院计算机网络信息中心 filed Critical 中国科学院计算机网络信息中心
Publication of WO2013013475A1 publication Critical patent/WO2013013475A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1483Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing

Definitions

  • the present invention relates to the field of network security technologies, and in particular, to a phishing detection method and apparatus. Background technique
  • Phishing refers to enticing a receiving user to a phishing website that is very similar to the target organization's website by sending spam emails, etc., and obtaining personal sensitive information input by the receiving user on the phishing website. Cybercrime. Due to the popularity and development of e-commerce and Internet applications, the damage caused by phishing has become increasingly serious.
  • blacklist technology maintains a constantly updated list of phishing sites by user reporting or evaluation, thereby preventing more users from accessing the phishing sites that have been discovered.
  • the heuristic feature detection technology detects the unknown phishing website by using the phishing website's link, text content, and domain name information as criteria for phishing websites.
  • the pattern recognition based detection technology extracts feature vectors from a large number of phishing website samples, and then trains to form a discriminant model, which is used to detect unknown phishing websites.
  • the client or the browser plug-in passively accept the Uniform Universal Resource Locator (URL) submitted by the user, and then send the URL submitted by the user to the detecting device, and the detecting device can
  • the above detection technology is used to determine whether the URL is a URL of a phishing website (referred to as a phishing URL).
  • the detecting device can only passively accept the URL submitted by the client for detection.
  • the fishing attack is becoming more and more common, and relying solely on user-triggered passive detection is obviously not enough to deal with phishing attacks. Summary of the invention
  • the embodiment of the invention provides a phishing detection method and device, which can solve the problem that the passive detection by the user triggering in the prior art cannot cope with the increasingly popular phishing attack.
  • the embodiment of the present invention provides a phishing detection method, including: acquiring a suspected phishing host name that matches a keyword of a phishing target;
  • an embodiment of the present invention provides a phishing detection apparatus, including: a suspected host acquisition module, configured to acquire a suspected phishing host name that matches a keyword of a phishing target;
  • a URL path obtaining module configured to acquire a phishing resource locator URL path corresponding to the phishing target
  • a URL construction module configured to splicing the suspected phishing host name and the phishing URL path into a suspected phishing URL
  • the detecting module is configured to detect the suspected phishing URL and determine whether the suspected phishing URL is a phishing URL.
  • the suspected phishing host name and the phishing URL path are spliced into a suspected phishing URL by actively acquiring the suspected phishing host name matching the keyword of the phishing target and the phishing URL path corresponding to the phishing target, and
  • the suspected phishing URL is used to detect whether the suspected phishing URL is a phishing URL, and overcomes the problem that the passive detection triggered by the user in the prior art cannot cope with the increasingly popular phishing attack, thereby achieving an earlier
  • the discovery of phishing websites has improved the efficiency of phishing websites.
  • FIG. 1 is a schematic flowchart diagram of Embodiment 1 of a phishing detection method provided by the present invention.
  • FIG. 2 is a schematic flowchart diagram of Embodiment 2 of a phishing detection method provided by the present invention.
  • FIG. 3 is a schematic structural diagram of Embodiment 1 of a phishing detection apparatus provided by the present invention.
  • FIG. 4 is a schematic structural diagram of Embodiment 2 of a phishing detection apparatus according to the present invention.
  • the inventor analyzes the actual phishing report data and finds that more than 90% of phishing attacks are still in the traditional mode, that is, by using a uniform resource locator (Uniform Universal Resource Locator) that has counterfeit behavior against phishing targets. , referred to as the URL) and its corresponding web page to attract fraudulent users to disclose confidential information such as their own account.
  • the fishing target here refers to the counterfeit objects of the phishing website, such as Taobao.com, Industrial and Commercial Bank of China, etc.
  • the URL includes the host name and URL path, t ⁇ http: ⁇ item.taobao.com/member/minilogin.asp This is a URL, which is hostname http: ⁇ item.taobao.com/ and the URL path /member/minilogin .asp composition.
  • the URL that has counterfeit behavior for the phishing target that is, the phishing website can be http://item.taobao.cvbda.co.cc/member/minilogin.asp. Because jth, through the targeted scanning and detection of the active host on the network, it is possible to retrieve the host with the counterfeit tendency as the suspected fishing host.
  • the embodiment of the present invention proposes to use the URL path in the phishing database that has been determined as a supplement, and splicing together the suspected phishing host name to construct a complete suspected phishing URL. The suspected phishing URL is then detected to determine the phishing URL.
  • FIG. 1 is a schematic flowchart of Embodiment 1 of a phishing detection method provided by the present invention. As shown in FIG. 1, this embodiment includes:
  • Step 101 Obtain a suspected phishing host name that matches a keyword of the phishing target
  • the detecting device may manually obtain the suspected phishing host name that matches the keyword of the phishing target based on the network administrator, or may obtain the suspected phishing host name by querying from the Domain Name System (DNS) server.
  • DNS Domain Name System
  • the embodiment does not limit this.
  • the suspected phishing host name matching the keyword of the phishing target is usually similar to the host name of the phishing target Like the host name.
  • the fishing target is Taobao.
  • the keyword can be taobao.
  • the suspected phishing host name matching the keyword can be htt: ⁇ www.taobao.co.cc/.
  • Step 102 Obtain a phishing URL path corresponding to the phishing target.
  • the phishing URL path corresponding to the phishing target is a pointer to the phishing target
  • the used phishing URL path can usually be obtained from an existing database.
  • the existing database stores the phishing target and one or more phishing URLs corresponding to the phishing target, and the phishing URL path is intercepted from the phishing URL.
  • step 101 and step 102 there is no certain order relationship between step 101 and step 102 in the actual application, and both step 101 and step 102 can be performed before step 103.
  • Step 103 splicing the suspected phishing host name and the phishing URL path into a suspected phishing
  • Step 104 Detect the suspected phishing URL, and determine whether the suspected phishing URL is a phishing URL.
  • the single-technology, the heuristic feature detection technology, the pattern recognition-based detection technology, and the like are not limited in this embodiment.
  • the phishing detection may be performed on a phishing target, and the phishing detection may be performed on a plurality of phishing targets, which is not limited in this embodiment.
  • the suspected phishing host name and the phishing URL path are spliced into a suspected phishing URL by actively acquiring the suspected phishing host name matching the phishing target keyword and the phishing URL path corresponding to the phishing target, and the suspected phishing URL is
  • the URL detects the technical means for determining whether the suspected phishing URL is a phishing URL, and overcomes the problem that the passive detection triggered by the user in the prior art cannot cope with the increasingly popular phishing attack, thereby realizing an earlier discovery of phishing.
  • the website has improved the efficiency of phishing website detection.
  • FIG. 2 is a schematic flowchart diagram of Embodiment 2 of a phishing detection method provided by the present invention. As shown in Figure 2, this embodiment includes:
  • Step 201 Obtain a host query log from a DNS server.
  • the host query log includes the host name, the query time, and the source IP address of the query.
  • the information may be an authoritative query log or a recursive query log. This embodiment does not limit this.
  • Step 202 Determine a host name list according to the host query log. Extracting the host name queried in the host query log to form a host name list.
  • Step 203 Pre-processing the host name list to form a valid host name list.
  • the pre-processing here includes but is not limited to any of the following manners. Or a combination of modes: 1) removing duplicate host names from the list of host names; 2) deleting the host name of the host whose port is in the closed state from the list of host names; 3) deleting from the list of host names The host name in the whitelist; 4) Delete the host name of the host with the normal page rank Page Rank value from the host name list.
  • Step 204 Match a keyword of the phishing target, and determine, from the valid host name list, a suspected phishing host name that matches a keyword of the phishing target;
  • the keyword of the fishing target such as the fishing target is Taobao
  • the key word of the fishing target may be taobao.
  • the combination of keywords can also be used here.
  • the combination of item and taobao often appears in the phishing URL host for Taobao
  • the combination of item and taobao can be used to match the list of valid host names.
  • the host name for example, matches http: ⁇ item.taobao.cvbda.co.cc/ as a suspected phishing hostname for Taobao.
  • Step 205 Read a phishing URL path corresponding to the phishing target from the phishing database; where the phishing report data source disclosed in any one of the prior art can be used as a phishing database, such as phishtank.com.
  • the phishing database contains the following information: The phishing target and the phishing URL corresponding to the phishing target.
  • the phishing URL path corresponding to the phishing target may be one or multiple, which is not limited in this embodiment.
  • the step 205 further includes: sorting the at least two phishing URL paths corresponding to the phishing target according to the frequency of occurrence, and then obtaining The N phishing URL paths with the highest frequency constitute a list of high frequency phishing paths, and N is a natural number greater than 1.
  • Step 206 splicing the suspected phishing host name and the phishing URL path into a suspected phishing URL
  • the suspected phishing host name and the phishing URL path in the high frequency phishing path list are sequentially Splicing to get a list of suspected phishing URLs.
  • Step 207 Access the suspected phishing URL to obtain the page corresponding to the suspected phishing URL.
  • the online access sniffing in the prior art may be used to determine whether the suspected phishing URL can be accessed online, and if not, the online phishing is ended. , or continue to conduct online access sniffing on the next suspected phishing URL.
  • Step 208 If the page includes a login box and a keyword of the phishing target, determine that the suspected phishing URL is a phishing URL.
  • steps 207 and 208 further determination is made as to whether the suspected phishing URL is a phishing URL, and the accuracy of the result is improved.
  • the host query log is obtained from the DNS server, the host name list is determined according to the host query log, the host name list is preprocessed, the target of the phishing target is determined, and the suspected phishing host name is determined, and then the suspected phishing host name and the phishing database are Obtaining the phishing URL path corresponding to the phishing target constitutes a suspected phishing URL, and finally determining whether the suspected phishing URL is a phishing URL by detecting the suspected phishing URL, not only overcomes the passive detection that is triggered by the user in the prior art, and cannot cope with The problem of phishing attacks has become more and more common, and the earlier discovery of phishing websites has been realized, the efficiency of phishing websites has been improved, and the accuracy of phishing websites has been improved.
  • FIG. 3 is a schematic structural diagram of Embodiment 1 of a phishing detection apparatus provided by the present invention. As shown in FIG. 3, this embodiment includes:
  • the suspected host acquisition module 31 is configured to obtain a suspected fishing host name that matches the keyword of the fishing target;
  • a URL path obtaining module 32 configured to acquire a phishing URL path corresponding to the phishing target
  • a URL constructing module 33 configured to spell the suspected phishing host name and the phishing URL path Connected to a suspected phishing URL
  • the detecting module 34 is configured to detect the suspected phishing URL, and determine whether the suspected phishing URL is a phishing URL.
  • the specific implementation of this embodiment refers to an embodiment of the phishing detection method provided by the present invention.
  • the suspected phishing host name and the phishing URL path are spliced into a suspected phishing URL by actively acquiring the suspected phishing host name matching the phishing target keyword and the phishing URL path corresponding to the phishing target, and the suspected phishing URL is
  • the URL detects the technical means for determining whether the suspected phishing URL is a phishing URL, and overcomes the problem that the passive detection triggered by the user in the prior art cannot cope with the increasingly popular phishing attack, thereby realizing an earlier discovery of phishing.
  • the website has improved the efficiency of phishing website detection.
  • FIG. 4 is a schematic structural diagram of Embodiment 2 of a phishing detection apparatus according to the present invention. As shown in FIG. 4, this embodiment includes:
  • the suspected host acquisition module 41 is configured to obtain a suspected fishing host name that matches the keyword of the fishing target;
  • the URL path obtaining module 42 is configured to acquire a phishing URL path corresponding to the phishing target
  • a URL construction module 43 configured to join the suspected phishing host name and the phishing URL path into a suspected phishing URL
  • the detecting module 44 is configured to detect the suspected phishing URL, and determine the suspected fishing
  • the URL is a phishing URL.
  • the suspected host acquisition module 41 specifically includes:
  • the log obtaining unit 411 is configured to obtain a host query log from the DNS server.
  • a list determining unit 412 configured to determine a host name list according to the host query log
  • a pre-processing unit 413 configured to pre-process the host name list to form a valid host name list
  • the matching unit 414 is configured to match the keyword of the phishing target, and determine, from the valid hostname list, a suspected phishing host that matches the keyword of the phishing target.
  • pre-processing unit 413 is specifically configured to perform at least one of the following operations,
  • the host name of the host whose port is in the closed state is deleted from the host name list;
  • the URL path obtaining module 42 is specifically configured to read the phishing URL path corresponding to the phishing target from the phishing database.
  • the URL path obtaining module 42 is specifically configured to: sort the at least two phishing URL paths according to the frequency of occurrence from high to low;
  • the constructing module 43 is specifically configured to: splicing the suspected phishing host name and the phishing URL path in sequence according to the sorting, to obtain at least two suspected phishing URLs; and the detecting module 44 is specifically configured to sequentially perform the At least two suspected phishing URLs are detected.
  • the detecting module 44 specifically includes:
  • the access unit 441 is configured to access the suspected phishing URL, and obtain a page corresponding to the suspected phishing URL;
  • the determining unit 442 is configured to determine that the suspected phishing URL is the phishing URL if the page includes a login box and a keyword of the phishing target.
  • the host query log is obtained from the DNS server, the host name list is determined according to the host query log, the host name list is preprocessed, the target of the phishing target is determined, and the suspected phishing host name is determined, and then the suspected phishing host name and the phishing database are Obtaining the phishing URL path corresponding to the phishing target constitutes a suspected phishing URL, and finally determining whether the suspected phishing URL is a phishing URL by detecting the suspected phishing URL, not only overcomes the passive detection that is triggered by the user in the prior art, and cannot cope with The problem of phishing attacks has become more and more common, and the earlier discovery of phishing websites has been realized, the efficiency of phishing websites has been improved, and the accuracy of phishing websites has been improved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提供一种网络钓鱼检测方法及装置。该方法包括:获取与钓鱼目标关键词匹配的疑似钓鱼主机名;获取与所述钓鱼目标对应的钓鱼统一资源定位符URL路径;将所述疑似钓鱼主机名和所述钓鱼URL路径拼接成疑似钓鱼URL;对所述疑似钓鱼URL进行检测,确定所述疑似钓鱼URL是否为钓鱼URL。本发明实施例通过主动获取与钓鱼目标关键词匹配的疑似钓鱼主机名和与所述钓鱼目标对应的钓鱼URL路径,拼接成疑似钓鱼URL,并对疑似钓鱼URL进行检测确定是否为钓鱼URL的技术手段,克服了现有技术中依靠用户触发的被动检测无法应对越来越猖獗普遍的钓鱼攻击的问题,进而实现了更早的发现钓鱼网站,提高了钓鱼网站检出效率。

Description

网络钓鱼检测方法及装置
技术领域 本发明涉及网络安全技术领域, 尤其是一种网络钓鱼检测方法及装置。 背景技术
网络钓鱼, 是指通过发送垃圾电子邮件等方式, 将收信用户引诱到一个 通过精心设计与目标组织的网站非常相似的钓鱼网站上, 并获取收信用户在 此钓鱼网站上输入的个人敏感信息的网络犯罪行为。 由于电子商务和互联网 应用的普及和发展, 网络钓鱼造成的损失日益严重。
目前, 出现的众多检测和判断钓鱼攻击的技术手段可以主要分为三大类: 黑名单技术、 启发式特征检测技术和基于模式识别的检测技术。 黑名单技术 是通过用户举报或评价来维护一个不断更新的钓鱼网站名单列表, 从而阻止 更多的用户不要访问已发现的钓鱼网站。 启发式特征检测技术是通过将钓鱼 网站的链接、 文本内容、 域名信息等特征作为钓鱼网站判定的标准, 以该标 准对未知的钓鱼网站进行检测。 基于模式识别的检测技术是通过对大量钓鱼 网站样本提取特征向量, 然后进行训练后形成判别模型, 以该判别模型对未 知的钓鱼网站进行检测。 不管采用上述哪种技术, 几乎都需要客户端或者浏 览器插件被动的接受用户提交的统一资源定位符 (Uniform Universal Resource Locator, 简称 URL), 然后将用户提交的 URL发给检测装置, 检测装置才能 利用上述检测技术对该 URL是否是钓鱼网站的 URL (简称钓鱼 URL )进行 判断。
现有技术中检测装置只能被动的接受客户端提交的 URL进行检测。而钓 鱼攻击越来越猖獗普遍, 仅仅依靠用户触发的被动检测, 显然是不足以应对 钓鱼攻击的。 发明内容
本发明实施例提供一种网络钓鱼检测方法及装置, 以解决现有技术中依 靠用户触发的被动检测无法应对越来越猖獗普遍的钓鱼攻击的问题。 一方面, 本发明实施例提供了一种网络钓鱼检测方法, 包括: 获取与钓鱼目标的关键词匹配的疑似钓鱼主机名;
获取与所述钓鱼目标对应的钓鱼统一资源定位符 URL路径;
将所述疑似钓鱼主机名和所述钓鱼 URL路径拼接成疑似钓鱼 URL; 对所述疑似钓鱼 URL 进行检测, 确定所述疑似钓鱼 URL是否为钓鱼
URL。
另一方面, 本发明实施例提供了一种网络钓鱼检测装置, 包括: 疑似主机获取模块, 用于获取与钓鱼目标的关键词匹配的疑似钓鱼主机 名;
URL路径获取模块, 用于获取与所述钓鱼目标对应的钓鱼统一资源定位 符 URL路径;
URL构造模块, 用于将所述疑似钓鱼主机名和所述钓鱼 URL路径拼接 成疑似钓鱼 URL;
检测模块,用于对所述疑似钓鱼 URL进行检测,确定所述疑似钓鱼 URL 是否为钓鱼 URL。
本发明实施例通过主动获取与钓鱼目标的关键词匹配的疑似钓鱼主机名 和与所述钓鱼目标对应的钓鱼 URL路径,将所述疑似钓鱼主机名和钓鱼 URL 路径拼接成疑似钓鱼 URL, 并对所述疑似钓鱼 URL进行检测确定所述疑似 钓鱼 URL是否为钓鱼 URL的技术手段, 克服了现有技术中依靠用户触发的 被动检测无法应对越来越猖獗普遍的钓鱼攻击的问题, 进而实现了更早的发 现钓鱼网站, 提高了钓鱼网站检出效率。 附图说明 为了更清楚地说明本发明实施例或现有技术中的技术方案, 下面将对实 施例或现有技术描述中所需要使用的附图作一简单地介绍, 显而易见地, 下 面描述中的附图是本发明的一些实施例, 对于本领域普通技术人员来讲, 在 不付出创造性劳动性的前提下, 还可以根据这些附图获得其他的附图。
图 1是本发明提供的一种网络钓鱼检测方法实施例一的流程示意图。 图 2是本发明提供的一种网络钓鱼检测方法实施例二的流程示意图。 图 3是本发明提供的一种网络钓鱼检测装置实施例一的结构示意图。 图 4为本发明提供的一种网络钓鱼检测装置实施例二的结构示意图。 具体实施方式 为使本发明的目的、 技术方案和优点更加清楚, 下面将结合本发明实施 例中的附图, 对本发明实施例中的技术方案进行清楚、 完整地描述, 显然, 所描述的实施例是本发明一部分实施例, 而不是全部的实施例。 基于本发明 中的实施例, 本领域普通技术人员在没有作出创造性劳动前提下所获得的所 有其他实施例, 都属于本发明保护的范围。
在实现本发明的过程中, 发明人通过分析实际的钓鱼举报数据, 发现当 前超过 90%的钓鱼攻击依然是传统模式, 即通过采用对钓鱼目标有仿冒行为 的统一资源定位符 (Uniform Universal Resource Locator, 简称 URL)和其对应 网页吸引欺骗用户泄露自己的账号等机密信息。 这里的钓鱼目标是指被钓鱼 网站的仿冒对象, 比如淘宝网, 中国工商银行等。 其中 URL 包括主机名和 URL路径, t匕^口 http:〃 item.taobao.com/member/minilogin.asp这是个 URL, 它由主 机名 http:〃 item.taobao.com/和 URL路径 /member/minilogin.asp组成。以釣鱼目标 为淘宝网举例来说, 对该钓鱼目标有仿冒行为的 URL, 即钓鱼网站可以是 http://item.taobao.cvbda.co.cc/member/minilogin.asp。 因 jth , 通过十网络上活跃主 机进行有针对性的扫描和检测, 可以检索出有仿冒倾向的主机作为疑似钓鱼 主机。 另一方面, 由于钓鱼攻击中执行钓鱼功能的页面 URL现在的结构层次 越来越复杂, 一般采用多级域多层级路径的形式, 因此仅仅依靠主机往往很 难检测出真正的钓鱼页面。 所以, 本发明实施例提出采用已经判定的钓鱼数 据库中的 URL路径作为补充, 和疑似钓鱼主机名拼接在一起, 从而构造出完 整的疑似钓鱼 URL。 然后对疑似钓鱼 URL进行检测, 以确定钓鱼 URL。
图 1是本发明提供的一种网络钓鱼检测方法实施例一的流程示意图。 如 图 1所示, 该实施例包括:
步骤 101、 获取与钓鱼目标的关键词匹配的疑似钓鱼主机名;
举例来说, 检测装置可以基于网络管理员手工输入获取与钓鱼目标的关 键词匹配的疑似钓鱼主机名, 也可以是从域名系统( Domain Name System, 简称 DNS )服务器查询获取疑似钓鱼主机名, 本实施例对此不作限定。 所述 与钓鱼目标的关键词匹配的疑似钓鱼主机名通常是与钓鱼目标的主机名相近 似的主机名。 比如钓鱼目标为淘宝网, 其关键词可以是 taobao , 与该关键词 匹配的疑似钓鱼主机名可以是 htt :〃 www.taobao.co.cc/等。
步骤 102、 获取与所述钓鱼目标对应的钓鱼 URL路径;
这里, 与所述钓鱼目标对应的钓鱼 URL路径是指针对该钓鱼目标, 使用 过的钓鱼 URL路径, 通常可以从现有的数据库中获取。 其中, 现有数据库中 存储了钓鱼目标和与钓鱼目标对应的一个或多个钓鱼 URL ,从所述钓鱼 URL 中截取出钓鱼 URL路径。
另外需要特别说明的是, 实际应用中步骤 101和步骤 102之间没有一定 的先后顺序关系, 步骤 101和步骤 102都只要在步骤 103之前执行即可。
步骤 103、 将所述疑似钓鱼主机名和所述钓鱼 URL路径拼接成疑似钓鱼
URL;
步骤 104、 对所述疑似钓鱼 URL进行检测, 确定所述疑似钓鱼 URL是 否为钓鱼 URL。 单技术、 启发式特征检测技术、 基于模式识别的检测技术等, 本实施例对此 不作限定。
实际应用中, 可以对一个钓鱼目标进行网络钓鱼检测, 也可以对多个钓 鱼目标进行网络钓鱼检测, 本实施例对此不作限定。
本发明实施例通过主动获取与钓鱼目标的关键词匹配的疑似钓鱼主机名 和与所述钓鱼目标对应的钓鱼 URL路径,将所述疑似钓鱼主机名和钓鱼 URL 路径拼接成疑似钓鱼 URL, 并对疑似钓鱼 URL进行检测确定所述疑似钓鱼 URL是否为钓鱼 URL的技术手段, 克服了现有技术中依靠用户触发的被动 检测无法应对越来越猖獗普遍的钓鱼攻击的问题, 进而实现了更早的发现钓 鱼网站, 提高了钓鱼网站检出效率。
图 2是本发明提供的一种网络钓鱼检测方法实施例二的流程示意图。 如 图 2所示, 该实施例包括:
步骤 201、 从 DNS服务器获取主机查询日志;
这里的主机查询日志包含查询的主机名、查询时间、查询来源 IP等信息, 具体可以是权威查询日志, 也可以是递归查询日志, 本实施例对此不作限定。
步骤 202、 根据所述主机查询日志确定主机名列表; 将所述主机查询日志中查询的主机名提取出来, 形成主机名列表 步骤 203、 对所述主机名列表进行预处理, 形成有效主机名列表; 这里的预处理包括但不限于以下任一种方式或方式的组合: 1 )从所述主 机名列表中删除重复的主机名; 2 )从所述主机名列表中删除端口处于关闭状 态的主机的主机名; 3 )从所述主机名列表中删除白名单中的主机名; 4 )从 所述主机名列表中删除页面等级 Page Rank值正常的主机的主机名。
步骤 204、 匹配钓鱼目标的关键词, 从所述有效主机名列表中确定与所 述钓鱼目标的关键词匹配的疑似钓鱼主机名;
这里的钓鱼目标的关键词, 比如钓鱼目标为淘宝网, 则该钓鱼目标的关 键词可以为 taobao。进一步地,这里还可以用关键词的组合来匹配, 比如 item 和 taobao的组合经常出现在针对淘宝网的钓鱼 URL主机中, 则可以用 item 和 taobao 的组合来匹配所述有效主机名列表中的主机名, 比如匹配中了 http:〃 item.taobao.cvbda.co.cc/作为一个针对淘宝网的疑似钓鱼主机名。
步骤 205、 从钓鱼数据库中读取与所述钓鱼目标对应的钓鱼 URL路径; 这里可以采用现有技术中任何一个公开的钓鱼举报数据源作为钓鱼数据 库, 比如 phishtank.com等。 钓鱼数据库包含如下信息: 钓鱼目标和与该钓鱼 目标对应的钓鱼 URL。 步骤 205在钓鱼数据库确定与所述钓鱼目标对应的钓 鱼 URL后, 读取钓鱼 URL中的路径部分, 也就是钓鱼 URL路径。 这里的与 所述钓鱼目标对应的钓鱼 URL路径可以是一个, 也可以是多个, 本实施例对 此不做限定。 另外需要特别说明的是, 步骤 205和步骤 201〜204之间没有一 定的先后顺序关系, 步骤 205和步骤 201〜204均只要在步骤 206之前执行即 可。
若与所述钓鱼目标对应的钓鱼 URL路径有至少两个,则步骤 205进一步 包括: 按照出现频率从高到低对所述至少两个与所述钓鱼目标对应的钓鱼 URL路径进行排序, 然后获取出现频率最高的 N个钓鱼 URL路径组成高频 钓鱼路径列表, N为大于 1的自然数。
步骤 206、 将所述疑似钓鱼主机名和所述钓鱼 URL路径拼接成疑似钓鱼 URL;
若与所述钓鱼目标对应的钓鱼 URL路径有至少两个, 对应地, 步骤 206 中将所述疑似钓鱼主机名依次和所述高频钓鱼路径列表中的钓鱼 URL路径 进行拼接, 得到疑似钓鱼 URL列表。
步骤 207、访问所述疑似钓鱼 URL ,获取所述疑似钓鱼 URL对应的页面; 这里可以采用现有技术中的在线访问嗅探, 判断是否可以在线访问所述 疑似钓鱼 URL, 若不能在线访问则结束, 或继续对下一个疑似钓鱼 URL进 行在线访问嗅探。
步骤 208、 若所述页面包含登陆框和所述钓鱼目标的关键词, 确定所述 疑似钓鱼 URL为钓鱼 URL。
发明人在实现本发明的过程中发现,通常钓鱼网站的页面上都有登陆框。 另外, 具体地, 判断所述页面的标题后 ( meta=title )的值和版权( copyright ) 处的字符串是否包含所述钓鱼目标的关键词。 其中 Meta=title 是页面中头 ( head )部分的标题(title )部分, 有具体的值, 一般为一连串文本, 描述这 个页面用途的, 被浏览器在顶端呈现。 进一步地, 步骤 208中若所述页面不 包含登陆框和 /或所述钓鱼目标的关键词, 则确定所述疑似钓鱼 URL不是钓 鱼 URL。
本实施例通过步骤 207和步骤 208对疑似钓鱼 URL是否为钓鱼 URL进 行进一步的判断, 提高结果的准确性。
本发明实施例通过从 DNS服务器获取主机查询日志,根据主机查询日志 确定主机名列表, 对主机名列表进行预处理后匹配钓鱼目标关键词确定疑似 钓鱼主机名, 然后将疑似钓鱼主机名和从钓鱼数据库中获取与钓鱼目标对应 的钓鱼 URL路径拼接构成疑似钓鱼 URL, 最后通过对疑似钓鱼 URL进行检 测确定所述疑似钓鱼 URL是否为钓鱼 URL, 不仅克服了现有技术中依靠用 户触发的被动检测无法应对越来越猖獗普遍的钓鱼攻击的问题, 实现了更早 的发现钓鱼网站, 提高了钓鱼网站检出效率, 更提高了钓鱼网站检出结果的 准确性。
图 3是本发明提供的一种网络钓鱼检测装置实施例一的结构示意图。 如 图 3所示, 该实施例包括:
疑似主机获取模块 31 , 用于获取与钓鱼目标的关键词匹配的疑似钓鱼主 机名;
URL路径获取模块 32,用于获取与所述钓鱼目标对应的钓鱼 URL路径; URL构造模块 33 , 用于将所述疑似钓鱼主机名和所述钓鱼 URL路径拼 接成疑似钓鱼 URL;
检测模块 34, 用于对所述疑似钓鱼 URL进行检测, 确定所述疑似钓鱼 URL是否为钓鱼 URL。
本实施例的具体实现参照本发明提供的一种网络钓鱼检测方法实施例 —。 本发明实施例通过主动获取与钓鱼目标的关键词匹配的疑似钓鱼主机名 和与所述钓鱼目标对应的钓鱼 URL路径,将所述疑似钓鱼主机名和钓鱼 URL 路径拼接成疑似钓鱼 URL, 并对疑似钓鱼 URL进行检测确定所述疑似钓鱼 URL是否为钓鱼 URL的技术手段, 克服了现有技术中依靠用户触发的被动 检测无法应对越来越猖獗普遍的钓鱼攻击的问题, 进而实现了更早的发现钓 鱼网站, 提高了钓鱼网站检出效率。
图 4为本发明提供的一种网络钓鱼检测装置实施例二的结构示意图。 如 图 4所示, 该实施例包括:
疑似主机获取模块 41, 用于获取与钓鱼目标的关键词匹配的疑似钓鱼主 机名;
URL路径获取模块 42,用于获取与所述钓鱼目标对应的钓鱼 URL路径;
URL构造模块 43 , 用于将所述疑似钓鱼主机名和所述钓鱼 URL路径拼 接成疑似钓鱼 URL;
检测模块 44, 用于对所述疑似钓鱼 URL进行检测, 确定所述疑似钓鱼
URL是否为钓鱼 URL。
疑似主机获取模块 41具体包括:
日志获取单元 411 , 用于从 DNS服务器获取主机查询日志;
列表确定单元 412, 用于根据所述主机查询日志, 确定主机名列表; 预处理单元 413 , 用于对所述主机名列表进行预处理, 形成有效主机名 列表;
匹配单元 414, 用于匹配所述钓鱼目标的关键词, 从所述有效主机名列 表中确定与所述钓鱼目标的关键词匹配的疑似钓鱼主机。
进一步地, 预处理单元 413具体用于进行下述至少一种操作,
从所述主机名列表中删除重复的主机名;
从所述主机名列表中删除端口处于关闭状态的主机的主机名;
从所述主机名列表中删除白名单中的主机名; 从所述主机名列表中删除 Page Rank值正常的主机的主机名。 进一步地, URL路径获取模块 42具体用于, 从钓鱼数据库中读取与所 述钓鱼目标对应的钓鱼 URL路径。
进一步地, 若所述与所述钓鱼目标对应的钓鱼 URL路径有至少两个, URL路径获取模块 42具体用于, 将所述至少两个钓鱼 URL路径按照出现频 率从高到低依次排序; URL构造模块 43具体用于, 按照所述排序依次将所 述疑似钓鱼主机名和所述钓鱼 URL路径进行拼接, 得到至少两个疑似钓鱼 URL; 检测模块 44 具体用于, 按照所述排序依次对所述至少两个疑似钓鱼 URL进行检测。
进一步地, 检测模块 44具体包括:
访问单元 441 , 用于访问所述疑似钓鱼 URL , 获取所述疑似钓鱼 URL对 应的页面;
判断单元 442, 用于若所述页面包含登陆框和所述钓鱼目标的关键词, 确定所述疑似钓鱼 URL为所述钓鱼 URL。
本实施例的具体实现参照本发明提供的一种网络钓鱼检测方法实施例 二。 本发明实施例通过从 DNS服务器获取主机查询日志, 根据主机查询日志 确定主机名列表, 对主机名列表进行预处理后匹配钓鱼目标关键词确定疑似 钓鱼主机名, 然后将疑似钓鱼主机名和从钓鱼数据库中获取与钓鱼目标对应 的钓鱼 URL路径拼接构成疑似钓鱼 URL, 最后通过对疑似钓鱼 URL进行检 测确定所述疑似钓鱼 URL是否为钓鱼 URL, 不仅克服了现有技术中依靠用 户触发的被动检测无法应对越来越猖獗普遍的钓鱼攻击的问题, 实现了更早 的发现钓鱼网站, 提高了钓鱼网站检出效率, 更提高了钓鱼网站检出结果的 准确性。
本领域普通技术人员可以理解: 实现上述方法实施例的全部或部分步骤 可以通过程序指令相关的硬件来完成, 前述的程序可以存储于一计算机可读 取存储介质中, 该程序在执行时, 执行包括上述方法实施例的步骤; 而前述 的存储介质包括: ROM、 RAM, 磁碟或者光盘等各种可以存储程序代码的介 质。
最后应说明的是: 以上实施例仅用以说明本发明的技术方案, 而非对其 限制; 尽管参照前述实施例对本发明进行了详细的说明, 本领域的普通技术 人员应当理解: 其依然可以对前述各实施例所记载的技术方案进行修改, 或 者对其中部分技术特征进行等同替换; 而这些修改或者替换, 并不使相应技 术方案的本质脱离本发明各实施例技术方案的精神和范围。

Claims

权 利 要 求 书
1、 一种网络钓鱼检测方法, 其特征在于, 包括:
获取与钓鱼目标的关键词匹配的疑似钓鱼主机名;
获取与所述钓鱼目标对应的钓鱼统一资源定位符 URL路径;
将所述疑似钓鱼主机名和所述钓鱼 URL路径拼接成疑似钓鱼 URL; 对所述疑似钓鱼 URL 进行检测, 确定所述疑似钓鱼 URL是否为钓鱼 URL。
2、 根据权利要求 1所述的方法, 其特征在于, 所述获取与钓鱼目标的关 键词匹配的疑似钓鱼主机名具体包括:
从域名系统 DNS服务器获取主机查询日志;
根据所述主机查询日志, 确定主机名列表;
对所述主机名列表进行预处理, 形成有效主机名列表;
匹配所述钓鱼目标的关键词, 从所述有效主机名列表中确定与所述钓鱼 目标的关键词匹配的疑似钓鱼主机名。
3、 根据权利要求 2所述的方法, 其特征在于, 所述对所述主机名列表进 行预处理具体包括下述至少一个操作:
从所述主机名列表中删除重复的主机名;
从所述主机名列表中删除端口处于关闭状态的主机的主机名;
从所述主机名列表中删除白名单中的主机名;
从所述主机名列表中删除页面等级 Page Rank值正常的主机的主机名。
4、 根据权利要求 1所述的方法, 其特征在于, 所述获取与所述钓鱼目标 对应的钓鱼 URL路径具体包括:
从钓鱼数据库中读取与所述钓鱼目标对应的钓鱼 URL路径。
5、 根据权利要求 1所述的方法, 其特征在于, 若所述与所述钓鱼目标对 应的钓鱼 URL路径有至少两个, 则所述将所述疑似钓鱼主机名和所述钓鱼
URL路径拼接成疑似钓鱼 URL之前还包括:
将所述至少两个钓鱼 URL路径按照出现频率从高到低依次排序; 所述将所述疑似钓鱼主机名和所述钓鱼 URL路径拼接成疑似钓鱼 URL 具体包括:
按照所述排序依次将所述疑似钓鱼主机名和所述钓鱼 URL路径进行拼 接, 得到至少两个疑似钓鱼 URL;
所述对所述疑似钓鱼 URL进行检测具体包括:
6、根据权利要求 1所述的方法, 其特征在于, 所述对所述疑似钓鱼 URL 进行检测具体包括:
访问所述疑似钓鱼 URL , 获取所述疑似钓鱼 URL对应的页面; 若所述页面包含登陆框和所述钓鱼目标的关键词, 确定所述疑似钓鱼 URL为所述钓鱼 URL。
7、 一种网络钓鱼检测装置, 其特征在于, 包括:
疑似主机获取模块, 用于获取与钓鱼目标的关键词匹配的疑似钓鱼主机 名;
URL路径获取模块, 用于获取与所述钓鱼目标对应的钓鱼统一资源定位 符 URL路径;
URL构造模块, 用于将所述疑似钓鱼主机名和所述钓鱼 URL路径拼接 成疑似钓鱼 URL;
检测模块,用于对所述疑似钓鱼 URL进行检测,确定所述疑似钓鱼 URL 是否为钓鱼 URL。
8、 根据权利要求 7所述的装置, 其特征在于, 所述疑似主机获取模块具 体包括:
日志获取单元, 用于从域名系统 DNS服务器获取主机查询日志; 列表确定单元, 用于根据所述主机查询日志, 确定主机名列表; 预处理单元, 用于对所述主机名列表进行预处理, 形成有效主机名列表; 匹配单元, 用于匹配所述钓鱼目标的关键词, 从所述有效主机名列表中 确定与所述钓鱼目标的关键词匹配的疑似钓鱼主机。
9、 根据权利要求 8所述的装置, 其特征在于, 所述预处理单元具体用于 进行下述至少一个操作,
从所述主机名列表中删除重复的主机名;
从所述主机名列表中删除端口处于关闭状态的主机的主机名;
从所述主机名列表中删除白名单中的主机名;
从所述主机名列表中删除页面等级 Page Rank值正常的主机的主机名。
10、 根据权利要求 7所述的装置, 其特征在于, 所述 URL路径获取模块 具体用于, 从钓鱼数据库中读取与所述钓鱼目标对应的钓鱼 URL路径。
11、 根据权利要求 7所述的装置, 其特征在于, 若所述与所述钓鱼目标 对应的钓鱼 URL路径有至少两个, 所述 URL路径获取模块具体用于,
将所述至少两个钓鱼 URL路径按照出现频率从高到低依次排序; 所述 URL构造模块具体用于,
按照所述排序依次将所述疑似钓鱼主机名和所述钓鱼 URL路径进行拼 接, 得到至少两个疑似钓鱼 URL;
所述检测模块具体用于,
12、 根据权利要求 7所述的装置, 其特征在于, 所述检测模块具体包括: 访问单元, 用于访问所述疑似钓鱼 URL, 获取所述疑似钓鱼 URL对应 的页面;
判断单元, 用于若所述页面包含登陆框和所述钓鱼目标的关键词, 确定 所述疑似钓鱼 URL为所述钓鱼 URL。
PCT/CN2011/083671 2011-07-28 2011-12-08 网络钓鱼检测方法及装置 WO2013013475A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201110212909.X 2011-07-28
CN201110212909.XA CN102316099B (zh) 2011-07-28 2011-07-28 网络钓鱼检测方法及装置

Publications (1)

Publication Number Publication Date
WO2013013475A1 true WO2013013475A1 (zh) 2013-01-31

Family

ID=45428916

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/083671 WO2013013475A1 (zh) 2011-07-28 2011-12-08 网络钓鱼检测方法及装置

Country Status (2)

Country Link
CN (1) CN102316099B (zh)
WO (1) WO2013013475A1 (zh)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103379111A (zh) * 2012-04-21 2013-10-30 中南林业科技大学 一种网络钓鱼智能防御系统
CN102833233B (zh) * 2012-08-06 2015-07-01 北京奇虎科技有限公司 一种识别网站页面的方法和装置
CN103685174B (zh) * 2012-09-07 2016-12-21 中国科学院计算机网络信息中心 一种不依赖样本的钓鱼网站检测方法
CN103067387B (zh) * 2012-12-27 2016-01-27 中国建设银行股份有限公司 一种反钓鱼监测系统和方法
CN104113539A (zh) * 2014-07-11 2014-10-22 哈尔滨工业大学(威海) 一种钓鱼网站引擎探测方法及装置
CN106209488B (zh) * 2015-04-28 2021-01-29 北京瀚思安信科技有限公司 用于检测网站攻击的方法和设备
CN105138912A (zh) * 2015-09-25 2015-12-09 北京奇虎科技有限公司 钓鱼网站检测规则的自动生成方法及装置
CN107181758A (zh) * 2017-06-30 2017-09-19 微梦创科网络科技(中国)有限公司 识别黑客行为的方法及系统
CN107360197B (zh) * 2017-09-08 2020-12-25 杭州安恒信息技术股份有限公司 一种基于dns日志的网络钓鱼分析方法及装置
CN108804926B (zh) * 2018-05-23 2020-06-26 腾讯科技(深圳)有限公司 一种通用Web应用漏洞检测、修复方法以及装置
CN110929107A (zh) * 2019-10-23 2020-03-27 广州艾媒数聚信息咨询股份有限公司 一种分析网络访问日志的方法、系统、装置和存储介质
CN114095278B (zh) * 2022-01-19 2022-05-24 南京明博互联网安全创新研究院有限公司 一种基于混合特征选择框架的钓鱼网站检测方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101310502A (zh) * 2005-09-30 2008-11-19 趋势科技股份有限公司 安全管理设备、通信系统及访问控制方法
CN101341717A (zh) * 2005-12-23 2009-01-07 国际商业机器公司 评估和访问网络地址的方法
US20090300768A1 (en) * 2008-05-30 2009-12-03 Balachander Krishnamurthy Method and apparatus for identifying phishing websites in network traffic using generated regular expressions

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101303700B (zh) * 2008-06-13 2010-04-21 成都市华为赛门铁克科技有限公司 网页收集的方法及其系统
CN101534306B (zh) * 2009-04-14 2012-01-11 深圳市腾讯计算机系统有限公司 一种钓鱼网站的检测方法及装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101310502A (zh) * 2005-09-30 2008-11-19 趋势科技股份有限公司 安全管理设备、通信系统及访问控制方法
CN101341717A (zh) * 2005-12-23 2009-01-07 国际商业机器公司 评估和访问网络地址的方法
US20090300768A1 (en) * 2008-05-30 2009-12-03 Balachander Krishnamurthy Method and apparatus for identifying phishing websites in network traffic using generated regular expressions

Also Published As

Publication number Publication date
CN102316099B (zh) 2014-10-22
CN102316099A (zh) 2012-01-11

Similar Documents

Publication Publication Date Title
WO2013013475A1 (zh) 网络钓鱼检测方法及装置
US9985978B2 (en) Method and system for misuse detection
John et al. {deSEO}: Combating {Search-Result} Poisoning
WO2014036801A1 (zh) 一种不依赖样本的钓鱼网站检测方法
US9123027B2 (en) Social engineering protection appliance
CN106357696B (zh) 一种sql注入攻击检测方法及系统
KR100619178B1 (ko) 인터넷 검색 엔진에 있어서의 무효 클릭 검출 방법 및 장치
KR101130357B1 (ko) 외부 데이터를 사용하는 검색 엔진 스팸 검출
US8495742B2 (en) Identifying malicious queries
Chiew et al. Leverage website favicon to detect phishing websites
CN106302440B (zh) 一种多渠道获取可疑钓鱼网站的方法
US20110087648A1 (en) Search spam analysis and detection
WO2013152610A1 (zh) 钓鱼网站检测方法及设备
US20080301139A1 (en) Search Ranger System and Double-Funnel Model For Search Spam Analyses and Browser Protection
Kim et al. Detecting fake anti-virus software distribution webpages
US20180131708A1 (en) Identifying Fraudulent and Malicious Websites, Domain and Sub-domain Names
CN112929390A (zh) 一种基于多策略融合的网络智能监控方法
US8719352B2 (en) Reputation management for network content classification
Marchal et al. PhishScore: Hacking phishers' minds
CN111756724A (zh) 钓鱼网站的检测方法、装置、设备、计算机可读存储介质
Geng et al. Combating phishing attacks via brand identity and authorization features
Banerjee et al. SUT: Quantifying and mitigating url typosquatting
GB2462456A (en) A method of determining whether a website is a phishing website, and apparatus for the same
Fatt et al. Phishdentity: Leverage website favicon to offset polymorphic phishing website
Jo et al. You're not who you claim to be: Website identity check for phishing detection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11869863

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12/06/2014)

122 Ep: pct application non-entry in european phase

Ref document number: 11869863

Country of ref document: EP

Kind code of ref document: A1