WO2017063274A1 - 一种恶意跳转及恶意嵌套类不良网站的自动判定方法 - Google Patents

一种恶意跳转及恶意嵌套类不良网站的自动判定方法 Download PDF

Info

Publication number
WO2017063274A1
WO2017063274A1 PCT/CN2015/098469 CN2015098469W WO2017063274A1 WO 2017063274 A1 WO2017063274 A1 WO 2017063274A1 CN 2015098469 W CN2015098469 W CN 2015098469W WO 2017063274 A1 WO2017063274 A1 WO 2017063274A1
Authority
WO
WIPO (PCT)
Prior art keywords
domain name
malicious
website
query
bad
Prior art date
Application number
PCT/CN2015/098469
Other languages
English (en)
French (fr)
Inventor
王翠翠
耿光刚
延志伟
Original Assignee
中国互联网络信息中心
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国互联网络信息中心 filed Critical 中国互联网络信息中心
Publication of WO2017063274A1 publication Critical patent/WO2017063274A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • H04L63/0236Filtering by address, protocol, port number or service, e.g. IP-address or URL
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/10Network architectures or network communication protocols for network security for controlling access to devices or network resources
    • H04L63/101Access control lists [ACL]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1483Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing

Definitions

  • the present invention relates to the field of network security technologies, and in particular, to an automatic determination method for a malicious jump and a malicious nested bad website.
  • the Domain Name System is a core service of the Internet. It acts as a distributed database that maps domain names and IP addresses to each other and is the gateway for users to access network resources. Because of its intuitiveness and convenience, it facilitates people's access to network resources, but also generates a large number of domain name abuse phenomena, including phishing, pornography, gambling and other bad websites, botnets, etc., resulting in the leakage of user information and The loss of property has brought serious adverse effects to the social atmosphere.
  • a malicious jump site is characterized by jumping from one URL (domain name) to another (domain name) and even jumping multiple times.
  • Such websites mainly provide bad services in the form of Refresh, JavaScript and the like.
  • JavaScript jump as an example, JavaScript itself is a programming language, and the jump mode is diversified. It is called a malicious jump.
  • search engines do not handle JavaScript code. Therefore, malicious jumps are called jump cheats.
  • a maliciously nested website means that a web page is nested on another web page by using a certain framework or through JavaScript code.
  • the web crawler gets a page, and the user sees information about another page, many websites, In particular, bad websites such as pornography and gambling are keen to use nested cheating techniques.
  • the reasons are nothing more than two points: 1) to deceive automated detection algorithms to avoid supervision and profit; 2) once investigated, it can be very It's easy to die and resurrect, because the embedded kernel website is still there, and you only need to change the shell to continue to provide services.
  • the object of the present invention is to propose an automatic determination method for a malicious jump and a malicious nested bad website. Based on the domain name resolution, the domain name query set of the above two types of websites is captured by simulating the browser access behavior, and finally The mechanism of blacklist matching implements the determination of two types of websites.
  • a method for automatically determining a malicious jump and a malicious nested bad website includes the following steps:
  • the source of the domain name blacklist in step 1) includes: 28 types of illegal and bad manual report data of the Ministry of Public Security, daily processed data of the China Anti-Phishing Alliance, and network published data.
  • the network publication data may be selected as phishingtank data.
  • the recursive server in the step 2) is built by the BIND software; and the startup query logging function is implemented by setting a configuration file of the recursive server.
  • the browser cache and the browser's DNS cache are cleared and disabled before the simulated browser access behavior in step 2).
  • NETSTATION1 and NETSTATION2 are the websites to be determined, and domainname1 to domainname4 are the domain names to be filtered corresponding to each website.
  • the query log of the recursive server is analyzed in step 4), and the corresponding domain name query sequence of each website to be determined is captured, and the query log between the websites that do not exist is accessed twice before and after the extraction.
  • the screening of each domain name in the domain name list in the step 5) includes: screening, according to the PR value of the corresponding website to be determined, each domain name in the domain name list.
  • the screening according to the PR value of the corresponding website to be determined, for each domain name in the domain name list, if the PR value of the website corresponding to the domain name to be filtered is less than a set threshold, the domain name is added to Suspected to abuse the list of domain names, otherwise, the domain name is determined to be a non-abuse domain name.
  • the implementation of the method mainly includes the following two aspects:
  • a series of cross-domain DNS query requests need to be initiated.
  • the method of the present invention starts with a DNS query request and builds a dedicated DNS recursive server. And set the computer DNS query to point to the server, by simulating the browser access behavior, capturing the collection of the domain name query, further taking the intersection of the collection and the blacklist, and if the intersection is not empty, determining that the website is a bad website.
  • the method of the invention is based on domain name resolution, does not need to parse and detect the webpage code, but extracts the domain name query sequence of the website to be determined by simulating the access behavior, avoids the misleading of the programming code, has higher accuracy, and the blacklist is based on the network.
  • the public security information data is updated and adjusted in real time with wide adaptability.
  • FIG. 1 is a schematic flowchart of a method in an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of part of the content of the common.js file in the specific implementation manner.
  • FIG. 3 is a schematic diagram of part of the content of the fery.js file in the specific embodiment.
  • the working principle of the present invention is that, in view of the malicious behavior described in the background art, a series of DNS query requests are initiated when the browser loads the webpage. Therefore, the present invention proposes a malicious jump from the perspective of domain name resolution. Automatic determination method for malicious nested bad websites. Including the following specific implementation methods:
  • blacklist data include, but are not limited to, 28 categories of illegal and unhealthy reports from the Ministry of Public Security, daily data processed by the China Anti-Phishing Alliance, and published data such as phishingtank data.
  • Use BIND software to build a recursive server which is used for recursive resolution when the website to be determined initiates a domain name query request.
  • the configuration file of the recursive server is set, the query logging function is enabled, and the recursive server cache is prohibited from being used, and the domain name initiated by the website is determined.
  • the query request is logged. Further setting the DNS query of the server to the recursive server, so that when the browser initiates the DNS query, the query request can be sent to the recursive server.
  • the server emulation browser issues a bjydhsbyxgs.cn domain name resolution request and sends the request to the built recursive server;
  • the recursive server cache is disabled in the present invention, when the recursive server receives the request, the query request is sent to the root domain name server, and the domain name of the query is recorded in the query log, and then the root domain name server returns to the recursive server. Query the address of the domain's top-level domain name server;
  • the recursive server sends a request to the server returned by the query. After receiving the request, the server queries its database, returns a resource record corresponding to the request, and the recursive server saves the returned resource record to the local cache.
  • step (3) Repeat step (3) until the correct query record is found;
  • the recursive server returns the final result to the browser and saves the result to the cache.
  • the domain name query record in its query log is as follows:
  • each query in the query log is a query record
  • the content in the first parenthesis in each query record is the domain name of the query.
  • the browser cache is first queried, and the request for domain name resolution is sent to the recursive server only if there is no relevant record of the website in the cache.
  • the browser cache should be cleared and disabled to prevent the browser from using the contents of the cache when accessing the web content.
  • clear and disable the browser's DNS cache to prevent the browser from using its own DNS cache when invoking a DNS query request.
  • the script is used to simulate the browser access behavior, and the website to be polled is polled. At the same time, each visit to a website, and then visit a non-existent website, such as www.xxxxxxxxxxxxxxxxxxx.cn , we call the website XNAME.
  • the query log of the recursive server is analyzed, and the query logs between the two XNAME websites are extracted before and after, and the corresponding domain name query sequence of each website is captured, and the domain name query sequence is merged and formed.
  • a list of filtered domain names is as follows:
  • NETSTATION1 and NETSTATION2 are the websites to be determined, and domainname1 to domainname4 are the domain names to be filtered.
  • preliminary screening is performed according to the PR (PageRank) value of the corresponding website, and a list of suspected abuse domain names is formed.
  • the threshold of the PR value is set to three.
  • the list to be filtered in (5) if the PR value of the website corresponding to domainname1 is less than 3, the domainname1 is deleted from the list, and the list of suspected abused domain names is as follows:
  • NETSTATION1 and NETSTATION2 are the websites to be detected, and domainname1a to domainname4a are suspected to abuse the domain name.
  • domainname2a in NETSTATION1 is the same as DOMAINNAMEabuse2, it is determined that NETSTATION1 is a bad website.
  • the top-level domain is .CN.
  • the URL is: http://www.xiansx.com.cn/
  • the top-level domain is a .COM website embedded in the common.js file (the contents of the file is shown in Figure 1).
  • the URL is: http://www.ag823.com/. In the detection of webpage code, no bad elements were detected, but when the user was seen to see the latter, a gambling website.
  • the top-level domain is .CN's website, and its URL is http://www.xiaoyanzi568.cn .
  • the webpage code obtained from the web crawler is the website of Nanjing Zhongmao Technology Co., Ltd., and it is impossible to detect bad elements. But the substance of the site is a typical gambling site.
  • the bad query domain name can be captured by analyzing the query log, and by matching with the blacklist, the two bad websites can be completed. Judgment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Transfer Between Computers (AREA)
  • Computer And Data Communications (AREA)

Abstract

本发明提出一种恶意跳转及恶意嵌套类不良网站的自动判定方法,包括:1)构建一域名黑名单;2)搭建一递归服务器,在一待判定网站发起域名查询请求时进行递归解析,同时启用查询日志记录功能,对待判定网站发起的域名查询请求进行记录;3)利用服务器模拟浏览器访问行为,对待判定网站进行轮询访问;4)对递归服务器的查询日志进行分析,形成待筛选的域名列表;5)对于域名列表中的各域名进行筛选,形成疑似滥用域名列表;6)将疑似滥用域名列表与所述域名黑名单进行比对,判定该待判定网站是否为恶意网站。基于域名解析,通过模拟浏览器访问行为,捕获上述两类网站的域名查询集合,最终通过黑名单匹配的机制实现两类网站的判定。

Description

一种恶意跳转及恶意嵌套类不良网站的自动判定方法 技术领域
本发明涉及网络安全技术领域,具体涉及一种恶意跳转及恶意嵌套类不良网站的自动判定方法。
背景技术
域名系统(Domain Name System,缩写DNS)是因特网的一项核心服务,它作为将域名和IP地址相互映射的一个分布式数据库,是用户访问网络资源的入口。由于其直观性和便利性,方便了人们对于网络资源的访问,但同时也产生了大量的域名滥用现象,包括网络钓鱼,色情、赌博等不良网站、僵尸网络等,造成了用户信息的泄露及财产的损失,更为社会风气带来了严重的不良影响。
随着部分顶级域(例如.CN域名)实名认证的实施以及打击域名滥用力度的增强,不法分子利用域名滥用牟取暴利的难度增大。为了躲避不良应用审查及检测,恶意跳转、恶意嵌入类型的不良网站日益增多。这两种类型的网站其典型的特点是:可见而不可得。所谓“可见而不可得”指的是通过浏览器打开网站,可以看到其属于明显的不良网站;但通过抓取网页源码进行不良应用检测时,却无法检测到不良元素存在。
恶意跳转网站其特点是从一个网址(域名)跳到另一个网址(域名),甚至跳转多次。该类网站主要通过Refresh、JavaScript等形式提供不良服务。以JavaScript跳转为例,JavaScript本身是编程语言,跳转方式多样化,被称为恶意跳转,目前对该类型的网站尚无完备的解决方案。除此之外,搜索引擎都对于JavaScript代码也均不做处理。因而恶意跳转又被叫做跳转作弊。
恶意嵌套网站,指网页通过使用某种框架,或通过JavaScript代码,实现对另外一个网页的嵌套,网络爬虫得到的是一个页面,而用户看到的是另外一个页面的信息,很多网站,特别是色情和赌博等不良网站热衷于使用嵌套作弊技术,究其原因,不外乎两点:1)用来欺骗自动化探测算法,以躲避监管、谋取利益;2)一旦被查处,可以很容易死而复生,因为被嵌入的内核网站还在,只需要换一个外壳,就能继续提供服务。由于该类网站的实现主要通过JavaScript代码恶意跳转和外域嵌入等形式提供不良服务,且JavaScript作为编程语言, 嵌入方式不计其数,给识别带来极大的困难。类似的,该类网站还可能通过CSS模式恶意嵌套或嵌入。
可见,对于恶意跳转及恶意嵌套类的不良网站,传统的识别方法,包括基于文本和链接等信息的统计学习、基于图像识别的检测均已失效。
发明内容
针对上述问题,本发明的目的是提出一种恶意跳转及恶意嵌套类不良网站的自动判定方法,基于域名解析,通过模拟浏览器访问行为,捕获上述两类网站的域名查询集合,最终通过黑名单匹配的机制实现两类网站的判定。
为达上述目的,本发明采取的具体技术方案是:
一种恶意跳转及恶意嵌套类不良网站的自动判定方法,包括以下步骤:
1)构建一域名黑名单;
2)搭建一递归服务器,在一待判定网站发起域名查询请求时进行递归解析,同时启用查询日志记录功能,对待判定网站发起的域名查询请求进行记录;同时设置一服务器的DNS查询指向该递归服务器;
3)利用服务器模拟浏览器访问行为,对待判定网站进行轮询访问;
4)对递归服务器的查询日志进行分析,捕获各待判定网站的对应的域名查询序列,对其进行归并,形成待筛选的域名列表;
5)对于域名列表中的各域名进行筛选,形成疑似滥用域名列表;
6)将疑似滥用域名列表与所述域名黑名单进行比对,并取交集;若交集不为空,则判定该交集对应的待判定网站为恶意网站。
进一步地,步骤1)中所述域名黑名单的来源包括:公安部28类违法和不良的人工举报数据、中国反钓鱼网站联盟每日处理的数据及网络公布数据。
进一步地,所述网络公布数据可选为phishingtank数据。
进一步地,步骤2)中所述递归服务器通过BIND软件搭建;所述启动查询日志记录功能通过设置递归服务器的配置文件实现。
进一步地,步骤2)中所述模拟浏览器访问行为之前清除并禁用浏览器缓存及浏览器的DNS缓存。
进一步地,步骤4)中所述待筛选的域名列表的格式如下:
NETSTATION1—>(domainname1,domainname2,…..)
NETSTATION2—>(domainname3,domainname4,……)
……
其中NETSTATION1、NETSTATION2为待判定网站,domainname1~domainname4为各网站对应的待筛选的域名。
进一步地,步骤3)中所述对待判定网站进行轮询访问时,每访问一个待判定网站,再对一个不存在的网站进行访问。
进一步地,步骤4)中所述对递归服务器的查询日志进行分析,捕获各待判定网站的对应的域名查询序列包括,提取前后两次访问该不存在的网站之间的查询日志。
进一步地,步骤5)中所述对于域名列表中的各域名进行筛选包括,对于域名列表中的各域名依据其对应的待判定网站的PR值进行筛选。
进一步地,所述对于域名列表中的各域名依据其对应的待判定网站的PR值进行筛选包括,若一待筛选域名对应的网站的PR值小于一设定阀值,则将此域名添加至疑似滥用域名列表,否则,判定该域名为非滥用域名。
如上述,本方法的实现主要包括以下两个方面:
(1)提出黑名单匹配机制进行不良网站判定。
考虑到以恶意跳转和恶意嵌入不良网站作弊行为往往是为了核心网站内容的重复利用。在进行不良网站检测之前,首先构建一个大型的黑名单,该黑名单的大小和实时性决定了在实际互联网检测环境中的有效性。
(2)通过搭建递归服务器捕获不良网站域名查询集合
考虑到恶意跳转及恶意嵌套不良网站,其在页面载入的过程中,均需要发起一系列跨域的DNS查询请求,本发明的方法从DNS查询请求入手,搭建一个专用的DNS递归服务器,并设置电脑DNS查询指向该服务器,通过模拟浏览器访问行为,捕获域名查询的集合,进一步取该集合与黑名单的交集,如果交集不为空,则判定该网站为不良网站。
较传统的识别方法具有以下优点:
本发明的方法基于域名解析,无需对网页代码进行解析和检测,而是通过模拟访问行为提取待判定网站的域名查询序列,避免编程代码的误导,具有更高的准确性,且黑名单根据网络公开的安全信息数据实时的更新和调整,具有广泛的适应性。
附图说明
图1为本发明实施例中方法流程示意图。
图2为具体实施方式中所述common.js文件部分内容的示意图。
图3为具体实施方式中所述fery.js文件部分内容的示意图。
具体实施方式
为使本发明的上述特征和优点能更明显易懂,下文特举实施例,并配合所附图作详细说明如下。
本发明的工作原理是:考虑到背景技术所述的恶意行为在浏览器载入网页时都会发起一系列DNS查询请求,因此,本发明从域名解析的角度出发,提出了一种恶意跳转及恶意嵌套类不良网站的自动判定方法。包括以下的具体实现方式:
(1)构建大型黑名单
首先,构建一个大型的黑名单考虑到恶意跳转和恶意嵌入不良网站的作弊行为往往是为了核心网站内容的重复利用。该黑名单的大小和实时性决定了在实际互联网检测环境中的有效性。黑名单数据来源包括但不限定于:公安部28类违法和不良的人工举报数据,中国反钓鱼网站联盟每日处理的数据、网络公布数据如phishingtank数据等。
(2)搭建递归服务器
利用BIND软件搭建递归服务器,用于待判定网站在发起域名查询请求时进行递归解析,同时设置递归服务器的配置文件,启用查询日志记录功能,同时设置禁止使用递归服务器缓存,对待判定网站发起的域名查询请求进行记录。进一步设置服务器的DNS查询指向该递归服务器,这样浏览器在发起DNS查询时,即可将查询请求发送至该递归服务器。
以网站www.bjydhsbyxgs.cn的解析为例:
(1)服务器模拟浏览器发出bjydhsbyxgs.cn域名解析请求,并将该请求发送给搭建的递归服务器;
(2)由于在本发明中禁用递归服务器缓存,当递归服务器收到请求之后,将查询请求发给根域名服务器,同时在查询日志中记录查询的域名,然后根域名服务器返回给递归服务器一个所查询域的顶级域名服务器的地址;
(3)递归服务器再向查询返回的服务器发送请求,该服务器收到请求后查询其数据库,返回与此请求对应资源记录,递归服务器将返回的资源记录保存到本地缓存。
(4)重复步骤(3),直至找到正确的查询记录;
(5)递归服务器将最终结果返回给浏览器,并将结果保存至缓存。
其查询日志中的域名查询记录如下所示:
21-May-2015 17:49:57.349 client 192.168.189.129#35835(bjydhsbyxgs.cn):query:bjydhsbyxgs.cn IN AAAA+(192.168.189.129)
21-May-2015 17:49:57.349 client 192.168.189.129#53751(bjydhsbyxgs.cn):query:bjydhsbyxgs.cn IN A+(192.168.189.129)
21-May-2015 17:49:58.162 client 192.168.189.129#53035(www.306070.com):query:www.306070.com IN AAAA+(192.168.189.129)
21-May-2015 17:50:05.007 client 192.168.189.129#53035(www.306070.com):query:www.306070.com IN AAAA+(192.168.189.129)
21-May-2015 17:50:18.303 client 192.168.189.129#54389(www.dwz.cn):query:www.dwz.cn IN AAAA+(192.168.189.129)
21-May-2015 17:50:22.251 client 192.168.189.129#59111(www.dwz.cn):query:www.dwz.cn IN A+(192.168.189.129)
其中,查询日志中的每一行为一个查询记录,每条查询记录中第一个括号中的内容即为查询的域名。
(3)清除并禁用浏览器相关缓存
因为当模拟浏览器访问网站时,首先会查询浏览器缓存,只有当缓存中无该网站的相关记录时,才将域名解析的请求发送至递归服务器。为了使递归服务器完整记录待检测网站的域名解析请求,应清除并禁用浏览器缓存,避免浏览器在访问网页内容时使用缓存中的内容。与此同时,清除并禁用浏览器的DNS缓存,避免浏览器在发起DNS查询请求时,使用其自身的DNS缓存。
(4)浏览器自动化轮询访问待判定网站列表
通过脚本模拟浏览器访问行为,对待判定网站进行轮询访问,同时,每访问一个网站,再对一个不存在的网站进行访问,例如www.xxxxxxxxxxxxxxxxxxx.cn,我们称该网站为XNAME。
(5)递归日志分析
完成待检测网站列表的轮询后,对递归服务器的查询日志进行分析,提取前后两次XNAME网站之间的查询日志,即可捕获各网站的对应的域名查询序列,对其进行归并,形成待筛选的域名列表。其列表格式如下:
NETSTATION1—>(domainname1,domainname2,…..)
NETSTATION2—>(domainname3,domainname4,……)
……
其中NETSTATION1、NETSTATION2为待判定网站,domainname1~domainname4为待筛选的域名。
(6)依据网站PR值对域名进行初步筛选
对于列表中的各域名,依据其对应网站的PR(PageRank)值进行初步筛选,形成疑似滥用域名列表。在一实施例中,将PR值的阈值设定为3。在对域名进行筛选时,若该域名对应网站的PR值小于3,则将此域名添加至疑似滥用域名列表,否则,判定该域名为非滥用域名。
例如对于(5)中待筛选列表,若domainname1对应网站的PR值小于3,则将domainname1从列表中删除,最终形成的疑似滥用域名列表如下:
NETSTATION1—>(domainname1a,domainname2a,…..)
NETSTATION2—>(domainname3a,domainname4a,……)
……
其中NETSTATION1、NETSTATION2为待检测网站,domainname1a~domainname4a为疑似滥用域名。
(7)黑名单匹配
将疑似滥用域名列表与黑名单进行比对,并取交集。若交集不为空,则判定其对应的网站为不良网站。
以(5)中的疑似滥用域名列表为例:
假设黑名单中的域名序列包括
(DOMAINNAMEabuse1,DOMAINNAMEabuse2,…DOMAINNAMEabusen)
若NETSTATION1中的domainname2a与DOMAINNAMEabuse2相同,则判定NETSTATION1为不良网站。
下面以实际的恶意网站为例,说明本发明的方法的实际判定过程:
(1)恶意嵌套网站
顶级域为.CN的网站其URL为:http://www.xiansx.com.cn/,通过common.js文件(文件部分内容如图1所示)嵌入了顶级域为.COM的网站,其URL为:http://www.ag823.com/。在进行网页代码检测时,并没有检测到不良元素存在,但当打开对用户而言看到的是后者,一个赌博网站。
(2)跳转网站——.CN跳转至.COM
顶级域为.CN的网站,其URL为http://www.xiaoyanzi568.cn,从网络爬虫获取的网页代码来看是南京中茂科技有限责任公司网站,无法检测到不良元素。但该网站的实质内容却为一个典型的赌博网站。该网站通过fery.js文件(文件部分内容如图2所示)恶意跳转到顶级域为.COM的网站,其URL为http://www.bzy888.com/
在利用本发明的方法对上述两个不良网站进行判定时,通过分析查询日志可捕获其嵌套或跳转至的不良域名,通过与黑名单进行匹配,最终均可完成对上述两种不良网站的判定。

Claims (10)

  1. 一种恶意跳转及恶意嵌套类不良网站的自动判定方法,包括以下步骤:
    1)构建一域名黑名单;
    2)搭建一递归服务器,在一待判定网站发起域名查询请求时进行递归解析,同时启用查询日志记录功能,对待判定网站发起的域名查询请求进行记录;同时设置一服务器的DNS查询指向该递归服务器;
    3)利用服务器模拟浏览器访问行为,对待判定网站进行轮询访问;
    4)对递归服务器的查询日志进行分析,捕获各待判定网站的对应的域名查询序列,对其进行归并,形成待筛选的域名列表;
    5)对于域名列表中的各域名进行筛选,形成疑似滥用域名列表;
    6)将疑似滥用域名列表与所述域名黑名单进行比对,并取交集;若交集不为空,则判定该交集对应的待判定网站为恶意网站。
  2. 如权利要求1所述的恶意跳转及恶意嵌套类不良网站的自动判定方法,其特征在于,步骤1)中所述域名黑名单的来源包括:公安部28类违法和不良的人工举报数据、中国反钓鱼网站联盟每日处理的数据及网络公布数据。
  3. 如权利要求2所述的恶意跳转及恶意嵌套类不良网站的自动判定方法,其特征在于,所述网络公布数据为phishingtank数据。
  4. 如权利要求1所述的恶意跳转及恶意嵌套类不良网站的自动判定方法,其特征在于,步骤2)中所述递归服务器通过BIND软件搭建;所述启动查询日志记录功能通过设置递归服务器的配置文件实现。
  5. 如权利要求1所述的恶意跳转及恶意嵌套类不良网站的自动判定方法,其特征在于,步骤2)中所述模拟浏览器访问行为之前清除并禁用浏览器缓存及浏览器的DNS缓存。
  6. 如权利要求1所述的恶意跳转及恶意嵌套类不良网站的自动判定方法,其特征在于,步骤3)中所述对待判定网站进行轮询访问时,每访问一个待判定网站,再对一个不存在的网站进行访问。
  7. 如权利要求6所述的恶意跳转及恶意嵌套类不良网站的自动判定方法,其特征在于,步骤4)中所述对递归服务器的查询日志进行分析,捕获各待判定网站的对应的域名查询序列包括,提取前后两次访问该不存在的网站之间的查询日志。
  8. 如权利要求1所述的恶意跳转及恶意嵌套类不良网站的自动判定方法,其特征在于,步骤4)中所述待筛选的域名列表的格式如下:
    NETSTATION1—>(domainname1,domainname2,…..)
    NETSTATION2—>(domainname3,domainname4,……)
    ……
    其中NETSTATION1、NETSTATION2为待判定网站,domainname1~domainname4为各网站对应的待筛选的域名。
  9. 如权利要求1所述的恶意跳转及恶意嵌套类不良网站的自动判定方法,其特征在于,步骤5)中所述对于域名列表中的各域名进行筛选包括,对于域名列表中的各域名依据其对应的待判定网站的PR值进行筛选。
  10. 如权利要求9所述的恶意跳转及恶意嵌套类不良网站的自动判定方法,其特征在于,所述对于域名列表中的各域名依据其对应的待判定网站的PR值进行筛选包括,若一待筛选域名对应的网站的PR值小于一设定阀值,则将此域名添加至疑似滥用域名列表,否则,判定该域名为非滥用域名。
PCT/CN2015/098469 2015-10-15 2015-12-23 一种恶意跳转及恶意嵌套类不良网站的自动判定方法 WO2017063274A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510666766.8A CN105376217B (zh) 2015-10-15 2015-10-15 一种恶意跳转及恶意嵌套类不良网站的自动判定方法
CN201510666766.8 2015-10-15

Publications (1)

Publication Number Publication Date
WO2017063274A1 true WO2017063274A1 (zh) 2017-04-20

Family

ID=55378024

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/098469 WO2017063274A1 (zh) 2015-10-15 2015-12-23 一种恶意跳转及恶意嵌套类不良网站的自动判定方法

Country Status (2)

Country Link
CN (1) CN105376217B (zh)
WO (1) WO2017063274A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108418780A (zh) * 2017-02-10 2018-08-17 阿里巴巴集团控股有限公司 Ip地址的过滤方法及装置、系统、dns服务器
CN108600054B (zh) * 2018-05-10 2020-11-20 中国互联网络信息中心 一种基于域名区文件的网站数量判定方法和系统
CN112261050B (zh) * 2020-10-23 2022-02-22 新华三信息安全技术有限公司 一种sql注入攻击的检测方法及装置
CN112511489B (zh) * 2020-10-29 2023-06-27 中国互联网络信息中心 一种域名服务滥用评估方法及装置
CN112804370A (zh) * 2020-12-29 2021-05-14 互联网域名系统北京市工程研究中心有限公司 诈骗网站的分析方法及系统
CN113676374B (zh) * 2021-08-13 2024-03-22 杭州安恒信息技术股份有限公司 目标网站线索检测方法、装置、计算机设备和介质
CN113938463B (zh) * 2021-08-27 2023-07-11 中国互联网络信息中心 一种域名滥用阻断方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101729288A (zh) * 2008-10-31 2010-06-09 中国科学院计算机网络信息中心 一种统计互联网用户网络访问行为的方法和装置
CN102523311A (zh) * 2011-11-25 2012-06-27 中国科学院计算机网络信息中心 非法域名识别方法及装置
CN102594825A (zh) * 2012-02-22 2012-07-18 北京百度网讯科技有限公司 一种内网木马的检测方法和装置
US20130036468A1 (en) * 2011-08-01 2013-02-07 Visicom Media Inc. Anti-phishing domain advisor and method thereof
CN103957201A (zh) * 2014-04-18 2014-07-30 上海聚流软件科技有限公司 基于dns的域名信息处理方法、装置及系统
CN103973704A (zh) * 2014-05-23 2014-08-06 北京奇虎科技有限公司 基于wifi设备的域名解析方法、装置及系统

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101820419B (zh) * 2010-03-23 2012-12-26 北京大学 一种挂马网页中网页木马挂接点自动定位方法
CN102299978A (zh) * 2011-09-23 2011-12-28 上海西默通信技术有限公司 Dns域名系统中的加黑名单过滤重定向方法
CN102523130B (zh) * 2011-12-06 2015-02-04 中国科学院计算机网络信息中心 不良网页检测方法及装置
CN102724187B (zh) * 2012-06-06 2016-05-25 北京奇虎科技有限公司 一种针对网址的安全检测方法及装置
CN103152442B (zh) * 2013-01-31 2016-06-01 中国科学院计算机网络信息中心 一种僵尸网络域名的检测与处理方法及系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101729288A (zh) * 2008-10-31 2010-06-09 中国科学院计算机网络信息中心 一种统计互联网用户网络访问行为的方法和装置
US20130036468A1 (en) * 2011-08-01 2013-02-07 Visicom Media Inc. Anti-phishing domain advisor and method thereof
CN102523311A (zh) * 2011-11-25 2012-06-27 中国科学院计算机网络信息中心 非法域名识别方法及装置
CN102594825A (zh) * 2012-02-22 2012-07-18 北京百度网讯科技有限公司 一种内网木马的检测方法和装置
CN103957201A (zh) * 2014-04-18 2014-07-30 上海聚流软件科技有限公司 基于dns的域名信息处理方法、装置及系统
CN103973704A (zh) * 2014-05-23 2014-08-06 北京奇虎科技有限公司 基于wifi设备的域名解析方法、装置及系统

Also Published As

Publication number Publication date
CN105376217A (zh) 2016-03-02
CN105376217B (zh) 2019-01-04

Similar Documents

Publication Publication Date Title
WO2017063274A1 (zh) 一种恶意跳转及恶意嵌套类不良网站的自动判定方法
CN105184159B (zh) 网页篡改的识别方法和装置
US8392963B2 (en) Techniques for tracking actual users in web application security systems
KR101001132B1 (ko) 웹 어플리케이션의 취약성 판단 방법 및 시스템
CN103279710B (zh) Internet信息系统恶意代码的检测方法和系统
CN103559235B (zh) 一种在线社交网络恶意网页检测识别方法
CN105933268A (zh) 一种基于全量访问日志分析的网站后门检测方法及装置
CN105760379B (zh) 一种基于域内页面关联关系检测webshell页面的方法及装置
CN104125209A (zh) 恶意网址提示方法和路由器
CN106961419A (zh) WebShell检测方法、装置及系统
CN104967628B (zh) 一种保护web应用安全的诱骗方法
CN107612924A (zh) 基于无线网络入侵的攻击者定位方法及装置
CN101370008A (zh) Sql注入web攻击的实时入侵检测系统
CN107800686B (zh) 一种钓鱼网站识别方法和装置
CN111104579A (zh) 一种公网资产的识别方法、装置及存储介质
CN104601573A (zh) 一种Android平台URL访问结果验证方法及装置
CN104378255B (zh) web恶意用户的检测方法及装置
CN107579997A (zh) 无线网络入侵检测系统
CN103914655A (zh) 一种检测下载文件安全性的方法及装置
CN103067387B (zh) 一种反钓鱼监测系统和方法
CN101895516A (zh) 一种跨站脚本攻击源的定位方法及装置
CN105635064B (zh) Csrf攻击检测方法及装置
CN107465702A (zh) 基于无线网络入侵的预警方法及装置
CN113518077A (zh) 一种恶意网络爬虫检测方法、装置、设备及存储介质
CN107547490A (zh) 一种扫描器识别方法、装置及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15906156

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 31/07/2018)

122 Ep: pct application non-entry in european phase

Ref document number: 15906156

Country of ref document: EP

Kind code of ref document: A1