WO2020211130A1 - 一种网站暗链检测方法和装置 - Google Patents

一种网站暗链检测方法和装置 Download PDF

Info

Publication number
WO2020211130A1
WO2020211130A1 PCT/CN2019/086057 CN2019086057W WO2020211130A1 WO 2020211130 A1 WO2020211130 A1 WO 2020211130A1 CN 2019086057 W CN2019086057 W CN 2019086057W WO 2020211130 A1 WO2020211130 A1 WO 2020211130A1
Authority
WO
WIPO (PCT)
Prior art keywords
url
attribute
tag
target
website
Prior art date
Application number
PCT/CN2019/086057
Other languages
English (en)
French (fr)
Inventor
程海金
王凤杰
Original Assignee
网宿科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 网宿科技股份有限公司 filed Critical 网宿科技股份有限公司
Priority to EP19856424.7A priority Critical patent/EP3745292A4/en
Priority to US16/813,799 priority patent/US20200336498A1/en
Publication of WO2020211130A1 publication Critical patent/WO2020211130A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2119Authenticating web pages, e.g. with suspicious links

Definitions

  • This application relates to the field of computer technology, in particular to a method and device for detecting dark links on websites.
  • the website URL comparison can be that the dark chain detection device will match all the URLs under the website obtained through crawler technology with the preset website URL whitelist. If a URL is not in the website URL whitelist, the website can be confirmed Hidden chain; sensitive keyword detection can be an access request to the above website URL, matching the content of the response page with the sensitive keyword library preset on the detection device, if the content of a response page contains a sensitive keyword library In the content, you can confirm that the website is hidden.
  • the website URL whitelist in the prior art has a high probability of failing to include all the URLs of the website, and the website URL may be mistakenly judged as a dark link during detection, resulting in a higher false alarm rate;
  • the preset sensitive keyword library may not contain all sensitive keywords, and some sensitive keywords may not be detected, resulting in a high false negative rate.
  • the embodiments of the present application provide a method and device for detecting dark links on websites.
  • the technical solution is as follows:
  • a method for detecting dark links on websites includes:
  • For each target URL of the target website periodically obtain all URL associated tags and all attribute content of the URL associated tags from the response page of the target URL;
  • the preset dark link attribute feature is included, it is determined that the URL corresponding to the attribute content of the URL associated tag is a dark link.
  • the method further includes:
  • An access request is initiated to each target URL in turn, and all URLs that contain the domain name of the target website in the response page of each target URL are added as target URLs.
  • the detecting whether the URL corresponding to the attribute content of the URL associated tag belongs to a preset secure URL set includes:
  • the detecting whether the attribute content of the URL associated tag includes a preset dark link attribute feature includes:
  • the method further includes:
  • the method further includes:
  • a detection log including the attribute content of the URL associated tag, the dark link URL and the target URL is generated, and the detection log is sent to the website server of the target website.
  • the preset secure URL set is updated according to the dark link URL.
  • a device for detecting dark links on websites includes a tag acquisition module and a tag detection module, wherein:
  • the tag obtaining module is configured to periodically obtain all URL associated tags and all attribute content of the URL associated tags from the response page of the target URL for each target URL of the target website;
  • the tag detection module is configured to, for each of the URL associated tags, detect whether the URL corresponding to the attribute content of the URL associated tag belongs to a preset secure URL set;
  • the tag detection module is further configured to detect whether the attribute content of the URL associated tag contains the preset dark link attribute feature if it does not belong to the preset secure URL set;
  • the tag detection module is further configured to determine that the URL corresponding to the attribute content of the URL-associated tag is a dark link if the preset dark link attribute feature is included.
  • the label acquisition module is specifically used for:
  • An access request is initiated to each target URL in turn, and all URLs that contain the domain name of the target website in the response page of each target URL are added as target URLs.
  • the label detection module is specifically used for:
  • the label detection module is specifically used for:
  • the label detection module is also used for:
  • the label detection module is also used for:
  • a detection log including the attribute content of the URL associated tag, the dark link URL and the target URL is generated, and the detection log is sent to the website server of the target website.
  • the label detection module is also used for:
  • the preset safe URL set is updated according to the dark link URL.
  • a dark link detection device in a third aspect, includes a processor and a memory.
  • the memory stores at least one instruction, at least one program, code set, or instruction set, and the at least one instruction ,
  • the at least one program, the code set or the instruction set is loaded and executed by the processor to implement the method for detecting dark links on a website as described in the first aspect.
  • a computer-readable storage medium stores at least one instruction, at least one program, code set, or instruction set, the at least one instruction, the at least one program, the code
  • the set or instruction set is loaded and executed by the processor to implement the method for detecting dark links on websites as described in the first aspect.
  • the website dark link detection method periodically obtains the URL-related tags and the attribute content of the URL-related tags from all URL response pages under the target website, and detects each URL-related tag, first detecting the URL-related tag Whether the URL corresponding to the attribute content of the URL belongs to the preset secure URL set, for the URL associated tags that do not belong to the preset secure URL set, and then check whether the attribute content of the URL associated tag contains the preset dark link attribute features, and so on Set the set of secure URLs and preset dark link attribute characteristics, and perform multi-angle detection on each URL associated tag to achieve multi-level detection of the website, which can more accurately detect whether the website is linked to the dark link to reduce the dark link. False alarm rate and false alarm rate of chain detection equipment.
  • FIG. 1 is a schematic flowchart of a method for detecting dark links on a website according to an embodiment of the application
  • FIG. 2 is a schematic flowchart of another method for detecting dark links on a website according to an embodiment of the application
  • FIG. 3 is a schematic structural diagram of a dark link detection device for a website provided by an embodiment of the application
  • FIG. 4 is a schematic structural diagram of a dark link detection device provided by an embodiment of the application.
  • the embodiment of the application provides a method for detecting dark links on a website.
  • the execution subject of the method may be dark link detection equipment, where the dark link detection equipment may be set by the website party or the dark link detection service party, and has a domain name access function and Data processing and analysis function, network equipment used to detect whether the website is hidden chain.
  • the dark link detection device performs dark link detection on a website, it can start from any URL of the website, obtain the attribute content of the URL associated tag in the URL response page, and analyze the attribute content to detect whether the URL associated tag is hidden or not. If the dark link is not detected, then any other URL will be detected until the dark link is detected or all URLs of the website are detected, so as to realize the dark link detection of the website.
  • the dark link detection device can include a processor, a memory, and a transceiver.
  • the processor can be used to perform the processing of performing website dark link detection in the following process.
  • the memory can be used to store the data needed and generated during the processing.
  • the transceiver It can be used to receive and send relevant data during processing.
  • Step 101 For each target URL of the target website, periodically obtain all URL-related tags and all attribute content of the URL-related tags from the response page of the target URL.
  • the target website is the website to be detected;
  • the target URL can be any URL of the target website;
  • the URL associated tag can be a tag that can realize URL redirection through the attribute content of the tag, including but not limited to a tag, meta tag, and iframe Tags, frame tags, embed tags, and object tags.
  • the attribute content of the tag may include the attribute content of the URL (such as href attribute, src attribute, URL attribute, etc.) and other attribute content (such as style attribute, height attribute, width attribute, left attribute, etc.).
  • each detection can start from any URL of the website, and specifically can obtain the tag and the attribute content of the tag in the response page.
  • the dark link is also a URL that can locate resources
  • the way to hang the dark link can be to use the URL associated tag that can realize the URL jump, write the dark link URL in the attribute content of the URL associated tag, and associate the tag through the URL
  • the contents of other attributes are hidden. Therefore, when obtaining tags, you can only obtain all URL-related tags and the attribute content of URL-related tags that can realize URL redirection.
  • the dark link detection device can obtain the URL of the website by initiating an access request to the multi-layer pages of the website.
  • the processing before step 101 can be as follows: Initiate an access request to the homepage of the target website, and include all the contents of the homepage. All URLs of the domain name of the target website are determined as target URLs; an access request is initiated for each target URL in turn, and the URL containing the domain name of the target website in the response page of each target URL is added as the target URL.
  • the dark chain detection device can first initiate an access request to the homepage of the target website, and then receive the received homepage containing the domain name of the target website (including website domain name and website All URLs of the subdomain of the domain name) are determined as target URLs, and the target URLs are stored in a preset set of URLs to be accessed. Next, the dark chain detection device can sequentially initiate an access request for each target URL in the access URL set, and then receive the response page that contains the domain name of the target website (including the website domain name and the subdomain name of the website domain name).
  • All URLs are added as target URLs, and the added target URLs are stored in the URL set to be accessed, that is, the dark link detection device can continue to perform the above-mentioned access and add new target URL processing for the newly added target URL, and the cycle repeats until it is obtained All URLs under the website. Under normal circumstances, the same resource on the same website can be accessed from the links of different pages under the website. Therefore, when accessing each URL, you will usually get many duplicate URLs. Here, the response page will be received. All URLs are added as target URLs, and all URLs can be added as target URLs after deduplication. The crawler technology can be used to achieve the above-mentioned processing of obtaining URLs.
  • the dark link detection device can also initiate an access request to any page under the target website, which is not limited in this embodiment.
  • Step 102 For each URL associated tag, it is detected whether the URL corresponding to the attribute content of the URL associated tag belongs to a preset secure URL set.
  • the preset safe URL set may be a set containing multiple or multiple types of URLs, and the URLs in the set may include all types of links allowed on pages of the target website, such as internal links or friendship links of the target website.
  • the URL corresponding to the attribute content of the URL associated tag can be detected by detecting the attribute content of the URL associated tag that can realize URL jump. If the URL corresponding to the attribute content of the URL association tag belongs to the preset secure URL set, it means that no dark link is written in the URL association tag; if the URL corresponding to the attribute content of the URL association tag does not belong to the preset secure URL set, then It shows that a dark link may be written in the URL associated tag, and further testing is needed. It is worth mentioning that for different URL-associated tags, the attributes that can realize URL redirection are also different.
  • the attribute that the a tag can realize URL jump is the href attribute
  • the attribute that the meta tag can realize URL jump is URL attribute
  • iframe tag can realize URL jump attribute is src attribute
  • frame tag can realize URL jump attribute is src attribute
  • embed tag can realize URL jump attribute is src attribute
  • object tag can realize URL jump
  • the attribute is the codebase attribute.
  • the URL domain name information in the attribute content of the URL associated tag can be used to determine whether the corresponding URL belongs to the preset secure URL set. Accordingly, the processing of step 102 can be as follows: detecting the corresponding attribute content of the URL associated tag Whether the URL belongs to the directory of the target website, or whether the URL domain name information in the attribute content of the URL associated tag belongs to the preset domain name whitelist.
  • URLs can usually be distinguished by domain name information. Therefore, the dark link detection device can determine whether the URL corresponding to the attribute content of the URL associated tag belongs to the preset by detecting the URL domain name information in the attribute content of the URL associated tag. Set of secure URLs. For the attribute content pointing to the URL in the URL association tag, the writing method can be different according to the page corresponding to the URL. If the URL belongs to the directory of the currently detected website, the attribute content pointing to the URL may not include domain name information. Directly start with "/", "./" or "../”; if the URL does not belong to the directory of the currently detected website, the attribute content of the pointing URL is usually in the form of a URL containing the domain name.
  • the dark link detection device can determine the URL by detecting whether the URL corresponding to the attribute content of the URL associated tag belongs to the directory of the target website, or detecting whether the URL domain name information in the attribute content of the URL associated tag belongs to the preset domain name whitelist. Whether the URL corresponding to the attribute content of the associated tag belongs to the preset safe URL set. If the attribute content of the URL associated tag belongs to the directory of the target website, or the domain name corresponding to the attribute content of the URL associated tag belongs to the preset domain name whitelist, and the two meet one of them, the dark link detection device can determine the attribute content of the URL associated tag The corresponding URL belongs to the preset safe URL set.
  • Step 103 If it does not belong to the preset secure URL set, detect whether the attribute content of the URL associated tag contains the preset dark link attribute feature.
  • the dark link attribute feature can be an attribute feature that can characterize the URL associated tag as a dark link, specifically, it can be artificially sorted out the way of hiding the dark link, and then summarized.
  • the dark link detection device can detect whether the target website is linked to the dark link by judging whether all the attribute content of the URL associated tag contains the preset dark link attribute characteristics. Specifically, the dark link detection device can analyze other attribute content of the URL associated tag (except the attribute pointing to the URL) to determine whether the attribute content of the URL associated tag contains preset dark link attribute features.
  • a corresponding detection method can be used to determine whether the attribute content of the URL-associated tag contains the preset dark-chain attribute characteristics according to the method of hanging the dark link. Accordingly, the processing of step 103 can be as follows: Whether the tag information in the attribute content is meta, or whether the attribute content of the URL associated tag matches the preset html dark link library.
  • the html dark link library can record most of the ways of hanging dark links under the current technical level, and can be continuously added and updated with the development of technology.
  • the meta tag is located at the head of the html source code of the webpage. It provides meta-information about the page. It is the main basis for search engines to determine the content of the webpage, but it will not be displayed on the page. Therefore, the attacker A large number of words and links that are not related to the webpage can be inserted into the tags, and hidden links can be realized without the help of other attribute content to hide.
  • URL-related tags other than meta tags in order to be searchable by search engines but not visible to visitors, in addition to hanging the dark link with the jump function, the dark link must be hidden.
  • the way to hang a dark link is usually to make the resources pointed to by the dark link impossible or difficult to be discovered by online visitors with the help of other attributes. Therefore, when detecting whether the attribute content of the URL associated tag contains the preset dark link attribute feature, it can be judged whether the URL associated tag in the attribute content is a meta tag, or whether the attribute content of the URL associated tag is with the preset html dark link Library matching.
  • the method of hanging the dark link in the html dark link library can include but is not limited to the way of using other attributes of the URL associated tag to hide the dark link, for example: set the color of the dark link to the background color and the text of the dark link to low Pixels, specifically written as follows:
  • Step 104 If the preset dark link attribute feature is included, it is determined that the URL corresponding to the URL associated tag is a dark link.
  • the dark link detection device can determine that the attribute content of the URL associated tag contains the preset dark link attribute characteristics , And then it can be determined that the URL corresponding to the URL associated tag is a dark link.
  • the processing after step 103 can also be as follows :
  • Step 105 If the preset dark link attribute feature is not included, obtain the outer multi-layer label of the URL associated label and the attribute content of the outer multi-layer label from the response page of the target URL.
  • the outer multi-layer label of the URL-related label may be the multi-layer outer label of the URL-related label.
  • the outer tags commonly used by hackers to hang hidden chains can include but are not limited to div tags, marquee tags, etc.
  • the method of hanging dark links can also be a way of using the attributes of the outer tags of the URL-associated tags to hide dark links. Therefore, when the dark link detection device The attribute content of the URL associated tag itself is detected, and if no dark link attribute characteristics are found, the outer tag of the URL associated tag can be further detected.
  • the outer label of the URL-associated label is usually used for hiding. Therefore, the multi-layer outer label and the attribute content of the outer label of the URL associated label can also be obtained from the response page of the target URL.
  • Step 106 sequentially detecting whether the attribute content of the outer multi-layer label contains the preset dark chain attribute feature.
  • the method of using the attribute content of the outer tag of the URL-associated tag to hide the hidden link can be either by using the attribute content of the outer first-level tag of the URL-associated tag to hide the hidden link, or using the outer tag of the URL-associated tag.
  • the attribute content of the Nth label (N is a natural number) hides the dark link, so the dark link detection device can sequentially detect whether the attribute content of the outer multi-layer label of the URL associated label contains the preset dark link from the inside to the outside (or from the outside to the inside) Chain attribute characteristics.
  • the position content in the style attribute of the div tag can be set to a negative number so that the dark link cannot be displayed in the visible page.
  • Step 107 When it is detected that the outer target layer tag contains the preset dark link attribute characteristics, stop the detection and determine that the URL corresponding to the URL associated tag is a dark link.
  • the dark link detection device sequentially detects the attribute content of the outer multi-layer tags of the URL-associated tag from the inside to the outside (or from the outside to the inside), until the dark link attribute feature is detected, or the outer end of the URL-associated label is detected.
  • One layer of labels If the outer target layer tag is detected and it is found that its attribute content contains the preset dark link attribute characteristics, the detection can be stopped. Accordingly, the URL corresponding to the outer target layer tag can be determined as a dark link.
  • the dark link detection device performs layer-by-layer detection on the outer label of the URL-associated label, the total number of detection layers can also be set to improve detection efficiency or reduce missed detection.
  • a detection log can be generated, and the corresponding processing can be as follows: according to the dark link attribute characteristics, from the attribute content of the URL associated tag Extract the dark link URL; generate a detection log containing the attribute content of the URL associated tag, the dark link URL and the target URL, and send the detection log to the web server of the target website.
  • the dark link detection device can associate the tags from the URL based on the attributes of the dark link.
  • the dark link URL is extracted from the content of the attribute.
  • the dark link detection device may generate a detection log after determining that the URL corresponding to the attribute content of the URL association tag is a dark link or after the detection is completed.
  • the detection log (especially the detection log where the dark link is found) can also be alarmed and sent to the website party of the target website, so that the website party can learn the detection result in time and prevent the dark link.
  • the detection log includes but is not limited to the attribute content of the URL associated tag, the dark link URL, and the target URL.
  • the dark link detection device can optimize and upgrade the dark link detection mechanism based on the feedback of the detection result, and the corresponding processing can be as follows: if a dark link false alarm instruction issued by the website server is received, the preset is updated according to the dark link URL Set of secure URLs.
  • the dark link detection device After the dark link detection device receives the dark link false alarm instruction, it can automatically add the dark link URL to the preset
  • the security URL is centralized to realize the update of the security URL, and then when the website is detected next time, the dark link library matching can no longer be performed for the target URL associated tags that have the dark link false alarm, which can also improve the detection efficiency of the website .
  • the website dark link detection method periodically obtains the URL-related tags and the attribute content of the URL-related tags from all URL response pages under the target website, and detects each URL-related tag, first detecting the URL-related tag Whether the URL corresponding to the attribute content of the URL belongs to the preset secure URL set, for the URL associated tags that do not belong to the preset secure URL set, and then check whether the attribute content of the URL associated tag contains the preset dark link attribute features, and so on Set the set of secure URLs and preset dark link attribute characteristics, and perform multi-angle detection on each URL associated tag to achieve multi-level detection of the website, which can more accurately detect whether the website is linked to the dark link to reduce the dark link. False alarm rate and false alarm rate of chain detection equipment.
  • an embodiment of the present application also provides a website dark link detection device.
  • the website dark link detection device includes a tag acquisition module 301 and a tag detection module 302, wherein:
  • the tag obtaining module 301 is configured to periodically obtain all URL associated tags and all attribute content of the URL associated tags from the response page of the target URL for each target URL of the target website;
  • the tag detection module 302 is configured to, for each of the URL associated tags, detect whether the URL corresponding to the attribute content of the URL associated tag belongs to a preset secure URL set;
  • the tag detection module 302 is further configured to detect whether the attribute content of the URL associated tag includes the preset dark link attribute feature if it does not belong to a preset secure URL set;
  • the tag detection module 302 is further configured to determine that the URL corresponding to the attribute content of the URL-associated tag is a dark link if the preset dark link attribute feature is included.
  • the label acquisition module 301 is specifically configured to:
  • An access request is initiated to each target URL in turn, and all URLs containing the domain name of the target website in the response page of each target URL are added as target URLs.
  • the label detection module 302 is specifically configured to:
  • the label detection module 302 is specifically configured to:
  • the label detection module 302 is further configured to:
  • the label detection module 302 is further configured to:
  • a detection log including the attribute content of the URL associated tag, the dark link URL, and the target URL is generated, and the detection log is sent to a website server.
  • the label detection module 302 is further configured to:
  • the preset safe URL set is updated according to the dark link URL.
  • the website dark link detection device periodically obtains URL-related tags and the attribute content of URL-related tags from all URL response pages under the target website, and detects each URL-related tag, first detecting the URL-related tag Whether the URL corresponding to the attribute content of the URL belongs to the preset secure URL set, for the URL associated tags that do not belong to the preset secure URL set, and then check whether the attribute content of the URL associated tag contains the preset dark link attribute features, and so on Set a set of secure URLs and preset dark link attributes, and perform multi-angle detection on the associated tags of each URL to achieve multi-level detection of the website, which can more accurately detect whether the website is linked to a dark link and reduce the dark link. False alarm rate and false alarm rate of chain detection equipment.
  • the dark link detection device for website performs dark link detection of a website
  • only the division of the above functional modules is used as an example for illustration.
  • the above functions can be assigned to different functions as required.
  • the function module is completed, that is, the internal structure of the device is divided into different function modules to complete all or part of the functions described above.
  • the website dark link detection device provided in the above embodiment belongs to the same concept as the embodiment of the website dark link detection method. For the specific implementation process, please refer to the method embodiment, which will not be repeated here.
  • Fig. 4 is a schematic structural diagram of a dark link detection device provided by an embodiment of the present application.
  • the dark link detection device 400 may have relatively large differences due to different configurations or performances, and may include one or more central processing units 422 (for example, one or more processors) and a memory 432, and one or more storage application programs 462 or data 466 storage medium 430 (for example, one or a storage device with a large amount of storage).
  • the memory 432 and the storage medium 430 may be short-term storage or persistent storage.
  • the program stored in the storage medium 430 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the dark link detection device.
  • the central processing unit 422 may be configured to communicate with the storage medium 430, and execute a series of instruction operations in the storage medium 430 on the dark link detection device 400.
  • the dark link detection device 400 may also include one or more power supplies 426, one or more wired or wireless network interfaces 450, one or more input and output interfaces 458, one or more keyboards 456, and/or, one or more Operating system 461, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and so on.
  • the dark link detection device 400 may include a memory, and one or more programs, where one or more programs are stored in the memory and configured to be executed by one or more processors. In order to perform the above-mentioned website dark link detection instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioethics (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种网站暗链检测方法及装置,属于计算机技术领域。所述方法包括:对于目标网站的每个目标URL,周期性从所述目标URL的响应页面获取所有URL关联标签及所述URL关联标签的所有属性内容(101);对于每个所述URL关联标签,检测所述URL关联标签的属性内容对应的URL是否属于预设的安全URL集(102);若不属于预设的安全URL集,则检测所述URL关联标签的属性内容是否包含预设的暗链属性特征(103);若包含预设的暗链属性特征,则确定所述URL关联标签的属性内容对应的URL为暗链(104)。采用本申请,可以更加准确地检测出网站是否被挂暗链,以降低暗链检测设备的误报率及漏报率。

Description

一种网站暗链检测方法和装置
交叉引用
本申请引用于2019年04月16日递交的名称为“一种网站暗链检测方法和装置”的第201910305415.2号中国专利申请,其通过引用被全部并入本申请。
技术领域
本申请涉及计算机技术领域,特别涉及一种网站暗链检测方法和装置。
背景技术
随着互联网技术的发展,通过网站的网络流量获取经济收益已经是一种普遍营销方式。为此,某些网站甚至将其网站链接以在线访客不可见、但搜索引擎能搜索到的链接形式隐藏在其它正常网站上(俗称挂暗链),以获取大量的网络流量。但是这样无疑会分占正常网站的资源,对正常网站造成一定的经济损失;且若暗链的网站传播非法内容,还会给正常网站造成声誉损失,甚至使正常网站所属单位或个人被法律问责。所以,正常网站通常会通过暗链检测设备对其网站是否被挂暗链进行检测,以防暗链对其造成不良影响。
现有技术中,大多是采用网站URL对比、敏感关键字检测(如博彩、色情等)等技术来检测网站是否被挂暗链。其中,网站URL对比可以是暗链检测设备将通过爬虫技术获取的网站下所有的URL,分别与预设的网站URL白名单进行匹配,若某一URL不在网站URL白名单中,则可以确认网站被挂暗链;敏感关键字检测可以是对上述网站URL进行访问请求,将响应页面的内容与检测装置上预设的敏感关键词库进行匹配,若某响应页面的内容中包含敏感关键词库中的内容,则可以确认网站被挂暗链。
在实现本申请的过程中,发明人发现现有技术至少存在以下问题:
现有技术中的网站URL白名单有很大概率无法包含网站全部的URL,则检测时可能会误将网站的URL判定为暗链,从而产生较高的误报率;再者,检测装置上预设的敏感关键词库可能无法包含所有的敏感关键字,则可能有些敏感关键字无法检测到,从而产生较高的漏报率。
发明内容
为了解决现有技术的问题,本申请实施例提供了一种网站暗链检测方法和装置。所述技术方案如下:
第一方面,提供了一种网站暗链检测方法,所述方法包括:
对于目标网站的每个目标URL,周期性从所述目标URL的响应页面获取所有URL关联标签及所述URL关联标签的所有属性内容;
对于每个所述URL关联标签,检测所述URL关联标签的属性内容对应的URL是否属于预设的安全URL集;
若不属于预设的安全URL集,则检测所述URL关联标签的属性内容是否包含预设的暗链属性特征;
若包含预设的暗链属性特征,则确定所述URL关联标签的属性内容对应的URL为暗链。
可选的,所述对于目标网站的每个目标URL,周期性从所述目标URL的响应页面获取所有URL关联标签及所述URL关联标签的所有属性内容之前,还包括:
对目标网站的首页发起访问请求,将所述首页内包含所述目标网站的域名的所有URL确定为目标URL;
依次对所述每个目标URL发起访问请求,将每个所述目标URL的响应页面内包含所述目标网站的域名的所有URL增添为目标URL。
可选的,所述检测所述URL关联标签的属性内容对应的URL是否属于预设的安全URL集,包括:
检测所述URL关联标签的属性内容对应的URL是否属于所述目标网站的目录,或所述URL关联标签的属性内容中的URL域名信息是否属于预设的域名白名单。
可选的,所述检测所述URL关联标签的属性内容是否包含预设的暗链属性特征,包括:
检测所述URL关联标签的属性内容中的标签信息是否为meta,或所述URL关联标签的属性内容是否与预设的html暗链库匹配。
可选的,所述检测所述URL关联标签的属性内容是否包含预设的暗链属性特征之后,还包括:
若不包含预设的暗链属性特征,则从所述目标URL的响应页面获取所述URL关联标签的外多层标签及所述外多层标签的属性内容;
依次检测所述外多层标签的属性内容是否包含预设的暗链属性特征;
当检测到外目标层标签包含预设的暗链属性特征时,停止检测并确定所述URL关联标签对应的URL为暗链。
可选的,所述确定所述URL关联标签的属性内容对应的URL为暗链之后,还包括:
根据所述暗链属性特征,从所述URL关联标签的属性内容中提取暗链URL;
生成包含所述URL关联标签的属性内容、所述暗链URL及目标URL的检测日志,并将所述检测日志发送给所述目标网站的网站服务器。
可选的,若接收到所述网站服务器发出的暗链误报指令,则根据所述暗链URL更新所述预设的安全URL集。
第二方面,提供了一种网站暗链检测装置,所述网站暗链检测装置包括标签获取模块和标签检测模块,其中:
所述标签获取模块,用于对于目标网站的每个目标URL,周期性从所述 目标URL的响应页面获取所有URL关联标签及所述URL关联标签的所有属性内容;
所述标签检测模块,用于对于每个所述URL关联标签,检测所述URL关联标签的属性内容对应的URL是否属于预设的安全URL集;
所述标签检测模块,还用于若不属于预设的安全URL集,则检测所述URL关联标签的属性内容是否包含预设的暗链属性特征;
所述标签检测模块,还用于若包含预设的暗链属性特征,则确定所述URL关联标签的属性内容对应的URL为暗链。
可选的,所述标签获取模块,具体用于:
对目标网站的首页发起访问请求,将所述首页内包含所述目标网站的域名的所有URL确定为目标URL;
依次对所述每个目标URL发起访问请求,将每个所述目标URL的响应页面内包含所述目标网站的域名的所有URL增添为目标URL。
可选的,所述标签检测模块,具体用于:
检测所述URL关联标签的属性内容对应的URL是否属于所述目标网站的目录,或所述URL关联标签的属性内容中的URL域名信息是否属于预设的域名白名单。
可选的,所述标签检测模块,具体用于:
检测所述URL关联标签的属性内容中的标签信息是否为meta,或所述URL关联标签的属性内容是否与预设的html暗链库匹配。
可选的,所述标签检测模块,还用于:
若不包含预设的暗链属性特征,则从所述目标URL的响应页面获取所述URL关联标签的外多层标签及所述外多层标签的属性内容;
依次检测所述外多层标签的属性内容是否包含预设的暗链属性特征;
当检测到外目标层标签包含预设的暗链属性特征时,停止检测并确定所述URL关联标签对应的URL为暗链。
可选的,所述标签检测模块,还用于:
根据所述暗链属性特征,从所述URL关联标签的属性内容中提取暗链URL;
生成包含所述URL关联标签的属性内容、所述暗链URL及目标URL的检测日志,并将所述检测日志发送给所述目标网站的网站服务器。
可选的,所述标签检测模块,还用于:
若接收到所述网站服务器发出的暗链误报指令,则根据所述暗链URL更新所述预设的安全URL集。
第三方面,提供了一种暗链检测设备,所述暗链检测设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如第一方面所述的网站暗链检测方法。
第四方面,提供了一种计算机可读存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现如第一方面所述的网站暗链检测方法。
本申请实施例提供的技术方案带来的有益效果是:
本实施例提供的网站暗链检测方法,周期性从目标网站下所有的URL响应页面内获取URL关联标签及URL关联标签的属性内容,并对每个URL关联标签进行检测,先检测URL关联标签的属性内容对应的URL是否属于预设的安全URL集,对于不属于预设的安全URL集的URL关联标签,再检测URL关联标签的属性内容是否包含预设的暗链属性特征,如此通过预设的安全URL集和预设的暗链属性特征,对每个URL关联标签进行多角度检测,以实现对网 站的多层次检测,可以更加准确地检测出网站是否被挂暗链,以降低暗链检测设备的误报率及漏报率。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种网站暗链检测方法流程示意图;
图2为本申请实施例提供的另一种网站暗链检测方法流程示意图;
图3为本申请实施例提供的一种网站暗链检测装置的结构示意图;
图4为本申请实施例提供的一种暗链检测设备的结构示意图。
具体实施例
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施例作进一步地详细描述。
本申请实施例提供了一种网站暗链检测方法,该方法的执行主体可以是暗链检测设备,其中,暗链检测设备可以是网站方或者暗链检测服务方设置的,具备域名访问功能和数据处理分析功能的,用于对网站是否被挂暗链进行检测的网络设备。暗链检测设备在对网站进行暗链检测时,可以从网站的任意URL开始,获取该URL响应页面内的URL关联标签的属性内容,通过对属性内容进行分析,检测该URL关联标签是否隐藏暗链,若没有检测到暗链,则再对其它任意URL进行检测,直至检测出暗链或将网站的所有URL都进行检测,从而实现网站的暗链检测。暗链检测设备可以包括处理器、存储器和收发器,处理器可以用于进行下述流程中执行网站暗链检测的处理,存储器可以用于存储处理过程中需要的数据以及产生的数据,收发器可以用于接收和发送处理过程中的相关数据。
下面将结合具体实施例,对图1所示的处理流程进行详细的说明,内容可以如下:
步骤101,对于目标网站的每个目标URL,周期性从目标URL的响应页面获取所有URL关联标签及URL关联标签的所有属性内容。
其中,目标网站即待检测的网站;目标URL可以是目标网站的任意URL;URL关联标签可以是能够通过标签的属性内容实现URL跳转的标签,可以包括但不限于a标签、meta标签、iframe标签、frame标签、embed标签及object标签。标签的属性内容可以包括指向URL的属性内容(如href属性、src属性及URL属性等)和其它属性内容(如style属性、height属性、width属性及left属性等)。例如a标签:<a href="http://www.aaa.com/"name="abc"class="de"style="display:block;">文字</a>,其中,href属性为指向URL的属性,其内容是http://www.aaa.com/;name、class及style均为其它属性,内容分别对应abc、de及display:block。
在实施中,网站方通过上述暗链检测设备周期性对其网站进行暗链检测时,每次检测都可以从网站的任意URL开始,具体可以获取其响应页面内的标签及标签的属性内容。鉴于,暗链也是一个可以定位资源的URL,挂暗链的方式可以是利用能够实现URL跳转的URL关联标签,将暗链URL写在URL关联标签的属性内容中,并通过URL关联标签的其它属性内容进行隐藏。所以,在获取标签时,可以只获取所有能够实现URL跳转的URL关联标签及URL关联标签的属性内容。
可选的,暗链检测设备可以通过对网站多层页面发起访问请求的方式获取网站的URL,相应的,步骤101之前的处理可以如下:对目标网站的首页发起访问请求,将首页内包含所述目标网站的域名的所有URL确定为目标URL;依次对每个目标URL发起访问请求,将每个目标URL的响应页面内包含所述目标网站的域名的URL增添为目标URL。
在实施中,当没有预设目标网站的所有URL时,暗链检测设备可以先对 目标网站的首页发起访问请求,然后将接收得到的首页中包含所述目标网站的域名(包括网站域名和网站域名的子域名)的所有URL确定为目标URL,并将目标URL存储至预设的待访问URL集中。接下来,暗链检测设备可以依次对待访问URL集中的每个目标URL分别发起访问请求,然后将接收得到的响应页面内包含所述目标网站的域名(包括网站域名和网站域名的子域名)的所有URL增添为目标URL,并将增添的目标URL存储至待访问URL集中,即暗链检测设备可以对新增添的目标URL继续执行上述访问和增添新目标URL的处理,如此循环重复,直至获取网站下所有的URL。通常情况下,同一网站的同一资源,可以从网站下的不同页面的链接访问到,所以,对每个URL进行访问时,通常会获取到很多重复的URL,这里将接收得到的响应页面内的所有URL增添为目标URL,可以是将所有URL去重后再增添为目标URL。实现上述获取URL的处理具体可以采用爬虫技术,从网站的首页为入口,对首页内容进行爬取和存储,再从首页内的任意链接获取新页面,并对新页面内容进行爬取和存储,依次循环,直至爬取到网站下所有的URL。进一步的,当循环数次后,增添的URL可能都是重复的,所以,为了提高检测效率,也可以设置循环预设次数(如3-5次)。值得一提的是,还可以将访问每个URL而获取的所有URL存储到一个待访问URL子集中,然后将所有待访问URL子集中的URL都存储到待访问URL集中。此外,暗链检测设备也可以先对目标网站下任一页面发起访问请求,本实施例对此不做限定。
步骤102,对于每个URL关联标签,检测URL关联标签的属性内容对应的URL是否属于预设的安全URL集。
其中,预设的安全URL集可以是包含多个或者多类URL的集合,集合中的URL可以包括目标网站的页面上允许存在的所有类型的链接,如目标网站的内部链接或者友情链接等。
在实施中,对每个URL关联标签进行检测时,可以通过检测URL关联标签中能够实现URL跳转的属性内容,来检测URL关联标签的属性内容对应 的URL。若URL关联标签的属性内容对应的URL属于预设的安全URL集,则说明URL关联标签中没有写入暗链;若URL关联标签的属性内容对应的URL不属于预设的安全URL集,则说明URL关联标签中可能写入了暗链,需要进一步检测。值得一提的是,对于不同的URL关联标签,其能够实现URL跳转的属性也不尽相同,例如a标签能够实现URL跳转的属性是href属性,meta标签能够实现URL跳转的属性是URL属性,iframe标签能够实现URL跳转的属性是src属性,frame标签能够实现URL跳转的属性是src属性,embed标签能够实现URL跳转的属性是src属性,object标签能够实现URL跳转的属性是codebase属性。
可选的,可以通过URL关联标签的属性内容中的URL域名信息来判断对应的URL是否属于预设的安全URL集,相应的,步骤102的处理可以如下:检测URL关联标签的属性内容对应的URL是否属于目标网站的目录,或检测URL关联标签的属性内容中的URL域名信息是否属于预设的域名白名单。
在实施中,通常可以通过域名信息来区分URL,故而,暗链检测设备可以通过检测URL关联标签的属性内容中的URL域名信息,来判断URL关联标签的属性内容对应的URL是否属于预设的安全URL集。而对于URL关联标签中指向URL的属性内容,其书写方式可以根据URL对应的页面不同而有所不同,若URL属于当前被检测网站的目录的话,则指向URL的属性内容可以不包括域名信息,直接以“/”、“./”或者“../”开头;而URL不属于当前被检测网站的目录的话,则指向URL的属性内容通常是包含域名的URL形式。所以,暗链检测设备可以通过检测URL关联标签的属性内容对应的URL是否属于目标网站的目录,或检测URL关联标签的属性内容中的URL域名信息是否属于预设的域名白名单,来判断URL关联标签的属性内容对应的URL是否属于预设的安全URL集。若URL关联标签的属性内容属于目标网站的目录,或者URL关联标签的属性内容对应的域名属于预设的域名白名单,二者符合其一,暗链检测设备便可判断URL关联标签的属性内容对应的URL属于预设的安全URL集。
步骤103,若不属于预设的安全URL集,则检测URL关联标签的属性内容是否包含预设的暗链属性特征。
其中,暗链属性特征可以是能够表征URL关联标签为暗链的属性特征,具体可以是人工对暗链的隐藏方式进行整理,从而归纳总结出的。
在实施中,若URL关联标签的属性内容对应的URL不属于预设的安全URL集,则说明有可能是一个非正常链接(既不属于目标网站的URL也不属于友情链接),需要对URL关联标签的属性内容做进一步检测。故而,暗链检测设备可以通过判断URL关联标签的所有属性内容中是否包含预设的暗链属性特征,以检测目标网站是否被挂暗链。具体的,暗链检测设备可以通过对URL关联标签的其它属性内容(除指向URL的属性之外)进行分析,判断URL关联标签的属性内容是否包含预设的暗链属性特征。
可选的,可以根据挂暗链的方式,采用相应的检测方法,判断URL关联标签的属性内容是否包含预设的暗链属性特征,相应的,步骤103的处理可以如下:检测URL关联标签的属性内容中的标签信息是否为meta,或URL关联标签的属性内容是否与预设的html暗链库匹配。
其中,html暗链库可以记录有当前技术水平下绝大多数的挂暗链的方式,并可以随着技术的发展不断添加更新。
在实施中,所有的URL关联标签中,meta标签位于网页html源代码的头部,提供有关页面的元信息,是搜索引擎判定网页内容的主要依据,但不会在页面显示,所以,攻击者可以在标签中插入大量与网页不相关的词语以及链接,不需借助其它属性内容进行隐藏,便可实现挂暗链。而对于meta标签以外的URL关联标签,为了既能让搜索引擎搜索到,又不让访客能看到,则除了挂上具有跳转功能的暗链后,还要对暗链进行隐藏,当前技术水平下,挂暗链的方式通常是借助其它属性内容使暗链指向的资源不能或者不易被在线访客发现。所以,检测URL关联标签的属性内容是否包含预设的暗链属性特征时,可以判断属性内容中的URL关联标签是否为meta标签,或者检测URL关联标签的属 性内容是否与预设的html暗链库匹配。具体的,html暗链库中的挂暗链方式可以包括但不限于使用URL关联标签的其它属性隐藏暗链的方式,例如:将暗链的颜色设为背景色,暗链的文字设为低像素,具体书写如下:
<a herf="暗链网址"style="color:#FFFFFF;">关键词</a>
<a herf="暗链网址"style="font-size:1px;">关键词</a>
<a herf="暗链网址"style="line-height:1px;">关键词</a>
步骤104,若包含预设的暗链属性特征,则确定URL关联标签对应的URL为暗链。
在实施中,若通过对URL关联标签的其它属性内容进行分析后,检测出URL关联标签的其它属性内容中包含预设的暗链属性特征,如属性内容对应的URL不属于预设的安全URL集的URL关联标签,恰好是meta标签;或者a标签的style属性内容恰如上述挂暗链的书写形式等等,暗链检测设备则可以确定URL关联标签的属性内容包含预设的暗链属性特征,继而可以确定URL关联标签对应的URL为暗链。
可选的,如果URL关联标签的属性内容不包含预设的暗链属性特征,还可以进一步检测URL关联标签的外层标签,相应的,如图2所示,步骤103之后的处理还可以如下:
步骤105,若不包含预设的暗链属性特征,则从目标URL的响应页面获取URL关联标签的外多层标签及外多层标签的属性内容。
其中,URL关联标签的外多层标签可以是URL关联标签的多层外层标签。黑客挂暗链常用外层标签可以包括但不限于div标签、marquee标签等。
在实施中,由于挂暗链方式除了上述使用URL关联标签的本身属性隐藏暗链的方式,也可以是使用URL关联标签的外层标签的属性内容隐藏暗链的方式,所以当暗链检测设备对于URL关联标签本身的属性内容进行检测,没有发现暗链属性特征时,可以进一步对URL关联标签的外层标签进行进一步检测。对于使用URL关联标签的外层标签的属性内容隐藏暗链的方式,为了能够被搜 索引擎检索到,通常是使用URL关联标签的外层标签进行隐藏。所以,还可以从目标URL的响应页面获取URL关联标签的多层外层标签及外层标签的属性内容。
步骤106,依次检测外多层标签的属性内容是否包含预设的暗链属性特征。
在实施中,使用URL关联标签的外层标签的属性内容隐藏暗链的方式,既可以是使用URL关联标签的外第一层标签的属性内容隐藏暗链,也可以是使用URL关联标签的外第N层标签(N为自然数)的属性内容隐藏暗链,所以暗链检测设备可以由内向外(或者由外向内)依次检测URL关联标签的外多层标签的属性内容是否包含预设的暗链属性特征。进一步的,若使用div标签的style属性内容隐藏暗链,则可以将div标签的style属性中position位置内容设置成负数,使暗链无法显示在可见页面之内,具体书写可以如下:<div style="position:absolute;left:-900px;top:-999px;"><a href="暗链">关键词2</a></div>;若使用marquee标签的属性内容隐藏暗链,则可以通过将标签的height属性(文字的高度)设置很小,同时将标签的scrollamount属性(文字滚动的速度,即闪现的频率)设置很大,使得查看页面时暗链可以跑马灯形式迅速闪现,且不会影响对页面的观看,具体书写可以如下:<marquee height=1width=4scrollamount=3000scrolldelay=20000><a href="暗链网址">关键字</a></marquee>。
值得一提的是,上述挂暗链的方式仅为了具体说明,便于理解,并不是对本实施例的限制。
步骤107,当检测到外目标层标签包含预设的暗链属性特征时,停止检测并确定URL关联标签对应的URL为暗链。
在实施中,暗链检测设备由内向外(或者由外向内)依次对URL关联标签的外多层标签的属性内容进行检测,直至检测到暗链属性特征,或检测完URL关联标签的外最后一层标签。若检测到外目标层标签,发现其属性内容中包含预设的暗链属性特征,则可以停止检测,相应的,可以确定外目标层标签对应 的URL为暗链。此外,暗链检测设备在对URL关联标签的外层标签进行逐层检测时,也可以设置总的检测层数,以提高检测效率或者减少漏检的情况。
可选的,确定了URL关联标签的属性内容对应的URL为暗链之后,或者检测完成之后,可以生成检测日志,相应的处理可以如下:根据暗链属性特征,从URL关联标签的属性内容中提取暗链URL;生成包含URL关联标签的属性内容、暗链URL及目标URL的检测日志,并将检测日志发送给目标网站的网站服务器。
在实施中,对于检测出其属性内容中隐藏有暗链的URL关联标签,也可以说是命中上述html暗链库的URL关联标签,暗链检测设备可以根据暗链属性特征,从URL关联标签的属性内容中进行提取暗链URL。且为了便于对检测结果进行存储和调用,暗链检测设备可以在确定了URL关联标签的属性内容对应的URL为暗链之后,或者检测完成之后,将检测结果生成检测日志。根据网站方需要,还可以将检测日志(尤其是发现暗链的检测日志)告警并发送给目标网站的网站方,以便网站方及时获悉检测结果,并对暗链加以防范。检测日志中包含但不限于URL关联标签的属性内容、暗链URL及目标URL。
可选的,暗链检测设备可以基于检测结果的反馈对暗链检测机制进行优化升级,相应的处理可以如下:若接收到网站服务器发出的暗链误报指令,则根据暗链URL更新预设的安全URL集。
在实施中,由于安全URL集和html暗链库都是预设的,暗链检测设备在未接收到更新指令之前通常不会对其进行更新修改,所以随着暗链技术的发展,安全URL集和html暗链库均难免会出现记录不完全的时候,所以检测结果也难免会发生误报的情况。当网站方发现接收到的检测结果为误报时,可以向暗链检测设备发出的暗链误报指令,暗链检测设备接收到暗链误报指令后,可以自动将暗链URL添加到预设的安全URL集中,实现安全URL的更新,然后下次对网站进行检测时,对于发生暗链误报的目标URL关联标签,便可以不再进行暗链库匹配,从而还可以提高网站的检测效率。
本实施例提供的网站暗链检测方法,周期性从目标网站下所有的URL响应页面内获取URL关联标签及URL关联标签的属性内容,并对每个URL关联标签进行检测,先检测URL关联标签的属性内容对应的URL是否属于预设的安全URL集,对于不属于预设的安全URL集的URL关联标签,再检测URL关联标签的属性内容是否包含预设的暗链属性特征,如此通过预设的安全URL集和预设的暗链属性特征,对每个URL关联标签进行多角度检测,以实现对网站的多层次检测,可以更加准确地检测出网站是否被挂暗链,以降低暗链检测设备的误报率及漏报率。
基于相同的技术构思,本申请实施例还提供了一种网站暗链检测装置,如图3所示,网站暗链检测装置包括标签获取模块301和标签检测模块302,其中:
所述标签获取模块301,用于对于目标网站的每个目标URL,周期性从所述目标URL的响应页面获取所有URL关联标签及所述URL关联标签的所有属性内容;
所述标签检测模块302,用于对于每个所述URL关联标签,检测所述URL关联标签的属性内容对应的URL是否属于预设的安全URL集;
所述标签检测模块302,还用于若不属于预设的安全URL集,则检测所述URL关联标签的属性内容是否包含预设的暗链属性特征;
所述标签检测模块302,还用于若包含预设的暗链属性特征,则确定所述URL关联标签的属性内容对应的URL为暗链。
可选的,所述标签获取模块301,具体用于:
对目标网站的首页发起访问请求,将所述首页内包含目标网站的域名的所有URL确定为目标URL;
依次对所述每个目标URL发起访问请求,将每个所述目标URL的响应页面内包含目标网站的域名的所有URL增添为目标URL。
可选的,所述标签检测模块302,具体用于:
检测所述URL关联标签的属性内容对应的URL是否属于所述目标网站的目录,或所述URL关联标签的属性内容中的URL域名信息是否属于预设的域名白名单。
可选的,所述标签检测模块302,具体用于:
检测所述URL关联标签的属性内容中的标签信息是否为meta,或所述URL关联标签的属性内容是否与预设的html暗链库匹配。
可选的,所述标签检测模块302,还用于:
若不包含预设的暗链属性特征,则从所述目标URL的响应页面获取所述URL关联标签的外多层标签及所述外多层标签的属性内容;
依次检测所述外多层标签的属性内容是否包含预设的暗链属性特征;
当检测到外目标层标签包含预设的暗链属性特征时,停止检测并确定所述URL关联标签对应的URL为暗链。
可选的,所述标签检测模块302,还用于:
根据所述暗链属性特征,从所述URL关联标签的属性内容中提取暗链URL;
生成包含所述URL关联标签的属性内容、所述暗链URL及目标URL的检测日志,并将所述检测日志发送给网站服务器。
可选的,所述标签检测模块302,还用于:
若接收到所述网站服务器发出的暗链误报指令,则根据所述暗链URL更新所述预设的安全URL集。
本实施例提供的网站暗链检测装置,周期性从目标网站下所有的URL响应页面内获取URL关联标签及URL关联标签的属性内容,并对每个URL关联标签进行检测,先检测URL关联标签的属性内容对应的URL是否属于预设的安全URL集,对于不属于预设的安全URL集的URL关联标签,再检测URL关联标签的属性内容是否包含预设的暗链属性特征,如此通过预设的安全URL 集和预设的暗链属性特征,对每个URL关联标签进行多角度检测,以实现对网站的多层次检测,可以更加准确地检测出网站是否被挂暗链,以降低暗链检测设备的误报率及漏报率。
需要说明的是:上述实施例提供的网站暗链检测装置在进行网站暗链检测时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的网站暗链检测装置与网站暗链检测方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图4是本申请实施例提供的暗链检测设备的结构示意图。该暗链检测设备400可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器422(例如,一个或一个以上处理器)和存储器432,一个或一个以上存储应用程序462或数据466的存储介质430(例如一个或一个以上海量存储设备)。其中,存储器432和存储介质430可以是短暂存储或持久存储。存储在存储介质430的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对暗链检测设备中的一系列指令操作。更进一步地,中央处理器422可以设置为与存储介质430通信,在暗链检测设备400上执行存储介质430中的一系列指令操作。
暗链检测设备400还可以包括一个或一个以上电源426,一个或一个以上有线或无线网络接口450,一个或一个以上输入输出接口458,一个或一个以上键盘456,和/或,一个或一个以上操作系统461,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
暗链检测设备400可以包括有存储器,以及一个或者一个以上的程序,其中一个或者一个以上程序存储于存储器中,且经配置以由一个或者一个以上 处理器执行所述一个或者一个以上程序包含用于进行上述网站暗链检测的指令。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的较佳实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (16)

  1. 一种网站暗链检测方法,包括:
    对于目标网站的每个目标URL,周期性从所述目标URL的响应页面获取所有URL关联标签及所述URL关联标签的所有属性内容;
    对于每个所述URL关联标签,检测所述URL关联标签的属性内容对应的URL是否属于预设的安全URL集;
    若不属于预设的安全URL集,则检测所述URL关联标签的属性内容是否包含预设的暗链属性特征;
    若包含预设的暗链属性特征,则确定所述URL关联标签的属性内容对应的URL为暗链。
  2. 根据权利要求1所述的方法,其中,所述对于目标网站的每个目标URL,周期性从所述目标URL的响应页面获取所有URL关联标签及所述URL关联标签的所有属性内容之前,还包括:
    对目标网站的首页发起访问请求,将所述首页内包含所述目标网站的域名的所有URL确定为目标URL;
    依次对所述每个目标URL发起访问请求,将每个所述目标URL的响应页面内包含所述目标网站的域名的所有URL增添为目标URL。
  3. 根据权利要求1所述的方法,其中,所述检测所述URL关联标签的属性内容对应的URL是否属于预设的安全URL集,包括:
    检测所述URL关联标签的属性内容对应的URL是否属于所述目标网站的目录,或所述URL关联标签的属性内容中的URL域名信息是否属于预设的域名白名单。
  4. 根据权利要求1所述的方法,其中,所述检测所述URL关联标签的属性内容是否包含预设的暗链属性特征,包括:
    检测所述URL关联标签的属性内容中的标签信息是否为meta,或所述URL关联标签的属性内容是否与预设的html暗链库匹配。
  5. 根据权利要求1所述的方法,其中,所述检测所述URL关联标签的属性内容是否包含预设的暗链属性特征之后,还包括:
    若不包含预设的暗链属性特征,则从所述目标URL的响应页面获取所述URL关联标签的外多层标签及所述外多层标签的属性内容;
    依次检测所述外多层标签的属性内容是否包含预设的暗链属性特征;
    当检测到外目标层标签包含预设的暗链属性特征时,停止检测并确定所述URL关联标签对应的URL为暗链。
  6. 根据权利要求1所述的方法,其中,所述确定所述URL关联标签的属性内容对应的URL为暗链之后,还包括:
    根据所述暗链属性特征,从所述URL关联标签的属性内容中提取暗链URL;
    生成包含所述URL关联标签的属性内容、所述暗链URL及目标URL的检测日志,并将所述检测日志发送给所述目标网站的网站服务器。
  7. 根据权利要求6所述的方法,其中,所述方法还包括:
    若接收到所述网站服务器发出的暗链误报指令,则根据所述暗链URL更新所述预设的安全URL集。
  8. 一种网站暗链检测装置,包括标签获取模块和标签检测模块,其中:
    所述标签获取模块,用于对于目标网站的每个目标URL,周期性从所述目标URL的响应页面获取所有URL关联标签及所述URL关联标签的所有属性内容;
    所述标签检测模块,用于对于每个所述URL关联标签,检测所述URL关联标签的属性内容对应的URL是否属于预设的安全URL集;
    所述标签检测模块,还用于若不属于预设的安全URL集,则检测所述URL关联标签的属性内容是否包含预设的暗链属性特征;
    所述标签检测模块,还用于若包含预设的暗链属性特征,则确定所述URL关联标签的属性内容对应的URL为暗链。
  9. 根据权利要求8所述的网站暗链检测装置,其中,所述标签获取模块,具体用于:
    对目标网站的首页发起访问请求,将所述首页内包含所述目标网站的域名的所有URL确定为目标URL;
    依次对所述每个目标URL发起访问请求,将每个所述目标URL的响应页面内包含所述目标网站的域名的所有URL增添为目标URL。
  10. 根据权利要求8所述的网站暗链检测装置,其中,所述标签检测模块,具体用于:
    检测所述URL关联标签的属性内容对应的URL是否属于所述目标网站的目录,或所述URL关联标签的属性内容中的URL域名信息是否属于预设的域名白名单。
  11. 根据权利要求8所述的网站暗链检测装置,其中,所述标签检测模块,具体用于:
    检测所述URL关联标签的属性内容中的标签信息是否为meta,或所述URL关联标签的属性内容是否与预设的html暗链库匹配。
  12. 根据权利要求8所述的网站暗链检测装置,其中,所述标签检测模块,还用于:
    若不包含预设的暗链属性特征,则从所述目标URL的响应页面获取所述URL关联标签的外多层标签及所述外多层标签的属性内容;
    依次检测所述外多层标签的属性内容是否包含预设的暗链属性特征;
    当检测到外目标层标签包含预设的暗链属性特征时,停止检测并确定所述URL关联标签对应的URL为暗链。
  13. 根据权利要求8所述的网站暗链检测装置,其中,所述标签检测模块,还用于:
    根据所述暗链属性特征,从所述URL关联标签的属性内容中提取暗链URL;
    生成包含所述URL关联标签的属性内容、所述暗链URL及目标URL的检 测日志,并将所述检测日志发送给所述目标网站的网站服务器。
  14. 根据权利要求13所述的网站暗链检测装置,其中,所述标签检测模块,还用于:
    若接收到所述网站服务器发出的暗链误报指令,则根据所述暗链URL更新所述预设的安全URL集。
  15. 一种暗链检测设备,包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如权利要求1至7任一项所述的网站暗链检测方法。
  16. 一种计算机可读存储介质,存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现如权利要求1至7任一项所述的网站暗链检测方法。
PCT/CN2019/086057 2019-04-16 2019-05-08 一种网站暗链检测方法和装置 WO2020211130A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP19856424.7A EP3745292A4 (en) 2019-04-16 2019-05-08 METHOD AND DEVICE FOR DETECTING A HIDDEN LINK ON A WEBSITE
US16/813,799 US20200336498A1 (en) 2019-04-16 2020-03-10 Method and apparatus for detecting hidden link in website

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910305415.2A CN110309667B (zh) 2019-04-16 2019-04-16 一种网站暗链检测方法和装置
CN201910305415.2 2019-04-16

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/813,799 Continuation US20200336498A1 (en) 2019-04-16 2020-03-10 Method and apparatus for detecting hidden link in website

Publications (1)

Publication Number Publication Date
WO2020211130A1 true WO2020211130A1 (zh) 2020-10-22

Family

ID=68074674

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/086057 WO2020211130A1 (zh) 2019-04-16 2019-05-08 一种网站暗链检测方法和装置

Country Status (3)

Country Link
EP (1) EP3745292A4 (zh)
CN (1) CN110309667B (zh)
WO (1) WO2020211130A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112532624A (zh) * 2020-11-27 2021-03-19 深信服科技股份有限公司 一种黑链检测方法、装置、电子设备及可读存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143722A (zh) * 2019-12-23 2020-05-12 杭州安恒信息技术股份有限公司 一种网页暗链检测方法、装置、设备及介质
CN113111274A (zh) * 2020-01-10 2021-07-13 网宿科技股份有限公司 一种检测网页中隐藏暗链的方法和装置
CN112487321A (zh) * 2020-12-08 2021-03-12 北京天融信网络安全技术有限公司 一种检测方法、装置、存储介质及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8392823B1 (en) * 2003-12-04 2013-03-05 Google Inc. Systems and methods for detecting hidden text and hidden links
CN103856442A (zh) * 2012-11-30 2014-06-11 腾讯科技(深圳)有限公司 一种黑链检测方法、装置和系统
CN105488402A (zh) * 2014-12-23 2016-04-13 哈尔滨安天科技股份有限公司 一种暗链的检测方法及系统
CN105704099A (zh) * 2014-11-26 2016-06-22 国家电网公司 一种检测隐藏在网站脚本中非法链接的方法
CN107370718A (zh) * 2016-05-12 2017-11-21 深圳市深信服电子科技有限公司 网页中黑链的检测方法和装置

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682097A (zh) * 2012-04-27 2012-09-19 北京神州绿盟信息安全科技股份有限公司 检测网页中暗链的方法和设备
CN103685174B (zh) * 2012-09-07 2016-12-21 中国科学院计算机网络信息中心 一种不依赖样本的钓鱼网站检测方法
CN103778365B (zh) * 2012-10-18 2015-05-13 腾讯科技(深圳)有限公司 一种检测网页隐藏内容的方法,及设备
JP2014153877A (ja) * 2013-02-07 2014-08-25 Ricoh Co Ltd 画像形成システム
IN2013CH06148A (zh) * 2013-12-30 2015-07-03 Samsung Electronics Co Ltd
CN104503962B (zh) * 2014-06-18 2017-11-03 北京邮电大学 一种网页暗链检测方法
CN105138907B (zh) * 2015-07-22 2019-04-23 国家计算机网络与信息安全管理中心 一种主动探测被攻击网站的方法和系统
CN105184159B (zh) * 2015-08-27 2018-11-27 深信服科技股份有限公司 网页篡改的识别方法和装置
CN105740308A (zh) * 2015-12-19 2016-07-06 哈尔滨安天科技股份有限公司 基于超文本标记语言结构的网站暗链检测方法及系统
CN105975523A (zh) * 2016-04-28 2016-09-28 浙江乾冠信息安全研究院有限公司 一种基于栈的暗链检测方法
CN106209863B (zh) * 2016-07-15 2019-04-12 山谷网安科技股份有限公司 一种基于全站扫描的网站安全监测方法
CN108363711B (zh) * 2017-07-04 2020-11-13 北京安天网络安全技术有限公司 一种网页中的暗链的检测方法和装置
CN107786537B (zh) * 2017-09-19 2020-04-07 杭州安恒信息技术股份有限公司 一种基于互联网交叉搜索的孤页植入攻击检测方法
CN108282489B (zh) * 2018-02-07 2020-01-31 网宿科技股份有限公司 一种漏洞扫描方法、服务端及系统
CN108737423B (zh) * 2018-05-24 2020-07-14 国家计算机网络与信息安全管理中心 基于网页关键内容相似性分析的钓鱼网站发现方法及系统
CN109067716B (zh) * 2018-07-18 2021-05-28 杭州安恒信息技术股份有限公司 一种识别暗链的方法与系统
CN109062803B (zh) * 2018-08-15 2022-03-11 杭州安恒信息技术股份有限公司 基于爬虫实现自动生成测试用例的方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8392823B1 (en) * 2003-12-04 2013-03-05 Google Inc. Systems and methods for detecting hidden text and hidden links
CN103856442A (zh) * 2012-11-30 2014-06-11 腾讯科技(深圳)有限公司 一种黑链检测方法、装置和系统
CN105704099A (zh) * 2014-11-26 2016-06-22 国家电网公司 一种检测隐藏在网站脚本中非法链接的方法
CN105488402A (zh) * 2014-12-23 2016-04-13 哈尔滨安天科技股份有限公司 一种暗链的检测方法及系统
CN107370718A (zh) * 2016-05-12 2017-11-21 深圳市深信服电子科技有限公司 网页中黑链的检测方法和装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3745292A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112532624A (zh) * 2020-11-27 2021-03-19 深信服科技股份有限公司 一种黑链检测方法、装置、电子设备及可读存储介质
CN112532624B (zh) * 2020-11-27 2023-09-05 深信服科技股份有限公司 一种黑链检测方法、装置、电子设备及可读存储介质

Also Published As

Publication number Publication date
EP3745292A1 (en) 2020-12-02
EP3745292A4 (en) 2020-12-02
CN110309667B (zh) 2022-08-30
CN110309667A (zh) 2019-10-08

Similar Documents

Publication Publication Date Title
WO2020211130A1 (zh) 一种网站暗链检测方法和装置
US20180260565A1 (en) Identifying web pages in malware distribution networks
US9614862B2 (en) System and method for webpage analysis
US10165072B2 (en) Unified tracking data management
CN105184159B (zh) 网页篡改的识别方法和装置
US9424424B2 (en) Client based local malware detection method
Wang et al. Detection of malicious web pages based on hybrid analysis
US20200336498A1 (en) Method and apparatus for detecting hidden link in website
CN105491053A (zh) 一种Web恶意代码检测方法及系统
CN106528657A (zh) 浏览器跳转至应用程序的控制方法及装置
CN101490685A (zh) 提高浏览网页的用户机的安全等级的方法
CN107786537B (zh) 一种基于互联网交叉搜索的孤页植入攻击检测方法
WO2013044757A1 (zh) 一种下载链接安全性检测方法、装置及系统
CN105959324A (zh) 基于正则匹配的网络攻击检测方法及装置
RU2658878C1 (ru) Способ и сервер для классификации веб-ресурса
CN102663052B (zh) 一种提供搜索引擎搜索结果的方法及装置
US9443077B1 (en) Flagging binaries that drop malicious browser extensions and web applications
CN103632084A (zh) 恶意特征数据库的建立方法、恶意对象检测方法及其装置
CN102591965A (zh) 一种黑链检测的方法及装置
CN105868290A (zh) 一种展现搜索结果的方法及装置
Shyni et al. Phishing detection in websites using parse tree validation
CN104468459A (zh) 一种漏洞检测方法及装置
CN103440454B (zh) 一种基于搜索引擎关键词的主动式蜜罐检测方法
CN103617225B (zh) 一种关联网页搜索方法和系统
Lyu et al. An Efficient and Packing‐Resilient Two‐Phase Android Cloned Application Detection Approach

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2019856424

Country of ref document: EP

Effective date: 20200310

ENP Entry into the national phase

Ref document number: 2019856424

Country of ref document: EP

Effective date: 20200310

NENP Non-entry into the national phase

Ref country code: DE