WO2012151843A1 - Ulr filtering system, method and gateway - Google Patents

Ulr filtering system, method and gateway Download PDF

Info

Publication number
WO2012151843A1
WO2012151843A1 PCT/CN2011/080608 CN2011080608W WO2012151843A1 WO 2012151843 A1 WO2012151843 A1 WO 2012151843A1 CN 2011080608 W CN2011080608 W CN 2011080608W WO 2012151843 A1 WO2012151843 A1 WO 2012151843A1
Authority
WO
WIPO (PCT)
Prior art keywords
url
unit
rule file
memory
message
Prior art date
Application number
PCT/CN2011/080608
Other languages
French (fr)
Chinese (zh)
Inventor
王永光
沈蓓洁
卢勤元
李冰
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2012151843A1 publication Critical patent/WO2012151843A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • H04L63/0236Filtering by address, protocol, port number or service, e.g. IP-address or URL

Definitions

  • the present invention relates to the field of communications, and in particular, to a Uniform I Universal Resource Locator (URL) filtering system, a method for filtering a URL, and a gateway.
  • URL Uniform I Universal Resource Locator
  • a URL also known as a web page address
  • a URL is the address of a standard resource on the Internet, an identification method used to fully describe the addresses of web pages and other resources on the Internet. Every web page on the Internet has a unique URL address name identifier, usually called a URL address. This address can be a local disk, or a computer on a local area network, and more is a site on the Internet. . Simply put, a URL is a web address, commonly known as a "URL.”
  • the hash table is used to store the URL information; the method is applicable to URL searches with different domain names, and when the domain name is the same, it takes a long time to find;
  • An object of the present invention is to provide a URL filtering system and a method and a gateway for filtering URLs, so as to solve the problem of improving the speed of searching for URLs in the prior art.
  • the present invention provides a method for filtering a URL, the method comprising:
  • the system When the system receives the packet, it scans and determines whether the packet is a Hyper Text Transfer Protocol (HTTP) packet, and when it is determined to be an HTTP packet, scans the URL in the HTTP packet. Information, and matching with the URL information in the URL rule file in the memory;
  • HTTP Hyper Text Transfer Protocol
  • the HTTP message is allowed or not allowed to pass according to the matching result.
  • the method further includes: determining whether the user-defined URL list has changed, and determining the change, according to the changed user-defined URL list, Regenerate the system-recognized URL rule file and load the newly generated URL rule file into memory;
  • the system uses the new URL rule file in the memory to match the URL information and delete the old URL rule file in the memory.
  • the method further includes: when the system determines that the received packet is not an HTTP packet, the system directly allows the packet to pass.
  • the user-defined URL list is a blacklist or a whitelist.
  • the enabling or disallowing of the HTTP packet according to the matching result includes:
  • the permission is not allowed.
  • the HTTP packet is passed; if the URL information in the received HTTP packet fails to match the URL information in the URL rule file in the memory, the HTTP packet is allowed to pass;
  • the HTTP packet is allowed to pass; If the URL information in the HTTP packet is not matched with the URL information in the URL rule file in the memory, the HTTP packet is not allowed to pass.
  • the present invention also provides a URL filtering system, including: an identification unit and a memory unit, the system further comprising a rule unit, a scanning unit, and a matching unit;
  • the identifying unit is configured to identify whether the received packet is an HTTP packet, and send the identification result to the scanning unit;
  • the rule unit is configured to generate a URL rule file recognizable by the system according to the user-defined URL list, and load the URL rule file into the memory unit;
  • the scanning unit is configured to scan the received message and send it to the message identification unit, and scan when the identification result returned by the identification unit is determined to be that the received message is an HTTP message.
  • the URL information in the HTTP packet, and the URL information is sent to the matching unit; and the HTTP packet is allowed or not allowed to pass according to the matching result returned by the matching unit; the matching unit is set to Matching the URL information in the HTTP packet with the URL information in the URL rule file in the memory unit, and transmitting the matching result to the scanning unit.
  • the rule unit is further configured to determine whether the user-defined URL list has changed, and when determining that the user-defined URL has changed, according to the changed user-defined URL list, Regenerating the system-recognizable URL rule file, loading the newly generated URL rule file into the memory unit, and after the loading is successful, notifying the matching unit to use the new URL rule file for URL information matching.
  • the matching unit is further configured to receive the notification of the rule unit, The URL information is matched using the new URL rule file, and the old URL rule file in the memory unit is deleted.
  • the scanning unit is further configured to directly allow the message to pass when receiving the identification result returned by the identifying unit to determine that the received message is not an HTTP message.
  • the invention also provides a gateway comprising the above URL filtering system.
  • the invention converts the user-defined URL list into a URL rule file identifiable by the URL system hardware and loads it into the memory.
  • the system can quickly match the HTTP message with the URL rule file in the memory. And the matching result is given, the scanning matching speed can reach at least 2 Gbps, and the type of the URL is not required to be distinguished, the complicated and cumbersome URL classification and searching in the existing method are omitted, and the URL processing speed is accelerated; the present invention supports big data.
  • the URL filtering is applicable to network devices such as Integrated Service Gateway (ISG), Wireless Application Protocol (WW) gateway, and WEB gateway.
  • FIG. 1 is a flow chart of a method for filtering a URL according to the present invention
  • FIG. 2 is a schematic block diagram of a URL filtering system according to the present invention.
  • FIG. 3 is a schematic block diagram of a gateway of the present invention. detailed description
  • FIG. 1 is a flowchart of a method for filtering a URL according to the present invention.
  • a method for filtering a URL is as follows. As shown in FIG. 1 , the method for filtering a URL includes the following steps:
  • Step S001 Generate a URL rule file recognizable by the URL filtering system according to the user-defined blacklist
  • Step S002 Loading the generated URL rule file into the memory
  • Step S003 The system receives the packet
  • Step S004 Scan the said ⁇ text
  • Step S005 determining whether the packet is an HTTP packet, if yes, executing step S006, otherwise, performing step S010;
  • Step S006 Scan URL information in the packet.
  • Step S007 Matching the URL information in the URL rule file in the memory;
  • Step S008 determining whether the matching is successful, if yes, executing step S009; otherwise, executing step S010;
  • Step S009 Filtering the information
  • the filtering the packet means that the packet is not allowed to pass.
  • Step S010 Release the message.
  • the releasing the message means allowing the message to pass, and the message in this step includes an HTTP packet and a non-HTTP packet.
  • the HTTP message is released. If the URL information in the received HTTP packet fails to match the URL information in the URL rule file in the memory, the HTTP packet is filtered.
  • the system may further determine whether the user-defined URL list has changed, and if so, according to the changed user-defined URL list. Regenerate the URL rule file recognizable by the system, and load the newly generated URL rule file into the memory. After the loading is completed, use the new URL rule file to match the URL information, and delete the old URL rule file, which can make this
  • the invention implements real-time update of the URL rule file without interrupting the scan matching service.
  • the memory A and the memory B can be reserved. If the old URL rule file is stored in the memory A, the newly generated URL rule file is loaded after the user-defined URL list is changed.
  • the system uses the URL rule file in the memory B to match the URL information, and at the same time, deletes the URL rule file in the memory A, and when the user-defined URL list changes again, the newly generated The URL rule file is loaded into memory A, and so on. That is to say, the system performs two tasks at the same time, one is to process the received message, and the other is to detect whether the user-defined URL list has changed.
  • the hardware-based filtering method of the present invention improves the speed of processing HTTP messages compared with existing software-based methods.
  • FIG. 2 is a schematic block diagram of a URL filtering system according to the present invention. As shown in FIG. 2, the system includes: a scanning unit 01, an identification unit 02, a rule unit 03, a matching unit 04, and a memory unit 05;
  • the scanning unit 01 is configured to scan the received message and send it to the message identification unit 02, or scan the URL information in the HTTP message, and send the URL information to the matching unit 04; and according to the identification unit 02 The returned recognition result and the matching result returned by the matching unit 04, release and/or filter the received message;
  • the identifying unit 02 is configured to identify whether the received packet is an HTTP packet, and send the identification result to the scanning unit 01;
  • the rule unit 03 is configured to generate a system-recognizable URL rule file according to the user-defined URL list, and load the generated URL rule file into the memory unit 05; and determine whether the user-defined URL list changes, and The user-defined URL has changed During the process, according to the changed user-defined URL list, the system-recognized URL rule file is regenerated, the newly generated URL rule file is loaded into the memory unit 05, and after the loading is completed, the matching unit 04 is notified to use the new URL rule file for URL information matching;
  • the matching unit 04 is configured to match the received URL information with the URL information in the URL rule file in the memory unit 05, and send the matching result to the scanning unit 01, or when receiving the notification of the rule unit 03, The URL information matching is performed using the newly loaded URL rule file in the memory unit 05, and the old URL rule file in the memory unit 05 is deleted.
  • the scanning unit 01 scans the URL information in the HTTP message, and sends the URL information to the matching unit 04. And releasing or filtering the HTTP packet according to the matching result returned by the matching unit;
  • the scanning unit 01 directly releases the message.
  • FIG. 3 is a schematic block diagram of a gateway according to the present invention. As shown in FIG. 3, the URL filtering system shown in FIG. 2 is included.
  • the URL filtering system includes a scanning unit 01, an identification unit 02, a rule unit 03, a matching unit 04, and a memory unit 05.
  • the function of each unit refer to the description of Figure 2 above, and it will not be repeated here.

Abstract

The present invention relates to a ULR filtering system, method and gateway. The system includes an identification unit, a memory unit, a rule unit, a scanning unit and a matching unit. The method includes: generating a ULR rule file which can be identified by a system according to a user-defined URL list, and loading the URL rule file into the memory; when receiving a message, the system scanning and judging whether the message is an HTTP message, and when it is determined that it is an HTTP message, scanning the ULR information in the HTTP message, matching the same with the URL information in the URL rule file in the memory, and allowing or not allowing the pass of the HTTP message according to the match result. The present invention need not distinguish the type of the URL, accelerating the URL processing speed.

Description

URL过滤系统及过滤 URL的方法、 网关 技术领域  URL filtering system and method for filtering URL, gateway
本发明涉及通信领域, 尤其涉及一种统一资源定位符(URL, Uniform I Universal Resource Locator )过滤系统及过滤 URL的方法、 网关。 背景技术  The present invention relates to the field of communications, and in particular, to a Uniform I Universal Resource Locator (URL) filtering system, a method for filtering a URL, and a gateway. Background technique
URL, 也被称为网页地址, 是因特网 (Internet )上标准的资源的地址, 用于完整地描述 Internet上网页和其它资源的地址的一种标识方法。 Internet 上的每一个网页都具有一个唯一的 URL地址名称标识, 通常称之为 URL 地址, 这种地址可以是本地磁盘, 也可以是局域网上的某一台计算机, 更 多的是 Internet上的站点。 简单地说, URL就是 Web地址, 俗称 "网址"。  A URL, also known as a web page address, is the address of a standard resource on the Internet, an identification method used to fully describe the addresses of web pages and other resources on the Internet. Every web page on the Internet has a unique URL address name identifier, usually called a URL address. This address can be a local disk, or a computer on a local area network, and more is a site on the Internet. . Simply put, a URL is a web address, commonly known as a "URL."
随着网络的普及, 互联网上的信息为人们的生活工作提供了越来越多 的便利, 接触到网络的青少年的数量也越来越多, 但网上的信息良莠不齐, 特别是还存在为数不少的宣扬色情、 暴力、 以及迷信等不良事物的网站, 为了给青少年呈现一个健康向上的网站, 需要对其访问的 URL进行过滤, 从而屏蔽掉一些不健康的、 以及非法的网站, 从而保证青少年的健康成长。  With the popularity of the Internet, information on the Internet has provided more and more convenience for people's lives and work. The number of teenagers who have access to the Internet is increasing, but the information on the Internet is mixed, especially there are still many. Websites that promote pornography, violence, and superstitions, in order to present a healthy and up-to-date website for teenagers, need to filter the URLs they visit to block out unhealthy and illegal websites to ensure the health of teenagers. growing up.
目前已有的 URL过滤方法主要有以下三种:  Currently, there are three main URL filtering methods:
第一, 使用哈希(hash )表存放 URL信息; 该方法适用于域名不同的 URL查找, 当域名相同时, 查找起来耗时较长;  First, the hash table is used to store the URL information; the method is applicable to URL searches with different domain names, and when the domain name is the same, it takes a long time to find;
第二, 使用字符串匹配算法; 该方法适用于关键字查找, 但是, 查找 速度比较慢;  Second, use a string matching algorithm; this method is suitable for keyword lookup, but the search speed is slower;
第三, 使用正则匹配算法; 该方法适用于不确定的 URL查找, 但是, 查找速度也比较慢。  Third, use a regular matching algorithm; this method is suitable for indeterminate URL lookups, but the search speed is slower.
现有的方法查找速度会随着 URL名单中的 URL记录增加而显著下降, 如此, 不能满足现在高吞吐网络中的 URL管理。 发明内容 The speed of the existing method search will decrease significantly as the URL record in the URL list increases. In this way, URL management in today's high-throughput networks cannot be met. Summary of the invention
本发明的目的在于提供一种 URL过滤系统及过滤 URL的方法、 网关, 以解决改善现有技术查找 URL速度慢的问题。  An object of the present invention is to provide a URL filtering system and a method and a gateway for filtering URLs, so as to solve the problem of improving the speed of searching for URLs in the prior art.
本发明提供了一种过滤 URL的方法, 该方法包括:  The present invention provides a method for filtering a URL, the method comprising:
根据用户自定义的 URL名单, 生成 URL过滤系统可识别的 URL规则 文件, 并将所述 URL规则文件加载至内存中;  Generating a URL rule file recognizable by the URL filtering system according to the user-defined URL list, and loading the URL rule file into the memory;
当所述系统收到报文时, 扫描并判断所述报文是否是超文本传输协议 ( HTTP, Hyper Text Transfer Protocol )报文, 确定是 HTTP报文时, 扫描 所述 HTTP报文中的 URL信息, 并与内存中的 URL规则文件中的 URL信 息进行匹配;  When the system receives the packet, it scans and determines whether the packet is a Hyper Text Transfer Protocol (HTTP) packet, and when it is determined to be an HTTP packet, scans the URL in the HTTP packet. Information, and matching with the URL information in the URL rule file in the memory;
根据匹配结果允许或不允许所述 HTTP报文通过。  The HTTP message is allowed or not allowed to pass according to the matching result.
上述方案中, 所述将 URL规则文件加载至内存中后, 该方法还包括: 判断所述用户自定义的 URL名单是否有变化, 确定有变化时, 根据变 化后的用户自定义的 URL名单, 重新生成系统可识别的 URL规则文件, 并将新生成的 URL规则文件加载至内存中;  In the above solution, after the URL rule file is loaded into the memory, the method further includes: determining whether the user-defined URL list has changed, and determining the change, according to the changed user-defined URL list, Regenerate the system-recognized URL rule file and load the newly generated URL rule file into memory;
加载成功后, 所述系统使用内存中新的 URL规则文件进行 URL信息 匹配, 同时删除内存中旧的 URL规则文件。  After the loading is successful, the system uses the new URL rule file in the memory to match the URL information and delete the old URL rule file in the memory.
上述方案中, 该方法还包括: 所述系统确定收到的报文不是 HTTP报 文时, 直接允许所述报文通过。  In the above solution, the method further includes: when the system determines that the received packet is not an HTTP packet, the system directly allows the packet to pass.
上述方案中, 所述用户自定义的 URL名单是黑名单、 或为白名单。 上述方案中, 所述根据匹配结果允许或不允许所述 HTTP报文通过, 包括:  In the foregoing solution, the user-defined URL list is a blacklist or a whitelist. In the above solution, the enabling or disallowing of the HTTP packet according to the matching result includes:
当所述用户自定义的 URL名单为黑名单时, 若收到的 HTTP报文中的 URL信息与内存中的 URL规则文件中的 URL信息匹配成功, 则不允许所 述 HTTP报文通过; 若收到的 HTTP报文中的 URL信息与内存中的 URL 规则文件中的 URL信息匹配失败, 则允许所述 HTTP报文通过; When the URL list of the user-defined URL is blacklisted, if the URL information in the received HTTP packet matches the URL information in the URL rule file in the memory, the permission is not allowed. The HTTP packet is passed; if the URL information in the received HTTP packet fails to match the URL information in the URL rule file in the memory, the HTTP packet is allowed to pass;
当所述用户自定义的 URL名单为白名单时, 若收到的 HTTP报文中的 URL信息与内存中的 URL规则文件中的 URL信息匹配成功, 则允许所述 HTTP报文通过;若收到的 HTTP报文中的 URL信息与内存中的 URL规则 文件中的 URL信息匹配失败, 则不允许所述 HTTP报文通过。  When the user-defined URL list is a whitelist, if the URL information in the received HTTP packet matches the URL information in the URL rule file in the memory, the HTTP packet is allowed to pass; If the URL information in the HTTP packet is not matched with the URL information in the URL rule file in the memory, the HTTP packet is not allowed to pass.
本发明还提供了一种 URL过滤系统, 包括: 识别单元以及内存单元, 该系统还包括规则单元、 扫描单元以及匹配单元; 其中,  The present invention also provides a URL filtering system, including: an identification unit and a memory unit, the system further comprising a rule unit, a scanning unit, and a matching unit;
所述识别单元, 设置为识别收到的报文是否是 HTTP报文, 并将识别 结果发送给所述扫描单元;  The identifying unit is configured to identify whether the received packet is an HTTP packet, and send the identification result to the scanning unit;
所述规则单元, 设置为根据用户自定义的 URL名单, 生成系统可识别 的 URL规则文件, 并将所述 URL规则文件加载至所述内存单元;  The rule unit is configured to generate a URL rule file recognizable by the system according to the user-defined URL list, and load the URL rule file into the memory unit;
所述扫描单元, 设置为扫描收到的报文, 并发送给所述报文识别单元, 并在收到所述识别单元返回的识别结果为确定收到的报文是 HTTP报文时, 扫描 HTTP报文中的 URL信息, 并将所述 URL信息发送给所述匹配单元; 并根据所述匹配单元返回的匹配结果, 允许或不允许所述 HTTP报文通过; 所述匹配单元, 设置为将所述 HTTP报文中的 URL信息与所述内存单 元中的 URL规则文件中的 URL信息进行匹配, 并将匹配结果发送给所述 扫描单元。  The scanning unit is configured to scan the received message and send it to the message identification unit, and scan when the identification result returned by the identification unit is determined to be that the received message is an HTTP message. The URL information in the HTTP packet, and the URL information is sent to the matching unit; and the HTTP packet is allowed or not allowed to pass according to the matching result returned by the matching unit; the matching unit is set to Matching the URL information in the HTTP packet with the URL information in the URL rule file in the memory unit, and transmitting the matching result to the scanning unit.
上述方案中, 所述规则单元, 还设置为判断所述用户自定义的 URL名 单是否有变化, 并在确定所述用户自定义的 URL有变化时, 根据变化后的 用户自定义的 URL名单, 重新生成系统可识别的 URL规则文件, 将新生 成的 URL规则文件加载至所述内存单元中, 并在加载成功后, 通知所述匹 配单元使用新的 URL规则文件进行 URL信息匹配。  In the above solution, the rule unit is further configured to determine whether the user-defined URL list has changed, and when determining that the user-defined URL has changed, according to the changed user-defined URL list, Regenerating the system-recognizable URL rule file, loading the newly generated URL rule file into the memory unit, and after the loading is successful, notifying the matching unit to use the new URL rule file for URL information matching.
上述方案中, 所述匹配单元, 还设置为收到所述规则单元的通知后, 使用新的 URL规则文件进行 URL信息匹配, 并删除所述内存单元中旧的 URL规则文件。 In the above solution, the matching unit is further configured to receive the notification of the rule unit, The URL information is matched using the new URL rule file, and the old URL rule file in the memory unit is deleted.
上述方案中, 所述扫描单元, 还设置为在收到所述识别单元返回的识 别结果为确定收到的报文不是 HTTP报文时, 直接允许所述报文通过。  In the above solution, the scanning unit is further configured to directly allow the message to pass when receiving the identification result returned by the identifying unit to determine that the received message is not an HTTP message.
本发明还提供了一种网关, 该网关包括上述 URL过滤系统。  The invention also provides a gateway comprising the above URL filtering system.
本发明将用户自定义的 URL名单转换成 URL系统硬件可识别的 URL 规则文件并加载至内存中, 当收到报文时, 系统可以迅速的把 HTTP报文 与内存中的 URL规则文件进行匹配, 并给出匹配结果, 扫描匹配速度可以 达到至少 2Gbps, 且不需要区分 URL的类型, 省去了现有方法中复杂而繁 瑣的 URL分类及查找,加快了 URL处理速度;本发明支持大数据量的 URL 过滤, 适用于综合业务网关(ISG, Integrated Service Gateway, )、 无线应 用协议( WAP, Wireless Application Protocol ) 网关、 WEB网关等网络设备 中。 附图说明  The invention converts the user-defined URL list into a URL rule file identifiable by the URL system hardware and loads it into the memory. When receiving the message, the system can quickly match the HTTP message with the URL rule file in the memory. And the matching result is given, the scanning matching speed can reach at least 2 Gbps, and the type of the URL is not required to be distinguished, the complicated and cumbersome URL classification and searching in the existing method are omitted, and the URL processing speed is accelerated; the present invention supports big data. The URL filtering is applicable to network devices such as Integrated Service Gateway (ISG), Wireless Application Protocol (WW) gateway, and WEB gateway. DRAWINGS
此处所说明的附图用来提供对本发明的进一步理解, 构成本发明的一 部分, 本发明的示意性实施例及其说明用于解释本发明, 并不构成对本发 明的不当限定。 在附图中:  The drawings are intended to provide a further understanding of the invention, and are intended to be a part of the invention. In the drawing:
图 1为本发明过滤 URL的方法流程图;  1 is a flow chart of a method for filtering a URL according to the present invention;
图 2为本发明 URL过滤系统的原理框图;  2 is a schematic block diagram of a URL filtering system according to the present invention;
图 3为本发明的网关的原理框图。 具体实施方式  3 is a schematic block diagram of a gateway of the present invention. detailed description
为了使本发明所要解决的技术问题、 技术方案及有益效果更加清楚、 明白, 以下结合附图和实施例, 对本发明进行进一步详细说明。 应当理解, 此处所描述的具体实施例仅用以解释本发明, 并不用于限定本发明。 图 1 为本发明过滤 URL 的方法流程图, 本实施例假设用户自定义的 URL名单为黑名单, 如图 1所示, 本发明过滤 URL的方法, 具体包括以下 步驟: The present invention will be further described in detail below with reference to the accompanying drawings and embodiments in order to make the present invention. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. FIG. 1 is a flowchart of a method for filtering a URL according to the present invention. In this embodiment, a method for filtering a URL is as follows. As shown in FIG. 1 , the method for filtering a URL includes the following steps:
步驟 S001 : 根据用户自定义的黑名单, 生成 URL过滤系统可识别的 URL规则文件;  Step S001: Generate a URL rule file recognizable by the URL filtering system according to the user-defined blacklist;
步驟 S002: 将生成的 URL规则文件加载到内存中;  Step S002: Loading the generated URL rule file into the memory;
步驟 S003: 系统收到报文;  Step S003: The system receives the packet;
步驟 S004: 扫描所述 ·^艮文;  Step S004: Scan the said ^^ text;
步驟 S005: 判断所述报文是否是 HTTP报文,若是,则执行步驟 S006, 否则, 执行步驟 S010;  Step S005: determining whether the packet is an HTTP packet, if yes, executing step S006, otherwise, performing step S010;
步驟 S006: 扫描所述报文中的 URL信息;  Step S006: Scan URL information in the packet.
步驟 S007: 与内存中的 URL规则文件中的 URL信息进行匹配; 步驟 S008: 判断是否匹配成功, 若是, 则执行步驟 S009; 否则, 执行 步驟 S010;  Step S007: Matching the URL information in the URL rule file in the memory; Step S008: determining whether the matching is successful, if yes, executing step S009; otherwise, executing step S010;
步驟 S009: 过滤所述 ·^艮文;  Step S009: Filtering the information;
这里, 所述过滤所述报文就是指不允许所述报文通过。  Here, the filtering the packet means that the packet is not allowed to pass.
步驟 S010: 放行所述报文。  Step S010: Release the message.
这里, 所述放行所述报文就是指允许所述报文通过, 本步驟的报文包 括 HTTP才艮文和非 HTTP才艮文。  Here, the releasing the message means allowing the message to pass, and the message in this step includes an HTTP packet and a non-HTTP packet.
在其它实施例中, 当用户自定义的 URL名单为白名单时, 若收到的 HTTP报文中的 URL信息与内存中的 URL规则文件中的 URL信息匹配成 功, 则放行所述 HTTP报文; 若收到的 HTTP报文中的 URL信息与内存中 的 URL规则文件中的 URL信息匹配失败, 则过滤所述 HTTP报文。  In other embodiments, when the user-defined URL list is a whitelist, if the URL information in the received HTTP message matches the URL information in the in-memory URL rule file, the HTTP message is released. If the URL information in the received HTTP packet fails to match the URL information in the URL rule file in the memory, the HTTP packet is filtered.
本发明中, 系统处理 4艮文的同时, 还可以进一步判断所述用户自定义 的 URL名单是否有变化, 若是, 则根据变化后的用户自定义的 URL名单, 重新生成系统可识别的 URL规则文件, 并将新生成的 URL规则文件加载 到内存中, 加载完成后, 使用新的 URL规则文件进行 URL信息匹配, 同 时删除旧的 URL规则文件, 这能使得本发明在不中断扫描匹配业务的情况 下, 实现 URL规则文件的实时更新。 在具体的实施例中, 可以预留内存 A 和内存 B两块内存, 若旧的 URL规则文件存放在内存 A中, 那么, 用户自 定义的 URL名单变化后, 新生成的 URL规则文件就加载到内存 B中, 加 载完成后, 系统使用内存 B中的 URL规则文件进行 URL信息匹配, 与此 同时, 删除内存 A中的 URL规则文件, 当用户自定义的 URL名单再次变 化后,新生成的 URL规则文件则被加载到内存 A中,依次类推。也就是说, 系统同时执行两个任务, 一个是处理收到的报文, 一个是检测用户自定义 的 URL名单是否有变化。 In the present invention, the system may further determine whether the user-defined URL list has changed, and if so, according to the changed user-defined URL list. Regenerate the URL rule file recognizable by the system, and load the newly generated URL rule file into the memory. After the loading is completed, use the new URL rule file to match the URL information, and delete the old URL rule file, which can make this The invention implements real-time update of the URL rule file without interrupting the scan matching service. In a specific embodiment, the memory A and the memory B can be reserved. If the old URL rule file is stored in the memory A, the newly generated URL rule file is loaded after the user-defined URL list is changed. In the memory B, after the loading is completed, the system uses the URL rule file in the memory B to match the URL information, and at the same time, deletes the URL rule file in the memory A, and when the user-defined URL list changes again, the newly generated The URL rule file is loaded into memory A, and so on. That is to say, the system performs two tasks at the same time, one is to process the received message, and the other is to detect whether the user-defined URL list has changed.
本发明基于硬件的过滤方法, 与基于软件的现有方法相比, 提高了处 理 HTTP才艮文的速度。  The hardware-based filtering method of the present invention improves the speed of processing HTTP messages compared with existing software-based methods.
图 2为本发明 URL过滤系统的原理框图, 如图 2所示, 该系统包括: 扫描单元 01、 识别单元 02、 规则单元 03、 匹配单元 04、 以及内存单元 05; 其中,  2 is a schematic block diagram of a URL filtering system according to the present invention. As shown in FIG. 2, the system includes: a scanning unit 01, an identification unit 02, a rule unit 03, a matching unit 04, and a memory unit 05;
扫描单元 01 , 用于扫描收到的报文, 并发送给报文识别单元 02, 或者, 扫描 HTTP报文中的 URL信息, 并将所述 URL信息发送给匹配单元 04; 并根据识别单元 02返回的识别结果以及匹配单元 04返回的匹配结果, 放 行和 /或过滤收到的报文;  The scanning unit 01 is configured to scan the received message and send it to the message identification unit 02, or scan the URL information in the HTTP message, and send the URL information to the matching unit 04; and according to the identification unit 02 The returned recognition result and the matching result returned by the matching unit 04, release and/or filter the received message;
识别单元 02, 用于识别收到的报文是否是 HTTP报文, 并将识别结果 发送给扫描单元 01 ;  The identifying unit 02 is configured to identify whether the received packet is an HTTP packet, and send the identification result to the scanning unit 01;
规则单元 03 , 用于根据用户自定义的 URL名单, 生成系统可识别的 URL规则文件, 并将生成的 URL规则文件加载到内存单元 05; 并用于判 断用户自定义的 URL名单是否有变化, 并在所述用户自定义的 URL有变 化时,根据变化后的用户自定义的 URL名单, 重新生成系统可识别的 URL 规则文件, 将新生成的 URL规则文件加载到内存单元 05中, 并在加载完 成后 , 通知匹配单元 04使用新的 URL规则文件进行 URL信息匹配; The rule unit 03 is configured to generate a system-recognizable URL rule file according to the user-defined URL list, and load the generated URL rule file into the memory unit 05; and determine whether the user-defined URL list changes, and The user-defined URL has changed During the process, according to the changed user-defined URL list, the system-recognized URL rule file is regenerated, the newly generated URL rule file is loaded into the memory unit 05, and after the loading is completed, the matching unit 04 is notified to use the new URL rule file for URL information matching;
匹配单元 04,用于将收到的 URL信息与内存单元 05中的 URL规则文 件中的 URL信息进行匹配, 并将匹配结果发送给扫描单元 01 , 或者, 在收 到规则单元 03的通知时, 使用内存单元 05中新加载的 URL规则文件进行 URL信息匹配, 并删除内存单元 05中旧的 URL规则文件。  The matching unit 04 is configured to match the received URL information with the URL information in the URL rule file in the memory unit 05, and send the matching result to the scanning unit 01, or when receiving the notification of the rule unit 03, The URL information matching is performed using the newly loaded URL rule file in the memory unit 05, and the old URL rule file in the memory unit 05 is deleted.
其中,当所述识别单元 02返回的识别结果为确定收到的报文不是 HTTP 报文时, 所述扫描单元 01扫描 HTTP报文中的 URL信息, 并将所述 URL 信息发送给匹配单元 04; 并根据所述匹配单元返回的匹配结果, 放行或过 滤所述 HTTP才艮文;  When the recognition result returned by the identification unit 02 is that the received message is not an HTTP message, the scanning unit 01 scans the URL information in the HTTP message, and sends the URL information to the matching unit 04. And releasing or filtering the HTTP packet according to the matching result returned by the matching unit;
当所述识别单元 02返回的识别结果为确定收到的报文不是 HTTP报文 时, 所述扫描单元 01直接放行所述报文。  When the recognition result returned by the identification unit 02 is that the received message is not an HTTP message, the scanning unit 01 directly releases the message.
图 3为本发明的网关原理框图, 如图 3所示, 包括图 2所示的 URL过 滤系统, URL过滤系统包括扫描单元 01、 识别单元 02、 规则单元 03、 匹 配单元 04、 以及内存单元 05, 各单元功能参见上述对图 2的描述, 此处不 再复述。  3 is a schematic block diagram of a gateway according to the present invention. As shown in FIG. 3, the URL filtering system shown in FIG. 2 is included. The URL filtering system includes a scanning unit 01, an identification unit 02, a rule unit 03, a matching unit 04, and a memory unit 05. For the function of each unit, refer to the description of Figure 2 above, and it will not be repeated here.
上述说明示出并描述了本发明的优选实施例, 但如前所述, 应当理解 本发明并非局限于本文所披露的形式, 不应看作是对其它实施例的排除, 而可用于各种其它组合、 修改和环境, 并能够在本文所述发明构想范围内, 通过上述教导或相关领域的技术或知识进行改动。 而本领域人员所进行的 改动和变化不脱离本发明的精神和范围, 则都应在本发明所附权利要求的 保护范围内。  The above description shows and describes a preferred embodiment of the present invention, but as described above, it should be understood that the invention is not limited to the form disclosed herein, and should not be construed as Other combinations, modifications, and environments are possible and can be modified by the teachings of the above teachings or related art within the scope of the inventive concept described herein. All changes and modifications made by those skilled in the art are intended to be within the scope of the appended claims.

Claims

权利要求书 Claim
1、 一种过滤统一资源定位符 URL的方法, 其中, 该方法包括: 根据用户自定义的 URL名单, 生成 URL过滤系统可识别的 URL规则 文件, 并将所述 URL规则文件加载至内存中;  A method for filtering a uniform resource locator URL, wherein the method comprises: generating a URL rule file recognizable by a URL filtering system according to a user-defined URL list, and loading the URL rule file into a memory;
当所述系统收到报文时, 扫描并判断所述报文是否是超文本传输协议 HTTP才艮文, 确定是 HTTP才艮文时 , 扫 4笛所述 HTTP才艮文中的 URL信息 , 并与内存中的 URL规则文件中的 URL信息进行匹配;  When the system receives the message, it scans and determines whether the message is a Hypertext Transfer Protocol (HTTP) message, and determines that the HTTP message is in the HTTP message, and Match the URL information in the in-memory URL rule file;
根据匹配结果允许或不允许所述 HTTP报文通过。  The HTTP message is allowed or not allowed to pass according to the matching result.
2、 根据权利要求 1所述的方法, 其中, 所述将 URL规则文件加载至 内存中后, 该方法还包括:  2. The method according to claim 1, wherein, after the URL rule file is loaded into the memory, the method further includes:
判断所述用户自定义的 URL名单是否有变化, 确定有变化时, 根据变 化后的用户自定义的 URL名单, 重新生成系统可识别的 URL规则文件, 并将新生成的 URL规则文件加载至内存中;  Determining whether there is a change in the user-defined URL list, and determining that there is a change, regenerating the system-recognizable URL rule file according to the changed user-defined URL list, and loading the newly generated URL rule file into the memory Medium
加载成功后, 所述系统使用内存中新的 URL规则文件进行 URL信息 匹配, 同时删除内存中旧的 URL规则文件。  After the loading is successful, the system uses the new URL rule file in the memory to match the URL information and delete the old URL rule file in the memory.
3、 根据权利要求 1所述的方法, 其中, 该方法进一步包括: 所述系统 确定收到的报文不是 HTTP报文时, 直接允许所述报文通过。  3. The method according to claim 1, wherein the method further comprises: when the system determines that the received message is not an HTTP message, directly allowing the message to pass.
4、 根据权利要求 2所述的方法, 其中, 所述用户自定义的 URL名单 为黑名单、 或为白名单。  4. The method according to claim 2, wherein the user-defined URL list is a blacklist or a whitelist.
5、 根据权利要求 1至 4任一项所述的方法, 其中, 所述根据匹配结果 允许或不允许所述 HTTP ^艮文通过, 包括:  The method according to any one of claims 1 to 4, wherein the allowing or disallowing the HTTP ^ 艮 text according to the matching result includes:
当所述用户自定义的 URL名单为黑名单时, 若收到的 HTTP报文中的 URL信息与内存中的 URL规则文件中的 URL信息匹配成功, 则不允许所 述 HTTP报文通过; 若收到的 HTTP报文中的 URL信息与内存中的 URL 规则文件中的 URL信息匹配失败, 则允许所述 HTTP报文通过; 当所述用户自定义的 URL名单为白名单时, 若收到的 HTTP报文中的 URL信息与内存中的 URL规则文件中的 URL信息匹配成功, 则允许所述 HTTP报文通过;若收到的 HTTP报文中的 URL信息与内存中的 URL规则 文件中的 URL信息匹配失败, 则不允许所述 HTTP报文通过。 If the URL list of the user-defined URL is a blacklist, if the URL information in the received HTTP packet matches the URL information in the URL rule file in the memory, the HTTP packet is not allowed to pass. If the URL information in the received HTTP packet fails to match the URL information in the URL rule file in the memory, the HTTP packet is allowed to pass. When the user-defined URL list is a whitelist, if the URL information in the received HTTP packet matches the URL information in the URL rule file in the memory, the HTTP packet is allowed to pass; If the URL information in the HTTP packet is not matched with the URL information in the URL rule file in the memory, the HTTP packet is not allowed to pass.
6、 一种 URL过滤系统, 该系统包括: 识别单元以及内存单元, 其中, 该系统还包括: 规则单元、 扫描单元以及匹配单元; 其中,  A URL filtering system, the system includes: an identification unit and a memory unit, wherein the system further includes: a rule unit, a scanning unit, and a matching unit;
所述识别单元, 设置为识别收到的报文是否是 HTTP报文, 并将识别 结果发送给所述扫描单元;  The identifying unit is configured to identify whether the received packet is an HTTP packet, and send the identification result to the scanning unit;
所述规则单元, 设置为根据用户自定义的 URL名单, 生成系统可识别 的 URL规则文件, 并将所述 URL规则文件加载至所述内存单元;  The rule unit is configured to generate a URL rule file recognizable by the system according to the user-defined URL list, and load the URL rule file into the memory unit;
所述扫描单元, 设置为扫描收到的报文, 并发送给所述报文识别单元, 并在收到所述识别单元返回的识别结果为确定收到的报文是 HTTP报文时, 扫描所述 HTTP报文中的 URL信息, 并将所述 URL信息发送给所述匹配 单元; 并根据所述匹配单元返回的匹配结果, 允许或不允许所述 HTTP报 文通过;  The scanning unit is configured to scan the received message and send it to the message identification unit, and scan when the identification result returned by the identification unit is determined to be that the received message is an HTTP message. Transmitting the URL information in the HTTP packet, and sending the URL information to the matching unit; and allowing or disallowing the HTTP packet to pass according to the matching result returned by the matching unit;
所述匹配单元, 设置为将所述 HTTP报文中的 URL信息与所述内存单 元中的 URL规则文件中的 URL信息进行匹配, 并将匹配结果发送给所述 扫描单元。  The matching unit is configured to match the URL information in the HTTP packet with the URL information in the URL rule file in the memory unit, and send the matching result to the scanning unit.
7、 根据权利要求 6所述的系统, 其中,  7. The system according to claim 6, wherein
所述规则单元,还设置为判断所述用户自定义的 URL名单是否有变化, 并在确定所述用户自定义的 URL有变化时, 根据变化后的用户自定义的 URL名单, 重新生成系统可识别的 URL规则文件, 将新生成的 URL规则 文件加载至所述内存单元中, 并在加载成功后, 通知所述匹配单元使用新 的 URL规则文件进行 URL信息匹配。  The rule unit is further configured to determine whether the user-defined URL list has changed, and when determining that the user-defined URL has changed, according to the changed user-defined URL list, the system may be regenerated. The identified URL rule file loads the newly generated URL rule file into the memory unit, and after the loading is successful, notifies the matching unit to use the new URL rule file to perform URL information matching.
8、 根据权利要求 7所述的系统, 其中, 所述匹配单元, 还设置为收到所述规则单元的通知后, 使用新的 URL 规则文件进行 URL信息匹配, 并删除所述内存单元中旧的 URL规则文件。 8. The system according to claim 7, wherein The matching unit is further configured to: after receiving the notification of the rule unit, use a new URL rule file to perform URL information matching, and delete the old URL rule file in the memory unit.
9、 根据权利要求 6至 8任一项所述的系统, 其中, 所述扫描单元, 还 设置为在收到所述识别单元返回的识别结果为确定收到的报文不是 HTTP 报文时, 直接允许所述报文通过。  The system according to any one of claims 6 to 8, wherein the scanning unit is further configured to: when receiving the recognition result returned by the identification unit, determining that the received message is not an HTTP message, The message is allowed to pass directly.
10、 一种网关, 其中, 该网关包括权利要求 6至 9任一项所述的 URL 过滤系统。  A gateway, wherein the gateway comprises the URL filtering system of any one of claims 6 to 9.
PCT/CN2011/080608 2011-05-11 2011-10-10 Ulr filtering system, method and gateway WO2012151843A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2011101213726A CN102780681A (en) 2011-05-11 2011-05-11 URL (Uniform Resource Locator) filtering system and URL filtering method
CN201110121372.6 2011-05-11

Publications (1)

Publication Number Publication Date
WO2012151843A1 true WO2012151843A1 (en) 2012-11-15

Family

ID=47125437

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/080608 WO2012151843A1 (en) 2011-05-11 2011-10-10 Ulr filtering system, method and gateway

Country Status (2)

Country Link
CN (1) CN102780681A (en)
WO (1) WO2012151843A1 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103354546A (en) * 2013-06-25 2013-10-16 亿赞普(北京)科技有限公司 Message filtering method and message filtering apparatus
CN103401850A (en) * 2013-07-19 2013-11-20 北京星网锐捷网络技术有限公司 Message filtering method and device
CN103560995A (en) * 2013-09-25 2014-02-05 深圳市共进电子股份有限公司 URL filtering method for realizing IPv4 and IPv6 at the same time
CN105302815B (en) * 2014-06-23 2019-06-07 腾讯科技(深圳)有限公司 The filter method and device of the uniform resource position mark URL of webpage
CN105938472A (en) * 2015-08-26 2016-09-14 杭州迪普科技有限公司 Web access control method and device
CN106657201B (en) * 2015-11-03 2021-08-24 中兴通讯股份有限公司 Data processing method and device of GSLB (generalized Global System for Mobile communications) scheduling system
CN106970917B (en) * 2016-01-13 2019-11-19 中国科学院声学研究所 A kind of foundation of the Hash table of blacklist URL and the lookup method of request URL
CN107404392A (en) * 2016-05-20 2017-11-28 中兴通讯股份有限公司 The processing method and processing device of the scheduling rule of uniform resource position mark URL
CN109547421A (en) * 2018-11-08 2019-03-29 锐捷网络股份有限公司 A kind of method and device for the URL that audits

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080209057A1 (en) * 2006-09-28 2008-08-28 Paul Martini System and Method for Improved Internet Content Filtering
CN101795272A (en) * 2010-01-22 2010-08-04 联想网御科技(北京)有限公司 Illegal website filtering method and device
CN102004770A (en) * 2010-11-16 2011-04-06 杭州迪普科技有限公司 Webpage auditing method and device
CN102004789A (en) * 2010-12-07 2011-04-06 苏州迈科网络安全技术股份有限公司 Application method of uniform/universal resource locator (URL) filter system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090083240A1 (en) * 2007-09-24 2009-03-26 Microsoft Corporation Authorization agnostic based mechanism

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080209057A1 (en) * 2006-09-28 2008-08-28 Paul Martini System and Method for Improved Internet Content Filtering
CN101795272A (en) * 2010-01-22 2010-08-04 联想网御科技(北京)有限公司 Illegal website filtering method and device
CN102004770A (en) * 2010-11-16 2011-04-06 杭州迪普科技有限公司 Webpage auditing method and device
CN102004789A (en) * 2010-12-07 2011-04-06 苏州迈科网络安全技术股份有限公司 Application method of uniform/universal resource locator (URL) filter system

Also Published As

Publication number Publication date
CN102780681A (en) 2012-11-14

Similar Documents

Publication Publication Date Title
WO2012151843A1 (en) Ulr filtering system, method and gateway
CN106489258B (en) Linking to content using an information centric network
US9762543B2 (en) Using DNS communications to filter domain names
JP5917573B2 (en) Real-time data awareness and file tracking system and method
WO2018107784A1 (en) Method and device for detecting webshell
US8910270B2 (en) Remote access to private network resources from outside the network
US20160072847A1 (en) Internet mediation
EP3170091B1 (en) Method and server of remote information query
US10560452B2 (en) Apparatus and method to control transfer apparatuses depending on a type of an unauthorized communication occurring in a network
US9195826B1 (en) Graph-based method to detect malware command-and-control infrastructure
US20190222656A1 (en) Communication Method and Apparatus
CN102404741B (en) Method and device for detecting abnormal online of mobile terminal
WO2006103743A1 (en) Communication control device and communication control system
WO2012100531A1 (en) Method, apparatus and system for forwarding packet
WO2012034518A1 (en) Method and system for providing message including universal resource locator
CN105635235B (en) access control method and network node for access control
JP5980968B2 (en) Information processing apparatus, information processing method, and program
WO2014206152A1 (en) Network safety monitoring method and system
WO2014094483A1 (en) Access control method for wifi device and wifi device thereof
EP2640035B1 (en) Hypertext transfer protocol (http) stream association method and device
CN109167758A (en) A kind of message processing method and device
JP2011087189A (en) Transfer control method, transfer control apparatus, transfer control system, and transfer control program
JP5035410B2 (en) Address search method and packet processing apparatus
CN108040124B (en) Method and device for controlling mobile terminal application based on DNS-Over-HTTP protocol
CN110708309A (en) Anti-crawler system and method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11865201

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11865201

Country of ref document: EP

Kind code of ref document: A1